Big Data Handling at the LHC

Question

My understanding is that much of the data that is is collected at the Large Hadron Collider is similar to that in the image below, and that a vast amount of the data contains little of specific and immediate interest.

My understanding is that data is recorded at 700 MB/s and about 15 PetaBytes are collected per year.

I assume this is too high to be examined manually, like the second image here shows.

What automated methods are used by physicists to cull the vast amount of data collected and find results that are of interest?

I'm not referring to the vast number of computers throughout the world that are aggregated to do the data reduction. What I am looking for it what methods are used. I'm particularly interested in how AI or neural networks are being used, if they are. What papers have been published on this?

If someone could help with the tags on this question, I'd appreciate it.

[ Image By Lucas Taylor / CERN - http://cdsweb.cern.ch/record/628469, CC BY-SA 3.0, Link]

[Image by Fermilab. - http://history.fnal.gov/brochure_src/events_study.html]

This is quite a broad question, so it's unlikely that any of the answers will be comprehensive. — probably_someone, Feb 10 '19 at 23:22
Related: What are the main algorithms the LHC particle detectors use to reconstruct decay pathways?. — Anyon, Feb 10 '19 at 23:23

score 4 · Answer 1 · answered Feb 11 '19 at 06:58

First, let me break your question into two pieces:

What automated methods are used by physicists to (i) cull the vast amount of data collected and (ii) find results that are of interest?

Regarding (i), the technique used is a so-called 'trigger system'. A trigger system decides which events to record to disk and which to discard. It is designed to reduce the frequency of events from about one billion per second to about 1000 per second. The trigger system is composed of three parts:

The Level 1 trigger, implemented as custom-built hardware, takes coarse-grained information about an event from the calorimeter. It makes a quick decision (about 2 μs) about whether to proceed with the event.
The Level 2 trigger, implemented as software, takes further information, but not the whole event, to make a decision about whether to veto. This level uses e.g. Kalman filtering to judge whether tracks lead back to a hard scatter
Level 3 and onwards are software triggers that consider all information about an event

If an event passes the triggers it is saved to disk. Now regarding (ii), finding results that are of interest. Well, what is of interest depends greatly on what you are looking for. Broadly speaking, whatever you are looking for, the same signature can be mimicked by other, so-called background processes. An experimental team will therefore select events in which we expect a high ratio of signal to background.

The features to select upon for were historically decided using physical insight guided by simulations. Recently though, we have indeed seen the advent of machine learning techniques to tackle this problem. See e.g., 1807.06038, 1902.02634, 1806.02350. This is typically seen as a classification problem: we must classify events as signal or background.

I should point our that machine learning is popular in a smaller sub-problem: jet reconstruction, where one tries to reconstruct so-called jets from the hadronic calorimeter. See e.g., 1609.00607.

score 2 · Answer 2 · answered Feb 11 '19 at 06:02

The second picture is from a bubble chamber and life was then simpler because the recorded data of the event were there. The first picture is from a detector experiment and to get at the image already programming has been used so that an image of track can be given in the computer. See this Atlas talk of how they detect and identify a track, to understand that your question can only be answered by hand waving, since after having a reconstructed event with many tracks further decisions have to be taken for physics analysis.

All possible methods can be seen if one goes through the papers published by experiments in the cern document server . If you search for "CMS neural net" you will get a number of thesis and publications that are using the method in analysis in various ways.

Thanks. Getting some familiarity with the terminology and some of the papers is a big help! — Jim, Feb 11 '19 at 06:20

Big Data Handling at the LHC

2 Answers2