Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
TRACKINGWITH
GRAPHS
XIANGYANG JU
DANIEL MURNANE
ON BEHALF OF THE
EXA.TRKX COLLABORATION 1
OVERVIEW
1. Why ML/GNNs for High Energy Physics?
2. GNN applications in HEP/ExatrkX
3. ML Tracking Pipeline
4. Metric Learning
Code Walkthrough (Pytorch)
5. Doublet & Triplet GNN
6. DBSCAN for TrackML Score
Code Walkthrough (TensorFlow)
2
WHY MACHINE LEARNING FOR TRACKING?
High-luminosity scaling problem, means we need something
to compliment traditional tracking algorithms,
but why graphs?
4
In other words…
Co
mp
uti
ng p
ow
er
Time, Energy, Number of Collisions
Predicted
capacity
Traditional
methods
(scale quadratically)
HL-LHC, 14 TeV
2024 – 2026
6 Billion events/second
WHY GRAPHS SPECIFICALLY?
High-luminosity scaling problem, means we need something
to compliment traditional tracking algorithms,
but why graphs?
Graphs can capture inherent sparsity of much physics data
5
Hits to
graphs
WHY GRAPHS?
High-luminosity scaling problem, means we need something
to compliment traditional tracking algorithms,
but why graphs?
Graphs can capture inherent sparsity of much physics data
Graphs can capture the manifold and relational structure of much physics data
Conversion to and from graphs can allow manipulation of dimensionality
Graph Neural Networks are booming (i.e. wouldn’t be talking about graphs if there weren’t a wealth of classic
algorithms and NN models for graph data)
Industry research and investment means good outlook for software and hardware optimised for graphs
6
APPLICATIONS
TrackML dataset ~ HL-LHC siliconhttps://indico.cern.ch/event/831165/contributions/3717124/
High Granularity Calorimeter datahttps://arxiv.org/abs/2003.11603
LArTPC data ~ DUNE experimenthttps://indico.cern.ch/event/852553/contributions/4059542/
Quantum GNN for Particle Track Reconstructionhttps://indico.cern.ch/event/852553/contributions/4057625/
GNNs on FPGAs for Level-1 Triggerhttps://indico.cern.ch/event/831165/contributions/3758961/
7
APPLICATIONS
TrackML dataset ~ HL-LHC siliconhttps://indico.cern.ch/event/831165/contributions/3717124/
High Granularity Calorimeter datahttps://arxiv.org/abs/2003.11603
8
LArTPC data ~ DUNE experimenthttps://indico.cern.ch/event/852553/contributions/4059542/
Quantum GNN for Particle Track Reconstructionhttps://indico.cern.ch/event/852553/contributions/4057625/
GNNs on FPGAs for Level-1 Triggerhttps://indico.cern.ch/event/831165/contributions/3758961/
THE EXATRKX PROJECT
MISSION
Optimization, performance and validation studies of ML approaches to the Exascale tracking problem, to enable production-level tracking on next-generation detector systems.
PEOPLE
• Caltech: Joosep Pata, Maria Spiropulu, Jean-Roch Vlimant, Alexander Zlokapa
• Cincinnati: Adam Aurisano, Jeremy Hewes
• FNAL: Giuseppe Cerati, Lindsey Gray, Thomas Klijnsma, Jim Kowalkowski, Gabriel Perdue, Panagiotis Spentzouris
• LBNL: Paolo Calafiura (PI), Nicholas Choma, Sean Conlon, Steve Farrell, Xiangyang Ju, Daniel Murnane, Prabhat
• ORNL: Aristeidis Tsaris
• Princeton: Isobel Ojalvo, Savannah Thais
• SLAC: Pierre Cote De Soux, Francois Drielsma, Kasuhiro Terao, Tracy Usher
9
https://exatrkx.github.io/
THE PHYSICAL PROBLEM
• “TrackML Kaggle Competition” dataset
• Generated by HL-LHC-like tracking (ACTS) simulation
• 9000 events to train on
• Each event has up to 100,000 layer hits from around
10,000 particles
• Layers can be hit multiple times by same particle
(“duplicates”)
• Non-particle hits present (“noise”)
10
THE PHYSICAL PROBLEM
• “TrackML Kaggle Competition” dataset
• Generated by HL-LHC-like tracking (ACTS) simulation
• 9000 events to train on
• Each event has up to 100,000 layer hits from around
10,000 particles
• Layers can be hit multiple times by same particle
(“duplicates”)
• Non-particle hits present (“noise”)
11
THE PHYSICAL PROBLEM
• Need to construct hit data into graph data, i.e. nodes and edges
• Can use geometric heuristics (have used in past: ~45% efficiency, 5% purity)
• To improve performance, use learned embedding construction
• Ideal final result is a “TrackML score”
𝑆 ∈ 0,1
• All hits belonging to same track labelled with same unique label ⇒ 𝑆 = 1
12
THE PHYSICAL PROBLEM
• We will follow a particular
event, let’s say event #4692
• We will follow a particular
particle, let’s say particle
#1058360480961134592
• It’s a mouthful, so let’s call
her Diane
13
y
zx
TRACKING PIPELINE1. Metric Learning
2. Doublet GNN
3. (Optional) Triplet GNN
4. DBSCAN → TrackML score
Raw hit data
embedded
Filter likely,
adjacent
doublets
Train/classify
doublets in GNN
Filter, convert to
triplets
Train/classify
triplets in GNN
Apply cut for
seeds
DBSCAN for
track labels
14
DATASET
OUR PREVIOUS STATE-OF-THE-ART
𝜂 = 1
𝜂 = 2
𝜂 = 3
𝜂 = 0𝜂 = −1
𝜂 = −2
𝜂 = −3
15
DATASET
OUR PREVIOUS STATE-OF-THE-ART
Barrel only
𝜂 = 1
𝜂 = 2
𝜂 = 3
𝜂 = 0𝜂 = −1
𝜂 = −2
𝜂 = −3Raw hit data
embedded
Filter likely,
adjacent
doublets
Train/classify
doublets in
GNN
Filter, convert
to triplets
Train/classify
tripets in
GNN
16
DATASET
OUR PREVIOUS STATE-OF-THE-ART
Barrel only
Adjacency condition
𝜂 = 1
𝜂 = 2
𝜂 = 3
𝜂 = 0𝜂 = −1
𝜂 = −2
𝜂 = −3 Adjacency allows clear ordering
of hits in track, and therefore
ground truth
Can also be used to prune
false positives in the pipeline
17
1
2
3
DATASET
OUR PREVIOUS STATE-OF-THE-ART
Adjacency allows clear ordering
of hits in track, and therefore
ground truth
Can also be used to prune
false positives in the pipeline
18
But what about
skipped layers?
Barrel only
Adjacency condition
1
2
3
DATASET
OUR PREVIOUS STATE-OF-THE-ART
Adjacency allows clear ordering
of hits in track, and therefore
ground truth
Can also be used to prune
false positives in the pipeline
19
But what about
skipped layers?
Barrel only
Adjacency condition
What about the
endcaps?
1
2
3?
3?
DATASET
OUR PREVIOUS STATE-OF-THE-ART
Adjacency allows clear ordering
of hits in track, and therefore
ground truth
Can also be used to prune
false positives in the pipeline
20
But what about
skipped layers?
Barrel only
Adjacency condition
What about the
endcaps?
Let’s define clear ground truth and see if
an embedding space can be learned without
knowledge of detector geometry
1
2
3?
3?
DATASET
GEOMETRY-FREE TRUTH GRAPH
1. For each particle, order hits by increasing
distance from creation vertex,
𝑅 = 𝑥2 + 𝑦2 + 𝑧2
2. Group by shared layers
3. Connect all combinations from layer 𝐿𝑖 to 𝐿𝑖+1,
where 𝑅𝑖−1 < 𝑅𝑖 < 𝑅𝑖+1
21
Creation vertex
DATASET
GEOMETRY-FREE TRUTH GRAPH
22
The real question is: Can such
a specific definition of truth be
learned in an embedded space?
1. For each particle, order hits by increasing
distance from creation vertex,
𝑅 = 𝑥2 + 𝑦2 + 𝑧2
2. Group by shared layers
3. Connect all combinations from layer 𝐿𝑖 to 𝐿𝑖+1,
where 𝑅𝑖−1 < 𝑅𝑖 < 𝑅𝑖+1
1. For all hits in barrel, embed features (co-ordinates, cell
direction data, etc.)
into N-dimensional space
Raw hit data
embedded
Filter likely,
adjacent
doublets
Train/classify
doublets in GNN
Filter, convert to
triplets
Train/classify
triplets in GNNApply cut for
seeds
METRIC LEARNING
OUR PREVIOUS STATE OF THE ART
24
1. For all hits in barrel, embed features (co-ordinates, cell
direction data, etc.)
into N-dimensional space
2. Associate hits from same tracks as close in N-
dimensional distance (close = within Euclidean
distance r)
Raw hit data
embedded
Filter likely,
adjacent
doublets
Train/classify
doublets in GNN
Filter, convert to
triplets
Train/classify
triplets in GNNApply cut for
seeds
METRIC LEARNING
OUR PREVIOUS STATE OF THE ART
25
1. For all hits in barrel, embed features (co-ordinates, cell
direction data, etc.)
into N-dimensional space
2. Associate hits from same tracks as close in N-
dimensional distance (close = within Euclidean
distance r)
Raw hit data
embedded
Filter likely,
adjacent
doublets
Train/classify
doublets in GNN
Filter, convert to
triplets
Train/classify
triplets in GNNApply cut for
seeds
METRIC LEARNING
OUR PREVIOUS STATE OF THE ART
26
r
1. For all hits in barrel, embed features (co-ordinates, cell
direction data, etc.)
into N-dimensional space
2. Associate hits from same tracks as close in N-
dimensional distance
3. Score each “target” hit within embedding
neighbourhood against
the “seed” hit at centre = Euclidean distance
Raw hit data
embedded
Filter likely,
adjacent
doublets
Train/classify
doublets in GNN
Filter, convert to
triplets
Train/classify
triplets in GNNApply cut for
seeds
METRIC LEARNING
OUR PREVIOUS STATE OF THE ART
27
Architecture specifics
“Comparative” hinge loss
Negatives punished for being in margin
radius
Positives punished for being outside
margin radius (Δ)
Margin = training radius
Train with random pairs with only (r,𝜙,z): 0.3
– 0.5% purity @ 96% efficiency.
Raw hit data
embedded
Filter likely,
adjacent
doublets
Train/classify
doublets in GNN
Filter, convert to
triplets
Train/classify
triplets in GNNApply cut for
seeds
METRIC LEARNING IN EMBEDDING SPACE
OUR PREVIOUS STATE OF THE ART
28
METRIC LEARNING IN EMBEDDING SPACE
TOWARDS REALISTIC TRACKING
29
Where do we lose eff/pur:
Most random pair negatives are easy – The Curse of Easy Negatives
Cell shape gives directional information to each hit
Battle against GPU memory – Can only train on subset of pairs
METRIC LEARNING IN EMBEDDING SPACE
TOWARDS REALISTIC TRACKING
30
Where do we lose eff/pur:
Most negatives are easy.
Solution: Hard Negative Mining (HNM)
1. Run event through embedding
2. Build radius graph
3. Train loss on all hits within radius=margin=1 of each hit
4. For speed, use sparse custom-built technique + the
Facebook GPU-powered FAISS KNN library
METRIC LEARNING IN EMBEDDING SPACE
EFFECTIVE TRAINING
32
Performance improvement with:
• Random Pairs (RP)
• Hard Negative Mining (HNM)
• Positive Weighting (PW)
• Cell Information (CI) - shape
• (Warm-up WU) – Altogether: HPRCW3
FILTERING
OUR PREVIOUS STATE OF THE ART
The pipeline so far. We want to get purity (1.3%) up further, so let’s filter these huge graphs.
51
2 Radius
Graph51
2
51
2
51
2
51
2
8
r,y,
zce
ll
Hinge
Loss
51
2
51
2
51
2 Cross
Entropy
Loss
no
rm
no
rm
34
FILTERING
TOWARDS REALISTIC TRACKING Just like in embedding, training on random pairs in filter gives poor performance, since the selection mostly gives
easy negatives.
35
Again, solution is HNM: run event through model (without tracking gradients), get hard negatives, run these through
model (now tracking gradients), training only on them.
z z
Distribution of true edgesDistribution of all edges
out of metric learning
FILTERING
TOWARDS REALISTIC TRACKING
36
FILTERING
TOWARDS REALISTIC TRACKING
Regime Phys. Truth purity
@ 99% efficiency
PID Truth purity
@ 99% efficiency
Vanilla 6.3% 7.8%
Cell info 8.3% 12.8%
Cell info,
layer+batch norm
14.0% 17.4%
Graph size O(1 million edges) O(3 million edges)
Physical Truth:
Truth is defined as edges
between closest hits in
track, on different layers
PID Truth:
Truth is defined as any edge
connecting hits with the
same Particle ID (PID)
Remember
37
38
FILTERING
TOWARDS REALISTIC TRACKING
Does it work? Let’s check Diane:
Pretty good!
True positive
False positive
No false negatives
39
FILTERING
TOWARDS REALISTIC TRACKING
Does it work? Let’s check Diane:
Not quite as good…
True positive
False positive
False negatives
This is where GNN
comes in
FILTERING
ROBUSTNESS
40
(In collaboration with University of Washington:
Aditi Chauhan, Alex Schuy, Ami Oka, David Ho, Shie-Chieh Hsu)
FILTERING
ROBUSTNESS
Noise level a proxy for low-pT hits
Robustness of embedding to noise
Not re-trained, that is:
Out-of-the-box penalty around 20%
41
(In collaboration with University of Washington:
Aditi Chauhan, Alex Schuy, Ami Oka, David Ho, Shie-Chieh Hsu)
METRIC LEARNING IN EMBEDDED SPACE
CODE WALKTHROUGH
42
GRAPH NEURAL NETWORKS
FOR HIGH ENERGY PHYSICS
Can approximate geometry of the physics problem
Are a generalisation of many other machine learning
techniques
E.g. Message passing convolution generalises CNN
from flat to arbitrary geometry
Can learn node (i.e. hit / spacepoint) features and
embeddings, as well as edge (i.e. relational) features and
embeddings
E.g. In practice, for a LHC-like detector environment, join hits
into graph, and iterate through message-passing of hidden
features
43
Raw hit data
embedded
Filter likely,
adjacent
doublets
Train/classify
doublets in GNN
Filter, convert to
triplets
Train/classify
triplets in GNN
Apply cut for
seeds
DBSCAN for
track labels
Doublet GNN architecture
Performance
Triplet construction +
performance
44
GRAPH NEURAL NETWORKS
THE AIM
45
GRAPH NEURAL NETWORKS
ARCHITECTURES
𝑣0𝑘+1 = 𝜙(𝑒0𝑗
𝑘 , 𝑣𝑗𝑘, 𝑣0
𝑘)MESSAGE
PASSING
𝑣1𝑘 𝑣2
𝑘
𝑣3𝑘 𝑣4
𝑘
𝑣𝑖𝑘 node features
𝑒𝑖𝑗𝑘 edge features
at iteration 𝑘
𝑒01𝑘 𝑒02
𝑘
𝑒03𝑘 𝑒04
𝑘
46
GRAPH NEURAL NETWORKS
ARCHITECTURES
𝑣0𝑘+1 = MLP( Σ𝑒0𝑗
𝑘 𝑣𝑗𝑘 , 𝑣0
𝑘 )ATTENTION
GNN
𝑣1𝑘 𝑣2
𝑘
𝑣3𝑘 𝑣4
𝑘
𝑣𝑖𝑘 node features
𝑒𝑖𝑗𝑘 edge score [0, 1]
at iteration 𝑘
𝑒01𝑘 𝑒02
𝑘 = 𝑀𝐿𝑃 [𝑣0𝑘 , 𝑣2
𝑘]
𝑒03𝑘 𝑒04
𝑘Veličković, Petar, et al.
"Graph attention
networks." arXiv preprint
arXiv:1710.10903 (2017).
47
GRAPH NEURAL NETWORKS
ARCHITECTURES
𝑣0𝑘+1 = 𝜙(𝑣0
𝑘, Σ𝑒0𝑗𝑘+1)
INTERACTION
NETWORK
𝑣1𝑘 𝑣2
𝑘
𝑣3𝑘 𝑣4
𝑘
𝑣𝑖𝑘 node features
𝑒𝑖𝑗𝑘 edge features
at iteration 𝑘
𝑒01𝑘 𝑒02
𝑘+1 = 𝜙 𝑣0𝑘, 𝑣2
𝑘 , 𝑒02𝑘
𝑒03𝑘 𝑒04
𝑘
Battaglia, Peter, et al. "Interaction
networks for learning about
objects, relations and
physics." Advances in neural
information processing systems.
2016.
• We convert from a doublet graph to triplet graph: triplet
edges have direct access to curvature information, therefore
we hypothesise the accuracy should be even better
• Doublets are associated to nodes, triplets are associated to
edges
48
0.99
x1
x2
x4
0.87 0.84
x2x2
x3
GRAPH NEURAL NETWORKS
PERFORMANCE
( ) ( )
( )
49
0.99
x1x2
x4
0.87 0.84
x2x2
x3
GRAPH NEURAL NETWORKS
PERFORMANCE
• We convert from a doublet graph to triplet graph: triplet
edges have direct access to curvature information, therefore
we hypothesise the accuracy should be even better
• Doublets are associated to nodes, triplets are associated to
edges
( ) ( )
( )
50
0.99
x1x2
x4
0.87 0.84
x2x2
x3
• We convert from a doublet graph
to triplet graph: triplet
edges have direct access to
curvature information, therefore
we hypothesise the accuracy
should be even better
• Doublets are associated to
nodes, triplets are associated to
edges
GRAPH NEURAL NETWORKS
PERFORMANCE
Barrel-only, adjacent-layers
( ) ( )
( )
51
0.99
x1x2
x4
0.87 0.84
x2x2
x3
Purity: 𝟗𝟗. 𝟏% ± 𝟎. 𝟎𝟕%
Inference time: ∼ 𝟓 seconds per
event per GPU, split between:
∼ 3 ± 1 seconds for embedding
construction
∼ 2 ± 1 seconds for two GNN
steps and processing0.82
0.84
0.86
0.88
0.9
0.92
0.94
0 0.5 1 1.5 2 2.5 3 3.5 4
Tota
l E
ffic
ien
cy
pT [GeV]
Seed Efficiency (PU=200)
GRAPH NEURAL NETWORKS
TRACK SEEDING PERFORMANCE
53
• 0.84 TrackML Score in barrel, no
noise
• Upper limit score out of metric
learning was 0.90, with only barrel,
adjacenct layers, etc.
• Now able to run full detector, with
geometry-free embedding
• Upper limit score out of metric
learning is now at least 0.935
• Can keep pushing this value at the
expense of purity = GPU memory
GRAPH NEURAL NETWORKS
TRACK LABELLING PERFORMANCE
Tra
ck
ML
Sco
re
Barrel-only, adjacent-layers
55
GRAPH NEURAL NETWORKS
CHALLENGES
Going to full detector, does the GNN understand our more epistemologically
motivated truth?
Going to full detector, can we fit a whole model on a GPU?
56
GRAPH NEURAL NETWORKS
CHALLENGES
Going to full detector, does the GNN understand our more epistemologically
motivated truth?
We will see performance results in the code walkthrough
Going to full detector, can we fit a whole model on a GPU? No.
We need some memory hacks: Checkpointing, Larger GPU, Mixed Precision
GRAPH NEURAL NETWORK
MEMORY MANAGEMENT
Half precision
Performance unaffected
Mention no speed
improvements yet,
pending PyTorch (PyTorch
Geometric) Github pull
0.450.4550.460.4650.470.4750.480.4850.490.4950.50.505
0
2
4
6
8
10
12
14
16
32, 8 iter, 1
batch
32, 8 iter, 2
batch
64, 6x, 1
batch
128, 8x, 1
batch
MP
/F
P R
ati
o
Pe
ak
GP
U U
sa
ge
(G
b)
Configuration (hidden features, message passing iters, training batch size)
FP
(GB)
MP
(GB)
57
GRAPH NEURAL NETWORK
MEMORY MANAGEMENT
Checkpointing (Pytorch)
Memory improvement
Minimal speed penalty (~1.2x longer training)
We checkpoint each iteration (partial checkpointing)
so there is still a scaling with iterations
Can checkpoint all iterations (maximal checkpointing)
58
No checkpointing
Maximal
checkpointing
GRAPH NEURAL NETWORK
CODE WALKTHROUGH
59
GRAPH NEURAL NETWORK
PERFORMANCE
60
Full detector, geometry-free construction
METRIC LEARNING & GRAPH NEURAL NETWORK PIPELINE
SUMMARY
We handle full detector, noise, geometry-free inference, distributed training,
with care
Can learn embedding space without layer information, provided we equip
training with hard negative mining, cell information, warmup
Can run GNN of full event, provided we equip training with gradient
checkpointing, mixed precision
Can include noise without re-training, at a small (~20%) penalty to purity
61
BACKUP
64
OUTLOOK
Converging on better architectures (attention, gated RNN, generalising
dense, flat methods to sparse, graph structure - not that the two are
mutually inclusive, there is increasing interest in sparse CNN
techniques, for example)
…
Dwivedi, Vijay Prakash, et al. "Benchmarking graph neural networks." arXiv preprint arXiv:2003.00982 (2020).
65
OUTLOOK Converging on better architectures (attention, gated RNN, generalising
dense, flat methods to sparse, graph structure - not that the two are
mutually inclusive, there is increasing interest in sparse CNN
techniques, for example)
Converging on better methods (sparse operations, triplet graph
structure, fast clustering, approximate NN, piggy-backing off big tech
methods, e.g. Facebook FAISS)
…
Dwivedi, Vijay Prakash, et al. "Benchmarking graph neural networks." arXiv preprint arXiv:2003.00982 (2020).
Google Trends of “Graph Neural Networks”
x1
x2
x3x4
0.99
x1x2
x3x4
0.87 0.84
x2x2
( )
( ) ( )
Doublet graph to higher-order classification66
Converging on better architectures (attention, gated RNN, generalising
dense, flat methods to sparse, graph structure - not that the two are
mutually inclusive, there is increasing interest in sparse CNN
techniques, for example)
Converging on better methods (sparse operations, triplet graph
structure, fast clustering, approximate NN, piggy-backing off big tech
methods, e.g. Facebook FAISS)
…
Dwivedi, Vijay Prakash, et al. "Benchmarking graph neural networks." arXiv preprint arXiv:2003.00982 (2020).
Google Trends of “Graph Neural Networks”
Converging on better hardware (mixed
precision handling on new GPUs/TPUs,
sparse handling in IPUs , compilability
of graph-structure ML libraries for
FPGA ports, e.g. IEEE HPEC
GraphChallenge)
OUTLOOK
67