Graph Data Structures and Graph Neural Networks in High

TRACKINGWITH

GRAPHS

XIANGYANG JU

DANIEL MURNANE

ON BEHALF OF THE

EXA.TRKX COLLABORATION 1

OVERVIEW

1. Why ML/GNNs for High Energy Physics?

2. GNN applications in HEP/ExatrkX

3. ML Tracking Pipeline

4. Metric Learning

Code Walkthrough (Pytorch)

5. Doublet & Triplet GNN

6. DBSCAN for TrackML Score

Code Walkthrough (TensorFlow)

WHY MACHINE LEARNING FOR TRACKING?

High-luminosity scaling problem, means we need something

to compliment traditional tracking algorithms,

but why graphs?

In other words…

Time, Energy, Number of Collisions

Predicted

capacity

Traditional

methods

(scale quadratically)

HL-LHC, 14 TeV

2024 – 2026

6 Billion events/second

WHY GRAPHS SPECIFICALLY?

but why graphs?

Graphs can capture inherent sparsity of much physics data

Hits to

graphs

WHY GRAPHS?

but why graphs?

Graphs can capture inherent sparsity of much physics data

Graphs can capture the manifold and relational structure of much physics data

Conversion to and from graphs can allow manipulation of dimensionality

Graph Neural Networks are booming (i.e. wouldn’t be talking about graphs if there weren’t a wealth of classic

algorithms and NN models for graph data)

Industry research and investment means good outlook for software and hardware optimised for graphs

APPLICATIONS

TrackML dataset ~ HL-LHC siliconhttps://indico.cern.ch/event/831165/contributions/3717124/

High Granularity Calorimeter datahttps://arxiv.org/abs/2003.11603

LArTPC data ~ DUNE experimenthttps://indico.cern.ch/event/852553/contributions/4059542/

Quantum GNN for Particle Track Reconstructionhttps://indico.cern.ch/event/852553/contributions/4057625/

GNNs on FPGAs for Level-1 Triggerhttps://indico.cern.ch/event/831165/contributions/3758961/

APPLICATIONS

TrackML dataset ~ HL-LHC siliconhttps://indico.cern.ch/event/831165/contributions/3717124/

High Granularity Calorimeter datahttps://arxiv.org/abs/2003.11603

LArTPC data ~ DUNE experimenthttps://indico.cern.ch/event/852553/contributions/4059542/

Quantum GNN for Particle Track Reconstructionhttps://indico.cern.ch/event/852553/contributions/4057625/

GNNs on FPGAs for Level-1 Triggerhttps://indico.cern.ch/event/831165/contributions/3758961/

THE EXATRKX PROJECT

MISSION

Optimization, performance and validation studies of ML approaches to the Exascale tracking problem, to enable production-level tracking on next-generation detector systems.

PEOPLE

• Caltech: Joosep Pata, Maria Spiropulu, Jean-Roch Vlimant, Alexander Zlokapa

• Cincinnati: Adam Aurisano, Jeremy Hewes

• FNAL: Giuseppe Cerati, Lindsey Gray, Thomas Klijnsma, Jim Kowalkowski, Gabriel Perdue, Panagiotis Spentzouris

• LBNL: Paolo Calafiura (PI), Nicholas Choma, Sean Conlon, Steve Farrell, Xiangyang Ju, Daniel Murnane, Prabhat

• ORNL: Aristeidis Tsaris

• Princeton: Isobel Ojalvo, Savannah Thais

• SLAC: Pierre Cote De Soux, Francois Drielsma, Kasuhiro Terao, Tracy Usher

https://exatrkx.github.io/

THE PHYSICAL PROBLEM

• “TrackML Kaggle Competition” dataset

• Generated by HL-LHC-like tracking (ACTS) simulation

• 9000 events to train on

• Each event has up to 100,000 layer hits from around

10,000 particles

• Layers can be hit multiple times by same particle

(“duplicates”)

• Non-particle hits present (“noise”)

• “TrackML Kaggle Competition” dataset

• Generated by HL-LHC-like tracking (ACTS) simulation

• 9000 events to train on

• Each event has up to 100,000 layer hits from around

10,000 particles

• Layers can be hit multiple times by same particle

(“duplicates”)

• Non-particle hits present (“noise”)

• Need to construct hit data into graph data, i.e. nodes and edges

• Can use geometric heuristics (have used in past: ~45% efficiency, 5% purity)

• To improve performance, use learned embedding construction

• Ideal final result is a “TrackML score”

𝑆 ∈ 0,1

• All hits belonging to same track labelled with same unique label ⇒ 𝑆 = 1

• We will follow a particular

event, let’s say event #4692

• We will follow a particular

particle, let’s say particle

#1058360480961134592

• It’s a mouthful, so let’s call

her Diane

TRACKING PIPELINE1. Metric Learning

2. Doublet GNN

3. (Optional) Triplet GNN

4. DBSCAN → TrackML score

Raw hit data

embedded

Filter likely,

adjacent

doublets

Train/classify

doublets in GNN

Filter, convert to

triplets

Train/classify

triplets in GNN

Apply cut for

DBSCAN for

track labels

DATASET

OUR PREVIOUS STATE-OF-THE-ART

𝜂 = 1

𝜂 = 2

𝜂 = 3

𝜂 = 0𝜂 = −1

𝜂 = −2

𝜂 = −3

DATASET

Barrel only

𝜂 = 1

𝜂 = 2

𝜂 = 3

𝜂 = 0𝜂 = −1

𝜂 = −2

𝜂 = −3Raw hit data

embedded

Filter likely,

adjacent

doublets

Train/classify

doublets in

Filter, convert

to triplets

Train/classify

tripets in

DATASET

Barrel only

Adjacency condition

𝜂 = 1

𝜂 = 2

𝜂 = 3

𝜂 = 0𝜂 = −1

𝜂 = −2

𝜂 = −3 Adjacency allows clear ordering

of hits in track, and therefore

ground truth

Can also be used to prune

false positives in the pipeline

DATASET

Adjacency allows clear ordering

ground truth

But what about

skipped layers?

Barrel only

Adjacency condition

DATASET

ground truth

But what about

skipped layers?

Barrel only

Adjacency condition

What about the

endcaps?

DATASET

ground truth

But what about

skipped layers?

Barrel only

Adjacency condition

What about the

endcaps?

Let’s define clear ground truth and see if

an embedding space can be learned without

knowledge of detector geometry

DATASET

GEOMETRY-FREE TRUTH GRAPH

1. For each particle, order hits by increasing

distance from creation vertex,

𝑅 = 𝑥2 + 𝑦2 + 𝑧2

2. Group by shared layers

3. Connect all combinations from layer 𝐿𝑖 to 𝐿𝑖+1,

where 𝑅𝑖−1 < 𝑅𝑖 < 𝑅𝑖+1

Creation vertex

DATASET

GEOMETRY-FREE TRUTH GRAPH

The real question is: Can such

a specific definition of truth be

learned in an embedded space?

1. For each particle, order hits by increasing

distance from creation vertex,

𝑅 = 𝑥2 + 𝑦2 + 𝑧2

2. Group by shared layers

3. Connect all combinations from layer 𝐿𝑖 to 𝐿𝑖+1,

where 𝑅𝑖−1 < 𝑅𝑖 < 𝑅𝑖+1

1. For all hits in barrel, embed features (co-ordinates, cell

direction data, etc.)

into N-dimensional space

Raw hit data

embedded

Filter likely,

adjacent

doublets

Train/classify

doublets in GNN

Filter, convert to

triplets

Train/classify

triplets in GNNApply cut for

METRIC LEARNING

OUR PREVIOUS STATE OF THE ART

2. Associate hits from same tracks as close in N-

dimensional distance (close = within Euclidean

distance r)

Raw hit data

embedded

Filter likely,

adjacent

doublets

Train/classify

doublets in GNN

Filter, convert to

triplets

Train/classify

METRIC LEARNING

dimensional distance (close = within Euclidean

distance r)

Raw hit data

embedded

Filter likely,

adjacent

doublets

Train/classify

doublets in GNN

Filter, convert to

triplets

Train/classify

METRIC LEARNING

dimensional distance

3. Score each “target” hit within embedding

neighbourhood against

the “seed” hit at centre = Euclidean distance

Raw hit data

embedded

Filter likely,

adjacent

doublets

Train/classify

doublets in GNN

Filter, convert to

triplets

Train/classify

METRIC LEARNING

Architecture specifics

“Comparative” hinge loss

Negatives punished for being in margin

radius

Positives punished for being outside

margin radius (Δ)

Margin = training radius

Train with random pairs with only (r,𝜙,z): 0.3

– 0.5% purity @ 96% efficiency.

Raw hit data

embedded

Filter likely,

adjacent

doublets

Train/classify

doublets in GNN

Filter, convert to

triplets

Train/classify

METRIC LEARNING IN EMBEDDING SPACE

TOWARDS REALISTIC TRACKING

Where do we lose eff/pur:

Most random pair negatives are easy – The Curse of Easy Negatives

Cell shape gives directional information to each hit

Battle against GPU memory – Can only train on subset of pairs

Where do we lose eff/pur:

Most negatives are easy.

Solution: Hard Negative Mining (HNM)

1. Run event through embedding

2. Build radius graph

3. Train loss on all hits within radius=margin=1 of each hit

4. For speed, use sparse custom-built technique + the

Facebook GPU-powered FAISS KNN library

EFFECTIVE TRAINING

Performance improvement with:

• Random Pairs (RP)

• Hard Negative Mining (HNM)

• Positive Weighting (PW)

• Cell Information (CI) - shape

• (Warm-up WU) – Altogether: HPRCW3

FILTERING

The pipeline so far. We want to get purity (1.3%) up further, so let’s filter these huge graphs.

2 Radius

Graph51

2 Cross

Entropy

FILTERING

TOWARDS REALISTIC TRACKING Just like in embedding, training on random pairs in filter gives poor performance, since the selection mostly gives

easy negatives.

Again, solution is HNM: run event through model (without tracking gradients), get hard negatives, run these through

model (now tracking gradients), training only on them.

Distribution of true edgesDistribution of all edges

out of metric learning

FILTERING

Regime Phys. Truth purity

@ 99% efficiency

PID Truth purity

@ 99% efficiency

Vanilla 6.3% 7.8%

Cell info 8.3% 12.8%

Cell info,

layer+batch norm

14.0% 17.4%

Graph size O(1 million edges) O(3 million edges)

Physical Truth:

Truth is defined as edges

between closest hits in

track, on different layers

PID Truth:

Truth is defined as any edge

connecting hits with the

same Particle ID (PID)

Remember

FILTERING

Does it work? Let’s check Diane:

Pretty good!

True positive

False positive

No false negatives

FILTERING

Does it work? Let’s check Diane:

Not quite as good…

True positive

False positive

False negatives

This is where GNN

comes in

FILTERING

ROBUSTNESS

(In collaboration with University of Washington:

Aditi Chauhan, Alex Schuy, Ami Oka, David Ho, Shie-Chieh Hsu)

FILTERING

ROBUSTNESS

Noise level a proxy for low-pT hits

Robustness of embedding to noise

Not re-trained, that is:

Out-of-the-box penalty around 20%

(In collaboration with University of Washington:

Aditi Chauhan, Alex Schuy, Ami Oka, David Ho, Shie-Chieh Hsu)

METRIC LEARNING IN EMBEDDED SPACE

CODE WALKTHROUGH

GRAPH NEURAL NETWORKS

FOR HIGH ENERGY PHYSICS

Can approximate geometry of the physics problem

Are a generalisation of many other machine learning

techniques

E.g. Message passing convolution generalises CNN

from flat to arbitrary geometry

Can learn node (i.e. hit / spacepoint) features and

embeddings, as well as edge (i.e. relational) features and

embeddings

E.g. In practice, for a LHC-like detector environment, join hits

into graph, and iterate through message-passing of hidden

features

Raw hit data

embedded

Filter likely,

adjacent

doublets

Train/classify

doublets in GNN

Filter, convert to

triplets

Train/classify

triplets in GNN

Apply cut for

DBSCAN for

track labels

Doublet GNN architecture

Performance

Triplet construction +

performance

THE AIM

ARCHITECTURES

𝑣0𝑘+1 = 𝜙(𝑒0𝑗

𝑘 , 𝑣𝑗𝑘, 𝑣0

𝑘)MESSAGE

PASSING

𝑣1𝑘 𝑣2

𝑣3𝑘 𝑣4

𝑣𝑖𝑘 node features

𝑒𝑖𝑗𝑘 edge features

at iteration 𝑘

𝑒01𝑘 𝑒02

𝑒03𝑘 𝑒04

ARCHITECTURES

𝑣0𝑘+1 = MLP( Σ𝑒0𝑗

𝑘 𝑣𝑗𝑘 , 𝑣0

𝑘 )ATTENTION

𝑣1𝑘 𝑣2

𝑣3𝑘 𝑣4

𝑒𝑖𝑗𝑘 edge score [0, 1]

at iteration 𝑘

𝑒01𝑘 𝑒02

𝑘 = 𝑀𝐿𝑃 [𝑣0𝑘 , 𝑣2

𝑒03𝑘 𝑒04

𝑘Veličković, Petar, et al.

"Graph attention

networks." arXiv preprint

arXiv:1710.10903 (2017).

ARCHITECTURES

𝑣0𝑘+1 = 𝜙(𝑣0

𝑘, Σ𝑒0𝑗𝑘+1)

INTERACTION

NETWORK

𝑣1𝑘 𝑣2

𝑣3𝑘 𝑣4

𝑒𝑖𝑗𝑘 edge features

at iteration 𝑘

𝑒01𝑘 𝑒02

𝑘+1 = 𝜙 𝑣0𝑘, 𝑣2

𝑘 , 𝑒02𝑘

𝑒03𝑘 𝑒04

Battaglia, Peter, et al. "Interaction

networks for learning about

objects, relations and

physics." Advances in neural

information processing systems.

• We convert from a doublet graph to triplet graph: triplet

edges have direct access to curvature information, therefore

we hypothesise the accuracy should be even better

• Doublets are associated to nodes, triplets are associated to

0.87 0.84

PERFORMANCE

( ) ( )

0.87 0.84

PERFORMANCE

• We convert from a doublet graph to triplet graph: triplet

edges have direct access to curvature information, therefore

we hypothesise the accuracy should be even better

• Doublets are associated to nodes, triplets are associated to

( ) ( )

0.87 0.84

• We convert from a doublet graph

to triplet graph: triplet

edges have direct access to

curvature information, therefore

we hypothesise the accuracy

should be even better

• Doublets are associated to

nodes, triplets are associated to

PERFORMANCE

Barrel-only, adjacent-layers

( ) ( )

0.87 0.84

Purity: 𝟗𝟗. 𝟏% ± 𝟎. 𝟎𝟕%

Inference time: ∼ 𝟓 seconds per

event per GPU, split between:

∼ 3 ± 1 seconds for embedding

construction

∼ 2 ± 1 seconds for two GNN

steps and processing0.82

0 0.5 1 1.5 2 2.5 3 3.5 4

pT [GeV]

Seed Efficiency (PU=200)

TRACK SEEDING PERFORMANCE

• 0.84 TrackML Score in barrel, no

• Upper limit score out of metric

learning was 0.90, with only barrel,

adjacenct layers, etc.

• Now able to run full detector, with

geometry-free embedding

• Upper limit score out of metric

learning is now at least 0.935

• Can keep pushing this value at the

expense of purity = GPU memory

TRACK LABELLING PERFORMANCE

Barrel-only, adjacent-layers

CHALLENGES

Going to full detector, does the GNN understand our more epistemologically

motivated truth?

Going to full detector, can we fit a whole model on a GPU?

CHALLENGES

Going to full detector, does the GNN understand our more epistemologically

motivated truth?

We will see performance results in the code walkthrough

Going to full detector, can we fit a whole model on a GPU? No.

We need some memory hacks: Checkpointing, Larger GPU, Mixed Precision

GRAPH NEURAL NETWORK

MEMORY MANAGEMENT

Half precision

Performance unaffected

Mention no speed

improvements yet,

pending PyTorch (PyTorch

Geometric) Github pull

0.450.4550.460.4650.470.4750.480.4850.490.4950.50.505

32, 8 iter, 1

32, 8 iter, 2

64, 6x, 1

128, 8x, 1

Configuration (hidden features, message passing iters, training batch size)

MEMORY MANAGEMENT

Checkpointing (Pytorch)

Memory improvement

Minimal speed penalty (~1.2x longer training)

We checkpoint each iteration (partial checkpointing)

so there is still a scaling with iterations

Can checkpoint all iterations (maximal checkpointing)

No checkpointing

Maximal

checkpointing

CODE WALKTHROUGH

PERFORMANCE

Full detector, geometry-free construction

METRIC LEARNING & GRAPH NEURAL NETWORK PIPELINE

SUMMARY

We handle full detector, noise, geometry-free inference, distributed training,

with care

Can learn embedding space without layer information, provided we equip

training with hard negative mining, cell information, warmup

Can run GNN of full event, provided we equip training with gradient

checkpointing, mixed precision

Can include noise without re-training, at a small (~20%) penalty to purity

BACKUP

OUTLOOK

Converging on better architectures (attention, gated RNN, generalising

dense, flat methods to sparse, graph structure - not that the two are

mutually inclusive, there is increasing interest in sparse CNN

techniques, for example)

Dwivedi, Vijay Prakash, et al. "Benchmarking graph neural networks." arXiv preprint arXiv:2003.00982 (2020).

OUTLOOK Converging on better architectures (attention, gated RNN, generalising

Converging on better methods (sparse operations, triplet graph

structure, fast clustering, approximate NN, piggy-backing off big tech

methods, e.g. Facebook FAISS)

Google Trends of “Graph Neural Networks”

0.87 0.84

( ) ( )

Doublet graph to higher-order classification66

Converging on better architectures (attention, gated RNN, generalising

Converging on better methods (sparse operations, triplet graph

structure, fast clustering, approximate NN, piggy-backing off big tech

methods, e.g. Facebook FAISS)

Google Trends of “Graph Neural Networks”

Converging on better hardware (mixed

precision handling on new GPUs/TPUs,

sparse handling in IPUs , compilability

of graph-structure ML libraries for

FPGA ports, e.g. IEEE HPEC

GraphChallenge)

OUTLOOK

Graph Data Structures and Graph Neural Networks in High

Documents

GRAPH NEURAL NET USING ANALYTICAL GRAPH FILTERS AND

Graph Neural Networks with Heterophily

Inductive biases, graph neural networks, attention … › present_file › Inductive biases...Graph neural network 27 Graph neural networks Battaglia, Peter W., et al. "Relational

Learning Discrete Structures for Graph Neural Networksproceedings.mlr.press/v97/franceschi19a/franceschi19a.pdf · 2.2. Graph Neural Networks Graph neural networks are a popular class

Hierarchical Bipartite Graph Neural Networks: Towards

Graph Neural Networks - GitHub Pages · “Gated Graph Sequence Neural Networks” Li et al. 2017 “The Graph Neural Network Model” Scarselli et al. 2009 “Relational inductive

Benchmarking Graph Neural Networks - arXiv

Graph Neural Networks - HITir.hit.edu.cn/~xiachongfeng/slides/Graph Neural Networks.pdf · A Gentle Introduction to Graph Neural Networks (Basics, DeepWalk, and GraphSage) Adjacency

Hyperbolic Graph Convolutional Neural Networkspapers.nips.cc › paper › 8733-hyperbolic-graph-convolutional-neural-networks.pdfGraph Convolutional Neural Networks (GCNs) are state-of-the-art

Graph Theory Approaches to Neural Structures and Dynamics

Neural Network Structures

NEURAL STRUCTURES

Explain Graph Neural Networks to Understand Weighted Graph

A gentle introduction to graph neural networks gentle... · Principles of graph neural network Updates in a graph neural network • Edge update : relationship or interactions, sometimes

Chapter 15 Dynamic Graph Neural Networks

Hyperbolic Graph Neural Networks - Facebook Research

Learning Discrete Structures for Graph Neural …Learning Discrete Structures for Graph Neural Networks 2. Background We ﬁrst provide some background on graph theory, graph neural

Auto-GNN: Neural Architecture Search of Graph Neural NetworksAuto-GNN: Neural Architecture Search of Graph Neural Networks Kaixiong Zhou, Qingquan Song, Xiao Huang, Xia Hu Department

Graph neural networks - chrome.ws.dei.polimi.it

Calendar Graph Neural Networks for Modeling Time …Calendar Graph Neural Networks for Modeling Time Structures in Spatiotemporal User Behaviors Daheng Wang1, Meng Jiang1, Munira Syed1,