Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Identification of Jets Containing b-Hadrons withRecurrent Neural Networks at the ATLAS
ExperimentATL-PHYS-PUB-2017-003
Dan GuestATLAS Collaboration
UC Irvine
May 9, 2017
[email protected] (UCI) RNN b-tagging May 9, 2017 1 / 20
Why b-tag? (an oversimplification)
I Want Higgs?
I Higgs mostly
decays to b-quarks
I b-quarks make jets
I In LHC, everythingmakes jets
I Not everythingmakes b-jets
≈2.3 MeV/c²
1/2u
up
2/3
≈4.8 MeV/c²
1/2 ddown
-1/3
≈1.275 GeV/c²
1/2c
charm
2/3
≈95 MeV/c²
1/2s
strange
-1/3
≈173.07 GeV/c²
1/2 ttop
2/3
≈4.18 GeV/c²
1/2 bbottom
-1/3
0
1g
gluon
0
0.511 MeV/c²
1/2e
electron
-1
105.7 MeV/c²
1/2μ
muon
-1
1.777 GeV/c²
1/2 τtau
-1
<2.2 eV/c²
1/2 νe
electronneutrino
0
<0.17 MeV/c²
1/2 νμ0
muonneutrino
<15.5 MeV/c²
1/2 ντ0
tauneutrino
80.4 GeV/c²
1 WW boson
±1
91.2 GeV/c²
1 ZZ boson
0
0
1
photon
0 γ
≈126 GeV/c²
0 H0
Higgsboson
mass
charge
spin
QU
AR
KS
LEP
TO
NS
GA
UG
E B
OS
ON
S
[email protected] (UCI) RNN b-tagging May 9, 2017 2 / 20
The Standard Model (as Seen by Collider Physics)
I Some are stable
I Many unstable
I Some form jets
I Some metastable
I Neutrinos → EmissT
I Short-lived particles are a big part of what we measure!
[email protected] (UCI) RNN b-tagging May 9, 2017 3 / 20
The b-hadron Decay Chain
B D
K
PV
I b-hadrons decay through cascade
I βγcτ ≈ 6.4 mm for B with pT = 70 GeV
I But many decay distances are O(detector resolution)
[email protected] (UCI) RNN b-tagging May 9, 2017 4 / 20
Reconstructing Secondary VerticesThe ATLAS approaches
Single SV
PV
SV
JetFitter
PV
Flight Line
I Many discriminants come from vertices, combine them with ML
[email protected] (UCI) RNN b-tagging May 9, 2017 5 / 20
The problem with SV tagging
I Sometimes we don’t find avertex
I Requires cutting ontrack-vertex compatibility
I Cuts always looseinformation
I Tuned “by hand”
I Experiment-specific
(JF)2trk vertices≥
N0 1 2 3 4 5
Arb
itra
ry u
nits
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
b jets
c jets
Lightflavour jets
ATLAS Simulation Preliminary
t=13 TeV, ts
I There is no FastJet for vertex reconstruction
[email protected] (UCI) RNN b-tagging May 9, 2017 6 / 20
Impact parameter (IP) tagging
I Take all tracks in a jet
I Apply some selection
I Extrapolate to perigee
I Per-track discriminants:I Sd0 ≡ d0/σd0I Sz0 ≡ z0/σz0I track “quality”
IPPV
I Compute per-track likelihood Lf (track) with f ∈ {b, c, light}I Per-jet likelihood pf =
∏trk Lf (track variables)
I IP based tagging is the problem we solve with RNNsI More on this later
[email protected] (UCI) RNN b-tagging May 9, 2017 7 / 20
Putting it all together
Low-Level
I IP: track-based variables
I Likelihood: gives pb, pc, plightI SV: gives vertex variables
I JetFitter: similar to SVx
High-level
I MV2: combine with BDT
I It’s easy to focus on the high-level tagger (MV2), but upstream isimportant too
[email protected] (UCI) RNN b-tagging May 9, 2017 8 / 20
IP3D: ATLAS’s IP Tagger
I Need to define Lf (track)I Lf (Sd0 , Sz0 , category)I Sd0 shown right
I Use histograms fromsimulation
I 3D binning scheme:I 35 bins in Sd0I 20 bins in Sz0I 14 track categories
I track category representsquality of track
Track signed d0 significance (Good)
20− 10− 0 10 20 30 40
Arb
itra
ry u
nits
6−10
5−10
4−10
3−10
2−10
1−10
1
10
ATLAS Simulation Preliminary
t = 13 TeV, ts b jets
c jets
Lightflavour jets
[email protected] (UCI) RNN b-tagging May 9, 2017 9 / 20
Improving Upstream Taggers: What IP3D misses
I Relations among tracks:I relation to neighbor binsI relation to neighbor tracks
I These are important (see right)
I New (SV inspired) track variables:
I pfracT ≡ ptrackT /pjetTI ∆R(track, jet)
Curse of Dimensionality
I Already 29,400 bins
I New variable →∼ 10× bins (andevents to “train”)
d0Leading S
-20 -10 0 10 20 30 40 50 60
d0S
uble
adin
g S
-20
-10
0
10
20
30
40
50
60
-410
-310
-210|<2.5η>20 GeV, |
Tb-jets, p
ATLAS Simulation Preliminary
t=13 TeV, ts
d0Leading S
-20 -10 0 10 20 30 40 50 60
d0S
uble
adin
g S
-20
-10
0
10
20
30
40
50
60
-310
-210
-110
|<2.5η>20 GeV, |T
light-jets, p
ATLAS Simulation Preliminary
t=13 TeV, ts
[email protected] (UCI) RNN b-tagging May 9, 2017 10 / 20
Recurrent Neural Networks (RNNs)
I RNNs can process an arbitrarily length sequence
I Output is a fixed dimensional vector for each jet
[email protected] (UCI) RNN b-tagging May 9, 2017 11 / 20
[email protected] (UCI) RNN b-tagging May 9, 2017 12 / 20
ROC Curves for a Multi-Background Discriminant
I Eventually we’ll combine with vertex-based approaches
I Conventional HEP discriminants are binaryI Train against a mix of backgrounds (i.e. MV2 is 7% c-jets)
I We use 4 outputs:I pb: bottom jetI pc: charm jetI plight: “light” jet (u, d, s, g)I pτ : τ jet
I Combine everything for the sake of plots
DRNN = lnpb
fcpc + fτpτ + (1− fc − fτ )plight(1)
I The f weighting parameters can be adjusted post-training
I For this talk: fc = 0.07, fτ = 0
[email protected] (UCI) RNN b-tagging May 9, 2017 13 / 20
RNN Performance (compared to IP3D)
bεb-jet efficiency, 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
lεlig
ht-je
t rej
ectio
n, 1
/
1
10
210
310
R)∆ Frac, T
, category, pz0
, Sd0
RNNIP(S
Frac)T
, category, pz0
, Sd0
RNNIP(S
, category)z0
, Sd0
RNNIP(S
IP3D
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
bεb-jet efficiency, 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
cεc-
jet r
ejec
tion,
1/
1
10
R)∆ Frac, T
, category, pz0
, Sd0
RNNIP(S
Frac)T
, category, pz0
, Sd0
RNNIP(S
, category)z0
, Sd0
RNNIP(S
IP3D
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
I Lowest line is IP3D
I Next up: RNN with IP3D inputs
I Each new variable adds discrimination
I At 70% working point:I RNN with IP3D inputs improves light rejection by 1.7I With ∆R(track, jet) and pfracT , improves light rejection by 2.5
[email protected] (UCI) RNN b-tagging May 9, 2017 14 / 20
RNN Performance (compared to high-level tagger)
bεb-jet efficiency, 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
lεlig
ht-je
t rej
ectio
n, 1
/
1
10
210
310
410MV2c10
RNNIP
IP3D
SV1
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
bεb-jet efficiency, 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
cεc-
jet r
ejec
tion,
1/
1
10
210MV2c10
RNNIP
IP3D
SV1
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
I MV2 using IP3D still rejects more background for εb < 0.9
I But this uses JetFitter and SV → much more information
I RNN as input for MV2 is outside the scope of this talkI But we can imagine replacing IP3D with the RNN
[email protected] (UCI) RNN b-tagging May 9, 2017 15 / 20
RNN Performance by pT
[GeV]T
b-jet p0 100 200 300 400 500
lεlig
ht-je
t Rej
ectio
n, 1
/
100
200
300
400
500
600
700
800
900MV2c10
RNNIP
IP3D
SV1
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
Flat 70% b-tagging WP
[GeV]T
b-jet p0 100 200 300 400 500
cεc-
jet R
ejec
tion,
1/
5
10
15
20
25
30
35
40
MV2c10
RNNIP
IP3D
SV1
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
Flat 70% b-tagging WP
I Cut on the discriminant such that εb = 0.7 in each pT bin
I Same trend as previous slide: rejection for IP3D < RNN < MV2
I RNN tagger is no more pT dependent than other taggers
[email protected] (UCI) RNN b-tagging May 9, 2017 16 / 20
RNN output correlation with input: Sd0 and Sz0
track in sequencethi2 4 6 8 10 12 14
)d0
, SR
NN
(DρC
orre
latio
n,
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
b-jetsc-jetslight-jets
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
track in sequencethi2 4 6 8 10 12 14
)z0
, SR
NN
(DρC
orre
latio
n,
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
b-jetsc-jetslight-jets
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
I DRNN output is highly correlated with jet Sd0 for “early” tracks in|Sd0 | ordering
I Interesting, but maybe not surprising: b hadrons have ∼ 5 tracks
I Effect is less pronounced for Sz0
[email protected] (UCI) RNN b-tagging May 9, 2017 17 / 20
RNN output correlation with input: ∆R and pfracT
track in sequencethi2 4 6 8 10 12 14
Fra
ctio
n)T
, pR
NN
(DρC
orre
latio
n,
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
b-jetsc-jetslight-jets
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
track in sequencethi2 4 6 8 10 12 14
R)
∆, R
NN
(DρC
orre
latio
n,
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
b-jetsc-jetslight-jets
ATLAS Simulation Preliminary
t=13 TeV, ts
|<2.5η>20 GeV, |T
p
I Much less correlation between DRNN and ∆R(track, jet) or pfracT
I But these are useful discriminants nonetheless
[email protected] (UCI) RNN b-tagging May 9, 2017 18 / 20
Notes on Software
I We train with KerasI Use Theano backend
I Our reconstruction framework doesn’t support batchedNumPy arrays
I Within our reconstruction, we evaluate with lwtnnI Used in ATLAS for top and W taggingI Also used by CMS for DeepFlavour
Help and other ideas welcome
I lwtnn is written “as needed”
I Is there a more sustainable approach?
[email protected] (UCI) RNN b-tagging May 9, 2017 19 / 20
Conclusions
I RNNs are a promising tool for flavor taggingI Use relatively low-level variablesI Can augment vertex-based approaches
I Many interesting questions:I What other low-level variables could we include?I How does this complement a high-level tagger (e.g. MV2,
DeepFlavour)?I How does this compare to the CMS approach?I Can we “understand” (visualize) what we’ve learned?
I Thanks for listening, ideas are welcome!
[email protected] (UCI) RNN b-tagging May 9, 2017 20 / 20
BACKUP
[email protected] (UCI) RNN b-tagging May 9, 2017 21 / 20
backup
Thanks
I Michela Paganini and Jonathan Shlomi for the graphics
I Zihao Jiang, Michael Kagan, Michela, and the rest of the RNNteam for training lots of networks
I The ATLAS flavor tagging group for a good problem
I ATLAS for all the simulation
[email protected] (UCI) RNN b-tagging May 9, 2017 22 / 20
backup
IP3D Categories
Fractional contribution [%]# Category b-jets c-jets light-jets0 No hits in first two layers; expected hit in IBL and b-layer 1.9 2.0 1.91 No hits in first two layers; expected hit in IBL and no expected hit in b-layer 0.1 0.1 0.12 No hits in first two layers; no expected hit in IBL and expected hit in b-layer 0.04 0.04 0.043 No hits in first two layers; no expected hit in IBL and b-layer 0.03 0.03 0.034 No hit in IBL; expected hit in IBL 2.4 2.3 2.15 No hit in IBL; no expected hit in IBL 1.0 1.0 0.96 No hit in b-layer; expected hit in b-layer 0.5 0.5 0.57 No hit in b-layer; no expected hit in b-layer 2.4 2.4 2.28 Shared hit in both IBL and b-layer 0.01 0.01 0.039 At least one shared pixel hits 2.0 1.7 1.510 Two or more shared SCT hits 3.2 3.0 2.711 Split hits in both IBL and b-layer 1.0 0.87 0.612 Split pixel hit 1.8 1.4 0.913 Good 83.6 84.8 86.4
I Fractions are based on simulated tt̄
[email protected] (UCI) RNN b-tagging May 9, 2017 23 / 20
backup
Training
I Use 3.2 million jets from simulated tt̄
I Training time: with a CPU, a few days on a (busy) cluster
I We only train on first 15 tracks (0.5% of jets 15+ tracks)
Track Selection
I Jet Algorithm: Anti-kt, R = 0.4
I Track pT > 1 GeV
I |d0| < 1 mm, |z0 sin θ| < 1.5 mm
I nSihits ≥ 7, nSiholes ≤ 2, npixelholes ≤ 1
[email protected] (UCI) RNN b-tagging May 9, 2017 24 / 20