39
Machine Learning in PandaRoot GlueX-Panda Workshop G.Washington University, May 2019 Ralf Kliemt (GSI)

Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

  • Upload
    others

  • View
    27

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Machine Learning in PandaRoot

GlueX-Panda Workshop G.Washington University, May 2019

Ralf Kliemt (GSI)

Page 2: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Motivation

• Machine Learning (ML) is about modelling data

• Self-learning algorithms gain knowledge from data to make predictions

Why?

• Gain computation speed in online scenarios

• More precision, e.g. by respecting correlations

• Let algorithms do the tedious tasks of recognising patterns, structures, principal components etc.

!2

Page 3: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

!3

Which type of ML?

Page 4: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

!4

Which type of ML?

Page 5: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

PANDA FPGA

Sim Digi Local Reco

Global Reco

Event Building

Event Selection

Analysis

Storage

Paper

Raw Data

online

offline

sim

Alignm. Calib.

ML Activities at PANDA

!5

Page 6: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

PANDA FPGA

Sim Digi Local Reco

Global Reco

Event Building

Event Selection

Analysis

Storage

Paper

Raw Data

online

offline

sim

Alignm. Calib.

ML Activities at PANDA

!6

Page 7: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Key Concept

!7

Page 8: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Boosted Decision Tree (BDT)

!8

• Break down data by steps of decisions

• Based on features in training data

• Splitting by maximising information gain

—> Typical application: Classification for PID

Page 9: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Artificial Neural Networks (ANN)

!9

• Transform data meaningfully

• Learn iteratively

Page 10: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

A lot to choose from…

!10

Page 11: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

!11

A lot to choose from…

Page 12: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Programming Frameworks & Packages

!12

• ROOT / TMVA• NumPy• TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs)• MLlib Apache Spark• Sci-Kit Learn • PyTorch (Deep Learning) • DL4J (Deep Learning, Java)• R Implementations

Page 13: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Popular Choices

!13

Page 14: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Machine Learning for Forward Tracking

Page 15: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

!15

Artificial Neural Networks:

Application to the FTS:

9 Create all possible combinations of hit pairs (adjacent

layers).

9 Train the network to predict if hit pairs are on the

same track or not.

9 Input observables:

1) Hit pair positions in x-z projection (vertical layers).

2) Drift radii (Isochrones).

3) Distance between hits.

9 Output:

1) Probability that hit pair are on the same track.

9 Connect hits that pass the probability cut (threshold).

e.g. probability(h1-h2)> threshold, and

probability(h2-h3)> threshold, so

h1, h2, h3 are on the same track.

03.April.2019 Page 9

Machine Learning For Track Findingat PANDA FTS

Institut für Kernphysik (IKP)

Forschungszentrum Jülich

Waleed Esmail, Tobias Stockmanns, and James Ritman

On Behalf of the PANDA Collaboration

Contribution at “Connecting the Dots 2019”

Page 16: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

!16

A) first layers

B) middle layers inside Magnet

C) last layers

AB

C

ANN Tracking: Pattern Recognition with parallel layers

Page 17: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

!17

ANN Tracking: Residuals including skewed layers

First promising results

inside magnetic

field

Page 18: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

!18

RNN (LSTM) Tracking: Residuals including skewed layers

Next Step: Track Fitting with RNN - stay tuned

Page 19: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Machine Learning for Particle Identification

Page 20: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

PID: Usual Observables

!20

p [GeV/c]0 2 4 6

E/p

0.2

0.4

0.6

0.8

1

1

10

210

EMC: E/p vs. p

p [GeV/c]0 2 4 6

[rad

]C

Θ

0.10.20.30.40.50.60.70.80.9

1

10

210

vs. pCΘDRC:

p [GeV/c]0 2 4 6

[rad

]C

Θ

0.10.20.30.40.50.60.70.80.9

1

10

210

vs. pCΘDSC:

p [GeV/c]0 2 4 6

dE/d

x [a

.u.]

2468

1012141618

1

10

210

STT: dE/dx vs. p

p [GeV/c]0 2 4 6

[cm

]iro

nL

1020304050607080

-110

1

10

)µ vs. p (ironMUO: L

p [GeV/c]0 2 4 6

[cm

]iro

nL

1020304050607080

-110

1

10

)µ vs. p (non-ironMUO: L

Figure 3: PID raw detector info for EMC, DIRC, DISC, STT and MUO. Distributions are superposedfor all particle species (electrons, muons, pions, kaons, protons) as a function of momentum.

p [GeV/c]0 2 4 6 8 10

P(e)

0

0.2

0.4

0.6

0.8

1P(e) vs p (Electrons)

p [GeV/c]0 2 4 6 8 10

)πP(

0

0.2

0.4

0.6

0.8

1) vs p (Pions)πP(

p [GeV/c]0 2 4 6 8 10

P(K)

0

0.2

0.4

0.6

0.8

1P(K) vs p (Kaons)

p [GeV/c]0 2 4 6 8 10

P(p)

0

0.2

0.4

0.6

0.8

1P(p) vs p (Protons)

p [GeV/c]0 2 4 6 8 10

P(e)

0

0.2

0.4

0.6

0.8

1P(e) vs p (Non-Electrons)

p [GeV/c]0 2 4 6 8 10

)πP(

0

0.2

0.4

0.6

0.8

1) vs p (Non-Pions)πP(

p [GeV/c]0 2 4 6 8 10

P(K)

0

0.2

0.4

0.6

0.8

1P(K) vs p (Non-Kaons)

p [GeV/c]0 2 4 6 8 10

P(p)

0

0.2

0.4

0.6

0.8

1P(p) vs p (Non-Protons)

Figure 4: Graphical representation of the combined PID likelihood values (detectors: EMC, STT,DRC, DSC, MUO) for elecrons, pions, kaons and protons as a function of particle momentum. Theplots in the upper row show the distributions for the correct particle type, the lower one for theincorrect type. It can clearly be seen, that for the correct type the distributions tend to higherlikelihood values, for the incorrect type they accumulate around P = 0. The PID preselection for thestudies in this note was choses to be P > 0.1 as very loose veto against wrong particle types.

11

(charged particles only)

Page 21: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

PID Approaches

!21

P (~x|h) = L(~x|h)⇥ P (h)Ph=e,µ,⇡,K,p

L(~x|h)⇥ P (h)<latexit sha1_base64="3ZvoEMUZT2fNdKs/18Cr9fQEUpU=">AAACSnicdVDPSxtBGJ2Nv2PV2B57GZRChBB29aCXQGgvhXrYQqNCJoTZybfu4MzuMvOtGrb750kv7aXQg6f+BV56sJRenCQerNoHA4/3vsf3zYtyJS36/rVXm5tfWFxaXqmvvlhb32hsvjyyWWEE9ESmMnMScQtKptBDiQpOcgNcRwqOo7N3E//4HIyVWfoJxzkMND9NZSwFRycNGzxssnMQ5WX1OdnpsNhwUR4+kBhKDZaGzWSnKpktNFNSS7TDMulAi+mixXLZ+tDKq/+lho1tv+1PQZ+S4J5sd8OL259X336Ew8Z3NspEoSFFobi1/cDPcVByg1IoqOqssJBzccZPoe9oyt2qQTmtoqJvnDKicWbcS5FO1YeJkmtrxzpyk5pjYh97E/E5r19gfDAoZZoXCKmYLYoLRTGjk17pSBoQqMaOcGGku5WKhLs60bVfdyUEj7/8lBzttoO99u5H18ZbMsMyeU22SJMEZJ90yXsSkh4R5Au5Ibfkt/fV++X98f7ORmvefeYV+Qe1+TsV/7iv</latexit>

L(~x|h) =Y

k

pk(~x|h)<latexit sha1_base64="woYPUKldMCmRspq6e3C9QClfum8=">AAACC3icbVC7SgNBFJ31GeNr1dJmSBBiE3ZjoY0QTKGFRQTzgGxYZmdnk2FnH8zMBsOaXgQLf8TGQhFbP0A7f0acPIqYeODC4Zx7ufceJ2ZUSMP41hYWl5ZXVjNr2fWNza1tfWe3LqKEY1LDEYt400GCMBqSmqSSkWbMCQocRhqOXxn6jR7hgkbhtezHpB2gTkg9ipFUkq3nLgtWj+D0ZnDbPTy1Yh65th/b/pRq63mjaIwA54k5Ifny+X3lx3r8rNr6l+VGOAlIKDFDQrRMI5btFHFJMSODrJUIEiPsow5pKRqigIh2OvplAA+U4kIv4qpCCUfq9ESKAiH6gaM6AyS7YtYbiv95rUR6J+2UhnEiSYjHi7yEQRnBYTDQpZxgyfqKIMypuhXiLuIISxVfVoVgzr48T+qlonlULF2pNM7AGBmwD3KgAExwDMrgAlRBDWBwB57AC3jVHrRn7U17H7cuaJOZPfAH2scvmcmfKw==</latexit>

k = MVD dE/dx, DRC thetaC …

Probability that a given track with parameters x corresponds to particle type h

Combination of measurements:Bayes:

Machine Learning:A. Boosted Decision Tree (BDT)B. “Deep Learning” Artificial Neural Network (ANN)

—> gain performance by considering correlations

Page 22: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Input Features

!22

Input:

• 𝑝𝑝→𝑋𝑋𝑌𝑌,• where X and Y =

e∓,𝜋∓,𝜇∓,k∓,p∓

• Beam momentum: 15GeV/c

Page 23: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Performance Plots

!23

Boosted Decision Tree Artificial Neural Network

Page 24: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Confusion Matrices

!24

Boosted Decision Tree Artificial Neural Network

Page 25: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Pions

!25

Boosted Decision Tree Artificial Neural Network

Page 26: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Kaons

!26

Boosted Decision Tree Artificial Neural Network

Page 27: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Machine Learning for the Software Trigger

Page 28: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Expected Data Rates

!28

• PANDA will run with a continuous beam• Event rates will be high, some events will overlap• Storage constraints in size & bandwidth• Data rate has to be reduced by 1/1000• No specific trigger on hardware possible• Event topology of signals similar to background

—> no “Jets” or similarly obvious features

Solution: An online physics filter (“software trigger”)

Page 29: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Event Generation • Signal• Background

Simulation & Reconstruction

Event Filtering • Combinatorics • Mass Window Selection• Trigger Specific Selection → Event Tagging

Global Trigger Tag

!29

Present status of the PANDA software trigger

March 5, 2014

Abstract

This note presents the current status of the PANDA software trigger project. Apart from thepresent results obtained from Monte Carlo simulated events for various PANDA physics channelsof interest, the task is definded and intersections to the DAQ and detector projects are pointed out.

EvtGen Physics Channel 1 Physics Channel 2

... Physics Channel m

DPM

Background

Toy MC Full MC

Trigger 1 Trigger n

Trigger Decision (Logical OR)

Trigger 2 ... Trigger 3

1

reduce bg. by 1/1000

Software Trigger

Page 30: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Software Trigger - Cuts

!30

• Cut based approach, optimised on signal & bg. MC• Many observables taken into account• Correlations are not respected

Entries 968

]2) [GeV/c-e+m(e0 1 2 3 4 5 6 7

05

1015202530354045 Entries 968

= 0.0%∈

-e+ e→ ppEntries 6926

]2) [GeV/c-K+m(K0.5 1 1.5 2 2.5 30

10

20

30

40

50Entries 6926

= 0.3%∈

-K+ K→ φEntries 94141

]2) [GeV/c-π+ KS

m(K0 1 2 3 4 5 6

0500

1000150020002500300035004000 Entries 94141

= 72.5%∈

-π + KS K→ cη

Entries 968

]2) [GeV/c-e+m(e0 1 2 3 4 5 6

0

5

10

15

20

25

30 Entries 968

= 0.1%∈

-e+ e→ ψJ/Entries 964

]2) [GeV/c-µ+µm(0 1 2 3 4 5 6

05

1015202530354045 Entries 964

= 0.2%∈

= 89.1%)t∈(-µ+µ → ψJ/

Entries 147835

]2) [GeV/c+π-m(K

0 1 2 3 4 5 60

200400600800

100012001400160018002000 Entries 147835

= 77.2%∈

+π- K→ 0D

Entries 128092

]2) [GeV/c+π+π-m(K

0 1 2 3 4 5 60

200400600800

100012001400160018002000 Entries 128092

= 45.7%∈

+π+π- K→ +D

Entries 24481

]2) [GeV/c+π-K+m(K

0 1 2 3 4 5 60

50100150200250300350

Entries 24481

= 6.4%∈

+π-K+ K→ +

sDEntries 32319

]2) [GeV/c-πm(p0.5 1 1.5 2 2.5 30

20406080

100120140160180200220 Entries 32319

= 2.1%∈

-π p→ ΛEntries 14226

]2) [GeV/c+π-m(pK

0 1 2 3 4 5 60

20406080

100120140160180 Entries 14226

= 3.2%∈

+π- pK→ cΛ

⌘ c!

KsK

+⇡+

Figure 9: Mass window cuts (ToyMC) — Illustration of simultaneous tagging for the ⌘c ! KsK+⇡+

dataset example atps= 5.5GeV. The individual signal e�ciencies ✏ for the di↵erent trigger lines

after the mass window cuts have been applied are given on the corresponding plots, in addition theglobal e�ciency ✏tot for all 10 channels applied simultaneously for triggering are given top/right, fordiscussion see text.

Entries 15610

]2) [GeV/c-e+m(e0 1 2 3 4 5 6 7

0200400600800

1000120014001600180020002200 Entries 15610

= 0.0%∈

-e+ e→ ppEntries 25069

]2) [GeV/c-K+m(K0.5 1 1.5 2 2.5 30

100

200

300

400

500 Entries 25069

= 0.8%∈

-K+ K→ φEntries 210195

]2) [GeV/c-π+ KS

m(K0 1 2 3 4 5 6

0500

1000150020002500300035004000 Entries 210195

= 3.4%∈

-π + KS K→ cη

Entries 15610

]2) [GeV/c-e+m(e0 1 2 3 4 5 6

0200400600800

10001200140016001800 Entries 15610

= 0.0%∈

-e+ e→ ψJ/Entries 6232

]2) [GeV/c-µ+µm(0 1 2 3 4 5 6

020406080

100120140160180 Entries 6232

= 0.0%∈

= 21.9%)t∈(-µ+µ → ψJ/

Entries 295962

]2) [GeV/c+π-m(K

0 1 2 3 4 5 60

2000

4000

6000

8000

10000 Entries 295962

= 7.2%∈

+π- K→ 0D

Entries 224819

]2) [GeV/c+π+π-m(K

0 1 2 3 4 5 60

50010001500200025003000350040004500

Entries 224819

= 8.6%∈

+π+π- K→ +D

Entries 74779

]2) [GeV/c+π-K+m(K

0 1 2 3 4 5 60

200400600800

1000120014001600 Entries 74779

= 2.8%∈

+π-K+ K→ +

sDEntries 667078

]2) [GeV/c-πm(p0.5 1 1.5 2 2.5 30

10002000300040005000600070008000 Entries 667078

= 7.6%∈

-π p→ ΛEntries 90669

]2) [GeV/c+π-m(pK

0 1 2 3 4 5 60

200400600800

100012001400 Entries 90669

= 5.2%∈

+π- pK→ cΛD

PM

Figure 10: Mass window cuts (ToyMC) — Illustration of simultaneous tagging for the DPM back-ground dataset example at

ps= 5.5GeV. The individual signal e�ciencies ✏ for the di↵erent trigger

lines after the mass window cuts have been applied are given on the corresponding plots, in additionthe global e�ciency ✏tot for all 10 channels applied simultaneously for triggering are given top/right,for discussion see text.

Note that due to combinatorics, the 50k input events may result in a larger number of entries inthe histograms, which is about a factor three larger in this example. The other trigger lines willcross-tag our channel at hand at various rates, e.g. does the e

+e�-trigger accept no event of this

dataset, thus the e�ciency is ✏ = 0.0%, whereas e.g. ✏ = 43.6% of the events are accepted by the�-trigger, and so on. In total, the 10 trigger lines tag ✏t = 90.4% of the events of the D

+s -data.

For the second example, the ⌘c-dataset (Fig. 9), the 8� mass cut applied on the ⌘c mass forthe ⌘c!KSK

+⇡� trigger results in an e�ciency ✏ = 72.5%. Also here, the e

+e�-trigger does not

accept any event of this dataset (✏ = 0.0%), and e.g. ✏ = 0.3% of the events are accepted by the�-tag, and so on. The total e�ciency of the 10 simultaneous trigger lines for the ⌘c-data results in✏t = 89.1%.

In case of the DPM-dataset (Fig. 10), applying the 8� mass cuts for all 10 trigger lines result ina total e�ciency ✏t = 21.9% (e.g. ✏ = 2.8% for the D

+s -tag, ✏ = 3.4% for the ⌘c-tag, ✏ = 0.0% for

17

DPM

2014

Page 31: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Software Trigger - Cuts

!31

[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

e+e- φ cη (2e)ψJ/ )µ(2ψJ/0D ±D sD Λ cΛ

Full MC - Efficiencies - mass window cut only

[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6

00.05

0.10.150.2

0.250.3

0.350.4

0.450.5 mass cut only

Full MC - Background fraction

Figure 17: FullMC: Summary of signal (left) and background (right) e�ciencies, after mass windowcuts applied.

[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7 e+e- φ cη (2e)ψJ/ )µ(2ψJ/0D ±D sD Λ cΛ

Full MC - Efficiencies - all cuts (high efficiency)

2 2.5 3 3.5 4 4.5 5 5.5 6

-310

-210

-110

1

mass cut onlyall cuts (high efficiency)

Full MC - Background fraction

Figure 18: FullMC: Summary of signal and background e�ciencies after all cuts, mass window cutsand further cuts optimised for signal e�ciency.

[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7 e+e- φ cη (2e)ψJ/ )µ(2ψJ/0D ±D sD Λ cΛ

Full MC - Efficiencies - all cuts (high suppression)

[GeV]s2 2.5 3 3.5 4 4.5 5 5.5 6

-310

-210

-110

1

mass cut onlyall cuts (high supression)

Full MC - Background fraction

Figure 19: FullMC: Summary of signal and background e�ciencies after all cuts, mass window cutsand further cuts optimised for background suppression.

optimisation for background suppression, the total simultaneous trigger e�ciencies obtained are✏t = 14.5% (D+

s -data set), ✏t = 6.4% (⌘c-data set) and — “by definiton” — ✏t = 0.1% (DPMdata-set), respectively (Fig. 29). Again these e�ciency values are summarised for all data sets atall five pp̄ centre-of-mass energies under study in Tab. 8 and Tab. 9. The full information on theseresults of each individual trigger line for each data set is summarised for completeness in Tab. 23and Tab. 24, respectively, in the Appendix (Sec. 8).

All the results of signal and background e�ciencies obtained after the further cuts applied forthe two approaches of optimisation (Tab. 8 and Tab. 9) are in addition graphically compiled inFig. 18 and Fig. 19.

While in the case of signal e�ciency optimised cuts a significantly improved background sup-pression (factors roughly between 4 and 25) can be achieved, the signal e�ciencies are basically

25

• Trade-off between signal efficiency & background suppression

• Many channels = much feedthrough & cross-tagging

Goal: BG suppression 1/1000

2014

Page 32: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Software Trigger - TMVA

!32

• First studies with many algorithms• Dependence on offered observables

—> output performance—> calculation speed

[GeV]s

2 2.5 3 3.5 4 4.5 5 5.5 600.10.20.30.40.50.60.70.80.9

1 φ 0D ±D sD (2e)ψJ/)µ(2ψJ/ c

η -e+e Λ cΛ

Efficiency - TMVA application

[GeV]s

2 2.5 3 3.5 4 4.5 5 5.5 6-310

-210

-110

1

Mass cut onlyTMVA application

Background fraction

Figure 23: CFMlpANN with primary event shape variables only (FullMC) — Summary of signal (left)and background (right) e�ciencies, after mass window cuts and CFMlpANN with primary event shapevariables only applied.

Approach 3. CFMlpANN with primary event shape variablesThird TMVA approach is based on only the primary event shape variables. The number of usedinput variables is fixed with 28 at each trigger category. To test the TMVA performance of globalevent shape variables, no secondary variables from each resonance as like p or pt are required. Thus,the contents in the list are slightly di↵erent from the approach 1. Few more variables such as numberof hits in trackers and mean scattering angle of all charged particles are additionally introduced toget an improvement of TMVA performance. In this case, all generated events can be put into thetraining, therefore no MC true matched smaples are necessary at all. It has an advantage to reducethe statistical uncertainty due to the size of training samples. Again, all introduced variables arelisted in Table ?? (Appendix sec. 8). In Fig. 25, training responses and discriminator distributionsfor the approach of few best variables are plotted for 9 tagging categories at E=5.5 GeV. Cut valuearound 0.5 should be suitable to separate between signal and background for every categories.For the approach with global event shape variables, the performance of background reduction isslightly worser than TMVA approach 1 or 2. At E=5.5 GeV the background reduction increaseupto ✏t = 5.74%. However, the signal e�ciencies are much higher than both TMVA approach 1and 2. Special feature of the approach with only global event shape is that the enhancement ofe�ciency at the signal tagging for the ⌘c and ⇤c data. The signal e�ciency can be recovered bythe factor of, which has ✏t = 33.64% and ✏t = 42.08% for the ⌘c and ⇤c, respectively. E�cienciesand background reduction for TMVA with primary event shape variables at all five centre-of-massenergies are summarised in the Fig. 23 and Table 12. The full information on these results of eachindividual trigger line for each data set is summarised for completeness in Table 30.

Approach 4. Systematics and Summary of TMVA applicationAs a systematic check, another well known non-linear classification, Boosted Decision Tree (BDT)has been tested. There are also existing several di↵erent algorithms for boosted classifiers. Wetake a version of adaptive boost tree, namely BDTD with variable transformation. To reducethe correlation among the variables for boosted algorithms, it suggested that all input variableswould be transformed their shapes into more appropiriate forms in advance. This preprocessingtransformations may lead to better performance for BDT method and to reduce the training time.

Table 12: CFMlpANN with primary event shape variables only(FullMC) — Summary of the totalsimultaneous trigger e�ciencies ✏t [%] for the di↵erent data sets, after mass window cuts and CFMl-pANN with primary event shape variables onlyapplied.

ps (GeV) e

+e�

� ⌘c J/ (ee) J/ (µµ) D0

Ds ⇤ ⇤c DPM2.4 50.38 34.35 - - - - - - 18.68 - 0.813.77 42.36 41.29 33.64 39.48 54.76 44.90 32.57 - 19.01 - 1.584.5 44.22 40.06 45.38 37.59 53.96 50.51 43.45 45.05 18.88 - 2.705.5 38.84 37.43 50.82 42.80 57.08 52.44 47.60 51.05 19.81 42.08 5.74

30

CFMlpANN

Goal: BG suppression 1/1000

FDA_GA response

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

dx

/ (1

/N)

dN

0

0.5

1

1.5

2

2.5

3

3.5

4Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.321 (0.199)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0.0

)%

TMVA overtraining check for classifier: FDA_GA

Fisher response

-500 0 500 1000 1500 2000

dx

/ (1

/N)

dN

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

0.0014

0.0016

0.0018

0.002 Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.443 (0.709)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0.0

)%

TMVA overtraining check for classifier: Fisher

RuleFit response

-4 -2 0 2 4

dx

/ (1

/N)

dN

0

0.1

0.2

0.3

0.4

0.5

0.6Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.4 (0.919)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0.0

)%

TMVA overtraining check for classifier: RuleFit

KNN response

0 0.2 0.4 0.6 0.8 1

dx

/ (1

/N)

dN

0

5

10

15

20

25

30

35Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.671 (0.473)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0

.0)%

TMVA overtraining check for classifier: KNN

Likelihood response

-8 -6 -4 -2 0 2

dx

/ (1

/N)

dN

0

0.2

0.4

0.6

0.8

1 Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 1 (0.712)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0

.0)%

TMVA overtraining check for classifier: Likelihood

MLP response

-1.5 -1 -0.5 0 0.5 1 1.5

dx

/ (1

/N)

dN

0

1

2

3

4

5

6 Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.832 (0.976)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0

.0)%

TMVA overtraining check for classifier: MLP

BDT response

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

dx

/ (1

/N)

dN

0

1

2

3

4

5

6 Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.0238 (0.000372)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0

.0)%

TMVA overtraining check for classifier: BDT

SVM response

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

dx

/ (1

/N)

dN

0

2

4

6

8

10Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.654 (0.265)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0

.0)%

TMVA overtraining check for classifier: SVM

TMlpANN response

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

dx

/ (1

/N)

dN

0

2

4

6

8

10

12

14 Signal (test sample)

Background (test sample)

Signal (training sample)

Background (training sample)

Kolmogorov-Smirnov test: signal (background) probability = 0.073 (0.0268)

U/O

-flo

w (

S,B

): (

0.0

, 0

.0)%

/ (

0.0

, 0

.0)%

TMVA overtraining check for classifier: TMlpANN

Figure 30: Classification of 9 TMVA algortihms for the J/ selection in the J/ ! l+l�⇡+⇡� event

at E=5.5 GeV data.

Signal efficiency

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Ba

ck

gro

un

d r

eje

cti

on

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

MVA Method:

BDT

SVM

RuleFit

MLP

KNN

TMlpANN

Likelihood

FDA_GA

Fisher

Background rejection versus Signal efficiency

Figure 31: ROC curve : summary of signal e�ciency and background rejection obtained by di↵erentalgorithms of J/ classification for J/ ! l

+l�⇡+⇡� event at E=5.5 GeV data.

55

2014

Page 33: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Software Trigger on GPU?

!33

FastSim - Principle Component Analysis - no GPU, yet

pp ! D+D� ! K�⇡+⇡+ D�(incl.) (& cc.)<latexit sha1_base64="j/nYHFxr7hPc0Pmeo0J07vdFGDI=">AAACR3icbVDLSgMxFM3Ud31VXboJFqUiHWZUUHAj6kJwU8E+oNOWTJq2wUwSkoxShv6dG7fu/AU3LhRxafoAa+sNuZyccy5JTigZ1cbzXp3UzOzc/MLiUnp5ZXVtPbOxWdIiVpgUsWBCVUKkCaOcFA01jFSkIigKGSmH95d9vfxAlKaC35muJLUItTltUYyMpRqZugyE1fvjiewFirY7BiklHuFV/eCqnh9nbuxR0vrBsJ1ZRz5HOWbuPgzOfhfMBXuWgBi7+41M1nO9QcFp4I9AFoyq0Mi8BE2B44hwgxnSuup70tQSpAzFjPTSQayJRPgetUnVQo4iomvJIIce3LVME7aEspsbOGDHJxIUad2NQuuMkOnoSa1P/qdVY9M6rSWUy9gQjocXtWIGjYD9UGGTKoIN61qAsKL2rRB3kELY2OjTNgR/8svToHTo+kfu4e1x9vxiFMci2AY7IAd8cALOwTUogCLA4Am8gQ/w6Tw7786X8z20ppzRzBb4UynnB7BErn8=</latexit>

Reaction:

Signal BackgroundTotal events 24713 52180True selection 23555 50350False selection 1158 1830True selection rate 0.953 0.965

MC input

Page 34: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Software Trigger on GPU

!34

FullSim - Artificial Neural Network - training on GTX1080Ti

pp ! D+D� ! K�⇡+⇡+ D�(incl.) (& cc.)<latexit sha1_base64="j/nYHFxr7hPc0Pmeo0J07vdFGDI=">AAACR3icbVDLSgMxFM3Ud31VXboJFqUiHWZUUHAj6kJwU8E+oNOWTJq2wUwSkoxShv6dG7fu/AU3LhRxafoAa+sNuZyccy5JTigZ1cbzXp3UzOzc/MLiUnp5ZXVtPbOxWdIiVpgUsWBCVUKkCaOcFA01jFSkIigKGSmH95d9vfxAlKaC35muJLUItTltUYyMpRqZugyE1fvjiewFirY7BiklHuFV/eCqnh9nbuxR0vrBsJ1ZRz5HOWbuPgzOfhfMBXuWgBi7+41M1nO9QcFp4I9AFoyq0Mi8BE2B44hwgxnSuup70tQSpAzFjPTSQayJRPgetUnVQo4iomvJIIce3LVME7aEspsbOGDHJxIUad2NQuuMkOnoSa1P/qdVY9M6rSWUy9gQjocXtWIGjYD9UGGTKoIN61qAsKL2rRB3kELY2OjTNgR/8svToHTo+kfu4e1x9vxiFMci2AY7IAd8cALOwTUogCLA4Am8gQ/w6Tw7786X8z20ppzRzBb4UynnB7BErn8=</latexit>

Reaction:

MC inputAll mass spectra normalised to 1

Page 35: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Summary

• Machine learning sees a comeback in physics• Many available libraries with fresh concepts

• ML in algorithms under development: • Forward tracking• charged PID• Software trigger

• Potential parts which may benefit from ML:• EMC cluster shape analysis• Event building• Physics analysis

!35

Page 36: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

Thanks for your attention.

Page 37: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)
Page 38: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

PandaRoot Communication

• Code Repository: pandaatfair.githost.io

• Issue tracker, including discussions

• WiKi page:panda-wiki.gsi.de/foswiki/bin/view/Computing/PandaRoot

• Forums: forum.gsi.de & Slack: pandaroot.slack.com

• Bi-weekly online meetings: Thu. 10-11

• Dashboard:

https://cdash.gsi.de/index.php?project=PandaRoot

!38

Page 39: Machine Learning in PandaRoot - jlab.org · • TensorFlow (Deep Learning) • Keras (on top of TensorFlow with GPUs) • MLlib Apache Spark • Sci-Kit Learn • PyTorch (Deep Learning)

!39

TString inputGenerator = "psi2s_Jpsi2pi_Jpsi_mumu.dec"; // "dpm" "ftf" or e.g. "box:type(211,1):p(1,1):tht(10,120):phi(0,360)" PndMasterRunSim *fRun = new PndMasterRunSim();fRun->SetInput(inputGenerator);fRun->SetName("TGeant4");fRun->SetOptions("");fRun->SetParamAsciiFile("all.par");fRun->SetBeamMom(7.0);fRun->SetStoreTraj(kTRUE);

fRun->Setup("evtcomplete"); // file name prefixfRun->CreateGeometry();fRun->SetGenerator();fRun->AddSimTasks();

fRun->Init();fRun->Run(1000); //nEventsfRun->Finish();

e.g. in macro/master

Short ROOT macros to start simulations