Particle Identiﬂcation - IHEP

Particle Identification

Kanglin He, Gang Qin, Bin Huang, Jifeng Hu, Xiaobin Ji, Yajun Mao

email: [email protected]

November 9, 2006

Contents

1 Introduction 2

2 The PID system of BESIII 2

2.1 The dE/dx measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 The TOF counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 The CsI(Tl) Calorimeter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.4 The muon system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 The Correlated Analysis in TOF PID 5

3.1 General algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Errors and correlations of TOF measurements . . . . . . . . . . . . . . . . . 6

3.3 Combining the time-of-flight from two layers’ measurements . . . . . . . . . 7

4 Control sample 7

4.1 Hadron Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.2 Electron Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.3 Muon Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 The Likelihood Method 9

5.1 Probability Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.2 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.4 Weighted Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.5 Using likelihood, consistencies, and probabilities . . . . . . . . . . . . . . . . 12

5.6 An example of TOF and dE/dx PID . . . . . . . . . . . . . . . . . . . . . . 13

5.7 Cell analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.8 The role of neural nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 The Toolkit for Multiple Variables Analysis 15

1

7 The Artificial Neural Network Method 157.1 The TMVA Algorithm Factories . . . . . . . . . . . . . . . . . . . . . . . . . 16

1 Introduction

Particle identification (PID)will play an essential role in most of BESIII physics program.Good µ/π separation is required in precise fD/fDs measurements. Excellent electron ID willhelp to improve the precision of CKM elements Vcs and Vcd. The identification of hadron(π/K/p) particles is the most common tool in BESIII physics analysis, sometimes it’s themost crucial tool to the analysis. For example, in searching for D0 − D0 mixing and CPviolation.

Each part of BESIII detector executes its own functions and provides a vast amountinformation which determines the final efficacy of particle identification. Particle identifi-cation should discriminate correctly and absolutely between particle species. In practice,physicists are imperfect, detectors are imperfect, backgrounds are present, and particles de-cay and interact as they traverse detector. In general, the detector responses are dependentof the incident angle and the hit position of charged track. Such non-uniformity in differentgeometry and physics region of detector have been carefully calibrated[?].

The particle identification assignment made for any particular track is often not cor-rect. However, by properly using all information available, one can discriminate betweenhypotheses most powerfully and can test for consistency between the data and the selectedhypotheses. In recent years, a couple of PID algorithms has been developed: the likelihoodmethod; the Fisher discriminator; the H-Matrix estimator; the Artificial Neural Network;and the Boosted Decision Tree, etc.

2 The PID system of BESIII

The BESIII detector consists of a Berylium beam pipe, a helium-based small-celled driftchamber,Time-OF-Flight (TOF) counters for particle identification, a CsI(Tl) crystal calorime-ter, a super-conducting solenoidal magnet with a field of 1 Tesla, and a muon identifier usingthe magnet yoke interleaved with Resistive Plate Counters(RPC).

2.1 The dE/dx measurements

The Main Drift Chamber (MDC) measures the drift times and the energy losses (dE/dx) ofcharged particle while it pass through the working gas. It consists of 43 layers of sensitivewires and works with a 60%/40% He/C3H8 gas mixture. The momentum of particle willbe obtained by fitting a helical curves to a set of position coordinates which are providedby the drift time measurements. The energy loss in drift chamber can provide additionalinformation on particle identification. Figure. 1 shows the normalized pulse heights varieswith the momentum and particle species. The normalized pulse height is proportional tothe energy loss of incident particles in the drift chamber, which is a function of βγ = p/m,where γ = 1/

√1− β2, p and m are the momentum and mass of charged particle. Charged

particles of different mass will have different velocity at the same momentum, so together

2

0.2 0.4 0.6 0.8 1 1.2 1.4

0.2

0.4

0.6

0.8

1

1.2

p

π

K

Figure 1: Normalized pulse heights (dE/dx) vs. momentum of charged particles.

with the momentum measurement, the dE/dx can give the mass information of the particle.

There are a lot of factors which affect the dE/dx measurements: the number of hits; theaverage pass lengthes in each cell; the space charge and saturation effects; the non-uniformityof electric fields, etc. Most of them are related to the incident angle and momentum ofcharged particles.

2.2 The TOF counter

Out-side the MDC is the TOF system, which is crucial for particle identification.It consistsof a two layer barrel array of 88 50mm × 60mm × 2320mm BC480 scintillators in eachlayer and endcap arrays of 48 fan shaped BC404 scintillators. Hamamatsu R5942 fine meshphototubes will be used-two on each barrel scintillator and one on each endcap scintillator.Expected time resolution for kaon and pion and for two layers is 100-110ps, giving a 2σ K/πseparation up to 0.9Gev/c for normal tracks.

In one e+e− collision, all produced particle fly from the interaction point(IP) towardto the outer detector. The physics goal of TOF system is to measure the flight time t =L/βc, β = p/

√p2 + m2 for charged particle identification, where c is the velocity of light,

m is the mass of charged particle, β is the flight velocity of charged particle. L and p arethe flight path and the momentum of charged particle gived by the MDC measurements.Usually, there are two equivalent ways to use the TOF information: comparing the measuredtime tmea against the predicted time texp, look for the most close to zero of ∆t = tmea− texp;calculating the measured mass of charged particle through

β =L

c× tmea

, m2 = p2 × 1− β2

β2. (2.1)

A typical mass square distribution calculated by Eq.(2.1) is drawn in Figure 2.

The PID ability relies on the time resolution (σt) of the TOF system. The σt dependson the pulse height, hit position and the beam status. The performances of the scintillator,PMT and electronics are different, usually the value of σt varies in different TOF counter.

3

m2

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.40

200

400

600

800

1000

p

π

K

Figure 2: Mass square distribution from TOF measurements.

2.3 The CsI(Tl) Calorimeter

The CsI(Tl) crystal electromagnetic calorimeter (EMC) contains 6240 crystals, is used tomeasure the energy of photons precisely. The expected energy and spatial resolutions at 1Gev are 2.5% and 0.6cm, respectively. The electromagnetic shower characteristic are differentfor electrons, muons and hadrons, the energy deposit and the shape of shower in calorimetrycan be used to identify particles.

The energy loss by exciting and/or ionizing per unit length is given by the dE/dx, andis essentially the same for all energetic particle. In the CsI(Tl) crystal, the energy loss isapproximately 5.63MeV/cm for minimum ionization particle (MIP). The energy deposit byionization is about 0.165GeV for charged particles passing at normal incidence through theEMC. Since electrons and positrons produce electromagnetic showers as they pass througha calorimeter, their energy loss will be dominated by pair-production and Bremsstrahlung,even though there will be some energy loss by ionizing/exciting atomic electrons. Theytherefore lose all of their energy in the calorimeter, and the ratios of deposit energy tothe track momentum (E/p) will be approximately unity. Sometimes the energy deposit ofhadrons will have an E/p ratios higher than that of expected by dE/dx due to the nuclearinteraction with materials. The energy loss of muons will be governed by dE/dx only, so theE/p ratio for muons will be smaller and narrower. Figure 3(a) shows the energy deposit vsmomentum of electron, pion and muon in EMC. Generally, we expect

(E/p)µ < (E/p)π < (E/p)e. (2.2)

The ”shape” of shower can be described by the three parameters: Eseed, the energydeposited in the central crystal; E3×3, the energy deposited in the central 3×3 crystal array;and E5×5, the energy deposited in the central 5 × 5 crystal array. Muons pass the crystalswithout generating any shower, just a simple line, so Eseed/E3×3 and E3×3/E5×5 would bealmost ∼ 1. But these two items will be different in an electromagnetic shower caused byelectrons and some of interacted pions. As shown in Figure 3(b), it is expected

(Eseed/E3×3)e < (Eseed/E3×3)π < (Eseed/E3×3)µ,(E3×3/E5×5)e < (E3×3)/E5×5)π < (E3×3/E5×5)µ.

(2.3)

The secondmoment S is defined as

S =

∑i Ei · d2

i∑i Ei

. (2.4)

4

momentum 0.2 0.4 0.6 0.8 1 1.2 1.4

ern

gy

dep

ost

ion

in E

MC

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

e

/ K /pπ / µ

e9/e25 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 10

50

100

150

200

250

300

Figure 3: (a)Energy deposit in EMC vs. the momentum of electrons, pions and muons; (b)ratio of E3×3/E5×5 of electrons, pions and muons.

where Ei is the energy deposition in the ith crystal and di is distance between that crystal andthe reconstructed center. The original idea of S was developed by Crystal Ball experimentto distinguish the cluster generated by π0’s and γ’s. For a single electron or muon, mostof its energy will deposit in the central crystal,so the S will be small. But for a interactedhadron, S will be relatively bigger, thus will do some help to separate pions and electrons.

2.4 The muon system

The magnet return iron has 9 layers of Resistive Plate Chambers(RPC) in the barrel and 8layers in the endcap to form a muon counter. An average efficiency of 95% is obtained forthe chamber. The spatial resolution obtained was 16.6 mm.

The energy of electron is exhausted in the calorimeter, cannot reach to the muon counter.Most of hadrons passed the material of calorimeter, magnet, and would be absorbed in thereturn irons. Muons have quite strong punching ability. Usually muons will produce one hitsin each layer, hadrons may produce many hits in a certain layer if the interaction occurred.The distances of muon hits to the extrapolated positions of inner track will be helpful toreduce the hadron contamination to a lower level, since the hits generated by the secondarymuon from the decay of pion/kaon cannot math the inner track very well. Figure ?? showthe distribution of travel depth, average hits per layer and the distance between the muonhit to the extrapolated position.

3 The Correlated Analysis in TOF PID

While one charged particle passing through the barrel array of TOF counter, there arepossible two or four measurements , corresponding to the hits in one or two layers’ counters.At BESIII the problem of averaging the TOF measurements becomes complicated by thefact that different measurements have correlated errors from the common event start time.A better choice would be the weighted average of the different measurements.

5

3.1 General algorithm

Suppose we have n measurements ti of a particular time-of-flight. Since the measurements arecorrelated we need more information that just the individual errors. Accordingly, let’s definethe covariance matrix Vt, whose terms are given by (Vt)ij =< δtiδtj >, where δti = ti − t,t is the average of ti. The best linear estimator for the time-of-fight which accounts for allmeasurements, including errors and correlations can be constructed generally as

t =∑

i

witi,∑

i

wi = 1 . (3.1)

where the weights wi must be found. Writing δt =∑

i wiδti and using the definition of thestandard deviation, we get

σ2t =

∑ij

wiwj(Vt)ij . (3.2)

To minimize σ2t

subject to the condition∑

i wi = 1, we use the Lagrange multipliertechnique. Let’s write

σ2t =

∑ij

wiwj(Vt)ij + λ(∑

i

wi − 1) (3.3)

and set the derivative of Eq. (3.3) with respect to the wi and Lagrange multiplier λ to zero.This give the solution

wi =

∑k(V

−1t )ik∑

jk(V−1t )jk

. (3.4)

3.2 Errors and correlations of TOF measurements

The time resolution of TOF (σt) counter can be factorized as a function of pulse height Qand hit position z [?]. σt varies with Q is complicated, which need the detail study on thereal data. In this paper, only the z dependence σt are taken into account since it’s in asimilar manner for electrons, muon and hadrons. Fig. ?? shows a typical σt(z) of one-endreadout varies as a function of z from the Bhabha event.

The tmea is determined by both the measurements of end-time and start-time. Theaccuracy of end-time is limited by the detector; The precision of start-time is controlled bythe uncertainties of t0.Thus for a given TOF, the tmea in both readout end can be decomposedas

t1 = tc + (tD)1

t2 = tc + (tD)2, (3.5)

where t1,2 are the tmea’s in two readout end, tc is the correlated tmea, (tD)1,2 are the uncor-related tmea’s. Let’s define

t+ =t1 + t2

2

t− =t1 − t2

2

, (3.6)

the fluctuation of tc (σc) can be directly extracted by comparing the time resolution of t+and t− (σ+,−), σc =

√σ2

+ − σ2−. σ+(z), σ−(z) and σc(z) are drawn in Fig. ??. As shown inFig. ??, σc(z) is approximately a constant.

6

For one-layer barrel TOF measurement, the covariance matrix can be expressed as

Vt =

(σ2

1 σ2c

σ2c σ2

2

). (3.7)

where σ1,2 are the time resolutions of two readout end, which are the function of z. ApplyingEq. (3.7) in Eq. (3.1)−Eq. (3.4), we get

w1 =σ2

2 − σ2c

σ21 + σ2

2 − 2σ2c

, w2 =σ2

1 − σ2c

σ21 + σ2

2 − 2σ2c

, (3.8)

and

σ2t =

σ21 · σ2

2 − σ4c

σ21 + σ2

2 − 2σ2c

. (3.9)

the average t can be easily obtained. The time resolution σt(z) is drawn in Fig. ??.

3.3 Combining the time-of-flight from two layers’ measurements

4 Control sample

4.1 Hadron Sample

4.2 Electron Sample

At electron-positron colliders, there is an excellent source of electrons, the QED Bhabhascattering process e+e− → e+e−. The electron is nearly massless, so radiative correctionsare very important for this process. The spectrum of radiated photons is in general soft, butthere is a long tail which extends up to the beam energy. Roughly half of the photons areemitted from the initial state and the remainder from the final state.

Photons from the initial state are seldom detected; they are emitted nearly parallel to thebeam direction and so usually remain within the beam pipe. Thus, if the photon from thescattering process e+e− → γe+e− is detected, it is most likely the result of Bremsstrahlungfrom one of the final state electrons. This has several fortunate consequences. The finalstate electron which does not radiate will have momentum approximately equal to the beamenergy. The electron which does radiate will be of lower momentum; the momentum distri-bution peaks at high momentum but extends to very low mentum.

The electron sample was drawn from events produced by radiative Bhabha scattering.For such events, stringent cuts were placed on the photon and on the charged track of highermomentum. The charged track of lower momentum was added to the electron sample. Thecuts imposed were:

1) that there must be two and only two tracks in the drift chamber with well measuredmomenta and showers in the barrel calorimeter.

2) that there must be at least one neutral shower in the barrel calorimeter with a measuredenergy of at least 200MeV.

7

3) that the higher momentum track should have momentum and deposit energy, consistentwith the beam energy.

4) that the lower momentum track must have momentum at least 200 MeV below thebeam energy.

5) that the direction of the neutral track should match the direction of the missing mo-mentum.

Neutral tracks were required to have a measured energy of at least 200 MeV for severalreasons. The shower detector is almost completely efficient for photons of energy greater than50 MeV. There are some neutral showers which are not associated with incident photons.These are the result of ’split-off’ from the showers of charged tracks or of the electronic noise.Most of these spurious neutral tracks are of low measured energy. Requiring the energy tobe greater than 200 MeV removed the majority of these.

The energy requirement has another effect. The electron sample is needed to studythe pattern of energy deposition by electrons in the shower detector. The photon and theradiating electron will be emitted in approximately the same direction; the larger the photonenergy, the largely will be the angle between their direction. If the photon is required todeposit at least 200 MeV in EMC, the photon and electron showers will seldom overlap.Figure ?? shows the energy spectrum of detected photons and the momentum spectrum ofelectron sample.

4.3 Muon Sample

To study the muon identification (especially for the low momentum muon), a large and boardmomentum range cosmic ray sample will be selected from the data. Muons are the mostcomponent of the cosmic ray. Hadrons and other electromagnetic components are filteredby the iron yoke, only a few percents of hadrons background are remained in the sample.The ratio of hadrons could be easily estimated by comparing with the e+e− → µ+µ− (dimu)sample. While a single cosmic ray (near the interaction region) pass through the trackingdetector, it will be reconstructed as 2 tracks in most case.

The cosmic rays selection are processed as: two charged MDC track are required. Atotal EMC energy (less than 1.5GeV) CUT are applied to remove Bhabha events. Both twocharged track should have good TOF information and the difference of the two time-of-flightare required to be greater than 5ns. Comparison with the collision events, the cosmic eventshas different T0. To get the better tracking resolution, the sample are reconstructed againwith the T0 correction. The T0 for the cosmic events has a shift

T0 =(T1 + T2)

2(4.1)

to the collision events, where T1 and T2 are the measure time-of-flight. The drift time arecorrected as

Tcorr = Tmeas − T0 +

{0 (φ > 180◦)

(T2 − T1)× R4×(L−1)+W

RTOF(φ ≤ 180◦)

(4.2)

8

0

2500

5000

7500

0 1 2 3 4 50

2500

5000

7500

0 100 200 3000

2000

4000

-1 -0.5 0 0.5 1pµ/(GeV) φ

µ(dgree) cosθ

µ

Figure 4: momentum p, φ and cos θ distributions for the selected cosmic ray sample fromBESII data.

where the R is radius of the wires(L is the layer number, W is the wire number) and TOFcounter. The tracking parameters are improved after the correction, the cosmic ray withφ > 180◦ is an ideal “muon” track. Figure 4 shows the momentum, φ and cos θ distributionsfor selected cosmic ray from BESII data.

The purity of the sample is quite high( greater than 98%). It can be used for the detectorcalibration, include: the momentum and incident angle dependent position resolution; theµ−ID efficiencies study; with the dimu sample together, check the uniformity of detectorresponse; the E/p ratio and the shower shape measured in the EMC; the calibration of thedE/dx curve; the alignment of tracking system, etc.

5 The Likelihood Method

Using relative likelihoods (likelihood ratios) allows the most powerful discrimination be-tween hypotheses, and using significance provides a measure of consistency between dataand selected hypotheses.

5.1 Probability Density Functions

The response of a detector to each particle species is given by a probability density function(PDF). The PDF, written as P(x; p,H) describes the probability that a particle of speciesH = e±, µ±, π±, K±, p, p̄ leaves a signature x described by a vector of measurements(dE/dx,TOF, e/p, ...). P(x; p,H)dx is the probability for the detector to respond to a track ofmomentum p and type H with a measurement in the range (x, x + dx). As with any PDF,the integral over all possible values is unit,

∫ P(x; p,H)dx = 1. Note that the momentumis treated as part of the hypothesis for the PDF and therefore is placed to the right ofsemicolon. Drift chamber momentum measurements are usually of sufficient precision thatthey can be treated as a given quantity. In borderline cases when the precision is almostsufficient, it is sometimes treated by assuming that momentum is perfectly measured andsmearing the PDF.

The vector x may describe a single measurement in one detector, several measurementsin one detector, or several measurements in several detectors. The measurements may becorrelated for a single hypothesis. An example of correlated measurements within a single

9

device is E/p and the shower shape of electrons in EMC. An example of correlated measure-ments in separate detectors is the energy deposited in the EMC and the instrumented fluxreturn by charged pions. In many case of interest the correlations will be reasonably smalland the overall PDF can be determined as a product of the PDFs for indivdual detectors.For example, the specific ionization deposited by a charged track as it traverses the driftchamber has almost no influence on the time-of-flight measurements in TOF.

The difficult part of PID analysis is determining the PDFs, their correlations (if any)and underdtanding the uncertainties for these distributions.

5.2 Likelihood

Given the relevant PDFs, the likelihood that a track with measurement vector x is a particleof species H is denoted by L(H; p, x). The functional forms of PDFs and the correspondinglikelihood function are the same:

L(H; p, x) ≡ P(x; p,H) (5.1)

The difference between L(H; p, x) and P(x; p,H) is subtle: probability is a function of themeasurable quantities (x) for a fixed hypothesis (p,H); likelihood is a function of particletype (H) for a fixed momentum p and the measured value (x). Therefore, an observed trackfor which x has been measured has a likelihood for each particle type. Competing particletype hypotheses should be compared using the ratio of their likelihoods. Other variableshaving a one-to-one mapping onto the likelihood ratio are equivalent. Two commonly usedmappings of the likelihood ratios are difference of log-likelihoods and a normalized likelihoodratio, sometimes called likelihood fraction. For example, to distinguish between the K+ andπ+ hypotheses for a track with measurements xobs, these three quantities would be writtenas:

L(K+; pobs, xobs)/L(π+; pobs, xobs) (5.2)

log(L(K+; pobs, xobs)

)− log(L(π+; pobs, xobs)

)(5.3)

L(K+; pobs, xobs)

L(K+; pobs, xobs) + L(π+; pobs, xobs)(5.4)

It can be shown rigorously that the likelihood ratio (Eq. (5.2) and its equivalents Eq. (5.3)and Eq. (5.4)) discriminate between hypotheses most powerfully. For any particular cut onthe likelihood ratio there exists no other set of cuts or selection procedure which gives ahigher signal efficiency for the same background rejection.

There has been an implicit assumption made so far that there is perfect knowledge ofthe PDF describing the detector. In the real world, there are often tails on distributions dueto track confusion, nonlinearities in detector response, and many other experimental sourcewhich are imperfectly described in PDFs. While deviations from the expected distributioncan be determined from control samples of real data and thereby taken into account correctly,the tails of these distributions are often associated with fake or badly reconstructed tracks.This is one reason why experimentalists should include an additional consistency test.

10

5.3 Consistency

A statistical test for consistency does not try to distinguish between competing hypotheses:it address how well the measured quantities accord with those expected for a particle typeH. The question is usually posed, ”What fraction of genuine tracks of species H looks lessH-like than does this track?” This is the prescription for a significance level. For a devicemeasuring a single quantity and a Gaussian response function, a track is said to be consistentwith hypothesis at the 31.7%(4.55%) significance level if the measurement falls within 1(2)σof the peak value. If the PDF is a univariate Gaussian,

P(x; p,H) =1√

2πσ(p,H)exp

[−1

2

(x− µ(p,H)

σ(p,H)

)2]

(5.5)

the significance level(SL) for hypothesis H of a measured track with x = xobs is defined by

SL(xobs; H) ≡ 1−∫ µH+xobs

µH−xobs

P(x; H)dx (5.6)

Notice that the integration interval is defined to have symmetric limits around the cen-tral value. This is an example of a two-sided test. Mathematically, one may also define aone-sided test where the integration interval ranges from xobs to +∞ or from −∞ to xobs.However , for a physicist establishing consistency, it is only sensible to talk about the sym-metric, two-sided significance levels defined in Eq. (5.6) when presented with a GaussianPDF. This definition is equally sensible for other symmetric PDFs with a single maximum.

Nature is not always kind enough to provide Gassian or monotonic PDFs. For example,asymmetric PDFs are encountered when making specific ionization (dE/dx) measurements.Multiple peaks in a PDF might be encountered when considering the energy deposited bya 1 GeV π− in EMC. Although the π− will typically leave a minimum ionizing signature,some fraction of the time there will be a charge exchange reaction (π− + p → π0 + n) whichdeposit most of the pi− energy electromagnetically. A particularly useful generalization ofthe significance level of an observation xobs given the hypothesis H is defined to be

SL(xobs; H) = 1−∫

P(x;H)>P(xobs;H)

P(x; H)dx (5.7)

Although we define the consistency in terms of an integral over the PDF of x, note that therange(es) is(ar) specified in terms of the PDF, not in terms of x. This allows a physicallymeaningful definition. While other definitions of significance level are possible mathemat-ically, we strongly recommend the definition in Eq. (5.7). Note that because the PDF isnomalized to 1, the significance level can be defined equivalently as

SL(xobs; H) =

∫

P(x;H)<P(xobs;H)

P(x; H)dx (5.8)

All significance levels derived from smooth distribution of true hypothesis are uniformlydistributed between 0 and 1 (as are confidence levels). This can be used to test the correctnessof the underlying PDF using a pure control sample.

11

Using significance levels to remove tracks which are inconsistent with all hypotheses takesa toll on the efficiency (presumably small), and may also discriminate between hypotheses.In general, if a cut is made requiring SL > α, the false negative rate, is α. This is identicalto the statement that the efficiency of this cut is equal to 1-α. The false positive rate, β(H)can depend on the definition of the SL, i.e., on the design of the test, and is identical t themisidentification probability. The background fraction in a sample is the sum of

∑βiAP i,

where PAi is the fraction of particle i in the sample.Consistencies control only the efficiency. Minimizing background, however, depends on

the type of sample. A fixed cut on the consistency will produce very different backgroundrates in different analysis.

Any procedure for combining either confidence levels or significance levels consistency isarbitrary with an infinite number of equally valid alternatives. For example, the method ofcombining the confidence levels is mathematically equivalent to the following recipe:

1) use the inverse of CL = P (χ2|1) to covert each of n probabilities CLi into a χ2i

2) add them up, i.e. χ2 =∑n

i=1 χ2i

3) use CL = P (χ2|n) to convert χ2 into a new ”combined” CL

5.4 Weighted Likelihood

In the case (such as particle identification) where the a priori probabilities of competinghypotheses are known numbers, PA(H), likelihood can be used to calculate the expectedpurities of given selections. Consider the case of K/π separation, the fraction of kaons in asample with measurement vector x is given by

F(K; x) =L(K; x) · PA(K)

L(π; x) · PA(π) + L(K; x) · PA(K)(5.9)

This can be considered as a weighted likelihood ratio where the weighting factors are a prioriprobabilities. The F(K; x) are also called posteriori probabilities, relative probabilities, orconditional probabilities, and their calculation according to Eq. (5.9) is an application ofBayes’ theorem. The purity, i.e., the fraction of kaons in a sample selected with, say,F(K; x) > 0.9, is determined by calculating the number of kaons observed in the relevantrange of values of F and normalizing to the total number of tracks observed there, e.g.,

fraction(FH > 0.9) =

∫ 1

0.9dN

dF(H;x)F(H; x)dF(H; x)

∫ 1

0.9dN

dF(H;x)dF(H; x)

(5.10)

where the integration variable is the value of F(H; x).

5.5 Using likelihood, consistencies, and probabilities

If PDFs (and a priori probabilities) were perfectly understood, using likelihood ratios (andthe probabilities calculated in Eq. (5.9)) to discriminate between hypotheses would suffice.However, the tails of distributions are likely to be unreliable. Some tracks will have signatures

12

in the detectors that are very unlikely for any hypothesis. Others will have inconsistentsignatures in different detectors, not in accord with any single hypothesis. We do not wantto call something a K rather than π when the observed value of some parameter is extremelyimprobable for either hypothesis, even if the likelihood ratio strongly favors the K hypothesis.Extremely improbable events indicate detector malfunctions ad glitches more reliably thanthey indicate particle species; they should be excluded. For many purposes, this can bedone conveniently by cutting on the consistency of the selected hypothesis. If the PDFs areresonablely well understood, this has the additional advantage that it provides the efficiencyof the cut.

Only in the case of a single Gaussian distributed variable do consistencies contain allthe information to calculate the corresponding likelihood functions. There is a two-to-onemapping from the variable to the consistency and a one-to-one mapping from the PDFto the consistency. One can compute probabilities directly from likelihoods only becausethey are proportional to PDFs. To compare relative likelihoods, one must either retain thelikelihoods or have access to the PDFs used to compute consistencies. If there is more thanone variable involved, or the distribution is non-Gaussian, even this possibility evaporates;any consistency corresponds to a surface in the parameters space, and one cannot recoverthe values of the parameters or the likelihood, even in principle.

5.6 An example of TOF and dE/dx PID

At BESIII, TOF and dE/dx are quite essential for hadron separation. For a TOF detectorin which the time-of-flight t are measured with Gaussian resolution σt which we assume t bea constant(∼ 80 ps); Similarly, the energy loss in drift chamber (dE/dx) are also Gaussiandistribution with a resolution σE ∼ 6.5%. If all incident particles are known to be eitherpions, kaons and protons at some fixed momentum, then the distribution of t and dE/dx willconsist of the superposition of three Gaussian distributions, centered at the central values(tπ, tK ,tp ) and ((dE/dx)π, (dE/dx)K , (dE/dx)p) for pions, kaons and protons. The PDF forpion hypothesis is the normalized probability function

P(t; π) =1√2πσt

exp

[−1

2

(t− tπ

σt

)2]

P(dE/dx; π) =1√

2πσE

exp

[−1

2

(dE/dx− (dE/dx)π

σE

)2] (5.11)

The PDFs for kaon and proton are in the similar form. Using the observed time of flight tand dE/dx information, the likelihoods for pion, kaon and proton can be constructed by

L(π) = L(π; t, dE/dx) = P(t; π) · P(dE/dx; π)L(K) = L(K; t, dE/dx) = P(t; K) · P(dE/dx; K)

L(p)L(p; t, dE/dx) = P(t; p) · P(dE/dx; p)(5.12)

Let’s Consider the K/π separation in a sample which consist 80% pions and 20% kaons.Using the observed time of flight t and energy loss in drift chamber, it is possible to calculate

13

time of flight/(ns)2.5 3 3.5 4 4.5 5

dE/d

x

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 0.2 0.4 0.6 0.8 1-510

-410

-310

-210

-110

1

10

210

310

410

Fraction of likelihood0 0.2 0.4 0.6 0.8 1

Eve

nts

/ (

0.01

)

-510

-410

-310

-210

-110

1

10

210

310

410

2.5 3 3.5 4 4.5 50

2000

4000

6000

8000

time of flight/(ns)2.5 3 3.5 4 4.5 5

Eve

nts

/ (

0.02

5 )

0

2000

4000

6000

8000

0.6 0.8 1 1.2 1.4 1.6 1.8 20

1000

2000

3000

4000

5000

6000

7000

dE/dx0.6 0.8 1 1.2 1.4 1.6 1.8 2

Eve

nts

/ (

0.01

5 )

0

1000

2000

3000

4000

5000

6000

7000

Momentum: p = 0.6 GeV

KK

KK

π

π

ππ

time of flight/(ns)2.5 3 3.5 4 4.5 5

dE

/dx

0.6

0.8

1

1.2

1.4

1.6

1.8

2


Even

ts /

( 0

.01 )

-510

-410

-310

-210

-110

1

10

210

310

410

0 0.2 0.4 0.6 0.8 1-510

-410

-310

-210

-110

1

10

210

310

410

time of flight/(ns)2.5 3 3.5 4 4.5 50

2000

4000

6000

8000

2.5 3 3.5 4 4.5 5

Even

ts /

( 0

.025)

0

2000

4000

6000

8000

0.6 0.8 1 1.2 1.4 1.6 1.8 20

2000

4000

6000

8000

dE/dx0.6 0.8 1 1.2 1.4 1.6 1.8 2

Even

ts /

( 0

.015 )

0

2000

4000

6000


K

K

KK

π

π

π π

time of flight/(ns)2.5 3 3.5 4 4.5 5

dE

/dx

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 0.2 0.4 0.6 0.8 1-510

-410

-310

-210

-110

1

10

210

310

410


Even

ts /

( 0

.01 )

-510

-410

-310

-210

-110

1

10

210

310

410

time of flight/(ns)2.5 3 3.5 4 4.5 5

2000

4000

6000

8000

2.5 3 3.5 4 4.5 5

Even

ts /

( 0

.025)

0

2000

4000

6000

8000

0.6 0.8 1 1.2 1.4 1.6 1.8 20

2000

4000

6000

8000

dE/dx0.6 0.8 1 1.2 1.4 1.6 1.8 2

Even

ts /

( 0

.015 )

0

2000

4000

6000

8000


K

K

KK

π

π

ππ

time of flight/(ns)2.5 3 3.5 4 4.5 5

dE/d

x

0.6

0.8

1

1.2

1.4

1.6

1.8

2


-510

-410

-310

-210

-110

1

10

210

310

410

.2

Eve

nts

/ (

0.01

)

time of flight/(ns)

2.5 3 3.5 4 4.5 5

Eve

nts

/ (

0.02

5)

0

2000

4000

6000

8000

2000

0.6 0.8 1 1.2 1.4 1.6 1.8 20

2000

4000

6000

8000

dE/dx.2

Eve

nts

/ (

0.01

5 )


K

K K

K

π

π

π

π

Figure 5: The relative likelihood constructed by combining the TOf and dE/dx informationwith the track momentum at 0.6, 0.8 , 1.0 and 1.2GeV. The time of flight distribution iscalculated by a 1.0 flight distance.

the relative probabilities of pions and kaons at measured t and dE/dx

F(π) =PA(π)L(π)

PA(π)L(π) + PA(K)L(K)

F(K) =PA(K)L(K)

PA(π)L(π) + PA(K)L(K)

(5.13)

By the construction, F(π) + F(K) = 1. The calculation of relative probabilities are illus-trated in Figure 5. As shown in Figure 5, the K/π separation at 0.6 GeV is better than thatat 1 GeV.

5.7 Cell analysis

In the example presented above we assumed there were no correlations between the particleidentification provided by the TOF and that provided by dE/dx. This is a fine approach ifTOF is a purely passive detector and there are no other sources of correlation. An approachthat takes into account all correlations explicitly is cell analysis. Basically, you make a multi-dimensional histogram of all relevant variables and compute the fraction of tracks that land

14

in each cell for each hypothesis. You can then use these fractions as your likelihood. Theresult is optimal with all correlations completely accounted for, if the cells are small enough.

The trouble with this approach is that as the number of variables becomes larger, thenumber of cells quickly gets out of hand. It becomes impossible find enough ”training events”to map out the cell distributions with adequate statistics. Still, it is a viable approach for asmall number of variables and is well suited to a problem such as combining E/p and eventshape in calorimeter. This would involve 3 variables in principle: E/p, shape, and dip angle,but one might get by with relatively large cells. A judicious choice of cells that uses ourknowledge of the underlying physics can greatly reduce the number of cells needed. e.g.,the dip angle might be eliminated as a variable if a dip-corrected shape can be invented. Ifgroups of highly correlated variables can be treated together, we might be able to constructa set of relatively uncorrelated likelihoods. It may be necessary to combine information fromseveral detectors to construct some of these variables

5.8 The role of neural nets

If the variables are not highly correlated, multiplying together the likelihood associatedwith each variable should suffice. If correlations are simple enough, a change of variablesor a cell analysis may suffice. If the variables are highly correlated, neural nets and othersuch opaque boxes might construct near-optimal discrimination variables. The PDFs forthe resulting variables can be used as the basis for a likelihood analysis. Using the sameformalism for neural network outputs as for conventional likelihood analyses allows modulardesign of analysis software with no loss of information and optimal discrimination betweenhypotheses.

6 The Toolkit for Multiple Variables Analysis

7 The Artificial Neural Network Method

An artificial neural network [?] is a computational structure inspired by the study of biologicalneural processing.There are many different types of neural networks,from relatively simple toquite complex,just as there are many theories on how biological neural processing works.InBESIII particle identification , a type of layered feed-forward neural network will be applied.

A layered feed-forward neural network has layers,or subgroups of processing elements.Thefirst layer is the input layer and the last output layer.The layers that are placed betweenthe first and the last layers are the hidden layers.A layer of processing elements makesindependent computations on data that it receives and passes the results to another layer.Thenext layer may in turn makes its independent computations and pass on the results to yetanother layer.Finally,a subgroup of one or more processing elements determines the outputfrom the network.Each processing element makes its computation based upon a weightedsum of its inputs.The processing elements are seen as units that are similar to the neurons ina human brain,and hence,they are referred to as cells,neuromime,or artificial neurons.Eventhough our subject matter deals with artificial neurons,we will simply refer to them as

15

normPH

goodHits

pt

ptrk

type

Figure 6: The structure of a type of layered feed-forward neural network applied in BESIIIparticle identification. Four parameters ptrk,pt,goodHits,normPH from MDC act as the inputneurons and construct the input layer,the right-most layer is the output layer consisting ofan output neuron type, the two layers between them are hidden layers including 8 and 3neurons

Epoch0 200 400 600 800 1000

Err

or

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Training sample

Test sample

Figure 7: Errors vs numbers for training and test samples.

neurons.Synapses between neurons are referred to as connections,which are represented byedges of a directed graph in which the nodes are the artificial neurons.

As discussed in Section 2, the following variables are choosed as the input neuron. Theyare: the momentum and transverse momentum by track fitting; the normalized pulse heightand the number of good hits in dE/dx measurements; the mass square calculated fromthe measured time-of-flight, together with the hit position and pulse height in inner, outerbarrel and endcap TOF array; the energy deposit in EMC, the shower shape parametersEseed, E3×3, E5×5 and the secondmoment S; the travel depth, average of number of hits andthe distance between the hit and the extrapolated position in first layer muon counter.

The data sample are generated and analyzed in BESIII Offline Software System (BOSS).e/µ/π/K/p/p particles are generated in momentum range of 0.1-2.1 GeV/c. When all theseinput variables are ready, they will be put into the network and the training starts. The aimof the training is to minimize the total error upon the sum of a weighted examples. The epochis the most important parameter during the training. Figure 7 shows its effect on trainingresult, when this number is about 400, the result is good, too few or too many times will leadto useless result or over-training. The following figures shows the variation with momentumof some input neurons for different particles ,which gives us a better understanding of theinput neurons’ ability in particle identification.

16

P (Gev/c)0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

P (Gev/c)0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

eff

icie

nc

y

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

electron♦

pion ♦

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

P (Gev/c)0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

P (Gev/c)0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

eff

icie

nc

y

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

muon♦

pion ♦

P (Gev/c)0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

P (Gev/c)0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

effi

cien

cy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

kaon♦

pion ♦

Figure 8: NN-PID performance.

7.1 The TMVA Algorithm Factories

References

17

Documents

Particle Identiﬂcation - IHEP