The upgrade of the LHCb trigger for Run III€¦ · T > 0.5GeV 92.3% 92.5% 96.5% 96.4% 95.4% 93.6% page 9 Upgrade trigger: Biannual performance update Ref: LHCb-PUB-2017-005 Public

The upgrade of the LHCbtrigger for Run III

Mark Whitehead on behalf of the LHCb collaboration

This is an output file created in Illustrator CS3

Colour reproductionThe badge version must only be reproduced on a plain white background using the correct blue: Pantone: 286 CMYK: 100 75 0 0 RGB: 56 97 170 Web: #3861AA

Where colour reproduction is not faithful, or the background is not plain white, the logo should be reproduced in black or white – whichever provides the greatest contrast. The outline version of the logo may be reproduced in another colour in instances of single-colour print.

Clear spaceA clear space must be respected around the logo: other graphical or text elements must be no closer than 25% of the logo’s width.

Placement on a documentUse of the logo at top-left or top-centre of a document is reserved for official use.

Minimum sizePrint: 10mmWeb: 60px

CERN Graphic Charter: use of the outline version of the CERN logo

Introduction• LHCb upgrade for Run III

• Detector upgrades to cope with increased luminosity• Run II • Run III

• Outline• Trigger strategy in Run III• Reconstruction• Bandwidth studies

8th July 2017 EPS 2017 Venice - M. Whitehead 2

L = 4⇥ 1032 cm�2s�1(13 TeV )

L = 2⇥ 1033 cm�2s�1(14 TeV )

CERN-LHCC-2014-016 Trigger TDR

LHCb-PUB-2017-005

LHCb-PUB-2017-006

Trigger in Run II and III• Trigger strategy changes from Run II to Run III


Remove hardware trigger

Increased output rate to storage

Trigger in Run III• A paradigm shift from Run II…

• Rates of beauty and charm so high in the upgrade regime that the trigger will not just have to separate signal and background decay topologies

• Effectively separating signal decays from other signal decays• 24% of events will contain a reconstructible charm hadron• 2% will contain a beauty hadron

• Select specific signal channels while suppressing others• Exclusive selections will be the standard

• Retain some inclusive triggers for bredth of the physics programme

• Should be almost the offline selections - aim for high purity and efficiency• More sensitivity to detector performance effects (e.g. asymmetries)

• Real-time alignment and calibration will be crucial



Tracking and reconstruction sequence

For details on the Run II real-time tracking and alignment performance and developments please see Agnieszka Dziurda’s talk from Thursday

Tracking and reconstruction• Take advantage of the Run II trigger strategy

• Perform a fast reconstruction for real-time alignment and calibration• Second, best, stage performs the full the reconstruction

• Online quality = offline quality - no need for further processing

8th July 2017 EPS 2017 Venice - M. Whitehead 6LHCb-PUB-2017-005

Upgrade trigger: Biannual performance update Ref: LHCb-PUB-2017-005Public Note Issue: 12 Reconstruction Sequence Date: February 23, 2017

minimum bias events as no selection is presently applied between it and the fast stage. Future iterationswill be optimised subject to available CPU, disk buffer space, and physics performance requirementswhich will determine the optimal balance between retention after the fast stage.In the current best stage, VELO tracks from the Fast sequence are used to find additional long trackswith a decreased transverse momentum threshold of 50 MeV/c. Additional T tracks are found using aseeding algorithm which looks for track seeds inside Sci-Fi detector, serving as input to the downstreamtracking and track matching. All tracks are then combined into the ‘best’ track container with theaddition of a clone-killing algorithm and an additional Kalman Filter. These serve as input to theremaining algorithms and subsequent trigger selections.

Figure 1: A schematic view of the fast tracking stage.

Figure 2: A schematic view of the best tracking stage.

2.5 Changes to individual algorithms since the trigger TDR

The reconstruction algorithms that form the fast and best sequences serve the same purpose as theydid in the Run 1 and Run 2 trigger. As such, many of those described in the trigger TDR are underongoing development to serve both the Run 2 trigger, and in preparation for the upgrade. Changes tothese algorithms since the trigger TDR are described here:

2.5.1 VELO

VELO tracks are reconstructed using a simple track following algorithm, using seed tracklets formedfrom pairs of clusters on adjacent modules within a certain range of slopes. At the end of the track

page 5

• Fast stage performance vital for the upgrade trigger• First tuning for LHC upgrade conditions to optimise both speed and

physics performance• Challenging environment - average number of PVs is 5 times higher

• Primary vertex resolution looks impressive

Fit function

A,B and C - free parametersN - number of tracksassociated to vertex

The fast stage



2.6 Performance Update

In light of the changes described in the previous section to both the simulation and the reconstructionalgorithms the overall performance of the reconstruction must be quantified. We describe here thestudies to-date:

2.6.1 PV resolution

The PV resolution is strongly correlated to the number of tracks associated to the vertex, N , and can bedescribed using the function:

�(N) =A

N

B+ C (1)

where A, B, C denote free parameters. Figure 3 shows the PV resolution comparison between Run IIand Upgrade simulation samples. Overall, a better resolution is found for the Upgrade sample.

N0 50

m]

µR

esol

utio

n [

05

1015202530354045

UpgradeRun II

UnofficialLHCb

Upgrade: m]µC [

Bm]µA [

0.3±0.3 0.03±0.64 4.1±78.5

Run II: m]µC [

Bm]µA [

0.2±2.5 0.02±0.86 8.3±159.8

N0 50

m]

µR

esol

utio

n [

0

50

100

150

200

250

300

UpgradeRunII

UnofficialLHCb

Upgrade: m]µC [

Bm]µA [

1.2±10.7 0.02±0.84 51.4±926.0

Run II: m]µC [

Bm]µA [

1.0±18.0 0.02±1.04

90.6±1751.1

Figure 3: Comparison between the PV resolution in x, left, and z, right, measured in Run II and Upgrade MonteCarlo samples. The solid orange (blue) line corresponds to Run II (Upgrade). The resolution is parameterized byEq. 1, and the result of the fit is indicated in the plot.

2.6.2 Ghost probability performance

The performance of trained ghost probability is studied by calculating the signal efficiency and ghosttrack rejection efficiency. The signal efficiency and ghost rejection efficiency for each track type areshown in Fig 4. For long tracks, with 70% ghost rejection, the signal efficiency is around 95%, which isdegraded with respect to the signal efficiency at the same ghost rejection in Run-2, where the efficiencyis 99%. This is expected to improve with further study.

2.6.3 Track reconstruction efficiencies

Table 1: Track reconstruction efficiencies and ghost rates of the fast and best stages as compared to the triggerTDR. The best stage efficiencies and ghost rates are shown for several values of the ghost probability requirement.

Trigger TDR Fast stage Best StageGhost probability < 0.9 < 0.75 < 0.3 < 0.1Ghost rate 10.9% 5.6% 18.8% 15.2% 7.8% 4.2%long 42.7% 42.9% 91.1% 90.8% 88.2% 84.3%long, from B 72.5% 72.7% 94.8% 94.6% 93.1% 90.6%long, from B, pT > 0.5GeV 92.3% 92.5% 96.5% 96.4% 95.4% 93.6%

page 9




2.6.1 PV resolution


�(N) =A

N

B+ C (1)


N0 50

m]

µR

esol

utio

n [

05

1015202530354045

UpgradeRun II

UnofficialLHCb

Upgrade: m]µC [

Bm]µA [

0.3±0.3 0.03±0.64 4.1±78.5

Run II: m]µC [

Bm]µA [

0.2±2.5 0.02±0.86 8.3±159.8

N0 50

m]

µR

esol

utio

n [

0

50

100

150

200

250

300

UpgradeRunII

UnofficialLHCb

Upgrade: m]µC [

Bm]µA [

1.2±10.7 0.02±0.84 51.4±926.0

Run II: m]µC [

Bm]µA [

1.0±18.0 0.02±1.04

90.6±1751.1







page 9

LHCb-PUB-2017-005

x direction z direction

Preliminary Preliminary

The fast stage• Fast stage performance vital for the upgrade trigger

• First tuning for LHC upgrade conditions to optimise both speed and physics performance

• Challenging environment - average number of PVs is 5 times higher• Ghost rejection now approaching Run II performance



Signal efficiency0 0.5 1

Gho

st re

ject

ion

0

0.2

0.4

0.6

0.8

1

Retrained GP

Default GP

/dof2χTrack

VELO Tracks

0.8 1

0.8

1


Gho

st re

ject

ion

0

0.2

0.4

0.6

0.8

1

Retrained GP

Default GP

/dof2χTrack

Long Tracks

0.8 1

0.8

1


Gho

st re

ject

ion

0

0.2

0.4

0.6

0.8

1

Upstream Tracks

Retrained GP

Default GP

/dof2χTrack 0.8 1

0.8

1


Gho

st re

ject

ion

0

0.2

0.4

0.6

0.8

1

Downstream Tracks

Retrained GP

Default GP

0.8 1

0.8

1


Gho

st re

ject

ion

0

0.2

0.4

0.6

0.8

1

TT Tracks

Retrained GP

Default GP

/dof2χTrack 0.8 1

0.8

1

Figure 4: Comparisons of signal efficiency and ghost track rejection efficiency from the most recent ghost probabilitytraining compared to the previous version. The track �

2/ndof distribution is overlaid, and an insets are included

showing the region around 90%.

The tracking efficiency is determined on the B0s !�� signal sample. The efficiency is determined as

the number of reconstructed candidates with respect to MC particles that have left sufficient clustersin the subdetectors to be considered reconstructible. The efficiency includes the ghost probabilityrequirement. In table 1 it can be seen that the efficiency for all long tracks is considerably higher inthe best stage due to the looser momentum requirement. However, high momentum particles frombeauty hadrons are reconstructed in the fast stage with an efficiency very similar to what is possiblewith significantly more computing time in the best stage. In all cases the performance is equal to orbetter than that of the trigger TDR with the fast stage having about half the ghost rate: Improvementsin the algorithms counteract the more realistic detector description including spillover, larger gaps andmodified acceptance in the Sci-Fi. For the best stage several efficiencies are provided as a function ofthe ghost probability requirement. This will be tuned to meet physics performance and throughputrequirements.

page 10

Uses a multivariate classifier• New training for Run III• Optimisation ongoing

LHCbPreliminary

• Timing inline with the trigger TDR CERN-LHCC-2014-016• Where we expected to be - but more improvements to come

• Tracking efficiencies also look promising• Equal to or better than those in the trigger TDR

• Throughput performance targets challenging to meet• Hardware performance growth at equal cost is slowing• A lot of work on a new software framework underway

• Fully exploit the multiprocessor paradigm• Computing TDR expected early next year

Timing and performance


Upgrade trigger: Biannual performance update Ref: LHCb-PUB-2017-005Public Note Issue: 13 Timing & throughput measurements Date: February 23, 2017

3 Timing & throughput measurements

In the trigger TDR the timing was determined for the reconstruction sequence that would be neededprior to Kalman fit and selections. The timing measurement is repeated here using the updated detectordescription, digitisation and algorithm improvements. We present a direct comparison between thefast sequence and the sequence used in the trigger TDR, and include for the first time the best sequencetiming to server as a baseline.

The extrapolations presented in the trigger TDR used the per-event timing of a single process todetermine the expected farm capacity in 2021. This relies on three multiplicative factors: The scalingfrom a single process on a single core to multiple processes on the present number of logical cores, theextrapolated growth rate at equal cost for a node purchased in 2020, and the number of farm nodesexpected to be purchased with the available budget. A more useful extrapolation is to measure thethroughput on a fully loaded EFF candidate node, which then removes the need to measure the firstmultiplicative factor, only needing the growth rate at equal cost. We present here both the timing andthroughput, using the throughput as input to the updated extrapolation.

In the trigger TDR the candidate EFF node consisted of a dual-socket Intel X5650 ‘Westmere’ servernode in which each processor has 6 physical cores with clock frequencies of 2.67GHz and two virtualprocessing units, sharing 24GB of RAM. The X5650 nodes were purchased in 2010, at a unit cost of2.5 kCHF. At the time of purchase this was the most cost-efficient solution. In this note we compareto a more recent EFF candidate node purchased in 2016 consisting of a dual socket Intel ’E5-2630 v4’server node in which each processor has 10 physical cores with clock frequencies of 2.20GHz and twovirtual processing units, sharing 128GB of RAM. In 2016 these nodes had a unit cost of 3 kCHF.

3.1 Global Event Cut

In the trigger TDR a Global Event Cut (GEC) was used based on our experience of using a similarcut in Run I and Run II. This GEC reduces the average event multiplicity which reduces the averagetime needed to process an event, with the added advantage that cleaner events have a higher trackingefficiency. We apply the same GEC here as was used in the trigger TDR for consistency, consistingof a cut on the sum of clusters in the ECAL and HCAL. The value can be adjusted to meet timingrequirements.

3.2 Single process timing comparison

The timing of the reconstruction sequence is performed in a manner identical to that of the trigger TDR:A candidate EFF node on which no other processes are running is used, and the timing is determinedby taking an average over 20 runs using 20000 minimum bias events in each run. The average timeper event for the sequence is compared to the trigger TDR in table. 2 using the X5650 trigger TDRbenchmark node.

Table 2: Timing of the fast stage compared to the one of the trigger TDR with GEC cuts applied. Both timing testsare performed using an X5650 EFF node described in the text.

Timing [ms] Trigger TDR Fast stageVELO tracking 2.0 2.0VELO-UT tracking 1.3 0.5Forward tracking 1.9 2.3PV finding 0.4 1.1total 5.6 6.0

The VELO-UT tracking has increased in speed considerably with no loss in performance as describedin the previous section. The new PV finding algorithm is slower, however, it is yet to be fully optimized:We expect this to improve. In spite of the additional complexity of the digitisation, the overall timing issimilar to that measured in the trigger TDR, while still showing the potential to improve.

page 11

Timing and performance• Timing inline with the trigger TDR CERN-LHCC-2014-016

• Where we expected to be - but more improvements to come


• Throughput performance targets challenging to meet• Mostly from the hardware point of view• A lot of work on a new software framework underway

• Fully exploit the multiprocessor paradigm• TDR expected early next year





2.6.1 PV resolution


�(N) =A

N

B+ C (1)


N0 50

m]

µR

esol

utio

n [

05

1015202530354045

UpgradeRun II

UnofficialLHCb

Upgrade: m]µC [

Bm]µA [

0.3±0.3 0.03±0.64 4.1±78.5

Run II: m]µC [

Bm]µA [

0.2±2.5 0.02±0.86 8.3±159.8

N0 50

m]

µR

esol

utio

n [

0

50

100

150

200

250

300

UpgradeRunII

UnofficialLHCb

Upgrade: m]µC [

Bm]µA [

1.2±10.7 0.02±0.84

51.4±926.0

Run II: m]µC [

Bm]µA [

1.0±18.0 0.02±1.04

90.6±1751.1







page 9

Timing and performance• Timing inline with the trigger TDR CERN-LHCC-2014-016

• Where we expected to be - but more improvements to come


• Throughput performance targets challenging to meet• Hardware performance growth at equal cost is slowing• A lot of work on new software underway

• Fully exploit the multiprocessor paradigm• Computing TDR expected early next year



Output bandwidth division studies

Output bandwidth division• How do we divide up the trigger output bandwidth?

• This is the output to offline storage• Finite disk space limits the output BW - not the network or trigger• TURBO stream - see Giulio Gazzoni's from Thursday

• Reduced event size - more signal events for the same amount of disk space

• Use an automated method to divide between channels• BW per channel defined by number of channels and physics priority• Need a way to tune the output BW consumed per channel

• Here we study it using an MVA classifier response• Proof of principle study using 4 charm decays modes


Upgrade trigger: Bandwidth strategy proposal Ref: LHCb-PUB-2017-006Public Note Issue: 14 Selection at HLT2 Date: February 22, 2017

Table 4: Event size for each channel entering the bandwidth division. For the D0 !K0S⇡

+⇡� decay mode theevent size is shown for the default ’Turbo + PersistReco’ case where all of the reconstructed event information isstored and the Turbo without PersistReco event size is indicated in brackets.

Channel Event sizeD+ !K+K�⇡+ 14 kBD0 !K+K� 12 kBD0 !K+K�⇡+⇡� 14 kBD0 !K0

S⇡+⇡� 70 kB (14kB)

where "i is the modified efficiency, defined in Eq. 3 for a signal decay mode during the global optimisa-tion, and "max

i is the modified efficiency for a given channel if it is optimised for alone, i.e. it is allowedto take the total available output bandwidth. The use of this ratio ensures that channels with a lowerselection efficiency are not penalised in favour of those which are easier to select. The parameter !i is aper-channel weight that can be varied in order to alter the relative importance of each channel in thedivision. The value of this weight would be decided by the collaboration based on a number of factors.For the purpose of this document, !i = 1.

The modified efficiency is defined as the total efficiency of a given channel with respect to all recon-structible signal candidates, ✏i, multiplied by a penalty factor if the bandwidth is greater than thebandwidth limit:

"i =

(✏i [BW BW

limit

]

✏i ⇥ BWlimitBW

otherwise,(3)

where BW is the bandwidth taken by the sum of all channels at their current MVA response values, andBW

limit

is the total available bandwidth. The bandwidth is evaluated from the minimum bias sampleas

BW[GB/s] = retention⇥ rate⇥ event size[kB/evt.], (4)

whereretention =

NMB

(selected)

NMB

(total). (5)

Here the rate is the 30MHz collision rate, and the event size is the average size of an event saved todisk for a given decay channel. Finally, N

MB

(selected) is the number of minimum bias events selectedat the present MVA response for each channel and N

MB

(total) is the total minimum bias sample size.

4.4 Event sizes

The Turbo paradigm means that analysts can choose to reduce the information that is stored for theiranalysis in order to obtain a higher rate for the same bandwidth [3]. In this note we consider two cases:The minimal case in which only the physics objects related to the exclusive decay chain are stored,referred to as ’Turbo’, and the case where all of the event information reconstructed in the trigger ispersisted, referred to as ’Turbo + PersistReco’. In reality, analysts will be able to choose from a spectrumof event sizes in between these two extremes. The D0 !K0

S⇡+⇡� decay mode is considered for both

Turbo and Turbo + PersistReco cases in order to study the effect on the bandwidth division. Event sizesfor the signal decay modes are shown in Table 4.

The bandwidth division algorithm assumes that each channel is exclusive. If more than one MVA selectsan event, the output will be the sum of the event sizes from each channel. This is a valid assumption asTurbo candidates select a subset of the event and minimal overlap is foreseen.

4.5 Results

We study a baseline allocation of a maximum bandwidth of 60 MB/s. This is a reasonable estimateof the bandwidth that may be allocated for the four channels in Run 3 based on the present Run 2output rates, scaled by the increased luminosity. Figure 2 presents the signal efficiencies, rates andbandwidths determined by the algorithm for this limit. Two scenarios are shown: The case where theD0 !K0

S⇡+⇡� mode is saved as Turbo, and the case where the full reconstruction is persisted.

page 6

LHCb-PUB-2017-006

Output bandwidth division• Genetic algorithm approach

• Minimise the by varying the MVA response for each decay• channel weight ( = 1.0 here)• channel efficiency• maximum channel efficiency

when given the full output BW



4 Selection at HLT2

At HLT2, the output bandwidth will be limited to what can be saved to offline storage. The bandwidthavailable to each analysis depends upon the number of analyses as well as the available computingresources. We propose an automated method to distribute the bandwidth between different decaymodes. This method requires a single parameter that when changed, varies the output rate of theselection. Here, we make use of a multivariate (MVA) classifier response as the parameter. We startwith a preselection to reduce the input sample size to the MVA training, and with respect to whichMVA efficiencies may be determined.

4.1 Preselection

The preselection consists of criteria similar to but slightly looser than that already applied in exclusiveselections in the Run 2 trigger. These selections comprise a mass window on the D⇤+ and D0 candidatesof 25 and 150 MeV respectively, and a �M = (mD⇤ �mD0 �m⇡) requirement of 25 MeV, where thepublished mass values are used [6]. Additionally, a requirement on the transverse momentum ofpT

> 250 MeV is applied to all tracks from the D meson decay, and pT

> 100 MeV is required for thesoft pion track from the decay of the D⇤+ candidate. Identical track quality requirements are applied toall final state tracks.

For the D+ !K+K�⇡+ channel three additional requirements are applied; The D+ lifetime is requiredto be ⌧ > 0.4 ps, the sum of the p

T

of the D+ child tracks is required to be greater than 3 GeV, andthe angle between the D+ momentum vector and its flight direction, known as the direction angle, isrequired to satisfy arccos(✓) < 10 mrad. In the D0 !K+K� channel, requirements of p

T

> 500 MeVand p > 5 GeV are applied to the D0 children. The D0 candidate p

T

is required to be greater than 1 GeVand the direction angle arccos(✓) < 17.3 mrad. For D0 !K+K�⇡+⇡� decays the criteria are ⌧ > 0.1 ps,direction angle arccos(✓) < 20 mrad, p

T

> 2 GeV for the D0 candidate and the minimum sum pT

ofthe four child tracks is greater than 1.8 GeV. Lastly, in the D0 !K0

S⇡+⇡� channel, the requirements

are ⌧ > 0.1 ps, pT

> 1.8 GeV and arccos(✓) < 34.6 mrad for the D0 candidate. The sum of the child pT

is required to be more than 1.5 GeV.

4.2 MVA selection

With the preselections applied, an MVA is trained for each channel. In this note we use the AdaBoostmeta estimator implementation of the scikit-learn package [7], however any other implementationis acceptable as long as the MVA output response is continuous. It should be noted that the MVAselections used in this note may not be optimal; they are used to demonstrate the method. In future itis expected that analysts will train and provide their own MVA as input.

One MVA is trained for each of the exclusive charm decay modes, using the signal and minimumbias background samples described in Sec. 2. The training makes use of 50% of each sample, with theremainder used to validate and test for overtraining. Monte-Carlo truth information is used to ensurethat the training samples consist only of true signal and background decays respectively. The variablesused as input to the MVAs include vertex and track qualities, particle identification information, andthe p and p

T

of the tracks. After training, the MVA classifier response for both the signal and minimumbias sample is saved for each event.

4.3 Bandwidth division strategy

The MVA response for each of the signal channels and the minimum bias samples are used as input toa genetic-algorithm [8] based bandwidth division. The bandwidth division algorithm minimises thefollowing �2 by varying the MVA responses of each input channel:

�2 =channelsX

i

!i ⇥✓1� "i

"max

i

◆2

, (2)

page 5- Maximum possible- Result of division

�2

wi

"max

i

"i

• Assign these channels a 60MB/s bandwidth limit between them and use the algorithm to divide it up • Efficiency calculated from signal MC samples• Bandwidth calculated from minimum bias MC sample


Table 4: Event size for each channel entering the bandwidth division. For the D0 !K0S⇡

+⇡� decay mode theevent size is shown for the default ’Turbo + PersistReco’ case where all of the reconstructed event information isstored and the Turbo without PersistReco event size is indicated in brackets.

Channel Event sizeD+ !K+K�⇡+ 14 kBD0 !K+K� 12 kBD0 !K+K�⇡+⇡� 14 kBD0 !K0

S⇡+⇡� 70 kB (14kB)

where "i is the modified efficiency, defined in Eq. 3 for a signal decay mode during the global optimisa-tion, and "max

i is the modified efficiency for a given channel if it is optimised for alone, i.e. it is allowedto take the total available output bandwidth. The use of this ratio ensures that channels with a lowerselection efficiency are not penalised in favour of those which are easier to select. The parameter !i is aper-channel weight that can be varied in order to alter the relative importance of each channel in thedivision. The value of this weight would be decided by the collaboration based on a number of factors.For the purpose of this document, !i = 1.

The modified efficiency is defined as the total efficiency of a given channel with respect to all recon-structible signal candidates, ✏i, multiplied by a penalty factor if the bandwidth is greater than thebandwidth limit:

"i =

(✏i [BW BW

limit

]

✏i ⇥ BWlimitBW

otherwise,(3)

where BW is the bandwidth taken by the sum of all channels at their current MVA response values, andBW

limit

is the total available bandwidth. The bandwidth is evaluated from the minimum bias sampleas

BW[GB/s] = retention⇥ rate⇥ event size[kB/evt.], (4)

whereretention =

NMB

(selected)

NMB

(total). (5)

Here the rate is the 30MHz collision rate, and the event size is the average size of an event saved todisk for a given decay channel. Finally, N

MB

(selected) is the number of minimum bias events selectedat the present MVA response for each channel and N

MB

(total) is the total minimum bias sample size.

4.4 Event sizes

The Turbo paradigm means that analysts can choose to reduce the information that is stored for theiranalysis in order to obtain a higher rate for the same bandwidth [3]. In this note we consider two cases:The minimal case in which only the physics objects related to the exclusive decay chain are stored,referred to as ’Turbo’, and the case where all of the event information reconstructed in the trigger ispersisted, referred to as ’Turbo + PersistReco’. In reality, analysts will be able to choose from a spectrumof event sizes in between these two extremes. The D0 !K0

S⇡+⇡� decay mode is considered for both

Turbo and Turbo + PersistReco cases in order to study the effect on the bandwidth division. Event sizesfor the signal decay modes are shown in Table 4.

The bandwidth division algorithm assumes that each channel is exclusive. If more than one MVA selectsan event, the output will be the sum of the event sizes from each channel. This is a valid assumption asTurbo candidates select a subset of the event and minimal overlap is foreseen.

4.5 Results

We study a baseline allocation of a maximum bandwidth of 60 MB/s. This is a reasonable estimateof the bandwidth that may be allocated for the four channels in Run 3 based on the present Run 2output rates, scaled by the increased luminosity. Figure 2 presents the signal efficiencies, rates andbandwidths determined by the algorithm for this limit. Two scenarios are shown: The case where theD0 !K0

S⇡+⇡� mode is saved as Turbo, and the case where the full reconstruction is persisted.

page 6

LHCb-PUB-2017-006






4 Selection at HLT2


4.1 Preselection





T


T


T


T




are ⌧ > 0.1 ps, pT



4.2 MVA selection



T




�2 =channelsX

i

!i ⇥✓1� "i

"max

i

◆2

, (2)

page 5


+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ba

nd

wid

th [

MB

/s]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ba

nd

wid

th [

MB

/s]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Effic

iency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Effic

iency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D-

π +π

- K+ K→

+D -π +

π S0 K→

0D

Ra

te [

KH

z]

0

1

2

3

4

5

6

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ra

te [

KH

z]

0

1

2

3

4

5

6

Figure 2: Output of the bandwidth division algorithm assuming a bandwidth limit of 60 MB/s. From top tobottom are the minimum bias bandwidth per channel, the signal efficiencies and the minimum bias rate. On theleft, the default scenario in which D0 !K0

S⇡+⇡� events are saved with PersistReco is presented, while on the

right, only D0 !K0S⇡

+⇡� signal tracks are saved. Distributions in red are those for the initial case in which eachsignal is allocated 100% of the available bandwidth; in blue are the final results in which each channel shares thebandwidth.

In the initial optimisation step "max

i is determined for each channel independently, and it can be seenthat the full bandwidth is taken by each channel. In the final step the genetic algorithm determines theMVA response necessary to minimise the �2, resulting in a distribution of the output bandwidth amongthe channels. The D+ !K+K�⇡+ channel is considerably less pure than the other channels due to thelack of a �m requirement. As a result, it receives proportionally more of the available bandwidth inorder to obtain a similar signal efficiency. Conversely, the D0 !K0

S⇡+⇡� mode is very pure: At a similar

efficiency to the other channels it uses the same bandwidth even though this bandwidth is driven bythe larger event size in the Turbo + PersistReco case. Moving to Turbo only, where D0 !K0

S⇡+⇡� has

a similar event size to the other channels, it is considerably more efficient for a similar rate.

The evolution of the MVA selection efficiency is shown in Figure 3 over the range from 20 MB/s to1 GB/s. The efficiency for each channel rises rapidly as the bandwidth limit is increased, as expected,and plateaus as "channels approaches "channel

max

. If 15 MB/s was available on average per channel, thencharm trigger efficiencies of ⇠ 30% could be expected in the upgrade. Allowing for 30 MB/s wouldincrease this to ⇠ 40%, assuming the MVA performance is indicative.

page 7


+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Bandw

idth

[M

B/s

]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Bandw

idth

[M

B/s

]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Eff

icie

ncy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Eff

icie

ncy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D-

π +π

- K+ K→

+D -π +

π S0 K→

0D

Rate

[K

Hz]

0

1

2

3

4

5

6

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Rate

[K

Hz]

0

1

2

3

4

5

6









S⇡+⇡� has



max


page 7

- Maximum possible- Result of division

�2

wi

"max

i

"i

LHCb-PUB-2017-006






4 Selection at HLT2


4.1 Preselection





T


T


T


T




are ⌧ > 0.1 ps, pT



4.2 MVA selection



T




�2 =channelsX

i

!i ⇥✓1� "i

"max

i

◆2

, (2)

page 5


+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ba

nd

wid

th [

MB

/s]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ba

nd

wid

th [

MB

/s]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Effic

iency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Effic

iency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D-

π +π

- K+ K→

+D -π +

π S0 K→

0D

Ra

te [

KH

z]

0

1

2

3

4

5

6

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ra

te [

KH

z]

0

1

2

3

4

5

6









S⇡+⇡� has



max


page 7


+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Bandw

idth

[M

B/s

]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Bandw

idth

[M

B/s

]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Eff

icie

ncy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Eff

icie

ncy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D-

π +π

- K+ K→

+D -π +

π S0 K→

0D

Rate

[K

Hz]

0

1

2

3

4

5

6

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Rate

[K

Hz]

0

1

2

3

4

5

6









S⇡+⇡� has



max


page 7

- Maximum possible- Result of division

�2

wi

"max

i

"i

LHCb-PUB-2017-006

• Signal efficiencies will ultimately depend on analysts’ ability to define powerful selections• Use machine learning in the trigger• Reduction of the event size, more signal for the same BW usage

Summary• LHCb upgrade trigger studies well underway

• Tracking and reconstruction• Very promising performance on simulated data• Throughput will improve with further optimisation

• Significant work will be done in the coming years

• Trigger bandwidth division• First look at charm and proof of principle• Next step - extend studies to full LHCb physics programme

• Lots more to come in the next couple of years


Backups


Upgrade detector


Best stage


• Best stage


minimum bias events as no selection is presently applied between it and the fast stage. Future iterationswill be optimised subject to available CPU, disk buffer space, and physics performance requirementswhich will determine the optimal balance between retention after the fast stage.In the current best stage, VELO tracks from the Fast sequence are used to find additional long trackswith a decreased transverse momentum threshold of 50 MeV/c. Additional T tracks are found using aseeding algorithm which looks for track seeds inside Sci-Fi detector, serving as input to the downstreamtracking and track matching. All tracks are then combined into the ‘best’ track container with theaddition of a clone-killing algorithm and an additional Kalman Filter. These serve as input to theremaining algorithms and subsequent trigger selections.

Figure 1: A schematic view of the fast tracking stage.

Figure 2: A schematic view of the best tracking stage.

2.5 Changes to individual algorithms since the trigger TDR

The reconstruction algorithms that form the fast and best sequences serve the same purpose as theydid in the Run 1 and Run 2 trigger. As such, many of those described in the trigger TDR are underongoing development to serve both the Run 2 trigger, and in preparation for the upgrade. Changes tothese algorithms since the trigger TDR are described here:

2.5.1 VELO

VELO tracks are reconstructed using a simple track following algorithm, using seed tracklets formedfrom pairs of clusters on adjacent modules within a certain range of slopes. At the end of the track

page 5

LHCb-PUB-2017-005

Tracking and reconstruction


• Timing and efficiency performance (with captions)




2.6.1 PV resolution


�(N) =A

N

B+ C (1)


N0 50

m]

µR

esol

utio

n [

05

1015202530354045

UpgradeRun II

UnofficialLHCb

Upgrade: m]µC [

Bm]µA [

0.3±0.3 0.03±0.64 4.1±78.5

Run II: m]µC [

Bm]µA [

0.2±2.5 0.02±0.86 8.3±159.8

N0 50

m]

µR

esol

utio

n [

0

50

100

150

200

250

300

UpgradeRunII

UnofficialLHCb

Upgrade: m]µC [

Bm]µA [

1.2±10.7 0.02±0.84

51.4±926.0

Run II: m]µC [

Bm]µA [

1.0±18.0 0.02±1.04

90.6±1751.1







page 9

Upgrade trigger: Biannual performance update Ref: LHCb-PUB-2017-005Public Note Issue: 13 Timing & throughput measurements Date: February 23, 2017

3 Timing & throughput measurements

In the trigger TDR the timing was determined for the reconstruction sequence that would be neededprior to Kalman fit and selections. The timing measurement is repeated here using the updated detectordescription, digitisation and algorithm improvements. We present a direct comparison between thefast sequence and the sequence used in the trigger TDR, and include for the first time the best sequencetiming to server as a baseline.

The extrapolations presented in the trigger TDR used the per-event timing of a single process todetermine the expected farm capacity in 2021. This relies on three multiplicative factors: The scalingfrom a single process on a single core to multiple processes on the present number of logical cores, theextrapolated growth rate at equal cost for a node purchased in 2020, and the number of farm nodesexpected to be purchased with the available budget. A more useful extrapolation is to measure thethroughput on a fully loaded EFF candidate node, which then removes the need to measure the firstmultiplicative factor, only needing the growth rate at equal cost. We present here both the timing andthroughput, using the throughput as input to the updated extrapolation.

In the trigger TDR the candidate EFF node consisted of a dual-socket Intel X5650 ‘Westmere’ servernode in which each processor has 6 physical cores with clock frequencies of 2.67GHz and two virtualprocessing units, sharing 24GB of RAM. The X5650 nodes were purchased in 2010, at a unit cost of2.5 kCHF. At the time of purchase this was the most cost-efficient solution. In this note we compareto a more recent EFF candidate node purchased in 2016 consisting of a dual socket Intel ’E5-2630 v4’server node in which each processor has 10 physical cores with clock frequencies of 2.20GHz and twovirtual processing units, sharing 128GB of RAM. In 2016 these nodes had a unit cost of 3 kCHF.

3.1 Global Event Cut

In the trigger TDR a Global Event Cut (GEC) was used based on our experience of using a similarcut in Run I and Run II. This GEC reduces the average event multiplicity which reduces the averagetime needed to process an event, with the added advantage that cleaner events have a higher trackingefficiency. We apply the same GEC here as was used in the trigger TDR for consistency, consistingof a cut on the sum of clusters in the ECAL and HCAL. The value can be adjusted to meet timingrequirements.

3.2 Single process timing comparison

The timing of the reconstruction sequence is performed in a manner identical to that of the trigger TDR:A candidate EFF node on which no other processes are running is used, and the timing is determinedby taking an average over 20 runs using 20000 minimum bias events in each run. The average timeper event for the sequence is compared to the trigger TDR in table. 2 using the X5650 trigger TDRbenchmark node.

Table 2: Timing of the fast stage compared to the one of the trigger TDR with GEC cuts applied. Both timing testsare performed using an X5650 EFF node described in the text.

Timing [ms] Trigger TDR Fast stageVELO tracking 2.0 2.0VELO-UT tracking 1.3 0.5Forward tracking 1.9 2.3PV finding 0.4 1.1total 5.6 6.0

The VELO-UT tracking has increased in speed considerably with no loss in performance as describedin the previous section. The new PV finding algorithm is slower, however, it is yet to be fully optimized:We expect this to improve. In spite of the additional complexity of the digitisation, the overall timing issimilar to that measured in the trigger TDR, while still showing the potential to improve.

page 11

LHCb-PUB-2017-005

Output bandwidth division


• Event size is very important - change from 70kb to 14kb


+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ba

nd

wid

th [

MB

/s]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ba

nd

wid

th [

MB

/s]

0

10

20

30

40

50

60

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Effic

iency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π S

0 K→0D

Effic

iency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+π

- K+ K→

+D - K+ K→

0D-

π +π

- K+ K→

+D -π +

π S0 K→

0D

Ra

te [

KH

z]

0

1

2

3

4

5

6

+π

- K+ K→

+D - K+ K→

0D -π +

π -

K+ K→0D -

π +π

S0 K→

0D

Ra

te [

KH

z]

0

1

2

3

4

5

6









S⇡+⇡� has



max


page 7

D0 ! K0S⇡

+⇡�

Huge efficiency gainsfor zero bandwidth increase

LHCb-PUB-2017-006

Documents

The upgrade of the LHCb trigger for Run III€¦ · T > 0.5GeV 92.3% 92.5% 96.5% 96.4% 95.4% 93.6% page 9 Upgrade trigger: Biannual performance update Ref: LHCb-PUB-2017-005 Public