The BTeV Trigger A Model Application for the Developers of Real Time and Embedded Technologies Joel N. Butler, Fermilab Workshop on High Performance, Fault-Adaptive

The BTeV TriggerA Model Application for the Developers of

Real Time and Embedded Technologies

Joel N. Butler, Fermilab

Workshop on High Performance, Fault-Adaptive Large Scale Real-Time Systems

Vanderbilt UniversityNov. 14-15, 2002

What’s BTeV• An experiment at the Fermilab Tevatron Collider to study the

matter-antimatter asymmetry of decays of particles containing the b-quark, a quark which is about 5 times the mass of the proton and decays away with a mean lifetime of 1.5 picoseconds (10-12 s).

• When produced at high energies, Einstein’s time dilation allows these particles to go a few millimeters from their production point before they decay. BTeV has enough tracking precision to reconstruct the interaction vertex and the decay vertex and can therefore isolate and study the decays of b-particles. The goal is to study the asymmetry (difference in rate) between the decay of b-particles and b-antiparticles

• Without these kinds of asymmetries, all matter would have found an antimatter particle to annihilate with into pure energy and there would be no matter excess to form the universe!

While I care about this problem, the main point here is that in attacking it, we’ve had to develop some hardware andsoftware, which, viewed abstractly, might be useful to YOU.

What’s a Trigger

• A Trigger is a FILTER. It selects some high energy data – in the form of records of individual collisions – to “save” and consigns the rest to oblivion.

• The BTeV trigger is a filter with a vengeance, involving thousands of computers operating in parallel to do sophisticated selections at 7.5 MHz.

What’s a Model?

• Webster’s Third New International Dictionary has 14 definitions, among which are:– A structural design or pattern

– A person or thing regarded as worthy of imitation

– A person or thing that serves as a pattern or source of inspiration for an artist or writer

– One who is employed to display clothes or appear in displays of other merchandise

Outline

• What a trigger does in high energy physics experiments and why they started out “highly specialized”

• How triggers got less “specialized”• The BTeV Trigger as a real implementation• The BTeV trigger as a model or abstraction• What needs to be done to exploit the model

The What and Why of Triggers• High energy collisions are single events, usually the result

of a “beam projectile” colliding with either a nucleus in a solid target (fixed target experiment) or a projectile in a second beam coming from the opposite direction (colliding beam experiment).

• Most high energy physics events are “ordinary” – I.e. “understood” (sometimes that means not really understood but at least “familiar”)

• One is usually looking for “rare” events• For one reason or another, shifting over time, it has been

impossible to record “every” event.

In HEP, a “trigger system” is the collection of hardware and associated software used to select (YES/NO) which events are to be recorded to an archival medium and therefore will be available to analysis and which are discarded, I.e. lost forever

Some Features• It must run in quasi-real time – decisions must keep up with

the rate of interactions coming from the experiment – the BTeV trigger makes a decision every 132 ns on average

• It is “mission critical” – a defective trigger can throw out the “good events” (as well as, or instead of, the bad).

• It must be well understood – a malfunction in the trigger can create selection bias that make it very hard to extract information from the signal events.

• We worry that it can be inherently physics “biased” – if you set it up to be efficient at selecting what you are looking for, how will you ever find the unexpected?

You would not use a “trigger” if you could record and subsequently analyze every event that was produced. The need for a trigger is always due to the scarcity of some resource. Only 1 collision in 500 has a b-quark.

• There is one of these every 132 ns (7.6 MHz)•Events have vastly different numbers of produced particles, and so a big variation in the number of struck channels• The detector response is faster than the 132 ns interval between consecutive events•For this detector, there are 25 million channels but only a few thousand have signals from track passing through in any given event•Total event – all detectors – have 200Kbytes/event, for a raw, sparsified rate of 1.5 Terabyte/second•Runs are last for about 8 hours each and go on 24 x 7

30 station pixel detector

A High Energy Collision

Various Limitations• Detector deadtime – early detectors needed to put in a

sensitive state and “fired” or “triggered” when there was an interaction. While they were “recovering”, other interactions were missed so you only wanted to “trigger” on good events

• Trigger deadtime – sometimes data taking was impossible while the trigger was making up its mind

• Readout deadtime – sometimes the detector could not accept data while it was being readout

• Storage limitations – sometimes archiving could not keep up and, without affordable buffering, events were lost

Most of these limitations can now be avoided due to improvements in speed and price of electronics, pipelining and buffering using cheap disk and memory, leaving ….

Data Storage/Data Access Limitations• A typical experiment works very hard to get down to

200MBytes of output per second• In a typical run of 1 year, accounting for scheduled down time,

accidental downtime, deadtime, etc, this results in 2 Pbyte/year• Typically, the output of the computations done on this triples

this number to of order 6-8 Pbytes per year• Not just affording the storage, but also being able to access

such large datasets, is a final limitation• To achieve this, the BTeV trigger must select, at most, 1 event

out of 2000 events; the CMS and ATLAS triggers (CERN LHC) must select 1 “event” out of 40,000 events!

Early Systems

• Early systems used the simplest aspects of a collision for the trigger – energy deposition in a few detectors

• Later, they began to use more complicated quantities, such as the total transverse energy – Ei X polar anglei – which could be done on specialized hardware boards with weighting schemes or ALU’s. Since this usually took longer and caused more deadtime, only events which passed the simple hardware trigger – now called “Level 1”, was sent to this new subsystem – “Level 2”

Enter Computing – at Level 3• Once microprocessors became available, it

became the practice to add a “Level 3” to the “Trigger hierarchy” which used FARMS or CLUSTERS of general purpose processors to do much more sophisticated computations, but on the relatively small number of events passing Levels 1 and Level 2. In HEP, these were an outgrowth of the use of FARMS or CLUSTERS to exploit the “embarrassingly trivial parallelism” of OFFLINE analysis – I.e. each event is an (almost) independent analysis problem as far as event reconstruction goes.

Computing Invades Level 2!• Within a very few years, general purpose

computing was appearing also in subcomponents of the Level 2 triggers and even, in a few cases, for specialized purposes in Level 1.

• At the same time, FPGA’s, PLA’s, associative memories, etc were beginning to blur the distinctions between computers and combinatoric logic

Rationale for Trigger Hierarchy

Collision Rate = R1 = input rate to Level 1.

•Average Decision time is 1/R1.

•If L1 accept fraction is f1, output rate is f1 x R1

Input Rate at Level 2 is f1 x R1.

•Decision time is 1/(f1 X R1)

• If L2 accept rate is f2, output rate is R1 x f1 x f2.

Input Rate to Level 3 is R1 x f1 x f2.

•Decision time is 1/(R1 x f1 x f2).

•Output rate, f3, is usually set by storage considerations

BTeV: Every Event Gets Computed

• The final step is to extend computing to all aspects of the trigger

• The different levels are really now mainly distinguished by the complexity of the algorithm, although at present there are still minor differences in the hardware at various levels, strictly due to cost considerations.

BTeV Spectrometer

The BTeV Level I Vertex Trigger

• Key Points– This is made possible by a vertex detector with

excellent spatial resolution, fast readout, low occupancy, and 3-d space points.

– A heavily pipelined and parallel processing architecture using inexpensive processing nodes optimized for specific tasks ~ 3000 processors (DSPs).

– Sufficient memory (~1 Terabyte) to buffer the event data while calculations are carried out.

The trigger will reconstruct every beam crossing and look for TOPOLOGICAL evidence of a B decaying downstream of the primary vertex. Runs at 7.6 MHz!

BTeV trigger block diagram

1.5 TB/s7.6 MHz

L1 rate reduction: ~100x

L2/3 rate reduction: ~20x

4 KHz ~800 MB/s200 MB/s (4x compression)

BTeV detector

L1 muon

L1 vertex

GlobalLevel-1

Level-1

Level 2/3 Crossing Switch

Data Logging

Front-end electronics

Level-1 Buffers

Level-2/3 Buffers

Information Transfer Control Hardware

ITCH

Level-2/3 Processor Farm#1

#2#m-1

#m

RDY

Crossing #N

Req. data for crossing #N

Level-3 accept

GL1 accept

PIX

> 2 x 10 channels7

Level 1 vertex trigger architecture

FPGA segment trackers

Merge

Trigger decision to Global Level 1

Switch: sort by crossing number

track/vertex farm(~2500 processors)

30 station pixel detector

• Generate Level-1 accept if 2 or more “detached” tracks in the BTeV pixel detector satisfy: 2

2.0

25.02

b

mb

pT

(GeV/c)2

cm

L1 vertex trigger algorithm

Execute Trigger

b

p p

B-meson

Pixel L1TriggerFinds the primary vertex and identifies tracks which miss it, calculates the significance of detachment, b/(b).

74%

1%

b,b/b

Impact Parameter in units of Impact Parameter in units of

Trigger Efficiency-Minimum Bias EventsTrigger EfficiencyBsDsKE

F

F

I

C

I

E

N

C

Y

E

F

F

I

C

I

E

N

C

Y

N=1

N=2

N=3

N=4

N=1

N=2

N=3

N=4

The Level 2/3 Trigger• The Level 1 trigger rejects 99% of the events,

retaining nearly 75% of all useful b-quark events• The Level 2 and 3 trigger consists of a farm of

2500 LINIX processors, which do an complete analysis – almost equivalent to the full offline analysis – using every piece of information available

• This system applies a sophisticated set of “physics filters” to achieve a further rejection of 95%, while retaining 90% of the useful b-quark events which survived Level 1.

The Abstraction – A Selection Engine

• So far this looks pretty specialized• But almost all the pieces are “commodity devices”

and all are “programmable”• The only BTeV “specific” part is where two data

“substreams” – the pixel detector and the muon detector – are picked off and routed to the Level 1 trigger

• Lets abstract this by having a “ new data arrival notifier” and “ a data extractor”

The Generalized Data Selection Engine

Data Generation

Transient or Persistent Storage

Level N Filter

Persistent Storage

Level 2 Filter

Level 1 Filter

Data?

• From sensors – HEP, Nuclear– Space Science, Astrophysics observations– Earths science, Geology

• Communications streams, Data Mining– EMAIL, Internet traffic– Written, graphic, verbal files

• Pattern matching• …….

What Software is Needed for Something This Complex to Work?

• A toolkit of parallelization software to split the computations among Levels and among many computers at each Level. THERE ARE SEVERAL TOOLKITS TO DO THIS.

• A toolkit of Fault Tolerant, Fault adaptive software to make sure all is working, data is not getting lost or miscalculated: CPU and network problems. THERE IS NOTHING AVAILABLE TO DO THIS AT THE SCALE OF THOUSANDS OF PROCESSORS!!!!

• In real time cases, software to check that the apparatus is working, which is another class of fault. This must be handled at the application level but can use many of the elements of the toolkits above. DITTO!!!!

• All this must be made to look “simple” to an operator or analyst

Fault Tolerance

• The trigger is working on many beam crossings at once. To achieve high utilization of all processors, it makes decisions as quickly as possible. There is no fixed latency and events are not emerging in the same time ordered sequence with which they enter the system.

• Keeping the trigger system going and being sure it is making the right decisions is a very demanding problem -- 6000-12,000 processing elements: FPGAs, DSPs. Commercial LINUX processors

• We have to write a lot of sophisticated fault tolerant, fault adaptive software

• We are joined by a team of computer scientists who specialize in fault-tolerant computing under an award of $5M over 5 years from the US NSF.

Analysis

Local Oper.Manager

LocalFaultMgr

TrigAlgo.

ARMOR/RTOS

TrigAlgo.Trig

Algo.Trig

Algo.

Logical C

ontrol N

etwork

L1/DSP

Local Oper.Manager

LocalFaultMgr

TrigAlgo.

ARMOR/RTOS

TrigAlgo.Trig

Algo.Trig

Algo.

Log

ical

Dat

a N

et

DSP

Local OperManager

LocalFaultMgr

TrigAlgo.

ARMOR/Linux

TrigAlgo.Trig

Algo.Trig

Algo.

Log

ical

Dat

a N

et

Logical C

ontrol Netw

ork

RISC

Local OperManager

LocalFaultMgr

TrigAlgo.

ARMOR/Linux

TrigAlgo.Trig

Algo.Trig

Algo.

L2,3/RISC

Region Operations Mgr

RegionFault Mgr

Runtime

Design and

Analysis

Reconfig Behavior

Algorithm Fault Behavior

Resource

Syn

thes

is

PerformanceSimulation

DiagnosabilityAnalysis

ReliabilityAnalysis

System Models

Soft Real-Time Hard

ExperimentInterface

Synthesis

Feed

back

Modeling

Logical C

ontrol N

etwork

Global Operations Manager

Global Fault Manager

Conclusion• We believe/hope that many applications can use

this kind of system, if not in detail, than at least its abstraction.

• We believe that the “fault adaptive, fault tolerant layer” is a key issue that will make it safe for non-experts, such as operators to use.

• We hope that you will help us to identify promising applications areas.

• We expect that these areas will have new requirements or different concerns/ emphases than HEP. This is your chance to influence the R&D!

Documents

The BTeV Trigger A Model Application for the Developers of Real Time and Embedded Technologies Joel N. Butler, Fermilab Workshop on High Performance, Fault-Adaptive