16
1 ANDREA NEGRI, INFN PAVIA – NUCLEAR SCIENCE SYMPOSIUM – ROME 20th October 2004 01 11 010 001 1101 1110 11001 01011 110110 001101 1111111 0111000 11101010 01001110 110111001 000101101 1111010001 0101111100 111101001111 010110000101 H t W Z 0 Andrea Negri, INFN Pavia Andrea Negri, INFN Pavia on behalf of the ATLAS HLT Group on behalf of the ATLAS HLT Group

Design Deployment and Functional Tests of the online Event Filter for the ATLAS experiment

  • Upload
    osgood

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

. 0. Design Deployment and Functional Tests of the online Event Filter for the ATLAS experiment. 01 11 010 001 1101 1110 11001 01011 110110 001101 1111111 0111000 11101010 01001110 110111001 000101101 1111010001 0101111100 111101001111 010110000101. TDAQ. - PowerPoint PPT Presentation

Citation preview

Page 1: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

1AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

011101000111011110110010101111011000110111111110111000111010100100111011011100100010110111110100010101111100

111101001111010110000101

H

t

W

Z0

Andrea Negri, INFN Pavia Andrea Negri, INFN Pavia

on behalf of the ATLAS HLT Groupon behalf of the ATLAS HLT Group

Page 2: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

2AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

Level 1 triggerHardware basedCoarse granularity calo/muon data

Event FilterFull event access“Seeded” by LVL2 resultAlgorithms inherited from offline

Level 2 triggerDetector sub-region processedFull granularity for all subdetectorsFast rejection steering

40 MHz

~75 kHz

~2 s

~2 kHz

~10 ms

~ 1 s

~200 Hz

Muon

ROD ROD ROD

LVL1

Calo Inner

PipelineMemories

ReadoutDrivers

RatesLatency

RoI

ATLAS T/DAQ systemATLAS T/DAQ system

LVL2

Event builder network

Storage: ~ 300 MB/s

ROBROB ROBROB ROBROBReadoutBuffers~1600

EF farm~1000 CPUs

1 selected eventevery millionTDAQ( )

EF

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

CM energy 14 TeV Luminosity 1034 cm-2s-1

Collision rate 40 MHz Event rate ~ 1 GHz Detector channels ~ 108

Page 3: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

3AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

A common framework for offline and onlineand similar reconstruction algorithms

Avoids duplication of work Simplify performance/validation studiesAvoid selection biasesCommon database access tools

General requirements

Scalability, flexibility and modularity

Hardware independence in order to follow technology trends

Reliability and fault tolerance

Avoid data losses

Could be critical: EF algorithms

inherited from the offline ones

EF

SFI

SFO

SFI

SFO

SFI

SFO

SFI

SFO

SubFarmInput

SubFarmOutput

EFSubFarm

Event Filter system: Constraints and RequirementsEvent Filter system: Constraints and Requirements

The computing instrument of the EF is organized

as a set of independent subFarms, connected to

different output ports of the EB switch

Possibility to partition the EF resources and run multiple concurrent DAQs instances (e.g.: calibration and commissioning purposes)

Event builder network

Storage

Read out system

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 4: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

4AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

data processing data flow functionalities

Design featuresDesign features

Read out systemEach processing node manages its own connection with the

SFI and SFO elements that implement the server part of

the communication protocol

Allows dynamic insertion/removal of sub-farms in the EF or of processing hosts in a sub-farm

Allows geographically distributed implementations

Supports multiple SFI connections:

dynamic re-routing in case of SFI malfunction (depends on the network topology)

Avoids single point of failure: a faulty processing host do not interfere with the

operations of other sub-farm elements

In order to assure data security in case of event processing problems the design has been based on the decoupling between:

SFI

SFO

SFI

SFO

SFI

SFO

SFI

SFO

Event builder network

StorageRemote

Farm

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 5: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

5AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

In each EF processing host

Data flow functionalities are provided by

the Event Filter Dataflow process that:

Manages the communication with SFI and SFO

Stores the events during their transit in the Event Filter

Makes the events available to

the Processing Tasks that perform the

data processing and event selection operations

running the EF algorithms in the standard

ATLAS offline framework

A pluggable interface (PTIO) allows PTs to access

the dataFlow part via a unix domain socket ( )

DataFlow DataFlow DataProcessing decoupling DataProcessing decoupling

Node n

DataFlow

DataProcessing

EFD

SFO

SFI

AcceptedEvents

IncomingEvents

PT #1

PT #n

PTIOPTIO

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 6: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

6AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

When an event enters the processing node it is stored in a shared

memory (sharedHeap) used to provide events to the PTs

A PT, using the PTIO interface (socket)

Requests an event

Obtains a pointer to sharedHeap portion that

contain the event to be processed

(The PTIO maps this portion in memory)

Processes the event

Communicates back to the EFD the filtering decisions

PT cannot corrupt the events because the map is read only

Only the EFD manages the sharedHeap

If the PT crashes the event is still owned by the EFD,

that may assign the event to another PT or force accept it

Fault Tolerance: the sharedHeap (1)Fault Tolerance: the sharedHeap (1)

Node n

EFD

SFO

PT #1

PTIO

PT #n

PTIO

SFI

100111010100010010010001000101000111101000100101001000100101000100001000100101010111100000101110011001001001001010011010101000100010001000100100010010100010000100010010101011110000010111001100100100100101001101010100010001000101010101010100010111101001101001110001

01000111

Evy

SharedHeap

Evx

Evz

RO map

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 7: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

7AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

To provide fault tolerance also in case of EFD crash the

sharedHeap is implemented as a memory mapped file

The OS itself manages directly the actual write

operations avoiding useless disk I/O over-heading

The raw events can be recovered reloading

the sharedHeap file at EFD restart

The system could be out of sync only in case of

power cut, OS crash or disk failure

these occurrences are completely decoupled from

the event types and topology and therefore do not

entail physics biases on the recorded data

Fault tolerance: the sharedHeap (2)Fault tolerance: the sharedHeap (2)

Node n

EFD

SFO

PT #1

PTIO

PT #n

PTIO

SFI

100111010100010010010001000101000111101000100101001000100101000100001000100101010111100000101110011001001001001010011010101000100010001000100100010010100010000100010010101011110000010111001100100100100101001101010100010001000101010101010100010111101001101001110001

01000111

Evy

SharedHeap

Evx

Evz

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 8: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

8AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

Node n

EFD

SFO

PT#1

PTIO

PT#2

PTIO

SFI

Input

Monitoring

Sorting

ExtPTs ExtPTs

Output Output Output

Trash

SFI

Input

PT#3

PTIO

PT#a

PTIO

PT#b

PTIO

SFOSFO

Calibration data

Debuggingchannel

Main outputstream

Cal

ibra

tion

Implementation

Implementationexample

example

The EFD function is divided into

different specific tasks that could be

dynamically interconnected to form a

configurable EF dataflow network

The internal dataflow is based on

reference passing

Only the pointer to the event (stored in the

sharedHeap) flows among the different tasks

Tasks that implement interfaces to external

components are executed by

independent threads (Multi Thread design)

In order to absorb communication latencies

and enhance performance

Flexibility and ModularityFlexibility and ModularityFUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 9: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

9AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

Throughput

0

20

40

60

80

0,1 1 10 100 1000 10000Event Size (kB)

MB/

sec

Verified the robustness of the architecture

Week long runs (>109 events) without crashes or event losses (even randomly killing PTs)

EFD PT communication mechanism scales with the number of running PTs

SFIEFDSFO communication protocol

Exploit gigabit links for realistic event sizes

Rate limitations for small event sizes (or remote farm implementations)

EFD asks for a new event only after the previous one has been receivedRate limited by the round trip timeImprovements under evaluation

Scalability tests carried out on 230 nodes

Up to: 21 subFarms, 230 EFDs, 16000 PTs

0102030405060708090

100

1 10 100Number of PTs

Rat

e (H

z)

40003600320028002400

Real PT

Dummy PT

Memory limit

Quad xeon 2.5GHz, 4GB

Functional TestsFunctional TestsFUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 10: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

10AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004 ATLAS Combined Test BeamATLAS Combined Test Beam

TRT LAr

Tilecal

MDT-RPC BOS

TRT LAr

Tilecal

MDT-RPC BOS

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 11: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

11AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

pROS

LocalLVL2 farm

Contains the LVL2 result that steers/seeds the EF processing

Trac

ker

Cal

oM

uon

monitoringrun control

101010100010001001001000100010110

ROS

LVL1calo

101010100010001001001000100010110

ROS

LVL1mu

101010100010001001001000100010110

ROS

RPC

101010100010001001001000100010110

ROS

TGC

101010100010001001001000100010110

ROS

CSC

101010100010001001001000100010110

ROS

MDT

101010100010001001001000100010110

ROS

Tile

101010100010001001001000100010110

ROS

LAr

101010100010001001001000100010110

ROS

TRT

101010100010001001001000100010110

ROS

SCT

101010100010001001001000100010110

ROS

Pixel

EventBuilder

DFM

SFI

data

net

wor

k (G

bE)

EF farm @ Meyrin(few Km)

gateway

Remote Farms:PolandCanada

DenmarkInfrastructure

tests only

Test Beam LayoutTest Beam Layout

Local EF farm SFO

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 12: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

12AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

Online event monitoring

Online histograms obtained merging data published by different PTs and gathered by a TDAQ monitoring process (the Gatherer)

Online event reconstruction

E.g.: Track fitting

Online event selection

Beam composed of , , e

Track reconstruction in muon chamber allowed the selection of events

Events labelled according to the selection and/or sent to different output streams

Validation of the HLT muon slice (work in progress)

Transfer LVL2 result to EF (via pROS) and decoding

Steering and seeding of the EF algorithm

Test Beam Online Event ProcessingTest Beam Online Event ProcessingFUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Presenter Main Window

Page 13: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

13AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

Online event monitoring

Online histograms obtained merging data published by different PTs and gathered by a TDAQ monitoring process (the Gatherer)

Online event reconstruction

E.g.: Track fitting

Online event selection

Beam composed of , , e

Track reconstruction in muon chamber allowed the selection of events

Events labelled according to the selection and/or sent to different output streams

Validation of the HLT muon slice (work in progress)

Transfer LVL2 result to EF (via pROS) and decoding

Steering and seeding of the EF algorithm

Online Event ProcessingOnline Event ProcessingFUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

= 61 m

mm

Residuals of segments fit

in muon chambers

Page 14: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

14AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

Online event monitoring

Online histograms obtained merging data published by different PTs and gathered by a TDAQ monitoring process (the Gatherer)

Online event reconstruction

E.g.: Track fitting

Online event selection

Beam composed of , , e

Track reconstruction in muon chamber allowed the selection of events

Events labelled according to the selection and/or sent to different output streams

Validation of the HLT muon slice (work in progress)

Transfer LVL2 result to EF (via pROS) and decoding

Steering and seeding of the EF algorithm

Online Event ProcessingOnline Event ProcessingFUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Energy deposition in calo cells

Hits

in m

uon

cham

ber

Page 15: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

15AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

Online event monitoring

Online histograms obtained merging data published by different PTs and gathered by a TDAQ monitoring process (the Gatherer)

Online event reconstruction

E.g.: Track fitting

Online event selection

Beam composed of , , e

Track reconstruction in muon chamber allowed the selection of events

Events labelled according to the selection and/or sent to different output streams

Validation of the HLT muon slice (work in progress)

Transfer LVL2 result to EF (via pROS) and decoding

Steering and seeding of the EF algorithm

Online Event ProcessingOnline Event Processing

pROS

LocalLVL2 farm

ROS

ROS

ROS

ROS

ROS

DFM

SFI

data

net

wor

k

Local EF farm

FUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN

Page 16: Design Deployment and Functional Tests  of the online Event Filter for the ATLAS experiment

16AN

DREA

NEGRI

, INFN

PAVIA

– NUC

LEAR S

CIENCE

SYMPO

SIUM –

ROME

20th O

ctober

2004

Design: EF designed to cope with the challenging on-line requirements

Scalable design in order to allow dynamic hot-plug of processing resources, to

follow technology trend and to allow geographically distributed implementations

High level of data security and fault tolerance via decoupling between data

processing and data flow functionalities and the use of memory mapped file

Modularity and flexibility in order to allow different EF data-flows

Functional tests: design validated on different test beds

Proven design robustness, design scalability and data security mechanisms

No design limitations observed

Deployment on test beam setup

Online event processing, reconstruction and selection

Online validation of the HLT muon full slice

ConclusionsConclusionsFUNCTIONAL TESTSFUNCTIONAL TESTS DEPLOYMENTDEPLOYMENT CONCLUSIONSCONCLUSIONSINTRODUCTIONINTRODUCTION DESIGNDESIGN