View
213
Download
0
Embed Size (px)
Citation preview
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Timm Morten Steinbeck, Computer Science / Computer Engineering
Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg
A Software Data Transport Framework for Trigger
Applications on Clusters
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Requirements
Alice: A Large Ion Collider Experiment
Very large multiplicity: >15.000 particles/event
Full event size > 70 MB Data rate into last trigger
stage (High Level Trigger, HLT) up to 25 GB/s
HLT has full event data available
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
From raw ADC values...
to
particle tracks
1, 2, 123, 255, 100, 30, 5, 1, 4,3, 2, 3, 4, 5, 3, 4, 60, 130, 30, 5,..........
Requirements
HLT primary task is event reconstruction for triggering and storage
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
High Level Trigger Dataflow
HLT consists of Linux PC farm w. roughly »1000 nodes
Analysis is performed hierarchically
Several stages for data processing and merging
Natural mapping of dataflow, cluster topology, detector geometry
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
High Level Trigger
Framework software required to transport data in HLT Flexible
Components communicating via standardized interface
Pluggable components to support different configurations
Support for runtime reconfiguration
Efficiency Minimize CPU usage to retain cycles for data processing
(primary)
Transport data as quickly as possible (secondary)
C++ Implementation
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Component Interface Only locally on a node Uses shared memory for
data and named pipes for descriptors
Multiple consumers attached to one producer (Publisher-Subscriber paradigm)
Buffer management has to be done in data producer
Event M
Data Block 0
Shared Memory
Data Block n
Publisher
Subscriber
SubscriberEvent MDescriptor
Block n
Block 0
Named Pipe
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Network Communication
Uses class library Abstract call interface Classes optimized for
Small message transfers Large data blocks
Implementations for multiple network technologies/protocols possible
Currently supported: TCP & SCI
Message Base Class
TCP Message Class SCI Message Class
Block Base Class
TCP Block Class SCI Block Class
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Components
Framework contains components to configure dataflow
To merge data streams belonging to one event
To split and rejoin a data stream (e.g. for load balancing)
EventScatterer EventGatherer
Event 1Event 2Event 3
Event 1
Event 3
Processing
Processing
Processing
Event 1Event 2Event 3
EventMerger
Event N, Part 1
Event N, Part 2
Event N, Part 3
Event N
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Components
Framework contains components to configure dataflow To transparently transport data over the network to
other computers (Bridge)
SubscriberBridgeHead has subscriber class for incoming data, PublisherBridgeHead uses publisher class to announce data
Both use network classes for communication
SubscriberBridgeHead
Node 1Network
PublisherBridgeHead
Node 2
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Components
Templates for user specific components
Data SourceAddressing &Handling
Buffer Management
Data Announcing
Data Analysis & Processing
Output Buffer Mgt.
Output Data Announcing
Accepting Input Data
Input Data Addressing
Data SinkAddressing &Writing
Accepting Input Data
Input Data Addressing
Read data from source and publish it (Data Source Template)
Accept data, process it, publish results (Analysis Template)
Accept data and process it, e.g. storing (Data Sink Template)
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Fault Tolerance
Data Source
Data Sink
Worker0
Worker1
Worker2
Spare
Data Source
Data Sink
Worker0
Worker1
Worker2
Spare
Components to handle software or hardware faults Processing distributed for load balancing + redundancy
Upon failure reschedule events and activate spare node
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
„Real-World-Test“
CF 0AU 0
CF 1AU 1
CF 2AU 2
CF 3AU 3
CF 4AU 4
Cf 5AU 5
T 0T 1
T 0T 1
T 0T 1
T 2T 3
T 2T 3
T 2T 3
T 4T 5
T 4T 5
T 4T 5
PMPM
PMPM
PMPM
ESMESM
Patch 0 Patch 1 Patch 2 Patch 3 Patch 4 Patch 5
ADC Unpacker
Clusterfinder
2*Tracker
2*Patch Merger
Event Stream Merger
HL0
HL1
HL2
HL3
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
„Real-World-Test“
Task: Tracking of simulated Alice pp events, Pile up of 25 events Simulate one sector of Alice TPC
(1/36 of detector)
Performance on mix of 800 MHz and 733 MHz systems:
Processing rate of more than 420 Hz
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Fault Tolerance Test
Curves are scaled independantly and arbitrarily
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Interface Performance
733 MHz PC 800 MHz PC 933 MHz PC
Average Event Rate [kHz] 11.86 12.73 14.41
168.7 157.1 138.8Average Time Overhead [s/event]
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Interface Performance
Reference PC memory benchmark scaling
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration
Conclusion Working framework Flexible configuration w. fault tolerance abilities Can already be used in real applications
To Do: Tool for easier configuration and setup Fault tolerance control instance/decision unit Fine-grained fault tolerance and recovery More tuning