17
March 24th-28th 2003 T. M. Steinbeck , V. Lindenstruth, H. Tilsner, for the Al Timm Morten Steinbeck , Computer Science / Computer Engineering Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg A Software Data Transport Framework for Trigger Applications on Clusters

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration Timm Morten Steinbeck, Computer Science

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Timm Morten Steinbeck, Computer Science / Computer Engineering

Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg

A Software Data Transport Framework for Trigger

Applications on Clusters

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Requirements

Alice: A Large Ion Collider Experiment

Very large multiplicity: >15.000 particles/event

Full event size > 70 MB Data rate into last trigger

stage (High Level Trigger, HLT) up to 25 GB/s

HLT has full event data available

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

From raw ADC values...

to

particle tracks

1, 2, 123, 255, 100, 30, 5, 1, 4,3, 2, 3, 4, 5, 3, 4, 60, 130, 30, 5,..........

Requirements

HLT primary task is event reconstruction for triggering and storage

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

High Level Trigger Dataflow

HLT consists of Linux PC farm w. roughly »1000 nodes

Analysis is performed hierarchically

Several stages for data processing and merging

Natural mapping of dataflow, cluster topology, detector geometry

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

High Level Trigger

Framework software required to transport data in HLT Flexible

Components communicating via standardized interface

Pluggable components to support different configurations

Support for runtime reconfiguration

Efficiency Minimize CPU usage to retain cycles for data processing

(primary)

Transport data as quickly as possible (secondary)

C++ Implementation

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Component Interface Only locally on a node Uses shared memory for

data and named pipes for descriptors

Multiple consumers attached to one producer (Publisher-Subscriber paradigm)

Buffer management has to be done in data producer

Event M

Data Block 0

Shared Memory

Data Block n

Publisher

Subscriber

SubscriberEvent MDescriptor

Block n

Block 0

Named Pipe

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Network Communication

Uses class library Abstract call interface Classes optimized for

Small message transfers Large data blocks

Implementations for multiple network technologies/protocols possible

Currently supported: TCP & SCI

Message Base Class

TCP Message Class SCI Message Class

Block Base Class

TCP Block Class SCI Block Class

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Components

Framework contains components to configure dataflow

To merge data streams belonging to one event

To split and rejoin a data stream (e.g. for load balancing)

EventScatterer EventGatherer

Event 1Event 2Event 3

Event 1

Event 3

Processing

Processing

Processing

Event 1Event 2Event 3

EventMerger

Event N, Part 1

Event N, Part 2

Event N, Part 3

Event N

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Components

Framework contains components to configure dataflow To transparently transport data over the network to

other computers (Bridge)

SubscriberBridgeHead has subscriber class for incoming data, PublisherBridgeHead uses publisher class to announce data

Both use network classes for communication

SubscriberBridgeHead

Node 1Network

PublisherBridgeHead

Node 2

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Components

Templates for user specific components

Data SourceAddressing &Handling

Buffer Management

Data Announcing

Data Analysis & Processing

Output Buffer Mgt.

Output Data Announcing

Accepting Input Data

Input Data Addressing

Data SinkAddressing &Writing

Accepting Input Data

Input Data Addressing

Read data from source and publish it (Data Source Template)

Accept data, process it, publish results (Analysis Template)

Accept data and process it, e.g. storing (Data Sink Template)

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Fault Tolerance

Data Source

Data Sink

Worker0

Worker1

Worker2

Spare

Data Source

Data Sink

Worker0

Worker1

Worker2

Spare

Components to handle software or hardware faults Processing distributed for load balancing + redundancy

Upon failure reschedule events and activate spare node

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

„Real-World-Test“

CF 0AU 0

CF 1AU 1

CF 2AU 2

CF 3AU 3

CF 4AU 4

Cf 5AU 5

T 0T 1

T 0T 1

T 0T 1

T 2T 3

T 2T 3

T 2T 3

T 4T 5

T 4T 5

T 4T 5

PMPM

PMPM

PMPM

ESMESM

Patch 0 Patch 1 Patch 2 Patch 3 Patch 4 Patch 5

ADC Unpacker

Clusterfinder

2*Tracker

2*Patch Merger

Event Stream Merger

HL0

HL1

HL2

HL3

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

„Real-World-Test“

Task: Tracking of simulated Alice pp events, Pile up of 25 events Simulate one sector of Alice TPC

(1/36 of detector)

Performance on mix of 800 MHz and 733 MHz systems:

Processing rate of more than 420 Hz

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Fault Tolerance Test

Curves are scaled independantly and arbitrarily

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Interface Performance

733 MHz PC 800 MHz PC 933 MHz PC

Average Event Rate [kHz] 11.86 12.73 14.41

168.7 157.1 138.8Average Time Overhead [s/event]

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Interface Performance

Reference PC memory benchmark scaling

CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration

Conclusion Working framework Flexible configuration w. fault tolerance abilities Can already be used in real applications

To Do: Tool for easier configuration and setup Fault tolerance control instance/decision unit Fine-grained fault tolerance and recovery More tuning