19
Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab 1

Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Embed Size (px)

Citation preview

Page 1: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Multi-threaded Event Processing with JANADavid Lawrence – Jefferson Lab

Nov. 3, 2008

11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab 1

Page 2: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Thomas Jefferson National Accelerator Facility (JLab)

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

• 6 GeV electron accelerator user facility funded by the US Dept. of Energy

Located in Newport News on the east coast of Virginia, USA

• 1 of the 2 major nuclearphysics research labs in

the U.S.

CHLCHL22

for basic research into the quark structure of nuclear matter

12 GeV

11 GeV(CD-3 approval came in Sept. 2008 with data planned in 2014)

Page 3: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

The GlueX Experiment

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

real

beam

2 Tesla solenoidmagnet

30 cm LH2 target

Forward EM calorimeter and forward TOF wall downstream

Cylindrical and planar drift chambers inside magnet

Barrel EM calorimeter inside magnet

The “continuous wave” 12GeV electron beam at JLab has a beam bunch every 2 ns

Conventional meson has quantum numbers determined only by constituent quarks

Hybrid meson has some quantum properties due to contributions from the “glue”

Page 4: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Data Rates in 12GeV eraFront EndDAQ Rate

EventSize

L1 Trigger

Rate

Bandwidth

to massStorage

GlueX 3 GB/s 15 kB 200 kHz

300 MB/s

CLAS12

100 MB/s

20 kB 10 kHz 100 MB/s

ALICE 500 GB/s

2.5 MB 200 kHz

200 MB/s

ATLAS 113 GB/s

1.5MB 75 kHz 300 MB/s

CMS 200 GB/s

1 MB 100kHz 100 MB/s

LHCb 40 GB/s

40 kB 1 MHz 100 MB/s

STAR 8 GB/s 80 MB 100 Hz 30 MB/s

PHENIX

900 MB/s

~60 kB ~ 15 kHz

450 MB/s

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

LH

CJL

ab

BN

L *

CH

EP

20

07

talk

Sylv

ain

Ch

ap

elin

pri

vate

com

m.

* NIM A499 Mar. 2003 ppg 762-765** CHEP2006 talk MartinL. Purschke

**

Page 5: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

CPU development in the coming years

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

From “Platform 2015: Intel Platform Evolution for the Next Decade”

expect more than 100 cores in a box by 2014!

• CPU development has shifted from increased clock speed to multiple cores

• Dual and quad core CPUs are common today

• Some type of parallelization must be done to use all of the power in a next generation CPU

Page 6: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Multi-threading vs. Multiple Processes for a Single Input File

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

FILE FILE FILE

singlethreadedprogram

singlethreadedprogram

singlethreadedprogram

singlethreadedprogram

singlethreadedprogram

singlethreadedprogram

multi-threadedprogram

dispatcher

Accumulator

fileoutput

Merger

fileoutput

fileoutput

Multiple Processes Multiple Threads

Bookkeeping overhead is reduced with multiple threads

option 1 option 2

FILE FILE FILE

Page 7: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Threading benefits small scale processing

(individual developer cycle)

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

= multi-threaded = single-threaded

Single Workstation

cor

es

processing time

Total CPU power proportional to area

= edit/compile

Multi-threading leads to a more rapid turn around time when developing

single-threaded

multi-threaded

The relevant measure of CPU “power” now includes the number of cores used

ΦNt = Ncores ⋅dt∫

Page 8: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

The JANA Factory Model

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

• Traditional factory models pass ownership of created objects to the caller• In JANA, only const pointers are passed out and ownership stays with the factory• Passing out only const pointers guarantees that only the factory may modify the objects• Subsequent requests get the same const pointers

vector<const DTrack*> tracks;loop->Get(tracks);

• Templated Get() method helps ensure type safety• Framework itself responsible for telling factories to delete objects at end of event• Persistent flag marks factories that should not auto-delete objects

Page 9: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Threads in JANA

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

• Each thread in JANA is composed of its own event processing loop and a complete set of factories

• Reconstruction of a given event is done entirely inside of a single thread

• No mutex locking is required by authors of reconstruction code

• Threads work asynchronously to maximize rates at the expense of not maintaining the event order on output

raw data read in

reconstructed values written out(e.g. ROOT tree)

Page 10: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Multi-threading when CPU limited

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

• CPU intensive jobs are the ideal application for multi-threading

• Blue circles are reconstruction of data from a Monte Carlo simulation

• Red triangles are from a CPU-hungry speed testing plugin

• Both show very good scaling of the event processing rate with the number of threads

Reconstruction of MC data, CPU bound jobs only

Overall event processing rate scales linearly with the number of threads

Page 11: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Multi-threading when I/O limited

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

• Multiple processes trying to access different locations on the same disk leads to competition causing the read head to physically move back and forth from one location on the disk to another

• A multi-threaded application will access a single file in sequence reducing the number of moves the read head must make

blue circles: one multi-threaded process reading from a single filered triangles: multiple single-thread processes reading different files from the same disk

No processing of event data, I/O bound jobs only

Page 12: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Features of JANA

C++ , object-oriented, STL

Multi-threaded : reconstruction program can launch any number of processing threads with each event being seen by only one thread

Plug-ins : an existing, compiled program can dynamically load other modules that extend or modify it’s behavior at run-time

Reconstruction AlgorithmsEvent (Data) sourcesEvent Processors (i.e. the top-level “conductor”)

Data on demand : modules are not “activated” unless the data they produce is requested for that particular event

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

The Event Processing Framework JANA includes the following features:

Page 13: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

SummaryIn the 12GeV era, JLab expects to produce more than 5 pB/yr

Performance improvements have been shown for both CPU and I/O limited jobs using a multi-threaded event processing framework.

Taking advantage of multi-core architectures requires very little effort from reconstruction code authors in a multi-thread framework.

Other JANA features not covered:Automatic TTree creation

Internal profiling and call graphing

Calib. /Cond. DB API

… 11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

Page 14: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Backup Slides

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

Page 15: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

The janaroot plugin (for automatic creation of ROOT TTrees)

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

• Each data object implements a toStrings() method which provides an expression of the data object that may not be a full representation of the object• The toStrings() mechanism was developed for allowing a simple, low-level dump of objects from single events to the screen• This mechanism is leveraged by janaroot to provide a similar expression as TTrees• An empty event tree is also created with all other trees

Each leaf is an array of size “N” to represent the N objects of this type in the event

A leaf named “N” is automatically added to each tree

listed as friends so that a leaves from multiple objects can be used together in expressions• Limitations make this unsuitable for all applications, but it does provide a quick, easy way to make plots of some reconstructed values for less experienced users

Page 16: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

The janadot plugin(for creating a factory call graph)

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

Number of calls and amount of time spent satisfying each is reported

Objects at bottom of graph are (mostly) supplied by event source

arrows indicate calling sequencedata flow is in opposite direction

Page 17: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Important Roles of the Event Processing Framework

A clear structure for modular building of reconstruction code

An easy means for swapping out modules(e.g. replace one calorimeter clustering algorithm with another one)

A mechanism for moving data between modules

Standard interface to event sources(i.e. reconstruction agnostic as to whether event came from file, socket, web service, etc…)

Standard interface to Calibrations and Conditions DB

Centralized area for run-time settings with simple access mechanism (i.e. allow user to modify a setting at runtime and all modules can see it)

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

The framework should provide:JANA has been designed to provide all of these!

Page 18: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Threading benefits large scale processing

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

core

s

processing time

= multi-threaded= single-threaded

Single Farm node

Page 19: Multi-threaded Event Processing with JANA David Lawrence – Jefferson Lab Nov. 3, 2008 11/3/08 Multi-threaded Event Processing with JANA - D. Lawrence JLab

Threading benefits large scale processing

11/3/08Multi-threaded Event Processing with JANA - D. Lawrence JLab

1 year of GlueX data =10k to20k files if 1 file every 10 min.