Page 1: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API


A Component Framework for A Component Framework for Distributed Data Analysis in HEPDistributed Data Analysis in HEP

Jakub T. MoscickiCERN IT/[email protected]

Page 2: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 2


goals: study requirements of

semi-interactive parallel analysis in HEP middleware technology evaluation & choice

CORBA, MPI, Condor, LSF... also see how to integrate API products with GRID

prototyping (focus on ntuple analysis) young project:

June 2001: start (0.5 FTE) June 2002: running prototype exists (1.5 FTE)

sample Ntuple analysis with Anaphe run-level parallel Geant4 simulation (soon)

Page 3: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 3

How does it fit with the Grid ?

Grid-enabled framework for HEP applications this framework will be a Grid component ...via a gateway that understands Grid/JDL

framework uses lower level Grid components authentication, security, load balancing

distribution aspects parallel cluster computation

"institute" or "workgroup" level (Tier 1-3) local computing center

remote analysis geographically unlimited

Page 4: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 4

Distributed Analysis: MotivationDistributed Analysis: Motivation

why do we want distributed data analysis? move processing close to data

for example ntuple job description ~ kB the data itself ~ MB, GB, TB ...

rather than downloading gigabyte data let the remote server do the job

do it in parallel – faster clusters of cheap PCs

Page 5: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 5

Topology of I/O intensive app.

ntuple mostly I/O intensive rather than CPU intensive

fast DB access from cluster slow network from user to

cluster very small amount of data

exchanged between the tasks in comparison to"input" data

Page 6: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 6

Parallel ntuple analysis

data driven all workers perform same task (similar to SPMD) synchronization quite simple (independent workers) master/worker model

Page 7: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 7

HEP public/workgroup clusters

features many users, many jobs diverse applications:

ntuple analysis, simulation, ... interactive ... semi-interactive ... batch ~ 100s of machines

dynamic environment users may submit their analysis code

mixed CPU and I/O intensive some applications may be preconfigured

general analysis e.g. ntuple projections or experiment specific apps load balancing important

thanks to Anaphe team

Page 8: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 8

Example of ntuple projection

example of semi-interactive analysis data: 30 MB HBOOK ntuple / 37K rows / 160 columns time: minutes .. hours

timings desktop (400Mhz, 128MB RAM) - c.a. 4 minutes standalone lxplus (800Mhz, SMP, 512MB RAM) - c.a. 45

sec 6 lxplus workers - c.a. 18 sec

why 6 * 18 = 45 ? job is small, so big fraction of the time is compilation and

dll loading, rather than computation pre-installing application would improve the speed caveat: example running on AFS and public machines

Page 9: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 9

Medicine applications

example: brachytherapy optimization of the treatment planning by MC simulation

features CPU intensive few users, few jobs one preconfigured application interactive: seconds .. minutes ~ 10s of machines

ongoing joint collaboration with G4and hospital units in Torino, Italy

to be deployed soon

thanks to M.G. Pia

Page 10: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 10

Space science applications

example: LISA MC simulation for gravitational

waves experiment features

CPU intensive big jobs (10 processor-years) preconfigured applications batch: days 1000+ machines

requirements: error recovery important monitoring and diagnostics

thanks to A. Howard

Page 11: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 11

Master/Worker model

applications share the same computation model so also share a big part of the framework code but have different non-functional requirements





Client WorkPlanner

WorkerWorker Worker


Page 12: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 12

Architecture principles

framework core 100% application independent e.g. Anaphe/Lizard ntuple analysis is just one application

thin client approach just create a well-formed job description in XML send via CORBA and read the results back in XML so client may be a standalone application in C++ or python,

or integrated into analysis framework (e.g. Lizard)

dynamic application repository plugin repository in XML dynamic loading on the server side + meta-tools (admin)

Page 13: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 13

Architecture principles (2)

component design of the core framework find common parts for all use-cases plug-in use-case specific components do not over-generalize

AIDA-based analysis applications using Lizard/Anaphe but any AIDA compliant

tool could be used (JAS, OpenScientist) see ACAT talks by V.Serbo "AIDA" and M.Sang

"Anaphe" integrated into python environment

Page 14: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 14

Deployment of Distibuted Components

layering: abstract middleware dynamic application loading plugin components

Page 15: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 15

Using CORBA and XML

inter-operability (shown in the prototype ntuple application) cross-release (muchos gracias XML!)

client running Lizard/Anaphe 3.6.6 server running 4.0.0-pre1

cross-language (muchos gracias CORBA!)

python CORBA client (~30 lines) C++ CORBA server

compact XML data messages 500 bytes to server, 22k bytes from server of XML

description factor 106 less than original data (30 MB ntuple)

thin client: no need to run Lizard on the client side as an alternative use case scenario

Page 16: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 16

Facade for end-user analysis

3 groups of user roles developers of distributed analysis applications

brand new applications e.g. simulation advanced users with custom ntuple analysis code

similar to Lizard Analyzer execute custom algorithm on the parallel ntuple scan

interactive users do the standard projections just specify the histogram and ntuple to project

user-friendly means: show only the relevant details hide the complexity of the underlying system

Page 17: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 17

Facade for end-user analysis

Page 18: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 18

Choices for back end s/w

For LHC not yet certain (outcome of LCG) Batch Job System (e.g. LSF)

limited control -> submit jobs (black box) job queues with CPU limits automatic load balancing, scheduling (task creation

and dispatch) prototype: deployed (~10s workers)

Dedicated Interactive Cluster custom daemons more control -> explicit creation of tasks load balancing callbacks into specific application prototype: custom PULL load-balancing (~10s


Page 19: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 19

Dedicated Interactive Cluster (1)

Daemons per node Dynamic process allocation

Page 20: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 20

Dedicated Interactive Cluster (2)

Daemons per user per node Thread pools, per-user policies

Page 21: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 21

Towards a flexible architecture

Corba Component Model (CCM) pluggable components & services make a truly component system on the core

architecture level common interface to the service

components difficult due to different nature of the services

implementations example: load-balancing service

Condor - process migration LSF - black-box load balancing custom PULL implenetation - active load balancing

but first results very encouraging

Page 22: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API


Page 23: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 23

Error recovery service

The mechanisms daemon control layer

make sure that the core framework process are alive periodical ping – need to be hierarchized to be

scalable worker sandbox

protect from the seg-faults in the user applications memory corruption exceptions signals

based on standard Unix mechanisms: child processes and signals

Page 24: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

24thanks to G.Chwajol

Page 25: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 25

Other services

Interactive data analysis connection-oriented vs connectionless monitoring and fault recovery

User environment replication do not rely on the common filesystem (e.g. AFS) distribution of application code

binary exchange possible for homogeneous clusters distribution of local setup data

configuration files, etc… binary dependencies (shared libraries etc)

Page 26: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 26


Optimizing distributed I/O access to data clustering of the data in the DB on the per-task basis depends on the experiment-specific I/O solution

Load balancing framework is not directly addressing low level issues ...but the design must be LB-aware

partition the initial data set and assign data chunks to tasks how big chunks? static/adaptive algorithm?

push vs pull model for dispatching tasks etc.

Page 27: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 27

Long term evolution

Full production in 2007 (LHC startup) software evolution and policy

distributed technology (CORBA, RMI, DCOM, sockets, ...) persistency technology (LCG RTAGs -> ODBMS, RDBMS,

RIO) programming/scripting languages (C++, Java,

python,...) hardware evolution what will come out of Grid?

Globus LCG, DataGrid, CrossGrid (interactive apps) ...

Page 28: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 28


Model limited to Master/Worker More complex synchronization patterns

some particular cpu-intensive applications require fine-grained synchronization between workers - this is NOT provided by the framework and must be achieved by other means (e.g MPI)

Intra-cluster scope: NOT a global metacomputer

Grid-enabled gateway to enter Grid universe otherwise the framework is independent thanks to

Abstract Interfaces

Page 29: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 29

Similar project in HEP

PIAF (history) using PAW

TOP-C G4 examples for parallelism at event-level

BlueOx Java using JAS for analysis some space for communality via AIDA

PROOF based on ROOT

Page 30: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 30


first prototype ready and working proof of concept for up to 50 workers ~1000 workers needs to be checked

deployment comming soon integration with Lizard analysis tool medical apps

active R&D in component architecture relation to LCG (?)

Page 31: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API


That's about it

Page 32: 1 A Component Framework for Distributed Data Analysis in HEP Jakub T. Moscicki CERN IT/API

ACAT2002, June, Moscow CERN IT/API, [email protected] 32

Data Exchange Protocol API

/* NTupleProtocol.h */class HistogramParams : public DXP::DataObject{public: HistogramParams(DXP::DataObject *parent) : DXP::DataObject(parent),

nbins(this), xmin(this), xmax(this) {} DXP::Long nbins; DXP::Double xmin; DXP::Double xmax;};class JobResult : public DXP::DataObject{public: JobResult(DXP::DataObject *parent) : DXP::DataObject(parent),

histoXML(this), jobData(this) {}

DXP::String histoXML; JobData jobData;};
