Upload
blair-cain
View
42
Download
3
Tags:
Embed Size (px)
DESCRIPTION
A prototype for an extended PROOF. What is PROOF ? ROOT analysis model … … on a multi-tier architecture Status New development Prototype based on XRD Demo. G. Ganis / CERN PH-SFT, June 2005. The ROOT analysis model: Trees. - PowerPoint PPT Presentation
Citation preview
A prototype for an extended PROOF
• What is PROOF ?• ROOT analysis model …• … on a multi-tier architecture• Status
• New development• Prototype based on XRD• Demo
G. Ganis / CERN PH-SFT, June 2005
The ROOT analysis model: Trees
• Main data structure in ROOT, extending the concept of PAW ntuple• Collection of independent entries• Organized in
• Leafs (basic type, array, C++ object)• Branches (collection of Leafs / Branches)
The ROOT analysis model: Trees (cnt’d)
• Efficient access to portions of entry data
• Several facilities to work with trees• Tree friends (TTree::AddFriend):
• extend an existing tree without touching it• e.g. an experiment read-only tree with user-specific branches / leafs
•Tree chains (TChain)• list of trees to make tree size virtually unbounded (typical size of single tree is < 2 GB)
• In all cases the result behaves exactly as a single tree
The ROOT analysis model: Selector
• TSelector: main tool to define the data processing strategy• Simple structure
• Framework automatically generated for a tree• tree->MakeSelector(“MySelector”)
void MySelector::Begin(TTree *tree){ // method called before starting the event loop fPtBranch = tree->GetBranch(“pt”); fPtBranch->SetAddress(&fPt); fPtHist = new TH1F(“Pt”,”Pt”,100,0.,400.);}
Bool_t MySelector::Process(Long64_t entry){ // Method called for each entry in the tree fPtBranch->GetEntry(entry); fPtHist->Fill(fPt);}
void MySelector::Terminate(){ // method called when the event loop is over fPtHist->Draw();}
Read only what isneeded by the algorithm
The ROOT analysis model: h1 analysis example
{ // localProcessing.C // Define the data set TChain a("h42"); a.Add("/home/ganis/rootdata/dstarmb.root"); a.Add("/home/ganis/rootdata/dstarp1a.root"); a.Add("/home/ganis/rootdata/dstarp1b.root"); a.Add("/home/ganis/rootdata/dstarp2.root");
// Process the selector a.Process("h1analysis.C");}
root [0] .x localProcessing.CStarting h1analysis with process option:Starting h1analysis with process option:Processing file: /home/ganis/rootdata/dstarmb.rootProcessing file: /home/ganis/rootdata/dstarp1a.rootProcessing file: /home/ganis/rootdata/dstarp1b.rootProcessing file: /home/ganis/rootdata/dstarp2.root FCN=70.4023 FROM MIGRAD STATUS=CONVERGED 220 CALLS 221 TOTAL EDM=1.37834e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 p0 9.59988e+05 9.07051e+04 7.92857e+01 -2.69331e-09 2 p1 3.51130e-01 2.32881e-02 4.69706e-05 5.29292e-03 3 p2 1.18502e+03 5.95938e+01 6.72112e-01 2.29626e-06 4 p3 1.45569e-01 5.93851e-05 8.69320e-07 -1.75027e+00 5 p4 1.24388e-03 6.63103e-05 7.86533e-07 -6.72432e-01Real time 0:00:17.563133, CP time 5.880
PROOF
• Why ?• Data to be analyzed only rarely can be all local• Data transfer of full data sets takes time
• Goal: provide a tool for interactive analysis on a heterogeneous cluster• exploit inter-independence of entries in a tree
• basic parallelism achieved by splitting the data into packets of variable size distributed to participant nodes
• Focus on:• Transparency
• same selectors, … on PROOF as in local session• Scalability
• linear scaling up to large number of workers (tested up to 1000)• Adaptability
• cope automatically with different cluster configurations and varying running conditions / perfomances
Motto: Bring the KiloBytes to the PetaBytes and not the PetaBytes to the KiloBytes
PROOF: architecture
PROOF: connection layer
…
client
slave 1
master
proofserv
proofd
proofd
proofslaveproofd
proofd slave n
proofslaveproofd
proofd
fork()
fork() fork() execv()execv()
execv()
parent proofd (always running)
child proofd (transforming in proofserv / proofslave)
proofserv / proofslave : TProofServ instances
PROOF: simplified message flow
PROOF: workflow
PROOF: data access strategies
• Each slave get assigned, as much as possible, packets representing data in local files
• If no (more) local data, get remote data via (x)rootd, rfiod or dCache (needs good LAN, like GB eth)
• In case of SAN/NAS just use round robin strategy
PROOF: processing algorithms
TSelector adapted to PROOFNatural additions• Input list: code to be run, …• Output list: results• Methods to initialize and finalize processing within a slave• Method to init a tree
void MySelector::Begin(TTree *tree){ // called in the client for local inits}void MySelector::SlaveBegin(TTree *tree) { // called in each slave before processing fPtHist = new TH1F(“Pt”,”Pt”,100,0.,400.); fOutput->Add(fPtHist);}void MySelector::Init(TTree *tree) { // called at each tree change fPtBranch = tree->GetBranch(“pt”); fPtBranch->SetAddress(&fPt);}Bool_t MySelector::Process(Long64_t entry){ // called for each entry in the tree fPtBranch->GetEntry(entry); fPtHist->Fill(fPt);}void MySelector::SlaveTerminate() { // called in each slave after processing}void MySelector::Terminate() { // called in the client after processing fPtHist->Draw();}
Defines the list of objects wanted back
Objects with Merge() methodare automatically merged inTerminate
The modified TSelector worksalso in non-PROOF sessions
PROOF: the data
Data set: dedicated class TDSet
• Specifies a collection of files with objects• Understands logical file names• Could be return by a query to a database or file catalog or …• API very close to TChain
{ // proofProcessing.C // Define the data set TDSet a(“TTree”,"h42"); a.Add(“root://oplapro62.cern.ch//tmp/dstarmb.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp1a.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp1b.root"); a.Add(“root://oplapro62.cern.ch//tmp/dstarp2.root");
// Process the selector a.Process("h1analysis.C");}
root[0] gROOT->Proof(“pcepsft43.cern.ch”)PROOF set to parallel mode (10 slaves)root[1] .x proofProcessing.CStarting h1analysis with process option:Starting h1analysis with process option:Processing file: /tmp/ganis/rootdata/dstarp1a.rootProcessing file: /tmp/ganis/rootdata/dstarp2.rootStarting h1analysis with process option:Processing file: //tmp/ganis/rootdata/dstarmb.rootProcessing file: //tmp/ganis/rootdata/dstarp1b.rootProcessing file: //tmp/ganis/rootdata/dstarp2.root FCN=70.4023 FROM MIGRAD STATUS=CONVERGED 220 CALLS 221 TOTAL EDM=1.37834e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 p0 9.59988e+05 9.07051e+04 7.92857e+01 -2.69331e-09 2 p1 3.51130e-01 2.32881e-02 4.69706e-05 5.29292e-03 3 p2 1.18502e+03 5.95938e+01 6.72112e-01 2.29626e-06 4 p3 1.45569e-01 5.93851e-05 8.69320e-07 -1.75027e+00 5 p4 1.24388e-03 6.63103e-05 7.86533e-07 -6.72432e-01root[2]
PROOF: running the query
Executing …
PROOF: additional features
• Possibility to upload and / or build additional packages• packed as PAR file (Proof ARchive, as Java JAR …)
gProof->UploadPackage(“MyPackage.par”)gProof->EnablePackage(“MyPackage”)
• Cache system to minimize the number of file transfers• File identity and integrity using message digest technology• Feedback information at configurable time intervals
PROOF: realtime feedback
Feedback histogram,
updated every (e.g.) 1 second
Chain definition (header) is fetched from the PROOF
master
PROOF on clusters
• PROOF can use “resource brokers” to find out where to start the slaves• PROOF can use file catalogs to locate the files to be analysed• Concrete examples:
• Interface with Condor Computing-On-Demand system• master start the slaves as COD jobs
• PEAC: PROOF-Enabled Analysis Cluster• Complete event analysis solution:
• data catalog, resource broker, PROOF• TGrid: abstract Grid interface for all Grid services
• Concrete implementation for Alien
// ConnectTGrid *alien = TGrid::Connect(“alien”);
// QueryTGridResult *res = alien->Query(“lfn:///alice/simulation/2001-04/V0.6*.root“);
// Data setTDSet *treeset = new TDSet("TTree", "AOD");treeset->Add(res);
// use files in result set to find remote nodesgROOT->Proof(res);treeset->Process(“myselector.C”);
PROOF: current limitations
• Originally intended for short queries• TDSet::Process blocks until is done
•Stateful connection
• everything is lost if the connection is lost or cut
• Originally designed for a local cluster• static configuration
• Robustness of some components• Interrupt control-flow based on Out-Of-Band messages • Authentication when different protocols are required at different steps
• Sandbox when user account not available
• Documentation
PROOF: team for new developments
• Maarten Ballintijn• Marek Biskup• Rene Brun• Derek Feichtinger (ARDA)• G.G.• Guenter Kickinger• Andreas Peters (ARDA)• Fons Rademakers
PROOF: new development fields
• Interactive batch• stateless connection• non blocking queries
• Robusteness• Get rid of OOB messages
• Setup/ configuration issues• zero-config setup• allow slaves to come and go
• Grid interfacing• efficient use of grid information (catalogs, resource brokers, …)
• Performance issues• targeted read ahead, improved caching, query estimators
• Authentication• Adopt XROOTD framework
• Analysis issues:• Tree friends, event lists, indices
• GUI, Browsing
Typical query-time distribution
XPD: communication layer for PROOF based on XROOTD
• Transfer of state from the client to the PROOF cluster requires a manager on the cluster side keeping track of existing sessions and query submissions • XROOTD (in ROOT since v 4.01.02), provides a generic main component (xrd) for handling of networking issues and protocol scheduling, and utilities tools (forking, error handling, security, …) on which the manager can be based on
• Candidate to introduce• interactive-batch mode:
• possibility to leave a session if a query takes too long and reconnect later to pick-up the results
• non-blocking query submission:• possibility to detach from the query while being processed (even for potentially short queries)
• more robust authentication system
How does XROOTD work
• Multi-component server based on a multi-thread architecture
• xrd component: provides networking, thread management, protocol scheduling
• Minimal sets of threads:
• Acceptor: opens connection; matches the protocol; submits job to scheduler• Pollers: react to any activity on open links; submit job to scheduler• Scheduler: schedules work to be done (jobs)• Worker(s): wait for job to be done• Buffer manager: dynamically optimizes use of memory buffers
• Workers created / destroyed following needs
• Links not attached to a specific worker: first worker free takes the job
• Jobs ≡ data/information to be processed for a given link
How does XROOTD work
accept
WN
schedulerBM
XROOTDXrdJob
poller
files
links
XrdXrootdProtocol
• one XrdXrootdProtocol instance per physical connection (i.e. per client session)• client gateway to the files: used to communicate with all the files the client wants to access on that specific server
How does XPROOFD work
accept
WN
scheduler
XPROOFDXrdJob
poller
proofserv
links
XrdProotdProtocol
• one XrdProotdProtocol instance per physical connection (i.e. per client session)• client gateway to proofserv• static area keeps all the relevant information about a user and its activities on the cluster
static area
XPROOFD: communication layer
…
clientxc
slave n
XrdProofd
PO
slave 1
XrdProofd proofslave
PO
master
XrdProofd
proofserv
PO
xc
PO
xcXRD pollers
TXPSocketxc
proofslave
xc xc
fork()
fork() fork()
Basic ingredients
• Client side:• new class TXPSocket
• TSocket interface understanding the new communication protocol• new class TXProofMgr
• reflects the status of a client vis-à-vis of a given cluster• start / attach sessions, described by TProof instances (no more unique)
• Server side:• new implementation of XrdProtocol, XrdProofdProtocol
• client gateway to the cluster, one-to-one relation to TXProofMgr• static area to describing the persistent information (server lifetime)
• new class XrdProofSrv• proxy to the external processor (proofserv), submitted queries, results, …• one per external processor
TXPSocket
• Separate thread for receiving messages• Intensive use of unsolicited messages
• normal asynchronous messages (i.e. in Collect)• interrupts (no OOB)• ping functionality
• Synchronous and asynchronous messages posted in separate queues• Interrupt handler waken up with internal SIGURG (from reader to main thread)• Ping treated as a special interrupt (level 0)
TXPSocket – Reader thread
syncmsg
asyncmsg
interrupts
SIGURG
Post event
recv()TCP connection
XPD: Demo!
Results achieved with the realistic prototype
• Multi-sessions• Disconnect / Reconnect• Process: blocking query• Submit: non-blocking query• Finalize results from different sessions• Archive results to /afs using same daemon as file server
XPD: what next
• Deep test of the communication layer• latencies• synchronization problems
• Test with large realistic number of slaves• Alternatives for internal connection • Enable authentication• XROOTD load balancing?
Other studies
Advanced prototype using a communication layer based onmemory mapped message queue technology (A. Peters,D. Feichtinger):
• full state in message queues• nice recovery features
• multi-thread master• queue insertion, configuration, scheduler, packetizer• client frontend
• slave splitting in supervisor and processors• not attached to a specific user
• better use of resources
Summary
• Lot of activity going on to improve the PROOF system• Working prototype with a communication layer based on XROOTD exists
• interactive batch, multi-session, reconnect • Alternative studies may provided good solutions for some issues
• Goal: have the new system in good shape for ROOT05