Upload
emily-horton
View
223
Download
3
Tags:
Embed Size (px)
Citation preview
PROOFPROOFStatus and PerspectivesStatus and Perspectives
G. GANISG. GANISCERN / LCGCERN / LCG
VII ROOT Users workshop, CERN, March VII ROOT Users workshop, CERN, March 20072007
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 22
OutlineOutline
(Very) quick introduction(Very) quick introduction What’s new since ROOT05What’s new since ROOT05 Current developments and plansCurrent developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 33
PROOF in a slidePROOF in a slide
PROOF: Dynamic PROOF: Dynamic approach to end-user HEP analysis on distributed approach to end-user HEP analysis on distributed systems exploiting the intrinsic parallelism of HEP data (see systems exploiting the intrinsic parallelism of HEP data (see Backup slides)Backup slides)
(Very) quick introduction (Very) quick introduction What’s new since ROOT05 Current developments and plans
subsubmastermaster
workersworkers MSSMSS
geographical domain
toptopmastermaster
subsubmastermaster
workersworkers MSSMSS
geographical domain
subsubmastermaster
workersworkers MSSMSS
geographical domain
master
clientclient
list of outputlist of outputobjectsobjects
(histograms, …)(histograms, …)
commands,commands,scriptsscripts
PROOF enabled facilityPROOF enabled facility
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 44
PROOF aspects / issuesPROOF aspects / issues
Connection layerConnection layer Xrootd, Authentication, Error handlingXrootd, Authentication, Error handling
Software distributionSoftware distribution Optimized package / class handlingOptimized package / class handling
Data accessData access Optimized distribution of data on worker nodesOptimized distribution of data on worker nodes
Classification / handling of the resultsClassification / handling of the results Query result managerQuery result manager
Resource sharing among usersResource sharing among users Client gets one ROOT session on each machineClient gets one ROOT session on each machine SchedulingScheduling
(Very) quick introduction (Very) quick introduction What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 55
What’s new since ROOT05What’s new since ROOT05
Connection layer based on XROOTDConnection layer based on XROOTD Coordinator functionalityCoordinator functionality Full implementation of “interactive batch” modelFull implementation of “interactive batch” model
Dataset managementDataset management Packetizer improvementsPacketizer improvements Progress in uploading / enabling additional softwareProgress in uploading / enabling additional software Restructuring of the PROOF modulesRestructuring of the PROOF modules Progress in the integrationProgress in the integration withwith experimentexperiment
softwaresoftware PROOF PROOF WikiWiki pages pages ALICE experience at the CAF (ALICE experience at the CAF (see J.F. Grosse-Oetringhaus talk)see J.F. Grosse-Oetringhaus talk)
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 66
Coordinator functionalityCoordinator functionality
Independent channel to control the clusterIndependent channel to control the cluster Global viewGlobal view Independent access to information (e.g. log files)Independent access to information (e.g. log files) Needed for full implementation of “interactive batch”Needed for full implementation of “interactive batch”
Not directly achievable with proofdNot directly achievable with proofd Daemon instance “disappearing” into proofservDaemon instance “disappearing” into proofserv Session lifetime same as client connection lifetimeSession lifetime same as client connection lifetime Parent proofd not aware of childrensParent proofd not aware of childrens
Natural candidate: XROOTDNatural candidate: XROOTD Light weight, industrial strength, networking and Light weight, industrial strength, networking and
protocol handlerprotocol handler
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 77
New connection layer based on New connection layer based on XROOTDXROOTD
New PROOF-related protocol:New PROOF-related protocol: XrdProofdProtocol (XrdProofdProtocol (XPDXPD)) XPD launches and controls PROOF sessions (XPD launches and controls PROOF sessions (proofservproofserv))
Client connection (XrdProofConn) based on XrdClientClient connection (XrdProofConn) based on XrdClient Concept of physical (per client) / logical (per session) Concept of physical (per client) / logical (per session)
connectionconnection Asynchronous reading via dedicated threadAsynchronous reading via dedicated thread
Messages read as soon as available and added to a queueMessages read as soon as available and added to a queue setup a setup a control interrupt network independent of OOBcontrol interrupt network independent of OOB
Cleaner security systemCleaner security system Physical connection authenticatedPhysical connection authenticated
Associated logical connections inherit the “token”Associated logical connections inherit the “token”
Client disconnection / reconnection handled naturallyClient disconnection / reconnection handled naturally
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 88
XPD roleXPD role
XrdProofdProtocol: XrdProofdProtocol: client gateway to proofservclient gateway to proofserv
XPDXPD
linkslinks
XrdProofdProtocolXrdProofdProtocol
staticstaticareaarea
MT stuffMT stuff
proofservproofserv
Work
er
serv
ers
Work
er
serv
ers
clientclient
PROOF FarmPROOF Farm
XROOTDXROOTD
linkslinks
XrdXrootdProtocolXrdXrootdProtocol
filesfiles
MT stuffMT stuff
clientclient
File ServerFile Server
XrdXrootdProtocol:client gateway to files
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 99
XPD communication layerXPD communication layer
…
clientclientxc
worker nworker n
XrdProofdXrdProofd
XS
worker 1worker 1
XrdProofdXrdProofd proofslaveproofslave
XS
mastermaster
XrdProofdXrdProofdproofservproofserv
XS
xc
XS
xc
XRD linksXRD links
TXSocketTXSocket
xc
proofslaveproofslave
xc xc
fork()fork()
fork()fork() fork()fork()
PROOF FarmPROOF Farm
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
clientclientxc
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1010
Stateless connection and “Interactive Stateless connection and “Interactive batch”batch”
““Interactive batch”: flexible submission Interactive batch”: flexible submission system keeping advantages of interactivity system keeping advantages of interactivity and batch and batch If a query is taking too long have the option to abort it, to stop If a query is taking too long have the option to abort it, to stop
and retrieve the results, or to leave it running on the system and retrieve the results, or to leave it running on the system coming back later on to browse / retrieve / archive the resultscoming back later on to browse / retrieve / archive the results
IngredientsIngredients Non-blocking running mode (Non-blocking running mode ( v5.04.00, ROOT05v5.04.00, ROOT05)) Query result management (Query result management ( v5.04.00, ROOT05v5.04.00, ROOT05)) Stateless client connection (Stateless client connection ( v5.08.00v5.08.00)) Ctrl-Z functionality (soon)Ctrl-Z functionality (soon)
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1111
Exploiting the coordinator: client sideExploiting the coordinator: client side
Not yet fully exploited:Not yet fully exploited: new functionality added regularlynew functionality added regularly
Examples:Examples: Log retrievalLog retrieval
TProofLogTProofLog contains log files as TMacro and contains log files as TMacro and implements display, grep, save, … functionalityimplements display, grep, save, … functionality
Session resetSession reset
Cleanup of user’s entry in the coordinatorCleanup of user’s entry in the coordinator Only way-out when something bad happenOnly way-out when something bad happen
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
root[] TProofLog *pl = TProof::Mgr(“user@master”)->GetSessionLogs()root[] pl->Grep(“violation”)
TProof::Reset(“user@master”)
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1212
Exploiting the coordinator: server Exploiting the coordinator: server sideside
Static control of resource usageStatic control of resource usage Max number of usersMax number of users Max number of workers per userMax number of workers per user
Access, usage controlAccess, usage control Role of serverRole of server List of users allowed to connectList of users allowed to connect
Define ROOT versions available on the clusterDefine ROOT versions available on the cluster Extendable to packagesExtendable to packages
……
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1313
Dataset uploaderDataset uploader
Optimized distribution of data files on the farm Optimized distribution of data files on the farm using XROOTD functionalityusing XROOTD functionality By direct uploadBy direct upload By staging out from mass storage By staging out from mass storage
Direct uploadDirect upload Sources: local directory, list of URLsSources: local directory, list of URLs XROOTD/OLBD pool insures optimal distributionXROOTD/OLBD pool insures optimal distribution
No special configuration (except for clean-up)No special configuration (except for clean-up) Using a stagerUsing a stager
Requires XROOTD configurationRequires XROOTD configuration e.g. CASTOR for ALICE @ CAF e.g. CASTOR for ALICE @ CAF
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1414
Dataset managerDataset manager
Data-sets are Data-sets are identified by nameidentified by name
Data-sets can be retrieved by name to Data-sets can be retrieved by name to automatically create TDSet’sautomatically create TDSet’s
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
root[0] TProof *proof = TProof::Open(“master”);root[1] proof->UploadDataSet(“MCppH”,”/data1/mc/ppH_*”);Uploading file:///data1/mc/ppH_01.root to \ root://poolurl//poolpath/ppH_01.root[TFile::Cp] Total 20.34 MB |===============| 100.00 % [6.9 MB/s]
root[2] proof->ShowDataSets();Existing Datasets:MCppH
root[]TDSet *dset = new TDSet(proof->GetDataSet(“MCppH”));
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1515
Dataset managerDataset manager
Metadata stored in sandbox on the masterMetadata stored in sandbox on the master New sub-directory New sub-directory <SandBox>/dataset<SandBox>/dataset
Concept of Concept of privateprivate / / publicpublic data-sets data-sets User’s private definitionsUser’s private definitions
readable / writable by owner onlyreadable / writable by owner only User’s public definitionsUser’s public definitions
readable by anybodyreadable by anybody Global public definitionsGlobal public definitions
Workgroup- / experiment-wide (e.g. 2008 runs)Workgroup- / experiment-wide (e.g. 2008 runs) readable by anybody (group restrictions?)readable by anybody (group restrictions?) writable by privileged account writable by privileged account
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1616
Packetizer improvementsPacketizer improvements
Packetizer’s goal: optimize work distribution to Packetizer’s goal: optimize work distribution to process queries as fast as possibleprocess queries as fast as possible
Standard TPacketizer’s strategyStandard TPacketizer’s strategy first process local files, than try to process remote datafirst process local files, than try to process remote data
End-of-query bottleneckEnd-of-query bottleneck
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
Active workersActive workers
Processing timeProcessing time
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1717
New strategy: TAdaptivePacketizerNew strategy: TAdaptivePacketizer
Predict processing time of local files for each workerPredict processing time of local files for each worker Keep assigning remote files from start of the queryKeep assigning remote files from start of the query to to
workers expected to finish fasterworkers expected to finish faster Processing time Processing time improved by up to 50%improved by up to 50%
Remote packetsRemote packets
SameSamescalescale
Processing rateProcessing rate for all packetsfor all packets
NEW
OLD
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1818
Progress in using additional softwareProgress in using additional software
Package enablingPackage enabling Separated behaviour client / clusterSeparated behaviour client / cluster Real-time feedback during build Real-time feedback during build
Load mechanism extended to single class / macroLoad mechanism extended to single class / macro
Selectors / macros / classes binaries are now Selectors / macros / classes binaries are now cachedcached Decreases initialization timeDecreases initialization time
API to modify include / library paths on the workersAPI to modify include / library paths on the workers Use packages globally available on the clusterUse packages globally available on the cluster
root[] TProof *proof = TProof::Open(“master”)root[] proof->Load(“MyClass.C”)
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 1919
Restructuring of PROOF modulesRestructuring of PROOF modules
Reduce dependenciesReduce dependencies Better control size of executables (proofserv)Better control size of executables (proofserv)
Faster worker startupFaster worker startup First step:First step:
Get rid of TVirtualProof and PROOF dependencies in Get rid of TVirtualProof and PROOF dependencies in ‘tree’‘tree’
All PROOF in ‘proof’, ‘proofx’, ‘proofd’All PROOF in ‘proof’, ‘proofx’, ‘proofd’ Still ‘proofserv’ needs a lot of libsStill ‘proofserv’ needs a lot of libs
2nd step (current situation):2nd step (current situation): Separate out TProofPlayer, TPacketizer, … in Separate out TProofPlayer, TPacketizer, … in
‘proofplayer’ (new libProofPlayer, v5.15.04)‘proofplayer’ (new libProofPlayer, v5.15.04) proofserv size proofserv size on workerson workers reduced by a factor of ~2 reduced by a factor of ~2
at startupat startup
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2020
Further optimization of PROOF libsFurther optimization of PROOF libs
Differentiate setups on client and clusterDifferentiate setups on client and cluster Client:Client:
Needs graphicsNeeds graphics May not need all experiment softwareMay not need all experiment software TSelector: compile only Begin() and Terminate()TSelector: compile only Begin() and Terminate()
Servers:Servers: Need all experiment softwareNeed all experiment software Do not need graphicsDo not need graphics TSelector: do not compile Begin() and TSelector: do not compile Begin() and
Terminate()Terminate() Client and Server versions of basic libsClient and Server versions of basic libs
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2121
Additional improvements (incomplete)Additional improvements (incomplete)
GUI controllerGUI controller Integration of the data set managerIntegration of the data set manager Integration of the new features of package managerIntegration of the new features of package manager Improved session / query history bookkeepingImproved session / query history bookkeeping
Improved user-friendliness of parameter settingImproved user-friendliness of parameter setting
Automatic support dynamic environment settingAutomatic support dynamic environment setting proofserv is a script launching proofserv.exeproofserv is a script launching proofserv.exe Envs to define the context in which to runEnvs to define the context in which to run Useful for experiment specific settings (see later) and/or for Useful for experiment specific settings (see later) and/or for
debugging purposes (e.g. run valgrind on worker …)debugging purposes (e.g. run valgrind on worker …)
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
root[] TProof *proof = TProof::Open(“master”)root[] proof->SetParameter(“factor”, 1.1)
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2222
Integration with experiment softwareIntegration with experiment software
Finding, using the experiment softwareFinding, using the experiment software Environment settings, libraries loadingEnvironment settings, libraries loading
Implementing the analysis algorithmsImplementing the analysis algorithms TSelector frameworkTSelector framework
Structured analysis and automated interaction Structured analysis and automated interaction with trees (chains) (with trees (chains) (++))
Tightly coupled with the tree (Tightly coupled with the tree (--)) New analysis implies new selectorNew analysis implies new selector Change in the tree definition implies a new selectorChange in the tree definition implies a new selector
May conflict with existing experiment technologiesMay conflict with existing experiment technologies Add new layer to hide details irrelevant for the end-Add new layer to hide details irrelevant for the end-
useruser
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2323
Setting the environmentSetting the environment
Experiment software available on nodesExperiment software available on nodes Additional dedicated software handled by the Additional dedicated software handled by the
PROOF package managerPROOF package manager Allows user to run her/his own modificationsAllows user to run her/his own modifications
The experiment environment can be set The experiment environment can be set StaticallyStatically (e.g. ALICE) (e.g. ALICE)
before starting xrootd (inherited by proofserv)before starting xrootd (inherited by proofserv) DynamicallyDynamically (e.g. CMS) (e.g. CMS)
evaluating a user defined script in front of evaluating a user defined script in front of proofservproofserv
Allows to select different versions at run timeAllows to select different versions at run time
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2424
Dynamic environment setting: CMSDynamic environment setting: CMS
CMS needs to run SCRAM before proofservCMS needs to run SCRAM before proofserv PROOF_INITCMD contains thePROOF_INITCMD contains the path of a script (path of a script (NEWNEW))
The script initializes the CMS environment using The script initializes the CMS environment using SCRAMSCRAM
TProof::AddEnvVar(“PROOF_INITCMD”, “~maartenb/proj/cms/CMSSW_1_1_1/setup_proof.sh”)
#!/bin/sh
# Export the architectureexport SCRAM_ARCH=slc3_ia32_gcc323
# Init CMS defaultscd ~maartenb/proj/cms/CMSSW_1_1_1. /app/cms/cmsset_default.sh
# Init runtime environmentscramv1 runtime -sh > /tmp/dummycat /tmp/dummy
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2525
Examples of implementing analysis Examples of implementing analysis algorithmsalgorithms
ALICE:ALICE: Generic AliSelector hiding detailsGeneric AliSelector hiding details User’s selector derives from AliSelectorUser’s selector derives from AliSelector
Access to ESD event by member fESDAccess to ESD event by member fESD Alternative technology using tasksAlternative technology using tasks See J.F. Grosse-Oetringhaus talkSee J.F. Grosse-Oetringhaus talk
TAM technology @ PHOBOSTAM technology @ PHOBOS Based on modularized tasksBased on modularized tasks
Separate analysis tasks from interaction with Separate analysis tasks from interaction with treetree
See C. Reed at ROOT05 See C. Reed at ROOT05
(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2626
CMSSW: provides EDAnalyzer for analysis CMSSW: provides EDAnalyzer for analysis Algorithms with Algorithms with a well defined interfacea well defined interface can be used can be used
with both technologies (EDAnalyzer and TSelector)with both technologies (EDAnalyzer and TSelector)
Used in a Used in a TSelector templated framework TSelector templated framework TFWLiteSelectorTFWLiteSelector
Selector libraries distributed as PAR fileSelector libraries distributed as PAR file
Analysis algorithms in CMSAnalysis algorithms in CMS(Very) quick introduction What’s new since ROOT05 What’s new since ROOT05 Current developments and plans
class MyAnalysisAlgorithm { void process( const edm::Event & ); void postProcess( TList & ); void terminate( TList & );};
// Load framework librarygSystem->Load(“libFWCoreFWLite”);// Load TSelector librarygSystem->Load(“libPhysicsToolsParallelAnalysis”);
TSelector *mysel = new TFWLiteSelector<MyAnalysisAlgorithm>
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2727
Current developments and plansCurrent developments and plans
SchedulingScheduling Consolidation, error handlingConsolidation, error handling
Improved but still cases when we lose control of the sessionImproved but still cases when we lose control of the session Processing error reportProcessing error report
Associate to a query an object detailing what went wrong Associate to a query an object detailing what went wrong (e.g. data set elements not analyzed) and why(e.g. data set elements not analyzed) and why
Non-input-file-driven based analysisNon-input-file-driven based analysis Current processing is based on tree or object filesCurrent processing is based on tree or object files
Local multi-core desktop optimizationLocal multi-core desktop optimization No daemons, UNIX sockets (no master?)No daemons, UNIX sockets (no master?)
GUI: integration in a more general GUI ROOT GUI: integration in a more general GUI ROOT controllercontroller
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2828
PROOF exploiting multi-coresPROOF exploiting multi-cores
Alice search for Alice search for 00’s’s 4 GB simulated data4 GB simulated data
Instantaneous ratesInstantaneous rates
(evt/s, MB/s)(evt/s, MB/s)
Clear advantage ofClear advantage of
quad corequad core
Additional computingAdditional computingPower fully exploitedPower fully exploited
Demo at Intel Quad-Core Launch – Nov 2006Demo at Intel Quad-Core Launch – Nov 2006
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 2929
PROOF: scheduling multi-usersPROOF: scheduling multi-users
Fair resource sharingFair resource sharing System scheduler not enough if NSystem scheduler not enough if Nusersusers >= ~ N >= ~ Nworkersworkers / 2 / 2
Enforce priority policiesEnforce priority policies Two approachesTwo approaches
Quota-based worker level load balancingQuota-based worker level load balancing Simple and solid implementation, no central unitSimple and solid implementation, no central unit
Group quotas defined in the configuration fileGroup quotas defined in the configuration file Central schedulerCentral scheduler
Per-query decisions based on cluster load, resources Per-query decisions based on cluster load, resources need by the query, user history and prioritiesneed by the query, user history and priorities
Generic interface to external schedulers plannedGeneric interface to external schedulers planned MAUI, LSF, …MAUI, LSF, …
(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3030
Quota-based worker level load balancingQuota-based worker level load balancing
Lower priority processes slowdownLower priority processes slowdown sleep before next packet requestsleep before next packet request
Sleeping time proportional to the used CPU timeSleeping time proportional to the used CPU time factor depends on # users and the quotasfactor depends on # users and the quotas
Example: Example: userA, quota 2/3userA, quota 2/3; ; userB, quota 1/3userB, quota 1/3 After T seconds:After T seconds:
CPU(A) = T/2, CPU(B) = T/2CPU(A) = T/2, CPU(B) = T/2 Sleep B form T/2 secondsSleep B form T/2 seconds
After T + T/2 secondsAfter T + T/2 seconds CPU(A) = T/2 + T/2 = 2 * CPU(B) = T/2CPU(A) = T/2 + T/2 = 2 * CPU(B) = T/2
General case of N users brings a tri-diagonal linear General case of N users brings a tri-diagonal linear systemsystem
(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3131
Quota-based worker level load balancingQuota-based worker level load balancing
Group quotas defined in the xrootd configuration fileGroup quotas defined in the xrootd configuration file
Factors recalculated by the master XPD each time Factors recalculated by the master XPD each time that a user start or ends processingthat a user start or ends processing Only active users consideredOnly active users considered
A low priority user will get 100% of resources when aloneA low priority user will get 100% of resources when alone
Under linux processes SCHER_RR system scheduling Under linux processes SCHER_RR system scheduling enforcedenforced The default, dynamic, SCHED_OTHER scheme screws up the The default, dynamic, SCHED_OTHER scheme screws up the
all idea, as sleeping processes get higher priority at restartall idea, as sleeping processes get higher priority at restart
xpd.group tpc usra,usrbxpd.grpparam tpc quota:70%
(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3232
DemoDemo
Same sample analysis (h1 slightly slowed-Same sample analysis (h1 slightly slowed-down) repeated for 20 timesdown) repeated for 20 times
2 users2 users gganis: reserved quota 70%gganis: reserved quota 70% ganis: taking what leftganis: taking what left
Histogram show processing rate in MB/s Histogram show processing rate in MB/s
(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3333
DemoDemo(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3434
Central schedulingCentral scheduling
Entity running on master XPD, loaded as plug-inEntity running on master XPD, loaded as plug-in Abstract interface XrdProofSched definedAbstract interface XrdProofSched defined
Input:Input: Query info (via XrdProofServProxy ->proofserv) Query info (via XrdProofServProxy ->proofserv) Cluster status via OLBD control networkCluster status via OLBD control network PolicyPolicy
Output:Output: List of workers to continue withList of workers to continue with
(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans
class XrdProofSched { …public: virtual int GetWorkers(XrdproofServProxy *xps, std::list<XrdProofWorker *> &wrks)=0; …};
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3535
Central schedulingCentral scheduling(Very) quick introduction What’s new since ROOT05 Current developments and plansCurrent developments and plans
TProofPlayerTProofPlayer(session)(session)
DatasetDatasetLookupLookup
TProofTProof
ClientClient MasterMaster
SchedulerScheduler
TPacketizerTPacketizer(query)(query)
XPDXPD PLB (olbd)PLB (olbd)
Schematic viewSchematic view
Needed ingredients:Needed ingredients: Full exploitation of the OLBD networkFull exploitation of the OLBD network Come&Go functionality for workersCome&Go functionality for workers ……
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3636
Summary Summary
Several improvements in PROOF since ROOT05Several improvements in PROOF since ROOT05 Coordinator functionalityCoordinator functionality Data set managerData set manager Resource controlResource control
ALICE is stress testing the system in LHC environment ALICE is stress testing the system in LHC environment using a test-CAF at CERNusing a test-CAF at CERN a lot of useful feedbacka lot of useful feedback
Efforts now concentrated on Efforts now concentrated on Further consolidation and optimizationFurther consolidation and optimization SchedulingScheduling
PROOF is steadily improving: getting ready for LHC PROOF is steadily improving: getting ready for LHC datadata
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3737
Credits Credits
PROOF teamPROOF team M. Ballintijn, B. Bellenot, L. Franco, G.G., J. M. Ballintijn, B. Bellenot, L. Franco, G.G., J.
Iwaszkiewizc, F. RademakersIwaszkiewizc, F. Rademakers J.F. Grosse-Oetringhaus, A. Peters (ALICE)J.F. Grosse-Oetringhaus, A. Peters (ALICE) A. Hanushevsky (SLAC)A. Hanushevsky (SLAC)
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3838
Backup Backup
See also presentations at previous ROOT See also presentations at previous ROOT workshops and at CHEPxxworkshops and at CHEPxx
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 3939
The ROOT data model: Trees & The ROOT data model: Trees & SelectorsSelectors
Begin()•Create histos, …•Define output list
Process()
preselection analysis
Terminate()•Final analysis (fitting, …)
output listSelector
loop over events
OK
event
branch
branch
leaf
leafleaf
branch
leafleaf
1 2 n last
n
read neededparts only
Chain
branch
leaf leaf
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4040
Motivation for PROOFMotivation for PROOF
Provide an alternative, Provide an alternative, dynamic, approach to dynamic, approach to end-user HEP analysis on distributed systemsend-user HEP analysis on distributed systems
Typical HEP analysisTypical HEP analysis is a continuous refinement is a continuous refinement cycle cycle
Data sets are Data sets are collections of independent eventscollections of independent events LargeLarge (e.g. ALICE ESD+AOD: ~350 TB / year) (e.g. ALICE ESD+AOD: ~350 TB / year) SpreadSpread over many disks and mass storage systems over many disks and mass storage systems
Exploiting intrinsic parallelismExploiting intrinsic parallelism is the only way to is the only way to analyze the data in reasonable timesanalyze the data in reasonable times
Implement algorithm
Run over data set
Make improvements
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4141
The PROOF approachThe PROOF approachcatalog StoragePROOF farm
scheduler
query
MASTER
PROOF query:data file list, myAna.C
files
feedbacksfinal
outputs (merged)
farm perceived as extension of local PC same syntax as in local session
more dynamic use of resources real time feedback automated splitting and merging
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4242
PROOF design goalsPROOF design goals
TransparencyTransparency Minimal impact on the ROOT user habitsMinimal impact on the ROOT user habits
ScalabilityScalability Full exploitation of the available resourcesFull exploitation of the available resources
AdaptabilityAdaptability Cope transparently with heterogeneous environmentsCope transparently with heterogeneous environments
Preserve Real-time interaction and feedbackPreserve Real-time interaction and feedback Intended forIntended for
Central Analysis FacilitiesCentral Analysis Facilities Departmental workgroup computing facilities (Tier-Departmental workgroup computing facilities (Tier-
2’s)2’s) Multi-core / multi-disk desktopsMulti-core / multi-disk desktops
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4343
PROOF dynamic load balancingPROOF dynamic load balancing
Pull architecture guarantees scalabilityPull architecture guarantees scalability
Adapts to variations in performance Adapts to variations in performance
Worker 1 Worker NMaster
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4444
PROOF intrinsic scalabilityPROOF intrinsic scalability
Strictly concurrent user jobsStrictly concurrent user jobs
at CAF (100% CPU used)at CAF (100% CPU used) In-memory dataIn-memory data Dual Xeon, 2.8 GHzDual Xeon, 2.8 GHz
CMS analysisCMS analysis 1 master, 80 workers1 master, 80 workers Dual Xeon 3.2 GHzDual Xeon 3.2 GHz Local data: 1.4 GB / nodeLocal data: 1.4 GB / node Non-Blocking GB EthernetNon-Blocking GB Ethernet
1 user
2 users
4 users
8 users
I. Gonzales, Cantabria
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4545
PROOF essentials: what can be done?PROOF essentials: what can be done?
Ideally everything made of independent tasksIdeally everything made of independent tasks Currently available:Currently available:
Processing of trees Processing of trees Processing of independent objects in a fileProcessing of independent objects in a file
Tree processing and drawing functionality Tree processing and drawing functionality completecomplete
// Create a chain of treesroot[0] TChain *c = CreateMyChain.C;
// MySelec is a TSelectorroot[1] c->Process(“MySelec.C+”);
// Create a chain of treesroot[0] TChain *c = CreateMyChain.C;
// Start PROOF and tell the chain// to use itroot[1] TProof::Open(“masterURL”);root[2] c->SetProof()
// Process goes via PROOFroot[3] c->Process(“MySelec.C+”);
PROOFLOCAL
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4646
The PROOF targetThe PROOF target
Short analysis usinglocal resources, e.g.- end-analysis calculations- visualization
Long analysis jobs with well defined algorithms (e.g. production of personal trees)
Medium term jobs, e.g.analysis design and development using alsonon-local resources
Optimize response for short / medium jobs Perceive medium as short
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4747
PROOF: additional remarksPROOF: additional remarks
Intrinsic serial overhead smallIntrinsic serial overhead small requires reasonable connection between a requires reasonable connection between a
(sub-)master and its workers(sub-)master and its workers Hardware considerationsHardware considerations
IO bound analysis (frequent in HEP) often limited by IO bound analysis (frequent in HEP) often limited by hard drive access: N small disks are much better hard drive access: N small disks are much better than 1 big onethan 1 big one
Good amount of RAM for efficient data caching Good amount of RAM for efficient data caching Data access is The IssueData access is The Issue::
Optimize for data locality, when possibleOptimize for data locality, when possible Low-latency access to mass storageLow-latency access to mass storage
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4848
PROOF: data access issuesPROOF: data access issues
Low latencyLow latency in data access is in data access is essential for high essential for high performanceperformance Not only a PROOF issueNot only a PROOF issue
File opening overheadFile opening overhead Minimized using asynchronous open techniquesMinimized using asynchronous open techniques
Data retrievalData retrieval caching, pre-fetching of data segments to be caching, pre-fetching of data segments to be
analyzedanalyzed Recently introduced in ROOT for TTreeRecently introduced in ROOT for TTree
Techniques improving network performance, e.g. Techniques improving network performance, e.g. InfiniBand, or file access (e.g. memory-based file InfiniBand, or file access (e.g. memory-based file serving, PetaCache) should be evaluatedserving, PetaCache) should be evaluated
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 4949
PROOF: PAR archive filesPROOF: PAR archive files
Allow client to add software to be used in the analysisAllow client to add software to be used in the analysis Simple structureSimple structure
packagepackage// Source / binary filesSource / binary files
packagepackage//PROOF-INF/BUILD.shPROOF-INF/BUILD.sh How to build the package (makefile)How to build the package (makefile)
packagepackage//PROOF-INF/SETUP.CPROOF-INF/SETUP.C How to enable the package (load, dependencies)How to enable the package (load, dependencies)
A PAR is a gzip’ed tar-ball of the A PAR is a gzip’ed tar-ball of the packagepackage tree tree Versioning support being addedVersioning support being added
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 5050
PROOF essentials: monitoringPROOF essentials: monitoring
InternalInternal File access rates, packet latencies, processing time, File access rates, packet latencies, processing time,
etc.etc. Basic set of histograms available at tunable frequencyBasic set of histograms available at tunable frequency
Client temporary output objects can also be Client temporary output objects can also be retrievedretrieved
Possibility of detailed tree for further analysisPossibility of detailed tree for further analysis MonALISA-basedMonALISA-based
Each host reportsEach host reports CPU, memory,CPU, memory, swap, networkswap, network
Each worker reportsEach worker reports CPU, memory, evt/s,CPU, memory, evt/s, IO vs. network rateIO vs. network rate
pcalimonitor.cern.ch:8889pcalimonitor.cern.ch:8889
Network traffic between nodes
BackupBackup
27/03/200727/03/2007 G. Ganis, ROOT Users WorkshopG. Ganis, ROOT Users Workshop 5151
PROOF GUI controllerPROOF GUI controller
Allows full Allows full on-clickon-click control control
define a new sessiondefine a new session submit a query, executesubmit a query, execute
a command a command query editorquery editor
create / pick up a chain create / pick up a chain choose selectorschoose selectors
online monitoring of feedback histogramsonline monitoring of feedback histograms browse folders with results of querybrowse folders with results of query retrieve, delete, archive functionalityretrieve, delete, archive functionality
BackupBackup