Upload
mahina
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The ROOT Project in the multi-core CPU era. CHEP06, Mumbai 15 February 2006 Ren é Brun CERN. Plan of talk. ROOT: 11 years old !! Still many developments Multi Core cpus: parallelism ROOT, Software Obesity and the GRID. ROOT: a long story. - PowerPoint PPT Presentation
Citation preview
The ROOT Project in the multi-core CPU
eraCHEP06, Mumbai
15 February 2006
René Brun
CERN
René Brun, CERN ROOT in the multi-core cpu era 2
Plan of talk
• ROOT: 11 years old !!• Still many developments
• Multi Core cpus: parallelism
• ROOT, Software Obesity and the GRID
René Brun, CERN ROOT in the multi-core cpu era 3
ROOT: a long story
• Started in January 1995. ROOT had to face many sociological obstacles at a time when most users were changing experiments, languages and lost in many fights. “Every problem has its root in failure of a relationship”
(The Times of India Tuesday 14 February)
• This initial opposition has been a key element for the success of the project. By spotting the inevitable weaknesses of some early designs, it forced the team to react quickly. The development method involving more and more users has been essential to get feedback. Designing a large system like ROOT is an iterative process. This process has involved many people in many experiments.
• ROOT is now strongly supported at CERN and FNAL. Many thanks to the management and my colleagues in the LCG project for facilitating a convergent process.
René Brun, CERN ROOT in the multi-core cpu era 4
ROOT project: some numbers
• The ROOT project is comparable in size and complexity to the software of each LHC experiment. See, for instance, the evaluation by the sloccount tool
• sloccount by John Wheeler assumes
Total Physical Source Lines of Code (SLOC) = 1,709,170Development Effort Estimate, Person-Years (Months) = 495.97 (5,951.63)Schedule Estimate, Years (Months) = 5.66 (67.97)Estimated Average Number of Developers = 87.57Total Estimated Cost to Develop = $ 66,998,665
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) (average salary = $56,286/year, overhead = 2.40).
René Brun, CERN ROOT in the multi-core cpu era 5
ROOT person power
CERN + FNALOnly people working full time on the project
René Brun, CERN ROOT in the multi-core cpu era 6
Presentations about ROOT & co at CHEP0698 - PROOF - The Parallel ROOT Facilit Distributed Data Analysis - Monday 13 February 15:00 Presenter: GANIS, Gerardo (CERN)
187 - ROOT GUI, General Status Software Tools and Information Systems - Monday 13
February 16:40 Presenter: RADEMAKERS, Fons (CERN)
188 - From Task Analysis to the Application Design Software Tools and Information Systems - Monday 13
February 17:00 Presenter: Mr. RADEMAKERS, Fons (CERN)
129 - ROOT I/O for SQL databases Software Components and Libraries - Monday 13
February 17:40 Presenter: Dr. LINEV, Sergey (GSI DARMSTADT)
185 - Reflex, reflection for C++ Software Components and Libraries - Tuesday 14
February 14:00 Presenter: Dr. ROISER, Stefan (CERN)
Xxx Recent Developments in the ROOT I/O and TTrees Software Components and Libraries - Monday 13 February 16:00 Presenter: Dr. Canal, Philippe (FNAL)
227 - New Developments of ROOT Mathematical Software Libraries
Software Components and Libraries - Tuesday 14 February 16:00
Presenter: Dr. MONETA, Lorenzo (CERN)
383 - New features in ROOT geometry modeller for representing non-ideal geometries
Software Components and Libraries - Wednesday 15 February 14:00
Presenter: CARMINATI Federico (CERN)
93 – ROOT 3D graphics Software Components and Libraries - Wednesday 15 February
16:00 Presenter: BRUN, Rene (CERN)
407 - Performance and Scalbility of xrootd Distributed Data Analysis - Wednesday 15 February 17:00 Presenter: HANUSHEVSKY, Andrew (Stanford Linear
Accelerator Center)
92 - ROOT 2D graphics visualisation techniquesPoster - Monday 13 February 11:00
91 - ROOT 3D graphics overview and examplesPoster - Monday 13 February 11:00
189 - Recent User Interface Developments Poster - Monday 13 February 11:00
186 - ROOT/CINT/Reflex integrationPoster - Monday 13 February 11:00
228 - The structure of the new ROOT Mathematical Software Libraries
Poster - Wednesday 15 February 09:00
249 - XrdSec - A high-level C++ interface for security services in client-server applications
Poster - Wednesday 15 February 09:00
408 - xrootd Server ClusteringPoster - Wednesday 15 February 09:00
Multi Core cpus
Impact on ROOT
René Brun, CERN ROOT in the multi-core cpu era 8
Multi Core CPUs
http://www.intel.com/technology/computing/archinnov/platform2015/
This is going to affect the evolution of ROOT in many areas
René Brun, CERN ROOT in the multi-core cpu era 9
Moore’s law revisited
Your laptop in 2016 with32 processors
16 Gbytes RAM16 Tbytes disk
> 50 today’s laptop
René Brun, CERN ROOT in the multi-core cpu era 10
• There are many areas in ROOT that can benefit from a multi core architecture. Because the hardware is becoming available on commodity laptops, it is urgent to implement the most obvious asap.
• Multi-Core often implies multi-threading. There are several areas to be made not only thread-safe but also thread aware.• PROOF obvious candidate. By default a ROOT
interactive session should run in PROOF mode. It would be nice if this could be made totally transparent to a user.
• Speed-up I/O with multi-threaded I/O and read-ahead• Buffer compression in parallel• Minimization function in parallel• Interactive compilation with ACLIC in parallel• etc..
Impact on ROOT
René Brun, CERN ROOT in the multi-core cpu era 11
CPU/Node hierarchy
latency 100 nanos 100 micros 100 millis
Laptop node
1->32->??N cpus
Local cluster
1000xN cpus
GRID(s)
100x1000 nodes
Batch jobs pushed to the GRID
Maximum number of jobs run in one week/month
Interactive jobs run on the laptop and use processors on the GRID
Real Time important for short/medium queries
Analysis mainly on laptop and ONE cluster on the GRID
Software ObesityUse local power as much as possible.Can we simplify software installation on the GRID?
A proposal
René Brun, CERN ROOT in the multi-core cpu era 13
• A considerable amount of time is spent in installing software (up to one day for an expert).
• Porting to a new platform is non trivial.• Dependency problems in case many packages
must be installed.• Only a small subset of the software is used.• The installation may require a huge amount of
disk space. Users are scared to download a new version.
• This is not fitting well with the GRID concept.
• The GRID should be used to simplify this process and not to make it more complex.
Observations
René Brun, CERN ROOT in the multi-core cpu era 14
AliceAlice AtlasAtlas CMSCMS ROOTROOTnumber of lines in header files
102282 698208 104923 153775
classes total 1815 8910 ??? 1500
classes in dict 1669 >41202140
835 1422
lines in dict 479849 ??? 103057 698000
classes c++ lines
577882 1524866 277923 857390
total linesClasses+dict
1057731 ??? 380980 1553390
totalf77 lines
736751 928574 ??? 3000
directories 540 19522 <500 958
comp time 25’ 750’ 90’ 30’
lines compiled/s 1196 50 (70) 71 863
LHC software
René Brun, CERN ROOT in the multi-core cpu era 15
René Brun, CERN ROOT in the multi-core cpu era 16
Source of inefficiencies with Shared Libs
• fPIC (Position Independent Code) introduces a 20 per cent degradation (10 to 30%)
• In case of many shared libs, the percentage of classes and code used is small =>swapping (20%)
• Because shared libs are generated for maximum portability, one cannot use the advanced features of the local processor when compiling. The same optimization level is used everywhere
• But a very large fraction of the code does not need to be optimized: no gain at execution, big loss when compiling
• A small fraction of the code should be compiled with the highest possible optimization (10%)
• May be a factor 2 loss !!!
René Brun, CERN ROOT in the multi-core cpu era 17
• In the Fortran era, often one subroutine/file• Loader takes only the subroutines really
referenced. However the percentage of referenced but not used code has increased with time.
• Shared libs were efficient at a time when code could be shared between different tasks on time sharing systems.
• Shared libs have solved partially the link time problem.
• Shared libs are not a solution for the long term.• Archive libs are unusable in a large system, but
nice to build static modules
• What to do ?What to do ?
Shared Libs vs Archive Libs
René Brun, CERN ROOT in the multi-core cpu era 18
Fraction of ROOT code
really used in a batch job
Share
d lib
siz
e in b
yte
s
René Brun, CERN ROOT in the multi-core cpu era 19
Fraction of ROOT code really used in a
job with graphics
René Brun, CERN ROOT in the multi-core cpu era 20
%classes used
%functions used
Fraction of code really used in one program
René Brun, CERN ROOT in the multi-core cpu era 21
*.cxx, *.h100 Mb
c++800 l/s ld myapp
memory
*.so76 Mb
*.o110 Mb
Cint10000 l/s
We are waisting a lot of time in
writing/reading .o or .so files to/from disk
René Brun, CERN ROOT in the multi-core cpu era 22
BOOTIntroducing
A Software Bootstrap system
Proposal for a new scenario
René Brun, CERN ROOT in the multi-core cpu era 23
• A small system to facilitate the life of many users doing mainly data analysis with ROOT and their own classes (users + experiment).
• It is a very small subset of ROOT (5 to 10 per cent)
• The same idea could be extended to other domains, like simulation and reconstruction.
What is BOOT?
R
O
O
TBOOT
René Brun, CERN ROOT in the multi-core cpu era 24
• A small, easy to install, standalone executable A small, easy to install, standalone executable modulemodule ( < 5 Mbytes) • One click in the web browser
• It must be a stable system that can cope with old and new versions of other packages including ROOT itself.
• It will include:• A subset of ROOT I/O, network and Core classes• A subset of Reflex• A subset of CINT (could also have a python flavor)• Possibly a GUI object browser
• From the BOOT GUI or command line, the referenced software (URL) will be automatically downloaded and locally compiled/cached in a transparent way.
What is BOOT?
René Brun, CERN ROOT in the multi-core cpu era 25
• BOOT must be able to run with the existing codes, may be with reduced possibilities.
• In the next slides, a few use cases to illustrate the ideas.
• Do not take the syntax as a final word.
BOOT and existing applications
René Brun, CERN ROOT in the multi-core cpu era 26
• Assumes BOOT already installed on your machine [email protected]
• Nothing else on the machine , except the compiler (no ROOT, etc)
• Import a ROOT file containing histograms, Trees and other classes (usecase1.root)
• Browse contents of file• Draw an histogram
BOOT: Use Case 1
R
O
O
TBOOT
René Brun, CERN ROOT in the multi-core cpu era 27
Usecase1.root(2 Mbytes)
Contains references(URL) to classes in
namespace ROOT
http://root.cern.ch/coderoot.root
This is a compressed ROOT filecontaining the full ROOT source tree
automatically built from CVS(25 Mbytes)
+
ROOT classes dictionary DSgenerated by Reflex
(5 Mbytes)+
The full classes documentationObjects generated by the source
parser(5 Mbytes)
Local cache withthe source of the
classes really used+
binaries for the classes or functions
that are automatically generated from the
interpreter (like ACLIC mechanism)
Use Case 1
René Brun, CERN ROOT in the multi-core cpu era 28
code.root
usecase1.root
Use Case 1 pictures
René Brun, CERN ROOT in the multi-core cpu era 29
//This code can be interpreted line by line
//executed as a script or compiled with C/C++
//after corresponding code generation
use ROOT, YYYY=http://cms.cern.ch/packages/yyyy
h = new TH1F(“h’,”example”,100,0,1);v = new LorentzVector(….);gener = new myClass(v.x());
h.Fill(gener.Something());
h.Draw();
Use Case 2
• BOOT already installed• Want to write the shortest possible program
using some classes in namespace ROOT and some classes from another namespace YYYY
René Brun, CERN ROOT in the multi-core cpu era 30
use ROOT, YYYY=http://cms.cern.ch/packages/yyyy
use ROOT6=http://root.cern.ch/root6/code.root
use ROOT6::LorentzVector
h = new TH1F(“h’,”example”,100,0,1);
v = new LorentzVector(….);
gener = new myClass(v.x());
h.Fill(gener.Something());
Use Case 3
• A variant of Use Case 2• A bug has been found in class LorentzVector of
ROOT and fixed in new version ROOT6
René Brun, CERN ROOT in the multi-core cpu era 31
use ROOT
use ATLFAST=http://atlas.cern.ch/atlfast/atlfastcode.root
TFile f(“mcrun.root”);
for each entry in f.Tree
for each electron in Electrons
h.Fill(electron.m_Pt);
h.Draw
Use Case 4
• High Level ROOT Selector understanding named collections in memory (ROOT,STL) or collections in ROOT files.
René Brun, CERN ROOT in the multi-core cpu era 32
Event data in a Tree
C++ scripts
Use Case 5: Event Displays
• In general, Event Displays require the full experiment infrastructure (Pacific, Obelix, WonderLand, Crocodile).
• This is complex and not good for users and OUTREACH.
• A data file with the visualization scripts is far more powerful
• This implies that the This implies that the GUI must be fully GUI must be fully scriptablescriptable. This is the case for ROOT GUI.
René Brun, CERN ROOT in the multi-core cpu era 33
Requirements: work to do
• libCore has already all the infrastructure for client-server communications and for accessing remote files on the GRID.
• We must understand how to use subsets of the compilers and linkers to bypass disk I/O.
• We must understand how to emulate a dynamic linker using pre-compiled objects in memory.
• We have to investigate various code generation tools and the coupling with an extended version of CINT (and possibly python).
• We must understand how to use the STL functionality without its penalty. Dynamic templates are also necessary.
René Brun, CERN ROOT in the multi-core cpu era 34
Procedure
• These are just ideas. Making a firm proposal requires more investigations and prototyping.
• It must be clear that the top priority is the consolidation of ROOT to be ready for LHC data taking. This should not be an excuse to not look forward.
• This work will continue as a background activity.
René Brun, CERN ROOT in the multi-core cpu era 35
Conclusions
• After more than 10 years of intensive development, the CORE work packages are consolidated.
• Important developments in PROOF, Math, CINT, Reflex, 3-D graphics.
• All packages must be adapted to a multi-threading environment made necessary by the multi core cpus.
• .Instead of pushing gigabytes of source or shared libs to the GRID working nodes, BOOT could greatly optimize and simplify the use of the GRID. BOOT will use a PULL technique to download only the software necessary (source) to run an application and in an incremental way.
• Hoping to show a working BOOT at the next CHEPHoping to show a working BOOT at the next CHEP.
Spare Slides
René Brun, CERN ROOT in the multi-core cpu era 37
“Classic” approach
G. Ganis, CHEP06, 15 Feb 2006
StorageBatch farm
queues
manager
outputs
catalog
query
“static” use of resources jobs frozen, 1 job / worker node
“manual” splitting, merging limited monitoring (end of single job)
submit
files
jobsdata file splitting
myAna.C
mergingfinal analysis
René Brun, CERN ROOT in the multi-core cpu era 38
The PROOF approach
G. Ganis, CHEP06, 15 Feb 2006
catalog StoragePROOF farm
scheduler
query
farm perceived as extension of local PC more dynamic use of resources real time feedback automated splitting and merging
MASTER
PROOF query:data file list, myAna.C
files
final outputs
(merged)
feedbacks
(merged)
René Brun, CERN ROOT in the multi-core cpu era 39
Atlas packages with > 10000 lines
211677 dice fortran=211641187691 atrecon fortran=138126,cpp=49354129793 MuonSpectrometer fortran=121321,python=3715,csh=2613,sh=2136118504 Tools cpp=67337,ansic=19012,python=13770,sh=7373,yacc=5659, fortran=3024,lex=1971116327 PhysicsAnalysis cpp=107348,python=6070,sh=1649,csh=1260115143 geant3 fortran=115040,ansic=67112445 TileCalorimeter cpp=108580,python=2209,csh=920,sh=736108200 atutil fortran=108000,ansic=16480866 Applications fortran=71764,cpp=6961,ansic=186574721 Calorimeter cpp=65917,python=7854,sh=490,csh=46067822 atlfast fortran=6778664838 Tracking cpp=60255,python=2092,csh=1380,sh=110459429 Generators fortran=28136,cpp=25538,python=4123,sh=872,csh=76049926 graphics java=40719,cpp=8312,python=321,sh=255,csh=22040058 AtlasTest cpp=25159,python=5131,sh=4815,perl=4145,csh=51739576 Control cpp=22030,python=15904,sh=907,csh=69331192 DetectorDescription ansic=29540,csh=680,sh=562,python=34329500 TestBeam cpp=27433,python=1491,csh=320,sh=25625001 Reconstruction sh=10297,fortran=7559,python=5393,csh=166718989 atlsim fortran=17561,cpp=138018328 InnerDetector python=11466,csh=2860,sh=2641,ansic=134317291 Simulation python=13653,sh=2126,csh=1302,fortran=16916139 Database perl=8310,sh=4299,java=2209,csh=709,python=56614250 Event cpp=13522,python=296,csh=240,sh=19212930 gcalor fortran=1289411955 Trigger python=7860,csh=1780,sh=1673,perl=63411195 LArCalorimeter python=6133,ansic=2045,csh=1620,sh=1347
3 million lines of code1200 packages
René Brun, CERN ROOT in the multi-core cpu era 40
Alice packages with > 10000 lines
398742 PDF fortran=398729,ansic=13146414 PYTHIA6 fortran=140748,cpp=5413,ansic=153,pascal=100128337 HLT cpp=127601,ansic=605,sh=100,csh=31128103 ITS cpp=128010,sh=93105763 MUON cpp=105673,sh=9094548 DPMJET fortran=94267,cpp=28172400 STEER cpp=7240052443 HBTAN cpp=51260,fortran=118351489 TPC cpp=51479,sh=1050932 PHOS cpp=50639,csh=29346176 TRD cpp=4617641998 ISAJET fortran=40483,cpp=1494,pascal=2139407 RALICE cpp=29764,ansic=9355,sh=28835916 EMCAL cpp=35410,fortran=383,csh=12331820 ANALYSIS cpp=3182027751 HERWIG fortran=27246,cpp=477,ansic=2827025 FMD cpp=27021,sh=426667 TOF cpp=2666724258 EVGEN cpp=2425821588 HIJING fortran=21099,cpp=48920562 JETAN cpp=19687,fortran=87518344 RAW cpp=1834415232 STRUCT cpp=1523213142 PMD cpp=1314212945 RICH cpp=1294510966 FASTSIM cpp=1096610944 MONITOR cpp=1094410659 ZDC cpp=10659
1.5 million lines of code
René Brun, CERN ROOT in the multi-core cpu era 41
libGraf-------
…TGraphTGaxisTPave
…
libX11-------
…
drawlinedrawtext
…
pm
libCore-------
…I/O
TSystem…
libHist-------
…TH1TH2…
libHistPainter-------
…THistPainter
TPainter3DAlgorithms…
libGpad-------
…TPadTFrame
…
h.Draw()
CINT
local mode
(Plug-in Manager)
pm
pm
pm
pm
René Brun, CERN ROOT in the multi-core cpu era 42
• STL containers are very nice. HoweverHowever they have a very high cost in a real large environment.
• Compiling code with STL is much much slower because of inlining (STL is only in header files). The situation improves a bit with precompiled headers (eg in gcc4), but not much.
• Object modules are bigger• Compiler or linker is able to eliminate duplicate code in
ONE object file or shared lib, not across libraries.• If you have 100 shared libs, it is likely that you have the
code for std:vector push_back or iterators 100 times!• In-lining is nice if used with care (or toy benchmarks). It
may have an opposite effect, generating more cache misses in a real application.
• Templates are statically defined and difficult to use in an dynamic interactive environment.
Problem with STL Inlining
René Brun, CERN ROOT in the multi-core cpu era 43
Can we gain with a better packaging?
• Yes and no• One shared lib per class implies more
administration, more dictionaries, more dependencies.
• 80 shared libs for ROOT is already a lot. 500 would be non sense
• A CORE library is essential. However some developers do not like this and penalize/complicate the life of the vast majority of users.
• Plug-in Manager helps