22
Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions 1 15/12/2014

Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Embed Size (px)

Citation preview

Page 1: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 1

Nuons && Threads

SuggestionsSFT meeting

December 15 2014René Brun

15/12/2014

Page 2: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 2

• 1973: Thesis in Nuclear Physics (SC33/CERN, Diogene/Saturne/Saclay)• 1973-1975: ISR/R232, p-p elastic scattering with C.Rubbia (Reconstruction)• 1975-1980:SPS/NA4, deep inelastic muon scattering with C.Rubbia (Simul + Recons)• 1978-1979 : simulation of UA1 with C.Rubbia• 1980-1989: simulation of OPAL with R.Heuer• 1988-1993:simulation of GEM & SDC for the defunct SSC• 1991-1994: simulation of ATLAS and CMS (letters of Intent) F.Gianotti, D.Froidevaux, V.Karimaki• 1995-2010: busy with ROOT• 2009-2010: interested by theoretical predictions for TOTEM (p-p elastic) and results• 2009-2011: foundations for the Nuons model• 2011……..: computing particles masses better than 1/1000• 2012……. Testing p-p elastic with TOTEM/UA4/D0/ISR• 2012… Testing p-p interactions at the LHC (900 GeV, 2.76 TeV, 7 TeV)• 2013… Testing nuons model with Jets at the LHC• 2014… Predictions for 13 TeV + paper draft

• From Algol to Nuons

15/12/2014

Page 3: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 3

Nuons

15/12/2014

proton

neutron

Page 4: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 4

• I am implementing my « physics model » to:– Model elementary particles using « nuons »– Compute particle masses with high accuracy– Test the model at many energies for p-p elastic scattering– Test the model at LHC energies: particles production and Jets

findall.C totem.C collide.C

Nuons and C++

15/12/2014

Page 5: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 5

Example of event motivating my project

15/12/2014

Standard proton model

Predicted cross section wrong by more than 1000 for t > 2 GeV^2

Page 6: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 6

collisions

15/12/2014

PP elastic

PP inelastic

Page 7: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 7

Some programming details• The 3 C++ programs findall, totem and collide (about 12000 LOC in

total) are all running in batch and multi-threaded mode on several OpenLab machines with 2x6 cores Westmere or 2x12 cores Ivy Bridge or 2x14 cores E5-2697v3 now upgraded to 2x18 cores. My programs run from a few minutes to one day.– nohup root.exe –b –q « collide.C+(7000) » >x1.log&– eg processid 12756

• While the program(s) are running, I can inspect the results (histograms or/and Trees), (say once per minute) from my laptop, stop and lauch again with a new set of parameters.– root > .x colshow.C(-12756)– This CINT script takes the file collide_12756.root from OpenLab/AFS and

stores it on my laptop where histograms are visualized.

15/12/2014

Page 8: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 8

More programming details• Findall is a bit « lattice QCD like ». 99.99% of

the time is spent in TMinuit to compute the stable positions of a set of N nuons generated at random in a cube of size 1 fermi.

• Totem and Collide are quite similar to Pythia or Herwig. They simulate proton-proton collisions generating output particles and Jets.

• The scripts run on my laptop and show plenty of graphs comparing with the LHC experiments results.

BatchOn OpenLab

machines

Interactive script

On my laptop

Histograms, Treeafs

scp

15/12/2014

Page 9: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 9

More programming details(2)

• Findall saves results in a Tree (one particle per entry). It takes about 0.1s to compute a pion, 10 minutes for a proton and 20 minutes for a Omega.

• Totem generates histograms only (about 20 1&2D) • collide generates about 100 histograms (1 & 2D) and a Tree

with a size ranging from a few Mbytes/minute to several Gbytes/minute depending on the desired granularity of the collision information. About one billion collisions are generated in one day.

• Most histograms are filled millions of times per second.

15/12/2014

Page 10: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 10

Experience Suggestions

• All these applications are multi-threaded, a HUGE gain in REAL time for what I am doing.

• There are many many applications in HEP that look very similar :– All detector simulations– All event generators– Most physics analyses

• To make the most efficient use of the hardware, I had to make simple changes in ROOT or implement solutions that should be implemented in a more general way in ROOT.

15/12/2014

Page 11: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 11

Main Topics

• Random numbers and distributions : trivial• Histograms• Trees• I/O in general• Thread scalability considerations

Current ROOT is a blockerfor performant multi-threaded applications

15/12/2014

Page 12: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 12

Random Numbers• No changes required in the TRandomXX classes. I am using only the

nice and efficient TRandom3 (Mersenne Twister). • I create a TRandom3 object per thread initialized with :

TRandom3(pid + 1000*thnumb).• I had to modify or circumvent all places referencing gRandom in full

backward compatibility and in totally trivial ways:– TF1::GetRandom() -> TF1::GetRandom(double r=-1)– Similar changes should be applied to TH1::GetRandom and FillRandom– TGenPhaseSpace: add SetRandom function and member fRandom– Similar changes should also be applied to:

• TF2,TF3::GetRandom, Tunuran, TKDTree• TMVA: Dataset, RuleEnsemble• TGeoBBox, TGeoCompositeShape, TGeoChecker• TRobustEstimator, TAttParticle, TVirtualMC, RooStudyPackage• TApplicationRemote, TProof

15/12/2014

Page 13: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 13

Histograms & Threads• Currently one has to set TH1::AddDirectory(0) to bypass gDirectory.• However, this forces the user to do the histogram book-keeping himself. This makes the

histogram merging phase a bit complex (see next slides with a solution).• Histograms may be created in the main thread and filled (with thread-locking) at each fill.

This is fine if the number of fills is negligible.• The only realistic solution is to make a copy of all histograms per thread.• However, in several applications, this can represent a substantial increase in memory.

– In my case, I have at most 100 histograms (total 400 Kbytes per thread)– Alice monitoring has 14000 histograms, total size 1.5 Gbytes in memory!– Most analysis applications have a few hundred , up to a few thousand histograms

• Some tiny work is required to take advantage of the architecture already in place to:– Do lazy instantiations of the bins structures– Exploit better the TH1::SetBuffer mechanism, in particular in TH1::Merge and make vectorization possible.

• I could not survive without my I/O check-pointing (around one per minute) for histograms and Trees. This allows me to inspect at any time the current status of my jobs and interrupt them and change my parameters when I see that the results are not the ones expected. It also makes the running of multi-threading applications much safer.

15/12/2014

Page 14: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 14

Histograms : poor man

Main ThreadTH1 *hrun, *hwatch

Thread 1

Create 97 histograms

Loop on events

Every N events, save thread histograms

to file

Thread 6

Create 97 histograms

Loop on events

Every N events, save thread histograms

to file

Thread 12

Create 97 histograms

Loop on events

Every N events, save thread histograms

to file

……. …….

Then Merge all thread files every NN events or at end of job

What I have been doing for a long time and efficiency < 8/12

15/12/2014

Page 15: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 15

Histograms (2) much better

Main ThreadTH1 *hrun, *hwatch

Thread 1

Create 97 histograms

Loop on events

Every N events, merge

histograms from all threadsand save to file

Thread 6

Create 97 histograms

Loop on events

Every N events, merge

histograms from all threadsand save to file

Thread 12

Create 97 histograms

Loop on events

Every N events, merge

histograms from all threadsand save to file

……. …….

My current version

15/12/2014

Page 16: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

16

Histograms Management (1)(my current solution)

Nuons && Threads -> Suggestions

TH1::AddDirectory(0);TList htr[nthreads];TH1D *hrun = new TH1D(…);

TThread::Lock();TList &hlist = htr[thnumb]; TH1D *hncol = new TH1D("hncol","number of collisions",66,0,66); hlist.Add(hncol); TH1D *hpoiss = new TH1D("hpoiss","Jets particle multiplicity",50,0,50); hlist.Add(hpoiss);……hncol->Fill(…);…

TFile *fhist = TFile::Open(TString::Format("collide_%d.root",processID),"recreate");hrun->SetBinContent(26,mainwatch->GetRealTime());hrun->Write(); TList hlistall; int nh = htr[0].GetSize();for (int ih=0;ih<nh;ih++) { TH1 *hcur = (TH1*)htr[0].At(ih)->Clone(); hlistall.Clear(); for (int t=1;t<ncpus;t++) { hlistall.Add(htr[t].At(ih)); } hcur->Merge(&hlistall); hcur->Write(); delete hcur;}fhist->SaveSelf(); delete fhist;

Main thread

in thread thnumb

In any thread or end of main thread

15/12/2014

Page 17: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

17

Histograms Management (2)(what I would like to see in ROOT)

Nuons && Threads -> Suggestions

TH1::InitializeThreads(nthreads);TH1D *hrun = new TH1D(…);

TH1::SetThreadDirectory(thnumb]; TH1D *hncol = new TH1D("hncol","number of collisions",66,0,66); TH1D *hpoiss = new TH1D("hpoiss","Jets particle multiplicity",50,0,50); ……hncol->Fill(…);…

TFile *fhist = TFile::Open(TString::Format("collide_%d.root",processID),"recreate");hrun->SetBinContent(26,mainwatch->GetRealTime());hrun->Write(); TH1::MergeThreads()->Write();fhist->SaveSelf(); delete fhist;

Main thread

in thread thnumb

In any thread or end of main thread

15/12/2014

Page 18: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 18

Histograms (3) muuuch better

Main ThreadTH1 *hrun, *hwatch

Thread 1

Create 97 histograms

Loop on events

Every N events, merge

histograms from all threadsand save to file

Thread 6

Create 97 histograms

Loop on events

Every N events, merge

histograms from all threadsand save to file

Thread 12

Create 97 histograms

Loop on events

Every N events, merge

histograms from all threadsand save to file

……. …….

What I would like to see

Non blocking asynchronous I/O thread15/12/2014

Page 19: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 19

Trees & Threads• Solution1 : one TTree per thread one file per thread, then possibly merge files at end of

job.– Currently this requires locking or/and fixing the non-thread-safe parts of TTree I/O – Not very user friendly as it requires more book-keeping

• Solution2: Use the TTree Buffer merge facility– This is much more efficient, but requires more memory– This solution is not yet fully operational for threads

• Solution 3: Create only one TTree in main thread (or any thread)– For each fill: Lock, Swap branch addresses, Fill, UnLock– This solution is nice for memory, but adds more sequentiality– This is my current solution, waiting for a better solution, eg Solution4

• Solution4: same as Solution3, but with– An optimized branch addresses booking and swapping– Delegation of the pure I/O part to a separate asynchronous thread doing the zipping and disk writes.

• Solution 5: same as Solution 4, with in addition– Possibility to call branch::Fill per thread (This will be essential for GeantV)

15/12/2014

Page 20: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 20

Trees & Threads(my current solution)

TTree *T = 0;

if (!T && fillTree) { TFile::Open(TString::Format("/data/brun/collide_%d_events.root",processID),"recreate"); T = new TTree("T","selected collide events"); T->Branch("i1",&i1,"i1/I"); T->Branch("i2",&i2,"i2/I"); T->Branch("nch",&nch,"nch/I"); T->Branch("nchCMS",&nchCMS,"nchCMS/I"); T->Branch("njets",&njets,"njets/I"); T->Branch("njetsCMS",&njetsCMS,"njetsCMS/I"); T->Branch("phi1",&phi1,"phi1/D"); ……. T->Branch("ptype",ptype,"ptype[nchCMS]/I"); T->Branch("pjet",pjet,"pjet[nchCMS]/I"); T->Branch("ppx",ppx,"ppx[nchCMS]/D"); T->Branch("ppy",ppy,"ppy[nchCMS]/D"); T->Branch("ppz",ppz,"ppz[nchCMS]/D"); T->Branch("ppt",ppt,"ppt[nchCMS]/D"); T->Branch("peta",peta,"peta[nchCMS]/D"); T->AutoSave("SaveSelf"); }

if (fillTree && bigjet) { TThread::Lock(); T->SetBranchAddress("i1",&i1); T->SetBranchAddress("i2",&i2); T->SetBranchAddress("nch",&nch); T->SetBranchAddress("nchCMS",&nchCMS); T->SetBranchAddress("njets",&njets); T->SetBranchAddress("njetsCMS",&njetsCMS); T->SetBranchAddress("phi1",&phi1); ……. T->SetBranchAddress("ptype",ptype); T->SetBranchAddress("pjet",pjet); T->SetBranchAddress("ppx",ppx); T->SetBranchAddress("ppy",ppy); T->SetBranchAddress("ppz",ppz); T->SetBranchAddress("ppt",ppt); T->SetBranchAddress("peta",peta); T->Fill(); //every N events autosave if (event%1000==0) T->AutoSave(“SaveSelf”); TThread::UnLock(); }

Main thread

in initialisation thread thnumb

Filling Tree in thread thnumb

15/12/2014

Page 21: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 21

Trees & Threads(what would be faster and simpler)

TTree *T = 0;

if (!T && fillTree) { TFile::Open(TString::Format("/data/brun/collide_%d_events.root",processID),"recreate"); T = new TTree("T","selected collide events"); T->Branch("i1",&i1,"i1/I"); T->Branch("i2",&i2,"i2/I"); T->Branch("nch",&nch,"nch/I"); T->Branch("nchCMS",&nchCMS,"nchCMS/I"); T->Branch("njets",&njets,"njets/I"); T->Branch("njetsCMS",&njetsCMS,"njetsCMS/I"); T->Branch("phi1",&phi1,"phi1/D"); ……. T->Branch("ptype",ptype,"ptype[nchCMS]/I"); T->Branch("pjet",pjet,"pjet[nchCMS]/I"); T->Branch("ppx",ppx,"ppx[nchCMS]/D"); T->Branch("ppy",ppy,"ppy[nchCMS]/D"); T->Branch("ppz",ppz,"ppz[nchCMS]/D"); T->Branch("ppt",ppt,"ppt[nchCMS]/D"); T->Branch("peta",peta,"peta[nchCMS]/D"); T->AutoSave("SaveSelf"); T->SaveThreadBranches(thnumb); }

if (fillTree && bigjet) { TThread::Lock(); T->SetThreadBranches(thnumb); T->Fill(); //every N events autosave if (event%1000==0) T->AutoSave(“SaveSelf”); TThread::UnLock(); }

Main thread

in initialisation thread thnumb

Filling Tree in thread thnumb

15/12/2014

Page 22: Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

Nuons && Threads -> Suggestions 22

Trees & Threads (3)(what would be much faster and even simpler)

TTree *T = 0;

if (!T && fillTree) { TFile::Open(TString::Format("/data/brun/collide_%d_events.root",processID),"recreate"); T = new TTree("T","selected collide events"); T->Branch("i1",&i1,"i1/I"); T->Branch("i2",&i2,"i2/I"); T->Branch("nch",&nch,"nch/I"); T->Branch("nchCMS",&nchCMS,"nchCMS/I"); T->Branch("njets",&njets,"njets/I"); T->Branch("njetsCMS",&njetsCMS,"njetsCMS/I"); T->Branch("phi1",&phi1,"phi1/D"); ……. T->Branch("ptype",ptype,"ptype[nchCMS]/I"); T->Branch("pjet",pjet,"pjet[nchCMS]/I"); T->Branch("ppx",ppx,"ppx[nchCMS]/D"); T->Branch("ppy",ppy,"ppy[nchCMS]/D"); T->Branch("ppz",ppz,"ppz[nchCMS]/D"); T->Branch("ppt",ppt,"ppt[nchCMS]/D"); T->Branch("peta",peta,"peta[nchCMS]/D"); T->AutoSave("SaveSelf"); T->SaveThreadBranches(thnumb); }

if (fillTree && bigjet) { TThread::Lock(); T->SetThreadBranchesFill(thnumb, kAutoSave %( n%1000==0)); TThread::UnLock(); }

Main thread

in initialisation thread thnumb

Filling Tree in thread thnumb

Where SetThreadBranchesFill quickly copy the branch data to a

circular buffer,return immediately the control to

the calling threadand pass the data to another

thread asynchronously to fill the TreeCache and disk I-O

15/12/2014