17
Grid in action: from EasyGrid to LCG testbed and gridification techniques. James Cunha Werner University of Manchester Christmas Meeting - 2005

Grid in action: from EasyGrid to LCG testbed and gridification techniques

Embed Size (px)

DESCRIPTION

Grid in action: from EasyGrid to LCG testbed and gridification techniques. James Cunha Werner University of Manchester. Christmas Meeting - 2005. Conventional way: Usual code (your cuts) Run BetaMiniApp in several data files one after the other. When all data is done, you have results!. - PowerPoint PPT Presentation

Citation preview

Page 1: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Grid in action: from EasyGrid to LCG testbed

and gridification techniques.

James Cunha WernerUniversity of Manchester

Christmas Meeting - 2005

Page 2: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Going to grid

Conventional way:• Usual code (your

cuts)• Run BetaMiniApp

in several data files one after the other.

• When all data is done, you have results!

Grid way:• Same usual code

(your cuts)• Run several copies of

BetaMiniApp, each running in one data file independent.

• At the end, join all results!

EasyGrid does it for you!

Page 3: Grid in action: from EasyGrid to LCG testbed and gridification techniques

General overview

Gridification algorithms

for generic soft

EasyGridfor datasets

EasyTaufor selected

events

Grid testbed

Users’ software

Page 4: Grid in action: from EasyGrid to LCG testbed and gridification techniques

EasyGrid: an overview• Prototype for future development.

RPA = guarantee of useful software• Provide all support for job submission

system:– Recovers results in users’ directory– Generates reports for further analysis (aborts

and abends) in one history file.

• It is a Framework users can adapt to their own needs and applications.

• Fully operational and integrated with LCG.

Page 5: Grid in action: from EasyGrid to LCG testbed and gridification techniques

./easygrid dataset_name

Page 6: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Christmas 2004: My goals were…

• develop a submission system fail proof.• write web pages with all elementary

tasks in HEP/Babar, to help students and newbie.

• Understand q-qbar interaction through Pi0.

What I have achieved in 2005…

Page 7: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Achievements with EasyGrid

• Friendly user framework, flexible and reliable. It provides users with results, or necessary information for further analysis.

• Tutorial web pages for PhD students and new researchers.http://www.hep.man.ac.uk/u/jamwer

• Pi0 Project: analysis of 500 million events and 5 Million Monte Carlo generation in 5 weeks.http://www.hep.man.ac.uk/u/jamwer/pi0alg5.html

• Anti-deuteron project: 1,500 Million events in 1 week, running in several sites in UK. More than 200 jobs in parallel.http://www.hep.man.ac.uk/u/jamwer/deutdesc.html

Page 8: Grid in action: from EasyGrid to LCG testbed and gridification techniques

LCG Installation and debug

• There are several problems in LCG grid: – high number of jobs fail when running more

than 200 jobs. – installation issues.– performance issues.

• Installation of a complete testbed from scratch using 10 obsolete computers:http://www.hep.man.ac.uk/u/jamwer/#sec0

Page 9: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Testbed stress test

Processing time is zero: BetaMiniApp replaced by program to print dataset nameand wait some time (e.g. 300 s).

1,000 jobs submitted every time at 6 WNs testbed.

Page 10: Grid in action: from EasyGrid to LCG testbed and gridification techniques

  T0 T1 T2

Sub Fail 0 0  0

Aborts (1)

84 122  0

Bf33 296 144  6

Bf34 306 148  161

Bf35 314 156  195

Bf36 0 165  211

Bf37 0 172  213

Bf38 0 91 (2)  214

•T0 and T1: Time between submissions is zero (continuous flow).

•T0: WN bf36, bf37, bf38 were without pbs_mom started

•T1: 1 WN crashed during test (2).

•T2: time between submissions: 30 s.

CE (bf32) CPU use was >90%.

(1) Cannot plan: BrokerHelper: no compatible resources

Number of jobs/WN

Page 11: Grid in action: from EasyGrid to LCG testbed and gridification techniques
Page 12: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Recommendations

CE are very required in Grid (>90% CPU load!) and affects grid performance:

• The number of WNs for each CE can be defined by the minimum value of submission delay and minimum queue time.

• Run one CE for large farms is a limiting factor. More matched CEs per RB would reduce failure and increase performance.

• File system study will provide more information soon.

Page 13: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Research in Gridification technologies for conventional software

• Users expend years developing their source code, and they will not throw away just to use web services.

• I developed an algorithm that will allow users use their own software on top of a web service layer with LCG middleware.

• Preliminary tests using “fake” web services (simulated with PVM) show it is a viable and flexible approach.

Page 14: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Gridification algorithm

• Creates parallel processes using PVM with ssh remote shell.

• There is a central job, with distributes tasks over parallel processes, when slaves processes return results. No need for load balancing!

• Controls slaves failures and resubmission to available slaves. There is not a checkpoint system (not worth).

• Transfer time can be a bottleneck. Task streams implemented. Results with 300 empty processes in one laptop show a transfer time of 185 ms/process.

Page 15: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Conclusion

• EasyGrid is operational. Benchmarks were a proof-of-concept under real conditions.

• LCG testbed is operational, providing results, and supporting performance analysis and tuning.

• Gridification algorithm is running in one Laptop with Genetic Programming/AI.

Page 16: Grid in action: from EasyGrid to LCG testbed and gridification techniques

New year resolution

• Analysis of linux kernel related file server issues. • LCG Performance study and Linux kernel tuning.• Implementation of EasyTau: a submission module

for TauUser package using EasyGrid (running on ntuples).

• Gridification algorithm running with LCG and commercial applications (WebSphere, Tivoli, Symphony, etc)

• EasyGrid Product development and startup.• Run pi0 project again with EasyGrid Product and

maybe … publish a paper about gridification!

Page 17: Grid in action: from EasyGrid to LCG testbed and gridification techniques

Happy new year!