Upload
ramona-robles
View
12
Download
2
Embed Size (px)
DESCRIPTION
Grid in action: from EasyGrid to LCG testbed and gridification techniques. James Cunha Werner University of Manchester. Christmas Meeting - 2005. Conventional way: Usual code (your cuts) Run BetaMiniApp in several data files one after the other. When all data is done, you have results!. - PowerPoint PPT Presentation
Citation preview
Grid in action: from EasyGrid to LCG testbed
and gridification techniques.
James Cunha WernerUniversity of Manchester
Christmas Meeting - 2005
Going to grid
Conventional way:• Usual code (your
cuts)• Run BetaMiniApp
in several data files one after the other.
• When all data is done, you have results!
Grid way:• Same usual code
(your cuts)• Run several copies of
BetaMiniApp, each running in one data file independent.
• At the end, join all results!
EasyGrid does it for you!
General overview
Gridification algorithms
for generic soft
EasyGridfor datasets
EasyTaufor selected
events
Grid testbed
Users’ software
EasyGrid: an overview• Prototype for future development.
RPA = guarantee of useful software• Provide all support for job submission
system:– Recovers results in users’ directory– Generates reports for further analysis (aborts
and abends) in one history file.
• It is a Framework users can adapt to their own needs and applications.
• Fully operational and integrated with LCG.
./easygrid dataset_name
Christmas 2004: My goals were…
• develop a submission system fail proof.• write web pages with all elementary
tasks in HEP/Babar, to help students and newbie.
• Understand q-qbar interaction through Pi0.
What I have achieved in 2005…
Achievements with EasyGrid
• Friendly user framework, flexible and reliable. It provides users with results, or necessary information for further analysis.
• Tutorial web pages for PhD students and new researchers.http://www.hep.man.ac.uk/u/jamwer
• Pi0 Project: analysis of 500 million events and 5 Million Monte Carlo generation in 5 weeks.http://www.hep.man.ac.uk/u/jamwer/pi0alg5.html
• Anti-deuteron project: 1,500 Million events in 1 week, running in several sites in UK. More than 200 jobs in parallel.http://www.hep.man.ac.uk/u/jamwer/deutdesc.html
LCG Installation and debug
• There are several problems in LCG grid: – high number of jobs fail when running more
than 200 jobs. – installation issues.– performance issues.
• Installation of a complete testbed from scratch using 10 obsolete computers:http://www.hep.man.ac.uk/u/jamwer/#sec0
Testbed stress test
Processing time is zero: BetaMiniApp replaced by program to print dataset nameand wait some time (e.g. 300 s).
1,000 jobs submitted every time at 6 WNs testbed.
T0 T1 T2
Sub Fail 0 0 0
Aborts (1)
84 122 0
Bf33 296 144 6
Bf34 306 148 161
Bf35 314 156 195
Bf36 0 165 211
Bf37 0 172 213
Bf38 0 91 (2) 214
•T0 and T1: Time between submissions is zero (continuous flow).
•T0: WN bf36, bf37, bf38 were without pbs_mom started
•T1: 1 WN crashed during test (2).
•T2: time between submissions: 30 s.
CE (bf32) CPU use was >90%.
(1) Cannot plan: BrokerHelper: no compatible resources
Number of jobs/WN
Recommendations
CE are very required in Grid (>90% CPU load!) and affects grid performance:
• The number of WNs for each CE can be defined by the minimum value of submission delay and minimum queue time.
• Run one CE for large farms is a limiting factor. More matched CEs per RB would reduce failure and increase performance.
• File system study will provide more information soon.
Research in Gridification technologies for conventional software
• Users expend years developing their source code, and they will not throw away just to use web services.
• I developed an algorithm that will allow users use their own software on top of a web service layer with LCG middleware.
• Preliminary tests using “fake” web services (simulated with PVM) show it is a viable and flexible approach.
Gridification algorithm
• Creates parallel processes using PVM with ssh remote shell.
• There is a central job, with distributes tasks over parallel processes, when slaves processes return results. No need for load balancing!
• Controls slaves failures and resubmission to available slaves. There is not a checkpoint system (not worth).
• Transfer time can be a bottleneck. Task streams implemented. Results with 300 empty processes in one laptop show a transfer time of 185 ms/process.
Conclusion
• EasyGrid is operational. Benchmarks were a proof-of-concept under real conditions.
• LCG testbed is operational, providing results, and supporting performance analysis and tuning.
• Gridification algorithm is running in one Laptop with Genetic Programming/AI.
New year resolution
• Analysis of linux kernel related file server issues. • LCG Performance study and Linux kernel tuning.• Implementation of EasyTau: a submission module
for TauUser package using EasyGrid (running on ntuples).
• Gridification algorithm running with LCG and commercial applications (WebSphere, Tivoli, Symphony, etc)
• EasyGrid Product development and startup.• Run pi0 project again with EasyGrid Product and
maybe … publish a paper about gridification!
Happy new year!