21
James Cunha Wern er [email protected] Enabling Grid Computer for HEP Babar Team at University of Manchester Resources: www.hep.man.ac.uk/u/jamwer

James Cunha [email protected] Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

Embed Size (px)

Citation preview

Page 1: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Enabling Grid Computer for HEP

Babar Team at

University of Manchester

Resources: www.hep.man.ac.uk/u/jamwer

Page 2: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Human resource strategyPhysicists: Roger, George, John, Jenny, Mark, Marta, Christina, Ming, Nick, Mitch, Andy

11 workers load

Goals: HEP, frontiers of Physics, …

Don’t care with computers, grid, popcorn machine: if available, they use them

Guinea Pig: James

Goal: integration and support2 * workers load

Computeers: Andrews, Alessandra, Mike, Chris, Sabah 3 workers load

Goals: New technologies, new technologies, new technologies, …

Total demand 16 workers load

* Jobs with 5 events instead Millions.

Page 3: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Resources StrategyBefore June September 2004

PCs General interactive use

SLAC terminal (Babar Software)

General interactive use

SLAC terminal (Babar Software)

Babar Software CM2/Monte Carlo

Production

40 machines

80 CPUs

Test Bed: 10 CPUs LCG2

-Babar Software CM2

-Monte Carlo

-Grid Application Dev

Production:70 CPUs LCG2

-only CE/WN

-exclusive non-babar use

Know how Workbook (Physics) Workbook (physics)

A to Z Babar Computing

Page 4: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Grid Test Bed

Page 5: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Page 6: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Software: 850 packages. Tau Datasets: range between 60 files 1GB and 150 files 1GB Total 4,000 GB ~ 10,000 files

Page 7: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Analysis Submission to Grid • Single command: ./easygrid dataset_name• Perform Handlers management and submission• Software based in State-machine

– Verify skimdata available:• If not available perform BbkDatasetTCL to generate

skimData. Each file will be a job.

– Verify if there are handlers pending• If not, script generation (gera.c) with edg-job-submit and

ClassAdds, and script execution. Nest for submission policy and optimisation.

• If yes, verify job status. When the all jobs ended, recover results in user folder.

(Prototype)

Page 8: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Generation and submission

[jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14

Invalid configuration filename: /opt/edg/etc/vomses Your identity:

/C=UK/O=eScience/OU=Manchester/L=HEP/CN=james wernerEnter GRID pass phrase for this identity: Creating temporary proxy .........................................................

Done Creating proxy .................................................... Done Searching pre selected skimdata. Searching previous handlers. Handlers not found. Submiting to GRID . Wait end of process...

Page 9: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Job Status

[jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14

Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy ............................ Done Creating proxy ............................... Done Searching pre selected skimdata. Searching previous handlers. Checking if jobs finished.### Handle ->

https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg Current Status: Scheduled https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg still

pendent. ### Handle -> https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA Current Status: Scheduled https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA still pendent. 4 jobs did not finished ! Try again later.

Page 10: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Job Status and recovery

[jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14

Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy .......................................... Done Creating proxy ........................................................... Done Searching pre selected skimdata. Searching previous handlers. Checking if jobs finished. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg Current Status: Done Exit code: 0 ### Handle -> https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA Current Status: Done Exit code: 0 0 jobs did not finished ! Try again later. All jobs done. Recovering results in your folder. Results in the following folders:

/home/jamwer/grid_sub/babar/jamwer_foRHhWyeDBnbqA9JkDADLg /home/jamwer/grid_sub/babar/jamwer_8DdK3xruxtevNpei3zZbaA

Page 11: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Monte Carlo Submission to Grid

• Single Command: ./mcgrid JobName num_copies• Perform Handlers management and submission.• Software based in State-Machine:

– Verify if there are handlers pending• If not, script generation (geramc.c) with edg-job-submit and

ClassAdds for each copy, and script execution. Nest for submission policy and optimisation.

• If yes, verify job status. When the all jobs ended, recover results in user folder.

(Prototype)

Page 12: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

MC Submission

[jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3

Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james

werner Enter GRID pass phrase for this identity: Creating temporary proxy ................................. Done Creating proxy ....................................................... Done Searching previous handlers. Handlers not found. Submiting to GRID . Wait end of process...

Page 13: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Job Status[jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3

Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james wernerEnter GRID pass phrase for this identity: Creating temporary proxy ........................................ Done Creating proxy ....................................... Done Searching previous handlers. Checking if jobs finished. ### Handle -> https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw Current Status: Scheduled https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw still pendent. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg Current Status: Ready https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg still

pendent. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/L5BD1OE--

eckTm5RXkp2nA Current Status: Ready https://lcgrb01.gridpp.rl.ac.uk:9000/L5BD1OE--eckTm5RXkp2nA still

pendent. 3 jobs did not finished ! Try again later.

Page 14: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Job status and recovery[jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy .................................................. Done Creating proxy .................................................... Done Searching previous handlers. Checking if jobs finished. ### Handle -> https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw Current Status: Done Exit code: 0 ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg Current Status: Done Exit code: 0 0 jobs did not finished ! Try again later. All jobs done. Recovering results in your folder. Results in the following

folders: /home/jamwer/grid_sub/mcgrid1/jamwer_9WzceoIMEQoTK24a-UvOmw /home/jamwer/grid_sub/mcgrid1/jamwer_c4iCB8vioozaGteI9hybIg /home/jamwer/grid_sub/mcgrid1/jamwer_L5BD1OE--eckTm5RXkp2nA

Page 15: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Testing Submission Script• Load Range: Worker load x #Files

– 16 x 60 files = 960 jobs pendent

– 16 x 150 files = 2400 jobs pendent

• Test with Submission script

100 Jobs 1000 Jobs

Submission Result recovery

Submission Result recovery

Done 99 99 255 253

Aborted 1 36 117 **

Scheduled 79

Fail 1** 630 * 630

* sslv3 alert handshake failure** Please wait job enter the “Done” status. This never happens!

Resource Broker not reliable or robust. Sometimes failure 3 days a week or takes hours to submit/dispatch to CE (empty!).

Page 16: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Pending Infrastructure => Course of action• Babar Software Know How is not available at Manchester => Web

Page & Network skills.• Quality Assurance => We are OK! from benchmark (E x P)• Real Application to perform complete cycle, acquire know how, and

grid prof-of-concept is missing => Partnership with physicists• CERN does NOT recognise Babar Community => Lets reduce their

priority!• RB at Manchester => 60MB binaries and policies freedom.• SE/RC at Manchester => policies and submission jobs freedom.• Mass storage (10TB) for Babar purposes => CAP!• UI in the AFS => wide access to Manchester farms.• Apprenticeship at RAL and later at SLAC – production and

experiment => improve where others fail• Configuration for optimal job performance/submission at Tear 2 (1 Ce

x 50 WN? Performance dCache with Babar Software? Why 10TB if Liverpool bought 80TB? Electricity bill? => analyse procedures to improve QoS and better Site Configuration

• Update (software and data) and operational policies => operational standards to achieve high QoS

Page 17: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Aimed Hardware Architecture

(Redundant RB with alternate access)

Page 18: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Aimed Software Architecture

Page 19: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Production Job Submission Package

• Operational policies/integration with RB (application level).

• Recovery of aborted status.• Resources optimisation.• Integration with RC (application level) for replicas

policies development.• Interactive data visualisation (Useful?)• Integration with GridSite (Data visualisation,

analysis, performance monitor, and submission)• Professional version.

Page 20: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Integrate LCG2 and Job Submission with Babar/CM2 at

University of Manchester for Tau Physics modelling, analysis and

MC generation.

We aim to be soon…• The largest site in UK.• Leader in grid computing and HEP

Summary

Page 21: James Cunha Wernerjamwer@hep.man.ac.uk Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

James Cunha Werner [email protected]

Conclusion

Babar CM2 is running at Manchester! LCG2 Grid is running with real world experiment!

Babar submission prototype to Grid is running !

LCG is not LHC software only! It is Babar’s.We are doing today what will take years to you to

achieve. Lets work together!