20
Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility [email protected] NOBUGS 2008, Sydney

Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility [email protected] NOBUGS 2008, Sydney

Embed Size (px)

Citation preview

Page 1: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Distributed computing at the Facility level: applications and attitudes

Tom GriffinSTFC ISIS Facility

[email protected]

NOBUGS 2008, Sydney

Page 2: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Spare cycles

• Typical PC CPU usage is about 10%

• Usage minimal 5pm – 8am

• Most desktop PCs are really fast

• Waste of energy

• How can we use (“steal?”) unused CPU

cycles to solve computational problems?

Page 3: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Types of Application

•CPU Intensive

•Low to moderate memory use

•Not too much file output

•Coarse grained

•Command line / batch driven

•Licensing issues?

Page 4: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Distributed computing solutions

Lots of choice CONDOR, GridEngine, GridMP…

• Grid MP Server hardware• Two, dual Xeon 2.8GHz servers RAID 10

• Software• Servers run RedHat Linux Enterprise Server / DB2• Unlimited Windows (and other) clients

•Programming• Web Services interface – XML, SOAP• Accessed with C++ , Java, C#

• Management Console• Web browser based• Can manage services, jobs, devices etc

• Large industrial user base•GSK, J&J, Novartis etc.

Page 5: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Installing and Running Grid MP

Server Installation2 hours

Client InstallationCreate MSI and RPM using ‘setmsiprop’30 seconds

Manual InstallBetter security on Linux and Macs

Page 6: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Adapting a program for GridMP

1) Think about how to split your data

2) Wrap your executable

3) Write the application service• Pre and Post processing

• Fairly easy to write

• Interface to grid via Web Services

•C++, Java, C#

Page 7: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Package your executable

PROGRAM MODULEEXECUTABLE

Uploaded to, and residenton, the server

ExecutableDLLs Standard data

files Environmentvariables

Compress?

Encrypt? }

Page 8: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Create / run a jobPkg1 Pkg4Molecules Proteins

Pkg2 Pkg3

Create job, generatecross product

Datasets

Workunits

Clie

nt s

ide

Ser

ver

side

https://

Start job

Page 9: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Code examples

Mgsi.Job job = new Mgsi.Job();job.application_gid = app.application_gid;job.description = txtJobName.Text.Trim();job.state_id = 1;job.job_gid = ud.createJob(auth, job);

Mgsi.JobStep js = new Mgsi.JobStep();js.job_gid = job.job_gid;js.state_id = 1;js.max_concurrent = 1js.max_errors = 20;js.num_results = 1;js.program_gid = prog.program_gid;

Page 10: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Code examplesMgsi.DataSet ds = new Mgsi.DataSet();ds.job_gid = job.job_gid;ds.data_set_name = job.description + "_ds_" + DateTime.Now.Ticks;ds.data_set_gid = ud.createDataSet(auth, ds);

for (int i = 1; i <= numWorkunits.Value; i++) {FileTransfer.UploadData uploadD = ft.uploadFile(auth, Application.StartupPath + "\\testdata.tar");Mgsi.Data data = new Mgsi.Data();data.data_set_gid = ds.data_set_gid;data.index = i;data.file_hash = uploadD.hash;

data.file_size = long.Parse(uploadD.size);datas[i - 1] = data; }

ud.createDatas(auth, datas);

ud.createWorkunitsFromDataSetsAsync(auth, js.job_step_gid, new string[] { ds.data_set_gid }, options);

Page 11: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

PerformanceFamotidine form B13 degrees of freedomP21/c V=1421Sync data to 1.64A1 x 107 moves per run, 64 runs

Standard DASH2.4GHz Core2 Quadusing single core

Job complete = 9 hrs

Gdash submit to testgrid of 5 in-use PCs4 x 2.4GHz Core2 Quad1 x 2.8GHz Core2 Quad

Job complete = 24 minutes

Speedup = 22.5 x

Page 12: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Performance – 999 SA runs, full grid

Time

Wor

kuni

ts

317 coresfrom 163 devices

42 Athlons: 1.6–2.2Ghz168 Core 2 duos: 1.8–3 Ghz36 Core 2 quads: 2.4–2.8 Ghz1 duron @ 1.2Ghz42 P4s 2.4–3.6Ghz27 Xeons: 2.5–3.6Ghz

4 days 18 hours CPU in ~40 minutes elapsed time

Page 13: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

A Particular Success - McStas

HRPD supermirror guide design

Complex designMeaningful simulations take a long time

Want to try lots of ideas

Many runs of >200 CPU days

Simpler model was best value

Massive improvement in flux

Significant cost savings

Page 14: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Problems

McStas

Interactions in the wild

Symantec Anti-Virus

Did not show up in testing

McStas restricted to night running only

Page 15: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

User Attitudes

A range

Theft

“I’m not having that on my machine”

First thing to get blamed

Gaining more trust

Evangelism by users

Page 16: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Flexibility with virtualisation

Request to run ‘GARefl’ code

ISIS is Windows based

Few Linux PCs

VMWare server is freeware

8 Hosts gave 26 cores

More cores = more demand

56 real cores recruited from servers, 64-core Beowulf

10 mac cores

Run Linux as a job

Page 17: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Flexibility with virtualisation

Page 18: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

The Future

Grid growing in power every dayNew machines added, old ones still left on

ElectricityEnergy saving drive at STFC – switch machines off

Wake On-LAN ‘Magic Packets’ + Remote hibernate

LaptopsGood or bad?

Page 19: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Summary

Distributed computing Perfect for coarse-grained,CPU intensive, ‘disk-lite’

Resources Use existing resources. Power increases with time, no need to write-off assets. Scalable

Not just faster Allows one to try different scenarios

Virtualisation Linux under Windows, Windows under Linux.

Green credentials PCs are running anyway, better to utilise them. Can be powered down & up.

Page 20: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Acknowledgements

ISIS Data Analysis GroupKenneth ShanklandDamian Flannery

STFC FBU IT Service Desk and ISIS Computing Group

Key UsersRichard Ibberson (HRPD)Stephen Holt (GARefl)

Questions?