16
GridPP 9 @ NeSC AC Irving, 4/2/04 1 UKQCD 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

Embed Size (px)

Citation preview

Page 1: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 1

UKQCD9th GridPP Collaboration Meeting

QCDgrid:

Status and FutureAlan Irving

University of Liverpool

Page 2: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 2

UKQCDUKQCD and the Grid: QCDgrid architecture

PPARC support

•GridPP1: Phase 1: data grid

•GridPP1: Phase 2: pilot scheme for

distributed processing

•GridPP2: Full distributed processing

•GridPP2: International Lattice Data

Grid activities (ILDG)

Page 3: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 3

UKQCDQCDOC: Columbia + IBM + UKQCD + BNL

10,000+ processors

10 Tflops, £6.6M, July 2004 128 procs Nov 03

Page 4: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 4

UKQCDStop press ....

Following exhaustive tests of the ASIC, orders have now been placed for some 14,720 ASICS for:

• 2048 node development machine ( > 1 Tflop sustained) for assembly in March

• 12,000+ node main machine, for assembly in May

Page 5: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 5

UKQCDUKQCD computing strategy with QCDOC

• Distributed computing Grid

• International standards (ILDG)

• SCIDAC: US strategy

• Local resources compute/data

• Tier 1: Edinburgh

• Tier 2: Edinburgh, Liverpool, Swansea, Southampton (+ RAL)

Node

QCDOC

FE

Grid

• UKQCD approved simulations

• International cooperation with: MILC, Columbia,..

• Data grid for configuration acquisition and storage

• International nodes available

• Job submission software (JSS) for homogeneous physics analysis within UKQCD

• Need for significant clusters at computational nodes, (Liverpool, RAL, ...)

Page 6: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 6

UKQCDBasics of the QCDgrid datagrid

• Currently has 4 sites with 7 RAID disk nodes

• Main design and implementation by EPCC (James Perry)

• Admin by C Maynard (Physics/Edinburgh) + local sys admins

• User requirement/testing driven by Liverpool (C McNeile)

• File replication managed by custom written software built on Globus 2

• Central control thread ensures at least 2 copies of each file at different

sites

• Replica catalogue maps logical names to physical locations

• Metadata catalogue associates physical parameters with files

• XML document for each data file

• XML document storage in eXist XML database, queried by Xpath

Page 7: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 7

UKQCDOperation of the QCDgrid datagrid

• Initial queries via browser GUI

• Production running via command line tools

• Current developments:– Simple interface for data/metadata

submission under development– Grid administration tools– Grid recovery tools including

switching of control thread– EDG software for virtual organisation

management and security.– Data binding in QCDOC codes

Page 8: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 8

UKQCDQCDgrid metadata browser

Page 9: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 9

UKQCDPilot version of job submission software

• Globus toolkit

• EDG software for VO management

and security

• Integrated with datagrid SW

• Pilot running on test grid at EPCC

• Command line job submission

• Job IO can go to user console

• Output files returned automatically

Soon ...

• Deploy on main grid

• Integrate with batch systems (PBS ..)

• Better user interface (GUI..)

GridPP2..

• Full system with real analysis code

Page 10: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 10

UKQCDjob submission test

[alan@qcdtest gridwork]$ qcdgrid-job-submit qcdother.epcc.ed.ac.uk \ /home/alan/gridwork/testrn \-input /home/alan/gridwork/in_seed.dat

Storing results in local directory qcdgridjob000002Storing results in remote directory /tmp/qcdgridjob000024RSL=&(executable=/opt/qcdgrid/qcdgrid-job-controller)

(arguments=/tmp/qcdgridjob000024/jobdesc)(environment=(LD_LIBRARY_PATH /opt/globus/lib:/opt/qcdgrid))

Connecting to port 16395...OUTPUT: iter r.n. 0 0.586089 1 0.651327r.n. seeds written to out_seed.dat testrn: finished Ok! Job has completedRetrieving jobdescRetrieving controller.logRetrieving wrapper.logRetrieving stdoutRetrieving stderrRetrieving out_seed.dat[alan@qcdtest gridwork]$

Page 11: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 11

UKQCD

• UKQCD launched this in 2002 in Boston

• Participants from: USA(Scidac), Japan, Germany,..

• Enable data sharing

• Agree standards

• Steering group of national reps + ..

• 2 working groups

Metadata WG– XML schema– gauge formats etc

International Lattice Data Grid

Middleware WG– Web service standards– Storage Resource Manager

http://www.lqa.rccp.tsukuba.ac.jp/

** Feb 3: CP-PACS (Japan) launch ILDG node at

Page 12: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 12

UKQCD3-continent file browsing

JLAB

UKQCD

LATT03

Page 13: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 13

UKQCDILDG file browser

Page 14: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 14

UKQCDQCDgrid and GridPP2

• Extend Job Submission Software, resource brokering,..

• XML mark-up within main QCDOC production codes

• Web services implementation of replica and metadata catalogues

• Web services ILDG replica and metadata catalogues

• Web services based compute grid using UK and non-UK nodes

Page 15: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 15

UKQCDQCDgrid websites

• QCDgrid home page (at GridPP?):

http://www.gridpp.ac.uk/qcdgrid

• QCDgrid project page at NeSCForge development site:

http://qcdgrid.forge.nesc.ac.uk/

• ILDG project page at JLAB, USA:

http://qcdgrid.lqcd.org/ildg

Page 16: UKQCD GridPP 9 @ NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool

GridPP 9 @ NeSC AC Irving, 4/2/04 16

UKQCDCONCLUSIONS

• UKQCD has operational data grid (QCDgrid)

• QCDOC preparations are well advanced

• ‘Tier 2’ nodes have been (are being) installed

• Work continues on XML tools

• Prototype job submission SW exists and is being developed

• International activity is increasing ( )

• Open software development via NeSC Forge