Upload
jesus-weeks
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
GridPP 9 @ NeSC AC Irving, 4/2/04 1
UKQCD9th GridPP Collaboration Meeting
QCDgrid:
Status and FutureAlan Irving
University of Liverpool
GridPP 9 @ NeSC AC Irving, 4/2/04 2
UKQCDUKQCD and the Grid: QCDgrid architecture
PPARC support
•GridPP1: Phase 1: data grid
•GridPP1: Phase 2: pilot scheme for
distributed processing
•GridPP2: Full distributed processing
•GridPP2: International Lattice Data
Grid activities (ILDG)
GridPP 9 @ NeSC AC Irving, 4/2/04 3
UKQCDQCDOC: Columbia + IBM + UKQCD + BNL
10,000+ processors
10 Tflops, £6.6M, July 2004 128 procs Nov 03
GridPP 9 @ NeSC AC Irving, 4/2/04 4
UKQCDStop press ....
Following exhaustive tests of the ASIC, orders have now been placed for some 14,720 ASICS for:
• 2048 node development machine ( > 1 Tflop sustained) for assembly in March
• 12,000+ node main machine, for assembly in May
GridPP 9 @ NeSC AC Irving, 4/2/04 5
UKQCDUKQCD computing strategy with QCDOC
• Distributed computing Grid
• International standards (ILDG)
• SCIDAC: US strategy
• Local resources compute/data
• Tier 1: Edinburgh
• Tier 2: Edinburgh, Liverpool, Swansea, Southampton (+ RAL)
Node
QCDOC
FE
Grid
• UKQCD approved simulations
• International cooperation with: MILC, Columbia,..
• Data grid for configuration acquisition and storage
• International nodes available
• Job submission software (JSS) for homogeneous physics analysis within UKQCD
• Need for significant clusters at computational nodes, (Liverpool, RAL, ...)
GridPP 9 @ NeSC AC Irving, 4/2/04 6
UKQCDBasics of the QCDgrid datagrid
• Currently has 4 sites with 7 RAID disk nodes
• Main design and implementation by EPCC (James Perry)
• Admin by C Maynard (Physics/Edinburgh) + local sys admins
• User requirement/testing driven by Liverpool (C McNeile)
• File replication managed by custom written software built on Globus 2
• Central control thread ensures at least 2 copies of each file at different
sites
• Replica catalogue maps logical names to physical locations
• Metadata catalogue associates physical parameters with files
• XML document for each data file
• XML document storage in eXist XML database, queried by Xpath
GridPP 9 @ NeSC AC Irving, 4/2/04 7
UKQCDOperation of the QCDgrid datagrid
• Initial queries via browser GUI
• Production running via command line tools
• Current developments:– Simple interface for data/metadata
submission under development– Grid administration tools– Grid recovery tools including
switching of control thread– EDG software for virtual organisation
management and security.– Data binding in QCDOC codes
GridPP 9 @ NeSC AC Irving, 4/2/04 8
UKQCDQCDgrid metadata browser
GridPP 9 @ NeSC AC Irving, 4/2/04 9
UKQCDPilot version of job submission software
• Globus toolkit
• EDG software for VO management
and security
• Integrated with datagrid SW
• Pilot running on test grid at EPCC
• Command line job submission
• Job IO can go to user console
• Output files returned automatically
Soon ...
• Deploy on main grid
• Integrate with batch systems (PBS ..)
• Better user interface (GUI..)
GridPP2..
• Full system with real analysis code
GridPP 9 @ NeSC AC Irving, 4/2/04 10
UKQCDjob submission test
[alan@qcdtest gridwork]$ qcdgrid-job-submit qcdother.epcc.ed.ac.uk \ /home/alan/gridwork/testrn \-input /home/alan/gridwork/in_seed.dat
Storing results in local directory qcdgridjob000002Storing results in remote directory /tmp/qcdgridjob000024RSL=&(executable=/opt/qcdgrid/qcdgrid-job-controller)
(arguments=/tmp/qcdgridjob000024/jobdesc)(environment=(LD_LIBRARY_PATH /opt/globus/lib:/opt/qcdgrid))
Connecting to port 16395...OUTPUT: iter r.n. 0 0.586089 1 0.651327r.n. seeds written to out_seed.dat testrn: finished Ok! Job has completedRetrieving jobdescRetrieving controller.logRetrieving wrapper.logRetrieving stdoutRetrieving stderrRetrieving out_seed.dat[alan@qcdtest gridwork]$
GridPP 9 @ NeSC AC Irving, 4/2/04 11
UKQCD
• UKQCD launched this in 2002 in Boston
• Participants from: USA(Scidac), Japan, Germany,..
• Enable data sharing
• Agree standards
• Steering group of national reps + ..
• 2 working groups
Metadata WG– XML schema– gauge formats etc
International Lattice Data Grid
Middleware WG– Web service standards– Storage Resource Manager
http://www.lqa.rccp.tsukuba.ac.jp/
** Feb 3: CP-PACS (Japan) launch ILDG node at
GridPP 9 @ NeSC AC Irving, 4/2/04 12
UKQCD3-continent file browsing
JLAB
UKQCD
LATT03
GridPP 9 @ NeSC AC Irving, 4/2/04 13
UKQCDILDG file browser
GridPP 9 @ NeSC AC Irving, 4/2/04 14
UKQCDQCDgrid and GridPP2
• Extend Job Submission Software, resource brokering,..
• XML mark-up within main QCDOC production codes
• Web services implementation of replica and metadata catalogues
• Web services ILDG replica and metadata catalogues
• Web services based compute grid using UK and non-UK nodes
GridPP 9 @ NeSC AC Irving, 4/2/04 15
UKQCDQCDgrid websites
• QCDgrid home page (at GridPP?):
http://www.gridpp.ac.uk/qcdgrid
• QCDgrid project page at NeSCForge development site:
http://qcdgrid.forge.nesc.ac.uk/
• ILDG project page at JLAB, USA:
http://qcdgrid.lqcd.org/ildg
GridPP 9 @ NeSC AC Irving, 4/2/04 16
UKQCDCONCLUSIONS
• UKQCD has operational data grid (QCDgrid)
• QCDOC preparations are well advanced
• ‘Tier 2’ nodes have been (are being) installed
• Work continues on XML tools
• Prototype job submission SW exists and is being developed
• International activity is increasing ( )
• Open software development via NeSC Forge