View
218
Download
0
Category
Tags:
Preview:
Citation preview
Tony Doylea.doyle@physics.gla.ac.uk
GridPP – From Prototype To Production, GridPP10 Meeting,CERN, 2 June 2004
Tony Doyle - University of Glasgow
OutlineOutline
• GridPP Project• Introduction• UK Context• Components:
A. ManagementB. MiddlewareC. ApplicationsD. Tier-2E. Tier-1F. Tier-0
• Challenges:1. Middleware Validation2. Improving Efficiency3. Meeting Experiment
Requirements4. ..via The Grid?5. Work Group Computing6. Events.. To Files.. To
Events7. Software Distribution8. Distributed Analysis9. Production Accounting10.Sharing Resources
• Summary
Tony Doyle - University of Glasgow
GridPP – A UK Computing Grid for GridPP – A UK Computing Grid for Particle PhysicsParticle Physics
GridPP
19 UK Universities, CCLRC (RAL & Daresbury) and CERN
Funded by the Particle Physics and Astronomy Research Council (PPARC)
GridPP1 - Sept. 2001-2004 £17m "From Web to Grid"
GridPP2 – Sept. 2004-2007 £16(+1)m "From Prototype to Production"
Tony Doyle - University of Glasgow
UK Core e-Science
Programme
Institutes
Tier-2 Centres
CERNLCG
EGEE
GridPP
GridPP in ContextGridPP in Context
Tier-1/A
Middleware, Security,
Networking
Experiments
GridSupportCentre
Not to scale!
Apps Dev
AppsInt
GridPP
Tony Doyle - University of Glasgow
GridPP1 ComponentsGridPP1 Components
6/Feb/2004
£3.57m
£5.67m
£3.74m
£2.08m£1.84m
CERN
DataGrid
Tier - 1/A
ApplicationsOperations
LHC Computing Grid Project (LCG)Applications, Fabrics, Technology and Deployment
European DataGrid (EDG)Middleware Development
UK Tier-1/A Regional CentreHardware and Manpower
Grid Application DevelopmentLHC and US Experiments + Lattice QCD
Management Travel etc
Tony Doyle - University of Glasgow
May 2004
£0.75m
£2.62m
£3.02m
£0.88m
£0.69m
£2.75m
£2.79m
£1.00m
£2.40m
Tier-1/AHardware
Tier-2Operations
Applications
M/S/N
LCG-2
MgrTravel
Ops
Tier-1/AOperations
GridPP2 ComponentsGridPP2 Components
C. Grid Application DevelopmentLHC and US Experiments + Lattice QCD + Phenomenology
B. Middleware Security NetworkDevelopment
F. LHC Computing Grid Project (LCG Phase 2) [review]
E. Tier-1/A Deployment:Hardware, System Management, Experiment Support
A. Management, Travel, Operations
D. Tier-2 Deployment: 4 Regional Centres - M/S/N support and System Management
Tony Doyle - University of Glasgow
A. GridPP ManagementA. GridPP Management
Collaboration Board
Project ManagementBoard
Project Leader
Project Manager
Technical (Deployment)
Board
Experiments (User)Board
(Production Manager)
(Dissemination Officer)
GGF, LCG, EDG (EGEE), UK e-
Science, Liaison
GridPP1 (GridPP2)
Project Map
Risk Register
Tony Doyle - University of Glasgow
GridPP PMB GridPP PMB Who’s WhoWho’s Who
CB ChairSteve Lloyd
Project LeaderTony Doyle
Project Manager Dave Britton
User Board ChairRoger Barlow
“External Input”
Dissemination Officer Sarah Pearce
CERN Liaison Tony Cass
UK e-Science Liaison Neil
Geddes
GGF Liaison Pete Clarke
Deployment Board Chair Dave Kelsey
Applications Coordinator Roger
Jones
Middleware Coordinator Robin Middleton
Tier-2 Board Chair Steve Lloyd
Tier-1 Board Chair Tony Doyle
Productn. Manager Jeremy Coles
PPARC Head of e-Science
Guy Rickett
Deputy Project Leader John Gordon
“authority” via the Collaboration Board “reporting” via the Project Manager “strategic” from the User Board and Deployment Board “external” from the dissemination officer and liaison members
Roles http://ppewww.ph.gla.ac.uk/~doyle/gridpp2/roles/Context http://www.gridpp.ac.uk/pmb/docs/PMB-36-Work_Areas-1.4.doc
Tony Doyle - University of Glasgow
In LCG In LCG ContextContext
A. Management A. Management StructureStructure
ARDA
Ex
pm
tsEG
EE LCG
Deployment Board
Tier1/Tier2,Testbeds,
Rollout
Servicespecification& provision
User Board
Requirements
ApplicationDevelopment
Userfeedback
Metadata
Workload
Network
Security
Info. Mon.
PMB
CB
Storage
Tony Doyle - University of Glasgow
A. GridPP Management A. GridPP Management Staff EffortStaff Effort
A. Management, Travel, Operations
GridPP2 Roles FTE
Project Leader + Admin. Assistant 0.67
Project Manager 0.9
CB and Tier-2 Board Chair 0.5
Applications Coordinator 0.5
Middleware Coordinator 0.5
DB Chair 0.5
Total 3.57
GridPP2 Roles FTE
Production Manager 1.0
Dissemination Officer 1.0
Total 2.0
Reporting line: Production Manager: via the Deputy Project Leader to the EGEE SA1
Infrastructure activity. Dissemination Officer: via the Project Manager and, partially, to the EGEE
NA2 Dissemination activity.
Tony Doyle - University of Glasgow
ARDA
Expmts
EGEE
LCG
Dep
loym
ent
Bo
ard
Tie
r1/T
ier2
,T
estb
eds,
Ro
llou
t
Ser
vice
spec
ific
atio
n&
pro
visi
on
Use
r B
oar
d
Req
uir
emen
ts
Ap
plic
atio
nD
evel
op
men
t
Use
rfe
edb
ack
Met
adat
a
Wo
rklo
ad
Net
wo
rk
Sec
uri
ty
Info
. M
on
.
PM
B
Sto
rag
e
III. Grid Middleware
I. Experiment Layer
II. Application Middleware
IV. Facilities and Fabrics
UserBoard
DeploymentBoard
GridPP2 Project GridPP2 Project Managing the MiddlewareManaging the Middleware
B. Middleware, Security and B. Middleware, Security and Network Development Network Development
Tony Doyle - University of Glasgow
B. Middleware, Security and B. Middleware, Security and Network Development Network Development
M/S/N builds upon UK strengths as part of International development
Configuration Management
Storage Interfaces
Network Monitoring
Security
Information Services
Grid Data Management
SecurityMiddleware
Networking
Tony Doyle - University of Glasgow
B. Middleware, Security and Network B. Middleware, Security and Network Development: Staff EffortDevelopment: Staff Effort
B. Middleware Security NetworkDevelopment
GridPP2 Work Area
PPARC funding Other funding
Metadata 1.0 0.0
Storage Management 2.0 0.0
Workload Management 1.0 3.0*
Security 3.5 0.0
Information & Monitoring 4.0 4.0
Network Sector 2.0 3.0
LHC Applications 1.0 0.0
Totals 14.5 10.0
Reporting line: via the middleware coordinator and also to the LCG/EGEE JRA1 Middleware area, if agreed, within the LCG/EGEE work areas.
Tony Doyle - University of Glasgow
C. Application DevelopmentC. Application Development
Fabric
TapeStorage
Elements
RequestFormulator and
Planner
Client Applications
ComputeElements
Indicates component that w ill be replaced
DiskStorage
Elements
LANs andWANs
Resource andServices Catalog
ReplicaCatalog
Meta-dataCatalog
Authentication and SecurityGSISAM-specific user, group , node, st at ion regis tration B bftp ‘cookie’
Connectivity and Resource
CORBA UDP File transfer protocol s - ftp, b bftp, rcp GridFTP
Mass Storage s ystems protocol se.g. encp, hp ss
Collective Services
C atalogproto co ls
Signi fi cant Event Log ger Naming Service Database ManagerC atalog Manager
SAM R es ource M an ag em entB atch Sys tems - LSF, FB S, PB S,
C ondorData Mov erJob Services
Storage ManagerJob ManagerCache ManagerRequest Manager
“Dataset Editor” “File Storage Server”“Project Master” “Station M aster” “Station M aster”
Web Python codes, Java codesCom mand line D0 Fram ework C++ codes
“Stager”“Optim iser”
CodeRepostory
Name in “quotes” is SAM-given software component name
or addedenhanced using PPDG and Grid tools
GANGA
SAMGridLattice QCD
AliEn → ARDA
CMS
BaBar
Tony Doyle - University of Glasgow
C. Application Development: C. Application Development: Staff EffortStaff Effort
C. Grid Application DevelopmentLHC and US Experiments + Lattice QCD + Phenomenology
GridPP2 Work Area FTE
ATLAS/LHCb (GANGA) 2.0
ATLAS 2.5
BaBar 2.0
CDF/D0 (SAM) 2.0
CDF 1.0
CMS 3.0
D0 1.0
LHCb 2.0
Portal 1.0
UKQCD 1.0
PhenoGrid 1.0
Total 18.5
Reporting line: via the applications coordinator.
Tony Doyle - University of Glasgow
D. UK Tier-2 CentresD. UK Tier-2 Centres
NorthGrid ****Daresbury, Lancaster, Liverpool,Manchester, Sheffield
SouthGrid *Birmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick
ScotGrid *Durham, Edinburgh, Glasgow
LondonGrid ***Brunel, Imperial, QMUL, RHUL, UCL
Current UK Status:10 Sites via LCG (2 at RAL)
Tony Doyle - University of Glasgow
D. UK Tier-2 Centres: D. UK Tier-2 Centres: Staff EffortStaff Effort
D. Tier-2 Deployment: 4 Regional Centres - M/S/N support and System Management
Reporting line: via the Tier-2 Board Chair for Operations staff. UK Support Posts report via the Production Manager and also to the Deputy
Project Leader for the EGEE SA1 Infrastructure activity. GridPP2 Work Area FTE
Security 1.0
Resource Broker 1.0
Network 0.5
Data Management 1.0
Storage Management 1.0
VO Management 0.5
ScotGrid Operations 1.0
NorthGrid Operations 4.5
Southern Grid Operations 1.0
London Grid Operations 2.5
Total 14.0 [+4.0]
Tony Doyle - University of Glasgow
E. The UK Tier-1/A CentreE. The UK Tier-1/A Centre
• High quality data services• National and International Role• UK focus for International Grid
development
LHCb
ATLAS
CMS
BaBar
April 2004:• 700 Dual CPU• 80TB Disk• 60TB Tape (Capacity 1PB)
Grid Operations Centre
Tony Doyle - University of Glasgow
E. The UK Tier-1/A Centre: E. The UK Tier-1/A Centre: Staff EffortStaff Effort
E. Tier-1/A Deployment:Hardware, System Management, Experiment Support
GridPP2 Work Area
PPARC funding CCLRC funding
CPU 2.0 0.0
Disk 1.5 0.0
Tape 1.5 1.0
Core Services 1.5 0.5
Operations 2.0 0.5
Networking 0.0 0.5
Deployment 2.0 0.0
Experiments 2.0 0.0
Management 1.0 0.5
Totals 13.5 3.0
Reporting line: via the Tier-1 Manager to the Tier-1/A Board.
Tony Doyle - University of Glasgow
Real Time Grid MonitoringReal Time Grid Monitoring
LCG21 June 2004
Tony Doyle - University of Glasgow
E. Grid OperationsE. Grid Operations
• Grid Operations Centre– Core Operational Tasks – Monitor infrastructure, components and
services– Troubleshooting– Verification of new sites joining Grid– Acceptance tests of new middleware
releases– Verify suppliers are meeting SLA– Performance tuning and optimisation– Publishing use figures and accounts– Grid information services – Monitoring services – Resource brokering – Allocation and scheduling services – Replica data catalogues – Authorisation services – Accounting services
• Grid Support Centre– Core Support Tasks – Running UK Certificate Authority
Tony Doyle - University of Glasgow
E. Grid Operations: E. Grid Operations: Staff EffortStaff Effort
Reporting line: via the Deputy Project Leader to the EGEE SA1 Infrastructure activity.
GridPP2 Work Area FTE
Tier-2 Coordinators +4.0
Operation Centre +3.0
Documentation +1.0
Other +1.5
Total +9.5
Tony Doyle - University of Glasgow
F. Tier 0 and LCG: F. Tier 0 and LCG: Foundation ProgrammeFoundation Programme
• Aim: build upon Phase 1
• Ensure development programmes are linked
• Project management:
GridPP LCG
• Shared expertise:
• LCG establishes the global computing infrastructure
• Allows all participating physicists to exploit LHC data
• Earmarked UK funding to be reviewed in Autumn 2004
Required Foundation: LCG Fabric, Technology and Deployment
F. LHC Computing Grid Project (LCG Phase 2) [review]
Tony Doyle - University of Glasgow
Ta
gg
ed
re
lea
se s
ele
cte
d f
or
cert
ifica
tion
Ce
rtifi
ed
re
lea
se s
ele
cte
d f
or
de
plo
yme
nt
Ta
gg
ed
pa
cka
ge
Problem reports
add unittested code to
repository
Run nightly build
& auto. testsGrid certification
Fix problemsApplication Certification
BuildSystem
CertificationTestbed ~40CPU
ApplicationTestbed ~1000CPU
Certified publicrelease
for use by apps.
24x7
Build system
Test Group
WPs
Unit Test Build Certification Production
Users
DevelopmentTestbed ~15CPU
Individual WP tests
IntegrationTeam
Integration
Overall release
tests
Releases candidate
Tagged Releases
Releases candidate
Certified Releases
Apps. Representatives
Process to:Test frameworksTest supportTest policiesTest documentationTest platforms/compilers
The Challenges Ahead I: The Challenges Ahead I: Implementing the Validation ProcessImplementing the Validation Process
Tony Doyle - University of Glasgow
The Challenges Ahead II: The Challenges Ahead II: Improving Grid “Efficiency”Improving Grid “Efficiency”
Efficiency (Successful Jobs / Jobs submitted)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
De
c-0
2
Jan
-03
Fe
b-0
3
Ma
r-0
3
Ap
r-0
3
Ma
y-0
3
Jun
-03
Jul-
03
Au
g-0
3
Se
p-0
3
Oct
-03
No
v-0
3
De
c-0
3
Jan
-04
Fe
b-0
4Su
cc
es
sfu
l Jo
bs
/ J
ob
s s
ub
mit
ed
CMS EDGv1.4 Altlas EDGv1.4 LHCb EDGv1.4 LCG1 (EDG v2.0) EDG appl. TB v2.x
Tony Doyle - University of Glasgow
The Challenges Ahead III: The Challenges Ahead III: Meeting Meeting Experiment Requirements (UK)Experiment Requirements (UK)
CPU
0
2000
4000
6000
8000
10000
12000
2004 2005 2006 2007
Year
kS
I20
00
ye
ar
ATLAS
CMS
LHCb
ALICE
Phenomenology
ZEUS
UKQCD
UKDMC
MINOS
MICE
LISA
D0
CDF
BaBar
ANTARES
LHC
NonLHC
Disk
0
500
1000
1500
2000
2500
2004 2005 2006 2007
Year
TB
ATLASCMSLHCbALICEPhenomenologyUKQCDUKDMCMINOSMICED0CRESSTCDFBaBarANTARES
LHC
NonLHC
Total Requirement:
Year 2004 2005 2006 2007
CPU [kSI2000] 2395 4066 6380 9965
Disk [TB] 369 735 1424 2285
Tape [TB] 376 752 1542 2623
In International Context -Q2 2004 LCGResources:
Tony Doyle - University of Glasgow
Dynamic Grid Optimisation over JANET
Network
2004 2007 ~7,000 1GHz CPUs ~30,000 1GHz CPUs ~400 TB disk ~2200 TB disk
(note x2 scale change)
The Challenges Ahead IV: The Challenges Ahead IV: Using (Anticipated) Grid ResourcesUsing (Anticipated) Grid Resources
Tony Doyle - University of Glasgow
The Challenges Ahead V: The Challenges Ahead V: Work Group ComputingWork Group Computing
Tony Doyle - University of Glasgow
The Challenges Ahead VI:The Challenges Ahead VI:Events.. to Files.. to EventsEvents.. to Files.. to Events
RAWRAW
ESDESD
AODAOD
TAGTAG
““Interesting Events List” Interesting Events List”
RAWRAW
ESDESD
AODAOD
TAGTAG
RAWRAW
ESDESD
AODAOD
TAGTAG
Tier-0Tier-0(International)(International)
Tier-1Tier-1(National)(National)
Tier-2Tier-2(Regional)(Regional)
Tier-3Tier-3(Local)(Local)
DataFiles
DataFiles
DataFiles
TAGData
DataFilesData
FilesDataFiles
RAWDataFile
DataFilesData
FilesESDData
DataFilesData
FilesAODData
Event 1 Event 2 Event 3
• VOMS-enhanced Grid certificates to access databases via metadata
• Non-Trivial..
Tony Doyle - University of Glasgow
The Challenges Ahead VII:The Challenges Ahead VII:software distributionsoftware distribution
• ATLAS Data Challenge (DC2) this year to validate world-wide computing model
• Packaging, distribution and installation: Scale:one release build takes 10 hours produces 2.5 GB of files
• Complexity: 500 packages, Mloc, 100s of developers and 1000s of users– ATLAS collaboration
is widely distributed:140 institutes, all wanting to use the software
– needs ‘push-button’ easy installation..
Physics Models
Monte Carlo Truth DataMonte Carlo Truth Data
MC Raw DataMC Raw Data
Reconstruction
MC Event Summary DataMC Event Summary Data MC Event Tags MC Event Tags
Detector Simulation
Raw DataRaw Data
Reconstruction
Data Acquisition
Level 3 trigger
Trigger TagsTrigger Tags
Event Summary Data
ESD
Event Summary Data
ESD Event Tags Event Tags
Calibration DataCalibration Data
Run ConditionsRun Conditions
Trigger System
Step 1: Monte Carlo
Data Challenges
Step 1: Monte Carlo
Data Challenges
Step 2: Real DataStep 2: Real Data
Tony Doyle - University of Glasgow
Complex workflow… Complex workflow… LCG/ARDA DevelopmentLCG/ARDA Development
1. AliEn (ALICE Grid) provided a pre-Grid implementation [Perl scripts]
2. ARDA provides a framework for PP application middleware
The Challenges Ahead VIII:The Challenges Ahead VIII:distributed analysisdistributed analysis
Tony Doyle - University of Glasgow
Complex workflow… Complex workflow… LCG/ARDA DevelopmentLCG/ARDA Development
Online monitoring
Automatic accounting
Meeting LCG and other requirements
The Challenges Ahead IX:The Challenges Ahead IX:Production AccountingProduction Accounting
GridPP Grid Report for Tue, 1 Jun 2004 14:00:47 +0100 CPUs Total:
1055
Hosts up: 442
Hosts down:
82
Avg Load (15, 5, 1m):
33%, 35%, 36%
Localtime:
2004-06-01 14:00
Tony Doyle - University of Glasgow
The Challenges Ahead X:The Challenges Ahead X:SSharing…haring… MoUs, Guidelines and Policies MoUs, Guidelines and Policies
• Disk/CPU resources allocated to each “group”• Grid is based on distributed resources - a “group” is an experiment• An institute is typically involved in many experiments• Institutes define priorities on computing resources via OPEN policy statements• All jobs submitted via Globus authentication - Certificates identified by user and experiment
• Need to implement Grid “priority”• Minimum amount of data to deliver at a time for a job?• Where to store files?• Which data access/storing activities have the highest priority?• Sharing of the resources among groups?• Users belong to multiple groups? • How many jobs per group are allowed?• What processing activities are allowed at each site?• To which sites should data access and processing activities be sent? • How should the resources of a local cluster of PCs be shared among groups?
• Tier-2 discussion prior to the Collaboration Meeting… issues will arise which require ALL Tier centres to define/sign up to an MoU and publish a policy (See Steve’s talk)
* Implemented by site administrators, with OPEN policies defined at each site based on e.g. case to funding authorityWhat’s new? Ability to monitor/allocate unused resources We will be judged by how well we work as a set of
Virtual Organisations
Tony Doyle - University of Glasgow
GridPP –GridPP – Theory and Experiment Theory and Experiment
• UK GridPP started 1/9/01• EU DataGrid: First
Middleware ~1/9/01 Development requires a testbed with feedback– “Operational Grid”
• Fit into UK e-Science structures
• Experience in distributed computing essential to build and exploit the Grid
Scale in UK? 0.5 PBytes and 2,000 distributed CPUs
GridPP in Sept 2004 • Grid jobs are being submitted
now.. user feedback loop is important..
• All experiments have immediate requirements
• Current Experiment Production: “The Grid” is a small component
• Non-technical issues:– Recognising context– Building upon expertise– Defining roles – Sharing resources
• Major deployment activity is LCG/EGEE
– We contribute significantly to LCG and our success depends critically on LCG
• “Production Grid” will be difficult to realise: GridPP2 planning underway as part of LCG/EGEE
• Work Areas and Roles defined
• Many Challenges Ahead..
GridPP Summary: GridPP Summary: From Web to GridFrom Web to Grid
Tony Doyle - University of Glasgow
GridPP Summary: GridPP Summary: From Prototype to ProductionFrom Prototype to Production
BaBar
D0CDF
ATLAS
CMS
LHCb
ALICE
19 UK Institutes
RAL Computer Centre
CERN ComputerCentre
SAMGrid
BaBarGrid
LCG
EDGGANGA
EGEE
UK PrototypeTier-1/A Centre
CERN PrototypeTier-0 Centre
4 UK Tier-2 Centres
LCG
UK Tier-1/ACentre
CERN Tier-0Centre
200720042001
4 UK Prototype Tier-2 Centres
ARDA
Separate Experiments, Resources, Multiple
Accounts 'One' Production GridPrototype Grids
Recommended