View
220
Download
0
Category
Tags:
Preview:
Citation preview
Tony Doylea.doyle@physics.gla.ac.uk
GridPP – From Prototype To Production, HEPiX Meeting, Edinburgh, 25 May 2004
Tony Doyle - University of Glasgow
OutlineOutline
• GridPP Project• Introduction• UK Context• Components:
A. ManagementB. MiddlewareC. ApplicationsD. Tier-2E. Tier-1F. Tier-0
• Challenges:– Middleware Validation– Improving Efficiency– Meeting Experiment Requirements– ..Via The Grid?– Work Group Computing– Events.. To Files.. To Events– Software Distribution– Distributed Analysis
• Historical Perspective
• What is the Grid Anyway?
• Is GridPP a Grid?• Summary
Tony Doyle - University of Glasgow
GridPP – A UK Computing Grid for GridPP – A UK Computing Grid for Particle PhysicsParticle Physics
GridPP
19 UK Universities, CCLRC (RAL & Daresbury) and CERN
Funded by the Particle Physics and Astronomy Research Council (PPARC)
GridPP1 - Sept. 2001-2004 £17m "From Web to Grid"
GridPP2 – Sept. 2004-2007 £16(+1)m "From Prototype to Production"
Tony Doyle - University of Glasgow
UK Core e-Science
Programme
Institutes
Tier-2 Centres
CERNLCG
EGEE
GridPP
GridPP in ContextGridPP in Context
Tier-1/A
Middleware, Security,
Networking
Experiments
GridSupportCentre
Not to scale!
Apps Dev
AppsInt
GridPP
Tony Doyle - University of Glasgow
GridPP1 ComponentsGridPP1 Components
6/Feb/2004
£3.57m
£5.67m
£3.74m
£2.08m£1.84m
CERN
DataGrid
Tier - 1/A
ApplicationsOperations
LHC Computing Grid Project (LCG)Applications, Fabrics, Technology and Deployment
European DataGrid (EDG)Middleware Development
UK Tier-1/A Regional CentreHardware and Manpower
Grid Application DevelopmentLHC and US Experiments + Lattice QCD
Management Travel etc
Tony Doyle - University of Glasgow
May 2004
£0.75m
£2.62m
£3.02m
£0.88m
£0.69m
£2.75m
£2.79m
£1.00m
£2.40m
Tier-1/AHardware
Tier-2Operations
Applications
M/S/N
LCG-2
MgrTravel
Ops
Tier-1/AOperations
GridPP2 ComponentsGridPP2 Components
C. Grid Application DevelopmentLHC and US Experiments + Lattice QCD + Phenomenology
B. Middleware Security NetworkDevelopment
F. LHC Computing Grid Project (LCG Phase 2) [review]
E. Tier-1/A Deployment:Hardware, System Management, Experiment Support
A. Management, Travel, Operations
D. Tier-2 Deployment: 4 Regional Centres - M/S/N support and System Management
Tony Doyle - University of Glasgow
A. GridPP ManagementA. GridPP Management
Collaboration Board
Project ManagementBoard
Project Leader
Project Manager
Technical (Deployment)
Board
Experiments (User)Board
(Production Manager)
(Dissemination Officer)
GGF, LCG, EDG (EGEE), UK e-
Science, Liaison
GridPP1 (GridPP2)
Project Map
Risk Register
Tony Doyle - University of Glasgow
In LCG In LCG ContextContext
A. Management A. Management StructureStructure
ARDA
Ex
pm
tsEG
EE LCG
Deployment Board
Tier1/Tier2,Testbeds,
Rollout
Servicespecification& provision
User Board
Requirements
ApplicationDevelopment
Userfeedback
Metadata
Workload
Network
Security
Info. Mon.
PMB
CB
Storage
Tony Doyle - University of Glasgow
ARDA
Expmts
EGEE
LCG
Dep
loym
ent
Bo
ard
Tie
r1/T
ier2
,T
estb
eds,
Ro
llou
t
Ser
vice
spec
ific
atio
n&
pro
visi
on
Use
r B
oar
d
Req
uir
emen
ts
Ap
plic
atio
nD
evel
op
men
t
Use
rfe
edb
ack
Met
adat
a
Wo
rklo
ad
Net
wo
rk
Sec
uri
ty
Info
. M
on
.
PM
B
Sto
rag
e
III. Grid Middleware
I. Experiment Layer
II. Application Middleware
IV. Facilities and Fabrics
UserBoard
DeploymentBoard
GridPP2 Project GridPP2 Project Managing the MiddlewareManaging the Middleware
B. Middleware, Security and B. Middleware, Security and Network Development Network Development
Tony Doyle - University of Glasgow
B. Middleware, Security and B. Middleware, Security and Network Development Network Development
M/S/N builds upon UK strengths as part of International development
Configuration Management
Storage Interfaces
Network Monitoring
Security
Information Services
Grid Data Management
SecurityMiddleware
Networking
Tony Doyle - University of Glasgow
C. Application DevelopmentC. Application Development
Fabric
TapeStorage
Elements
RequestFormulator and
Planner
Client Applications
ComputeElements
Indicates component that w ill be replaced
DiskStorage
Elements
LANs andWANs
Resource andServices Catalog
ReplicaCatalog
Meta-dataCatalog
Authentication and SecurityGSISAM-specific user, group , node, st at ion regis tration B bftp ‘cookie’
Connectivity and Resource
CORBA UDP File transfer protocol s - ftp, b bftp, rcp GridFTP
Mass Storage s ystems protocol se.g. encp, hp ss
Collective Services
C atalogproto co ls
Signi fi cant Event Log ger Naming Service Database ManagerC atalog Manager
SAM R es ource M an ag em entB atch Sys tems - LSF, FB S, PB S,
C ondorData Mov erJob Services
Storage ManagerJob ManagerCache ManagerRequest Manager
“Dataset Editor” “File Storage Server”“Project Master” “Station M aster” “Station M aster”
Web Python codes, Java codesCom mand line D0 Fram ework C++ codes
“Stager”“Optim iser”
CodeRepostory
Name in “quotes” is SAM-given software component name
or addedenhanced using PPDG and Grid tools
GANGA
SAMGridLattice QCD
AliEn → ARDA
CMS
BaBar
Tony Doyle - University of Glasgow
D. UK Tier-2 CentresD. UK Tier-2 Centres
NorthGrid ****Daresbury, Lancaster, Liverpool,Manchester, Sheffield
SouthGrid *Birmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick
ScotGrid *Durham, Edinburgh, Glasgow
LondonGrid ***Brunel, Imperial, QMUL, RHUL, UCL
Current UK Status:10 Sites via LCG
Tony Doyle - University of Glasgow
D. The UK Testbed: D. The UK Testbed: Hidden SectorHidden Sector
Tony Doyle - University of Glasgow
E. The UK Tier-1/A CentreE. The UK Tier-1/A Centre
• High quality data services• National and International Role• UK focus for International Grid
development
LHCb
ATLAS
CMS
BaBar
April 2004:• 700 Dual CPU• 80TB Disk• 60TB Tape (Capacity 1PB)
Grid Operations Centre
Tony Doyle - University of Glasgow
Real Time Grid MonitoringReal Time Grid Monitoring
LCG224 May
2004
Tony Doyle - University of Glasgow
E. Grid OperationsE. Grid Operations
• Grid Operations Centre– Core Operational Tasks – Monitor infrastructure, components and
services– Troubleshooting– Verification of new sites joining Grid– Acceptance tests of new middleware
releases– Verify suppliers are meeting SLA– Performance tuning and optimisation– Publishing use figures and accounts– Grid information services – Monitoring services – Resource brokering – Allocation and scheduling services – Replica data catalogues – Authorisation services – Accounting services
• Grid Support Centre– Core Support Tasks – Running UK Certificate Authority
Tony Doyle - University of Glasgow
F. Tier 0 and LCG: F. Tier 0 and LCG: Foundation ProgrammeFoundation Programme
• Aim: build upon Phase 1
• Ensure development programmes are linked
• Project management:
GridPP LCG
• Shared expertise:
• LCG establishes the global computing infrastructure
• Allows all participating physicists to exploit LHC data
• Earmarked UK funding to be reviewed in Autumn 2004
Required Foundation: LCG Fabric, Technology and Deployment
Tony Doyle - University of Glasgow
Ta
gg
ed
re
lea
se s
ele
cte
d f
or
cert
ifica
tion
Ce
rtifi
ed
re
lea
se s
ele
cte
d f
or
de
plo
yme
nt
Ta
gg
ed
pa
cka
ge
Problem reports
add unittested code to
repository
Run nightly build
& auto. testsGrid certification
Fix problemsApplication Certification
BuildSystem
CertificationTestbed ~40CPU
ApplicationTestbed ~1000CPU
Certified publicrelease
for use by apps.
24x7
Build system
Test Group
WPs
Unit Test Build Certification Production
Users
DevelopmentTestbed ~15CPU
Individual WP tests
IntegrationTeam
Integration
Overall release
tests
Releases candidate
Tagged Releases
Releases candidate
Certified Releases
Apps. Representatives
Process to:Test frameworksTest supportTest policiesTest documentationTest platforms/compilers
The Challenges Ahead I: The Challenges Ahead I: Implementing the Validation ProcessImplementing the Validation Process
Tony Doyle - University of Glasgow
The Challenges Ahead II: The Challenges Ahead II: Improving Grid “Efficiency”Improving Grid “Efficiency”
Efficiency (Successful Jobs / Jobs submitted)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
De
c-0
2
Jan
-03
Fe
b-0
3
Ma
r-0
3
Ap
r-0
3
Ma
y-0
3
Jun
-03
Jul-
03
Au
g-0
3
Se
p-0
3
Oct
-03
No
v-0
3
De
c-0
3
Jan
-04
Fe
b-0
4Su
cc
es
sfu
l Jo
bs
/ J
ob
s s
ub
mit
ed
CMS EDGv1.4 Altlas EDGv1.4 LHCb EDGv1.4 LCG1 (EDG v2.0) EDG appl. TB v2.x
Tony Doyle - University of Glasgow
The Challenges Ahead III: The Challenges Ahead III: Meeting Meeting Experiment Requirements (UK)Experiment Requirements (UK)
CPU
0
2000
4000
6000
8000
10000
12000
2004 2005 2006 2007
Year
kS
I20
00
ye
ar
ATLAS
CMS
LHCb
ALICE
Phenomenology
ZEUS
UKQCD
UKDMC
MINOS
MICE
LISA
D0
CDF
BaBar
ANTARES
LHC
NonLHC
Disk
0
500
1000
1500
2000
2500
2004 2005 2006 2007
Year
TB
ATLASCMSLHCbALICEPhenomenologyUKQCDUKDMCMINOSMICED0CRESSTCDFBaBarANTARES
LHC
NonLHC
Total Requirement:
Year 2004 2005 2006 2007
CPU [kSI2000] 2395 4066 6380 9965
Disk [TB] 369 735 1424 2285
Tape [TB] 376 752 1542 2623
In International Context -Q2 2004 LCGResources:
Tony Doyle - University of Glasgow
Dynamic Grid Optimisation over JANET
Network
2004 2007 ~7,000 1GHz CPUs ~30,000 1GHz CPUs ~400 TB disk ~2200 TB disk
(note x2 scale change)
The Challenges Ahead IV: The Challenges Ahead IV: Using (Anticipated) Grid ResourcesUsing (Anticipated) Grid Resources
Tony Doyle - University of Glasgow
The Challenges Ahead V: The Challenges Ahead V: Work Group ComputingWork Group Computing
Tony Doyle - University of Glasgow
The Challenges Ahead VI:The Challenges Ahead VI:Events.. to Files.. to EventsEvents.. to Files.. to Events
RAWRAW
ESDESD
AODAOD
TAGTAG
““Interesting Events List” Interesting Events List”
RAWRAW
ESDESD
AODAOD
TAGTAG
RAWRAW
ESDESD
AODAOD
TAGTAG
Tier-0Tier-0(International)(International)
Tier-1Tier-1(National)(National)
Tier-2Tier-2(Regional)(Regional)
Tier-3Tier-3(Local)(Local)
DataFiles
DataFiles
DataFiles
TAGData
DataFilesData
FilesDataFiles
RAWDataFile
DataFilesData
FilesESDData
DataFilesData
FilesAODData
Event 1 Event 2 Event 3
• VOMS-enhanced Grid certificates to access databases via metadata
• Non-Trivial..
Tony Doyle - University of Glasgow
The Challenges Ahead VII:The Challenges Ahead VII:software distributionsoftware distribution
• ATLAS Data Challenge (DC2) this year to validate world-wide computing model
• Packaging, distribution and installation: Scale:one release build takes 10 hours produces 2.5 GB of files
• Complexity: 500 packages, Mloc, 100s of developers and 1000s of users– ATLAS collaboration
is widely distributed:140 institutes, all wanting to use the software
– needs ‘push-button’ easy installation..
Physics Models
Monte Carlo Truth DataMonte Carlo Truth Data
MC Raw DataMC Raw Data
Reconstruction
MC Event Summary DataMC Event Summary Data MC Event Tags MC Event Tags
Detector Simulation
Raw DataRaw Data
Reconstruction
Data Acquisition
Level 3 trigger
Trigger TagsTrigger Tags
Event Summary Data
ESD
Event Summary Data
ESD Event Tags Event Tags
Calibration DataCalibration Data
Run ConditionsRun Conditions
Trigger System
Step 1: Monte Carlo
Data Challenges
Step 1: Monte Carlo
Data Challenges
Step 2: Real DataStep 2: Real Data
Tony Doyle - University of Glasgow
Complex workflow… Complex workflow… LCG/ARDA DevelopmentLCG/ARDA Development
1. AliEn (ALICE Grid) provided a pre-Grid implementation [Perl scripts]
2. ARDA provides a framework for PP application middleware
The Challenges Ahead VIII:The Challenges Ahead VIII:distributed analysisdistributed analysis
Tony Doyle - University of Glasgow
Historical PerspectiveHistorical Perspective
• I wrote in 1990 a program called "WorlDwidEweb", a point and click hypertext editor which ran on the "NeXT" machine. This, together with the first Web server, I released to the High Energy Physics community at first, and to the hypertext and NeXT communities in the summer of 1991.
• Tim Berners-Lee
• The first three years were a phase of persuasion, aided by my colleague and first convert Robert Cailliau, to get the Web adopted…
• We needed seed servers to provide incentive and examples, and all over the world inspired people put up all kinds of things…
• Between the summers of 1991 and 1994, the load on the first Web server ("info.cern.ch") rose steadily by a factor of 10 every year…
Tony Doyle - University of Glasgow
What is The Grid Anyway?
From Particle Physics PerspectiveThe Grid is:
not hype, but surrounded by it
a working prototype running on testbed(s)…
about seamless discovery of PC resources around the world
using evolving standards for interoperation
the basis for particle physics computing in the 21st Century
not (yet) as transparent as end-users want it to be
Tony Doyle - University of Glasgow
What is “The Grid”What is “The Grid” Is GridPP a Grid?Is GridPP a Grid?Anyway?Anyway?
1. Coordinates resources that are not subject to centralized control
2. … using standard, open, general-purpose protocols and interfaces
3. … to deliver nontrivial qualities of service
1. YES. This is why development and maintenance of a UK-EU-US testbed is important
2. YES... Globus/CondorG/EDG meet this requirement. Common experiment application layers are also important here.
3. NO(T YET)… Experiments define whether this is true - currently only ~100,000 jobs submitted via the testbed c.f. internal component tests of up 10,000 jobs per day. Next step: LCG-2 deployment outcome… this year
http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf
Tony Doyle - University of Glasgow
GridPP –GridPP – Theory and Experiment Theory and Experiment
• UK GridPP started 1/9/01• EU DataGrid: First
Middleware ~1/9/01 Development requires a testbed with feedback– “Operational Grid”
• Fit into UK e-Science structures
• Experience in distributed computing essential to build and exploit the Grid
Scale in UK? 0.5 PBytes and 2,000 distributed CPUs
GridPP in Sept 2004 • Grid jobs are being submitted
now.. user feedback loop is important..
• All experiments have immediate requirements
• Current Experiment Production: “The Grid” is a small component
• Non-technical issues:– Recognising context– Building upon expertise– Defining roles
– Sharing resources
• Major deployment activity is LCG– We contribute significantly to
LCG and our success depends critically on LCG
• “Production Grid” will be difficult to realise: GridPP2 planning underway as part of LCG/EGEE
• Many Challenges Ahead..
GridPP Summary: GridPP Summary: From Web to GridFrom Web to Grid
Tony Doyle - University of Glasgow
GridPP Summary: GridPP Summary: From Prototype to ProductionFrom Prototype to Production
BaBar
D0CDF
ATLAS
CMS
LHCb
ALICE
19 UK Institutes
RAL Computer Centre
CERN ComputerCentre
SAMGrid
BaBarGrid
LCG
EDGGANGA
EGEE
UK PrototypeTier-1/A Centre
CERN PrototypeTier-0 Centre
4 UK Tier-2 Centres
LCG
UK Tier-1/ACentre
CERN Tier-0Centre
200720042001
4 UK Prototype Tier-2 Centres
ARDA
Separate Experiments, Resources, Multiple
Accounts 'One' Production GridPrototype Grids
Recommended