Upload
cody-taylor
View
220
Download
2
Tags:
Embed Size (px)
Citation preview
Tony [email protected]
“GridPP2 Proposal and Responses to Questions”, Grid Steering Committee, Coseners, 28 July 2003
Tony Doyle - University of Glasgow
GridPP2 ProposalGridPP2 Proposal
1. Executive Summary ........................................................................................................................4
2. Outline ............................................................................................................................................5
3. Introduction .....................................................................................................................................6 3.1 Experimental Motivation .......................................................................................................6 3.2 GridPP1: From Web to Grid..................................................................................................7 3.3 GridPP2: From Prototype to Production..............................................................................10
4. Experimental Requirements for the Production Grid .....................................................................17 4.1 CPU Requirements.............................................................................................................17 4.2 Storage Requirements........................................................................................................18 4.3 Networking Requirements ..................................................................................................19 4.4 Total Hardware Requirements............................................................................................19
5. The Grid Computing Hierarchy......................................................................................................19 5.1 Tier-0..................................................................................................................................21 5.2 Tier-1..................................................................................................................................23
A National and International UK Role............................................................................................25 5.3 Tier-2..................................................................................................................................25 5.4 Summary of Tier Centre Production Services .....................................................................27
6. Meeting the Experiments’ Hardware Requirements ......................................................................29
7. Grid Development .........................................................................................................................30 7.1 Middleware, Security and Network Development................................................................30
7.1.1 Data and Storage Management ......................................................................................30 7.1.2 Workload Management...................................................................................................32 7.1.3 Information & Monitoring .................................................................................................32 7.1.4 Security...........................................................................................................................33 7.1.5 The Network Sector ........................................................................................................34
7.2 Summary of Middleware, Security and Network Services ...................................................36 7.3 Application Interfaces .........................................................................................................37
8. Management.................................................................................................................................39
9. Dissemination and Technology Transfer .......................................................................................40 9.1 Dissemination.....................................................................................................................40 9.2 Technology Transfer...........................................................................................................41
10. Resource Request ........................................................................................................................41 10.1 Overview ............................................................................................................................41 10.2 Tier-0 Resources................................................................................................................42 10.3 Tier-1 Resources................................................................................................................43 10.4 Tier-2 Resources................................................................................................................43 10.5 Application Interfaces .........................................................................................................43 10.6 Middleware, Security and Networking.................................................................................44 10.7 Management, Dissemination and Operations .....................................................................44 10.8 Travel and Consumables....................................................................................................44 10.9 Resource Request Details and Interface Issues .................................................................44 10.10 Resource Request Summary..............................................................................................46
11. Conclusion....................................................................................................................................48
12. Appendix.......................................................................................................................................48 12.1 Tier-0 Planning Document ..................................................................................................48 12.2 Tier-1 Planning Document ..................................................................................................48 12.3 Tier-2 Planning Document ..................................................................................................48 12.4 Middleware Planning Document .........................................................................................48 12.5 Applications Planning Document ........................................................................................48 12.6 Management Planning Document.......................................................................................49 12.7 Travel Planning Document .................................................................................................49 12.8 Hardware Requirements Planning Document .....................................................................49
http://www.gridpp.ac.uk/docs/gridpp2/
~30 page proposal + figures/tables
+ 11 planning documents:15. Tier-0
16. Tier-1
17. Tier-2
18. The Network Sector
19. Middleware
20. Applications
21. Hardware Requirements
22. Management
23. Travel
24. Dissemination
25. From Testbed to Production
Tony Doyle - University of Glasgow
GridPPGridPP Timeline…Timeline…
• 29/10/02 CB ~outline approval for GridPP2 (“Prototype to Production”) and EGEE (“Grid = Network++”)
• Dec 02 EU Call for FP6 (EGEE)• Jan 03 GridPP Tier-2 tenders • Four ½ posts to be assigned to regional management
(Testbed: Grid enablement of Hardware)• Feb 03 PPARC Call for e-Science proposals• 19/02/03 CB review of GridPP II plans• Apr 03 EGEE Proposal submission• ** Tier-1 and Tier-2 Centres
not yet defined (work with prototypes)**• End-May 03 GridPP II Proposal • Sep 03 Approval? (Limited) Tier-2 funding starts• Dec 03 DataGrid funding ends• ~Jan 04 DataGrid’ as part of EGEE?• Sep 04 Start of GridPP II Production Phase…
Production Grid
Whole Greater than the Sum of Parts..
Tony Doyle - University of Glasgow
Ta
gg
ed
re
lea
se s
ele
cte
d f
or
cert
ifica
tion
Ce
rtifi
ed
re
lea
se s
ele
cte
d f
or
de
plo
yme
nt
Ta
gg
ed
pa
cka
ge
Problem reports
add unittested code to
repository
Run nightly build
& auto. testsGrid certification
Fix problemsApplication Certification
BuildSystem
CertificationTestbed ~40CPU
ProductionTestbed ~1000CPU
Certified publicrelease
for use by apps.
24x7
Build system
Test Group
WPs
Unit Test Build Certification Production
Users
DevelopmentTestbed ~15CPU
Individual WP tests
IntegrationTeam
Integration
Overall release
tests
Releases candidate
Tagged Releases
Releases candidate
Certified Releases
Apps. Representatives
0
1
2
3
4
5
6
7
8
9
10
FT
E
Operations Manager
Tier-2 Expert
Tier-2 Expert
Tier-2 Expert
Tier-2 Expert
Tier-1 Expert
Tier-1 Expert
Applications Expert
University WP6 Posts
RAL WP6 Post
WP8 Post
Testbed Team Production Team
GridPP1 GridPP2
From Testbed to ProductionFrom Testbed to Production
Tony Doyle - University of Glasgow
Experiment Requirements: Experiment Requirements: UK onlyUK only
CPU
0
2000
4000
6000
8000
10000
12000
2004 2005 2006 2007
Year
kS
I20
00
ye
ar
ATLAS
CMS
LHCb
ALICE
Phenomenology
ZEUS
UKQCD
UKDMC
MINOS
MICE
LISA
D0
CDF
BaBar
ANTARES
LHC
NonLHC
Disk
0
500
1000
1500
2000
2500
2004 2005 2006 2007
Year
TB
ATLASCMSLHCbALICEPhenomenologyUKQCDUKDMCMINOSMICED0CRESSTCDFBaBarANTARES
LHC
NonLHC
Tape
0
500
1000
1500
2000
2500
3000
2004 2005 2006 2007
Year
TB
ATLASCMS
LHCb
ALICEUKDMC
MINOS
MICED0
ANTARES
LHC
NonLHC
Total Requirement:
Year 2004 2005 2006 2007
CPU [kSI2000] 2395 4066 6380 9965
Disk [TB] 369 735 1424 2285
Tape [TB] 376 752 1542 2623
Tony Doyle - University of Glasgow
Projected Hardware ResourcesProjected Hardware Resources
Total Resources:
2004 2007(note x2 scale change)
Tony Doyle - University of Glasgow
Tier Centres Tier Centres -- FunctionalityFunctionality
RAWRAW
ESDESD
AODAOD
TAGTAG
““InterestingInteresting”” Events List Events List
RAWRAW
ESDESD
AODAOD
TAGTAG
RAWRAW
ESDESD
AODAOD
TAGTAG
TierTier--00(International)(International)
TierTier--11(National)(National)
TierTier--22(Regional)(Regional)
TierTier--33(Local)(Local)
DataFiles
DataFiles
DataFiles
TAGData
DataFilesData
FilesDataFiles
RAWDataFile
DataFilesData
FilesESDData
DataFilesData
FilesAODData
Event 1 Event 2 Event n
Tony Doyle - University of Glasgow
GridPP2 Project MapGridPP2 Project Map
• Built in: to identify progress…
1 . 1 2 . 1 3 . 1 4 . 1 5 . 1 6 . 1 7 . 1
1 . 1 . 1 1 . 1 . 2 1 . 1 . 3 1 . 1 . 4 2 . 1 . 1 2 . 1 . 2 2 . 1 . 3 2 . 1 . 4 3 . 1 . 1 3 . 1 . 2 3 . 1 . 3 3 . 1 . 4 4 . 1 . 1 4 . 1 . 2 4 . 1 . 3 4 . 1 . 4 5 . 1 . 1 5 . 1 . 2 5 . 1 . 3 6 . 1 . 1 6 . 1 . 2 6 . 1 . 3 6 . 1 . 4 7 . 1 . 1 7 . 1 . 2 7 . 1 . 3 7 . 1 . 41 . 1 . 5 2 . 1 . 5 2 . 1 . 6 2 . 1 . 7 2 . 1 . 8 3 . 1 . 5 3 . 1 . 6 3 . 1 . 7 3 . 1 . 8 4 . 1 . 5 4 . 1 . 6 4 . 1 . 7 4 . 1 . 8 6 . 1 . 5
2 . 1 . 9 3 . 1 . 9 3 . 1 . 1 0 4 . 1 . 9
1 . 2 2 . 2 3 . 2 4 . 2 5 . 2 6 . 2 7 . 2
1 . 2 . 1 1 . 2 . 2 1 . 2 . 3 1 . 2 . 4 2 . 2 . 1 2 . 2 . 2 2 . 2 . 3 2 . 2 . 4 3 . 2 . 1 3 . 2 . 2 3 . 2 . 3 3 . 2 . 4 4 . 2 . 1 4 . 2 . 2 4 . 2 . 3 4 . 2 . 4 5 . 2 . 1 5 . 2 . 2 5 . 2 . 3 6 . 2 . 1 6 . 2 . 2 6 . 2 . 3 7 . 2 . 1 7 . 2 . 2 7 . 2 . 31 . 2 . 5 1 . 2 . 6 2 . 2 . 5 2 . 2 . 6 2 . 2 . 7 3 . 2 . 5 3 . 2 . 6 3 . 2 . 7 3 . 2 . 8 4 . 2 . 5 4 . 2 . 6 4 . 2 . 7
3 . 2 . 9
1 . 3 2 . 3 3 . 3 4 . 3 5 . 3 6 . 3 7 . 3
1 . 3 . 1 1 . 3 . 2 1 . 3 . 3 1 . 3 . 4 2 . 3 . 1 2 . 3 . 2 2 . 3 . 3 2 . 3 . 4 3 . 3 . 1 3 . 3 . 2 3 . 3 . 3 3 . 3 . 4 4 . 3 . 1 4 . 3 . 2 4 . 3 . 3 4 . 3 . 4 5 . 3 . 1 5 . 3 . 2 5 . 3 . 3 6 . 3 . 1 6 . 3 . 2 6 . 3 . 3 6 . 3 . 4 7 . 3 . 1 7 . 3 . 2 7 . 3 . 3 7 . 3 . 41 . 3 . 5 1 . 3 . 6 2 . 3 . 5 2 . 3 . 6 2 . 3 . 7 3 . 3 . 5 3 . 3 . 6 4 . 3 . 5
1 . 4 2 . 4 3 . 4 4 . 4 5 . 4
1 . 4 . 1 1 . 4 . 2 1 . 4 . 3 1 . 4 . 4 2 . 4 . 1 2 . 4 . 2 2 . 4 . 3 2 . 4 . 4 3 . 4 . 1 3 . 4 . 2 3 . 4 . 3 3 . 4 . 4 4 . 4 . 1 4 . 4 . 2 4 . 4 . 3 4 . 4 . 4 5 . 4 . 1 5 . 4 . 2 5 . 4 . 3 5 . 4 . 41 . 4 . 5 1 . 4 . 6 1 . 4 . 7 1 . 4 . 8 2 . 4 . 5 2 . 4 . 6 2 . 4 . 7 3 . 4 . 5 3 . 4 . 6 3 . 4 . 7 3 . 4 . 8 4 . 4 . 5 4 . 4 . 6 5 . 4 . 51 . 4 . 9 3 . 4 . 9 3 . 4 . 1 0 M e t r ic O K 1 . 1 . 1
M e t r ic n o t O K 1 . 1 . 1 1 . 5 2 . 5 3 . 5 4 . 5 T a s k c o m p le t e 1 . 1 . 1
1 . 5 . 1 1 . 5 . 2 1 . 5 . 3 1 . 5 . 4 2 . 5 . 1 2 . 5 . 2 2 . 5 . 3 2 . 5 . 4 3 . 5 . 1 3 . 5 . 2 3 . 5 . 3 3 . 5 . 4 4 . 5 . 1 4 . 5 . 2 4 . 5 . 3 4 . 5 . 4 T a s k o v e r d u e 1 . 1 . 11 . 5 . 5 1 . 5 . 6 1 . 5 . 7 1 . 5 . 8 2 . 5 . 5 2 . 5 . 6 2 . 5 . 7 3 . 5 . 5 3 . 5 . 6 3 . 5 . 7 6 0 d a y s 1 . 1 . 1
1 . 5 . 9 1 . 5 . 1 0 T a s k n o t d u e s o o n 1 . 1 . 1N o l o n g e r a c t i v e 1 . 1 . 1
2 . 6 3 . 6 4 . 6 N o T a s k o r m e t r i c2 . 6 . 1 2 . 6 . 2 2 . 6 . 3 2 . 6 . 4 3 . 6 . 1 3 . 6 . 2 3 . 6 . 3 3 . 6 . 4 4 . 6 . 1 4 . 6 . 2 4 . 6 . 32 . 6 . 5 2 . 6 . 6 2 . 6 . 7 2 . 6 . 8 3 . 6 . 5 3 . 6 . 6 3 . 6 . 7 3 . 6 . 8 N a v i g a t e u p
2 . 6 . 9 3 . 6 . 9 3 . 6 . 1 0 3 . 6 . 1 1 3 . 6 . 1 2 N a v i g a t e d o w n
E x t e r n a l li n k 2 . 7 3 . 7 L i n k t o g o a l s
2 . 7 . 1 2 . 7 . 2 2 . 7 . 3 2 . 7 . 4 3 . 7 . 1 3 . 7 . 2 3 . 7 . 3 3 . 7 . 42 . 7 . 5 2 . 7 . 6 2 . 7 . 7 2 . 7 . 8 3 . 7 . 5 3 . 7 . 6
2 . 8 3 . 8
2 . 8 . 1 2 . 8 . 2 2 . 8 . 3 2 . 8 . 4 3 . 8 . 1 3 . 8 . 2 3 . 8 . 32 . 8 . 5
T o d e v e l o p a n d d e p l o y a l a r g e s c a l e s c i e n c e G r i di n t h e U K f o r t h e u s e o f t h e P a r t i c l e P h y s i c s c o m m u n i t y
P r e s e n t a t i o n D e p lo y m e n t
5 6 74
1 - A p r - 0 3S t a t u s D a t e
I n t . S t a n d a r d s
O p e n S o u r c e
W o r l d w i d e I n t e g r a t i o n
U K I n t e g r a t i o n
M o n i t o r i n g
D e v e lo p in gE n g a g e m e n t
P a r t i c i p a t i o n
O t h e r
D a t a C h a l l e n g e s
R o l l o u t
T e s t b e d
W P 7
A T L A S / L H C b
C M S
B a B a r
C D F / D O
U K Q C D
W P 1
W P 2
W P 3
L C G C r e a t i o n
A p p l i c a t io n s
C E R N D a t a G r i d A p p l i c a t i o n s I n f r a s t r u c t u r e
D u e w it h in
A T L A S
G r i d P P G o a l
R e s o u r c e sI n t e r o p e r a b i l i t y D i s s e m i n a t i o n
T i e r - 1
T i e r - A
L H C b T i e r - 2
W P 8
1 2 3
D e p lo y m e n t
W P 4
W P 5
F a b r i c
T e c h n o lo g y
W P 6
U p d a t e
C l e a r
1. 1 2. 1 3. 1 4. 1 5. 1 6. 1 7. 1
1. 2 2. 2 3. 2 4. 2 5. 2 6. 2 7. 2
1. 3 2. 3 3. 3 4. 2 5. 3 6. 3 7. 3
1. 4 2. 4 3. 4 4. 3 5. 4 6. 4 7. 4
2. 5 3. 5 4. 4 5. 5
Navigate down
External link
Link to goals
TechnologyTransfer
Infrastructure
D0
OtherApplications
NetworkSupport
Support
ExperimentPhenoGrid
Non-LHC Apps761 2 3 4
ManagementProduction Grid
EngagementMonitoring
GridPP2 Goal
ATLAS
grid in the UK for the use of the Particle Physics communityTo develop and deploy a large scale production quality
5
CDF
Grid Technology
Grid Deployment UKQCDCMS
Security
Information &
LHCb
InteroperabilityDeployment
OutreachPlanningRollout
Dissemination
Computing Fabric
LCG Development LHC Apps
Applications Data & Storage
Workload
Management
Ganga
Management
BaBar
MonitoringRunning
Grid Services
Middleware
Tony Doyle - University of Glasgow
20%
2% 2%
7%
10%
9%
3%
7%
6%
3%
14%
1%1%
3% 12%
Application Development
Tier-2 Staff (Inst.)
Tier-1 Staff (CLRC)
Middleware/Security/Network
Tier-2 Hardware
Tier-2 Staff
Tier-1 Hardware
Tier-1 Staff
Tier-0 Hardware
Tier-0 Staff
Application Integration
Middleware/Security/Network
Dissemination
Travel and Ops
Management
GridPP2Proposal
ExternallyFunded
Components:Components:GridPP2 ProposalGridPP2 Proposal
* *
*Note Added Estimate of Tier-2 SRIF-2 Hardware Value
Tony Doyle - University of Glasgow
PPARC Call (27/2/03)
GridPP2 Response (30/5/03)
Projects Peer Review Panel(14-15/7/03)
Grid Steering Committee(28-29/7/03)
Science Committee(October 03)
Components:Components:GridPP2 Proposal (£23.1m)GridPP2 Proposal (£23.1m)
£0m
£2m
£4m
£6m
£8m
£10m
£12m
£14m
2003 2004 2005 2006 2007 2008
Application Development
Tier-1 and 2 staff
Middleware/Security/Network
Tier-2 Hardware
Tier-2 Staff
Tier-1 Hardware
Tier-1 Staff
Tier-0 Hardware
Tier-0 Staff
Application Integration
Middleware/Security/Network
Dissemination
Travel and Operations
Management
GridPP2Proposal
ExternallyFunded
£0.24m
£2.21m
£2.84m
£4.60m
£3.55m
£3.73m
£5.99m
£2.67m
£3.16m
£m £1m £2m £3m £4m £5m £6m £7m
Tier-0 Hardware
Tier-0 Staff
Tier-1 Hardware
Tier-1 Staff
Tier-2 Hardware
Tier-2 Staff
App. Integration
LHC Application Dev.
Non LHC Application Dev.
Middleware/Security/Network
Operations/Management/Dissem.
25% EGEE + Others
98% non-PPARC
20% Institutes
Experiment Collaboration Bids
?50% of £5mSRIF-1 SRIF-2
15% CLRC
Vertical
Integration
Tony Doyle - University of Glasgow
ConclusionConclusion
We request £23.1m to fund our three-year project, GridPP2
This will provide a Production Grid incorporating1. access to the Tier-0 Centre at CERN and the LCG
deployment releases 2. the UK Tier-1 Centre3. integration of four distributed Tier-2 centres4. technical development in Middleware, Security
and Networking5. Grid integration of the experimentsThe project is in direct support of PPARC's highest
priority science programme, the LHCStarting Point for Consideration of Priorites
Tony Doyle - University of Glasgow
GridPP Responses to QuestionsGridPP Responses to Questions
1. “If funded at a 25 % reduced level, what would the GridPP2 priorities be and what would be delivered?”
Prior to planning documents and proposal writing, discussed priorities to achieve a “Production Grid”: robust, reliable, resilient, secure, stable service delivered to end-user applications – adapt to reduced funding with this objective in mind.
GridPP2 Reduced Programme
2. “What extra added value would be bought/delivered by increasing funding above this level?”
Present in terms of a 15% reduced level scenario and the corresponding GridPP2 Regained Programme
(Other scenarios possible)
Tony Doyle - University of Glasgow
Priorities:Priorities:GridPP2 ProposalGridPP2 Proposal
1. Tier-1/A staff – National Grid Centre2. Tier-1/A hardware – International Role3. Tier-2 staff – UK e-Science Grid 4. Applications
– Grid Integration (GridPP2)– Development (experiments proposals)
5. Middleware – EU-wide development 6. Tier-2 hardware – non-PPARC funding7. CERN staff – quality assurance8. CERN hardware – pro-rata contribution• Established entering proposal writing
phase…
ALL of
theseare
required to
address the LHC
Computing Challenge
Tony Doyle - University of Glasgow
GridPP2 Priorities: GridPP2 Priorities: £17.3m Funding Scenario£17.3m Funding Scenario
• Tier-1/A Staff and Hardware– 25% Reduction
• Tier-2 Staff– 30% Reduction
• Applications– 25% Reduction
• Middleware, Security and Networking– 40% Reduction
• Tier-0/LCG Contribution– Fixed Cost
…
• GridPP Management– Fixed Cost
• Highest Priority– (Inter)National Grid via LCG
• Best Value For Money – National Grid
• End-user Driven Project – Grid interfaces required
• Essential Grid & e-Science Component– Grid development required
• Basis of Deployment– International negotiation
required with PPARC
• Added functionality– Operations Manager and
Dissemination Officer
GridPP2 Priorities End-user Driven Programme Focus on Production Grid Deployment
Assessment by Area Risk Reduced ability to develop and maintain Production Grid
Tony Doyle - University of Glasgow
£17.3m Funding Scenario£17.3m Funding Scenario
£0.24m
£2.21m
£2.13m
£3.43m
£2.61m
£2.79m
£3.76m
£2.12m
£3.16m
£m £1m £2m £3m £4m £5m £6m £7m
Tier-0 Hardware
Tier-0 Staff
Tier-1 Hardware
Tier-1 Staff
Tier-2 Hardware
Tier-2 Staff
App. Integration
LHC Application Dev.
Non LHC Application Dev.
Middleware/Security/Network
Operations/Management/Dissem.
28% EGEE + Others
27% Institutes
Experiment Collaboration Bids
98% non-PPARC
?50% of £5mSRIF-1 SRIF-2
14% CLRC
£0m
£2m
£4m
£6m
£8m
£10m
£12m
£14m
2003 2004 2005 2006 2007 2008
Application Development
Tier-1 and 2 staff
Middleware/Security/Network
Tier-2 Hardware
Tier-2 Staff
Tier-1 Hardware
Tier-1 Staff
Tier-0 Hardware
Tier-0 Staff
Application Integration
Middleware/Security/Network
Dissemination
Travel and Operations
Management
GridPP2Proposal
ExternallyFunded
Tony Doyle - University of Glasgow
Tier 0 and LCG: Tier 0 and LCG: Foundation ProgrammeFoundation Programme
• Aim: retain UK influence
• Ensure development programmes are linked
• Project management:
GridPP LCG
• Shared expertise:
• LCG establishes the global computing infrastructure
• Allows all participating physicists to exploit LHC data
• Proposed funding determined based on:– Recently increased funding at
CERN supporting LCG (recognition of Grid importance within the LHC programme)
– Appropriate share for the UK– Requirements of LCG Phase 2– Past GridPP1 contribution
Required Foundation: LCG Fabric, Technology and Deployment
Tony Doyle - University of Glasgow
Tier 1: Reduced SystemTier 1: Reduced System
GridPP2 Proposal
Reduced Services
CPU 2.0 2.0
Disk 1.5 1.5
AFS 0.5 0.0
Tape 2.5 2.5
Core Services 2.5 2.0
Operations 3.0 2.5
Networking 0.5 0.5
Security 1.0 0.0
Deployment 2.0 1.0
Experiments 2.0 1.0
Management 1.5 1.0
Total 19 14
• Reduce hardware by 25% 3MSI2K cpu and 850TB of disk by 2007
• c.f. Reqt. of 12MSI2K and 2200TB in 2007
Reduce Tier 1 manpower resources by 25%:
Reduced International Significance and Ability to Contribute to Grid Deployment
Integrated Disk
0
200
400
600
800
1000
1200
1400
2002 2003 2004 2005 2006 2007T
B
proposal
15%cut
25%cut
3 Reduce numbers of cpus and disks by 25% of original year1 figures
Result 0.76 2168
CPU Evolution
0
500
1000
1500
2000
2500
2002 2003 2004 2005 2006 2007
kilo
SI2
000 Proposal
15%cut
25%cut
Tony Doyle - University of Glasgow
Tier 1: Reduced ServicesTier 1: Reduced Services
Reduced Services (Concentrate on delivery to LHC/LCG)
• Reduce Application Support by 1 FTE – No explicit support for BaBar at Tier-A and best-effort support for non-LHC
experiments• Reduce Management by 0.5 FTE
– Compromises ability to contribute fully to external projects• Reduce Deployment Team by 1 FTE
– Reduced participation/support for Grid Deployment programme at UK centre• Remove Security Support (1 FTE) and
– Spread load across Support Team. Risk of security exposure and less focus on propagating security knowledge to other sites
• Drop AFS Support (0.5 FTE)– No specialist file service support
• Reduce Operations Support and Core Services by 1 FTE– Slower fixing of broken boxes. Reduced hardware utilisation
• Reduce Hardware by 25%– Concentrate on Data Services. Greater reliance on Tier-2 Centres for CPU-
intensive jobs
Reduced International Significance and Ability to Contribute to Grid Deployment
Tony Doyle - University of Glasgow
Tier 2: Reduced System SupportTier 2: Reduced System Support
• Reduce Hardware Support by 1 FTE– Reduces hardware support at a
given Tier-2 centre by 50%• Reduce User Support by 1 FTE (50%)
– Harder to induct users in Grid technology and support them afterwards
• Reduce Data Management by 1 FTE (50%)
– No longer a dedicated specialist service (1/2 post) at each Tier-2 centre
• Remove Network (1 FTE) and VO Management Services (1 FTE)
– Services required by all e-Science Grid users
– Rely on the local centres providing these: increasing risks if not responsive to GridPP requirements
GridPP2 Proposal Reduced Service
Y 1 Y 2 Y 3 Y 1 Y 2 Y 3
Hardware Support 8.0 8.0 8.0 7.0 7.0 7.0
Core Services 4.0 4.0 4.0 4.0 4.0 4.0
User Support 2.0 2.0 2.0 1.0 1.0 1.0
Specialist Services
Security 1.0 1.0 1.0 1.0 1.0 1.0
Resource Broker 1.0 1.0 1.0 1.0 1.0 1.0
Network 1.0 1.0 1.0 0.0 0.0 0.0
Data Management 2.0 2.0 2.0 1.0 1.0 1.0
VO Management 1.0 1.0 1.0 0.0 0.0 0.0
20.0 20.0 20.0 15.0 15.0 15.0
Existing Staff -4.0 -4.0 -4.0 -4.0 -4.0 -4.0
GridPP2 16.0 16.0 16.0 11.0 11.0 11.0
Total SY 48.0 33.0
Reduce ability to access significant Tier-2 resources via Production Grid by ~ one third
Reduce Tier 2 manpower resources by 30%:
Tony Doyle - University of Glasgow
Middleware, Security & Networking: Middleware, Security & Networking: Reduced DevelopmentReduced Development
– Reduce Security by 1 FTE• Omit local access and usage control
framework
– Reduce Information Services by 2 FTE• Matching funding needed for EGEE
participation; Risks UK leadership; Other potential solution (MDS) falls short of LCG requirements
– Reduce Data & Storage by 1 FTE • Reduced data replication capabilities
– No Workload Mgmt. Development• No tech transfer from Core Programme;
No leverage of OGSA development
– Reduce Networking by 1 FTE• No active participation in UKLight
Programme
– Rely upon non-GridPP developments• Significantly Increased Project Risk
Experiments’ priorities:1. Data & Storage
– Mission critical to PP• Information Services & Monitoring
– Essential for “understanding” grid2. Security
– Robust against hackers, denial-of-service, secure file storage..
• Networking– PP input to performance & provisioning
3. Workload– Brokering development highly desirable
Activity Proposal Reduced
Security 4.0 3.0
Info. Services & Monit. 5.0 3.0
Data & Storage 4.0 3.0
Workload 2.5 0
Networking 3.5 2.5
TOTAL 19.0 11.5
Reduce ability to develop/understand Middleware as part of Production Grid Environment
Reduce middleware manpower by 40%:
SecurityMiddleware
Networking
Tony Doyle - University of Glasgow
Middleware, Security & Networking: Middleware, Security & Networking: Reduced DevelopmentReduced Development
Task 1. Deployment and maintenance of EDG and future LCG software base. o Establishing a brokering infrastructure for UK resources. o Deployment of latest releases, investigation of problems and creation of bugfixes
for current EDG software base. Task 2: Updating User Requirements
o Define the service granularity needed by the GridPP2 community. o How these services should be made available to application developers. o How these services should be available to the user.
Task 3: Standardisation of workload management related service within the Global Grid Forum
o Establishment of Job Submission Description Language (JSDL) Working Group and contribution to activity
o Standardisation of Resource Brokering and Workflow service interfaces from a user and inter-service perspective.
o Working with established WG’s (e.g. GRAAP) or new WG’s to define interfaces to promote service interoperability with other work in this area, e.g. EGEE.
Task 4: OGSIfication of the EDG Resource Brokering software stack. o Establish an instrumented testbed within LeSC resources to examine scalability
of workload management software. o Develop OGSA complaint interfaces to and within workload management
software to enable (for example) the submission of JSDL documents and to extract job status reports etc
o Develop notification interface so that other services may subscribe to workload management events (e.g. job completion).
Task 5: Redesign of workload management architecture o Analysis of testbed scalability & reliability. o Identification of bottlenecks in current virtualised architecture. o Identification of failures to identify the ‘weakest link’ and to eliminate it. o Performance and reliability driven replacement of identified problem areas with
OGSA services. o Improvements in workload management scalability through load balancing
across multiple server instances Task 6: Development of a ‘pure’ Java client for interaction with workload management
infrastructure. o Develop toolkit to interact with workload management services o Develop Java JSDL client to submit, monitor and retrieve jobs from the workload
management infrastructure. o Develop management tools to monitor and report on the performance of the
workload management instances. Task 7: Enhancements to workload management infrastructure
o Autonomic reaction to workload (i.e. as load increases and jobs fail deploy and configure in ‘spare’ RB units’)
o Examination of alternative scheduling algorithms.
Task 4. Grid Optimisation. o Improving the current grid simulation tool (OptorSim) to develop better algorithms
for grid optimisation, and validation of its predictions with the existing Tevatron experiments in a variety of experimental phases including debugging, full scale production, Monte Carlo production and analysis prior to conference. This comparison with real data will enable us to use the simulation to examine problems that are of practical concern to particle physics computing.
o Leveraging the latest research performed in GridPP and EDG to improve the capabilities and performance of the Optimisation module. The Optimisation module is part of the replica management system. Its development is driven from the Grid models studied using OptorSim.
o Incorporation of the Optimisation module into the Fermilab SAM / JIM system. o Continuing to build on strong relationships with the UK networking community
upon how their expertise can add value to the overall data management software. Their input has already been vital to the grid optimisation component of the software.
o Define and implement the necessary interfaces between the data management system, the workload management system and the experiment software to enable effective job splitting.
o Examine how the file meta-data from the experiments can be used to improve the Optimisation module for particle physics usage.
. Local Access Control The first task of this proposal involves removing these limitations, by allowing users to belong to multiple Virtual Organisations and to use the associated authorization rights, group memberships, roles etc in jobs. Initially, this will be done by supplying methods to dynamically create, manage and expire the Unix rights and privileges associated with local Pool Accounts, in accordance with the set of Grid-wide rights a particular job has. Initially, this feature will be especially useful for users engaged in operations and deployment, since it will allow them to assume that subset of rights which are necessary to perform a specific test. For example, to test the Storage Element file servers at two sites which are members of two different Application VOs. As GridPP puts the regional Tier-2 model into practice, this feature will be essential to allow users to gain access to pooled resources by virtue of GridPP or Tier-2 membership, rather than by membership of a VO which not all GridPP or Tier-2 sites belong to. The first task also involves applying this dynamic Grid to local rights mapping to local batch systems used to queue jobs for execution. The task will then further develop the Grid Access Control List (GACL) language and associated GACL handling libraries for C/C++ and Java environments. Initial versions of these were produced by GridPP and are used in the EDG Storage Element and the GridSite and SlashGrid prototypes discussed below. Part of this subtask will involve co-ordination of GACL development with other access control language work, via the Global Grid Forum Authorization Working Group, co-chaired by Andrew McNab of Manchester HEP. Finally, the first task envisages using the SlashGrid filesystem prototype to implement Grid-aware access control directly in the operating system layer, rather than being translated into local Unix privileges as with the initial Pool Accounts scheme. Although the existing SlashGrid prototype has most of this functionality already, some additional work will be required to interface it to sources of credentials as they enter the site via job submission, and to update the software as the GACL library is developed. By applying these controls at a low level, it is easier to produce access controls without loopholes, which are implemented in an efficient way, and can be made persistent, so that, for example, file ownership is preserved on disk for months, and can be transferred along with the files onto backup media. In summary the stages of the task are: a. Pool accounts and Unix permission management. b. GACL access control language development. c. Batch system access control interfaces. d. SlashGrid filesystem access control. 2. Local Usage Control The second task has a similar to structure to the first, but concentrates on Usage rather than Access Control, by which we mean the control of the amount of resources available to a user, job or process, rather than the decision whether to allow access to a specific site, host or file. In parallel with the first task, this envisages initially mapping the permitted usage, quotas and limits of a given Grid identity to local Unix usage limit mechanisms. In parallel with this, a language or description of these properties will be developed and used by administrators to specify permitted usage. Finally, the SlashGrid system will be extended to enforce these systems at the operating system level. The work of this task will benefit greatly from co-operation with the GGF Accounting work already being led by the North West e-Science Centre and Manchester Computing as discussed below. Both the first and second tasks assume that the EDG Virtual Organisation Membership Service (VOMS) attribute certificates system will be the only source of authorization credentials, and will use them as its source of authorization evidence. Subsequent tasks extend this to support other sources of authorization. a. Interface to Unix resource usage limit controls. b. Usage control description development. c. Batch system usage control interfaces. d. SlashGrid filesystem usage control.
Task 4. Schema & Information Provider co-ordination : An essential activity is the co-ordination of information schema production. This will be done through the production of guidance notes and template scripts as well as through direct support channels. Standardisation of schemas is very important both as a major component in the Grid Information Model and where interoperability between or federation of different grids is important. It will be necessary to participate in the development of schema standards such as GLUE, in the adaptation of standards such as CIM and in associated developments at GGF (such as under the OGSA umbrella).
Task 5. Development & integration of displays and end-user tools : It will be desirable to provide a baseline set of tools and displays to exploit the GIMS (and thus avoid developments of a number of similar tools within different projects). Requirements for the set will be derived, though it is very likely that a wide diversity will result and it will be possible only to meet a basic subset. This set will be derived from both existing tools and through new developments where gaps exist. In either case the tools will be integrated with the GIMS and documentation developed. Existing packages such as Nagios, Ganglia and the EDG Fabric Management tools are current contenders to be included in the baseline. One or more information portals, of varying characteristics, will also be included and it is anticipated that the need to federate information sources will influence choices. Tools will be chosen/developed to support three basic domains, application users, system administrators and grid administrators. The latter category will include tools necessary for use in a Grid Operations Centre.
Participation of GridPP-2 in UKLIGHT
HEFCE recently approved the construction of UKLIGHT, a leading edge optical networking research “point-of-presence” which will allow the UK to join the global optical networking research infrastructure. The UKLIGHT facility will be situated in London with links to StarLight and NetherLight, which are in turn connected to other facilities in the US Canada, the Nordic countries and CERN. Connections within the UK to participating institutes will be via the SuperJANET development network. UKLIGHT is described in more detail in the appendix. With UKLIGHT we are on the threshold of a new era of International research networking collaboration based upon an integrated global infrastructure. A mainline part of the scope of UKLIGHT is to demonstrate the benefit of hybrid multi-service networks (meaning a mix of layer 1,2 and 3) to high profile-high demand applications, and PP has been one of the highest profile of these. In practice UKLIGHT will give access to “on demand” multi Gbit/s dedicated circuits which can be routed to CERN, FNAL and possibly SLAC for high performance data transport experiments. GridPP should ensure that PP benefits from this infrastructure by collaborating directly in the first “early success project” in conjunction with other disciplines, including Radio Astronomy and possibly the HPC community for visualisation. The PP goal will of course be to show the benefit of “lightpath” switching to LHC and Tevatron and/or BaBar data delivery. It is important to emphasise that GridPP in not expected to resource the entire activity, but, as for some middleware activities, contribute PP applications focused effort to leverage a larger effort and thereby ensure that the outcome benefits PP. This activity will also leverage benefits of direct collaboration with other directly related projects :
- Technical programme funding applications to be submitted to industrial collaboration lines and future core programme calls.
- ULTRALIGHT – a US applications driven project led by leading US PP organisations, with similar goals, i.e. demonstration of the benefit of multi-service networks to PP experiments and other applications. FNAL and SLAC have supported this project. We (UK PP networking interest) are naturally already involved supporting this proposal in order to benefit from the wider expertise, as well as the FNAL and SLAC buy in to the project.
- A Framework-6 Optical networking project which will be submitted. If successful we would expect this to bring at least matching EU funded post to the UK
Deliverables D3.1 [M6]
- Static lightpath connectivity to one remote site D3.2 [M12]
- Demonstration of high capacity data transport for at least one experiment - Static light path connectivity to all sites
D3.3 [M24]
- Dynamic lightpath connectivity - Demonstration of high capacity data transfer for production requirements of two
experiments, at least one of which must be an LHC experiment D3.4 [M36]
- Strategic report describing the benefits of hybrid service networking to high demand applications.
Resource:
SecurityMiddleware
Networking
Cuts From Reduced Development Programme
Tony Doyle - University of Glasgow
• Reduce Applications Resources by 25%
5 FTEs removed (less than current programme)– Cuts in ongoing programme of work– Loss of leadership within experiments’ programme– Reduced non-LHC involvement– No new experiment involvement
• Long-term problem running separate computing systems• Non-Grid & Grid
• Disenfranchise sizeable part of the community– MICE– Linear Collider– Non-accelerator Physics– Phenomenology
Applications: Reduced DevelopmentApplications: Reduced Development
Risk of lost leadership and expertise
Failure to engage the whole PP community
Expt A
Server
Expt B
Servers
Expt C Expt DMiddleware
Tony Doyle - University of Glasgow
Regained ProgrammeRegained Programme
• Regained Programme:• Reduce GridPP2 programme by
15% leading to the following areas being brought back into scope of the GridPP2 programme
• Note cuts already included (in both scenarios)
• Dissemination de-scoping: two year funding within
GridPP2• Spine Point Savings:
SP11 used as average for new University appointments
Losses [£m] Reduced Services
(25% cut)
Regained Programme
(15% cut)
Tier-1 staff - 0.99 - 0.62
Tier-1 hardware - 0.71 - 0.43
Tier-2 staff - 0.91 - 0.45
Middleware posts - 1.85 - 0.99
Application posts - 0.93 - 0.54
Dissemination De-scoping
- 0.15 - 0.15
Spine Point Savings - 0.10 - 0.13
Total - 5.7 - 3.4
Proposed cuts would compromise large areas of the GridPP2 Prioritised Programme
GridPP2 Priority Areas Regain to maintain Production Grid
Propose to regain this programme by reducing severity of cuts
£0m
£2m
£4m
£6m
£8m
£10m
£12m
£14m
2003 2004 2005 2006 2007 2008
Application Development
Tier-1 and 2 staff
Middleware/Security/Network
Tier-2 Hardware
Tier-2 Staff
Tier-1 Hardware
Tier-1 Staff
Tier-0 Hardware
Tier-0 Staff
Application Integration
Middleware/Security/Network
Dissemination
Travel and Operations
Management
GridPP2Proposal
ExternallyFunded
Tony Doyle - University of Glasgow
Tier 1: Regained ProgrammeTier 1: Regained Programme
• Regained programme includes 13.5 (GridPP) + 3.0 (CCLRC) FTE
• Hardware: Regain 3.5MSI2K cpu (+15%) and 1PB of disk (+17.5%) by 2007
• System Support: Focus on outward programme
• Restore 1 FTE for Experiment Support
– Continue as BaBar Tier A through to 2007
• Restore 1 FTE to Deployment– Regain participation/support for
Grid Deployment programme
• Restore 0.5 FTE to Management– Enable participation in wider Grid
programme
Reduced Services
Regained Programme
CPU 2.0 2.0
Disk 1.5 1.5
AFS 0.0 0.0
Tape 2.5 2.5
Core Services 2.0 2.0
Operations 2.5 2.5
Networking 0.5 0.5
Security 0.0 0.0
Deployment 1.0 2.0
Experiments 1.0 2.0
Management 1.0 1.5
Total 14.0 16.5
Restore Tier 1 manpower resources to 87% of original proposal
Restores International significance and ability to lead Grid Deployment
Tony Doyle - University of Glasgow
Tier 2: Regained ProgrammeTier 2: Regained Programme
Reduced ServiceRegained
Programme
Y 1 Y 2 Y 3 Y 1 Y 2 Y 3
Hardware Support 7.0 7.0 7.0 4.0 8.0 8.0
Core Services 4.0 4.0 4.0 4.0 4.0 4.0
User Support 1.0 1.0 1.0 1.0 2.0 2.0
Specialist Services
Security 1.0 1.0 1.0 1.0 1.0 1.0
Resource Broker 1.0 1.0 1.0 1.0 1.0 1.0
Network 0.0 0.0 0.0 0.5 0.5 0.5
Data Management 1.0 1.0 1.0 2.0 2.0 2.0
VO Management 0.0 0.0 0.0 0.5 0.5 0.5
15.0 15.0 15.0 14.0 19.0 19.0
Existing Staff -4.0 -4.0 -4.0 -4.0 -4.0 -4.0
GridPP2 11.0 11.0 11.0 10.0 15.0 15.0
Total SY 33.0 40.0
• Reprofile Hardware Support to later years
– Delays Production Grid Roll-Out but establishes longer-term support
• Reprofile and Restore User Support to 2 FTE in the 2nd & 3rd years
– Induct users in Grid technology and support them as LHC turn-on approaches
• Restore Data Management to 2 FTE– Allows dedicated specialist service
inc. 0.5FTE at each Tier-2
• Partially Restore the Network and VO Management Services to 0.5 FTE each
– Reduce reliance on local centres– Reduces risk that they are not
responsive to GridPP requirements
Restore Tier 2 manpower resources to 86% of original proposal
Restores ability to access managed Tier 2 resources via Production Grid
Tony Doyle - University of Glasgow
Middleware, Security & Networking: Middleware, Security & Networking: Regained ProgrammeRegained Programme
• Restore 1.0 FTE Data & Storage– meet PP data replication
requirements
• Restore 1.0 FTE Information & Monitoring
– enable delivery of robust information services
• Restore 0.5 FTE for Security– enable application of local site
policies
• Restore 0.5 FTE Networking– enable UKLight direct
participation
• Regain 1.5 FTE Workload– enable viable job brokering
development programme
• Regained programme defined by – mission criticality
(experiment requirements driven)
– International/UK-wide lead
– leverage of EGEE, UK core and LCG developments
Activity Proposal Reduced Regained
Security 4.0 3.0 3.5
Info-Mon. 5.0 3.0 4.0
Data & Storage 4.0 3.0 4.0
Workload 2.5 0 1.5
Networking 3.5 2.5 3.0
TOTAL 19.0 11.5 16
Restores ability to develop key Middleware as part of Production Grid Environment
SecurityMiddleware
Networking
Tony Doyle - University of Glasgow
Applications: Regained ProgrammeApplications: Regained Programme
• Restore Applications Resources – 2 FTEs restored (regain current GridPP1 compliment)
• Ongoing programme of work can continue– Difficult to involve experiment activity not already engaged
within GridPP• Still a risk in providing Grid access across PP community
• Would need re-scoping (or de-scoping) current activities
• Project would need to build on cross-experiment collaboration – GridPP1 already has experience
– GANGA: ATLAS & LHCb
– SAM: CDF & D0
– Persistency: CMS & BaBar
• Encourage new joint developments across experiments
Current knowledge base maintained and current engagement protected
Ability to rescope to engage the whole PP community
Tony Doyle - University of Glasgow
ConclConclusionsusions
• GridPP2 proposal strategic aim: meet all particle physics computing requirements via production grid
• Balanced programme• Priorities focus on LCG
development and deployment • Recognise challenge in going
from Prototype to Production systems
• 25% reduced funding scenario would require significant de-scoping
• One scenario presented, focussing on LHC end-user driven objectives
• 15% reduced funding scenario would re-enable key aspects of the GridPP2 programme
• Regain:
• Tier-1: International significance and ability to lead Grid Deployment
• Tier-2: ability to access managed Tier-2 resources via Production Grid
• Middleware: ability to develop key Middleware as part of Production Grid Environment
• Applications: maintain leadership and ability to re-scope to engage the whole PP community