Upload
howard-cunningham
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 1
High Energy Physicsand Data Grids
Paul AveryUniversity of Floridahttp://www.phys.ufl.edu/~avery/[email protected]
US/UK Grid WorkshopSan Francisco
August 4-5, 2001
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 2
Essentials of High Energy PhysicsBetter name “Elementary Particle Physics”
Science: Elementary particles, fundamental forces
e
e
ud
cs
tb
Goal unified theory of natureUnification of forces (Higgs, superstrings, extra dimensions, …)Deep connections to large scale structure of universeLarge overlap with astrophysics, cosmology, nuclear physics
Quarks
Leptons
Particles Forces
Strong gluon
Electro-weak , W, Z0
Gravity graviton
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 3
10-10 m ~ 10 eV >300,000 Y
10-15 m MeV - GeV
10-16 m >> GeV ~10 -6 sec
10-18 m ~ 100 GeV ~10 -10 sec
1900.... Quantum MechanicsAtomic physics
1940-50 Quantum Electro Dynamics
1950-65 Nuclei, HadronsSymmetries, Field theories
1965-75 Quarks. Gauge theories
1990 LEP 3 families, Precision Electroweak
10-19 m ~10 2 GeVOrigin of masses
The next step...
~10 -12 sec 2007 LHC Higgs ? Supersymmetry ?
197083 SPS ElectroWeak unification, QCD
~ 3 min
10-32 m ~1016 GeV ~10 -32 secProton Decay ? Underground GRAND Unified Theories ?
10-35 m ~1019 GeV(Planck scale)
~10 -43 sec ?? Quantum Gravity? Superstrings ?
The Origin of the Universe
1994 Tevatron Top quark
u e+Z
e-u
HEP Short History + Frontiers1/T t/ p
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 4
HEP ResearchExperiments primarily accelerator based
Fixed target, colliding beams, special beams
DetectorsSmall, large, general purpose, special purpose
… but wide variety of other techniquesCosmic rays, proton decay, g-2, neutrinos, space missions
Increasing scale of experiments and laboratoriesForced on us by ever higher energiesComplexity, scale, costs large collaborations International collaborations are the norm todayGlobal collaborations are the future (LHC)
LHC discussed in next few slides
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 5
The CMS Collaboration
1010
448
351
1809
Member States
Non-Member States
Total
USA
58
36
144
Member States
Total
USA
50Non-Member States
Number ofScientists
Number of Laboratories
Slovak Republic
CERN
France
Italy
UK
Switzerland
USA
Austria
Finland
Greece
Hungary
Belgium
Poland
PortugalSpain
Pakistan
Georgia
Armenia
UkraineUzbekistan
Cyprus
Croatia
China
TurkeyBelarus
Estonia
India
Germany
Korea
Russia
Bulgaria
China (Taiwan)
1809 Physicists and Engineers 31 Countries 144 Institutions
Associated Institutes
Number of ScientistsNumber of Laboratories
365
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 6
CERN LHC site
CMS
Atlas
LHCb
ALICE
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 7
High Energy Physics at the LHC“Compact” Muon Solenoid
at the LHC (CERN)
Smithsonianstandard man
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 8
Particle
ProtonProton 2835 bunch/beam Protons/bunch 1011
Beam energy 7 TeV (7x1012 ev)Luminosity 1034 cm2s1
Crossing rate 40 MHz(every 25 nsec)
Collision rate ~109 Hz
Parton(quark, gluon)
Proton
Selection: 1 in 1013
ll
jetjet
Bunch
SUSY.....
Higgs
Zo
Zoe+
e+
e-
e-
New physics rate ~ 105 Hz
Collisions at LHC (2007?)
(Average ~20 Collisions/Crossing)
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 9
HEP DataScattering is principal technique for gathering data
Collisions of beam-beam or beam-target particlesTypically caused by a single elementary interactionBut also background collisions obscures physics
Each collision generates many particles: “Event”Particles traverse detector, leaving electronic signature Information collected, put into mass storage (tape)Each event is independent trivial computational
parallelism
Data Intensive ScienceSize of raw event record: 20KB 1MB106 109 events per year0.3 PB per year (2001) BaBar (SLAC)1 PB per year (2005) CDF, D0 (Fermilab)5 PB per year (2007) ATLAS, CMS (LHC)
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 10
Data Rates: From Detector to Storage
Level 1 Trigger: Special Hardware
40 MHz ~1000 TB/sec
75 KHz 75 GB/sec
5 KHz 5 GB/sec
Level 2 Trigger: Commodity CPUs
100 Hz 100 MB/sec
Level 3 Trigger: Commodity CPUs
Raw Data to storage
Physics filtering
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 11
LHC Data Complexity“Events” resulting from beam-beam collisions:
Signal event is obscured by 20 overlapping uninteresting collisions in same crossing
CPU time does not scale from previous generations
2000 2007
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 12
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
(+30 minimum bias events)
40M events/sec, selectivity: 1 in 1013
Example: Higgs Decay into 4 Muons
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 13
1800 Physicists150 Institutes32 Countries
LHC Computing ChallengesComplexity of LHC environment and resulting dataScale: Petabytes of data per year (100 PB by ~2010)
Millions of SpecInt95s of CPUGeographical distribution of people and resources
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery
Transatlantic Net WG (HN, L. Price)
Tier0 - Tier1 BW Requirements [*] 2001 2002 2003 2004 2005 2006
CMS 100 200 300 600 800 2500
ATLAS 100 200 300 600 800 2500
BaBar 300 600 1100 1600 2300 3000
CDF 600 1200 1600 2000 3000 4000
D0 600 1200 1600 2000 3000 4000
BTeV 20 40 100 200 300 500
DESY 100 180 210 240 270 300
CERNBW
155-310
622 1250 2500 5000 10000
[*] Installed BW in Mbps. Maximum Link [*] Installed BW in Mbps. Maximum Link Occupancy 50%; work in progressOccupancy 50%; work in progress
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 15
Hoffmann LHC Computing Report 2001
Tier0 – Tier1 link requirements
(1) Tier1 Tier0 Data Flow for Analysis 0.5 - 1.0 Gbps
(2) Tier2 Tier0 Data Flow for Analysis 0.2 - 0.5 Gbps
(3) Interactive Collaborative Sessions (30 Peak) 0.1 - 0.3 Gbps
(4) Remote Interactive Sessions (30 Flows Peak) 0.1 - 0.2 Gbps
(5) Individual (Tier3 or Tier4) data transfers 0.8 Gbps Limit to 10 Flows of 5 Mbytes/sec each
TOTAL Per Tier0 - Tier1 Link 1.7 - 2.8 Gbps
Corresponds to ~10 Gbps Baseline BW Installed on US-CERN LinkAdopted by the LHC Experiments (Steering Committee Report)
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 16
LHC Computing ChallengesMajor challenges associated with:
Scale of computing systemsNetwork-distribution of computing and data resources Communication and collaboration at a distanceRemote software development and physics analysis
Result of these considerations: Data Grids
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 17
Tier0 CERNTier1 National LabTier2 Regional Center (University, etc.)Tier3 University workgroupTier4 Workstation
Global LHC Data Grid Hierarchy
Tier 1
T2
T2
T2
T2
T2
3
3
3
3
3
3
3
3
3
3
3
Tier 0 (CERN)
4 4 4 4
3 3
Key ideas:Hierarchical structureTier2 centersOperate as unified Grid
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 18
Example: CMS Data Grid
Tier2 Center
Online System
CERN Computer Center > 20
TIPS
USA CenterFrance Center
Italy Center UK Center
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations,other portals
~100 MBytes/sec
2.5 Gbits/sec
100 - 1000
Mbits/sec
Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute has ~10 physicists working on one or more channels
Physics data cache
~PBytes/sec
2.5 Gbits/sec
Tier2 CenterTier2 CenterTier2 Center
~622 Mbits/sec
Tier 0 +1
Tier 1
Tier 3
Tier 4
Tier2 Center Tier 2
Experiment CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 19
Tier1 and Tier2 CentersTier1 centers
National laboratory scale: large CPU, disk, tape resourcesHigh speed networksMany personnel with broad expertiseCentral resource for large region
Tier2 centersNew concept in LHC distributed computing hierarchySize [national lab * university]1/2
Based at large University or small laboratoryEmphasis on small staff, simple configuration & operation
Tier2 roleSimulations, analysis, data cachingServe small country, or region within large country
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 20
LHC Tier2 Center (2001)
Router
FEthFEth Switch
FEth SwitchFEth Switch
FEth SwitchGEth Switch
Data S
erver
>1 RAIDTape
WA
N
Hi-speedchannel
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 21
Buy late, but not too late: phased implementationR&D Phase 2001-2004 Implementation Phase 2004-2007R&D to develop capabilities and computing model itselfPrototyping at increasing scales of capability & complexity
1.4 years
1.2 years
1.1 years
2.1 years
Hardware Cost Estimates
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 22
HEP Related Data Grid ProjectsFunded projects
GriPhyN USA NSF, $11.9M + $1.6MPPDG I USA DOE, $2MPPDG II USA DOE, $9.5MEU DataGrid EU $9.3M
Proposed projects iVDGL USA NSF, $15M + $1.8M + UKDTF USA NSF, $45M + $4M/yrDataTag EU EC, $2M?GridPP UK PPARC, > $15M
Other national projectsUK e-Science (> $100M for 2001-2004) Italy, France, (Japan?)
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 23
(HEP Related) Data Grid Timeline
Q2 00
Q3 00
Q4 00
Q1 01
Q2 01
Q3 01
GriPhyN approved, $11.9M+$1.6M
Outline of US-CMS Tier plan
Caltech-UCSD install proto-T2
Submit GriPhyN proposal, $12.5M
Submit iVDGL preproposal EU DataGrid approved,
$9.3M1st Grid coordination
meeting Submit PPDG proposal, $12M
Submit DTF proposal, $45M
Submit iVDGL proposal, $15M
PPDG approved, $9.5M
2nd Grid coordination meeting
iVDGL approved?DTF approved?DataTAG approved
Submit DataTAG proposal, $2M
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 24
Coordination Among Grid ProjectsParticle Physics Data Grid (US, DOE)
Data Grid applications for HENPFunded 1999, 2000 ($2M)Funded 2001-2004 ($9.4M)http://www.ppdg.net/
GriPhyN (US, NSF)Petascale Virtual-Data GridsFunded 9/2000 – 9/2005 ($11.9M+$1.6M)http://www.griphyn.org/
European Data Grid (EU)Data Grid technologies, EU deploymentFunded 1/2001 – 1/2004 ($9.3M)http://www.eu-datagrid.org/
HEP in common
Focus: infrastructure development & deployment
International scope
Now developing joint coordination framework
GridPP, DTF, iVDGL very soon?
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 25
Data Grid Management
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 26
PPDG
BaB
ar D
ata
Man
agem
ent
BaBar
D0
CDF
Nuclear Physics
CMSAtlas
Globus Users
SRB Users
Condor Users
HENPGC
Users
CM
S D
ata Managem
ent
Nuclear Physics Data Management
D0 Data M
anagement
CDF Data ManagementA
tlas
Dat
a M
anag
emen
t
Globus Team
Condor
SRB Team
HE
NP
GC
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 27
Work Package
Work Package title Lead contractor
WP1 Grid Workload Management INFN
WP2 Grid Data Management CERN
WP3 Grid Monitoring Services PPARC
WP4 Fabric Management CERN
WP5 Mass Storage Management PPARC
WP6 Integration Testbed CNRS
WP7 Network Services CNRS
WP8 High Energy Physics Applications CERN
WP9 Earth Observation Science Applications ESA
WP10 Biology Science Applications INFN
WP11 Dissemination and Exploitation INFN
WP12 Project Management CERN
EU DataGrid Project
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 28
PPDG and GriPhyN Projects
PPDG focus on today’s (evolving) problems in HENP Current HEP: BaBar, CDF, D0 Current NP: RHIC, JLAB Future HEP: ATLAS , CMS
GriPhyN focus on tomorrow’s solutions ATLAS, CMS, LIGO, SDSS Virtual data, “Petascale” problems (Petaflops, Petabytes) Toolkit, export to other disciplines, outreach/education
Both emphasize Application sciences drivers CS/application partnership (reflected in funding) Performance
Explicitly complementary
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 29
UniversityCPU, Disk,
Users
PRIMARY SITEData Acquisition,Tape, CPU, Disk,
Robot
Satellite SiteTape, CPU, Disk, Robot
Satellite SiteTape, CPU, Disk, Robot
UniversityCPU, Disk,
Users
UniversityCPU, Disk,
Users
Satellite SiteTape, CPU, Disk, Robot
Resource Discovery, Matchmaking, Co-Scheduling/Queueing, Tracking/Monitoring, Problem Trapping + Resolution
PPDG Multi-site Cached File Access System
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 30
GriPhyN: PetaScale Virtual-Data Grids
Virtual Data Tools
Request Planning &
Scheduling ToolsRequest Execution & Management Tools
Transforms
Distributed resources(code, storage, CPUs,networks)
Resource Management
Services
Resource Management
Services
Security and Policy
Services
Security and Policy
Services
Other Grid ServicesOther Grid
Services
Interactive User Tools
Production TeamIndividual Investigator Workgroups
Raw data source
~1 Petaflop~100 Petabytes
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 31
Virtual Data in Action
Data request may Compute locally Compute remotely Access local data Access remote data
Scheduling based on Local policies Global policies Cost
Major facilities, archives
Regional facilities, caches
Local facilities, cachesItem request
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 32
GriPhyN Goals for Virtual Data
Transparency with respect to locationCaching, catalogs, in a large-scale, high-performance Data
Grid
Transparency with respect to materializationExact specification of algorithm componentsTraceability of any data productCost of storage vs CPU vs networks
Automated management of computation Issues of scale, complexity, transparencyComplications: calibrations, data versions, software
versions, …
Explore concept of virtual data and itsapplicability to data-intensive science
HEP and Data Grids (Aug. 4-5, 2001)
Paul Avery 33
Data Grid Reference Architecture
RequestPlanningServices
Discipline- Specific Data Grid Applications
Communication, service discovery (DNS), authentication, delegation
Application
Collective
Resource
Connectivity
Fabric StorageSystems
ComputeSystems
Networks Catalogs
ReplicaSelectionServices
ReplicaManagement
Services
CommunityAuthorization
Service
CodeRepositories
StorageMgmt
Protocol
ComputeMgmt
Protocol
NetworkMgmt
Protocol
CatalogMgmt
Protocol
CodeMgmt
Protocol
ServiceReg.
Protocol
EnquiryProtocol
OnlineCertificateRepository
InformationServices
CoallocationServices
DistributedCatalogServices
ConsistencyManagement
Services
SystemMonitoringServices
ResourceBrokeringServices
UsageAccounting
Services
RequestManagement
Services