Upload
sen
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
“Data Handling in HEP” Towards LHC computing…. EPS Conference July 22, 2003 David Stickland, Princeton University. Overview. Why GRIDS? High Level Triggers Demonstration that LHC Physics can be selected by purely software triggers and “commodity” computing farms Software - PowerPoint PPT Presentation
Citation preview
DPS Jul 03, EPS plenary
““Data Handling in HEP”Data Handling in HEP”
Towards LHC computing…Towards LHC computing…
EPS Conference
July 22, 2003David Stickland, Princeton University
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 2
OverviewOverview
Why GRIDS?
High Level Triggers Demonstration that LHC Physics can be selected by purely
software triggers and “commodity” computing farms
Software GEANT4 almost ready for prime-time Persistency and Core Framework Software
Computing PASTA Summary of Computing Costs and Projections Planning for LHC Computing
Deploying an HEP 24x7 Operational GRID
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 3
Why GRIDs?Why GRIDs?
The Computing must be distributed: Politics, Economics, Physics, Manpower, …
Optimized use of globally distributed resources: Retrieve data from remote disks with best availability Submit jobs to centers for which they are best suited. Base these decisions on the current systems status
Requires Common protocols for data and status information exchange The GRID
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 4
Without GRIDS?Without GRIDS?
Major experiments (CDF,D0,Babar, Belle) are running now using only some GRID base components.
Production tasks can always be made to work (If there are enough resources)
The collaboration members need to be “organized” to avoid resource contention suffocating the system.
D0 developed the “SAM” system, an advanced Data Management system for HEP analysis
(now also adopted by CDF) Regional Analysis Centers becoming operational based on SAM. Important tool for Tevatron (and testing ground for LHC)
LHC data rates to tape are >20 times those of Tevatron
Widely spread, and large, collaborations require, efficient access to what is expected to be a very rich physics environment
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 5
The LCG: LHC Computing GRIDThe LCG: LHC Computing GRID
Building a Production GRID service “LCG1” ready for deployment now for 2003/4 Data Challenges
Developing and maintaining some of the base framework software
Infrastructure: Savannah, SCRAM, External Software, … LCG Projects: POOL(Persistency) ,SEAL (Framework Services),… LCG Contributions/Collaboration: ROOT, GEANT4,… Deeply collaborating with the Experiment Software teams
Integrating the Worldwide GRID prototype software
GLOBUS, EDG, VDT, … Collaborating with the Recently Approved EGEE EU project to
build a heterogeneous and interoperable European GRID
Managed by the LHC Experiments, CERN and the Regional Centers
LCG
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 6
p-p collisions at LHCp-p collisions at LHC
Crossing rate 40 MHzEvent Rates: ~109 Hz
Max LV1 Trigger 100 kHzEvent size ~1 MbyteReadout network 1 Terabit/sFilter Farm ~107 Si2KTrigger levels 2Online rejection 99.9997% (100 Hz from 50 MHz)System dead time ~ %Event Selection: ~1/1013
Crossing rate 40 MHzEvent Rates: ~109 Hz
Max LV1 Trigger 100 kHzEvent size ~1 MbyteReadout network 1 Terabit/sFilter Farm ~107 Si2KTrigger levels 2Online rejection 99.9997% (100 Hz from 50 MHz)System dead time ~ %Event Selection: ~1/1013
Event rate
“Discovery” rate
LuminosityLow 2x1033 cm-2 s-1
High 1034 cm-2 s-1
Level 1 Trigger
Rate to tape
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 7
HLT Muon track HLT Muon track reconstructionreconstruction
Inclusion of Tracker Hits: “Level-3”•Define a region of interest through tracker based on L2 track with parameters at vertex• Find pixel seeds, and propagate from innermost layers out, including muon
Standalone Muon Reconstruction: “Level-2”• Seeded by Level-1 muons• Kalman filtering technique applied to DT/CSC/RPC track segments•GEANE used for propagation through iron• Trajectory building works from inside out• Track fitting works from outside in• Fit track with beam constraint
Single muons10<Pt<100 GeV/c
Level-3Algorithmic efficiency
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 8
Inclusive b tagging at HLTInclusive b tagging at HLT
Inclusive b tag at HLT possible, provided alignment under control
Use tracks to define Jet axis(if rely on L1 Calo Jet ~ randomize signed IP)
Performance of simple signed IP “track counting” tags~ same as after full track reconstruction
Regional Tracking: Look only inJet-track matching cone
Loose Primary Vertex association
Conditional Tracking: Stop track as soon asPixel seed found (PXL) / 6 hits found (Trk)
If Pt<1 GeV with high C.L.
~300 ms low lumi~300 ms low lumi~1 s high lumi~1 s high lumi
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 9
HLT table: LHC start…HLT table: LHC start…
Level-1 rate “DAQ staging”:
50 KHz
Total Rate: 105 Hz
Average HLT CPU:
300ms*1GHz
Improvements are possible
Channel Efficiency (for fiducial objects)
H(115 GeV) 77%
H(160 GeV)WW* 2 92%
HZZ4 92%
A/H(200 GeV)2 45%
SUSY (~0.5 TeV sparticles) ~60%
With RP-violation ~20%
We 67% (fid: 60%)
W 69% (fid: 50%)
Top X 72%
HLT performances:
Priority to discovery channels
Trigger Threshold
(=90-95%) (GeV)
Indiv.
Rate (Hz)
Cumul rate(Hz)
1e, 2e 29, 17 34 34
1, 2 80, (40*25) 9 43
1, 2 19, 7 29 72
1, 2 86, 59 4 76
Jet * Miss-ET180 * 123 5 81
1-jet, 3-jet, 4-jet 657, 247, 113 9 89
e * jet 19 * 52 1 90
Inclusive b-jets 237 5 95
Calibration/other 10 105
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 10
HLT: CPU usageHLT: CPU usage
All numbers for a 1 GHz, Intel Pentium-III CPU
Trigger CPU (ms) Rate (kHz)
Total (s)
1e/, 2e/ 160 4.3 688
1, 2 710 3.6 2556
1, 2 130 3.0 390
Jets, Jet * Miss-ET
50 3.4 170
e * jet 165 0.8 132
B-jets 300 0.5 150
Total: 4092 s for 15.1 kHz 271 ms/eventTime completely dominated by slow GEANE extrapolation in muons – will improve!Consider ~50% uncertainty!
Today: ~300 ms/event on a 1GHz Pentium-III CPU
Physics start-up (50 kHz LVL1 output): need 15,000 CPUs
Moore’s Law: 8x faster CPUs in 2007~ 40 ms in 2007, ~2,000 CPUs~1,000 dual-CPU boxes in Filter Farm
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 11
CMS Full Detector Simulation with GEANT4CMS Full Detector Simulation with GEANT4
G4G3
Going in to production for
Data Challenge
DC04 (Now)
Tracker Reconstructed Hits as a function of
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 12
Muon Energy Loss in Liquid ArgonMuon Energy Loss in Liquid Argon
Geant4 simulation (+ el. noise) describes well beam test data
Reconstructed Energy [GeV]0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
Reconstructed Energy [GeV]
Δ e
vents
/0.1
GeV
[%
]Fra
ctio
n e
vents
/0.1
G
eV
10-4
10-3
10-2
10-1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Eμ= 100 GeV, ημ ≈ 0.975
Electromagnetic Barrel Calorimeter EMB (Liquid Argon/Lead Accordion)
Electromagnetic Barrel Calorimeter EMB (Liquid Argon/Lead Accordion)
-100 1000 200 300 400 5000
600
100
200
300
400
500
700
800
Calorimeter Signal [nA]
180 GeV μ180 GeV μ
Hadronic EndCap Calorimeter (HEC) (Liquid Argon/Copper Parallel Plate)Hadronic EndCap Calorimeter (HEC) (Liquid Argon/Copper Parallel Plate)
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 13
Epre
show
er/
Ebeam
[%
]
Sam
plin
g 1
/Ebeam
[%
]
Sam
plin
g 2
/Ebeam
[%
]
Sam
plin
g 3
/Ebeam
[%
]
0 0.5 1.0
0 0.5 1.0 0 0.5 1.0
0 0.5 1.0
- Geant4o - Geant3 - data
250 GeV e–
energy in longitudinal samplings as a function of
Electron shower shapes in EMBElectron shower shapes in EMB
Geant4 electromagnetic showers for 20-245 GeV electrons in the EMB are more compact longitudinally than in Geant3:
Latest comparison with data shows that Geant4 does better job at all
Small discrepancy in last sampling (2X0 out of 24X0 full calorimeter depth) but energy deposition is very small and uncertainty is large
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 14
LCG Blueprint Software DecompositionLCG Blueprint Software Decomposition
EventGeneration
Core Services
Dictionary
Whiteboard
Foundation and Utility Libraries
DetectorSimulation
Engine
Persistency
StoreMgr
Reconstruction
Algorithms
Geometry Event Model
GridServices
I nteractiveServices
Modeler
GUIAnalysis
EvtGen
Calibration
Scheduler
Fitter
PluginMgr
Monitor
NTuple
Scripting
FileCatalog
ROOT GEANT4 DataGrid Python Qt
Monitor
. . .MySQLFLUKA
EventGeneration
Core Services
Dictionary
Whiteboard
Foundation and Utility Libraries
DetectorSimulation
Engine
Persistency
StoreMgr
Reconstruction
Algorithms
Geometry Event Model
GridServices
I nteractiveServices
Modeler
GUIAnalysis
EvtGen
Calibration
Scheduler
Fitter
PluginMgr
Monitor
NTuple
Scripting
FileCatalog
ROOT GEANT4 DataGrid Python Qt
Monitor
. . .MySQLFLUKA
Building a Common Core Software Environment for LHC Experiments
LCG
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 15
The LCG Persistency FrameworkThe LCG Persistency Framework
POOL is the LCG Persistency Framework Pool of persistent objects for LHC
Started in April ’02 Common effort in which the experiments take a major share of
the responsibility for defining the system architecture for development of POOL components
The LCG Pool project provides a hybrid store integrating object streaming (eg Root I/O) with RDBMS technology (eg MySQL/Oracle) for consistent meta data handling
Strong emphasis on component decoupling and well defined communication/dependencies
Transparent cross-file and cross-technology object navigation via C++ smart pointers
Integration with Grid technology (via EDG-RLS) but preserving networked and grid-decoupled working model
LCG
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 16
POOL , Files and NavigationPOOL , Files and Navigation
GRID mostly deals with data of file level granularity File Catalog connects POOL to Grid Resources
eg via the EDG-RLS backend POOL Storage Service deals with intra file structure
need connection via standard Grid File access
Both File and Object based Collections are seen as important End User concepts
POOL offers a consistent interface to both types
The goal is transparent navigation back from an object in an “ntuple” through the DST and even back to the Raw Data.
Gives the possibility to do a complex selection, deep copy only the relevant data and run a new calibration or reconstruction pass
Functional complete POOL V1.1 release has been produced in June
CMS and ATLAS Integrating and Testing Now
(CMS Hopes to have Pool in production by the end of the summer )
LCG
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 17
PASTA III Technology ReviewPASTA III Technology Review
A: Semiconductor TechnologyIan Fisk (UCSD/CMS), Alessandro Machioro (CERN), Don Petravik (Fermilab)
B:Secondary Storage Gordon Lee (CERN), Fabien Collin (CERN), Alberto Pace (CERN)
C:Mass StorageCharles Curran (CERN), Jean-Philippe Baud (CERN)
D:Networking TechnologiesHarvey Newman (Caltech/CMS), Olivier Martin (CERN), Simon Leinen (Switch)
E:Data Management TechnologiesAndrei Maslennikov (Caspur), Julian Bunn (Caltech/CMS)
F:Storage Management SolutionsMichael Ernst (Fermilab/CMS), Nick Sinanis (CERN/CMS), Martin Gasthuber (DESY )
G:High Performance Computing SolutionsBernd Panzer (CERN), Ben Segal (CERN), Arie Van Praag (CERN)
Chair David Foster Editor Gordon Lee
http://lcg.web.cern.ch/LCG/PEB/PASTAIII/pasta2002Report.htm
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 18
Basic System ComponentsBasic System Components- Processors- Processors
Performance evolution and associated cost evolution for both High-end machines (15K$ for quad processor) and Low-end Machines (2K$ for dual CPU)
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 19
Network ProgressNetwork Progress
Network backbones are advancing rapidly to the 10 Gbps range
“Gbps” end-to-end throughput data flows will be in production soon (in 1-2 years)
Wide area data migration/replication now feasible and affordable. Tests of multiple streams to the US running over 24hrs at the full
capacity of 2Gbit/sec were successful.
Network advances are changing the view of the networks’ roles
This is likely to have a profound impact on the experiments’ Computing Models, and bandwidth requirements
Advanced integrated applications, such as Data Grids, rely on seamless “transparent” operation of our LANs and WANs
With reliable, quantifiable (monitored), high performance Networks need to be integral parts of the Grid(s) design
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 20
HENP Major Links: Bandwidth HENP Major Links: Bandwidth Roadmap (in Gbps)Roadmap (in Gbps)
Year Production Experimental Remarks
2001 0.155 0.622-2.5 SONET/SDH
2002 0.622 2.5 SONET/SDH DWDM; GigE Integ.
2003 2.5 10 DWDM; 1 + 10 GigE Integration
2005 10 2-4 X 10 Switch; Provisioning
2007 2-4 X 10 ~10 X 10; 40 Gbps
1st Gen. Grids
2009 ~10 X 10 or 1-2 X 40
~5 X 40 or ~20-50 X 10
40 Gbps Switching
2011 ~5 X 40 or
~20 X 10
~25 X 40 or ~100 X 10
2nd Gen Grids Terabit Networks
2013 ~Terabit ~MultiTbps ~Fill One Fiber
Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 21
LHC Computing OutlookLHC Computing Outlook
HEP trend is to fewer and bigger experiments Multi- Peta-Bytes, GB/s, MSI2k Worldwide collaborations, thousands of physicists, …
LHC experiments will be extreme cases But, CDF, D0, Babar and Belle are approaching the same scale
and tackling the same problems even now.
(Worldwide) Hardware Computing costs at LHC will be in the region of 50M€ per year
Worldwide Software development for GRID in HEP also in this ballpark
With so few experiments, so many collaborators, so much money:
We have to get this right (enough)…
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 22
LHC Data Grid HierarchyLHC Data Grid Hierarchy
Tier 1
Tier2 Center
Online System
CERN Center PBs of Disk;
Tape Robot
FNAL CenterIN2P3 Center INFN Center RAL Center
InstituteInstituteInstituteInstitute
Workstations
~100-1500 MBytes/sec
2.5-10 Gbps
0.1 to 10 Gbps
Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.Physics data cache
~PByte/sec
~2.5-10 Gbps
Tier2 CenterTier2 CenterTier2 Center
~2.5-10 Gbps
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
Experiment
CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1
Emerging Vision: A Richly Structured, Global Dynamic System
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 23
Scheduled ComputingScheduled Computing
Organized, Scheduled, Simulation and Large-Scale Event Reconstruction is a task we understand “well”
We can make reasonably accurate estimates of the computing required
We can perform simple optimizations to share the work between the large computing centers
Total Computing power required by CMS
0
5000
10000
15000
20000
25000
30000
2003 2004 2005 2006 2007 2008
Year
kS
I2
k
Regional T2s
T1s
T1 at CERN
CERN T0
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 24
Chaotic ComputingChaotic Computing
Data Analysis is a “Feeding Frenzy” Data is widely dispersed, may be geographically mismatched to
available CPU Choosing between data and job movement?
How/When will we have the information to motivate those choices?
Move Data to Job Moving only those parts of the
data that the user really needs All of some events, or some
parts of some events? Very different resource requirements
Web-Services/ Web-Caching may be the right technologies here
Move Job to Data Information required to describe
the data requirements can (will) be complex and poorly described
Difficult for a resource broker to make good scheduling choices
Current Resource Brokers are quite primitive
Balancing the many priorities internal to an experiment is essential Completing the a-priori defined critical physics as quickly and correctly as possible Enabling the collaboration to explore the full Physics richness
Build a Flexible System, Avoid Optimizations now
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 25
(Some) Guiding Principles for LHC Computing(Some) Guiding Principles for LHC Computing
Access to Data is more of a bottleneck than access to CPU Make multiple distributed copies as early as possible
Experiment needs to be able to enact Priority Policy Stream data from Raw onwards
Some overlap allowed Partition CPU according to experiment priorities
Initial detailed analysis steps will be run at the T1’s Need access to large data samples
T2’s have (by definition?) more limited Disk/Network than the T1’s
Good for final analysis, small (TB) samples Make sure there is rapid access to locally replicate these
Perfect for Monte-Carlo Production
User Analysis tasks are equal in magnitude to Production tasks
50% Resources for each Self correcting fraction
(When it gets to big strong motivation to make the user task a common production task)
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 26
DC04 Analysis challenge
DC04 Calibration challenge
T0
T1T2
T2
T1
T2
T2
Fake DAQ(CERN)
DC04 T0challenge
SUSYBackground
DST
HLTFilter ?
CERN disk pool~40 TByte(~20 days
data)
TAG/AOD(replica)
TAG/AOD(replica)
TAG/AOD(20
kB/evt)
ReplicaConditions
DB
ReplicaConditions
DB
HiggsDST
Eventstreams
Calibrationsample
CalibrationJobs
MASTERConditions DB
1st passRecon-
struction
25Hz1.5MB/evt40MByte/s3.2 TB/day
Archivestorage
CERNTape
archive
Disk cache
25Hz1MB/evt
raw
25Hz0.5MB recoDST
Higgs backgroundStudy (requests
New events)
Eventserver
50M events75 Tbyte
1TByte/day2 months
Pre Challenge Production
CERNTape
archive
Data Challenge CMS DC04Data Challenge CMS DC04
Starting Now. “True” DC04 Feb,
2004
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 27
Deployment Goals for LCG-1Deployment Goals for LCG-1A First Production GRIDA First Production GRID
Production service for Data Challenges in second half of 2003 & 2004 Initially focused on batch production work
Gain experience in close collaboration between the Regional Centers Must have wide enough participation to understand the issues
Learn how to maintain and operate a global grid
Focus on a production-quality service Robustness, fault-tolerance, predictability, and supportability take
precedence; additional functionality gets prioritized
LCG should be integrated into the sites’ physics computing services – should not be something apart
LCG
DP
S Ju
l 03,
EP
S pl
enar
y
Slide 28
The Goal is the Physics, not the Computing…The Goal is the Physics, not the Computing…
Motivation: at L0=1033 cm-2s-1, 1 fill (6hrs) ~ 13 pb-1
1 day ~ 30 pb-1
1 month ~ 1 fb-1
1 year ~ 10 fb-1
Most of Standard-ModelHiggs can be probed within a few months Ditto for SUSY
Turn-on for detector +
computing and software will be crucial