Upload
gary-sutton
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
CASC Meeting (July 14, 2004)
Paul Avery 1
Paul AveryUniversity of [email protected]
Physics Grids and Open Science Grid
CASC Meeting Washington, DC
July 14, 2004
CASC Meeting (July 14, 2004)
Paul Avery 2
U.S. “Trillium” Grid Consortium Trillium = PPDG + GriPhyN + iVDGL
Particle Physics Data Grid: $12M (DOE) (1999 – 2004+)GriPhyN: $12M (NSF) (2000 – 2005) iVDGL: $14M (NSF) (2001 – 2006)
Basic composition (~150 people)PPDG: 4 universities, 6 labsGriPhyN: 12 universities, SDSC, 3 labs iVDGL: 18 universities, SDSC, 4 labs, foreign partnersExpts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO,
SDSS/NVO
Complementarity of projectsGriPhyN: CS research, Virtual Data Toolkit (VDT)
developmentPPDG: “End to end” Grid services, monitoring, analysis iVDGL: Grid laboratory deployment using VDTExperiments provide frontier challengesUnified entity when collaborating internationally
CASC Meeting (July 14, 2004)
Paul Avery 3
Trillium Science Drivers Experiments at Large Hadron
Collider100s of Petabytes 2007 - ?
High Energy & Nuclear Physics expts~1 Petabyte (1000 TB) 1997 –
present
LIGO (gravity wave search)100s of Terabytes 2002 –
present
Sloan Digital Sky Survey10s of Terabytes 2001 –
present
Data
gro
wth
Com
mu
nit
y g
row
th
2007
2005
2003
2001
2009
Future Grid resources Massive CPU (PetaOps) Large distributed datasets (>100PB) Global communities (1000s)
CASC Meeting (July 14, 2004)
Paul Avery 4
Search for Origin of Mass & Supersymmetry (2007 – ?)
TOTEM
LHCb
ALICE
27 km Tunnel in Switzerland & France
CMS
ATLAS
Large Hadron Collider (LHC) @ CERN
CASC Meeting (July 14, 2004)
Paul Avery 5
Trillium: Collaborative Program of Work
Common experiments, leadership, participants CS research
Workflow, scheduling, virtual data
Common Grid toolkits and packagingVirtual Data Toolkit (VDT) + Pacman packaging
Common Grid infrastructure: Grid3National Grid for testing, development and production
Advanced networkingUltranet, UltraLight, etc.
Integrated education and outreach effort+ collaboration with outside projects
Unified entity in working with international projectsLCG, EGEE, South America
CASC Meeting (July 14, 2004)
Paul Avery 6
Goal: Peta-scale Virtual-Data Gridsfor Global Science
Virtual Data Tools
Request Planning &Scheduling Tools
Request Execution & Management Tools
Transforms
Distributed resources(code, storage, CPUs,networks)
ResourceManagement
Services
Security andPolicy
Services
Other GridServices
Interactive User Tools
Production TeamSingle Researcher Workgroups
Raw datasource
PetaOps Petabytes Performance
GriPhyNGriPhyN GriPhyN
GriPhyN
CASC Meeting (July 14, 2004)
Paul Avery 7
LHC: Petascale Global Science Complexity: Millions of individual detector channels Scale: PetaOps (CPU), 100s of Petabytes (Data) Distribution: Global distribution of people & resources
CMS Example- 20075000+ Physicists 250+ Institutes 60+ Countries
BaBar/D0 Example - 2004700+ Physicists 100+ Institutes 35+ Countries
CASC Meeting (July 14, 2004)
Paul Avery 8
CMS Experiment
Global LHC Data Grid Hierarchy
Online System
CERN Computer Center
USAKorea RussiaUK
Maryland
0.1 - 1.5 GBytes/s
>10 Gb/s
10-40 Gb/s
2.5-10 Gb/s
Tier 0
Tier 1
Tier 3
Tier 4
Tier 2
Physics caches
PCs
Iowa
UCSDCaltechU Florida
~10s of Petabytes/yr by 2007-8~1000 Petabytes in < 10 yrs?
FIU
CASC Meeting (July 14, 2004)
Paul Avery 9
HEP Bandwidth Roadmap (Gb/s)
HEP: Leading role in network development/deployment
Year Production Experimental Remarks
2001 0.155 0.62 - 2.5 SONET/SDH
2002 0.622 2.5 SONET/SDH DWDM; GigE Integ.
2003 2.5 10 DWDM; 1+10 GigE Integration
2005 10 2-4 10 Switch; Provisioning
2007 2-4 10 1 40 1st Gen. Grids
2009 1-2 40 ~5 40 40 Gb/s; Switching
2011 ~5 40 ~25 40 Terabit Networks
2013 ~Terabit ~MultiTbps ~Fill One Fiber
CASC Meeting (July 14, 2004)
Paul Avery 10
Tier2 Centers Tier2 facility
20 – 40% of Tier1?“1 FTE support”: commodity CPU & disk, no hierarchical
storageEssential university role in extended computing
infrastructureValidated by 3 years of experience with proto-Tier2 sites
FunctionsPhysics analysis, simulationsSupport experiment softwareSupport smaller institutions
Official role in Grid hierarchy (U.S.)Sanctioned by MOU with parent organization (ATLAS, CMS,
LIGO)Local P.I. with reporting responsibilitiesSelection by collaboration via careful process
CASC Meeting (July 14, 2004)
Paul Avery 11
Analysis by Globally Distributed Teams
Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flows
CASC Meeting (July 14, 2004)
Paul Avery 12
Trillium Grid Tools: Virtual Data Toolkit
Sources(CVS)
Patching
GPT srcbundles
NMI
Build & TestCondor pool
(37 computers)
…
Build
Test
Package
VDT
Build
Contributors (VDS, etc.)
Build
Pacman cache
RPMs
Binaries
Binaries
Binaries Test
Use NMI processes later
CASC Meeting (July 14, 2004)
Paul Avery 13
Virtual Data Toolkit: Tools in VDT 1.1.12
Globus Alliance Grid Security Infrastructure
(GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS)
Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds
EDG & LCG Make Gridmap Cert. Revocation List
Updater Glue Schema/Info provider
ISI & UC Chimera & related tools Pegasus
NCSA MyProxy GSI OpenSSH
LBL PyGlobus Netlogger
Caltech MonaLisa
VDT VDT System Profiler Configuration software
Others KX509 (U. Mich.)
CASC Meeting (July 14, 2004)
Paul Avery 14
VDT Growth (1.1.14 Currently)
0
2
46
8
10
12
14
1618
20
22
Jan-
02
Mar
-02
May
-02
Jul-0
2
Sep
-02
Nov
-02
Jan-
03
Mar
-03
May
-03
Jul-0
3
Sep
-03
Nov
-03
Jan-
04
Mar
-04
May
-04
Number of Components
VDT 1.1.3,1.1.4 & 1.1.5 pre-SC 2002
VDT 1.0Globus 2.0bCondor 6.3.1
VDT 1.1.7Switch to Globus 2.2
VDT 1.1.11Grid3
VDT 1.1.8First real use by LCG
VDT 1.1.14May 10
CASC Meeting (July 14, 2004)
Paul Avery 15
Language: define software environments Interpreter: create, install, configure, update, verify
environments LCG/Scram ATLAS/CMT CMS DPE/tar/make LIGO/tar/make OpenSource/tar/
make
Globus/GPT NPACI/TeraGrid/tar/
make D0/UPS-UPD Commercial/tar/make
Combine and manage software from arbitrary sources.
% pacman –get iVDGL:Grid3“1 button install”: Reduce burden on administrators
Remote experts define installation/ config/updating for everyone at once
VDT
ATLAS
ATLAS
NPACI
NPACI
D-Zero
D-Zero
iVDGL
iVDGL
UCHEP
UCHEP
% pacman
VDT
CMS/DPE
LIGO
Packaging of Grid Software: Pacman
CASC Meeting (July 14, 2004)
Paul Avery 16
Trillium Collaborative Relationships
Internal and External
ComputerScience
Research
VirtualData
Toolkit
Partner Physics projects
Partner Outreach projects
LargerScience
Community
Globus, Condor, NMI, iVDGL, PPDG,EU DataGrid, LHC Experiments,
QuarkNet, CHEPREO
ProductionDeployment
Tech
Transfer
Techniques
& software
RequirementsPrototyping
& experiments
Other linkages Work force CS researchers Industry
U.S.GridsInt’l
Outreach
CASC Meeting (July 14, 2004)
Paul Avery 17
Grid3: An Operational National Grid28 sites: Universities + national labs2800 CPUs, 400–1300 jobsRunning since October 2003Applications in HEP, LIGO, SDSS, Genomics
http://www.ivdgl.org/grid3
CASC Meeting (July 14, 2004)
Paul Avery 18
Grid3: Three Months Usage
CASC Meeting (July 14, 2004)
Paul Avery 19
Production Simulations on Grid3
Used = 1.5 US-CMS resources
US-CMS Monte Carlo Simulation
USCMS
Non-USCMS
CASC Meeting (July 14, 2004)
Paul Avery 20
Grid3 Open Science Grid Build on Grid3 experience
Persistent, production-quality Grid, national + international scope
Ensure U.S. leading role in international scienceGrid infrastructure for large-scale collaborative scientific
research
Create large computing infrastructureCombine resources at DOE labs and universities to
effectively become a single national computing infrastructure for science
Provide opportunities for educators and studentsParticipate in building and exploiting this grid
infrastructureDevelop and train scientific and technical workforceTransform the integration of education and research at all
levelshttp://www.opensciencegrid.org
CASC Meeting (July 14, 2004)
Paul Avery 21
Roadmap towards Open Science Grid
Iteratively build & extend Grid3, to national infrastructure
Shared resources, benefiting broad set of disciplinesStrong focus on operationsGrid3 Open Science Grid
Contribute US-LHC resources to provide initial federationUS-LHC Tier-1 and Tier-2 centers provide significant resources
Build OSG from laboratories, universities, campus grids, etc.
Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. LabUW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc.Corporations?
Further develop OSGPartnerships and contributions from other sciences, universities Incorporation of advanced networkingFocus on general services, operations, end-to-end performance
CASC Meeting (July 14, 2004)
Paul Avery 22
Enterprise
Possible OSG Collaborative Framework
Technical Groups
0…n (small)
Consortium Board(1)
ResearchGrid Projects
VO Org
Researchers
Sites
Service Providers
Campus, Labs
activity1activity
1activity1activity
0…N (large)
Joint committees(0…N small)
Participants provide:resources, management, project steering groups
CASC Meeting (July 14, 2004)
Paul Avery 23
Open Science Grid Consortium Join U.S. forces into the Open Science Grid consortium
Application scientists, technology providers, resource owners, ...
Provide a lightweight framework for joint activities...Coordinating activities that cannot be done within one projectAchieve technical coherence and strategic vision (roadmap)Enable communication and provide liaisingReach common decisions when necessary
Create Technical GroupsPropose, organize, oversee activities, peer, liaiseNow: Security and storage servicesLater: Support centers, policies
Define activitiesContributing tasks provided by participants joining the activity
CASC Meeting (July 14, 2004)
Paul Avery 24
Applications, Infrastructure, Facilities
CASC Meeting (July 14, 2004)
Paul Avery 25
Open Science Grid Partnerships
CASC Meeting (July 14, 2004)
Paul Avery 26
Partnerships for Open Science Grid Complex matrix of partnerships: challenges &
opportunitiesNSF and DOE, Universities and Labs, physics and computing
depts.U.S. LHC and Trillium Grid ProjectsApplication sciences, computer sciences, information
technologiesScience and educationU.S., Europe, Asia, North and South AmericaGrid middleware and infrastructure projectsCERN and U.S. HEP labs, LCG and Experiments, EGEE and OSG
~85 participants on OSG stakeholder mail list~50 authors on OSG abstract to CHEP conference (Sep. 2004)
Meetings Jan. 12: Initial stakeholders meetingMay 20-21: Joint Trillium Steering meeting to define programSep. 9-10: OSG workshop, Boston
CASC Meeting (July 14, 2004)
Paul Avery 27
Education and Outreach
CASC Meeting (July 14, 2004)
Paul Avery 28
NEWS: Bulletin: ONE TWOWELCOME BULLETIN General InformationRegistrationTravel Information Hotel RegistrationParticipant List How to Get UERJ/Hotel Computer AccountsUseful Phone Numbers ProgramContact us: Secretariat Chairmen
Grids and the Digital DivideRio de Janeiro, Feb. 16-20, 2004
Background World Summit on Information Society HEP Standing Committee on Inter-
regional Connectivity (SCIC)
Themes Global collaborations, Grids and
addressing the Digital Divide
Next meeting: 2005 (Korea)
http://www.uerj.br/lishep2004
CASC Meeting (July 14, 2004)
Paul Avery 29
iVDGL, GriPhyN Education / Outreach
Basics $200K/yr Led by UT
Brownsville Workshops, portals Partnerships with
CHEPREO, QuarkNet, …
CASC Meeting (July 14, 2004)
Paul Avery 30
June 21-25 Grid Summer School First of its kind in the U.S. (South Padre Island,
Texas)36 students, diverse origins and types (M, F, MSIs, etc)
Marks new direction for TrilliumFirst attempt to systematically train people in Grid
technologiesFirst attempt to gather relevant materials in one placeToday: Students in CS and PhysicsLater: Students, postdocs, junior & senior scientists
Reaching a wider audiencePut lectures, exercises, video, on the webMore tutorials, perhaps 3-4/yearDedicated resources for remote tutorialsCreate “Grid book”, e.g. Georgia Tech
New funding opportunitiesNSF: new training & education programs
CASC Meeting (July 14, 2004)
Paul Avery 31
CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University
Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S.
America)
Funded September 2003
$4M initially (3 years) 4 NSF Directorates!
CASC Meeting (July 14, 2004)
Paul Avery 33
UUEO: A New Initiative Meeting April 8 in Washington DC Brought together ~40 outreach leaders (including
NSF) Proposed Grid-based framework for common E/O
effort
CASC Meeting (July 14, 2004)
Paul Avery 34
Grid Project References Grid3
www.ivdgl.org/grid3 PPDG
www.ppdg.net GriPhyN
www.griphyn.org iVDGL
www.ivdgl.org Globus
www.globus.org LCG
www.cern.ch/lcg EU DataGrid
www.eu-datagrid.org EGEE
egee-ei.web.cern.ch
2nd Editionwww.mkp.com/grid2
CASC Meeting (July 14, 2004)
Paul Avery 35
Extra Slides
CASC Meeting (July 14, 2004)
Paul Avery 36
OutreachQuarkNet-Trillium Virtual Data
Portal More than a web site
Organize datasets Perform simple
computations Create new computations &
analyses View & share results Annotate & enquire
(metadata) Communicate and
collaborate
Easy to use, ubiquitous, No tools to install
Open to the community Grow & extend
Initial prototype implemented by graduate student Yong Zhao and M. Wilde (U. of Chicago)
CASC Meeting (July 14, 2004)
Paul Avery 37
GriPhyN Achievements Virtual Data paradigm to express science
processesUnified language (VDL) to express general data
transformationAdvanced planners, executors, monitors, predictors, fault
recovery to make the Grid “like a workstation”
Virtual Data Toolkit (VDT)Tremendously simplified installation & configuration of
GridsClose partnership with and adoption by multiple sciences:
ATLAS, CMS, LIGO, SDSS, Bioinformatics, EU Projects
Broad education & outreach program (UT Brownsville)
25 graduate, 2 undergraduate; 3 CS PhDs by end of 2004Virtual Data for QuarkNet Cosmic Ray projectGrid Summer School 2004, 3 MSIs participating
CASC Meeting (July 14, 2004)
Paul Avery 38
Grid3 Broad Lessons Careful planning and coordination essential to build
GridsCommunity investment of time/resources
Operations team needed to operate Grid as a facility
Tools, services, procedures, documentation, organizationSecurity, account management, multiple organizations
Strategies needed to cope with increasingly large scale
“Interesting” failure modes as scale increasesDelegation of responsibilities to conserve human
resourcesProject, Virtual Org., Grid service, site, application
Better services, documentation, packaging
Grid3 experience critical for building “useful” GridsFrank discussion in “Grid3 Project Lessons” doc
CASC Meeting (July 14, 2004)
Paul Avery 39
Grid2003 Lessons (1) Need for financial investment
Grid projects: PPDG + GriPhyN + iVDGL: $35MNational labs: Fermilab, Brookhaven, Argonne, LBLOther expts: LIGO, SDSS
Critical role of collaborationCS + 5 HEP + LIGO + SDSS + CS + biologyMakes possible large common effortCollaboration sociology ~50% of issues
Building “stuff”: Prototypes testbeds production Grids
Build expertise, debug software, develop tools & services
Need for “operations team”Point of contact for problems, questions, outside inquiriesResolution of problems by direct action, consultation w/
experts
CASC Meeting (July 14, 2004)
Paul Avery 40
Grid2003 Lessons (2): Packaging VDT + Pacman: Installation and configuration
Simple installation, configuration of Grid tools + applications
Major advances over 14 VDT releases
Why this is critical for usUniformity and validation across sitesLowered barriers to participation more sites!Reduced FTE overhead & communication trafficThe next frontier: remote installation
CASC Meeting (July 14, 2004)
Paul Avery 41
Grid2003 and Beyond Further evolution of Grid3: (Grid3+, etc.)
Contribute resources to persistent GridMaintain development Grid, test new software releases Integrate software into the persistent GridParticipate in LHC data challenges
Involvement of new sitesNew institutions and experimentsNew international partners (Brazil, Taiwan, Pakistan?, …)
Improvements in Grid middleware and servicesStorage services Integrating multiple VOsMonitoringTroubleshootingAccounting
CASC Meeting (July 14, 2004)
Paul Avery 42
International Virtual Data Grid Laboratory
UF
UW MadisonBNL
Indiana
Boston USKC
Brownsville
Hampton
PSUJ. Hopkins
Caltech
Tier1Tier2Other
FIU
Austin
Michigan
LBL Argonne
Vanderbilt
UCSD
Fermilab
PartnersEUBrazilKorea
Iowa Chicago
UW Milwaukee
ISI
Buffalo
CASC Meeting (July 14, 2004)
Paul Avery 43
U Florida CMS (Tier2), Management Caltech CMS (Tier2), LIGO (Management) UC San Diego CMS (Tier2), CS Indiana U ATLAS (Tier2), iGOC (operations) Boston U ATLAS (Tier2) Harvard ATLAS (Management) Wisconsin, Milwaukee LIGO (Tier2) Penn State LIGO (Tier2) Johns HopkinsSDSS (Tier2), NVO Chicago CS, Coord./Management, ATLAS
(Tier2) Vanderbilt* BTEV (Tier2, unfunded) Southern California CS Wisconsin, Madison CS Texas, Austin CS Salish Kootenai LIGO (Outreach, Tier3) Hampton U ATLAS (Outreach, Tier3) Texas, Brownsville LIGO (Outreach, Tier3) Fermilab CMS (Tier1), SDSS, NVO Brookhaven ATLAS (Tier1) Argonne ATLAS (Management), Coordination
Roles of iVDGL Institutions
CASC Meeting (July 14, 2004)
Paul Avery 44
“Virtual Data”: Derivation & Provenance
Most scientific data are not simple “measurements”They are computationally corrected/reconstructedThey can be produced by numerical simulation
Science & eng. projects are more CPU and data intensive
Programs are significant community resources (transformations)
So are the executions of those programs (derivations)
Management of dataset dependencies critical!Derivation: Instantiation of a potential data productProvenance: Complete history of any existing data
productPreviously: Manual methodsGriPhyN: Automated, robust
tools
CASC Meeting (July 14, 2004)
Paul Avery 45
Transformation Derivation
Data
product-of
execution-of
consumed-by/generated-by
“I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.”
“I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.”
“I want to search a database for 3 muon SUSY events. If a program that does this analysis exists, I won’t have to write one from scratch.”
“I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.”
Virtual Data Motivations
CASC Meeting (July 14, 2004)
Paul Avery 46
decay = WWWW e Pt > 20
decay = WWWW e
decay = WWWW leptons
mass = 160
decay = WW
decay = ZZ
decay = bb
Other cuts
Scientist adds a new derived data branch & continues analysis
Virtual Data Example: LHC Analysis
Other cuts Other cuts Other cuts
CASC Meeting (July 14, 2004)
Paul Avery 47
Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
Sloan Data
CASC Meeting (July 14, 2004)
Paul Avery 48
The LIGO International Grid
LIGO Grid: 6 US sites + 3 EU sites (Cardiff/UK, AEI/Germany)
LHO, LLO: observatory sitesLSC - LIGO Scientific Collaboration
Cardiff
AEI/Golm •
Birmingham•
CASC Meeting (July 14, 2004)
Paul Avery 49
Grids: Enhancing Research & Learning
Fundamentally alters conduct of scientific research“Lab-centric”: Activities center around large facility“Team-centric”: Resources shared by distributed teams“Knowledge-centric”: Knowledge generated/used by a
community Strengthens role of universities in research
Couples universities to data intensive scienceCouples universities to national & international labsBrings front-line research and resources to studentsExploits intellectual resources of formerly isolated schoolsOpens new opportunities for minority and women researchers
Builds partnerships to drive advances in IT/science/engHEP Physics, astronomy, biology, CS, etc.“Application” sciences Computer ScienceUniversities LaboratoriesScientists StudentsResearch Community IT industry