Upload
melina
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Open Science Grid.. An introduction. Ruth Pordes Fermilab. 1999. 2000. 2001. 2002. 2003. 2004. 2005. 2006. 2007. 2008. 2009. OSG Provenance. iVDGL. (NSF). OSG. GriPhyN. Trillium. Grid3. (NSF). (DOE+NSF). PPDG. (DOE). Introducing myself. - PowerPoint PPT Presentation
Citation preview
1
Open Science Grid.. An introduction
Ruth Pordes
Fermilab
2
OSG Provenance
1999 2000 2001 2002 20052003 2004 2006 2007 2008 2009
PPDG
GriPhyN
iVDGL
Trillium Grid3 OSG
(DOE)
(DOE+NSF)(NSF)
(NSF)
3
Introducing myself
at Fermilab for 25 years (well and 2 years in the “pioneer” ‘70s),started on data acquisition for High Energy Physics experiments, a “builder” of the Sloan Digital Sky Survey,led development of a common data acquisition system for 6
experiments at Fermilab (DART),coordinator of the CDF/D0 Joint Run II offline projects (with Dane), coordinator of the Particle Physics Data Grid SciDAC I
collaboratory,founder of Trillium collaboration of iVDGL, GridPhyN, PPDG, and
GLUE interoperability between US and EU.
Now I am variously:Executive Director of the Open Science Grid, an Associate Head of the Computing Division at Fermilab, andUS CMS Grid Services and Interfaces Coordinator.
4
A Common Grid Infrastructure
5
Overlaid by community computational environments of single to large groups of researchers located locally to
worldwide
6
Grid of Grids - from Local to Global
Community Campus
National
7
Current OSG deployment
96 Resources across production & integration infrastructures
20 Virtual Organizations +6 operations
Includes 25% non-physics.
~20,000 CPUs (from 30 to 4000 shared between OSG and local use)
~6 PB Tapes
~4 PB Shared Disk
Jobs Running on OSG over 9 months
Sustaining through OSG submissions:
3,000-4,000 simultaneous jobs .
~10K jobs/day
~50K CPUhours/day.
Peak short validation jobs ~15K
Using production & research networks
8
Examples of Sharing
Site Max # Jobs
ASGC_OSG 9
BU_ATLAS_Tier2 154
CIT_CMS_T2 99
FIU-PG 58
FNAL_GPFARM 17
OSG_LIGO_PSU 1
OU_OCHEP_SWT2 82
Purdue-ITaP 3
UC_ATLAS_MWT2 88
UFlorida-IHEPA 1
UFlorida-PG (CMS) 1
UMATLAS
UWMadisonCMS 594
UWMilwaukee 2
osg-gw-2.t2.ucsd.edu 2
CPUHours 55,000
VO Max # Jobs
ATLAS 2
CDF 279
CMS 559 COMPBIOGRID 10 GADU 1
LIGO 75
Average # of Jobs (~300 batch slots)
253
CPUHours 30,000
#Jobs Completed 50,000
last week at UCSD -- CMS Site
last week of ATLAS
9
OSG Consortium
ContributorsContributors
ProjectProject
10
OSG Project
11
OSG & its goalsProject receiving ~$6/M/Year for 5 years from DOE and NSF for effort to sustain
and evolve the distributed facility, bring on board new communities and capabilities and EOT. Hardware resources contributed by OSG Consortium members.
Goals:Support data storage, distribution & computation for High Energy, Nuclear & Astro Physics
collaborations, in particular delivering to the needs of LHC and LIGO science.
Engage and benefit other Research & Science of all scales through progressively supporting their applications.
Educate & train students, administrators & educators.
Provide a petascale Distributed Facility across the US with guaranteed & opportunistic access to shared compute & storage resources.
Interface, Federate and Collaborate with Campus, Regional, other national & international Grids, in particular with EGEE & TeraGrid.
Provide an Integrated, Robust Software Stack for Facility & Applications, tested on a well provisioned at-scale validation facility.
Evolve the capabilities by deploying externally developed new technologies through joint projects with the development groups.
12
Middleware Stack and DeploymentOSG Middleware is deployed on existing farms and storage systems.
OSG Middleware interfaces to the existing installations of OS, utilities and batch systems.
VOs have VO scoped environments in which they deploy applications (and other files), execute code and store data.
VOs are responsible for and have control over their end-to-end distributed system using the OSG infrastructure.
End-to-end s/w Stack
Deployment into Production
Integration Grid has ~15 sites
13
OSG will support Global Data Transfer, Storage & Access at GBytes/sec 365 days a year e.g.
CMS
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Data To / From Tape at Tier-1
Need to triple in ~1 year.
Data To / From Tape at Tier-1
Need to triple in ~1 year.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Data to Disk Caches - Data SamplesData to Disk Caches - Data Samples
200MB/sec200MB/sec600MB/sec600MB/sec
Tier-2 sites data distributed toTier-2 sites data distributed to~7 Tier-1s, CERN + Tier-2s~7 Tier-1s, CERN + Tier-2s
OSG must enable data placement, disk usage, resource managament policies, of 10s Gbit/Sec data movement, 10s Petabyte tape stores, local shared disk caches of 100sTBs across 10s of sites for >10 VOs.
Data distribution will depend on & integrate to advanced network infrastructures: Internet 2 will provide "layer 2” connectivity between OSG University Sites and peers in
Europe. ESNET will provide "layer 2" connectivity between OSG DOE Laboratory sites and EU
GEANT network. Both include the use of the IRNC link (NSF) from the US to Amsterdam
14
Security Infrastructure• Identity: X509 Certificates. Authentication and Authorization using
VOMS extended attribute certficates.• Security Process modelled on NIST procedural controls -
management, operational, technical, starting from an inventory of the OSG assets.
• User and VO Management: VO Registers with Operations Center User registers through VOMRS or VO administrator Site Registers with the Operations Center Each VO centrally defines and assigns roles Each Site provides role to access mappings based on VO/VOGroup. Can reject individuals.
• Heterogeneous identity management systems – OSG vs TeraGrid/EGEE , grid vs. local, compute vs. storage, head-node vs. , old-version vs. new-version. Issues include: Cross domain right management Right/identity management of software modules and resources Error/rejection propagation Solutions/approaches that work end-to-end
15
Education, Outreach, TrainingTraining Workshops -
for Administrators and Application Developers e.g. Grid Summer Workshop (in 4th year)
Outreach - e.g. Science Grid This Week
-> International Science Grid This Week
Education through e-Labs
16
Integrated Network Management
OSG Initial Timeline & Milestones - Summary
LHC Simulations Support 1000 Users; 20PB Data Archive
Contribute to Worldwide LHC Computing Grid LHC Event Data Distribution and Analysis
Contribute to LIGO Workflow and Data Analysis
+1 Community
Additional Science Communities +1 Community
+1 Community
+1 Community
Facility Security : Risk Assessment, Audits, Incident Response, Management, Operations, Technical Controls
Plan V1 1st Audit Risk Assessment
Audit Risk Assessment
Audit Risk Assessment
Audit Risk Assessment
VDT and OSG Software Releases: Major Release every 6 months; Minor Updates as needed VDT 1.4.0VDT 1.4.1VDT 1.4.2 … … … …
Advanced LIGO LIGO Data Grid dependent on OSG
CDF Simulation
STAR, CDF, D0, Astrophysics
D0 Reprocessing
STAR Data Distribution and Jobs 10KJobs per Day
D0 SimulationsCDF Simulation and Analysis
LIGO data run SC5
Facility Operations and Metrics: Increase robustness and scale; Operational Metrics defined and validated each year.
Interoperate and Federate with Campus and Regional Grids
2006 2007 2008 2009 2010 2011
Project start End of Phase I End of Phase II
VDT Incremental
Updates
dCache with role based
authorization
OSG 0.6.0OSG 0.8.0 OSG 1.0 OSG 2.0 OSG 3.0 …
Accounting Auditing
VDS with SRMCommon S/w Distribution
with TeraGridEGEE using VDT 1.4.X
Transparent data and job movement with TeraGridTransparent data management with
EGEE
Federated monitoring and information services
Data Analysis (batch and interactive) Workflow
Extended Capabilities & Increase Scalability and Performance for Jobs and Data to meet Stakeholder needsSRM/dCache Extensions
“Just in Time” Workload Management
VO Services Infrastructure
Improved Workflow and Resource Selection
Work with SciDAC-2 CEDS and Security with Open Science
+1 Community
2006 2007 2008 2009 2010 2011
+1 Community
+1 Community
+1 Community
+1 Community