12
NERSC BU UNM SDSC UTA OU FNAL ANL U WISC BNL VANDERBILT PSU UVA CALTECH IOWA STATE PURDUE IU BUFFALO TTU CORNELL ALBANY UMICH INDIANA IUPUI STANFORD UWM UNL UFL UNI WSU MSU LTU LSU CLEMSON UMISS UIUC UCR UCLA LEHIGH NSF ORNL HARVARD UIC SMU UCHICAGO MIT RENCI L B L GEORGETOWN UIOWA UCDAVIS ND Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid Reliant on production and advanced networking from ESNET, LHCNET and Internet2. Virtual Data Toolkit: Common software developed between Computer Science & applications used by OSG and others. OSG Today OSG Today

Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid Reliant on production and advanced networking from ESNET, LHCNET and

Embed Size (px)

Citation preview

NERSC

BU

UNMSDSC

UTA

OU

FNALANL

U WISC BNL

VANDERBILT

PSU

UVA

CALTECH

IOWA STATE

PURDUE

IU

BUFFALO

TTU

CORNELL

ALBANY

UMICH

INDIANAIUPUI

STANFORD

UWM

UNL

UFL

UNI

WSU

MSU

LTU

LSU

CLEMSONUMISS

UIUC

UCRUCLA

LEHIGH

NSF

ORNL

HARVARD

UIC

SMU

UCHICAGO

MIT

RENCI

LBL

GEORGETOWNUIOWA

UCDAVIS

ND

Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid

Reliant on production and advanced networking from ESNET, LHCNET and Internet2.

Virtual Data Toolkit: Common software developed between Computer Science & applications used by OSG and others.

OSG TodayOSG Today

2

OSG Job ThroughputOSG Job Throughput 29 VOs

~75 sites (19 SE & 82 CE)

~400,000 wall clock hours per day (peaks over 500,000)

25-30% opportunistic use

~15% is non-physics

>20,000 cores used per day

>43,000 cores accessible

US-CMS, US-ATLAS and OSG ready for LHC startup

3

OSG Data ThroughputOSG Data ThroughputPetabytes a month distributed from CERN to Tier-1s, between Tier-1s and to/from Tier-2s.

Transfers bursts of >10Gb/sec.

Relies on ESNET, LHCNet and Internet2 in the US.

Are These Estimates Realistic? Yes.Slide Courtesy ESNET:

FNAL outbound CMS traffic for 4 months, to Sept. 1, 2007Max= 8.9 Gb/s (1064 MBy/s of data), Average = 4.1 Gb/s (493 MBy/s of data)

Gigabits/sec of netw

ork trafficMeg

abyt

es/s

ec o

f da

ta t

raff

ic

0

1

2

3

4

5

6

7

8

9

10

Destinations:

Known LHC Tier 2+3 Sites Drive Many of theESnet Peering Point Location and Design Decisions

Slide Courtesy Internet2:

background slides

77

OSG Platform for the US-LHC Collaborations

Software/Middlewarea) Support the movement, storage and management of the petabyte LHC data sets.b) Support of job workflow, scheduling and execution at the Tier-1, Tier-2 and Tier-3 sites, that supports

transparent access across the European and US grids

Servicesa) Information, accounting and monitoring Services publishing to the WLCGb) Reliability and Availability monitoring used by the experiments to determine the availability of sites and the

WLCG to match to the MOU.

Supporta) Security monitoring, incident response, notification and mitigation b) Operational support including centralized Ticket Handling, with automated bi-directional communication

between the systems in Europe and the USAc) Collaboration with ESNET and Internet2 network projects for the integration and monitoring of the

underlying network fabric.d) Site Coordination and common support for Tier 3 sites (>8 now on OSG)e) End-to-end support for simulation, production, analysis and focused data challenges; enabling USLHC

readiness for real data taking.

88

OSG Reporting to WLCG on behalf of US-LHC(Example)

Reliability Availability

CPU Wallclock hours for

Owner VO

CPU efficiency for Owner

VO

CPU hours for

Owner VO

MoU Pledge

*

Wallclock hours

delivered to all OSG

VOs

ATLAS T2 FederationsUS-AGLT2 96% 96% 444,517 90% 401,400 416,880 462,449

US-MWT2 100% 100% 882,569 98% 863,373 480.384 975,449

US-NET2 99% 99% 308,304 94% 290,869 287,280 308,304

US-SWT2 100% 100% 463,413 95% 435,971 598,752 686,350

US-WT2 88% 90% 399,035 81% 324,410 354,240 399,035

CMS T2sT2_US_Caltech 83% 86% 419,135 78% 327,857 432,000 451,886

T2_US_Florida 96% 97% 450,173 76% 344,198 432,000 623,556

T2_US_MIT 92% 93% 568,596 87% 493,368 432,000 949,936

T2_US_Nebraska 91% 93% 378,784 62% 235,869 432,000 661,090

T2_US_Purdue 98% 98% 2,098,491 65% 1,370,777 432,000 2,484,099

T2_US_UCSD 99% 99% 1,411,529 39% 554,206 432,000 1,737,658

T2_US_Wisconsin 100% 100% 605,646 79% 480,678 432,000 610,905

US LHC Tier2 Activity for September 2008

Long path to success, and there remains fragility in end-to-end process

99

US-ATLAS Production on OSG

ATLAS Operations on the OSG April thru September 2008

Wall Clock Hours

0500,000

1,000,0001,500,000

2,000,0002,500,0003,000,000

Apr

-08

May

-08

Jun-

08

July

-08

Aug

-08

Sept

-08

Six Month Sum of Wall Clock Hours

02,000,0004,000,0006,000,0008,000,000

10,000,00012,000,00014,000,000

Apr

-08

May

-08

Jun-

08

July

-08

Aug

-08

Sept

-08

Number of Jobs

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

Apr

-08

May

-08

Jun-

08

8-Ju

l

Aug

-08

Sept

-08

Usage Type at ATLAS Sites

0%20%

40%60%

80%100%

Apr

-08

May

-08

Jun-

08

July

-08

Aug

-08

Sept

-08

Owned Opportunistic

Average Number of CPUs Delivered

0500

1,0001,5002,0002,5003,0003,5004,000

Apr

-08

May

-08

Jun-

08

July

-08

Aug

-08

Sept

-08

Petabytes Moved

0.0000.0500.1000.1500.2000.2500.3000.350

Apr

-08

May

-08

Jun-

08

July

-08

Aug

-08

Sept

-08

1010

US-CMS Production on OSG

CMS Operations on the OSG April thru Sept 2008

Wall Clock Hours

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

Apr

-08

May

-08

Jun-

08

Jul-0

8

Aug

-08

Sept

-08

Six Month Sum of Wall Clock Hours

0

5,000,000

10,000,000

15,000,000

20,000,000

Apr

-08

May

-08

Jun-

08

Jul-0

8

Aug

-08

Sept

-08

Number of Jobs

0200,000400,000600,000800,000

1,000,0001,200,000

Apr

-08

May

-08

Jun-

08

Jul-0

8

Aug

-08

Sept

-08

Usage Type at CMS Sites

0%

20%

40%

60%

80%

100%

Apr

-08

May

-08

Jun-

08

Jul-0

8

Aug

-08

Sept

-08

Owned Opportunistic

Average Number of CPUs Delivered

01,0002,0003,0004,0005,0006,0007,000

Apr

-08

May

-08

Jun-

08

Jul-0

8

Aug

-08

Sept

-08

Petabytes Moved

0

0.5

1

1.5

2

Apr

-08

May

-08

Jun-

08

Jul-0

8

Aug

-08

Sept

-08

Out In

1111

US-LHC Benefits from OSG

Common to US-ATLAS and US-CMS1. Serves as integration and delivery point for core middleware components including

compute and storage elements (VDT)2. Cyber Security operations support within OSG and across Grids (e.g. WLCG) in

case of security incidents3. Cyber Security infrastructure including site-level authorization service, operational

service for updating certificates and revocation lists4. Service availability monitoring of critical site infrastructure services, i.e. Computing

and Storage Elements (RSV)5. Service availability monitoring and forwarding of results to WLCG6. Site level accounting services and forwarding accumulated results to WLCG7. Consolidation of Grid client utilities incl. incorporation of LCG client suite, resolving

Globus library inconsistencies8. dCache packaging through VDT and support through OSG-Storage9. Integration testbed for new releases of the OSG software, pre-production

deployment testing10. Continuous support of the distributed Computing Facility and production services

through the weekly OSG facility phone meetings

1212

US-LHC Benefits from OSG(continued)

Specific to US-ATLAS 1. LCG File Catalog (LFC) server and client packaging – needed in support of the

ATLAS global Distributed Data Management system (DDM) 2. Bestman and xrootd: SRM and file system support for Tier 2 and Tier 3 facilities3. Support for integration and extension of security services in the PanDA workload

management system and the GUMS grid identity mapping service, for compliance with OSG security policies and requirements

Specific to US-CMS1. Bestman: SRM support for Tier 3 facilities2. lcg-utils tools for data management3. Scalability testing of OSG services, incl. BDII, CE, SE, and work with developers to

improve the underlying middleware.