23
Open Science Grid Open Science Grid For CI-Days For CI-Days NYSGrid Meeting Sebastien Goasguen, [email protected] John McGee, [email protected] OSG Engagement Manager School of Computing Clemson University, Clemson, SC Renaissance Computing Institute University of North Carolina, Chapel Hill, NC

Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, [email protected] John McGee, [email protected] OSG Engagement Manager School of Computing

Embed Size (px)

Citation preview

Page 1: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Open Science GridOpen Science GridFor CI-DaysFor CI-Days

NYSGrid Meeting

Sebastien Goasguen, [email protected] McGee, [email protected]

OSG Engagement Manager

School of ComputingClemson University, Clemson, SC

Renaissance Computing InstituteUniversity of North Carolina, Chapel Hill, NC

Page 2: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

2

21st Century Discovery21st Century Discovery• The three fold way

– theory– experiment– computational analysis

• Supported by– multimodal collaboration systems– distributed, multi-petabyte data archives– leading edge computing systems– distributed experimental facilities– distributed multidisciplinary teams

• Socialization and community– multidisciplinary groups– geographic distribution– new enabling technologies– creation of 21st century IT infrastructure

• sustainable, multidisciplinary communitiesT

heo

ry

Exp

erim

ent

Simulation

Page 3: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Shift from Single User, Single Resource

To:

Multiple Users, Multiple Resources

Any Combination of users and resources forms a Virtual Organization (VO)

Grid computing is solving the problem of sharing resources among VO

Page 4: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Cyberinfrastructure:

“Information Technology infrastructure to support a Virtual Organization”

•Therefore there are many Cyberinfrastructures not a single on

•The IT infrastructure is not only about HPC, but also software and applications

•The CI is put together to meet the needs of the VO members

•There are many re-usable components

•Leveraging existing assets is encouraged

•CI follows basic principles of service orientation and grid architecture

The Open Science Grid aims at supporting VO to enable science, it can be a component of the CI you build for a particular VO.

Disclaimer: This slide is the view of the author…

Page 5: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

The Open Science GridThe Open Science Grid

• OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories and computing centers across the U.S., who together build and operate the OSG project. The project is funded by the NSF and DOE, and provides staff for managing various aspects of the OSG.

• Brings petascale computing and storage resources into a uniform grid computing environment

• Integrates computing and storage resources from over 50 sites in the U.S. and beyond

A framework for large scale distributed resource sharingaddressing the technology, policy, and social requirements of sharing

Page 6: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Principal Science DriversPrincipal Science Drivers

• High energy and nuclear physics– 100s of petabytes (LHC) 2007– Several petabytes 2005

• LIGO (gravity wave search)– 0.5 - several petabytes 2002

• Digital astronomy– 10s of petabytes 2009– 10s of terabytes 2001

• Other sciences emerging– Bioinformatics (10s of petabytes)– Nanoscience– Environmental– Chemistry– Applied mathematics– Materials Science

Page 7: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Virtual Organizations (VOs)Virtual Organizations (VOs)• The OSG Infrastructure trades in

Groups not Individuals

• VO Management services allow registration, administration and control of members of the group.

• Facilities trust and authorize VOs.

• Storage and Compute Services prioritize according to VO group. Set of Available Resources

VO Management Service

OSG and WAN VO Management

& Applications

Campus Grid Campus Grid

Image courtesy: UNM Image courtesy: UNM

Page 8: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Current OSG Resources Current OSG Resources • OSG has more than 50 participating institutions,

including self-operated research VOs, campus grids, regional grids and OSG-operated VOs

• Provides about 10,000 CPU-days per day in processing• Provides 10 Terabytes per day in data transport• CPU usage averages about 75%• OSG is starting to offer support for MPI

Page 9: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

What The OSG Offers that you What The OSG Offers that you may need to support your VO(s)may need to support your VO(s)

• Low-threshold access to many distributed computing and storage resources

• A combination of dedicated, scheduled, and opportunistic computing

• The Virtual Data Toolkit software packaging and distributions

• Grid Operations, including facility-wide monitoring, validation, information services and system integration testing

• Operational security• Troubleshooting of end-to-end problems• Education and Training

Page 10: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Date range: 2007-04-29 00:00:00 GMT - 2007-05-07 23:59:59 GMT

Page 11: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

OSG Bottom line:

Framework to support VOs:

• VO of users only

• VO of resources

• VO of users and resources

Can help you with:

• Supporting your VO

• Making your resources available inside and outside campus

• Enable science through user engagement

Page 12: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Campus Grids to the Rescue

Page 13: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Why should my University facilitateWhy should my University facilitate(or drive) resource sharing?(or drive) resource sharing?

Because it’s the right thing to do– Enables new modalities of collaboration– Enables new levels of scale– Democratizes large scale computing– Sharing locally leads to sharing globally– Better overall resource utilization– Funding agencies

At the heart of the cyberinfrastructure vision is the development of a cultural community that supports peer-to-peer collaboration and new modesof education based upon broad and open access to leadership computing; data and information resources; online instruments and observatories; and visualization and collaboration services.

- Arden Bement CI Vision for 21st Century introduction

Page 14: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Campus GridsCampus Grids• They are a fundamental building block of the OSG

– The multi-institutional, multi-disciplinary nature of the OSG is a macrocosm of many campus IT infrastructure coordination issues.

• Currently OSG has three operational campus grids on board:– Fermilab, Purdue, Wisconsin– Working to add Clemson, Harvard, Lehigh

• Elevation of jobs from Campus CI to OSG is transparent• Campus scale brings value through

– Richness of common software stack with common interfaces– Higher common denominator makes sharing easier– Greater collective buying power with venders– Synergy through common goals and achievements

Page 15: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Simplified View

Page 16: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Simplified View

Page 17: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Submitting jobs through OSG to UW Campus Grid (Dan Bradley, UW Madison)

schedd(Job caretaker)

startd(Job Executor)

HEP matchmaker

CS matchmaker

GLOWmatchmaker

flock

ing

schedd(Job caretaker)

condor_submit

condorgridmanager

Open Science Grid User

Globus gatekeeper

GUMS

Page 18: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

FermiGrid - Current Architecture (Keith Chadwick)

CMSWC1

CDFOSG1

CDFOSG2

D0CAB1

GPFarm

VOMSServer

SAZServer

GUMSServer

Step 1 - user is

sues voms-proxy-init

user receives voms si

gned credentials

Step 2 – user submits their grid job viaglobus-job-run, globus-job-submit, or condor-g

Step 4 – Gateway requests GUMS

Mapping based on VO & Role

Step 3 – Gateway checks against

Site Authorization Service clusters send ClassAds

via CEMonto the site wide gateway

Step 5 - Grid job is forwarded

to target cluster

BlueArc

Periodic

Synchronization

D0CAB2

Site Wide

Gateway

Exterio

r

Interior

Page 19: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Clemson Campus Condor PoolClemson Campus Condor Pool• Machines in 27 different

locations on Campus• ~1,700 job slots• >1.8M hours served in

6 months

• users from Industrial and Chemical engineering, and Economics

• Fast ramp up of usage

• Accessible to the OSG through a gatekeeper

Page 20: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

Campuses and Regional Grids

Campus Condor pool backfills idle nodes in PBS clusters - provided 5.5 million CPU-hours in 2006, all from idle nodes in clusters

Use on TeraGrid: 2.4 million hours in 2006 spent Building a database of hypothetical zeolite structures; 2007: 5.5 million hours allocated to TG

http://www.cs.wisc.edu/condor/PCW2007/presentations/cheeseman_Purdue_Condor_Week_2007.ppt

Page 21: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

“What impressed me most was how quickly we were able to

access the grid and start using it. We learned about it at

RENCI, and we were running jobs about two weeks later,”

says Kuhlman.

For each protein we design, we consume about 3,000 CPU hours across 10,000 jobs,” says Kuhlman. “Adding in the structure and atom design process, we’ve consumed about 100,000 CPU hours in total so far.”

Engaging Users Engaging Users (more this afternoon)(more this afternoon)

Page 22: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

What can we do together?What can we do together?• Clemson’s OSG team is looking for a few partners to

help deploy campus wide grid infrastructure that integrates with local enterprise infrastructure and the national CI

• RENCI’s OSG team is available to help scientists get their applications running on OSG– low impact starting point– Help your researchers gain significant compute cycles while

exploring OSG as a framework for your own campus CI

mailto: [email protected]

Page 23: Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, sebgoa@clemson.edu John McGee, mcgee@renci.org OSG Engagement Manager School of Computing

E N D Sebastien Goasguen, [email protected]

John McGee, [email protected]

Questions ?