55
Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Embed Size (px)

Citation preview

Page 1: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Grid computing usingOpen Science Grid (OSG)

Alina BejanUniversity of Chicago

Page 2: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Open Science Grid (OSG)

• takes High Throughput Computing to the next level, to transform data-intensive science through a cross-domain, self-managed nationally distributed cyber-infrastructure.

• brings together campuses and communities, and facilitates the needs of Virtual Organizations at all scales.

• The OSG Consortium includes– universities– national laboratories– scientific collaborations – software developers

working together to meet these goals

Page 3: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

What is a grid?

• Grid is a system that:– coordinates resources that are not

subject to centralized control,– using standard, open, general-

purpose protocols and interfaces,– to deliver nontrivial qualities of

service (based on Ian Foster’s definition in

http://www.gridtoday.com/02/0722/100136.html)

Page 4: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Grids consist of distributed clusters

Grid Client

Application& User Interface

Grid ClientMiddleware

Resource,Workflow

& Data Catalogs

4

Grid Site 2:Sao Paolo

GridService

Middleware

ComputeCluster

GridStorage

Grid

Protocols

Grid Site 1:Fermilab

GridService

Middleware

ComputeCluster

GridStorage

…Grid Site N:UWisconsin

GridService

Middleware

ComputeCluster

GridStorage

Page 5: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

• Do you have a project that takes too long when running on a single processor ?

• Do you deal with large amounts of data from simulations or experiments ?

Page 6: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Scaling up Science:Citation Network Analysis in Sociology

2002

1975

1990

1985

1980

2000

1995

Work of James Evans, University of Chicago,

Department of Sociology

6

Page 7: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Scaling up the analysis• Query and analysis of 25+ million

citations• Work started on desktop workstations• Queries grew to month-long duration• With data distributed across

U of Chicago TeraPort cluster:– 50 (faster) CPUs gave 100 X speedup

– Many more methods and hypotheses can be tested!

• Higher throughput and capacity enables deeper analysis and broader community access.

7

Page 8: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Mining Seismic data for hazard analysis (Southern Calif. Earthquake Center).

InSAR Image of theHector Mine Earthquake

• A satellitegeneratedInterferometricSynthetic Radar(InSAR) image ofthe 1999 HectorMine earthquake.

• Shows thedisplacement fieldin the direction ofradar imaging

• Each fringe (e.g.,from red to red)corresponds to afew centimeters ofdisplacement.

SeismicHazardModel

Seismicity Paleoseismology Local site effects Geologic structure

Faults

Stresstransfer

Crustal motion Crustal deformation Seismic velocity structure

Rupturedynamics 88

Page 9: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Grids work like a CHARMM for molecular dynamics

• Understanding the mathematics of molecular movement helps researchers simulate slices of the atomic world

• But when accurate nanosecond simulations pose a serious challenge, how can you simulate full microseconds of complex molecular dynamics?

Page 10: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Designing Proteins from Scratch

• Scientists use OSG to design proteins that adopt specific 3D structures and more ambitiously bind and regulate target proteins important in cell biology and pathogenesis

Page 11: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Genetics

• Grid computing is helping microbiologists solve the mysteries of mapping new genomes using GADU (Genome Analysis and Database Update)

Page 12: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Genome Analysis and Database Update (GADU)

• Runs across TeraGrid and OSG. Uses the Virtual Data System (VDS) workflow & provenance.

• Pass through public DNA and protein databases for new and newly updated genomes of different organisms and runs BLAST, Blocks, Chisel. 1200 users of resulting DB.

• Request: 1000 CPUs for 1-2 weeks. Once a month, every month. On OSG at the moment >600CPUs and 17,000 jobs a week.

Page 13: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Stormy weather: grid computing powers fine-scale climate modeling

• Why run individual models when you can run models in combination?

• When it comes to climate modeling, meteorologists are showing 16 forecasts are better than one.

Page 14: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Which sciences can benefit ?

• particle and nuclear physics• astrophysics• bioinformatics• gravitational-wave science• computer science• mathematics• medical imaging • nanotechnology• potentially any other science …

Page 15: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Grid Resources in the US

• Research Participation

Majority from physics : Tevatron, LHC, STAR, LIGO.

Used by 10 other (small) research groups. 90 members, 30 VOs,

Contributors: 5 DOE Labs

BNL, Fermilab, NERSC, ORNL, SLAC. 65 Universities. 5 partner campus/regional grids.

Accessible resources: 43,000+ cores 6 Petabytes disk cache 10 Petabytes tape stores 14 internetwork partnership

Usage 15,000 CPU WallClock days/day 1 Petabyte data distributed/month. 100,000 application jobs/day. 20% cycles through resource sharing,

opportunistic use.

• Research Participation

Majority from physics : Tevatron, LHC, STAR, LIGO.

Used by 10 other (small) research groups. 90 members, 30 VOs,

Contributors: 5 DOE Labs

BNL, Fermilab, NERSC, ORNL, SLAC. 65 Universities. 5 partner campus/regional grids.

Accessible resources: 43,000+ cores 6 Petabytes disk cache 10 Petabytes tape stores 14 internetwork partnership

Usage 15,000 CPU WallClock days/day 1 Petabyte data distributed/month. 100,000 application jobs/day. 20% cycles through resource sharing,

opportunistic use.

• Research Participation Support for Science Gateways over 100 scientific data

collections (discipline specific databases)

Contributors: 11 Supercomputing centers

Indiana, LONI, NCAR, NCSA, NICS, ORNL, PSC, Purdue, SDSC, TACC and UC/ANL

• Computational resources: – > 1 Petaflop computing capability– 30 Petabytes of storage (disk and

tape)– Dedicated high performance

internet connections (10G( TFLOPS (161K-cores) in 750

parallel computing systemsand growing

• Research Participation Support for Science Gateways over 100 scientific data

collections (discipline specific databases)

Contributors: 11 Supercomputing centers

Indiana, LONI, NCAR, NCSA, NICS, ORNL, PSC, Purdue, SDSC, TACC and UC/ANL

• Computational resources: – > 1 Petaflop computing capability– 30 Petabytes of storage (disk and

tape)– Dedicated high performance

internet connections (10G) 750 TFLOPS (161K-cores) in

parallel computing systems and growing

TeraGridOSG

Page 16: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Open Science Grid

Overview

Page 17: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

• grid service providers:– middleware developers– cluster, network and storage administrators– local-grid communities

• the grid consumers:– global collaborations – single researchers– campus communities – under-served science domains

into a cooperative infrastructure to share and sustain a common heterogeneous distributed facility in the US and beyond.

The Open Science Grid Consortium brings:

Page 18: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

OSG sites

Page 19: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

96 Resources across production & integration infrastructures

30 Virtual Organizations +6 operations Includes 25% non-physics.

~30,000 CPUs (from 30 to 4000)~6 PB Tapes

~4 PB Shared Disk

Snapshot of Jobs on OSGs

Sustaining through OSG submissions:3,000-4,000 simultaneous jobs .

~100K jobs/day~50K CPUhours/day.

Peak test jobs of 15K a day.

Using production & research networks

OSG Snapshot

Page 20: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Overlaid by virtual computational environments of single to large groups of researchers local to worldwide

Page 21: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

To efficiently use a Grid, you must

locate and monitor its resources.

• Check the availability of different grid sites

• Discover different grid services

• Check the status of “jobs”

• Make better scheduling decisions with information maintained on the “health” of sites

Page 22: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Virtual Organization Resource Selector - VORS

http://vors.grid.iu.edu/

• Custom web interface to a grid scanner that checks services and resources on:– Each Compute Element – Each Storage Element

• Very handy for checking:– Paths of installed tools on Worker Nodes.– Location & amount of disk space for

planning a workflow.– Troubleshooting when an error occurs.

Page 23: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Open Science Grid

Page 24: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

VORS entry for OSG_LIGO_PSU

OSG Consortium Mtg March 2007Quick Start Guide to the OSG

Page 25: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Gratia -- job accounting systemhttp://gratia-osg.fnal.gov:8880/gratia-reporting/

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 26: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

How do you join the OSG?A software perspective

Page 27: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Joining OSG

• Assumption:– You have a campus grid

• Question: – What changes do you need to make

to join OSG?

Page 28: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Your Campus Grid

• assuming that you have a cluster with a batch system:

– Condor– Sun Grid Engine– PBS/Torque– LSF

Page 29: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Administrative Work

• You need a security contact– Who will respond to security

concerns

• You need to register your site

• You should have a web page about your site.– This will be published– People can learn about your site.

Page 30: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Big Picture

• Compute Element (CE)– OSG jobs submitted to CE, which

gives them to batch system– Also has information services and

lots of support software

• Shared file system– OSG requires a couple of directories

to be mounted on all worker nodes

• Storage Element (SE)– How do you manage your storage at

your site

Page 31: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Installing Software

• The OSG Software Stack– Based on the VDT

• The majority of the software you’ll install• It is grid independent

– OSG Software Stack:• VDT + OSG-specific configuration

• Installed via Pacman

Page 32: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

What is installed?

• GRAM: – Allows job submissions

• GridFTP: – Allows file transfers

• CEMon/GIP: – Publishes site information

• Some authorization mechanism– grid-mapfile: file that lists authorized users,

or– GUMS (grid identity mapping service)

• And a few other things…

Page 33: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

OSG Middleware

Infr

astr

uctu

reA

ppli

cati

ons

VO Middleware

Core grid technology distributions: Condor, Globus, Myproxy: shared with TeraGrid and

others

Virtual Data Toolkit (VDT) core technologies + software needed by

stakeholders: many components shared with EGEE

OSG Release Cache: OSG specific configurations, utilities etc.

HEP

Data and workflow management etc

Biology

Portals, databases etc

User Science Codes and Interfaces

Existing Operating, Batch systems and Utilities.

Astrophysics

Data replication etc

Page 34: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Picture of a basic site

Page 35: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Shared file system

• OSG_APP– For users to store applications

• OSG_DATA– A place to store data– Highly recommended, not required

• OSG_GRID– Software needed on worker nodes– Not required– May not exist on non-Linux clusters

• Home directories for users– Not required, but often very convenient

Page 36: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Storage Element

• Some folks require more sophisticated storage

management

– How do worker nodes access data?

– How do you handle terabytes (petabytes?) of data

• Storage Elements are more complicated

– More planning needed

– Some are complex to install and configure

• Two OSG supported options of SRMs:

– dCache

– Bestman

Page 37: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

OSG - Education, Training and

Outreach

OpenScienceGrid.org/Education

OpenScienceGrid.org/About/Outreach

[email protected]

Page 38: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

OSG EOT Mission• Organize and deliver training for OSG

– OSG End Users – Site Administrators – Support new communities / VOs joining OSG

• Engage young people in (e)Science and CS– Primary focus: undergraduate and early graduate

students– Reach high schools through I2U2 (QuarkNet follow-on)– Promote and train in interdisciplinary collaboration

• Reach out – To under-represented communities

• Engage and assist minority students and minority serving institutions by providing resources and opportunities.

– internationally• Strengthen and assist emerging, underserved regions of

strategic importance to form bonds to US science and Grid communities

• Focus (for outreach) is on Latin America and Africa• OISE focus on engagement in Europe and Asia

Page 39: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

OSG EOT Program Overview

• End User Education– In-person workshops– Online training– EOT VO for student engagement, access and

support• Community Outreach

– International student/faculty exchange via OISE– Supporting under-represented and under-resourced

communities in US, Latin America and Africa through workshops, technical assistance and grid access

– High School Education – I2U2 support - http://ed.fnal.gov/uueo/i2u2.html

• Site Admin Training– Training grid administrators in setup and support of

OSG sites using the OSG/VDT software stack

Page 40: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

2007-08 Workshop Programwww.opensciencegrid.org/workshops• Georgetown University Grid School 2008, April 15-17, DC

• Tuskegee University Grid School 2008, Feb 6-8 - Tuskegee AL

• Florida International Grid School 2008, Jan 23-25, at Florida International University, Miami, Florida

• Supercomputing ’07 tutorials, Nov 11 & 13, at Reno, Nevada

• Great Plains Grid School (GPGS’07), Aug 8-10, at the U. of Nebraska-Lincoln

• Rio Grande Grid School (RGGS’07), Jun 8-10, at the U. of Texas at Brownsville, coordinated with UT-Pan American

• TeraGrid Conference tutorials, Jun 4-8, at the U. of Wisconsin-Madison

• South Africa Workshop, Mar 26-30, at the IFIP School on Software (ISS’07), Gordon's Bay, South Africa

• Midwest Grid Workshop (MGW’07), Mar 24-25 at the U. of Illinois at Chicago

• Argentine Grid Workshop, Mar 12-14 at Santa Fe, Argentina

Page 41: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Grid School Syllabus• Intro to distributed computing and the Grid• Grid security and basic Grid access• Grid resource and job management• Grid data management• Building, monitoring, maintaining & using

Grids• Grid applications and frameworks• Workflow and related issues (scheduling,

provenance)• Future:

– Porting applications to the Grid – Web services and the resource framework– Advanced networking; data mining

Page 42: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Self-paced / online instruction

• opensciencegrid.org/OnlineGridCourse

• Flexible roadmaps for navigating the material

• Lectures and labs

• Access to online community to provide support

• Online office hours

Page 43: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

I2U2 Interactions In

Understanding the Universe• The Grid for Secondary Science

Education “educational virtual organization”• creates an infrastructure to develop

– hands-on laboratory course content and– an interactive learning experience that

• brings tangible aspects of each experiment into a “virtual laboratory.”

• These labs use the Grid for education in the same way that science uses the Grid.

• www.i2u2.org

Page 44: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

I2U2

• "e-Labs”– delivered as Web-based portals accessible in the

classroom and at home– implemented with of Web-based media capabilities

• "i-Labs”– delivered as interactive interfaces typically located

within science museums and similar public venues– leverage the latest advances in

• display technology and • human-computer interaction,

– and bring the experiences and appreciation of scientific investigation and inquiry to the wide audience of informal education

Page 45: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

List of e-Labs– Cosmic Ray e-Lab

• High school students investigate data from a cosmic ray detector array. (not necessary to have a detector to participate.)

• Possible investigations: ・Muon Lifetime ・ Diurnal changes in flux ・ Effects of shielding ・ High-energy showers ・ Altitude effects

– CMS Test Beam e-Lab (Beta Version)• High school students analyze CMS test beam data in an

online graphical ROOT environment.• Shower Depth ・ Lateral Shower Size ・ Beam Purity ・

Detector Resolution

– LIGO e-Lab (Beta version) • High school and middle school students investigate seismic

behavior with data from LIGO ( Laser Interferometer Gravitational-wave Observatory).

• Earthquake Studies ・ Frequency Band Studies ・Microseismic Studies ・ Studies of Human-induced Seismic Activity

– ATLAS e-Lab– STAR e-Lab

Page 46: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

i-Labs

• To engage the general public in science, we envision using appealing museum exhibits to attract visitors' attentions and engage them in a short taste of exploration

• they will use virtual data tools and techniques to access, process and publish data, report their results as online posters, have online discussions about their work with peers, and then present posters and meet scientists at museums.

Page 47: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

i-Lab Example

• Adler Planetarium– is developing a cosmic ray i-Lab with

support from QuarkNet and the Compact Muon Solenoid (CMS) experiment.

– effort to research an informal-education model

Page 48: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Cooperation with EGEE

International Schools on Grid Computing

– OSG as co-organizer for ISSGC’07 and ISSGC’08

– sponsor alumni of US Grid Schools to attend the International Summer school.

– Joint lectureships and material sharing / development efforts

– Content sharing

Page 49: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Cooperation with TeraGrid

• Another major national cyberinfrastructure

• Partnership of 11 organizations– Mostly supercomputer labs

• Use of TG and OSG resources

• Contribute content

• Joint training

Page 50: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Education VO

• Interested in getting started with OSG ?

• Join OSGEDU VO – Use OSG resources– Contribute resources

• Wiki, email lists, follow-up discussions– Support, engagement– Postings of opportunities for students

Page 51: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Students2004-2008 facts:

• International participation:– Argentina , Brazil, Canada, Colombia, India, Mexico,

New Zealand, Russia, South Africa, Uruguay• Women

– Approx. 15%• Minorities

– Approx 15%

Try to improve these statistics

Page 52: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Participants’ domains

Computer Science Image processing Communications Networking

Physics Astrophysics High Energy Nuclear Physics Optical Networks Theoretical solid state physics Atomic Physics Computational Physics

Chemistry Computational Chemistry Molecular Dynamics &

Simulation

Applied Mathematics

Geosciences

Computational Multibody Dynamics for Distributed computing

Judicial Administration

Engineering Materials Science

Quantum theory

…and others …

Page 53: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Acknowledgments

Various OSG members and contributors (Alain Roy, Mike Wilde, Ruth Pordes, Gabielle Allen and many others …)

Page 54: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

Summary of OSG

• Provides core services, software and a distributed facility for an increasing set of research communities.

• Helps VOs access resources on many different infrastructures.

• Interested in collaborating and contributing our experience and efforts.

Page 55: Grid computing using Open Science Grid (OSG) Alina Bejan University of Chicago

it’s the people…that make the grid a community! http://www.opensciencegrid.org