38
The Global Storage Grid Or, Managing Data for “Science 2.0” Ian Foster Computation Institute Argonne National Lab & University of Chicago

The Global Storage Grid Or, Managing Data for “Science 2.0” Ian Foster Computation Institute Argonne National Lab & University of Chicago

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

The Global Storage Grid

Or, Managing Data for “Science 2.0”

Ian FosterComputation Institute

Argonne National Lab & University of Chicago

2

“Web 2.0” Software as services

Data- & computation-richnetwork services

Services as platforms Easy composition of services to create new capabilities

(“mashups”)—that themselves may be made accessible as new services

Enabled by massive infrastructure buildout Google projected to spend $1.5B on computers, networks,

and real estate in 2006 Dozens of others are spending substantially

Paid for by advertising

Declan Butler,Nature

3

Science 2.0:E.g., Cancer Bioinformatics Grid

Data Service@ uchicago.edu<BPEL

WorkflowDoc>

BPELEngine

Analytic service@ osu.edu

Analytic service@ duke.eduResearcher

Or Client App <WorkflowResults>

<WorkflowInputs> link

link

link

link

caBiG: https://cabig.nci.nih.gov/ BPEL work: Ravi Madduri et al.

4

Science 2.0:E.g., Virtual Observatories

Data ArchivesData Archives

User

Analysis toolsAnalysis tools

Gateway

Figure: S. G. Djorgovski

Discovery toolsDiscovery tools

5

Science 1.0 Science 2.0:For Example, Digital Astronomy

Tell me aboutthis star Tell me about

these 20K stars

Support 1000sof users

E.g., Sloan DigitalSky Survey, ~40 TB;others much bigger soon

6

Global Data Requirements

Service consumer Discover data Specify compute-intensive analyses Compose multiple analyses Publish results as a service

Service provider Host services enabling data access/analysis Support remote requests to services Control who can make requests Support time-varying, data- and compute-intensive

workloads

7

But: Amount of computation can be enormous Load can vary tremendously Users want to compose distributed services

data must sometimes be moved, anyway Fortunately

Networks are getting much faster (in parts) Workloads can have significant locality of

reference

Analyzing Large Data:“Move Computation to the Data”

8

“Move Computation to the Core”

Poorly connected “periphery”

Highly connected“core”

9

Highly Connected “Core”:For Example, TeraGrid

StarlightStarlight

LALA

AtlantaAtlanta

SDSCSDSC

TACCTACC

PUPU

IUIU

ORNLORNL

NCSANCSA

ANLANL

PSCPSC

• 16 Supercomputers - 9 different types, multiple sizes• World’s fastest network• Globus Toolkit and other middleware providing single

login, application management, data movement, web services

30 Gigabits/s to large sites= 20-30 times major uni links= 30,000 times my home broadband= 1 full length feature film per sec

75 Teraflops (trillion calculations/s)= 12,500 faster than all 6 billion humans on earth each doing one calculation per second

10

Open Science

Grid

May 7-14, 2006

4000 jobs

11

The Two Dimensions of Science 2.0

Decompose across network

Clients integrate dynamically Select & compose services Select “best of breed” providers Publish result as new services

Decouple resource & service providers

Function

Resource

Data Archives

Analysis tools

Discovery toolsUsers

Fig: S. G. Djorgovski

12

Provisioning

Technology Requirements: Integration & Decomposition

Service-oriented Gridinfrastructure Provision physical

resources to support application workloads

ApplnService

ApplnService

Users

Workflows

Composition

Invocation

Service-oriented applications Wrap applications &

data as services Compose services

into workflows

“The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005

13

Technology Requirements:Within the Core …

Provide “service hosting services” that allow consumers to negotiate the hosting of arbitrary data analysis services

Dynamically manage resources (compute, storage, network) to meet diverse computational demands

Provide strong internal security, bridging to diverse external sources of attributes

14

Globus SoftwareEnables Grid Infrastructure

Web service interfaces for behaviors relating to integration and decomposition Primitives: resources, state, security Services: execution, data movement, …

Open source software that implements those interfaces In particular, Globus Toolkit (GT4)

All standard Web services “Grid is a use case for Web services, focused on

resource management”

15

Open Source Grid Software

Data Mgmt

SecurityCommonRuntime

Execution Mgmt

Info Services

GridFTPAuthenticationAuthorization

ReliableFile

Transfer

Data Access& Integration

Grid ResourceAllocation &

ManagementIndex

CommunityAuthorization

DataReplication

CommunitySchedulingFramework

Delegation

ReplicaLocation

Trigger

Java Runtime

C Runtime

Python Runtime

WebMDS

WorkspaceManagement

Grid Telecontrol

Protocol

Globus Toolkit v4www.globus.org

CredentialMgmt

Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005

16

GT4 Data Services

Data movement GridFTP—secure,

reliable, performant Reliable File Transfer:

managed transfers Data Replication Service—managed replication

Replica Location Service Scales to 100s of millions of replicas

Data Access & Integration services Access to, and server-side processing, of structured data

Bandwidth Vs Striping

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 10 20 30 40 50 60 70

Degree of Striping

Ba

nd

wid

th (

Mb

ps

)

# Stream = 1 # Stream = 2 # Stream = 4

# Stream = 8 # Stream = 16 # Stream = 32

Disk-to-disk onTeraGrid

17

Security Services Attribute Authority (ATA)

Issue signed attribute assertions (incl. identity, delegation & mapping)

Authorization Authority (AZA) Decisions based on assertions & policy

VO AService

VOATA

VOAZA

MappingATA

VO BService

VOUser A

Delegation AssertionUser B can use Service A

VO-A Attr VO-B Attr

VOUser B

Resource AdminAttribute

VO MemberAttribute

VO Member Attribute

18

Service Hosting

Policy

Client

Environment

Activity

Allocate/provisionConfigure

Initiate activityMonitor activityControl activity

Interface Resource provider

WSRF (or WS-Transfer/WS-Man, etc.), Globus GRAM, Virtual Workspaces

19

Virtual OSG Clusters

OSG cluster

Xen hypervisors

TeraGrid cluster

OSG

“Virtual Clusters for Grid Communities,” Zhang et al., CCGrid 2006

20

Managed Storage:GridFTP with NeST (Demoware)

GridFTP Server

NeST Module

Disk

Storage

NeST Server

Chirp

Custom Application

(Lot operations, etc.)(chirp)

(File transfers)(GSI-FTP)

(File transfer)(chirp)

GT4 NeST

Bill Allcock, Nick LeRoy, Jeff Weber, et al.

21

Community

Services Provider

Content

Services

Capacity

Hosting Science Services1) Integrate services from other sources

Virtualize external services as VO services

2) Coordinate & compose Create new services from existing ones

Capacity Provider

“Service-Oriented Science”, Science, 2005

22

Virtualizing Existing Services Establish service agreement with service

E.g., WS-Agreement Delegate use to community (“VO”) user

UserA

VO Admin

UserBVO User

ExistingServices

23

Birmingham•

The Globus-BasedLIGO Data Grid

Replicating >1 Terabyte/day to 8 sites>40 million replicas so farMTBF = 1 month

LIGO Gravitational Wave Observatory

www.globus.org/solutions

Cardiff

AEI/Golm

24

Pull “missing” files to a storage system

List of required

Files

GridFTPLocal

ReplicaCatalog

ReplicaLocation

Index

Data Replication

Service

Reliable File

Transfer Service Local

ReplicaCatalog

GridFTP

Data Replication Service

“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005

ReplicaLocation

Index

Data MovementData Location

Data Replication

25

Hypervisor/OS Deploy hypervisor/OS

Data Replication Service:Dynamic Deployment

Physical machineProcure hardware

VM VM Deploy virtual machine

State exposed & access uniformly at all levelsProvisioning, management, and monitoring at all levels

JVM Deploy container

DRS Deploy service GridFTP LRC

VO Services

GridFTP

26

Decomposition EnablesSeparation of Concerns & Roles

User

ServiceProvider

“Provide access to data D at S1, S2, S3 with performance P”

ResourceProvider

“Provide storage with performance P1, network with P2, …”

D

S1

S2

S3

D

S1

S2

S3Replica catalog,User-level multicast, …

D

S1

S2

S3

27

Example: Biology

Public PUMA Knowledge Base

Information about proteins analyzed against ~2 million gene sequences

Back OfficeAnalysis on Grid

Millions of BLAST, BLOCKS, etc., on

OSG and TeraGridNatalia Maltsev et al., http://compbio.mcs.anl.gov/puma2

28

Genome Analysis & Database Update (GADU) on OSG

April 24, 2006

GADU

3,000 jobs

29

Example:Earth System Grid

Climate simulation data Per-collection control Different user classes Server-side processing

Implementation (GT) Portal-based User

Registration (PURSE) PKI, SAML assertions GridFTP, GRAM, SRM

>2000 users >100 TB downloaded

www.earthsystemgrid.org — DOE OASCR

30

“Inside the Core” of ESG

31

Example:Astro Portal Stacking Service

Purpose On-demand “stacks” of

random locations within ~10TB dataset

Challenge Rapid access to 10-10K

“random” files Time-varying load

Solution Dynamic acquisition of

compute, storage

++++++

=

+

S4 SloanDataWeb page

or Web Service

Joint work with Ioan Raicu & Alex Szalay

32

Preliminary Performance (TeraGrid, LAN GPFS)

Joint work with Ioan Raicu & Alex Szalay

33

Example: Cybershake

Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast

Rupture Forecast

Synthetic Seismogram

Strain GreenTensor

Hazard CurveSpectral Acceleration

Hazard Map

Tom Jordan et al., Southern California Earthquake Center

34

20 TB,1.8 CPU-year

Enlisting TeraGridResources

Workflow Scheduler/Engine

VO Service Catalog

Provenance Catalog

Data Catalog

SCECStorage

TeraGridCompute

TeraGridStorage

VO Scheduler

Ewa Deelman, Carl Kesselman, et al., USC Information Sciences Institute

Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years)

1

10

100

1000

10000

100000

10/1

910

/21

10/2

310

/25

10/2

710

/29

10/3

111

/211

/411

/611

/8

11/1

0

JOBS

HRS

35

http://dev.globus.org

Guidelines(Apache)

Infrastructure(CVS, email,

bugzilla, Wiki)

ProjectsInclude

dev.globus — Community Driven Improvement of Globus Software, NSF OCI

36

Summary

“Science 2.0”—science as service, & service as platform—demands New infrastructure—service hosting New technology—hosting & management New policy—hierarchically controlled access

Data & storage management cannot be separated from computation management And increasingly become community roles

A need for new technologies, skills, & roles Creating, publishing, hosting, discovering, composing,

archiving, explaining … services

39

Acknowledgements

Carl Kesselman for many discussions Many colleagues, including those named on

slides, for research collaborations and/or slides

Colleagues involved in the TeraGrid, Open Science Grid, Earth System Grid, caBIG, and other Grid infrastructures

Globus Alliance members for Globus software R&D

DOE OASCR, NSF OCI, & NIH for support

40

For More Information Globus Alliance

www.globus.org Dev.Globus

dev.globus.org Open Science Grid

www.opensciencegrid.org TeraGrid

www.teragrid.org Background

www.mcs.anl.gov/~foster

2nd Editionwww.mkp.com/grid2

Thanks for DOE, NSF, and NIH for research support!!