19
Major Grid Computing Initatives Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago

Major Grid Computing Initatives

  • Upload
    amaris

  • View
    42

  • Download
    1

Embed Size (px)

DESCRIPTION

Major Grid Computing Initatives. Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago. Overview. The Grid concept Historical background Grid computing initiatives Grid technology roadmap - PowerPoint PPT Presentation

Citation preview

Page 1: Major Grid Computing Initatives

MajorGrid Computing Initatives

Ian FosterMathematics and Computer Science Division

Argonne National Laboratoryand

Department of Computer ScienceThe University of Chicago

Page 2: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Overview

The Grid concept Historical background Grid computing initiatives Grid technology roadmap Data Grid projects Summary

Page 3: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

The Grid Concept

Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships

Via investigations of New applications that become possible when

resources can be shared in a coordinated way Protocols, algorithms, persistent infrastructure

to facilitate sharing

Page 4: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

A Little History

Early 90s Gigabit testbeds, metacomputing

Mid to late 90s Early experiments (e.g., I-WAY), academic software

projects (e.g., Globus), application experiments 2000

Major application communities emerging Major infrastructure deployments Clear architecture picture, rich technology base Grid Forum: >300 people, >90 orgs, 11 countries

Page 5: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Layered Grid Architecture(By Analogy to Internet Architecture)

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Managing multiple resources”: ubiquitous infrastructure services

User“Specialized services”: user- or appln-specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arch

itectu

re

Page 6: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Grid Technology Base

Development of Grid protocols & services Protocol-mediated access to remote resources New services: e.g., resource brokering “On the Grid” = speak Intergrid protocols Mostly (extensions to) existing protocols

Development of Grid APIs & SDKs Facilitate application development by supplying

higher-level abstractions The (hugely successful) model is the Internet The Grid is not a distributed OS!

Page 7: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

U.S. Grid Computing Activities(Excluding Data Grid Projects)

NSF: Fundamental IT research, plus NSF PACI program (~$3M/yr) NEESgrid ($10M over 3 years)

DOE SC: Fundamental IT research, plus NGI program (done), SciDAC (perhaps)

DOE DP: DISCOM ($3M/yr?) DARPA: Parts of Quorum ($2M/yr?) NASA: Information Power Grid (~$5M/yr)

Funds [inadequate] support for research, development, deployment, operations

Page 8: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Data IntensiveComputing and Grids

The term “Data Grid” is often used Unfortunate as it implies a distinct

infrastructure, which it isn’t; but easy to say Data-intensive computing shares numerous

requirements with collaboration, instrumentation, computation, …

Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained

Fortunately this seems easy to do!

Page 9: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Emerging Data Grid Architecture

Discipline-Specific Data Grid Application

Coherency control, replica selection, task management, virtual data catalog, virtual data code catalog, …

Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs,

Access to data, access to computers, access to network performance data, …

Communication, service discovery (DNS), authentication, authorization, delegation

Storage systems, clusters, networks, network caches, …

User

Appln

Collective

Resource

Connect

Fabric

Page 10: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Major Data Grid Projects Clipper (DOE Science)

Technologies for reliable high-speed transfer Earth System Grid (DOE Office of Science)

DG technologies, climate applications European Data Grid (EU)

DG technologies & deployment in EU GriPhyN (NSF ITR)

Investigation of “Virtual Data” concept Particle Physics Data Grid (DOE Science)

DG applications for HENP experiments

Page 11: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

High-Level View of Earth System Grid:A Model Architecture for Data Grids

Metadata Catalog

Replica Catalog

Tape Library

Disk Cache

Attribute Specification

Logical Collection and Logical File Name

Disk Array Disk Cache

Application

Replica Selection

Multiple Locations

NWS

SelectedReplica

GridFTP commands PerformanceInformation &Predictions

Replica Location 1 Replica Location 2 Replica Location 3

MDS

Page 12: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

GriPhyN Overview(www.griphyn.org)

5-year, $12M NSF ITR proposal to realize the concept of virtual data, via:1) CS research on

Virtual data technologies (info models, management of virtual data software, etc.)

Request planning and scheduling (including policy representation and enforcement)

Task execution (including agent computing, fault management, etc.)

2) Development of Virtual Data Toolkit (VDT)3) Applications: ATLAS, CMS, LIGO, SDSS

PIs=Avery (Florida), Foster (Chicago)

Page 13: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

The Petascale Virtual Data Grid (PVDG) Model

Data suppliers publish data to the Grid Users request raw or derived data from

Grid, without needing to know Where data is located Whether data is stored or computed

User can easily determine What it will cost to obtain data Quality of derived data

PVDG serves requests efficiently, subject to global and local policy constraints

Page 14: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

PVDGScenario

?

Major Archive Facilities

Network caches & regional centers

Local sites

User requests may be satisfied via a combination of data access and computation at local, regional, and central sites

Page 15: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

User View of PVDG Architecture

Virtual Data ToolsRequest Planning and

Scheduling ToolsRequest Execution Management Tools

Transforms

Distributed resources(code, storage,computers, and network)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid Services

Other Grid Services

Interactive User Tools

Production Team

Individual Investigator Other Users

Raw data source

Page 16: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Other Activities Relevant to Data Grids

Simulation activities MONARC, MicroGrid

Globus Data Grid/replica mgmt services GridFTP: secure high-performance FTP Replica catalog/replica management

Grid Data Management Pilot (GDMP) Being used to move data CERN->Caltech Uses GridFTP http://cmsdoc.cern.ch/cms/grid/

Page 17: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Example Tech Developments:Globus Data Grid Services

Library

Program

Legend

globus-url-copy

ReplicaPrograms

CustomServers

globus_gass_copy

globus_ftp_client

globus_ftp_control

globus_common GSI (security)

globus_io OpenLDAP client

globus_replica_catalog

globus_replica_manager

Custom Clients

globus_gass_transfer

globus_gass

Already exist

Page 18: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Example Technology Developments:Quality of Service for Bulk Transfer

0

20000

40000

60000

80000

100000

0 50 100 150 200 250

Time

Ban

dw

idth

(K

bp

s)

backgroundforegroundcompetitive

When a reservation begins, the bulk-transfer backs off

When a reservation ends,the bulk-transfer speeds up

The competitive UDP trafficnever interferes

GARA: www.mcs.anl.gov/qos

Page 19: Major Grid Computing Initatives

Ian Foster ARGONNE CHICAGO

Summary New data-intensive applications require a new type

of infrastructure: “Data Grids” Concerns and infrastructure requirements have

much in common with other “Grids” Development requires substantial R&D in caching,

security, policy, QoS, etc., etc. Existing technology base enables contruction of Data

Grids to start now

www.globus.org www.griphyn.org

www.gridforum.org www.ppdg.net

grid.web.cern.ch