22
Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory [email protected] GriPhyN NSF Project Review 29-30 January 2003 Chicago

Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory [email protected] GriPhyN NSF Project Review 29-30 January 2003

Embed Size (px)

Citation preview

Page 1: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

Computer Science Research

Ian FosterUniversity of Chicago & Argonne National Laboratory

[email protected]

GriPhyN NSF Project Review29-30 January 2003

Chicago

Page 2: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

229 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Research Introduction & Context (Ian Foster: 30 mins)

– Vision : Virtual data as e-science enabler

– Organization: Structure & interactions

– Dissemination: Targets and mechanisms

– The nature of future challenges Computer science research

– Virtual data (Mike Wilde: 15)

– Scheduling, planning (Ewa Deelman: 15)

– Execution (Mike Franklin: 15)

– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)

– Virtual Data Toolkit Student presentations (60)

Page 3: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

329 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Research Introduction & Context (Ian Foster: 30 mins)

– Vision : Virtual data as e-science enabler

– Organization: Structure & interactions

– Dissemination: Targets and mechanisms

– The nature of future challenges Computer science research

– Virtual data (Mike Wilde: 15)

– Scheduling, planning (Ewa Deelman: 15)

– Execution (Mike Franklin: 15)

– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)

– Virtual Data Toolkit Student presentations (60)

Page 4: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

429 Jan 2003Ian Foster, U.Chicago [email protected]

PetaScale Virtual Data Grids (1)

Virtual Data ToolsRequest Planning & Scheduling Tools

Request Execution & Management Tools

Transforms

Distributed resources(code, storage,

computers, and network)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid ServicesOther Grid

Services

Interactive User Tools

Production TeamIndividual Investigator Research group

Raw datasource

PetaOpsPetabytes

Performance

Page 5: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

529 Jan 2003Ian Foster, U.Chicago [email protected]

Petascale Virtual Data Grids (2)

GridOperations

simulation data

discovery

ScienceReview

Data Grid

storageelement

replica locationservice

storageelement

storageelement

Dat

aT

ran

spo

rt Sto

rage

Reso

urce

Mg

mt

virtualdata

catalogvirtual data

index

virtualdata

catalog

virtualdata

catalog

Computing Grid

workflowplanner

request plannerworkflowexecutor

(DAGman)

request executor(Condor-G,

GRAM)

requestpredictor

(Prophesy)

Grid Monitor

ProductionManager

Researcher

planning

discovery

com

po

sition

sim

ula

tio

n

anal

ysis

sharing

raw d

ata

detector

derivatio

n

Page 6: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

629 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science and GriPhyN

ComputerScience

Research

VirtualData

Toolkit

PartnerPhysicsProjects

LargerScience

Community

Globus, Condor, NMI, EU DataGrid, PPDG Communities

ProductionDeployment

TechTransfer

Techniques& software

Requirements

Prototyping& experiments

Other linkages:- Work force- CS researchers- Industry

Page 7: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

729 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Challenges (1) Virtual data

– Representation, discovery, & manipulation of workflows and associated data & programs

Planning– Mapping workflows in an efficient, policy-aware manner to

distributed resources Execution

– Executing workflows, including data movements, reliably and efficiently

Performance– Monitoring aspects of system performance for scheduling &

troubleshooting

Page 8: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

829 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Challenges (2)

Engage meaningfully with physics groups Provide educational opportunities Develop, package, deliver, and support

quality software Achieve outreach to groups outside partner

physics experiments

Page 9: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

929 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Research Introduction & Context (Ian Foster: 30 mins)

– Vision : Virtual data as e-science enabler

– Organization: Structure & interactions

– Dissemination: Targets and mechanisms

– The nature of future challenges Computer science research

– Virtual data (Mike Wilde: 15)

– Scheduling, planning (Ewa Deelman: 15)

– Execution (Mike Franklin: 15)

– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)

– Virtual Data Toolkit Student presentations (60)

Page 10: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1029 Jan 2003Ian Foster, U.Chicago [email protected]

GriPhyN Computer Science Team U.Chicago: Dumitrescu, Foster, Iamnitchi, Milligan, Ranganathan,

Ripeanu, Voeckler, Wilde USC/ISI: Deelman, Kesselman, Mehta, Patil, Singh, Vahi NWU -> TAMU: Taylor, Yin UCB: Franklin, Liu UCSD: Marzullo, Moore, Zhang, Jagatheesan UW-Madison: Alderman, Arpaci-Dusseau, Arpaci-Dusseau, Bailey,

Bent, Kosar, Livny, Roy, Stanley, Thain UF: Arbee, George, Jiang, Katageri, Ranka, Rodriguez UT Brownsville: Campanelli, Morris, Zamora LBNL: Shoshani

Faculty/Staff, Student/Postdoc (underlined = present)

Page 11: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1129 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Research:How do We Work?

System architecture & virtual data toolkit as two overarching organizational mechanisms

Project activities all defined in relationship to these organizing principles: – Research: Explore new techniques to guide

evolution of the system architecture and VDT

– Development: Construct VDT software

– Evaluation: Apply and evaluate VDT software and/or new techniques in context of application challenges

Page 12: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1229 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Research:How Are We Coordinated?

The activities of this large, multidisciplinary group are coordinated by frequent and multivalent communications– Face-to-face meetings in large & small groups

– Formal and informal documents defining requirements, challenge problems, testbeds

– Email, phone calls, videoconferences

– Cooperation on challenge problems and technology and application demonstrations

– Cooperation on software releases

Page 13: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1329 Jan 2003Ian Foster, U.Chicago [email protected]

GriPhyN Architecture/VDTand CS Research Projects

VirtualData

Planning

Execution

ChimeraVirtual

Data System+ Pegasus

Planner

DAGmanWorkflow

Globus Toolkit,Condor,Ganglia,

Etc.

Partial Queries(Liu, Franklin)

Decentralized scheduling (Ranganatha

n) Fault-

tolerantmaster-worker

(Marzullo) Scalable replicalocation service(UC, ISI team)

Policy-aware

scheduling(Dumitresc

u)

Ontologies

(Zhao)

NeST Storage mgmt

(UW team)

Virtual data language

design(Voeckler,Wild

e)

AI Planning(Deelman,Nara

ng) Virtual data language

applns(Milligan,

Zhao) DAGmanenhancemen

ts(UW team)

Prophesy (Taylor,

Yin)

HP monitoring(George)

VDT Research

Page 14: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1429 Jan 2003Ian Foster, U.Chicago [email protected]

GriPhyN Arch/VDT—CS ResearchDegree of Coupling

VirtualData

Planning

Execution

ChimeraVirtual

Data System+ Pegasus

Planner

DAGmanWorkflow

Globus Toolkit,Condor,Ganglia,

Etc.

Partial Queries(Liu, Franklin)

Decentralized scheduling (Ranganatha

n) Fault-

tolerantmaster-worker

(Marzullo) Scalable replicalocation service(UC, ISI team)

Policy-aware

scheduling(Dumitresc

u)

Ontologies

(Zhao)

NeST Storage mgmt

(UW team)

Virtual data language

design(Voeckler,Wild

e)

AI Planning(Deelman,Nara

ng) Virtual data language

applns(Milligan,

Zhao) DAGmanenhancemen

ts(UW team)

Prophesy (Taylor,

Yin)

HP monitoring(George)

VDT Research

Already

Underway

Pending

Page 15: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1529 Jan 2003Ian Foster, U.Chicago [email protected]

Examples of Technology Injection:Chimera R&D Timeline

Chimera-2• Type model• Dataset catalog• Metadata• Hyperlinks• Instance tracking• Performance data

20032002

Chimera-1• Java code & class model• XML VDL• TR/DV model• Compound TRs• General Grid exec env• Optimized DB schema

Chimera-0• Derivations only• Grid exec environment (prototype)• PERL & PostgresQL

Sloancluster finding

APPS

TECH

CMS analysis

prototype w/ROOT

CMS official event

simulation

Sloan cluster-finding science

CMS & ATLAS

analysis w/ROOT, CLARENS,

JASLIGO pulsar search

ATLAS events-on- demand

CMS event simulation

prototyping

Chimera-3• Knowledge repr.• Policy-driven planners• VD browsers, composers• …

2004

Sloan near-earth object

BioGrid

facility…

Page 16: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1629 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Research Introduction & Context (Ian Foster: 30 mins)

– Vision : Virtual data as e-science enabler

– Organization: Structure & interactions

– Dissemination: Targets and mechanisms

– The nature of future challenges Computer science research

– Virtual data (Mike Wilde: 15)

– Scheduling, planning (Ewa Deelman: 15)

– Execution (Mike Franklin: 15)

– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)

– Virtual Data Toolkit Student presentations (60)

Page 17: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1729 Jan 2003Ian Foster, U.Chicago [email protected]

Dissemination: Targets

Researchers and educators– Facilitate creation of new knowledge

Computer science research community– Contribute to knowledge

– Engage community in solving our problems Open source community

– Contribute to open Grid technology base Industry

– Contribute to vibrant commercial technology

Page 18: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1829 Jan 2003Ian Foster, U.Chicago [email protected]

Dissemination: Mechanisms

Software– VDT: adoption by LHC Computing Grid

– Globus Toolkit and Condor systems Publications and talks

– XX papers, YY tech reports, ZZ talks Workshops and meetings

– E.g., “Data Derivation & Provenance”, Oct 02 Community activities

– E.g., advisory committees, GGF standards

Page 19: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

1929 Jan 2003Ian Foster, U.Chicago [email protected]

Representative Publications Annis, J., Zhao, Y., Voeckler, J., Wilde, M., Kent, S., Foster, I., Applying Chimera Virtual Data

Concepts to Cluster Finding in the Sloan Sky Survey. SC'2002, 2002. Bent, J., Venkataramani, V., LeRoy, N., Roy, A., Stanley, J., Arpaci-Dusseau, A.C., Arpaci-

Dusseau, R.H., Livny, M., Flexibility, Manageability, and Performance in a Grid Storage Appliance, HPDC’11, 2002.

Deelman, E., Blackburn, K., Ehrens, P., Kesselman, C., Koranda, S., Lazzarini, A., Mehta, G., Meshkat, L., Pearlman, L., Blackburn, K. and Williams., R., GriPhyN and LIGO: Building a Virtual Data Grid for Gravitational Wave Scientists, HPDC’11, 2002.

Foster, I., Voeckler, J., Wilde, M., Zhao, Y., Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation, SSDBM, 2002.

Iamnitchi, A., Ripeanu, M., Foster, I., Locating Data in (Small-World?) Peer-to-Peer Scientific Collaborations. 1st Intl. Workshop on Peer-to-Peer Systems, 2002.

Raman, P., George, A., Radlinski, M., Subramaniyan, R., GEMS: Gossip-Enabled Monitoring Service for Heterogeneous Distributed Systems, Technical Report, UF, 2002.

Ranganathan, K. and Foster, I., Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications, HPDC’11, 2002.

Ripeanu, M., Foster, I., Iamnitchi, A. Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design. Internet Computing, 6 (1). 50-57. 2002.

Page 20: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

2029 Jan 2003Ian Foster, U.Chicago [email protected]

Computer Science Research Introduction & Context (Ian Foster: 30 mins)

– Vision : Virtual data as e-science enabler

– Organization: Structure & interactions

– Dissemination: Targets and mechanisms

– The nature of future challenges Computer science research

– Virtual data (Mike Wilde: 15)

– Scheduling, planning (Ewa Deelman: 15)

– Execution (Mike Franklin: 15)

– Performance (Valerie Taylor: 15) Technology delivery (Miron Livny: 15)

– Virtual Data Toolkit Student presentations (60)

Page 21: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

2129 Jan 2003Ian Foster, U.Chicago [email protected]

The Nature of Future Challenges

GriPhyN R&D is proving very successful– In terms of “new ideas”

– In terms of interest & adoption Our major challenges as we move forward are

to scale and sustain the effort – Research scope: virtual data => KR; planning,

execution => x1000 larger; …; …

– Software support: we need NMIx10!

– Infrastructure & application support See Atkins cyberinfrastructure report!

Page 22: Computer Science Research Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003

2229 Jan 2003Ian Foster, U.Chicago [email protected]

Summary

CS has made significant contributions both to experiments and to knowledge, e.g.– Virtual data concepts and technologies

– Scheduling in large-scale distributed systems

– DAGman workflow management & execution

– Scalable replica location services VDT (& underlying Globus Toolkit & Condor systems) a

good technology transfer vehicle– Adoption by major science projects

– Adoption of Grid concepts within industry Major challenge: exploiting opportunities