53
Korean HEP G rid Workshop (August 27, Paul Avery 1 Paul Avery University of Florida [email protected] Trillium and Open Science Grid 3 rd International Workshop on HEP Data Gri Kyungpook National University Daegu, Korea August 27, 2004

Paul Avery University of Florida [email protected]

  • Upload
    margie

  • View
    42

  • Download
    3

Embed Size (px)

DESCRIPTION

Trillium and Open Science Grid. 3 rd International Workshop on HEP Data Grids Kyungpook National University Daegu, Korea August 27, 2004. Paul Avery University of Florida [email protected]. U.S. “Trillium” Grid Consortium. Trillium = PPDG + GriPhyN + iVDGL - PowerPoint PPT Presentation

Citation preview

Page 1: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 1

Paul AveryUniversity of [email protected]

Trillium and Open Science Grid

3rd International Workshop on HEP Data GridsKyungpook National University

Daegu, KoreaAugust 27, 2004

Page 2: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 2

U.S. “Trillium” Grid Consortium Trillium = PPDG + GriPhyN + iVDGL

Particle Physics Data Grid: $12M (DOE) (1999 – 2004+)GriPhyN: $12M (NSF) (2000 – 2005) iVDGL: $14M (NSF) (2001 – 2006)

Basic composition (~150 people)PPDG: 4 universities, 6 labsGriPhyN: 12 universities, SDSC, 3 labs iVDGL: 18 universities, SDSC, 4 labs, foreign partnersExpts: BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO,

SDSS/NVO

Complementarity of projectsGriPhyN: CS research, Virtual Data Toolkit (VDT)

developmentPPDG: “End to end” Grid services, monitoring, analysis iVDGL: Grid laboratory deployment using VDTExperiments provide frontier challengesUnified entity when collaborating internationally

Page 3: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 3

Trillium Science Drivers Experiments at Large Hadron

Collider100s of Petabytes 2007 - ?

High Energy & Nuclear Physics expts~1 Petabyte (1000 TB) 1997 –

present

LIGO (gravity wave search)100s of Terabytes 2002 –

present

Sloan Digital Sky Survey10s of Terabytes 2001 –

present

Data

gro

wth

Com

mu

nit

y g

row

th

2007

2005

2003

2001

2009

Future Grid resources Massive CPU (PetaOps) Large distributed datasets (>100PB) Global communities (1000s)

Page 4: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 4

Trillium: Collaborative Program of Work

Common experiments, leadership, participants CS research

Workflow, scheduling, virtual data

Common Grid toolkits and packagingVirtual Data Toolkit (VDT) + Pacman packaging

Common Grid infrastructure: Grid3National Grid for testing, development and production

Advanced networkingUltranet, UltraLight, etc.

Integrated education and outreach effort+ collaboration with outside projects

Unified entity in working with international projectsLCG, EGEE, Asia, South America

Page 5: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 5

Search for Origin of Mass & Supersymmetry (2007 – ?)

TOTEM

LHCb

ALICE

27 km Tunnel in Switzerland & France

CMS

ATLAS

Large Hadron Collider (LHC) @ CERN

Page 6: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 6

The LIGO Scientific Collaboration (LSC)and the LIGO Grid

LIGO Grid: 6 US sites

* LHO, LLO: observatory sites* LSC - LIGO Scientific Collaboration - iVDGL supported

iVDGL has enabled LSC to establish a persistent production grid

Cardiff

AEI/Golm •

+ 3 EU sites (Cardiff/UK, AEI/Germany)

Birmingham•

Page 7: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 7

Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN

1

10

100

1000

10000

100000

1 10 100

Num

ber

of C

lust

ers

Number of Galaxies

Galaxy clustersize distribution

Sloan Data

Page 8: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 8

Goal: Peta-scale Virtual-Data Gridsfor Global Science

Virtual Data Tools

Request Planning &Scheduling Tools

Request Execution & Management Tools

Transforms

Distributed resources(code, storage, CPUs,networks)

ResourceManagement

Services

Security andPolicy

Services

Other GridServices

Interactive User Tools

Production TeamSingle Researcher Workgroups

Raw datasource

PetaOps Petabytes Performance

GriPhyNGriPhyN GriPhyN

GriPhyN

Page 9: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 9

LHC: Petascale Global Science Complexity: Millions of individual detector channels Scale: PetaOps (CPU), 100s of Petabytes (Data) Distribution: Global distribution of people & resources

CMS Example- 20075000+ Physicists 250+ Institutes 60+ Countries

BaBar/D0 Example - 2004700+ Physicists 100+ Institutes 35+ Countries

Page 10: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 10

CMS Experiment

Global LHC Data Grid Hierarchy

Online System

CERN Computer Center

USAKorea RussiaUK

Maryland

0.1 - 1.5 GBytes/s

>10 Gb/s

10-40 Gb/s

2.5-10 Gb/s

Tier 0

Tier 1

Tier 3

Tier 4

Tier 2

Physics caches

PCs

Iowa

UCSDCaltechU Florida

~10s of Petabytes/yr by 2007-8~1000 Petabytes in < 10 yrs?

FIU

Page 11: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 11

Tier2 Centers Tier2 facility

20 – 40% of Tier1?“1 FTE support”: commodity CPU & disk, no hierarchical

storageEssential university role in extended computing

infrastructureValidated by 3 years of experience with proto-Tier2 sites

FunctionsPerform physics analysis, simulationsSupport experiment softwareSupport smaller institutions

Official role in Grid hierarchy (U.S.)Sanctioned by MOU with parent organization (ATLAS, CMS,

LIGO)Local P.I. with reporting responsibilitiesSelection by collaboration via careful process

Page 12: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 12

Analysis by Globally Distributed Teams

Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flows

Page 13: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 13

Trillium Grid Tools: Virtual Data Toolkit

Sources(CVS)

Patching

GPT srcbundles

NMI

Build & TestCondor pool

(37 computers)

Build

Test

Package

VDT

Build

Contributors (VDS, etc.)

Build

Pacman cache

RPMs

Binaries

Binaries

Binaries Test

Use NMI processes later

Page 14: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 14

Virtual Data Toolkit: Tools in VDT 1.1.12

Globus Alliance Grid Security Infrastructure

(GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS)

Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds

EDG & LCG Make Gridmap Cert. Revocation List

Updater Glue Schema/Info provider

ISI & UC Chimera & related tools Pegasus

NCSA MyProxy GSI OpenSSH

LBL PyGlobus Netlogger

Caltech MonaLisa

VDT VDT System Profiler Configuration software

Others KX509 (U. Mich.)

Page 15: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 15

VDT Growth (1.1.14 Currently)

0

2

46

8

10

12

14

1618

20

22

Jan-

02

Mar

-02

May

-02

Jul-0

2

Sep

-02

Nov

-02

Jan-

03

Mar

-03

May

-03

Jul-0

3

Sep

-03

Nov

-03

Jan-

04

Mar

-04

May

-04

Number of Components

VDT 1.1.3,1.1.4 & 1.1.5 pre-SC 2002

VDT 1.0Globus 2.0bCondor 6.3.1

VDT 1.1.7Switch to Globus 2.2

VDT 1.1.11Grid3

VDT 1.1.8First real use by LCG

VDT 1.1.14May 10

Page 16: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 16

Language: define software environments Interpreter: create, install, configure, update, verify

environments LCG/Scram ATLAS/CMT CMS DPE/tar/make LIGO/tar/make OpenSource/tar/

make

Globus/GPT NPACI/TeraGrid/tar/

make D0/UPS-UPD Commercial/tar/make

Combine and manage software from arbitrary sources.

% pacman –get iVDGL:Grid3“1 button install”: Reduce burden on administrators

Remote experts define installation/ config/updating for everyone at once

VDT

ATLAS

ATLAS

NPACI

NPACI

D-Zero

D-Zero

iVDGL

iVDGL

UCHEP

UCHEP

% pacman

VDT

CMS/DPE

LIGO

Packaging of Grid Software: Pacman

Page 17: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 17

Trillium Collaborative Relationships

Internal and External

ComputerScience

Research

VirtualData

Toolkit

Partner Physics projects

Partner Outreach projects

LargerScience

Community

Globus, Condor, NMI, iVDGL, PPDG,EU DataGrid, LHC Experiments,

QuarkNet, CHEPREO

ProductionDeployment

Tech

Transfer

Techniques

& software

RequirementsPrototyping

& experiments

Other linkages Work force CS researchers Industry

U.S.GridsInt’l

Outreach

Page 18: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 18

Grid3: An Operational National Grid28 sites: Universities + 4 national labs2800 CPUs, 400–1300 simultaneous jobs (800

now)Running since October 2003Applications in HEP, LIGO, SDSS, Genomics

http://www.ivdgl.org/grid3

Page 19: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 19

Grid2003 Applications High energy physics

US-ATLAS analysis (DIAL),US-ATLAS GEANT3 simulation (GCE)US-CMS GEANT4 simulation (MOP)BTeV simulation

Gravity wavesLIGO: blind search for continuous sources

Digital astronomySDSS: cluster finding (maxBcg)

BioinformaticsBio-molecular analysis (SnB)Genome analysis (GADU/Gnare)

CS Demonstrators Job Exerciser, GridFTP, NetLogger-grid2003

Page 20: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 20

Grid3: Three Months Usage

Page 21: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 21

CMS Production Simulations on Grid3

Total = 1.5 US-CMS resources

US-CMS Monte Carlo Simulation

USCMS

Non-USCMS

Page 22: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 22

LCG NorduGrid Grid3 LSF

LCGexe

LCGexe

NGexe

G3exe

LSFexe

super super super super super

prodDB dms

RLS RLS RLS

jabber jabber soap soap jabber

Don Quijote

Windmill

Lexor

AMI

CaponeDulcinea

ATLAS DC2 Production system150K jobs total so far (12-24 hrs/job)

650 jobs running now

Page 23: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 23

Grid2003 Broad Lessons Careful planning and coordination essential to build

GridsCommunity investment of time/resources

Operations team needed to operate Grid as a facility

Tools, services, procedures, documentation, organizationSecurity, account management, multiple organizations

Strategies needed to cope with increasingly large scale

“Interesting” failure modes as scale increasesDelegation of responsibilities to conserve human

resourcesProject, Virtual Org., Grid service, site, application

Better services, documentation, packaging

Grid2003 experience critical for building “useful” GridsFrank discussion in “Grid2003 Project Lessons” doc

Page 24: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 24

Grid2003 Lessons: Packaging VDT + Pacman: Installation and configuration

Simple installation, configuration of Grid tools + applications

Major advances over 14 VDT releases

Why this is critical for usUniformity and validation across sitesLowered barriers to participation more sites!Reduced FTE overhead & communication traffic

Further work: remote installation

Page 25: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 25

Grid3 Open Science Grid Build on Grid3 experience

Persistent, production-quality Grid, national + international scope

Continue U.S. leading role in international scienceGrid infrastructure for large-scale collaborative scientific

research

Create large computing infrastructureCombine resources at DOE labs and universities to

effectively become a single national computing infrastructure for science

Maintain interoperability with LCG (Gagliardi talk) Provide opportunities for educators and students

Participate in building and exploiting this grid infrastructureDevelop and train scientific and technical workforceTransform the integration of education and research at all

levelshttp://www.opensciencegrid.org

Page 26: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 26

Roadmap towards Open Science Grid

Iteratively build & extend Grid3, to national infrastructure

Shared resources, benefiting broad set of disciplinesStrong focus on operationsGrid3 Open Science Grid

Contribute US-LHC resources to provide initial federationUS-LHC Tier-1 and Tier-2 centers provide significant resources

Build OSG from laboratories, universities, campus grids, etc.

Argonne, Fermilab, SLAC, Brookhaven, Berkeley Lab, Jeff. LabUW Madison, U Florida, Purdue, Chicago, Caltech, Harvard, etc.Corporations

Further develop OSGPartnerships and contributions from other sciences, universities Incorporation of advanced networkingFocus on general services, operations, end-to-end performance

Page 27: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 27

Enterprise

Possible OSG Collaborative Framework

Technical Groups

0…n (small)

Consortium Board(1)

ResearchGrid Projects

VO Org

Researchers

Sites

Service Providers

Campus, Labs

activity1activity

1activity1activity

0…N (large)

Joint committees(0…N small)

Participants provide:resources, management, project steering groups

Page 28: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 28

Open Science Grid Consortium Join U.S. forces into the Open Science Grid consortium

Application scientists, technology providers, resource owners, ...

Provide a lightweight framework for joint activities...Coordinating activities that cannot be done within one projectAchieve technical coherence and strategic vision (roadmap)Enable communication and provide liaisingReach common decisions when necessary

Create Technical GroupsPropose, organize, oversee activities, peer, liaiseNow: Security and storage servicesLater: Support centers, policies

Define activitiesContributing tasks provided by participants joining the activity

Page 29: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 29

Applications, Infrastructure, Facilities

Page 30: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 30

Open Science Grid Partnerships

Page 31: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 31

Partnerships for Open Science Grid Complex matrix of partnerships: challenges &

opportunitiesNSF and DOE, Universities and Labs, physics and

computing depts.U.S. LHC and Trillium Grid ProjectsApplication sciences, computer sciences, information

technologiesScience and educationU.S., Europe, Asia, North and South AmericaGrid middleware and infrastructure projectsCERN and U.S. HEP labs, LCG and Experiments, EGEE and

OSG

~85 participants on OSG stakeholder mail list~50 authors on OSG abstract to CHEP conference (Sep.

2004)

Page 32: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 32

Open Science Grid Meetings Sep. 17, 2003 @ NSF

Early discussions with other HEP peopleStrong interest of NSF education people

Jan. 12, 2004 @ Fermilab Initial stakeholders meeting, 1st discussion of governance

May 20-21, 2004 @ Univ. of Chicago Joint Trillium Steering meeting to define OSG program

July 2004 @ WisconsinFirst attempt to define OSG Blueprint (document)

Sep. 7-8, 2004 @ MIT2nd Blueprint meeting

Sep. 9-10, 2004 @ HarvardMajor OSG workshopThree sessions: (1) Technical, (2) Governance, (3)

Sciences

Page 33: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 33

Page 34: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 34

Education and Outreach

Page 35: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 35

NEWS: Bulletin: ONE TWOWELCOME BULLETIN General InformationRegistrationTravel Information Hotel RegistrationParticipant List How to Get UERJ/Hotel Computer AccountsUseful Phone Numbers ProgramContact us: Secretariat Chairmen

Grids and the Digital DivideRio de Janeiro, Feb. 16-20, 2004

Background World Summit on Information Society HEP Standing Committee on Inter-

regional Connectivity (SCIC)

Themes Global collaborations, Grids and

addressing the Digital Divide

Next meeting: May 2005 (Korea)

http://www.uerj.br/lishep2004

Page 36: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 36

iVDGL, GriPhyN Education / Outreach

Basics $200K/yr Led by UT

Brownsville Workshops, portals Partnerships with

CHEPREO, QuarkNet, …

Page 37: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 37

June 21-25 Grid Summer School First of its kind in the U.S. (South Padre Island,

Texas)36 students, diverse origins and types (M, F, MSIs, etc)

Marks new direction for TrilliumFirst attempt to systematically train people in Grid

technologiesFirst attempt to gather relevant materials in one placeToday: Students in CS and PhysicsLater: Students, postdocs, junior & senior scientists

Reaching a wider audiencePut lectures, exercises, video, on the webMore tutorials, perhaps 3-4/yearDedicated resources for remote tutorialsCreate “Grid book”, e.g. Georgia Tech

New funding opportunitiesNSF: new training & education programs

Page 38: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 38

Page 39: Paul Avery University of Florida avery@phys.ufl

CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University

Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S.

America)

Funded September 2003

$4M initially (3 years) 4 NSF Directorates!

Page 40: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 40

UUEO: A New Initiative Meeting April 8 in Washington DC Brought together ~40 outreach leaders (including

NSF) Proposed Grid-based framework for common E/O

effort

Page 41: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 41

Grid Project References Grid3

www.ivdgl.org/grid3 PPDG

www.ppdg.net GriPhyN

www.griphyn.org iVDGL

www.ivdgl.org Globus

www.globus.org LCG

www.cern.ch/lcg EU DataGrid

www.eu-datagrid.org EGEE

egee-ei.web.cern.ch

2nd Editionwww.mkp.com/grid2

Page 42: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 42

Extra Slides

Page 43: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 43

OutreachQuarkNet-Trillium Virtual Data

Portal More than a web site

Organize datasets Perform simple

computations Create new computations &

analyses View & share results Annotate & enquire

(metadata) Communicate and

collaborate

Easy to use, ubiquitous, No tools to install

Open to the community Grow & extend

Initial prototype implemented by graduate student Yong Zhao and M. Wilde (U. of Chicago)

Page 44: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 44

GriPhyN Achievements Virtual Data paradigm to express science

processesUnified language (VDL) to express general data

transformationAdvanced planners, executors, monitors, predictors, fault

recovery to make the Grid “like a workstation”

Virtual Data Toolkit (VDT)Tremendously simplified installation & configuration of

GridsClose partnership with and adoption by multiple sciences:

ATLAS, CMS, LIGO, SDSS, Bioinformatics, EU Projects

Broad education & outreach program (UT Brownsville)

25 graduate, 2 undergraduate; 3 CS PhDs by end of 2004Virtual Data for QuarkNet Cosmic Ray projectGrid Summer School 2004, 3 MSIs participating

Page 45: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 45

Grid2003 Lessons (1) Need for financial investment

Grid projects: PPDG + GriPhyN + iVDGL: $35MNational labs: Fermilab, Brookhaven, Argonne, LBLOther expts: LIGO, SDSS

Critical role of collaborationCS + 5 HEP + LIGO + SDSS + CS + biologyMakes possible large common effortCollaboration sociology ~50% of issues

Building “stuff”: Prototypes testbeds production Grids

Build expertise, debug software, develop tools & services

Need for “operations team”Point of contact for problems, questions, outside inquiriesResolution of problems by direct action, consultation w/

experts

Page 46: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 46

Grid2003 Lessons (2): Packaging VDT + Pacman: Installation and configuration

Simple installation, configuration of Grid tools + applications

Major advances over 14 VDT releases

Why this is critical for usUniformity and validation across sitesLowered barriers to participation more sites!Reduced FTE overhead & communication trafficThe next frontier: remote installation

Page 47: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 47

Grid2003 and Beyond Further evolution of Grid3: (Grid3+, etc.)

Contribute resources to persistent GridMaintain development Grid, test new software releases Integrate software into the persistent GridParticipate in LHC data challenges

Involvement of new sitesNew institutions and experimentsNew international partners (Brazil, Taiwan, Pakistan?, …)

Improvements in Grid middleware and servicesStorage services Integrating multiple VOsMonitoringTroubleshootingAccounting

Page 48: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 48

International Virtual Data Grid Laboratory

UF

UW MadisonBNL

Indiana

Boston USKC

Brownsville

Hampton

PSUJ. Hopkins

Caltech

Tier1Tier2Other

FIU

Austin

Michigan

LBL Argonne

Vanderbilt

UCSD

Fermilab

PartnersEUBrazilKorea

Iowa Chicago

UW Milwaukee

ISI

Buffalo

Page 49: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 49

U Florida CMS (Tier2), Management Caltech CMS (Tier2), LIGO (Management) UC San Diego CMS (Tier2), CS Indiana U ATLAS (Tier2), iGOC (operations) Boston U ATLAS (Tier2) Harvard ATLAS (Management) Wisconsin, Milwaukee LIGO (Tier2) Penn State LIGO (Tier2) Johns HopkinsSDSS (Tier2), NVO Chicago CS, Coord./Management, ATLAS

(Tier2) Vanderbilt* BTEV (Tier2, unfunded) Southern California CS Wisconsin, Madison CS Texas, Austin CS Salish Kootenai LIGO (Outreach, Tier3) Hampton U ATLAS (Outreach, Tier3) Texas, Brownsville LIGO (Outreach, Tier3) Fermilab CMS (Tier1), SDSS, NVO Brookhaven ATLAS (Tier1) Argonne ATLAS (Management), Coordination

Roles of iVDGL Institutions

Page 50: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 50

“Virtual Data”: Derivation & Provenance

Most scientific data are not simple “measurements”They are computationally corrected/reconstructedThey can be produced by numerical simulation

Science & eng. projects are more CPU and data intensive

Programs are significant community resources (transformations)

So are the executions of those programs (derivations)

Management of dataset dependencies critical!Derivation: Instantiation of a potential data productProvenance: Complete history of any existing data

productPreviously: Manual methodsGriPhyN: Automated, robust

tools

Page 51: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 51

Transformation Derivation

Data

product-of

execution-of

consumed-by/generated-by

“I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.”

“I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.”

“I want to search a database for 3 muon SUSY events. If a program that does this analysis exists, I won’t have to write one from scratch.”

“I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.”

Virtual Data Motivations

Page 52: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 52

decay = WWWW e Pt > 20

decay = WWWW e

decay = WWWW leptons

mass = 160

decay = WW

decay = ZZ

decay = bb

Other cuts

Scientist adds a new derived data branch & continues analysis

Virtual Data Example: LHC Analysis

Other cuts Other cuts Other cuts

Page 53: Paul Avery University of Florida avery@phys.ufl

Korean HEP Grid Workshop (August 27, 2004)

Paul Avery 53

Grids: Enhancing Research & Learning

Fundamentally alters conduct of scientific research“Lab-centric”: Activities center around large facility“Team-centric”: Resources shared by distributed teams“Knowledge-centric”: Knowledge generated/used by a

community Strengthens role of universities in research

Couples universities to data intensive scienceCouples universities to national & international labsBrings front-line research and resources to studentsExploits intellectual resources of formerly isolated schoolsOpens new opportunities for minority and women researchers

Builds partnerships to drive advances in IT/science/engHEP Physics, astronomy, biology, CS, etc.“Application” sciences Computer ScienceUniversities LaboratoriesScientists StudentsResearch Community IT industry