25
[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop BNL - May 7, 2002

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

Embed Size (px)

Citation preview

Page 1: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Distributed Analysis

Craig E. Tull

HCG/NERSC/LBNL

(US) ATLAS Grid Software Workshop

BNL - May 7, 2002

Page 2: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Distributed Analysis

• How to participate in new PPDG activity in distributed analysis

• Grid portals—Grappa: brief status, and plans for remainder

of 02—Ganga: Karl Harrison will give a 15min

overview Powerpoint PDF

Page 3: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Distributed Processing Models

• Batch-like Processing (ala WP1)• Distributed Single Event (MPP)• Client-Server (interactive)• WAN Data Access (AMS, Clipper)• File Transfer and Local Processing (GDMP)• Agent-based Processing (distributed control)• Check-Point & Migrate (save & restore)• Scatter & Gather (parallel events)

• Move the data or move the executable?—No experiment is planning to write PetaBytes

of Code!

Page 4: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

ATLAS Distributed Processing Model

• At this point, it is still not clear what the final ATLAS distributed computing model will be. Although newer ideas like Agent-based Processing have a great deal of appeal, they are as yet unproven in a large-scale production environment.

• A conservative approach would be some combination of Batch-like Processing and File Transfer and Local Processing for batch jobs, with perhaps a Client-Server-like approach for interactive jobs (w/ some Scatter/Gather?).

Page 5: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Data Access Patterns

• Data access patterns of physics jobs also heavily influence our thinking about interacting with the Grid. It is likely that all possible data access patterns will be extant in ATLAS data processing at various stages in that processing.We may find that some data access patterns lend themselves to efficient use of the Grid much better than others.

• Data access patterns include:

— Sequential Access (reconstruction)

— Random Access (interactive analysis)

— File/Data Set Driven (LFN-friendly)

— Navigational Driven (OODB-like)

— Query Driven (SQL/OQL/JDO/etc)

Page 6: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Athena/Grid Interface

• For the programmatic interface to Grid services, we are thinking in terms of Gaudi services to capture and present the functionality of the grid services (not necessarily a one-to-one mapping, BTW).

• I think it is important at this stage (maybe forever) to insure that the framework is "grid-capable" without being "grid-dependent". IE- We should always be able to run without grid services available.—Gaudi's component architecture makes this

approach to using the grid quite natural.—How do we switch between Grid/non-Grid?

Page 7: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Athena/Gaudi - ATLAS/LHCb Collaboration

• Some is already occurring. UK Grid PP funding exists for:

• Installation Kit—Many common tools & problems

• CMT, Gaudi, AFS

• Controlling Interface—How to interact with WP1's JCL—"GANGA"-like Concept

• Grid Services API—Grid Services should be presented as Gaudi

Services

Page 8: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Interfacing to the GRID

• Making the framework working in the GRID environment requires:

— Collecting use-cases and architectural design

— Identify the [Gaudi/Athena] components that need to be adapted/re-implemented to make use of the Grid services

• Started to identify areas of work:

— Data access (persistency)

— Event Selection

— GANGA (job configuration & monitoring, resource estimation & booking, job scheduling, etc.)

GAUDI/AthenaProgram

GANGAGU

I

JobOptionsAlgorithms

GRIDServices

HistogramsMonitoringResults

Page 9: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Ganga Senarios

• Original Proposal - October 2001

• Scenario 1—User makes a "high-level" selection of data to process

and defines processing job.• "High-level" means based on event characteristics and not

on file or even identity.

—High-level event selection uses ATLAS Bookkeeping DataBase (similar to current LArC Bookkeeping data base or BNL's Magda) to select event & logical file identities.

—Construct JDL for WP1 using LFNs—Construct jobOptions.py using PFNs (w/ WP2)—Submit job(s) using JDL & jobOptions.py in sandbox.

• Scenario 2 - The same except jobOptions.py now contains LFNs. This requires the Replica Service API-enabled EvtSelector or ConversionSrv.

Page 10: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

CS-11 Analysis Tools

“interface and integrate interactive data analysis tools with the grid and to identify common components and services.”

First:— identify appropriate individuals to participate in this area, within

and from outside of PPDG – several identified from each experiment

—assemble a list of references to white papers, publications, tools and related activities – available on http://www.ppdg.net/pa/ppdg-pa/idat/related-info.html

—produce a white paper style requirements document as an initial view of a coherent approach to this topic – draft circulated by June

—develop a roadmap for the future of this activity – at/post face-to-face meeting

Page 11: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Analysis of large datasets over the Grid

• Dataset does not fit on disk: Need access s/w to couple w/ processing; Distributed management implementing global experiment and local site policies

• Demand significantly exceeding available resources: Queues always full. When/how to move job and/or data; Global optimization of (or at least not totally random) total system throughput without too many local constraints (e.g. single points of failure)

• Data and Job Definition – in physicist terminology . For D0-SAM web+cl interface to specify Dataset + Dataset Snapshots. Saved in RDBMS for tracking and reuse. Many “dimensions” or attributes can be combined to define a dataset; Definitions can be iterative, extended; New versions defined at a specific date; Transforms dataset definition into SQL query to the database. Saves the transform definition.

• Distributed processing and control: Schedule, control and monitor access to shared resources – CPU, disk, network. E.g. All D0-SAM job executions pass through a SAM-wrapper and are tracked in the database for monitoring and analysis.

• Faults of all kind occur: Preemption, exceptions, resource unavaibility; crashes,; Checkpointing and Restart; Workflow management to complete failed tasks; Error reporting and diagnosis;

• Chaotic and Large Spikes in Load; e.g.Analysis needs vary widely and difficult to predict – especially if a sniff of a new discovery..

• Estimation, Prediction, Planning, Partial Results - GriPhyN research areas.

Page 12: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

References supplied by PPDG participants to date

• Proposal to NSF for CMS Analysis: an Interactive Grid-Enabled Environment (CAIGEE) - Julian Bunn, Caltech

• Grid Analysis Environment work at Caltech, April 2002 - Julian Bunn, Caltech

• Views of CMS Event Data – Koen, Caltech• ATLAS Athena & Grid - Craig Tull, LBNL• CMS Distributed analysis workshop, April 2001 - Koen Holtman, Caltech• PPDG-8, Comparison of datagrid tools capabilities - Reagan Moore,

SDSC• Interactivity in a Batched Grid Environment - David Liu, UCB• Deliverables document from Crossgrid WP4• Portals, UI examples, etc.links • GENIUS: Grid Enabled web eNvironment for site Independent User job

Submission - Roberto Barbera, INFN• SciDAC CoG Kit (Commodity Grid Kit) • ATLAS Grid Access Portal for Physics Applications XCAT, a

Common Component Architecture implementation

Page 13: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Tools etc

• Java Analysis Studio JASTony Johnson, SLAC• Distributed computing with JAS (prototype) linkTony Johnson,

SLAC• Abstract Interfaces for Data Analysis (AIDA) homeTony Johnson,

SLAC• BlueOx: Distributed Analysis with Java (Jeremiah Mans,

Princeton)• homeTony Johnson, SLAC• Parallel ROOT Facility, PROOF intro, slides, update Fons

Rademakers, CERN• Integration of ROOT and SAM info, example Gabriele Garzoglio,

FNAL• Clarens Remote Analysis infoConrad Steenberg, Caltech• IMW: Interactive Master-Worker Style Parallel Data Analysis Tool

on the Grid linkMiron Livny, Wisconsin• SC2001 demo of Bandwidth Greedy Grid-enabled Object

Collection Analysis for Particle Physics linkKoen Holtman, Caltech

Page 14: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

CS-11- Short term Status

• The requirements document is now in the process of being outlined – Joseph Perl, Doug Olson – based on posted contributions.

• A workshop is being planned to bring people together at LBL in mid June (18?19?). We won’t know more specifics til after the meeting.. Clearly Experiments starting to think about Remote Analysis (D0), Analysis for Grid simulatiojn production (CMS), and ATLAS/ALICE

• Many experiments (will) use ROOT (& carrot? proof?). In conjunction with Run2 visit to Fermilab, Rene will have discussions with PPDG and CS groups in the last week of May.

• Need to identify the narrow band in which PPDG can be a contributor rather than just adding to the meeting load: Keep to our mission of using/extending existing tools “for real” over the short/medium term (but encourage and do not derail needed longer term development work!)

Page 15: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Page 16: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Page 17: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Generic data flow in HENP ?“Skims”,

“microDST production”, …Filtering chosen to make

this a convenient size$100M, 10 yr, 100 people

10 yr, 20 people

1 yr, 50 people, 5x/yr

1 mo, 1 person, 100x/yr

What’s going on in this box?Is this picture anywhere close to reality?

Many groups grappling with requirements now..

Page 18: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Page 19: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

Page 20: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

PPDG Plan - Distributed analysis services

• user transparent data analysis; automatic and transparent optimization of processing locale, and transparent return of result

• principal initial application: analysis executing at major center under transparent local control from home institute

Page 21: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

PPDG Plan - Distributed analysis services

• Components:—Job specification tools:

• tools to specify job conditions and requests, record them (cf. data signature catalog), and transpose them into actual job scripts, config files, etc.

—Distributed job management:• automatic and transparent optimization of

processing locale based on data and resource availability

—request management

—resource optimization

—interaction and integration with job control services

Page 22: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

PPDG Plan - Distributed analysis services

• Components:—transparent local availability of (selected

compact) results, further transparent distributed processing of results

• results browsing, catenation, return• management of additional processing and/or

reprocessing• selective public/private cataloguing of results

—Cost-aware optimized data retrieval:• tools and services providing efficient access to

distributed data optimized across all retrieval requests, with flexible support for access policies and priorities established by the experiments.

Page 23: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

PPDG Plan - Distributed analysis services

• Components:—Services for user-owned data management—Grid enabling of analysis, statistics, graphics

tools—Integrated and deployed distributed analysis

services:• testbed and production services deployed in the

experiments.—In ATLAS: CERN <--> BNL Tier 1 <--> Tier N

—Timescale: Basic remote submission services: during 2001Initial comprehensive implementation: 2003 (ATLAS MDC2)

Page 24: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

PPDG Plan - Distributed analysis services

• Components:—Services for user-owned data management—Grid enabling of analysis, statistics, graphics

tools—Integrated and deployed distributed analysis

services:• testbed and production services deployed in the

experiments.—In ATLAS: CERN <--> BNL Tier 1 <--> Tier N

—Timescale: Basic remote submission services: during 2001Initial comprehensive implementation: 2003 (ATLAS MDC2)

Page 25: CETull@lbl.gov - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL) Distributed Analysis Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software Workshop

[email protected] - Distributed Analysis (07may02 - USA Grid SW WkShp @ BNL)

PPDG Plan

• ATLAS short-term workplan for distributed grid architecture performing increasingly complex operations (L.Perini talk 12/00):—submit event generation and simulation locally

and remotely;—store events locally and remotely;—access remote data (e.g., background events,

stored centrally);—(partially) duplicate event databases;—schedule job submission;—allocate resources:—monitor job execution;—optimize performances;