23
Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

  • Upload
    matia

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002. CMS distributed computing. CMS wanted to build a distributed computing system all along! CMS CTP (Dec 1996): One integrated computing system with a single global view of the data - PowerPoint PPT Presentation

Citation preview

Page 1: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Caltech and CMS Grid Work Overview

Koen Holtman

Caltech/CMS

May 22, 2002

Page 2: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

CMS distributed computing

CMS wanted to build a distributed computing

system all along!

CMS CTP (Dec 1996):

One integrated computing system with a

single global view of the data

Used by the 1000s of CMS collaborators

around the world

We now call this the `CMS Data Grid System'

Page 3: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

PPDG: Mission-Oriented Pragmatic Methodology

End-to-end integration and deployment of experiment applications using existing and emerging Grid services

Deployment of Grid technologies and services in production (24x7) environments

With stressful performance needs

Collaborative development of Grid middleware and extensions between application and middleware groups

Leading to pragmatic and acceptable-risk solutions.

HENP experiments extend their adoption of common infrastructures to higher layers of their data analysis and processing applications.

Much attention to integration, coordination, interoperability and interworking

With emphasis on incremental deployment of increasingly functional working systems

Page 4: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Major Grid requirements effort completedDocument writing by Caltech groupCatania CMS week Grid workshop

(June 2001, about 12 hours over various sessions)

CMS consensus on many strategic issuesDivision of labor between Grid projects

and CMS Computing group Needed for planning, manpower

estimatesGrid job execution modelGrid data model, replication modelObject handling and the Grid

Main Grid Requirements Document: CMS Data Grid System Overview and

Requirements. CMS Note 2001/037 http://kholtman.home.cern.ch/kholtman/cmsreqs.pdf

Additional documents on object views, hardware sizes, workload model, data model (K. Holtman) CMS Note 2001/047

28Pages

2003 CMS data grid system vision

CMS Grid Requirements

Page 5: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Objects and Files in the Grid

CMS computing is object-oriented, and database oriented Fundamentally we have a persistent data model with

1 object = 1 piece of physics data (KB-MB size) Much of the thinking in the Grid projects and Grid community

is file oriented `Computer center' view of large applications

Do not look inside application codeThink about application needs in terms of CPU batch

queues, disk space for files, file staging and migration How to reconcile this ? CMS requirements 2001-2003:

Grid project components do not need to deal with objects directly

Specify file handling requirements in such a way that a CMS layer for object handling can be built on top

LCG Project (SC2, PEB) has started to develop new object handling layer

Page 6: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Grid Services for CMS: Division of Labor(CMS Week,June 2001)

Provided by CMS Mapping between objects and files

(persistency layer) Local and remote extraction and

packaging of objects to/from files Consistency of software

configuration for each site Configuration meta-data for each

sample Aggregation of sub-jobs Policy for what we want to do (e.g.

priorities for what to run first, the production manager)

Some error recovery too Not needed by 2003

Auto-discovery of arbitrary identical/similar samples

Needed from Somebody Tool to implement common CMS

configuration on remote sites ?

Provided by the Grid Distributed job scheduler: if a file is remote

the Grid will run appropriate CMS software

(often remotely; split over systems) Resource management, monitoring,

and accounting tools and services Query estimation tools (to WHAT DEPTH?) Resource optimisation with some user hints

/ control (coherent management of local

copies, replication, caching) Transfer of collections of data Error recovery tools (from e.g. job/disk

crashes.) Location information of Grid-managed files File management such as creation, deletion,

purging, etc. Remote virtual login and authentication /

authorisation

Page 7: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

GSI, CAS

MDS

MCAT; GriPhyN catalogs

Application

Planner

Executor

Catalog Services

Info Services

Policy/Security

Monitoring

Repl. Mgmt.

Reliable TransferService

Compute Resource Storage Resource

= initial solution is operational

Ian Foster, Carl Kesselman, Mike Wilde, others

GriPhyN/PPDG Architecture

Page 8: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Imperial Imperial CollegeCollege

UFLUFL

Fully operationalFully operational

CaltechCaltech

PUPUNo PUNo PU

In progressIn progress

Common Prod. Common Prod. toolstools

(IMPALA)(IMPALA)GDMPGDMP

DigitizationDigitizationSimulationSimulation

HelsinkiHelsinkiIN2P3IN2P3

WisconsinWisconsin

BristolBristol

UCSDUCSD

INFN (10)INFN (10)MoscowMoscow

FNALFNAL

CERNCERN

Worldwide Production

at 21 Sites

CMS Production

Page 9: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Data Produced in 2001

0.13 MHelsinki

0.07 MWisconsin

0.31 MIN2P3

0.76 MINFN

1.10 MCERN

1.27 MBristol/RAL

2.50 MCaltech

0.05 M

0.06 M

0.43 M

1.65 M

Simulated EventsTOTAL = 8.4 M

UFL

UCSD

Moscow

FNAL

TYPICAL EVENT SIZES

Simulated 1 CMSIM event

= 1 OOHit event= 1.4 MB

Reconstructed 1 “1033” event

= 1.2 MB 1 “2x1033” event

= 1.6 MB1 “1034” event

= 5.6 MB

0.22 TBBristol/RAL

0.08 TB

-

0.05 TB

0.10 TB

0.20 TB

0.40 TB

0.45 TB

12 TB

14 TB

UFL

Helsinki

Wisconsin

IN2P3

UCSD

INFN

Moscow

FNAL

CERN

OBJECTIVITY DATATOTAL = 29 TB

0.60 TBCaltech

Page 10: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

GDMP

Tool to transfer and manage files in production Easy to handle this manually with a few centers, impossible with lots of data at many centers

GDMP is based around Globus Middleware and a Flexible architecture Globus Replica Catalogue

Provided an early model of collaboration between HEP and Grid middleware providers Successfully used to replicate > 1TB of CMS data

Now a PPDG/EU DataGrid joint project Grid Data Management Pilot (GDMP): A Tool for Wide Area

Replication Applied Informatics Conference (AI2001), Innsbruck, Austria, 2/1001.

Authors: Caltech, CERN/CMS, FNAL, CERN/IT; PPDG, GriPhyN, EU DataGrid WP2

Integration with ENSTORE; HPSS, Castor Tape Systems

Page 11: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

PPDG MOP system

PPDG Developed MOP SystemAllows submission of CMS prod. Jobs from a central location, run on remote locations, and returnresults

Relies on GDMP for replication Globus GRAM Condor-G and local queuing

systems for Job Scheduling IMPALA for Job Specification

Shown in SC2001 demoNow being deployed in USCMS testbedProposed as basis for next CMS-wide production infrastructure

Page 12: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

US CMS Prototypes and Test-bedsUS CMS Prototypes and Test-beds

All U.S. CMS S&C Institutions are involved in DOE and NSF Grid ProjectsAll U.S. CMS S&C Institutions are involved in DOE and NSF Grid Projects1.1. Integrating Grid softwareIntegrating Grid software

into CMS systemsinto CMS systems2.2. Bringing CMS ProductionBringing CMS Production

on the Gridon the Grid3.3. Understanding the Understanding the

operational issuesoperational issues

1.1. MOP used as first pilot applicationMOP used as first pilot application

2.2. MOP system got official CMS production MOP system got official CMS production assignment of 200K CMSIM eventsassignment of 200K CMSIM events

3.3. 50K have been produced and registered already50K have been produced and registered already

Page 13: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Installing middleware

Virtual Data Toolkit Globus 2.0 beta

Essential Grid ToolsEssential Grid Services I & IIGrid API

Condor-G 6.3.1 Condor 6.3.1 ClassAds 0.9 GDMP 3.0 alpha 3

We found the VDT to be very easy to install, but a little bit more challenging to configure

Page 14: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

StorageResource

Replica MngmtCatalog Services

Planner Executor

Use r

RefDBMaterialized

DataCatalog

Virtual DataCatalog

ConcretePlanner/

WP1

AbstractPlanner

MOP/WP1

ReplicaCatalogGDMP

Local GridStorage

ObjectivityMetadataCatalog

LocalTracking DB

Prototype VDG System (production)

Compute Resource

BO

SS

CMKIN

CMSIM

ORCA/COBRA

WrapperScripts

Page 15: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

StorageResource

Replica MngmtCatalog Services

Planner Executor

Use r

RefDBMaterialized

DataCatalog

Virtual DataCatalog

ConcretePlanner/

WP1

AbstractPlanner

MOP/WP1

ReplicaCatalogGDMP

Local GridStorage

ObjectivityMetadataCatalog

LocalTracking DB

Compute Resource

BO

SS

CMKIN

CMSIM

ORCA/COBRA

WrapperScripts

= no code = existing = implemented using MOP

Prototype VDG System (production)

Page 16: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Analysis part

Physics data analysis will be done by 100s of users

Caltech taking responsibility for developing the

analysis part of the vertically integrated system

Analysis part is connected to same catalogs

Maintain a global view of all data

Big analysis jobs can use production job handling

mechanisms

Analysis services based on tags

Page 17: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Optimization of “Tag” Databases

Tags are small (~0.2 - 1 kbyte) summary objects for each event

Crucial for fast selection of interesting event subsets;this will be an intensive activity

Past work concentrated in three main areas:Integration of CERN’s “HepODBMS” generic Tag

system with the CMS “COBRA[*]” frameworkInvestigations of Tag bitmap indexing to speed queries

Comparisons of OO and traditional databases (SQL Server, soon Oracle 9i) as efficient stores for Tags

New work concentrates on tag based analysis services

Page 18: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

CLARENS: a Portal to the Grid

Grid-enabling the working environment for non-specialist physicists' data analysisClarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence.The server is implemented in C++ to give access to the CMS OO analysis toolkit.The server will provide a remote API to Grid tools:

Security services provided by the Grid (GSI)The Virtual Data Toolkit: Object collection accessData movement between Tier centers using GSI-FTPCMS analysis software (ORCA/COBRA),

Current prototype is running on the Caltech proto-Tier2More information at http://clarens.sourceforge.net, along with a web-based demo

Page 19: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Push & Pullrsh & ssh existing scripts

snmp

RCMonitorService

Farm Monitor

Client(other service)

LookupService

LookupService

Registration

Farm Monitor

Discovery

Proxy

Component Factory

GUI marshaling Code Transport RMI data access

Globally Scalable Monitoring Service CMS (Caltech and Pakistan)

Page 20: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Current events

GDMP and MOP just had very favorable internal reviews in PPDG

Testbed: currently MOP deployment under way Stresses the Grid middleware in new ways: new

issues and bugs being discovered in Globus, Condor

Testbed MOP production request: 200K CMSIM events requested, now 50K (~10 GB)

finished and validated. New fully integrated system: first versions expected

by summer System will be the basis for demos at SC2002

Upcoming: CMS workshop on Grid based production (CERN)

Upcoming: PPDG analysis workshop (Berkeley)

Page 21: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

2000 - 2001

Main `Grid task' activities in 2000 - 2001: Ramp-up of Grid projects,

establish a new mode of working Grid project requirements documents, architecture GDMP

Started as griddified package for data transport in CMS production, is now a more generic project

Used widely in 2001 production Also demo of mode of working

MOP Vertical integration of CMS production software,

GDMP, Condor Both GDMP and MOP just had very succesful internal

reviews in PPDG

Page 22: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

2002

Grid task main activities (in US) in 2002: Build USCMS test grid Deploy Globus 2.0, EU DataGrid components Use MOP as a basis for developing a larger

vertically integrated system withVirtual data featuresCentral catalogs and a global view of dataProduction facilities

● Participate in real CMS production with non-

trivial jobsAnalysis facilities

Caltech team's main role is towards analysis facilities

Page 23: Caltech and CMS  Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Summary: 2000 - 2002

Main `Grid task' activities in 2000 - 2001: Grid project requirements documents,

architecture GDMP MOP

Main `Grid task' activities (in US) in 2002: Build USCMS test grid Deploy Globus 2.0, EU DataGrid components Use MOP as a basis for developing a larger

vertically integrated system withVirtual data featuresCentral catalogs and a global view of dataProduction facilities

● Participate in real CMS productionAnalysis facilities