Upload
matia
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002. CMS distributed computing. CMS wanted to build a distributed computing system all along! CMS CTP (Dec 1996): One integrated computing system with a single global view of the data - PowerPoint PPT Presentation
Citation preview
Caltech and CMS Grid Work Overview
Koen Holtman
Caltech/CMS
May 22, 2002
CMS distributed computing
CMS wanted to build a distributed computing
system all along!
CMS CTP (Dec 1996):
One integrated computing system with a
single global view of the data
Used by the 1000s of CMS collaborators
around the world
We now call this the `CMS Data Grid System'
PPDG: Mission-Oriented Pragmatic Methodology
End-to-end integration and deployment of experiment applications using existing and emerging Grid services
Deployment of Grid technologies and services in production (24x7) environments
With stressful performance needs
Collaborative development of Grid middleware and extensions between application and middleware groups
Leading to pragmatic and acceptable-risk solutions.
HENP experiments extend their adoption of common infrastructures to higher layers of their data analysis and processing applications.
Much attention to integration, coordination, interoperability and interworking
With emphasis on incremental deployment of increasingly functional working systems
Major Grid requirements effort completedDocument writing by Caltech groupCatania CMS week Grid workshop
(June 2001, about 12 hours over various sessions)
CMS consensus on many strategic issuesDivision of labor between Grid projects
and CMS Computing group Needed for planning, manpower
estimatesGrid job execution modelGrid data model, replication modelObject handling and the Grid
Main Grid Requirements Document: CMS Data Grid System Overview and
Requirements. CMS Note 2001/037 http://kholtman.home.cern.ch/kholtman/cmsreqs.pdf
Additional documents on object views, hardware sizes, workload model, data model (K. Holtman) CMS Note 2001/047
28Pages
2003 CMS data grid system vision
CMS Grid Requirements
Objects and Files in the Grid
CMS computing is object-oriented, and database oriented Fundamentally we have a persistent data model with
1 object = 1 piece of physics data (KB-MB size) Much of the thinking in the Grid projects and Grid community
is file oriented `Computer center' view of large applications
Do not look inside application codeThink about application needs in terms of CPU batch
queues, disk space for files, file staging and migration How to reconcile this ? CMS requirements 2001-2003:
Grid project components do not need to deal with objects directly
Specify file handling requirements in such a way that a CMS layer for object handling can be built on top
LCG Project (SC2, PEB) has started to develop new object handling layer
Grid Services for CMS: Division of Labor(CMS Week,June 2001)
Provided by CMS Mapping between objects and files
(persistency layer) Local and remote extraction and
packaging of objects to/from files Consistency of software
configuration for each site Configuration meta-data for each
sample Aggregation of sub-jobs Policy for what we want to do (e.g.
priorities for what to run first, the production manager)
Some error recovery too Not needed by 2003
Auto-discovery of arbitrary identical/similar samples
Needed from Somebody Tool to implement common CMS
configuration on remote sites ?
Provided by the Grid Distributed job scheduler: if a file is remote
the Grid will run appropriate CMS software
(often remotely; split over systems) Resource management, monitoring,
and accounting tools and services Query estimation tools (to WHAT DEPTH?) Resource optimisation with some user hints
/ control (coherent management of local
copies, replication, caching) Transfer of collections of data Error recovery tools (from e.g. job/disk
crashes.) Location information of Grid-managed files File management such as creation, deletion,
purging, etc. Remote virtual login and authentication /
authorisation
GSI, CAS
MDS
MCAT; GriPhyN catalogs
Application
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
= initial solution is operational
Ian Foster, Carl Kesselman, Mike Wilde, others
GriPhyN/PPDG Architecture
Imperial Imperial CollegeCollege
UFLUFL
Fully operationalFully operational
CaltechCaltech
PUPUNo PUNo PU
In progressIn progress
Common Prod. Common Prod. toolstools
(IMPALA)(IMPALA)GDMPGDMP
DigitizationDigitizationSimulationSimulation
HelsinkiHelsinkiIN2P3IN2P3
WisconsinWisconsin
BristolBristol
UCSDUCSD
INFN (10)INFN (10)MoscowMoscow
FNALFNAL
CERNCERN
Worldwide Production
at 21 Sites
CMS Production
Data Produced in 2001
0.13 MHelsinki
0.07 MWisconsin
0.31 MIN2P3
0.76 MINFN
1.10 MCERN
1.27 MBristol/RAL
2.50 MCaltech
0.05 M
0.06 M
0.43 M
1.65 M
Simulated EventsTOTAL = 8.4 M
UFL
UCSD
Moscow
FNAL
TYPICAL EVENT SIZES
Simulated 1 CMSIM event
= 1 OOHit event= 1.4 MB
Reconstructed 1 “1033” event
= 1.2 MB 1 “2x1033” event
= 1.6 MB1 “1034” event
= 5.6 MB
0.22 TBBristol/RAL
0.08 TB
-
0.05 TB
0.10 TB
0.20 TB
0.40 TB
0.45 TB
12 TB
14 TB
UFL
Helsinki
Wisconsin
IN2P3
UCSD
INFN
Moscow
FNAL
CERN
OBJECTIVITY DATATOTAL = 29 TB
0.60 TBCaltech
GDMP
Tool to transfer and manage files in production Easy to handle this manually with a few centers, impossible with lots of data at many centers
GDMP is based around Globus Middleware and a Flexible architecture Globus Replica Catalogue
Provided an early model of collaboration between HEP and Grid middleware providers Successfully used to replicate > 1TB of CMS data
Now a PPDG/EU DataGrid joint project Grid Data Management Pilot (GDMP): A Tool for Wide Area
Replication Applied Informatics Conference (AI2001), Innsbruck, Austria, 2/1001.
Authors: Caltech, CERN/CMS, FNAL, CERN/IT; PPDG, GriPhyN, EU DataGrid WP2
Integration with ENSTORE; HPSS, Castor Tape Systems
PPDG MOP system
PPDG Developed MOP SystemAllows submission of CMS prod. Jobs from a central location, run on remote locations, and returnresults
Relies on GDMP for replication Globus GRAM Condor-G and local queuing
systems for Job Scheduling IMPALA for Job Specification
Shown in SC2001 demoNow being deployed in USCMS testbedProposed as basis for next CMS-wide production infrastructure
US CMS Prototypes and Test-bedsUS CMS Prototypes and Test-beds
All U.S. CMS S&C Institutions are involved in DOE and NSF Grid ProjectsAll U.S. CMS S&C Institutions are involved in DOE and NSF Grid Projects1.1. Integrating Grid softwareIntegrating Grid software
into CMS systemsinto CMS systems2.2. Bringing CMS ProductionBringing CMS Production
on the Gridon the Grid3.3. Understanding the Understanding the
operational issuesoperational issues
1.1. MOP used as first pilot applicationMOP used as first pilot application
2.2. MOP system got official CMS production MOP system got official CMS production assignment of 200K CMSIM eventsassignment of 200K CMSIM events
3.3. 50K have been produced and registered already50K have been produced and registered already
Installing middleware
Virtual Data Toolkit Globus 2.0 beta
Essential Grid ToolsEssential Grid Services I & IIGrid API
Condor-G 6.3.1 Condor 6.3.1 ClassAds 0.9 GDMP 3.0 alpha 3
We found the VDT to be very easy to install, but a little bit more challenging to configure
StorageResource
Replica MngmtCatalog Services
Planner Executor
Use r
RefDBMaterialized
DataCatalog
Virtual DataCatalog
ConcretePlanner/
WP1
AbstractPlanner
MOP/WP1
ReplicaCatalogGDMP
Local GridStorage
ObjectivityMetadataCatalog
LocalTracking DB
Prototype VDG System (production)
Compute Resource
BO
SS
CMKIN
CMSIM
ORCA/COBRA
WrapperScripts
StorageResource
Replica MngmtCatalog Services
Planner Executor
Use r
RefDBMaterialized
DataCatalog
Virtual DataCatalog
ConcretePlanner/
WP1
AbstractPlanner
MOP/WP1
ReplicaCatalogGDMP
Local GridStorage
ObjectivityMetadataCatalog
LocalTracking DB
Compute Resource
BO
SS
CMKIN
CMSIM
ORCA/COBRA
WrapperScripts
= no code = existing = implemented using MOP
Prototype VDG System (production)
Analysis part
Physics data analysis will be done by 100s of users
Caltech taking responsibility for developing the
analysis part of the vertically integrated system
Analysis part is connected to same catalogs
Maintain a global view of all data
Big analysis jobs can use production job handling
mechanisms
Analysis services based on tags
Optimization of “Tag” Databases
Tags are small (~0.2 - 1 kbyte) summary objects for each event
Crucial for fast selection of interesting event subsets;this will be an intensive activity
Past work concentrated in three main areas:Integration of CERN’s “HepODBMS” generic Tag
system with the CMS “COBRA[*]” frameworkInvestigations of Tag bitmap indexing to speed queries
Comparisons of OO and traditional databases (SQL Server, soon Oracle 9i) as efficient stores for Tags
New work concentrates on tag based analysis services
CLARENS: a Portal to the Grid
Grid-enabling the working environment for non-specialist physicists' data analysisClarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence.The server is implemented in C++ to give access to the CMS OO analysis toolkit.The server will provide a remote API to Grid tools:
Security services provided by the Grid (GSI)The Virtual Data Toolkit: Object collection accessData movement between Tier centers using GSI-FTPCMS analysis software (ORCA/COBRA),
Current prototype is running on the Caltech proto-Tier2More information at http://clarens.sourceforge.net, along with a web-based demo
Push & Pullrsh & ssh existing scripts
snmp
RCMonitorService
Farm Monitor
Client(other service)
LookupService
LookupService
Registration
Farm Monitor
Discovery
Proxy
Component Factory
GUI marshaling Code Transport RMI data access
Globally Scalable Monitoring Service CMS (Caltech and Pakistan)
Current events
GDMP and MOP just had very favorable internal reviews in PPDG
Testbed: currently MOP deployment under way Stresses the Grid middleware in new ways: new
issues and bugs being discovered in Globus, Condor
Testbed MOP production request: 200K CMSIM events requested, now 50K (~10 GB)
finished and validated. New fully integrated system: first versions expected
by summer System will be the basis for demos at SC2002
Upcoming: CMS workshop on Grid based production (CERN)
Upcoming: PPDG analysis workshop (Berkeley)
2000 - 2001
Main `Grid task' activities in 2000 - 2001: Ramp-up of Grid projects,
establish a new mode of working Grid project requirements documents, architecture GDMP
Started as griddified package for data transport in CMS production, is now a more generic project
Used widely in 2001 production Also demo of mode of working
MOP Vertical integration of CMS production software,
GDMP, Condor Both GDMP and MOP just had very succesful internal
reviews in PPDG
2002
Grid task main activities (in US) in 2002: Build USCMS test grid Deploy Globus 2.0, EU DataGrid components Use MOP as a basis for developing a larger
vertically integrated system withVirtual data featuresCentral catalogs and a global view of dataProduction facilities
● Participate in real CMS production with non-
trivial jobsAnalysis facilities
Caltech team's main role is towards analysis facilities
Summary: 2000 - 2002
Main `Grid task' activities in 2000 - 2001: Grid project requirements documents,
architecture GDMP MOP
Main `Grid task' activities (in US) in 2002: Build USCMS test grid Deploy Globus 2.0, EU DataGrid components Use MOP as a basis for developing a larger
vertically integrated system withVirtual data featuresCentral catalogs and a global view of dataProduction facilities
● Participate in real CMS productionAnalysis facilities