20
3rd June 2004 3rd June 2004 CDF Grid CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

Embed Size (px)

Citation preview

Page 1: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

SAM:Metadata and Middleware Components

Mòrag Burgon-Lyon

University of Glasgow

Page 2: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

Contents

• CDF Computing Goals

• SAM

• CAF

• DCAF

• JIM

• How it all fits together

• SAM TV

Page 3: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

CDF Computing Goals

• The CDF experiment intend to have:– 25% of computing offsite by June 2004

– 50% by June 2005

• To achieve these goals several components are being developed and deployed:– SAM – data handling system

– CAF & DCAF – batch systems

– JIM – Grid extension to SAM

– SAM TV – monitoring for SAM Stations

Page 4: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

SAM

• Sequential Access via Metadata• Mature data handling system• Users can start SAM projects, e.g. running AC+

+Dump. • Large volumes of files (in datasets) may be

requested by SAM and are processed by the SAM projects. These are transferred from either the main cache at Fermilab, or from neighbouring SAM stations.

Page 5: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

CAF

• The original CDF Analysis Farm• The CAF is a 600 CPU farm of computers running Linux

• Access to the CDF data handling system and databases to allow CDF collaborators to run batch analysis jobs.

• Since standard Unix accounts are not created for users (i.e. you cannot ``log into'' the CAF), custom software provides remote job submission, control, monitoring, and output interface for the user

• Strongly authenticated via kerberos.

• http://cdfcaf.fnal.gov/

Page 6: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

CAF

• Users compile and link their analysis jobs on their desktop.• The required files are archived into a temporary tar file and

copied to the CAF head node.• Jobs are executed using a distributed batch system Farm Batch

System Next Generation (FBSNG)• Output is tarred up and either received back on the users desktop

or saved to scratch space on the CAF FTP server, for later retrieval.

• A cdfsoft installation is required to submit jobs. Two 8-way Linux SMP systems are provided for users without cdfsoft on their local desktops, and for general reference for users having problems with their local installations.

Page 7: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

CAF

Page 8: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

CAF

• Initially configured to favour large reads and small writes (e.g. produce small skims, histograms, etc from official secondary datasets).

• Extensions have been made to allow users to store their output files back into the SAM data handling system allowing jobs with larger writes to run easily.

• CAF has also been used for large-scale Monte Carlo and tertiary data set production.

• Users typically use CAF GUI, though command line job submissions are also possible.

Page 9: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

CAF Monitoring

Page 10: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

DCAF

• Decentralised CDF Analysis Farm

• CAF implemented at several remote sites from Taiwan to Canada

• Rollout began in January 2004

• Core set of 6 DCAF sites provide backbone

• New sites continually being added

• User selects site on which to run

Page 11: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

DCAF Hardware Resources site GHz

nowTB now

GHz Summer

TB Summer Notes

INFN 250 5 950 30 Priority to INFN users; Pinned data sets exist

Taiwan 100 2.5 150 2.5 Pinned data sets exist

Korea 120 - 120 - Running MC only now

UCSD 280 5 280 5 Pools resources from several US groups. Min guaranteed from x2 larger farm (CDF+CMS)

Rutgers 100 4 400 4 In-kind, will do MC production

TTU 6 2 60 4 2 DCAFs, test site + CDF+CMS cluster

Germany GridKa

~200 16 ~240 18 Min. guaranteed CPU from x8 larger pool. Open to all by ~Dec (JIM)

Canada 240+ - 240+ - In-kind, doing MC production, + common pool

Japan - - 150 6 Under construction

Cantabria 30 1 60 2 ~1 month away

MIT - - 200 - ~1 month away

UK - - 400 - Open to all by ~Dec (JIM), + common pool

Page 12: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

DCAF

• Recent DCAF report (1st June):– Taiwan DCAF has finished copying and pinning 3 large

muon datasets with no major problems.– Request for ~600GHz of MC production for June has

been received.– Storing MC results in a timely way was a priority.– The MC producers have been educated in storage of

files through SAM (web-pages, tutorials), requiring only the CDF dataset name or MC request ID.

– Request for ~600GHz of MC production for June has been received.

Page 13: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

JIM

• Job and Information Management• Grid extension to SAM allowing users to submit

jobs using a local thin client.• Remote broker assigns each job to an execution

site based on where the most data is present and the queue is the shortest.

• Job progress can be monitored through a web page.

• Job output can be downloaded from using a web browser.

Page 14: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

JIM

Page 15: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

JIM

• JIM can run on shared resources, and can interface with most batch systems

• CDF environment can be tar-balled, for running Monte Carlo on non-CDF equipment.

• D0 have successfully run large Monte Carlo • CDF Monte Carlo has been run interactively on

D0 cluster. Next step is JIM submission.

Page 16: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

How it all fits together

Page 17: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

SAM TV

Adam Lyon at Fermilab has created a set of web pages that can be used to monitor SAM stations and projects.

Demo:

• http://ncdf151.fnal.gov:8520/samTV/current/samTV.html

Page 18: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

SAM TV

• Snapshot summaries – lists the stations with a pie-chart showing the number of file transfers.

• SAM project snapshot – all the projects on the selected station with a plot of file delivery/time.

• Project details – including time and plot of last file delivery

• Consumer and process – consumer and process Ids, application, node, user, etc.

• Files – list of files desired by a project

Page 19: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

SAM TV

Page 20: 3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow

3rd June 20043rd June 2004 CDF GridCDF Grid

Challenges and Future Work

• Implementation and rollout of JIM for MC

• More DCAF installations

• Encourage user migration

• Solve fragmented disks and caches problem (suggestions welcome!)