10
03/27/2003 CHEP2003 1 Remote Operation of a Monte Remote Operation of a Monte Carlo Production Farm Using Carlo Production Farm Using Globus Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio State University)

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

Embed Size (px)

Citation preview

Page 1: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 1

Remote Operation of a Monte Carlo Remote Operation of a Monte Carlo Production Farm Using GlobusProduction Farm Using Globus

Dirk Hufnagel, Teela Pulliam,

Thomas Allmendinger, Klaus Honscheid

(Ohio State University) 

Page 2: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 2

• High luminosity experiments need large MC sample (Belle,BaBar require hundreds of millions of MC events)

• Massive computing power needed (farms of Linux machines)

• Farms are typically geographically distributed

CLEO two sites

DELPHI five sites

BaBar two dozen sites (US and Europe)

Belle eight sites

The Problem:

Page 3: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 3

Hardware alone is not sufficient:Hardware alone is not sufficient:

• Hardware, system level software maintenance• Experiment specific MC software setup

• MC production• Job submission

• Job monitoring (rerun failed jobs)

• Data transfer

• Coordination

Page 4: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 4

Is there another way?Is there another way?

• Reduced manpower requirements• More efficient coordination

• Our approach• Select one of the steps in the MC production chain

• MC Production• Centralize operations

• Remote submission and monitoring• Evaluate GRID tools. Can they help with MC production?

• Globus toolkit

Page 5: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 5

OSU MC Production FarmOSU MC Production Farm

• 27 dual Athlon nodes 1U• 1 dual Athlon server 4U• 840GB disk in RAID• OpenPBS batch system• File/batch queue server• 600-700k MC events/day

Page 6: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 6

Globus ToolkitGlobus Toolkit

• Globus• Secure access

• Certificates for user and server• Remote command execution system

• We observed significant overhead• few seconds for single command

• Integrated tools• e.g. GRIDftp

• Installation at Ohio State• Globus 2.2.4 on dedicated server• Separate batch queue system for testing• No Resource Broker

• Farm configuration details hidden• Loss of dynamic configurability but much simpler

Page 7: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 7

MC production I : Job submissionMC production I : Job submission

• Typical input information :• (MC software release), run range, #events …

• To do :• build MC jobs and submit them

• Choose on option:• One Globus command starts whole run range production

• many (thousands) of local jobs

• still need local script

• One Globus command starts a single MC production job

• Too slow

• Submit all production runs at once

• Only submit enough runs to fill queue

• Re-submitted jobs proceed faster

Page 8: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 8

MC production II : Job monitoringMC production II : Job monitoring

• Job Status (“qstat”)• Use local script to monitor log files

• Resubmit crashed jobs locally

• Monitor through Globus (remotely)• Speed?

• Data Quality Monitoring• check physics histograms• not always done during production

Page 9: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 9

MC production III : Data transferMC production III : Data transfer

• Easy if MC output is in file format• GridFTP …

• Can be complicated otherwise• Example would be writing MC into a database

• Limited disk space -> delete generated MC• Log files

Page 10: 03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio

03/27/2003 CHEP2003 10

ConclusionConclusion

• MC production for a high luminosity experiment requires significant hardware and manpower resources.

• GRID tools can help to centralize this effort.• Simple test show that remote operation of MC farms is

possible• Relatively easy to setup• Globus framework (secure access, remote command execution)• Local scripts for job submission, monitoring

• Still, significant software infrastructure (“local scripts” required.

• Other parts of the MC production chain need to be addressed before this becomes a realistic option.• Remote MC software installation and version management