18
Physics with SAM- Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

Embed Size (px)

Citation preview

Page 1: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

Physics with SAM-Grid

Stefan StonjekUniversity of Oxford

CHEP 2003 25th March 2003

San Diego

Page 2: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 2

Outline

• Components of SAM Grid

• Job submission

• Example

• Problems

• Outlook

• Summary

Page 3: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 3

Components of SAM-Grid (global)

• JIM (Job and Information Management system)– Frontend and glue

• Condor-G for global submission and brokering

• GRAM protocol to transfer job to execution site

• Authentication via GSI (Grid Security Infrastructure)

Page 4: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 4

Components of SAM-Grid (local)

• Data handling with SAM (Sequential data Access via Meta-data)

• Local job submission to– CDF: CAF (Central Analysis Facility)

• FBS: Farm batch system• Kerberized tools to write data back to FNAL

– DØ: OpenPBS• Output data handling via SAM

Page 5: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 5

Submission to the Grid

• User must provide :– Grid proxy– Job description file– “tar” file with executable and configuration files

• GUI exists (generates “jdf” file, “tar” file and submits)• Submission to via JIM to Condor• Submit to the site with the most files of the required

dataset already present– New Condor-MMS feature: execution of external code when

negotiating the matches– Here: calls SAM to check for the presence of input data at

different locations

Page 6: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 6

Local Submission

• Job is transferred as “tar” file via the GRAM protocol and than submitted to the local batch system

• Different local batch systems are possible– Need adaptor for submission and job status information– Supported at the moment: FBS, PBS, LSF, Condor

• Queues have to be the same at all sites– Problem: job should not stop in the middle of an input file– User has to limit amount of input relative to CPU time in queue

(needs to know queue CPU time) or provide CPU time per event (difficult)

Page 7: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 7

SAM-Grid Architecture

Page 8: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 8

Job Description File

• executable = ./run-job.sh• sam_dataset = jbot0g• input_sandbox_tgz = inbox.tgz• output_sandbox = [email protected]:/www/output.tgz• email = [email protected]• job_type = caf• caf_job_type = sam• caf_initial_section = 1• caf_final_section = 1

Page 9: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 9

Data Handling (SAM)• If needed SAM transfers the files to the

local site

• SAM translates dataset name to list of files

• Selection can be based on physics meta-data

• File transfer and delivery is transparent for the user

Page 10: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 10

Job Monitoring• Monitoring via a Web page

• Job is identified by a global job ID

• Decentralized approach– Several independent web-servers possible

Page 11: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 11

Layout of the Example(shown at Supercomputing 2002, November 2002)

• Submission via command line

• Broker to one site

• Transfer via GRAM protocol

• Local job submission by CAF

• Job monitoring via Web

• Transfer of results via kerberized rcp to FNAL web server

Page 12: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 12

Grid Map

•CDF–Kyungpook National University, Korea–Rutgers State University, New Jersey, US–Rutherford Appelton Laboratory, UK–Texas Tech, Texas, US–University of Toronto, Canada

•DØ–Imperial College, London, UK–Michigan State University, Michigan, US–University of Michigan, Michigan, US–University of Texas at Arlington, Texas, US

Page 13: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 13

PhysicsStandard CDF analysis job submitted via SAM-Grid and executed somewhere

z0(µ1) z0(µ2)

J/ψ => µ+ µ-

Page 14: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 14

Problems (Security)• Firewalls (different settings at different sites)

– Block all incoming connection to unpriviled ports– Cancel idle TCP/IP connection– Communication problems, in particular for remote execution

• Authentication (ssh, kerberos, GSI, ...)– FNAL allows just kerberized access

• Different and local policies

• Problem: How to write back the data?

• Grid Security Infrastructure (GSI) might help

Page 15: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 15

Problems (Private Networks)• Already problems with SAM

• Worker node can contact outside world

• Outside world can not call back

• Problem if long time between call from worker and the response from the outside

• IPv6 might be a solution

Page 16: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 16

Outlook• Deploy SAM-Grid to further locations

• Develop SAM-Grid towards production readiness

Page 17: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 17

Summary• Several new tools and protocols were

used to from a Grid enabled environment to do physics

• SAM-Grid is able to use Grid technology to perform real world physics analysis

Page 18: CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP 2003 25 th March 2003 San Diego

CHEP 2003 Stefan Stonjek 18