21
Condor Project Computer Sciences Department University of Wisconsin-Madison http://www.cs.wisc.edu/condor Stork An Introduction Condor Week 2006 Milan

Stork An Introduction Condor Week 2006 Milan

  • Upload
    molimo

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Stork An Introduction Condor Week 2006 Milan. Two Main Ideas. Make data transfers a “first class citizen” in Condor Reuse items in the Condor toolbox. The tools. ClassAds Matchmaking DAGMan. The data transfer problem. Process large data sets at sites on grid. For each data set: - PowerPoint PPT Presentation

Citation preview

Page 1: Stork  An Introduction Condor Week 2006 Milan

Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madisonhttp://www.cs.wisc.edu/condor

Stork An Introduction

Condor Week 2006Milan

Page 2: Stork  An Introduction Condor Week 2006 Milan

2http://www.cs.wisc.edu/condor

Two Main Ideas

•Make data transfers a “first class citizen” in Condor

•Reuse items in the Condor toolbox

Page 3: Stork  An Introduction Condor Week 2006 Milan

3http://www.cs.wisc.edu/condor

The tools

•ClassAds

•Matchmaking

•DAGMan

Page 4: Stork  An Introduction Condor Week 2006 Milan

4http://www.cs.wisc.edu/condor

The data transfer

problem•Process large data sets at sites on

grid. For each data set:

o stage in data from remote server

o run CPU data processing job

o stage out data to remote server

Page 5: Stork  An Introduction Condor Week 2006 Milan

5http://www.cs.wisc.edu/condor

Simple Data Transfer

Job#!/bin/sh

globus-url-copy source dest

Often works fine for short, simple

data transfers, but…

Page 6: Stork  An Introduction Condor Week 2006 Milan

6http://www.cs.wisc.edu/condor

What can go wrong?

•Too many transfers at one time

•Service down; need to try later

•Service down; need to try alternate data source

•Partial transfers

•Time out; not worth waiting anymore

Page 7: Stork  An Introduction Condor Week 2006 Milan

7http://www.cs.wisc.edu/condor

Stork

•What Schedd is to CPU jobs, Stork is to data placement jobs.o Job queue

o Flow control

o Failure-handling policies

o Event log

Page 8: Stork  An Introduction Condor Week 2006 Milan

8http://www.cs.wisc.edu/condor

Supported Data

Transfers•local file system

•GridFTP

•FTP

•HTTP

•SRB

• NeST

• SRM

• other protocols via simple plugin

Page 9: Stork  An Introduction Condor Week 2006 Milan

9http://www.cs.wisc.edu/condor

Stork Commands

stork_submit - submit a jobstork_q - list the job queuestork_status - show completion

statusstork_rm - cancel a job

Page 10: Stork  An Introduction Condor Week 2006 Milan

10http://www.cs.wisc.edu/condor

Creating a Submit Description File

• A plain ASCII text file

• Tells Stork about your job:o source/destinationo alternate protocolso proxy locationo debugging logso command-line arguments

Page 11: Stork  An Introduction Condor Week 2006 Milan

11http://www.cs.wisc.edu/condor

Simple Submit File// c++ style comment lines[ dap_type = "transfer"; src_url = "gsiftp://server/path”; dest_url = "file:///dir/file"; x509proxy = "default"; log = "stage-in.out.log"; output = "stage-in.out.out"; err = "stage-in.out.err";]

Note: different format from Condor submit files

Page 12: Stork  An Introduction Condor Week 2006 Milan

12http://www.cs.wisc.edu/condor

Sample stork_submit

# stork_submit stage-in.storkusing default proxy: /tmp/x509up_u19100================Sending request: [ dest_url = "file:///dir/file"; src_url = "gsiftp://server/path"; err = "path/stage-in.out.err"; output = "path/stage-in.out.out"; dap_type = "transfer"; log = "path/stage-in.out.log"; x509proxy = "default" ]================

Request assigned id: 1#

returned job id

Page 13: Stork  An Introduction Condor Week 2006 Milan

13http://www.cs.wisc.edu/condor

Sample Stork User Log

000 (001.-01.-01) 04/17 19:30:00 Job submitted from host: <128.105.121.53:54027>...001 (001.-01.-01) 04/17 19:30:01 Job executing on host: <128.105.121.53:9621>...008 (001.-01.-01) 04/17 19:30:01 job type: transfer...008 (001.-01.-01) 04/17 19:30:01 src_url: gsiftp://server/path...008 (001.-01.-01) 04/17 19:30:01 dest_url: file:///dir/file...005 (001.-01.-01) 04/17 19:30:02 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job...

Page 14: Stork  An Introduction Condor Week 2006 Milan

14http://www.cs.wisc.edu/condor

Who needs Stork?

SRM exists. It provides a job queue, logging, etc.

Why not use that?

Page 15: Stork  An Introduction Condor Week 2006 Milan

15http://www.cs.wisc.edu/condor

Use whatever makes

sense!•Another way to view Stork:

•Glue between DAGMan and data transport or transport scheduler.

•So one DAG can describe a workflow, including both data movement and computation steps.

Page 16: Stork  An Introduction Condor Week 2006 Milan

16http://www.cs.wisc.edu/condor

Stork jobs in a DAG• A DAG is defined by a text file, listing

each job and its dependents:# data-process.dagData IN in.storkJob CRUNCH crunch.condorData OUT out.storkParent IN Child CRUNCHParent CRUNCH Child OUT

• each node will run the Condor or Stork job specified by accompanying submit file

IN

CRUNCH

OUT

Page 17: Stork  An Introduction Condor Week 2006 Milan

17http://www.cs.wisc.edu/condor

Important Stork

Parameters•STORK_MAX_NUM_JOBS limits number of active jobs

•STORK_MAX_RETRY limits job attempts, before job marked as failed

•STORK_MAXDELAY_INMINUTES specifies “hung job” threshold

Page 18: Stork  An Introduction Condor Week 2006 Milan

18http://www.cs.wisc.edu/condor

Features in

DevelopmentMatchmakingo Job ClassAd with site ClassAdo Global max transfers per site limitso Load balancing across siteso Dynamic reconfiguration of siteso Coordination of multiple instances of Stork

Working prototype developed with Globus gridftp team

Page 19: Stork  An Introduction Condor Week 2006 Milan

19http://www.cs.wisc.edu/condor

Further Ahead•Automatic startup of personal stork

server on demand

•Fair sharing between users

•Fit into new pluggable scheduling framework ala schedd-on-the-side

Page 20: Stork  An Introduction Condor Week 2006 Milan

20http://www.cs.wisc.edu/condor

Summary

•Stork manages a job queue for data transfers

•A DAG may describe a workflow containing both data movement and processing steps.

Page 21: Stork  An Introduction Condor Week 2006 Milan

21http://www.cs.wisc.edu/condor

Additional Resources

•http://www.cs.wisc.edu/condor/

stork/

•Condor Manual, Stork sections

[email protected] list

[email protected] list