Stork An Introduction Condor Week 2006 Milan

Preview:

DESCRIPTION

Stork An Introduction Condor Week 2006 Milan. Two Main Ideas. Make data transfers a “first class citizen” in Condor Reuse items in the Condor toolbox. The tools. ClassAds Matchmaking DAGMan. The data transfer problem. Process large data sets at sites on grid. For each data set: - PowerPoint PPT Presentation

Citation preview

Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madisonhttp://www.cs.wisc.edu/condor

Stork An Introduction

Condor Week 2006Milan

2http://www.cs.wisc.edu/condor

Two Main Ideas

•Make data transfers a “first class citizen” in Condor

•Reuse items in the Condor toolbox

3http://www.cs.wisc.edu/condor

The tools

•ClassAds

•Matchmaking

•DAGMan

4http://www.cs.wisc.edu/condor

The data transfer

problem•Process large data sets at sites on

grid. For each data set:

o stage in data from remote server

o run CPU data processing job

o stage out data to remote server

5http://www.cs.wisc.edu/condor

Simple Data Transfer

Job#!/bin/sh

globus-url-copy source dest

Often works fine for short, simple

data transfers, but…

6http://www.cs.wisc.edu/condor

What can go wrong?

•Too many transfers at one time

•Service down; need to try later

•Service down; need to try alternate data source

•Partial transfers

•Time out; not worth waiting anymore

7http://www.cs.wisc.edu/condor

Stork

•What Schedd is to CPU jobs, Stork is to data placement jobs.o Job queue

o Flow control

o Failure-handling policies

o Event log

8http://www.cs.wisc.edu/condor

Supported Data

Transfers•local file system

•GridFTP

•FTP

•HTTP

•SRB

• NeST

• SRM

• other protocols via simple plugin

9http://www.cs.wisc.edu/condor

Stork Commands

stork_submit - submit a jobstork_q - list the job queuestork_status - show completion

statusstork_rm - cancel a job

10http://www.cs.wisc.edu/condor

Creating a Submit Description File

• A plain ASCII text file

• Tells Stork about your job:o source/destinationo alternate protocolso proxy locationo debugging logso command-line arguments

11http://www.cs.wisc.edu/condor

Simple Submit File// c++ style comment lines[ dap_type = "transfer"; src_url = "gsiftp://server/path”; dest_url = "file:///dir/file"; x509proxy = "default"; log = "stage-in.out.log"; output = "stage-in.out.out"; err = "stage-in.out.err";]

Note: different format from Condor submit files

12http://www.cs.wisc.edu/condor

Sample stork_submit

# stork_submit stage-in.storkusing default proxy: /tmp/x509up_u19100================Sending request: [ dest_url = "file:///dir/file"; src_url = "gsiftp://server/path"; err = "path/stage-in.out.err"; output = "path/stage-in.out.out"; dap_type = "transfer"; log = "path/stage-in.out.log"; x509proxy = "default" ]================

Request assigned id: 1#

returned job id

13http://www.cs.wisc.edu/condor

Sample Stork User Log

000 (001.-01.-01) 04/17 19:30:00 Job submitted from host: <128.105.121.53:54027>...001 (001.-01.-01) 04/17 19:30:01 Job executing on host: <128.105.121.53:9621>...008 (001.-01.-01) 04/17 19:30:01 job type: transfer...008 (001.-01.-01) 04/17 19:30:01 src_url: gsiftp://server/path...008 (001.-01.-01) 04/17 19:30:01 dest_url: file:///dir/file...005 (001.-01.-01) 04/17 19:30:02 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job...

14http://www.cs.wisc.edu/condor

Who needs Stork?

SRM exists. It provides a job queue, logging, etc.

Why not use that?

15http://www.cs.wisc.edu/condor

Use whatever makes

sense!•Another way to view Stork:

•Glue between DAGMan and data transport or transport scheduler.

•So one DAG can describe a workflow, including both data movement and computation steps.

16http://www.cs.wisc.edu/condor

Stork jobs in a DAG• A DAG is defined by a text file, listing

each job and its dependents:# data-process.dagData IN in.storkJob CRUNCH crunch.condorData OUT out.storkParent IN Child CRUNCHParent CRUNCH Child OUT

• each node will run the Condor or Stork job specified by accompanying submit file

IN

CRUNCH

OUT

17http://www.cs.wisc.edu/condor

Important Stork

Parameters•STORK_MAX_NUM_JOBS limits number of active jobs

•STORK_MAX_RETRY limits job attempts, before job marked as failed

•STORK_MAXDELAY_INMINUTES specifies “hung job” threshold

18http://www.cs.wisc.edu/condor

Features in

DevelopmentMatchmakingo Job ClassAd with site ClassAdo Global max transfers per site limitso Load balancing across siteso Dynamic reconfiguration of siteso Coordination of multiple instances of Stork

Working prototype developed with Globus gridftp team

19http://www.cs.wisc.edu/condor

Further Ahead•Automatic startup of personal stork

server on demand

•Fair sharing between users

•Fit into new pluggable scheduling framework ala schedd-on-the-side

20http://www.cs.wisc.edu/condor

Summary

•Stork manages a job queue for data transfers

•A DAG may describe a workflow containing both data movement and processing steps.

21http://www.cs.wisc.edu/condor

Additional Resources

•http://www.cs.wisc.edu/condor/

stork/

•Condor Manual, Stork sections

•stork-announce@cs.wisc.edu list

•stork-discuss@cs.wisc.edu list

Recommended