42
Condor Project Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor Case Studies of Using Condor for Scientists Barcelona, 2006

Case Studies of Using Condor for Scientists Barcelona, 2006

  • Upload
    meghan

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Case Studies of Using Condor for Scientists Barcelona, 2006. Agenda. Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs. BLAST. Background. - PowerPoint PPT Presentation

Citation preview

Page 1: Case Studies of Using Condor for Scientists  Barcelona, 2006

Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/condor

Case Studies of Using Condor for Scientists

Barcelona, 2006

Page 2: Case Studies of Using Condor for Scientists  Barcelona, 2006

2http://www.cs.wisc.edu/condor

AgendaExtended user’s tutorialAdvanced Uses of Condor

Java programsDAGManStorkMWGrid Computing

Case studies, and a discussion of your application‘s needs

Page 3: Case Studies of Using Condor for Scientists  Barcelona, 2006

3http://www.cs.wisc.edu/condor

BLAST

Page 4: Case Studies of Using Condor for Scientists  Barcelona, 2006

4http://www.cs.wisc.edu/condor

Background

• Each species has a genetic encoding within its cells

• Humans are made of approximately 1014 cells

Page 5: Case Studies of Using Condor for Scientists  Barcelona, 2006

5http://www.cs.wisc.edu/condor

Background• The human nucleus of each

cell contains 46 chromosomes• Each chromosome contains

between 231 and 2958 genes• Each chromosome is made of

somewhere between 25 million and 237 million (approximately) base pairs

Page 6: Case Studies of Using Condor for Scientists  Barcelona, 2006

6http://www.cs.wisc.edu/condor

Page 7: Case Studies of Using Condor for Scientists  Barcelona, 2006

7http://www.cs.wisc.edu/condor

Base Pairs (Simplified)

• Each base pair is one of 4 nucleotides

• Each nucleotide is represented by one letter:

A C G T

Page 8: Case Studies of Using Condor for Scientists  Barcelona, 2006

8http://www.cs.wisc.edu/condor

The Science Issue

Scientists ask many questions and pose computationally difficult issues:map a species’ genome - build a huge

database of informationunderstand evolution at a genetic level –

answer homology and related questionsidentify mutations and genes – to develop

diagnoses and medical treatments

Page 9: Case Studies of Using Condor for Scientists  Barcelona, 2006

9http://www.cs.wisc.edu/condor

BLAST

Basic Local Alignment Search Tool A really good pattern matching program An answer to the science questions often

requires queries such asDoes the following nucleotide sequence

(~1000 pairs), or something close appear in the database (several billions of pairs)? To what certainty is there a match?

Page 10: Case Studies of Using Condor for Scientists  Barcelona, 2006

10http://www.cs.wisc.edu/condor

The Biological Magnetic Resonance Data Bank

Department of Biochemistry at University of Wisconsin-Madison

Part of the Center for Eukaryotic Structural Genomics (CESG)

Working on three dimensional protein structure

Page 11: Case Studies of Using Condor for Scientists  Barcelona, 2006

11http://www.cs.wisc.edu/condor

The BMRB and BLAST

The BMRB (with the help of the Condor Team) has a weekly set of automated BLAST runs

These BLAST runs compare progress on the BMRB set of working proteins to the Protein Data Bank

Page 12: Case Studies of Using Condor for Scientists  Barcelona, 2006

12http://www.cs.wisc.edu/condor

Serial versus Parallel

Too slow: The BMRB working set could be input as a single BLAST program execution Load the Protein Data Bank database Serially query the database with each protein

in the working set

Faster: Divide the working set into pieces that allow parallel executions of BLAST

Page 13: Case Studies of Using Condor for Scientists  Barcelona, 2006

13http://www.cs.wisc.edu/condor

Weekly BMRB Runs

1. Obtain and install the BLAST executable and Protein Data Bank database

2. Decide on the best way to split the BMRB working set of proteins to minimize the parallel execution time

3. Make a custom DAG for this split4. Produce a report on the BMRB run

Page 14: Case Studies of Using Condor for Scientists  Barcelona, 2006

14http://www.cs.wisc.edu/condor

E

BBB

The Custom DAG

. . .

E E. . .

C

B is BLAST

E is Extract results

Page 15: Case Studies of Using Condor for Scientists  Barcelona, 2006

15http://www.cs.wisc.edu/condor

An Economics Application

Computations are done at points on a coordinate plane

Initial values are known along the axes Computation of one point at a time is too

slow (serial execution) Each point is dependent on 2 neighboring

points(x,y) can be computed knowing (x-1,y) and (x,y-

1)

Page 16: Case Studies of Using Condor for Scientists  Barcelona, 2006

16http://www.cs.wisc.edu/condor

The Coordinate Plane

1 2 3 5 64

1

2

3

4

5

6

know

n

result

Page 17: Case Studies of Using Condor for Scientists  Barcelona, 2006

17http://www.cs.wisc.edu/condor

The Coordinate Plane

1 2 3 5 64

1

2

3

4

5

6

know

n

resultinput

s

ready

Page 18: Case Studies of Using Condor for Scientists  Barcelona, 2006

18http://www.cs.wisc.edu/condor

The Coordinate Plane

1 2 3 5 64

1

2

3

4

5

6

know

n

resultinput

s

ready

Page 19: Case Studies of Using Condor for Scientists  Barcelona, 2006

19http://www.cs.wisc.edu/condor

The Coordinate Plane

1 2 3 5 64

1

2

3

4

5

6

know

n

resultinput

s

ready

Page 20: Case Studies of Using Condor for Scientists  Barcelona, 2006

20http://www.cs.wisc.edu/condor

The Coordinate Plane

1 2 3 5 64

1

2

3

4

5

6

know

n

resultinput

s

ready

Page 21: Case Studies of Using Condor for Scientists  Barcelona, 2006

21http://www.cs.wisc.edu/condor

The Coordinate Plane

1 2 3 5 64

1

2

3

4

5

6

know

n

resultinput

s

ready

Page 22: Case Studies of Using Condor for Scientists  Barcelona, 2006

22http://www.cs.wisc.edu/condor

The DAG

1-1

1-2

2-1

1-3

2-2

3-1

1-4

2-3

3-2

4-1

etc.

Page 23: Case Studies of Using Condor for Scientists  Barcelona, 2006

23http://www.cs.wisc.edu/condor

Use DAGMan

Write a program to generate the DAG input file

The submit description file (and the executable) is the same for each node in the DAG

Page 24: Case Studies of Using Condor for Scientists  Barcelona, 2006

24http://www.cs.wisc.edu/condor

DAG Input FileJob 1-1 gonkulate.submitJob 1-2 gonkulate.submitParent 1-1 Child 1-2Job 2-1 gonkulate.submitParent 1-1 Child 2-1Job 1-3 gonkulate.submitParent 1-2 Child 1-3Job 2-2 gonkulate.submitParent 1-2 2-1 Child 2-2Vars 2-2 left=“file1-2”Vars 2-2 below=“file2-1”Vars 2-2 result=“file2-2”. . .

DAG input file, continued

Job 3-4 gonkulate.submit

Parent 2-4 3-3 Child 3-4

Vars 3-4 left=“file2-4”

Vars 3-4 below=“file3-3”

Vars 3-4 result=“file3-4”

. . .

Page 25: Case Studies of Using Condor for Scientists  Barcelona, 2006

25http://www.cs.wisc.edu/condor

Submit Description File

In gonkulate.submit:universe = vanillaexecutable = gonkulateoutput = $(result)should_transfer_files = YESwhen_to_transfer_output = ON_EXITtransfer_input_files = $(left) $(below)log = gonkulate.lognotification = Neverqueue

Page 26: Case Studies of Using Condor for Scientists  Barcelona, 2006

26http://www.cs.wisc.edu/condor

Nug30

Page 27: Case Studies of Using Condor for Scientists  Barcelona, 2006

27http://www.cs.wisc.edu/condor

Description of Nug30 nug30 (a Quadratic Assignment Problem

instance of size 30) had been the “holy grail” of computational QAP research since 1968

In 2000, Anstreicher, Brixius, Goux, & Linderoth set out to solve this problem

Using a mathematically sophisticated and well-engineered algorithm, they still estimated that we would require 11 CPU years to solve the problem.

Page 28: Case Studies of Using Condor for Scientists  Barcelona, 2006

28http://www.cs.wisc.edu/condor

Nugent’s Problem

There are a set of N locations and a set of N facilities, and each facility must be assigned a location. To measure the cost of each possible assignment, the flow between each pair of facilities is multiplied by the distance between the pair's assigned locations, and then a sum is taken over all of the pairs.

For Nug30, N = 30

Page 29: Case Studies of Using Condor for Scientists  Barcelona, 2006

29http://www.cs.wisc.edu/condor

The formal definition of the quadratic assignment problem is Given two sets, P ("facilities") and L ("locations"), of equal

size, together with a weight function w : P x P R and a distance function d : L x L R. Find the bijection f : P L (assignment) such that the cost function:

w(a,b) . d(f(a), f(b))

is minimized and a and b are members of P.Usually weight and distance functions are viewed as a

square real-valued matrices.

QAP Definition*

* Wikipedia

Page 30: Case Studies of Using Condor for Scientists  Barcelona, 2006

30http://www.cs.wisc.edu/condor

Scope of the Problem

This QAP problem is difficult due to the excessively large number of possible facility assignments.

The number of possible assignments is factorial in the number of facilities.N! = N x (N-1) x (N-2) x . . . x 2

30! is approximately 2.6 x 1032

Page 31: Case Studies of Using Condor for Scientists  Barcelona, 2006

31http://www.cs.wisc.edu/condor

The Simplified Approach

• Method of choice is branch and bound

• The complete tree has 30! nodes as leaves

• Branching grows the tree• Bounding results in

pruning the tree

Page 32: Case Studies of Using Condor for Scientists  Barcelona, 2006

32http://www.cs.wisc.edu/condor

The Nug30 Solution

Used a new algorithm calledquadratic programming bound

developed by Anstreicher and Brixius Sequential execution would have

taken 7 years, so parallelization of the algorithm was important

Used MW

Page 33: Case Studies of Using Condor for Scientists  Barcelona, 2006

33http://www.cs.wisc.edu/condor

Nug30 Computational Grid

Number Arch/OS Location 414 Intel/Linux Argonne

96 SGI/Irix Argonne

1024 SGI/Irix NCSA

16 Intel/Linux NCSA

45 SGI/Irix NCSA

246 Intel/Linux Wisconsin

146 Intel/Solaris Wisconsin

133 Sun/Solaris Wisconsin

190 Intel/Linux Georgia Tech

94 Intel/Solaris Georgia Tech

54 Intel/Linux Italy (INFN)

25 Intel/Linux New Mexico

12 Sun/Solaris Northwestern

5 Intel/Linux Columbia U.

10 Sun/Solaris Columbia U.

Used tricks to make it look like one Condor pool Flocking Glidein

2510 CPUs total

Page 34: Case Studies of Using Condor for Scientists  Barcelona, 2006

34http://www.cs.wisc.edu/condor

Workers Over Time

Page 35: Case Studies of Using Condor for Scientists  Barcelona, 2006

35http://www.cs.wisc.edu/condor

Nug30 solvedWall Clock Time 6 days

22:04:31 hours

Avg # Machines 653

CPU Time 11 years

Parallel Efficiency

93%

Page 36: Case Studies of Using Condor for Scientists  Barcelona, 2006

36http://www.cs.wisc.edu/condor

The Football Pool Problem

Page 37: Case Studies of Using Condor for Scientists  Barcelona, 2006

37http://www.cs.wisc.edu/condor

Win By Gambling

Each week, 6 games are played

The outcome of each game is

1. win2. lose3. tie

Page 38: Case Studies of Using Condor for Scientists  Barcelona, 2006

38http://www.cs.wisc.edu/condor

Bet, and win $$$

• Get 5 of the 6 games correctly predicted, and you win

• What is the minimum number of predictions you must make to guarantee winning?

Page 39: Case Studies of Using Condor for Scientists  Barcelona, 2006

39http://www.cs.wisc.edu/condor

Known Values

3 5

4 9

5 27

number of games minimum predictions

Page 40: Case Studies of Using Condor for Scientists  Barcelona, 2006

40http://www.cs.wisc.edu/condor

Problem Description

A covering code An NP Hard problem Many years of research and effort for 6

games leads to65 < minimum number of predictions < 73

An integer programming problem Best solver is the commercial application

CPLEX

Page 41: Case Studies of Using Condor for Scientists  Barcelona, 2006

41http://www.cs.wisc.edu/condor

Why the Problem is Difficult

Number of tickets possible: 6! x 36

The tree that represents the problem (and solutions) has many isomorphic branches. This makes it difficult to prune the tree.

New techniques have been developed, which leads to reducing the interval of solution

The latest and greatest does many smaller problems using MW

Page 42: Case Studies of Using Condor for Scientists  Barcelona, 2006

42http://www.cs.wisc.edu/condor

Solution! Not yet. . . The first effort (many CPU years

worth of time) had a very small error in input

Second effort is still in progress. All this to improve the lower bound

from 65 to 70, thereby reducing the range for the solution