56
Using Abstractions Using Abstractions to Scale Up to Scale Up Applications Applications to Campus Grids to Campus Grids Douglas Thain Douglas Thain University of Notre Dame University of Notre Dame 28 April 2009 28 April 2009

Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Using AbstractionsUsing Abstractionsto Scale Up Applicationsto Scale Up Applications

to Campus Gridsto Campus Grids

Douglas ThainDouglas Thain

University of Notre DameUniversity of Notre Dame

28 April 200928 April 2009

Page 2: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

OutlineOutline

What is a Campus Grid?What is a Campus Grid?

Challenges of Using Campus Grids.Challenges of Using Campus Grids.

Solution: AbstractionsSolution: Abstractions

Examples and ApplicationsExamples and Applications– All-Pairs: BiometricsAll-Pairs: Biometrics– Wavefront: EconomicsWavefront: Economics– Assembly: GenomicsAssembly: Genomics

Combining Abstractions TogetherCombining Abstractions Together

Page 3: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

What is a Campus Grid?What is a Campus Grid?

A campus grid is an aggregation of all A campus grid is an aggregation of all available computing power found in an available computing power found in an institution:institution:– Idle cycles from desktop machines.Idle cycles from desktop machines.– Unused cycles from dedicated clusters.Unused cycles from dedicated clusters.

Examples of campus grids:Examples of campus grids:– 600 CPUs at the University of Notre Dame600 CPUs at the University of Notre Dame– 2000 CPUs at the University of Wisconsin2000 CPUs at the University of Wisconsin– 13,000 CPUs at Purdue University13,000 CPUs at Purdue University

Page 4: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Provides robust batch queueing on a Provides robust batch queueing on a complex distributed system.complex distributed system.

Resource owners control consumption:Resource owners control consumption:– ““Only run jobs on this machine at night.”Only run jobs on this machine at night.”– ““Prefer biology jobs over physics jobs.”Prefer biology jobs over physics jobs.”

End users express needs:End users express needs:– ““Only run this job where RAM>2GB”Only run this job where RAM>2GB”– ““Prefer to run on machinesPrefer to run on machines

http://www.cs.wisc.edu/condor

Page 5: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009
Page 6: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009
Page 7: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009
Page 8: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009
Page 9: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009
Page 10: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

The Assembly LanguageThe Assembly Languageof Campus Gridsof Campus Grids

User Interface:User Interface:– N x { run program X with files F and G }N x { run program X with files F and G }

System Properties:System Properties:– Wildly varying resource availability.Wildly varying resource availability.– Heterogeneous resources.Heterogeneous resources.– Unpredictable preemption.Unpredictable preemption.

Effect on Applications:Effect on Applications:– Jobs can’t run for too long...Jobs can’t run for too long...– But, they can’t run too quickly, either!But, they can’t run too quickly, either!– Use file I/O for inter-process communication.Use file I/O for inter-process communication.– Bad choices cause chaos on the network and Bad choices cause chaos on the network and

heartburn for system administrators.heartburn for system administrators.

Page 11: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

I have 10,000 iris images acquired in my research lab. I want to reduce each one to a feature space, and then compare all of them to each other. I want to spend my time doing science, not struggling with computers.

I have a laptop.

I own a few machines I can get cycles from ND and Purdue

Now What?

Page 12: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

ObservationObservation

In a given field of study, a single person In a given field of study, a single person may repeat the same may repeat the same pattern of work of work many times, making slight changes to the many times, making slight changes to the data and algorithms.data and algorithms.

If we knew in advance the intended If we knew in advance the intended pattern, then we could do a better job of pattern, then we could do a better job of mapping a complex application to a mapping a complex application to a complex system.complex system.

Page 13: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

AbstractionsAbstractionsfor Distributed Computingfor Distributed Computing

Abstraction: a Abstraction: a declarative specificationdeclarative specification of the computation and data of a workload.of the computation and data of a workload.

A A restricted patternrestricted pattern, not meant to be a , not meant to be a general purpose programming language.general purpose programming language.

UsesUses data structures instead of files. instead of files.

Provide users with a Provide users with a bright path..

Regular structure makes it tractable to Regular structure makes it tractable to model and model and predict performance.predict performance.

Page 14: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

All-Pairs AbstractionAll-Pairs Abstraction

AllPairs( set A, set B, function F )AllPairs( set A, set B, function F )

returns matrix M wherereturns matrix M where

M[i][j] = F( A[i], B[j] ) for all i,jM[i][j] = F( A[i], B[j] ) for all i,j

B1

B2

B3

A1 A2 A3

F F F

A1A1

An

B1B1

Bn

F

AllPairs(A,B,F)F

F F

F F

FMoretti, Bulosan, Flynn, Thain,AllPairs: An Abstraction… IPDPS 2008

allpairs A B F.exe

Page 15: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Example ApplicationExample Application

Goal: Design robust face comparison function.Goal: Design robust face comparison function.

F

0.05

F

0.97

Page 16: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Similarity Matrix ConstructionSimilarity Matrix Construction

11 .8.8 .1.1 00 00 .1.1

11 00 .1.1 .1.1 00

11 00 .1.1 .3.3

11 00 00

11 .1.1

11

F

Current Workload:4000 images256 KB each10s per F(five days)

Future Workload:60000 images1MB each1s per F(three months)

Page 17: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Non-Expert User on a Campus GridNon-Expert User on a Campus GridTry 1: Each F is a batch job.Failure: Dispatch latency >> F runtime.

HN

CPU CPU CPU CPUF F F FCPUF

Try 2: Each row is a batch job.Failure: Too many small ops on FS.

HN

CPU CPU CPU CPUF F F FCPUFFFF FF

FFFF

FFF

FFF

Try 3: Bundle all files into one package.Failure: Everyone loads 1GB at once.

HN

CPU CPU CPU CPUF F F FCPUFFFF FF

FFFF

FFF

FFF

Try 4: User gives up and attemptsto solve an easier or smaller problem.

Page 18: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

All-Pairs AbstractionAll-Pairs Abstraction

AllPairs( set A, set B, function F )AllPairs( set A, set B, function F )

returns matrix M wherereturns matrix M where

M[i][j] = F( A[i], B[j] ) for all i,jM[i][j] = F( A[i], B[j] ) for all i,j

B1

B2

B3

A1 A2 A3

F F F

A1A1

An

B1B1

Bn

F

AllPairs(A,B,F)F

F F

F F

F

Page 19: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

% allpairs compare.exe adir bdir% allpairs compare.exe adir bdir

Page 20: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009
Page 21: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Distribute Data Via Spanning TreeDistribute Data Via Spanning Tree

Page 22: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009
Page 23: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

An Interesting TwistAn Interesting Twist

Send the absolute minimum amount of data Send the absolute minimum amount of data needed to each of N nodes from a central serverneeded to each of N nodes from a central server– Each job must run on exactly 1 node.Each job must run on exactly 1 node.– Data distribution time: Data distribution time: O( D sqrt(N) )O( D sqrt(N) )

Send all data to all N nodes via spanning tree Send all data to all N nodes via spanning tree distribution:distribution:– Any job can run on any node.Any job can run on any node.– Data distribution time: Data distribution time: O( D log(N) )O( D log(N) )

It is both faster and more robust to send all It is both faster and more robust to send all data to all nodes via spanning tree.data to all nodes via spanning tree.

Page 24: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Choose the Right # of CPUsChoose the Right # of CPUs

Page 25: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

What is the right metric?What is the right metric?

Page 26: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

What’s the right metric?What’s the right metric?

Speedup?Speedup?– Seq Runtime / Parallel RuntimeSeq Runtime / Parallel Runtime

Parallel Efficiency?Parallel Efficiency?– Speedup / N CPUs?Speedup / N CPUs?

Neither works, because the number of CPUs Neither works, because the number of CPUs varies over time and between runs.varies over time and between runs.

Better Choice: Cost EfficiencyBetter Choice: Cost Efficiency– Work Completed / Resources ConsumedWork Completed / Resources Consumed– Cars: Miles / GallonCars: Miles / Gallon– Planes: Person-Miles / GallonPlanes: Person-Miles / Gallon– Results / CPU-hoursResults / CPU-hours– Results / $$$Results / $$$

Page 27: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

All-Pairs Abstraction

Page 28: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Wavefront ( R[x,0], R[0,y], F(x,y,d) )Wavefront ( R[x,0], R[0,y], F(x,y,d) )

R[4,2]

R[3,2] R[4,3]

R[4,4]R[3,4]R[2,4]

R[4,0]R[3,0]R[2,0]R[1,0]R[0,0]

R[0,1]

R[0,2]

R[0,3]

R[0,4]

Fx

yd

Fx

yd

Fx

yd

Fx

yd

Fx

yd

Fx

yd

F

F

y

y

x

x

d

d

x F Fx

yd yd

Page 29: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

% wavefront func.exe infile outfile% wavefront func.exe infile outfile

Page 30: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

The Performance ProblemThe Performance Problem

Dispatch latency really matters: a delay in one Dispatch latency really matters: a delay in one holds up all of its children.holds up all of its children.If we dispatch larger sub-problems:If we dispatch larger sub-problems:– Concurrency on each node increases.Concurrency on each node increases.– Distributed concurrency decreases.Distributed concurrency decreases.

If we dispatch smaller sub-problems:If we dispatch smaller sub-problems:– Concurrency on each node decreases.Concurrency on each node decreases.– Spend more time waiting for jobs to be dispatched.Spend more time waiting for jobs to be dispatched.

So, model the system to choose the block size.So, model the system to choose the block size.And, build a fast-dispatch execution system.And, build a fast-dispatch execution system.

Page 31: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Wavefront ( R[x,0], R[0,y], F(x,y,d) )Wavefront ( R[x,0], R[0,y], F(x,y,d) )

R[4,2]

R[3,2] R[4,3]

R[4,4]R[3,4]R[2,4]

R[4,0]R[3,0]R[2,0]R[1,0]R[0,0]

R[0,1]

R[0,2]

R[0,3]

R[0,4]

Fx

yd

Fx

yd

Fx

yd

Fx

yd

Fx

yd

Fx

yd

F

F

y

y

x

x

d

d

x F Fx

yd yd

Block Size= 2

Page 32: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Model of 1000x1000 WavefrontModel of 1000x1000 Wavefront

Page 33: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

worker

workerworker

workerworker

workerworker

workqueue

FIn.txt out.txt

put F.exeput in.txtexec F.exe <in.txt >out.txtget out.txt

wavefront

queuetasks

tasksdone

100s of workersdispatched toNotre Dame,Purdue, and

Wisconsin

Page 34: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

worker

workerworker

workerworker

workerworker

workqueue

F

In.txt out.txt

put F.exeput in.txtexec F.exe <in.txt >out.txtget out.txt

wavefront

queuetasks

tasksdone

wavefront

F

100s of workersdispatched toNotre Dame,Purdue, and

Wisconsin

Page 35: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

500x500 Wavefront on ~200 CPUs500x500 Wavefront on ~200 CPUs

Page 36: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Wavefront on a 200-CPU ClusterWavefront on a 200-CPU Cluster

Page 37: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Wavefront on a 32-Core CPUWavefront on a 32-Core CPU

Page 38: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

The Genome Assembly ProblemThe Genome Assembly Problem

AGTCGATCGATCGATAATCGATCCTAGCTAGCTACGA

AGTCGATCGATCGAT

AGCTAGCTACGA TCGATAATCGATCCTAGCTA

Chemical Sequencing

Computational Assembly

AGTCGATCGATCGAT

AGCTAGCTACGA TCGATAATCGATCCTAGCTA

Millions of “reads”100s bytes long.

Page 39: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Sample GenomesSample Genomes

ReadsReads DataData PairsPairs

SequentialSequential

TimeTime

A. gambiaeA. gambiae

scaffoldscaffold

101K101K 80MB80MB 738K738K 12 hours12 hours

A. gambiaeA. gambiae

completecomplete

180K180K 1.4GB1.4GB 12M12M 6 days6 days

S. bicolorS. bicolor 7.9M7.9M 5.7GB5.7GB 84M84M 30 days30 days

Page 40: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Assemble( set S, Test(), Align(), Assm() )Assemble( set S, Test(), Align(), Assm() )

0. AGCCTGCATTA…1. CATTAACGAAC…2. GACTGACTAGC…3, TGACCGATAAA…

Candidate Pairs0 is similar to 11 is similar to 31 is similar to 4

0. AGCCTGCATTA1. CATTAACGAAC…

Sequence Data

AGTCGATCGATCGATAATC…

List of Alignments

Assembled Sequence

CPU Bound

I/O

Bou

ndR

AM

Bound

Align

Test Assem

Page 41: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

worker

workerworker

workerworker

workerworker

workqueue

in.txt out.txt

put align.exeput in.txtexec F.exe <in.txt >out.txtget out.txt

100s of workersdispatched toNotre Dame,Purdue, and

Wisconsin

alignmaster

queuetasks

tasksdone

align.exe

detail of a single worker:

Distributed Genome AssemblyDistributed Genome Assembly

test

assemble

Page 42: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Small Genome (101K reads)Small Genome (101K reads)

Page 43: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Medium Genome (180K reads)Medium Genome (180K reads)

Page 44: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Large Genome (7.9M)Large Genome (7.9M)

Page 45: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

From Workstation to GridFrom Workstation to Grid

Page 46: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

What’s the Upshot?What’s the Upshot?

We can do full-scale assemblies as a routine We can do full-scale assemblies as a routine matter on existing conventional machines.matter on existing conventional machines.

Our solution is faster (wall-clock time) than the Our solution is faster (wall-clock time) than the next faster assembler run on 1024x BG/L.next faster assembler run on 1024x BG/L.

You could almost certainly do better with a You could almost certainly do better with a dedicated cluster and a fast interconnect, but dedicated cluster and a fast interconnect, but such systems are not universally available.such systems are not universally available.

Our solution opens up research in assembly to Our solution opens up research in assembly to labs with “NASCAR” instead of “Formula-One” labs with “NASCAR” instead of “Formula-One” hardware.hardware.

Page 47: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

What Other AbstractionsWhat Other AbstractionsMight Be Useful?Might Be Useful?

Map( set S, F(s) )Map( set S, F(s) )Explore( F(x), x: [a….b] )Explore( F(x), x: [a….b] )Minimize( F(x), delta )Minimize( F(x), delta )Minimax( state s, A(s), B(s) )Minimax( state s, A(s), B(s) )Search( state s, F(s), IsTerminal(s) )Search( state s, F(s), IsTerminal(s) )Query( properties ) -> set of objectsQuery( properties ) -> set of objects

FluidFlow( V[x,y,z], F(v), delta )FluidFlow( V[x,y,z], F(v), delta )

Page 48: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

How do we connect multiple How do we connect multiple abstractions together?abstractions together?

Need a meta-language, perhaps with its own Need a meta-language, perhaps with its own atomic operations for simple tasks:atomic operations for simple tasks:Need to manage (possibly large) intermediate Need to manage (possibly large) intermediate storage between operations.storage between operations.Need to handle data type conversions between Need to handle data type conversions between almost-compatible components.almost-compatible components.Need type reporting and error checking to avoid Need type reporting and error checking to avoid expensive errors.expensive errors.If abstractions are feasible to model, then it may If abstractions are feasible to model, then it may be feasible to model entire programs.be feasible to model entire programs.

Page 49: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Connecting Abstractions in BXGridConnecting Abstractions in BXGrid

B1

B2

B3

A1 A2 A3

F F F

F

F F

F F

F

L brown

L blue

R brown

R brown

S1

S2

S3

eye color

F

F

F

ROCCurve

S = Select( color=“brown” )

B = Transform( S,F )

M = AllPairs( A, B, F )

Bui, Thomas, Kelly, Lyon, Flynn, ThainBXGrid: A Repository and Experimental Abstraction… poster at IEEE eScience 2008

Page 50: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

50

Page 51: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

51

Page 52: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Implementing AbstractionsImplementing Abstractions

S = Select( color=“brown” )

B = Transform( S,F )

M = AllPairs( A, B, F )

DBMS

Relational Database (2x)

Active Storage Cluster (16x)

CPU

Relational Database

CPU CPU CPU

CPU CPU CPU CPU

Condor Pool (500x)

Page 53: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Largest Combination so FarLargest Combination so Far

Complete Select/Transform/All-Pairs biometric Complete Select/Transform/All-Pairs biometric experiment on 58,396 irises from the Face experiment on 58,396 irises from the Face Recognition Grand Challenge.Recognition Grand Challenge.To our knowledge, the largest experiment ever To our knowledge, the largest experiment ever run on publically available data.run on publically available data.Competing biometric research relies on samples Competing biometric research relies on samples of 100-1000 images, which can miss important of 100-1000 images, which can miss important population effects. population effects. Reduced computation time from 800 days to 10 Reduced computation time from 800 days to 10 days, making it feasible to repeat multiple times days, making it feasible to repeat multiple times for a graduate thesis.for a graduate thesis.

Page 54: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009
Page 55: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

Abstractions ReduxAbstractions ReduxCampus grids provide enormous Campus grids provide enormous computing power, but are very challenging computing power, but are very challenging to use effectively.to use effectively.

An abstraction provides a robust, scalable An abstraction provides a robust, scalable solution to a category of problems.solution to a category of problems.

Multiple abstractions can be chained Multiple abstractions can be chained together to solve very large problems.together to solve very large problems.

Could a menu of abstractions cover a Could a menu of abstractions cover a significant fraction of the application mix?significant fraction of the application mix?

Page 56: Using Abstractions to Scale Up Applications to Campus Grids Douglas Thain University of Notre Dame 28 April 2009

AcknowledgmentsAcknowledgments

Cooperative Computing LabCooperative Computing Lab– http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl

Grad StudentsGrad Students– Chris MorettiChris Moretti– Hoang Bui Hoang Bui – Li YuLi Yu– Mike OlsonMike Olson– Michael AlbrechtMichael Albrecht

Faculty:Faculty:– Patrick FlynnPatrick Flynn– Nitesh ChawlaNitesh Chawla– Kenneth JuddKenneth Judd– Scott EmrichScott Emrich

NSF Grants CCF-0621434, CNS-0643229NSF Grants CCF-0621434, CNS-0643229

UndergradsUndergrads– Mike KellyMike Kelly– Rory CarmichaelRory Carmichael– Mark PasquierMark Pasquier– Christopher LyonChristopher Lyon– Jared BulosanJared Bulosan