11
Scaling Up Data Intensive Scaling Up Data Intensive Scientific Applications to Scientific Applications to
Campus GridsCampus Grids
Douglas ThainDouglas ThainUniversity of Notre DameUniversity of Notre Dame
LSAP WorkshopLSAP WorkshopMunich, June 2009Munich, June 2009
22
OverviewOverview
Challenges in Using Campus GridsChallenges in Using Campus Grids
Solution: AbstractionsSolution: Abstractions
Examples and ApplicationsExamples and Applications– All-Pairs: Biometrics, Data MiningAll-Pairs: Biometrics, Data Mining– Wavefront: Genomics and EconomicsWavefront: Genomics and Economics– Some-Pairs: GenomicsSome-Pairs: Genomics
Abstractions, Workflows, and LanguagesAbstractions, Workflows, and Languages
33
What is a Campus Grid?What is a Campus Grid?
A campus grid is an aggregation of all A campus grid is an aggregation of all available computing power found in an available computing power found in an institution:institution:– Idle cycles from desktop machines.Idle cycles from desktop machines.– Unused cycles from dedicated clusters.Unused cycles from dedicated clusters.
Examples of campus grids:Examples of campus grids:– 600 CPUs at the University of Notre Dame600 CPUs at the University of Notre Dame– 2000 CPUs at the University of Wisconsin2000 CPUs at the University of Wisconsin– 13,000 CPUs at Purdue University13,000 CPUs at Purdue University
Cluster, cloud grid are all similar concepts.Cluster, cloud grid are all similar concepts.
44
55
66
77
Campus grids can give us access Campus grids can give us access to more machines than we can to more machines than we can
possibly use.possibly use.
But are they easy to use?But are they easy to use?
88
Example: Biometrics ResearchExample: Biometrics Research
Goal: Design robust face comparison function.Goal: Design robust face comparison function.
F
0.05
F
0.97
99
Similarity Matrix ConstructionSimilarity Matrix Construction
1.0 0.8 0.1 0.0 0.0 0.1
1.0 0.0 0.1 0.1 0.0
1.0 0.0 0.1 0.3
1.0 0.0 0.0
1.0 0.1
1.0
Challenge Workload:
60,000 iris images1MB each.02s per F833 CPU-days600 TB of I/O
1010
I have 60,000 iris images acquired in my research lab. I want to reduce each one to a feature space, and then compare all of them to each other. I want to spend my time doing science, not struggling with computers.
I have a laptop.
I own a few machines. We have access to a campus grid.
What should I do?
1111
We said:We said:Try using our campus grid!Try using our campus grid!
(How hard could it be?)(How hard could it be?)
1212
Non-Expert User Using 500 CPUsNon-Expert User Using 500 CPUsTry 1: Each F is a batch job.Failure: Dispatch latency >> F runtime.
HN
CPU CPU CPU CPUF F F FCPUF
Try 2: Each row is a batch job.Failure: Too many small ops on FS.
HN
CPU CPU CPU CPUF F F FCPUFFFF FF
FFFF
FFF
FFF
Try 3: Bundle all files into one package.Failure: Everyone loads 1GB at once.
HN
CPU CPU CPU CPUF F F FCPUFFFF FF
FFFF
FFF
FFF
Try 4: User gives up and attemptsto solve an easier or smaller problem.
1313
Why are Grids Hard to Use?Why are Grids Hard to Use?System Properties:System Properties:– Wildly varying resource availability.Wildly varying resource availability.– Heterogeneous resources.Heterogeneous resources.– Unpredictable preemption.Unpredictable preemption.– Unexpected resource limits.Unexpected resource limits.
User Considerations:User Considerations:– Jobs can’t run for too long... but, they can’t run too Jobs can’t run for too long... but, they can’t run too
quickly, either!quickly, either!– I/O operations must be carefully matched to the I/O operations must be carefully matched to the
capacity of clients, servers, and networks.capacity of clients, servers, and networks.– Users often do not even have access to the necessary Users often do not even have access to the necessary
information to make good choices!information to make good choices!
1414
OverviewOverview
Challenges in Using Campus GridsChallenges in Using Campus Grids
Solution: AbstractionsSolution: Abstractions
Examples and ApplicationsExamples and Applications– All-Pairs: Biometrics, Data MiningAll-Pairs: Biometrics, Data Mining– Wavefront: Genomics and EconomicsWavefront: Genomics and Economics– Some-Pairs: GenomicsSome-Pairs: Genomics
Abstractions, Workflows, and LanguagesAbstractions, Workflows, and Languages
1515
ObservationObservation
In a given field of study, many people In a given field of study, many people repeat the same repeat the same pattern of work many of work many times, making slight changes to the data times, making slight changes to the data and algorithms.and algorithms.If the system knows the overall pattern in If the system knows the overall pattern in advance, then it can do a better job of advance, then it can do a better job of executing it reliably and efficiently.executing it reliably and efficiently.If the user knows in advance what patterns If the user knows in advance what patterns are allowed, then they have a better idea are allowed, then they have a better idea of how to construct their workloads.of how to construct their workloads.
1616
What’s the Most Successful What’s the Most Successful Parallel Programming Language?Parallel Programming Language?
OpenGL:OpenGL:
– A declarative specification of a workload.A declarative specification of a workload.– Ported to a wide variety of HW over 20 years.Ported to a wide variety of HW over 20 years.– The graphics pipeline is very specific:The graphics pipeline is very specific:
Transform points to coordinate space.Transform points to coordinate space.Connect polygons to transformed points.Connect polygons to transformed points.Stretch textures across polygons.Stretch textures across polygons.Sort everything by Z-depth.Sort everything by Z-depth.
– Can we apply the same idea to grids?Can we apply the same idea to grids?
1717
AbstractionsAbstractionsfor Distributed Computingfor Distributed Computing
Abstraction: a Abstraction: a declarative specificationdeclarative specification of the computation and data of a workload.of the computation and data of a workload.
A A restricted patternrestricted pattern, not meant to be a , not meant to be a general purpose programming language.general purpose programming language.
UsesUses data structures instead of files. instead of files.
Provide users with a Provide users with a bright path..
Regular structure makes it tractable to Regular structure makes it tractable to model and model and predict performance.predict performance.
1818
Abstractions asAbstractions asHigher-Order FunctionsHigher-Order Functions
AllPairs( set A, set B, function F )AllPairs( set A, set B, function F )– returns M[i,j] = F( A[i], B[j] )returns M[i,j] = F( A[i], B[j] )
SomePairs( set A, list(i,j) L, function F )SomePairs( set A, list(i,j) L, function F )– returns list of F( A[i], A[j] ) returns list of F( A[i], A[j] )
Wavefront( matrix R, function F )Wavefront( matrix R, function F )– returns R[i,j] = F( R[i-1,j], R[I,j-1] )returns R[i,j] = F( R[i-1,j], R[I,j-1] )
1919
Working with AbstractionsWorking with AbstractionsF
A1A2
An
AllPairs( A, B, F )
Campus Grid
A1A2
Bn
CustomWorkflow
Engine
Compact Data Structure
2020
OverviewOverview
Challenges in Using Campus GridsChallenges in Using Campus Grids
Solution: AbstractionsSolution: Abstractions
Examples and ApplicationsExamples and Applications– All-Pairs: Biometrics, Data MiningAll-Pairs: Biometrics, Data Mining– Wavefront: Genomics and EconomicsWavefront: Genomics and Economics– Some-Pairs: GenomicsSome-Pairs: Genomics
Abstractions, Workflows, and LanguagesAbstractions, Workflows, and Languages
2121
All-Pairs AbstractionAll-Pairs Abstraction
AllPairs( set A, set B, function F )AllPairs( set A, set B, function F )
returns matrix M wherereturns matrix M where
M[i][j] = F( A[i], B[j] ) for all i,jM[i][j] = F( A[i], B[j] ) for all i,j
B1
B2
B3
A1 A2 A3
F F F
A1A1
An
B1B1
Bn
F
AllPairs(A,B,F)F
F F
F F
F
allpairs A B F.exe
2222
How Does the Abstraction Help?How Does the Abstraction Help?
The custom workflow engine:The custom workflow engine:– Chooses right data transfer strategy.Chooses right data transfer strategy.– Chooses the right number of resources.Chooses the right number of resources.– Chooses blocking of functions into jobs.Chooses blocking of functions into jobs.– Recovers from a larger number of failures.Recovers from a larger number of failures.– Predicts overall runtime accurately.Predicts overall runtime accurately.
All of these tasks are nearly impossible for All of these tasks are nearly impossible for arbitrary workloads, but are tractable (not arbitrary workloads, but are tractable (not trivial) to solve for a trivial) to solve for a specificspecific abstraction. abstraction.
2323
2424
Distribute Data Via Spanning TreeDistribute Data Via Spanning Tree
2525
Choose the Right # of CPUsChoose the Right # of CPUs
2626
Conventional vs AbstractionConventional vs Abstraction
2727
All-Pairs in ProductionAll-Pairs in ProductionOur All-Pairs implementation has provided over Our All-Pairs implementation has provided over 57 CPU-years57 CPU-years of computation to the ND of computation to the ND biometrics research group over the last year.biometrics research group over the last year.
Largest run so far: Largest run so far: 58,396 irises58,396 irises from the Face from the Face Recognition Grand Challenge. The largest Recognition Grand Challenge. The largest experiment ever run on publically available data.experiment ever run on publically available data.
Competing biometric research relies on samples Competing biometric research relies on samples of 100-1000 images, which can miss important of 100-1000 images, which can miss important population effects. population effects.
Reduced computation time from 833 days to 10 Reduced computation time from 833 days to 10 days, making it feasible to repeat multiple times for days, making it feasible to repeat multiple times for a graduate thesis. (We can go faster yet.)a graduate thesis. (We can go faster yet.)
2828
2929
OverviewOverview
Challenges in Using Campus GridsChallenges in Using Campus Grids
Solution: AbstractionsSolution: Abstractions
Examples and ApplicationsExamples and Applications– All-Pairs: Biometrics, Data MiningAll-Pairs: Biometrics, Data Mining– Wavefront: Genomics and EconomicsWavefront: Genomics and Economics– Some-Pairs: GenomicsSome-Pairs: Genomics
Abstractions, Workflows, and LanguagesAbstractions, Workflows, and Languages
3030
M[4,2]
M[3,2] M[4,3]
M[4,4]M[3,4]M[2,4]
M[4,0]M[3,0]M[2,0]M[1,0]M[0,0]
M[0,1]
M[0,2]
M[0,3]
M[0,4]
Fx
yd
Fx
yd
Fx
yd
Fx
yd
Fx
yd
Fx
yd
F
F
y
y
x
x
d
d
x F Fx
yd yd
Wavefront( matrix M, function F(x,y,d) )Wavefront( matrix M, function F(x,y,d) )
returns matrix M such thatreturns matrix M such that
M[i,j] = F( M[i-1,j], M[I,j-1], M[i-1,j-1] )M[i,j] = F( M[i-1,j], M[I,j-1], M[i-1,j-1] )
F
Wavefront(M,F)
M
3131
Applications of WavefrontApplications of Wavefront
Bioinformatics:Bioinformatics:– Compute the alignment of two large DNA strings in Compute the alignment of two large DNA strings in
order to find similarities between species. Existing order to find similarities between species. Existing tools do not scale up to complete DNA strings.tools do not scale up to complete DNA strings.
Economics:Economics:– Simulate the interaction between two competing firms, Simulate the interaction between two competing firms,
each of which has an effect on resource consumption each of which has an effect on resource consumption and market price. E.g. When will we run out of oil?and market price. E.g. When will we run out of oil?
Applies to any kind of optimization problem Applies to any kind of optimization problem solvable with dynamic programming.solvable with dynamic programming.
3232
Problem: Dispatch LatencyProblem: Dispatch Latency
Even with an infinite number of CPUs, Even with an infinite number of CPUs, dispatch latency controls the total dispatch latency controls the total execution time: O(n) in the best case.execution time: O(n) in the best case.However, job dispatch latency in an However, job dispatch latency in an unloaded grid is about unloaded grid is about 30 seconds30 seconds, which , which may outweigh the runtime of F.may outweigh the runtime of F.Things get worse when queues are long!Things get worse when queues are long!Solution:Solution: Build a lightweight task dispatch Build a lightweight task dispatch system. (Idea from Falkon@UC)system. (Idea from Falkon@UC)
3333
worker
workerworker
workerworker
workerworker
workqueue
FIn.txt out.txt
put F.exeput in.txtexec F.exe <in.txt >out.txtget out.txt
1000s of workersdispatched viaCondor/SGE/SSH
wavefrontengine
queuetasks
tasksdone
3434
Problem: Performance VariationProblem: Performance Variation
Tasks can be delayed for many reasons:Tasks can be delayed for many reasons:– Heterogeneous hardware.Heterogeneous hardware.– Interference with disk/network.Interference with disk/network.– Policy based suspension.Policy based suspension.
Any delayed task in Wavefront has a cascading Any delayed task in Wavefront has a cascading effect on the rest of the workload.effect on the rest of the workload.
Solution - Fast Abort:Solution - Fast Abort: Keep statistics on task Keep statistics on task runtimes, and abort those that lie significantly runtimes, and abort those that lie significantly outside the mean. Prefer to assign jobs to outside the mean. Prefer to assign jobs to machines with a fast history.machines with a fast history.
3535
500x500 Wavefront on ~200 CPUs500x500 Wavefront on ~200 CPUs
3636
Wavefront on a 200-CPU ClusterWavefront on a 200-CPU Cluster
3737
Wavefront on a 32-Core CPUWavefront on a 32-Core CPU
3838
Performance Prediction is PossiblePerformance Prediction is Possible
Often, users have no idea whether a task will Often, users have no idea whether a task will take one day or one year -> better to find out at take one day or one year -> better to find out at the beginning!the beginning!
Allows the system to choose automatically Allows the system to choose automatically whether to run locally or on the campus grid.whether to run locally or on the campus grid.
Of course, performance prediction is technically Of course, performance prediction is technically and philosophically dangerous: we simply argue and philosophically dangerous: we simply argue that abstractions are more predictable than that abstractions are more predictable than general programs.general programs.
3939
OverviewOverview
Challenges in Using Campus GridsChallenges in Using Campus Grids
Solution: AbstractionsSolution: Abstractions
Examples and ApplicationsExamples and Applications– All-Pairs: Biometrics, Data MiningAll-Pairs: Biometrics, Data Mining– Wavefront: Genomics and EconomicsWavefront: Genomics and Economics– Some-Pairs: GenomicsSome-Pairs: Genomics
Abstractions, Workflows, and LanguagesAbstractions, Workflows, and Languages
4040
The Genome Assembly ProblemThe Genome Assembly Problem
AGTCGATCGATCGATAATCGATCCTAGCTAGCTACGA
AGTCGATCGATCGAT
AGCTAGCTACGA TCGATAATCGATCCTAGCTA
Chemical Sequencing
Computational Assembly
AGTCGATCGATCGAT
AGCTAGCTACGA TCGATAATCGATCCTAGCTA
Millions of “reads”100s bytes long.
4141
Sample GenomesSample Genomes
ReadsReads DataData PairsPairs
SequentialSequential
TimeTime
A. gambiaeA. gambiae
scaffoldscaffold
101K101K 80MB80MB 738K738K 12 hours12 hours
A. gambiaeA. gambiae
completecomplete
180K180K 1.4GB1.4GB 12M12M 6 days6 days
S. BicolorS. Bicolor
simulatedsimulated
7.9M7.9M 5.7GB5.7GB 84M84M 30 days30 days
4242
Genome Assembly TodayGenome Assembly Today
Several commercial firms provide an Several commercial firms provide an assembly service that takes weeks on a assembly service that takes weeks on a dedicated cluster, costs O($10K), and is dedicated cluster, costs O($10K), and is based on human genome heuristics.based on human genome heuristics.
Genome researchers would like to be able Genome researchers would like to be able to perform custom assemblies using their to perform custom assemblies using their own data and heuristics.own data and heuristics.
Can this be done on a campus grid? Can this be done on a campus grid?
4343
Some-Pairs AbstractionSome-Pairs Abstraction
SomePairs( set A, list (i,j), function F(x,y) )SomePairs( set A, list (i,j), function F(x,y) )
returnsreturns
list of F( A[i], A[j] )list of F( A[i], A[j] )
A1
A2
A3
A1 A2 A3
F
A1A1
An
F
SomePairs(A,L,F)F F
F
(1,2)(2,1)(2,3)(3,3)
4444
worker
workerworker
workerworker
workerworker
workqueue
in.txt out.txt
put align.exeput in.txtexec F.exe <in.txt >out.txtget out.txt
100s of workersdispatched to
Notre Dame,Purdue, and
Wisconsin
somepairsmaster
queuetasks
tasksdone
F
detail of a single worker:
Distributed Genome AssemblyDistributed Genome Assembly
A1A1
An F
(1,2)(2,1)(2,3)(3,3)
4545
Small Genome (101K reads)Small Genome (101K reads)
4646
Medium Genome (180K reads)Medium Genome (180K reads)
4747
Large Genome (7.9M)Large Genome (7.9M)
4848
From Workstation to GridFrom Workstation to Grid
4949
What’s the Upshot?What’s the Upshot?
We can do full-scale assemblies as a routine We can do full-scale assemblies as a routine matter on existing conventional machines.matter on existing conventional machines.
Our solution is faster (wall-clock time) than the Our solution is faster (wall-clock time) than the next faster assembler run on 1024x BG/L.next faster assembler run on 1024x BG/L.
You could almost certainly do better with a You could almost certainly do better with a dedicated cluster and a fast interconnect, but dedicated cluster and a fast interconnect, but such systems are not universally available.such systems are not universally available.
Our solution opens up research in assembly to Our solution opens up research in assembly to labs with “NASCAR” instead of “Formula-One” labs with “NASCAR” instead of “Formula-One” hardware.hardware.
5050
OverviewOverview
Challenges in Using Campus GridsChallenges in Using Campus Grids
Solution: AbstractionsSolution: Abstractions
Examples and ApplicationsExamples and Applications– All-Pairs: Biometrics, Data MiningAll-Pairs: Biometrics, Data Mining– Wavefront: Genomics and EconomicsWavefront: Genomics and Economics– Some-Pairs: GenomicsSome-Pairs: Genomics
Abstractions, Workflows, and LanguagesAbstractions, Workflows, and Languages
5151
Other Abstractions for ComputingOther Abstractions for ComputingDirected Graph
Bag of TasksMap-Reduce
M
R
R
5252
Partial Lattice of AbstractionsPartial Lattice of Abstractions
Directed Graph
Bag of Tasks Wavefront Map Reduce
Some Pairs
All-Pairs
Lambda Calculus
RobustPerformance
ExpressivePower
5353
Two Abstractions ComparedTwo Abstractions Compared
AllPairs( set A, set B, F(x,y) )AllPairs( set A, set B, F(x,y) )
SomePairs( set S, list L, F(x,y) )SomePairs( set S, list L, F(x,y) )
Assuming that A = B = S…Assuming that A = B = S…
Can you express AllPairs using SomePairs?Can you express AllPairs using SomePairs?– Yes, but you must enumerate all pairs explicitly. It is not Yes, but you must enumerate all pairs explicitly. It is not
trivial for SomePairs to minimize the amount of data trivial for SomePairs to minimize the amount of data transferred to each node.transferred to each node.
Can you express SomePairs using AllPairs?Can you express SomePairs using AllPairs?– Yes, but only by doing excessive amounts of work, and Yes, but only by doing excessive amounts of work, and
then winnowing the results.then winnowing the results.
5454
Abstractions as a Social ToolAbstractions as a Social Tool
Collaboration with outside groups is how we Collaboration with outside groups is how we encounter the most interesting, challenging, and encounter the most interesting, challenging, and important problems, in computer science.important problems, in computer science.However, often neither side understands which However, often neither side understands which details are essential or non-essential:details are essential or non-essential:– Can you deal with files that have upper case letters?Can you deal with files that have upper case letters?– Oh, by the way, we have 10TB of input, is that ok?Oh, by the way, we have 10TB of input, is that ok?– (A little bit of an exaggeration.)(A little bit of an exaggeration.)
An abstraction is an excellent chalkboard tool:An abstraction is an excellent chalkboard tool:– Accessible to anyone with a little bit of mathematics.Accessible to anyone with a little bit of mathematics.– Makes it easy to see what must be plugged in.Makes it easy to see what must be plugged in.– Forces out essential details: data size, execution time.Forces out essential details: data size, execution time.
5555
ConclusionConclusion
Grids, clouds, and clusters provide enormous Grids, clouds, and clusters provide enormous computing power, but are very challenging to computing power, but are very challenging to use effectively.use effectively.
An abstraction provides a robust, scalable An abstraction provides a robust, scalable solution to a narrow category of problems; each solution to a narrow category of problems; each requires different kinds of optimizations.requires different kinds of optimizations.
Limiting expressive powerLimiting expressive power, results in systems , results in systems that are usable, predictable, and reliable.that are usable, predictable, and reliable.
Is there a menu of abstractions that would Is there a menu of abstractions that would satisfy many consumers of grid computing?satisfy many consumers of grid computing?
5656
AcknowledgmentsAcknowledgments
Cooperative Computing LabCooperative Computing Lab– http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl
Grad StudentsGrad Students– Chris MorettiChris Moretti– Hoang Bui Hoang Bui – Li YuLi Yu– Mike OlsonMike Olson– Michael AlbrechtMichael Albrecht
Faculty:Faculty:– Patrick FlynnPatrick Flynn– Nitesh ChawlaNitesh Chawla– Kenneth JuddKenneth Judd– Scott EmrichScott Emrich
NSF Grants CCF-0621434, CNS-0643229NSF Grants CCF-0621434, CNS-0643229
UndergradsUndergrads– Mike KellyMike Kelly– Rory CarmichaelRory Carmichael– Mark PasquierMark Pasquier– Christopher LyonChristopher Lyon– Jared BulosanJared Bulosan