Upload
rhuzefa
View
827
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
MotivationMethodResults
Summary
GPU-Euler
Sequence Assembly using GPGPU
S. Mahmood H. Rangwala
Department of Computer Science
George Mason University
International Conference on High Performance Computing &Communications, 2011
Ban�, Canada
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
Genome
Genome � a biological blueprint.
Very long chains of four types ofnucleobases.
AdenineGuanineCytosineThymine
Important to understand thefunction of the organism.
Figure: Double Helix DNArepresentation
1
1Image courtesy of Image Library of Biological Macromolecules, Jena,Germany. http://www.imb-jena.de/IMAGE.html
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
Sequence AssemblyChallenges
Total number of nucleobases in a genome is very large
eg. Human Genome has 3.2 Billion base pairs.
Existing technologies can only read a fraction of this longstrand.
Smaller fragments(reads) are required to be stitched together.
Figure: Sequence Assembly
22Image courtesy of Center for BioInformatics Computational Biology,UMD.
www.cbcb.umd.edu/research/assembly_primer.htmlMahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
Problem Statement
Given a set of alphabets ∑ = {A,G ,C ,T} and a set of stringsR = {r1, r2, r3 . . . rn} over alphabet Σ
Construct Super String S, containing all the strings from R.
Similar to Shortest Common Super string.
Need to consider Repeats.
Massive volume of data
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
Sequence AssemblyTechniques & Tools
Greedy Assemblers
VCAKE
Overlap-layout-consensus
Celera
Eulerian Path
Euler, EulerSR, VelvetABySS
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
General Purpose GPU Computing
GPUs for General Purpose Computing
Massive parallelism for application.
nVidia CUDA , a framework fordevelopment on nVidia GPUs.
Similar model for parallel computation
Parallel Random Access Machine(PRAM)Single Instruction Multiple Data(SIMD)
Figure: CUDA ApplicationStack
3
3Image courtesy of nVidia : nVidia CUDA Toolkit Reference ManualMahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Genome AssemblyPrevious WorkGPGPU
CUDACompute Uni�ed Device Architecture
A CUDA enabled device has
Symmetric Multiprocessor (SM)
Each SM has a set of StreamingProcessors (SP).
Global Memory.
Concurrent execution of samecode on all SM.
Computations use GPU memory.
Figure: Hardware Architecture
44Image courtesy of nVidia : nVidia CUDA Toolkit Reference Manual
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Concepts
de-Bruijn Graph A directed graph
vertice are k lengthwordedge represents ak−1 betweenvertices.
Contigs Assembled sequences fromthe input data.
EulerTour A graph traversal visitingeach edge only once.
Figure: de-Bruijn Graph
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Objective
Represent input read as a de-Bruijn graph
Each edge would correspond to a single base.
An Euler tour will visit each base only once.
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Parallel Eulerian Assembly
Construct de-Bruijn Graph
Find Euler Tour
Output Contigs
Debruijn Graph
Construction
FASTA file
(input)
Euler Tour
ConstructionIdentify Contigs
Graph
Reads
Annotated Graph
FASTA file
(output)
Contigs
EulerGPU
Figure: GPU Euler Work �ow
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Parallel de-Bruijn Graph Construction
Assign each CUDA thread toone read.
Generate k-mers andk +1-mers.
Store them in a hash table.
Create nodes from k-mersand vertices from k +1-mers.
<CUDA>
Count
Edges
<CUDA>
Setup
Edges
FASTA file
<CUDA>
Setup
Vertices
Figure: Graph Construction
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Parallel Euler Tour
Create a Edge Successor Graphfrom de-Bruijn Graph.
Identify Circuits in the EdgeSuccessor Graph
Create a Circuit Graph byidentifying adjacent circuits.
Calculate a spanning tree forCircuit.
Traverse Circuit Graph andswitch successor edges ofadjacent Circuits.
Find
Spanning Tree
<CUDA>
Assign
Successor
<CUDA>
Execute
Swipe
<CUDA>
Find
Component
<CUDA>
Create
Circuit
Graph
Circuit Graph
Annotated Graph
Spanning Tree
Comp. Label
DeBruijn Graph
Euler Tour
Figure: Parallel Euler Tour
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Decomposition of Di�erent Phases
Phase Computation
I/O and k-mer Extraction CPU + GPU
Hash Table Construction GPU
debruijn Graph Construction GPU
Euler Tour Construction GPU + CPU
Sub-steps for Euler Tour Construction
Finding Connected Component GPU
Circuit Graph Creation GPU
Spanning Tree CPUSwipe Execution GPU
Traversal (Other) GPU
Contig Generation (O/P) CPU
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Time Complexity Analysis
Step Complexity Processors
de-Bruijn Graph Construction O (1) O (n)Euler Tour Construction (logn) O (n)
Spanning Tree O (log |V |) O (|V |)GPU-Euler O (logn) O (n)
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation
Experimental Protocol
Compared Timing, N50 Score, Mean length with EulerSRusing various parameters.
Why EulerSR
Based on same conceptShared memory approachSupport short reads
Contigs with length > 100 were included in the comparison.
Calculated contig converge using MUMMER.
Individual GPU Computations were timed as well.
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Data setsResults
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Data setsResults
Data SetsGenome size and number of simulated reads for di�erent read length
Genome Length 36 bp 50 bp 256 bp
Campylobacter Jejuni 1,641,481 911,934 656,593 128,241Neisseria Meningitidis 2,184,406 1,213,559 873,763 170,657Lactococcus Lactisd 2,635,589 1,314,216 946,236 184,812
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Data setsResults
Outline
1 MotivationGenome AssemblyPrevious WorkGPGPU
2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation
3 ResultsData setsResults
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Data setsResults
ResultsExecution Time Comparison
20
40
60
80
100
120
140
160
180
36bp 50bp 256bp
Ass
embl
y T
ime
(sec
onds
)
Read Length
Runtime Comparison for Campylobacter Jejuni
EulerSREulerSR*
GPU-Euler
0
50
100
150
200
250
300
36bp 50bp 256bp
Ass
embl
y T
ime
(sec
onds
)
Read Length
Runtime Comparison for Neisseria Meningitidis
EulerSREulerSR*
GPU-Euler
0
50
100
150
200
250
300
36bp 50bp 256bp
Ass
embl
y T
ime
(sec
onds
)
Read Length
Runtime Comparison for Lactococcus Lactis
EulerSREulerSR*
GPU-Euler
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Data setsResults
ResultsN50 Score Comparison
0
20000
40000
60000
80000
100000
120000
36bp 50bp 256bp
N50
Sco
re (
base
s)
Read Length
N50 Score Comparison for Campylobacter Jejuni
EulerSREulerSR*
GPU-Euler
0
5000
10000
15000
20000
25000
30000
35000
36bp 50bp 256bp
N50
Sco
re (
base
s)
Read Length
N50 Score Comparison for Neisseria Meningitidis
EulerSREulerSR*
GPU-Euler
0
10000
20000
30000
40000
50000
60000
70000
80000
36bp 50bp 256bp
N50
Sco
re (
base
s)
Read Length
N50 Score Comparison for Lactococcus Lactis
EulerSREulerSR*
GPU-Euler
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Data setsResults
ResultsAccuracy Comparison
75
80
85
90
95
100
36bp 50bp 256bp
Wei
ghte
d A
ccur
acy
Read Length
Campylobacter Jejuni
EulerSREulerSR*
GPU-Euler
0
10
20
30
40
50
60
70
80
90
100
36bp 50bp 256bp
Wei
ghte
d A
ccur
acy
Read Length
Neisseria Meningitidis
EulerSREulerSR*
GPU-Euler
75
80
85
90
95
100
36bp 50bp 256bp
Wei
ghte
d A
ccur
acy
Read Length
Lactococcus Lactis
EulerSREulerSR*
GPU-Euler
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Data setsResults
GPU Euler Phase Distribution
Phase Computation % Time
I/O and k-mer Extraction CPU + GPU 77.29+1.44Hash Table Construction GPU 0.31debruijn Graph Construction GPU 1.15Euler Tour Construction GPU + CPU
Sub-steps for Euler Tour Construction
Finding Connected Component GPU 10.06Spanning Tree CPU 0.06Swipe Execution GPU 0.01Circuit Graph & Traversal (Other) GPU 0.72
Contig Generation (O/P) CPU 4.39
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Summary
Exploiting GPUs for Sequence Assembly.
Implementation of PRAM algorithm on CUDA devices.
Outlook
No Error CorrectionGraph Simpli�cation.
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Questions
Questions?
Mahmood, Rangwala GPU-Euler
MotivationMethodResults
Summary
Thank you
Thank you!!
Mahmood, Rangwala GPU-Euler