32

Hpcc euler

  • Upload
    rhuzefa

  • View
    827

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Hpcc euler

MotivationMethodResults

Summary

GPU-Euler

Sequence Assembly using GPGPU

S. Mahmood H. Rangwala

Department of Computer Science

George Mason University

International Conference on High Performance Computing &Communications, 2011

Ban�, Canada

Mahmood, Rangwala GPU-Euler

Page 2: Hpcc euler

MotivationMethodResults

Summary

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 3: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 4: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

Genome

Genome � a biological blueprint.

Very long chains of four types ofnucleobases.

AdenineGuanineCytosineThymine

Important to understand thefunction of the organism.

Figure: Double Helix DNArepresentation

1

1Image courtesy of Image Library of Biological Macromolecules, Jena,Germany. http://www.imb-jena.de/IMAGE.html

Mahmood, Rangwala GPU-Euler

Page 5: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

Sequence AssemblyChallenges

Total number of nucleobases in a genome is very large

eg. Human Genome has 3.2 Billion base pairs.

Existing technologies can only read a fraction of this longstrand.

Smaller fragments(reads) are required to be stitched together.

Figure: Sequence Assembly

22Image courtesy of Center for BioInformatics Computational Biology,UMD.

www.cbcb.umd.edu/research/assembly_primer.htmlMahmood, Rangwala GPU-Euler

Page 6: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

Problem Statement

Given a set of alphabets ∑ = {A,G ,C ,T} and a set of stringsR = {r1, r2, r3 . . . rn} over alphabet Σ

Construct Super String S, containing all the strings from R.

Similar to Shortest Common Super string.

Need to consider Repeats.

Massive volume of data

Mahmood, Rangwala GPU-Euler

Page 7: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 8: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

Sequence AssemblyTechniques & Tools

Greedy Assemblers

VCAKE

Overlap-layout-consensus

Celera

Eulerian Path

Euler, EulerSR, VelvetABySS

Mahmood, Rangwala GPU-Euler

Page 9: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 10: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

General Purpose GPU Computing

GPUs for General Purpose Computing

Massive parallelism for application.

nVidia CUDA , a framework fordevelopment on nVidia GPUs.

Similar model for parallel computation

Parallel Random Access Machine(PRAM)Single Instruction Multiple Data(SIMD)

Figure: CUDA ApplicationStack

3

3Image courtesy of nVidia : nVidia CUDA Toolkit Reference ManualMahmood, Rangwala GPU-Euler

Page 11: Hpcc euler

MotivationMethodResults

Summary

Genome AssemblyPrevious WorkGPGPU

CUDACompute Uni�ed Device Architecture

A CUDA enabled device has

Symmetric Multiprocessor (SM)

Each SM has a set of StreamingProcessors (SP).

Global Memory.

Concurrent execution of samecode on all SM.

Computations use GPU memory.

Figure: Hardware Architecture

44Image courtesy of nVidia : nVidia CUDA Toolkit Reference Manual

Mahmood, Rangwala GPU-Euler

Page 12: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 13: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Concepts

de-Bruijn Graph A directed graph

vertice are k lengthwordedge represents ak−1 betweenvertices.

Contigs Assembled sequences fromthe input data.

EulerTour A graph traversal visitingeach edge only once.

Figure: de-Bruijn Graph

Mahmood, Rangwala GPU-Euler

Page 14: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Objective

Represent input read as a de-Bruijn graph

Each edge would correspond to a single base.

An Euler tour will visit each base only once.

Mahmood, Rangwala GPU-Euler

Page 15: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Parallel Eulerian Assembly

Construct de-Bruijn Graph

Find Euler Tour

Output Contigs

Debruijn Graph

Construction

FASTA file

(input)

Euler Tour

ConstructionIdentify Contigs

Graph

Reads

Annotated Graph

FASTA file

(output)

Contigs

EulerGPU

Figure: GPU Euler Work �ow

Mahmood, Rangwala GPU-Euler

Page 16: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Parallel de-Bruijn Graph Construction

Assign each CUDA thread toone read.

Generate k-mers andk +1-mers.

Store them in a hash table.

Create nodes from k-mersand vertices from k +1-mers.

<CUDA>

Count

Edges

<CUDA>

Setup

Edges

FASTA file

<CUDA>

Setup

Vertices

Figure: Graph Construction

Mahmood, Rangwala GPU-Euler

Page 17: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Parallel Euler Tour

Create a Edge Successor Graphfrom de-Bruijn Graph.

Identify Circuits in the EdgeSuccessor Graph

Create a Circuit Graph byidentifying adjacent circuits.

Calculate a spanning tree forCircuit.

Traverse Circuit Graph andswitch successor edges ofadjacent Circuits.

Find

Spanning Tree

<CUDA>

Assign

Successor

<CUDA>

Execute

Swipe

<CUDA>

Find

Component

<CUDA>

Create

Circuit

Graph

Circuit Graph

Annotated Graph

Spanning Tree

Comp. Label

DeBruijn Graph

Euler Tour

Figure: Parallel Euler Tour

Mahmood, Rangwala GPU-Euler

Page 18: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Decomposition of Di�erent Phases

Phase Computation

I/O and k-mer Extraction CPU + GPU

Hash Table Construction GPU

debruijn Graph Construction GPU

Euler Tour Construction GPU + CPU

Sub-steps for Euler Tour Construction

Finding Connected Component GPU

Circuit Graph Creation GPU

Spanning Tree CPUSwipe Execution GPU

Traversal (Other) GPU

Contig Generation (O/P) CPU

Mahmood, Rangwala GPU-Euler

Page 19: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 20: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Time Complexity Analysis

Step Complexity Processors

de-Bruijn Graph Construction O (1) O (n)Euler Tour Construction (logn) O (n)

Spanning Tree O (log |V |) O (|V |)GPU-Euler O (logn) O (n)

Mahmood, Rangwala GPU-Euler

Page 21: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 22: Hpcc euler

MotivationMethodResults

Summary

Parallel Eulerian AssemblyTime Complexity AnalysisEvaluation

Experimental Protocol

Compared Timing, N50 Score, Mean length with EulerSRusing various parameters.

Why EulerSR

Based on same conceptShared memory approachSupport short reads

Contigs with length > 100 were included in the comparison.

Calculated contig converge using MUMMER.

Individual GPU Computations were timed as well.

Mahmood, Rangwala GPU-Euler

Page 23: Hpcc euler

MotivationMethodResults

Summary

Data setsResults

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 24: Hpcc euler

MotivationMethodResults

Summary

Data setsResults

Data SetsGenome size and number of simulated reads for di�erent read length

Genome Length 36 bp 50 bp 256 bp

Campylobacter Jejuni 1,641,481 911,934 656,593 128,241Neisseria Meningitidis 2,184,406 1,213,559 873,763 170,657Lactococcus Lactisd 2,635,589 1,314,216 946,236 184,812

Mahmood, Rangwala GPU-Euler

Page 25: Hpcc euler

MotivationMethodResults

Summary

Data setsResults

Outline

1 MotivationGenome AssemblyPrevious WorkGPGPU

2 MethodParallel Eulerian AssemblyTime Complexity AnalysisEvaluation

3 ResultsData setsResults

Mahmood, Rangwala GPU-Euler

Page 26: Hpcc euler

MotivationMethodResults

Summary

Data setsResults

ResultsExecution Time Comparison

20

40

60

80

100

120

140

160

180

36bp 50bp 256bp

Ass

embl

y T

ime

(sec

onds

)

Read Length

Runtime Comparison for Campylobacter Jejuni

EulerSREulerSR*

GPU-Euler

0

50

100

150

200

250

300

36bp 50bp 256bp

Ass

embl

y T

ime

(sec

onds

)

Read Length

Runtime Comparison for Neisseria Meningitidis

EulerSREulerSR*

GPU-Euler

0

50

100

150

200

250

300

36bp 50bp 256bp

Ass

embl

y T

ime

(sec

onds

)

Read Length

Runtime Comparison for Lactococcus Lactis

EulerSREulerSR*

GPU-Euler

Mahmood, Rangwala GPU-Euler

Page 27: Hpcc euler

MotivationMethodResults

Summary

Data setsResults

ResultsN50 Score Comparison

0

20000

40000

60000

80000

100000

120000

36bp 50bp 256bp

N50

Sco

re (

base

s)

Read Length

N50 Score Comparison for Campylobacter Jejuni

EulerSREulerSR*

GPU-Euler

0

5000

10000

15000

20000

25000

30000

35000

36bp 50bp 256bp

N50

Sco

re (

base

s)

Read Length

N50 Score Comparison for Neisseria Meningitidis

EulerSREulerSR*

GPU-Euler

0

10000

20000

30000

40000

50000

60000

70000

80000

36bp 50bp 256bp

N50

Sco

re (

base

s)

Read Length

N50 Score Comparison for Lactococcus Lactis

EulerSREulerSR*

GPU-Euler

Mahmood, Rangwala GPU-Euler

Page 28: Hpcc euler

MotivationMethodResults

Summary

Data setsResults

ResultsAccuracy Comparison

75

80

85

90

95

100

36bp 50bp 256bp

Wei

ghte

d A

ccur

acy

Read Length

Campylobacter Jejuni

EulerSREulerSR*

GPU-Euler

0

10

20

30

40

50

60

70

80

90

100

36bp 50bp 256bp

Wei

ghte

d A

ccur

acy

Read Length

Neisseria Meningitidis

EulerSREulerSR*

GPU-Euler

75

80

85

90

95

100

36bp 50bp 256bp

Wei

ghte

d A

ccur

acy

Read Length

Lactococcus Lactis

EulerSREulerSR*

GPU-Euler

Mahmood, Rangwala GPU-Euler

Page 29: Hpcc euler

MotivationMethodResults

Summary

Data setsResults

GPU Euler Phase Distribution

Phase Computation % Time

I/O and k-mer Extraction CPU + GPU 77.29+1.44Hash Table Construction GPU 0.31debruijn Graph Construction GPU 1.15Euler Tour Construction GPU + CPU

Sub-steps for Euler Tour Construction

Finding Connected Component GPU 10.06Spanning Tree CPU 0.06Swipe Execution GPU 0.01Circuit Graph & Traversal (Other) GPU 0.72

Contig Generation (O/P) CPU 4.39

Mahmood, Rangwala GPU-Euler

Page 30: Hpcc euler

MotivationMethodResults

Summary

Summary

Exploiting GPUs for Sequence Assembly.

Implementation of PRAM algorithm on CUDA devices.

Outlook

No Error CorrectionGraph Simpli�cation.

Mahmood, Rangwala GPU-Euler

Page 31: Hpcc euler

MotivationMethodResults

Summary

Questions

Questions?

Mahmood, Rangwala GPU-Euler

Page 32: Hpcc euler

MotivationMethodResults

Summary

Thank you

Thank you!!

Mahmood, Rangwala GPU-Euler