18
Revolutionizing the Datacenter Accelerating Genome Assembly with Power8 Seung-Jong Park, Ph.D. School of EECS, CCT, Louisiana State University Join the Conversation #OpenPOWERSummit

Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Revolutionizing the Datacenter

Join the Conversation #OpenPOWERSummit

Accelerating Genome Assembly with Power8

Seung-Jong Park, Ph.D.

School of EECS, CCT, Louisiana State University

Join the Conversation #OpenPOWERSummit

Page 2: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Agenda

The Genome Assembly Problem

Accelerating Graph Construction with POWER8

Accelerating Graph Simplification with CAPI Flash

24/1/2016

Page 3: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

The Genome Assembly Problem

34/1/2016

Page 4: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

NGS Technologies Outpaced Moore’s Law

Software with Extreme Scalability

HPC Platform• More Compute Cycles

• Extreme I/O Performance

• Huge Storage Space

Challenges for Genome Assemblers

44/1/2016

Genome

NGS

Reads (TBs)

HPC

Re-constructed

Genome (MBs/GBs)Data and

ComputeIntensive

Page 5: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

MapReduce-based Graph Construction

54/1/2016

TA

GT

CG

AG

G

CT

GG

CT

TTA

GA

T

CT

GA

GG

CT

TTA

G Map

TT

TA

GA

GA

CA

GG

AT

CC

GA

TG

A

GTA

GT

CG

AG

G

CT Map

TT

TA

:G

TA

GT

:C

TT

AG

:A

TA

GA

:GT

CC

G:

AT

GA

G:

N

TC

GA

:

G

AG

AG

:

AA

GA

C:A

AC

AG

:

NA

TC

C:

GC

CG

A:

TC

GA

T:

GA

TG

A:G

AG

TC

:

GC

GA

G:

GA

GG

C:

T

GA

TC

:

C

GA

GA

:

CG

AC

A:

G

GA

TG

:AG

TC

G:

AG

AG

G:

CG

GC

T:

N

GG

CT

:

T

GT

CG

:

AG

AG

G:

CG

GC

T:

N

GC

TT

:TG

AT

C:

NG

AG

G:

CG

GC

T:

TG

CT

T:T

AG

TC

:

GC

GA

G:

GA

GG

C:

TC

TT

T:A

AG

AT

:CA

GG

C:

TC

TT

T:A

TA

GT

:C

TG

AG

:

G

TC

GA

:

GT

TT

A:G

TT

AG

:A

TA

GA

:T

TT

TA

:G

TT

AG

:N

Reduce

Reduce

Reduce

TA

GA

:G,T

TA

GT

:C

TC

CG

:A

TC

GA

:G

TG

AG

:G

TT

AG

:A

TT

TA

:G

AC

AG

:N

AG

AC

:A

AG

AG

:A

AG

AT

:C

AG

GC

:T

AG

TC

:G

AT

CC

:G

AT

GA

:G

CC

GA

:T

CG

AG

:G

CG

AT

:G

CT

TT

:A

GA

CA

:G

GA

GA

:C

GA

GG

:C

GA

TC

:C

GA

TG

:A

GC

TT

:T

GG

CT

:T

GT

CG

:A

Page 6: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Accelerating Graph Construction with POWER8

64/1/2016

Page 7: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Experimental Test Beds

74/1/2016

System Type IBM PKY Cluster LSU SuperMikeII

Processor Two 10-core IBM Power8 Two 8-core Intel SandyBridge Xeon

Maximum #Nodes used in various

experiments

40 120

#Physical cores/node 20 (8 Simultaneous Multi-Thread) 16 (Hyper threading disabled)

#vcores/node 160 16

RAM/node (GB) 256 32

#Disks/node 5 3

#Disks/node used for shuffled data 3 1

Total Storage space/node used for shuffled

data

1.8 0.5

Network 56Gbps InfiniBand (non-blocking) 40Gbps InfiniBand (2:1 blockings)

Page 8: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Datasets

84/1/2016

Genome data set Input size Shuffle data

size

Output size

Rice genome 12GB 70GB 50GB

Bumble bee genome 90GB 600GB 95GB

Metagenome 3.2TB 20TB 8.6TB

Page 9: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Hadoop Configurations

94/1/2016

Hadoop Parameters IBM Power8 SuperMikeII

Yarn.nodemanager.cpu.resource.vcore 120 16

Yarn.nodemanager.memory.mb 231000 29000

Mapreduce.map/reduce.cpu.vcore 4 2

Mapreduce.map/reduce.memory.mb 7000 3500

Mapreduce.map/reduce.java.opts 6500m 3000m

Page 10: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Hadoop Scalability with POWER8 SMTs

Tested with small size rice genome data on 2 node

Almost linear scalability with increasing SMTs

104/1/2016

Page 11: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Rice Genome

Analyzing small size (12GB) data

Eliminate the impact of network and disk I/O

7.5X performance improvement per server

114/1/2016

Page 12: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Bumble Bee Genome

Analyzing Medium size (90GB) Bumble Bee genome

7.5x improvement in terms of Performance/server

124/1/2016

Page 13: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Metagenome

Analyzing huge (3.2TB) metagenome data

Only 6.5 hours on 40-node IBM Power8 cluster

More than 9x improvement in terms of performance per server

134/1/2016

Page 14: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Graph Simplification with Distributed NoSQL

144/1/2016

TAGA:G,T

TAGT:C

TCCG:A

TCGA:G

TGAG:G

TTAG:A

TTTA:G

ACAG:N

AGAC:A

AGAG:A

AGAT:C

AGGC:T

AGTC:G

ATCC:G

ATGA:G

GACA:G

GAGA:C

GAGG:C

GATC:C

GATG:A

GCTT:T

GGCT:T

GTCG:A

CCGA:T

CGAG:G

CGAT:G

CTTT:A

TAGTCGAG GAGGCTTTAGA

Page 15: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Accelerating Simplification with IBM CAPI Flash

154/1/2016

NoSQL I/OThroughput(keys/sec)

CAPI Flash I/O Throughput (bytes/sec)

Only 20 Power8 Cores + CAPI : 500GB Graph traversal in

7.5 Hrs

Page 16: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Computational Challenges – The Next Step

Graph building is the most expensive phase in terms of time and resources

The Obvious Solutions: Either use a single machine with LOTS of memory, or run on a cluster.

Idea: Use CAPI accelerated flash instead of main memory

164/1/2016

Page 17: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Graph Construction on IBM CAPI Flash

174/1/2016

TAGTCGAGGCT

GGCTTTAGATC

TGAGGCTTTAG

Map

TTTAGAGACAG

GATCCGATGAG

TAGTCGAGGCT

GATC:C

GAGA:C

GACA:G

GATG:A

GTCG:A

GAGG:C

GGCT:N

GGCT:T

GTCG:A

GAGG:C

GGCT:N

GCTT:T

GATC:N

GAGG:C

GGCT:T

GCTT:T

AGAG:A

AGAC:A

ACAG:N

ATCC:G

CCGA:T

CGAT:G

ATGA:G

AGTC:G

CGAG:G

AGGC:T

AGTC:G

CGAG:G

AGGC:T

CTTT:A

AGAT:C

AGGC:T

CTTT:A

TTTA:G

TAGT:C

TTAG:A

TAGA:G

TCCG:A

TGAG:N

TCGA:G

TAGT:C

TGAG:G

TCGA:G

TTTA:G

TTAG:A

TAGA:T

TTTA:G

TTAG:N

Sort

GATC:C

GAGA:C

GACA:G

GATG:A

GTCG:A

GAGG:C

GGCT:N

GGCT:T

GTCG:A

GAGG:C

GGCT:N

GCTT:T

GATC:N

GAGG:C

GGCT:T

GCTT:T

GACA:G

GAGA:C

GAGG:C

GATC:C

GATG:A

GCTT:T

GGCT:T

GTCG:A

AGAG:A

AGAC:A

ACAG:N

ATCC:G

CCGA:T

CGAT:G

ATGA:G

AGTC:G

CGAG:G

AGGC:T

AGTC:G

CGAG:G

AGGC:T

CTTT:A

AGAT:C

AGGC:T

CTTT:A

TTTA:G

TAGT:C

TTAG:A

TAGA:G

TCCG:A

TGAG:N

TCGA:G

TAGT:C

TGAG:G

TCGA:G

TTTA:G

TTAG:A

TAGA:T

TTTA:G

TTAG:N

Sort

Sort

ACAG:N

AGAC:A

AGAG:A

AGAT:C

AGGC:T

AGTC:G

ATCC:G

ATGA:G

CCGA:T

CGAG:G

CGAT:G

CTTT:A

TAGA:G,T

TAGT:C

TCCG:A

TCGA:G

TGAG:G

TTAG:A

TTTA:G

NoSQL data engine APIs

Page 18: Accelerating Genome Assembly with Power8 · 2/14/2016  · Your logo here Experimental Test Beds 4/1/2016 7 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8

Your logohere

Initial Results of Graph Construction

Compared 85GB bumblebee dataset on 8-node Hadoop cluster vs. a single node with CAPI-accelerated flash.

Hadoop Cluster (20 physical cores per node)• Peak memory usage of 60GB per datanode

• 1 HDD per datanode

• 1 hr 56 mins

CAPI Accelerated Flash server (20 physical cores)• Peak memory usage of 7 GB

• 1 HDD and 1 CAPI card

• 3 hrs 44 mins

184/1/2016

• Peak memory usage reduced by 60 times.

• Execution time reduced by 3.5 times per node.