Dependency-Driven Disk-based Graph Processing...•Monotonic selection (KickStarter[ASPLOS’17])...

Dependency-Driven Disk-based Graph ProcessingKeval Vora

Simon Fraser University

https://github.com/pdclab/lumos

USENIX ATC’19 - Renton, WAJuly 11, 2019

Graph Processing12/20

Iterative Graph Processing13/20PageRank computation

Iterative Graph Processing

iteration t iteration t+1

Out-of-Core Graph Processing

a b c d e fV =a b c d e fV =

P0 P1P0 P1

14/20iteration t

iteration t+1

partitions reside on disk

iteration t

iteration t+1

P0 P1 P0 P1

iteration t

iteration t+1

P0 P1 P0 P1

a b c d e fV =

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C

a b c d e fV =

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

compute

I/O C I/O C I/O C

a b c d e fV =

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C I/O C

I/O = 69-90%

a b c d e fV =• Out-of-Order processing

• Amortize I/O across multiple iterations

• Guarantee Bulk Synchronous Parallel Semantics

• Dependency-Driven Cross-Iterationvalue propagation• Safely compute future values

iteration t

Computing Future Values

a b c d e fV =

a b c d e f

iteration t

iteration t+1

a b c d e fV =

P0 P0 P1

16/20future vertex values

a b c d e fV =

a b c d e f

cditeration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O I/O

a b c d e fV =

a b c d e f

cditeration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C C

a b c d e fV =

a b c d e f

cditeration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C C

a b c d e fV =

a b c d e f

cditeration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C I/O C C

future dependencies not met

TW TT FT UK

TW TT FT

16 64 256Chunk Cyclic Random

Edges Vertices

UK 1B 39.5M

TW 1.5B 41.7M

TT 2B 52.6M

FT 2.5B 68.3M

TW TT FT UK

TW TT FT

17/2030-45% vertices compute future values, and they contribute for less than 4% edges

Graph Computation

Partial Aggregation

Graph Computation

Partial Aggregation

Graph Computation

Partial Aggregation

Graph Computation

Partial Aggregation

Cross-Iteration Value Propagation

a b c d e fV =

V = a b c d e f

Partial Aggregation

iteration t

iteration t+1

a b c d e fV =

P0 P0 P1

a b c d e fV =

V = a b c d e f

Partial Aggregation

iteration t

iteration t+1

a b c d e fV =

P0 P0 P1

a b c d e fV =

V = a b c d e f

Partial Aggregation

iteration t

iteration t+1

a b c d e fV =

V = a b c d e f

Partial Aggregation

iteration t

iteration t+1

a b c d e fV =

I/O C I/O C I/O C

TW TT FT UK

TW TT FT

future values computed across >45% edges

Intra-Partition Propagation

a b c d e fV =

V = a b c d e f

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/OI/O C I/O C C CI/OI/O

Intra-Partition Propagation

a b c d e fV =

V = a b c d e f

iteration t

iteration t+1

a b c d e fV =

P0 P1 P0 P1

I/O C I/O C C CI/OI/O

Lumos: Cross-Iteration Value Propagation

• Propagation across multiple future iterations• Synergy with Dynamic Shards [ATC’16]• Detailed comparison with out-of-order techniques• Generalized graph layout & partitioning strategies• Cross-iteration propagation across different partition sizes• Interplay with selective scheduling

• Lumos for Asynchronous Algorithms

Lumos for Asynchronous Algorithms

V = a b c d e f

iteration t

iteration t+113/20shortest paths computation

V = a b c d e f

iteration t

iteration t+113/20

a b c d e fV =

V = a b c d e f • Monotonic selection (KickStarter [ASPLOS’17])

iteration t

a b c d e fV =

V = a b c d e f • Monotonic selection (KickStarter [ASPLOS’17])• Partial aggregations merged directly

iteration t

a b c d e fV =

• Monotonic selection (KickStarter [ASPLOS’17])• Partial aggregations merged directly

• Partial values safe to be propagated• Any change can be propagated immediately

iteration t

a b c d e fV =

iteration t

a b c d e fV =

• Intra-partition propagation based on degree of asynchrony (ASPIRE [OOPSLA’14])it

eration t

• Grid layout based on GridGraph [ATC’15]

Lumos System

source

target

Lumos System

source

target

source =target =

source

target

Lumos System14/20

future values propagated for upper triangle + diagonal edges

Lumos Systemiteration t

iteration t+1

14/20only lower triangle edges loaded

Lumos Systemiteration t

iteration t+1

Light-weight Degree-Aware PartitioningHOF: Highest Out-Degree FirstHIL: Highest In-Degree LastHRF: Highest Out-Degree to In-Degree Ratio First

source

target

maximize edges in upper triangle

Light-weight Degree-Aware Partitioning

16 32 64 128

256 16 32 64 128

TT FT YHHOF HIL HRF

# Partitions

HOF: Highest Out-Degree FirstHIL: Highest In-Degree LastHRF: Highest Out-Degree to In-Degree Ratio First

Edges Vertices

TT 2B 52.6M

FT 2.5B 68.3M

YH 6.6B 1.4B

cross-iteration propagation across 72-93% edges

Experimental Setup

• Graph algorithms• PageRank (weighted and unweighted), Belief Propagation,

Co-Training Expectation Maximization, Dispersion, Label Propagation

• Performance on single disk• h1.2xlarge: 278MB/sec HDD read bandwidth

• Scaling I/O• d2.4xlarge: 195-768MB/sec read bandwidth over 1-4 HDDs• i3.8xlarge: 1.2-4.1GB/sec read bandwidth over 1-4 SSDs

Performance

PR CoEM DP BP WPR LP PR CoEM DP BP WPR LPN

h1.2xlarge: 278MB/sec HDD read bandwidth

Edges Vertices

TT 2B 52.6M

YH 6.6B 1.4B

Performance

PR CoEM DP BP WPR LP PR CoEM DP BP WPR LP

Read Compute

h1.2xlarge: 278MB/sec HDD read bandwidth

Edges Vertices

TT 2B 52.6M

YH 6.6B 1.4B

Scaling I/O

1 2 3 4

# Drives

HDD SSD

Edges Vertices

RMAT29 8.6B 537M

DeviceType

Single Drive

RAID-0 with k drives

k = 2 k = 3 k = 4

d2.4xlarge HDD 195MB/s 368MB/s 590MB/s 768MB/s

i3.8xlarge SSD 1.2GB/s 3.8GB/s 4.1GB/s 3.9GB/s

16 32 64 128

256 16 32 64 128

TT FT YHHOF HIL HRF

# Partitions

Detailed Evaluation of Lumos

100 150 200 250

00:00 01:30

0 200 400 600 800

00:00 01:30

0 0.2 0.4 0.6 0.8

1 1.2 1.4 1.6

GG LBL

TW TT FT YHDegrees OrderLayouts

0 0.2 0.4 0.6 0.8

GG LBL

TW TT FT YHPrimary Secondary

GG LBL

TT FT YHPR CoEM DP BP WPR LP PR CoEM DP BP WPR LP PR CoEM DP BP WPR LP

Read Compute

Conclusion

• Dependency-Driven Cross-iteration value propagation • Out-of-Order processing to reduce I/O• Guarantees Bulk Synchronous Parallel semantics

• Generic technique that fundamentally eliminates the performance barrier

github.com/pdclab/lumos

Dependency-Driven Disk-based Graph Processing...•Monotonic selection (KickStarter[ASPLOS’17])...

Documents

Operators, Expressions, Projections, Aggregations Svetlin Nakov Telerik Corporation

HIPPODAMIA CONVERGENS AGGREGATIONS IN THE MAULE …

Efficient Incremental Computation of Aggregations over

Aggregations in Power BI · • Aggregations are an excellent tool for optimizing performance of high level BI queries often used in PBI reports and dashboards. • Aggregations are

[Better than pin] dynamic bin feiner asplos 2012

PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Servicesdelimitrou/papers/2019.asplos... · 2019-01-28 · PARTIES ASPLOS ’19, April 13–17, 2019, Providence,

Sperm aggregations in the spermatheca of female desmognathine

Clustering and aggregations of minke whales in the

Streaming Multiple Aggregations Using Phantoms

Asymmetric connectivity of spawning aggregations of a - PeerJ

Designed Particles Aggregations 02

Temporal, Geographical and Categorical Aggregations Viewed

Social threshold aggregations

Multi dimension aggregations using spark and dataframes

Execution Primitives for Scalable Joins and Aggregations ...linkedin.github.io/Cubert/_static/rubix_vldb.pdf · Execution Primitives for Scalable Joins and Aggregations in Map

1 Unified Modeling Language Classes Associations Aggregations

Busquets 2006 City X Lines Ch5 Piecemeal Aggregations

ElasticSearch - Introduction to Aggregations

Elasticsearch Aggregations

Two-dimensional trust rating aggregations in service