45
X-Stream: Edge-Centric Graph Processing using Streaming Partitions Amitabha Roy Ivo Mihailovic Willy Zwaenepoel 1

X-Stream: Edge-Centric Graph Processing using Streaming Partitions

  • Upload
    zody

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

X-Stream: Edge-Centric Graph Processing using Streaming Partitions. Amitabha Roy Ivo Mihailovic Willy Zwaenepoel. Graphs. Interesting information is encoded as graphs. HyperANF Pagerank ALS …. + . Big Graphs. Large graphs are a subset of the big data problem - PowerPoint PPT Presentation

Citation preview

Page 1: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

1

X-Stream: Edge-Centric Graph Processing using

Streaming PartitionsAmitabha RoyIvo Mihailovic

Willy Zwaenepoel

Page 2: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

2

Graphs

+

HyperANFPagerankALS….

Interesting information is encoded as graphs

Page 3: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

3

Big Graphs• Large graphs are a subset of the big data problem• Billions of vertices and edges, hundreds of gigabytes• Normally tackled on large clusters• Pregel, Giraph, Graphlab …• Complexity, Power consumption …

• Can we do large graphs on a single machine ?

Page 4: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

4

X-Stream Process large graphs on a single machine

1U server = 64 GB RAM + 2 x 200 GB SSD + 3 x 3TB drive

Page 5: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

5

Approach• Problem: Graph traversal = random access• Random access is inefficient for storage• Disk (500X slower)• SSD (20X slower)• RAM (2X slower)

Solution: X-Stream makes graph accesses sequential

Page 6: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

6

Contributions• Edge-centric scatter gather model• Streaming partitions

Page 7: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

7

Standard Scatter Gather• Edge-centric scatter gather based on Standard Scatter gather• Popular graph processing model

Pregel [Google, SIGMOD 2010] …Powergraph [OSDI 2012]

Page 8: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

8

Standard Scatter Gather• State stored in vertices• Vertex operations• Scatter updates along outgoing edges• Gather updates from incoming edges

V V

Scatter Gather

Page 9: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

9

1 63

58

7

4

2

BFS

Standard Scatter Gather

Page 10: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

10

Vertex-Centric Scatter Gather• Iterates over vertices

for each vertex v if v has update for each edge e from v scatter update along e

• Standard scatter gather is vertex-centric• Does not work well with storage

Scatter

Page 11: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

1 63

58

7

4

2

BFS

SOURCE DEST

1 31 52 72 43 23 84 34 74 85 66 18 58 6

V

12345678

Vertex-Centric Scatter Gather

Lookup Index

Page 12: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

12

Transformation

for each vertex v if v has update for each edge e from v scatter update along e

for each edge e If e.src has update scatter update along e

Vertex-Centric Edge-Centric

Scatter Scatter

Page 13: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

1 63

58

7

4

2

SOURCE DEST

1 31 52 72 43 23 84 34 74 85 66 18 58 6

V

12345678BFS

Edge-Centric Scatter Gather

Page 14: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

14

SOURCE DEST

1 31 52 72 43 23 84 34 74 85 66 18 58 6

=

SOURCE DEST

1 38 65 62 43 24 74 33 84 82 76 18 51 5

No indexNo clusteringNo sorting

Page 15: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

15

Tradeoff

Edge-centric Scatter-Gather:

Vertex-centric Scatter-Gather:

• Sequential Access Bandwidth >> Random Access Bandwidth• Few scatter gather iterations for real world graphs •Well connected, variety of datasets covered in the paper

Page 16: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

16

Contributions• Edge-centric scatter gather model• Streaming partitions

Page 17: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

17

Streaming Partitions• Problem: still have random access to vertex set

V

1

2

3

4

5

6

78

• Solution: partition the graph into streaming partitions

Page 18: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

18

Streaming Partitions• A streaming partition is• A subset of the vertices that fits in RAM• All edges whose source vertex is in that subset• No requirement on quality of the partition

Page 19: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

19

V1

1

2

3

4

V2

5

6

7

8

SOURCE DEST1 54 72 74 34 83 82 41 33 2

SOURCE DEST5 68 68 56 1

Partitioning the Graph

Subset of vertices

Page 20: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

20

V1

1

2

3

4

Random Accesses for FreeSOURCE DEST1 54 72 74 34 83 82 41 33 2

Page 21: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

21

V1

1

2

3

4

Generalization

Fast storage Slow storage

Applies to any two level memory hierarchy

SOURCE DEST1 5

4 7

2 7

4 3

4 8

3 8

2 4

1 3

3 2

Page 22: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

22

Generally Applicable

OR

Disk

OR

SSD RAM

RAM RAM CPU Cache

Page 23: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

23

Parallelism• Simple Parallelism• State is stored in vertex• Streaming partitions have disjoint vertices•Can process streaming partitions in parallel

Page 24: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

24

Gathering Updates

Edges Vertices

XX YVertices

YShuffler

Minimize random access for large number of partitionsMulti-round copying akin to merge sort but cheaper

Partition 1

Partition 100

Page 25: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

25

Performance

• Focus on SSD results in this talk• Similar results with in-memory graphs

Page 26: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

26

Baseline• Graphchi [OSDI 2012]• First to show that graph processing on a single machine• Is viable• Is competitive

• Also targets larger sequential bandwidth of SSD and Disk

Page 27: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

27

Different Approaches• Fundamentally different approaches to same goal• Graphchi uses “shards”• Partitions edges into sorted shards

• X-Stream uses sequential scans • Partitions edges into unsorted streaming partitions

Page 28: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

28

Baseline to Graphchi• Replicated OSDI 2012 experiments on our SSD

InputCreate shards

ShardsRun Algorithm

Answer

InputRun Algorithm

Answer

Graphchi

X-Stream

Page 29: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

29

Netflix/ALS

Twitter/Pagerank

Twitter/Belief Propagation

RMAT27/WCC

0 1 2 3 4 5 6

X-Stream Speedup over Graphchi

Mean Speedup = 2.3

Page 30: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

30

Baseline to Graphchi• Replicated OSDI 2012 experiments on our SSD

InputCreate shards

ShardsRun Algorithm

Answer

InputRun Algorithm

Answer

Graphchi

X-Stream

Page 31: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

31

Netflix/ALS

Twitter/Pagerank

Twitter/Belief Propagation

RMAT27/WCC

0 1 2 3 4 5 6

X-Stream Speedup over Graphchi ( + sharding)

Mean Speedup Prev = 2.3Now = 3.7

Page 32: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

Netflix

/ALS

Twitter/P

agerank

Twitter/B

elief

Propagation

RMAT27/W

CC0

50010001500200025003000

Graphchi ShardingX-Stream runtime

Tim

e (s

ec)

Preprocessing Impact

32

X-Stream returns answers before Graphchi finishes sharding

Page 33: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

33

Sequential Access Bandwidth• Graphchi shard• All vertices and edges must fit in memory

• X-Stream partition• Only vertices must fit in memory

•More Graphchi shards than X-Stream partitions•Makes access more random for Graphchi

Page 34: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

34

SSD Read Bandwidth (Pagerank on Twitter)

0100200300400500600700800900

1000

X-StreamGraphchi

5 minute window

Read

(MB/

s)

Page 35: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

35

SSD Write Bandwidth (Pagerank on Twitter)

0

100

200

300

400

500

600

700

800

X-StreamGraphchi

5 minute window

Writ

e (M

B/s)

Page 36: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

36

Disk Transfers (Pagerank on Twitter)

Metric X-Stream GraphchiData moved 224 GB 322 GBTime taken 398 seconds 2613 secondsTransfer rate 578 MB/s 126 MB/s

SSD can sustain reads = 667 MB/s, writes = 576 MB/sX-Stream uses all available bandwidth from the storage device

Page 37: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

37

Scaling up

384MB

768MB

1536MB3GB

6GB12GB

24GB48GB

96GB192GB

384GB768GB

1.5TB0:00:010:00:050:00:210:01:240:05:380:22:301:30:006:00:00

24:00:0096:00:00

Weakly Connected Components

Input Edge Data

Tim

e (H

H:M

M:S

S)

16 GB RAM400 GB SSD

6 TB Disk

8 Million V, 128 Million E, 8 sec

256 Million V, 4 Billion E, 33 mins

4 Billion V, 64 Billion E, 26 hours

Page 38: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

38

Conclusion

Big graphs

X-Stream

Good PerformanceRAM, SSD, Disk

Edge-centric processing+

Streaming Partitions =

Sequential Access

Download from http://labos.epfl.ch/xstream

Page 39: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

39

BACKUP

Page 40: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

40

API Restrictions• Updates must be commutative • Cannot access all edges from a vertex in single step

Page 41: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

41

Applications• X-Stream can solve a variety of problemsBFS, SSSP, Weakly connected components, Strongly connected components, Maximal independent sets, Minimum cost spanning trees, Belief propagation, Alternating least squares, Pagerank, Betweenness centrality, Triangle counting, Approximate neighborhood function, Conductance, K-Cores

Q. Average distance between people on a social network ?A. Use approximate neighborhood function.

Page 42: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

42

Edge-centric Scatter Gather• Real world graphs have low diameter

1 6

3

8

7

4

25

1

2

3 4 5 6

7

8

D=3, BFS in 3 steps, Most real-world graphs

D=7, BFS in 7 steps

Page 43: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

43

X-Stream Main Memory Performance

1 2 4 8 16020406080

100

BFS (32M vertices/256M edges)

BFS-1 [HPC 2010]BFS-2 [PACT 2011]X-Stream

ThreadsRunti

me

(s) L

ower

is b

etter

Page 44: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

44

Runtime impact of Graphchi Sharding

Netflix/ALS Twitter/Pagerank Twitter/Belief Propagation

RMAT27/WCC0

0.10.20.30.40.50.60.70.80.9

1

Graphchi Runtime Breakdown

Compute + I/ORe-sort shard

Benchmark

Frac

tion

of R

untim

e

Page 45: X-Stream: Edge-Centric Graph Processing using Streaming Partitions

45

Pre-processing Overhead• Low overhead for producing streaming partition• Strictly cheaper than sorting edges by source vertex