21
Grazelle Hardware-Optimized In-Memory Graph Processing Samuel Grossman, Heiner Litz, and Christos Kozyrakis

Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

GrazelleHardware-Optimized

In-Memory Graph Processing

Samuel Grossman, Heiner Litz, and Christos Kozyrakis

Page 2: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Existing Work

Properties of Graph Problems

• Irregular graph data

• Difficult to partition

• Unpredictable access pattern

Scalability Optimizations

• Partitioning algorithms

• Dynamic scheduling, load balancing

• Sharing and synchronization optimizations

2

Page 3: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Existing Work

Properties of Graph Problems

• Irregular graph data

• Difficult to partition

• Unpredictable access pattern

Modern Hardware Features

Vector processing units

Sequential memory accesses

Prefetchers

NUMA

3

Page 4: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Grazelle

Properties of Graph Problems

• Irregular graph data

Simple and easy to partition

Predictable access pattern

Modern Hardware Features

Vector processing units

Sequential memory accesses

Prefetchers

NUMA

4

Page 5: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Grazelle

Grazelle is a single-machine, in-memory Gather-Apply-Scatter (GAS) graph processing engine that:

• Leverages modern hardware features

• Improves throughput by 4.4× to 36.2× over existing work

Grazelle is not a complete graph analytics framework.

5

Page 6: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Top-Level Execution Flow

Finish

Start

GAS Gather1. Gather Phase

GAS Apply, Scatter2. Combine Phase

6

Page 7: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Key Design Principles

• Vector-optimized data structures with minimal indirection

• Thread-private memory writes

• Mostly sequential memory accesses

• Simple, static partitioning and scheduling

• Synchronization via thread barriers between phases

7

Page 8: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Gather: Topology Data Structures

Existing Work

• “Compressed Sparse Row”

Grazelle

• Vector-encoded edge list

8

...

[0] [1] [2] [3]

...

[0] [1] [2]

...

[0] [1] [2] [3] [4]

Page 9: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Gather: Topology Data Structures

9

256 bits, 4 elements

Val

id

Par

t o

f D

est

inat

ion

V

ert

ex

ID

Sou

rce

Ve

rte

x ID

Page 10: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Gather: Execution

...

...

...

Edges

Vertices

Accumulators

10

Vector Load

Vector Gather

Scalar Store

Private, read-only

Shared, read-only

Private, write-only

Page 11: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Combine: Execution

...

...

Accumulators

Vertices

11

Vector Load

Vector Store

Private, read-only

Private, write-only

Page 12: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

NUMA Partitioning

12

Node 0 Node 1

Node 0 Node 1

Edges always NUMA-local

Accumulators always NUMA-localVertices sometimes NUMA-remote

Page 13: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Evaluation

Processor: 4× Intel Xeon E7-4850 (14 cores, 2-way SMT, 35 MB LLC)

RAM: 1 TB total, 256 GB per socket

Storage: 12× 6 TB magnetic disks, RAID-10

OS: Ubuntu 14.04 LTS

Compiler: GCC 4.8

13

Page 14: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Evaluation

Name Abbreviation Vertices Edges Size Domain

cit-Patents C 3.7 M 16.5 M 250 MB Citations web

dimacs-usa D 23.9 M 58.3 M 900 MB Road network

twitter-2010 T 41.7 M 1.47 B 20 GB Social

uk-2007 U 105.9 M 3.74 B 60 GB Internet

(skewed synthetic) ≤ 134 M ≤ 17 B ≤ 250 GB

14

Page 15: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Comparison

X-Stream Polymer Grazelle

Vector processing units No No Yes

Sequential memory accesses Yes Yes Yes

Prefetching No No Yes

NUMA awareness No Yes Yes

Caching overheads Yes Partial Yes

Simultaneous multithreading No No Yes

15

Page 16: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Comparison: Throughput (Real Graphs)

1 Socket 4 Sockets

0

5

10

15

C D T U

Perf

. (B

ed

ges/

sec)

X-Stream Polymer Grazelle

0

1

2

3

4

C D T U

Perf

. (B

ed

ges/

sec)

X-Stream Polymer Grazelle

16

Page 17: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Comparison: Throughput (Synthetic Graphs)

1 Socket 4 Sockets

0.0

0.1

0.2

0.3

0.4

17M 67M 268M 1BPerf

. (B

edge

s/se

c)

# Edges

X-Stream Polymer Grazelle

0.00.40.81.21.62.0

17M 67M 268M 1B 4B 17BPerf

. (B

edge

s/se

c)

# Edges

X-Stream Polymer Grazelle

17

Page 18: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Memory Bandwidth Utilization

0

10

20

30

40

50

60

cit-Patents dimacs-usa twitter-2010 uk-2007

Ban

dw

idth

(G

B/s

ec)

Read (Gather) Write (Gather) Read (Combine) Write (Combine)

18

Page 19: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Edge Vector Packing Efficiency

19

0%

20%

40%

60%

80%

100%

1 4 16 64 256 1024 4096Avg

. Pac

kin

g Ef

fici

ency

Avg. Degree

4 Elements 8 Elements 16 Elements

Page 20: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Load Balance Effectiveness

Time Division: Work vs. Barrier L2 Stall Cycles

0%

20%

40%

60%

80%

100%

Tim

e C

on

trib

uti

on

Threads

Work Barrier

20

30% off ideal

0%

20%

40%

60%

80%

100%

% L

2 S

tall

Cyc

les

Threads

Stall

Page 21: Grazelle - Stanford University Talks/samuel-grossman-1.pdfEvaluation Name Abbreviation Vertices Edges Size Domain cit-Patents C 3.7M 16.5M 250MB Citations web dimacs-usa D 23.9M 58.3M

Conclusion

• Grazelle maps graph problems to a regular and predictable software implementation without sacrificing scalability or balance

• Grazelle effectively leverages modern hardware and significantly outperforms the state-of-the-art

• Future work:• Expand to secondary storage devices like flash

• Build higher-level optimizations on top of Grazelle

21