23
GraphMat: Bridging the Productivity-Performance Gap in Graph Analytics Narayanan Sundaram Parallel Computing Lab, Intel Labs

Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

  • Upload
    mlconf

  • View
    802

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

GraphMat: Bridging the Productivity-Performance Gap in Graph AnalyticsNarayanan SundaramParallel Computing Lab, Intel Labs

Page 2: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 2

A cybersecurity application

• Intel Security• Loopy belief propagation for reputation management• ~2B vertices, ~6 Billion edges• Needed to run daily

• Took almost a day with Giraph on 16 machines

How can we handle Internet-of-Things reputation management without increased performance?

Port scanning

DDoS

Normal Traffic

Page 3: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 3

A social problem

• Pagerank• ~1 trillion edges in graph• Takes 3 minutes/iteration on 200 machines on Giraph

How can we handle personalized pagerank for even top 1% users without increased performance?Ching, Avery, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. "One trillion edges: graph processing at Facebook-scale." Proceedings of the VLDB Endowment 8, no. 12 (2015): 1804-1815.

Page 4: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 4

Problem scale

Social network ~1 billion vertices

~100 billion connections

Web graph ~50 billion pages

~1 trillion hyperlinks

Brain network ~100 billion neurons

~100 trillion connectionsMarc Smith: NodeXL Twitter Network Graphs: CHI2010: https://www.flickr.com/photos/marc_smith/4511844243 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 )

Larry & Teddy Page: Blog webgraph: https://www.flickr.com/photos/igboo/1814232325 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 )Xavier Gigandet et. al. - Gigandet X, Hagmann P, Kurant M, Cammoun L, Meuli R, et al. (2008) Estimating the Confidence Level of White Matter Connections Obtained with MRI Tractography. PLoS ONE 3(12): e4006. doi:10.1371/journal.pone.0004006 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 )

Page 5: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 5

GraphMat

• What is GraphMat?• GraphMat is a graph programming framework with vertex

programming as front-end and sparse matrix operations as back-end• “Matrix level performance with vertex program productivity”

• How can it help you?• “I know vertex programming and I like it, but

Giraph/GraphX/Pregel/GraphLab… is too slow”• “I heard that graph programs can be written as matrix operations

(and matrices are fast), but I do not want to recode my graph algorithms as matrix algorithms”Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Michael Anderson,

Satya Gautam Vadlamudi, Dipankar Das, Pradeep Dubey “GraphMat: High performance graph analytics made productive”, PVLDB, Vol 8 No 11, 2015.

Page 6: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 6

Why?

• Why GraphMat?• We want to enable super-fast distributed graph processing on X86 servers

• Why open-source?• We want to enable super-fast distributed graph processing on X86 servers

for everyone C++/MPI BSD license

• Integrate it with your data processing/ML frameworks• We can help

Page 7: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 7

Current state-of-the-art

• GraphMat is faster than other distributed graph frameworks• Faster than GraphLab, CombBLAS, GraphX, Giraph…

• Optimized for multi-node and multi-core• Uses vertex programming

• Bringing sparse matrix optimizations from High Performance Computing to Big Graph processing

Page 8: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 8

• Vertex programming “think like a vertex”– GraphLab, Giraph, MapGraph, Pregel,

GraphX

• Matrix based “graphs are sparse matrices”– CombBLAS, PEGASUS

• Task models– Galois

• Declarative programming– SociaLite (datalog-like)

• Domain-specific languages– GreenMarl

Diversity in current graph frameworks

PageRank (8 mi...0

20406080

100120140160

Giraph

GraphLab

CombBLAS

Galois

NativeSp

eedu

p w

.r.t

. Gir

aph

Page 9: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 9

Diversity in current graph frameworks(contd.)

Framework Productivity PerformanceGraphLabGiraphCombBLASGaloisGraphXGraphMat

Combine high productivity with great performance

Green = good, orange = ok, red = bad.

Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Jiwon Seo, Jongsoo Park, Muhammad Hassaan, Shubho Sengupta, Zhaoming Yin, Pradeep Dubey, “Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets”, SIGMOD 2014

Page 10: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 10

Assumptions

• Vertex programming is productive• Fewer building blocks are better• Sparse matrix operations are scalable• Very few people have the ability and interest to optimize “to the

metal”• Can use MPI in distributed setting (even on cloud)

• This assumption may be relaxed in the future• Graph data fits in memory

Page 11: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 11

GraphMat: High level operation

Benefits High productivity (vertex programming for users) High performance (optimized sparse matrix backend)

Vertex program:

• Send message to all edges

• Process incoming message

• Reduce

• Operate on vertex

Our transformation:

Send message Create (sparse) vector

Process message SPMV multiplyReduce SPMV Add

Apply Data parallel operator

Scatter

Gather

Apply

GeneralizedSpMV /SpGEMM

Page 12: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 12

Example (Vertex Degree)

Can process in-edges, out-edges or all edges.

C++ templates for handling arbitrary types

User-defined functions to specify a particular algorithm

Page 13: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 13

What is new?

• Graph algorithms as linear algebra are well-known

• Unifying vertex programming with linear algebra is new

Page 14: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 14

B A

C D

E1

21 3

4

2 2

Single Source Shortest PathSEND_MESSAGE : message vertex_distancePROCESS_MESSAGE : result message + edge_valueREDUCE : result min(result, operand)APPLY : vertex_distance = min(result, vertex_distance)

Example

Page 15: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 15

[∞∞∞∞∞

] 𝐼𝑛𝑖𝑡→ [0∞∞∞∞

]

Iteration 0

Iteration

1

B A

C D

E1

21 3

4

2 2

B A

C D

E1

21 3

4

2 2

B A

C D

E1

21 3

4

2 2

0 ∞∞

∞∞

0 ∞1

23

0 41

22

reducedvalues

previousdistances

updateddistances

Single Source Shortest PathSEND_MESSAGE : message vertex_distancePROCESS_MESSAGE : result message + edge_valueREDUCE : result min(result, operand)APPLY : vertex_distance = min(result, vertex_distance)

Page 16: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 16

Optimizations

• Flexible graph partitioning• 1-D, 2-D, Block cyclic

• Flexible data structures• Compressed Sparse Row (CSR)• Doubly compressed sparse column (DCSC)• Dense with bitvectors

• Low-level • Compiler optimizations• Vectorization

Page 17: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 17

GraphMat vs others

MapGraph

Galois

CombBLAS

GraphLab

0 1 2 3 4 5 6 7 8Slowdown vs GraphMat (>1 imples GraphMat is

faster)

Page 18: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 18

Is GraphMat good enough?

* Native code performs direction optimized sweeps for BFS, GraphMat only forward

Pagerank

Breadth First Search

Triangle counting

Shortest path

0 1 2 3 4 5 6 7 8

Native runtime vs GraphMat

GraphMatNative optimized code

Time in seconds

Within 1.2X of native performance on average

Page 19: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 19

Scalability (Preliminary results)

Weak scaling, RMAT 128 M edges/node

1 2 40.1

1

10

100Pagerank

GraphMatGraphX

#Nodes

Tim

e pe

r ite

rati

on (

in s

ec)

1 2 40.1

1

10

100

1000

Shortest path

GraphMatGraphX

#Nodes

Tim

e in

sec

onds

Page 20: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 20

Availability

• Open source under BSD license• https://

github.com/narayanan2004/GraphMat• (Single-node code only at the

moment)

• Plan to integrate with 3rd party data processing frameworks• JNI wrappers to call with Spark as a

first step

Page 21: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 21

Summary

• GraphMat bridges the productivity-performance gap for graph analytics

• Within 20% of native code performance• Faster than GraphLab, CombBLAS, Galois, and GraphX• As easy as vertex programming

• Integration with other frameworks on the way

• Code available under BSD at https://github.com/narayanan2004/GraphMat

Page 22: Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15

© 2015 Intel Corporation 22

Acknowledgements(Parallel Computing Lab, Intel Labs)Michael J. AndersonNadathur Rajagopalan SatishMd Mostofa Ali PatwarySubramanya DulloorSatya Gautam VadlamudiNesreen AhmedDipankar DasTed WillkePradeep Dubey