44
Giraph: Production-grade graph processing infrastructure for trillion edge graphs 8/12/2014 ATPESC Avery Ching

Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Giraph: Production-grade graph processing infrastructure for trillion

edge graphs

8/12/2014 ATPESC

Avery Ching

Page 2: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Motivation

Page 3: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Apache Giraph• Inspired by Google’s Pregel but runs on Hadoop

• “Think like a vertex”

• Maximum value vertex example

Processor 1

Processor 2

Time

5

5

5

5

2 5

5

5

2

1 5

5

2

1

Page 4: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Giraph on Hadoop / Yarn

MapReduce YARN

Giraph

Hadoop 0.20.x

Hadoop 0.20.203

Hadoop 2.0.x

Hadoop 1.x

Page 5: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Send page rank value to neighbors for 30 iterations

Calculate updated page rank value from neighbors

Page rank in Giraph!!public class PageRankComputation extends BasicComputation<LongWritable, DoubleWritable, FloatWritable, DoubleWritable> { public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable> vertex, Iterable<DoubleWritable> messages) { if (getSuperstep() >= 1) { double sum = 0; for (DoubleWritable message : messages) { sum += message.get(); } vertex.getValue().set(DoubleWritable((0.15d / getTotalNumVertices()) + 0.85d * sum); } if (getSuperstep() < 30) { sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / getNumOutEdges())); } else { voteToHalt(); } } }

Page 6: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Giraph compared to MPIGiraph’s vertex centric API is higher-level (and narrower) than MPI

• Vertex message queues vs process-level messaging

• Enforced BSP model

• Data distribution, checkpointing handled by Giraph

• Giraph aggregators are user-level MPI_Allreduce reductions

Java (Giraph) vs C/C++ (MPI)

Scheduled with Hadoop clusters

Easy integration with other Hadoop pipelines

Page 7: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Apache Giraph data flow

Loading the graph

Input format

Split 0

Split 1

Split 2

Split 3

Mas

ter

Load/ Send

Graph

Wor

ker 0

Load/ Send

Graph

Wor

ker 1

Storing the graph

Wor

ker 0

Wor

ker 1

Output format

Part 0

Part 1

Part 2

Part 3

Part 0

Part 1

Part 2

Part 3

Compute / Iterate

Compute/ Send

Messages

Compute/ Send

Messages

In-memory graph

Part 0

Part 1

Part 2

Part 3M

aste

r Wor

ker 0

Wor

ker 1

Send stats/iterate!

Page 8: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Pipelined computationMaster “computes”

• Sets computation, in/out message, combiner for next super step

• Can set/modify aggregator values

Time

Worker 0

Worker 1

Master

phase 1a

phase 1a

phase 1b

phase 1b

phase 2

phase 2

phase 3

phase 3

Page 9: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Use case

Page 10: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Affinity propagationFrey and Dueck “Clustering by passing messages between data points” Science 2007

Organically discover exemplars based on similarity

Initialization Intermediate Convergence

Page 11: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Responsibility r(i,k)

• How well suited is k to be an exemplar for i?

Availability a(i,k)

• How appropriate for point i to choose point k as an exemplar given all of k’s responsibilities?

Update exemplars

• Based on known responsibilities/availabilities, which vertex should be my exemplar?

!

* Dampen responsibility, availability

3 stages

Page 12: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Responsibility

Every vertex i with an edge to k maintains responsibility of k for i

Sends responsibility to k in ResponsibilityMessage (senderid, responsibility(i,k))

C

A

D

B

r(c,a)r(d,a)

r(b,d)r(b,a)

Page 13: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Availability

Vertex sums positive messages

Sends availability to i in AvailabilityMessage (senderid, availability(i,k))

C

A

D

B

a(c,a)a(d,a)

a(b,d)

a(b,a)

Page 14: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Update exemplars

Dampens availabilities and scans edges to find exemplar k

Updates self-exemplar

C

A

D

Bupdate update

update update

exemplar=a exemplar=d

exemplar=a exemplar=a

Page 15: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Master logic

calculate responsibility

calculate availability

update exemplars

initial state

halt

if (exemplars agree they are exemplars && changed exemplars < ∆) then halt, otherwise continue

Page 16: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Performance & Scalability

Page 17: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Example graph sizes

Twitter 255M MAU (https://about.twitter.com/company), 208 average followers (Beevolve 2012)

→ Estimated >53B edges

Facebook 1.28B MAU (Q1/2014 report), 200+ average friends (2011 S1)

→ Estimated >256B edges

Graphs used in research publications

Bil

lio

ns

0

1.75

3.5

5.25

7

Clueweb 09 Twitter dataset Friendster Yahoo! web

Rough social network scale*

Bil

lio

ns

0

75

150

225

300

Twitter Est* Facebook Est*

Page 18: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Faster than Hive?

Application Graph Size CPU Time Speedup Elapsed Time Speedup

Page rank(single iteration)

400B+ edges 26x 120x

Friends of friends score 71B+ edges 12.5x 48x

Page 19: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Apache Giraph scalabilityScalability of workers

(200B edges)

Seco

nd

s

0

125

250

375

500

# of Workers

50 100 150 200 250 300

Giraph Ideal

Scalability of edges (50 workers)

Seco

nd

s

0

125

250

375

500

# of Edges

1E+09 7E+10 1E+11 2E+11

Giraph Ideal

Page 20: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Trillion social edges page rankM

inu

tes

per

iter

atio

n

0

1

2

3

4

6/30/2013 6/2/2014

Improvements

• GIRAPH-840 - Netty 4 upgrade

• G1 Collector / tuning

Page 21: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Graph partitioning

Page 22: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Why balanced partitioningRandom partitioning == good balance

BUT ignores entity affinity

0 1

2

3

4 5

6

7

8 9

10

11

Page 23: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Balanced partitioning applicationResults from one service:

Cache hit rate grew from 70% to 85%, bandwidth cut in 1/2

!

!

0

2

3

5

6 9

11

1 4 7

8

10

Page 24: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Balanced label propagation results

* Loosely based on Ugander and Backstrom. Balanced label propagation for partitioning massive graphs, WSDM '13

Page 25: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Leveraging partitioningExplicit remapping

Native remapping

• Transparent

• Embedded

Page 26: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Explicit remapping

Id EdgesSan Jose

(Chicago, 4)

(New York, 6)

Chicago(San Jose, 4)

(New York, 3)

New York

(San Jose, 6)

(Chicago, 3)

Original graph

Partitioning Mapping

Id Alt Id

San Jose 0

Chicago 1

New York 2

Id Edges

0(1, 4)

(2, 6)

1(0, 4)

(2, 3)

2(0, 6)

(1, 3)

Remapped graph

Reverse partitionmapping

Alt Id Id

0 San Jose

1 Chicago

2 New YorkCo

mp

ute

- sh

ort

est

pat

hs

fro

m 0

Join

Compute output

Id Distance

0 0

1 4

2 6

Join Id DistanceSan Jose 0

Chicago 4

New York 6

Final compute output

Page 27: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Native transparent remapping

Id EdgesSan Jose

(Chicago, 4)

(New York, 6)

Chicago(San Jose, 4)

(New York, 3)

New York

(San Jose, 6)

(Chicago, 3)

Original graph

Partitioning Mapping

Id Group

San Jose 0

Chicago 1

New York 2

Id DistanceSan Jose 0

Chicago 4

New York 6

Final compute output

Id Group Edges

San Jose 0(Chicago, 4)

(New York, 6)

Chicago 1(San Jose, 4)

(New York, 3)

New York 2(San Jose, 6)

(Chicago, 3)

Original graph with group information

Co

mp

ute

- sh

ort

est

pat

hs

fro

m

“San

Jose

Page 28: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Native embedded remapping

Id Edges

0(1, 4)

(2, 6)

1(0, 4)

(2, 3)

2(0, 6)

(1, 3)

Original graph

Partitioning Mapping

Id Mach

0 0

1 1

2 0

Id Distance

0 0

1 4

2 6

Final compute output

Top bits machine, Id Edges

0, 0(1, 4)

(2, 6)

1, 1(0, 4)

(2, 3)

0, 2(0, 6)

(1, 3)

Original graph with mapping embedded in Id

Co

mp

ute

- sh

ort

est

pat

hs

fro

m

“San

Jose

Not all graphs can leverage this technique, Facebook can since ids are longs with unused bits.

Page 29: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Remapping comparison

ExplicitNative

TransparentNative

Embedded

Pros

• Can also add id compression

• No application change, just additional input parameters

!

• Utilize unused bits

Cons

•Application aware of remapping

•Workflow complexity •Pre and post joins overhead

• Additional memory usage on input

• Group information uses more memory

• Application changes Id type

Page 30: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Partitioning experiments

345B edge page rankSe

con

ds

per

iter

atio

n

0

40

80

120

160

Random 47% Local 60% Local

Page 31: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Message explosion

Page 32: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Avoiding out-of-coreExample: Mutual friends calculation between neighbors

1. Send your friends a list of your friends

2. Intersect with your friend list

!

1.23B (as of 1/2014)

200+ average friends (2011 S1)

8-byte ids (longs)

= 394 TB / 100 GB machines

3,940 machines (not including the graph)

A B

C

D

E

A:{D} D:{A,E} E:{D}

B:{} C:{D} D:{C}

A:{C} C:{A,E} E:{C}

!C:{D} D:{C}

!!E:{}

Page 33: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs
Page 34: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Superstep splittingSubsets of sources/destinations edges per superstep

* Currently manual - future work automatic!

A

Sources: A (on), B (off) Destinations: A (on), B (off)

B

BB

AA

A

Sources: A (on), B (off) Destinations: A (off), B (on)

B

BB

AA

A

Sources: A (off), B (on) Destinations: A (on), B (off)

B

BB

AA

A

Sources: A (off), B (on) Destinations: A (off), B (on)

B

BB

AA

Page 35: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Giraph in productionOver 1.5 years in production

Hundreds of production Giraph jobs processed a week

• Lots of untracked experiments

30+ applications in our internal application repository

Sample production job - 700B+ edges

Job times range from minutes to hours

Page 36: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

GiraphicJam demo

Page 37: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Giraph related projects

Graft: The distributed Giraph debugger

Page 38: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Giraph roadmap

2/12 - 0.1 6/14 - 1.15/13 - 1.0

Page 39: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

The future

Page 40: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Scheduling jobs

Time

Time

Snapshot automatically after a time period and restart at end of queue

Page 41: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Democratize Giraph?Higher level primitives

(i.e. HelP - Salihoglu)

• Filter!• Aggregating Neighbor

Values (ANV)!• Local Update of Vertices

(LUV)!• Update Vertices Using

One Other Vertex (UVUOV)!

• Updates vertex values by using a value from one other vertex (not necessarily a neighbor)!

• Form Supervertices (FS)!• Aggregate Global Value

(AGV)!

Graph traversal language (i.e. Gremlin)

// calculate basic!// collaborative filtering for!// vertex 1!m = [:]!g.v(1).out('likes').in('likes').out('likes').groupCount(m)!m.sort{-it.value}!!// calculate the primary!// eigenvector (eigenvector!// centrality) of a graph!m = [:]; c = 0;!g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000}!m.sort{-it.value}!

Implement lots of algorithms?

./run-page-rank !-input pages !-output page_rank_output!!./run-mutual-friends !-input friendlist !-output pair_count_output!!./run-graph-partitioning !-input vertices_edges !-output vertex_partition_list!!!!!!!!

Page 42: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Future workInvestigate alternative computing models

• Giraph++ (IBM research)

• Giraphx (University at Buffalo, SUNY)

Performance

Applications

Page 43: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs

Our team

!

Maja Kabiljo

Sergey Edunov

Pavan Athivarapu

Avery Ching

Sambavi Muthukrishnan

Page 44: Giraph: Production-grade graph processing infrastructure ...press3.mcs.anl.gov/computingschool/files/2014/01/... · 8/12/2014  · processing infrastructure for trillion edge graphs