170
Graph Processing Frameworks Lecture 24 CSCI 4974/6971 5 Dec 2016 1 / 13

Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Graph Processing FrameworksLecture 24

CSCI 4974/6971

5 Dec 2016

1 / 13

Page 2: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Today’s Biz

1. Reminders

2. Review

3. Graph Processing Frameworks

4. 2D Partitioning

2 / 13

Page 3: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Reminders

I Assignment 6: due date Dec 8th

I Final Project Presentation: December 8th

I Project Report: December 11thI Intro, Background and Prior Work, Methodology,

Experiments, ResultsI Include: Report as PDF, compilable source, data if small

or link if large (google drive, linux.cs.rpi.edu or ccifilesystems

I Office hours: Tuesday & Wednesday 14:00-16:00 Lally317

I Or email me for other availability

3 / 13

Page 4: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Today’s Biz

1. Reminders

2. Review

3. Graph Processing Frameworks

4. 2D Partitioning

4 / 13

Page 5: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Quick Review

Graphs on Manycores:

I Manycores: Xeon Phis and GPUsI Hundreds to thousands of cores, even more threadsI Work balance among threads is king

I

5 / 13

Page 6: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Today’s Biz

1. Reminders

2. Review

3. Graph Processing Frameworks

4. 2D Partitioning

6 / 13

Page 7: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PREGEL A System for Large-Scale Graph Processing

Page 8: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The Problem

• Large Graphs are often part of computations required in modern systems (Social networks and Web graphs etc.)

• There are many graph computing problems like shortest path, clustering, page rank, minimum cut, connected components etc. but there exists no scalable general purpose system for implementing them.

2 Pregel

Page 9: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Characteristics of the algorithms

• They often exhibit poor locality of memory access.

• Very little computation work required per vertex.

• Changing degree of parallelism over the course of execution.

Refer [1, 2] 3 Pregel

Page 10: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Possible solutions

• Crafting a custom distributed framework for every new algorithm.

• Existing distributed computing platforms like MapReduce. – These are sometimes used to mine large graphs[3, 4], but

often give sub-optimal performance and have usability issues.

• Single-computer graph algorithm libraries – Limiting the scale of the graph is necessary – BGL, LEDA, NetworkX, JDSL, Standford GraphBase or FGL

• Existing parallel graph systems which do not handle fault tolerance and other issues – The Parallel BGL[5] and CGMgraph[6]

Pregel 4

Page 11: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Pregel

Google, to overcome, these challenges came up with Pregel.

• Provides scalability

• Fault-tolerance

• Flexibility to express arbitrary algorithms

The high level organization of Pregel programs is inspired by Valiant’s Bulk Synchronous Parallel model[7].

Pregel 5

Page 12: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Message passing model

A pure message passing model has been used, omitting remote reads and ways to emulate shared memory because:

1. Message passing model was found sufficient for all graph algorithms

2. Message passing model performs better than reading remote values because latency can be amortized by delivering larges batches of messages asynchronously.

Pregel 6

Page 13: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Message passing model

Pregel 7

Page 14: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Example

Find the largest value of a vertex in a strongly connected graph

8 Pregel

Page 15: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

3 6 2 1

3 6 2 1 6 2 6 6

6 6 2 6 6 6

6 6 6 6 6

Blue Arrows are messages

Blue vertices have voted to halt

9 Pregel

6

Finding the largest value in a graph

Page 16: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Basic Organization

• Computations consist of a sequence of iterations called supersteps.

• During a superstep, the framework invokes a user defined function for each vertex which specifies the behavior at a single vertex V and a single Superstep S. The function can: – Read messages sent to V in superstep S-1 – Send messages to other vertices that will be received

in superstep S+1 – Modify the state of V and of the outgoing edges – Make topology changes (Introduce/Delete/Modify

edges/vertices)

10 Pregel

Page 17: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Basic Organization - Superstep

11 Pregel

Page 18: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Model Of Computation: Entities

VERTEX

• Identified by a unique identifier.

• Has a modifiable, user defined value.

EDGE

• Source vertex and Target vertex identifiers.

• Has a modifiable, user defined value.

Pregel 12

Page 19: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Model Of Computation: Progress

• In superstep 0, all vertices are active.

• Only active vertices participate in a superstep.

– They can go inactive by voting for halt.

– They can be reactivated by an external message from another vertex.

• The algorithm terminates when all vertices have voted for halt and there are no messages in transit.

13 Pregel

Page 20: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Model Of Computation: Vertex

State machine for a vertex

14 Pregel

Page 21: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Comparison with MapReduce

Graph algorithms can be implemented as a series of MapReduce invocations but it requires passing of entire state of graph from one stage to the next, which is not the case with Pregel.

Also Pregel framework simplifies the programming complexity by using supersteps.

15 Pregel

Page 22: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API

Creating a Pregel program typically involves subclassing the predefined Vertex class.

• The user overrides the virtual Compute() method. This method is the function that is computed for every active vertex in supersteps.

• Compute() can get the vertex’s associated value by GetValue() or modify it using MutableValue()

• Values of edges can be inspected and modified using the out-edge iterator.

16 Pregel

Page 23: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API – Message Passing

Each message consists of a value and the name of the destination vertex.

–The type of value is specified in the template parameter of the Vertex class.

Any number of messages can be sent in a superstep.

–The framework guarantees delivery and non-duplication but not in-order delivery.

A message can be sent to any vertex if it’s identifier is known.

17 Pregel

Page 24: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API – Pregel Code

Pregel Code for finding the max value

Class MaxFindVertex

: public Vertex<double, void, double> {

public:

virtual void Compute(MessageIterator* msgs) {

int currMax = GetValue();

SendMessageToAllNeighbors(currMax);

for ( ; !msgs->Done(); msgs->Next()) {

if (msgs->Value() > currMax)

currMax = msgs->Value();

}

if (currMax > GetValue())

*MutableValue() = currMax;

else VoteToHalt();

}

}; 18 Pregel

Page 25: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API – Combiners

Sending a message to another vertex that exists on a different machine has some overhead. However if the algorithm doesn’t require each message explicitly but a function of it (example sum) then combiners can be used.

This can be done by overriding the Combine() method.

-It can be used only for associative and commutative operations.

19 Pregel

Page 26: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API – Combiners

Example: Say we want to count the number of incoming links to all the pages in a set of interconnected pages.

In the first iteration, for each link from a vertex(page) we will send a message to the destination page.

Here, count function over the incoming messages can be used a combiner to optimize performance.

In the MaxValue Example, a Max combiner would reduce the communication load.

20 Pregel

Page 27: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API – Combiners

21 Pregel

Page 28: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API – Aggregators

They are used for Global communication, monitoring and data.

Each vertex can produce a value in a superstep S for the Aggregator to use. The Aggregated value is available to all the vertices in superstep S+1.

Aggregators can be used for statistics and for global communication.

Can be implemented by subclassing the Aggregator Class

Commutativity and Assosiativity required

22 Pregel

Page 29: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API – Aggregators

Example: Sum operator applied to out-edge count of each vertex can be used to generate the total number of edges in the graph and communicate it to all the vertices. - More complex reduction operators can even generate histograms. In the MaxValue example, we can finish the entire program in a single superstep by using a Max aggregator. 23 Pregel

Page 30: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

The C++ API – Topology Mutations

The Compute() function can also be used to modify the structure of the graph.

Example: Hierarchical Clustering

Mutations take effect in the superstep after the requests were issued. Ordering of mutations, with

– deletions taking place before additions, – deletion of edges before vertices and – addition of vertices before edges

resolves most of the conflicts. Rest are handled by user-defined handlers.

24 Pregel

Page 31: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Implementation

Pregel is designed for the Google cluster architecture.

The architecture schedules jobs to optimize resource allocation, involving killing instances or moving them to different locations.

Persistent data is stored as files on a distributed storage system like GFS[8] or BigTable.

25 Pregel

Page 32: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Basic Architecture

The Pregel library divides a graph into partitions, based on the vertex ID, each consisting of a set of vertices and all of those vertices’ out-going edges.

The default function is hash(ID) mod N, where N is the number of partitions.

The next few slides describe the several stages of the execution of a Pregel program.

26 Pregel

Page 33: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Pregel Execution

1. Many copies of the user program begin executing on a cluster of machines. One of these copies acts as the master.

The master is not assigned any portion of the graph, but is responsible for coordinating worker activity.

27 Pregel

Page 34: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Pregel Execution

2. The master determines how many partitions the graph will have and assigns one or more partitions to each worker machine.

Each worker is responsible for maintaining the state of its section of the graph, executing the user’s Compute() method on its vertices, and managing messages to and from other workers.

28 Pregel

Page 35: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Pregel Execution

29 Pregel

1 4 2

6

8

9

10

3

5

7

11 12

Page 36: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Pregel Execution

3. The master assigns a portion of the user’s input to each worker.

The input is treated as a set of records, each of which contains an arbitrary number of vertices and edges.

After the input has finished loading, all vertices are marked are active.

30 Pregel

Page 37: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Pregel Execution

4. The master instructs each worker to perform a superstep. The worker loops through its active vertices, and call Compute() for each active vertex. It also delivers messages that were sent in the previous superstep.

When the worker finishes it responds to the master with the number of vertices that will be active in the next superstep.

31 Pregel

Page 38: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Pregel Execution

32 Pregel

Page 39: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Pregel Execution

33 Pregel

Page 40: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Fault Tolerance

• Checkpointing is used to implement fault tolerance.

– At the start of every superstep the master may instruct the workers to save the state of their partitions in stable storage.

– This includes vertex values, edge values and incoming messages.

• Master uses “ping“ messages to detect worker failures.

34 Pregel

Page 41: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Fault Tolerance

• When one or more workers fail, their associated partitions’ current state is lost.

• Master reassigns these partitions to available set of workers. – They reload their partition state from the most

recent available checkpoint. This can be many steps old.

– The entire system is restarted from this superstep.

• Confined recovery can be used to reduce this load

35 Pregel

Page 42: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Applications

PageRank

36 Pregel

Page 43: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank

PageRank is a link analysis algorithm that is used to determine the importance of a document based on the number of references to it and the importance of the source documents themselves.

[This was named after Larry Page (and not after rank of a webpage)]

37 Pregel

Page 44: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank

A = A given page

T1 …. Tn = Pages that point to page A (citations)

d = Damping factor between 0 and 1 (usually kept as 0.85)

C(T) = number of links going out of T

PR(A) = the PageRank of page A

))(

)(........

)(

)(

)(

)(()1()(

2

2

1

1

n

n

TC

TPR

TC

TPR

TC

TPRddAPR

38 Pregel

Page 45: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank

Courtesy: Wikipedia

39 Pregel

Page 46: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank

40 Pregel

PageRank can be solved in 2 ways: • A system of linear equations • An iterative loop till convergence

We look at the pseudo code of iterative version Initial value of PageRank of all pages = 1.0; While ( sum of PageRank of all pages – numPages > epsilon) { for each Page Pi in list { PageRank(Pi) = (1-d); for each page Pj linking to page Pi { PageRank(Pi) += d × (PageRank(Pj)/numOutLinks(Pj)); } } }

Page 47: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank in MapReduce – Phase I

Parsing HTML • Map task takes (URL, page content) pairs and

maps them to (URL, (PRinit, list-of-urls)) – PRinit is the “seed” PageRank for URL – list-of-urls contains all pages pointed to by URL

• Reduce task is just the identity function

41 Pregel

Page 48: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank in MapReduce – Phase 2

PageRank Distribution • Map task takes (URL, (cur_rank, url_list))

– For each u in url_list, emit (u, cur_rank/|url_list|)

– Emit (URL, url_list) to carry the points-to list along through iterations

• Reduce task gets (URL, url_list) and many (URL, val) values – Sum vals and fix up with d – Emit (URL, (new_rank, url_list))

42 Pregel

Page 49: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank in MapReduce - Finalize

• A non-parallelizable component determines whether convergence has been achieved

• If so, write out the PageRank lists - done

• Otherwise, feed output of Phase 2 into another Phase 2 iteration

43 Pregel

Page 50: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank in Pregel

Class PageRankVertex

: public Vertex<double, void, double> {

public:

virtual void Compute(MessageIterator* msgs) {

if (superstep() >= 1) {

double sum = 0;

for (; !msgs->done(); msgs->Next())

sum += msgs->Value();

*MutableValue() = 0.15 + 0.85 * sum;

}

if (supersteps() < 30) {

const int64 n = GetOutEdgeIterator().size();

SendMessageToAllNeighbors(GetValue() / n);

} else {

VoteToHalt();

}}};

44 Pregel

Page 51: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank in Pregel

The pregel implementation contains the PageRankVertex, which inherits from the Vertex class.

The class has the vertex value type double to store tentative PageRank and message type double to carry PageRank fractions.

The graph is initialized so that in superstep 0, value of each vertex is 1.0 .

45 Pregel

Page 52: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank in Pregel

In each superstep, each vertex sends out along each outgoing edge its tentative PageRank divided by the number of outgoing edges.

Also, each vertex sums up the values arriving on messages into sum and sets its own tentative PageRank to

For convergence, either there is a limit on the number of supersteps or aggregators are used to detect convergence.

46 Pregel

sum 85.015.0

Page 53: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Apache GiraphLarge-scale Graph Processing on Hadoop

Claudio Martella <[email protected]> @claudiomartella

Hadoop Summit @ Amsterdam - 3 April 2014

Page 54: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

2

Page 55: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Graphs are simple

3

Page 56: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

A computer network

4

Page 57: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

A social network

5

Page 58: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

A semantic network

6

Page 59: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

A map

7

Page 60: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Graphs are huge

•Google’s index contains 50B pages

•Facebook has around1.1B users

•Google+ has around 570M users

•Twitter has around 530M users

VERY rough estimates!

8

Page 61: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

9

Page 62: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Graphs aren’t easy

10

Page 63: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Graphs are nasty.

11

Page 64: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Each vertex depends on its

neighbours, recursively.

12

Page 65: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Recursive problems are nicely solved iteratively.

13

Page 66: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

PageRank in MapReduce

•Record: < v_i, pr, [ v_j, ..., v_k ] >

•Mapper: emits < v_j, pr / #neighbours >

•Reducer: sums the partial values

14

Page 67: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

MapReduce dataflow

15

Page 68: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Drawbacks

•Each job is executed N times

•Job bootstrap

•Mappers send PR values and structure

•Extensive IO at input, shuffle & sort, output

16

Page 69: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

17

Page 70: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Timeline

•Inspired by Google Pregel (2010)

•Donated to ASF by Yahoo! in 2011

•Top-level project in 2012

•1.0 release in January 2013

•1.1 release in days 2014

18

Page 71: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Plays well with Hadoop

19

Page 72: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Vertex-centric API

20

Page 73: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BSP machine

21

Page 74: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BSP & Giraph

22

Page 75: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Advantages

•No locks: message-based communication

•No semaphores: global synchronization

•Iteration isolation: massively parallelizable

23

Page 76: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Architecture

24

Page 77: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Giraph job lifetime

25

Page 78: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Designed for iterations

•Stateful (in-memory)

•Only intermediate values (messages) sent

•Hits the disk at input, output, checkpoint

•Can go out-of-core

26

Page 79: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

A bunch of other things

•Combiners (minimises messages)

•Aggregators (global aggregations)

•MasterCompute (executed on master)

•WorkerContext (executed per worker)

•PartitionContext (executed per partition)

27

Page 80: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Shortest Paths

28

Page 81: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Shortest Paths

29

Page 82: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Shortest Paths

30

Page 83: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Shortest Paths

31

Page 84: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Shortest Paths

32

Page 85: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Composable API

33

Page 86: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Checkpointing

34

Page 87: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

No SPoFs

35

Page 88: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Giraph scales

36

ref: https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920

Page 89: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Giraph is fast

• 100x over MR (Pr)

• jobs run within minutes

• given you have resources ;-)

37

Page 90: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Serialised objects

38

Page 91: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Primitive types

•Autoboxing is expensive

•Objects overhead (JVM)

•Use primitive types on your own

•Use primitive types-based libs (e.g. fastutils)

39

Page 92: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Sharded aggregators

40

Page 93: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Many stores with Gora

41

Page 94: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

And graph databases

42

Page 95: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Current and next steps

•Out-of-core graph and messages

•Jython interface

•Remove Writable from < I V E M >

•Partitioned supernodes

•More documentation

43

Page 96: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

GraphLab:

A New Framework for Parallel

Machine Learning

Yucheng Low, Aapo Kyrola, Carlos Guestrin, Joseph Gonzalez, Danny Bickson, Joe Hellerstein

Presented by Guozhang Wang

DB Lunch, Nov.8, 2010

Page 97: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Overview

Programming ML Algorithms in Parallel

◦ Common Parallelism and MapReduce

◦ Global Synchronization Barriers

GraphLab

◦ Data Dependency as a Graph

◦ Synchronization as Fold/Reduce

Implementation and Experiments

From Multicore to Distributed

Environment

Page 98: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Parallel Processing for ML

Parallel ML is a Necessity

◦ 13 Million Wikipedia Pages

◦ 3.6 Billion photos on Flickr

◦ etc

Parallel ML is Hard to Program

◦ Concurrency v.s. Deadlock

◦ Load Balancing

◦ Debug

◦ etc

Page 99: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

MapReduce is the Solution?

High-level abstraction: Statistical Query

Model [Chu et al, 2006]

Weighted Linear Regression: only sufficient statistics

𝚹 = A-1b, A = 𝚺wi(xixiT), b = 𝚺wi(xiyi)

Page 100: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

MapReduce is the Solution?

High-level abstraction: Statistical Query

Model [Chu et al, 2006]

K-Means: only data assignments

class mean = avg(xi), xi in class

Embarrassingly Parallel independent computation

No Communication needed

Page 101: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

ML in MapReduce

Multiple Mapper

Single Reducer

Iterative MapReduce needs global synchronization at the single reducer

◦ K-means

◦ EM for graphical models

◦ gradient descent algorithms, etc

Page 102: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Not always Embarrassingly

Parallel

Data Dependency: not MapReducable

◦ Gibbs Sampling

◦ Belief Propagation

◦ SVM

◦ etc

Capture Dependency as a Graph!

Page 103: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Overview

Programming ML Algorithms in Parallel

◦ Common Parallelism and MapReduce

◦ Global Synchronization Barriers

GraphLab

◦ Data Dependency as a Graph

◦ Synchronization as Fold/Reduce

Implementation and Experiments

From Multicore to Distributed

Environment

Page 104: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Key Idea of GraphLab

Sparse Data Dependencies

Local Computations

X4 X5 X6

X9X8

X3X2X1

X7

Page 105: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

GraphLab for ML

High-level Abstract

◦ Express data dependencies

◦ Iterative

Automatic Multicore Parallelism

◦ Data Synchronization

◦ Consistency

◦ Scheduling

Page 106: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Main Components of GraphLab

Data Graph

Shared Data Table

Scheduling

Update Functions and Scopes

GraphLabModel

Page 107: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Data Graph

A Graph with data associated with every

vertex and edge.

x3: Sample value

C(X3): sample counts

Φ(X6,X9): Binary potential

X1

X2

X3

X5

X6

X7

X8

X9

X10

X4

X11

Page 108: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Update Functions

Operations applied on a vertex that

transform data in the scope of the vertex

Gibbs Update:

- Read samples on adjacent

vertices

- Read edge potentials

- Compute a new sample for

the current vertex

Page 109: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Scope Rules

Consistency v.s. Parallelism

◦ Belief Propagation: Only uses edge data

◦ Gibbs Sampling: Needs to read adjacent

vertices

Page 110: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Scheduling

Scheduler determines the order of

Update Function evaluations

Static Scheduling

◦ Round Robin, etc

Dynamic Scheduling

◦ FIFO, Priority Queue, etc

Page 111: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Dynamic Scheduling

e f g

kjih

dcbaCPU 1

CPU 2

a

h

a

b

b

i

Page 112: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Global Information

Shared Data Table in Shared Memory

◦ Model parameters (updatable)

◦ Sufficient statistics (updatable)

◦ Constants, etc (fixed)

Sync Functions for Updatable Shared Data

◦ Accumulate performs an aggregation over

vertices

◦ Apply makes a final modification to the

accumulated data

Page 113: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Sync Functions

Much like Fold/Reduce

◦ Execute Aggregate over every vertices in turn

◦ Execute Apply once at the end

Can be called

◦ Periodically when update functions are active

(asynchronous) or

◦ By the update function or user code

(synchronous)

Page 114: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

GraphLab

GraphLabModel

Data Graph

Shared Data Table

SchedulingUpdate Functions and

Scopes

Page 115: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Overview

Programming ML Algorithms in Parallel

◦ Common Parallelism and MapReduce

◦ Global Synchronization Barriers

GraphLab

◦ Data Dependency as a Graph

◦ Synchronization as Fold/Reduce

Implementation and Experiments

From Multicore to Distributed

Environment

Page 116: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Implementation and Experiments

Shared Memory Implemention in C++

using Pthreads

Applications:

◦ Belief Propagation

◦ Gibbs Sampling

◦ CoEM

◦ Lasso

◦ etc (more on the project page)

Page 117: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Parallel Performance

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10 12 14 16

Sp

eed

up

Number of CPUs

Optimal

Bett

er

Round robin schedule

Colored Schedule

Page 118: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

From Multicore to Distributed

Enviroment MapReduce and GraphLab work well for

Multicores

◦ Simple High-level Abstract

◦ Local computation + global synchronization

When Migrate to Clusters

◦ Rethink Scope synchronization

◦ Rethink Shared Data single “reducer”

◦ Think Load Balancing

◦ Maybe think abstract model?

Page 119: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 1

Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin

http://www.dima.tu-berlin.de/

Hot Topics in Information Management PowerGraph: Distributed Graph-Parallel

Computation on Natural Graphs

Igor Shevchenko

Mentor: Sebastian Schelter

Page 120: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 2

Agenda

1. Natural Graphs: Properties and Problems;

2. PowerGraph: Vertex Cut and Vertex Programs;

3. GAS Decomposition;

4. Vertex Cut Partitioning;

5. Delta Caching;

6. Applications and Evaluation;

Paper: Gonzalez at al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.

Page 121: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 3

■ Natural graphs are graphs derived from real-world or natural phenomena;

■ Graphs are big: billions of vertices and edges and rich metadata;

Natural graphs have

Power-Law Degree Distribution

Natural Graphs

Page 122: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 4

Power-Law Degree Distribution

(Andrei Broder et al. Graph structure in the web)

Page 123: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 5

■ We want to analyze natural graphs;

■ Essential for Data Mining and Machine Learning;

Goal

Identify influential people and information; Identify special nodes and communities; Model complex data dependencies;

Target ads and products; Find communities; Flow scheduling;

Page 124: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 6

■ Existing distributed graph computation systems

perform poorly on natural graphs (Gonzalez et al.

OSDI ’12);

■ The reason is presence of high degree vertices;

Problem

High Degree Vertices: Star-like motif

Page 125: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 7

Possible problems with high degree vertices:

■ Limited single-machine resources;

■ Work imbalance;

■ Sequential computation;

■ Communication costs;

■ Graph partitioning;

Applicable to:

■ Hadoop; GraphLab; Pregel (Piccolo);

Problem Continued

Page 126: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 8

■ High degree vertices can exceed the memory capacity of a single machine;

■ Store edge meta-data and adjacency information;

Problem: Limited Single-Machine Resources

Page 127: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 9

■ The power-law degree distribution can lead to significant work imbalance and frequency barriers;

■ For ex. with synchronous execution (Pregel):

Problem: Work Imbalance

Page 128: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 10

■ No parallelization of individual vertex-programs;

■ Edges are processed sequentially;

■ Locking does not scale well to high degree vertices (for ex. in GraphLab);

Problem: Sequential Computation

Sequentially process edges Asynchronous execution requires heavy locking

Page 129: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 11

■ Generate and send large amount of identical messages (for ex. in Pregel);

■ This results in communication asymmetry;

Problem: Communication Costs

Page 130: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 12

■ Natural graphs are difficult to partition;

■ Pregel and GraphLab use random (hashed) partitioning on natural graphs thus maximizing the network communication;

Problem: Graph Partitioning

Page 131: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 13

■ Natural graphs are difficult to partition;

■ Pregel and GraphLab use random (hashed) partitioning on natural graphs thus maximizing the network communication;

Expected edges that are cut

Examples:

■ 10 machines:

■ 100 machines:

Problem: Graph Partitioning Continued

= number of machines

90% of edges cut;

99% of edges cut;

Page 132: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 14

■ GraphLab and Pregel are not well suited for computations on natural graphs;

Reasons:

■ Challenges of high-degree vertices;

■ Low quality partitioning;

Solution:

■ PowerGraph new abstraction;

In Summary

Page 133: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 15

PowerGraph

Page 134: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 16

Two approaches for partitioning the graph in a distributed environment:

■ Edge Cut;

■ Vertex Cut;

Partition Techniques

Page 135: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 17

■ Used by Pregel and GraphLab abstractions;

■ Evenly assign vertices to machines;

Edge Cut

Page 136: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 18

■ Used by PowerGraph abstraction;

■ Evenly assign edged to machines;

Vertex Cut The strong point of the paper

4 edges 4 edges

Page 137: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 19

Think like a Vertex

[Malewicz et al. SIGMOD’10]

User-defined Vertex-Program:

1. Runs on each vertex;

2. Interactions are constrained by graph structure;

Pregel and GraphLab also use this concept, where

parallelism is achieved by running multiple vertex

programs simultaneously;

Vertex Programs

Page 138: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 20

■ Vertex cut distributes a single vertex-program across several machines;

■ Allows to parallelize high-degree vertices;

GAS Decomposition The strong point of the paper

Page 139: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 21

Generalize the vertex-program into three phases:

1. Gather

Accumulate information about neighborhood;

2. Apply

Apply accumulated value to center vertex;

3. Scatter

Update adjacent edges and vertices;

GAS Decomposition

Gather, Apply and Scatter are user-defined functions;

The strong point of the paper

Page 140: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 22

■ Executed on the edges in parallel;

■ Accumulate information about neighborhood;

Gather Phase

Page 141: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

22.06.2015 DIMA – TU Berlin 23

■ Executed on the central vertex;

■ Apply accumulated value to center vertex;

Apply Phase

Page 142: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Today’s Biz

1. Reminders

2. Review

3. Graph Processing Frameworks

4. 2D Partitioning

7 / 13

Page 143: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

2D PartitioningAydin Buluc and Kamesh Madduri

8 / 13

Page 144: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Graph Partitioning for Scalable Distributed Graph Computations

Aydın Buluç Kamesh Madduri [email protected] [email protected]

10th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering February 13-14, 2012

Atlanta, GA

Page 145: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Overview of our study

• We assess the impact of graph partitioning for computations on ‘low diameter’ graphs

• Does minimizing edge cut lead to lower execution time?

• We choose parallel Breadth-First Search as a representative distributed graph computation

• Performance analysis on DIMACS Challenge instances

2

Page 146: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Key Observations for Parallel BFS

• Well-balanced vertex and edge partitions do not guarantee load-balanced execution, particularly for real-world graphs

– Range of relative speedups (8.8-50X, 256-way parallel concurrency) for low-diameter DIMACS graph instances.

• Graph partitioning methods reduce overall edge cut and communication volume, but lead to increased computational load imbalance

• Inter-node communication time is not the dominant cost in our tuned bulk-synchronous parallel BFS implementation

3

Page 147: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Talk Outline

• Level-synchronous parallel BFS on distributed-memory systems

– Analysis of communication costs

• Machine-independent counts for inter-node communication cost

• Parallel BFS performance results for several large-scale DIMACS graph instances

4

Page 148: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Parallel BFS strategies

5

1. Expand current frontier (level-synchronous approach, suited for low diameter graphs)

0 7

5

3

8

2

4 6

1

9

source vertex

2. Stitch multiple concurrent traversals (Ullman-Yannakakis, for high-diameter graphs)

• O(D) parallel steps • Adjacencies of all vertices in current frontier are visited in parallel

0 7

5

3

8

2

4 6

1

9 source vertex

• path-limited searches from “super vertices” • APSP between “super vertices”

Page 149: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

• Consider a logical 2D processor grid (pr * pc = p) and the dense matrix representation of the graph

• Assign each processor a sub-matrix (i.e, the edges within the sub-matrix)

“2D” graph distribution

0 7

5

3

8

2

4 6

1 x x x

x

x x

x x x

x x x

x x

x x x

x x x

x x x x

9 vertices, 9 processors, 3x3 processor grid

Flatten Sparse matrices

Per-processor local graph representation

Page 150: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BFS with a 1D-partitioned graph

Steps: 1. Local discovery: Explore adjacencies of vertices in current

frontier. 2. Fold: All-to-all exchange of adjacencies. 3. Local update: Update distances/parents for unvisited vertices.

0 1

2

3 6

5

4

[0,1] [0,3] [0,3] [1,0] [1,4] [1,6]

[2,3] [2,5] [2,5] [2,6] [3,0] [3,0] [3,2] [3,6]

[4,1] [4,5] [4,6] [5,2] [5,2] [5,4]

[6,1] [6,2] [6,3] [6,4]

Consider an undirected graph with n vertices and m edges

Each processor ‘owns’ n/p vertices and stores their adjacencies (~ 2m/p per processor, assuming balanced partitions).

Page 151: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BFS with a 1D-partitioned graph

Steps: 1. Local discovery: Explore adjacencies of vertices in current

frontier. 2. Fold: All-to-all exchange of adjacencies. 3. Local update: Update distances/parents for unvisited vertices.

0 1

2

3 6

5

4

Current frontier: vertices 1 (partition Blue) and 6 (partition Green) 1. Local discovery:

[1,0]

[6,2] [6, 3]

[1,4] [1,6]

[6,1] [6,4]

P0

P3

P1

P2

No work

No work

Page 152: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BFS with a 1D-partitioned graph

Steps: 1. Local discovery: Explore adjacencies of vertices in current

frontier. 2. Fold: All-to-all exchange of adjacencies. 3. Local update: Update distances/parents for unvisited vertices.

0 1

2

3 6

5

4

Current frontier: vertices 1 (partition Blue) and 6 (partition Green) 2. All-to-all exchange:

[1,0]

[6,2] [6, 3]

[1,4] [1,6]

[6,1] [6,4]

P0

P3

P1

P2

No work

No work

Page 153: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BFS with a 1D-partitioned graph

Steps: 1. Local discovery: Explore adjacencies of vertices in current

frontier. 2. Fold: All-to-all exchange of adjacencies. 3. Local update: Update distances/parents for unvisited vertices.

0 1

2

3 6

5

4

Current frontier: vertices 1 (partition Blue) and 6 (partition Green) 2. All-to-all exchange:

[1,0]

[6,2] [6, 3]

[1,4]

[1,6]

[6,1]

[6,4]

P0

P3

P1

P2

Page 154: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BFS with a 1D-partitioned graph

Steps: 1. Local discovery: Explore adjacencies of vertices in current

frontier. 2. Fold: All-to-all exchange of adjacencies. 3. Local update: Update distances/parents for unvisited vertices.

0 1

2

3 6

5

4

Current frontier: vertices 1 (partition Blue) and 6 (partition Green) 3. Local update:

[1,0]

[6,2] [6, 3]

[1,4]

[1,6]

[6,1]

[6,4]

P0

P3

P1

P2

0

2, 3

4

Frontier for next iteration

Page 155: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Modeling parallel execution time

• Time dominated by local memory references and inter-node communication

• Assuming perfectly balanced computation and communication, we have

12

p

mn

p

mpnLL

/,

Local latency on working set |n/p|

Inverse local RAM bandwidth

Local memory references:

pp

edgecutp NaaN )(2,

Inter-node communication:

All-to-all remote bandwidth with p participating processors

Page 156: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BFS with a 2D-partitioned graph

• Avoid expensive p-way All-to-all communication step

• Each process collectively ‘owns’ n/pr vertices

• Additional ‘Allgather’ communication step for processes in a row

13

Local memory references:

p

m

p

n

p

mrc pnLpnLL ,,

Inter-node communication:

cN

cr

cgatherN

rNraaN

pp

n

pp

pp

edgecutp

11)(

)(

,

2,

Page 157: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Temporal effects, communication-minimizing tuning prevent us from obtaining tighter bounds

• The volume of communication can be further reduced by maintaining state of non-local visited vertices

14

0 1

2

3 6

5

4

[0,3] [0,3] [1,3] [0,4] [1,4]

P0

Local pruning prior to All-to-all step

[0,6] [1,6] [1,6]

[0,3] [0,4] [1,6]

Page 158: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Predictable BFS execution time for synthetic small-world graphs

• Randomly permuting vertex IDs ensures load balance on R-MAT graphs (used in the Graph 500 benchmark).

• Our tuned parallel implementation for the NERSC Hopper system (Cray XE6) is ranked #2 on the current Graph 500 list.

15 Buluc & Madduri, Parallel BFS on distributed memory systems, Proc. SC’11, 2011.

Execution time is dominated by work performed in a few parallel phases

Page 159: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Modeling BFS execution time for real-world graphs

• Can we further reduce communication time utilizing existing partitioning methods?

• Does the model predict execution time for arbitrary low-diameter graphs?

• We try out various partitioning and graph distribution schemes on the DIMACS Challenge graph instances

– Natural ordering, Random, Metis, PaToH

16

Page 160: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Experimental Study

• The (weak) upper bound on aggregate data volume communication can be statically computed (based on partitioning of the graph)

• We determine runtime estimates of – Total aggregate communication volume

– Sum of max. communication volume during each BFS iteration

– Intra-node computational work balance

– Communication volume reduction with 2D partitioning

• We obtain and analyze execution times (at several different parallel concurrencies) on a Cray XE6 system (Hopper, NERSC)

17

Page 161: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Orderings for the CoPapersCiteseer graph

18

Natural Random

PaToH checkerboard PaToH Metis

Page 162: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

BFS All-to-all phase total communication volume normalized to # of edges (m)

# of partitions

Graph name

% compared to m

Natural Random PaToH

19

Page 163: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Ratio of max. communication volume across iterations to total communication volume

# of partitions

Graph name

Ratio over total volume

Natural Random PaToH

20

Page 164: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Reduction in total All-to-all communication volume with 2D partitioning

21

Graph name

Ratio compared to 1D

Natural Random PaToH

# of partitions

Page 165: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Edge count balance with 2D partitioning

Graph name

Max/Avg. ratio

Natural Random PaToH

# of partitions

Page 166: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Parallel speedup on Hopper with 16-way partitioning

23

Page 167: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Execution time breakdown

24

0

50

100

150

200

Random-1D Random-2D Metis-1D PaToH-1D

BFS

tim

e (

ms)

Partitioning Strategy

Computation Fold Expand

0

2

4

6

8

10

Random-1D Random-2D Metis-1D PaToH-1D

Co

mm

. tim

e (

ms)

Partitioning Strategy

Fold Expand

0

50

100

150

200

250

300

Random-1D Random-2D Metis-1D PaToH-1D

BFS

tim

e (

ms)

Partitioning Strategy

Computation Fold Expand

0

0.5

1

1.5

2

2.5

3

Random-1D Random-2D Metis-1D PaToH-1D

Co

mm

. tim

e (

ms)

Partitioning Strategy

Fold Expand

eu-2005 kron-simple-logn18

Page 168: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Imbalance in parallel execution

25

eu-2005, 16 processes*

PaToH Random

* Timeline of 4 processes shown in figures. PaToH-partitioned graph suffers from severe load imbalance in computational phases.

Page 169: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Conclusions

• Randomly permuting vertex identifiers improves computational and communication load balance, particularly at higher process concurrencies

• Partitioning methods reduce overall communication volume, but introduce significant load imbalance

• Substantially lower parallel speedup with real-world graphs compared to synthetic graphs (8.8X vs 50X at 256-way parallel concurrency) – Points to the need for dynamic load balancing

26

Page 170: Computer Science - Graph Processing Frameworksslotag/classes/FA16/slides/lec24... · 2010. 11. 8. · Reminders I Assignment 6: due date Dec 8th I Final Project Presentation: December

Today: In class work

I Develop 2D partitioning strategy

I Implement BFS

Blank code and data available on website(Lecture 24)

www.cs.rpi.edu/∼slotag/classes/FA16/index.html

9 / 13