Pregel reading circle

Preview:

DESCRIPTION

研究室での論文紹介のスライド

Citation preview

Pregel: A System for Large-Scale Graph Processing

2014 / 5 /14

Ishikawa Yasutaka

About this Paper

• Authers:Malewicz, GrzegorzAustern, Matthew HBik, Aart J.CDehnert, James CHorn, IlanLeiser, NatyCzajkowski, Grzegorz• Google’s paper

• Proceedings of the 2010 international conference on Management of data - SIGMOD '10

2

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

3

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

4

Today’s problems of graph processing

• Poor locality of memory access

• Very little work ver vertex

5

Methods of graph processing…(1/2)

1. Crafting a custom distributed infrastructure→typically requiring a substantial implementation effort

2. Relying on an existing distributed computing platform(e.g.,MapReduce)→this can lead to suboptimal performance and usability

issues.

6

Methods of graph processing…(2/2)

3. Using a single-computer graph algorithm library→limiting the scale of problems

4. Using an existing parallel graph system→do not address fault tolerance or other issues that are

important for very large scale distributed systems

7

What is Pregel

• Scalable graph processing model- Based on BSP(Bulk Synchronous Parallel)- Designed for efficient,scalable and fault- tolerant

Implementation on clusters- Distribution-related details are hidden behind an

abstract API

• Not open source software- Apach Giraph is a open source software

implementation of Pregel

8

Bulk Synchronous Parallel

• Bridging model for designing parallel algorithm

• BPS iterates superstep for computing

and synchronize all

processes at

each superstep

superstep

9

BSP’s algorithm(1/3)

1. Concurrent computation

2. Communication

3. Barrier synchronisation

Each thread processes their data concurrently,independently

10

BSP’s algorithm(2/3)

1. Concurrent computation

2. Communication

3. Barrier synchronisation

They pass messages

11

BSP’s algorithm(3/3)

1. Concurrent computation

2. Communication

3. Barrier synchronisation

They wait for completion of message passing of all other tread

Next superstep…

12

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

13

Pregel’s input and output

• Input: graph

• Output: graph

• Iterate superstep,which

consists of user defined function,

message passing

Graph:Input

Graph:output

Superstep

Superstep

Superstep

14

Graph component

• Graph of Pregel consists of vertex and edge• Vertex:

- Consisting of unique identifier, user defined value

- Outgoing edge and value are modifiable

• Edge:- Consisting of source vertex, target vertex, user defined value

- User defined value is modifiable

- Not first class citizen

A B

Vertex value is modifiableD

C

B

A

D

C

B

A

Outgoing edge and edge value are modifiablea

b c

d

15

State of vertex

• Vertex has two states:Active,Inactive

• In case vertex receives message, chage state to Active

• In case vertex has no message, change state to Inactive

Active Inactive

Vote to halt

Message received 16

Pregel’s Superstep

1. In Superstep S,vertex V, compute user defined fuctionwith messages send in Superstep S-1

2. Send messages to other vertices that will be received in Superstep S+1

3. Modify the state of V

4. If all other vertices finish 1~3, go to Superstep S+1

• When no further vertices change in a superstep, algorithm terminates with output

17

Example: maximum value(1/4)

3 6 2 1

3 6 2 1

:Active

:InactiveSuperstep 0

18

Example: maximum value(2/4)

3 6 2 1

6 6 2 6

6 6 2 6

:Active

:InactiveSuperstep 0

Superstep 1

19

Example: maximum value(3/4)

3 6 2 1

6 6 2 6

6 6 6 6

6 6 6 6

:Active

:InactiveSuperstep 0

Superstep 1

Superstep 2

20

Example: maximum value(4/4)

3 6 2 1

6 6 2 6

6 6 6 6

6 6 6 6

:Active

:InactiveSuperstep 0

Superstep 1

Superstep 2

Superstep 3

21

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

22

Vertex class

• Writing Pregel program involves subclassing the predefined Vertex class• Compute() method will be executed at each active vertex

23

Message Passing

• The type of message which sent by vertex is specified by the user as template parameter of Vertex class

• There is no guaranteed order of messages in the iterator, but it is guaranteed that messages will be delivered

24

Combiners

• Sending a message to a vertex on another machine incurs some overhead

• In some case, using combiners can reduce the number of messages

• To enable this, user subclass

Conbiner classReduction of messages

25

Aggregators(1/2)

• Pregel aggregators are a mechanism for global communication

• Each vertex can provide a value in Superstep S, and this value is made available to all vertices in Superstep S+1

Superstep S

4

2

1

Superstep S+1

7

7

7

4+2+1…

Sum aggregator: number of edges

26

Aggregators(2/2)

• To define a new aggregator, a user subclasses the predefined Aggregator class

Superstep S

4

2

1

Superstep S+1

7

7

7

4+2+1…

Sum aggregator: number of edges

27

Topology Mutations(1/2)

• Some graph algorithms need to change the graph’s topology

- Clustering algorithm

- Minimum spanning tree algorithm

• User’s Compute() function can issue requests to add or remove vertices or edges

- it causes conflicts

28

Topology Mutations(2/2)

• We can solve this conflict using two mechanisms- Partial ordering: edge remove → vertex remove → vertex addition → edge addition

- Handler: This picks one arbitrary. User can define hundler method in vertex subclass

• Partial ordering yields deterministic for most conflict

29

Input and output

• Pregel adapts to many file format in input and output

- It decouples the task of interpreting an input file from task of graph computation

- Library provides readers and writers

- Users can write own by subclassing Reader and Writer

File format A

File format B

Reader

Compute

File format C

File format D

Writer

30

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

31

Basic architecture(1/2)

• The Pregel library divides a graph into partitions

• Assignment of a vertex to a partition depends sololy on vertex ID

- Default partitioning function is Hash(ID):mod N

32

Basic architecture(2/2)

• The execution of a Pregel program consists of several stages

1. Many copies of the user program begin executing on a cluster of machines. One of these acts as the master

2. The master determines how many partitions the graph will have, and assigns partitions to each worker

3. The master assigns a portion of the user’s input to each worker

4. The master instructs each worker to perform a superstep

33

Fault tolerance(1/2)

• Fault tolerance is achieved through chechpointing

• The master instructs workers to save the state of their partitions to persistent storage is

- Including vertex values,edge values,imcoming messages

- Master separately saves the aggregator values

34

Fault tolerance(2/2)

• Worker failures are detected using regular “ping” messages the master issues to workers

• When one or more workers fail, the master reassigns graph partitions to the workers

- Repeating the missing Supersteps

35

Worker implementation

• A worker machine maintains the state of its portion of the graph in memory

• There are two copies of active flag and incoming message queue• One for the current superstep and another for the next

superstep

• In message sending, there are two pattern: remote, local

36

Master implementation

• The master assigns unique identifier to each worker at the time of registration

• The master maintains a list of all workers known to be active

• If any worker fails, the master enters recovery mode

• The master runs an HTTP server that display statistics about the progress of computation

37

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

38

[1]Page Rank(1/2)

• Page Rank algorithm decide the importance of web pages

• This algorithm is based on evaluation of paper- Good paper might be cited from many other papers

- 「A paper that is cited from papers cited from many papers」 might be good paper

• This is named from one of Google’s founders,

Larry “Page”

39

[1]Page Rank(2/2)

40

[2]Shortest Path(1/6)

• Shortest-Path problem: calculate the shortest path in given two nodes of a weighted graph

• There is several variety of Shortest-Path problem- The single-source shortest paths problem- The s-t shortest path problem- All-pairs shortest paths problem

• In this paper, focusing on single-source shortest paths problems

41

[2]Shortest Path(2/6)

∞ ∞

0 ∞

5

3

1 4

3 2

1

2

4

Superstep 0

42

[2]Shortest Path(3/6)

5 ∞

0 3

5

3

1 4

3 2

1

2

4

Superstep 1

43

[2]Shortest Path(4/6)

4 6

0 3

6

5

5

3

1 4

3 2

1

2

4

Superstep 2

44

[2]Shortest Path(5/6)

4 5

0 3

6

9

5

5

3

1 4

3 2

1

2

4

Superstep 3

45

[2]Shortest Path(6/6)

46

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

47

Experiment details

• Three experiments with the single-source shortest paths

• Using a cluster of 300 multicore commodity PCs

• Reporting runtime for binary trees and log-normal graphs

- Binary tree, varying number of worker tasks- Binary tree, varying graph sizes- Log-normal, random graphs: varying graph sizes

48

[1]1 billion vertex binary tree:varyingnumber of worker tasks

• Setting- A billion vertices, the number of Pregelworkers varying from50 to 800

• Result- Using 16 times as many as Workersrepresents a speedupof about 10

49

[2]Binary tree:varying graph sizes on 800 worker tasks

• Setting- Varying in size from a billion to 50 billion vertices,using a fixed numberof 800 workertasks

• Result- tree size varying from a billion to 50 billion,the time increase from17.3 to 702

50

[3]Log-normal random graphs: varying graph sizes on 800 worker

tasks(1/2)

• Binary trees are not representative of graphs encountered in practice

• Use a log-normal distribution of outdegrees

• In this experiment, μ = 4, σ = 1.3

ed

ddp

22 2/)(ln

2

1)(

51

[3]Log-normal random graphs: varying graph sizes on 800 worker

tasks(2/2)• Setting

- Varying in size from

10million to a a billion

vertices

• Result- Largest graph took

a little over 10 minutes

52

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

53

Conclusion

• They suggest a computing model that is suitable for graph processing, and has scalability, fault-tolerance

• They say that programmers can implement graph processing algorithm easily with Pregel

54

This slide’s sources(1/)

• http://www.slideshare.net/doryokujin/largescale-graph-processingintroduction

• http://shnya.jp/blog/?p=797

• http://www.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing

• http://teppei.hateblo.jp/entry/2013/11/11/232052

• http://ja.wikipedia.org/wiki/%E5%AF%BE%E6%95%B0%E6%AD%A3%E8%A6%8F%E5%88%86%E5%B8%83

55

This slide’s sources(2/)

• http://keisan.casio.jp/exec/system/1161228861

• http://www.atmarkit.co.jp/ait/articles/1203/22/news165_2.html

• http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

• http://research.preferred.jp/2011/06/bsp_piccolo_spark_introduction/

• http://ja.wikipedia.org/wiki/%E3%83%9A%E3%83%BC%E3%82%B8%E3%83%A9%E3%83%B3%E3%82%AF

56

This slide’s sources(3/)

• http://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%91%E3%83%8B%E3%83%B3%E3%82%B0%E3%83%84%E3%83%AA%E3%83%BC%E3%83%97%E3%83%AD%E3%83%88%E3%82%B3%E3%83%AB

• http://ja.wikipedia.org/wiki/%E6%9C%80%E7%9F%AD%E7%B5%8C%E8%B7%AF%E5%95%8F%E9%A1%8C

• http://matome.naver.jp/odai/2128685245125920701?&page=1

• http://www.cs.ucsb.edu/~prakash/projects/cs290b/index.html

57

This slide’s sources

• http://homepage2.nifty.com/well/Template.html

• http://ja.wikipedia.org/wiki/%E7%AC%AC%E4%B8%80%E7%B4%9A%E3%82%AA%E3%83%96%E3%82%B8%E3%82%A7%E3%82%AF%E3%83%88

• http://ja.wikipedia.org/wiki/%E3%82%AF%E3%83%AA%E3%83%BC%E3%82%AF_(%E3%82%B0%E3%83%A9%E3%83%95%E7%90%86%E8%AB%96)

• http://www.alaxala.com/jp/techinfo/archive/manual/AX2000R/HTML/KAISETS2/0078.HTM

58