Pregel reading circle

Pregel: A System for Large-Scale Graph Processing

2014 / 5 /14

Ishikawa Yasutaka

About this Paper

• Authers:Malewicz, GrzegorzAustern, Matthew HBik, Aart J.CDehnert, James CHorn, IlanLeiser, NatyCzajkowski, Grzegorz• Google’s paper

• Proceedings of the 2010 international conference on Management of data - SIGMOD '10

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

Outline

• Introduction

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

Today’s problems of graph processing

• Poor locality of memory access

• Very little work ver vertex

Methods of graph processing…(1/2)

1. Crafting a custom distributed infrastructure→typically requiring a substantial implementation effort

2. Relying on an existing distributed computing platform(e.g.,MapReduce)→this can lead to suboptimal performance and usability

issues.

Methods of graph processing…(2/2)

3. Using a single-computer graph algorithm library→limiting the scale of problems

4. Using an existing parallel graph system→do not address fault tolerance or other issues that are

important for very large scale distributed systems

What is Pregel

• Scalable graph processing model- Based on BSP(Bulk Synchronous Parallel)- Designed for efficient,scalable and fault- tolerant

Implementation on clusters- Distribution-related details are hidden behind an

abstract API

• Not open source software- Apach Giraph is a open source software

implementation of Pregel

Bulk Synchronous Parallel

• Bridging model for designing parallel algorithm

• BPS iterates superstep for computing

and synchronize all

processes at

each superstep

superstep

BSP’s algorithm(1/3)

1. Concurrent computation

2. Communication

3. Barrier synchronisation

Each thread processes their data concurrently,independently

2. Communication

They pass messages

2. Communication

They wait for completion of message passing of all other tread

Next superstep…

Outline

• Introduction

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

Pregel’s input and output

• Input: graph

• Output: graph

• Iterate superstep,which

consists of user defined function,

message passing

Graph:Input

Graph:output

Superstep

Graph component

• Graph of Pregel consists of vertex and edge• Vertex:

- Consisting of unique identifier, user defined value

- Outgoing edge and value are modifiable

• Edge:- Consisting of source vertex, target vertex, user defined value

- User defined value is modifiable

- Not first class citizen

Vertex value is modifiableD

Outgoing edge and edge value are modifiablea

State of vertex

• Vertex has two states:Active,Inactive

• In case vertex receives message, chage state to Active

• In case vertex has no message, change state to Inactive

Active Inactive

Vote to halt

Message received 16

Pregel’s Superstep

1. In Superstep S,vertex V, compute user defined fuctionwith messages send in Superstep S-1

2. Send messages to other vertices that will be received in Superstep S+1

3. Modify the state of V

4. If all other vertices finish 1~3, go to Superstep S+1

• When no further vertices change in a superstep, algorithm terminates with output

Example: maximum value(1/4)

3 6 2 1

:Active

:InactiveSuperstep 0

3 6 2 1

6 6 2 6

:Active

Superstep 1

3 6 2 1

6 6 2 6

6 6 6 6

:Active

Superstep 1

Superstep 2

3 6 2 1

6 6 2 6

6 6 6 6

:Active

Superstep 1

Superstep 2

Superstep 3

Outline

• Introduction

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

Vertex class

• Writing Pregel program involves subclassing the predefined Vertex class• Compute() method will be executed at each active vertex

Message Passing

• The type of message which sent by vertex is specified by the user as template parameter of Vertex class

• There is no guaranteed order of messages in the iterator, but it is guaranteed that messages will be delivered

Combiners

• Sending a message to a vertex on another machine incurs some overhead

• In some case, using combiners can reduce the number of messages

• To enable this, user subclass

Conbiner classReduction of messages

Aggregators(1/2)

• Pregel aggregators are a mechanism for global communication

• Each vertex can provide a value in Superstep S, and this value is made available to all vertices in Superstep S+1

Superstep S

Superstep S+1

4+2+1…

Sum aggregator: number of edges

Aggregators(2/2)

• To define a new aggregator, a user subclasses the predefined Aggregator class

Superstep S

Superstep S+1

4+2+1…

Sum aggregator: number of edges

Topology Mutations(1/2)

• Some graph algorithms need to change the graph’s topology

- Clustering algorithm

- Minimum spanning tree algorithm

• User’s Compute() function can issue requests to add or remove vertices or edges

- it causes conflicts

Topology Mutations(2/2)

• We can solve this conflict using two mechanisms- Partial ordering: edge remove → vertex remove → vertex addition → edge addition

- Handler: This picks one arbitrary. User can define hundler method in vertex subclass

• Partial ordering yields deterministic for most conflict

Input and output

• Pregel adapts to many file format in input and output

- It decouples the task of interpreting an input file from task of graph computation

- Library provides readers and writers

- Users can write own by subclassing Reader and Writer

File format A

File format B

Reader

Compute

File format C

File format D

Writer

Outline

• Introduction

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

Basic architecture(1/2)

• The Pregel library divides a graph into partitions

• Assignment of a vertex to a partition depends sololy on vertex ID

- Default partitioning function is Hash(ID):mod N

Basic architecture(2/2)

• The execution of a Pregel program consists of several stages

1. Many copies of the user program begin executing on a cluster of machines. One of these acts as the master

2. The master determines how many partitions the graph will have, and assigns partitions to each worker

3. The master assigns a portion of the user’s input to each worker

4. The master instructs each worker to perform a superstep

Fault tolerance(1/2)

• Fault tolerance is achieved through chechpointing

• The master instructs workers to save the state of their partitions to persistent storage is

- Including vertex values,edge values,imcoming messages

- Master separately saves the aggregator values

Fault tolerance(2/2)

• Worker failures are detected using regular “ping” messages the master issues to workers

• When one or more workers fail, the master reassigns graph partitions to the workers

- Repeating the missing Supersteps

Worker implementation

• A worker machine maintains the state of its portion of the graph in memory

• There are two copies of active flag and incoming message queue• One for the current superstep and another for the next

superstep

• In message sending, there are two pattern: remote, local

Master implementation

• The master assigns unique identifier to each worker at the time of registration

• The master maintains a list of all workers known to be active

• If any worker fails, the master enters recovery mode

• The master runs an HTTP server that display statistics about the progress of computation

Outline

• Introduction

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

[1]Page Rank(1/2)

• Page Rank algorithm decide the importance of web pages

• This algorithm is based on evaluation of paper- Good paper might be cited from many other papers

- 「A paper that is cited from papers cited from many papers」 might be good paper

• This is named from one of Google’s founders,

Larry “Page”

[1]Page Rank(2/2)

[2]Shortest Path(1/6)

• Shortest-Path problem: calculate the shortest path in given two nodes of a weighted graph

• There is several variety of Shortest-Path problem- The single-source shortest paths problem- The s-t shortest path problem- All-pairs shortest paths problem

• In this paper, focusing on single-source shortest paths problems

∞ ∞

Superstep 0

Superstep 1

Superstep 2

Superstep 3

Outline

• Introduction

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

Experiment details

• Three experiments with the single-source shortest paths

• Using a cluster of 300 multicore commodity PCs

• Reporting runtime for binary trees and log-normal graphs

- Binary tree, varying number of worker tasks- Binary tree, varying graph sizes- Log-normal, random graphs: varying graph sizes

[1]1 billion vertex binary tree:varyingnumber of worker tasks

• Setting- A billion vertices, the number of Pregelworkers varying from50 to 800

• Result- Using 16 times as many as Workersrepresents a speedupof about 10

[2]Binary tree:varying graph sizes on 800 worker tasks

• Setting- Varying in size from a billion to 50 billion vertices,using a fixed numberof 800 workertasks

• Result- tree size varying from a billion to 50 billion,the time increase from17.3 to 702

[3]Log-normal random graphs: varying graph sizes on 800 worker

tasks(1/2)

• Binary trees are not representative of graphs encountered in practice

• Use a log-normal distribution of outdegrees

• In this experiment, μ = 4, σ = 1.3

22 2/)(ln

[3]Log-normal random graphs: varying graph sizes on 800 worker

tasks(2/2)• Setting

- Varying in size from

10million to a a billion

vertices

• Result- Largest graph took

a little over 10 minutes

Outline

• Introduction

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

Conclusion

• They suggest a computing model that is suitable for graph processing, and has scalability, fault-tolerance

• They say that programmers can implement graph processing algorithm easily with Pregel

This slide’s sources(1/)

• http://www.slideshare.net/doryokujin/largescale-graph-processingintroduction

• http://shnya.jp/blog/?p=797

• http://www.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing

• http://teppei.hateblo.jp/entry/2013/11/11/232052

• http://ja.wikipedia.org/wiki/%E5%AF%BE%E6%95%B0%E6%AD%A3%E8%A6%8F%E5%88%86%E5%B8%83

• http://keisan.casio.jp/exec/system/1161228861

• http://www.atmarkit.co.jp/ait/articles/1203/22/news165_2.html

• http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

• http://research.preferred.jp/2011/06/bsp_piccolo_spark_introduction/

• http://ja.wikipedia.org/wiki/%E3%83%9A%E3%83%BC%E3%82%B8%E3%83%A9%E3%83%B3%E3%82%AF

• http://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%91%E3%83%8B%E3%83%B3%E3%82%B0%E3%83%84%E3%83%AA%E3%83%BC%E3%83%97%E3%83%AD%E3%83%88%E3%82%B3%E3%83%AB

• http://ja.wikipedia.org/wiki/%E6%9C%80%E7%9F%AD%E7%B5%8C%E8%B7%AF%E5%95%8F%E9%A1%8C

• http://matome.naver.jp/odai/2128685245125920701?&page=1

• http://www.cs.ucsb.edu/~prakash/projects/cs290b/index.html

This slide’s sources

• http://homepage2.nifty.com/well/Template.html

• http://ja.wikipedia.org/wiki/%E7%AC%AC%E4%B8%80%E7%B4%9A%E3%82%AA%E3%83%96%E3%82%B8%E3%82%A7%E3%82%AF%E3%83%88

• http://ja.wikipedia.org/wiki/%E3%82%AF%E3%83%AA%E3%83%BC%E3%82%AF_(%E3%82%B0%E3%83%A9%E3%83%95%E7%90%86%E8%AB%96)

• http://www.alaxala.com/jp/techinfo/archive/manual/AX2000R/HTML/KAISETS2/0078.HTM

Pregel reading circle

Technology

Reading Circle

PREGEL - home.apache.org

The Pregel Programming Model with Spark GraphX

Pregel Algorithms for Graph Connectivity Problems with

Pregel - MIT CSAIL

Cost Model for Pregel on GraphX

Circle-Keeping Process & Technique€¦ · 10. Circle Summary, Closing reading, Evaluation forms- Taking Action Your Circle-keeping manual (from the two-day training) will provide

Leadership Reading Circle Registration Form, Fall … Registration Form Fall 2013.pdf · Leadership Reading Circle Registration Form, Fall 2013 ... Soul of a Citizen: Living with

PREGEL a system for large scale graph processing

Oxford Reading Circle - Oxford University Press Reading Circle... · all the learning opportunities presented in the Oxford Reading Circle ... My New Rabbit: ow ... Look at the text

Pregel and GraphX

CATALOG - PreGel Family

THE CLASSIC DESSERT ACCESSORY - PreGel America

PREGEL® BASES

087 Pagerank Mapreduce Pregel

Reading!Circle:!The!First! Mission!to!the! Moon!

Processing large-scale graphs with Google Pregel

pregel interno GB - bakkersvak.nl

The virtuous circle: Modeling individual differences in L2 reading - Eric

Family daily temperature reading with circle care family app