31
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (GOOGLE, INC.)

Pregel : A System for Large-Scale Graph Processing

  • Upload
    deanna

  • View
    71

  • Download
    0

Embed Size (px)

DESCRIPTION

Pregel : A System for Large-Scale Graph Processing. Presented by Dylan Davis Authors: Grzegorz Malewicz , Matthew H. Austern , Aart J.C. Bik, James C. Dehnert , Ilan Horn, Naty Leiser , Grzegorz Czajkowski (GOOGLE, INC.). Overview. What is a graph? Graph Problems - PowerPoint PPT Presentation

Citation preview

Page 1: Pregel : A System for Large-Scale Graph Processing

Pregel: A System for Large-Scale Graph

ProcessingPresented by Dylan Davis

Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski

(GOOGLE, INC.)

Page 2: Pregel : A System for Large-Scale Graph Processing

Overview•What is a graph?•Graph Problems• The Purpose of Pregel•Model of Computation•C++ API• Implementation•Applications• Experiments

Page 3: Pregel : A System for Large-Scale Graph Processing

What is a graph?G = (V, E)

Binary Tree

Page 4: Pregel : A System for Large-Scale Graph Processing

Graph Problems

Network Routing Social Network Connections

Page 5: Pregel : A System for Large-Scale Graph Processing

The Purpose of Pregel•Google was interested in applications that could perform internet-related graph algorithms, such as PageRank, so they designed Pregel to perform these tasks efficiently.• It is a scalable, general-purpose system for implementing graph algorithms in a distributed environment.•Focus on “Thinking Like a Vertex” and parallelism

Page 6: Pregel : A System for Large-Scale Graph Processing

Model of Computation

Page 7: Pregel : A System for Large-Scale Graph Processing

Model of Computation (Vertex)

Vertex ID

Vertex Value

Edge ValueVertex

ID

Vertex ID

Edge Value

Page 8: Pregel : A System for Large-Scale Graph Processing

Model of Computation (Superstep)Superstep 0 Superstep 1 Superstep 2

Execution Time

Compute()

Compute()

Compute() Compute()

Compute()

Compute() Compute()

Compute()

Compute()

Page 9: Pregel : A System for Large-Scale Graph Processing

Model of Computation (Vertex Actions)

A vertex can:

Vertex ID

Vertex Value

• Modify its values• Receive messages from

previous superstep• Send messages• Request topology changes

Page 10: Pregel : A System for Large-Scale Graph Processing

Model of Computation (State Machine)

Page 11: Pregel : A System for Large-Scale Graph Processing
Page 12: Pregel : A System for Large-Scale Graph Processing

C++ API

Page 13: Pregel : A System for Large-Scale Graph Processing
Page 14: Pregel : A System for Large-Scale Graph Processing

C++ API (Message Passing)Destination

Vertex IDMessage

Value

2 571 2

Message Buffer

Page 15: Pregel : A System for Large-Scale Graph Processing

C++ API (Combiners & Aggregators)

Combiner Aggregator

Page 16: Pregel : A System for Large-Scale Graph Processing

C++ API (Topology Mutations)V

Superstep

Page 17: Pregel : A System for Large-Scale Graph Processing

C++ API (Input and Output)0 1 2 3 4

0 0 0 1 1 01 0 0 0 1 12 1 1 0 1 13 0 1 1 0 14 1 1 1 0 0

Page 18: Pregel : A System for Large-Scale Graph Processing

Implementation

Page 19: Pregel : A System for Large-Scale Graph Processing

Implementation (Basic Architecture)

Page 20: Pregel : A System for Large-Scale Graph Processing

Implementation (Program Execution)

Flow:1. Copy user program – Master copy & worker copies2. Master assigns graph partitions3. Master takes user input data, assigns to workers –

load vertex data4. Supersteps (Compute() and send messages)5. Save output

Page 21: Pregel : A System for Large-Scale Graph Processing

Implementation (Fault Tolerance)Checkpoint

WorkerSave()

WorkerSave()

WorkerSave()

Recover

WorkerRecompute()

WorkerWorker

Recompute()X

Page 22: Pregel : A System for Large-Scale Graph Processing

Implementation (Worker)

Worker Worker

Page 23: Pregel : A System for Large-Scale Graph Processing

Implementation (Master)List of

WorkersMaster

Partitions

Page 24: Pregel : A System for Large-Scale Graph Processing

Applications

Page 25: Pregel : A System for Large-Scale Graph Processing

Applications (Shortest Path)2 1

5

3

Page 26: Pregel : A System for Large-Scale Graph Processing

Experiments

Page 27: Pregel : A System for Large-Scale Graph Processing

Experiments (Description)• Test the execution times of Pregel running the Single-

Source Shortest Path algorithm. •Use a cluster of 300 multicore commodity PCs.•Run Pregel with Binary Tree graphs, and with a more

realistic, randomly-distributed graph. •Results do not include initialization, graph generation,

and result verification times.• Failure Recovery is not included (reduces overhead)

Page 28: Pregel : A System for Large-Scale Graph Processing
Page 29: Pregel : A System for Large-Scale Graph Processing
Page 30: Pregel : A System for Large-Scale Graph Processing
Page 31: Pregel : A System for Large-Scale Graph Processing

Conclusion•Pregel is a model suitable for large-scale graph computing

with a production-quality, scalable and fault tolerant implementation.

•Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges.

•This implementation is flexible enough to express a broad set of algorithms.