30
The Pregel Programming Model with Spark GraphX

The Pregel Programming Model with Spark GraphX

Embed Size (px)

Citation preview

Page 1: The Pregel Programming Model with Spark GraphX

The Pregel Programming Model with Spark GraphX

Page 2: The Pregel Programming Model with Spark GraphX

Agenda

- GraphX Introduction - Pregel programming model - Code examples

The main focus will be on the programming model

Page 3: The Pregel Programming Model with Spark GraphX

GraphX is a graph processing system built on top of Apache Spark

- property graph representation- based on RDDs- user defined partitioning on RDDs

Page 4: The Pregel Programming Model with Spark GraphX

GraphX / Spark software stack

Page 5: The Pregel Programming Model with Spark GraphX

Pregel Programming Model

https://kowshik.github.io/JPregel/pregel_paper.pdf

- based on vertices- messages from/to neighbours- bounded in supersteps- status (active / inactive)

Page 6: The Pregel Programming Model with Spark GraphX

Pregel Sample: finding the maximum value

Page 7: The Pregel Programming Model with Spark GraphX

GraphX implementation of Pregel

Uses three functions:

- vprog computes the new vertex value- sendMsg decides to whom send the new value- mergeMsg merges incoming values

Page 8: The Pregel Programming Model with Spark GraphX

GraphX communication diagram

Page 9: The Pregel Programming Model with Spark GraphX

graph.pregel( initialMsg = Int.MinValue, maxIterations = Int.MaxValue, activeDirection = EdgeDirection.Out)( // vprog (vertexId: Long, currentVertexAttr: Int, newVertexAttr: Int) => if (newVertexAttr > currentVertexAttr)

newVertexAttr else currentVertexAttr, // sendMsg (edgeTriplet: EdgeTriplet[Int, Int]) => { if (edgeTriplet.srcAttr > edgeTriplet.dstAttr) Iterator( (edgeTriplet.dstId, edgeTriplet.srcAttr) ) else Iterator.empty },

// mergeMsg (attribute1: Int, attribute2: Int) =>

if (attribute1 > attribute2) attribute1 else attribute2)

Max Value implementation

Page 10: The Pregel Programming Model with Spark GraphX

Graph initial stateNode [1]: 3Node [2]: 6Node [3]: 2Node [4]: 1

Graph final stateNode [1]: 6Node [2]: 6Node [3]: 6Node [4]: 6

Max value of the graph is 6.

Max Value implementationResults:

Page 11: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Baltimore- Detroit- Chicago- NewYork- Philadelphia

Page 12: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Baltimore- Detroit- Chicago- NewYork- Philadelphia

Page 13: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Baltimore- Detroit- Chicago- NewYork- Philadelphia

Page 14: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Baltimore- Detroit- Chicago- NewYork- Philadelphia

Page 15: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Detroit- Chicago- NewYork- Philadelphia

Page 16: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Detroit- Chicago- NewYork- Philadelphia

Page 17: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Detroit- Chicago- NewYork- Philadelphia

Page 18: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Chicago- NewYork- Philadelphia

Page 19: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Chicago- NewYork- Philadelphia

Page 20: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Chicago- Philadelphia

Page 21: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Chicago- Philadelphia

Page 22: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

- Chicago

Page 23: The Pregel Programming Model with Spark GraphX

Dijkstra's algorithm

Unvisited nodes:

Page 24: The Pregel Programming Model with Spark GraphX

type VertexId = scala.Long

case class City(name: String, id: VertexId

)

case class VertexAttribute(cityName: String, distance: Double, path: List[City]

)

Dijkstra's algorithm implementation

Types definitions:

Page 25: The Pregel Programming Model with Spark GraphX

val shortestPathGraph = initialGraph.pregel(initialMsg = VertexAttribute(

"", Double.PositiveInfinity, List[City]()

),maxIterations = Int.MaxValue,activeDirection = EdgeDirection.Out)(vprog,sendMsg,mergeMsg)

Dijkstra's algorithm implementation

Page 26: The Pregel Programming Model with Spark GraphX

val vprog = ( vertexId: VertexId, currentVertexAttr: VertexAttribute, newVertexAttr: VertexAttribute ) =>

if (currentVertexAttr.distance <= newVertexAttr.distance) { currentVertexAttr else newVertexAttr

}

val mergeMsg = (attribute1: VertexAttribute, attribute2: VertexAttribute

) =>

if (attribute1.distance < attribute2.distance) { attribute1 else attribute2

}

Dijkstra's algorithm implementation

Page 27: The Pregel Programming Model with Spark GraphX

val sendMsg = (edgeTriplet: EdgeTriplet[VertexAttribute, Double]) => { if (edgeTriplet.srcAttr.distance < (edgeTriplet.dstAttr.distance - edgeTriplet.attr)) {

Iterator( (edgeTriplet.dstId,

new VertexAttribute(edgeTriplet.dstAttr.cityName,edgeTriplet.srcAttr.distance + edgeTriplet.attr,edgeTriplet.srcAttr.path :+ new City(

edgeTriplet.dstAttr.cityName, edgeTriplet.dstId

) ) ) ) } else Iterator.empty}

Dijkstra's algorithm implementation

Page 28: The Pregel Programming Model with Spark GraphX

Going from Washington to Chicago has a distance of 105.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5] => Chicago [4]

Going from Washington to Washington has a distance of 0.0 km. Path is: Washington [1]

Going from Washington to Philadelphia has a distance of 91.0 km. Path is: Washington [1] => Baltimore[2] => Detroit[3] => NewYork[5] => Philadelphia[6]

Going from Washington to Detroit has a distance of 62.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3]

Going from Washington to NewYork has a distance of 76.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5]

Going from Washington to Baltimore has a distance of 27.0 km. Path is: Washington [1] => Baltimore [2]

Dijkstra's algorithm implementationResults:

Page 29: The Pregel Programming Model with Spark GraphX

Questions & Answers

Page 30: The Pregel Programming Model with Spark GraphX

Thanks!

The code is available at https://github.com/andreaiacono/TalkGraphX