Upload
qian-lin
View
135
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian
Information
time-sensitiverich connections
Challenges
1. Timeliness guarantees
2. Graph
3. Graph-mining
Kineograph
distr. in-memory graph storageincremental graph mining
Graph nodes
Ingest nodes
Continuous Data feeds
Global consistent snapshots
Incremental computation on a static graph snapshot
Progress table
Snapshooter
Graph Storage
Computation
Master
Graph computation
Graph updates
Graph nodes
storage layercomputation layer
Storage layer
key/value storelogical partitions
Graph partitioning
edge-cutno locality consideration
Snapshot
ingest nodesgraph nodes
global progress table
Ingest node
graph-update operationssequence number
Epoch commit protocol
……
s1
4 6 7
1 2 4 s1
sn
Partition u
5 6 8
2 3 5 s1
sn
Partition v
0
…
s1
…sn 3
Progress table
Ingest nodes
Graph nodes Epoch specified by progress table and snapshooter
Global tx vector
Snapshootersn
…
…
123
47
Graph update / compute Pipeline
GraphComputation
SnapshotConstruction
IncomingTweets … …
Si-1 Si Si+1
Ci
ti-1
Time
ti ti’ ti
’’
Epoch
Timeliness
Consistency
no global serialization(diff. from 2PL or t.s. ordering)
Atomicity
v u
v u
Deterministic vertex creation
Computation layer
incremental graph-mining
vertex-based computation model
Incremental Graph Computation
Detect Vertex Status
Compute New Vertex Values
Propagate Updates
Graph-Scale Aggregation
Change Significantly?
Init
Updates from other vertices
Y
N
Push model
sender-side aggregation
Pull model
read a subset of neighbors
Execution model
BSP + Dynamic scheduling
3 apps
TunkRankSP
K-exposure
TunkRank
SP
K-exposure
Fault tolerance among servers
Paxos-based solution
Ingest node failure
incarnation number
Fault tolerance @ storage layerquorum-based replication
Fault tolerance @ computation layer
roll back & re-executeprimary/backup replication
Incremental expansion
Decaying
C#
17,000 LOC
Twitter feeds
8M vertices, 29M edges100M tweets with 100K/sec
power-law
Graph-update throughput
Incremental vs. Non-incremental
Scalability
Incoming data rate
Failure recovery