Trinity: A Distributed Graph Engine on a Memory Cloud

Trinity: A Distributed Graph Engine on a Memory Cloud

Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian/

Graph applications

Online query processing Low latencyOffline graph analytics High throughput

Online queries

Random data accesse.g., BFS, sub-graph matching, …

Offline computations

Performed iteratively

Insight: Keeping the graph in memory

at least the topology

Trinity

Online query + Offline analytics

Random data access problem in large graph

computationGlobally addressable distr. memory

Random access abstraction

Belief

High-speed network is more availableDRAM is cheaper

In-memory solution become practical

“Trinity itself is not a system that comes with comprehensive built-in graph computation modules.”

Trinity cluster

Stack of Trinity system modules

User define: Graph schema, Communication protocols, Computation paradigms

Memory cloud

Partition memory space into trunksHashing

Memory trunks

2p > m1. Trunk level parallelism

2. Efficient hashing

Hashing

Key-value storep-bit value i [0, 2∈ p – 1]

Inner trunk hash table

Data partitioning and addressing

Benefit:Scalability Fault-tolerance

Modeling graph

Cell: value + schemaRepresent a node in a cell

TSL

Object-oriented cell manipulationData integration

Network communication

Online queries

Traversal basedNew paradigm

Vertex centric offline analytics

Restrictive vertex centric model

Message passing optimization

Create a bipartite partition of the local graph

Buffer hub vertices

A new paradigm for offline analytics

1. Aggregate answers from local computations

2. Employ probabilistic inference

Circular memory management

• Aim to avoid memory gaps between large number of key-value pairs

Fault tolerance

Heartbeat-based failure detectionBSP: checkpointing

Async.: “periodical interruption”

Performance

Performance (cont.)

Education

Trinity: A Distributed Graph Engine on a Memory Cloud