Upload
ldbc-council
View
67
Download
0
Embed Size (px)
Citation preview
Big Graph Analytics Engine
Yinglong Xia6/23/2016
8th Linked Data Benchmark Council TUC Meeting@Oracle Conference Center
4
Recent Growth
Revenue
Net Profits
Cash flow from operating activities
http://www.huawei.com/en/about-huawei
5
Collaboration
Industrial Partners
Universities
Standards
Technical organizations
Global Research Institute & Labs
Open Source
6
Graph Analytics for Smart Big Data
Big Data Analytics & Management
Graph Machine Learning
NLP Deep Learning
10
Challenges● Very large scale graphs for analysis
• 10B~1000B in terms of the number of vertices• a few hundreds of properties, static and dynamic• distributed communication introduces additional overhead
● Irregularity in graph data access • Low data locality results in high disk/communication IO overhead• Data access patterns are diverse among graph analysis algorithms
● Near real-time requirement• Incorporate with incremental graph updates• Approximate query & analysis should be considered
● Efficiency and productivity to balance
11
Graph Platform for Smart Big Data
Infrastructure
Data Management
Graph engines
Visualization
Analytics
Single Machine Cluster GPU Server Cloud
Structure Management
PropertyManagement
Metadata Management
Permission Control
Basic Engine
Streaming Graph Graphical Model Hyper Graph
Bayes NetCommunity
Label propagationCentrality
Anomaly detection
Matching
Ego Feature
Max Flow
Dynamic Graph Vis Property Vis Large Graph Vis
Incremental Update
bi-temp query
E→Edge Prop
12
Graph Platform
Data Source Graph Topology and Property
V→AdjacencyKC/KV Store
V→Vertex PropProp Idx
Encoding
External(Solr/CLucene)
ConcurrencyControl
Main Storage Dynamic Graph
Onl
ine
quer
y/m
odifi
catio
n
Property indices
Ingestion
V→TimeStamp→Adjacency
Streaming graph storage
V→TimeStamp→Vprop
V→TimeStamp→EpropSlid
ing
Win
dow KC/KV Store
CSRSparse subgraphs
Densse SubgraphDense subgraphs
GPU
Offload
Direct Solver
Iterative Solver
Snapshots Double buffering Batch processing
StreamingGraph
TripleStore
Streaming algorithms
Graph Inference
Inference Tools (Virtuoso, Jena, etc.)
Knowledge Graph
Online update property graph
Periodically updated static graph snapshots
Probabilistic Graphical Model & InferenceOffline Batch Processing
online/offline analysis
MVCC
KV Store
Snap
shot
Man
agem
ent
13
Unified Graph Data Access Patterns1
2
3
4
5
6
1 2 3 4 5 60.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.20.3
1.1equivalent
src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8
src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9
src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1
shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6)
src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8
src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9
src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1
src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8
src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9
src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1
1
2
3
4
5
6
0.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
1
2
3
4
5
6
0.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
step
1st
ep 2
step
3
obse
rvat
ion
on P
SW d
ata
acce
ss
patte
rns
insp
ires
high
ly e
ffici
ent
shar
ding
repr
esen
tatio
n
Itera
tion i
14
Construct Edge-set Flows1
2
3
4
5
6
1 2 3 4 5 60.3
0.2
1.4
0.5 0.6
0.8
0.4
0.3
0.8
0.2
1.9
0.6
0.9 1.2
0.3
1.1
3
5
1
2
4
6
1 2 3 4 5 60.2
0.5 0.6
0.8
0.2
0.9 1.2
1.1
0.3 0.4
0.3 0.6
1.4 0.3
0.8 1.9
3
5
1
2
4
6
1 2 3 4 5 60.2
0.5 0.6
0.8
0.2
0.9 1.2
1.1
0.3 0.4
0.3 0.6
1.4 0.3
0.8 1.9
1 4 7 1 2 3 2 5 8 4 5 6
row permutation column permutation Physical edge-sets
1 2 34 5 67 8 9
Flow direction
15
Preliminary Experiments - Preproc.
Graph Ingestion/Preprocessing Time
Create the data in our format
16
Preliminary Experiments - Comp.
PageRank w/o Loading Time
Decent speedup achieved w/ or w/o loading time
18
Conclusion● Many big data problems involve links among a lot of entities,
naturally represented as a graph● Property graph is highly expressive● Industry is looking for graph/graphical model engines for complex
network analysis, streaming graph, probabilistic graphical models, and RDF graph computing
● Efficiency is the key in many industry graph analysis systems, especially when the data volume is big
● Eventually, the graph engine should serve for AI Business systems