19
Big Graph Analytics Engine Yinglong Xia 6/23/2016 8th Linked Data Benchmark Council TUC Meeting@Oracle Conference Center

8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine

Embed Size (px)

Citation preview

Big Graph Analytics Engine

Yinglong Xia6/23/2016

8th Linked Data Benchmark Council TUC Meeting@Oracle Conference Center

2

Introduction

3

Introduction

http://www.huawei.com/en/about-huawei/corporate-governance/corporate-governance

4

Recent Growth

Revenue

Net Profits

Cash flow from operating activities

http://www.huawei.com/en/about-huawei

5

Collaboration

Industrial Partners

Universities

Standards

Technical organizations

Global Research Institute & Labs

Open Source

6

Graph Analytics for Smart Big Data

Big Data Analytics & Management

Graph Machine Learning

NLP Deep Learning

7

Graph in ONOS

HotSDN’2014

8

Topology Impact on Information Propagation

9

Explore the Variety in Graph Analytics

Graph

10

Challenges● Very large scale graphs for analysis

• 10B~1000B in terms of the number of vertices• a few hundreds of properties, static and dynamic• distributed communication introduces additional overhead

● Irregularity in graph data access • Low data locality results in high disk/communication IO overhead• Data access patterns are diverse among graph analysis algorithms

● Near real-time requirement• Incorporate with incremental graph updates• Approximate query & analysis should be considered

● Efficiency and productivity to balance

11

Graph Platform for Smart Big Data

Infrastructure

Data Management

Graph engines

Visualization

Analytics

Single Machine Cluster GPU Server Cloud

Structure Management

PropertyManagement

Metadata Management

Permission Control

Basic Engine

Streaming Graph Graphical Model Hyper Graph

Bayes NetCommunity

Label propagationCentrality

Anomaly detection

Matching

Ego Feature

Max Flow

Dynamic Graph Vis Property Vis Large Graph Vis

Incremental Update

bi-temp query

E→Edge Prop

12

Graph Platform

Data Source Graph Topology and Property

V→AdjacencyKC/KV Store

V→Vertex PropProp Idx

Encoding

External(Solr/CLucene)

ConcurrencyControl

Main Storage Dynamic Graph

Onl

ine

quer

y/m

odifi

catio

n

Property indices

Ingestion

V→TimeStamp→Adjacency

Streaming graph storage

V→TimeStamp→Vprop

V→TimeStamp→EpropSlid

ing

Win

dow KC/KV Store

CSRSparse subgraphs

Densse SubgraphDense subgraphs

GPU

Offload

Direct Solver

Iterative Solver

Snapshots Double buffering Batch processing

StreamingGraph

TripleStore

Streaming algorithms

Graph Inference

Inference Tools (Virtuoso, Jena, etc.)

Knowledge Graph

Online update property graph

Periodically updated static graph snapshots

Probabilistic Graphical Model & InferenceOffline Batch Processing

online/offline analysis

MVCC

KV Store

Snap

shot

Man

agem

ent

13

Unified Graph Data Access Patterns1

2

3

4

5

6

1 2 3 4 5 60.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.20.3

1.1equivalent

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6)

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

src dst value 1 2 0.33 2 0.24 1 1.45 1 0.5 2 0.6 6 2 0.8

src dst value 1 3 0.42 3 0.33 4 0.85 3 0.26 4 1.9

src dst value 2 5 0.63 5 0.9 6 1.24 5 0.35 6 1.1

1

2

3

4

5

6

0.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.2

0.3

1.1

1

2

3

4

5

6

0.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.2

0.3

1.1

step

1st

ep 2

step

3

obse

rvat

ion

on P

SW d

ata

acce

ss

patte

rns

insp

ires

high

ly e

ffici

ent

shar

ding

repr

esen

tatio

n

Itera

tion i

14

Construct Edge-set Flows1

2

3

4

5

6

1 2 3 4 5 60.3

0.2

1.4

0.5 0.6

0.8

0.4

0.3

0.8

0.2

1.9

0.6

0.9 1.2

0.3

1.1

3

5

1

2

4

6

1 2 3 4 5 60.2

0.5 0.6

0.8

0.2

0.9 1.2

1.1

0.3 0.4

0.3 0.6

1.4 0.3

0.8 1.9

3

5

1

2

4

6

1 2 3 4 5 60.2

0.5 0.6

0.8

0.2

0.9 1.2

1.1

0.3 0.4

0.3 0.6

1.4 0.3

0.8 1.9

1 4 7 1 2 3 2 5 8 4 5 6

row permutation column permutation Physical edge-sets

1 2 34 5 67 8 9

Flow direction

15

Preliminary Experiments - Preproc.

Graph Ingestion/Preprocessing Time

Create the data in our format

16

Preliminary Experiments - Comp.

PageRank w/o Loading Time

Decent speedup achieved w/ or w/o loading time

17

Preliminary Experiments

PageRank Total Time

18

Conclusion● Many big data problems involve links among a lot of entities,

naturally represented as a graph● Property graph is highly expressive● Industry is looking for graph/graphical model engines for complex

network analysis, streaming graph, probabilistic graphical models, and RDF graph computing

● Efficiency is the key in many industry graph analysis systems, especially when the data volume is big

● Eventually, the graph engine should serve for AI Business systems