21
LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015 LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

Embed Size (px)

Citation preview

Page 1: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network

BenchmarkPeter Boncz

Database Architectures (DA) @ CWI

Special chair “Large-Scale Data Engineering” @ VU

event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

Page 2: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

Engines for Data Analysis

Inaugural Lecture

October 2014

Page 3: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

The Start-Up Company Experience 1996-2003

2008-

2013-

Page 4: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

the relationalindustry has been reshaped...

Page 5: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network

BenchmarkPeter Boncz

Database Architectures (DA) @ CWI

Special chair “Large-Scale Data Engineering” @ VU

event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

Page 6: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

a benchmark is a standardtest that measures efficiency

Goal: quantification make competing systems comparable

important tool in experimental science accelerate progress, make technology

viable social goal, influence a research field

Benchmarking?

Page 7: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

Graph data managementMany Big Data problems revolve around graphs Social network data AI methods that build/discover relationships

Wave of new systems (/research): Graph database systems

e.g. Neo4j -- graph & paths “first class citizens” RDF / SPARQL systems Graph extensions to relational systems

Extensions: e.g. recursive queries, traversals

Graph Programming Frameworks leveraging cluster computing for graph algorithms e.g. GraphLab – distributed AI algorithms Giraph “think like a vertex”

Page 8: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

SNB (Social Network Benchmark) schema

Page 9: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

SNB Workloads Interactive: tests a system's throughput with

relatively simple queries with concurrent updates For one person, recommend a friend based on

shared friends and interests

Business Intelligence: consists of complex structured queries for analyzing online behavior Who are influential people the topic of open source

development?

Graph Analytics: tests the functionality and scalability on most of the data as a single operation PageRank, Shortest Paths, Community Detection

Page 10: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

Social Networks correlation between property values and

network structure

Page 11: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

SNB datagen: correlated graph structure

P4

<know

s

>

<kn

ow

s

>

<knows>

P5

Student “Anna

”<is

>

<studyA

t

>

“University of Leipzig”

<liveAt

>“Germany”

“1990”

<birthYear>

<firstnam

e><firstname

>P1

< studyAt

>

“University of Leipzig”

“Laura”

“1990”

<birthYea

r>

<lik

e>

<Britney Spears>

<Britney Spears>

<like>

<knows

>

P3

<

studyAt

>“University of Leipzig” “1990

<b

irthYea

r> P2<studyAt

>

“University of Amsterdam”

<liv

eA

t

>

“Netherlands”

Page 12: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

SNB datagen: correlated graph structure

P4

P5

Student “Anna

”<is

>

<study

At>

“University of Leipzig”

<liveAt

>“Germany”

“1990”

<birthYear>

<firstnam

e><firstname

>P1

< studyAt

>

“University of Leipzig”

“Laura”

“1990”

<birthYea

r>

<lik

e>

<Britney Spears>

<Britney Spears>

<like>

P3

<

studyAt

>“University of Leipzig” “199

0”

<b

irthYea

r> P2 <study

At>“University of Amsterdam”

<liv

eA

t

>

“Netherlands”

Danger: this is very expensive to compute on a large graph!(quadratic, random access)

?

??

? ?

• Compute similarity of two nodes based on their (correlated) properties.

• Use a probability density function wrt to this similarity for connecting nodes

connectionprobability

highly similar less similar

?

Page 13: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

SNB datagen: correlated graph structure

P4

<know

s

>

<know

s

>

<knows>

P5

Student “Anna

”<is

>

<study

At>

“University of Leipzig”

<liveAt

>“Germany”

“1990”

<birthYear>

<firstnam

e><firstname

>P1

< studyAt

>

“University of Leipzig”

“Laura”

“1990”

<birthYea

r>

<lik

e>

<Britney Spears>

<Britney Spears>

<like>

<know

s>

P3

<

studyAt

>“University of Leipzig”

“1990”

<b

irthYea

r> P2 <study

At>“University of Amsterdam”

<liv

eA

t

>

“Netherlands”

Probability that two nodes are connected is skewed w.r.t the similarity between the nodes (due to probability distr.)

connectionprobability

highly similar less similar

Window

Trick: disregard nodes with too large similarity distance(only connect nodes in a similarity window)

Page 14: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

SNB datagen: MapReduce approach

Page 15: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

SNB datagen: temporal effects

Page 16: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

SNB datagen: friend degree distribution Based on

“Anatomy of Facebook” blogpost (2013)

Diameter increases logarithmically with dataset scale factor

Page 17: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

SNB datagen: how realistic is it?

GRADES2014 “How community-like is the structure of synthetically generated graphs” - Arnau Prat (UPC); David Domínguez-Sal (Sparsity Technologies)

Livejournal LFR3 (synthetic) SNB datagen

Page 18: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

ldbcouncil.org Code @ github/ldbc

Page 19: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

Industry Membership

Page 20: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27

Summary LDBC

Graph and RDF benchmark council Choke-point driven benchmark design (user+system expert

involvement) Social Network Benchmark (SNB)

Advanced social network generator (scale-free,power-laws,clsuetring,correlations)

Real data distributions from DBpediaSIGMOD 2015 publication (to appear)

Page 21: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015

Designing Engines for Data Analysis - Inaugural Lecture - 14/10/2014

Working with Industry increases impact Jim Gray Michael Stonebreaker

ACM Turin

g

Award 1998 IEEE Von

Neumann

Medal 2004

ACM Turin

g

Award 2015