Democratizing Big Semantic Data management · Democratizing Big Semantic Data management … or how to query a labelled graph with 28 billion edges in a standard laptop Javier D

Democratizing Big Semantic Data management

hellip or how to query a labelled graph with 28 billion edges in a standard laptop

Javier D Fernaacutendez

26TH SEPTEMBER 2017

WECOS workshopCSH Vienna

0 Zero knowledge

1 I have just heard of RDF andor Linked Data

2 I know the basic foundations and I gave it a try

3 I often manage RDFLinked Data

PAGE 2

Knowledge of RDFLinked Data

img Nick Youngson

Linked Data Introduction

Preliminaries

Linked Data is simply about using the Web to create typed links between data from different sourcesldquo

A practical scenariohellip

computer scientists working in Vienna younger than 40

4

5

The information is already in the Webhellip but with no structure

httpswwwwuacateninfobizteamfernandezhttpmyPersonalWebCV

hellip Javier Fernaacutendez helliphelliphelliphelliphellip33 years oldhelliphelliphelliphelliphelliphelliphelliphellip helliphellip

hellip Javier David FernandezhelliphelliphelliphellipWU (Vienna University of Economics and Business)helliphelliphellip hellipis a postdoctoral researcherhelliphelliphelliphellip

The Web of Data (Semantic Web)Linking data to data




Javier Fernaacutendez

33

age

Javier David Fernaacutendez

WU

works

postdoctoral researcher

Vienna

is a

is located in

same as

Quick intro to

Resource Description Framework (W3C Rec 2004)

Machine processable descriptions

Webs services protocols Persons Proteins geographyhellip

Data model Based on Triplessentences Subject Predicate Object

8

Example

Javier isA Person Javier hasName ldquoJavier Fernandezrdquo Javier worksAt WU Javier knows tim Javier knows axelaxel hasName ldquoAxel Polleresldquotim hasName ldquoTim Berners-Leeldquo tim hasCreated httplinkeddataorg

Is this the same Javier as Javier Bardem (actor)

Is ldquoWorksAtrdquo thesame as

ldquoresearchAtrdquo

I

Quick intro to





9

Example

lthttpFernandeznetJaviergt lthttpwwww3org19990222-rdf-syntax-nstypegt lthttpxmlnscomfoaf01Persongt lthttpFernandeznetJaviergt lthttpxmlnscomfoaf01namegt ldquoJavier Fernandezrdquo lthttpFernandeznetJaviergt lthttpxmlnscomfoaf01workplaceHomepagegt lthttpwwwwuacatgt lthttpFernandeznetJaviergt lthttpxmlnscomfoaf01knowsgt lthttptimorgfoafrdftimgtlthttpFernandeznetJaviergt lthttpxmlnscomfoaf01knowsgt lthttppolleresnetmegt lthttppolleresnetmegt lthttpxmlnscomfoaf01namegt ldquoAxel Polleresrdquolthttptimorgfoafrdftimgt lthttpxmlnscomfoaf01namegt ldquoTim Berners-Leeldquo lthttptimorgfoafrdftimgt lthttpxmlnscomfoaf01madegt lthttplinkeddataorggt

URIs x URIs x (URIs U Literals)

I

lthttpFernandeznetJaviergtlthttppolleresnetmegt

ldquoJavier Fernandezrdquo ldquoAxel Polleresrdquo

foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE

people foafknows lthttppolleresnetmegt

people foafname name

people

name

people name

lthttpFernandeznetJaviergt ldquoJavier Fernandezrdquo


ldquoJavier Fernandezrdquo

foafPerson

rdftype

10

lthttppolleresnetmegt

foafknows

Current RDF data

Query

The Web of Linked Data (2017)

BIG (LINKED) SEMANTIC DATA COMPRESSIONPAGE 11

~10K datasets organized into 9 domains which include many and varied knowledge fields

150B statements including entity descriptions and (interintra-dataset) links between them

gt500 live endpoints serving this data

httplod-cloudnet

httpstatslod2eu

httpsparqlesaiwuacat

Big Semantic Data

The greatness of Linked Open Data

13

gt 150B triples

1K-6K datasets

gt557 SPARQL Endpoints

httplod-cloudnethttpsdatahubiohttpstatslod2euhttpsparqlesaiwuacat

But what about Web-scale queries

Eg retrieve all entities in LOD with the label ldquoAxel Polleresldquo

Solutions

14

select distinct x

x rdfslabel Axel Polleres

15

Letrsquos fish in our Linked Data Eco System

A) Federated Queries

1 Get a list of potential SPARQL Endpoints

datahubio LOV other catalogs

2 Query each SPARQL Endpoint

Problems

Many SPARQL Endpoints have low availability

16

The Web of Data Eco System






Problems


SPARQL Endpoints are usually restricted (timeoutresults)

Moreover it can be tricky with complex queries (joins) due to intermediary results delays etc

17


B) Follow-your-nose

1 Follow self-descriptive IRIs and links

2 Filter the results you are interested in

Problems

You need some initial seed

DBpedia could be a good start

Itrsquos slow (fetching many documents)

Where should I start for unbounded queries

x rdfslabel ldquoAxel Polleres

18


C) Use the RDF dumps by yourself

1 Crawl de Web of Data

Probably start with datahubio LOV other catalogs

2 Download datasets

You better have some free space in your machine

3 Index the datasets locally

You better are patience and survive parsing errors

4 Query all datasets

You better are alive by then

Problems

Hugh resources

+ Messiness of the data

19


1) LOD Laundromat

Challenges

Still you need to query 650K datasets

Of course it does not contain all LOD but ldquoa good approximationrdquo

20

A Linked Data hacker toolkit

2) HDT

Highly compact serialization of RDF

Allows fast RDF retrieval in compressed space (without prior decompression)

Includes internal indexes to solve basic queries with small (3) memory footprint

Very fast on basic queries (triple patterns) x 15 faster than Virtuoso Jena RDF3X

Supports FULL SPARQL as the compressed backend store of Jena with an efficiency on the same scale as current more optimized solutions

Challenges

Publisher has to pay a bit of overhead to convert the RDF dataset to HDT (but then it is ready to consume efficiently)

21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB

Slightly more but you can query

httpsgithubcomrdfhdt C++ and Java tools

Only in the last two weekshellip

HDT-cpp

HDT-java

3) Linked Data Fragments

Challenges

Still room for optimization for complex federated queries (delays intermediate results hellip)

23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot

LOD-a-lot28B triples

Disk size

HDT 304 GB

HDT-FoQ (additional indexes) 133 GB

Memory footprint (to query)

157 GB of RAM (3 of the size)

144 seconds loading time

8 cores (26 GHz) RAM 32 GB SATA HDD on Ubuntu 14045 LTS

LDF page resolution in milliseconds

27

LOD-a-lot (some numbers)

305euro

(LOD-a-lot creation took 64 h amp 170GB RAM HDT-FoQ took 8 h amp 250GB RAM)

28

LOD-a-lot

httpsdatahubiodatasetlod-a-lot

httppurlorgHDTlod-a-lot

Query resolution at Web scale

Using LDF Jena

Evaluation and Benchmarking

No excuse

RDF metrics and analytics

29

LOD-a-lot (some use cases)

subjects predicates objects

Identity closure

x owlsameAs y

Graph navigations

Eg shortest path random walk

30


Wouter Beek Javier D Fernaacutendez and Ruben Verborgh LOD-a-lot A Single-File Enabler for Data Science In Proc of SEMANTiCS 2017

More use cases

httphdtlodlabsvunltripleobject=22Axel20Polleres22

Retrieve all entities in LOD with the label ldquoAxel Polleresldquo

Update LOD-a-lot regularly

More and newer datasets from the LOD Cloud

Keep named graphs with the provenance of each triple

Currently supported only via LOD Laundromat

hellip implement the use cases and help the community to democratize the access to LOD

low-cost access to LOD = high-impact research

Roadmap

21

32

ACKs

36

RDF Archiving Archiving policies

V1

exC1 exhasProfessor exP1 exS1 exstudy exC1 exS3 exstudy exC1

exC1 exhasProfessor exP2 exC1 exhasProfessor exS2 exS1 exstudy exC1 exS3 exstudy exC1

V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1

exC1 exhasProfessor exP1

exC1 exhasProfessor exP2 exC1 exhasProfessor exS2

V12

3exC1 exhasProfessor exP1 [V1V2]exC1 exhasProfessor exP2 [V3]exC1 exhasProfessor exS2 [V3]exS1 exstudy exC1 [V1V2V3]exS2 exstudy exC1 [V1]exS3 exstudy exC1 [V2V3]

a) Independent CopiesSnapshots (IC)

b) Change-based approach (CB)

c) Timestamp-based approach (TB)

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

Democratizing Open Data preservationmonitoring

Enhance usability of Open Data and to enhance its accessibility for non-expert users

Deep search and re-usable visualization components

Integrate Open Data support into the online discussion and Web Intelligence platforms

OPEN DATA PORTAL WATCH

Vadim Savenkov

Periodically monitoring a list of Open Data Portals

90 CKAN powered Open Data Portals

Quality assessment

Evolution tracking

Meta data

Data

The CommuniData Project

httpdatawuacatportalwatch

Juumlrgen UmbrichSebastian NeumaierAxel Polleres ImagesAd Meskens Doug Coulter

We are currently facing Big Linked Data challenges

Generation publication and consumption

Archiving evolutionhellip

Thanks to compression the Big Linked Data today will be the ldquopocketrdquo data tomorrow

Compression democratizes the access to Big Linked Data

= Cheap scalable consumers


PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

0 Zero knowledge

1 I have just heard of RDF andor Linked Data

2 I know the basic foundations and I gave it a try

3 I often manage RDFLinked Data

PAGE 2

Knowledge of RDFLinked Data

img Nick Youngson


Preliminaries




4

5










33

age


WU

works


Vienna

is a

is located in

same as

Quick intro to





8

Example





I

Quick intro to





9

Example



I



foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov


Preliminaries




4

5










33

age


WU

works


Vienna

is a

is located in

same as

Quick intro to





8

Example





I

Quick intro to





9

Example



I



foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov



4

5










33

age


WU

works


Vienna

is a

is located in

same as

Quick intro to





8

Example





I

Quick intro to





9

Example



I



foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

5










33

age


WU

works


Vienna

is a

is located in

same as

Quick intro to





8

Example





I

Quick intro to





9

Example



I



foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov










33

age


WU

works


Vienna

is a

is located in

same as

Quick intro to





8

Example





I

Quick intro to





9

Example



I



foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov






33

age


WU

works


Vienna

is a

is located in

same as

Quick intro to





8

Example





I

Quick intro to





9

Example



I



foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

Quick intro to





8

Example





I

Quick intro to





9

Example



I



foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

Quick intro to





9

Example



I



foafPerson

rdftype

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

Formal Query SPARQL

Similar to SQL

SELECT people name

WHERE



people

name

people name




foafPerson

rdftype

10


foafknows

Current RDF data

Query






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov






httplod-cloudnet

httpstatslod2eu


Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

Big Semantic Data


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov


13

gt 150B triples

1K-6K datasets





Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov



Solutions

14

select distinct x


15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

15






Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov





Problems


16







Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov





Problems




17


B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

B) Follow-your-nose



Problems






18





2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov




2 Download datasets






Problems

Hugh resources


19


1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

1) LOD Laundromat

Challenges



20


2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

2) HDT






Challenges


21


431 Mtriples~

63 GB

NT + gzip5 GB

HDT 66 GB




HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov



HDT-cpp

HDT-java


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov


Challenges


23


PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

PAGE 24

LDF interfaces

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

LOD-a-lot

25


- flashback -

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

26

LOD

Laundromat

Dataset 1

N-Triples (zip)

Dataset 650K

N-Triples (zip)

Linked Open Data

SPARQL endpoint

(metadata)

LOD-a-lot


Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

Disk size

HDT 304 GB







27


305euro


28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

28

LOD-a-lot




Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov


Using LDF Jena


No excuse


29



Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

Identity closure

x owlsameAs y

Graph navigations


30



More use cases









Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov







Roadmap

21

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

32

ACKs

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

36


V1



V2 V3


V1


exS3 exstudy exC1

exS2 exstudy exC1



V12





RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR

RETRIEVAL MEDIATOR






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov






Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov





Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov



Quality assessment

Evolution tracking

Meta data

Data











PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov








PAGE 40

Take-home messages

Thank you

TOP-K shortest path

Vadim Savenkov

Thank you

TOP-K shortest path

Vadim Savenkov

TOP-K shortest path

Vadim Savenkov

Documents

Democratizing Big Semantic Data management · Democratizing Big Semantic Data management … or how to query a labelled graph with 28 billion edges in a standard laptop Javier D