NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them using StoryBlink

Preview:

Citation preview

Exposing Digital Content as Linked Data,and Linking them using StoryBlink

Ben De MeesterTom De Nies, Laurens De Vocht,

Ruben Verborgh, Erik Mannens, and Rik Van de Walle

University Ghent – iMinds – Multimedia Labben.demeester@ugent.be | @Ben__DM

NLPDBpedia2015@ISWC | October 11th 2015 | Bethlehem, PA

We live in a fast worldwith a lot of content to sift through

http://blog.qmee.com/qmee-online-in-60-seconds/

Book ≠ Fast

Finding a good book in short time?

Recommendations!

Recommendations?

Social recommendationsLong tail

Metadata recommendationsManual?

What do we want?

Automatic content-based metadata

to fuel future recommendation-engines

Content-based metadata

Get the tags…DBPedia Spotlight

... use them to represent books’ content …EPUB CFI, NIF, ITS, …

… and link to other books … in a good way.TPF, EiCE

Storyblink!

Get the tags

Find out what a book is about…

Semantic tags!

Using NER/NED!

Extract all semantic concepts from the book

AGDISTIS

AGDISTIS

Open source

Local

NER/NED/NEL

From a book to a semantic book

… …

Split HTML into chunks

HTMLto text

Local Spotlight

Represent a book by tags@prefix schema: <http://schema.org/> .@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .@prefix dbr: <http://dbpedia.org/resource/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> .

pg84:book a schema:Book .

pg84:epubcfi(/6/12!/4/2/4) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book .pg84:epubcfi(/6/2!/4/46[chap01]/16/42) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/12!/4/2/6) itsrdf:taIdentRef dbr:Desert ; nif:sourceUrl pg84:book .

...

Represent a book by tags@prefix schema: <http://schema.org/> .@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .@prefix dbr: <http://dbpedia.org/resource/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> .

pg84:book a schema:Book .

pg84:epubcfi(/6/12!/4/2/4) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book .pg84:epubcfi(/6/2!/4/46[chap01]/16/42) itsrdf:taIdentRef dbr:Chamois ; nif:sourceUrl pg84:book . pg84:epubcfi(/6/12!/4/2/6) itsrdf:taIdentRef dbr:Desert ; nif:sourceUrl pg84:book .

...

Link to other books

Open Source

Linked data path finding

Multiple paths

Keeping all concepts…

Not all mentioned concepts are useful.

The path finding becomes really slow.

Keeping all concepts…

Not all mentioned concepts are useful.

The path finding becomes really slow.

What happens if we keep the top X%?

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

0

10000

20000

30000

40000

50000

60000

Amount of considered concepts (%)

#paths Time (s)

Top 50% of found concepts gives similar paths,but a lot faster

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

0

10000

20000

30000

40000

50000

60000

Amount of considered concepts (%)

#paths Time (s)

Top 50% of found concepts gives similar paths,but a lot faster

Time-out

Optimized Results@prefix schema: <http://schema.org/> .@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .@prefix dbr: <http://dbpedia.org/resource/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix pg84: <http://www.gutenberg.org/ebooks/84.epub#> . pg84:book a schema:Book .

pg84:book itsrdf:taIdentRef dbr:Chamois, dbr:Desert, ...

http://uvdt.test.iminds.be/storyblinkdata/books

Storyblink

Exploring the links between classic works

Choose two books, and…

Storyblink

Next steps

Scale

Indirect pathse.g. book about WWI and book about WWII

Relevancy measuresKnowledge base influenceFiltering influence

Storyblinkgives a semantic representationof important semantic concepts

inside books, and uses those to connect books together content-wise

http://uvdt.test.iminds.be/storyblink

Demo 48

Our project

The Publisher of the Future

Our pilot project partners:

Recommended