43
DBpedia DBpedia A It li ki H b i th W b f D t An Interlinking-Hub in the Web of Data Chris Bizer Chris Bizer Christian Becker Georgi Kobilarov Freie Universität Berlin London. September 4, 2008

DBpedia - An Interlinking Hub in the Web of Data

Embed Size (px)

DESCRIPTION

Given and overview about the DBpedia project and the role of DBpedia in the Web of Data and outlines the next steps from the Dbpedia project as well as ideas for using DBpedia data within the BBC.

Citation preview

Page 1: DBpedia - An Interlinking Hub in the Web of Data

DBpediaDBpedia

A I t li ki H b i th W b f D t An Interlinking-Hub in the Web of Data

Chris BizerChris BizerChristian BeckerGeorgi Kobilarov

Freie Universität Berlin

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)London. September 4, 2008

Page 2: DBpedia - An Interlinking Hub in the Web of Data

Hello

Name Chris Bizer

Job Associate Professor at Freie Universität Berlin, Germany

ProjectsProjectsRAP - RDF API for PHP (together with Universität Leipzig)D2RQ und D2R Server (together with HP Labs Bristol)D2RQ und D2R Server (together with HP Labs Bristol)Named Graphs and NG4J (together with HP Labs Bristol)Fresnel Display Vocabulary (together with MIT and INRIA)Fresnel Display Vocabulary (together with MIT and INRIA)DBpedia (together with Universität Leipzig and OpenLink)Linking Open Data (community project sponsored by W3C)Linking Open Data (community project sponsored by W3C)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 3: DBpedia - An Interlinking Hub in the Web of Data

Outline

1. The DBpedia Project

2. DBpedia and the Web of Data

3. What’s next for DBpedia?

4. The BBC, DBpedia, and the Web of Data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 4: DBpedia - An Interlinking Hub in the Web of Data

DBpedia

DBpedia is a community effort toextract structured information from Wikipediamake this information available on the Web under an open licenseinterlink the DBpedia dataset with other open datasets on the Web

ContributorsFreie Universität Berlin (Germany)Universität Leipzig (Germany)OpenLink Software (UK)Linking Open Data Community (W3C SWEO)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 5: DBpedia - An Interlinking Hub in the Web of Data

Structured Information in Wikipedia

Wikipedia consists of 11.2 million articles (2.49 million in English)in 264 languagesmonthly growth-rate: 4%

Wikipedia articles contain structured informationinfoboxes which use a template mechanismcategorization of the article images depicting the article’s topiclinks to external webpagesintra-wiki links to other articlesinter-language links to articles about the same topic in different languages

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 6: DBpedia - An Interlinking Hub in the Web of Data

Extracting Structured Information from Wikipedia

http://en.wikipedia.org/wiki/Calgary

<http://dbpedia.org/resource/Calgary>

dbpedia:native_name “Calgary” ;

dbpedia:elevation “1048” ;

dbpedia:population_city “988193” ;

db di l ti t “1079310”dbpedia:population_metro “1079310” ;

mayor_name

dbpedia:Dave Bronconnier ;dbpedia:Dave_Bronconnier ;

governing_body

dbpedia:Calgary_City_Council ;_ _

...

using a PHP extraction frameworkusing a PHP extraction frameworkframework session: Tomorrow

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 7: DBpedia - An Interlinking Hub in the Web of Data

The DBpedia Dataset

Data about 2.49 million “things”including at leastincluding at least 108,000 persons 392,000 places p57,000 music albums36,000 films 80,000 species

Altogether 218 million pieces of information (RDF triples)Altogether 218 million pieces of information (RDF triples) 29 million triples originate from infoboxes 588,000 links to pictures, p3,150,000 links to relevant external web pages 207,000 Wikipedia categories75,000 YAGO categories

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 8: DBpedia - An Interlinking Hub in the Web of Data

Multi-Lingual Abstracts

The dataset contains a short and a long abstract for each concept.concept.Short abstracts

English: 2 490 000English: 2,490,000 German: 391,000 French: 383 000French: 383,000 Dutch: 284,000 Polish: 256 000Polish: 256,000 Italian: 286,000 Spanish: 226 000Spanish: 226,000 Japanese: 199,000 Portuguese: 246 000Portuguese: 246,000 Swedish: 144,000 Chinese 101 000

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Chinese: 101,000

Page 9: DBpedia - An Interlinking Hub in the Web of Data

Accessing the DBpedia Dataset over the Web

1. SPARQL Endpoint

2 Linked Data Interface2. Linked Data Interface

3. Data Dumps for Download

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 10: DBpedia - An Interlinking Hub in the Web of Data

The DBpedia SPARQL Endpoint

http://dbpedia.org/sparql

hosted on a OpenLink Virtuoso server

can answer SPARQL queries likecan answer SPARQL queries likeGive me all Sitcoms that are set in NYC? All German musicians that were born in Berlin in the 19th century?All German musicians that were born in Berlin in the 19th century?All tennis players from Moscow? All films by Quentin Tarentino?All films by Quentin Tarentino? All soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 11: DBpedia - An Interlinking Hub in the Web of Data

Improving Wikipedia Search

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 12: DBpedia - An Interlinking Hub in the Web of Data

2. DBpedia and the Web of Data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 13: DBpedia - An Interlinking Hub in the Web of Data

The Web of Documents

The Web is a single information space b ild t d d d h li kbuild on open standards and hyperlinks.

Web Browsers

Search Engines

HTTP

HTML HTML HTMLhyper h h

HTMLhyperlinks

hyperlinks

hyperlinks

A B C D

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

A B C D

Page 14: DBpedia - An Interlinking Hub in the Web of Data

Linked Data

Use RDF and HTTP to1. publish structured data on the Web,2. set links between data from one data source2. set links between data from one data source

to data within other data sources.

Thing Thing Thing Thing ThingThing

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

datalink

datalink

datalink

datalink

B CA D E

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 15: DBpedia - An Interlinking Hub in the Web of Data

What can I do with this?

Search Engines

Linked DataMashups

Linked DataBrowsers Engines MashupsBrowsers

HTTP HTTP

Thing Thing Thing Thing Thing

HTTP HTTP

Thing

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

datalink

datalink

datalink

datalink

B CA D E

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 16: DBpedia - An Interlinking Hub in the Web of Data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 17: DBpedia - An Interlinking Hub in the Web of Data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 18: DBpedia - An Interlinking Hub in the Web of Data

Falcons

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 19: DBpedia - An Interlinking Hub in the Web of Data

DBtune Slashfacet

visualizes music-related Linked Datauses LastFM MySpace and BBC datauses LastFM, MySpace, and BBC data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 20: DBpedia - An Interlinking Hub in the Web of Data

DBpedia Mobile

Geospatial entry point into the Web of Datainto the Web of Data

Starts with DBpedia, R d Fli k d tRevyu and Flickr data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 21: DBpedia - An Interlinking Hub in the Web of Data

DERI Semantic Web Pipes

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 22: DBpedia - An Interlinking Hub in the Web of Data

Is the Web of Data real?

Thing Thing Thing Thing ThingThing

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

typedlinks

typedlinks

typedlinks

typedlinks

B CA D E

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 23: DBpedia - An Interlinking Hub in the Web of Data

W3C Linking Open Data Project

Community effort toypublish existing open license datasets as Linked Data on the Webinterlink things between different data sources

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 24: DBpedia - An Interlinking Hub in the Web of Data

LOD Datasets on the Web: May 2007

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)Over 500 million RDF triples.

Page 25: DBpedia - An Interlinking Hub in the Web of Data

LOD Datasets on the Web: April 2008

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)Over 2 billion RDF triples.

Page 26: DBpedia - An Interlinking Hub in the Web of Data

DBpedia as Interlinking Hub

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 27: DBpedia - An Interlinking Hub in the Web of Data

3. What is next for DBpedia?

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 28: DBpedia - An Interlinking Hub in the Web of Data

Improve the Quality of Extracted Data

Problem chaotic usage of infoboxes within Wikipedia

Solutionsmarter version of the infobox extractorsmushes multiple properties with the same meaningsmushes different infoboxes for the same classuses knowledge about property rangesgenerates a clean class hierarchy

StatusStatusclean dataset available within the next 2 weeks

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 29: DBpedia - An Interlinking Hub in the Web of Data

Better Interfaces for Common Wikipedia Users

Cooperation with Neophonie (Berlin search engine company)

Direction: free-text search + facet-browsing

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 30: DBpedia - An Interlinking Hub in the Web of Data

Cross-Language Data Fusion

Opportunitythere are 264 Wikipedia Editions in different languages.there are cross-language links.the Italian Wikipedia knows more about Italian villages then the English one.the German Wikipedia contains more person infoboxes thanthe German Wikipedia contains more person infoboxes than the English one.

IdeaIdeaAugment the infobox dataset with facts from other Wikipedia editions.

Result A much richer DBpedia dataset.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 31: DBpedia - An Interlinking Hub in the Web of Data

Augment DBpedia with Data from External Sources

Opportunitythe Linking Open Data cloud provides lots of useful data which is not contained in Wikipedia yet.For instance:For instance:

- EuroStat provides additional statistical information about countries.- Musicbrainz contains additional information about other bands.- Geonames provides additional information about locations.

Idea Augment DBpedia with additional data from external sources.

ResultResultA much richer DBpedia dataset.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 32: DBpedia - An Interlinking Hub in the Web of Data

Live Update

Current SituationDBpedia update cycle: 3 monthWikipedia provides a (commercial) live update stream

OpportunityIncrease the currency of the DBpedia dataset using this update stream

ResultDBpedia in synchronization with WikipediaDBpedia in synchronization with Wikipedia.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 33: DBpedia - An Interlinking Hub in the Web of Data

Contribute back to the Wikipedia Community

Opportunityaugmentation with data from the LOD cloud makes the DBpedia dataset richer than Wikipedia itself.infobox data is extracted from Wikipedia editions in various languagesinfobox data is extracted from Wikipedia editions in various languages.

IdeaExtend the Wikipedia authoring environment with

- Suggestions for infobox values- Cross-language consistency checking for infoboxes- Cross-language consistency checking for infoboxes

Initialize Wikipedia Clean-Up CyclesInitialize Wikipedia Clean-Up CyclesData-driven search interfaces expose the weaknesses of Wikipedia template system.Preferred items not showing up in end-user interfaces may motivate Wikipedia editors to use templates more stringently.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 34: DBpedia - An Interlinking Hub in the Web of Data

Organization of the DBpedia Project

Presentmembers spare-timecross-financed from research grants about other topicsmaster theses

Futureresearch grantsindustrial grantsDBpedia Foundation

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 35: DBpedia - An Interlinking Hub in the Web of Data

4. The BBC, DBpedia and the Web of Data

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 36: DBpedia - An Interlinking Hub in the Web of Data

What benefits could DBpedia and the Web of Data provide for the BBC?p

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 37: DBpedia - An Interlinking Hub in the Web of Data

DBpedia as Controlled Vocabulary

DBpedia provides identifiers for concepts from various domains.

Identifiers are backed with concept descriptions (data, multi-lingual abstracts).

DBpedia evolves as Wikipedia changes.

BBC CIS / DB di I t li ki ( i t 3 )

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

BBC CIS / DBpedia Interlinking (session at 3 pm)

Page 38: DBpedia - An Interlinking Hub in the Web of Data

DBpedia as Data Supplier for the BBC

DBpedia provides data about a wide range of domains2.49 million things218 million factsmulti-lingual abstracts

DBpedia provides four classification hierarchiesWikipedia categoriesInfobox-based taxonomyYAGO taxonomyOpenCyc

DBpedia evolves as Wikipedia changes

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 39: DBpedia - An Interlinking Hub in the Web of Data

The Web of Data as Data Supplier for the BBC

The Web of Data is growing rapidly.

Being interlinked with DBpedia allow the BBC to access various datasets.

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 40: DBpedia - An Interlinking Hub in the Web of Data

What contribution could the BBC provide to the Web of Data?p

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 41: DBpedia - An Interlinking Hub in the Web of Data

Un-silo BBC content

The BBC provides lots of high-quality content.

Publishing RDF metadata about content provides the basis for intelligent mashups.

(mockup)

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 42: DBpedia - An Interlinking Hub in the Web of Data

Freebase and DBpedia

Freebasecommercial company57 million US$ venture capitalhas to make money in the long run

DBpediaopen community effort (as the Wikipedia community itself)0 US$ aims at improving Wikipedia and the Web itself

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

Page 43: DBpedia - An Interlinking Hub in the Web of Data

Thanks!

ReferencesDBpedia http://dbpedia org/Abouthttp://dbpedia.org/About

W3C Linking Open Data Project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

Tutorial: How to Publish Linked Data on the Webhttp://www4 wiwiss fu berlin de/bizer/pub/LinkedDataTutorial/

Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/