Upload
chris-bizer
View
2.889
Download
2
Embed Size (px)
DESCRIPTION
Given and overview about the DBpedia project and the role of DBpedia in the Web of Data and outlines the next steps from the Dbpedia project as well as ideas for using DBpedia data within the BBC.
Citation preview
DBpediaDBpedia
A I t li ki H b i th W b f D t An Interlinking-Hub in the Web of Data
Chris BizerChris BizerChristian BeckerGeorgi Kobilarov
Freie Universität Berlin
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)London. September 4, 2008
Hello
Name Chris Bizer
Job Associate Professor at Freie Universität Berlin, Germany
ProjectsProjectsRAP - RDF API for PHP (together with Universität Leipzig)D2RQ und D2R Server (together with HP Labs Bristol)D2RQ und D2R Server (together with HP Labs Bristol)Named Graphs and NG4J (together with HP Labs Bristol)Fresnel Display Vocabulary (together with MIT and INRIA)Fresnel Display Vocabulary (together with MIT and INRIA)DBpedia (together with Universität Leipzig and OpenLink)Linking Open Data (community project sponsored by W3C)Linking Open Data (community project sponsored by W3C)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Outline
1. The DBpedia Project
2. DBpedia and the Web of Data
3. What’s next for DBpedia?
4. The BBC, DBpedia, and the Web of Data
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
DBpedia
DBpedia is a community effort toextract structured information from Wikipediamake this information available on the Web under an open licenseinterlink the DBpedia dataset with other open datasets on the Web
ContributorsFreie Universität Berlin (Germany)Universität Leipzig (Germany)OpenLink Software (UK)Linking Open Data Community (W3C SWEO)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Structured Information in Wikipedia
Wikipedia consists of 11.2 million articles (2.49 million in English)in 264 languagesmonthly growth-rate: 4%
Wikipedia articles contain structured informationinfoboxes which use a template mechanismcategorization of the article images depicting the article’s topiclinks to external webpagesintra-wiki links to other articlesinter-language links to articles about the same topic in different languages
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Extracting Structured Information from Wikipedia
http://en.wikipedia.org/wiki/Calgary
<http://dbpedia.org/resource/Calgary>
dbpedia:native_name “Calgary” ;
dbpedia:elevation “1048” ;
dbpedia:population_city “988193” ;
db di l ti t “1079310”dbpedia:population_metro “1079310” ;
mayor_name
dbpedia:Dave Bronconnier ;dbpedia:Dave_Bronconnier ;
governing_body
dbpedia:Calgary_City_Council ;_ _
...
using a PHP extraction frameworkusing a PHP extraction frameworkframework session: Tomorrow
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The DBpedia Dataset
Data about 2.49 million “things”including at leastincluding at least 108,000 persons 392,000 places p57,000 music albums36,000 films 80,000 species
Altogether 218 million pieces of information (RDF triples)Altogether 218 million pieces of information (RDF triples) 29 million triples originate from infoboxes 588,000 links to pictures, p3,150,000 links to relevant external web pages 207,000 Wikipedia categories75,000 YAGO categories
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Multi-Lingual Abstracts
The dataset contains a short and a long abstract for each concept.concept.Short abstracts
English: 2 490 000English: 2,490,000 German: 391,000 French: 383 000French: 383,000 Dutch: 284,000 Polish: 256 000Polish: 256,000 Italian: 286,000 Spanish: 226 000Spanish: 226,000 Japanese: 199,000 Portuguese: 246 000Portuguese: 246,000 Swedish: 144,000 Chinese 101 000
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Chinese: 101,000
Accessing the DBpedia Dataset over the Web
1. SPARQL Endpoint
2 Linked Data Interface2. Linked Data Interface
3. Data Dumps for Download
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The DBpedia SPARQL Endpoint
http://dbpedia.org/sparql
hosted on a OpenLink Virtuoso server
can answer SPARQL queries likecan answer SPARQL queries likeGive me all Sitcoms that are set in NYC? All German musicians that were born in Berlin in the 19th century?All German musicians that were born in Berlin in the 19th century?All tennis players from Moscow? All films by Quentin Tarentino?All films by Quentin Tarentino? All soccer players with tricot number 11, playing for a club having a stadium with over 40,000 seats and is born in a country with over 10 million inhabitants?
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Improving Wikipedia Search
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
2. DBpedia and the Web of Data
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The Web of Documents
The Web is a single information space b ild t d d d h li kbuild on open standards and hyperlinks.
Web Browsers
Search Engines
HTTP
HTML HTML HTMLhyper h h
HTMLhyperlinks
hyperlinks
hyperlinks
A B C D
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
A B C D
Linked Data
Use RDF and HTTP to1. publish structured data on the Web,2. set links between data from one data source2. set links between data from one data source
to data within other data sources.
Thing Thing Thing Thing ThingThing
Thing
Thing
Thing
Thing
Thing Thing
Thing
Thing
Thing
datalink
datalink
datalink
datalink
B CA D E
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
What can I do with this?
Search Engines
Linked DataMashups
Linked DataBrowsers Engines MashupsBrowsers
HTTP HTTP
Thing Thing Thing Thing Thing
HTTP HTTP
Thing
Thing
Thing
Thing
Thing
Thing Thing
Thing
Thing
Thing
datalink
datalink
datalink
datalink
B CA D E
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Falcons
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
DBtune Slashfacet
visualizes music-related Linked Datauses LastFM MySpace and BBC datauses LastFM, MySpace, and BBC data
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
DBpedia Mobile
Geospatial entry point into the Web of Datainto the Web of Data
Starts with DBpedia, R d Fli k d tRevyu and Flickr data
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
DERI Semantic Web Pipes
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Is the Web of Data real?
Thing Thing Thing Thing ThingThing
Thing
Thing
Thing
Thing
Thing Thing
Thing
Thing
Thing
typedlinks
typedlinks
typedlinks
typedlinks
B CA D E
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
W3C Linking Open Data Project
Community effort toypublish existing open license datasets as Linked Data on the Webinterlink things between different data sources
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
LOD Datasets on the Web: May 2007
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)Over 500 million RDF triples.
LOD Datasets on the Web: April 2008
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)Over 2 billion RDF triples.
DBpedia as Interlinking Hub
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
3. What is next for DBpedia?
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Improve the Quality of Extracted Data
Problem chaotic usage of infoboxes within Wikipedia
Solutionsmarter version of the infobox extractorsmushes multiple properties with the same meaningsmushes different infoboxes for the same classuses knowledge about property rangesgenerates a clean class hierarchy
StatusStatusclean dataset available within the next 2 weeks
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Better Interfaces for Common Wikipedia Users
Cooperation with Neophonie (Berlin search engine company)
Direction: free-text search + facet-browsing
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Cross-Language Data Fusion
Opportunitythere are 264 Wikipedia Editions in different languages.there are cross-language links.the Italian Wikipedia knows more about Italian villages then the English one.the German Wikipedia contains more person infoboxes thanthe German Wikipedia contains more person infoboxes than the English one.
IdeaIdeaAugment the infobox dataset with facts from other Wikipedia editions.
Result A much richer DBpedia dataset.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Augment DBpedia with Data from External Sources
Opportunitythe Linking Open Data cloud provides lots of useful data which is not contained in Wikipedia yet.For instance:For instance:
- EuroStat provides additional statistical information about countries.- Musicbrainz contains additional information about other bands.- Geonames provides additional information about locations.
Idea Augment DBpedia with additional data from external sources.
ResultResultA much richer DBpedia dataset.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Live Update
Current SituationDBpedia update cycle: 3 monthWikipedia provides a (commercial) live update stream
OpportunityIncrease the currency of the DBpedia dataset using this update stream
ResultDBpedia in synchronization with WikipediaDBpedia in synchronization with Wikipedia.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Contribute back to the Wikipedia Community
Opportunityaugmentation with data from the LOD cloud makes the DBpedia dataset richer than Wikipedia itself.infobox data is extracted from Wikipedia editions in various languagesinfobox data is extracted from Wikipedia editions in various languages.
IdeaExtend the Wikipedia authoring environment with
- Suggestions for infobox values- Cross-language consistency checking for infoboxes- Cross-language consistency checking for infoboxes
Initialize Wikipedia Clean-Up CyclesInitialize Wikipedia Clean-Up CyclesData-driven search interfaces expose the weaknesses of Wikipedia template system.Preferred items not showing up in end-user interfaces may motivate Wikipedia editors to use templates more stringently.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Organization of the DBpedia Project
Presentmembers spare-timecross-financed from research grants about other topicsmaster theses
Futureresearch grantsindustrial grantsDBpedia Foundation
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
4. The BBC, DBpedia and the Web of Data
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
What benefits could DBpedia and the Web of Data provide for the BBC?p
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
DBpedia as Controlled Vocabulary
DBpedia provides identifiers for concepts from various domains.
Identifiers are backed with concept descriptions (data, multi-lingual abstracts).
DBpedia evolves as Wikipedia changes.
BBC CIS / DB di I t li ki ( i t 3 )
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
BBC CIS / DBpedia Interlinking (session at 3 pm)
DBpedia as Data Supplier for the BBC
DBpedia provides data about a wide range of domains2.49 million things218 million factsmulti-lingual abstracts
DBpedia provides four classification hierarchiesWikipedia categoriesInfobox-based taxonomyYAGO taxonomyOpenCyc
DBpedia evolves as Wikipedia changes
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
The Web of Data as Data Supplier for the BBC
The Web of Data is growing rapidly.
Being interlinked with DBpedia allow the BBC to access various datasets.
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
What contribution could the BBC provide to the Web of Data?p
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Un-silo BBC content
The BBC provides lots of high-quality content.
Publishing RDF metadata about content provides the basis for intelligent mashups.
(mockup)
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Freebase and DBpedia
Freebasecommercial company57 million US$ venture capitalhas to make money in the long run
DBpediaopen community effort (as the Wikipedia community itself)0 US$ aims at improving Wikipedia and the Web itself
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
Thanks!
ReferencesDBpedia http://dbpedia org/Abouthttp://dbpedia.org/About
W3C Linking Open Data Project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Tutorial: How to Publish Linked Data on the Webhttp://www4 wiwiss fu berlin de/bizer/pub/LinkedDataTutorial/
Christian Bizer: An Interlinking-Hub in the Web of Data (9/4/2008)
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/