25
A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight Pablo N. Mendes, Christian Bizer [email protected] Web Based Systems Group Freie Universität Berlin SemTechBiz Berlin February 6th 2011

A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

Embed Size (px)

DESCRIPTION

A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight Presented at SemTech Berlin 2012 Wikipedia is one of the most important repositories of human knowledge, containing millions of interlinked articles. The DBpedia project extracts and combines Wikipedia information into a large multilingual knowledge base that enables semantic processing in a wide range of applications. We have built DBpedia Spotlight, a tool that recognizes ambiguous terms in text and automatically assigns unambiguous definitions to those terms by connecting them to DBpedia. Such interconnection enriches information by providing explicit semantic relationships, enabling semantic indexing, faceted exploration, among other data processing enhancements. In this talk we will describe how DBpedia Spotlight can be applied to establish a virtuous cycle of semantic enhancement. On the one hand, it can enhance knowledge interconnectivity in document collections. On the other hand, it learns how to better annotate from user feedback. Such a positive feedback loop can be applied on Wikipedia itself, or in enterprises to alleviate the cold start problem and knowledge management costs.

Citation preview

Page 1: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

A Virtuous Cycle of Semantic Enhancement with

DBpedia Spotlight

Pablo N. Mendes, Christian Bizer

[email protected]

Web Based Systems Group

Freie Universität Berlin

SemTechBiz Berlin

February 6th 2011

Page 2: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Agenda

• What do we mean by semantic enhancement?

• How does DBpedia Spotlight work?

• From Wikipedia to DBpedia Spotlight

• From DBpedia Spotlight to Wikipedia

• In your project

Can you also enable a virtuous cycle?

Page 3: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Semantic Enhancement?

• Generally:

– Making something easier to understand

• For humans:

– Say what you mean (reduce ambiguity)

– Make associations

– Access to definitions, background

• For machines:

– the same, but in structured format

Page 4: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Semantic Enhancement (Example)

News Annotation Links to “topics”

Topic pages lead to related content

Semantic Enhancement Links text to unique

identifiers

Adds background information

Interconnects related content

http://nyti.ms/qsYAyt

Page 5: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

A Virtuous Cycle of Semantic Enhancement

Page 6: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

DBpedia Spotlight

• DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data

• DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages

• Learns how to recognize that a DBpedia resource was mentioned

• Given plain text as input, generates annotated text

Page 7: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

DBpedia Spotlight: Text Annotation • From:

• To:

(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

http://dbpedia.org/resource/Apple_Corps

http://dbpedia.org/resource/New_York_City

Page 8: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Challenge: Term Ambiguity

• ...this apple on the palm of my hand...

• ...Apple tried to acquire Palm Inc....

• ...eating an apple sitting next to a palm tree...

• What do “apple” and “palm” mean in each case?

• Our objective is to recognize entities/topics and disambiguate their meaning, generating DBpedia annotation in text.

Page 9: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Stage 1: Spotting

• Find substrings that seem worthy of annotation

• Simplest approach relies on a dictionary of known entity names.

– Other: Named Entity Recognition, Keyphrase Extraction, ...

(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

Output: “Lennon”, “McCartney”, “New York”, “Apple Corps”

Input:

Page 10: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Stage 2: Candidate Mapping

• Find possible meanings for each of the spotted substrings.

Input (spotted names): “Lennon”, “McCartney”, “New York”, “Apple Corps”

Output (candidate map): “Lennon”: { Lennon_(album), Lennon,_Michigan, … } “McCartney”: { McCartney(surname), Paul_McCartney, … } “New York”: { New_York_State, New_York_City, … } “Apple Corps”: { Apple_Corps }

Page 11: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Stage 3: Disambiguation

• Select the correct candidate DBpedia Resource for a given substring.

• Decision is made based on the context(1) the substring was mentioned

con·text (kntkst)n. 1. the parts of a discourse that surround a word or passage and can throw light on its meaning

http://mw1.merriam-webster.com/dictionary/context

Page 12: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Learning the Context for a resource

• Collect context for DBpedia Resources from all articles in Wikipedia

e.g. Co-occurrence Statistics John_Lennon = {John:981, Beatles:320, McCartney:100, ...}

• Types of context – Wikipedia Pages – Definitions from disambiguation pages – Paragraphs that link to resources

(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

Page 13: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

DBpedia Spotlight

http://spotlight.dbpedia.org/demo

Freely available Web Service; Open Source, Java/Scala

Apache V2 License.

Page 14: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

A Virtuous Cycle

Page 15: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Suggest

User decides to add a link

The “Suggest” Button

Page 16: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Suggest

System suggest targets, user chooses

The “Suggest” Button

Page 17: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Sztakipedia

• Developed by Mihály Héder et al. at MTA SZTAKI (Hungarian Academy of Sciences)

• Adds a toolbar to Wikipedia that can use DBpedia Spotlight to suggest links

– Also suggests Categories, Infoboxes, Books

• Helps editors to refine knowledge in Wikipedia

– More interconnections, more entity types, more structured data!

http://pedia.sztaki.hu

Page 18: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Sztakipedia (screenshots)

http://pedia.sztaki.hu

Source: http://www.youtube.com/watch? v=8VW0TrvXpl4

Page 19: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Beyond Wikipedia? RDFaCE

• Developed by Ali Khalili at U. Leipzig (AKSW)

• Helps users to add RDFa markup via a WYSIWYG interface

• Can use DBpedia Spotlight, among other services to disambiguate entity names

• Available as a Wordpress Plugin

– Enables blogs as sources of context for disambiguation

http://rdface.aksw.org/

Page 20: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

RDFaCE http://rdface.aksw.org

Page 21: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

A Virtuous Cycle in My Enterprise

• Select a database of entity identifiers

• Select textual sources that talk about those entities

• Use semantic enhancement editors (with automatic suggestions) to annotate text

• Use annotated text to re-train annotator

DBpedia Spotlight is ready for MediaWiki XML and TSV.

Other formats to come! Take a look at NIF http://nlp2rdf.org/nif-1-0

Page 22: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Semantic Enhancement Marketplace

• User-generated annotations are valuable crowdsourced knowledge

• Can be used as currency: – “sweat for web service provision”

– b2b partnerships

• Example: RoboTagger.com – entity annotation service (in German)

– entity types are not fixed (crowd-sourced)

– users rewarded with more access to web service

Page 23: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Conclusion

• Unstructured information (text) and structured information (e.g. RDF)

– Mutually dependent and beneficial

• DBpedia Spotlight sits on the border of two worlds:

– From Wikipedia, an automatic annotator

– From the auto-annotator, a more interconnected Wikipedia

• Fosters a semantic enhancement ecosystem!

Page 24: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Thank you! On Twitter: @pablomendes

E-mail: [email protected]

Web: http://pablomendes.com

http://slideshare.net/pablomendes

• Special thanks to Mihály Héder and Iavor Jelev for many fruitful discussions

• DBpedia Spotlight is partially funded by LOD2.eu

http://spotlight.dbpedia.org

Page 25: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight - SemTech Berlin 2012

FREIE UNIVERSITÄT BERLIN

http://wbsg.de

SemTechBiz Berlin, February 2012

Pablo N. Mendes, Christian Bizer: A Virtuous Cycle of Semantic Enhancement with DBpedia Spotlight

Links

• Download

– DBpedia: http://dbpedia.org/Downloads37/

– DBpedia Spotlight: http://sourceforge.net/projects/dbp-spotlight/

– RDFaCE

• http://code.google.com/p/rdface/

• Wordpress plugin: http://wordpress.org/extend/plugins/rdface/