12
Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia Milan Dojchinovski 1 , Tomas Kliegr 2 1 Faculty of Information Technology Czech Technical University in Prague 2 Faculty of Informatics and Statistics University of Economics, Prague The 7th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2012) November 22-23, 2012, Smolenice, SK Milan Dojchinovski milan.dojchinovski@fit.cvut.cz - @m1ci - http://dojchinovski.mk Except where otherwise noted, the content of this presentation is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Czech Technical University in Prague University of Economics Prague

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Embed Size (px)

DESCRIPTION

The 7th Workshop on Intelligent and Knowledge Oriented Technologies, Smolenice, Slovakia.

Citation preview

Page 1: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Milan Dojchinovski1, Tomas Kliegr2

1 Faculty of Information TechnologyCzech Technical University in Prague

2Faculty of Informatics and StatisticsUniversity of Economics, Prague

The 7th Workshop on Intelligent and Knowledge Oriented Technologies (WIKT 2012)November 22-23, 2012, Smolenice, SK

Milan [email protected] - @m1ci - http://dojchinovski.mk

Except where otherwise noted, the content of this presentation is licensed underCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

Czech Technical University in Prague

University of Economics Prague

Page 2: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Overview

‣ Introduction

‣ Entity Recognition, Classification and Publication

‣ Experiments

‣ Conclusion and Future Work

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 2

Page 3: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Introduction

‣ Unsupervised and fully-automated:- entity recognition - rule based lexico-syntactic patterns- entity classification by extraction of hypernyms - targeted hypernym extraction- entity linking to DBpedia concepts

‣ Publication as Linked Data- results in NLP Interchange Format (NIF)

3Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk

Page 4: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Overview

‣ Introduction

‣ Entity Recognition, Classification and Publication

‣ Experiments

‣ Conclusion and Future Work

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 4

Page 5: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Tool Architecture

5

‣ Available as Web 2.0 application at: http://ner.vse.cz/thd‣ Web API available at: http://ner.vse.cz/thd/docs

Fig 1. Architecture overview

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk

Page 6: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Entity Recognition and Classification

6

‣ Entity Recognition- 2 JAPE grammars: 1) NNP+ 2) JJ* NN+- input: free text- output: Named (e.g., “Diego Maradona”) or Common Entities (e.g., “hockey player”)

‣ Entity Classification- supported by the Targeted Hypernym Discovery algorithm- lexico-syntactic patterns, e.g. “_x_ is a _y_ “

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk

Page 7: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Entity Linking and Publication

7

‣ Entity Linking- linking with concepts from DBpedia- used Wikipedia Search API- mapping Wikipedia article URL to its DBpedia representation

‣ Publication in NIF- NLP Interchange Format (RDF-based representation)- each processed document (context) has unique identifier- each entity and hypernym as offset-based string

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk

Page 8: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Overview

‣ Introduction

‣ Entity Recognition, Classification and Publication

‣ Experiments

‣ Conclusion and Future Work

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 8

Page 9: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Experiments

9

‣ Question addressed- How well our tool recognizes, classifies and links Named and Common Entities?

‣ Experiment setup- manually created dataset, “Czech Traveler Dataset”- 101 Named Entities, 85 Common Entities- comparison with 3 other systems: DBpedia Spotlight, Open Calais, Alchemy API

‣ Results- Named Entities,

• f-score: recognition 0.66, classification 0.66, linking 0.58

- Common Entities

• f-score: recognition 0.60, classification 0.51, linking 0.61

- better results in all tasks

• overtaken only by DBpedia Spotlight - linking of common entities with f-score 0.69

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk

Page 10: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Overview

‣ Introduction

‣ Entity Recognition, Classification and Publication

‣ Experiments

‣ Conclusion and Future Work

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk 10

Page 11: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Conclusion and Future Work

11

‣ Tool for Entity Recognition, Classification and Publication

‣ Future directions- multilingual support - Dutch, German and Czech language- grammar improvements- evaluation on a standard benchmark

Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia - @m1ci - http://dojchinovski.mk

Page 12: Recognizing, Classifying and Linking Entities with Wikipedia and DBpedia

Feedback

12

Thank you!Questions, comments, ideas?

Milan Dojchinovski @[email protected] http://dojchinovski.mk

Except where otherwise noted, the content of this presentation is licensed underCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

demo at: http://ner.vse.cz/thd