20
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Entity Linking Using Generic Linked Data Datasets PhD Day – April/2013 Bianca Pereira

PhD Day: Entity Linking using Generic Linked Data Datasets

Embed Size (px)

DESCRIPTION

Presentation at 4th NLP PhD Day at National University of Ireland, Galway (DERI) at 23/04/2013

Citation preview

Page 1: PhD Day: Entity Linking using Generic Linked Data Datasets

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Entity Linking Using Generic Linked Data Datasets

PhD Day – April/2013Bianca Pereira

Page 2: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Agenda

Motivation Problem Related Work Research Questions Next Steps Challenges

2 of XYZ

Page 3: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Motivation

Biggest part of the content available on the web is unstructured natural language text.

How to structure natural language texts in order to be easier to process them?

3 of XYZ

Page 4: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Motivation

There are three possible solutions to this problem:

Extract knowledge from text according to a given structure (ontology population from text).

Extract knowledge from text without using a previous structure (ontology learning from text).

Link mention from text with entities from a structured knowledge base (entity linking).

4 of XYZ

Page 5: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Motivation

Entity Linking..

.. enables reusing knowledge already published on the web.

.. can be used as the first step for ontology learning and population algorithms.

5 of XYZ

Page 6: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Motivation

Many datasets have been used for Entity Linking:

Relational datasets Wikipedia DBPedia, YAGO, MusicBrainz, Freebase, …

6 of XYZ

Page 7: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Motivation

Linked Data datasets are promising because..

.. many of them are public. .. they are already structured. .. they are interlinked. .. they are available under diverse ownership. .. they provide knowledge in diverse domains. .. the LOD cloud is growing.

7 of XYZ

Page 8: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Motivation

There are already some Entity Linking solutions using Linked Data datasets.

8 of XYZ

Page 9: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Problem

Current Entity Linking Approaches work only with a small fixed number of Linked Data datasets.

AIDA (YAGO) Alchemy API (CIA Factbook, CrunchBase, Freebase,

GeoNames, MusicBrainz, OpenCyc, UMBEL, US Census, YAGO)

DBPedia Spotlight (DBPedia) Open Calais (Calais)

9 of XYZ

Page 10: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Problem

Current tools work well with generic knowledge and public datasets. But what do we do if we want to..

.. link an enterprise text with a private dataset?

.. identify domain specific entities?

10 of XYZ

Page 11: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Problem

AELA (Adaptive Entity Linking Approach) was developed to solve this problem..

11 of XYZ

Page 12: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Problem

What AELA does not solve is..

.. the recognition of generalized entities/topics (such as genes and diseases).

.. the recognition of individuals with the same name as their classes (such as ambulance, coffee machine and airplane).

12 of XYZ

Page 13: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Related Work

Which topics are related to Entity Linking?

Entity Resolution, coreference resolution, merge-purge, data deduplication, object identification, mention matching, tuple matching, record linkage, entity disambiguation, anaphora resolution, instance identification, database hardening, entity identification, identity resolution, reference reconciliation, record matching, name matching, identity uncertainty, duplicate detection, entity matching, instance matching, entity consolidation, entity reconciliation, object consolidation, topic consolidation, reference disambiguation, instance fusion, data fusion.

13 of XYZ

Page 14: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Related Work

14 of XYZ

Page 15: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Research Questions

Which methods created in last 5 decades can be used to improve AELA results?

How can AELA adapt itself to a given domain?

What are the use cases in which AELA can be applied? Is it better than previous approaches?

May AELA be language independent?

15 of XYZ

Page 16: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Research Questions

The most important question..

What is an entity!?

Object? Concept? Topic?

16 of XYZ

Page 17: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Next Steps

Survey the methods used in related areas.

Evaluation of the methods within AELA architecture.

Develop a method to select a given Linked Data dataset given the domain from text.

Apply AELA to news domain.

Evaluate AELA using datasets in other languages.

17 of XYZ

Page 18: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Next Steps

Define entity.

18 of XYZ

Page 19: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

Challenges

Many previous works

Big Data issues

Linked Data issues (standards and data quality)

Evaluation issues

19 of XYZ

Page 20: PhD Day: Entity Linking using Generic Linked Data Datasets

Digital Enterprise Research Institute www.deri.ie

QUESTIONS?

20 of XYZ