NUIG Research Showcase 2014

Entity Linking with Multiple Knowledge Bases

What is the text talking about?

Motivation

Written communication has been a common way of sharing knowledge between humans.

But machines understand natural language text as a sequence of characters without anymeaning.

When asked about a term (sequence of characters) the computer can spot that sequence butcannot explain its meaning.

Bianca Pereira

This project has been funded by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.

Proposed Solution

Even big cross-domain Knowledge Bases do not cover all knowledge in the world.

Then, our solution aims the use of multiple Knowledge Bases to perform Entity Linking. Inother words, we want to enable the use of different sources of concepts.

Our approach is based on three main steps: selection of textual features, selection ofKnowledge Base Features, and use of a Collective Inference Algorithm.

When a human reader wants to understand the content of a text she uses the words around agiven term to determine its meaning (context words). Noun phrases and verbs are the mainsource of information. In the same way, words appearing near the term are more relevantthan those appearing far in the text. In a computer-based environment those features areextracted and used to measure how probable a given concept in the knowledge base hasbeen cited by that term.

When analyzing those context words, a human also performs the mapping between thewords in the text and her previous knowledge. This is used to modify the probability that theterm is citing a given concept instead of another one. In a computer-based environment, therelationship between concepts in a Knowledge Base can be used to modify the probability oflinking with a given entry.

In the last step, a human uses the coherence characteristic of a text to perform theunderstanding of all terms. The basic assumption is that terms appearing in a coherent textare somehow related in the previous knowledge of the reader (unless they are conceptsintroduced by the text). In a computer-based environment, this step aggregates all featuresand, using the probabilities computed, detect all the best linking between each term in thetext and their respective concepts in the Knowledge Base. This is done through a processcalled Collective Inference.

Problem Statement

Natural language texts are hard to understand due to two linguistic features: polysemy and synonymy.

Related Work

Humans process the content of a text first by matching the terms with their previousknowledge. In a computer-based environment this previous knowledge is given by aKnowledge Base.

In Computer Science, the process that mimics this linking process is called Entity Linking. Itis the task of linking terms in a text with Knowledge Base entries that represent the same realworld concept.

Previous work [1][2] have been successful in linking text with cross-domain Knowledge Bases(e.g. Wikipedia, DBPedia and YAGO).

Challenges

The disambiguation of terms is our key challenge. In other words, the definition of the right concept for each term cited in text.

Since our goal is in the use of multiple Knowledge Bases there are also two other challenges to address: the processing of Big Data and the hetereogeneity in the semantic descriptionof Knowledge Bases.

This text is

not meaningful

for machines.

This text is not

meaningful for

machines.

SOURCE: http://google.com SOURCE: http://bing.com SOURCE: http://yahoo.com

Polysemy happens when a single termmay be related to more than one concept.

Synonymy happens when there are manyterms that refer to the same concept.

Jackson

NUIG

National University

of Ireland, Galway

Michael Jackson, the singer of Black or White, died in 2009.

http://en.wikipedia.org/wiki/Michael_Jackso

n

http://en.wikipedia.org/wiki/Black_or_White

X X

I started my night watching Copacabana and ended in a party dancing

Havana D’Primera.

Michael Jackson, the composer of Blame it on the Boogie, has the same

name of the member of Jackson 5.

? ?

context words

http://musicbrainz.org/work/8ffc75e5-

3ddb-4a6a-a2d5-8ec5ecee1c78

singer_of composer_of

http://musicbrainz.org/artist/f27ec8db-

af05-4f36-916e-3d57f91ecf5e

http://musicbrainz.org/artist/059e57d8-

af63-4d90-8078-ebed36985fff

Michael Jackson, the composer of Blame it on the Boogie, has the same

name of the member of Jackson 5.

?? ?

Main Findings

Not all Knowledge Bases contain textual descriptions for all concepts. As major previous workassume.

Is it possible to perform Entity Linking with Knowledge Bases other than the previous cross-domain ones [3]?

How is the method when applied in cross-domain ones [4]?

To be continued.. (a.k.a. Future Work)

References

[1] Hachey, B., Radford, W., Nothman, J., Honnibal, M., & Curran, J.R. (2013). Evaluating Entity Linking with Wikipedia. Artificial Intelligence,

194, 130-150.

[2] Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., … & Weikum, G. (2011, July). Robust Disambiguation of

named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782 -792). Association

for Computational Linguistics.

[3] Pereira, B., Aggarwal, N., & Buitelaar, P. (2013, May). AELA: an adaptive entity linking approach. In Proceedings of the 22nd international

conference on World Wide Web companion (pp. 87-88). International World Wide Web Conferences Steering Committee.

[4] EuroSentiment Project. Work Package 4. http://eurosentiment.eu

Pictures from http://pixabay.com

http://eurosentiment.eu

Technology

NUIG Research Showcase 2014