Lightweight Multilingual Entity Extraction and Linking Slide

Speaker: Shih-Han LoAdvisor: Professor Jia-Ling Koh

Author: Aasish Pappu, Roi Blanco, Yashar Mehdad, Amanda Stent, Kapil Thadani

Date: 2017/09/19Source: WSDM ’17

Lightweight Multilingual Entity

Extraction and Linking

Outline

Introduction

Method

Experiment

Conclusion

Introduction

Key tasks for text analytic systems:

Named Entity Recognition (NER)

Named Entity Linking (NEL)

Some systems perform NER and NEL jointly.

Introduction

Most approaches involve (some of) the following steps:

Mention detection

Mention normalization

Candidate entity retrieval for each mention

Entity disambiguation for mentions with multiple candidate entities

Mention clustering for mentions that do not link to any entity

Motivation

Outline

Introduction

Method

Experiment

Conclusion

Mention Detection

Typically consists of running an NER system over input text.

We use simple CRFs and only a few lexical, syntactic and semantic features.

System Description

Candidate Entity Retrieval

Entity Embeddings

We aim to simultaneously learn D-dimensional representations of Ent and W in a common vector space.

Training our embedding model: continuous skip-grams with 300 dimensions and a window size of 10.

Entity Embeddings

Fast Entity Linking

Fast Entity Linker (FEL) is an unsupervisedapproach.

FEL imposes contextual dependencies by calculating the cosine distance between two entities. Candidate From the substrings of the input string

Minimal perfect hash function

Elias-Fano integer coding

Entity Disambiguation

Task of figuring out to which candidate entity a mention refers.

The task is complex because mentions may refer to different entities, depend on local context.

Forward-Backward Algorithm (FwBw)

Exemplar (Clustering)

Label Propagation (LabelProp)

Modified adsorption (MAD)

For , we inject seed labels L on a few nodes.

For nodes V’, we assign a label distribution:

Along with , MAD takes three hyper-parameters as input.

We pick the highest ranked label for each node in V as the final candidate.

Outline

Introduction

Method

Experiment

Conclusion

Experiment

Datasets:

Cross-lingual TAC KBP 2013

Mono-lingual AIDA-CONLL 2003

Experiment

N-best: N = 10

FwBw: λ = 0.5

Exemplar: max_iterations = 300, λ = 0.5

LabelProp: μ1 = 1, μ2 = 1e − 2, μ3 = 1e − 2

Experiment

TAC KBP Evaluation Results

Experiment

Analysis

Experiment

Analysis

Experiment

AIDA Evaluation

Experiment

Runtime Performance

Outline

Introduction

Method

Experiment

Conclusion

Our NER implementation is outperformed only by NER systems that use much more complex feature engineering and/or modeling methods.

In future work, we plan to improve the performance of our system for other languages, by expanding the pool of entities for which we have information.

Candidate retrieval in Spanish is relatively poor compared to English and Chinese.

Lightweight Multilingual Entity Extraction and Linking Slide

Documents

LOCALIZATION - MultiLingual

Multilingual education

wooCommerce Multilingual

Multilingual Ontology

Multilingual brain

Multilingual Open Educational Resources for a Multilingual and Multicultural UWC

Multilingual dictionary

Multilingual DRR

Multilingual internet

Improving multilingual catalog search services by means of multilingual … · 2020-04-20 · IMPROVING MULTILINGUAL CATALOG SEARCH SERVICES BY MEANS OF MULTILINGUAL THESAURUS DISAMBIGUATION

Multilingual sentiment analysis: from formal to informal and scarce resource languagessentic.net/multilingual-sentiment-analysis.pdf · Multilingual sentiment analysis: from formal

WordPress Multilingual

Multilingual Strategy

Multilingual MPS

ITU-T Operational Plan for 2003 · ITU-T Operational Plan for 2003 ... Res. 72 (Minneapolis, 1998): Linking strategic, ... Management of Internationalized (multilingual)

Multilingual Program

MULTILINGUAL MEETING

Linking Mobility to Pedagogy with Multilingual Immigrant Youth

CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists

Babelfying the Multilingual Web: state-of-the-art ... · A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the