9

Click here to load reader

Msm2013challenge

  • Upload
    fgodin

  • View
    411

  • Download
    2

Embed Size (px)

DESCRIPTION

Our submission for the Making Sense of Micropost IE Challenge at the World Wide Web Conference 2013

Citation preview

Page 1: Msm2013challenge

ELIS – Multimedia Lab

Fréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

MSM2013 IE Challenge: Leveraging Existing Tools for

Named Entity Recognition in Microposts

Multimedia Lab, Ghent University – iMinds, Belgium

Image and Video Systems Lab, KAIST, South Korea

Page 2: Msm2013challenge

2

ELIS – Multimedia Lab

MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

Making Sense of Micropost Workshop @ World Wide Web Conference 2013

Introduction: The challenge

Existing tools for NER are developed for news corpera

Develop NER tools for microposts

4 entity types: PersonLocationOrganisation Miscellaneous (film/movie, entertainment award event, political event, programming language, sporting event and TV show)

Page 3: Msm2013challenge

3

ELIS – Multimedia Lab

MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

Making Sense of Micropost Workshop @ World Wide Web Conference 2013

How do current NER tools perform? (1)

Rizzo et al. evaluated the performance of:AlchemyAPI, DBpedia Spotlight, Evri, Extractiv, OpenCalais and Zemanta

On:5 TED talks, 1000 news articles, and 217 conference abstracts.

Could we do the same evaluation for microposts?

Page 4: Msm2013challenge

4

ELIS – Multimedia Lab

MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

Making Sense of Micropost Workshop @ World Wide Web Conference 2013

How do current NER tools perform? (2)

Preprocessing: convert bracket tokens to brackets

Note: values can differ based on ontology mapping used!

PER LOC ORG MISCAlchemyAPI 78.20% 74.60% 54.40% 10.20%

Spotlight (0.2) 57.60% 46.40% 24.40% 5.00%Spotlight (0.5) 32.90% 3.70% 6.50% 7.30%

OpenCalais 69.30% 73.10% 55.80% 31.40%Zemanta 70.40% 64.30% 48.10% 29.30%

F1 values

Page 5: Msm2013challenge

5

ELIS – Multimedia Lab

MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

Making Sense of Micropost Workshop @ World Wide Web Conference 2013

How do current NER tools perform? (3)

AlchemyAPI: performs bad in recognizing exotic names, small villages, buildings and organizations

Zemanta: same as AlchemyAPI + relies on capitalisation

OpenCalais: bad in recognizing small villages, buildings and organizations. Does recognize big events!

DBpedia Spotlight: returns multiple ‘possible’ entities

What if we combine the power of all 4 services?

Page 6: Msm2013challenge

6

ELIS – Multimedia Lab

MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

Making Sense of Micropost Workshop @ World Wide Web Conference 2013

Combining existing services (1)

Apply machine learning on a feature vector of the output of the different services

AlchemyAPI DBpedia Spotlight OpenCalais Zemanta

Random Forest

Confidence level PER, LOC, ORG, MISC

Service specific entity

16 features

PER, LOC, ORG, MISC

Page 7: Msm2013challenge

7

ELIS – Multimedia Lab

MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

Making Sense of Micropost Workshop @ World Wide Web Conference 2013

Combining existing services (2)

Evaluation on entity type

PER LOC ORG MISC

Spotlight (0.2) 82.20% 75.70% 60.40% 47.40%

Spotlight (0.5) 81.60% 74.30% 59.40% 40.50%

Noisy input data gives better results

(final results on test set are not included and are part of the challenge)

Page 8: Msm2013challenge

8

ELIS – Multimedia Lab

MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

Making Sense of Micropost Workshop @ World Wide Web Conference 2013

Conclusions

Current NER tools do perform well in most cases

Shortcomings: Incorrect use of capital lettresAbbreviations of organisationsSmall villages, counties and buildings

Combining the output of several services yields good results

Page 9: Msm2013challenge

9

ELIS – Multimedia Lab

MSM2013 IE Challenge: Leveraging Existing Tools for Named Entity Recognition in MicropostsFréderic Godin, Pedro Debevere, Erik Mannens, Wesley De Neve and Rik Van de Walle

Making Sense of Micropost Workshop @ World Wide Web Conference 2013

#Questions @frederic_godin #MMLab