15
Intelligent Database Systems Presenter : Chang,Chun-Chih Authors : David Milne * , Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Embed Size (px)

Citation preview

Page 1: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Presenter : Chang,Chun-Chih

Authors : David Milne * , Ian H. Witten

2012, AI

An open-source toolkit for mining Wikipedia

Page 2: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Motivation

The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles.

For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing

Page 4: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Objectives

• The Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia’s rich semantics into their own applications.

• Wikipedia Miner is intended to be a platform for sharing data mining techniques.

Page 5: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Methodology - Architecture of the wikipedia Miner toolkit

Page 6: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Methodology - Measuring relatedness between concepts

Page 7: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Methodology - Measuring relatedness between concepts

Page 8: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Methodology -Features for measuring artucle relatedness

Page 9: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Experiments - Impact of thresholds for disambiguation and detection

Page 10: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Experiments - Impact of relatedness dependencies

Page 11: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Experiments - Impact of traning data

Page 12: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Experiments - performance of the disambiguator

Page 13: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Experiments - performance of the detector

Page 14: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Conclusions

• Our aim in releasing this work open source is not to provide a complete and polished product,

• but rather a resource for the research community to collaborate around and continue building together.

Page 15: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia

Intelligent Database Systems Lab

Comments

• Advantages• Applications - wikipedia - Disambiguation - Annotation