18
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Embed Size (px)

Citation preview

Page 1: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Ranking Definitions with Supervised Learning Methods

J.Xu, Y.Cao, H.Li and M.Zhao

WWW 2005

Presenter: Baoning Wu

Page 2: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Motivation

People may need to find definitions of terms from Web.

Traditional information retrieval is designed to search for relevant document, not suitable for this.

Google’s definition search may suffer from relying on glossary pages and ranking in alphabetic order.

Page 3: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Task for definition search

Receive a query term, usually a noun.Extract definition candidates from the

document collection.Rank the candidates according to the

degree to which each one is good.Output the result.

Page 4: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Definition search is useful

Page 5: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Candidates are not all good definitions

Page 6: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Three categories of definitions

Good: must contain the general notion of the term and several important properties.

Bad: neither describes the general notion nor the properties of the term.

Indifferent: between good and bad.

Page 7: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

First step: collecting candidates

Parse all sentences with a Base NP (base noun phrase) parser and identify <term> with <term> is the first Base NP of the first sentence. Two Base NPs separated by “of” or “for” are considered

as <term>

Extract definition candidates with patterns: <term> is a|an|the * <term>, *, a,|an|the * <term> is one of *

Page 8: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Second step: Ranking candidates

Ranking based on Ordinal Regression (ordinal classification). Ranking SVM is used.

Ranking based on classification SVM is used.

Page 9: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Ranking based on Ordinal Regression

Ordinal regression is a problem in which the classifiers classifies instances into a number of ordered categories.

Ranking SVM is used as the model.For each candidate x,

U(x)=wTx, where w represents a vector of weights.

The higher of U(x), the better x is as a definition

Page 10: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Ranking based on Classification

Only good and bad definitions are used. It is a binary classification.

SVM is used as the model.F(x)= wTx+b

Page 11: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Features

Page 12: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Removing redundant candidates

After ranking, duplicate definition may exist.

Use Edit distance to remove the one with a lower ranking score.

Page 13: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Sample result

Page 14: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Evaluation metric

Page 15: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Results: For intranet data

Page 16: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Results: For TREC.gov data

Page 17: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Results: for definitional sentences

Page 18: Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

Conclusions

Address the issue of searching for definitions by definition ranking.

Results are better than traditional IR.Enterprise search system has been

developed.Not limited to search of definitions.