Upload
arva
View
28
Download
0
Embed Size (px)
DESCRIPTION
Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al. Overview. 23 student WSD projects combined in a 2-layer voting scheme (an ensemble of ensemble classifiers). - PowerPoint PPT Presentation
Citation preview
Taking the Kitchen Sink Seriously:
An Ensemble Approach to Word Sense Disambiguation from
Christopher Manning et al.
Overview
● 23 student WSD projects combined in a 2-layer voting scheme (an ensemble of ensemble classifiers).
● Performed well on SENSEVAL-2: 4th place out of 21 supervised systems on the English Lexical Sample task.
● Offers some valuable lessons for both WSD and ensemble methods in general.
System Overview
● 23 different "1st order" classifiers.
– Independently developed WSD systems.
– Use a variety of algorithms (naïve bayes, n-gram, etc.).
● These 1st order classifiers combined into a variety of 2nd order classifiers/voting mechanisms.
– 2nd order classifiers vary with respect to:
● Algorithms used to combine 1st order classifiers.● Number of voters. Each takes the top k 1st order,
where k is one of {1,3,5,7,9,11,13,15} .
Voting Algorithms
● Majority vote (each vote has weight 1).
● Weighted voting, with weights determined by EM.
– Tries to choose weights that maximize the likelihood of 2nd order training instances, where the probability of a sense (given the votes) is defined as the sum of weighted votes for that sense.
● Maximum entropy using features derived from the votes of the 1st order classifiers.
Classifier Construction Process● For each word:
– Train each 1st order on ¾ of training data
– Use remaining ¼ of data to rank performance of 1st orders
– For each 2nd order classifier:
● Take the top k 1st orders for this word● Train the 2nd order on ¾ of training data using
this ensemble– Rank performance of 2nd orders with ¼ of training
data
– Take the top 2nd order as the classifier for this word. Retrain on all the training data.
Results
● 61.7% accuracy in SENSEVAL-2 competition (4th place).
● After competition, improved performance:
– Used global performance (i.e., over all words) as a tie breaker for rankings of both 1st and 2nd order .
– Improved accuracy to 63.9% (would have been 2nd).
Results for 2nd Order Classifiers
● Results are averaged over all words.
● Note MaxEnt's ability to resist dilution.
Evaluating Effects of Combination● We want different classifiers to make different mistakes.
● We can measure this differentiation as the average (over all pairs of 1st order classifiers) of the fraction of errors that are shared (error independence).
● When error independence and word difficulty grow, the advantage of combination grows.
Lessons for WSD
● Every word is a separate problem.
– All 1st and 2nd order classifiers had some words on which they did the best.
● Implementation details:
– Large or small window sizes work better than medium window sizes.
– This suggests that senses are determined on both a very local, collocational level and a very general, topical level.
– Smoothing is very important.
Lessons for Ensemble Methods
● Variety within the ensemble is desirable.
– Qualitatively different approaches are better than minor perturbations in similar approaches.
– We can measure the extent to which this ideal is achieved.
● Variety in combination algorithms helps as well.
– In particular, it can help with overfitting (because different algorithms will start overtraining at different points).