12
Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman

Improving Machine Learning Approaches to Coreference Resolution

  • Upload
    leoma

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Improving Machine Learning Approaches to Coreference Resolution. Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman. Goal. Improve on Soon et al. by better preprocessing (chunking, names, …) better search procedure for antecedent - PowerPoint PPT Presentation

Citation preview

Page 1: Improving Machine Learning Approaches to Coreference Resolution

Improving Machine Learning Approaches to Coreference

Resolution

Vincent Ng and Claire Cardie

Cornell Univ.

ACL 2002

slides prepared by Ralph Grishman

Page 2: Improving Machine Learning Approaches to Coreference Resolution

Goal

Improve on Soon et al. by better preprocessing (chunking, names, …) better search procedure for antecedent better selection of positive examples more features more features more features ...

Page 3: Improving Machine Learning Approaches to Coreference Resolution

Better search for antecedent

Soon et al. Use decision tree as binary classifier, take nearest antecedent classified as +ve

Ng&Cardie use same sort of classifier, but count +ve and -ve examples at each leaf, and use that to compute a probability

Ng&Cardie then take highest ranking antecedent (if probability > 0.5)

Page 4: Improving Machine Learning Approaches to Coreference Resolution

Better choice of positive examples Soon et al. always use most recent

antecedent For Ng&Cardie, if anaphor is not a

pronoun, they use most recent antecedent that is not a pronoun

Page 5: Improving Machine Learning Approaches to Coreference Resolution

More features #1

Soon et al. Have a ‘same string’ feature Ng&Cardie split this up into 3 features, for

pronominals, nominals, and names

Page 6: Improving Machine Learning Approaches to Coreference Resolution

First improvements: F scores

MUC-6 MUC-6Soon et al. 62.6 60.4Better preproc. 66.3 61.2Better search 66.3 62.3+ve ex. selection 65.8 61.1String features 66.7 62.0

Combined 67.5 63.0

Page 7: Improving Machine Learning Approaches to Coreference Resolution

More features

Added 41 more features: lexical grammatical semantic

Page 8: Improving Machine Learning Approaches to Coreference Resolution

Lexical features (examples)

Non-empty overlap of words of two NPs Prenominal modifiers of one NP are a

subset of prenominal modifiers of other

Page 9: Improving Machine Learning Approaches to Coreference Resolution

Grammatical features (examples)

NPs are in predicate nominal construct One NP spans the other NP1 is a quoted string One of the NPs is a title

Page 10: Improving Machine Learning Approaches to Coreference Resolution

Semantic features (examples)

For nominals with different heads direct or indirect hypernym relation in

WordNet distance of hypernym relation sense number for hypernym relation

Page 11: Improving Machine Learning Approaches to Coreference Resolution

Selecting features

Full feature set yielded very low precision on nominal anaphors

• overtraining: too many features for too little data

So they (manually) eliminated many features which led to low precision (on training data)

• no ‘development set’ separate from training and test sets

Page 12: Improving Machine Learning Approaches to Coreference Resolution

Adding features: F scores

MUC-6 MUC-7

Intermediate system 67.5 63.0

All features 63.8 61.6

Hand-selected features 69.1 63.4