Upload
leoma
View
24
Download
0
Embed Size (px)
DESCRIPTION
Improving Machine Learning Approaches to Coreference Resolution. Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman. Goal. Improve on Soon et al. by better preprocessing (chunking, names, …) better search procedure for antecedent - PowerPoint PPT Presentation
Citation preview
Improving Machine Learning Approaches to Coreference
Resolution
Vincent Ng and Claire Cardie
Cornell Univ.
ACL 2002
slides prepared by Ralph Grishman
Goal
Improve on Soon et al. by better preprocessing (chunking, names, …) better search procedure for antecedent better selection of positive examples more features more features more features ...
Better search for antecedent
Soon et al. Use decision tree as binary classifier, take nearest antecedent classified as +ve
Ng&Cardie use same sort of classifier, but count +ve and -ve examples at each leaf, and use that to compute a probability
Ng&Cardie then take highest ranking antecedent (if probability > 0.5)
Better choice of positive examples Soon et al. always use most recent
antecedent For Ng&Cardie, if anaphor is not a
pronoun, they use most recent antecedent that is not a pronoun
More features #1
Soon et al. Have a ‘same string’ feature Ng&Cardie split this up into 3 features, for
pronominals, nominals, and names
First improvements: F scores
MUC-6 MUC-6Soon et al. 62.6 60.4Better preproc. 66.3 61.2Better search 66.3 62.3+ve ex. selection 65.8 61.1String features 66.7 62.0
Combined 67.5 63.0
More features
Added 41 more features: lexical grammatical semantic
Lexical features (examples)
Non-empty overlap of words of two NPs Prenominal modifiers of one NP are a
subset of prenominal modifiers of other
Grammatical features (examples)
NPs are in predicate nominal construct One NP spans the other NP1 is a quoted string One of the NPs is a title
Semantic features (examples)
For nominals with different heads direct or indirect hypernym relation in
WordNet distance of hypernym relation sense number for hypernym relation
Selecting features
Full feature set yielded very low precision on nominal anaphors
• overtraining: too many features for too little data
So they (manually) eliminated many features which led to low precision (on training data)
• no ‘development set’ separate from training and test sets
Adding features: F scores
MUC-6 MUC-7
Intermediate system 67.5 63.0
All features 63.8 61.6
Hand-selected features 69.1 63.4