24
Learning Within- Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Embed Size (px)

Citation preview

Page 1: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Learning Within-Sentence Semantic Coherence

Elena Eneva

Rose Hoberman

Lucian LitaCarnegie Mellon University

Page 2: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Semantic (in)Coherence

Trigram: content words unrelated Effect on speech recognition:

– Actual Utterance: “THE BIRD FLU HAS AFFECTED CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMANS SICK”

– Top Hypothesis: “THE BIRD FLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMAN SAID”

Our goal: model semantic coherence

Page 3: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

A Whole Sentence Exponential Model [Rosenfeld 1997]

P0(s) is an arbitrary initial model (typically N-gram)

fi(s)’s are arbitrary computable properties of s (aka features)

Z is a universal normalizing constant

)exp()(1

)Pr( )(0 sii

i

fsPZ

s

(

def

Page 4: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

A Methodology for Feature InductionGiven corpus T of training sentences:

1. Train best-possible baseline model, P0(s)

2. Use P0(s) to generate corpus T0 of “pseudo sentences”

3. Pose a challenge: find (computable) differences that allow discrimination between T and T0

4. Encode the differences as features fi(s)

5. Train a new model:

)exp()(1

)( )(01 sii

i

fsPZ

sP

Page 5: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Discrimination Task:

1. - - - feel - - sacrifice - - sense - - - - - - - - -meant - - - - - - - - trust - - - - truth

2. - - kind - free trade agreements - - - living - - ziplock bag - - - - - - university japan's daiwa bank stocks step –

Are these content words generated from atrigram or a natural sentence?

Page 6: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Building on Prior Work

Define “content words” (all but top 50) Goal: model distribution of content

words in sentence Simplify: model pairwise co-

occurrences (“content word pairs”) Collect contingency tables; calculate

measure of association for them

Page 7: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Q Correlation Measure

Q values range from –1 to +1

21122211

21122211

cccc

cc-cc

Q

W1 yes

W1 no

W2 yes c11 c21

W2 no c12 c22

Derived fromCo-occurrenceContingencyTable

Page 8: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Density Estimates

We hypothesized:– Trigram sentences: wordpair correlation

completely determined by distance– Natural sentences: wordpair correlation

independent of distance kernel density estimation

– distribution of Q values in each corpus– at varying distances

Page 9: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Q Distributions

Q Value

Den

sity

---- Trigram Generated Broadcast News

Distance = 1 Distance = 3

Page 10: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Likelihood Ratio Feature

ji ijij

ijij

TrigramdQ

BNewsdQL

, wordpairs ),|Pr(

),|Pr(

she is a country singer searching for fame and fortune in nashville

Q(country,nashville) = 0.76 Distance = 8Pr (Q=0.76|d=8,BNews) = 0.32 Pr(Q=0.76|d=8,Trigram) = 0.11 Likelihood ratio = 0.32/0.11 = 2.9

Page 11: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Simpler Features

Q Value based– Mean, median, min, max of Q values for content

word pairs in the sentence (Cai et al 2000)– Percentage of Q values above a threshold– High/low correlations across large/small distances

Other– Word and phrase repetition– Percentage of stop words– Longest sequence of consecutive stop/content

words

Page 12: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Datasets

LM and contingency tables (Q values) derived from 103 million words of BN

From remainder of BN corpus and sentences sampled from trigram LM:– Q value distributions estimated from ~100,000

sentences– Decision tree trained and test on ~60,000 sentences

Disregarded sentences with < 7 words – “Mike Stevens says it’s not real”– “We’ve been hearing about it”

Page 13: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Experiments

Learners: – C5.0 decision tree– Boosting decision stumps with

Adaboost.MH Methodology:

– 5-fold cross validation on ~60,000 sentences

– Boosting for 300 rounds

Page 14: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Results

Feature Set Classification

Accuracy

Q mean, median, min, max (Previous Work)

73.39 ± 0.36

Likelihood Ratio 77.76 ± 0.49

All but Likelihood Ratio 80.37 ± 0.42

All Features 80.37 ± 0.46

Likelihood Ratio + non-Q

Page 15: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Shannon-Style Experiment

50 sentences – ½ “real” and ½ trigram-generated– Stopwords replaced by dashes

30 participants– Average accuracy of 73.77% ± 6– Best individual accuracy 84%

Our classifier:– Accuracy of 78.9% ± 0.42

Page 16: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Summary

Introduced a set of statistical features which capture aspects of semantic coherence

Trained a decision tree to classify with accuracy of 80%

Next step: incorporate features into exponential LM

Page 17: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Future Work

Combat data sparsity– Confidence intervals– Different correlation statistic– Stemming or clustering vocabulary

Evaluate derived features– Incorporate into an exponential language model– Evaluate the model on a practical application

Page 18: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Agreement among Participants

Page 19: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Expected Perplexity Reduction

Semantic coherence feature– 78% of broadcast news sentences– 18% of trigram-generated sentences

Kullback-Leibler divergence: .814 Average perplexity reduction per word

= .0419 (2^.814/21) per sentence? Features modify probability of entire sentence Effect of feature on per-word probability is

small

Page 20: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Likelihood Value

Den

sity

---- Trigram Generated

Broadcast News

Distribution of Likelihood Ratio

Page 21: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Discrimination Task

Natural Sentence:– but it doesn't feel like a sacrifice in a sense that you're

really saying this is you know i'm meant to do things the right way and you trust it and tell the truth

Trigram-Generated:– they just kind of free trade agreements which have been

living in a ziplock bag that you say that i see university japan's daiwa bank stocks step though

Page 22: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Q Value

Den

sity

---- Trigram Generated Broadcast News

Q Values at Distance 1

Page 23: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Q Value

Den

sity

---- Trigram Generated Broadcast News

Q Values at Distance 3

Page 24: Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Outline

The problem of semantic (in)coherence Incorporating this into the whole-

sentence exponential LM Finding better features for this model

using machine learning Semantic coherence features Experiments and results