47
+ Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+ Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

Embed Size (px)

Citation preview

Page 1: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+

Detecting Genre Shift

Mark Dredze, Tim Oates, Christine Piatko

Paper to appear at EMNLP-10

Page 2: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Natural Language Processing and Machine Learning

Extracting findings from scientific papers

•Genetic epidemiology (development domain)

•PubMed search produces thousands of papers

•Manually reviewed to extract findings

•Findings determine relevant papers/studies

•Automate this process with ML/NLP methods

•Create searchable database of findings

•Allow machine inference over findings

•Suggest new scientific hypotheses

Page 3: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Genre Shift in Statistical NLP

… told that John Paul Stevens is retiring this summer …

Named Entity Recognition

… President Barack Obama is urging members to …

… President Barack Obama is urging members to …

Page 4: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Supervised Machine Learning for Named Entity Recognition

Today the Atlantic Ocean is in an uproar and North Carolina remains in a state of anxiety.

Windowed Text Label

Today the Atlantic Ocean is B

the Atlantic Ocean is in I

Atlantic Ocean is in an O

Ocean is in an uproar O

is in an uproar and O

in an uproar and North O

an uproar and North Carolina O

uproar and North Carolina remains B

and North Carolina remains in I

North Carolina remains in a O

Page 5: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Supervised Machine Learning for Named Entity Recognition

Windowed Text Label

Today the Atlantic Ocean is B

the Atlantic Ocean is in I

Atlantic Ocean is in an O

Feature Vector Label

[today, the, atlantic, ocean, is, U, L, U, U, L] B

[the, atlantic, ocean, is, in, L, U, U, L, L] I

[atlantic, ocean, is, in, an, U, U, L, L, L] O

Page 6: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Genre Shift in Statistical NLP

… told that John Paul Stevens is retiring this summer …

Named Entity Recognition

… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…

???

Page 7: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+This is a Pervasive Problem

Extracting regulatory pathways from online bioinformatics journals using a parser trained on the WSJ

Finding faces in images of disaster victims using a model trained on “mug shot” images

Identifying RNA sequences that regulate gene expression in a lab in Baltimore using a model trained on data gathered in a lab in Germany

When things change in a way that’s harmful, we’d like to know!

Page 8: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Data Streams Change Over Time

Natural drift Users unaware of system limitations

Sentiment classification from movie reviews

Page 9: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Detecting Genre Shift

Two problems1)Detect changes in stream of

numbers (A-distance)2)Convert document stream to

stream of informative numbers (margin)

Genre shift hurts system performance (accuracy)

Page 10: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Detecting Genre Shift

Measure accuracy directlyRequires labeled examples!

Look for changes in feature distributionsWords become more/less commonNew words appear

Genre shift hurts system performance (accuracy)

Page 11: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Measuring Changes in Streams:The A-Distance

A nonparametric, distribution independent measure of changes in univariate, real-valued data streams (Kifer, Ben-David, and Gherke, 2004)

P P’

Page 12: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Measuring Changes in Streams:The A-Distance

P P’

> ε

Page 13: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Measuring Changes in Streams:The A-Distance

P P’

> ε

Page 14: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Changes in Document Streams

… President Barack Obama is urging members to …

X

Page 15: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Changes in Document Streams

… President Barack Obama is urging members to …

X

Obama

embassy

41

4

1

Page 16: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Changes in Document Streams

… President Barack Obama is urging members to …

XW

Obama

embassy

41

1.6

0.1 1.6 * 4 + 0.1 * 1 + … = 3.7

Page 17: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Changes in Document Streams

… President Barack Obama is urging members to …

XW

Obama

embassy

41

1.6

0.1 1.6 * 4 + 0.1 * 1 + … = 3.7

• WX = margin• sign of WX is class label (+/-)• magnitude of WX is “certainty” in label

Page 18: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Why Margins?

We have an easy way of producing them from unlabeled examples!

We want to track feature changes Margins are linear combinations of feature values Removing important features yields smaller

margins Only track features that matter, features with

zero (small) weight don’t affect margin (much)

Spoiler alert! Tracking margins works really well for unsupervised detection on genre shifts.

Page 19: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Accuracy vs. Margins

DVD to Electronics

Page 20: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Accuracy vs. Margins

DVD to Electronics

Average in block

Average over last 100 instances

Page 21: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Accuracy vs. Margins

DVD to Electronics

Page 22: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Confidence Weighted Margins

Margins can be viewed as measure of confidence

We detect when confidence in classifications drops

Confidence Weighted (CW) learning refines this idea Gaussian distribution over weight vectors Mean of weight vector: μ in RN

Diagonal co-variance matrix: σ in RNxN

Low variance high confidence

Normalized margin: μx / (xTσx)0.5

Called VARIANCE in slides that follow

μ

1.6

0.1

σ = 0.02σ = 1.74

Page 23: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Experiments

Datasets Sentiment classification between domains (Blitzer et al.,

2007) DVDs, electronics, books, kitchen appliances

Spam classification between users (Jiang and Zhai, 2007) Named entity classification between genres (ACE 2005)

News articles, broadcast news, telephone, blogs, etc.

Algorithms Baselines: SVM, MIRA, CW Our method: VARIANCE

Page 24: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Experiments

Simulated domain shifts between each pair of genres 38 pairs, 10 trials each with different random instance

orderings 500 source examples 1500 target examples

False change 11 datasets with no shift, 10 trials with different random

instance orderings

If no shift found then detection recorded as end of target examples when computing averages

Page 25: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Comparing Algorithms

Instances from point of shift

Good for o

ur

approach

!

Good for b

aselin

e

Page 26: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+SVM vs. VARIANCE

Page 27: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+SVM vs. VARIANCE

Page 28: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Summary of Results Thus Far

VARIANCE detected shifts faster than … SVM 34 times out of 38 MIRA 26 times out of 38 CW 27 times out of 38

Page 29: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Gradual Shifts

Page 30: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+What if you have labels?

STEPD: a Statistical Test of Equal Proportions to Detect concept drift (Nishida and Yamauchi, 2007)

Monitors accuracy of classifier from stream of labeled examples

Parameters: window size, W, and threshold, α

Page 31: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Comparison to STEPD

Page 32: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+What about false positives?

Page 33: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+The A-Distance: Choosing Parameters

P

> ε

A

n

Page 34: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+The A-Distance: Choosing Parameters

P

> ε

A

n

Page 35: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+The A-Distance: Choosing Parameters

• A-distance paper gives bounds on FPs and FNs• Bounds depend on n and • Bounds do not depend on tiling!• So loose as to be meaningless• No guidance on how to choose tiling

• What if tiles lie outside support of data?

Page 36: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Better Bounds

PA = true probability of a point falling in tile A

h = number of points that actually fell in A

pA = h/n = ML estimate of PA

Define P’A, h’, and p’A for second window

Suppose PA = P’A, then any change detected is a false positive

> ε

What is the probability that |pA – p’A| > /2?

Page 37: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Posterior Over PA

B(, ) is the Beta function over + Bernoulli trials

trials have one outcome (point lands in tile A)

trials have the other (point lands in some other tile)

Page 38: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+False Positives: Two Cases

Page 39: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Don’t worry, I’m not going to explain this (much)

Page 40: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Probability of a FP (n = 200)

Page 41: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Probability of FN

Page 42: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Minimizing Expected Loss

Page 43: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Moving Forward

GenreClassifier Newswire

TranscribedBroadcast

News

Twitter

Page 44: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Genre Shift “Fix”

… told that John Paul Stevens is retiring this summer …

Named Entity Recognition

… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…

Page 45: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Genre Shift “Fix”

… told that John Paul Stevens is retiring this summer …

Named Entity Recognition

… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…

… President Barack Obama is urging members to …

Page 46: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Conclusion

Changes in margins convey useful information about changes in classification accuracy No need for labeled examples!

The A-distance applied to margin streams finds genre shifts with few false positives/negatives

Confidence weighted margins normalized by variance detect shifts faster than SVM, MIRA, or (non-normalized) CW margins

Our approach even works with gradual shifts and compares favorably to shift detectors that use labeled examples

Page 47: + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+Thank you!