Upload
grace-bates
View
224
Download
0
Embed Size (px)
Citation preview
+
Detecting Genre Shift
Mark Dredze, Tim Oates, Christine Piatko
Paper to appear at EMNLP-10
+Natural Language Processing and Machine Learning
Extracting findings from scientific papers
•Genetic epidemiology (development domain)
•PubMed search produces thousands of papers
•Manually reviewed to extract findings
•Findings determine relevant papers/studies
•Automate this process with ML/NLP methods
•Create searchable database of findings
•Allow machine inference over findings
•Suggest new scientific hypotheses
+Genre Shift in Statistical NLP
… told that John Paul Stevens is retiring this summer …
Named Entity Recognition
… President Barack Obama is urging members to …
… President Barack Obama is urging members to …
+Supervised Machine Learning for Named Entity Recognition
Today the Atlantic Ocean is in an uproar and North Carolina remains in a state of anxiety.
Windowed Text Label
Today the Atlantic Ocean is B
the Atlantic Ocean is in I
Atlantic Ocean is in an O
Ocean is in an uproar O
is in an uproar and O
in an uproar and North O
an uproar and North Carolina O
uproar and North Carolina remains B
and North Carolina remains in I
North Carolina remains in a O
+Supervised Machine Learning for Named Entity Recognition
Windowed Text Label
Today the Atlantic Ocean is B
the Atlantic Ocean is in I
Atlantic Ocean is in an O
Feature Vector Label
[today, the, atlantic, ocean, is, U, L, U, U, L] B
[the, atlantic, ocean, is, in, L, U, U, L, L] I
[atlantic, ocean, is, in, an, U, U, L, L, L] O
+Genre Shift in Statistical NLP
… told that John Paul Stevens is retiring this summer …
Named Entity Recognition
… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…
???
+This is a Pervasive Problem
Extracting regulatory pathways from online bioinformatics journals using a parser trained on the WSJ
Finding faces in images of disaster victims using a model trained on “mug shot” images
Identifying RNA sequences that regulate gene expression in a lab in Baltimore using a model trained on data gathered in a lab in Germany
When things change in a way that’s harmful, we’d like to know!
+Data Streams Change Over Time
Natural drift Users unaware of system limitations
Sentiment classification from movie reviews
+Detecting Genre Shift
Two problems1)Detect changes in stream of
numbers (A-distance)2)Convert document stream to
stream of informative numbers (margin)
Genre shift hurts system performance (accuracy)
+Detecting Genre Shift
Measure accuracy directlyRequires labeled examples!
Look for changes in feature distributionsWords become more/less commonNew words appear
Genre shift hurts system performance (accuracy)
+Measuring Changes in Streams:The A-Distance
A nonparametric, distribution independent measure of changes in univariate, real-valued data streams (Kifer, Ben-David, and Gherke, 2004)
P P’
+Measuring Changes in Streams:The A-Distance
P P’
> ε
+Measuring Changes in Streams:The A-Distance
P P’
> ε
+Changes in Document Streams
… President Barack Obama is urging members to …
X
+Changes in Document Streams
… President Barack Obama is urging members to …
X
Obama
embassy
41
4
1
+Changes in Document Streams
… President Barack Obama is urging members to …
XW
Obama
embassy
41
1.6
0.1 1.6 * 4 + 0.1 * 1 + … = 3.7
+Changes in Document Streams
… President Barack Obama is urging members to …
XW
Obama
embassy
41
1.6
0.1 1.6 * 4 + 0.1 * 1 + … = 3.7
• WX = margin• sign of WX is class label (+/-)• magnitude of WX is “certainty” in label
+Why Margins?
We have an easy way of producing them from unlabeled examples!
We want to track feature changes Margins are linear combinations of feature values Removing important features yields smaller
margins Only track features that matter, features with
zero (small) weight don’t affect margin (much)
Spoiler alert! Tracking margins works really well for unsupervised detection on genre shifts.
+Accuracy vs. Margins
DVD to Electronics
+Accuracy vs. Margins
DVD to Electronics
Average in block
Average over last 100 instances
+Accuracy vs. Margins
DVD to Electronics
+Confidence Weighted Margins
Margins can be viewed as measure of confidence
We detect when confidence in classifications drops
Confidence Weighted (CW) learning refines this idea Gaussian distribution over weight vectors Mean of weight vector: μ in RN
Diagonal co-variance matrix: σ in RNxN
Low variance high confidence
Normalized margin: μx / (xTσx)0.5
Called VARIANCE in slides that follow
μ
1.6
0.1
σ = 0.02σ = 1.74
+Experiments
Datasets Sentiment classification between domains (Blitzer et al.,
2007) DVDs, electronics, books, kitchen appliances
Spam classification between users (Jiang and Zhai, 2007) Named entity classification between genres (ACE 2005)
News articles, broadcast news, telephone, blogs, etc.
Algorithms Baselines: SVM, MIRA, CW Our method: VARIANCE
+Experiments
Simulated domain shifts between each pair of genres 38 pairs, 10 trials each with different random instance
orderings 500 source examples 1500 target examples
False change 11 datasets with no shift, 10 trials with different random
instance orderings
If no shift found then detection recorded as end of target examples when computing averages
+Comparing Algorithms
Instances from point of shift
Good for o
ur
approach
!
Good for b
aselin
e
+SVM vs. VARIANCE
+SVM vs. VARIANCE
+Summary of Results Thus Far
VARIANCE detected shifts faster than … SVM 34 times out of 38 MIRA 26 times out of 38 CW 27 times out of 38
+Gradual Shifts
+What if you have labels?
STEPD: a Statistical Test of Equal Proportions to Detect concept drift (Nishida and Yamauchi, 2007)
Monitors accuracy of classifier from stream of labeled examples
Parameters: window size, W, and threshold, α
+Comparison to STEPD
+What about false positives?
+The A-Distance: Choosing Parameters
P
> ε
A
n
+The A-Distance: Choosing Parameters
P
> ε
A
n
+The A-Distance: Choosing Parameters
• A-distance paper gives bounds on FPs and FNs• Bounds depend on n and • Bounds do not depend on tiling!• So loose as to be meaningless• No guidance on how to choose tiling
• What if tiles lie outside support of data?
+Better Bounds
PA = true probability of a point falling in tile A
h = number of points that actually fell in A
pA = h/n = ML estimate of PA
Define P’A, h’, and p’A for second window
Suppose PA = P’A, then any change detected is a false positive
> ε
What is the probability that |pA – p’A| > /2?
+Posterior Over PA
B(, ) is the Beta function over + Bernoulli trials
trials have one outcome (point lands in tile A)
trials have the other (point lands in some other tile)
+False Positives: Two Cases
+Don’t worry, I’m not going to explain this (much)
+Probability of a FP (n = 200)
+Probability of FN
+Minimizing Expected Loss
+Moving Forward
GenreClassifier Newswire
TranscribedBroadcast
News
+Genre Shift “Fix”
… told that John Paul Stevens is retiring this summer …
Named Entity Recognition
… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…
+Genre Shift “Fix”
… told that John Paul Stevens is retiring this summer …
Named Entity Recognition
… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…
… President Barack Obama is urging members to …
+Conclusion
Changes in margins convey useful information about changes in classification accuracy No need for labeled examples!
The A-distance applied to margin streams finds genre shifts with few false positives/negatives
Confidence weighted margins normalized by variance detect shifts faster than SVM, MIRA, or (non-normalized) CW margins
Our approach even works with gradual shifts and compares favorably to shift detectors that use labeled examples
+Thank you!