Encoding Logic Rules in Sentiment ... - The Last...

Encoding Logic Rules in Sentiment Classification

Kalpesh Krishna* UMass Amherst

Preethi Jyothi IIT Bombay

Mohit Iyyer UMass Amherst

* Work done at IIT Bombay

Sentiment Classification

Classify a sentence as positive or negative

this movie has a great story

Sentiment = Positive

Not Always Easy!

Solution :- Lexicons, Bag of Words

Not Always Easy!

Contrastive - this movie is funny, but horribly directed

Negation - this is not a movie worth waiting for

Not Always Easy!

Contrastive - this movie is funny, but horribly directed

Negation - this is not a movie worth waiting for

Much Harder!

Logic Rules

this movie is funny, but horribly directed A-but-B

Logic Rules

this movie is funny, but horribly directed A-but-B

sentiment(A-but-B) = sentiment(B)

Neural Nets + Logic Rules?Method Previous Work Our Contributions

Explicit

Explicit Hu et al. (ACL 2016)

Contribution #1 replication study finds incorrect conclusions

Implicit?

Implicit? Peters et al. (NAACL 2018)

Contribution #2 ELMo embeddings

learn logic rules without explicit supervision

Contribution #3 Robustness of explicit / implicit methods to varying

annotator agreement in A-but-B sentences

Digression: Reproducibility

Small benchmark datasets (SST, MR, CR)

Significant variation in performance every run (due to random initialization / GPU parallelization)

Digression: Reproducibility

Small benchmark datasets (SST, MR, CR)

Significant variation in performance every run (due to random initialization / GPU parallelization)

Solution :- Average performance over a large number of random seeds (Reimers and Gurevych 2017)

Large Variation (100 seeds)

OutlineMethod Previous Work Our Contributions

Model in Hu et al. 2016Expectation-Maximization style algorithm

E: Projection (Ganchev et al. 2010)

M: Distillation (Hinton et al. 2014)

this movie is funny, but horribly directed

negative = 0.34 positive = 0.66

project

projection is a convex optimization problem

project

new distribution consistent with logic rulesprojection is a convex optimization problem

M: Distillation (Hinton et al. 2014)

train model with projected distribution as soft-label

Hu et al. 2016 algorithmE: ProjectionM: Distillation

forall minibatch (x,y) {p = forward(x)q = project(p)theta += grad-update(p, q, y)

Conclusions in Hu et al. 2016

1) Distilled model better than single projection

2) Distilled neural network has significant gain on SST2 as it learns A-but-B rule

Our Conclusions1) Distilled model better than single projection

1) A single projection is a good way to explicitly encode logic rules

2) Distilled neural nets aren't learning logic rules

Distillation is ineffective

Distillation Single Projection Distill + Single Project

Reported Averaged

Single Projection good!

Reported Averaged

Single Projection sufficient!

Reported Averaged

Single Projection sufficient!

Consistent Trend on A-but-B

Reported N / A

Averaged Gain %

Distillation = 1.9% Single Projection = 9.3%

Distillation + Projection = 8.9%

Again, a single projection at test time is sufficient!

Our Conclusions1) Distilled model better than single projection

1) A single projection is a good way to explicitly encode logic rules

2) Distilled neural nets aren't learning logic rules

ELMo RepresentationsEmbeddings from Language Models

large language model trained on the 1 Billion Words dataset

learnt representations used for downstream task

Unlike word2vec, these embeddings are contextual

ELMo Results (100 seeds)

ELMo Results (100 seeds)Model SST2 A-but-B A-but-B +

negation

CNN (Baseline) 86.0 78.7 80.1

CNN + ELMo 88.9 86.5 87.2

Gain % 2.9 7.8 7.1

ELMo Results (100 seeds)

Significant improvement, even after averaging!

Model SST2 A-but-B A-but-B + negation

CNN (Baseline) 86.0 78.7 80.1

CNN + ELMo 88.9 86.5 87.2

Gain % 2.9 7.8 7.1

60% of the improvement is on A-but-B sentences and negations

Is ELMo Learning Logic Rules?Model SST2 A-but-B A-but-B +

negation

CNN (Baseline) 86.0 78.7 80.1

CNN + ELMo 88.9 86.5 87.2

Gain % 2.9 7.8 7.1

60% of the improvement is on A-but-B sentences and negations

(Only 24.5% of corpus is A-but-B / negations)

Is ELMo Learning Logic Rules?Model SST2 A-but-B A-but-B +

negation

CNN (Baseline) 86.0 78.7 80.1

CNN + ELMo 88.9 86.5 87.2

Gain % 2.9 7.8 7.1

ELMo + Explicit (Projection)Model SST2 A-but-B

ELMo 88.9 86.5

ELMo + project 89.0 87.2

Gain % 0.1 0.7

Test-time projection is ineffective for ELMo

ELMo + Explicit (Projection)Model SST2 A-but-B

ELMo 88.9 86.5

Gain % 0.1 0.7

Test-time projection is ineffective for ELMo

ELMo + Explicit (Projection)

Distance between ELMo distribution and projected distribution is 0.13 (vs 0.26 distillation, 0.27 baseline)

Model SST2 A-but-B

ELMo 88.9 86.5

Gain % 0.1 0.7

Clustering ELMo Vectors

Cosine similarity between every pair of words

Clustering ELMo Vectors

Cosine similarity between every pair of words

Contrastive (A-but-B) there are slow and repetitive parts, but it has

just enough spice to keep it interesting

there are slow and repetitive parts, but it has just enough spice to keep it interesting

word2vec ELMo

word2vec ELMoClustering for A part and B part in A-but-B

sentences for ELMo embeddings

word2vec ELMoClustering for A part and B part in A-but-B

sentences for ELMo embeddings

ELMo Representations learn the scope of a contrastive conjunction!

word2vec ELMo

Sentiment is Ambiguous!

beautiful film, but those who have read the book will be disappointed

nine crowd-workers label each A-but-B sentence as positive / negative / neutral

we test our models on subsets of varying agreement

Consistent trends on all levels of agreement

Projection degrades accuracy on high agreement sentences!

Key Takeaways• Carefully perform sentiment classification research

• variation across runs - average across several seeds

• ambiguous sentences - benchmark on subsets of varying annotator agreement

• ELMo embeddings implicitly learn logic rules for sentiment classification

Code + Data github.com/martiansideofthemoon/logic-rules-sentiment

Key Takeaways• Carefully perform sentiment classification research

• variation across runs - average across several seeds

• ambiguous sentences - benchmark on subsets of varying annotator agreement

• ELMo embeddings implicitly learn logic rules for sentiment classification

Thank You!

Code + Data github.com/martiansideofthemoon/logic-rules-sentiment

Encoding Logic Rules in Sentiment ... - The Last...

Documents

Learning to Automatically Solve Logic Grid Puzzlesark/EMNLP-2015/proceedings/EMNLP/...2.2 Representing Puzzle Entities A (m;n )-puzzle problem contains m categories and n elements

30 Pressure (kbar) Temperature (°C) · Pressure (kbar) Grossular 0.36 0.36 0.34 0.30 0.30 0.40 0.34 0.26 Appendix I. Eclogite D197 compositional isopleths for the calculated phase

Rls For Emnlp 2008

C ONFERENCE ON E MPIRICAL M ETHODS IN N ATURAL L ANGUAGE …ark/EMNLP-2015/proceedings/CogACLL/Cog… · EMNLP 2015 C ONFERENCE ON E MPIRICAL M ETHODS IN N ATURAL L ANGUAGE P ROCESSING

An Unsupervised Bayesian Modelling Approach for Storyline Detection on News …ark/EMNLP-2015/proceedings/EMNLP/... · 2015. 9. 4. · Proceedings of the 2015 Conference on Empirical

LCSTS: A Large Scale Chinese Short Text Summarization ...ark/EMNLP-2015/proceedings/EMNLP/pdf/...text summarization and natural language process-ing based on naturally annotated web

Personality Profiling of Fictional Characters using Sense ...ark/EMNLP-2015/proceedings/EMNLP/pdf/EMNL… · sonality prole of a ctional character in a similar way as it is done for

MINICOLOR GRAVICOLOR - motan-colortronic · G2S 0.34 - 26.6 0.34 - 26.6 G3S 0.7 - 49.2 0.7 - 49.2 Dosing capacity (disc dosing) (kg/h)* DD1 0.07 - 4.38 DD2 0.13 - 7.92 DD3 0.24 -

EMNLP / NASTEAthedansimonson.com/simonson_DCNLP_nastea_presentation... · 2016-11-17 · EMNLP / NASTEA A recap of some EMNLP talks, and my own research presented at the Second Computing

“Why Did it All Go So Horribly Wrong?” Intercultural ... · Chapter 13 “Why Did it All Go So Horribly Wrong?” 197. communities in the face of transcultural flows of languages

When Your Social Media Marketing Goes Horribly Wrong

Self-disclosure topic model for twitter conversations - EMNLP 2014

And It All Went Horribly Wrong: Debugging Production Systemsdtrace.org/resources/bmc/QCon.pdf · VP, Engineering bryan@joyent.com Bryan Cantrill And It All Went Horribly Wrong: Debugging

MMH12 12ft (0.34 m3 Mobile Mud Hog Mixing Station

C H A P T E R 5 The Bizarre Mystery of Horribly Hard ... · “The Bizarre Mystery of Horribly Hard Middle School ... Jane Bell Kiester You will recognize most of your students and

Learning to Identify the Best Contexts for Knowledge-based …ark/EMNLP-2015/proceedings/EMNLP/...to context modeling for knowledge-based WSD. Context is represented by bag-of-words,

1.10 A sailing trip that went horribly wrong Jane...Page 1 1.10 A sailing trip that went horribly wrong Wikipedia Introduction You are going to listen to Jane talking to Sheila about

emnlp - Association for Computational Linguisticsmirror.aclweb.org/emnlp2013/files/handbook.pdf · 2014-12-15 · EMNLP Organizing committee General Chair David Yarowsky (John Hopkins

The road to retirement - Colorado Mesa University · 2020. 10. 1. · American Funds 2010 Target Date Retirement R6 RFTTX 0.34 0.34 0.00 0.135 American Funds 2015 Target Date Retirement

Proceedings of the 2020 EMNLP Workshop W-NUT: The Sixth