Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Natural Language Inference What are NLI models really learning?
Kicking out Premises…
Takeaways
SNLI (Bowman et. al, 2015) 570 K MultiNLI (Williams et. al., 2017) 433 K
★Results consistent with numerous findings of issues with NLP datasets like ROC (Cai et. al., 2017) and VQA (Agrawal et. al., 2016).
★Annotation artifacts could be addressed by improving annotation protocols preemptively to correct for common biases.
NLI models learn from lexical cues rather than entailment semantics.
SNLI
Acc
urac
y (%
)
50
62.5
75
87.5
100
DAM ESIM DIIN
93.492.692.4
72.771.369.4
86.885.884.7
Full Hard Easy
MultiNLI Matched
0
22.5
45
67.5
90
DAM ESIM DIIN
87.686.285.3
64.159.355.8
7774.172
MultiNLI Mismatched
0
22.5
45
67.5
90
DAM ESIM DIIN
86.885.285.7
64.458.956.2
76.573.172.1
Two dogs are running through a field.
Contradiction
The pets are sitting on a couch.
Entailment
There are animals
outdoors.
Neutral
Some puppies are running to catch
a stick.
Premise
Tex
t Cl
assi
fica
tion
A
ccur
acy
(%)
0
17.5
35
52.5
70
SNLI MultiNLI MultiNLI
52.353.9
67
35.235.434.3
Majority ClassHypothesis-only
Matched Mismatched
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan* Swabha Swayamdipta* Omer Levy Roy Schwartz Samuel R. Bowman Noah A. Smith
* equal contribution
Negation
Older man with white hair and a red cap painting the golden gate bridge on the shore with the golden gate bridge in the distance.
Contradiction
Nobody wears a cap.
Premise
Cats!
Three dogs racing on racetrack.
Contradiction
Three cats race on a track.
Premise
Contradiction Artifacts
Purpose
Clauses
A group of female athletes are huddled together and
excited.
Neutral
They are huddled together
because they are working together.
Premise
Modifiers
A middle-aged man works under the engine of a train on
rail tracks.
Neutral
A man is doing work on a black Amtrack train.
Premise
Neutral Artifacts
Shortening
A person in a red shirt is mowing the grass with a
green riding mower.
Entailment
A person in red is cutting the grass on a
riding mower.
Premise
Generalizatio
n
Some men and boys are playing frisbee in a grassy
area.
Entailment
People play frisbee outdoors.
Premise
Entailment Artifacts
C
E
N
Hypothesis
Premise
C
E
N
Hypothesis
Easy
Hard
Premise
Hard/Easy Classification
Tex
t Cl
assi
fica
tion
SNLI premises are Flickr captions. MultiNLI premises are collected from diverse genre. Hypotheses are crowdsource-generated.
Resources★Hard benchmarks for MultiNLI:
www.kaggle.com/c/multinli-matched-open-hard-evaluation/
★Hard benchmark for SNLI: www.nlp.stanford.edu/projects/snli/snli_1.0_test_hard.jsonl
DAM - Decomposable Attention Model (Parikh et. al. 2016) ESIM - Enhanced Sequential Inference Model (Chen et. al., 2017) DIIN - Densely Interactive Inference Network (Gong et. al. 2018)
Over 50% of NLI examples can be
correctly classified without ever observing the
premise!