61
Henning Wachsmuth [email protected] June 19, 2019 Computational Argumentation — Part XI Assessment of the Quality of Argumentation

Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

Henning Wachsmuth [email protected]

June 19, 2019

Computational Argumentation — Part XI

Assessment of the Quality of Argumentation

Page 2: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

2

I.  Introduction to computational argumentation II.  Basics of natural language processing III.  Basics of argumentation IV.  Applications of computational argumentation V.  Resources for computational argumentation

VI.  Mining of argumentative units VII. Mining of supporting and objecting units VIII. Mining of argumentative structure IX.  Assessment of the structure of argumentation X.  Assessment of the reasoning of argumentation XI.  Assessment of the quality of argumentation XII. Generation of argumentation XIII. Development of an argument search engine

XIV. Conclusion

Outline

Assessment of the Quality of Argumentation, Henning Wachsmuth

•  Introduction •  A quality taxonomy •  Absolute rating •  Relative comparison •  Objective assessment •  Inclusion of subjectivity •  Conclusion

Page 3: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

3

§  Concepts •  Get to know various quality dimensions of argumentation. •  Learn about differences between quality in theory and in practice. •  Understand the subjective nature of quality.

§  Methods •  Learn how to assess quality with supervised learning. •  Learn how to assess quality through graph analyses.

§  Associated research fields •  Argumentation theory and rhetoric •  Computational linguistics

§  Within this course •  Understand how to distinguish good from bad arguments. •  See to what extent computational assessment is doable currently.

Learning goals

Assessment of the Quality of Argumentation, Henning Wachsmuth

http

s://c

omm

ons.

wik

imed

ia.o

rg

http

s://p

ixab

ay.c

om

http

s://p

ixab

ay.c

om

Page 4: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

4

Introduction

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 5: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

5

§  Argumentation quality •  Natural language argumentation is rarely logically correct or complete. •  Need to measure how good an argument unit, argument, or argumentation is.

§  Observations •  Granularity. Quality may be addressed at different levels of text granularity. •  Dimensions. Several dimensions of quality may be considered. •  Goal orientation. What is important, depends on the goal of argumentation.

§  Notice •  The study of logical quality in terms of fallacies is beyond the score here.

What is argumentation quality?

Assessment of the Quality of Argumentation, Henning Wachsmuth

” Everyone has an inalienable human right to life, even those who commit murder; sentencing a person to death and executing them violates that right.”

argument cogent?

premises acceptable?

effective in persuading?

relevant to discussion? linguistically

clear?

reasonably argued?

Page 6: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

6

Debate (dialogical argumentation)

Argumentation (monological)

Argument

Granularity levels of argumentation (recap)

Assessment of the Quality of Argumentation, Henning Wachsmuth

Alice. Some people say refugees threaten peace, as many of them were criminals. In fact, Spiegel Online just reported results from a study of the federal police about numbers of refugees and crimes: Overall, there is no correlation at all! Rather, the police confirmed that the main reason for committing crime is poverty. So, if you believe the police then you shouldn't believe those people. Syrians are even involved less in crimes than Germans according to the study. So, the more Syrians come to Germany, the more peaceful it gets there, right?

Bob. The question is here why should I believe the police!? Argument failed :P

Argumentative discourse unit

Page 7: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

7

Argumentation quality dimensions (Wachsmuth et al., 2017b)

Assessment of the Quality of Argumentation, Henning Wachsmuth

cogency reason- ableness

effectiveness

local relevance

local acceptability

local sufficiency

global relevance

global acceptability

global sufficiency

clarity

appropriateness

credibility emotional appeal

arrangement

Argumentation quality

Rhetorical quality

Logical quality

Dialectical quality

Page 8: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

8

§  Persuasion •  Changing or reinforcing the stance of an audience

towards an issue.

§  Agreement •  Resolving a dispute between multiple parties or

achieving a settlement in a negotiation.

§  Justification •  Giving reasons or explanations for an attitude or

action that might be controversial.

§  Recommendation •  Suggesting a decision to make, an action to take,

a product to buy, or similar.

§  Deliberation •  Deepening one‘s own understanding of an issue.

Goals of argumentation (recap) based on Tindale (2007)

Assessment of the Quality of Argumentation, Henning Wachsmuth

http

s://d

e.w

ikip

edia

.org

ht

tps:

//com

mon

s.w

ikim

edia

.org

ht

tps:

//pix

abay

.com

ht

tps:

//de.

m.w

ikip

edia

.org

ht

tps:

//pix

abay

.com

Page 9: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

9

§  Argumentation quality assessment •  Identification of indisputable flaws or requirements of argumentation. •  Judgment about a specific quality dimension. •  Determination whether argumentation successfully achieves its goal.

§  Observations •  Choice of comparison. Dimensions can be assessed absolutely or relatively. •  Subjectivity. Perceived quality depends on the view of the reader/audience.

(and maybe also on the author/speaker)

§  How to approach quality assessment •  Input. Argumentative text, metadata (e.g., author), external knowledge, ... •  Techniques. Supervised classification/regression, graph-based analyses, ...

Several example approaches discussed in this lecture.

What is argumentation quality assessment?

Assessment of the Quality of Argumentation, Henning Wachsmuth

linguistically clear?

effective in persuading?

Page 10: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

10

§  Two ways of assessing a quality dimension •  Absolute rating. Assignment of a score from a predefined scale.

Typical scales: Integers (possibly with half-points): 1–3, 1–4, 1–5, 1–10, -2–2, ... Real valued: [0,1], [-1,1]

•  Relative comparison. Given two instances, which of them is better.

§  Observations •  Both allow for ranking the assessed instances. •  Absolute ratings entail relative comparisons. •  Absolute ratings imply a maximum and minimum.

§  Absolute vs. relative assessment •  A relative assessment is often much easier. •  Still, absolute ratings are widely spread and often work well.

Absolute vs. relative assessment

Assessment of the Quality of Argumentation, Henning Wachsmuth

”If you wanna hear my view I think that the death penalty should be abolished. It legitimizes an irreversible act of violence. As long as human justice remains fallible, the risk of executing the innocent can never be eliminated.”

”Human beings never act freely and thus should not be punished for even the most horrific crimes.“

4 / 5

better than

Page 11: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

11

§  Quality in theory •  The normative view of quality in terms of cogency, reasonableness,

or similar. •  Suggests to use absolute quality ratings.

§  Quality in practice •  Quality is decided by the effectiveness on (some type of) people. •  Relative comparisons are often more suitable.

§  Unresolved questions •  Should quality be aligned with how we should or how with we do argue? •  Is this actually so different? à more on this below

Argumentation quality in theory and in practice

Assessment of the Quality of Argumentation, Henning Wachsmuth

” Is a strong argument an effective argument which gains the adherence of the audience, or is it a valid argument, which ought to gain it?“

(Perelman and Olbrechts-Tyteca, 1969)

http

s://d

e.w

ikip

edia

.org

ht

tps:

//com

mon

s.w

ikim

edia

.org

Page 12: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

12

§  Reader (or audience) •  Argumentation often targets a

particular audience. •  Different arguments and ways of

arguing work for different readers.

http

s://p

ixab

ay.c

om

§  Author (or speaker) •  Argumentation is connected to the

person who argues. •  The same argument is perceived

differently depending on the author.

The role of participants in argumentation (recap)

Assessment of the Quality of Argumentation, Henning Wachsmuth

”University education must be free. That is the only way to achieve equal opportunities for everyone.“

”According to the study of XYZ found online, avoiding tuition fees is beneficial in the long run, both socially and economically.“

http

s://c

omm

ons.

wik

imed

ia.o

rg

http

s://p

ixab

ay.c

om

§  Questions •  May the assessment ignore the author/speaker? And the reader/audience?

The author/speaker is unknown in some application scenarios, but rarely the reader/audience is.

Page 13: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

13

§  Subjectiveness of quality assessment •  Many dimensions are inherently subjective. •  Quality depends on the subjective weighting

of different aspects of an issue. •  Also depends on preconceived opinions.

§  Example: Which argument is more relevant?

§  Two ways to approach this problem (both will be detailed below) •  Either, focus on properties that can be assessed ”objectively“. •  Or, include a model of the reader/audience in the quality assessment.

Subjective (and objective) assessment

Assessment of the Quality of Argumentation, Henning Wachsmuth

” The death penalty doesn’t deter people from committing serious violent crimes. The thing that deters is the likelihood of being caught and punished.”

” The death penalty legitimizes an irreversible act of violence. As long as human justice remains fallible, the risk of executing the innocent can never be eliminated.”

Pro comfort hygge

Con ugliness expense

exam

ple

and

figur

es fr

om h

ttps:

//arg

min

ing2

017.

files

.wor

dpre

ss.c

om/

2016

/12/

argm

inin

g201

7-in

vite

d-ta

lk-c

hris

tian-

kock

.ppt

x

”Should we buy a Chesterfield armchair?”

(credit to Christian Kock for this example)

Page 14: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

14

§  Why assessing argumentation quality? •  Mining arguments and understanding the reasoning is not enough in practice. •  For successful argumentation, we need to choose the ”best“ arguments. •  Critical for any application of computational argumentation.

§  Example applications •  Argument search. What argument to rank highest? •  Writing support. How good is an argumentative text,

what flaws does it have? •  Automatic decision making. Which arguments outweigh

which others?

Importance of quality assessment

Assessment of the Quality of Argumentation, Henning Wachsmuth

http

s://p

ixab

ay.c

om

http

s://w

ww

.pub

licdo

mai

npic

ture

s.ne

t

”In some, sense the question about the quality of an argument is the ‘ultimate’ one for argumentation mining.“

(Stede and Schneider, 2018)

Page 15: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

15

A quality taxonomy based on Wachsmuth et al. (2017b)

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 16: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

16

Survey of existing research

Assessment of the Quality of Argumentation, Henning Wachsmuth

Freeman (2011)

Damer (2009) Tindale (2007)

O‘Keefe and Jackson (1995) Aristotle (2007)

van Eemeren (2015)

Perelman and Olbrecht-Tyteca (1969)

van Eemeren and Grootendorst (2004)

Cohen (2011)

Walton (2006)

Johnson and Blair (2006)

Hamblin (1970)

Blair (2012) Govier (2010)

Toulmin (1958) Walton et al. (2008)

Hoeken (2001)

Mercier and Sperber (2011)

Braunstain et al. (2016)

Rahimi et al. (2014)

Stab and Gurevych (2017)

Persing and Ng (2013)

Feng et al. (2014)

Park et al. (2015)

Persing and Ng (2014)

Persing and Ng (2015)

Cabrio and Villata (2012)

Wachsmuth et al. (2017a)

Boltužic and Šnajder (2015) ´

Persing et al. (2010)

Rahimi et al. (2015)

Tan et al. (2016) Wei et al. (2016)

Habernal and Gurevych (2016)

Zhang et al. (2016)

Rhetoric

Logic Dialectic

Argumentation quality

argumentation theory

assessment approaches

Page 17: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

17

Three main quality aspects

Assessment of the Quality of Argumentation, Henning Wachsmuth

A A à B B

Rhetoric

Logic Dialectic

Argumentation quality

A A à B B

http

s://c

omm

ons.

wik

imed

ia.o

rg

A A à B B

B à C C

http

s://d

e.w

ikip

edia

.org

Blair (2012)

”An argument is cogent if its premises are relevant to its

conclusion, individually acceptable, and together sufficient to draw

the conclusion.“

Aristotle (2007)

”In making a speech, one must study three points:

the means of producing persuasion, the style or language to be used,

and the proper arrangement of the various parts.“

van Eemeren (2015)

”A dialectical discussion derives its reasonableness from

a dual criterion: problem validity and intersubjective validity.“

Page 18: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

18

Unification of views

Assessment of the Quality of Argumentation, Henning Wachsmuth

soundness validity

strength well-formedness

amount of rebuttal

fallaciousness

satisfac- toriness

convincingness

level of support

amount of evidence

sufficiency

thesis clarity

prompt adherence

global coherence

evaluability

argument strength

persuasiveness

winning side

organization

argument relevance

prominence

Rhetoric

Logic Dialectic

local/probative relevance

premise acceptability

premise sufficiency

cogency

effectiveness

clarity of style

appropriateness of style

credibility emotional appeal

arrangement

global/dialectical relevance

intersubjective acceptability

dialectical sufficiency

reason- ableness

argument acceptability

Argumentation quality

focus on theory

focus on accepted

prefer general

unify names

Page 19: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

19

A taxonomy of argumentation quality

Assessment of the Quality of Argumentation, Henning Wachsmuth

cogency reason- ableness

effectiveness

local relevance

local acceptability

local sufficiency

global relevance

global acceptability

global sufficiency

clarity

appropriateness

credibility emotional appeal

arrangement

Argumentation quality

thesis clarity Persing and Ng (2013)

prompt adherence Persing and Ng (2014)

global coherence Feng et al. (2014)

evaluability Park et al. (2015)

amount of evidence Rahimi et al. (2014)

sufficiency Stab and

Gurevych (2017)

level of support Braunstain et al. (2016)

argument acceptability Cabrio and Villata (2012)

argument prominence Boltužic and Šnajder (2015) ´

argument relevance Wachsmuth et al. (2017a)

organization Persing et al. (2010) Rahimi et al. (2015)

argument strength Persing and Ng (2015) persuasiveness Tan et al. (2016) winning side Wang et al. (2016) Zhang et al. (2016) convincingness Habernal and Gurevych (2016)

Page 20: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

20

§  A cogent argument. Has acceptable, relevant, and sufficient premises. •  Local acceptability. The premises are worthy being believed as true. •  Local relevance. The premises are relevant to the conclusion. •  Local sufficiency. The premises are sufficient to draw the conclusion.

§  Effective argumentation. Persuades the target audience. •  Credibility. Makes the authors worthy of credence. •  Emotional appeal. Makes the audience open to be persuaded. •  Clarity. Is linguistically clear and as simple as possible. •  Appropriateness. Linguistically matches the audience and issue. •  Arrangement. Presents content in the right order.

§  Reasonable argumentation. Is acceptable, relevant, and sufficient. •  Global acceptability. Worthy to be considered in the way stated. •  Global relevance. Contributes to resolution of issue. •  Global sufficiency. Adequately rebuts potential counterarguments.

  Notice: cogency also adds to effectiveness, and cogency and effectiveness also add to reasonableness.

Quality dimensions in the taxonomy

Assessment of the Quality of Argumentation, Henning Wachsmuth

Rhetoric

Logic

Dialectic

Page 21: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

21

cogency local acceptability local relevance local sufficiency

effectiveness credibility emotional appeal clarity appropriateness arrangement reasonableness global acceptability global relevance global sufficiency overall quality

Dimension Alpha .44 .46 .47 .44

.45

.37

.26

.35

.36

.39

.50

.44

.42

.27

.51

Maj. 92% 91% 92% 93%

94% 96% 94% 90% 88% 93% 96% 95% 90% 98% 94%

Mean 1.6 1.9 2.3 1.5

1.4 1.7 1.9 2.1 2.1 1.8 1.6 1.9 2.0 1.2 1.6

§  Corpus based on the taxonomy •  320 debate portal arguments

(Habernal and Gurevych, 2016a)

•  10 per issue/stance pair •  3 annotators per argument •  Score from [1,3] for all 15 dimensions

§  Agreement •  Krippendorff‘s alpha limited •  Majority agreement very high

§  Correlations •  Overall quality correlates most with

reasonableness (.86), cogency (.84), and effectiveness (.81)

•  Several other intuitive correlations

The Dagstuhl-15512 ArgQuality corpus

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 22: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

22

Absolute rating

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 23: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

23

§  Problem •  Can we predict whether an argument(ation) is good (cogent, effective, ...)? •  Can we rate how good it is?

§  Main idea •  See quality assessment as a standard classification

or regression task. •  Learn what linguistic feature or metadata speaks for quality?

§  Existing approaches •  Persuasiveness. Prediction based on interaction of participants. (Tan et al., 2016)

•  Organization. Assessment based on tuned features. (Persing et al., 2010) Analog approaches for thesis clarity, prompt adherence, and argument strength (Persing and Ng, 2013–2015).

•  Amount of evidence. Count of evidence supporting conclusion. (Rahimi et al., 2014)

•  Sufficiency. Prediction using convolutional neural networks (Stab and Gurevych, 2017).   ... among other approaches

Absolute quality rating: Overview

Assessment of the Quality of Argumentation, Henning Wachsmuth

4 / 5 Conclusion Premises

Page 24: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

24

Absolute rating: Covered dimensions

Assessment of the Quality of Argumentation, Henning Wachsmuth

cogency reason- ableness

effectiveness

local relevance

local acceptability

local sufficiency

global relevance

global acceptability

global sufficiency

clarity

appropriateness

credibility emotional appeal

arrangement

Argumentation quality

thesis clarity Persing and Ng (2013)

prompt adherence Persing and Ng (2014)

global coherence Feng et al. (2014)

evaluability Park et al. (2015)

amount of evidence Rahimi et al. (2014)

sufficiency Stab and

Gurevych (2017)

level of support Braunstain et al. (2016)

argument acceptability Cabrio and Villata (2012)

argument prominence Boltužic and Šnajder (2015) ´

argument relevance Wachsmuth et al. (2017a)

organization Persing et al. (2010) Rahimi et al. (2015)

argument strength Persing and Ng (2015) persuasiveness Tan et al. (2016) winning side Wang et al. (2016) Zhang et al. (2016) convincingness Habernal and Gurevych (2016)

Page 25: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

25

§  Task •  In a discussion, what will persuade someone open to be persuaded?

§  Approach

•  Analysis of correlations between linguistic, interaction, and meta-discussion features with persuasion.

•  Prediction based on features as to whether persuasion will happen.

§  Data •  20k+ discussions from Reddit ChangeMyView. •  Discussion. An opinion poster (OP) states a view,

others argue against, OP gives Δ to convincing arguments.

§  Selected results •  Accuracy. 69% in balanced setting. •  Insights. Some interactions and many participants help;

appropriate style, not to similar to OP‘s style most persuasive.

Absolute rating of effectiveness (Tan et al., 2016)

Assessment of the Quality of Argumentation, Henning Wachsmuth

view

cha

nged

0%

2%

4%

6%

# interactions2 4 6+

http

s://d

e.w

ikip

edia

.org

Page 26: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

26

§  Task •  Given a persuasive essay, rate argumentation-related quality dimensions.

§  Dimensions •  Organization. How well is the essay‘s argumentation arranged? •  Thesis clarity. How easy to understand is the essay‘s thesis? •  Prompt adherence. How close does the essay stay to the prompt? •  Argument strength. How strong is the argument made for the thesis?

§  Research question •  Can we leverage argument mining

to assess the argumentation quality of persuasive essays?

§  Data (Persing et al., 2010; Persing and Ng, 2013–2015)

•  800–1003 essays with scores from [1,4] annotated for each dimension

Absolute rating of four rhetorical dimensions (Wachsmuth et al., 2016)

Assessment of the Quality of Argumentation, Henning Wachsmuth

0

200

400

1 2 3 4

thesis clarity

prompt adherence organization

argument strength

essays

score

Page 27: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

27

Motivation: Argumentative writing support (Wachsmuth et al., 2016)

Assessment of the Quality of Argumentation, Henning Wachsmuth

Web Technology and Information Systems www.webis.de [email protected]

Using Argument Mining to Assess the Argumentation Quality of EssaysHenning Wachsmuth, Khalid Al-Khatib, Benno Stein

Statistical insights into argumentation based on the output of mining

The first study of argument miningfor argumentation quality assessment

State-of-the-art assessment of essayorganization and argument strength

Argument mining determines the argumentative structure of texts. The benefit of this structure has rarely been evaluated.

Argumentation quality assessment is needed for envisaged applications such as argumentative writing support.

Argumentative writing support for persuasive essays: 1. Mining of an essay‘s argumentative structure. 2. Assessment of argumentation quality dimensions. 3. Synthesis of suggestions for improvements (future work).

Modeling of an essay as a flow of paragraph-level arguments with sentence-level argumentative discourse units (ADUs).

Novel feature types for argumentation-related essay scoring based on the output of mining.

We score persuasive essays based on the output of mining for four argumentation-related quality dimensions: – Organization (Persing et al., EMNLP 2010)

– Thesis clarity (Persing and Ng, ACL 2013)

– Prompt adherence (Persing and Ng, ACL 2014)

– Argument strength (Persing and Ng, ACL 2015)

Main contributions of our work: – The first study of the benefit of argument mining for argumentation quality assessment. – Statistical insights into essay argumentation. – The new state of the art for two quality dimensions.

Learning of mining four ADU types using standard features on the Argument Annotated Essays corpus (Stab and Gurevych, COLING 2014)

Application of mining on all 6085 student essays from the International Corpus of Learner English (Granger et al., 2009).

If we take a look back in time we are in a position to see man dreaming, philosophizing and using his imagination of whatever comes his way. We see man transcending his ego I a way and thus becoming a God - like figure. And by putting down these sacred words, what is taking shape in my mind is the fact that using his imagination Man is no longer this organic and material substance like his contemporary counterpart who is putting his trump card on science, technology and industrialization but Man is a way transcends himself through his imagination.For instance, if we take into account the Renaissance or Romantic periods of mankind and close our eyes we could see Shakespeare applying his imagination in the fancy world of his comedies: elf and nymphs circling the stage making it a dream that will lost forever in our minds. We could even hear their high-pitched weird chuckle piercing with a gentle touch our ears, but "open those eyes that must eclipse the day" and you'll wee the high-tech wiping out every trace of the human elevated spirit that have dominated over the previous centuries. What we see now is "deux aux machina" or the fake "God from the machine" who with the touch of a button could unleash Armageddon.For poets and literate people of yore it was a common idea to transcend reality or to go beyond it by using their imagination not by using reason as we the homosapiens of our time do. For example, if we indulge in entertaining the idea of the film "The matrix" it has a lot to do with the period of Romanticism. But the difference is that a poet from that time could transcend reality, become one with Nature, and cruise wherever he wants using his imagination. Whereas now in the 21st century and in "The matrix" in particular the scientific type of Man thinks that at last he has succeeded in making travelling without boundaries via the virtual reality of his PC.As a logical conclusion to my essay I would like to put only one thing. "Wouldn't it be better if imagination makes the world go round". If I was to answer this question, the answer would be positive, but given the aquisitive or consumer society conditions we live in let's make a match between imagination and science. It would be somewhat more realistic.

prompt

essay

Some people say that in our modern world, dominated by science and technology and industrialisation, there is no longer a place for dreaming and imagination. What is your opinion?

none

conclusion

premise

Analysis of common ADU change flows in all ICLE paragraphs.

Evaluation on all 830–1003 ICLE essays that are labeled for each quality dimension with a score from [1, 4].

Experimental set-up exactly as in the papers of the (former) state-of-the-art approaches.

Essay scoring with several supervised approaches: – Average score baseline – State-of-the-art baseline (Persing et al. EMNLP 2010, Persing and Ng ACL 2013–2015)

– Content: Token n-grams, prompt similarities – POS: Part-of-speech n-grams – Flows: Sentiment flow patterns (Wachsmuth et al., COLING 2014, EMNLP 2015)

– Our approach: ADU flows, n-grams, and compositions

Mean squared errors in 5-fold cross-validation:

Mining

Assessment

Synthesis

argumentativestructure

argumentationquality

essay(input)

suggestion(output)

organization 2.0clarity 3.0

adherence 4.0strength 2.5

x 2x 1x 1

1

2

3

...

essay level paragraph level sentence level

argumentativestructure

... ...

argument1

argument2

argumentk

......

ADU type21

ADU type2m...

...

...

...

Argument mining approach Accuracy F1-score

Majority baseline 0.525 0.361State-of-the-art baseline (Stab and Gurevych, EMNLP 2014) 0.773 0.726Our approach 0.745 0.745

Essay scoring Thesis Prompt Argumentapproach Organization clarity adherence strength

Average score baseline 0.349 0.469 0.291 0.266State-of-the-art baseline 0.175 0.369 0.197 0.244

Content 0.336 0.425 0.231 0.236POS 0.326 0.461 0.231 0.233Flows 0.228 0.481 0.257 0.259

Our approach 0.184 0.470 0.241 0.242ADU flows 0.234 0.461 0.247 0.242ADU n-grams 0.225 0.466 0.265 0.243ADU compositions 0.194 0.457 0.239 0.239

Our approach + POS / Flows 0.164 0.496 0.232 0.246ADU compositions + Content 0.178 0.435 0.216 0.226

Paragraph of essay

# ADU change flow average first last

1 (conclusion, premise) 25.1% – 13.1%2 (conclusion) 22.4% 0.9% 31.6%3 (conclusion, premise, conclusion) 17.0% – 27.2%4 (none) 5.8% 42.7% 0.4%5 (premise) 4.3% – 1.4%6 (none, thesis) 3.4% 25.9% –7 (premise, conclusion) 2.9% – 2.7% (mean squared errors in green significantly improve the state of the art with a confidence of over 90%)

Demo

webis16.medien.uni-weimar.de/essay-scoring

...

ADUn-grams

ADUcompositions

argumentativestructure

ADUflows

featureextraction

...

0.5000.2000.2000.100

0.3330.222

0.111

...

0.2500.125

0.125...

012

> 2

0.333

0.000

0.333

0.333

0.333

0.667

0.000

0.000

0.667

0.333

0.000

0.000

0.667

0.333

0.000

0.000

minmax

meanmed

0

3

1.667

2

0

1

0.667

1

0

0.667

2

0

0

1

0.333

0

n = 1

n = 2

n = 3

ADUflow

changeflow

change+ w/o none

0.3330.3330.333

( )( )

( )

0.6670.333

( )( )

0.3330.3330.333

( )( )

( )

w/onone

0.6670.333

( )( )

0.6670.333

( )( )

w/o none+ change

0.111

Page 28: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

28

§  Mining of argument units •  Task. Classify sentence-level units as thesis, conclusion, premise, or none. •  Approach. Support vector machine (SVM)

with different standard features. •  Data. AAE corpus (Stab and Gurevych, 2014)

•  Results. Comparable to state of the art.

§  Analysis of mined argumentative structure •  Task. Mine and analyze common unit type flows (consider changes only). •  Data. All paragraphs of full ICLE corpus (6085 student essays). (Granger et al., 2009)

•  Insights. Some flows very common, 1st and last flow in text differ entirely.

Shallow mining of argumentative structure (Wachsmuth et al., 2016)

Assessment of the Quality of Argumentation, Henning Wachsmuth

Unit type flows Average First Last Conclusion, Premises 25.1% – 13.1% Conclusion, Premises, Conclusion 17.0% – 27.2% None, thesis 3.4% 25.9% – Premises, Conclusion 2.9% – 2.7%

Approach Acc.. F1

Majority baseline 52.5 36.1 State of the art 77.3 72.6 Our classifier 74.5 74.5

Page 29: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

29

None

Conclusion

Premise

”If we take a look back in time we are in a position to see man dreaming, philosophizing and using his imagination of whatever comes his � way. We see man transcending his ego I a way and thus becoming a God - like figure. And by putting down these sacred words, what is � taking shape in my mind is the fact that using his imagination Man is no longer this organic and material substance like his � contemporary counterpart who is putting his trump card on science, technology and industrialization but Man is a way transcends � himself through his imagination.

For instance, if we take into account the Renaissance or Romantic periods of mankind and close our eyes we could see Shakespeare � applying his imagination in the fancy world of his comedies: elf and nymphs circling the stage making it a dream that will lost forever in � our minds. We could even hear their high-pitched weird chuckle piercing with a gentle touch our ears, but "open those eyes that must � eclipse the day" and you'll wee the high-tech wiping out every trace of the human elevated spirit that have dominated over the previous � centuries. What we see now is "deux aux machina" or the fake "God from the machine“ who with the touch of a button could unleash � Armageddon.

For poets and literate people of yore it was a common idea to transcend reality or to go beyond it by using their imagination not by � using reason as we the homosapiens of our time do. For example, if we indulge in entertaining the idea of the film "The matrix" it has a � lot to do with the period of Romanticism. But the difference is that a poet from that time could transcend reality, become one with � Nature, and cruise wherever he wants using his imagination. Whereas now in the 21st century and in "The matrix" in particular the � scientific type of Man thinks that at last he has succeeded in making travelling without boundaries via the virtual reality of his PC.

As a logical conclusion to my essay I would like to put only one thing. ’Wouldn't it be better if imagination makes the world go round‘. � If I was to answer this question, the answer would be positive, but given the aquisitive or consumer society conditions we live in let's � make a match between imagination and science. It would be somewhat more realistic.”

Example essay with mined structure (Wachsmuth et al., 2016)

Assessment of the Quality of Argumentation, Henning Wachsmuth

§  Prompt

§  Essay

”Some people say that in our modern world, dominated by science and technology and industrialisation, � there is no longer a place for dreaming and imagination. What is your opinion?”

Organization 3.0 Thesis clarity 2.0 Prompt adherence 4.0 Argument strength 2.0

Page 30: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

30

§  Quality assessment based on structure •  Approach. SVM based on standard and argument-specific features.

§  Evaluation •  Results. Lowest mean squared error for the structure-related dimensions. •  Insights. Best feature type captures composition of argument units.

Assessment of argumentation quality (Wachsmuth et al., 2016)

Assessment of the Quality of Argumentation, Henning Wachsmuth

Approach Organization Clarity Adherence Strength Average baseline 0.349 0.469 0.291 0.266 Previous state of the art 0.175 0.369 0.197 0.244 Our approach 0.164 0.425 0.216 0.226

Page 31: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

31

Relative comparison

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 32: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

32

§  Problem •  Rating the quality of an argument in isolation may be hard or even doubtful. •  Is there an easier or more realistic way to assess quality?

§  Main idea •  Often, we are only interested in the best available argument. •  It‘s enough to compare the quality of an argument to others. •  Dilemma. Unclear in the end whether the best argument is good.

§  Existing approaches •  Winning side. Prediction of the debate winner from debate flow. (Zhang et al., 2016)

•  Winning side. Prediction of the winner from content and style (Wang et al., 2016)

•  Convincingness. Argument comparison with standard supervised learning. (Habernal and Gurevych, 2016a)

•  Level of support. Ranking of arguments by support of claim. (Braunstain et al., 2016)

Relative quality comparison: Overview

Assessment of the Quality of Argumentation, Henning Wachsmuth

vs

Conclusion Premises

Conclusion Premises

Page 33: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

33

Relative quality comparison: Covered dimensions

Assessment of the Quality of Argumentation, Henning Wachsmuth

cogency reason- ableness

effectiveness

local relevance

local acceptability

local sufficiency

global relevance

global acceptability

global sufficiency

clarity

appropriateness

credibility emotional appeal

arrangement

Argumentation quality

thesis clarity Persing and Ng (2013)

prompt adherence Persing and Ng (2014)

global coherence Feng et al. (2014)

evaluability Park et al. (2015)

amount of evidence Rahimi et al. (2014)

sufficiency Stab and

Gurevych (2017)

level of support Braunstain et al. (2016)

argument acceptability Cabrio and Villata (2012)

argument prominence Boltužic and Šnajder (2015) ´

argument relevance Wachsmuth et al. (2017a)

organization Persing et al. (2010) Rahimi et al. (2015)

argument strength Persing and Ng (2015) persuasiveness Tan et al. (2016) winning side Wang et al. (2016) Zhang et al. (2016) convincingness Habernal and Gurevych (2016)

Page 34: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

34

§  Task •  Given a full Oxford-style debate, which opponent wins?

§  Approach

•  Mining of supporting points each side. •  Modeling of the ”conversational flow“:

When does a side puts forward own points, when does it attack opponent points.

•  Logistic regression classifier with features capturing the flow.

§  Data •  108 Intelligence2 debates (117 turns on average). •  Winning side and audience feedback given.

§  Results •  Accuracy. Approach (0.65) beats audience feedback (0.6). •  Insights. Attacking the opponent’s points better than focus on own points.

Relative comparison of effectiveness (Zhang et al. 2016)

Assessment of the Quality of Argumentation, Henning Wachsmuth

winner

winnerloser

loser

own points opponent‘s pointschange in usage in interactive stage

–8%

–4%

0%

4%

http

s://d

e.w

ikip

edia

.org

pro debt

reality

college

boomer con economy

engage

volunteer

home

”Millennials don’t stand a chance“

Page 35: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

35

§  Task •  Given two arguments with the same topic and

stance, which one is more convincing?

§  Supervised learning approaches •  SVM. SVM with RBF kernel and a rich set of linguistic features. •  BiLSTM. Bi-directional long short-term memory neural network using GloVe.

Notice: The focus of the paper was not the approaches but the data construction.

§  Crowdsourced data •  16,927 pairs of 1052 debate portal arguments for 32 topic-stance pairs. •  Each annotated 5 times for convincingness (most reliable annotation taken).

Reliability can be estimated with MACE (Hovy et al., 2013). Annotators also had to give reasons.

§  Results in 32-fold cross-validation •  Accuracy. SVM (0.78) beats BiLSTM (0.76). Human performance 0.93. •  Insights. Surface features like capitalization easy, ”inverted“ sentiment hard.

Relative comparison of effectiveness (Habernal et al., 2016a)

Assessment of the Quality of Argumentation, Henning Wachsmuth

A B

”Ban plastic water bottles?“ pro pro

vs

Page 36: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

36

Absolute vs. relative assessment ~ Theory vs. practice

§  Data representing theory (Wachsmuth et al., 2017b)

•  Absolute expert ratings •  Normative guidelines •  15 predefined quality dimensions

§  Empirical comparison of theory and practice(Wachsmuth et al., 2017d)

•  736 argument pairs are available with ratings and labels. •  Compute Kendall‘s τ correlations of all dimensions and reasons.

Assessment of the Quality of Argumentation, Henning Wachsmuth

§  Data representing practice (Habernal and Gurevych, 2016b)

•  Relative lay comparisons •  No guidelines •  17+1 resulting reason labels

attacking/abusive

language/grammar issues

unclear/hard to follow

no credible evidence

insufficient reasoning irrelevant reasons

only opinion

non-sense/confusing

off-topic

generally weak/vague

details/facts/examples

objective/two-sided credible / confident

crisp / well-written

close to topic makes you think

well thought through

convincing

cogency reason- ableness

effectiveness

local relevance

local acceptability

local sufficiency

global relevance

global acceptability

global sufficiency

clarity

appropriateness credibility emotional

appeal

arrangement

overall quality

http

s://d

e.w

ikip

edia

.org

http

s://c

omm

ons.

wik

imed

ia.o

rg

Page 37: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

37

How different is assessment in theory and in practice?

§  Selected insights •  Convincing correlates most with overall quality (0.64). •  Generally high ”correlations“ between 0.3 and 1.0.

•  Perfect: Global acceptability + attacking/abusive (1.0). •  Mostly very intuitive, such as clarity + unclear (0.91).

•  Top overall quality for well thought through (mean score 1.8 of 3). •  Lowest overall quality for off-topic (mean score 1.1 of 3).

•  Few unintuitive results, e.g., ”only“ 0.52 for credibility + no credible evidence. •  Local sufficiency + global sufficiency hard to separate.

§  Conclusions •  Theory and practice match more than expected. •  Theory can guide quality assessment in practice. •  Practice indicates what to focus on to simplify theory.

Assessment of the Quality of Argumentation, Henning Wachsmuth

http

s://d

e.w

ikip

edia

.org

http

s://c

omm

ons.

wik

imed

ia.o

rg

vs

Page 38: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

38

Objective assessment

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 39: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

39

§  Problem •  How to assess quality without learning from subjective annotations? •  What are objective argumentation quality indicators?

§  Main idea •  Assess quality based on the structure induced by

the set of all arguments. •  Works for both for absolute and relative assessment. •  Dilemma. Evaluation on subjective annotations?

A solution is to rely on majority assessments of many annotators.

§  Existing approaches •  Acceptability. Assessment based on the attack relations. (Cabrio and Villata, 2012)

•  Relevance. Assessment based on reuse of argument units. (Wachsmuth et al., 2017a)

•  Prominence. Assessment based on argument frequency. (Boltužic and Šnajder, 2015)

Objective quality assessment: Overview

Assessment of the Quality of Argumentation, Henning Wachsmuth

Conclusion Premises

Conclusion Premises ≈ ≈

Conclusion Premises

Conclusion Premises

Conclusion Premises

support attack

Page 40: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

40

Objective quality assessment: Covered dimensions

Assessment of the Quality of Argumentation, Henning Wachsmuth

cogency reason- ableness

effectiveness

local relevance

local acceptability

local sufficiency

global relevance

global acceptability

global sufficiency

clarity

appropriateness

credibility emotional appeal

arrangement

Argumentation quality

thesis clarity Persing and Ng (2013)

prompt adherence Persing and Ng (2014)

global coherence Feng et al. (2014)

evaluability Park et al. (2015)

amount of evidence Rahimi et al. (2014)

sufficiency Stab and

Gurevych (2017)

level of support Braunstain et al. (2016)

argument acceptability Cabrio and Villata (2012)

argument prominence Boltužic and Šnajder (2015) ´

argument relevance Wachsmuth et al. (2017a)

organization Persing et al. (2010) Rahimi et al. (2015)

argument strength Persing and Ng (2015) persuasiveness Tan et al. (2016) winning side Wang et al. (2016) Zhang et al. (2016) convincingness Habernal and Gurevych (2016)

Page 41: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

41

§  Background: Abstract argumentation framework (Dung, 1995)

•  A directed graph where nodes represent arguments and edges attack relations between arguments.

•  Graph analysis reveals whether to accept an argument.

•  Accepted. If all arguments attacking it are rejected. •  Not accepted. If an accepted argument attacks it.

Extensions with weightings and with support+attack exist.

§  Approach (Cabrio and Villata, 2012) •  Given a set of arguments, use textual entailment algorithm to classify attacks. •  Assess acceptability of arguments following Dung‘s framework.

§  Evaluation •  Tested on 100 argument pairs from idebate.org, 45 attacking each other. •  Attack classification. Accuracy 0.67 •  Acceptability assessment. Accuracy 0.75

Objective assessment of global acceptability

Assessment of the Quality of Argumentation, Henning Wachsmuth

attack A1 A2

A3 A4 attack accepted

accepted

atta

ck

Page 42: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

42

Objective assessment of global relevance (Wachsmuth et al., 2017a)

Assessment of the Quality of Argumentation, Henning Wachsmuth

§  Task •  Given a set of arguments, which one is

most relevant to some issue? •  Problem. Relevance is highly subjective.

§  Research question •  Can we develop an ”objective” measure of relevance?

§  Key hypothesis •  The relevance of a conclusion depends on what other arguments

across the web use it as a premise. •  Rationale. Author cannot control who ”cites“ a conclusion in this way.

§  Approach •  Ignore content and inference of arguments (for now). •  Derive relevance structurally from the reuse of conclusions

at web scale.

Conclusion Premises

Conclusion Premises

Page 43: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

43

Building an argument graph for the web (Wachsmuth et al., 2017a)

Assessment of the Quality of Argumentation, Henning Wachsmuth

”If you wanna hear my view !

I think that the death penalty

should be abolished. It

legitimizes an irreversible act

of violence . As long as human

justice remains fallible , the

risk of executing the innocent

can never be eliminated .”

”If you wanna hear my view !

I think that the death penalty

should be abolished. It

legitimizes an irreversible act

of violence . As long as human

justice remains fallible , the

risk of executing the innocent

can never be eliminated .”

”If you wanna hear my view !

I think that the death penalty

should be abolished. It

legitimizes an irreversible act

of violence . As long as human

justice remains fallible , the

risk of executing the innocent

can never be eliminated .”

”If you wanna hear my view !

I think that the death penalty

should be abolished. It

legitimizes an irreversible act

of violence . As long as human

justice remains fallible , the

risk of executing the innocent

can never be eliminated .”

”If you wanna hear my view !

I think that the death penalty

should be abolished. It

legitimizes an irreversible act

of violence . As long as human

justice remains fallible , the

risk of executing the innocent

can never be eliminated .”

”If you wanna hear my view !

I think that the death penalty

should be abolished. It

legitimizes an irreversible act

of violence . As long as human

justice remains fallible , the

risk of executing the innocent

can never be eliminated .”

”If you wanna hear my view !

I think that the death penalty

should be abolished. It

legitimizes an irreversible act

of violence . As long as human

justice remains fallible , the

risk of executing the innocent

can never be eliminated .”

”If you wanna hear my view !

I think that the death penalty

should be abolished. It

legitimizes an irreversible act

of violence . As long as human

justice remains fallible , the

risk of executing the innocent

can never be eliminated .”

Conclusion Premises

abolish the death penalty

≈ ≈

≈ ≈

≈ stance

stance stance

The death penalty doesn‘t deter people from committing serious violent crimes.

A survey of the UN on the relation between the death penalty and homicide rates gave

no support to the deterrent hypothesis.

It does not deter people from

committing serious violent crimes.

Even if it did, is it acceptable to pay

for predicted future crimes of others?

The death penalty should be abolished.

Page et al. (1999)

” PageRank, a method for rating web pages objectively

and mechanically, effectively measuring human interest “

Page 44: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

44

§  Original PageRank score of a web page d (Page et al., 1999)

§  Adapted PageRank score of an argument unit c (Wachsmuth et al., 2017a)

§  Argument relevance is aggregation of premise scores •  Minimum, average, maximum, or sum

p̂(c) = (1� ↵) · p(d) · |D||A| + ↵ ·

X

i

p̂(ci)

|Pi|c

Approach: Adapt PageRank for argument relevance

Assessment of the Quality of Argumentation, Henning Wachsmuth

p(d) = (1� ↵) · 1

|D| + ↵ ·X

i

p(di)

|Di|

ground relevance

recursive relevance

ground relevance

recursive relevance

page di links to d

# pages di links to

same score for each page

conclusion ci uses c as premise

# premises of ci

PageRank of page d containing c

di‘

d di

<a>

<a>

<a>

...

ci‘

Pi‘

≈ ci

Pi ≈

... ≈

Page 45: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

45

Data (Wachsmuth et al., 2017a)

Assessment of the Quality of Argumentation, Henning Wachsmuth

§  No use of argument mining here •  Evaluation of PageRank without noise.

§  A ground-truth argument graph •  57 argument corpora from www.aifdb.org. •  Merged all arguments except for duplicates. •  17,877 arguments, 31,080 different units. •  PageRank computed based on assumption

that units match if they span the same text.

§  Benchmark rankings •  Since no objective relevance assessments

exist, use average assessments a proxy. •  110 arguments for 32 general claims.

2-6 arguments per claim.

•  Ranked by seven annotators (mean Kendall‘s τ = .36, highest τ = .59).

0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000! 18000!

10–122 5–9

4 3 2 1 0 17372

10595 1846

663 288 266

50

usage as conclusion

0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000! 18000!

6–8 5 4 3 2 1 0 12892

17093 694

172 123 95 11

usage as premise

Page 46: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

46

Evaluation of relevance assessment (Wachsmuth et al., 2017a)

Assessment of the Quality of Argumentation, Henning Wachsmuth

§  Evaluation of unsupervised ranking approaches

§  Experiment on ground-truth graph

•  Rank arguments with each approach. •  Correlate with benchmark rankings.

§  Results •  PageRank best (with sum aggregation). •  Notable correlation despite ignorance

of content and inference.

1 2 3 4 5 6

# Kendall‘s τ0.28 0.19 0.12 0.10 0.02 0.00

PageRank Number Sentiment Frequency Similarity Random

Approach best results for each ranking approach

PageRank of premises

Frequency of premises X

Sentiment of premises

J Similarity of units

c⇠PNumber

of premises

|P |Random ranking

each for minimum, average, maximum, and sum aggregation

Page 47: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

47

” Strawberries are good for your ticker.”

” One cup of strawberries, for instance, contains your full recommended daily intake of vitamin C, along with high quantities of folic acid and fiber.” #2

” Berries are superfoods because they’re so high in antioxidants without being high in calories, says Giovinazzo MS, RD, a nutritionist at Clay health club and spa, in New York City.”

#1

Assessment of the Quality of Argumentation, Henning Wachsmuth

” Strawberries are the best choice for your breakfast meal.”

Examples of ”objective“ argument relevance

” Technology has given us a means of social interaction that wasn't possible before.”

” The internet has enabled us to widen our knowledge.”

” The use of technology has revolutionized business.”

#3

#1

#2

#3

” Technology has enhanced the daily life of humans.”

http

s://d

e.w

ikip

edia

.org

ht

tps:

//pix

abay

.com

Page 48: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

48

Inclusion of subjectivity

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 49: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

49

http

s://p

ixab

ay.c

om

Inclusion of Subjectivity: Overview

Assessment of the Quality of Argumentation, Henning Wachsmuth

§  Problem •  Ultimately, effective argumentation requires to consider the target audience. •  Humans would barely argue without doing so.

§  Main idea •  Model the target audience within quality assessment. •  This also includes to have audience-specific ground-truth annotations.

§  Missing approaches •  Audience model have rarely been included explicitly so far. •  Implicitly, some annotated corpora may actually represent specific audiences. •  Recent studies analyze the quality perception of different audiences.

§  Studies •  Different personalities. Effectiveness of emotional vs. rational arguments.

(Lukin et al., 2017)

•  Different ideologies. Effectiveness of news editorials. (El Baff et al., 2018)

Page 50: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

50

Studying effectiveness based on personality (Lukin et al., 2017)

Assessment of the Quality of Argumentation, Henning Wachsmuth

§  Hypothesis •  People with different personalities are open to different types of arguments.

§  Study •  Impact of personality on the effectiveness of

emotional and factual arguments. •  Personality. Here, the ”Big Five“.

§  Data •  5185 arguments from online dialogs. •  Quality. Each annotated for whether it

changed the belief (to pro, to con, neither). •  Personality. Each annotator did Big Five test.

§  Selected insights •  Agreeable people easiest to predict (F1 ~.48), extroverted hardest (F1 ~.44). •  Factual arguments best for agreeable people, emotional best for open people.

http

s://c

omm

ons.

wik

imed

ia.o

rg

Page 51: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

51

§  Effects of news editorials •  News editorials are said to shape public opinion, but they rarely change a

reader‘s prior stance. •  Rather, they challenge or reinforce stance — or neither.

§  Dialectical notion of argumentation quality •  A good editorial reinforces one side and challenges the other. •  Or it challenges both sides.

Argumentation quality in news editorials (El Baff et al., 2018)

Assessment of the Quality of Argumentation, Henning Wachsmuth

Opposite stance Same stance

Stance of the editorial

No effe

ct

Chang

e

of sta

nce

Strong

ly

challe

nging

Somew

hat

challe

nging

No effe

ct

Empower

Strong

ly

reinfo

rcing

Somew

hat

reinfo

rcing

Page 52: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

52

§  Hypothesis •  Prior stance depends on political ideology (and personality). •  Ideology needs to be known to assess the effectiveness of news editorials.

§  Study •  Impact of ideology (and personality) on the effectiveness of news editorials. •  Ideology. Here, conservative vs. liberal.

§  Data •  1000 editorials from NY Times. •  Quality. Each annotated for persuasive

effect by 3 conservatives and 3 liberals. •  Ideology. All 24 annotators (in total) did the

Political Typology Quiz. •  Personality. Also, Big Five test was taken.

Studying effectiveness based on ideology

Assessment of the Quality of Argumentation, Henning Wachsmuth

Core Conservatives

Country First Conservatives

Market Skeptic Republicans

New Era Enterprisers

Devout and Diverse

Disaffected Democrats

Opportunity Democrats

Solid Liberals

0

4

6

2

0

6

3

3

Liberals

Conservatives

Page 53: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

53

§  Majority effect distribution in the corpus

§  Effect depending on ideology and personality Kendall‘s τ correlation with challenge/reinforce

Selected results of the ideology study (El Baff et al., 2018)

Assessment of the Quality of Argumentation, Henning Wachsmuth

1%

5%

44%

2%

38%

10%

0% 20% 40%

Challenge & Challenge

Challenge & Reinforce

Reinforce & Reinforce

Challenge & No_effect

Reinforce & No_effect

No effect & No_effect

71 269

708

1402

550

72 275

1282

798 578

0

300

600

900

1200

1500

Strongly challenging

Somewhat challenging

No effect Somewhat reinforcing

Strongly reinforcing

Conservatives Liberals

33 35

Change stance

-0.14

-0.04

0.06

0.16

0.26

Agreeability Conscientiousness Extraversion Neuroticism Openness

Page 54: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

54

Conclusion

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 55: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

55

§  Argumentation quality •  Several quality dimensions at different granularity levels. •  What dimension is important, depends on the goal. •  Many dimensions are highly subjective.

§  Assessment of argumentation quality •  Either absolute rating or relative comparison. •  Structural analyses help to counter subjectiveness. •  Diverse approaches exist, often learning-based.

§  Selected assessment approaches •  Argument-specific features for rhetorical dimensions. •  Modeling conversational flow to predict debate winners. •  PageRank for ”objective“ argument relevance.

Conclusion

Assessment of the Quality of Argumentation, Henning Wachsmuth

A B

”Ban plastic water bottles?“ pro pro

vs

Page 56: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

56

§  Aristotle (2007). Aristotle (George A. Kennedy, Translator). On Rhetoric: A Theory of Civic Discourse. Clarendon Aristotle series. Oxford University Press, 2007.

§  Blair (2012). J. Anthony Blair. Groundwork in the Theory of Argumentation. Springer Netherlands, 2012.

§  Boltužic and Šnajder (2015). Filip Boltužic and Jan Šnajder. Identifying Prominent Arguments in Online Debates using Semantic Textual Similarity. In Proceedings of the 2nd Workshop on Argumentation Mining, pages 110–115, 2015.

§  Braunstain et al. (2016). Liora Braunstain, Oren Kurland, David Carmel, Idan Szpektor, and Anna Shtok. Supporting Human Answers for Advice-seeking Questions in CQA Sites. In Proceedings of the 38th European Conference on IR Research, pages 129–141, 2016.

§  Cabrio and Villata (2012). Elena Cabrio and Serena Villata. Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 208–212, 2012.

§  Cohen (2001). Daniel H. Cohen. Evaluating Arguments and Making Meta-Arguments. Informal Logic, 21(2):73–84, 2001.

§  Damer (2009). T. Edward Damer. Attacking Faulty Reasoning: A Practical Guide to Fallacy-Free Arguments. Wadsworth, Cengage Learning, Belmont, CA, 6th edition, 2009.

§  Dung (1995): Phan Minh Dung. On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games. Artificial Intelligence, 77(2):321–357, 1995.

§  El Baff et al. (2018). Roxanne El Baff, Henning Wachsmuth, Khalid Al-Khatib, and Benno Stein. Challenge or Empower: Revisiting Argumentation Quality in a News Editorial Corpus. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 454–464, 2018.

References

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 57: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

57

§  Feng et al. (2014). Vanessa Wei Feng, Ziheng Lin, and Graeme Hirst. The Impact of Deep Hierarchical Discourse Structures in the Evaluation of Text Coherence. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 940–949. Dublin City University and Association for Computational Linguistics, 2014.

§  Freeley and Steinberg (2009). Austin J. Freeley and David L. Steinberg. Argumentation and Debate. Cengage Learning, 12th edition, 2008.

§  Freeman (2011). Argument Structure: Representation and Theory. Springer, 2011.

§  Govier (2010). Trudy Govier. A Practical Study of Argument. Wadsworth, Cengage Learning, Belmont, CA, 7th edition, 2010.

§  Granger et al. (2009). Sylviane Granger, Estelle Dagneaux, Fanny Meunier, and Magali Paquot. International Corpus of Learner English (version 2), 2009.

§  Habernal and Gurevych (2016a). Ivan Habernal and Iryna Gurevych. 2016. Which Argument is More Convincing? Analyzing and Predicting Convincingness of Web Arguments using Bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1589–1599.

§  Habernal and Gurevych (2016b). Ivan Habernal and Iryna Gurevych. What makes a convincing argument? Empirical Analysis and Detecting Attributes of Convincingness in Web Argumentation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1214–1223, 2016.

§  Hamblin (1970). Charles L. Hamblin. Fallacies. Methuen, London, UK, 1970.

§  Hoeken (2001). Hans Hoeken. Anecdotal, Statistical, and Causal evidence: Their Perceived and Actual Persuasiveness. Argumentation, 15(4):425–437, 2001.

References

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 58: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

58

§  Hovy et al. (2013). Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy. 2013. Learning Whom to Trust with MACE. In Proceedings of NAACL-HLT 2013, pages 1120–1130.

§  Johnson and Blair (2006). Ralph H. Johnson and J. Anthony Blair. 2006. Logical Self-defense. International Debate Education Association.

§  O‘Keefe and Jackson (1995). Daniel J. O’Keefe and Sally Jackson. Argument Quality and Persuasive Effects: A Review of Current Approaches. In Argumentation and Values: Proceedings of the Ninth Alta Conference on Argumentation, pages 88–92, 1995.

§  Mercier and Sperber (2011). Hugo Mercier and Dan Sperber. 2011. Why Do Humans Reason? Arguments for an Argumentative Theory. Behavioral and Brain Sciences, 34:57–111.

§  Page et al. (1999). Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, Stanford InfoLab. Previous number = SIDL-WP-1999-0120, 1999.

§  Park et al. (2015). Joonsuk Park, Cheryl Blake, and Claire Cardie. Toward Machine-assisted Participation in eRulemaking: An Argumentation Model of Evaluability. In Proceedings of the 15th International Conference on Artificial Intelligence and Law, pages 206–210, 2015.

§  Perelman and Olbrecht-Tyteca (1969). Chaïm Perelman and Lucie Olbrechts-Tyteca. 1969. The New Rhetoric: A Treatise on Argumentation (John Wilkinson and Purcell Weaver, translator). University of Notre Dame Press.

§  Persing and Ng (2013): Isaac Persing and Vincent Ng. Modeling Thesis Clarity in Student Essays. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 260–269, 2013.

§  Persing and Ng (2014): Isaac Persing and Vincent Ng. Modeling Prompt Adherence in Student Essays. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1534–1543, 2014.

References

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 59: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

59

§  Persing and Ng (2015): Isaac Persing and V. Ng. Modeling Argument Strength in Student Essays. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 543–552, 2015.

§  Persing et al. (2010). Isaac Persing, Alan Davis, and Vincent Ng. Modeling organization in student essays. In Pro- ceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 229–239, 2010.

§  Rahimi et al. (2014). Zahra Rahimi, Diane J. Litman, Richard Correnti, Lindsay Clare Matsumura, Elaine Wang, and Zahid Kisa. Automatic Scoring of an Analytical Response-to-Text Assessment. In Proceedings of the 12th International Conference on Intelligent Tutoring Systems, pages 601–610, 2014.

§  Rahimi et al. (2015). Zahra Rahimi, Diane Litman, Elaine Wang, and Richard Correnti. Incorporating Coherence of Topics as a Criterion in Automatic Response-to-Text Assessment of the Organization of Writing. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 20–30, 2015.

§  Stab and Gurevych (2014). Christian Stab and Iryna Gurevych. Annotating Argument Components and Relations in Persuasive Essays. In Proceedings of the 25th Conference on Computational Linguistics, pages 1501–1510, 2014.

§  Stab and Gurevych (2017). Christian Stab and Iryna Gurevych. Recognizing Insufficiently Supported Arguments in Argumentative Essays. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 980–990, 2017.

§  Stede and Schneider (2018). Manfred Stede and Jodi Schneider. Argumentation Mining. Synthesis Lectures on Human Language Technologies 40, Morgan & Claypool, 2018.

§  Tindale (2007). Christopher W. Tindale. 2007. Fallacies and Argument Appraisal. Critical Reasoning and Argumentation. Cambridge University Press.

§  Toulmin (1958). Stephen E. Toulmin. The Uses of Argument. Cambridge University Press, 1958.

References

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 60: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

60

§  van Eemeren (2015). Frans H. van Eemeren. Reasonableness and Effectiveness in Argumentative Discourse: Fifty Contributions to the Development of Pragma-Dialectics. Argumentation Library. Springer International Publishing, 2015.

§  van Eemeren and Grootendoorst (2004). Frans H. van Eemeren and Rob Grootendorst. 2004. A Systematic Theory of Argumentation: The Pragma-Dialectical Approach. Cambridge University Press, Cambridge, UK.

§  Wachsmuth et al. (2016). Henning Wachsmuth, Khalid Al-Khatib, and Benno Stein. Using Argument Mining to Assess the Argumentation Quality of Essays. In: Proceedings of the 26th International Conference on Computational Linguistics, pages 1680–1692, 2016.

§  Wachsmuth et al. (2017a). Henning Wachsmuth, Benno Stein, and Yamen Ajjour. ”PageRank“ for Argument Relevance. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 1116–1126, 2017.

§  Wachsmuth et al. (2017b). Henning Wachsmuth, Nona Naderi, Yufang Hou, Yonatan Bilu, Vinodkumar Prabhakaran, Tim Alberdingk Thijm, Graeme Hirst, and Benno Stein. Computational Argumentation Quality Assessment in Natural Language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 176–187, 2017.

§  Wachsmuth et al. (2017d). Henning Wachsmuth, Nona Naderi, Ivan Habernal, Yufang Hou, Graeme Hirst, Iryna Gurevych, and Benno Stein. Argumentation Quality Assessment: Theory vs. Practice. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, pages 250–255, 2017.

§  Walton (2006). Douglas Walton. Fundamentals of Critical Argumentation. Cambridge University Press, 2006.

§  Walton et al. (2008). Douglas Walton, Christopher Reed, and Fabrizio Macagno. Argumentation Schemes. Cambridge University Press, 2008.

References

Assessment of the Quality of Argumentation, Henning Wachsmuth

Page 61: Computational Argumentation — Part XI Assessment of the ... · XI. Assessment of the quality of argumentation XII. Generation of argumentation XIII.Development of an argument search

61

§  Wang et al. (2017). Lu Wang, Nick Beauchamp, Sarah Shugars, and Kechen Qin. Winning on the Merits: The Joint Effects of Content and Style on Debate Outcomes. In: Transactions of the Association for Computational Linguistics 5, pages 219--232, 2017.

§  Wei et al. (2016). Zhongyu Wei, Yang Liu, and Yi Li. Is this Post Persuasive? Ranking Argumentative Comments in Online Forum. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 195–200, 2016.

§  Zhang et al. (2016). Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil. Conversational Flow in Oxford-style Debates. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 136–141, 2016.

References

Assessment of the Quality of Argumentation, Henning Wachsmuth