26
The Strong-Weak Pronoun Distinction as a Marker of Literariness Andreas van Cranenburgh 28 February 2019, W¨ urzburg Digital Stylistics in Romance Studies and Beyond conference

The Strong-Weak Pronoun Distinction as a Marker of ... fileThe Strong-Weak Pronoun Distinction as a Marker of Literariness Andreas van Cranenburgh 28 February 2019, Wurzburg Digital

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

The Strong-Weak Pronoun Distinctionas a Marker of Literariness

Andreas van Cranenburgh

28 February 2019, Wurzburg

Digital Stylistics in Romance Studies and Beyond conference

Outline

I General problem: What is literature?

I This talk: Strong/weak pronounsI Results

I CorrelationsI OutliersI Analysis

1 / 17

Background: What is literature?

What makes a literary novel literary?

I Cultural capital of critics and publishers

I Subjective aesthetic value-judgments

I “Objective” textual features, writing style: literariness

2 / 17

The Riddle of Literary Quality

James, 5

0 shades

Child, 6

1 hours

Smith

, Those

in peril

Austin, U

ntil we re

ach..

Adler-Olse

n, Consp

iracy o

f..

Watson, B

efore I go..

Scholte

n, Kameraad baron

Beijnum, S

oort familie

Barnes, S

ense of e

nding1

2

3

4

5

6

7

mea

n ra

ting,

95%

con

f. in

t.

I 401 contemporaryDutch-languagenovels

I Large reader survey ofgeneral public

I Literary ratings on7-point Likert scale

I Stylometry andmachine learning withtexts of novels

3 / 17

http://literaryquality.huygens.knaw.nl

Previous results: 76.0 % R2

1 2 3 4 5 6 7

actual reader judgments

1

2

3

4

5

6

7

pre

dic

ted

reader

judgm

ents

JamesFifty shades of Grey

KinsellaRemember me?

SmeetsAfrekening

GilbertEat pray love

KochThe dinner

StockettThe help

DonoghueRoom

FranzenFreedom

BarnesSense of anending

LewinskyJohannistag

MortierGodenslaap

RosenboomZoete mond

BaldacciHell's corner

FrenchBlue monday

Fiction

Suspense

Romantic

Other

I Literariness is highlypredictable from text

I Word frequencies,cliches, syntacticcomplexity yield goodpredictions,

I but many manyfeatures, hard tointerpret.

This talk is not aboutgetting better predictions,but about understandingone specific stylistic aspect

4 / 17

van Cranenburgh & Bod (2017). A data-oriented model of literary languagevan Cranenburgh et al (2019). Vector space explorations of literary language

Background: strong and weak pronouns

I Some Dutch pronounshave strong and weak(reduced) forms

I Same meaning, butstrong/weak issometimes obligatoryor preferred

I Other times, freechoice (style!)

Strong Weaksubj, obj

1st sg ik, mij me2nd sg jij, jou je3rd sg fem zij, haar ze3rd sg masc hij, hem -3rd sg neut het -1st pl wij, ons we2nd pl jullie -3rd pl zij, hen/hun ze

Red/blue: pronouns with both formsNot shown: weak pronouns avoided inwritten language or only used aspossessive/reflexive.

5 / 17

Cf. Haeseryn et al. (1997). Algemene Nederlandse Spraakkunst

Previous work: why strong vs weak?

Kaiser (2011) and works cited there:

Salience:null > reduced pronoun > full pronoun > demonstrative > full NP . . . etc.most salient less salientreferent referent

Contrast:The referent is in a contrast relationto other entities in the discourse

6 / 17

Kaiser (2011). Salience and contrast effects in reference resolution:The interpretation of Dutch pronouns and demonstratives.

This talk

Missing from previous work:

Stylistic dimension:Weak vs strong pronouns are related to informality and the tone ofa text

Research Questions:

1. Is there an association between strong/weak pronouns andliterariness?

2. How common is it for these variants to be

a Free stylistic choiceb Preferredc Obligatory

3. Why could there be such an association?

7 / 17

This talk

Missing from previous work:

Stylistic dimension:Weak vs strong pronouns are related to informality and the tone ofa textResearch Questions:

1. Is there an association between strong/weak pronouns andliterariness?

2. How common is it for these variants to be

a Free stylistic choiceb Preferredc Obligatory

3. Why could there be such an association?

7 / 17

Method

Calculate correlation between:

1. Mean literary ratings of novels

2. Two independent measurements:

a Baseline: frequency of both pronoun formsb Proportion of strong pronouns vs both forms

8 / 17

Correlation: pronouns vs literary rating (N=401)

1 2 3 4 5 6 7Literary rating

0

1

2

3

4

5

6

7

% p

rono

uns

r=-0.32; p=4.3e-11

I Count both forms

I Divide by total words(rel. freq)

I Result:Less pronouns, moreliterary

I Probably proxy foramount of dialogue vsnarrative description

9 / 17

Correlation: strong pronouns vs literary rating (N=401)

1 2 3 4 5 6 7Literary rating

0

10

20

30

40

50

60

Prop

ortio

n st

rong

pro

noun

s

r=0.38; p=3.8e-15

SpringerQuadriga

MitchellNietVerhoordeGebeden

KootenVerrekijker

DewulfKleineDagen Japin

Vaslav

BernlefZijnDood

VerhulstLaatsteLiefdeVanSiebelink

OscarAbdolahKoning

I Count strongpronouns, divide bycount of both forms

I Independent of totalnumber of pronouns

I On average, 85 % ofpronouns are weak

I More strongpronouns, moreliterary

I Several strongoutliers!

10 / 17

Distribution of pronouns in outliers

mij me jij jou je zij-SG ze-SG wij we zij-PL hen hun ze-PLform

SpringerQuadriga

MitchellNietVerhoordeGebeden

KootenVerrekijker

DewulfKleineDagen

JapinVaslav

BernlefZijnDood

VerhulstLaatsteLiefdeVan

SiebelinkOscar

AbdolahKoning 1.2

0.8

0.4

0.0

0.4

0.8

1.2

Table: Divergence of relative frequencies (wrt corpus mean)

I All Dutch authors (except Mitchell), highly literary (> 5)

I Less je, ze-SG,FEM. More mij, wij, zij-SG,FEM

11 / 17

Possible explanations

Why are strong pronouns more common in literary texts?

Stylistic choice (deliberate or not):

I Non-literary texts have more informal, idiomaticlanguage

I Literary authors are less afraid of sounding“unnatural”

Discourse structure more complicated:

I Larger number of charactersI Multiple perspective, storylines

give rise to higher frequency of less salient referentsand use of contrast.

12 / 17

Manual Analysis

I In first 100 sentences of the outliersI Annotate each pronoun:

I Strong vs weakI Free choice, preferred, or

obligatoryI Used for emphasis/contrast?I Type: personal, possessive,

generic, non-personal, verb

Limitation: annotation was donewithout looking at discourse context.

Distribution of types:Personal 303Possessive 26Generic 20Non-personal 6Verb 1Total 356

We’ll only considerpersonal pronouns, whichallow both forms.

13 / 17

Manual Analysis

I In first 100 sentences of the outliersI Annotate each pronoun:

I Strong vs weakI Free choice, preferred, or

obligatoryI Used for emphasis/contrast?I Type: personal, possessive,

generic, non-personal, verb

Limitation: annotation was donewithout looking at discourse context.

Distribution of types:Personal 303Possessive 26Generic 20Non-personal 6Verb 1Total 356

We’ll only considerpersonal pronouns, whichallow both forms.

13 / 17

Breakdown (N=303 pronouns)

strong weakform

0

20

40

60

80

coun

temphasis = False

strong weakform

emphasis = Truestatusfreepreferredobligatory

I Emphasis is rare.

I Weak often preferred, but large part is free choice.

14 / 17

Typical examples

Weak:

(1) a. Free: We speuren erfgenamen opWe track down heirs

b. Preferred: Dat weet je toch?You know that right?

c. Obligatory: Mooie gouvernante is me dat.Nice governess that is.

Strong:

(2) a. Free: Hoort u mij?Do you hear me?

b. Preferred: Maar dan kennen ze mij niet.But then they haven’t met me.

c. Obligatory: Je ziet dat het niet van mij is!You can tell it’s not mine!

15 / 17

Interesting examples

Arguably unnatural usage of strong pronoun:

(3) a. Ik keek om mij heenI looked around me

b. aangezien [. . . ] heb ik altijd mijn eigen Duralexglas bij mijsince [. . . ] I always have my own Duralex glass with me

Weak vs strong pronouns pick different referents:

(4) Ik heb nooit kunnen vaststellen dat ze mij in de gaten hielden,al deden ze dat natuurlijk wel, en zij in de eerste plaats.I have never been able to confirm that they were watching me,although of course they did , and she most of all.

16 / 17

Conclusion

Answers to Research Questions:

1. Is there an association between strong/weak pronouns andliterariness?

I Negative correlation between pronouns and literarinessI Positive correlation between strong pronouns and literarinessI Striking outliers: Dutch literary authors who use lots of strong

pronouns

2. How common is it for these variants to be free choice,preferred, or obligatory?

I Majority is free, stylistic choice.I Even in the outliers, weak pronouns are often preferredI Obligatoriness and emphasis are rare.

3. Why could there be such an association?I Seems to be predominantly a stylistic choiceI Should look into discourse context

17 / 17

Conclusion

Answers to Research Questions:

1. Is there an association between strong/weak pronouns andliterariness?

I Negative correlation between pronouns and literarinessI Positive correlation between strong pronouns and literarinessI Striking outliers: Dutch literary authors who use lots of strong

pronouns

2. How common is it for these variants to be free choice,preferred, or obligatory?

I Majority is free, stylistic choice.I Even in the outliers, weak pronouns are often preferredI Obligatoriness and emphasis are rare.

3. Why could there be such an association?I Seems to be predominantly a stylistic choiceI Should look into discourse context

17 / 17

Conclusion

Answers to Research Questions:

1. Is there an association between strong/weak pronouns andliterariness?

I Negative correlation between pronouns and literarinessI Positive correlation between strong pronouns and literarinessI Striking outliers: Dutch literary authors who use lots of strong

pronouns

2. How common is it for these variants to be free choice,preferred, or obligatory?

I Majority is free, stylistic choice.I Even in the outliers, weak pronouns are often preferredI Obligatoriness and emphasis are rare.

3. Why could there be such an association?I Seems to be predominantly a stylistic choiceI Should look into discourse context

17 / 17

EXTRA SLIDES

18 / 17

Future Work

I Analyze antecedents: distance, grammatical function,competing referents

I Compare with non-literary corpora (news etc.)

I See whether weak/strong distinction is useful as a feature forautomatic anaphora resolution

I Analyze dialogue for each character separately from narrativetext

19 / 17

Properties of strong and weak pronounsStrong: (emphatic)

I Default, unmarkedI Often stressed: emphasis/contrastI Less salient referentsI Obligatory for:

I Comparisons(ik ben rijker dan jij, *dan je)

I Conjunctions of two pronouns(hij en zij)

I Oblique arguments, eg. rel.clause (voor hen die . . . )

I Preferred:I Sounds unnatural when

repeatedI Preferred in writing, even when

weak form used when samesentence is spoken (applies toie, ’m, d’r, ’t, but also to alesser extent for me/je/we/ze)

Weak: (unemphatic)

I Reduced, markedI Always unstressed: no emphasisI Salient referentsI Obligatory for:

I Idioms(e.g., dank je, *dank jij)

I Generic you (je weet maarnooit!)

I 3rd pers. pl. non-personal(ze, *zij, *hun, *hun);

I Preferred:I Can be repeated; or sentence

has one strong pronoun followedby several weak instances

I Less personal, more informal

Cf. Haeseryn et al. (1997). Algemene Nederlandse Spraakkunst;

Bresnan (1998), Markedness and morphosyntactic variation in pronominal systems20 / 17

Distribution of grammatical functions

mij me jij jou je zij-SG ze-SG wij we zij-PL hen hun ze-PLform

su

obj1

obj2

se

cnj

hd

func

3.1 23.5 1.6 43.7 1.0 11.7 0.5 14.9

13.5 39.2 6.0 19.1 10.6 0.2 11.4

8.6 57.4 2.5 21.6 1.4 3.9 4.6

1.3 74.2 24.5

0.8 10.8 3.3 0.4 24.8 0.5 15.3 1.2 42.8

24.0 26.9 13.0 2.2 31.8 2.0 0.2

10.8 0.2 19.9 10.0 0.9 7.5 0.6 17.3 0.7 11.6 18.2 1.4 0.9

0

15

30

45

60

Percentages add up to 100 for each row

21 / 17