Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
The Strong-Weak Pronoun Distinctionas a Marker of Literariness
Andreas van Cranenburgh
28 February 2019, Wurzburg
Digital Stylistics in Romance Studies and Beyond conference
Outline
I General problem: What is literature?
I This talk: Strong/weak pronounsI Results
I CorrelationsI OutliersI Analysis
1 / 17
Background: What is literature?
What makes a literary novel literary?
I Cultural capital of critics and publishers
I Subjective aesthetic value-judgments
I “Objective” textual features, writing style: literariness
2 / 17
The Riddle of Literary Quality
James, 5
0 shades
Child, 6
1 hours
Smith
, Those
in peril
Austin, U
ntil we re
ach..
Adler-Olse
n, Consp
iracy o
f..
Watson, B
efore I go..
Scholte
n, Kameraad baron
Beijnum, S
oort familie
Barnes, S
ense of e
nding1
2
3
4
5
6
7
mea
n ra
ting,
95%
con
f. in
t.
I 401 contemporaryDutch-languagenovels
I Large reader survey ofgeneral public
I Literary ratings on7-point Likert scale
I Stylometry andmachine learning withtexts of novels
3 / 17
http://literaryquality.huygens.knaw.nl
Previous results: 76.0 % R2
1 2 3 4 5 6 7
actual reader judgments
1
2
3
4
5
6
7
pre
dic
ted
reader
judgm
ents
JamesFifty shades of Grey
KinsellaRemember me?
SmeetsAfrekening
GilbertEat pray love
KochThe dinner
StockettThe help
DonoghueRoom
FranzenFreedom
BarnesSense of anending
LewinskyJohannistag
MortierGodenslaap
RosenboomZoete mond
BaldacciHell's corner
FrenchBlue monday
Fiction
Suspense
Romantic
Other
I Literariness is highlypredictable from text
I Word frequencies,cliches, syntacticcomplexity yield goodpredictions,
I but many manyfeatures, hard tointerpret.
This talk is not aboutgetting better predictions,but about understandingone specific stylistic aspect
4 / 17
van Cranenburgh & Bod (2017). A data-oriented model of literary languagevan Cranenburgh et al (2019). Vector space explorations of literary language
Background: strong and weak pronouns
I Some Dutch pronounshave strong and weak(reduced) forms
I Same meaning, butstrong/weak issometimes obligatoryor preferred
I Other times, freechoice (style!)
Strong Weaksubj, obj
1st sg ik, mij me2nd sg jij, jou je3rd sg fem zij, haar ze3rd sg masc hij, hem -3rd sg neut het -1st pl wij, ons we2nd pl jullie -3rd pl zij, hen/hun ze
Red/blue: pronouns with both formsNot shown: weak pronouns avoided inwritten language or only used aspossessive/reflexive.
5 / 17
Cf. Haeseryn et al. (1997). Algemene Nederlandse Spraakkunst
Previous work: why strong vs weak?
Kaiser (2011) and works cited there:
Salience:null > reduced pronoun > full pronoun > demonstrative > full NP . . . etc.most salient less salientreferent referent
Contrast:The referent is in a contrast relationto other entities in the discourse
6 / 17
Kaiser (2011). Salience and contrast effects in reference resolution:The interpretation of Dutch pronouns and demonstratives.
This talk
Missing from previous work:
Stylistic dimension:Weak vs strong pronouns are related to informality and the tone ofa text
Research Questions:
1. Is there an association between strong/weak pronouns andliterariness?
2. How common is it for these variants to be
a Free stylistic choiceb Preferredc Obligatory
3. Why could there be such an association?
7 / 17
This talk
Missing from previous work:
Stylistic dimension:Weak vs strong pronouns are related to informality and the tone ofa textResearch Questions:
1. Is there an association between strong/weak pronouns andliterariness?
2. How common is it for these variants to be
a Free stylistic choiceb Preferredc Obligatory
3. Why could there be such an association?
7 / 17
Method
Calculate correlation between:
1. Mean literary ratings of novels
2. Two independent measurements:
a Baseline: frequency of both pronoun formsb Proportion of strong pronouns vs both forms
8 / 17
Correlation: pronouns vs literary rating (N=401)
1 2 3 4 5 6 7Literary rating
0
1
2
3
4
5
6
7
% p
rono
uns
r=-0.32; p=4.3e-11
I Count both forms
I Divide by total words(rel. freq)
I Result:Less pronouns, moreliterary
I Probably proxy foramount of dialogue vsnarrative description
9 / 17
Correlation: strong pronouns vs literary rating (N=401)
1 2 3 4 5 6 7Literary rating
0
10
20
30
40
50
60
Prop
ortio
n st
rong
pro
noun
s
r=0.38; p=3.8e-15
SpringerQuadriga
MitchellNietVerhoordeGebeden
KootenVerrekijker
DewulfKleineDagen Japin
Vaslav
BernlefZijnDood
VerhulstLaatsteLiefdeVanSiebelink
OscarAbdolahKoning
I Count strongpronouns, divide bycount of both forms
I Independent of totalnumber of pronouns
I On average, 85 % ofpronouns are weak
I More strongpronouns, moreliterary
I Several strongoutliers!
10 / 17
Distribution of pronouns in outliers
mij me jij jou je zij-SG ze-SG wij we zij-PL hen hun ze-PLform
SpringerQuadriga
MitchellNietVerhoordeGebeden
KootenVerrekijker
DewulfKleineDagen
JapinVaslav
BernlefZijnDood
VerhulstLaatsteLiefdeVan
SiebelinkOscar
AbdolahKoning 1.2
0.8
0.4
0.0
0.4
0.8
1.2
Table: Divergence of relative frequencies (wrt corpus mean)
I All Dutch authors (except Mitchell), highly literary (> 5)
I Less je, ze-SG,FEM. More mij, wij, zij-SG,FEM
11 / 17
Possible explanations
Why are strong pronouns more common in literary texts?
Stylistic choice (deliberate or not):
I Non-literary texts have more informal, idiomaticlanguage
I Literary authors are less afraid of sounding“unnatural”
Discourse structure more complicated:
I Larger number of charactersI Multiple perspective, storylines
give rise to higher frequency of less salient referentsand use of contrast.
12 / 17
Manual Analysis
I In first 100 sentences of the outliersI Annotate each pronoun:
I Strong vs weakI Free choice, preferred, or
obligatoryI Used for emphasis/contrast?I Type: personal, possessive,
generic, non-personal, verb
Limitation: annotation was donewithout looking at discourse context.
Distribution of types:Personal 303Possessive 26Generic 20Non-personal 6Verb 1Total 356
We’ll only considerpersonal pronouns, whichallow both forms.
13 / 17
Manual Analysis
I In first 100 sentences of the outliersI Annotate each pronoun:
I Strong vs weakI Free choice, preferred, or
obligatoryI Used for emphasis/contrast?I Type: personal, possessive,
generic, non-personal, verb
Limitation: annotation was donewithout looking at discourse context.
Distribution of types:Personal 303Possessive 26Generic 20Non-personal 6Verb 1Total 356
We’ll only considerpersonal pronouns, whichallow both forms.
13 / 17
Breakdown (N=303 pronouns)
strong weakform
0
20
40
60
80
coun
temphasis = False
strong weakform
emphasis = Truestatusfreepreferredobligatory
I Emphasis is rare.
I Weak often preferred, but large part is free choice.
14 / 17
Typical examples
Weak:
(1) a. Free: We speuren erfgenamen opWe track down heirs
b. Preferred: Dat weet je toch?You know that right?
c. Obligatory: Mooie gouvernante is me dat.Nice governess that is.
Strong:
(2) a. Free: Hoort u mij?Do you hear me?
b. Preferred: Maar dan kennen ze mij niet.But then they haven’t met me.
c. Obligatory: Je ziet dat het niet van mij is!You can tell it’s not mine!
15 / 17
Interesting examples
Arguably unnatural usage of strong pronoun:
(3) a. Ik keek om mij heenI looked around me
b. aangezien [. . . ] heb ik altijd mijn eigen Duralexglas bij mijsince [. . . ] I always have my own Duralex glass with me
Weak vs strong pronouns pick different referents:
(4) Ik heb nooit kunnen vaststellen dat ze mij in de gaten hielden,al deden ze dat natuurlijk wel, en zij in de eerste plaats.I have never been able to confirm that they were watching me,although of course they did , and she most of all.
16 / 17
Conclusion
Answers to Research Questions:
1. Is there an association between strong/weak pronouns andliterariness?
I Negative correlation between pronouns and literarinessI Positive correlation between strong pronouns and literarinessI Striking outliers: Dutch literary authors who use lots of strong
pronouns
2. How common is it for these variants to be free choice,preferred, or obligatory?
I Majority is free, stylistic choice.I Even in the outliers, weak pronouns are often preferredI Obligatoriness and emphasis are rare.
3. Why could there be such an association?I Seems to be predominantly a stylistic choiceI Should look into discourse context
17 / 17
Conclusion
Answers to Research Questions:
1. Is there an association between strong/weak pronouns andliterariness?
I Negative correlation between pronouns and literarinessI Positive correlation between strong pronouns and literarinessI Striking outliers: Dutch literary authors who use lots of strong
pronouns
2. How common is it for these variants to be free choice,preferred, or obligatory?
I Majority is free, stylistic choice.I Even in the outliers, weak pronouns are often preferredI Obligatoriness and emphasis are rare.
3. Why could there be such an association?I Seems to be predominantly a stylistic choiceI Should look into discourse context
17 / 17
Conclusion
Answers to Research Questions:
1. Is there an association between strong/weak pronouns andliterariness?
I Negative correlation between pronouns and literarinessI Positive correlation between strong pronouns and literarinessI Striking outliers: Dutch literary authors who use lots of strong
pronouns
2. How common is it for these variants to be free choice,preferred, or obligatory?
I Majority is free, stylistic choice.I Even in the outliers, weak pronouns are often preferredI Obligatoriness and emphasis are rare.
3. Why could there be such an association?I Seems to be predominantly a stylistic choiceI Should look into discourse context
17 / 17
Future Work
I Analyze antecedents: distance, grammatical function,competing referents
I Compare with non-literary corpora (news etc.)
I See whether weak/strong distinction is useful as a feature forautomatic anaphora resolution
I Analyze dialogue for each character separately from narrativetext
19 / 17
Properties of strong and weak pronounsStrong: (emphatic)
I Default, unmarkedI Often stressed: emphasis/contrastI Less salient referentsI Obligatory for:
I Comparisons(ik ben rijker dan jij, *dan je)
I Conjunctions of two pronouns(hij en zij)
I Oblique arguments, eg. rel.clause (voor hen die . . . )
I Preferred:I Sounds unnatural when
repeatedI Preferred in writing, even when
weak form used when samesentence is spoken (applies toie, ’m, d’r, ’t, but also to alesser extent for me/je/we/ze)
Weak: (unemphatic)
I Reduced, markedI Always unstressed: no emphasisI Salient referentsI Obligatory for:
I Idioms(e.g., dank je, *dank jij)
I Generic you (je weet maarnooit!)
I 3rd pers. pl. non-personal(ze, *zij, *hun, *hun);
I Preferred:I Can be repeated; or sentence
has one strong pronoun followedby several weak instances
I Less personal, more informal
Cf. Haeseryn et al. (1997). Algemene Nederlandse Spraakkunst;
Bresnan (1998), Markedness and morphosyntactic variation in pronominal systems20 / 17
Distribution of grammatical functions
mij me jij jou je zij-SG ze-SG wij we zij-PL hen hun ze-PLform
su
obj1
obj2
se
cnj
hd
func
3.1 23.5 1.6 43.7 1.0 11.7 0.5 14.9
13.5 39.2 6.0 19.1 10.6 0.2 11.4
8.6 57.4 2.5 21.6 1.4 3.9 4.6
1.3 74.2 24.5
0.8 10.8 3.3 0.4 24.8 0.5 15.3 1.2 42.8
24.0 26.9 13.0 2.2 31.8 2.0 0.2
10.8 0.2 19.9 10.0 0.9 7.5 0.6 17.3 0.7 11.6 18.2 1.4 0.9
0
15
30
45
60
Percentages add up to 100 for each row
21 / 17