Upload
enya
View
52
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Optimalit ätstheorie und Pragmatik Kompaktseminar an der Universit ät Wien Sommersemester 2005 Manfred Krifka Stochastische Optimalitätstheorie Lernalgorithmen Evolutionäre Optimalitätstheorie. Differential Case Marking: Objects. - PowerPoint PPT Presentation
Citation preview
Optimalitätstheorie und Pragmatik
Kompaktseminar an der Universität WienSommersemester 2005
Manfred Krifka
Stochastische OptimalitätstheorieLernalgorithmen
Evolutionäre Optimalitätstheorie
Differential Case Marking: ObjectsIn many languages, case marking of subject and object
depends on a variety of factors.Hebrew: Only definite object NPs are case marked.
Ha-seret her’a ‘et ha-milxama.‘the-movie showed ACC the-war’Ha-seret her’a (*‘et) milxama.‘the-movie showed (*ACC) war’
Spanish: Only animate object NPs are case marked.Busco a una señora.I-look-for ACC a woman.Busco (*a) una casa.‘I-look-for (*ACC) a house.’
Bossong (1985): differential object marking, attested in more than 300 languages.
Explanation in Aissen (2002) Two scales that determine differential object marking:• Animacy:
Human > Animate > Inanimate• Definiteness:
Pers.Pronoun > Name > Def.NP > Indef.Spec.NP > Nonspec. NPGeneralization: Object marking more likely at the high end of the scales.
A closer look: DOM in medieval Spanish
From Judith Aissen: Differential Object Marking. Iconicity vs. EconomyDraft, Stanford 2000
Differentielle Objektmarkierung im Deutschen: Nominativ / Akkusativ - Synkretismus
Auch im Deutschen finden wir differentielle Objektmarkierung, determiniert durch Genus:
Maskulinum: Der Mann sieht den Hasen.Der Hase sieht den Mann. NOM AKK
Femininum: Die Frau sieht den Hasen. Der Hase sieht die Frau. NOM = AKK, SynkretismusNeutrum: Das Kind sieht den Hasen. Der Hase sieht das Kind. NOM = AKK, Synkretismus
Synkretismus im Neutrum ist ererbt (allgemein in indogerman. Sprachen), im Feminum hat er sich im Mittelhochdeutschen / Frühneuhochdeutschen herausgebildet.
Synkretismus innerhalb einer Flexionsklasse der Nomina (n-Stämme) nach Belebtheit:der Mensch / den Mensch-en, der Bote / den Bot-en, der Hase / den Has-en ...der Regen / den Regen, der Kragen / den Kragen, der Besen / den Besen ...
Fische unbelebt: der Karpfen / den Karpfen, der Rochen / den RochenUnterschiedlich kategorisierbare Fälle: der Same(n) / der Wille(n), der Friede(n), ...Dubletten: der Drache, der Drachen; der Rappe, der Rappen; der Lump, der Lumpen
Differentielle Objektmarkierung im Deutschen: Nominativ / Akkusativ - Synkretismus
Belebtheit als ein Faktor des Kasus-Synkretismus im Allgemeinen:Maskuline Nomina sind wahrscheinlicher belebt als femine.
Beispiel: Korpus von Ruoff (1981), 500.000 Wörter, gesprochene Alltagserzählungen aus dem schwäbischen Raum
Frequency of Animacy, Nouns > 8 occurrences, > 0.01 %
0
50
100
150
200
250
300
Masculine Feminine Neuter Pluraliatanta
InanimatesAnimates
Genus der 100 häufigsten Substantive,
die Belebtes bezeichnen,
Maskulina69%
Feminina16%
Neutra9%
Pluralia-tanta 6%
Differentielle Objektmarkierung im Deutschen: Nominativ / Akkusativ - Synkretismus
Belebtheit als ein systematischer Faktor: Nominalderivation• Maskuline Ableitungen sind oft animat:
Lehr-er, Lehr-ling, Praktik-ant, Psycho-loge• Feminine Ableitungen sind nicht animat:
Frei-heit, Freund-schaft, Kleid-ung, Diskuss-ion, Sing-ereiAusnahme: Movierung, Präsident-in.
• Neutrum-Ableitungen sind ebenfalls oft nicht animat.
Daher: Auch der Kasussynkretismus im Deutschen hat eine Affinitätzu allgemeinen Gesetzmäßigkeiten der differentiellen Objektmarkierung.
Differential Case Marking: Subjects
Differential subject marking (“Split Ergativity”): Example: Dyirbal, Australia.
1st and 2nd person pronouns: No marking of subject NP
ɲana banaga-nyu.ɲana ɲurra-na bura-n.we returned. we you-ACC saw.
ɲurra banaga-nyu ɲurra ɲana-na bura-n.you returned you us-ACC saw
Other pronouns and NPs: Ergative marking of subject of transitive sentence:ɲuma banaga-nyu ɲuma-ɲgu yabu bura-n. Father returned. Father-ERG mother saw.
Mixed system: ɲuma-ɲgu ɲurra-na bura-n.Father-ERG you-ACC saw
Hundreds of languages (Basque, Georgian, Hindi...) distribution of subject marking governed by similar scales (Silverstein 1976):
• Animacy: Human > Animate > Inanimate
• Definiteness: Pers.Pronoun > Name > Def.NP > Indef.Spec.NP > Nonspec. NP
• Generalization: Subject marking more likely at the low end of the scales.
Differential Case Marking: Scale Alignment
Aissen (2002): Case marking patterns as the result of alignment of two scales,here illustrated with definiteness scale.
harmonic alignment, case marking unlikely
disharmonic alignment, case marking likely
Alignment of two scales produces the following markedness scales:• Subj/pronoun > Subj/name > Subj/def > Subj/spec > Subj/nonspec• Obj/nonspec > Obj/spec > Obj/def > Obj/name > Obj/pronoun
Subject Object
pronoun name definite NP spec.indef. NP nonspec. NP
Scale Alignment and OT constraintsExpression of marking tendencies, Hebrew:
Relevant parts of basic hierarchies: Subj > Obj, +def > –defAligned hierarchies: Subj/+def > Subj/–def(harmonic > disharmonic) Obj/–def > Obj/+def (only this one relevant here)Correspond. constraint ranking: *Obj/+def >> *Obj/–def “Not marking definite objects is worse than not marking indefinite objects” better interpretation: “Case marking of definite objects is more important than case marking of indefinite objects”
Markedness constraint: *STRUC: Avoid Structure (explicit marking): Speaker Economy (not strictly necessary for Hebrew case, but relevant later)
Constraint ranking: *Obj/+def >> *STRUCT >> *Obj/–def
‘the movie showed the war / war’ *Obj/+def *STRUC *Obj/–def
Ha-seret hera ‘et ha-milxama *Ha-seret hera ‘ha-milxama *Ha-seret hera ‘et milxama *
Ha-seret hera milxama *
Derivation of Dyirbal System
The facts, again:
1st and 2nd person pronouns: No marking of subject NP
ɲana banaga-nyu. ɲana ɲurra-na bura-n.we returned. we you-ACC saw.
ɲurra banaga-nyu ɲurra ɲana-na bura-n.you returned you us-ACC saw
Other pronouns and NPs: Ergative marking of subject of transitive sentence:ɲuma banaga-nyu ɲuma-ɲgu yabu bura-n. Father returned. Father-ERG mother saw.
Mixed marking: ɲuma-ɲgu ɲurra-na bura-n.Father-ERG you-ACC saw
No marking:ɲana ɲuma bura-n.we Father saw
OT Constraints, Case marking in a Dyirbal-like LanguageBasic hierarchies, universal: S(ubj) > O(bj) 1(st) > 3(rd)Aligned hierarchies: S/1 > S/3 O/3 > O/1Generated constraint orders: *S/3 >> *S/1 *O/1 >> *O/3
“marking of S/3 is more important than marking of S/1”Combined constraints: {*S/3, *O/1} >> *STRUC >> {*S/1, *O/3 }
Subj Obj *S/3 *O/1 *STRUC *S/1 *O/3 1st-Ø 3rd-Ø * *
1st-ERG 3rd-Ø * *1st-Ø 3rd-ACC * *1st-ERG 3rd-ACC **3rd-Ø 3rd-Ø * *
3rd-ERG 3rd-Ø * *
3rd-Ø 3rd-ACC * *3rd-ERG 3rd-ACC * **
3rd-Ø 1st-Ø * *3rd-ERG 1st-Ø * *3rd-Ø 1st-ACC * *
3rd-ERG 1st-ACC **
Where do the hierarchies come from?Aissen simply assumes hierarchies like S > O, 1 > 3, def > indef as given.Bresnan, Dingare & Manning (2001), Zeevat & Jäger (2002):
The hierarchies can be explained by typical patterns of language use.Example:
Subjects and objects in 3151 simple transitive clausesof Swedish everyday conversation (SAMTAL corpus, Ö. Dahl)
total +def –def +pron –-pron +anim –anim
Subj 3151 3098 53 2984 167 2984 203
Obj 3151 1830 1321 1512 1639 317 2834
Biases in the SAMTAL Corpus
Probabities that subjects and objectshave certain properties,SAMTAL Corpus of spoken Swedish(collected by Ö. Dahl, analyzed by Zeevat & Jäger)
Resulting stastical biases, expressed as conditional probabilities e.g., p(Subj | +def): probability that a +def NP is subject: 63%
p(Subj | +def) = 63% p(Subj | –def) = 4%p(Obj | +def) = 37% p(Obj | –def) = 96%p(Subj | +pron) = 66% p(Subj | –pron) = 9%p(Obj | +pron) = 33% p(Obj | –pron) = 91%p(Subj | +anim) = 90% p(Subj | –anim) = 7%p(Obj | +anim) = 10% p(Obj | –anim) = 93%
This holds for a fairly large and representative corpus of spoken Swedish;findings can be reproduced in their tendencies for other languages, communities;but collecting further data absolutely necessary.
Statistical Bias and Bidirectional OT
Zeevat & Jäger (2002), Jäger (2003)Economical encoding:• Case marking is disfavored for frequent combinations,
e.g., indefinite objects: p(Obj | –def) = 96%• but case marking is favored for infrequent combinations,
e.g., indefinite subjects: p(Subj | –def) = 4% definite objects: p(Obj | +def) = 37%
A case for weak bidirectional optimization?• Preference for simple forms: –case >> +case• Preference for meanings that correspond to bias: Obj/–def >> Obj/+def
–case, Obj/–def
–case, Obj/+def +case, Obj/–def
+case, Obj/+def
Optimal pairs, case markingpattern of Hebrew.
Problem: There is no choiceto interpret a given NPas +def or –def;this is explicitly marked!
Statistical Bias and Bidirectional OT
Zeevat & Jäger assume the following constraints:• *STRUC: Avoid structure, i.e. avoid overt marking• FAITH: Faithful interpretation of case morphemes,
e.g. ACC: Obj, ERG: Subj• BIAS: An NP of a certain category is interpreted as having
the grammatical function that is most probable for this category, e.g. Obj: inanimate
Ranking: FAITH >> BIAS >> *STRUCHearer optimality and speaker optimality (Asymmetric Bi-OT):• Hearer optimality: For a given form,
choose the meaning that shows the least severe constraint violation!In the case at hand, interpret an NP according to its case marking pattern;if there is no case marking, follow statistical bias (I-Implicature)
• If two competing forms are both hearer optimal for a given meaning,speaker can choose the preferred one (here: the one without case marking)
Hearers have to be served first, as Speakers want to be understood.Definition:• A pair F, M GEN is hearer-optimal iff there is no alternative F, M’ GEN
such that F, M’ > F, M.• A pair F, M GEN is optimal iff it is hearer-optimal
and there is no alternative form F’, M GEN such that F’, M is hearer-optimal and F’, M > F, M.
Example: Animacy in a language with ERG and ACC
Form Meaning FAITH BIAS *STRUC
anim-ERG Subj *Obj * * *
anim-Ø Subj
Obj *anim-ACC Subj * *
Obj * *
inanim-ERG Subj * *Obj * *
inanim-Ø Subj *Obj
inanim-ACC Subj * * *Obj *
hearer-opt
opt
From Pragmatics to Grammar?
One caveat:The OT-tableaus typically abstract away from important factors, e.g. word order, plausibility, selectional restrictions.The lightning killed the man.Even though the man is animate and in object position, it wouldn’t need case marking, as only animates can be killed.
A second caveat:Case marking is typically part of the core grammar,and not derived by pragmatic rules.But: Pragmatic tendencies as one source of core grammar(functionalist view of grammar).
Motivation for Stochastic Optimality Theory
Judith Aissen (2000) and Joan Bresnan (2002):There is not just a universal tendency towards differential case markingin the core grammars of language,but it can be also describe optional case marking within a language.
Example: Case marking by postpositions in colloquial Japanese(data: Fry 2001, Ellipsis and w-marking in Japanese):Subj/anim: 60% Subj/inanim: 70%Obj/anim: 54% Obj/inanim: 47%
Obligatory case marking patterns can be seen as extreme casesof statistical marking patterns, e.g. Spanish:Obj/anim: 100% Obj/inanim: 0%
Stochastic Optimaltiy Theory (StOT), Boersma (1998), Functional Phonologydeveloped originally for phonological phenomena, can be used to model this intuition:Core grammar phenomena are not essentially differentfrom statistical tendencies based on usagein phenomena that core grammar leaves, to a certain degree, optional.
Stochastic Optimality Theory (StOT)
Main differences between standard OT and Stochastic OT:• Constraint ranking on a continuous scale
Every constraint is assigned a real numberwhich determines the ranking of the constraintsand is a measure for the distance between them.
• Stochastic evaluation:For each evaluation, the placement of a constraintis modified by adding a noise value with normal distribution.The ordering of the constraints after adding this noise valuedetermines the actual evaluation of the set of candidates.
Constraints C1, C2 overlap:mostly C1 >> C2 sometimes C2 >> C1
Constraints C1, C2 do not overlap:C1 >> C2 (almost) all the time
Stochastic OT: Ordering Probabilities
Difference between mean values > 10:C1 dominates C2 categorically,p(C2 > C1) < 10-10
Difference between mean values 5:preference for C1 >> C2,but C2 >> C1 lead to grammatical results,p(C2 > C1) 10%
Difference between mean values = 0no ranking preferences,p(C2 > C1) = p(C1 > C2) = 50%,random outcomes.
Statistical OT and Gradual Learning
Boersma (1998), Boersma & Hayes (2001), in Linguistic Inquiry:Gradual Learning Algorithm (GLA) for learning constraint rankings(not for learning of possible candidates, GEN)
• In phonology: GEN: pairs of phonological formsand phonetic interpretations: //, []
• In semantics/pragmatics: GEN: pairs of syntactic/morphological forms and semantic/pragmatic interpretations: F, M
Boersma’s Gradual Learning Algorithm (GLA)
0. Initial state: All constraint values are set to 01. Learning datum: input-output pair i, o 2. Generation:
a. For each constraint, a noise value with probability following normal distribution,is added to current ranking: This is the selection point of the constraint.
b. Constraints are ranked by order of their selection points.c. The grammar generates an output o’ for the input i; alternative pair: i, o’
3. Comparison: If o’ = o, nothing happens. Otherwise, algorithm compares the constraint violationsof the learning datum i, o with the generated datum i, o’
4. Adjustment:a. All constraints that favor the learning datum i, o over the self-generated i, o’ are increased by a small predefined numerical amount (“plasticidy”)b. All constraints that favor the self generated i, o’ over the learning datum i, o are decreased by the plasticity value.
5. Final state: Steps 1 – 4 are repeated until the constraint values stablize.Plasticidy may change over life time from high to low.
Bidirectional Gradual Learning Algorithm (BiGLA)Jäger (2003): ‘The bidirectional gradual learning algorithm’• Speaker-based learning:
Input: Meaning, Output: Form. i, o = M, FSpeaker compares different forms.
• Hearer-based learning: Input: Form, Output: Likely meaning. i, o = F, MHearer compares different meanings.
Hearer also uses speaker-based reasoning:On hearing F, M with likely meaning M, speaker checks: Would I have used a different F’ to express M?If yes: Adjust rankings to increase likelihood of using F to express M.
observed form observed likely meaning
hypothesized form
hypothesized meaning
Modelling Pragmatics
The Bidirectional Gradual Learning Algorithm (BiGLA)can be tested experimentally.
Implementation: evolOT, downloadable with files at: http://uni-potsdam.de/~jaeger/nasslli03
Example: Differential Object Marking triggered by definiteness (e.g., Hebrew); input: Statistical distributions of SAMTAL corpus.
Development of Differential Object Marking
generations
rank
ing
diffe
renc
es b
etw
een
cons
train
ts
constraints
Starting state:constraints start out
equally ranked
After 1000 generations,ranking of constraints firmly established,
including previously observedm:Obj/+def >> *STRUCT >> m:Obj/–def
markdefiniteobjects!
*STRUC
markindefiniteobjects!
Development of Split Ergativity (Animacy)mark
animateobjects!
mark inanimate subjects!
don’t mark animate subjects!
don’t markinanimateobjects!
Start out with high value of FAITH:Every NP is case marked
Lower value of FAITH:Fewer NPs are case marked
*STRUC
Development of Split Ergativity: Initial State doesn’t matter
markanimateobjects!
markanimateobjects!
mark inanimate subjects!
mark inanimate subjects!
don’t mark animate subjects!
don’t mark animate subjects!
don’t markinanimateobjects!
don’t markinanimateobjects!
*STRUC
*STRUC
Development of Split Ergativity: Initial State doesn’t matter
markanimateobjects!
markanimateobjects!
mark inanimate subjects!
mark inanimate subjects!
don’t markinanimateobjects!
don’t markinanimateobjects!
don’t mark animate subjects!
don’t mark animate subjects!
*STRUC*STRUC
Learning under the Microscope: Speaker Mode
Incoming datum: Subj.anim-Ø (non-marked animate subject)In speaker mode:
Algorithm produces one of the forms:a. Subj.anim-Ø (= learning datum, nothing happens)b. Subj.anim-ERG (satisfying FAITH)
Comparison with learning datum:b. *STRUC favors datum and is promoted, m:S/+a disfavors datum and is demoted. Ultimately, *STRUC will rank higher than m:S/+a, suppressing marking of animate subjects.
In general: If a form is produced that differs from the datum and is– a non-marked NP: promotion of *STRUC and/or demotion of marking constraint (see example)– a case-marked NP: demotion of *STRUC, promotion of FAITH if case marking is different.
Assume current constraint ranking includes the following relative ranking, where m:S/+a: ‘mark animate subjects’ and *STRUC: ‘avoid marking’
m:S/+a *STRUC
m:S/+a *STRUC
Learning under the Microscope: Hearer Mode
Incoming datum: Subj.anim-Ø (non-marked animate NP interpreted as subject)In hearer mode:
Algorithm produces one of the interpretations (as subject or object):a. Subj.anim-Ø (= learning datum, nothing happens)b. Obj.anim-Ø
Comparison with learning datum:b. m:S/+a favors datum and is promoted, m:O/+a disfavors datum and is demoted.
In general: If a meaning is produced that differs from the datum and the NP is– a case-marked NP: promotion of FAITH– a non-marked NPs: promotion and/or demotion of marking constraints (see example)
Assume current constraint ranking includes the following relative ranking, where m:S/+a: ‘mark animate subjects’ and m:O/+a: ‘mark animate objects’
m:S/+a m:O/+a
m:S/+a m:O/+a