32
Uncertainty reduction as a predictor of reading difficulty in Chinese relative clauses Zhong Chen Lena Jäger John Hale Cornell University USA Universität Potsdam Germany Cornell University USA IsCLL-13, Taipei June 2012

Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Uncertainty reduction as a predictor of reading difficulty in Chinese relative clauses

Zhong Chen Lena Jäger John Hale

Cornell UniversityUSA

Universität PotsdamGermany

Cornell UniversityUSA

IsCLL-13, TaipeiJune 2012

Page 2: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Plan

1. Entropy Reductiona complexity metric for human sentence comprehension

2. Processing Chinese Relative clausesprevious proposals vs. new resultsmodel reading-time data

Page 3: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Incremental processing

Page 4: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Incremental processing

the ...

Page 5: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Incremental processing

the sailor ...

Page 6: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Guess the rest of the sentence

the sailor who ...

Page 7: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

the sailor who told...

Guess the rest of the sentence(again)

Page 8: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Reducing entropy

the sailor who

the sailor who told

entropy9.5 bits

7.3 bits

info gain = 2.2 bits

Page 9: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

the sailor who tell �ed the story be �ed interesting

1

2

3

4

ERat this prefix

Incremental complexity metric

Page 10: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Probabilistic grammar

Brown

corpus

1430 subject RC929 direct object167 indirect object41 oblique34 genitive subject4 genitive object

2. construction frequencies

1. formal grammar

e.g. Minimalist Grammars (Stabler 97)

CP

C’✭✭✭✭✭✭✭✭C

❤❤❤❤❤❤❤❤TP

DP(4)

D’✘✘✘✘✘D

the

RelP✥✥✥✥✥✥

DP(1)✧✧

NP(0)

N’

N

boy

❜❜D’

✚✚D

who

NumP

Num’✱✱

NumNP

t(0)

Rel’✘✘✘✘✘

Rel

TP✭✭✭✭✭✭✭

DP(3)

D’✚✚

D

the

NumP

Num’✱✱

Num❧❧NP

N’

N

sailor

❤❤❤❤❤❤❤T’✘✘✘✘✘

T✧✧

v✚✚

V

tell

v

❜❜T

-ed

vP✦✦✦

DP

t(3)

❛❛❛v’✏✏✏✏

v

t

����VP✏✏✏✏

DP(2)

D’✚✚

D

the

NumP

Num’✱✱

Num❧❧NP

N’

N

story

����V’✧✧

DP

t(2)

❜❜V’

✚✚V

t

p toP

p to’✏✏✏✏p to✚✚

Pto

to

p to

����PtoP✟✟✟

DP(1)

t(1)

❍❍❍Pto’✱✱

Pto

t

❧❧DP

t(1)

T’✦✦✦T

✱✱Be

be

❧❧T

-ed

❛❛❛BeP

Be’✧✧

Be

t

❜❜aP

✧✧DP

t(4)

❜❜a’

a AP

A’

A

interesting

1

Page 11: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Entropy

H(X) = −�

x∈Xp(x) log2 p(x)

Page 12: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Entropy Reduction

expectations about as-yet-unheard words. As new words come in, given what one has already read, the probability of grammatical alternatives fluctuates. The idea of ER is that decreased uncertainty about the whole sentence, including probabilistic expectations, correlates with observed processing difficulty.1 Such processing difficulty reflects the amount of information that a word supplies about the overall disambiguation task in which the reader is engaged.

The average uncertainty of specified alternatives can be quantified using the fundamental information-theoretic notion, entropy, as formulated below in definition (1). In a language-processing scenario, the random variable X in (1) might take values that are derivations on a probabilistic grammar G. We could further specialize X to reflect derivations proceeding from various categories, e.g., NP, VP, S etc. Since rewriting grammars always have a start symbol, e.g. S, the expression HG(S) reflects the average uncertainty of guessing any derivation that G generates.

!

H(X) = " p(x)log2 p(x)x#X$ (1)

This entropy notation extends naturally to express conditioning events. If w1w2… wi

is an initial substring of a sentence generated by G, the conditional entropy HG(S|w1w2… wi) will be the uncertainty about just those derivations that have w1w2…wi as a prefix.2 By abbreviating HG(S|w1w2… wi) with Hi, the cognitive load ER(i) reflects the difference between conditional entropies before and after wi, a particular word in a particular position in a sentence.

!

ER(i) =Hi"1 "Hi when this difference is positive0 otherwise# $ % (2)

Formula (2) defines the ER complexity metric. It says that cognitive work is

predicted whenever uncertainty about the sentence’s structure, as generated by the grammar G, goes down after reading in a new word.

Intuitively, disambiguation occurs when the uncertainty about the rest of the sentence decreases. In such a situation, readers’ “beliefs” in various syntactic alternatives take on a more concentrated probability distribution (Jurafsky, 1996). The disambiguation work spent on this change is exactly the entropy reduction. By contrast when beliefs about syntactic alternatives become more disorganized, e.g. there exist many equiprobable syntactic expectations, then disambiguation work has not been done and the parser has gotten more confused. The background assumption of ER is that human sentence comprehension is making progress towards a peaked, disambiguated parser state and that the disambiguation efforts made during this process can be quantified by the reductions of structural uncertainty conveyed by words.

The ER proposal is not to be confused with another widely applied complexity metric, Surprisal (Hale, 2001; Levy, 2008), which is the conditional expectation of the log-ratio between forward probabilities of string prefixes before and after a word. 1 The ER is a generalization of Narayanan and Jurafsky’s (2002) idea of the “flipping the preferred interpretation”. Here the flip only counts if the reader moves towards a less-confused state of mind. 2 Conditional entropies can be calculated using standard techniques from computational linguistics such as chart parsing (Bar-Hillel, Perles & Shamir, 1964; Nederhof & Satta, 2008). See Chapter 13 of Jurafsky and Martin (2008) for more details.

Let w1w2…wi be an initial substring of a sentence generated by the grammar.

Page 13: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

ExpectationsH3 = 9.5 bits

1. 0.16 “the sailor who tell -ed the story be -ed interesting”

2. 0.11 “the sailor who tell -ed the sailor be -ed interesting”

3. 0.10 “the sailor who the story tell -ed be -ed interesting”

4. 0.07 “the sailor who the sailor tell -ed be -ed interesting”

5. 0.01 “the sailor who tell -ed the sailor who tell -ed the story be -ed interesting”

6. 0.01 “the sailor who tell -ed the sailor who tell -ed the sailor be -ed interesting”

7. 0.009 “the sailor who the sailor who tell -ed the story tell -ed be -ed interesting”

8. 0.009 “the sailor who tell -ed the sailor who the story tell -ed be -ed interesting”

9. 0.008 “the sailor who tell -ed the sailor which tell -ed the story be -ed interesting”

10. 0.006 “the sailor who the sailor who tell -ed the sailor tell -ed be -ed interesting”...

Page 14: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

ExpectationsH3 = 9.5 bits

1. 0.16 “the sailor who tell -ed the story be -ed interesting”

2. 0.11 “the sailor who tell -ed the sailor be -ed interesting”

3. 0.10 “the sailor who the story tell -ed be -ed interesting”

4. 0.07 “the sailor who the sailor tell -ed be -ed interesting”

5. 0.01 “the sailor who tell -ed the sailor who tell -ed the story be -ed interesting”

6. 0.01 “the sailor who tell -ed the sailor who tell -ed the sailor be -ed interesting”

7. 0.009 “the sailor who the sailor who tell -ed the story tell -ed be -ed interesting”

8. 0.009 “the sailor who tell -ed the sailor who the story tell -ed be -ed interesting”

9. 0.008 “the sailor who tell -ed the sailor which tell -ed the story be -ed interesting”

10. 0.006 “the sailor who the sailor who tell -ed the sailor tell -ed be -ed interesting”...

Page 15: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

ExpectationsH4 = 7.3 bits

1. 0.29 “the sailor who tell -ed the story be -ed interesting”

2. 0.20 “the sailor who tell -ed the sailor be -ed interesting”

3. 0.02 “the sailor who tell -ed the sailor who tell -ed the story be -ed interesting”

4. 0.019 “the sailor who tell -ed the sailor who tell -ed the sailor be -ed interesting”

5. 0.017 “the sailor who tell -ed the sailor who the story tell -ed be -ed interesting”

6. 0.016 “the sailor who tell -ed the sailor which tell -ed the story be -ed interesting”

7. 0.012 “the sailor who tell -ed the sailor who the sailor tell -ed be -ed interesting”

8. 0.011 “the sailor who tell -ed the sailor which tell -ed the sailor be -ed interesting”

9. 0.010 “the sailor who tell -ed the sailor which the story tell -ed be -ed interesting”

10. 0.008 “the sailor who tell -ed the boy be -ed interesting”...

Page 16: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Plan

1. Entropy reductiona complexity metric for human sentence comprehension

2. Processing Chinese Relative clausesprevious proposals vs. new resultsmodel reading-time data

Page 17: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Subject relative preferenceSubject relatives (SRs) are easier to process than object relatives (ORs).

Evidence

in Dutch, English, French, German, Japanese, Korean ...

using self-paced reading, eye-tracking, ERP, fMRI, PET ...

Page 18: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

RC processing principlesTable 1. Processing principles proposed for relative clauses

Broad Categories General Proposals

WORD ORDER Bever (1970); MacDonald & Christiansen (2002)

The sequence of words in SRs is closer to the canonical word order than that in ORs.

ACCESSIBILITY HIERARCHY Keenan & Comrie (1977)

Universal markedness hierarchy of grammatical relations ranks the relativization from subject higher.

LINEAR DISTANCE: Wanner & Maratsos (1978); Gibson (2000); Lewis & Vasishth (2005) WORKING

MEMORY STRUCTURAL DISTANCE: O’Grady (1997); Hawkins (2004)

ORs are harder than SRs because they impose a greater working memory burden.

TUNING HYPOTHESIS: Mitchell, Cuetos, Corley & Brysbaert (1995); Jurafsky (1996)

SRs occur more frequently than ORs and therefore are more expected and easier to process.

SURPRISAL: Hale (2001); Levy (2008)

ORs are more difficult because they require a low-probability rule.

STRUCTURAL FREQUENCY

ENTROPY REDUCTION: Hale (2006)

ORs are harder because they force the comprehender through more confusing intermediate states.

2.2 Conflicting Results in Chinese In the past decade, the SR advantage demonstrated in English and other languages

has held up but not in every experiment. The processing-difficulty contrast between Chinese SRs and ORs, shown below in (3), is particularly interesting because previous studies have reported conflicting results. (3) a. Subject Relatives

[ ei 邀請 富豪 的 ]RC 官員i 打了 記者 ei invite tycoon de officiali hit reporter ‘The official who invited the tycoon hit the reporter.’ b. Object Relatives [ 富豪 邀請 ei 的 ]RC 官員i 打了 記者 tycoon invite ei de officiali hit reporter ‘The official who the tycoon invited hit the reporter.’

In the above examples, the head noun “official” comes after the relative clause. As

a result, the distance between the gap, indicated with a co-indexed empty category ei, and the relativized head noun in SRs is longer than that in ORs in contrast to postnominal RCs in languages like English. This distance between filler and gap is particularly relevant to Working Memory theories based on linear distance (Table 1).

Page 19: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

e.g. Wanner, Maratsos & Kaplan’s HOLD hypothesis (1970s)or Gibson’s DLT (1990s) or Lewis & Vasishth ACT-R model (2000s)

Memory load

SRC The reporter who t sent the photographer to the editorhoped for a good story

ORC The reporter who the photographer sent t to the editorhoped for a good story

2

0

Page 20: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

RC processing principlesTable 1. Processing principles proposed for relative clauses

Broad Categories General Proposals

WORD ORDER Bever (1970); MacDonald & Christiansen (2002)

The sequence of words in SRs is closer to the canonical word order than that in ORs.

ACCESSIBILITY HIERARCHY Keenan & Comrie (1977)

Universal markedness hierarchy of grammatical relations ranks the relativization from subject higher.

LINEAR DISTANCE: Wanner & Maratsos (1978); Gibson (2000); Lewis & Vasishth (2005) WORKING

MEMORY STRUCTURAL DISTANCE: O’Grady (1997); Hawkins (2004)

ORs are harder than SRs because they impose a greater working memory burden.

TUNING HYPOTHESIS: Mitchell, Cuetos, Corley & Brysbaert (1995); Jurafsky (1996)

SRs occur more frequently than ORs and therefore are more expected and easier to process.

SURPRISAL: Hale (2001); Levy (2008)

ORs are more difficult because they require a low-probability rule.

STRUCTURAL FREQUENCY

ENTROPY REDUCTION: Hale (2006)

ORs are harder because they force the comprehender through more confusing intermediate states.

2.2 Conflicting Results in Chinese In the past decade, the SR advantage demonstrated in English and other languages

has held up but not in every experiment. The processing-difficulty contrast between Chinese SRs and ORs, shown below in (3), is particularly interesting because previous studies have reported conflicting results. (3) a. Subject Relatives

[ ei 邀請 富豪 的 ]RC 官員i 打了 記者 ei invite tycoon de officiali hit reporter ‘The official who invited the tycoon hit the reporter.’ b. Object Relatives [ 富豪 邀請 ei 的 ]RC 官員i 打了 記者 tycoon invite ei de officiali hit reporter ‘The official who the tycoon invited hit the reporter.’

In the above examples, the head noun “official” comes after the relative clause. As

a result, the distance between the gap, indicated with a co-indexed empty category ei, and the relativized head noun in SRs is longer than that in ORs in contrast to postnominal RCs in languages like English. This distance between filler and gap is particularly relevant to Working Memory theories based on linear distance (Table 1).

Page 21: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Chinese relative clauses

SRC [[

eiei

��yaoqinginvite

��fuhaotycoon

�deDE

]]

(��i)guanyuaniofficial

��da-lehit

��jizhereporter

‘The official who invited the tycoon hit the reporter.’

ORC [

[

��fuhaotycoon

��yaoqinginvite

eiei

�deDE

]

]

(��i)guanyuaniofficial

��da-lehit

��jizhereporter

‘The official who the tycoon invited hit the reporter.’

Page 22: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Conflicting results

SR Preference OR Preference

Lin & Bever (2006, 2007, 2011) Lin (2008) Wu (2009) Chen, Jäger, Li & Vasishth (under review)

Hsiao & Gibson (2003) Hsu & Chen (2007) Lin & Garnsey (2011) Gibson & Wu (In press)

Page 23: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Temporary ambiguities

SRC [[

eiei

��yaoqinginvite

��fuhaotycoon

�deDE

]]

(��i)guanyuaniofficial

��da-lehit

��jizhereporter

‘The official who invited the tycoon hit the reporter.’

ORC [

[

��fuhaotycoon

��yaoqinginvite

eiei

�deDE

]

]

(��i)guanyuaniofficial

��da-lehit

��jizhereporter

‘The official who the tycoon invited hit the reporter.’

Page 24: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

那個 昨晚 [ ei 揍了 服務生 一頓 的 ]RC 顧客i 見過 老板 …na-ge zuowan ei zuole fuwusheng yi-dun de guke jian-guo laobandet-cl last night ei hit-asp waiter one-cl de customeri see-asp boss …

‘That customer who hit the waiter last night had seen the boss before…’

SR

那個 昨晚 [ 服務生 揍了 ei 一頓 的 ]RC 顧客i 見過 老板 …na-ge zuowan fuwusheng zuole ei yi-dun de guke jian-guo laobandet-cl last night waiter hit-asp ei one-cl de customeri see-asp boss …

‘That customer who the waiter hit last night had seen the boss before…’

OR

Less-ambiguous

Page 25: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Observed reading times

Det Adv V(sr)/N(or) N(sr)/V(or) Comp DE head head+1 head+2

200

300

400

500

600

700

800

900

1000

Read

ing

Tim

e (m

s)

Subject modifying relative clauses

ORSR

Det+CL Adv V(sr)/N(or) N(sr)/V(or) FreqP/DurP DE head head+1 head+2

200

300

400

500

600

700

800

900

1000

Read

ing

Tim

e (m

s)

ORSR

Page 26: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

movement analysisKayne (1994)

A sample of lexical entries in a Minimalist Grammar

Formal grammar

Noun

e Vt

e de

e

N, -case, -wh

N-null, -case, -wh

=N, +case, V =>V, N, v

=T, +f, C, -k

=C, +wh, N

X-bar structures

1. 0.895711587708CP

C’✭✭✭✭✭✭C

❤❤❤❤❤❤TP✭✭✭✭✭✭✭✭

DP(6)

D’✧✧

D

Det

❜❜CLP

CL’✥✥✥✥✥CL

Cl

NP✭✭✭✭✭✭

CP(4)✭✭✭✭✭✭TP(3)✘✘✘✘

TempP

Temp’

Temp

Time

TP✏✏✏

NP(2)

t(2)

���T’✏✏✏

T���

vP✏✏✏NP

t(2)

���v’✘✘✘✘

v✑✑

V

Vt

v

VP✦✦✦

NP(1)

N’

N

Noun

❛❛❛V’✦✦✦

NP

t(1)

❛❛❛V’✟✟

FP(0)

F’

F

Freq

❍❍V’★★V

t

❝❝FP

t(0)

❤❤❤❤❤❤C’✧✧

C

de

◗◗TP

t(3)

❤❤❤❤❤❤N’✏✏✏

N���

NP✦✦✦NP(2)

N’

N

Noun

❛❛❛N’✧✧

N◗◗CP

t(4)

❤❤❤❤❤❤❤❤T’✦✦✦

T❛❛❛

vP✦✦✦DP

t(6)

❛❛❛v’✦✦✦

v

V

Vt

v

❛❛❛VP

✟✟DP(5)

D’

D NP

N’

N

Noun

❍❍V’

✱✱V

t

❧❧DP

t(5)

2

Page 27: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Distributional information

The Entropy Reduction complexity metric derives processing difficulty, in part, from probabilities. This means we need to weight the prepared MG grammar with help from language resources.

The methodology of the grammar weighting is to estimate probabilistic context-free rules of MG derivations by parsing a “mini-Treebank.” We obtain corpus counts8 for relevant structural types in the Chinese Treebank 7.0 (Xue et al., 2010) and treat these counts as attestation counts of certain construction in a mini-Treebank. For example, Table 3 lists corpus counts of four RC constructions. It suggests that SRs are more frequent than ORs when they modify matrix subjects. In addition, ORs tend to allow more covert heads than SRs. Through parsing each of these sentence types in the mini-Treebank, we estimate the weight of particular grammar rules by adding up the number of time such a rule is used in sentence construction while also considering how often each sentence type occurs in the mini-Treebank (Chi, 1999). However, for structurally stricter sentence types such as the RC with a determiner-classifier and a frequency phrase in (2), we could not get enough corpus counts from the Chinese Treebank. We then estimate their counts proportionally based on their counterparts in a simpler version such as regular RCs in (1).

Table 3. A fragment of the “Mini-Treebank” that includes corpus counts on Chinese RCs

Strings Construction Types Corpus Counts

Vt Noun de Noun Vt Noun Vt Noun de Vt Noun Noun Vt de Noun Vt Noun Noun Vt de Vt Noun

Subject-modifying SR with Vt Subject-modifying headless SR with Vt Subject-modifying OR with Vt Subject-modifying headless OR with Vt

366 123 203 149

3.3 Prefix Parsing as Intersection Grammars

The weighted grammar allows us to calculate the probability of constructing a complete sentence. But since we are more interested in predicting human comprehension difficulty word-by-word, it is necessary for us to determine the entropy of a prefix string.

Based on a weighted grammar G obtained after the training stage, we use chart parsing to recover probabilistic “intersection” grammars G’ conditioned on each prefix of the sentences of interest (Nederhof & Satta, 2008). An intersection grammar derives all and only the sentences in the language of G that are consistent with the initial prefix. It implicitly defines comprehenders’ expectations about how the sentence continues. Given the prefix string, the conditional entropy of the start symbol models a reader’s degree of confusion about which construction he or she is in at that point in the sentence. Comparing the conditional entropies before and after adding a new word, any decrease will quantify disambiguation work that, ideally, could have been done at that word.

Besides computing entropies, our system also samples syntactic alternatives from intersection grammars to get an intuitive picture of how uncertainties are reduced during parsing. These syntactic alternatives are discussed further in Section 3.4. 8 Corpus inquiries about construction types were done by using the pattern-matching tool Tregex (Levy & Andrew, 2006). For an example of Tregex queries for Chinese RCs, see Table 4 in Chen et al. In Press.

Page 28: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Entro

py R

educ

tion

(bit)

Det+Cl Time Vt(sr)/N(or) N(sr)/Vt(or) Freq DE N(head)0 0 0.020.02

2.32

2.53

0.530.60

0.86

0.61

0 0

0.66

0.97

SROR

Derived from grammar

Page 29: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

RC-medial disambiguation work

Rank Cond. Prob. Continuation

1 2 3 4 5 …

0.386 0.116 0.079 0.065 0.060 …

Det Cl Time Vt Noun de Noun Vt Noun Det Cl Time Vt Noun de Noun Vi Det Cl Time Vt Noun de Vt Noun Det Cl Time Vt Noun Freq de Noun Vt Noun Det Cl Time Vt de Noun Vt Noun …

Entropy: 3.815 bits

Rank Cond. Prob. Continuation

1 2 3 4 5 …

0.168 0.083 0.057 0.050 0.043 …

Det Cl Time Vt Noun de Noun Vt Noun Det Cl Time Noun Vt de Noun Vt Noun Det Cl Time Time Vt Noun de Noun Vt Noun Det Cl Time Vt Noun de Noun Vi Det Cl Time Noun Vt de Vt Noun …

Entropy: 6.131 bits Rank Cond. Prob. Continuation

1 2 3 4 5 …

0.381 0.197 0.068 0.066 0.045 …

Det Cl Time Noun Vt de Noun Vt Noun Det Cl Time Noun Vt de Vt Noun Det Cl Time Noun Vt Freq de Noun Vt Noun Det Cl Time Noun de Noun Vt de Noun Vt Noun Det Cl Time Noun Vt Freq de Vt Noun …

Entropy: 3.600 bits

!!!

!!!

!!!

!

SR

H ↓ = 2.32 bits

OR

H ↓ = 2.53 bits

SR

H ↓ = 0.66 bits

OR

H ↓ = 0.97 bits

!!!

!!!

!!!

!

SR

H ↓ = 2.32 bits

OR

H ↓ = 2.53 bits

SR

H ↓ = 0.66 bits

OR

H ↓ = 0.97 bits

SR

OR

Page 30: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Is it headless?

Rank Cond. Prob. Continuation

1 2 3 4 5 …

0.552 0.166 0.112 0.058 0.034 …

Det Cl Time Vt Noun Freq de Noun Vt Noun Det Cl Time Vt Noun Freq de Noun Vi Det Cl Time Vt Noun Freq de Vt Noun Det Cl Time Vt Noun Freq de Noun Vt Noun de Noun Det Cl Time Vt Noun Freq de Vi …

Entropy: 2.406 bits

Rank Cond. Prob. Continuation

1 2 3 4 5 …

0.664 0.200 0.069 0.008 0.008 …

Det Cl Time Vt Noun Freq de Noun Vt Noun Det Cl Time Vt Noun Freq de Noun Vi Det Cl Time Vt Noun Freq de Noun Vt Noun de Noun Det Cl Time Vt Noun Freq de Noun Vt Det Cl Vt Noun de Noun Det Cl Time Vt Noun Freq de Noun Vt Vt Noun de Noun …

Entropy: 1.750 bits

Rank Cond. Prob. Continuation

1 2 3 4 5 …

0.490 0.320 0.051 0.033 0.015 …

Det Cl Time Noun Vt Freq de Noun Vt Noun Det Cl Time Noun Vt Freq de Vt Noun Det Cl Time Noun Vt Freq de Noun Vt Noun de Noun Det Cl Time Noun Vt Freq de Vt Noun de Noun Det Cl Time Noun Vt Freq de Noun Vi …

Entropy: 2.390 bits

Rank Cond. Prob. Continuation

1 2 3 4 5 …

0.810 0.085 0.025 0.010 0.010 …

Det Cl Time Noun Vt Freq de Noun Vt Noun Det Cl Time Noun Vt Freq de Noun Vt Noun de Noun Det Cl Time Noun Vt Freq de Noun Vi Det Cl Time Noun Vt Freq de Noun Vt Det Cl Vt Noun de Noun Det Cl Time Noun Vt Freq de Noun Vt Vt Noun de Noun …

Entropy: 1.422 bits

!!!

!!!

!!!

!

SR

H ↓ = 2.32 bits

OR

H ↓ = 2.53 bits

SR

H ↓ = 0.66 bits

OR

H ↓ = 0.97 bits

!!!

!!!

!!!

!

SR

H ↓ = 2.32 bits

OR

H ↓ = 2.53 bits

SR

H ↓ = 0.66 bits

OR

H ↓ = 0.97 bits

SR

OR

Page 31: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Conclusions

entropy reduction:

is calculated based on generative grammars weighted by corpus counts

derives consistent processing patterns with empirical findings, e.g. Chinese RCs

instantiates the structural frequency idea by highlighting the particular disambiguation decisions at each word

Page 32: Uncertainty reduction as a predictor of reading difficulty ...conf.ling.cornell.edu/zhongchen/pdfs/iscll_talk.key.pdf · comprehension 2. Processing Chinese Relative clauses previous

Thank youTim Hunter (Yale)

Charles Lin (Indiana)

Shravan Vasishth (Potsdam)

John Whitman (Cornell)

Jiwon Yun (Cornell)

Cognitive Science @ Cornell