41
The role of syntactic structure in guiding prosody perception with spontaneous speech Jennifer Cole Yoonsook Mo Soondo Baek University of Illinois at Urbana-Champaign

The role of syntactic structure in guiding prosody ...prosody.beckman.illinois.edu/pubs/39_Cole_etal_2008_ETAP_slides.pdfThe role of syntactic structure in guiding prosody perception

  • Upload
    lehanh

  • View
    231

  • Download
    3

Embed Size (px)

Citation preview

The role of syntactic structure in guiding prosody perception

with spontaneous speech

Jennifer Cole

Yoonsook Mo Soondo Baek

University of Illinois at Urbana-Champaign

This work supported by:

$$$ NSF IIS-0414117

Friends and collaborators:

José Ignacio Hualde Eun Kyung Lee

April 13, 2008 ETAP 2

José Ignacio Hualde Eun Kyung LeeMark Hasegawa-Johnson Xiaodan ZhuangChilin Shih Zak HulstromMargaret Fleck Tae-Jin Yoon

Thanks to the ETAP 2008 conference organizers and participants!

This talk presents evidence that “naïve” listeners perceive prosodic phrases in everyday speech.

Part 1. The methods of our naïve prosody transcription experiment

Part 2. The syntactic alignment of [ [ ] [ ] ]

April 13, 2008 ETAP 3

Part 2. The syntactic alignment of perceived prosodic boundaries

Part 3. Is boundary perception driven by by syntax, acoustics, or both? Correlation analyses

[S [NP ]NP [VP ]VP ]S

B-score

pre

dic

tor

??

Prosody and phrasing

• Every language displays a characteristic pattern of prosodic modulation that serves to break continuous speech into smaller, meaningful chunks (phrasing).

April 13, 2008 ETAP 4

meaningful chunks (phrasing).

• Through its phrasing aspect, prosody defines an interface between syntax, semantics, and phonology.

Prosody and phrasing

• Every language displays a characteristic pattern of prosodic modulation that serves to break continuous speech into smaller, meaningful chunks (phrasing).

April 13, 2008 ETAP 5

meaningful chunks (phrasing).

• Through its phrasing aspect, prosody defines an interface between syntax, semantics, and phonology.

Getting the prosody correct is important for computer speech applications, as Jurafsky showed in his talk yesterday.

How do ordinary listeners perceive prosody in everyday speech?

• Do listeners differ in how they perceive prosody for the same utterance?

– A qualified “yes”

• What properties of an utterance determine how

April 13, 2008 ETAP 7

• What properties of an utterance determine how

prosody is perceived?

– acoustic, phonological, syntactic, semantic,

pragmatic…

Prosodic transcription as a tool for prosody research

• Reliability

• High agreement rates between transcribers on the same

utterance(s) indicate:

– Speakers produce salient acoustic cues to prosody, and

April 13, 2008 ETAP 8

– Speakers produce salient acoustic cues to prosody, and

– Listeners perceive acoustic cues similarly.

OR… prosody perception is determined in part by “higher” level structure.� Prosody perception is expectation-driven

� Prosody perception is signal-driven

Our method: Naïve prosody transcription

• The transcribers: many (70+) “naïve” listeners.

• The transcriptions: locate prominent words and boundaries,

ignoring differences in tune and strength

Adapting a method from Rotondo 1984; Streefkerk et al. 1997, 1998; Swerts 1997; Buhmann et al. 2001

April 13, 2008 ETAP 9

ignoring differences in tune and strength

• The analysis: evaluates variation in prosodic transcription across

listeners and assigns probabilistic prosody labels

• Speed: Real-time comprehension, to diminish strategic analysis

• Reliability: measured using Fleiss’ Kappa statistic to calculate

agreement rates for multiple (> 2) transcribers.

Our method: Naïve prosody transcription

Listeners hear speech excerpts from the Buckeye Corpus

of American English spontaneous speech (Pitt et al. 2007)

– 38 short excerpts (19 speakers x 2 excerpts), ~ 25 sec. each

– 2110 words

Listeners mark-up a printed transcript of each excerpt (no punctuation or capitalization):

April 13, 2008 ETAP 10

Listeners mark-up a printed transcript of each excerpt (no punctuation or capitalization):

• Prominence: word word word

• Boundary: word | word word…

Transcriptions pooled over listeners to obtain two population-wise prosody scores for each word:

P-score & B-score

Our method: Naïve prosody transcription

Listeners hear speech excerpts from the Buckeye Corpus

of American English spontaneous speech (Pitt et al. 2007)

– 38 short excerpts (19 speakers x 2 excerpts), ~ 25 sec. each

– 2110 words

Listeners mark-up a printed transcript of each excerpt (no punctuation or capitalization):

April 13, 2008 ETAP 11

Listeners mark-up a printed transcript of each excerpt (no punctuation or capitalization):

• Prominence: word word word

• Boundary: word | word word…

Transcriptions pooled over listeners to obtain two population-wise prosody scores for each word:

P-score & B-score

The B-score codes the boundary label at the right edge of the word:

…word | word word…0.4 0.0

Our method: Naïve prosody transcription

Probabalistic prosody scores by word

0.6

0.7

0.8

0.9

1

p(P)

p(B)

Pro

sody

score

April 13, 2008 ETAP 12

0

0.1

0.2

0.3

0.4

0.5

i

don't

know a

lot

of

the

pla

ces

that

are

com

ing

up

the

gentlem

an's

clu

bs

and

thin

gs

like

that i

just i

thin

k

that's

gonna

be

really

bad

for

the

kid

s

Pro

sody

score

z=2.32, α=0.01Exp.1 Exp. 2

Grp.1 Grp.2 Grp.3 Grp.4

ProminenceKappa 0.373 0.421 0.394 0.407

z 19.43 20.48 18.15 18.31

boundaryKappa 0.612 0.544 0.621 0.575

Assessing inter-transcriber agreement

April 13, 2008 ETAP 13

boundaryKappa 0.612 0.544 0.621 0.575

z 27.62 21.87 25.05 26.22

Fleiss’ multi-rater Kappa coefficient & z-statistic were used to assess agreement

���� Agreement between transcribers is highly significant for labeling of prominent words and boundaries

���� Agreement is higher for boundaries than for prominent words.

z=2.32, α=0.01Exp.1 Exp. 2

Grp.1 Grp.2 Grp.3 Grp.4

ProminenceKappa 0.373 0.421 0.394 0.407

z 19.43 20.48 18.15 18.31

boundaryKappa 0.612 0.544 0.621 0.575

Assessing inter-transcriber agreement

April 13, 2008 ETAP 14

boundaryKappa 0.612 0.544 0.621 0.575

z 27.62 21.87 25.05 26.22

Fleiss’ multi-rater Kappa coefficient & z-statistic were used to assess agreement

���� Agreement between transcribers is highly significant for labeling of prominent words and boundaries

���� Agreement is higher for boundaries than for prominent words.

Assessing inter-transcriber agreement

Pairwise transcriber agreement by Cohen’s Kappa statistic

Mean Kappa for Boundary = 0.582

Mean Kappa for

April 13, 2008 ETAP 15

Mean Kappa for Prominence = 0.392

Two transcribers listening to the same speaker perceive prosody differently!

Variability in perceived prosody -- by speaker

8.0

10.0

12.0 P-interval

B-interval

Interval for P > B

Some chunks have

Mean number of words between prominences and boundaries

# w

ord

s (

mean)

April 13, 2008 ETAP 16

0.0

2.0

4.0

6.0

8.0

22 21 25 32 11 4 24 26 2 33 35 15 16 10 3 17 13 14

Some chunks have

no prominences!

Speaker (by id. number)

# w

ord

s (

mean)

Variability in perceived prosody -- by speaker

8.0

10.0

12.0 P-interval

B-interval

Mean number of words between prominences and boundaries

# w

ord

s (

mean)

April 13, 2008 ETAP 17

0.0

2.0

4.0

6.0

8.0

22 21 25 32 11 4 24 26 2 33 35 15 16 10 3 17 13 14

Interval for P ≈ B

One prominence per

chunk.

Speaker (by id. number)

# w

ord

s (

mean)

Variability in perceived prosody -- by speaker

8.0

10.0

12.0 P-interval

B-interval

Mean number of words between prominences and boundaries

# w

ord

s (

mean)

April 13, 2008 ETAP 18

0.0

2.0

4.0

6.0

8.0

22 21 25 32 11 4 24 26 2 33 35 15 16 10 3 17 13 14

Interval for P < B

Multiple prominences

per chunk.

Speaker (by id. number)

# w

ord

s (

mean)

Interim Summary: Part 1

• Naïve transcribers agree on the location of prosodic boundaries and prominent words in spontaneous speech at levels well above chance.

• Agreement is highest for prosodic phrase

April 13, 2008 ETAP 19

• Agreement is highest for prosodic phrase boundaries.

• Even so, there is variation in the transcriptions across listeners, and by speaker.

• Variation in perception yields a probabilistic prosody score on each word.

Interim Summary: Part 1

• Naïve transcribers agree on the location of prosodic boundaries and prominent words in spontaneous speech at levels well above chance.

• Agreement is highest for prosodic phrase A continuous measure.

April 13, 2008 ETAP 20

• Agreement is highest for prosodic phrase boundaries.

• Even so, there is variation in the transcriptions across listeners, and by speaker.

• Variation in perception yields a probabilistic prosody score on each word.

A continuous measure.Watson’s and Ladd’s talks suggest continuous prosody features based on production evidence.

Perceived prosodic boundaries:Syntactic correlates

• Utterances manually coded for syntactic structure, following Penn Treebank guidelines.

• Each word was coded for the highest syntactic category at its left and right edge:

[S [NP ]NP [VP ]VP ]S

April 13, 2008 ETAP 21

category at its left and right edge:

Example: “ [ I ] [ think ] in today’s world…”

S NP VP Word (= phrase-medial)

Mean B-Scores: 0.9 0.0 = 0.0 0.2

Perceived prosodic boundaries:Syntactic encoding

[S [NP ]NP [VP ]VP ]S

S matrix S

S-bar subordinate or relative clause

the fact | that it was something

Clausal categories

April 13, 2008 ETAP 22

the fact | that it was something

S2 S preceded by conj. or rel. pronounthe fact that | it was something

if | he would choose

CC-S coordinating conjunction following or preceding a sentence

| if he would choose

Perceived prosodic boundaries:Syntactic encoding

[S [NP ]NP [VP ]VP ]S

CC-XP coordinating conj. preceding or following an XP

we did | or didn’t …

Phrase (XP) any XP that is not a clause

Non-clausal categories

April 13, 2008 ETAP 23

Phrase (XP)

Within phrase any word boundary that does not align with a coded syntactic boundary

Disfluency filled pause, repetition & repair difluencies

Discourse marker yknow, like, so, I mean, …

Other

Frequency of syntactic boundariesleft, right

[S [NP ]NP [VP ]VP ]S

S 124, 159

S-bar 112, 22

S2 154, 0

CC-S 84, 84

April 13, 2008 ETAP 24

CC-S 84, 84

CC-XP 37, 34

Phrase (XP) 776, 480

Within phrase 410, 1016

Disfluency 78, 78

Discourse marker 60, 64

Other 237, 158

Perceived prosodic boundaries:Syntactic correlates

[S [NP ]NP [VP ]VP ]S

Boundary scores are generally high at the edges ofclauses andcoordinating

Mean boundary scores by syntactic category at L/R word edge

Co.Conj.

Clauses.50

April 13, 2008 ETAP 25

coordinating conjunctions

Patterns of correlation are stronger for B-scores at right edge of a word, than at left edge.

We focus now on right-edge patterns.

0

Perceived prosodic boundaries:Syntactic correlates

[S [NP ]NP [VP ]VP ]S

A decreasing cline of Boundary scores at the right edges of:

Mean boundary scores by syntactic category at right edge of word

April 13, 2008 ETAP 26

S

S-b

ar

CC

-S

XP

C-X

P

WP

Oth

r

Syntactic boundary (right)

Clauses >

Coordinating Conj. >

XP, WP, Other

IP? ip?

Perceived prosodic boundaries:Discourse markers and disfluency

[S [NP ]NP [VP ]VP ]S

Boundary scores are HIGHEST at the right edges of clauses, discourse markers

Distribution of boundary scores by category defined at right edge of word

April 13, 2008 ETAP 27

Reference line: mean B-score for the right edge of XP

discourse markers and disfluency.

S S’ DM Disf

Perceived prosodic boundaries:Phrasing implications

[S [NP ]NP [VP ]VP ]S

B-scores are similar for right and left edge of disfluency

� disf. phrased alone

Boundary

Mean boundary score at word edge by syntactic category

April 13, 2008 ETAP 28

Perceived prosodic boundaries:Phrasing implications

[S [NP ]NP [VP ]VP ]S

B-scores are higher at right edge for clauses

Boundary

Mean boundary score at word edge by syntactic category

April 13, 2008 ETAP 29

right edge for clauses

� Prosodic phrases align with end of clause

Perceived prosodic boundaries:Phrasing implications

[S [NP ]NP [VP ]VP ]S

Boundary

Mean boundary score at word edge by syntactic category

April 13, 2008 ETAP 30

B-scores are higher at left edge of DM, CC

� These items are pro-clitics to syntactic phrase

Interim Summary- Part 2

• Boundary scores decrease at the right edge of syntactic units from higher-level units (clauses) to lower-level ones (XP, word).

• B-scores are HIGHEST at the right edges of clauses and disfluency.

• Discourse markers and coordinating conjunctions are

[S [NP ]NP [VP ]VP ]S

April 13, 2008 ETAP 31

• Discourse markers and coordinating conjunctions are perceived with a boundary at their left edge

� as pro-clitics to the following clause or phrase

• Disfluencies tend to be perceived as separate prosodic “chunks”.

Is boundary perception cued by acoustic duration?

Final lengthening effects are robust cues to prosodic boundaries in English.(Wightman et al. 1992)

These data from the BU Radio News corpus show graded lengthening of the vowel in the domain-final rime, at three levels of prosodic boundary:

April 13, 2008 ETAP 32

prosodic boundary:

Word, ip, IP

(Yoon, Cole, Hasegawa-Johnson, ICPhS, 2007)

Wd ip IP

Is boundary perception cued by acoustic duration?

Normalized duration measures are highly variable, but longer

Normalized vowel duration and B-scores from vowels in word-final stressed syllables.

April 13, 2008 ETAP 33

variable, but longer vowels are more likely to be perceived as final in the prosodic phrase.

Does vowel duration directly encode the syntactic category?

V duration at the right edge of CC-XP

“… we did or] didn’t do….”

Mean vowel duration (z-score) by syntactic category at right edge of word

1.00�

Vow

el dura

tion (

norm

aliz

ed)

April 13, 2008 ETAP 34

is highly variable, suggesting this is not a single category. Some of these CCs may exhibit final lengthening, while while others may be reduced.

2 4 6

reorders

-1.00

0.00z_rh

_d

ur

S S-b

ar

CC

-X

XP

CC

-XP

WP

Oth

er

Syntactic Category

Vow

el dura

tion (

norm

aliz

ed)

Does vowel duration directly encode the syntactic category?

Mean vowel duration (z-score) by syntactic category at right edge of word

1.00�

Vow

el dura

tion (

norm

aliz

ed)

April 13, 2008 ETAP 35

2 4 6

reorders

-1.00

0.00z_rh

_d

ur

S S-b

ar

CC

-X

XP

CC

-XP

WP

Oth

er

Syntactic Category

Vow

el dura

tion (

norm

aliz

ed)

Considering the other syntactic categories, vowel duration is longer at right edge of clauses

Does vowel duration directly encode the syntactic category?

Mean vowel duration (z-score) by syntactic category at right edge of word

1.00�

Vow

el dura

tion (

norm

aliz

ed)

April 13, 2008 ETAP 36

2 4 6

reorders

-1.00

0.00z_rh

_d

ur

S S-b

ar

CC

-X

XP

CC

-XP

WP

Oth

er

Syntactic Category

Vow

el dura

tion (

norm

aliz

ed)

Considering the other syntactic categories, vowel duration is longer at right edge of clauses…and shorter at the right edge of words that are final in XP or within-phrase.

Correlation analysis

Correlations B-scores & B-scores & V duration &

Testing for a linear relationship between three measures of each word:

• boundary score

• duration of the word-final vowel (normalized, stressed Vs only)

• syntactic category at its right edge

April 13, 2008 ETAP 37

Correlations

(Kendall’s tau)

B-scores &

V duration

B-scores &

SynCat (Right edge)

V duration &

SynCat (Right edge)

Coefficient .369 -.352 -.225

Significance <.001 <.001 <.001

B-scores increase with final V duration

Correlation analysis

Correlations B-scores & B-scores & V duration &

Testing for a linear relationship between three measures of each word:

• boundary score

• duration of the word-final vowel (normalized, stressed Vs only)

• syntactic category at its right edge

April 13, 2008 ETAP 38

Correlations

(Kendall’s tau)

B-scores &

V duration

B-scores &

SynCat (Right edge)

V duration &

SynCat (Right edge)

Coefficient .369 -.352 -.225

Significance <.001 <.001 <.001

B-scores decrease from higher to lower syntactic categories

Correlation analysis

Correlations B-scores & B-scores & V duration &

Testing for a linear relationship between three measures of each word:

• boundary score

• duration of the word-final vowel (normalized, stressed Vs only)

• syntactic category at its right edge

April 13, 2008 ETAP 39

Correlations

(Kendall’s tau)

B-scores &

V duration

B-scores &

SynCat (Right edge)

V duration &

SynCat (Right edge)

Coefficient .369 -.352 -.225

Significance <.001 <.001 <.001

Final V duration decreases from higher to lower syntactic categories

Correlation analysis

Correlations

(Kendall’s tau)

B-scores &

V duration

B-scores &

SynCat (Right edge)

V duration &

SynCat (Right edge)

Coefficient .369 -.352 -.225

Significance <.001 <.001 <.001

• Final vowel duration and the syntactic category at the right edge of the

April 13, 2008 ETAP 40

• Final vowel duration and the syntactic category at the right edge of the word are about equally correlated with B-scores.

… Is the syntactic category effect covertly encoding the effects of final vowel duration?

Correlation analysis

Correlations

(Kendall’s tau)

B-scores &

V duration

B-scores &

SynCat (Right edge)

V duration &

SynCat (Right edge)

Coefficient .369 -.352 -.225

Significance <.001 <.001 <.001

April 13, 2008 ETAP 41

• Not completely. Final vowel duration and syntactic category are more weakly correlated.

�Suggests that the relationship between syntactic category and boundary perception is not completely due to lengthening effects on the vowel at the right edge of the syntactic category.

… a weak independent effect of syntax on boundary perception.

What we’ve learned

• Naïve transcribers agree on the location of prosodic boundaries and prominent words in spontaneous speech at levels well above chance.

• Agreement is highest for prosodic phrase boundaries.

• Boundary scores decrease at the right edge of syntactic units from higher-level units [ [ ] [ ] ]

April 13, 2008 ETAP 42

of syntactic units from higher-level units (clauses) to lower-level ones (XP, word).

• Boundary perception at the right edge of a word correlates with V duration and with the highest syntactic boundary at that location.

• Syntax may have a weakly independent effect on boundary perception.

[S [NP ]NP [VP ]VP ]S

B-score

pre

dic

tor

??