Upload
lehanh
View
231
Download
3
Embed Size (px)
Citation preview
The role of syntactic structure in guiding prosody perception
with spontaneous speech
Jennifer Cole
Yoonsook Mo Soondo Baek
University of Illinois at Urbana-Champaign
This work supported by:
$$$ NSF IIS-0414117
Friends and collaborators:
José Ignacio Hualde Eun Kyung Lee
April 13, 2008 ETAP 2
José Ignacio Hualde Eun Kyung LeeMark Hasegawa-Johnson Xiaodan ZhuangChilin Shih Zak HulstromMargaret Fleck Tae-Jin Yoon
Thanks to the ETAP 2008 conference organizers and participants!
This talk presents evidence that “naïve” listeners perceive prosodic phrases in everyday speech.
Part 1. The methods of our naïve prosody transcription experiment
Part 2. The syntactic alignment of [ [ ] [ ] ]
April 13, 2008 ETAP 3
Part 2. The syntactic alignment of perceived prosodic boundaries
Part 3. Is boundary perception driven by by syntax, acoustics, or both? Correlation analyses
[S [NP ]NP [VP ]VP ]S
B-score
pre
dic
tor
??
Prosody and phrasing
• Every language displays a characteristic pattern of prosodic modulation that serves to break continuous speech into smaller, meaningful chunks (phrasing).
April 13, 2008 ETAP 4
meaningful chunks (phrasing).
• Through its phrasing aspect, prosody defines an interface between syntax, semantics, and phonology.
Prosody and phrasing
• Every language displays a characteristic pattern of prosodic modulation that serves to break continuous speech into smaller, meaningful chunks (phrasing).
April 13, 2008 ETAP 5
meaningful chunks (phrasing).
• Through its phrasing aspect, prosody defines an interface between syntax, semantics, and phonology.
Getting the prosody correct is important for computer speech applications, as Jurafsky showed in his talk yesterday.
How do ordinary listeners perceive prosody in everyday speech?
• Do listeners differ in how they perceive prosody for the same utterance?
– A qualified “yes”
• What properties of an utterance determine how
April 13, 2008 ETAP 7
• What properties of an utterance determine how
prosody is perceived?
– acoustic, phonological, syntactic, semantic,
pragmatic…
Prosodic transcription as a tool for prosody research
• Reliability
• High agreement rates between transcribers on the same
utterance(s) indicate:
– Speakers produce salient acoustic cues to prosody, and
April 13, 2008 ETAP 8
– Speakers produce salient acoustic cues to prosody, and
– Listeners perceive acoustic cues similarly.
OR… prosody perception is determined in part by “higher” level structure.� Prosody perception is expectation-driven
� Prosody perception is signal-driven
Our method: Naïve prosody transcription
• The transcribers: many (70+) “naïve” listeners.
• The transcriptions: locate prominent words and boundaries,
ignoring differences in tune and strength
Adapting a method from Rotondo 1984; Streefkerk et al. 1997, 1998; Swerts 1997; Buhmann et al. 2001
April 13, 2008 ETAP 9
ignoring differences in tune and strength
• The analysis: evaluates variation in prosodic transcription across
listeners and assigns probabilistic prosody labels
• Speed: Real-time comprehension, to diminish strategic analysis
• Reliability: measured using Fleiss’ Kappa statistic to calculate
agreement rates for multiple (> 2) transcribers.
Our method: Naïve prosody transcription
Listeners hear speech excerpts from the Buckeye Corpus
of American English spontaneous speech (Pitt et al. 2007)
– 38 short excerpts (19 speakers x 2 excerpts), ~ 25 sec. each
– 2110 words
Listeners mark-up a printed transcript of each excerpt (no punctuation or capitalization):
April 13, 2008 ETAP 10
Listeners mark-up a printed transcript of each excerpt (no punctuation or capitalization):
• Prominence: word word word
• Boundary: word | word word…
Transcriptions pooled over listeners to obtain two population-wise prosody scores for each word:
P-score & B-score
Our method: Naïve prosody transcription
Listeners hear speech excerpts from the Buckeye Corpus
of American English spontaneous speech (Pitt et al. 2007)
– 38 short excerpts (19 speakers x 2 excerpts), ~ 25 sec. each
– 2110 words
Listeners mark-up a printed transcript of each excerpt (no punctuation or capitalization):
April 13, 2008 ETAP 11
Listeners mark-up a printed transcript of each excerpt (no punctuation or capitalization):
• Prominence: word word word
• Boundary: word | word word…
Transcriptions pooled over listeners to obtain two population-wise prosody scores for each word:
P-score & B-score
The B-score codes the boundary label at the right edge of the word:
…word | word word…0.4 0.0
Our method: Naïve prosody transcription
Probabalistic prosody scores by word
0.6
0.7
0.8
0.9
1
p(P)
p(B)
Pro
sody
score
April 13, 2008 ETAP 12
0
0.1
0.2
0.3
0.4
0.5
i
don't
know a
lot
of
the
pla
ces
that
are
com
ing
up
the
gentlem
an's
clu
bs
and
thin
gs
like
that i
just i
thin
k
that's
gonna
be
really
bad
for
the
kid
s
Pro
sody
score
z=2.32, α=0.01Exp.1 Exp. 2
Grp.1 Grp.2 Grp.3 Grp.4
ProminenceKappa 0.373 0.421 0.394 0.407
z 19.43 20.48 18.15 18.31
boundaryKappa 0.612 0.544 0.621 0.575
Assessing inter-transcriber agreement
April 13, 2008 ETAP 13
boundaryKappa 0.612 0.544 0.621 0.575
z 27.62 21.87 25.05 26.22
Fleiss’ multi-rater Kappa coefficient & z-statistic were used to assess agreement
���� Agreement between transcribers is highly significant for labeling of prominent words and boundaries
���� Agreement is higher for boundaries than for prominent words.
z=2.32, α=0.01Exp.1 Exp. 2
Grp.1 Grp.2 Grp.3 Grp.4
ProminenceKappa 0.373 0.421 0.394 0.407
z 19.43 20.48 18.15 18.31
boundaryKappa 0.612 0.544 0.621 0.575
Assessing inter-transcriber agreement
April 13, 2008 ETAP 14
boundaryKappa 0.612 0.544 0.621 0.575
z 27.62 21.87 25.05 26.22
Fleiss’ multi-rater Kappa coefficient & z-statistic were used to assess agreement
���� Agreement between transcribers is highly significant for labeling of prominent words and boundaries
���� Agreement is higher for boundaries than for prominent words.
Assessing inter-transcriber agreement
Pairwise transcriber agreement by Cohen’s Kappa statistic
Mean Kappa for Boundary = 0.582
Mean Kappa for
April 13, 2008 ETAP 15
Mean Kappa for Prominence = 0.392
Two transcribers listening to the same speaker perceive prosody differently!
Variability in perceived prosody -- by speaker
8.0
10.0
12.0 P-interval
B-interval
Interval for P > B
Some chunks have
Mean number of words between prominences and boundaries
# w
ord
s (
mean)
April 13, 2008 ETAP 16
0.0
2.0
4.0
6.0
8.0
22 21 25 32 11 4 24 26 2 33 35 15 16 10 3 17 13 14
Some chunks have
no prominences!
Speaker (by id. number)
# w
ord
s (
mean)
Variability in perceived prosody -- by speaker
8.0
10.0
12.0 P-interval
B-interval
Mean number of words between prominences and boundaries
# w
ord
s (
mean)
April 13, 2008 ETAP 17
0.0
2.0
4.0
6.0
8.0
22 21 25 32 11 4 24 26 2 33 35 15 16 10 3 17 13 14
Interval for P ≈ B
One prominence per
chunk.
Speaker (by id. number)
# w
ord
s (
mean)
Variability in perceived prosody -- by speaker
8.0
10.0
12.0 P-interval
B-interval
Mean number of words between prominences and boundaries
# w
ord
s (
mean)
April 13, 2008 ETAP 18
0.0
2.0
4.0
6.0
8.0
22 21 25 32 11 4 24 26 2 33 35 15 16 10 3 17 13 14
Interval for P < B
Multiple prominences
per chunk.
Speaker (by id. number)
# w
ord
s (
mean)
Interim Summary: Part 1
• Naïve transcribers agree on the location of prosodic boundaries and prominent words in spontaneous speech at levels well above chance.
• Agreement is highest for prosodic phrase
April 13, 2008 ETAP 19
• Agreement is highest for prosodic phrase boundaries.
• Even so, there is variation in the transcriptions across listeners, and by speaker.
• Variation in perception yields a probabilistic prosody score on each word.
Interim Summary: Part 1
• Naïve transcribers agree on the location of prosodic boundaries and prominent words in spontaneous speech at levels well above chance.
• Agreement is highest for prosodic phrase A continuous measure.
April 13, 2008 ETAP 20
• Agreement is highest for prosodic phrase boundaries.
• Even so, there is variation in the transcriptions across listeners, and by speaker.
• Variation in perception yields a probabilistic prosody score on each word.
A continuous measure.Watson’s and Ladd’s talks suggest continuous prosody features based on production evidence.
Perceived prosodic boundaries:Syntactic correlates
• Utterances manually coded for syntactic structure, following Penn Treebank guidelines.
• Each word was coded for the highest syntactic category at its left and right edge:
[S [NP ]NP [VP ]VP ]S
April 13, 2008 ETAP 21
category at its left and right edge:
Example: “ [ I ] [ think ] in today’s world…”
S NP VP Word (= phrase-medial)
Mean B-Scores: 0.9 0.0 = 0.0 0.2
Perceived prosodic boundaries:Syntactic encoding
[S [NP ]NP [VP ]VP ]S
S matrix S
S-bar subordinate or relative clause
the fact | that it was something
Clausal categories
April 13, 2008 ETAP 22
the fact | that it was something
S2 S preceded by conj. or rel. pronounthe fact that | it was something
if | he would choose
CC-S coordinating conjunction following or preceding a sentence
| if he would choose
Perceived prosodic boundaries:Syntactic encoding
[S [NP ]NP [VP ]VP ]S
CC-XP coordinating conj. preceding or following an XP
we did | or didn’t …
Phrase (XP) any XP that is not a clause
Non-clausal categories
April 13, 2008 ETAP 23
Phrase (XP)
Within phrase any word boundary that does not align with a coded syntactic boundary
Disfluency filled pause, repetition & repair difluencies
Discourse marker yknow, like, so, I mean, …
Other
Frequency of syntactic boundariesleft, right
[S [NP ]NP [VP ]VP ]S
S 124, 159
S-bar 112, 22
S2 154, 0
CC-S 84, 84
April 13, 2008 ETAP 24
CC-S 84, 84
CC-XP 37, 34
Phrase (XP) 776, 480
Within phrase 410, 1016
Disfluency 78, 78
Discourse marker 60, 64
Other 237, 158
Perceived prosodic boundaries:Syntactic correlates
[S [NP ]NP [VP ]VP ]S
Boundary scores are generally high at the edges ofclauses andcoordinating
Mean boundary scores by syntactic category at L/R word edge
Co.Conj.
Clauses.50
April 13, 2008 ETAP 25
coordinating conjunctions
Patterns of correlation are stronger for B-scores at right edge of a word, than at left edge.
We focus now on right-edge patterns.
0
Perceived prosodic boundaries:Syntactic correlates
[S [NP ]NP [VP ]VP ]S
A decreasing cline of Boundary scores at the right edges of:
Mean boundary scores by syntactic category at right edge of word
April 13, 2008 ETAP 26
S
S-b
ar
CC
-S
XP
C-X
P
WP
Oth
r
Syntactic boundary (right)
Clauses >
Coordinating Conj. >
XP, WP, Other
IP? ip?
Perceived prosodic boundaries:Discourse markers and disfluency
[S [NP ]NP [VP ]VP ]S
Boundary scores are HIGHEST at the right edges of clauses, discourse markers
Distribution of boundary scores by category defined at right edge of word
April 13, 2008 ETAP 27
Reference line: mean B-score for the right edge of XP
discourse markers and disfluency.
S S’ DM Disf
Perceived prosodic boundaries:Phrasing implications
[S [NP ]NP [VP ]VP ]S
B-scores are similar for right and left edge of disfluency
� disf. phrased alone
Boundary
Mean boundary score at word edge by syntactic category
April 13, 2008 ETAP 28
Perceived prosodic boundaries:Phrasing implications
[S [NP ]NP [VP ]VP ]S
B-scores are higher at right edge for clauses
Boundary
Mean boundary score at word edge by syntactic category
April 13, 2008 ETAP 29
right edge for clauses
� Prosodic phrases align with end of clause
Perceived prosodic boundaries:Phrasing implications
[S [NP ]NP [VP ]VP ]S
Boundary
Mean boundary score at word edge by syntactic category
April 13, 2008 ETAP 30
B-scores are higher at left edge of DM, CC
� These items are pro-clitics to syntactic phrase
Interim Summary- Part 2
• Boundary scores decrease at the right edge of syntactic units from higher-level units (clauses) to lower-level ones (XP, word).
• B-scores are HIGHEST at the right edges of clauses and disfluency.
• Discourse markers and coordinating conjunctions are
[S [NP ]NP [VP ]VP ]S
April 13, 2008 ETAP 31
• Discourse markers and coordinating conjunctions are perceived with a boundary at their left edge
� as pro-clitics to the following clause or phrase
• Disfluencies tend to be perceived as separate prosodic “chunks”.
Is boundary perception cued by acoustic duration?
Final lengthening effects are robust cues to prosodic boundaries in English.(Wightman et al. 1992)
These data from the BU Radio News corpus show graded lengthening of the vowel in the domain-final rime, at three levels of prosodic boundary:
April 13, 2008 ETAP 32
prosodic boundary:
Word, ip, IP
(Yoon, Cole, Hasegawa-Johnson, ICPhS, 2007)
Wd ip IP
Is boundary perception cued by acoustic duration?
Normalized duration measures are highly variable, but longer
Normalized vowel duration and B-scores from vowels in word-final stressed syllables.
April 13, 2008 ETAP 33
variable, but longer vowels are more likely to be perceived as final in the prosodic phrase.
Does vowel duration directly encode the syntactic category?
V duration at the right edge of CC-XP
“… we did or] didn’t do….”
Mean vowel duration (z-score) by syntactic category at right edge of word
1.00�
Vow
el dura
tion (
norm
aliz
ed)
April 13, 2008 ETAP 34
is highly variable, suggesting this is not a single category. Some of these CCs may exhibit final lengthening, while while others may be reduced.
2 4 6
reorders
-1.00
0.00z_rh
_d
ur
�
�
�
�
�
�
S S-b
ar
CC
-X
XP
CC
-XP
WP
Oth
er
Syntactic Category
Vow
el dura
tion (
norm
aliz
ed)
Does vowel duration directly encode the syntactic category?
Mean vowel duration (z-score) by syntactic category at right edge of word
1.00�
Vow
el dura
tion (
norm
aliz
ed)
April 13, 2008 ETAP 35
2 4 6
reorders
-1.00
0.00z_rh
_d
ur
�
�
�
�
�
�
S S-b
ar
CC
-X
XP
CC
-XP
WP
Oth
er
Syntactic Category
Vow
el dura
tion (
norm
aliz
ed)
Considering the other syntactic categories, vowel duration is longer at right edge of clauses
Does vowel duration directly encode the syntactic category?
Mean vowel duration (z-score) by syntactic category at right edge of word
1.00�
Vow
el dura
tion (
norm
aliz
ed)
April 13, 2008 ETAP 36
2 4 6
reorders
-1.00
0.00z_rh
_d
ur
�
�
�
�
�
�
S S-b
ar
CC
-X
XP
CC
-XP
WP
Oth
er
Syntactic Category
Vow
el dura
tion (
norm
aliz
ed)
Considering the other syntactic categories, vowel duration is longer at right edge of clauses…and shorter at the right edge of words that are final in XP or within-phrase.
Correlation analysis
Correlations B-scores & B-scores & V duration &
Testing for a linear relationship between three measures of each word:
• boundary score
• duration of the word-final vowel (normalized, stressed Vs only)
• syntactic category at its right edge
April 13, 2008 ETAP 37
Correlations
(Kendall’s tau)
B-scores &
V duration
B-scores &
SynCat (Right edge)
V duration &
SynCat (Right edge)
Coefficient .369 -.352 -.225
Significance <.001 <.001 <.001
B-scores increase with final V duration
Correlation analysis
Correlations B-scores & B-scores & V duration &
Testing for a linear relationship between three measures of each word:
• boundary score
• duration of the word-final vowel (normalized, stressed Vs only)
• syntactic category at its right edge
April 13, 2008 ETAP 38
Correlations
(Kendall’s tau)
B-scores &
V duration
B-scores &
SynCat (Right edge)
V duration &
SynCat (Right edge)
Coefficient .369 -.352 -.225
Significance <.001 <.001 <.001
B-scores decrease from higher to lower syntactic categories
Correlation analysis
Correlations B-scores & B-scores & V duration &
Testing for a linear relationship between three measures of each word:
• boundary score
• duration of the word-final vowel (normalized, stressed Vs only)
• syntactic category at its right edge
April 13, 2008 ETAP 39
Correlations
(Kendall’s tau)
B-scores &
V duration
B-scores &
SynCat (Right edge)
V duration &
SynCat (Right edge)
Coefficient .369 -.352 -.225
Significance <.001 <.001 <.001
Final V duration decreases from higher to lower syntactic categories
Correlation analysis
Correlations
(Kendall’s tau)
B-scores &
V duration
B-scores &
SynCat (Right edge)
V duration &
SynCat (Right edge)
Coefficient .369 -.352 -.225
Significance <.001 <.001 <.001
• Final vowel duration and the syntactic category at the right edge of the
April 13, 2008 ETAP 40
• Final vowel duration and the syntactic category at the right edge of the word are about equally correlated with B-scores.
… Is the syntactic category effect covertly encoding the effects of final vowel duration?
Correlation analysis
Correlations
(Kendall’s tau)
B-scores &
V duration
B-scores &
SynCat (Right edge)
V duration &
SynCat (Right edge)
Coefficient .369 -.352 -.225
Significance <.001 <.001 <.001
April 13, 2008 ETAP 41
• Not completely. Final vowel duration and syntactic category are more weakly correlated.
�Suggests that the relationship between syntactic category and boundary perception is not completely due to lengthening effects on the vowel at the right edge of the syntactic category.
… a weak independent effect of syntax on boundary perception.
What we’ve learned
• Naïve transcribers agree on the location of prosodic boundaries and prominent words in spontaneous speech at levels well above chance.
• Agreement is highest for prosodic phrase boundaries.
• Boundary scores decrease at the right edge of syntactic units from higher-level units [ [ ] [ ] ]
April 13, 2008 ETAP 42
of syntactic units from higher-level units (clauses) to lower-level ones (XP, word).
• Boundary perception at the right edge of a word correlates with V duration and with the highest syntactic boundary at that location.
• Syntax may have a weakly independent effect on boundary perception.
[S [NP ]NP [VP ]VP ]S
B-score
pre
dic
tor
??