27
COST2102 International School - Development of Multimodal Interfaces slide 1 Analyzing complementary acoustic cues for signalling prominence in different languages William J. Barry Bistra Andreeva Jacques Koreman

COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

Embed Size (px)

Citation preview

Page 1: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 1

Analyzing complementary acoustic cues for signalling prominence in different languages

William J. Barry Bistra Andreeva

Jacques Koreman

Page 2: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 2

Basis for this presentation

This talk presents the related results from three recent presentations:

• Koreman, J., Andreeva, B. & Barry, W.J. (2008). Accentuation cues in French and German, in: P.A. Barbosa, S. Madureira and C. Reis. Proc. Speech Prosody 2008, Campinas (Brazil), 613-616. Campinas, Brazil: Editora RG/CNPq.

• Koreman, J., Van Dommelen, W., Sikveland, R., Andreeva, B. & Barry, W.J. (in print). Cross-language differences in the production of phrasal prominence in Norwegian and German, Proc. Nordic Prosody 2008, Helsinki (Finland).

• Barry, William J. & Bistra Andreeva (2009). Cross-language and individual differences in the production and perception of syllabic prominence, Annual Meeting SPP 1234 Sprachlautliche Kompetenz 2009, Cologne (Germany).

Page 3: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 3

Why present this here?

• Björn Granström: “Coherence between audio and video?”, e.g. between nodding and F0 in “Båten seglede forbi”.

• Kristiina Jokinen: “To what extent does non-verbal activity, esp. gestures and facial expressions, co-occur with verbal expressions?” (culture-dependence, communicative function)

Are there cross-cultural (-language) differences in importance of acoustic and visual cues? (There are for prosodic dimensions.)

Are they complementary? (Prosodic dimensions are.)

What does that mean for synchrony detection? (Trouble?)

This talk only deals with the acoustics of prominence. But because that involves several prosodic dimensions, the data analysis may also be relevant to multi-modal speech.

Page 4: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 4

Outline

The ideas about the acoustic realization of prominence that I present here are mainly Bill Barry’s and Bistra Andreeva’s.(This is an acknowledgement, not an attempt to evade responsibility.)

from each of the three presentations

• Research questions

• Recordings

• Measurements

• Statistical analysis

• Results

• Discussion

• Conclusion and possible relevance to COST 2102

Page 5: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 5

Research questions

• How do different languages exploit the universal means of signalling the varying prominence of words in an utterance?• duration• fundamental frequency• energy • spectral properties

• Do the different word-phonological requirements of a language affect the degree to which the properties are exploited?• duration (length opposition; word stress)• fundamental frequency (tonal word-accent)• spectral properties (phonologized vowel reduction)

Page 6: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 6

Project

• The present work is part of a larger project funded by the German Research Council:Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited.

• The languages investigated in the projects are article 1 article 2 article 3

• German• English• Norwegian• Bulgarian• Russian• French • Japanese

Page 7: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 7

Recordings

• Six speakers from homogeneous groups in each language• Comparable production task across languages: varying

accentuation due to different focus on critical words (CWs) elicited by questions:

• broad• narrow non-contrastive (early or late)• narrow contrastive (early or late)

• Text replies to questions followed by “dada” versionNorwegian sentences:1. Hun Siv drar med skipet snart.2. Han Karl tenker på fag nå.3. Hans far brukte sagen da.4. Min pasta blir kald til da.6. Min stabsmann forblir bak nå.7. Han Krister fikk skiftet mitt.

German sentences:1. Der Mann fuhr den Wagen vor.2. Das Bild soll nicht hässlich sein.3. Das Kind sollte im Bett sein.4. Der Peter kann den Film gucken.5. Das Mädchen soll ein Bild malen.6. Mein Vater kann Türkisch lesen.

Results given here,but checked with

text versions

B E L

text dada

Page 8: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 8

Measurements

Duration • Duration (ms) of stressed vowels, stressed syllables, CWs, feet

F0 • Mean F0 (semitones) across stressed vowel of CW

• F0 contour by comparison of stressed vowel in CW with preceding/following vowels

Intensity • Mean intensity (dB) of stressed vowel in CW• Spectral balance = difference between

70-1000 Hz band and 1200-5000 Hz band in stressed vowel of CW

Normalized relative to mean across corresponding units in sentence

Spectr. def. • F1–F3 at middle of stressed nucleus of CW

Page 9: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 9

Statistical analysis FR-GE (Speech Prosody data)

• Multivariate Anova’sfor CW1 and CW2 separatelywith independent variables:

• language (FR, GE)• focus (accented, deaccented)• number of syllables in CW (1,2)

• Multivariate Anova’s per language (FR, GE)

• Stepwise discriminant analyses: cue weighting• for CW1 and CW2 separately• for each language separately

Page 10: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 10

Results: Manova’s

Main effects for language

Parameter CW1 CW2vowel dur.syllable dur.word dur.foot dur.

*******-

***********

F0 meanF0 difference

******

--

intensityspect. bal.

******

****

F1F2F3

***-*

*****-

Interactions lang. accentuation

Parameter CW1 CW2vowel dur.syllable dur.word dur.foot dur.

****-

*********-

F0 meanF0 difference

******

******

intensityspect. bal.

--

--

F1 F2F3

-**-

**-

Page 11: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 11

CW1 syllable duration

CW1 word duration

11 111 1

22 2222

1 1 1 1 11 222222

GE FR

Results for durationsy

llabl

e du

ratio

nw

ord

dura

tion

Page 12: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 12

CW2 word duration

111 222

in final foot

111222

1 1122

1

1 11 22 1

CW2 syllable duration

GE FR

Effects greater for French than for German

Results for durationsy

llabl

e du

ratio

nw

ord

dura

tion

Page 13: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 13

CW1Language Parameter sdc

French

mean F0

syllable dur.vowel dur.

intensity

0.7090.665

-0.3790.328

German

intensitymean F0

word durationspect. balancevowel dur.foot dur.

0.6830.5750.399

-0.2090.1710.158

Language Parameter sdc

French

mean F0

intensityF0 change

vowel dur.word dur.

0.9620.576

-0.4190.2790.164

German

intensityvowel dur.mean F0

syllable dur.spect. balance

0.9320.6710.515

-0.430-0.345

CW2

Results: discriminant analyses

Page 14: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 14

• Duration effects accented-deaccented in anova greater for French than for German: exploitation in German constrained due to segmental vowel length opposition??

• Spectral balance included as DA-predictor in German: reduction increases accented-deaccented opposition (but no interaction lg x accentuation in Anova’s).

• But importance of duration in French compared to German not so clear in DA, probably due to correlation between acoustic cues. DA therefore not very suitable for analyzing these data.

Discussion

Page 15: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 15

Statistical analysis NO-GE (Nordic Prosody data)

• Multivariate Anova’sfor CW1 and CW2 separatelywith independent variables:

• language (NO, GE)• focus (broad, early narrow, late narrow)• number of syllables in CW (1,2)

• Multivariate Anova’s per language (NO, GE)

Page 16: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 16

Results

Main effects for language

Parameter CW1 CW2

vowel dur.syllable dur.word dur.foot dur.

n.s.

n.s.

F0 meanF0 difference

intensityspect. balance

F1F2F3

Interactions lang. accentuation

Parameter CW1 CW2

vowel dur.syllable dur.word dur.foot dur.

F0 meanF0 difference

intensityspect. balance

n.s.

F1 F2F3

n.s.n.s.n.s.

n.s.n.s.n.s.

Page 17: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 18

Results: Manova’s per language

η2-values for accentuation (for both CWs, NO and GE)*

* η2 = ratio of treatment / total variances

η2 in red > 0.5; η2 in grey n.s.

NO GEParameter CW1 CW2 CW1 CW2Vowel duration .556 .669 .038 .020Syllable duration .684 .756 .390 .168Word duration .527 .627 .335 .243Foot duration .155 .454 .035 .067F0 mean .576 .246 .837 .884F0 difference .145 .053 .709 .702Intensity .433 .428 .756 .884Spectral balance .057 .112 .123 .437F1 .134 .058 .331 .392F2 .012 .095 .022 .013F3 .003 .007 .026 .004

Page 18: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 19

Results

• η2-values are a ratio of treatment and total variance, and thus indicate the part of the total variance explained by the focus conditions.

• In Norwegian, durational cues (esp. syllable duration) distinguish the three conditions.

• In German, intensity and F0 are the strongest cues to distinguish the three conditions.

• The lack of importance of F0 in Norwegian is most likely an artefact of the different realizations of the lexical tone 1 for mono- and disyllabic stimuli.

Page 19: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 20

Results for intensityvo

wel

inte

nsity

vow

el in

tens

ity

• Similar patterns for (normalized) intensity for German and Norwegian

• But greater differences between early, late and broad focus in German than in Norwegian

• In Norwegian late and broad focus intensity of CW2 less than that of CW1, but not in German

GERMAN NORW.

CW1

CW2

early

late

X

Position

Bars show Means

D N

Sprache

0,00

1,40

2,80

4,20

5,60

V1_

No

rmM

ean

I

early

late

X

Position

Bars show Means

D N

Sprache

0,00

1,40

2,80

4,20

5,60

V2_

No

rmM

ean

I

earlylatebroad

Focus

Page 20: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 21

Results for duration critical word 1sy

llabl

e du

ratio

nw

ord

dura

tion

GERMAN NORWEGIAN

• Greater (normalized) durational differen-ces between early, broad and late focus in Norwegian than in German

• Similar effect for CW2

D N

early

late

X

Position

Bars show Means

1 2

nbumber syl

0,00

50,00

100,00

150,00

200,00

KS

ilbe1

_ND

1 2

nbumber sylD N

early

late

X

Position

Bars show Means

1 2

nbumber syl

0,00

50,00

100,00

150,00

200,00

KW

ort

1_N

D

1 2

nbumber syl

1 σ 2 σ

1 σ 2 σ1 σ 2 σ

1 σ 2 σearlylatebroad

Focus

Page 21: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 22

Results: summary

• German strongly uses intensity to signal prominence

• Norwegian uses duration more→ but Norwegian also has a vowel length opposition

and is classified as the same rhythm type as German (stress-timed), so this disconfirms the hypothesis that the use of acoustic cues depends on their phonological status in a language!

• F0 does play a role (esp. for German), but our measures do not reflect the different accent types well. → There is a difference in peak alignment of early and

late/broad focus between Norwegian and German

Page 22: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 29

Analysis 6 languages (SPP1234 data)

• Anova’s with languages as independent variables• Dependent variable is mean change in values from

broad to contrastive focus• Mean change is expressed as a percentage (duration, F0)

or in dB (intensity)

Page 23: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 30

Results for syllable duration of [da]

Languages use the acoustic carriers of prominence to different degrees (CS=Critical Syllable):

NO > FR > RU ~ GE > EN ~ BU CS1 46% 32% 25% 22% 17% 16%

NO > FR > RU > GE ~ BU ~ EN CS2 53% 38% 26% 17% 17% 14%

Note: No apparent connection between vowel lengthopposition and use of duration for accentuation(in contrast to Rebecca Dauer‘s claim)

Page 24: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 31

Results for F0 in text recordings

Languages use the acoustic carriers of prominence to different degrees:

FR > EN ~ GE > BU ~ NO > RU CS1 72% 61% 58% 28% 27% 20%

GE ~ FR > EN > BU > RU > NOCS2 64% 62% 51% 38% 31% 10%

Note: Despite some shift in rank between FR, EN, GE and between NO and RU for the early (CS1) and the late position (CS2), the generally high vs. low dynamics for the groups remain (the ranking for [dada] is even more consistent)

Page 25: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 32

Results for intensity in [dada] recordings

Languages use the acoustic carriers of prominence to different degrees (intensities in dB):

BU > FR ~ GE > RU ~ EN > NO CS1 5.8 3.2 3.0 2.7 2.5 1.6

BU > FR = GE > EN > RU > NOCS2 6.5 5.6 5.6 4.2 3.7 2.8

Note: Larger intensity differences for CS2 than CS1.

Page 26: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 33

Conclusion and possible relevance

• For each acoustic parameter, there is a hierarchy of its exploitation for signalling focus-induced prominence in different languages.

• Similar differences may exist between languages/cultures in the way they exploit different gestures (face, hand, arm, etc.) and/or for the relative explotiation of acoustic/visual cues, e.g. to signal focus or other communicative functions.

• Possibly not only correlation (synchrony), but also complementarity of parameters.

Page 27: COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different

COST2102 International School - Development of Multimodal Interfaces slide 34

Thank you for your attention