30
1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech Processing Ludwig-Maximilians-Universität München, Germany Special Thanks to: Chr. Heinrich, S. Barfüßer, I. Dhillon, Prof. Th. Gilg, RaOLG Tourneur

1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech

Embed Size (px)

Citation preview

1VIU Seminar 14. - 17. April 2009

Alcoholized Speech:F0 and Rhythm

Florian Schiel Bavarian Archive for Speech Signals

Institute of Phonetics and Speech ProcessingLudwig-Maximilians-Universität München, Germany

Special Thanks to:Chr. Heinrich, S. Barfüßer, I. Dhillon, Prof. Th. Gilg, RaOLG Tourneur

2VIU Seminar 14. - 17. April 2009

Overview

Motivation, Goals and Earlier Work ALC Corpus F0 Analysis Rhythm Analysis Discussion: Prosodic Features for ALC

Motivation

3VIU Seminar 14. - 17. April 2009

Why is alcoholized speech interesting? Phonetic Forensics:

Speaker identification from alcoholized speech samples

Determine alcoholization from air traffic recordings (for example Exxon Valdez crash in 1987)

Traffic accidents: determine alcoholization from in-car recordings, if blood samples are not available

Motivation

4VIU Seminar 14. - 17. April 2009

Why is alcoholized speech interesting? Speech Production:

How does intoxication influences planing and motor control?

Speech Perception:

Can listeners judge the alcoholisation from a speech sample? Which features do listeners use for their judgement?

Motivation

5VIU Seminar 14. - 17. April 2009

Why is alcoholized speech interesting? Traffic security

Can a voice controlled car judge the alcoholization of its driver (and then take measures)?

OnFocus / OffFocus}

Motivation

6VIU Seminar 14. - 17. April 2009

What has been done already? Forensic studies (2) Perception studies (3) Phonetic Features (10) Recognition (2)

Motivation

Common problems: mostly male speakers number of speakers is low (<40), statistics not valid intoxication measured by breath alcohol concentration (BRAC) lab speech ('Northwind and the Sun' etc.) results partly contradictory

7VIU Seminar 14. - 17. April 2009

What features have been investigated? F0 parameters formant Parameters RMS / Loudness spectral tilt of signal or source signal speech rate parameters pause length, number mispronunciations: deletions, insertions, repairs, stutter errors in phonetic gestures - incomplete gestures (measurement?) - lateralisation /r/ -> /l/ (measurement?) - shift of place /s/ -> /S/ or /s/ -> /T/ - nasalisation, de-nasalization (?)

Motivation

8VIU Seminar 14. - 17. April 2009

... and what has not been investigated? dysfluencies centralisation of vowels rhythm prosodic contours

Motivation

female speech 'outside the lab' speech command & control speech dialogue speech statistically valid data (>100 speakers, > 2 Mio phonemes

.... so lets do it! (Yes, we can!)

9VIU Seminar 14. - 17. April 2009

Our goals: verify/falsify reported findings on a larger database check for rhythm parameters check for prosodic contours (with Uwe's help?) check for centralization of vowels check for 'linguistic irregularities' check for gender / age / speech type influences check on sober control group preception experiments: what features are important?

Motivation

Help wanted!

10VIU Seminar 14. - 17. April 2009

alcoholization experiments at the Institute of Legal Medicine blood alcohol concentration : 0.05 – 0.2% breath sample (BRAC) and blood sample test (BAC) 15 minutes recording in two cars, SpeechRecorder, 2 mics read, monologue, dialogue, command&control (with engine) annotation SpeechDat extended by Verbmobil tags export into BAS Partitur Format, canonical pronunciation by BALLOON, MAUS segmentation import into Emu hierachy, F0, formants, RMS analysis using R

ALC Corpus

11VIU Seminar 14. - 17. April 2009

ALC Corpus

12VIU Seminar 14. - 17. April 2009

ALC Corpus

13VIU Seminar 14. - 17. April 2009

ALC Corpus

Examples

14VIU Seminar 14. - 17. April 2009

ALC Corpus

Nov 2007 | 2008 | 2009 |

First recordings 14 speakersrecorded

LREC 2008

First contactwith Legal Medicine

61 speakersrecorded

82 speakersrecorded

First F0 Analysis

Rhythm features Analysis

Analysis of irregularities

First perception tests

150 speakersrecorded

DFG application

Time line and estimates

75 female + 75 male speakers age 22 – 75 BAC 0.00 - 0.20%

15VIU Seminar 14. - 17. April 2009

ALC Corpus

Problems 2nd sober recording : loss rate of 20%

MAUS segmentation of dialogues unreliable solution : pre-segmentation into speaker and non- speaker parts, then MAUS on each speaker part

gender balance: we need more male speakers

age balance: very few speakers above 50

16VIU Seminar 14. - 17. April 2009

Analysis

RM-ANOVA requires one measurement per speaker and within-factor combination.

between-factors: sex, age, (drinking habits)within-factors: alc, speech type, (content, car noise)

Definition:utterance group (UG) : all utterances of one speaker

and one within-factor combination

Example:

UG(speaker=006, alc=a, type=spont) =3 monologues, 2 dialogues and 5 spontaneous commands

17VIU Seminar 14. - 17. April 2009

F0 Analysis

F0 from Vincent-Schaefer pitch period detector (Emu)

1. F0 Median Fm over utterance group (UG)

18VIU Seminar 14. - 17. April 2009

F0 Analysis

2. F0 quarter-quantile distances Fqq

over UG

19VIU Seminar 14. - 17. April 2009

F0 Analysis

3. F0 in lexically accented vowels /a: e: E: i: u: o:/ in same context in read speech

22 female / 24 male speakers

Results:

Median and quarter-quantile distance of F0 behave like global values with following exceptions:

no significant increase of Fm for male speakers in

back vowels /o:/ and /u:/

no significant increase of Fqq

in /a:/, /o:/ and /u:/

20VIU Seminar 14. - 17. April 2009

F0 Analysis

4. F0 change per speaker

read speechF

m(alc) – F

m(non-alc)

45 female37 male

21VIU Seminar 14. - 17. April 2009

F0 Analysis

5. Hypothesis: F0 + energy contours differ

Example:

simple declarative sentences with single phrase

calculate F0 by Vincent-Schaefer

linear interpolated F0 gaps

calculated 2nd (tilt) and 3rd (curvature) coefficients of Discrete Cosine Transform (DCT)

22VIU Seminar 14. - 17. April 2009

F0 Analysis

blue : raw F0 red : linear interpolationgreen : DCT coefficients 0-2

DCT-0 = 313.62 (bias)DCT-1 = 35.73 (tilt)DCT-2 = -0.93 (curvature)

DCT-0 = 338.17 (bias)DCT-1 = 31.01 (tilt)DCT-2 = -3.92 (curvature)

23VIU Seminar 14. - 17. April 2009

F0 Analysis

2-dim. plot of DCT-1 (tilt) vs. DCT-2 (curvature)

-> centriods identical,variation increases for alcoholized speech

24VIU Seminar 14. - 17. April 2009

Rhythm

Rhythm in this context:The segmental structure of V, C and P clusters

syllable nuclei = middle of V cluster

25VIU Seminar 14. - 17. April 2009

Rhythm

Rhythm features

Two basic types of measurements:

counts (normalized over time or on number of syllables) or proportions, calculated across the UG -> one measurement per UG : <feature>

multiple measurements (e.g. per syllable) averaged across UG, usually expressed as mean (.m) and standard deviation (.sd) -> two values per UG : <feature>.m, <feature>.sd

Usually the initial and final silence interval of a recording is disregarded.

26VIU Seminar 14. - 17. April 2009

Rhythm feature overview

Voicing %V : proportion (time) of voiced signal

Speech rate sylrate : number of syllables (nuclei) per sec

Silence intervals ps-persyl : number of short pauses (<1sec) per syllable ps-persec : number of short pauses per sec pl-persyl : number of long pauses (>1sec) per syllable pl-persec : number of long pauses per sec

Rhythm

27VIU Seminar 14. - 17. April 2009

Rhythm feature overview (Cont.)

Silence dimensions durs : length of short pauses (<1sec)

Cluster dimensions deltaV, deltaC (Ramus et al 1999) : voiced and unvoiced cluster lengths deltaSN : nuclei distances

Cluster structure nPVI-V, nPVI-C (Grabe&Low 2004) : length difference of consecutive clusters normalized to average length of both clusters nPVI-SN : distance difference of consecutive syllable nuclei normalized to average length of both distances

Rhythm

28VIU Seminar 14. - 17. April 2009

Some results (45 female + 37 male, read + command speech)

Rhythm

RM-ANOVA: p = 0.0014 p > 0.05 p < 0.001 p > 0.05 p = 0.049

Post hoc speech type: no - only command - only read

Post hoc gender: No interaction in gender in all features

29VIU Seminar 14. - 17. April 2009

Conclusion

Work in progress, therefore: no conclusions!

But ...

30VIU Seminar 14. - 17. April 2009

Prosodic Features for ALC?

---

Thank you!

Discussion