1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech

1VIU Seminar 14. - 17. April 2009

Alcoholized Speech:F0 and Rhythm

Florian Schiel Bavarian Archive for Speech Signals

Institute of Phonetics and Speech ProcessingLudwig-Maximilians-Universität München, Germany

Special Thanks to:Chr. Heinrich, S. Barfüßer, I. Dhillon, Prof. Th. Gilg, RaOLG Tourneur


Overview

Motivation, Goals and Earlier Work ALC Corpus F0 Analysis Rhythm Analysis Discussion: Prosodic Features for ALC

Motivation


Why is alcoholized speech interesting? Phonetic Forensics:

Speaker identification from alcoholized speech samples

Determine alcoholization from air traffic recordings (for example Exxon Valdez crash in 1987)

Traffic accidents: determine alcoholization from in-car recordings, if blood samples are not available

Motivation


Why is alcoholized speech interesting? Speech Production:

How does intoxication influences planing and motor control?

Speech Perception:

Can listeners judge the alcoholisation from a speech sample? Which features do listeners use for their judgement?

Motivation


Why is alcoholized speech interesting? Traffic security

Can a voice controlled car judge the alcoholization of its driver (and then take measures)?

OnFocus / OffFocus}

Motivation


What has been done already? Forensic studies (2) Perception studies (3) Phonetic Features (10) Recognition (2)

Motivation

Common problems: mostly male speakers number of speakers is low (<40), statistics not valid intoxication measured by breath alcohol concentration (BRAC) lab speech ('Northwind and the Sun' etc.) results partly contradictory


What features have been investigated? F0 parameters formant Parameters RMS / Loudness spectral tilt of signal or source signal speech rate parameters pause length, number mispronunciations: deletions, insertions, repairs, stutter errors in phonetic gestures - incomplete gestures (measurement?) - lateralisation /r/ -> /l/ (measurement?) - shift of place /s/ -> /S/ or /s/ -> /T/ - nasalisation, de-nasalization (?)

Motivation


... and what has not been investigated? dysfluencies centralisation of vowels rhythm prosodic contours

Motivation

female speech 'outside the lab' speech command & control speech dialogue speech statistically valid data (>100 speakers, > 2 Mio phonemes

.... so lets do it! (Yes, we can!)


Our goals: verify/falsify reported findings on a larger database check for rhythm parameters check for prosodic contours (with Uwe's help?) check for centralization of vowels check for 'linguistic irregularities' check for gender / age / speech type influences check on sober control group preception experiments: what features are important?

Motivation

Help wanted!


alcoholization experiments at the Institute of Legal Medicine blood alcohol concentration : 0.05 – 0.2% breath sample (BRAC) and blood sample test (BAC) 15 minutes recording in two cars, SpeechRecorder, 2 mics read, monologue, dialogue, command&control (with engine) annotation SpeechDat extended by Verbmobil tags export into BAS Partitur Format, canonical pronunciation by BALLOON, MAUS segmentation import into Emu hierachy, F0, formants, RMS analysis using R

ALC Corpus


ALC Corpus


ALC Corpus


ALC Corpus

Examples


ALC Corpus

Nov 2007 | 2008 | 2009 |

First recordings 14 speakersrecorded

LREC 2008

First contactwith Legal Medicine

61 speakersrecorded

82 speakersrecorded

First F0 Analysis

Rhythm features Analysis

Analysis of irregularities

First perception tests

150 speakersrecorded

DFG application

Time line and estimates

75 female + 75 male speakers age 22 – 75 BAC 0.00 - 0.20%


ALC Corpus

Problems 2nd sober recording : loss rate of 20%

MAUS segmentation of dialogues unreliable solution : pre-segmentation into speaker and non- speaker parts, then MAUS on each speaker part

gender balance: we need more male speakers

age balance: very few speakers above 50


Analysis

RM-ANOVA requires one measurement per speaker and within-factor combination.

between-factors: sex, age, (drinking habits)within-factors: alc, speech type, (content, car noise)

Definition:utterance group (UG) : all utterances of one speaker

and one within-factor combination

Example:

UG(speaker=006, alc=a, type=spont) =3 monologues, 2 dialogues and 5 spontaneous commands


F0 Analysis

F0 from Vincent-Schaefer pitch period detector (Emu)

1. F0 Median Fm over utterance group (UG)


F0 Analysis

2. F0 quarter-quantile distances Fqq

over UG


F0 Analysis

3. F0 in lexically accented vowels /a: e: E: i: u: o:/ in same context in read speech

22 female / 24 male speakers

Results:

Median and quarter-quantile distance of F0 behave like global values with following exceptions:

no significant increase of Fm for male speakers in

back vowels /o:/ and /u:/

no significant increase of Fqq

in /a:/, /o:/ and /u:/


F0 Analysis

4. F0 change per speaker

read speechF

m(alc) – F

m(non-alc)

45 female37 male


F0 Analysis

5. Hypothesis: F0 + energy contours differ

Example:

simple declarative sentences with single phrase

calculate F0 by Vincent-Schaefer

linear interpolated F0 gaps

calculated 2nd (tilt) and 3rd (curvature) coefficients of Discrete Cosine Transform (DCT)


F0 Analysis

blue : raw F0 red : linear interpolationgreen : DCT coefficients 0-2

DCT-0 = 313.62 (bias)DCT-1 = 35.73 (tilt)DCT-2 = -0.93 (curvature)

DCT-0 = 338.17 (bias)DCT-1 = 31.01 (tilt)DCT-2 = -3.92 (curvature)


F0 Analysis

2-dim. plot of DCT-1 (tilt) vs. DCT-2 (curvature)

-> centriods identical,variation increases for alcoholized speech


Rhythm

Rhythm in this context:The segmental structure of V, C and P clusters

syllable nuclei = middle of V cluster


Rhythm

Rhythm features

Two basic types of measurements:

counts (normalized over time or on number of syllables) or proportions, calculated across the UG -> one measurement per UG : <feature>

multiple measurements (e.g. per syllable) averaged across UG, usually expressed as mean (.m) and standard deviation (.sd) -> two values per UG : <feature>.m, <feature>.sd

Usually the initial and final silence interval of a recording is disregarded.


Rhythm feature overview

Voicing %V : proportion (time) of voiced signal

Speech rate sylrate : number of syllables (nuclei) per sec

Silence intervals ps-persyl : number of short pauses (<1sec) per syllable ps-persec : number of short pauses per sec pl-persyl : number of long pauses (>1sec) per syllable pl-persec : number of long pauses per sec

Rhythm


Rhythm feature overview (Cont.)

Silence dimensions durs : length of short pauses (<1sec)

Cluster dimensions deltaV, deltaC (Ramus et al 1999) : voiced and unvoiced cluster lengths deltaSN : nuclei distances

Cluster structure nPVI-V, nPVI-C (Grabe&Low 2004) : length difference of consecutive clusters normalized to average length of both clusters nPVI-SN : distance difference of consecutive syllable nuclei normalized to average length of both distances

Rhythm


Some results (45 female + 37 male, read + command speech)

Rhythm

RM-ANOVA: p = 0.0014 p > 0.05 p < 0.001 p > 0.05 p = 0.049

Post hoc speech type: no - only command - only read

Post hoc gender: No interaction in gender in all features


Conclusion

Work in progress, therefore: no conclusions!

But ...


Prosodic Features for ALC?

---

Thank you!

Discussion

Documents

1 VIU Seminar 14. - 17. April 2009 Alcoholized Speech: F0 and Rhythm Florian Schiel Bavarian Archive for Speech Signals Institute of Phonetics and Speech