Upload
darleen-horton
View
216
Download
1
Embed Size (px)
Citation preview
1VIU Seminar 14. - 17. April 2009
Alcoholized Speech:F0 and Rhythm
Florian Schiel Bavarian Archive for Speech Signals
Institute of Phonetics and Speech ProcessingLudwig-Maximilians-Universität München, Germany
Special Thanks to:Chr. Heinrich, S. Barfüßer, I. Dhillon, Prof. Th. Gilg, RaOLG Tourneur
2VIU Seminar 14. - 17. April 2009
Overview
Motivation, Goals and Earlier Work ALC Corpus F0 Analysis Rhythm Analysis Discussion: Prosodic Features for ALC
Motivation
3VIU Seminar 14. - 17. April 2009
Why is alcoholized speech interesting? Phonetic Forensics:
Speaker identification from alcoholized speech samples
Determine alcoholization from air traffic recordings (for example Exxon Valdez crash in 1987)
Traffic accidents: determine alcoholization from in-car recordings, if blood samples are not available
Motivation
4VIU Seminar 14. - 17. April 2009
Why is alcoholized speech interesting? Speech Production:
How does intoxication influences planing and motor control?
Speech Perception:
Can listeners judge the alcoholisation from a speech sample? Which features do listeners use for their judgement?
Motivation
5VIU Seminar 14. - 17. April 2009
Why is alcoholized speech interesting? Traffic security
Can a voice controlled car judge the alcoholization of its driver (and then take measures)?
OnFocus / OffFocus}
Motivation
6VIU Seminar 14. - 17. April 2009
What has been done already? Forensic studies (2) Perception studies (3) Phonetic Features (10) Recognition (2)
Motivation
Common problems: mostly male speakers number of speakers is low (<40), statistics not valid intoxication measured by breath alcohol concentration (BRAC) lab speech ('Northwind and the Sun' etc.) results partly contradictory
7VIU Seminar 14. - 17. April 2009
What features have been investigated? F0 parameters formant Parameters RMS / Loudness spectral tilt of signal or source signal speech rate parameters pause length, number mispronunciations: deletions, insertions, repairs, stutter errors in phonetic gestures - incomplete gestures (measurement?) - lateralisation /r/ -> /l/ (measurement?) - shift of place /s/ -> /S/ or /s/ -> /T/ - nasalisation, de-nasalization (?)
Motivation
8VIU Seminar 14. - 17. April 2009
... and what has not been investigated? dysfluencies centralisation of vowels rhythm prosodic contours
Motivation
female speech 'outside the lab' speech command & control speech dialogue speech statistically valid data (>100 speakers, > 2 Mio phonemes
.... so lets do it! (Yes, we can!)
9VIU Seminar 14. - 17. April 2009
Our goals: verify/falsify reported findings on a larger database check for rhythm parameters check for prosodic contours (with Uwe's help?) check for centralization of vowels check for 'linguistic irregularities' check for gender / age / speech type influences check on sober control group preception experiments: what features are important?
Motivation
Help wanted!
10VIU Seminar 14. - 17. April 2009
alcoholization experiments at the Institute of Legal Medicine blood alcohol concentration : 0.05 – 0.2% breath sample (BRAC) and blood sample test (BAC) 15 minutes recording in two cars, SpeechRecorder, 2 mics read, monologue, dialogue, command&control (with engine) annotation SpeechDat extended by Verbmobil tags export into BAS Partitur Format, canonical pronunciation by BALLOON, MAUS segmentation import into Emu hierachy, F0, formants, RMS analysis using R
ALC Corpus
14VIU Seminar 14. - 17. April 2009
ALC Corpus
Nov 2007 | 2008 | 2009 |
First recordings 14 speakersrecorded
LREC 2008
First contactwith Legal Medicine
61 speakersrecorded
82 speakersrecorded
First F0 Analysis
Rhythm features Analysis
Analysis of irregularities
First perception tests
150 speakersrecorded
DFG application
Time line and estimates
75 female + 75 male speakers age 22 – 75 BAC 0.00 - 0.20%
15VIU Seminar 14. - 17. April 2009
ALC Corpus
Problems 2nd sober recording : loss rate of 20%
MAUS segmentation of dialogues unreliable solution : pre-segmentation into speaker and non- speaker parts, then MAUS on each speaker part
gender balance: we need more male speakers
age balance: very few speakers above 50
16VIU Seminar 14. - 17. April 2009
Analysis
RM-ANOVA requires one measurement per speaker and within-factor combination.
between-factors: sex, age, (drinking habits)within-factors: alc, speech type, (content, car noise)
Definition:utterance group (UG) : all utterances of one speaker
and one within-factor combination
Example:
UG(speaker=006, alc=a, type=spont) =3 monologues, 2 dialogues and 5 spontaneous commands
17VIU Seminar 14. - 17. April 2009
F0 Analysis
F0 from Vincent-Schaefer pitch period detector (Emu)
1. F0 Median Fm over utterance group (UG)
19VIU Seminar 14. - 17. April 2009
F0 Analysis
3. F0 in lexically accented vowels /a: e: E: i: u: o:/ in same context in read speech
22 female / 24 male speakers
Results:
Median and quarter-quantile distance of F0 behave like global values with following exceptions:
no significant increase of Fm for male speakers in
back vowels /o:/ and /u:/
no significant increase of Fqq
in /a:/, /o:/ and /u:/
20VIU Seminar 14. - 17. April 2009
F0 Analysis
4. F0 change per speaker
read speechF
m(alc) – F
m(non-alc)
45 female37 male
21VIU Seminar 14. - 17. April 2009
F0 Analysis
5. Hypothesis: F0 + energy contours differ
Example:
simple declarative sentences with single phrase
calculate F0 by Vincent-Schaefer
linear interpolated F0 gaps
calculated 2nd (tilt) and 3rd (curvature) coefficients of Discrete Cosine Transform (DCT)
22VIU Seminar 14. - 17. April 2009
F0 Analysis
blue : raw F0 red : linear interpolationgreen : DCT coefficients 0-2
DCT-0 = 313.62 (bias)DCT-1 = 35.73 (tilt)DCT-2 = -0.93 (curvature)
DCT-0 = 338.17 (bias)DCT-1 = 31.01 (tilt)DCT-2 = -3.92 (curvature)
23VIU Seminar 14. - 17. April 2009
F0 Analysis
2-dim. plot of DCT-1 (tilt) vs. DCT-2 (curvature)
-> centriods identical,variation increases for alcoholized speech
24VIU Seminar 14. - 17. April 2009
Rhythm
Rhythm in this context:The segmental structure of V, C and P clusters
syllable nuclei = middle of V cluster
25VIU Seminar 14. - 17. April 2009
Rhythm
Rhythm features
Two basic types of measurements:
counts (normalized over time or on number of syllables) or proportions, calculated across the UG -> one measurement per UG : <feature>
multiple measurements (e.g. per syllable) averaged across UG, usually expressed as mean (.m) and standard deviation (.sd) -> two values per UG : <feature>.m, <feature>.sd
Usually the initial and final silence interval of a recording is disregarded.
26VIU Seminar 14. - 17. April 2009
Rhythm feature overview
Voicing %V : proportion (time) of voiced signal
Speech rate sylrate : number of syllables (nuclei) per sec
Silence intervals ps-persyl : number of short pauses (<1sec) per syllable ps-persec : number of short pauses per sec pl-persyl : number of long pauses (>1sec) per syllable pl-persec : number of long pauses per sec
Rhythm
27VIU Seminar 14. - 17. April 2009
Rhythm feature overview (Cont.)
Silence dimensions durs : length of short pauses (<1sec)
Cluster dimensions deltaV, deltaC (Ramus et al 1999) : voiced and unvoiced cluster lengths deltaSN : nuclei distances
Cluster structure nPVI-V, nPVI-C (Grabe&Low 2004) : length difference of consecutive clusters normalized to average length of both clusters nPVI-SN : distance difference of consecutive syllable nuclei normalized to average length of both distances
Rhythm
28VIU Seminar 14. - 17. April 2009
Some results (45 female + 37 male, read + command speech)
Rhythm
RM-ANOVA: p = 0.0014 p > 0.05 p < 0.001 p > 0.05 p = 0.049
Post hoc speech type: no - only command - only read
Post hoc gender: No interaction in gender in all features