Upload
whitney-newton
View
212
Download
0
Embed Size (px)
Citation preview
[1]
Processing the Prosody of Oral Presentations
Rebecca Hincks
KTH, The Royal Institute of Technology
Department of Speech, Music and Hearing
The Unit for Language and Communication
Rebecca Hincks [2]
English in Sweden• A second language rather
than a foreign language• Nearly all beginners are
children• ASR not appropriate or
necessary for acquisition of sounds
Rebecca Hincks [3]
Support for advanced L2 users?
• Vision: Speech checker analogous to a spellchecker or grammar checker
• Practice an oral presentation, get feedback on:– Lexicon– Pronunciation– Prosody
• Making a presentation can be difficult in a native language, and is even more difficult in an L2
• Standard advice for how to deliver a presentation– Use a lively voice, don’t speak too fast, take pauses
• These qualities can be processed automatically using speech analysis
Rebecca Hincks [4]
What is a lively voice?
• A voice that varies in pitch and rhythm• A voice that shows enthusiasm• Difficult for native speakers, but more difficult for
non-native speakers• Studies have shown that non-natives use a more
narrowed pitch range than natives (Pickering 2004)
• Tools for helping speakers increase their liveliness should be welcomed
• Research Question: How can we measure liveliness automatically?
Rebecca Hincks [5]
Corpus of student speech
• Audio recordings of 35 ten-minute presentations in English made by engineering students
• Recordings made in the classroom• Selected 10 women and 10 men
– Varied levels of ability in English– All native speakers of Swedish
• Written feedback on the presentations from teachers and classmates
• In preparation: listener ratings of liveliness and fluency
Rebecca Hincks [6]
Pitch dynamism quotient, PDQ
Standard deviation of F0 in Hertz
Mean F0 in HertzPDQ =
• F0 = Fundamental frequency = pitch
• Necessary to normalize the standard deviation in order to compare voices that are naturally high or naturally low
Rebecca Hincks [7]
Time, frequencies and editing
• Between 7 and 10 minutes per person• Divided in intervals of (1 min, 30 s, 15 s,) 10
seconds• WaveSurfer’s ESPS settings: 60-400 Hz men, 75-
600 Hz women • Have also analyzed at 25-400 Hz men, 25-500 Hz
women• Visually inspected every contour and edited away
as many errors as possible
Rebecca Hincks [8]
Mean pitch dynamism quotient for 7-10 minutes of speech
0.05
0.10
0.15
0.20
0.25
42C
S1
45G
K1
50T
N2
52V
J2
54M
L3
56P
T2
58C
N3
58E
L2
63A
J3
63N
W3
64M
N2
68H
Ö3
69T
M2
70JH
3
80N
B4
85O
M4
88K
S4
88T
O4
89E
H4
90M
N4
Student, by placement test
Mea
n P
DQ
Females
Males
Rebecca Hincks [9]
Three proficient speakers
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
Consecutive time periods of 10 seconds
Pit
ch
dy
na
mis
m q
uo
tie
nt
85OM
88TO
89EH
Rebecca Hincks [10]
Lively speaker 1
• Mean PDQ: .23
“the divergence”
“well-structured,” “confident,” “easy to follow,” “very coherent,” and the speech “well-modulated” and with “varied intonation.”
Speaker 85OM4
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
PD
Q
Rebecca Hincks [11]
Lively speaker 2
Speaker 88TO4
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
PD
Q
• Mean PDQ: .21
Her presentation was “well-rehearsed” and “professional.”
Rebecca Hincks [12]
Monotone speaker
• Medel PDQ: .12
“why is voice over IP interesting?
• Mean PDQ: .12
Delivery was “a little deadpan,” “more animated facial expressions would be good,” and the presentation would be improved by “showing more enthusiasm.”
Speaker 89EH4
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
PD
Q
Rebecca Hincks [13]
Test values; 9 per speaker
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0 10 20 30 40 50 60 70 80 90
Individual ten-second segment
PD
QMales
Females
Selection of files for listening test
• 3 lowest PDQ• 3 closest to
mean• 3 highest
Rebecca Hincks [14]
Conclusions
• Normalized standard deviation can be used as a measure of liveliness in speaking styles used for oral presentations
• Hypothesis: PDQ values over .15 lively, over .30 very lively, between .20 and .25 a good target- Different preferences depending on personality and culture?
• Unclear effect of Swedish L1 and of proficiency in English
• Applications: teaching, presentation skills• Appropriate feedback: not values but a talking
head that moves from alert to sleepy
Rebecca Hincks [15]
Thank you for your attention…