[1] Processing the Prosody of Oral Presentations Rebecca Hincks KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit

[1]

Processing the Prosody of Oral Presentations

Rebecca Hincks

KTH, The Royal Institute of Technology

Department of Speech, Music and Hearing

The Unit for Language and Communication

Rebecca Hincks [2]

English in Sweden• A second language rather

than a foreign language• Nearly all beginners are

children• ASR not appropriate or

necessary for acquisition of sounds

Rebecca Hincks [3]

Support for advanced L2 users?

• Vision: Speech checker analogous to a spellchecker or grammar checker

• Practice an oral presentation, get feedback on:– Lexicon– Pronunciation– Prosody

• Making a presentation can be difficult in a native language, and is even more difficult in an L2

• Standard advice for how to deliver a presentation– Use a lively voice, don’t speak too fast, take pauses

• These qualities can be processed automatically using speech analysis

Rebecca Hincks [4]

What is a lively voice?

• A voice that varies in pitch and rhythm• A voice that shows enthusiasm• Difficult for native speakers, but more difficult for

non-native speakers• Studies have shown that non-natives use a more

narrowed pitch range than natives (Pickering 2004)

• Tools for helping speakers increase their liveliness should be welcomed

• Research Question: How can we measure liveliness automatically?

Rebecca Hincks [5]

Corpus of student speech

• Audio recordings of 35 ten-minute presentations in English made by engineering students

• Recordings made in the classroom• Selected 10 women and 10 men

– Varied levels of ability in English– All native speakers of Swedish

• Written feedback on the presentations from teachers and classmates

• In preparation: listener ratings of liveliness and fluency

Rebecca Hincks [6]

Pitch dynamism quotient, PDQ

Standard deviation of F0 in Hertz

Mean F0 in HertzPDQ =

• F0 = Fundamental frequency = pitch

• Necessary to normalize the standard deviation in order to compare voices that are naturally high or naturally low

Rebecca Hincks [7]

Time, frequencies and editing

• Between 7 and 10 minutes per person• Divided in intervals of (1 min, 30 s, 15 s,) 10

seconds• WaveSurfer’s ESPS settings: 60-400 Hz men, 75-

600 Hz women • Have also analyzed at 25-400 Hz men, 25-500 Hz

women• Visually inspected every contour and edited away

as many errors as possible

Rebecca Hincks [8]

Mean pitch dynamism quotient for 7-10 minutes of speech

0.05

0.10

0.15

0.20

0.25

42C

S1

45G

K1

50T

N2

52V

J2

54M

L3

56P

T2

58C

N3

58E

L2

63A

J3

63N

W3

64M

N2

68H

Ö3

69T

M2

70JH

3

80N

B4

85O

M4

88K

S4

88T

O4

89E

H4

90M

N4

Student, by placement test

Mea

n P

DQ

Females

Males

Rebecca Hincks [9]

Three proficient speakers

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Consecutive time periods of 10 seconds

Pit

ch

dy

na

mis

m q

uo

tie

nt

85OM

88TO

89EH

Rebecca Hincks [10]

Lively speaker 1

• Mean PDQ: .23

“the divergence”

“well-structured,” “confident,” “easy to follow,” “very coherent,” and the speech “well-modulated” and with “varied intonation.”

Speaker 85OM4

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58


PD

Q

Rebecca Hincks [11]

Lively speaker 2

Speaker 88TO4

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58


PD

Q

• Mean PDQ: .21

Her presentation was “well-rehearsed” and “professional.”

Rebecca Hincks [12]

Monotone speaker

• Medel PDQ: .12

“why is voice over IP interesting?

• Mean PDQ: .12

Delivery was “a little deadpan,” “more animated facial expressions would be good,” and the presentation would be improved by “showing more enthusiasm.”

Speaker 89EH4

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58


PD

Q

Rebecca Hincks [13]

Test values; 9 per speaker

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0 10 20 30 40 50 60 70 80 90

Individual ten-second segment

PD

QMales

Females

Selection of files for listening test

• 3 lowest PDQ• 3 closest to

mean• 3 highest

Rebecca Hincks [14]

Conclusions

• Normalized standard deviation can be used as a measure of liveliness in speaking styles used for oral presentations

• Hypothesis: PDQ values over .15 lively, over .30 very lively, between .20 and .25 a good target- Different preferences depending on personality and culture?

• Unclear effect of Swedish L1 and of proficiency in English

• Applications: teaching, presentation skills• Appropriate feedback: not values but a talking

head that moves from alert to sleepy

Rebecca Hincks [15]

Thank you for your attention…

Documents

[1] Processing the Prosody of Oral Presentations Rebecca Hincks KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit