Upload
cleopatra-bridges
View
213
Download
0
Embed Size (px)
Citation preview
Data collection and experimentation
Why should we talk about data collection? •
• It is a central part of most, if not all, aspects of current speech technology
• The higher grades (A, B; as tested in the home exam assignments and the project) require a measure of data collection
What is data collection? •
• In speech technology, the gathering of human communicative behaviours that can be used for implementation of e.g. spoken dialogue systems
• What do we gather?- Speech- Text- Voices- Gestures- Patterns!
All vs one?
• Recognition: we want to have seen all possibilities• Synthesis: we want one, consistent behaviour
Group exercise
• Same groups as before• Design one or more data collection(s) that will become
the basis for a spoken dialogue system intended to inform users of the television program
• Take note of why you make your design choices• We’ll talk about it here in 30 minutes
• Application- Remote control• Select programme• Menu options - tree
- TV guide• More free speech• But connected to GUI options (e.g. for lists)
• Data- Room environment- Age recognition data • Recognize age• Recognize identity of a specific mother
- Usage probabilities- Asking people - ratings- Language? Programmes are english, swedish- Read tv guide- But people speak differently (“trean”)- Monitor corpus (updated)- “Beta” version – iterative process (h/h, WoZ, beta)- Demography: adults, elderly, kids?- Keywords• Cloud• Times• Some commands
Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
7
What is a corpus? •
• Wikipedia:- A collection of written or spoken material in machine-
readable form, assembled for the purpose of studying linguistic structures, frequencies, etc.
Click icon to add pictureClick icon to add pictureClick icon to add picture
Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
8
Why collect a corpus? •
- ”[...] for the purpose of studying linguistic structures, frequencies, etc.”
- Sample - cannot analyze all
- Training data for duplicating behaviours
- Analysis of how humans do things
- Generalisability, representativeness• Same results in different corpora
• Use constraints, standards, theories to form the corpus
• If findings are expected - corroborate theory - we're better off
Click icon to add pictureClick icon to add pictureClick icon to add picture
Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
9
How is a corpus collected? •
• Often high formal demands:- Structure- Balance
• Audio, visual, audiovisual - choice of modalities- Requires equipment- Silent lab
Click icon to add pictureClick icon to add pictureClick icon to add picture
Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
10
Where are corpora collected? •
Click icon to add pictureClick icon to add pictureClick icon to add picture
Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
11
When are corpora collected? •
• Often collected once, then static- But monitor corpora exists- And the web is as always changing things
Click icon to add pictureClick icon to add pictureClick icon to add picture
Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
12
Examples of corpora? •
Click icon to add pictureClick icon to add pictureClick icon to add picture
Thank you!Questions?