View
213
Download
1
Tags:
Embed Size (px)
Citation preview
1
Zöe HandleyLearning Sciences Research Institute (LSRI), University of Nottingham
and
Marie-Josée HamelDepartment of French, Dalhousie University
Is Text-to-Speech Synthesis Ready for use in CALL?
CALL 2008, Antwerp, Belgium
2
Plan TTS synthesis in CALL Evaluation Requirements analysis Readiness of TTS synthesis for CALL Conclusions
3
TTS synthesis What is TTS synthesis?
– Speech synthesis “systems … allow the generation of novel messages, either from scratch (i.e. entirely by rule) or by re-combining shorter pre-stored units” (van Bezooijen and van Heuven, 1997: 709)
– Text-to-Speech Synthesis systems allow the automatic generation of speech from text
Why use TTS in CALL?– There is a general need in language
learning and teaching for “self-paced interactive learning environments” which provide “controlled interactive speaking practice outside the classroom” (Ehsani and Knodt, 1998: 45).
http://www.acapela-group.com/text-to-speech-interactive-demo.html
Graham Lucy
4
CALL Applications Reading machine
– Talking dictionaries, texts (de Pijper, 1997; Hamel, 2003), word processors, and conjugators, dictations (Santiago-Oriola, 1999; Mercier et al., 1999), and grapheme-phoneme exercises
Pronunciation model– Auditory discrimination; repetition
(Hamel, 1998; Mercier et al., 2000)
Conversational partner– In combination with automatic
speech recognition, speech understanding, the generative power of TTS synthesis can be harnessed to provide learners with interactive speaking practice, i.e. a dialogue partner (Raux and Eskenazi, 2004; Senef et al., 2004)
Oxford Hachette 4 French Dictionary on CD-ROM
5
Benefits of TTS synthesis Improvements on other
media– Easy creation and editing of
speech samples
– Simultaneous presentation of text and speech
– Low storage requirements
– Non-human and therefore perceived as non-judgemental
Adds value– Generation of examples
on demand (Sherwood, 1981) and therefore the automatic generation of feedback, conversational turns, and exercises with speech models
6
Why evaluation? Few CALL applications integrating TTS synthesis are available
on the market
Few evaluations of TTS synthesis for the purposes of CALL have been conducted
Since the failure of the language laboratory teachers have been sceptical about unevaluated technologies
TTS synthesis is being used in CALL in roles in which it has not been used in previous applications outside CALL - the most common, perhaps only, role that TTS synthesis assumes outside CALL is that of a reading machine
7
1. Basic research evaluation of TTS synthesis for use in CALL– Viability and potential benefits of the use of TTS synthesis in CALL
2. Technology evaluation of TTS synthesis for use in CALL – Adequacy of TTS synthesis for use in CALL
3. Judgemental evaluation of the CALL application – Potential of the CALL program to provide ideal conditions for SLA
4. Judgemental evaluation of the teacher-planned activity– Potential of the planned activity to provide ideal conditions for SLA
5. Usage evaluation of the teacher-planned activity– Learner’s performance in the planned activity
This is a combination of the levels of evaluation recommended by Chapelle (2001) for the evaluation of CALL activities and by ELSE (1999) for the evaluation of Speech and Language Technologies (SALT).
Framework for the evaluation of TTS synthesis
for use in CALL (Handley and Hamel, 2005)
8
Evaluations of TTS Synthesis for CALL Technology evaluations of TTS synthesis for use in CALL
• Stratil et al (1987)– Evaluated the quality of a Spanish TTS chip for use for the presentation of grammar
exercises in a language laboratory.
Usage evaluation of the teacher-planned activity– Outcome-oriented
• Santiao-Oriola (1999)– Evaluated the use of a French TTS synthesiser for the presentation of dictation
exercises.
• Hincks (2002)– Evaluated the use of a Swedish TTS synthesiser in combination with a speech editor
(re-synthesis) for teaching the lexical stress of English to Swedophones.
– Process-oriented• Cohen (1993)
– Evaluated the use of a talking word processor to support literacy activities, namely writing stories, for young learners of French as a second language.
9
Requirements analysis The evaluation process
– ISO (1999) and EAGLES (1999) guidelines
– Establish the evaluation requirements
• Establish the purpose of the evaluation
• Identify the types of products to be evaluated
• Specify the quality model– Specify the evaluation
• Select metrics• Establish rating levels for
metrics• Establish criteria for
assessment– Design the evaluation– Execute the evaluation
CALL requirements“When the language competence of the system begins to outstrip that of some of the better second language users, such systems become useful adjunct tools” (Keller and Zellner-Keller, 2000)
10
CALL requirements analysis Ideal conditions for Second Language Acquisition (SLA)
(Chapelle, 2001)– Language learning potential
• Goals of SLA– Communicative competence– Quality of the output– Primary requirement: Comprehensibility/intelligibility– Secondary requirements: Accuracy and naturalness– At both the level of individual speech sounds and the prosodic
level
• Focus on form– Flexibility– Speech rate, pitch
11
Explorative investigation (Handley and Hamel, 2005)
Research questions1. Do the different roles identified impose different requirements on the
quality of speech synthesis?2. Does comprehensibility account for acceptability for use in CALL?
Method– 17 French teachers – One research TTS system, FIPSvox from the University of Geneva – 3 roles: (1) reading machine, (2) pronunciation model, and (3)
conversational partner– Likert scales: (1) comprehensibility, (2) acceptability, and (3)
appropriateness– Word pointing paradigm (van Santen, 1993)
Results1. Most suitable as a dialogue partner. Least suitable as a pronunciation
model.2. Comprehensibility is not the only requirement. Accuracy and naturalness
matter as do register and expressiveness.
12
Is TTS synthesis ready for use in CALL?
Research questions– Do the different roles identified impose different requirements on the quality
of speech synthesis? – Is TTS synthesis ready for use in CALL?
Design– Within subjects, N = 17, French Teachers
– Dependent variables– Quality of the speech output– Acceptability– Adequacy
– Independent variables– Role of TTS in CALL: (1) Reading Machine (RM), (2) Pronunciation
Model (PM) at the (a) segmental level and (b) suprasegmental level, and (3) Conversational Partner (CP)
– TTS synthesis system
13
Systems evaluated
1. http://www.research.att.com/~ttsweb/tts/demo.php#top French English
2. http://212.8.184.250/tts/demo_login.jsp French English
3. http://www.multitel.be/TTS/layout.php?page=eLite_demo French English
4. http://www.acapela-group.com/text-to-speech-interactive-demo.html French English
14
Questionnaire MOS-CALL
– ITU-T Overall Quality Test– MOS-X (Polkosky and Lewis, 2003)
On-line presentation of questionnaire
15
Is TTS synthesis ready for use in CALL?
Different TTS synthesis systems are most suitable for use in different roles
Reinforces the need to evaluate every TTS synthesis system
System 4 is ready for use in all applications where TTS synthesis adds value
Mean ratings of adequacy
Mean ratings of acceptability
16
System 1: AT&T Next-Gen (Alain)Mean ratings of quality of output
17
System 2: Nuance Vocalizer (Julie)Mean ratings of quality of output
18
System 3: eLite (Vincent)Mean ratings of quality of output
19
Do the different roles have different requirements?
Differences in adequacy were statistically significant for systems 2 and 4 (χ²r = 8.010, df = 3, p = 0.046; χ²r = 8.063, df = 3, p = 0.045, respectively)
But, not for systems 1 and 3 (χ²r = 2.352, df = 3, p = 0.503; χ²r = 3.467, df = 3, p = 0.325; χ²r = 3.194, respectively)
Differences in acceptability were not significant (system 1 χ²r = 6.616, df = 3, p = 0.085, system 2 χ²r = 6.303, df = 3, p = 0.098, system 3 χ²r = 3.194, df = 3, p = 0.363, and system 4 χ²r = 5.547, df = 3, p = 0.163)
Mean ratings of adequacy
Mean ratings of acceptability
20
Conclusions Some French TTS synthesis systems are reaching readiness
for use in CALL in applications which add value
In order to fully meet the requirements of CALL more attention needs to be paid to accuracy and naturalness, in particular at the prosodic level, and expressiveness– Expressive speech synthesis is the focus of much current research
(Campbell et al., 2006)
This may not be the case for all languages; different languages pose different problems to TTS
It will not be long before learners will be able to benefit from the support of an untiring non-judgemental substitute native speaker 24/7 in CALL applications.