Upload
benedict-watson
View
212
Download
0
Embed Size (px)
Citation preview
Senior Project – Electrical Engineering - 2008
Tool for Improving Non-Native French
Speech Pronunciation Joseph Ciaburri
Advisor – Professor Catravas, Professor Chilcoat
Abstract:
For foreign language students, pronunciation can be one of the most frustrating aspects of mastering the language. The art of correct pronunciation is more difficult to encapsulate into a set of rules than grammar. As a result, students must either rely on instructor critiques, or their own aural judgment. In language laboratories, the student typically hears a native speaker, speaks into a recorder and listens to his or her voice replayed. Such an approach does not take advantage of the potential for facial movement to provide feedback. In this work, the introduction of a video monitor of facial movement into a pronunciation software tool, along with traditional aural and signal processing based techniques, is investigated. Much like a language laboratory, a native speaker reads a phrase, which the student repeats. Matlab acquires the student response via a webcam and microphone, which replays the student's attempt, allowing the student to self-diagnose. The audio signal is analyzed and displayed in the frequency domain as Short-time Fourier Transform in the form of a spectrogram and in the quefrency domain as the cepstrum. The initial focus is on vowel sounds. Future work will include efforts to provide a bull's eye comparing a numerical figure of merit with a target reference. A software tool that focuses on both audio and video for language learning has the potential increase pronunciation skill while decreasing the learning time. This project can also provide a platform to enable testing of the effectiveness and significance of the different feedback mechanisms employed for language pronunciation.
http://www.logitech.com/repository/471/jpg/3770.1.0.jpg
USER
Window 1Native Speaker
Audio
Speech
Microphone Webcam
Video
Data Acquisition
Window 2Audio and Videoof User Speaking
Data
Dat
a
Data
Window 3Diagnostics
Video
Audio
Video
Video
Audio
Audio/Visual Databank
Time Domain
AcknowledgementsProfessor RudkoProfessor HansonProfessor StreignitzProfessor CotterProfessor CatravasProfessor ChilcoatProfessor Pickering
Listen and Repeat System
Language Lab
Proposed System
0 0.5 1 1.5-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Time (s)
Am
plitu
de
Native Speaker Non-Native Speaker
CepstrumNative Speaker Non-Native Speaker
SpectrogramNative Speaker Non-Native Speaker
Results:
The building blocks for modules of the teaching tool, shown in the upper right, are ready for implementation. The ability to acquire synchronized audio and video, and to play the audio and video (not synchronized) form the basis for the visual and aural feedback. The audio signal ,as shown in the analysis to the right, can be displayed in the time domain, the frequency domain, and the quefrency domain, which will allow for the quantization of the signal. The time domain can be used for defining the phonemes and stressing. The frequency domain is used to define the spectrogram, which is used to determine vowels using formants, and consonants using transitions. The quefrency domain is used to create the cepstrum which is used to find the fundamental frequency.
Frequency Domain
Quefrency Domain
Analysis for the word “Analyste” (Analyst)