Senior Project – Electrical Engineering - 2008 Tool for Improving Non-Native French Speech Pronunciation Joseph Ciaburri Advisor – Professor Catravas,

Senior Project – Electrical Engineering - 2008

Tool for Improving Non-Native French

Speech Pronunciation Joseph Ciaburri

Advisor – Professor Catravas, Professor Chilcoat

Abstract:

For foreign language students, pronunciation can be one of the most frustrating aspects of mastering the language. The art of correct pronunciation is more difficult to encapsulate into a set of rules than grammar. As a result, students must either rely on instructor critiques, or their own aural judgment. In language laboratories, the student typically hears a native speaker, speaks into a recorder and listens to his or her voice replayed. Such an approach does not take advantage of the potential for facial movement to provide feedback. In this work, the introduction of a video monitor of facial movement into a pronunciation software tool, along with traditional aural and signal processing based techniques, is investigated. Much like a language laboratory, a native speaker reads a phrase, which the student repeats. Matlab acquires the student response via a webcam and microphone, which replays the student's attempt, allowing the student to self-diagnose. The audio signal is analyzed and displayed in the frequency domain as Short-time Fourier Transform in the form of a spectrogram and in the quefrency domain as the cepstrum. The initial focus is on vowel sounds. Future work will include efforts to provide a bull's eye comparing a numerical figure of merit with a target reference. A software tool that focuses on both audio and video for language learning has the potential increase pronunciation skill while decreasing the learning time. This project can also provide a platform to enable testing of the effectiveness and significance of the different feedback mechanisms employed for language pronunciation.

http://www.logitech.com/repository/471/jpg/3770.1.0.jpg

USER

Window 1Native Speaker

Audio

Speech

Microphone Webcam

Video

Data Acquisition

Window 2Audio and Videoof User Speaking

Data

Dat

a

Data

Window 3Diagnostics

Video

Audio

Video

Video

Audio

Audio/Visual Databank

Time Domain

AcknowledgementsProfessor RudkoProfessor HansonProfessor StreignitzProfessor CotterProfessor CatravasProfessor ChilcoatProfessor Pickering

Listen and Repeat System

Language Lab

Proposed System

0 0.5 1 1.5-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time (s)

Am

plitu

de

Native Speaker Non-Native Speaker

CepstrumNative Speaker Non-Native Speaker

SpectrogramNative Speaker Non-Native Speaker

Results:

The building blocks for modules of the teaching tool, shown in the upper right, are ready for implementation. The ability to acquire synchronized audio and video, and to play the audio and video (not synchronized) form the basis for the visual and aural feedback. The audio signal ,as shown in the analysis to the right, can be displayed in the time domain, the frequency domain, and the quefrency domain, which will allow for the quantization of the signal. The time domain can be used for defining the phonemes and stressing. The frequency domain is used to define the spectrogram, which is used to determine vowels using formants, and consonants using transitions. The quefrency domain is used to create the cepstrum which is used to find the fundamental frequency.

Frequency Domain

Quefrency Domain

Analysis for the word “Analyste” (Analyst)

Documents

Senior Project – Electrical Engineering - 2008 Tool for Improving Non-Native French Speech Pronunciation Joseph Ciaburri Advisor – Professor Catravas,