Upload
duongthien
View
214
Download
0
Embed Size (px)
Citation preview
►
2
Question:
Can you speak French if you don’t it?
For example:
Laissez-moi tranquille
Introduction
►
3
Answer:
Of course! You can!
Try to pronounce:
Laissez-moi tranquille
Less-ay mwah trahn-KEEL
Introduction
►
4
phrasebooks additionally provide phonetic spellings that approximate the sounds of the foreign phrase. These spellings employ the familiar writing system and sounds of the speaker’s language.
Travel Phrasebook
►
5
Input: Text entered by the speaker, in her own
language.
Output: Phonetic rendering of a foreign language translation of that text, which, when pronounced by the speaker, can be understood by the listener.
Software Program
►
6
Build several candidate Chinese-to-Chinglish systems and evaluate them as follows:
• Compute the normalized edit distance between the system’s output and a human generated Chinglish reference.
Evaluation
►
7
• A Chinese speaker pronounces the system’s output out loud, and an English listener takes dictation. Measure the normalized edit distance against an English reference.
• Automate the previous evaluation by replace
the two humans with: (1) a Chinese speech synthesizer, and (2) an English speech recognizer.
Evaluation
►
8
Source of training data:
Travel Phrasebook
Get:
1312 <Chinese, English, Chinglish> tuples
1182 utterances for training, 65 for development,
and 65 for test.
Data
►
9
First, because Chinglish and Chinese are written with the same characters, they render the same inventory of 416 distinct syllables but the distribution of Chinglish syllables differs with each other.
Properties
►
10
Second, multiple occurrences of an English word type are generally associated with the same Chinglish sequence. Also, Chinglish characters do not generally span multiple English words.
Properties
►
12
Use an online MT system to convert Chinese to an English word sequence (Eword).
FST A generates an English sound sequence (Epron). It is constructed from the CMU Pronouncing Dictionary.
Model
►
13
wFST B translates English sounds into Chinese sounds (Pinyin-split).
Pinyin is an off icial syllable-based romanization of Mandarin Chinese characters.
Pinyin-split is a standard separation of Pinyin syllables into initial and f inal parts.
Model
►
15
Construct string pairs <Epron, Pinyin-split>
Epron: by pronouncing the phrasebook English with FST A.
Pinyin-split: by pronouncing the phrasebook Chinglish with FSTs D and C.
Phoneme-based model
►
17
Mappings between phonemes are context sensitive.
For example: Grandmother
As the reference Pinyin-split sequence is:
Phoneme-phrase-based model
►
18
Second, extract phoneme phrase pairs consistent with these alignments.
Phoneme-phrase-based model
First, obtain Viterbi alignments using the phoneme-based model.
►
19
Create <English, Pinyin> training pairs from the phrasebook by pronouncing the Chinglish with
FST D.
EM learns values for parameters like P(nai te|night), plus Viterbi alignments such as:
Word-based model
►
20
To improve the accuracy of word-based EM alignment, use the phoneme based model to decode each English word in the training data to
Pinyin.
Hybrid training
►
21
wFST E fails if an utterance contains a new English word type, previously unseen in training.
phoneme-based models fail when some English word type falls outside the CMU pronouncing dictionary.
Use the word-based model for known English words, and the phoneme-based models for unknown English
words.
Hybrid decoding
►
22
Measure the Chinglish output against references from
the test portion of the phrasebook, using edit distance.
Start with reference English and measure the accuracy of Pinyin syllable production, since the choice of Chinglish character does not affect the Chinglish pronunciation.
First Evaluation
►
23
Word-based method has very high accuracy,
but low coverage.
Best system is the Hybrid training/decoding method.
First Evaluation
►
24
Speak Chinglish character sequence output aloud and ask an English monolingual person to transcribe it.
Measure edit distance between human transcription and reference English from our phrasebook.
Second Evaluation
►
25
Repeat the last experiment, but removing the human from the loop, using both automatic Chinese speech synthesis and English speech recognition.
Third Evaluation
►
27
Conclusion
Help people speak foreign languages they don’t know, by providing native phonetic spellings that approximate the sounds of foreign phrases.
Use a cascade of f inite-state transducers to accomplish the task.
Improve the model by adding phrases, word boundary constraints, and improved alignment.