28
1 How to Speak a Language without Knowing It Yibin Jiang

How to Speak a Language without Knowing Itlaotzu.bit.uni-bonn.de/ipec_presentation/language0.pdf5 Input: Text entered by the speaker, in her own language. Output: Phonetic rendering

Embed Size (px)

Citation preview

1

How to Speak a Language without Knowing It

Yibin Jiang

2

Question:

Can you speak French if you don’t it?

For example:

Laissez-moi tranquille

Introduction

3

Answer:

Of course! You can!

Try to pronounce:

Laissez-moi tranquille

Less-ay mwah trahn-KEEL

Introduction

4

phrasebooks additionally provide phonetic spellings that approximate the sounds of the foreign phrase. These spellings employ the familiar writing system and sounds of the speaker’s language.

Travel Phrasebook

5

Input: Text entered by the speaker, in her own

language.

Output: Phonetic rendering of a foreign language translation of that text, which, when pronounced by the speaker, can be understood by the listener.

Software Program

6

Build several candidate Chinese-to-Chinglish systems and evaluate them as follows:

• Compute the normalized edit distance between the system’s output and a human generated Chinglish reference.

Evaluation

7

• A Chinese speaker pronounces the system’s output out loud, and an English listener takes dictation. Measure the normalized edit distance against an English reference.

• Automate the previous evaluation by replace

the two humans with: (1) a Chinese speech synthesizer, and (2) an English speech recognizer.

Evaluation

8

Source of training data:

Travel Phrasebook

Get:

1312 <Chinese, English, Chinglish> tuples

1182 utterances for training, 65 for development,

and 65 for test.

Data

9

First, because Chinglish and Chinese are written with the same characters, they render the same inventory of 416 distinct syllables but the distribution of Chinglish syllables differs with each other.

Properties

10

Second, multiple occurrences of an English word type are generally associated with the same Chinglish sequence. Also, Chinglish characters do not generally span multiple English words.

Properties

11

A cascade of weighted f inite-state transducers

(wFST)

Model

12

Use an online MT system to convert Chinese to an English word sequence (Eword).

FST A generates an English sound sequence (Epron). It is constructed from the CMU Pronouncing Dictionary.

Model

13

wFST B translates English sounds into Chinese sounds (Pinyin-split).

Pinyin is an off icial syllable-based romanization of Mandarin Chinese characters.

Pinyin-split is a standard separation of Pinyin syllables into initial and f inal parts.

Model

14

FSTs A, C, and D are unweighted.

Train wFST B and FST E.

Training

15

Construct string pairs <Epron, Pinyin-split>

Epron: by pronouncing the phrasebook English with FST A.

Pinyin-split: by pronouncing the phrasebook Chinglish with FSTs D and C.

Phoneme-based model

16

Run the EM algorithm to learn FST B parameters and Viterbi alignments.

Phoneme-based model

17

Mappings between phonemes are context sensitive.

For example: Grandmother

As the reference Pinyin-split sequence is:

Phoneme-phrase-based model

18

Second, extract phoneme phrase pairs consistent with these alignments.

Phoneme-phrase-based model

First, obtain Viterbi alignments using the phoneme-based model.

19

Create <English, Pinyin> training pairs from the phrasebook by pronouncing the Chinglish with

FST D.

EM learns values for parameters like P(nai te|night), plus Viterbi alignments such as:

Word-based model

20

To improve the accuracy of word-based EM alignment, use the phoneme based model to decode each English word in the training data to

Pinyin.

Hybrid training

21

wFST E fails if an utterance contains a new English word type, previously unseen in training.

phoneme-based models fail when some English word type falls outside the CMU pronouncing dictionary.

Use the word-based model for known English words, and the phoneme-based models for unknown English

words.

Hybrid decoding

22

Measure the Chinglish output against references from

the test portion of the phrasebook, using edit distance.

Start with reference English and measure the accuracy of Pinyin syllable production, since the choice of Chinglish character does not affect the Chinglish pronunciation.

First Evaluation

23

Word-based method has very high accuracy,

but low coverage.

Best system is the Hybrid training/decoding method.

First Evaluation

24

Speak Chinglish character sequence output aloud and ask an English monolingual person to transcribe it.

Measure edit distance between human transcription and reference English from our phrasebook.

Second Evaluation

25

Repeat the last experiment, but removing the human from the loop, using both automatic Chinese speech synthesis and English speech recognition.

Third Evaluation

26

Third Evaluation

27

Conclusion

Help people speak foreign languages they don’t know, by providing native phonetic spellings that approximate the sounds of foreign phrases.

Use a cascade of f inite-state transducers to accomplish the task.

Improve the model by adding phrases, word boundary constraints, and improved alignment.

28

Acknowledgement

Thank you