19
Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University South Africa

Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Embed Size (px)

Citation preview

Page 1: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Data-driven approach to rapid prototyping Xhosa speech

synthesis

Albert VisagieJustus Roux

Centre for Language and Speech Technology

Stellenbosch UniversitySouth Africa

Page 2: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Introduction

• Japan-South African Intergovernmental Science and Technology Cooperation Programme.

• Goals:– Understand what is needed from a linguistic

and technology standpoint.– Build a text-analysis front-end.– Experimental platform.

Page 3: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Outline

• Xhosa: – orthography, – phonetics, – tone

• Approach: – Text analysis, – HTS.

Page 4: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Xhosa

• Xhosa is spoken in South Africa, by about 8 million people.

• One of the official languages of South Africa• Writing system is relatively young, and based on

English letters.• Many dialects. • Borrowed clicks from Khoisan.

Page 5: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Xhosa: Orthography

Agglutinative language.Nouns:

– 15 classes (including plural & singular).– Nouns affixed for dimunitive.

Verbs:– Verbs affixed according to subject, tense, negative etc.

Examples:teach: -fund-preacher (teacher): umfundisi u + m(u) + fund + is + ismall preacher: umfundisana u + m(u) + fund + is + anaHe/she will teach them:

uzakubafundisa u + za + ku + ba + fund + is + a

Page 6: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Xhosa: Phonetics

Consonants:• Implosive /b/• Ejectives and aspirated versions of stops.• 15 Clicks

Vowels• Five basic vowels, including long versions.

Page 7: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Xhosa: Tone

• According to the literature, it’s a tone language.• High, Low, and Falling tones.• Recent dictionary: has tone marked for root morphemes,

rules can be constructed to predict movement under morphological composition.

• Recent work:– Downing, Roux, argue for accent.– Kuun: Statistical experiment suggests highly regular structure.

• Observed regularity on pitch rises and duration increase gives a simple method to use in a first prototype.

Page 8: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Approach

Focus on language dependent components:– Build the text analyser,– use an existing synthesiser.

Choice: HTS 2.0– Model driven, trainable synthesiser.– Contains language independent F0 and duration

models– Good use of synthesis database by predicting

spectrum, F0 and segment duration separately.

Page 9: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

HTS

Page 10: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

HTS: Symbolic Features

Each segment of audio (HMM state) is labelled according to its linguistic context

Examples:• Phonetic context: labels of preceding and following

phones.• Parts-of-speech.• Stress or canonical tone.• Counting.

Page 11: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Text Analyser Components

Components:– Orthographic to phonetic– Morphological analysis– Parts-of-speech– Canonical tone marks

Page 12: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Orthographic to Phonetic

• The orthography is very young, and highly consistent with the pronunciation.

• Hand-written letter-to-sound rewrite rules.

• Lexicon for loan words.

Page 13: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Morphology

• Specially bootstrapped from a Zulu version for this project.

• Requires a lexicon of root morphemes.• Works with isolated words.• Ambiguous!• Ideal: root morpheme boundaries, affix types,

POS tagger for disambiguation.• Implemented: None

Page 14: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Parts-of-Speech

• Morphological analysis.

• Ideal: POS tagger.

• Implemented: Exhaustive lists of closed sets – pronouns, conjunctions, prepositions, etc.

Page 15: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Tone

• A printed dictionary with canonical tone markings for root morphemes is available.

• Rules can be constructed to determine movement of at least High tones, under morphological composition.

• Highly regular structure: 3rd-from-last syllable starts high pitch excursion, 2nd-from-last syllable lengthened.

• Ideal: Exhaustive specification of set tones• Implemented: Word-level syllable counts (3-1, 2-2, 1-3)

Page 16: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Tests

• Basic intelligibility test:Listeners asked to transcribe what they hear.– Incomplete phrases.– Two versions of the question set, and natural

utterances (recoded)– Mother-tongue and second language speakers.

• Impressions:– “He’s from the townships.”– “That’s perfect, there’s nothing wrong with that.”– Also frowns and repeats.

Page 17: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Next Steps

• Comprehension test?

• Impressions.

• Baseline comparative/preference test.

• Improvements– Question phrases.– Information from morphological analysis.– Canonical tone markings.

• Zulu

Page 18: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Conclusion

• The system worked very well, considering the bare minimum of knowledge currently incorporated.

• Data driven approach with HTS well suited to bootstrapping a new language.

• Got experimental platform

Page 19: Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University

Demos

“Ubangele amadoda amaninzi kule lali,”– Natural:

– Synthesised:

“waqalisa ukunqwenela ukuba nomzi.”– Natural:

– Synthesised:

Click song: