Upload
george-little
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Data-driven approach to rapid prototyping Xhosa speech
synthesis
Albert VisagieJustus Roux
Centre for Language and Speech Technology
Stellenbosch UniversitySouth Africa
Introduction
• Japan-South African Intergovernmental Science and Technology Cooperation Programme.
• Goals:– Understand what is needed from a linguistic
and technology standpoint.– Build a text-analysis front-end.– Experimental platform.
Outline
• Xhosa: – orthography, – phonetics, – tone
• Approach: – Text analysis, – HTS.
Xhosa
• Xhosa is spoken in South Africa, by about 8 million people.
• One of the official languages of South Africa• Writing system is relatively young, and based on
English letters.• Many dialects. • Borrowed clicks from Khoisan.
Xhosa: Orthography
Agglutinative language.Nouns:
– 15 classes (including plural & singular).– Nouns affixed for dimunitive.
Verbs:– Verbs affixed according to subject, tense, negative etc.
Examples:teach: -fund-preacher (teacher): umfundisi u + m(u) + fund + is + ismall preacher: umfundisana u + m(u) + fund + is + anaHe/she will teach them:
uzakubafundisa u + za + ku + ba + fund + is + a
Xhosa: Phonetics
Consonants:• Implosive /b/• Ejectives and aspirated versions of stops.• 15 Clicks
Vowels• Five basic vowels, including long versions.
Xhosa: Tone
• According to the literature, it’s a tone language.• High, Low, and Falling tones.• Recent dictionary: has tone marked for root morphemes,
rules can be constructed to predict movement under morphological composition.
• Recent work:– Downing, Roux, argue for accent.– Kuun: Statistical experiment suggests highly regular structure.
• Observed regularity on pitch rises and duration increase gives a simple method to use in a first prototype.
Approach
Focus on language dependent components:– Build the text analyser,– use an existing synthesiser.
Choice: HTS 2.0– Model driven, trainable synthesiser.– Contains language independent F0 and duration
models– Good use of synthesis database by predicting
spectrum, F0 and segment duration separately.
HTS
HTS: Symbolic Features
Each segment of audio (HMM state) is labelled according to its linguistic context
Examples:• Phonetic context: labels of preceding and following
phones.• Parts-of-speech.• Stress or canonical tone.• Counting.
Text Analyser Components
Components:– Orthographic to phonetic– Morphological analysis– Parts-of-speech– Canonical tone marks
Orthographic to Phonetic
• The orthography is very young, and highly consistent with the pronunciation.
• Hand-written letter-to-sound rewrite rules.
• Lexicon for loan words.
Morphology
• Specially bootstrapped from a Zulu version for this project.
• Requires a lexicon of root morphemes.• Works with isolated words.• Ambiguous!• Ideal: root morpheme boundaries, affix types,
POS tagger for disambiguation.• Implemented: None
Parts-of-Speech
• Morphological analysis.
• Ideal: POS tagger.
• Implemented: Exhaustive lists of closed sets – pronouns, conjunctions, prepositions, etc.
Tone
• A printed dictionary with canonical tone markings for root morphemes is available.
• Rules can be constructed to determine movement of at least High tones, under morphological composition.
• Highly regular structure: 3rd-from-last syllable starts high pitch excursion, 2nd-from-last syllable lengthened.
• Ideal: Exhaustive specification of set tones• Implemented: Word-level syllable counts (3-1, 2-2, 1-3)
Tests
• Basic intelligibility test:Listeners asked to transcribe what they hear.– Incomplete phrases.– Two versions of the question set, and natural
utterances (recoded)– Mother-tongue and second language speakers.
• Impressions:– “He’s from the townships.”– “That’s perfect, there’s nothing wrong with that.”– Also frowns and repeats.
Next Steps
• Comprehension test?
• Impressions.
• Baseline comparative/preference test.
• Improvements– Question phrases.– Information from morphological analysis.– Canonical tone markings.
• Zulu
Conclusion
• The system worked very well, considering the bare minimum of knowledge currently incorporated.
• Data driven approach with HTS well suited to bootstrapping a new language.
• Got experimental platform
Demos
“Ubangele amadoda amaninzi kule lali,”– Natural:
– Synthesised:
“waqalisa ukunqwenela ukuba nomzi.”– Natural:
– Synthesised:
Click song: