Upload
vladimir-kulyukin
View
170
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
MobAppDev
Text-To-Speech Synthesis
Vladimir Kulyukin
Outline
● Text-to-Speech Synthesis (TTS)● TTS on Android● TTS Customization● Overcoming TTS Limitations with Phonetic Spelling & Human
Recording
Review
TTS: Text To Speech
● The General Problem: Take a sequence of characters and generate a waveform
● Words are pronounced as a sequence of individual units called phones
● Phonetic alphabets describe how phones are pronounced● Phonological rules specify how phones combine into
speech
TTS Engine Anatomy
● A typical TTS engine consists of three components: text analyzer, language analyzer, waveform generator
● Text Analysis – parse text (after transliterating it if necessary) and identify words and utterances
● Linguistic Analysis – identify phrases and assign prosodies (accents, emphasis, duration, pauses, etc)
● Waveform Generation - generate a waveform from a fully specified linguistic description
TTS Approaches
● Full Automation – machine does everything● Mixed Initiative – human records a set of known
texts; machine learning is used to extract the rules● Human-Based Recording – human records
words/sentences/texts; machine plays them as needed
TTS on Android
Android TTS
● Android TTS is an multi-lingual speech synthesis engine
● Android TTS can be used as a black box: text in, speech out
● Android TTS can be parameterized
Starting TTS
● It is best practice to check if TTS is available on the device
● This is done via Intent to check TTS data● If the check is successful, a instance of TTS can be
created● Activity (or some other component) that uses TTS
implements OnInitListener interface
Overriding onPause() and onDestroy()
● When your Activity is paused (e.g., it loses focus), have TTS stop synthesizing
● When your Activity is destroyed, shut TTS down to notify Android that the resources can be released and given to other activities or applications
TTS Customization
Overcoming TTS Limitations
● Every TTS engine mispronounces some words (one can think of it as a fundamental theorem of TTS)
● There are two ways of overcoming this limitation: Phonetic spelling: spell mispronounced words the way they
sound, generate waveforms, associate words with wave-forms, & save them
Human recording: have a human record mispronounced words, save them in audio files, and use those files
Audio Dictionary Application
● Develop an application that allows the user to create an audio dictionary of phonetically spelled words if their accurate spellings are mispronounced by the TTS engine
● The application allows the user to spell words as they are pronounced
● The phonetic words are converted into wav files by the TTS engine and saved on the device's sdcard
● The saved wav files are associated with the correct spelling
Audio Dictionary Application Screenshot
Implementation
source code is herehttps://github.com/VKEDCO/TTSOnAndroid/blob/master/AudioDictionaryViaSpeechSynthesis.zip?raw=true
Storing Files on SDCard
● Create a directory on the device's sdcard (manually or programmatically)● If you are using Eclipse:
open the DDMS perspective click on the device's name in the Devices panel on the left click on the File Explorer perspective on the the right go to /storage/sdcard and create a folder (e.g., my_audio_files)
● You can do the same steps on your Android device by connecting it to your computer a storage device with a USB cable
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
Setting Reading & Writing Permissions in AndroidManfist.xml
// Initialize TTS on onCreate() of the main activity
String mSDCardFolder = null;
public onCreate(Bundle savedInstance) {
// Do the GUI stuff here & TTS initialization
mSDCardFolder = Environment.getExternalStorageDirectory() + "/phonetic_spelling/";
}
Setting the External Storage Directory
public class AudioDictionaryAct extends Activity
implements OnInitListener {
// If TTS is initialized successfully, enable the Speak and
// Record buttons
public void onInit(int status) {
if ( status == TextToSpeech.SUCCESS ) {
btnSpeak.setEnabled(true);
btnRecord.setEnabled(true);
}
}}
Implement OnInitListener in the Main Activity
// Initialize TTS on onCreate() of the main activity
public onCreate(Bundle savedInstance) {
// Do the GUI stuff here
Intent checkIntent = new Intent();
checkIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
startActivityForResult(checkIntent, REQ_TTS_STATUS_CHECK);
}
TTS Initialization
TextToSpeech mTTS = null;
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
if ( requestCode == REQ_TTS_STATUS_CHECK ) {
switch ( resultCode ) {
case TextToSpeech.Engine.CHECK_VOICE_DATA_PASS:
mTTS = new TextToSpeech(this, this); Log.v(TAG, TTS_INSTALLED_MSG); break;
case TextToSpeech.Engine.CHECK_VOICE_DATA_FAIL:
Log.v(TAG, INSTALL_TTS_DATA_MSG + resultCode);
Intent installTTSDataIntent = new Intent();
installTTSDataIntent.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);
startActivity(installTTSDataIntent);
default: Log.e(TAG, TTS_UNAVAILABLE_MSG);
}}}
TTS Initialization
Button Logic
btnSpeak = (Button)findViewById(R.id.btnSpeak);
btnSpeak.setOnClickListener(new OnClickListener() {
public void onClick(View view) {
mTTS.speak(edTxtPhoneticSpelling.getText().toString(), TextToSpeech.QUEUE_ADD, null);
}
});
Speak
btnRecord = (Button)findViewById(R.id.btnRecord);
btnRecord.setOnClickListener(new OnClickListener() {
public void onClick(View view) {
soundFilename = mSDCardFolder + edTxtUserFileName.getText().toString();
soundFile = new File(soundFilename);
if (soundFile.exists()) { soundFile.delete(); }
if (mTTS.synthesizeToFile(edTxtPhoneticSpelling.getText().toString(), null,
soundFilename) == TextToSpeech.SUCCESS ) {
btnPlay.setEnabled(true);
btnAssociate.setEnabled(true);
}}});
Record
btnPlay = (Button)findViewById(R.id.btnPlay);
btnPlay.setOnClickListener( new OnClickListener() {
public void onClick(View view) {
try {
Log.v("AUDIODICTIONARY", soundFilename);
mPlayer = new MediaPlayer();
mPlayer.setDataSource(soundFilename);
mPlayer.prepare();
mPlayer.start();
}
catch (Exception e) { // handle exception }}
});
Play
btnAssociate = (Button)findViewById(R.id.btnAssociate);
btnAssociate.setOnClickListener(new OnClickListener() {
public void onClick(View view) {
mTTS.addSpeech(edTxtRealSpelling.getText().toString(), soundFilename);
}
});
Associate Audio with Spelling
Overcoming TTS Limitationsthrough
Human Recording
What Is This?
Bhagavatgita, Verse 1
dharmakshetre kurukshetre samaveta yuyutsavah
mamakah pandavashcaiva kim akurvata sanjaya
Bhagavatgita, V. 1 Transliterated
Что свершали, - скажи Санджая, -
сыновья мои и Пандавы,
ради битвы сойдясь на поле
Kурукшетры, на поле дхармы?
Перевод В.С. Семенцова
What Is This?
Что свершали, - скажи Санджая, -
сыновья мои и Пандавы,
ради битвы сойдясь на поле
Kурукшетры, на поле дхармы?
Перевод В.С. Семенцова
The Russian Translation of Bhagavatgita V. 1
Chto svershili, - skazhi Sandzhaya, -
synovya moi i Pandavy,
radi bitvy soydyas' na pole
Kurukshetry, nа pоlе dharmy?
Translated by V.S. Sementsov
Transliteration of Russian Translation
Oh, Sanjaya, tell me what happened atKurukshetra, the field of dharma, where myfamily and the Pandavas gathered to fight?
Translated by Eknath Easwaran
English Translation of Bhagavatgita, V. 1
TTS Bhagavatgita Project
source code is herehttps://github.com/VKEDCO/TTSOnAndroid/blob/master/BhagavatGitaTTS_v43.zip
The Problem
Have your Android device read the first verse of Bhagavatgita in Sanskrit, Russian, & English.
Sample Screenshot
Logical Steps of a Solution
● Write a Devanagari transliterator that takes Sanskrit texts and produces their Latin transliterations
● Write a Cyrillic transliterator that takes Russian texts and produces their Latin transliterations
● Have human readers record Sanskrit and Russian words● Associate strings with specific recordings
Real Steps
● We will skip transliterator implementation (quite likely an M.S./Ph.D. type of project)
● Record .wav files & save them on SD card● Associate .wav files with specific strings● Have the TTS engine load those strings from SD card
at run time
mTTS.addSpeech("sn_akurvata", snPath + "sn_akurvata.wav");
mTTS.addSpeech("sn_dharmakshetre", snPath + "sn_dharmakshetre.wav");
mTTS.addSpeech("sn_kim", snPath + "sn_kim.wav");
mTTS.addSpeech("sn_kurukshetre", snPath + "sn_kurukshetre.wav");
mTTS.addSpeech("sn_mamakah", snPath + "sn_mamakah.wav");
mTTS.addSpeech("sn_pandavashcaiva", snPath + "sn_pandavashcaiva.wav");
mTTS.addSpeech("sn_samaveta", snPath + "sn_samaveta.wav");
mTTS.addSpeech("sn_samjaya", snPath + "sn_samjaya.wav");
mTTS.addSpeech("sn_yuyutsavah", snPath + "sn_yuyutsavah.wav");
Adding Sanskrit to TTS Engine
mTTS.addSpeech("ru_bitvy", ruPath + "ru_bitvy.wav");
mTTS.addSpeech("ru_chto", ruPath + "ru_chto.wav");
mTTS.addSpeech("ru_dharmy", ruPath + "ru_dharmy.wav");
mTTS.addSpeech("ru_i", ruPath + "ru_i.wav");
mTTS.addSpeech("ru_kurukshetry", ruPath + "ru_kurukshetry.wav");
mTTS.addSpeech("ru_moi", ruPath + "ru_moi.wav");
mTTS.addSpeech("ru_na", ruPath + "ru_na.wav");
mTTS.addSpeech("ru_pandavy", ruPath + "ru_pandavy.wav");
mTTS.addSpeech("ru_pole", ruPath + "ru_pole.wav");
Adding Russian to TTS Engine
final static String SN_PREFIX = "sn_";
public void saySanskritWords() {
for(String w: mSNWords) speakWord(SN_PREFIX + w);
}
final static String RU_PREFIX = "ru_";
public void sayRussianWords() {
for(String w: mRUWords) speakWord(RU_PREFIX + w);
}
Speaking Sanskrit & Russian
public void speakWord(String word) {
mTTS.speak(word, TextToSpeech.QUEUE_ADD, null);
}
Speaking Sanskrit & Russian
Storing Audio Files on SDCard
● Create a folder on the sdcard called /bhagavatgita in the folder given as the output value of the call Environment.getExternalStorageDirectory().getPath()
● Create two subfolders /bhagavatgita/sn/ and /bhagavatgita/ru/
● Place the audio files from this zip archive into the the appropriate folders● Here is the full link to the above zip archive:
https://github.com/VKEDCO/TTSOnAndroid/blob/master/bhagavatgita.zip
References & Reading Suggestions
● http://developer.android.com/reference/android/speech/tts/TextToSpeech.html