11
Implementation Convert Text to Speech Voice on Portable Tool Integrated with Cell Phones for Disability people Muhammad Taufiq, Akhmad Hendriawan, Ardik Wijayanto Electronics Engineering Study Program of Applied Graduate Department of Electronics Engineering Electronic Engineering Polytechnic Institute of Surabaya EEPIS Campus, Raya Sukolilo street ITS, Surabaya, East Java – Indonesia 60111 Tlp: (031) 594 7280; Fax: (031) 594 6114 Email: [email protected] , [email protected] , [email protected] , ABSTRACT Research on the implementation of the method of text to speech have been carried out , one in " The Design Tool Talk in the Shape of Text to Voice Changers Portable Equipped with Input Text ". However , these studies have shortcomings in syllable recognition ability and capacity voice database used. Therefore , this study intends to overcome the shortage in the number of sound database used is also the method used for syllable recognition. The voice database contains a collection of sounds and syllables in the form of total 4700 pieces with each pattern V , VK , VKK , K , KV , KVK , KVKK , KVKKK , KKV , KKVK , KKVKK , KKVKKK , KKKV , KKKVK ( V is a vowel and K is a consonant ). This database serves as the voice sound to generate a reference signal synthesized speech. Based on the text that has been entered and normalized into a new text character in the form of a row of capital letters , the text is then converted to a row of syllables using Finite State Automata ( FSA ). Rows of syllables are then processed using concatenation Syllable database by matching each syllable sounds corresponding to each other are then combined to obtain the final result of the sound synthesis. Based on test results, the system has been able to meet the expected contribution is able to recognize syllables and converts it into sound according to the input text. System testing results obtained in syllable recognition success rate of 90 % of the tested 10 kinds of text. System test results in the conversion into sound syllables also obtained a maximum 75 % success rate of 20 respondents. Keyword : Text to Speech, Syllable Concatenation, Finite State Automata (FSA), Database Syllabary, Disability people I. PENDAHULUAN A lot of research that explores the problems faced by persons with disabilities. Problems that are often faced by people with disabilities is especially disability people communication. Though in modern times , many telecommunications equipment vital that work using sound input , such as telephone , HT plane , internet and others. Though this equipment is very important to anticipate such emergencies to call the police post , fire station , hospital , and so forth. You can bet they are not able to use the telecommunications equipment [ 1 ]. This clearly proves that the telecommunication technology has yet to reach fully to the special as they are.

Papan Ketik Khusus (PATIKUS) Konversi Teks ke Ucapan Voice pada Perangkat Portabel

Embed Size (px)

DESCRIPTION

Penelitian tentang penerapan metode text to speech telah dilakukan, satu di "Desain Alat Bicara dalam Bentuk Text to Pengubah Suara Portabel Dilengkapi dengan Input Teks". Penelitian ini bertujuan untuk mengatasi kekurangan dalam jumlah database suara yang digunakan juga merupakan metode yang digunakan untuk pengakuan suku kata. Database suara berisi kumpulan suara dan syllables.This database yang berfungsi sebagai suara suara untuk menghasilkan sinyal suara referensi disintesis. Berdasarkan teks yang telah dimasukkan dan dinormalisasi menjadi karakter teks baru dalam bentuk deretan huruf kapital, teks kemudian dikonversi menjadi deretan suku kata dengan menggunakan Finite State Automata (FSA).

Citation preview

Implementation Convert Text to Speech Voice on Portable Tool Integrated with Cell Phones for Disability people

Muhammad Taufiq, Akhmad Hendriawan, Ardik Wijayanto

Electronics Engineering Study Program of Applied GraduateDepartment of Electronics Engineering

Electronic Engineering Polytechnic Institute of SurabayaEEPIS Campus, Raya Sukolilo street ITS, Surabaya, East Java – Indonesia 60111

Tlp: (031) 594 7280; Fax: (031) 594 6114Email: [email protected], [email protected], [email protected],

ABSTRACT

Research on the implementation of the method of text to speech have been carried out , one in " The Design Tool Talk in the Shape of Text to Voice Changers Portable Equipped with Input Text ". However , these studies have shortcomings in syllable recognition ability and capacity voice database used. Therefore , this study intends to overcome the shortage in the number of sound database used is also the method used for syllable recognition. The voice database contains a collection of sounds and syllables in the form of total 4700 pieces with each pattern V , VK , VKK , K , KV , KVK , KVKK , KVKKK , KKV , KKVK , KKVKK , KKVKKK , KKKV , KKKVK ( V is a vowel and K is a consonant ). This database serves as the voice sound to generate a reference signal synthesized speech. Based on the text that has been entered and normalized into a new text character in the form of a row of capital letters , the text is then converted to a row of syllables using Finite State Automata ( FSA ). Rows of syllables are then processed using concatenation Syllable database by matching each syllable sounds corresponding to each other are then combined to obtain the final result of the sound synthesis. Based on test results, the system has been able to meet the expected contribution is able to recognize syllables and converts it into sound according to the input text. System testing results obtained in syllable recognition success rate of 90 % of the tested 10 kinds of text. System test results in the conversion into sound syllables also obtained a maximum 75 % success rate of 20 respondents.

Keyword : Text to Speech, Syllable Concatenation, Finite State Automata (FSA), Database Syllabary, Disability people

I. PENDAHULUAN

A lot of research that explores the problems faced by persons with disabilities. Problems that are often faced by people with disabilities is especially disability people communication. Though in modern times , many telecommunications equipment vital that work using sound input , such as telephone , HT plane , internet and others. Though this equipment is very important to anticipate such emergencies to call the police post , fire station , hospital , and so forth. You can bet they are not able to use the telecommunications equipment [ 1 ]. This clearly proves that the telecommunication technology has yet to reach fully to the special as they are. In addition , as part of the social human being of course they want to still be able to communicate with others , especially family , although its existence is not possible to communicate directly. So the existence of telecommunication tools like this must be very necessary .

However , until now this kind of telecom tools such as TTY phone is still limited [ 2 ]. The phone still rely on wired PSTN as data channel and yet operate on a network GSM or CDMA so it can only be used in certain places such as at home or in the office. In addition , the price of the phone is still fairly expensive at $ 339.17 [ 3 ]. Moreover , these devices are not in Indonesia , so it needs an additional fee to bring it from abroad.

Seeing this phenomenon , so it will need an engineering technology that is able to realize effective telecommunication tools , economical , and efficient , and able to answer the needs of persons with disability people. Many studies that explore engineering technology as practiced by Dwi Lisnasari to implement tools in the form of portable speech text to voice converter [ 4 ]. But still has shortcomings in syllable recognition ability and capacity of the voice database used. Therefore , in this thesis , the author would like to create a technology engineering tools for long-distance telecommunications in the form of a portable gadget that comes with a virtual keyboard to input text as a medium. Then through a text to speech system that has been integrated in it , the tool is able to convert text into sound useful as a substitute for the user's voice when making a phone call. That way , are expected to provide ease and comfort in performing telecommunications and technology to address the disparities that have been experienced by persons with disabilities specifically disability people .

Electronics Jornal of EEPIS, Electronics Engineering, Vol.2, No.2, (2023)

II. DESIGN and DEVELOPMENT

In general, this tool is the integration of embedded systems from Touchscreen TFT LCD display interface that functioned as a virtual keyboard with ARM Cortex M3 microcontroller and then mounted on the mobile phone through the headset. With embedded text-to-speech engine from the software side as pensintesis artificial sound, the end result can be obtained in the form of a new device that serves as a telecommunications tool for disability people.

Figure 1. Block diagram system

Text-to-speech (TTS) or speech synthesis is a system that can convert text into speech row as output. In principle speech synthesis system consists of two basic parts, namely :

Figure 2. Block diagram system text-to-speech [5]

1. Parts of the text to phoneme converterText to phoneme converter section serves to take the input sentence in a particular language in the form of

lines of text and change a few things such as numbers and sign into writing in accordance with the sound should be, often called text normalization (text normalization). Then determine the phonetic code (phonetic transcriptions) for each word along with duration and pitch. Phoneme code is a code that represents the unit sound like spoken. Pronunciation of the word or phrase in principle is sound or symbolic sequence is a sequence of phonemes code.

2. Phoneme-to-speech converter sectionPart phoneme-to-speech converter will accept input phoneme codes as well as pitch and duration have

been generated by the previous section. Based on the codes of this section will result in sound or speech signal corresponding to the sentence she wanted to say. There are several alternative techniques that can be used for the implementation of this section. One technique used is the connection diphone (Diphone Concatenation). On systems that use grafting techniques diphone, the system must be supported by a diphone database that contains segments recording a diphone speech.

SpeechText

Phoneme code, tone,

and duration

Intonation Model Indonesian

Indonesian diphone database

Phoneme to Speech Converter

Converter Text to Phoneme

Electronics Jornal of EEPIS, Electronics Engineering, Vol.2, No.2, (2023)

In the text to voice converter can be used algorithms Finite State Automata (FSA). Here's the workflow of the algorithm and the FSA in recognizing syllable cut :

Figure 3. Diagram of algoritma FSA[6]

The other system components is highly important voice database. This database is used as a signal generator sound Bibliography. The database contains a collection of sound samples, amounting to approximately 4700 pieces which is the result of sound recording Indonesian raw syllable containing an array of V, VK, VKK, K, KV, KVK, KVKK, KVKKK, KKV, KKVK, KKVKK (V is and K is a consonant vowel). The sound samples recorded at a frequency of 44100 Hz in the format *. WAV and stored in the SD memory card.

For media use interface for entering text in the form of a virtual keyboard. The buttons used are the standard buttons that are commonly used for short message / SMS on a smartphone. The buttons and their functions are as follows:

a. The number keys 0 to 9, serves as the input numeric characters.b. Button alphabet A to Z, a to z alphabet, alphabetic character serves as an input.c. Character key point, serves as a character input point.d. Shift key, serves as the size of the font modifierse. The space bar, serves as the input space characterf. Delete button, serves to remove all text that has been inputted.g. Backspace key, serves as removing the last character that has been inputted.h. Speak button, serves to process the text that has been feded into sound.

`

Q1

Q0

Q2

Q3

Q4

Q5

Q7

Q6

Q8

START

Description :Q0 : Beginning statusQ1 : recognize spacesQ2 : recognize vokal (V)Q3 : recognize vokal (V)Q3,Q4,Q5, Q7: recognize consonant (K) Q6 : recognize consonant 2 alphabetQ8 : recognize consonant -vokal (KV)

Blank/Vokal

Vokal

Vokal

Vokal

Vokal

‘G’,’Y’

‘N’

‘H’

‘Y’

Konsonanexept N,K,S

‘K’

‘S’

Vokal

Blank/Vokal

Electronics Jornal of EEPIS, Electronics Engineering, Vol.2, No.2, (2023)

Figure 4. Performance of virtual keyboard

Designing and Creating Algorithms

The software in question is the computational program for the completion of the conversion process algorithm of text into sound. The following flowchart of the software work.

Figure 5. Flowchart of sequence of text to voice conversion process

III. TESTING and ANALYSIS

Testing system to text converter syllables

This test is intended to determine the reliability of the software to convert text into syllables. This test is used on ARM microcontroller as the computing center. Then the result of processing the data in the form of syllables shown through hyperterminal on a PC screen.

Block diagram of the test system

Figure 6. Block diagram of the test system text converter into syllablesThe test results

PCMinsysSTM32 HyperterminalKeyboard

START

Inputted text

Normalisize text

Syllabary conversion to Voice

Convert Text into Syllabary

Systesis voice

STOP

Electronics Jornal of EEPIS, Electronics Engineering, Vol.2, No.2, (2023)

Figure 7. Test results to text conversion syllables

Table 1. Test syllables to text conversion

NO Text input Conversion outcome Indication1. Itu buku saya I1 TU3 BU1 KU3 SA1 YA3 Success2. KePolisiAN KE1 PO2 LI2 SI2 AN3 Success3. KAPAN SAJA KA1 PAN3 SA1 JA3 Success4. ImpleMEntasi IM1 PLE2 MEN2 TA2 SI3 Success5. Konteks bahasa KON1 TEKS3 BA1 HA2 SA3 Success6. Khusus Anda KU1 SUS3 AN1 DA3 Success7. PROYEK AKHIR PRO1 YEK3 AK1 HIR3 Success8. menggunakan MENG1 GU2 NA2 KAN3 Success9. Ekstraksi zat EKS1 TRAK2 SI3 ZAT3 Success10 memBU@L s@JA MEM1 BU2 Failed

Analisa hasil pengujian

The test is performed to convert the system in the form of text input sentence into a row of syllables. To obtain these results, the text that has been entered should be normalized into a set of character strings in the form of a row of capital letters, and then converted into a row of syllables using Finite State Automata (FSA). This method is modified so as to be able to do the conversion in Table 1. In this case able to recognize the position of syllables in a word or sentence. This position determines the database to be used. There are three positions, namely at the beginning, in the middle, and at the end of each of which is represented by numbering 1, 2, and 3 in each end of the syllable.

Based on the test results in Table 1, note that nine of the ten test data, the system is able to perform the conversion properly. But on the 10th of testing the system can not do the conversion because the system does not recognize the character @ (outside letters of the alphabet) so that the system is in an error at that position then issued a warning in the form of text input is invalid. So the user must restart from the beginning to input text.

Testing syllables system converters to soundTesting was conducted to determine the characteristics of the sound produced in the conversion process into

sound syllables.

Block diagram of the test system

`

output

input

Electronics Jornal of EEPIS, Electronics Engineering, Vol.2, No.2, (2023)

Figure 8. Block diagram of the test system converters to voice syllables

Testing result

Figure 11. Footage voice signal spectrum "SATU"

Testing Result Analysis

If both signals are Figure 9 and Figure 10 are combined, it will get a new voice signals as in Figure 11. At the signal that there is a delay or lag is generated by software in the process of searching and matching databases corresponding syllable sound. The length of this pause was dependent on the length of the software in search of the database file. Which results in a less as expected.

System integration testing with mobile phone

This test is intended to compare the level of clarity of the sound produced by the instrument subsequent to the overall system integration.

Block diagram of testing system

Figure 12. Block diagram of the test system to voice converter syllables

Document of testing

SpeakerHeadsetTools

“SA” “TU”delay

Tool Handphone 1Headset Handphone 2

Electronics Jornal of EEPIS, Electronics Engineering, Vol.2, No.2, (2023)

Figure 13. Document of testing system

Description :A : Patikus GadgetB : Headset outC : Headset in

Testing Result

Table 2. trial level synthesis results sound clarity

NO. Syntesis word Respon1. Fajar voice fairly clear2. Terang voice fairly clear3. Source voice faint4. Panah voice fairly clear5. Semua voice faint6. Terka voice fairly clear7. Tidak voice fairly clear8. Sekarang voice fairly clear9. Mendapatkan voice fairly clear10. Selamannya voice faint

In this test, a tool attached to the phone through the headset 1. While the mobile phone used by respondents 2. Communication via telephone, respondents then played synthesis results. The parameters used reference is the level of voice clarity. Accordance with Table 2, of the ten words chosen at random, not all the results clearly audible voice synthesis. This is because the quality of voice database has the characteristics of less as expected. The point is that the suitability of pronunciation reading in the recording process database. Among syllable word is still experiencing problems in which an element of the letter 'n', 'ny', 'ng', or 'm'.

In addition, the faint sound effects also occur due to noise that occurs between the connection with mobile devices, because both are connected through the headset is modified in such a way, but less attention to aspects of the noise signal is going to happen in it. It is evident that if voice is played through the speakers directly and without using a headset, the sound is heard more clearly. So it can be said that the connection between the noise caused even the faintest audible voice synthesis is less clear.

IV. CONCLUSSION

`

B

C

A

Electronics Jornal of EEPIS, Electronics Engineering, Vol.2, No.2, (2023)

Taxable income doing Testing and Analysis, then it can be concluded about some SYSTEM The performance has been made, as follows:

1 . System testing results obtained in syllable recognition success rate of 90 % of the tested 10 kinds of text .2 . System test results in the conversion into sound syllables also obtained a maximum 75 % success rate of 20 respondents .3 . Of testing the system as a whole , the quality of sound produced enough rated 80 % good and 20 % of the response to the subject of disability 5 people.4 . On testing the entire system , the resulting sound synthesis is more likely vague and unclear because of the noise that occurs between the connection between the mobile device .5 . Method of Finite State Automata ( FSA ) is reliable enough to be able to recognize and distinguish the syllables .6 . The larger the sampling frequency sounds are used , the resulting sound quality is getting better . But also generate sizable noise when using the PWM as a medium for issuing voice .7 . The quality of the recording database syllable is determined by his verbal ability . Recording can be done only once and is continuous in the same clock . And if something goes wrong then it should be repeated pronunciation of the time anyway to keep the sound quality is still the same .8 . The words are often constrained faint sound is the word which contains elements of letters ' n ' , ' ny ' , ' ng ' , or ' m.

BIBLIOGRAAPHY

[1] http://alatbantualb.blogspot.com/2010/11/alat-bantu-tuna-wicara.h tml?m=1, visited on May 20th 2012.[2] http://en.wikipedia.org/wiki/Telecommunications_devices_forthe _deaf, visited on May 20th 2012.[3] http://www.uic.edu/depts/accc/telecom2.0/phone/deafdevices.shtm l, visited on May 20th 2012.[4] Lisnasari, Dwi. 2010. Perancangan dan imlementasi komunikasi data Text To Speech (TTS) dalam bahasa

Indonesia. Graduation Project EEPIS-ITS. Surabaya[5] Arman, Arry Akhmad. 2004. Konversi dari Teks ke Ucapan. Bandung[6] Basuki, Thomas Anung. 2000. Pengenalan Suku Kata Bahasa Indonesia Menggunakan Finite-State Automata.

Bandung.