6
10: 90 VOICE SEARCH FOR QURANIC VERSES RECITATION Ammar Mohammed MaGIC-X UTM -IRDA Universiti Teknologl Malaysia Johor Bahru, Malaysia [email protected] Mohd Shahrizal Bin SURar MaG IC-X UTM -IRDA Umversiti Teknologi Malaysia Johor Bahru, Malaysia [email protected] Md.Sah Hj.Salam. Faculty of Computing Universiti Teknologi Malaysia Johor Bahru, Malaysia [email protected] ,Abs" act This paper describes challenges and solutions for building II. suecessful voice sel rc h system as applied to Quranlc verses. The paper descnbes tbe techniques used to deal with an finit e \'ocabularJ how modelling completely in the voice dom ain (or lan guage model and dictionary can avoid some system cornplealry, and how we built dictionaries, language and acoustic models in the framework, The Holy Quran Is written In Arabic language, and the Arabic is one of the oldest langu ages In the world that presents its own features and challenges while searching for Ara ble-based content. The most search systems for Ihe Holy Qura n Is organized around te xt words (contained In the target verses) but no system organized around voice "hil e there is need to search In qur anle red tation. The speech recognition approaches are applled to build the dictionary and test the system, " 'hile the MFCC and stemming techniques will be will be applied to find the stem of the word. The development of vetce search for Quranlc r edt ation led to a significa nt simplification of the original process to hulld a system to retrieving audio information related to the Qur'an , it also helps to build a system to serve people with sp edal needs such as blind.paralyzed, and othe" who can not use the ke ybeerd . vetce search; Quranic recitation; Quranic aClllstlc; Spe« h recognition I. lNTRODUCIlON voice searc h the ability to perfonn web searches simply by speaking, first appeared around 2008 on iPhone and Android phone in US English. Soon after a several of researchers focusing on developing voice searc h systems for other languages.[ I] Arabic language is not similar to US English, it is a semantic language with a composite morphology. Arabic words are categorized as particles, nouns, or verbs. Unlike most western languages. Arabic script writin g orientation is from right to left. There are 28 characters in Arabic. The characters are connected and do not start with cap ital letter as in English. Most of me characters differ in shape based in their position in me sentence and adjunc tleners.[2 , 3) The Quran is the holy book of Islam,originally written in Classical Arab ic language, consists of 6236 verses divided into 114 chapters called suras. Each surah also differs from one another in terms of the number of verses (ayat) [3, 4}. Orthographic variations and the use of diacritics and glyphs in the rep resentat ion of the lang .. uage of Classical Arab ic increase the difficulty of stemming[5]. Many verses are similar and even identical. Searching for similar words (e.g verses) could return thousands of verses. that when displayed completely or partly as list would ma ke analysis and understanding difficult and confus ing. Moreover it would be visually impossible to instantly figure out the overall distribution of the Identified or retrieved verses in the Quran. (4J II. LITERATURE REVIEW Using our vo ice to access informa tion has bee n a part of scie nce fiction ever. Today, with powerful smartphones, cloud-based computing, and speech recognition techniques science fiction is becoming reality. One of the important voice applications is is a speech recognition technology that allows users to searc h by saying terms aloud rather than typing them into a search field. The information normally exists in a large database, and the query has to be compared with a field in the database to obtain the relevant informationle, 7]. The proliferation of smart phones and other small, Web-enabled mobile devices has spurred interest in voice search. The voice applications are available in the markets which recites the holy Quran. One of the most popular and commonly used is Quran Auto Reciter (QAR) [8], and some of applications depend on speech recogniton techniques like Hafas[9] and Ehafiz[ I0, II] . This study proposes the system the ability to search in recitation by voice, And the expected benefit of Quranic voice search system is using in retrieving audio information related to the Holy Quran, as well as a systems to serve people with special needs who can not use the keyboard. ARELATED SnIDIES In 2008. a new version ofGOOG-41 1 has been deployed which allowed (and encouraged) the user to state their need in a single utterance rather than in sequential utterances that split apart the location and the business. This was motivated by our desire to ' 24

90 VOICE SEARCH FOR QURANIC VERSES RECITATIONeprints.utm.my/id/eprint/60959/1/MdSahSalam2014_Voice...are well versed with Tajweed rules. However. USCTS'w hoare not Arabie speakers

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • 10: 90

    VOICE SEARCH FOR QURANIC VERSESRECITATION

    Ammar Mohamm edMaGIC-X UTM -IRDA

    Universiti Teknologl Ma laysiaJohor Bahru, Malaysia

    [email protected]

    Mohd Shahrizal Bin SURarMaG IC-X UTM -IRDA

    Umversiti Teknologi Ma laysiaJohor Bahru, Malaysia

    [email protected]

    Md.Sah Hj .Salam.Faculty of Computing

    Universiti Teknologi MalaysiaJohor Bahru, Malaysia

    [email protected]

    ,Abs" act • This paper describes challenges and solutions forbuilding II. suecessful voice selrc h system as ap plied to Qu ranlcverses. The paper descnbes tbe techniques used to deal with anfinit e \'ocabularJ how modelling completely in th e voice dom ain(or lan guage model and dictionary can avoid some systemcornplealry, and how we built dictionaries, language and acousticmodels in the fr amework, The Holy Qura n Is wri tt en In Ar abiclanguage, and the Arabic is one of the oldest languages In the worldthat presents its own features and challenges while searching forArable-based content. The most search systems for Ihe HolyQura n Is organized around te xt words (contained In the targetverses) but no system organized around voice "hile there is needto search In quranle red tation. The speech recognition approachesare applled to build the dictionary and test the system, " 'hile theMFCC and stemming techniques will be will beapplied to find thestem of the word. The development of vetce search for Quranlcredtation led to a significant simplification of the original processto hulld a system to retrieving audio information related to theQur'an , it also helps to build a system to serve people with spedalneeds such as blind.paralyzed, and othe" who can not use thekeybeerd.

    X~words- vetce search; Quranic recitation; Quranic aClllstlc;Spe«h recognition

    I. lNTRODUCIlON

    vo ice searc h the ability to perfonn web searches simply byspeaking, first appeared around 2008 on iPhone and Andro idphone in US English. Soo n after a seve ral of researchersfocusing on developing voice search systems for otherlanguages.[ I]Arabic language is not similar to US Englis h, it is a semanticlanguage with a composite morphology. Arabic words arecategorized as particles, nouns, or verbs . Unlike most westernlanguages. Arabic script writin g orientation is from right to left .The re are 28 characters in Arabic. The characters are connectedand do not start with cap ital letter as in English. Most of mecharacters differ in shape based in their position in me sentenceand adjunc tleners.[2, 3)

    The Quran is the holy book of Islam,originally written inClassical Arab ic language, consists of 6236 verses divided into114 chapters called suras. Each surah also differs from oneanother in terms of the number of verses (ayat) [3, 4}.

    Orthographic variations and the use of diacritics and glyphs inthe rep resentat ion of the lang..u age of Classical Arab ic increasethe difficulty of stemming[5]. Many verses are similar and evenidentical.Searching for similar words (e.g verses) could return thousandsofverses. that when displayed completely or partly as list wouldma ke analysis and understanding difficult and confus ing.Moreover it would be visually impossible to instantly figure outthe ove rall distribution of the Identified or retrieved verses inthe Quran. (4J

    II. LITERATURE REVIEW

    Using our vo ice to access information has been a part of scie ncefiction ever. Today, with powerful smartphones, cloud-basedcompu ting, and speech recognition techniques science fiction isbecoming reali ty.One of the important voice applications is is a speechrecognition technology that allows users to searc h by sayingterms aloud rather than typing them into a search field. Theinformation normally exists in a large database, and the queryhas to be compared with a fiel d in the database to obtai n therelevant informationle, 7].Th e proliferation ofsmart phones and other small, Web-enabledmobile devices has spurred interest in vo ice search.

    The voice app lications are avai lable in the markets whic hrecites the holy Quran. One of the most popular and commonlyused is Quran Auto Reciter (QAR) [8], and some ofapp licationsdepend on speech recogniton techniques like Hafas[9] andEhafiz[ I0, II].This study proposes the system the ability to search in recitationby voice , And the expected benefit of Quranic voice sea rchsystem is us ing in retrieving audio info rmatio n related to theHoly Quran, as well as a systems to serve people with specialneeds who can not use the keyboard.

    ARELATED SnIDIES

    In 2008. a new version ofGOOG-41 1 has been dep loyed whichallowed (and encouraged) the user to state their need in a singleutterance rat her than in sequential utterances tha t split apart thelocation and the business. This was motivated by our des ire to

    ' 24

  • accommodate faster interactions as .....ell as allow the usergreater flexibility in how they describe their needs (6].

    With regard to Quranic search systems. most of researchershave interested on the development of search techniques for theQuranic text, but the principle can be app lied in thedeve lopment of voice search systems. for examp le.Kadri et at (3] proposed a new stemming method that tries todetermine the core of a word according to inguistic rules. Thenew method shows the best retrieval effectiveness. TheIinguistic-based method can better determine the semantic coreofa word.Naglaa Thabet [5J proposed a new light stemming approach thaigives better results. when applied to a rich vocalized text as theQuran. The stemmer is basically a light stemmer to removeprefixes and suffixes and is applied to a version of the Qurantransliterated into western script.Riyad Alshalabi [3] provided a technique for extracting thetrilateral Arabic mol for an unvccahzed Arabic corpus. Itprovides an efficient way to remove suffixes and prefixes fromthe inflected words. Then it matches the resulting word with theavailable patterns to find the suitable one and men extracts thethree letters of the root by removing all infixes in that pattern.

    Regard ing researches in the development ofspeech recognitiontec hniques to use in Quran recitation applications, there areseveral research in the field ofQw'anic voice recognition. ofthemost famous of such research is an automated del imiterintroduced by Hassan Tabbal et al.[S] Which extracts versesfrom an aud io file and coverts Quran verses in audio file, usingspeec h recognition technique. The Sphinx. IV framework isused to develop this system.

    The core recogniti on process is provided automatically by thesphinx engine using the appropriate langua ge and acousticmodels. The sphinx framework must be configured using anxrnl based configuration tile.The recognitio n rat io in the case ofTarteel is slightly better thanin the case of TAJWEED. One possible reason for this could bethat the majority ofthe Tarteel recitat ions available now followsthe same monotony and the duration ( in time) of each phonemediffers slightly from one reciter to another. There is also theextra noise that is caused by the compression of the audio filesand the low quality of the recordi ngs.

    Although we had antic ipated this by using noisy audio filesduring the training. but the differences in compression ratiosbetween the files add a lot of ven ation of the added no ise andthus causing extra errors. When unskilled persons tested thesystem (we eve n tested it on children), it behaved astonishinglywell even .....ben the reciter was a w"Oma n. II case that cannot beencountered in rea l life because it's not usual to have a womanreciting the Holy Quean. There is also an interesting observationdra.....l1 from these tests: It is elways recommended [12]to trainthe system with more than 500 different voices in order to reachspeaker independence. But we didn' t train our system wi th this

    relatively large number and still we were able to haveremarkable speaker indepe ndence results.

    The system introduced by Hassan Tabbal[8J an automateddelimiter, which extracts verses from the audio ti les, usingMFCe feature extraction. This sys tem is useful for people whoare well versed with Tajweed rules. However. USCTS'who are notArab ie speakers don't benefit from this system. In addition, itmay also not help reciters to improve recitat ion abilities. Thesystem is useful 10 those people who already know the correctrecitation of the holy Qw'an and the subsequent roles and notsuitable for none Arabic speakers. A system that will be able 10help users to know recitation rules (TAJWEED), pointing outmistakes made during recitation is a necessity and a taskachieved by the E - Hafiz system.

    The other system introduced by Bushra Abro [13] Arabiclanguage which included isolated words and sentences. Thedataset comprised of few Arabic sentences and words. Thesystem was buill on Al-Alaoui algorithm lhat trains NeuralNetworks (NN) and has been able to achieve 8~;' accuracy onsentence recognition. But the drawback of this technique is thatthe system was trained on distinct sentences.Similarities in sentences separate NNs and therefore are neededto be trained whic h is computationally very expensive. for thepurpose of Qur'an memorization, MFee was used to extractfeatures. The dataset consists of few small Quranic verses andpattern match ing was do ne by Vector Quantization. The systemgained good recognition rate but the approach is statis tical andtherefore difficult to scale up for larger system to be built oncomplete Qur'an. It is also known to take much time and spacecomplexity which is a sbortfa ll of a rea l time system.

    Za idi Razzak [14] presents different recognition techniquesused for the recita tion of the Quran verses in Arabic versepointing out the adva ntages and the drawbacks. The most usefulmethod for the project "Quranic verse Recitation Recognitionfor support j-QAF leaming" is exp lained therein. J·QAF is apilot program. that aims to encourage learne rs to learn Quraoreading skills, understanding Tejweed and Islamic obligations.The method of teaching j-QAF (teacher and student) is stillhandled manuall y. One basic goal of this paper is to automatethe learnin g process.

    B. HIDDEN MARKOV MODELS

    Markov cha in is a series of cases and final set of randomfunctions in times of intermittent and eontinuous[15]. Wher eassume some conditions. and notices that are generatedrandomly as initial values for the model. And is moving fromcase to case aga in by the probability matrix. Can see the outputof the random function and its relationship with the case, butcan't figure out stages ofthe transition from one slate to another.nor knowledge of the sequence of observations and time. socalled hidden Markov models. Hidden Markov models areclassified on the basis of time into two classes: Notes inintermittent times and in times ofcontinuing notes.

    125

  • emerge out, like from the roots of a plant ; the stem and manybranches emerge.For examp le. the root", J e. ~ gives the stem words:",.M;-" (ilm)and ........ (alam). Furthennore, from the stem word :-,Joo.-(ilm) comes the branching words : "~. (ya'tamu), -+.(a1eem),"~ - (salim) etc; and from -,Ji:." (alam} comes thebranching words -~ "(aalameen} etc .In this stage, also remove of noise from digital audio signals inorder to extract the distinct ive features ofthe voice clear ly. Alsothe time and frequency must be consolidated, thai can beapplied in several ways such as (MA Anal ysis) In order to be aspec ific time and frequency( EX. lOs, 22.0S0hz) [10 ]. Thisstudy will focus on the roots of Quranic words (representing itskeywords in databas e file).

    Mel-Frequency Ceps tral Coeffi cients (MFCC) are extensivelyin ASR.MFCC is based on signal decomposition with the help

    Several different feature extraction algorithms exist, name lyLinea r Predictive Cepstral Coefficients (LPCC): computesSpectral envelop before conven ing it into CepstraI coe fficient.

    Perceptual Linear Prediction (PLP) Cepstra: it is based on theNonl inear Bark scale. The PLP is designed for speechrecognition with removing of speaker dependent characteristics.

    III. METHODOLOGY

    B .PREPROCESSING & F EATURE EXTRACTION

    This stage, to remo ve of noise from digital audio signals inorder to extract the distinctive features of the voice clearly. Alsothe time and frequency must be consolidated, that can beapplied in several ways suc h as (MA Analysis) In order to be aspec ific time and frequency( EX. lOs, 22 .050hz ).Also the process of finding the roots will be done at this stage,Ste mming is a pre-processing step in any Infonnation Retri eval

    B·. ~ - [:ata (IR) system. It is the process of removing all affixes (prefixes,B:I::i' M~~ .& Fearu: 11 ' suffIxes and affixes) from a wo rd to extract its root.Extra

  • -+-

    ...

    Mlddl FIn.e I

    j

    l

    •"

    Inttl. 1

    l

    j

    t

    Jll

    .\1;

  • extracted features with its reference model. This referencemodel is developed, after the enrolment or training phase hadbeen successfully implemented. In this case, the referencemodel (stored model in database) used consist of 2 types ofmodels, which are Word based Model and Phoneme basedMode l.The reference model for phoneme[ I2 , 21) based model istotally differs from word based model, where speech featuresthat have been extracted are directly compared to the wordtemplates.The Quranic recita tion has 60[22J phoneme that led moredifficulties to the dictionary building. Here, each of wordtemplates in direct matching model will store as a vector offeatu res parameters. Word based model will be used as a firstmodel, while phoneme based model been the second modelused as template matching at testing/recogni tion pan.

    E. ACOUSTIC MODELING & TEST

    The acoustic modelling will be done by HMMs, Neverthelessthere are three methods for: Hidden Markov Models HMM,Vector Quantization (VQ), Artificial Neura l Network(ANN )[ 12J and Hybird Model[23].HMM or VQ can be apply for tra ining and testing. HMM isusedwhen Arabic language recognition has to perform and VQfor English language[24] HMM had introduced the Viterbialgorithm for decoding HMMs, and the Bawn-Welch orForward-Backward algorithm for training HMMs. All thealgorithm ofHMM playa crucial role in ASR. It involved withstates , transitions, and observa tions map into the speechrecognition task .The extensions to the Beum-Welch algorithms needed to dealwith spoken language. These methods had been implemen tedby D. Jurafsky and 1. H. Martin (2007) in their research. Here,speech recognition systems train each phone HMM embeddedin an entire sentence . Hence, the segmenta tion and phonealignmenl are performed automatically as parts of the trainingprocedure{l4]. It cons ists of two interrela ted stochasticprocesses common to describe the statistical characteristics ofthe signal. One of which is hidden (unobserved) finite-stateMarkov chain, and the other is the observation vector associatedwith each state of the Markov chain stochastic process(observable)[l7).

    Artificia l Neural Network (ANN) is a computational model ormathematical model based on biological neural networks. Theprocedure depends on the way a person applies intelligence invisualizing, analyzing and characterized the speech based on aset of measured acoustic features[14], but the basic neuralnetworks are not well equipped to address these problems ascompared to HMM's.

    Vector Quantizat ion (VQ) Quantization is the process ofapproximati ng continuous ampl itude signals by discretesymbols. It can be quantized on a single signal value or

    128

    parameter known as scalar quantizatio n, vector quant ization orothers. VQ is divided into 2 parts , known as features trainingand matching features . Features training is mainly concernedwith randomly selecting feature vectors and perform trainingfor the codebook using vector quantizat ion (VQ) algorithm[ 14].

    IV. EXPECTED RESULT

    The results can besummarized as expected:t. Create a new system that able to search in Holy Quran byvioce.2. Help the special needs people in the Quran search andaccess to the selec ted Sura easly .3. Assist in the development of audio information retrievalsystems regarding the Quran .

    V. CONCLUSION

    This paper has covered many aspects of speech recognitionsystem and Quranic recognition,and focusing on The voicesearch in Holy Quran. And this research funding will be highlybeneficia l with special needs and who can not use the keyboard,as it will be useful in information retrieval researches in theHoly Quran ..By the end of this study it is expected that the existing modelswill be enhanced and improved for more suitable for specificitythe Holy Quran.

    ACKNOWLEDGMENT

    This research is supported by UTM Magicx at Department ofComputer Graphics and Multi media, Faculty of Computing,Univers iti Tekno logi Malaysia .

    REFERENCES

    I Schuster, M., and Nakajima, K.: ' Japanese and koreanvoice search', in Editor (Ed.)"(Eds.): 'Book Japanese andkorean voice search' (IEEE, 20 12, edn.), pp. 5149·5 1522 Al-Shammari, E.T.: ' A Novel Algorithm for NormalizingNoisy Arabic Text', in Editor (Ed.)"(Eds.): 'Book A NovelAlgorithm for Normalizing Noisy Arabic Text ' (IEEE, 2009,edn.), pp. 477-4823 Al-Taani, A.T., and Al-Gharaibeh, A.M.: 'SearchingConcepts and Keywords in the Holy Quran', 20104 NasSOUIOU, M.: 'Assisting Analysis and Understanding ofQuran Search Results with Interactive Scatter Plots and Tables ',in Editor (Ed.)"(Eds.): 'Book Assisting Ana lysis andUnderstanding ofQuran Search Results with Interactive ScatterPlots and Tables ' (2011 , edn.), pp.

  • 5 Thabet, N.: 'Stemming the Qur'an ' , in Editor (Ed.)"(Eds.) :' Book Stemming the Qur'an ' (Association for ComputationalLinguistics, 2004, eOO.), pp. 85-886 Schalkwyk, J., Beefennan, D., Beaufays, F., Byrne, R ,Che1ba, C., Cohen, M.• Kamvar, M., and Strope, 8. : "YourWord is my Command": Google Search by Voice: A CaseStudy' : ' Advances in Speech Recognition' (Springer, 2010),pp.61-907 Wang, Y.-Y., Yu, D., Ju, Y.-C., and Acero, A.: 'Anintroduction to voice search', Signal Processing Magazine,IEEE, 2008, 25, (3), pp. 28-388 Hassan, T., Wassim, A.-F., and Bassem, M.: 'Analysis andImplementation of an Automated Delimiter of' Quranic"Verses in Audio Files using Speech Recognition Techniques' ,Robust Speech Recognition and Understanding, 2007, pp. 351910 Elhadj, Y.a.M.: 'E-Halagat: An Eel.earning System forTeaching the Holy Quran' , Turkish Online Journal ofEducational Technology, 2010, 9, (1)II Muhammad, A., u1 Qayyum, Z., Tanveer, W.M.M.s., andSyed, A.Z.: ' EiHafiz: Intelligent System to Help Muslims inRecitation and Memorization ofQuran' , Life Science Journa l,2012,9, (I), pp. 534-54 112 Ibrahim. N.J., Idris, M.Y.I., Razak, Z., and Rahman,N.N.A.: 'Automated tajweed checking rules engine for Quraniclearning' , Multicultural Education & Technology Journal,20 13, 7, (4),PP. 275-28713 Abrc, B., Naqvi, A.B., and Hussain, A.: ' Qur'anrecognition for the purpose of mernorisarion using SpeechRecognition technique ', in Editor (Ed.)"{Eds.): 'Book QUI'anrecognition for the purpose of memorisation using SpeechRecognition technique ' (IEEE, 2012, edn.), pp. 30-3414 Zaidi Razakf-, N.JJ. , Mohd Yamani Idna Idrist , EmranMohd Tamilt , Mohd Yakub @Zulkifli Mohd Yusofft t andNoor Naemah Abdul Rabmaott: ' Quranic Verse RecitationRecognition Module for Support in j-QAF Learning: AReview', UCSNS International Journal of Computer Scienceand Network Security, VOL.8 No.8, August 2008, 200815 Nilsson, M., and Ejoarsson, M.: 'Speech Recognition usingHidden Markov Model' , Department of Telecommunicat ionsand Speech Processing, Blekinge Institute ofTechnology, 200216 201417 Meng. J.,Zhang. 1., and Zhao, H.: ' Overview ofthe SpeechRecognition Technology ' , in Editor (Ed.)"(Eds.): ' BookOverview ofthe Speech Recognition Technology' (IEEE, 2012,edn.), pp.199-20218 Truong. K.: 'A utomatic pronunciation error detection inDutch as a second language: an acoustic-p honetic approach ' ,200419 Ahsiah, I., Noor, N.. and Idris, M.: "Tajweed checkingsystem 10 support recitation' , in Editor (Ed.Y(Eds.): ' BookTajweed checking system to support recitation' (IEEE, 2013,edn.), pp.1 89-19320 Ibrahim. N.J.: 'Automated TAJWEED checking rulesengine for Quranic verse recitation ', 201021 Ahmad, M.A., and EI Awady, R.M.: ' Phonetic recognitionof Arabic alpbabet letters using neural networks ' , International

    129

    Journal of Electric & Computer Sciences IJECS.IJENS, 2011,11, (01)22 Yahya, A., Aighamdi, M., and Alansari, M.A.A.: 'A utoCorrection of the Holy Quran Recitation', Journal of ComputerResearch . I I: 01- 15. (in Arabic), 201223 Zarrouk, E., Ayed, Y.B., and Gargouri , F.: ' Hybridcontinuous speech recognition systems by HMM, MLP andSVM: a comparative study ', International Journa l of SpeechTechnology, 2014, pp. I -I I24 Muhammad, W.M., Muhammad, R., Muhammad, A., andMartinez-Enriquez, A.: ' Voice Content Matching System forQuran Readers' , in Editor (Ed.)"(Eds.): ' Book Voice ContentMatching System for Quran Readers' (IEEE, 2010, edn.), pp.148-153