View
225
Download
0
Category
Preview:
Citation preview
8/14/2019 Speech Recognition UTHM
1/30
Design of Automatic Speech RecognitionSystems for Voice Identification and
Analysis
Supervisor: Dr. Mohammed Faiz Liew bin AbdullahDr. Rozlan bin Alias
En.Ariffuddin bin Joret
DINESHWARAN GUNALAN HE120104TUAN MUSTAQIM HE120105
UNIVERSITY TUN HUSSEIN ONN (UTHM)
8/14/2019 Speech Recognition UTHM
2/30
8/14/2019 Speech Recognition UTHM
3/30
Advantages and Applications
Dictation
Command and control
Telephony
Medical/disabilities
8/14/2019 Speech Recognition UTHM
4/30
How the Brain Work
Articulation produces
sound waves which the ear conveys to the brain
for processing
8/14/2019 Speech Recognition UTHM
5/30
How the Computer Work
Digitization
Acoustic analysis of thespeech signal
Linguistic interpretation
coustic waveform Acoustic signal
Speech recognition
8/14/2019 Speech Recognition UTHM
6/30
Challenges Faced
Ease of use
Robust performance
Automatic learning of new words and sounds
Grammar for spoken language
Control of synthesized voice quality
Integrated learning for speech recognition and synthesis
Getting a computer to understand spoken language
Digitization
Converting analogue signal into digital representation
Signal processing
Separating speech from background noise
Phonetics
Variability in human speech
Phonology
Recognizing individual sound distinctions (similar phonemes)
8/14/2019 Speech Recognition UTHM
7/30
Digitization
Analogue to digital conversion Sampling and quantizing
Use filters to measure energy levels forvarious points on the frequency spectrum
Knowing the relative importance of differentfrequency bands (for speech) makes thisprocess more efficient
E.g. high frequency sounds are lessinformative, so can be sampled using abroader bandwidth (log scale)
8/14/2019 Speech Recognition UTHM
8/30
Separating Speech from Background Noise
Noise cancelling microphones
Two mics, one facing speaker, the other facingaway
Ambient noise is roughly same for both mics Knowing which bits of the signal relate to speech
Spectrograph analysis
8/14/2019 Speech Recognition UTHM
9/30
Variability in Individuals Speech
Variation among speakers due toVocal range (f0, and pitch rangesee later)
Voice quality (growl, whisper, physiologicalelements such as nasality, adenoidality, etc)
ACCENT !!! (especially vowel systems, but alsoconsonants, allophones, etc.)
Variation within speakers due to
Health, emotional stateAmbient conditions
Speech style: formal read vs spontaneous
8/14/2019 Speech Recognition UTHM
10/30
Identifying Phonemes
Differences between some phonemesare sometimes very small
May be reflected in speech signal (eg
vowels have more or less distinctive f1 andf2)
Often show up in coarticulation effects(transition to next sound)
e.g. aspiration of voiceless stops in English
Allophonic variation
8/14/2019 Speech Recognition UTHM
11/30
Disambiguating Homophones
Mostly differences are recognised by humansby context and need to make sense
Its hard to wreck a nice beach
What dimes a necks drain to stop port?
Systems can only recognize words that are intheir lexicon, so limiting the lexicon is anobvious ploy
Some ASR systems include a grammar whichcan help disambiguation
8/14/2019 Speech Recognition UTHM
12/30
Interpreting Prosodic Features
Pitch, length and loudness are used toindicate stress
All of these are relative
On a speaker-by-speaker basisAnd in relation to context
Pitch and length are phonemic in some
languages
8/14/2019 Speech Recognition UTHM
13/30
Pitch
Pitch contour can be extracted fromspeech signal
But pitch differences are relative
One mans high is another (wo)mans low Pitch range is variable
Pitch contributes to intonation
But has other functions in tone languages Intonation can convey meaning
8/14/2019 Speech Recognition UTHM
14/30
Length
Length is easy to measure but difficult tointerpret
Again, length is relative
It is phonemic in many languages Speech rate is not constantslows down at the
end of a sentence
8/14/2019 Speech Recognition UTHM
15/30
Loudness
Loudness is easy to measure butdifficult to interpret
Again, loudness is relative
8/14/2019 Speech Recognition UTHM
16/30
Objectives of our Project
Design and implement an automatic voicerecognition system.
Design a GUI system.
Design an Interface circuit to blink LED
MATLAB program is used to implement theproject.
8/14/2019 Speech Recognition UTHM
17/30
Speech Recognition
Main pr inciple
of Speech
Recognition
Identification Verification
8/14/2019 Speech Recognition UTHM
18/30
Hardware
8/14/2019 Speech Recognition UTHM
19/30
GUI Software
8/14/2019 Speech Recognition UTHM
20/30
Input Speech Signals
There are silence at the beginning and at theend of the signals.
The word consists of two syllables.
ze ro
8/14/2019 Speech Recognition UTHM
21/30
Frame Blocking
FrameBlocking Windowing FFT
Final Output
Continuous
speech
frame spectrum
Continuous speech signal is blocked into frames of Nsamples with adjacent frames being separated by M(M
8/14/2019 Speech Recognition UTHM
22/30
Frame Blocking
FrameBlocking Windowing FFT
Final Output
Continuous
speech
frame spectrum
Continuous speech signal is blocked into frames of Nsamples with adjacent frames being separated by M(M
8/14/2019 Speech Recognition UTHM
23/30
After Frame Blocking
The speech signals were blocked into framesof N samples with overlap.
8/14/2019 Speech Recognition UTHM
24/30
Windowing
FrameBlocking Windowing FFT
Final Output
Continuous
speech
frame spectrum
Each individual frame will be windowed.
Hamming window is used in this project.
8/14/2019 Speech Recognition UTHM
25/30
After Windowing
8/14/2019 Speech Recognition UTHM
26/30
8/14/2019 Speech Recognition UTHM
27/30
After Windowing
FrameBlocking Windowing FFT
Final Output
Continuous
speech
frame spectrum
Convert each frame of N samples form timedomain into the frequency domain.
8/14/2019 Speech Recognition UTHM
28/30
8/14/2019 Speech Recognition UTHM
29/30
Spectrum
8/14/2019 Speech Recognition UTHM
30/30
Thank You
For
Your Time
Recommended