61
Audio & Speech Technology for Consumer Electronics Basics and Technical Challenges ICCE Consumer Electronics Society Webinar Reinhard MOELLER University of Wuppertal

Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

Embed Size (px)

Citation preview

Page 1: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

Audio & Speech Technology for Consumer ElectronicsBasics and Technical Challenges

ICCE Consumer Electronics Society Webinar

Reinhard MOELLER University of Wuppertal

Page 2: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

221.09.1721.09.17

● Introduction● Historical Facts● Mathematical Elements of

Speech Technology ● Speech Processing

Page 3: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

321.09.1721.09.17

● Introduction

Page 4: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

421.09.1721.09.17

Introduction

● Human differentiates Sound and Noise

● Sound and Noise are evolutionary basis of communication between human and environmentHumans can feel and hear acoustic information

Page 5: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

521.09.1721.09.17

Principles of Sound

Sound • travels in waves, produced when an object pushes

on the air around it, causing small changes in air pressure.

• Properties: frequency, wavelength, period, amplitude, phase and speed

• Can be one single tone or a mixture of several tones with equal or different properties; Examples:

•music consists of a mixture of different frequencies and amplitudes•White noise – mix of frequencies with equal power distribution over a given frequency range,“unwanted” sound, harsh/crisp sounding noise•Pink noise – mix of frequencies with equal power distribution over a given logarithmic frequency scale, “naturally” sounding environment noise•speech

Page 6: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

621.09.1721.09.17

Human Audio „Sensors”: The Ears

The principle of hearing (after H. v. Helmholtz, 1873)

The inner ear is an active sound analyzer

Page 7: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

721.09.1721.09.17

Measurement of Sound

• The sound level heard by human ears is commonly measured in decibels

• Referring to sound, a decibel is used to measure the amplitude of the sound wave: 10 log (P2/P1) dB

• Unit Decibel is useful because it can represent the wide range of sound levels the human ear can hear using a more manageable scale

• On the decibel scale, the softest sound that can be heard is 0 dB (P1=P2). Every increase of 10 dB represents an approximate doubling of the perceived loudness of the sound

Page 8: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

821.09.1721.09.17

Dynamics of Human Hearing

Very soft

Extremly Loud

Dynamic rangeof a bicycle: 7:1

Dynamic range of a human ear: 1.000.000:1

Issues:How loud is too loud?What about hearing impairment?

Page 9: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

921.09.1721.09.17

Human Audio “Actuator”: Speech and Tone

● Speech production model: source-filter interaction– Anatomical structure (vocal tract/glottis) conveyed in speech spectrum

Glottal pulses Vocal tract Speech signal

Page 10: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1021.09.1721.09.17

● Historical Facts

Page 11: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1121.09.1721.09.17

Pre-History of Audio and Speech Technology

● 1653: Cyrano de Bergerac„Sonderbare Geschichten der Staaten und Reiche des Mondes“

– .. books are little mechanical boxes like wristwatches.. reader fits its nerves and listens to the sound…

● 1786: Baron Münchhausen„Der Ritt auf der Kanonenkugel und andere Abenteuer„

– ..frozen sound carried in a post horn, melted behind a warm oven and resound..

Page 12: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1221.09.1721.09.17

Pre-History of Audio and Speech Technology

● 1634: Keppler– „once we will produce speeking

machines, but they will have a snarling tone"

● 1761: Euler– „It would be one of our most important

inventions, if we could build a machine able to imitate all sounds of our words with all articulations... The thing does not seem to be impossible to me“

● 1773: Ch. G. Kratzenstein– Single vowels using resonance tubes

connected to organ pipes

Page 13: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1321.09.1721.09.17

Pre-History of Audio and Speech Technology

● 1791: Wolfgang von Kempelen– „Mechanism of the human speech and

description of a speaking machine“

– The Chess Turk

– detailed construction plans, basis for later reconstructions and improvement

– called „..the first phonetitian..“

● 1824: Johann Maelzel– speaking dolly (Mama, Papa) Kempelen‘s Speaking Machine

Source: German Museum

Page 14: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1421.09.1721.09.17

History of consumer audio recording

● 1877: Edison‘s Phonograph– Information carrier is a cylinder

– Intended applications:● dictaphone, voice recorder

● Archive of voices of famous people

– First recorded and replayed word: HELLO

Page 15: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1521.09.1721.09.17

● History of consumer audio recording

1887: Berliner‘s Grammophon started success story of music

mass reproduction- wax coated zinc plate1892: pressed rubber disc1895: Schellack disc1896: Edison Spring motor

enhanced phonograph1908: double-sided disc1948: PVC

Page 16: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1621.09.1721.09.17

History of consumer audio recording

1898: Piano Roll in mass production

Page 17: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1721.09.1721.09.17

History of consumer audio electronics recording: Music media● 1930‘s: magnetizable tapes

● 1983 Digital Audio Tape (DAT)– originally for consumer use

– professional 8 channel S-VHS since 1993

● 1980: Red Book Standard (AudioCD)– 44.1 kHz, 16 Bit, 74 minutes

● 1990+: DVD Audio, Mini Disc, iPod,solid state disc & more

Page 18: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1821.09.1721.09.17

Consumer audio electronics:Development towards spatial Audio

● 2 channel stereo: one-dimensional (width of stage)

● 2 channel surround: two dimensions (added depth of room)

● N channel 3D: added audio tracks for upper frequency bands

● N-channel object-based VR: binaural technology, outside head

● Future: Audio AR, i.e for gaming and navigation

Stereo(2-3 speakers)

Surround(5 to 7 speakers)

3D(7 plus speakers)

Audio VR(7 plus speakers)

Immersion

60‘s ~201670‘s

Page 19: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

1921.09.1721.09.17

Mathematical Elements of Speech Technology

Page 20: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2021.09.1721.09.17

HMI: Dialog and Speech Understanding

“A symbolic description should be calculated from a speech signal, that allows a usable reaction of a system to a verbally expressed user demand in context of a human-machine dialog.”

according to: Sagerer, Automatisches Verstehen gesprochener Sprache, BI-Wiss.-Verl., 1990

Page 21: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2121.09.1721.09.17

Mathematical Elements

● Elements– Signal, System, Frequency, Amplitude,

Phase, Spectrum– Sampling, Quantisation

● Acoustic Modells of Speech Production– Tube Model– Source-Filter-Model– Perturbation Model (Formant Shifting)

● Spectral Attributes of Sound Classes● Spectral Analysis

– Basics– Windowing

Page 22: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2221.09.1721.09.17

Basics and Terminology

● Signal– analog (continuous in time and value)

• modulated Signals: amplitude-, frequency modulated

– digital (discrete time and discret value)● Signal parameters

– Frequency– Amplitude– Phase

● Spectrum

Page 23: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2321.09.1721.09.17

Frequency, Amplitude, Phase

● Frequency = 1 / cycle time [Hz]

● Phase = displacement of a wave with respect to a fixed point in time

Cycle time

Amplitude

t =Time

• Waves with same phase

• Waves with different phase

Page 24: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2421.09.1721.09.17

Analog to Digital Signal Conversion

● Analog Signal

● Sampling– Time becomes discrete

● Quantization– Values become discrete

Page 25: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2521.09.1721.09.17

Sampling

● Nyquist/Shannon definition– Signal is fully reconstructable if

fsample > 2 fmax – Otherwise we get aliasing

● example speech analysis:– fmax ~ 7 kHz– fsample =16 kHz

● Sampling rate:– Number of samples per second

Page 26: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2621.09.1721.09.17

Quantization

Quantization error

Sampling value

Mean value of interval

Maximum quantization error

Page 27: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2721.09.1721.09.17

Topics of Speech Acoustics

● Concerned with signal processing and speech communication

● Topics:– Speech production, Vocal tract models

– Seech signal analysis

– Speech perception, Readability and -quality

– Speech- and Sound coding

– Speech synthesis

– Noise suppression, robust Speech-signal processing

– Speech recognition

– Speaker recognition

Page 28: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2821.09.1721.09.17

Speech signal in time and frequency domain

The word „aua“ in time domain

The word „aua“ in frequency domain

Page 29: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

2921.09.1721.09.17

Signal Spectgrogram vs. Cascade Spectrogram

Page 30: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3021.09.1721.09.17

● Wide-band Spectrogram– Shows formants (resonance

functions of vocal tract) = characteristics of filter

● Narrow-band Spectrogram– Shows harmonics =

characteristics of source

● Synonyme: Sonagram

Spectrogram II

Page 31: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3121.09.1721.09.17

„flat“ Spectrogram (Sonagram)

time

freq

uenc

y

Amplitude shown by density

Page 32: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3221.09.1721.09.17

Acoustic Models of Speech Production

● Source/Filter Model

● Tube Model

● Perturbation Model (formant shifting)

Page 33: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3321.09.1721.09.17

1) Source/Filter Model

Source Filter Speech signal

Sound formingStimulation

Page 34: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3421.09.1721.09.17

2) Tube Model

● Vocal tract modelled with tube elements of different diameters

Approximation of changing cross-sectionwith piecewise homogenous tubes Tube model

Glottis lips

Page 35: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3521.09.1721.09.17

Simplified tube model

● assumption:

– The whole vocal tract is a homogenous tube

– Diameter is much less then length

– Equal diameter over length

– Glottis = total reflector

– Lips = open end

● Result: – resonant wave

Page 36: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3621.09.1721.09.17

3) Formant shifting model

● Defined by local energy maxima in spectrum

● Center frequency is defined as formant frequency

● Independent of base frequency

● Based on resonance characteristics (size and form) of articulation tract

● 1st and 2nd formant define vowels

Page 37: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3721.09.1721.09.17

Formant-Shifting (Perturbation Model)

● Increasing (+) resp. Minimizing (-) of the first three formants by shifting the local constriction of the articulation tract

Page 38: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3821.09.1721.09.17

Sonagrams i, u, a

Page 39: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

3921.09.1721.09.17

Speech Recognition

Page 40: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4021.09.1721.09.17

Interdisciplinarity of Speech Technology

Engineering / Computer Science

Computer Linguistics Phonetics

Natural Dialog, Speech-understanding, Text-to-Speech

i.e. Systems for:

Consumer Electronics

Page 41: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4121.09.1721.09.17

Typical Tasks in Speech Recognition

SpeechRecognition

LanguageRecognition

SpeakerRecognition

Words

Language Name

Speaker Name

“How are you?”

English

Glenn Miller

Speech Signal

Goal: Automatically extract information transmitted in speech signal

Page 42: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4221.09.1721.09.17

Three Steps of Speech Processing

Red

uctio

n o

f U

nce

rtai

nty

Grammar

Word

definitions

What does the speaker mean?

10alternatives

Speech Analysis

Knowledge about topic,

dialog partnerand context

What is the intent of the speaker?Unambiguous

understanding within the dialog

Speech

Understanding

Spoken Input

What did the speaker say?100

alternatives

Speech Recognition

Acoustic

Speech analysis

Word lists

acc. to W. Wahlster, DFKI

Page 43: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4321.09.1721.09.17

Speech Recognition: Dependencies

● EnvironmentNoise, Acoustics, S/N ratio

● Speaker‘s stateHealth, stress, gender..

● Speaker`s literacylanguage, amount of words

● Softwaresystem, dynamics, algorithm, error handling

● Use Casetranslation, user-device dialog, robotics

● Hardwaremicrophones, speakers

● Dialog Architecturesoftware design

● Training

Page 44: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4421.09.1721.09.17

Noise contamination of speech

Noise

Environmental Personal

Continuous Transient Related to breathing

Non related to breathing

e.g.

•Air Conditioner

•Motors

•Fans

•Continuous Conversation

e.g.

•Phone

•Vocal/

•Conversational

•Alarms

e.g.

•Body motion

•Respiratory infects/

•Distorted respiration

e.g.

•Indoor/ Outdoor Movement

•Clothes

•Joint crackles

Page 45: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4521.09.1721.09.17

Acoustic Wave

PossiblePhonemes

PossibleWords

PossibleSentences

Speech Recognition

PossibleSentences

GrammarStructure

WordMeaning

Phrase/SentenceMeaning

Speech Analysis

SentenceMeaning

Discourse Meaning in Source Language

Phrase Choice inTarget Language

Speech Understandingand Translation

Discourse Meaning in Target Language

Phrase Choice inTarget Language

Sentence Production

Speech Synthesis

Prosody Generation

Generation and Synthesis

Process Chain in Speech Processing

Page 46: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4621.09.1721.09.17

Remember: Technical Evaluation of a Speech Signal

● Speech is a continuous evolution of the vocal tract – Need to extract time series of spectra

– Use a sliding window - 20 ms window, 10 ms shift

..

.

Fourier Transform

Fourier Transform MagnitudeMagnitude

• Produces time-frequency evolution of the spectrum

Page 47: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4721.09.1721.09.17

Sonagram

Narrow-band Sonagram

Broad-band Sonagram

voiced voiced voiced

freq

uenc

yfr

eque

ncy

time (s)

formants

Page 48: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4821.09.1721.09.17

Segmentability of Sonagrams: Phonemes

Page 49: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

4921.09.1721.09.17

Speech Recognition: Problems

acc. to W. Wahlster, DFKI

„Calligraphy“

Spontanuous speech

Nonlinear time distortion

Channel distortion

„Coctail party effect“

Co- articulation

Variation in speech (slang)

no break between words

Punctuation? Capitalization?

A very good morning Mrs. Lennard. How is the state of your actual workplan?

Hi Jane, what's up with your plans?

Hi Jane what's up with your plans

HiJanewhatsupwithyourplans

Uh Jaine, whatss up with ya plan

Page 50: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5021.09.1721.09.17

Speech Recognition: Variety of Signals“Ich habe einen Termin um 17 Uhr 30”

Page 51: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5121.09.1721.09.17

Speech Recognition: Word Hypothesis Graph

“It´s hard to recognize speech”

U Washington, CS

Page 52: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5221.09.1721.09.17

Application to Consumer Electronics Dialog Systems

Systems Complexity

Siz

e o

f V

oca

bu

lary

Standard IVR Systems

Command & Control

“Star Trek Dialogs”Dictation

very high low

smal

lV

ery

larg

e

Telephone Dialogs

Dialog Systems

Page 53: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5321.09.1721.09.17

Characteristics of speech processing systems

● Speaker-dependent: – high training efforts

– limited group of users

– highly individual and sensitve against small changes

● Speaker-independent: – no training, robust

– small word capacity

● Speaker-adaptive: – learning system

– instant improvement of recognition

Training efforts

• Single-word recognition: – recognition of isolated spoken

words

• Discrete recognition: – short breaks between words

• Continuous recognition: – no break between words

• Spontaneous recognition: – speech with or without delays– interrupted words

Input types

Page 54: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5421.09.1721.09.17

Questions?

Page 55: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5521.09.1721.09.17

Speaker Recognition

Page 56: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5621.09.1721.09.17

...

Fourier Transform

Fourier Transform MagnitudeMagnitude

• Produces time-frequency evolution of the spectrum

Features for Speaker Recognition• Speech is a continuous evolution of the vocal tract

– Need to extract time series of spectra– Use a sliding window - 20 ms window, 10 ms shift

Page 57: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5721.09.1721.09.17

General Theory

- Speaker Models -● Speaker models (voiceprints) represent voice biometric in compact and generalizable form

h-a-d

• Modern speaker verification systems use Hidden Markov Models (HMMs)

– HMMs are statistical models of how a speaker produces sounds

– HMMs represent underlying statistical variations in the speech state (e.g., phoneme) and temporal changes of speech between the states.

– Fast training algorithms (EM) exist for HMMs with guaranteed convergence properties.

Page 58: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5821.09.1721.09.17

Neural network-based speech recognition

Another approach in acoustic modeling is the use of neural networks. They are capable of solving much more complicated recognition tasks, but do not scale as well as HMMs when it comes to large vocabularies. Rather than being used in general-purpose speech recognition applications they can handle low quality, noisy data and speaker independence. Such systems can achieve greater accuracy than HMM based systems, as long as there is training data and the vocabulary is limited. A more general approach using neural networks is phoneme recognition. This is an active field of research, but generally the results are better than for HMMs. There are also NN-HMM hybrid systems that use the neural network part for phoneme recognition and the hidden markov model part for language modeling.

Page 59: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

5921.09.1721.09.17

Following: Part II

Applications

Page 60: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

6021.09.1721.09.17

Psychoacoustics

University of Surrey, UK

Page 61: Audio & Speech Technology for Consumer … & Speech Technology for Consumer Electronics ... IEEE Consumer Electronics Society ... a decibel is used to measure the amplitude of the

IEEE Consumer Electronics Society

6121.09.1721.09.17

Voiceprint as a Biometric

• Biometric: a human generated signal or attribute for authenticating a person’s identity

• Voice is a popular biometric:– natural signal to produce

– ubiquitous: telephones, microphone equipped PC

• Voice biometric combined with other forms of security– Something we have

- e.g., badge

– Something we are - e.g., voice

– Something we know - e.g., password

Strongest security

HaveKnow

Are