67
Meena Ramani 04/12/06 EEL 6586 Automatic Speech Processing

Meena Ramani 04/12/06

  • Upload
    trapper

  • View
    45

  • Download
    1

Embed Size (px)

DESCRIPTION

EEL 6586 Automatic Speech Processing. Meena Ramani 04/12/06. Topics to be covered. Lecture 1: The incredible sense of hearing 1 Anatomy Perception of Sound Lecture 2: The incredible sense of hearing 2 Psychoacoustics Hearing aids and cochlear implants. - PowerPoint PPT Presentation

Citation preview

Page 1: Meena Ramani 04/12/06

Meena Ramani

04/12/06

EEL 6586 Automatic Speech Processing

Page 2: Meena Ramani 04/12/06

Topics to be covered

Lecture 1: The incredible sense of hearing 1The incredible sense of hearing 1

Anatomy

Perception of Sound

Lecture 2: The incredible sense of hearing 2The incredible sense of hearing 2

Psychoacoustics

Hearing aids and cochlear implants

Page 3: Meena Ramani 04/12/06

The incredible sense of hearing-2The incredible sense of hearing-2

“Behind these unprepossessing flaps lie structures of such delicacy that they shame the most skillful craftsman"

-Stevens, S.S. [Professor of Psychophysics, Harvard University]

Page 4: Meena Ramani 04/12/06

How do we hear?

Page 5: Meena Ramani 04/12/06

Threshold of Hearing

Page 6: Meena Ramani 04/12/06

Equal loudness curves

Page 7: Meena Ramani 04/12/06

The Bass Loss Problem

Rock music

Too lowno bass

Too hightoo much bass

Page 8: Meena Ramani 04/12/06

Threshold variation with age

102

103

104

-10

0

10

20

30

40

50

60

70

80

90

Frequency (Hz)

Th

res

ho

ld o

f h

ea

rin

g (

dB

SP

L)

Thresholds of hearing for normal & HI listeners

Normal hearingHearing impaired

Page 9: Meena Ramani 04/12/06

The Audiogram

0 1000 2000 3000 4000 5000 6000-20

0

20

40

60

80

100

Frequency, Hz

He

ari

ng

Le

ve

l (H

L),

dB

Audiogram

Left EarRight Ear

Page 10: Meena Ramani 04/12/06

The Audiogram (contd.)

Pure tone audiogram

[250 500 1K 2K 4K 6k] Hz

<20 dB HL is Normal Hearing

0 1000 2000 3000 4000 5000 6000-20

0

20

40

60

80

100

Frequency, Hz

He

ari

ng

Le

ve

l (H

L),

dB

Audiogram

Left EarRight Ear

Page 11: Meena Ramani 04/12/06

Loudness Growth Curve

0 20 40 60 80 1000

1

2

3

4

5

6

7

Input level (dB SPL)

LG

OB

-Lo

ud

ne

ss

ra

tin

g

LGOB loudness growth curve at 250 Hz

Normal hearingHearing impaired

Page 12: Meena Ramani 04/12/06

Otoacoustic emissions

• The ear produces some sounds!– OHC-outer hair cell

• Used to test hearing for infants & check if patient is feigning a loss

Page 13: Meena Ramani 04/12/06

Monoaural beats

If two tones are presented monaurally with a small frequency difference, a beating pattern can be heard

500 & 502 Hz 500 & 520 Hz

Interaction of the two tones in the same auditory filter

Waveform: 150 Hz + 170 Hz

Page 14: Meena Ramani 04/12/06

Beating can also be heard when the tones are presented to different ears!

Beating arises from neural interaction

Only perceived if the tones are sufficiently close in frequency

500 Hz - left 520 Hz - right binaural

Binaural beats

Page 15: Meena Ramani 04/12/06

The case of the missing fundamental

Telephone BW: 300-3400 Hz

How do we know the pitch?

Primary Auditory cortex

•Pitch sensitive neurons [Bendor and Wang, Nature 2005]

•Neuron responds to fundamental and harmonics

•What are the I/Ps to these neuron?

How do spikes represent periodic, temporal and spectral information?

Page 16: Meena Ramani 04/12/06

Matlab code available

Feed it a wav file

Spits out PSTH

<post stimulus time histogram>

Auditory-periphery model

(Zhang et al. ~2001)

Page 17: Meena Ramani 04/12/06

Critical bands

Equally loud, close in frequency

•Same IHCs

•Slightly louder

Equally loud, separated in freq.

•Different IHCs

•Twice as loud

Psychoacoustic experiments

Page 18: Meena Ramani 04/12/06

Critical Band (cont.)

• Proposed by Fletcher• How to measure?

– S/N ratio vs noise BW • CB ~= 1.5mm spacing on BM• 24 such band pass filters

• BW of the filters increases with fc

• Logarithmic relationship – Weber’s law example

• Bark scale

Center Freq Critical BW

100 90

200 90

500 110

1000 150

2000 280

5000 700

10000 1200

Page 19: Meena Ramani 04/12/06

Critical bands for HI

103

104

0

10

20

30

40

50

60

70

80

90

Desired tone frequency (Hz)

De

sir

ed

to

ne

th

res

ho

ld (

dB

SP

L)

4 kHz tuning curve for normal & HI listeners

MaskerNormal hearingHearing impaired

Page 21: Meena Ramani 04/12/06

Frequency Masking

• Masking occurs because two frequencies lie within a critical band and the higher amplitude one masks the lower amplitude signal

• Masking can be because of broad band, narrowband noise, pure and complex tones

• Low frequency broad band sounds mask the most– Eg. Truck on road, water flowing

• Masking threshold– Amount of dB for test tone to be just audible in presence of noise

Page 22: Meena Ramani 04/12/06

Temporal Aspects of Masking

• Simultaneous Masking• Pre-Stimulus/Backward/Premasking

– 1st test tone 2nd Masker

• Poststimulus/Forward/Postmasking– 1st Masker 2nd test tone

Page 23: Meena Ramani 04/12/06

Simultaneous masking– Duration >200ms constant test tone threshold– Assume hearing system integrates over a period of 200ms

Postmasking– Decay in effect of masker for 100ms– More dominant

Premasking – Takes place 20ms before masker is on!!– Each sensation is not instantaneous , requires build-up time

• Quick build up for loud maskers• Slower build up for softer maskers

– Less dominant effect

Temporal Aspects of Masking (contd.)

Page 24: Meena Ramani 04/12/06

Temporal masking for HI

0 20 40 60 80 100 120 1400

10

20

30

40

50

60

70

80

Desired-Masker tone separation (ms)

De

sir

ed

to

ne

th

res

ho

ld (

dB

SP

L)

Temporal resolution at 4 kHz for normal & HI listeners

Normal hearingHearing impaired

Page 25: Meena Ramani 04/12/06

Meena Ramani

04/14/06

EEL 6586 Automatic Speech Processing

Page 26: Meena Ramani 04/12/06

Normal Hearing

Sensorineural Hearing Loss

Mild to Severe Loss

[10 20 30 60 80 90] dB HL

Time (s)

Fre

qu

en

cy

(H

z)

Cell phone speech for normal hearing

0 0.5 1 1.5 20

500

1000

1500

2000

2500

3000

3500

4000

-250

-200

-150

-100

-50

0

Time (s)

Fre

qu

en

cy

(H

z)

Cell phone speech for SNHL

0 0.5 1 1.5 20

500

1000

1500

2000

2500

3000

3500

4000

-250

-200

-150

-100

-50

0

What do the hearing impaired hear?

Page 27: Meena Ramani 04/12/06

Facts on Hearing Loss in Adults

• One in every ten (28 million) Americans has hearing loss. 

• The vast majority of Americans (95% or 26 million) with hearing loss can have their hearing loss treated with hearing aids. 

• Only 6 million use HAs

• Millions of Americans with hearing loss could benefit from hearing aids but avoid them because of the stigma.

Page 28: Meena Ramani 04/12/06

Types of Hearing aids

Behind The earIn the Ear

In the Canal Completely in the canal

Page 29: Meena Ramani 04/12/06

Anatomy of a Hearing Aid

• Microphone• Tone hook• Volume control• On/off switch

• Battery compartment

Page 30: Meena Ramani 04/12/06

Ear Mold Measurements

Hearing Aid Fitting

Page 31: Meena Ramani 04/12/06

Acclimatization effect

Auditory cortex brain plasticity

Time for the HI to reuse the HF information: Acclimatization effect

How does this affect HA fitting?– Multiple fitting sessions– Initial fitting should be optimum one

Page 32: Meena Ramani 04/12/06

So doc, what is the fitting methodology employed by the hearing aid company to compensate for my hearing loss?

Not-so-average Joe

(PhD EE/Speech person)

CO

NFI

DEN

TIA

L?

Page 33: Meena Ramani 04/12/06

So, do you want your HA to:

1) Always be comfortably loud2) Equalize loudness across

frequencies3) Normalize loudness

…?

?

Which fitting methodology is the bestbest?

Page 34: Meena Ramani 04/12/06

Existing HL compensation algorithms

Rationale Adhoc: Half Gain, POGO Make speech comfortable: NAL-R Loudness normalization: IHAFF, Fig 6 Loudness equalization: DSL

Hearing aid fittingalgorithms

Threshold-only Suprathreshold

NAL-R POGO HG Fig 6 IHAFF DSL

Page 35: Meena Ramani 04/12/06

Sensorineural hearing loss [10 20 30 60 80 90] dB HLSpeech level= 65 dBA

Spectrograms and sound files

Normal hearing Hearing impaired HI with Linear gain

HI with DSL gain HI with RBC gain

Section Two

Page 36: Meena Ramani 04/12/06

Speech Intelligibility

Objective MeasuresAI, STI

Speech Quality

Objective MeasuresPESQ

Subjective MeasuresMOS

Speech Intelligibility (SI): The degree to which speech can be understood

Performance metrics

Subjective MeasuresHINT

Speech Quality: “Does the speech match your expectations?”

Page 37: Meena Ramani 04/12/06

Performance metrics (contd.)• Objective speech quality measure

– Perceptual Evaluation of Subjective Quality (PESQ)• Subjective speech quality measure

– Mean Opinion Score (MOS)• Subjective speech intelligibility measure

– Hearing In Noise Test (HINT)

Reference signal

Comparison signal

Score

Page 38: Meena Ramani 04/12/06

Hearing In Noise Test (HINT)

Page 39: Meena Ramani 04/12/06

Subjective listening experiments

Audiograms of the HI patients

0 2000 4000 6000 80000

20

40

60

80

100

120

Frequency (Hz)

Th

res

ho

ld o

f h

ea

rin

g (

dB

HL

)

Left ear audiograms of the HI subjectsLocation:

Shands speech & hearing clinic

(sound proof booth)

Subjects:

15 HI people– PTA: 40-70 dB HL

15 normal hearing people

Tools used:

Matlab HINT and MOS GUIs

Page 40: Meena Ramani 04/12/06

Subjective HINT and MOS scores for RBC:hearing impaired, cell phone speech

RBC has a 7 dB improvement in SI when compared to DSL

MOS scores reveal that RBC has a quality rating of ‘Good’

None HPF RBC NALR POGO HG NALRP DSL

1-Bad

2-Poor

3-Fair

4-Good

5-Excellent

Algorithm

Ave. MOSs of 15 HI subjects

None HPF RBC NALR POGO HG NALRP DSL-20

-15

-10

-5

0

5

Algorithm

SN

R r

ela

tiv

e t

o b

as

eli

ne

(d

B)

Ave. HINT scores of 15 HI subjects

Page 41: Meena Ramani 04/12/06

Subjective HINT and MOS scores for RBC:normal hearing, cell phone speech

RBC has a 12 dB improvement in SI when compared to DSL

MOS scores reveal that RBC has a quality rating of ‘Good’

Page 42: Meena Ramani 04/12/06

Cochlear Implants

The first fully functional Brain Machine The first fully functional Brain Machine Interface (BMI)Interface (BMI)

Definition:

A device that electrically stimulates the auditory nerve of patients with severe-to-profound hearing loss to provide them with sound and speech information

Page 43: Meena Ramani 04/12/06

Who is a candidate?

• Severe-to profound sensorineural hearing loss

• Hearing loss did not reach severe-to-profound level until after acquiring oral speech and language skills

• Limited benefit from hearing aids

Page 44: Meena Ramani 04/12/06

• Worldwide:– Over 100,000 multi-channel implants

• At Univ of Florida:– Implanted first patient in 1985– Currently follow over 400 cochlear patients

CI statistics

Page 45: Meena Ramani 04/12/06

Technical and Safety Issues

• Magnetic Resonance Imaging• Surgical issues

Page 46: Meena Ramani 04/12/06

How does the Cochlea encode frequencies?

Page 47: Meena Ramani 04/12/06
Page 48: Meena Ramani 04/12/06
Page 49: Meena Ramani 04/12/06

Example: New Freedom

Page 50: Meena Ramani 04/12/06
Page 51: Meena Ramani 04/12/06

CI characteristics

1. Electrode design – Number of electrodes, electrode configuration

2. Type of stimulation – Analog or pulsatile

3. Transmission link – Transcutaneous or percutaneous

4. Signal processing – Waveform representation or feature extraction

Page 52: Meena Ramani 04/12/06

Signal processing

• Compressed Analog (CA)• Continuous Interleaved Sampling (CIS)• Multiple Peak (MPEAK )• Spectral Maxima Sound Processor (SMSP)• Spectral Peak (SPEAK)

Page 53: Meena Ramani 04/12/06

Compressed Analog (CA) approach

Page 54: Meena Ramani 04/12/06

CA activation signals

Page 55: Meena Ramani 04/12/06

Continuous Interleaved Sampling (CIS)

Page 56: Meena Ramani 04/12/06

CIS activation signals

Page 57: Meena Ramani 04/12/06

Multiple Peak (MPEAK)

Page 58: Meena Ramani 04/12/06

MPEAK activated electrodes

Page 59: Meena Ramani 04/12/06

Spectral Maxima Sound Processor (SMSP)

Page 60: Meena Ramani 04/12/06

SMSP activated electrodes

Page 61: Meena Ramani 04/12/06

Spectral Peak (SPEAK)

Page 62: Meena Ramani 04/12/06

SPEAK activated electrodes

Page 63: Meena Ramani 04/12/06

Outcomes for Post-lingual Adults

• Wide range of success

• Most score 90-100% on AV sentence materials

• Majority score > 80% on high context

• Performance more varied on single word tests

Page 64: Meena Ramani 04/12/06

Auditory Brainstem Implant

• Approved October 20, 2000• Uses the Nucleus 24 system

processors• Plate array with 21 electrodes

Page 65: Meena Ramani 04/12/06

Review-1Pinna:

ITDs,IIDs: Horizontal localizationReflections: Vertical localization

Ear canal:¼ wave resonance 1-3 kHz

Middle ear:Amplification by lever action and by areaStapedius reflex

Cochlea:IHCs/OHCs: convert mechanical to electricalPlace theory: frequency analysisMissing fundamental

Page 66: Meena Ramani 04/12/06

Review-2

Adaptation: AN firing sensitive to changes

Otoacoustic emissions:Produced by movement of OHCs

Beats:Monaural & binaural

Measurement of hearing:Audiogram: threshold of hearingThreshold variation with ageEqual loudness curves

Bass loss problem: discrimination against LFs

Page 67: Meena Ramani 04/12/06

Review-3

Critical bands:used for efficient encodingBark scale

Masking:Frequency: LFs mask moreTemporal: simultaneous, pre and post

Hearing impairment:Hearing aids: external to cochleaCochlear implants: inside cochlea