31
Perceptual Linear Pred ictive Analysis of Spe ech Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990 報報 : 報報報

報告 : 張志豪

  • Upload
    iman

  • View
    87

  • Download
    6

Embed Size (px)

DESCRIPTION

Perceptual Linear Predictive Analysis of Speech Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990. 報告 : 張志豪. Outline. Linear Prediction Coding Mel-scale Frequency Cepstral Coefficients Perceptual Linear Predictive. Introduction. Feature Extraction - PowerPoint PPT Presentation

Citation preview

Page 1: 報告  :  張志豪

Perceptual Linear Predictive Analysis of Speech

Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990

報告 : 張志豪

Page 2: 報告  :  張志豪

2

Outline

• Linear Prediction Coding

• Mel-scale Frequency Cepstral Coefficients

• Perceptual Linear Predictive

Page 3: 報告  :  張志豪

3

Introduction

• Feature Extraction– Speech Production Model

• Linear Prediction Coding

– Speech Perception Model• Mel-scale Frequency Cepstral Coefficients

Page 4: 報告  :  張志豪

4

Linear Prediction Coding

• Property– Approximates the areas of high-energy concentration while smoothing

out the fine harmonic structure and other less-relevant spectral details.

– The approximated high-energy spectral areas often correspond to the resonance frequencies of the vocal tract (formants).

Page 5: 報告  :  張志豪

5

Linear Prediction Coding

• Autocorrelation– Levinson-Durbin Recursion

• Impulse Response

0 50 100 150 200 250 300 350-30

-20

-10

0

10

20

30

0 50 100 150 200 250 300 350-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 1000 2000 3000 4000 5000 6000 7000 800020

40

60

80

100

120

140

Time domain Time domain Frequency domain

Speech LPC Speech and LPC

Page 6: 報告  :  張志豪

6

Linear Prediction Coding

• Disadvantage– LPC approximates speech equally well at all frequencies of the

analysis band. This property is inconsistent with human hearing. Beyond about 800Hz, the spectral resolution of hearing decreases with frequency.

– The amplitude levels typically encountered in conversational speech, hearing is more sensitive in the middle frequency range of the audible spectrum.

– The spectral details of speech are not always preserved or discarded by LPC analysis according to their auditory prominence.

Page 7: 報告  :  張志豪

7

Mel-scale Frequency Cepstral Coefficients

• Mel-scale– 在低頻部分 , 人耳感受是比較敏銳 – 在高頻部分 , 人耳的感受就會越來越粗糙 – 人耳對於頻率的感受事呈對數變化的

Page 8: 報告  :  張志豪

8

Mel-scale Frequency Cepstral Coefficients

Page 9: 報告  :  張志豪

9

Mel-scale Frequency Cepstral Coefficients

• Discrete cosine transform – 由 frequency domain 轉回 time domain– frequency of frequency

Page 10: 報告  :  張志豪

10

MFCC & LPC

• Mel-scale Frequency Cepstral Coefficients – Advantage

• 強調語音頻譜上的特性 , 即使在有雜訊干擾的環境下 , 仍能維持較佳的辨識率– Disadvantage

• 運算量較大

• Linear Prediction Coding– Advantage

• 運算量小– Disadvantage

• 未考慮語音頻譜上的特性 , 辨識率隨著雜訊增加而下降

Page 11: 報告  :  張志豪

11

Perceptual Linear Predictive

MFCC

LPC

Page 12: 報告  :  張志豪

12

Perceptual Linear Predictive

• Equal-Loudness Preemphasis2 6 4

2 6 2 9 6 26

( 56.8 10 )( )( 6.3 10 )( 0.38 10 )( 9.58 10 )

w wE ww w w

Page 13: 報告  :  張志豪

13

Perceptual Linear Predictive

• Equal-Loudness Preemphasis (count.)– 與預強的效果相同 ?

2 4 6 8 10 12 14 16 180

1

2

3

4

5

6

x 109 Filter Bank

2 4 6 8 10 12 14 16 180

0.5

1

1.5

2

2.5

3

3.5

4

x 109 Equal-Loudness

Frequency domain Frequency domain

Page 14: 報告  :  張志豪

14

Perceptual Linear Predictive

• Intensity-Loudness Power Law–

2 4 6 8 10 12 14 16 180

0.5

1

1.5

2

2.5

3

3.5

4

x 109 Equal-Loudness

2 4 6 8 10 12 14 16 180

200

400

600

800

1000

1200

1400

Intensity-Loudness

1

3

Perceived loudness, L(w), is approximately the cube root of the intensity, I(w)

L(w) = I(w)

Frequency domain Frequency domain

Page 15: 報告  :  張志豪

15

Perceptual Linear Predictive

• Intensity-Loudness Power Law (count.)– Power spectrum 不需要再開平方

• ek = (float)sqrt((double)(t1*t1 + t2*t2));

– Filter bank 後的值不需要取 log• bins[bin] = log((double)t1);

Page 16: 報告  :  張志豪

16

Perceptual Linear Predictive

• Inverse Discrete Fourier Transform– 由 frequency domain 轉回 time domain

2 4 6 8 10 12 14 16 180

200

400

600

800

1000

1200

1400

Intensity-Loudness

2 4 6 8 10 12

-150

-100

-50

0

50

ac

Frequency domain Time domain

Page 17: 報告  :  張志豪

17

Perceptual Linear Predictive

• Autoregressive Modeling (LPC)

2 4 6 8 10 12

-150

-100

-50

0

50

ac

2 4 6 8 10 12

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

c

Time domainTime domain

Page 18: 報告  :  張志豪

18

Experiment

  3 6 9

MFCC 54.21 55.11 55.37

PLP_05 39.01 39.32 39.88

PLP_10 52.55 53.02 53.02

PLP_12 53.79 54.49 54.94

PLP_14 53.62 53.94 54.27

*PLP_12 31.65 32.08 32.03

Page 19: 報告  :  張志豪

19

Thanks

Page 20: 報告  :  張志豪

20

Thanks

Page 21: 報告  :  張志豪

21

Choice Of The Order Of The Autoregressive PLP Model

• Introduction

• Spectral distortion measure of PLP

• Single-frame phoneme identification

• Isolated-word identification

Page 22: 報告  :  張志豪

22

Choice Of The Order Of The Autoregressive PLP Model

• Introduction – With increasing model order the spectrum of the all-pole model

asymptotically approaches the auditory spectrum.

Page 23: 報告  :  張志豪

23

Choice Of The Order Of The Autoregressive PLP Model

• Spectral Distortion Measure of PLP – group-delay distortion measure

• The spectral peaks of the model are enhanced and its spectral slope is suppressed.

• The group-delay metric is more sensitive to distance between narrow peaks.

• The group-delay measure is more sensitive to the actual value of the spectral peak width.

– Exponential measure• Allows for various degrees of peak enhancement.

Page 24: 報告  :  張志豪

24

• Single-Frame Phoneme Identification – As is evident, the PLP identification accuracy increases up to

about the 5th order of the autoregressive model and then starts decreasing with further increases in the model order.

Choice Of The Order Of The Autoregressive PLP Model

Page 25: 報告  :  張志豪

25

Choice Of The Order Of The Autoregressive PLP Model

• Isolated-Word Identification

Page 26: 報告  :  張志豪

26

Choice Of The Order Of The Autoregressive PLP Model

• Discussion– The advantage of the PLP over the LP is that it allows for the

effective suppression of the speaker-dependent information by choosing the particular model order.

– The linguistically relevant speaker-independent cues lie in the gross shape of the auditory spectrum. This gross shape can be characterized by the one or two spectral peaks of the 5th-order PLP model.

Page 27: 報告  :  張志豪

27

PLP and Human Hearing

• Introduction

• Formant Frequency Changes

• Sensitivity to Bandwidth Changes

• Sensitivity to Spectral Tilt

• Sensitivity to F0

• Discussion

Page 28: 報告  :  張志豪

28

PLP and Human Hearing

• Introduction– The first three formant frequencies is approximately constant in

relative frequency. The LP analysis is in conflict with it.

Page 29: 報告  :  張志豪

29

PLP and Human Hearing

• Formant Frequency Changes

Page 30: 報告  :  張志豪

30

PLP and Vowel Perception

• Introduction

• The effective second formant

• Spectral peak integration theory

• The significance of the bandwidth B2

• Discussion

Page 31: 報告  :  張志豪

31