♥♥♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 1/191. Intro.2.Spec.sub.3.Est. noise4.Intro.J& S5.Results6 Concl ♠♠◄◄►►

♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 1/19

IIT

Bo

mb

ayICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia[Wed, 25th Aug, R. 201 Speech processing & communication systems 2, 15.40]

Enhancement of Electrolaryngeal Speech by Spectral Subtraction, Spectral Compensation,

and Introduction of Jitter and Shimmer

Prem C. PandeyS. Khadar Basha

{pcpandey, basha}ee.iitb.ac.inhttp://www.ee.iitb.ac.in/~spilab

IIT Bombay, India


IIT

Bo

mb

ay OVERVIEW1. Introduction2. Spectral subtraction3. Estimation of noise spectrum4. Jitter, shimmer, & spectral compensation5. Results 6. Conclusion


IIT

Bo

mb

ay

1 INTRODUCTION

Glottal excitation to vocal tract

Intro. 1/4

Natural speech


IIT

Bo

mb

ay

Excitation to vocal tract from external vibrator

Electrolaryngeal speech

Intro. 2/4


IIT

Bo

mb

ay Problems with electrolarynx

• Dynamic control of level, voicing, & pitch not feasible

• Background noise due to leakage of acoustic energy, affecting the intelligibility

• Unnatural quality due to

▪ Low frequency spectral deficit

▪ Constant pitch & level

Intro. 3/4


IIT

Bo

mb

ay

Methods of noise reduction• Acoustic shielding of vibrator (Epsy-Wilson et al 1996)

• 2-input noise cancellation based on LMS algorithm ( Epsy-Wilson et al 1996)

• Single input noise cancellation using spectral subtraction▪ Averaging based noise est. & pitch-synch. generalized spectral

subtraction (Pandey et al 2002)

▪ Quantile based noise estimation (Pandey et al 2004)

▪ Parameter adaptation using freq.domain auditory masking (Liu et al 2006)

▪ Min.statistics based noise estimation (Mitra & Pandey 2006, Kabir et al 2008)

Intro. 4/4


IIT

Bo

mb

ay

2 SPECTRAL SUBTRACION

Noise generation (Pandey et al 2002)

• Leakage of vibrations produced by vibrator membrane

• Improper coupling of vibrations to the neck tissue

Spec. sub 1/5


IIT

Bo

mb

ay

s(n) = e(n)*hv(n), l(n) = e(n)*hl(n), x(n) = s(n) + l(n)

Xn(ej) = En(ej) [Hvn(ej) + Hln

(ej)]

• Assuming hv(n) & hl (n) to be uncorrelated

Xn(ej)2 = En(ej)2[Hvn(ej)2 + Hln

(ej)2]

• For short-time spectra calculated using pitch-synchronous window, En(ej)2 may be considered as constant E(ej)2

• During non-speech intervals, s(n) will be negligible,

Xn(ej)2 = |Ln(ej)|2

= |E(ej)|2 |Hln(ej)|2

Spec. sub 2/5


IIT

Bo

mb

ay Generalized spectral subtraction (Berouti et al 1979) using

FFT

E(k) = | Xn(k)|γ - α|Ln(k)|γ

Clean mag. spectrum

|Y n(k)| = [E(k)](1/ γ), if E(k) > [β|Ln(k)|]γ

β|Ln(k)|, otherwise

( : subtraction, : spectral floor, : subtraction power)

yn(m) = IDFT [ Yn(k) ejθn(k)]

Spec. sub 3/5


IIT

Bo

mb

ay

Phase estimation

▪ Noisy phase : θn(k) = Xn(k)

▪ Zero Phase : θn(k) = 0

▪ Random phase: θn(k) = r (uniformly distr over [0, 2π]

▪ Min. phase calculationiterative tech. (Quatieri and Oppenheim 1981),

cepstrum based non-iterative calculation (Oppenheim & Schafer 1975, Rabiner & Schafer 1978, Yegnanarayana & Dhayalan 1981)

▪ Phase set for continuity across the frames

θn(k) = θn-1(k) + (2πndk)/N

where nd = window shift, N = FFT size

▪ Noisy phase resulted in better quality than others

Spec. sub 4/5


IIT

Bo

mb

ay

Block diagram of spectral subtraction

Spec. sub 5/5


IIT

Bo

mb

ay

3 ESTIMATION OF NOISE SPECTRUM

• Variation in noise due to change in the electrolarynx orientation

• Voice activity detection difficult in electrolaryngeal speech

• Averaging based noise est. (Pandey et al 2002) unsuitable for long term use

• Quantile based noise est. (Stahl et al 2000) used for electrol. speech (Pandey et al 2004) difficult to implement for real-time processing

• Minimum statistics based method (Martin 1994) used for elec. lary. speech (Mitra & Pandey 2006, Kabir et al 2008) not effective with fixed subtraction parameters.

Est. noise 1/1


IIT

Bo

mb

ay

• Introduction of jitter and shimmer, using LPC based analysis synthesis, after spectral subtraction for reducing unnaturalness• Spectral compensation for low-frequency spectral deficit

4 INTRO. OF JITTER & SHIMMER & SPECTRAL COMPENSATION

Intro. J & S 1/4


IIT

Bo

mb

ay

Implementation of shimmer

Impulse amplitude = a(1+sr1)

a = mean amplitude

s = peak-to-peak shimmer

r1= random number uniformly distributed over +0.5

Implementation of jitter

Impulse repetition period = N(1+jr2)

N = mean pitch period in number of samples

j = peak-to-peak jitter

r2 = random number uniformly distributed over +0.5

Intro. J & S 3/4


IIT

Bo

mb

ay

• Low frequency spectral deficit in electrolaryngeal speech

• High frequency spectral emphasis in resynthesized speech due to impulse train excitation in LPC analysis-synthesis,

• Spectral compensation filter designed by comparing LPC smoothened spectra of natural and resynthesized /a/, /i/, /u/. Inserted in the excitation path for spectral compensation.

Spectral compensation

Intro. J & S 4/4


IIT

Bo

mb

ay

5 RESULTS Results 1/2

γ = 1Averaging: α=10, β=0.001, Min.: α = 25, β=0.005, Median: α = 1.5, β = 0.001

Materal: “….Where were you a year ago? 1 2 3 4 5 6 7 8 9 10”Electrolarynx Solatone.


IIT

Bo

mb

ay

Jitter →---------------------

Shimmer ↓ 0% 6% 12% 20% 40%

0%

6%

12%

20%

40%

Electrolaryngeal speech Enhan. electrolar. speech after spec. sub. with MBNE

Material: “…Where were you a year ago? “, Electrolarynx: Solatone

Results 2/2

(α = 1.2, β = 0.001, γ = 1),


IIT

Bo

mb

ay

6 CONCLUSION

▪ Median based noise estimation could be used for noise suppression without varying the oversubtraction factor.

▪ Phase estimation based on minimum phase and phase continuity did not imrove the quality above that of the noisy speech.

▪ Introduction of shimmer did not improve speech quality.

▪ Introduction of peak-to-peak jitter of up to 6 % and spectral compensation helped in improving the quality.


IIT

Bo

mb

ay

Thank You


IIT

Bo

mb

ay

P. C. Pandey, S. K. Basha, “Enhancement of electrolaryngeal speech by spectral subtraction, spectral compensation, and introduction of jitter and shimmer”, Proc. 20th International Congress on Acoustics ( ICA 2010), 23-27 August 2010, Sydney, Australia.

Abstract -- An electrolarynx, a verbal communication aid used by laryngectomy patients, is a vibrator held against the neck tissue to provide excitation to the vocal tract, as a substitute to that provided by the glottal vibrations. Although the user can set the vibration level and pitch, a dynamic control of level, voicing, and pitch during speech production is not feasible. In addition to this basic limitation, the electrolaryngeal speech suffers from (i) presence of background noise caused by leakage of acoustic energy from the vibrator and vibrator-tissue interface, (ii) low-frequency spectral deficiency, and (iii) unnatural quality due to constant pitch and level. Background noise decreases the intelligibility, while the other two factors affect the speech quality. Present study involved investigations for improving the intelligibility and quality of electrolaryngeal speech. Pitch-synchronous application of generalized spectral subtraction was used for reducing the background noise. In order to track the variation in the spectrum of the leakage noise due to changes in vibrator orientation and pressure during speech production, a dynamic estimation of noise was carried out from a set of past frames. The estimated noise spectrum was subtracted from that of the noisy speech and the resulting magnitude spectrum was combined with the original phase spectrum. The speech signal was resynthesized using overlap-add method, with two-pitch period analysis frames and one period overlap. Estimation of phase spectrum by minimum-phase assumption and the assumption of phase continuity did not improve the speech quality. An introduction of jitter and shimmer in the speech signal, using LPC based analysis-synthesis, was investigated for improving its naturalness. The excitation for synthesis was an impulse train with the frequency equal to that of the vibrator, with random frequency and amplitude modulations for providing the jitter and the shimmer, respectively. An FIR filtering of the excitation was used to match the long-term average spectral envelope of the processed electrolaryngeal speech to that of the normal speech. A peak-to-peak jitter of up to 6 % increased the naturalness, while introduction of shimmer decreased the quality.


IIT

Bo

mb

ayREFERENCES

1 M. Weiss, G. Y. Komshian, and J. Heinz, “Acoustic and perceptual characteristics of speech produced with an electronic artificial larynx,” J. Acoust. Soc. Am., 65, 1298-1308 (1979).

2 H. L. Barney, F. E. Haworth, and H. K. Dunn, “An experimental transistorized artificial larynx,” Bell Systems Tech. J., 38, 1337-1356 (1959).

3 Q. Yingyong and B. Weinberg, “Low frequency energy deficit in electrolaryngeal speech,” J. Speech Hearing Res., 34, 1250-1256 (1991).

4 C. Y. Espy-Wilson, V. R. Chari, and C. B. Haung, “Enhance ment of alaryngeal speech by adaptive filtering,” Proc. ICSLP, 764-771 (1996).

5 P. C. Pandey, S. M. Bhandarkar, G. K. Baccher, and P. K. Lehena, “Enhancement of alaryngeal speech using spectral subtraction,” Proc. 14th Int. Conf. Digital Signal Prcessing (DSP 2002), Santorini, Greece, 591-594 (2002).

6 P. C. Pandey, S. S. Pratapwar, and P. K. Lehana, “Enhancement of electrolaryngeal speech by reducing leakage noise using spectral subtraction with quantile based dynamic estimation of noise,” Proc. 18th Int. Congress Acoustics (ICA 2004), Kyoto, Japan, 3029-3032 (2004).

7 H. Liu, Q. Zhao, M. Wan, and S. Wang, “Application of spectral subtraction method on enhancement of electrolaryngeal speech,” J. Acoust. Soc. Am., 120, 398-406 (2006).

8 H. Liu, Q. Zhao, M. Wan and S. Wang, “Enhancement of electrolarynx speech based on auditory masking,” IEEE Trans. Biomed. Eng., 53, 865-874 (2006).

9 P. Mitra and P.C. Pandey, “Enhancement of electro laryngeal speech by spectral subtraction with minimum statistics-based noise estimation,” J. Acoust. Soc. Amer., 120, 3039 (2006).

10 R. Kabir, A. Greenblatt, K. Panetta, and S. Agaian, “Enhance ment of alaryngeal speech utilizing spectral subtract ion and minimum statistics,” Proc. 7th International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July (2008).

11 S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process, 27, 113-120 (1979).

12 M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc.IEEE ICASSP’79, 208-211 (1979).

13 V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and wiener filtering,” Proc. IEEE ICASSP’00, 3, 1875-1878 (2000).


IIT

Bo

mb

ay

14 R. Martin, “Spectral subtraction based on minimum statisic” Proc. 7th European Signal Processing Conf. (EUSIPCO–94), Edinburgh, Scoltland, 1182-1185 (1994).

15 T. F. Quatieri and A. V. Oppenheim,“Iterative techniques for minimum phase signal reconstruction from phase or magnitude,” IEEE Trans. Acoust., Speech, Signal Process., 29, 1187-1193 (1981).

16 B. Yegnanarayana and A. Dhayalan, “Noniterative techniques for minimum phase signal reconstruction from phase or magnitude,” Proc. IEEE ICASSP, 639-642, (1983).

17 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. (Prentice-Hall, Englewood Cliffs, New Jersey, 1975).

18 L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, (Prentice Hall, Englewood Cliffs, New Jersey, 1978).

Documents

♥♥♥♥ 1. Intro. 2.Spec.sub. 3.Est. noise 4.Intro.J& S 5.Results 6 Concl. ♠♠ ◄◄ ►► 1/191. Intro.2.Spec.sub.3.Est. noise4.Intro.J& S5.Results6 Concl ♠♠◄◄►►