Workshop “Research Domains in Electronics & Telecomm Engineering”, 9th July 2104,

Slide 1

Workshop Research Domains in Electronics & Telecomm Engineering, 9th July 2104,Vivekanand Education Society's Institute of Technology, Chembur, Mumbai 400 074

Single-channel Speech Enhancement for Real-time Applications

Prof P C PandeySPI Lab, EE Dept., IIT Bombayhttp://www.ee.iitb.ac.in/~pcpandeyhttp://www.ee.iitb.ac.in/~spilab

IIT Bombay09 / July / 2014 [email protected] Dept., IIT BombayOverviewIntroductionSpeech Enhancement Using Spectral SubtractionInvestigations Using Offline ProcessingImplementation for Real-time ProcessingSummary & Conclusions---------------------------------------------------------------------------------Main contributors: Santosh K. Waddi, Nitya TiwariReferencesS. K. Waddi, P. C. Pandey, and N. Tiwari, Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners, in Proc. Nat. Conf. Commun. 2013, New Delhi, India, doi: 10.1109/NCC.2013. 6487989.N. Tiwari, S. K. Waddi, and P. C. Pandey, "Speech enhancement and multi-band frequency compression for suppression of noise and intraspeech spectral masking in hearing aids," in Proc. 10th Annual Conference of the IEEE India Council (IEEE Indicon 2013), Mumbai, December 13-15, 2013, paper no. 524

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions1. IntroductionSensorineural hearing lossIncreased hearing thresholds and high frequency lossDecreased dynamic range & abnormal loudness growthReduced speech perception due to increased spectral & temporal masking Decreased speech intelligibility in noisy environment

Signal processing in hearing aidsFrequency selective amplificationAutomatic volume controlMultichannel dynamic range compression (settable attack time, release time, and compression ratios) Processing for reducing the effect of increased spectral masking in sensorineural lossBinaural dichotic presentation (Lunner et al. 1993, Kulkarni et al. 2012)Spectral contrast enhancement (Yang et al. 2003) Multiband frequency compression (Arai et al. 2004, Kulkarni et al. 2012) [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsTechniques for reducing the background noise Directional microphoneAdaptive filtering (a second microphone needed for noise reference)Single-channel noise suppression using spectral subtraction (Boll 1979, Berouti et al.1979, Martin 1994, Kamath & Loizou 2002, Loizou 2007, Lu & Loizou 2008, Paliwal et al. 2010)Processing stepsDynamic estimation of non-stationary noise spectrum- During non-speech segments using voice activity detection- Continuously using statistical techniquesEstimation of noise-free speech spectrum - Spectral noise subtraction- Multiplication by noise suppression functionSpeech resynthesis (using enhanced magnitude and noisy phase) [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsObjective Real-time single-input speech enhancement for use in hearing aids and other sensory aids (cochlear prostheses, etc) for hearing impaired listeners and in communication devices

Main challenges Noise estimation without voice activity detection to avoid errors under low-SNR & during long speech segmentsLow signal delay(algorithmic + computational) for real-time applicationLow computational complexity & memory requirement for implementation on a low-power processor

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions2. Speech Enhancement Using Spectral SubtractionDynamic estimation of non-stationary noise spectrumEstimation of noise-free speech spectrum Speech resynthesis

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsGeneralized spectral subtraction (Berouti et al. 1979)Power subtractionWindowed speech spectrum = Xn(k)Estimated noise mag. spectrum = Dn(k)Estimated speech spectrum Yn(k) = [|Xn(k)|2 (Dn(k))2 ] 0.5 e j ( + )1/Dn(k) 1/ Dn(k) otherwise = exponent factor (2: power subtraction, 1: magnitude subtraction) = over-subtraction factor (for limiting the effect of short-term variations in noise spectrum) = floor factor to mask the musical noise due to over-subtractionRe-synthesis with noisy phase without explicit phase calculationYn(k) = |Yn(k)| Xn(k) / |Xn(k)| [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsMulti-band spectral subtraction (Kamath & Loizou 2002)Noise does not effect spectrum uniformlySpeech spectrum divided into B non-overlapping bands, spectral subtraction is performed independentlyTest material: 10 sentences from HINT database, noise: speech-shaped noise, Noisy speech: 0 dB and 5 dB SNREvaluation: Itakura-Saito (IS) distance method as an objective measure Improvement over the conventional power spectral subtraction, a very little trace of musical noiseGeometric approach to spectral subtraction (Lu & Loizou 2008)Without assuming the cross-terms as zeroTest material: NOIZEUS database, noise: babble, street, car, white, Noisy speech: 0 dB, 5 dB and 15 dB SNREvaluation: mean square error (MSE), PESQ, log likelihood ratioCross terms can be ignored at very low and high SNRs but not near to 0 dBProc. output: no audible musical noise , smooth and pleasant residual noisePerformed significantly better than power spectral subtraction in all conditions [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsNoise estimationMinimal-tracking algorithmsMinimum statistics (Martin 1994)Tracks the noise as minima of past frames Minimum tracking (Doblinger 1995)Smoothing noisy speech power spectra in each frequency bin using a non-linear smoothing

Time-recursive averaging algorithmsSNR-dependent recursive averaging (Lin 2003)Noisy speech decomposed into sub-band signals, noisy signal power is smoothened and noise estimated adaptivelySmoothing parameter: function of estimated SNRWeighted spectral averaging (Hirsch and Ehrlischer 1995)First order recursive weighted average of past spectral magnitude values over 400 ms which are below an adaptive threshold [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsImproved minima-controlled recursive averaging (Cohen 2007)Two iterations of smoothing and trackingFirst iteration: rough voice activity detection is provided in each frequency bandSecond iteration: smoothing excludes strong speech components, makes the minimum tracking robust during speech activitySmoothing parameter: frequency-dependent & dynamically adjusted by signal presence probabilityLower estimator error than minimum statisticsMethod is combined with log-spectral amplitude estimator Higher segmental SNR improvement than minimum statistics

Histogram-based technique (Hirsch & Ehrlicher 1995)Histogram: noisy speech over 400 msNoise estimated: maximum of distribution in each sub-bandAvoid spikes: estimated values smoothed along time axisObjective evaluation: relative error. Low relative error as compared to weighted spectral average method (Hirsch & Ehrlicher 1995) [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsCascaded-median based estimation (Basha & Pandey 2012) Moving median approximated by p-point q-stage cascaded-median, with a saving in memory & computation for real-time implementation.

Quantile-based noise estimation (Stahl 2000)Speech signal energy: low in most of the frames high in only 10 20 % framesNoise estimation: Selecting certain quantile value from previous frames of noisy speech spectrumFrequency-dependent and SNR-dependent for quantile selectionMedian-based noise estimation works well in a robust manner, but is difficult to use for real-time applications [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions MBNE vs CMBNE ComparisonProject objectiveImplementation of generalized spectral subtraction along with cascaded-median based noise estimation for real-time processing using a low-power DSPSelection of optimal set of processing steps and parameters, using offline processingImplementation on a DSP board with a16-bit fixed-point processor & evaluation Condition for reducing sorting operations low p, p = 3 code simplification for sorting operationsCondition for reducing storage: q ln(M)Median typeStorage per freq. bin No. of sortings per frame per freq. binM-point moving median2M (M1)/2p-point q-stage cascaded median (M = pq)pqp(p1)/2 [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions3. Investigations Using Offline ProcessingTest materialSpeech material1: Recording with three isolated vowels, a Hindi sentence, an English sentence (-/a/-/i/-/u/ aayiye aap kaa naam kyaa hai? Where were you a year ago?) from a male speaker. Referred to as "VHSES"Speech material2: Six sentences from NOIZEUS database of one male speaker Noise: white, pink, street, babble, car, and train noises.SNR: , 18, 15,12, 9, 6, 3, 0, -3, -6 dB.

Evaluation methodsInformal listeningObjective evaluation using PESQ measure (0 4.5)

Investigations (fs = 10 kHz)Overlap of 50% & 75% : indistinguishable outputs = 1 (magnitude subtraction) : higher tolerance to variation in , values [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsInvestigation on noise estimation

(a) Clean speech signal

(b) White noise

(c) Noisy speech: white noise, 3 dB SNR(d) Noisy speech: white noise, 0 dB SNRScatter plots for magnitude spectra. Speech material: VHSES

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions(a) Clean speech signal(b) White noise(c) Noisy speech: white noise, 3 dB SNR(d) Noisy speech: white noise, 0 dB SNRScatter plots for magnitude spectra. Speech material: NOIZEUS

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions(a) Mean(b) Median(c) MinimumMean, median and minimum of magnitude spectra of clean speech signal, noise and noisy speech (white, SNR: 0 dB), speech material: VHSES

Noisy signal median tracks the noise median & Noisy signal minimum tracks the noise minimum at almost all the frequencies [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions(a) Mean(b) Median(c) MinimumMean, median and minimum of magnitude spectra of clean speech signal, noise and noisy speech (white, SNR: 0 dB), speech material: NOIZEUS

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions

(a) Speech material: VHSES

(b) Speech material: NOIZEUS

Relative RMS error (dB)Objective evaluation of the accuracy of noise estimationRelative RMS error (dB) decreases as SNR decreases [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsEffect of window length and noise estimation durationProcessing: Magnitude spectral subtraction with median based noise estimationHigh PESQ score: Noise estimation across 81 past frames & 20 40 ms window length30ms window length was chosen (approximately 1.2 s duration)

(a) Speech: VHSES, noise: white, SNR: 0dB

(b) Speech: NOIZEUS, noise: white, SNR: 0dB [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsComparison of enhanced speech using MBNE and CMBNEMBNE requires large memory & computation intensive3-point 4-stage cascaded-median significantly reduces memory requirement & computationsReduction in storage requirement per freq. bin: from 162 to 12 samples Reduction in number of sorting operations per frame per freq. bin: from 40 to 3Information listening: Perceptually sameObjective evaluation: Almost same in most cases and maximum difference of 0.06Noise typeUn proc.Proc. using MBNEProc. using CMBNEwhite1.551.841.84babble1.751.801.81street1.832.082.04pink1.602.001.98train2.052.402.35car1.721.951.89PESQ score of the enhanced speech. Speech: NOIZEUS, SNR: 0 dB [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsEffect of spectral subtraction parametersProcessing: Magnitude spectral subtraction using 3-point 4-stage CMBNEAnalysis-synthesis: 30 ms window length & 50% overlapSpectral floor factor : 0.01 appropriate for all the casesSubtraction factor : in 2 2.5 for VHSES and in 1.2 1.4 for NOIZEUS speech materialPhase estimation for spectral subtractionProcessing: Magnitude spectral subtraction using 3-point 4-stage CMBNEPhase: zero, Cepstrum 1978, Quatieri & Oppenheim 1981, Nawab et al. 1983Analysis-synthesis: 50% overlap rect. win., 75% overlap rect. win., Griffin-Lim method (Griffin & Lim 1984)Informal listening: No improvement over by using phase using noisy phaseObjective evaluation: signal estimated using noisy phase has higher PESQ score [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsComparison of proposed method with other methodsProposed method: Magnitude spectral subtraction with cascaded-median based noise estimation. Analysis-synthesis with 30 ms and 50% overlapComparison: spectral-subtractive, statistical-model based, and subspace algorithms (implementations available on CD accompanying Loizou 2007: specsub, mband, ga, wiener_iter, wiener_as, wiener_wt, mt_mask, audnoise, mmse, logmmse, logmmse_spu, stsa_weuchild, stsa_wcosh, stsa_mis, kli, pklt)Enhancement methodNoise typewhitebabblestreetpinktraincarun proc.1.541.731.781.592.001.67specsub1.781.741.852.012.331.91mband1.431.882.061.722.612.09ga1.822.052.422.112.672.26mmse1.951.861.992.222.692.24klt2.511.891.842.272.321.91Proposed method2.141.932.162.312.632.15 Comparison of PESQ scores for VHSES speech material, 0 dB SNR [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsComparison of PESQ scores for NOIZEUS speech material, 0 dB SNREnhancement methodNoise typewhitebabblestreetpinktraincarun proc.1.551.751.831.602.051.72specsub1.631.581.811.782.041.77mband1.591.842.021.732.301.91ga1.611.822.071.862.351.96mmse1.821.782.022.012.431.99klt1.891.711.871.972.071.78Proposed method1.831.812.031.952.331.90Observation: Comparable to the best ones [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsDiscussionFFT length N = 512 & higher: indistinguishable outputsProcessing: Magnitude spectral subtraction with 3-point 4-stage CMBNE, analysis-synthesis with 30ms window length & 50% overlap Informal listening: Significant enhancement for all noises with different SNR'sSpectral subtraction parameters: = 0.01 appropriate for all the cases, in 2 2.5 for VHSES and in1.2 1.4 for NOIZEUS speech materialSNR advantage: 4 13 dB for VHSES & 2 7 dB for NOIZEUS speech materialNoise typeMaterial: VHSESMaterial: NOIZEUSSNR advantageOptimal SNR advantageOptimal white13 dB2.07 dB1.4babble4 dB2.02 dB1.2street5 dB2.5 4 dB1.4pink11 dB2.07 dB1.4train7 dB2.05 dB1.4car6 dB2.05 dB1.4 [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions4. Implementation for Real-time Processing16-bit fixed point DSP: TI/TMS320C551516 MB memory space : 320 KB on-chip RAM with 64 KB dual access RAM, 128 KB on-chip ROMThree 32-bit programmable timers, 4 DMA controllers each with 4 channelsFFT hardware accelerator (8 to 1024-point FFT)Max. clock speed: 120 MHz

DSP Board: eZdsp 4 MB on-board NOR flash for user programCodec TLV320AIC3204: stereo ADC & DAC, 16/20/24/32-bit quantization , 8 192 kHz samplingDevelopment environment for C: TI's 'CCStudio, ver. 4.0' [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions

ImplementationOne codec channel (ADC and DAC) with 16-bit quantizationSampling frequency: 10 kHzWindow length of 30 ms (L = 300) with 50% overlap, FFT length N = 512Storage of input samples, spectral values, processed samples: 16-bit real & 16-bit imaginary parts [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsData transfers and buffering operations (S = L/2)

DMA cyclic buffers3 block input buffer2 block output buffer(each with S samples)Pointerscurrent input blockjust-filled input block current output blockwrite-to output block(incremented cyclically on DMA interrupt)

Signal delayAlgorithmic: 1 frame (30 ms)Computational frame shift (15 ms)

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsResults

PESQ Score vs SNR for noisy and enhanced speech using offline and real-time processing(a) Speech: VHSES(b) Speech: NOIZEUSOffline proc. improvement: 0.57 0.80 for VHSES & 0.28 0.44 for NOIZEUSReal-time proc. improvement: 0.39 0.71 for VHSES & 0.22 0.32 for NOIZEUS [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsExample of Processing : "-/a/-/i/-/u/ "aayiye aap kaa naam kyaa hai?" "Where were you a year ago?", with white noise at 3 dB SNR

(a) Clean speech(c) Offline processed(b) Noisy speech(d) Real-time processed

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsNoise typeUn proc.Offline proc.Real-time proc.white1.542.142.10babble1.731.931.87street1.782.161.92pink1.592.312.20train2.002.632.45car1.672.152.09Real-time processing tested using white, babble, car, pink, train noises: real-time processed output perceptually similar to the offline processed outputSignal delay = 48 msLowest clock for satisfactory operation = 16.4 MHz Processing capacity used 1/7 of the capacity with highest clock (120 MHz)Comparison of enhanced speech between offline and real-time processed. Speech: VHSES, SNR: 0dB

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions5. Summary & ConclusionsInvestigation & implementation of spectral subtraction for real-time operation: Magnitude spectrum subtraction and resynthesis using noisy phase, along with cascaded-median based dynamic noise estimation for reducing computation and memory requirementEnhancement of speech with different types of additive stationary and non-stationary noise: SNR advantage : 4 13 dB for VHSES & 2 7 dB for NOIZEUSImplementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: Implementation with 10 kHz sampling using 1/7 of processing capacity, signal delay = 48 msFurther work Frequency & a posteriori SNR-dependent subtraction & spectral floor factorsCombination of speech enhancement technique with other processing techniques in the sensory aids Implementation using other processorsSubjective evaluation of intelligibility and quality of enhanced speech [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsThank You [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsAbstractSensorineural loss is generally associated with increased spectral masking due to widened auditory filters and the listeners having this kind of hearing impairment often experience great difficulty when the speech is contaminated by noise. This thesis presents investigations for real-time enhancement of noisy speech using spectral subtraction for suppressing the external noise. Investigation using offline processing for enhancing the noisy speech with different types of noise and SNR values is carried out to select the optimal set of steps and parameters for real-time processing. PESQ score is used for objective comparison of quality of the enhanced speech. Results show that median based noise estimation is effective in estimating noise from noisy speech without a voice activity detector, for a wide variety of stationary and non-stationary noises and range of SNR values and that a cascaded-median can be used as an approximation to median for significantly reducing the computation and memory requirement, without adversely affecting the noise estimation. Speech enhancement using magnitude spectrum subtraction with 3-point 4-stage cascaded median for noise estimation and resynthesis using noisy phase resulted in improvements in PESQ scores in the range 0.28 0.44 for speech material from NOIZEUS database with added white noise. Resynthesis using phase estimated from the enhanced magnitude spectrum did not result in any further improvement in the scores. The processing technique is implemented and tested for satisfactory operation, with sampling frequency of 10 kHz, 30 ms analysis window with 50% overlap, using a DSP board based on 16-bit fixed-point DSP processor TMS320C5515 with on-chip FFT hardware. The implementation uses data transfer and buffering operations devised for an efficient realization of analysis-synthesis and codec and DMA for acquisition of the input signal and outputting of the processed output signal. The real-time operation is achieved with signal delay of approximately 48 ms and using about one-seventh of the computing capacity of the processor. [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsBibliography[1]H. Levitt, J. M. Pickett, and R. A. Houde, Eds., Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980, pp. 310.[2]J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289323.[3]H. Dillon, Hearing Aids. New York: Thieme Medical, 2001.[4]B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66107.[5]T. Lunner, S. Arlinger, and J. Hellgren, 8-channel digital filter bank for hearing aid use: preliminary results in monaural, diotic, and dichotic modes, Scand. Audiol. Suppl., vol. 38, pp. 7581, 1993.[6]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss, Int. J. Audiol., vol. 51, no. 4, pp. 334344, 2012.[7]J. Yang, F. Luo, and A. Nehorai, Spectral contrast enhancement: Algorithms and comparisons, Speech Commun., vol. 39, no. 12, pp. 3346, 2003.[8]T. Arai, K. Yasu, and N. Hodoshima, Effective speech processing for various impaired listeners, in Proc. 18th Int. Cong. Acoust. (ICA 2004), Kyoto, Japan, 2004 pp. 13891392.[9]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, "Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss," Speech Commun., vol. 54, no. 3, pp. 341350, 2012.[10]P. C. Loizou, Speech Enhancement: Theory and Practice. New York: CRC, 2007.[11]S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, pp. 113120, 1979.[12]M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proc. IEEE ICASSP 1979, Washington, DC, pp. 208211.[13]S. Kamath and P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proc. IEEE ICASSP, 2002, Orlando, Florida, vol. 4, pp. IV4164.[14]Y. Lu and P. C. Loizou, A geometric approach to spectral subtraction, Speech Commun., vol. 50, no. 6, pp. 453466, 2008. [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions[15]K. Paliwal, K. Wojcicki, and B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Commun., vol. 52, no. 5, pp. 450475, 2010.[16]R. Martin, Spectral subtraction based on minimum statistics, in Proc. Eur. Signal Process. Conf., 1994, pp. 1182-1185.[17]I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging, IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466475, 2003.[18]H. Hirsch and C. Ehrlicher, Noise estimation techniques for robust speech recognition, in Proc. IEEE ICASSP, 1995, Detroit, MI, pp. 153156.[19]V. Stahl, A. Fisher, and R. Bipus, Quantile based noise estimation for spectral subtraction and Wiener filtering, in Proc. IEEE ICASSP, 2000, Istanbul, Turkey, pp. 18751878.[20]G. Doblinger, Computationally efficient speech enhancement by spectral minima tracking in subbands, in Proc. 4th Eur. Conf. Speech Commun. and Technology (EUROSPEECH95), Madrid, Spain, 1995, pp. 15131516.[21]L. Lin, W.H. Holmes, and E. Ambikairajah, "Adaptive noise estimation algorithm for speech enhancement," Electronics Letters, vol.39, no. 9, pp.754-755, 2003.[22]C. Ris and S. Dupont, Assessing local noise level estimation methods: application to noise robust ASR, Speech Commun., vol. 34, no. 1-2, pp. 141158, 2001.[23]S. K. Basha and P. C. Pandey, Real-time enhancement of electrolaryngeal speech by spectral subtraction, in Proc. Nat. Conf. on Commun. 2012 (NCC 2012), Kharagpur, India, 2012, pp. 516520.[24]S. K. Waddi, P. C. Pandey, and N. Tiwari, Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners, in Proc. Nat. Conf. Commun. (NCC 2013), Delhi, India, 2013, paper no. 1569696063.[25]ITU, Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, ITU-T Rec., P.862, 2001.[26]Y. Hu and P. C. Loizou, Subjective evaluation and comparison of speech enhancement algorithms, Speech Communication, vol. 49, pp. 588601, 2007.[27]T. F. Quatieri, and A. V. Oppenheim, Iterative techniques for minimum phase signal reconstruction from phase or magnitude, IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 6, pp. 11871193, 1981.[28]S. H. Nawab, T. F. Quatieri, and J. S. Lim, Signal-reconstruction from short time Fourier transform magnitude, IEEE Trans. Acoust., Speech Signal Process., vol. 31, no. 4, pp. 986998, 1983. [email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & Conclusions[29]L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. Englewood Cliffs, New Jersey: Prentice Hall, 1978, pp. 356362.[30]D. W. Griffin and J. S. Lim, Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoust., Speech, and Signal Process., vol. 32, no. 2, pp. 236243, 1984.[31]Spectrum Digital, Inc. (2010) TMS320C5515 eZdsp USB Stick Technical Reference. [online]. Available: support.spectrumdigital.com/boards/usbstk5515/reva/files/usbstk 5515_TechRef_RevA.pdf[32]Texas Instruments, Inc. (2011) TMS320C5515 Fixed-Point Digital Signal Processor. [online]. Available: focus.ti.com/lit/ds/symlink/tms320c5515.pdf.[33]Texas Instruments, Inc. (2008) TLV320AIC3204 Ultra Low Power Stereo Audio Codec. [online]. Available: focus.ti.com/lit/ds/symlink/tlv320aic3204.pdf.[34]S. K. Waddi, "Real-time enhancement of noisy speech using spectral subtraction," M.Tech thesis, Electrical Engineering, Indian Institute of Technology Bombay, 2013.[35] S. K. Waddi, P. C. Pandey, and N. Tiwari, Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners, in Proc. Nat. Conf. Commun. 2013, New Delhi, India, doi: 10.1109/NCC.2013. 6487989.[36]N. Tiwari, S. K. Waddi, and P. C. Pandey, "Speech enhancement and multi-band frequency compression for suppression of noise and intraspeech spectral masking in hearing aids," in Proc. 10th Annual Conference of the IEEE India Council (IEEE Indicon 2013), Mumbai, December 13-15, 2013, paper no. 524

[email protected] Dept., IIT Bombay#/321. Intro. 2. Speech Enhan. 3. Offline Inv. 4. Real-time proc. 5. Summ. & ConclusionsText

|Xn(k)|

x(n)

Complex Spectrum

Spectral Subtraction

Noise Estimation

FFT

Windowing

| |,

|Yn(k)|

Yn(k)

y(n)

Input Speech

Enhanced Speech

Dn(k)

IFFT

Overlap-Add

Xn(k)

Xn(k)

text

Med1-1

Median 2

Med1-2

|Xn(k)|

Dn(k)

Med1-P

Mag.Sp.-1

Median 1

Mag.Sp.-2

Mag.Sp.-p

MedQ-1-1

Median q

MedQ-1-2

MedQ-1-p

Text

N L

N L

Input Samples

Ouput Samples

Just-Filled Block

Input Block

L

Words

Output Block

L

Words

DMA Input Cyclic Buffer(3S Samples)

DMA Output Cyclic Buffer(2S Samples)

Write-to Block

Input Data Buffer

Output Data Buffer

LSamples

SSamples

FFT

IFFT

Documents

Workshop “Research Domains in Electronics & Telecomm Engineering”, 9th July 2104,