Research Article Noise Estimation and Suppression Using ...downloads.hindawi.com/journals/js/2016/5352437.pdf · speech enhancement to improve the speech intelligibility or quality

Research ArticleNoise Estimation and Suppression Using NonlinearFunction with A Priori Speech Absence Probability inSpeech Enhancement

Soojeong Lee1 and Gangseong Lee2

1School of Electronic Engineering Hanyang University 222 Wangsimni-ro Seongdong Seoul 133-791 Republic of Korea2Kwangwoon University 20 Kwangwoon-ro Nowon-gu Seoul Republic of Korea

Correspondence should be addressed to Soojeong Lee leesoo86hanyangackr

Received 7 December 2015 Accepted 6 April 2016

Academic Editor Marco Anisetti

Copyright copy 2016 S Lee and G Lee This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

This paper proposes a noise-biased compensation of minimum statistics (MS) method using a nonlinear function and a priorispeech absence probability (SAP) for speech enhancement in highly nonstationary noisy environments The MS method is a well-known technique for noise power estimation in nonstationary noisy environments however it tends to bias noise estimation belowthat of the true noise level The proposed method is combined with an adaptive parameter based on a sigmoid function and apriori SAP for residual noise reduction Additionally our method uses an autoparameter to control the trade-off between speechdistortion and residual noise We evaluate the estimation of noise power in highly nonstationary and varying noise environmentsThe improvement can be confirmed in terms of signal-to-noise ratio (SNR) and the Itakura-Saito Distortion Measure (ISDM)

1 Introduction

Noise estimation algorithms are essential components ofmany modern mobile communication speech recogni-tion and human computer interaction systems for speechenhancement [1 2] It is generally included as a part of thespeech enhancement to improve the speech intelligibilityor quality of a signal corrupted by noise However it isdifficult to reduce noise without distorting speech becausethe performance of any noise estimation algorithm usuallydepends on a trade-off between speech distortion and noisereduction

Current single microphone speech enhancement meth-ods belong to two groups namely time domain methodssuch as the subspacemethod and frequency domainmethodssuch as the spectral subtraction (SS) [3] and minimummeansquare error (MMSE) estimator [4] Both methods havetheir own advantages and drawbacks Subspace methodsprovide a mechanism to control the trade-off between speechdistortion and residual noise but with the cost of a heavycomputational load [5] Frequency domain methods on theother hand usually consume less computational resources

but do not have a theoretically establishedmechanism to con-trol trade-off between speech distortion and residual noiseAmong them spectral subtraction (SS) is computationallyefficient and has a simple mechanism to control trade-offbetween speech distortion and residual noise but suffersfrom a notorious artifact known as musical noise [6] Thesespectral noise reduction algorithms require an estimate ofthe noise spectrum which can be obtained from speechabsence frames indicated by a voice activity detector (VAD)or alternatively with the minimum statistic (MS) methods[7] that is by tracking spectral minima in each frequencyband

Several recent studies have proposed noise estimationschemes for unknown noise signals [1ndash14] The minimumstatistics (MS) noise estimation scheme [7] is one that workswell in nonstationary noisy environments Martin proposedan algorithm for noise estimation based on minimum statis-tics [7]The ability to track varying noise levels is a prominentfeature of the minimum statistics (MS) algorithm [7] Thenoise estimate is obtained as theminima values of a smoothedpower estimate of the noisy signal multiplied by a factor thatcompensates the bias However the MS algorithm still has

Hindawi Publishing CorporationJournal of SensorsVolume 2016 Article ID 5352437 7 pageshttpdxdoiorg10115520165352437

2 Journal of Sensors

a tendency to bias the noise estimate below that of the truenoise level regardless of the number of frames [8]Thereforeit leaves residual noise in the frames of speech absence andin the frames of variation of noise characteristic in highlynonstationary noisy environments

To solve this problem we propose a combined adaptivefactor based on a sigmoid function and a priori speechabsence probability (SAP) estimation [9] for biased com-pensation Specifically we apply the adaptive factor 120575 asa posteriori SNR When the a posteriori SNR decreases 120575

increases but is constrained to take a value between 120575minand 120575max Thus the proposed adaptive biased compensationfactor 120575 approaches 120575max at times when the SNR is low Inaddition when the a priori SAP equals unity the adaptivebiased compensation factor 120575 also approaches 120575max in eachfrequency bin and vice versa Furthermore our method usesanother adaptive parameter to control the trade-off betweenspeech distortion and residual noise for suppressing theestimated noise in highly nonstationary and various noisyenvironments The autocontrol parameter is controlled by aposteriori signal-to-noise ratio (SNR) as the variation of thenoise level

We evaluate the performance of the proposed algorithmfor nonstationary noise and various noise environmentsTheimprovement can be confirmed in the segmental SNR andthe Itakura-SaitoDistortionMeasure (ISDM) [15]The resultsshow that our proposed method is superior to the conven-tional MS approach The structure of the paper is as followsSection 2 reviews theminimum statistics and the a priori SAPestimation algorithms Section 3 addresses noise estimationand suppression using a linear and a nonlinear functionIn Section 4 we express the combined sigmoid functionusing the a posteriori SNR and a priori SAP estimation forrobust biased compensation In Section 5 we discuss theexperimental results

2 Minimum Statistics (MS) and SpeechAbsence Probability (SAP)

21 Review of MS The noisy speech signal 119910(119899) can berepresented as 119910(119899) = 119909(119899) + 119889(119899) where 119909(119899) is the cleanspeech signal and 119889(119899) is the noise signal Dividing thesignal into overlapping frames using a window function andapplying the short-time Fourier transform (STFT) [16] toeach frame yield the time-frequency representation 119884(119896 119897) =

119883(119896 119897) + 119863(119896 119897) where 119896 = 1 2 119870 is the frequency binindex and 119897 = 1 2 119871 is the time frame index It can beshown that

1003816100381610038161003816119884119896 (119897)10038161003816100381610038162asymp

1003816100381610038161003816119883119896 (119897)10038161003816100381610038162+

1003816100381610038161003816119863119896 (119897)10038161003816100381610038162 (1)

where |119884119896(119897)|2 |119883119896(119897)|

2 and |119863119896(119897)|2 are the power spectrum

of the noisy speech signal clean speech and noise respec-tively

The MS algorithm relies on the fact that the noisy powerspectrum often becomes equal to the noise power spectrumduring periods of speech pauses [7 13 17] Therefore an esti-mate of the noise power spectrum is obtained by separatelytracking the minimum of the noisy speech in each frequency

bin In addition because the minimum is biased towardslower values an unbiased estimate may be obtained throughmultiplication by a bias factor which is derived from thestatistics of the local minimum To search for the minimumwe take the first-order recursive of the noisy power spectrum

119878119896 (119897) = 120572119878119896 (119897 minus 1) + (1 minus 120572)1003816100381610038161003816119884119896 (119897)

10038161003816100381610038162 (2)

where 119878119896(119897) is the smoothed periodogram and 120572 is thesmoothing factor The smoothing factor used in (2) must beclose to 1 to keep the variance of the minimum tracking assmall as possible Hence time and frequency dependenceare required to determine if speech is present or absentThe smoothing factor is therefore derived by minimizing themean square error between 119878119896(119897) and 120590

2

119889119896(119897)

119864 (119878119896 (119897) minus 1205902

119889119896(119897))2

| 119878119896 (119897 minus 1) (3)

where 1205902

119889119896(119897) is the noise variance

119878119896 (119897) = 120572119896 (119897) 119878119896 (119897 minus 1) + (1 minus 120572119896 (119897))1003816100381610038161003816119884119896 (119897)

10038161003816100381610038162 (4)

In (4) the time-frequency dependent smoothing factor 120572119896(119897)is used instead of the fixed 120572 defined in (2) Substituting(4) into (3) and setting the first derivative to 0 we find theoptimum value for 120572119896(119897)

120572opt119896 (119897) =1

1 + (119878119896 (119897 minus 1) 1205902

119889119896(119897) minus 1)

(5)

According to (5) the smoothing factor can vary between0 and 1 but such a smoothing factor is not practical [15]The value of 120572opt becomes progressively smaller for a largea posteriori SNR 120574 asymp (119878119896 (119897 minus 1)120590

2

119889119896(119897)) (speech present)

However smoothing is required even during periods ofspeech because the speech power spectrum also containsa percentage of noise Hence the smoothing factor has afloor of (03) which results in a maximum of only (70)of the original spectrum remaining within any one frameConversely when the a posteriori SNR 120574 is low (speech isabsent) 120572 tends towards 1 which causes the smoothed outputto lock onto the previous value To eliminate this (5) ismultiplied by 120572max = 096 From (5) we note that 120572opt119896(119897)

depends on the true noise variance1205902119889119896

(119897) which is unknownIn practice we can replace 120590

2

119889119896(119897) with the latest estimated

value 2

119889119896(119897 minus 1) In general however this lags the true noise

variance and hence the estimated smoothing factor may betoo small or large Problems may arise when 120572opt119896(119897) is closeto 1 because 119878119896(119897) will not respond fast enough to changesin the noise Thus tracking errors were monitored in [7] bycomparing the average short-term smoothed periodogram tothe estimated noise variance After including the correctionfactor [7]

120572119888 (119897) =1

1 + (sum119870

119896=1119878119896 (119897 minus 1) sum

119870

119896=1

1003816100381610038161003816119884119896 (119897)10038161003816100381610038162minus 1)2 (6)

Journal of Sensors 3

the final factor

120572opt119896 (119897) =120572max sdot 120572119888 (119897)

1 + (119878119896 (119897 minus 1) 2

119889119896(119897 minus 1) minus 1)

2 (7)

is also smoothed over time [7]The estimated noise power based the MS algorithm [7] is

obtained by searching for a minimumwithin a finite windowlength 119862 of the smoothed power estimates 119875(119896 119897)

119878min119896 (119897) = min 119878119896 (119897) 119878119896 (119897 minus 1) 119878119896 (119897 minus 119862) (8)

Because the minimum power estimate obtained through thetime-varying smoothing factor is smaller than the meanvalue the MS algorithm requires a bias compensation for theunbiasednoise power estimate as detailed in the following [7]

2

119889119896(119897) = 120573min119896 (119897) sdot 119878min119896 (119897) (9)

where 2

119889119896(119897) is the unbiased noise power estimateThe quan-

tity 120573min119896(119897) is the bias compensation factor

22 Review of Speech Absence Probability The two-statemodel of speech events can be represented as a binaryhypothesis model [9 15 17]

1198670 (119896 119897) 119884 (119896 119897) = 119863 (119896 119897)

1198671 (119896 119897) 119884 (119896 119897) = 119883 (119896 119897) + 119863 (119896 119897)

(10)

where 1198670(119896 119897) and 1198671(119896 119897) represent the absence and pres-ence of speech in the 119896th frequency bin of the 119897th framerespectively and where

119875 (1198670 (119896 119897) equiv 119902 (119896 119897)) (11)

is the a priori probability that speech will be absent Anefficient estimator is derived for the a priori SAP using a soft-decision approach based on the estimated a priori SNR [9] Arecursive average of this can be defined as

120577 (119896 119897) = 120573120577 (119896 119897 minus 1) + (1 minus 120573) (119896 119897 minus 1) (12)

where 120573 is a time constant The decision-directed methodproposed by Ephraim and Malah [4] provides a usefulestimation scheme for the a priori SNR

(119896 119897) = 1198862

(119896 119897 minus 1)

2

119889(119896 119897)

+ (1 minus 119886)max [120574 (119896 119897) minus 1 0] (13)

where 119886 (0 lt 119886 lt 1) is a smoothing factor max is a functionthat prevents negative values and 120574(119896 119897) asymp |119884(119896 119897)|

22

119889(119896 119897)

represents the a posteriori SNR [9] The local and globalaveraging window are then applied to (13) [9] resulting in

120577120582 (119896 119897) =

119908120582

sum

119894=minus119908120582

ℎ120582 (119894) 120577 (119896 minus 119894 119897) (14)

where the subscript 120582 may denote either ldquolocalrdquo or ldquoglobalrdquowindow and ℎ120582 is a normalized window of size 2119908120582 + 1 We

define two parameters 119875local and 119875global which represent therelationship between the above averages and the likelihoodof speech in the 119896th frequency bin of the 119897th frame Theseparameters are given as [9]

119875120582 (119896 119897) =

0 if 120577120582 (119896 119897) le 120577min

1 if 120577120582 (119896 119897) ge 120577max

log (120577120582 (119896 119897) 120577min)

log (120577max120577min) otherwise

(15)

where 120577min and 120577max are empirical constants maximizedto attenuate noise while leaving weak speech componentsunaffected The third parameter 119875frame(119897) which is requiredto attenuate more noise in speech-absent frames is based onthe speech energy in neighboring frames [9]

If 120577frame(119897) gt 120577min then

if 120577frame(119897) gt 120577frame(119897 minus 1) then119875frame(119897) = 1

120577peak(119897) = minmax[120577frame 120577119901min] 120577119901max

else119875frame(119897) = 120583(119897)

Else

119875frame(119897) = 0

where 120577frame(119897) = (1(1198702 + 1))sum1198702+1

119896=1120577(119896 119897) is an average in

the frequency domain 120583(119897) represents a soft transition fromspeech to noise 120577peak is a confined peak value of 120577frame and120577119901min and 120577119901max are empirical constants that determine thedelay of the transition as defined in [9] Finally the a prioriSAP can be defined as [9]

(119896 119897) = 1 minus 119875local (119896 119897) sdot 119875global (119896 119897) sdot 119875frame (119897) (16)

Accordingly (119896 119897) is larger if either previous frames or recentneighboring frequency bins do not contain speechThereforewhen SAP goes to 1 the speech presence probability goes to0

3 Noise Estimation and Suppression UsingLinear and Nonlinear Function

31 Combining Adaptive Factor Based on Sigmoid Functionand A Priori SAP In this section we propose a methodthat combines the adaptive factor based on the sigmoidfunction and the a priori SAP estimation [9] to achieve biasedcompensation

First we can detect the adaptive factor by requiring thesmoothed power spectrum 119875(119896 119897) be equal to the updatednoise power estimator 120590

2

119889during speech absence region In

particular we can determine the adaptive factor by minimiz-ing the mean squared error (MSE) between 119875(119896 119897) and 120590

2

119889as

follows

119864 (119875 (119896 119897) minus 1205902

119889(119896 119897))

2

| 119875 (119896 119897 minus 1) (17)


0 50 100 150 200 250 300 350 400 450 5001

102

104

Frames

120573ne

w

(a)

minus5 0 5 10 15 20 25 30 350

005

01

A priori SNR in DD

120573ne

w

(b)

Figure 1 (a) Plot of the adaptive factor 120575 in the frame index (b) Adaptive factor 120575 using a sigmoid function based on the a posteriori SNR

where we assume that the updated noise power estimator 1205902

119889

during the speech absence region is

1205902

119889(119896 119897)2asymp 2

119889(119896 119897)2+ 120575 (119896 119897) (18)

Substituting (18) into (17) then after taking the first derivativeof theMSE with respect to 120575(119897) and setting it equal to zero weget the adaptive factor for 120575(119897)

120575 (119897) = 119875 (119896 119897) minus 2

119889(119896 119897) (19)

where 2

119889is the unbiased noise power estimate in (9) We

apply the adaptive factor based on the sigmoid function to thebiased compensation factor of theMS algorithm according tothe a posteriori SNR

120575 (119897) asymp 120578 sdot1

(1 + exp (minus (minus120588 sdot SNR (119897)))) (20)

where 120575(119897) is derived from the slope factor 120588 = 05 and theempirical constant 120578 = 01 for 120575max The a posteriori SNR is

SNR (119897) = 10 sdot log(

10038171003817100381710038171003817|119884 (119896 119897)|

210038171003817100381710038171003817100381710038171003817100381710038172

119889(119896 119897)

10038171003817100381710038171003817

) (21)

where sdot is the Euclidean length of a vector The adaptivefactor 120575(119897) is controlled by the a posteriori SNR When thea posteriori SNR decreases 120575(119897) increases but is constrainedto take a value between 120575min and 120575max Thus the proposedadaptive biased compensation factor 120575(119897) approaches 120575max attimes when the SNR is low In addition when the a prioriSAP equals unity the adaptive biased compensation factor120575(119897) is also equal to 120575max in each frequency bin and vice versaThe adaptive factor is shown to be a biased compensation inFigure 1 It shows as suggested by (20) and (21) that as the aposteriori SNR increases 120575(119897) decreases but 120575(119897) maintains avalue between 120575max (120575max ≪ 01) and 120575minThus the adaptivefactor 120575(119897) approaches 120575min when the SNR is close to 20 dBSimulation results show that an increase in the 120575(119897) is goodfor noisy signals with a low SNR of less than 5 dB and thata decrease in 120575(119897) is good for noisy signals with a relativelyhigh SNR greater than 10 dB We can thus control the trade-off between speech distortion and residual noise in the frameindex using 120575(119897) In (22) let 120590

2

119889(119896 119897) be the updated noise

power estimate according to the combined a priori SAP andthe adaptive factor

1205902

119889(119896 119897) =

2

119889(119896 119897) + 120575 (119897) sdot (119896 119897) (22)

The term (119896 119897) is the a priori SAP in (16) When (119896 119897)

becomes 1 the adaptive biased compensation factor 120575(119897)

is equal to 120575max Therefore the speech absence region isefficiently compensated by combining the a priori SAP andthe adaptive factor in the 119896th frequency bin of the 119897th frameAs a result the updated noise power estimator for the optimalsmoothing factor opt(119896 119897) of 120572opt(119896 119897) is deduced from (7) as

opt (119896 119897) =120572max sdot 120572119888 (119897)

1 + (119875 (119896 119897 minus 1) 1205902

119889(119896 119897 minus 1) minus 1)

2 (23)

32 Estimated Noise Suppression Using Linear Function Inthis subsection our method uses another adaptive param-eter to control the trade-off between speech distortion andresidual noise for suppressing the estimated noise in ahighly nonstationary and varying noisy environment Theautocontrol parameter is controlled by a posteriori signal-to-noise ratio (SNR) as the variation of the noise level

The estimated clean speech power spectrum can berepresented as shown in (28) One has

SNR (119896) = log(119875 (119896 119897)

1205902

119889(119896 119897)

) (24)

120577119904 =120577min minus 120577max

SNRmax minus SNRmin (25)

120577119900 = 120577max minus 120577119904 sdot SNRmax (26)

120577 (119896) = 120577119904 sdot SNR (119896) + 120577119900 (27)

10038161003816100381610038161003816 (119896 119897)

10038161003816100381610038161003816

2

asymp |119884 (119896 119897)|2minus 120577 (119896) sdot 120590

2

119889(119896 119897) (28)

where 120577(119896) is the oversubtraction factor 120577119904 is the slope and120577119900 is the offset The constants 120577min = 1 120577max = 3 SNRmax =

20 dB and SNRmin = minus5 dB respectively [3] The adaptivelinear factor 120577(119896) affects the amount of speech distortioncaused by the spectral subtraction in (28) The factor 120577(119896)

offers a large amount of flexibility to the modified spectralsubtraction (MSS) scheme The SNR(119896) in (24) is the aposteriori SNR in frequency bin The estimated clean speechsignal can then be transformed back to the time domain bytaking the inverse STFT and synthesizing using the overlap-add method


0 20 40 60 80 100 120 140minus35minus30minus25minus20minus15minus10

minus505

1015

Short-time frames

Noisy signalNoise estimated by MS methodNoise estimated by proposed method

Residualnoise

Residual noise

Figure 2 Comparison between the noisy signal noise estimated byMS and noise estimated by the proposed method in restaurant 5 dBnoisy environment

4 Experimental Results and Discussion

The noisy signals used in our evaluation were taken fromthe NOIZEUS database [15] We used 30 test utterancesof which three each were from male and female speechsignals The analyzed signal was sampled at 8 kHz and short-time Fourier-transformed using 50 overlapping Hammingwindows of 256 samples Both the MS [7] and proposedmethods track the minimum of the noisy speech to updatethe noise estimate in Figure 2 The MS method is obtainedby tracking the minimum of the noisy power spectrumover a specified number of frames Thus the MS algorithmnoise estimate tends to be biased below the true noiselevel regardless of the number of frames Our proposedmethod efficiently compensates the speech absence region bycombining the adaptive bias compensation factor and a prioriSAPThis implies that the proposed method is more accuratethan the conventional one and could improve residual noisereduction

Figure 3 shows the clear superiority of the proposedmethod in highly nonstationary noisy environments Theconventional method [7] does not work well from initialframe to 20 frames of car noise (15 dB) and from 110 framesto 130 frames of car (15 dB) and also suffered from residualnoise A different outcome is observed in the red circleof Figure 3 Particularly the robust characteristics of theproposed method in spite of the variation of the noisyenvironments are well demonstrated Thus we can estimatemore exactly the noise level to reduce a residual noise whencompared with conventional method in highly nonstationarynoisy environments

The spectrum of the clean signal is given in Figure 4(a)and the spectrum of the noisy speech signal for speechenhancement using the MS plus spectral subtraction (SS)(MS + SS) [3 7] method is given in Figure 4(b) We canalso observe the minimum controlled recursive averaging(MCRA) with SS in Figure 4(c) There is residual noise inFigure 4(c) from 0 s lt 119905 lt 015 s and at 119905 gt 18 s partly

0 100 200 300 400 500 600 700 800minus60

minus50

minus40

minus30

minus20

minus10

0

10

20

Frames


Car5dB

Babble5dBWhite

0dBBabble10dB

Car15dB

Figure 3 Comparison between the noisy signal noise estimatedby minimum statistics (MS) and noise estimated by the proposedmethod in highly nonstationary noisy environments

Table 1 Objective evaluation and comparison of the proposedmethod segmental SNR values

Noise Method SNR0 (dB) 5 (dB) 10 (dB) 15 (dB)

WhiteMS 427 877 1283 1657

MCRA 508 999 1356 1715Proposed 569 1064 1409 1740

CarMS 344 748 1201 1610

MCRA 492 793 1185 1642Proposed 539 884 1245 1706

BabbleMS 183 600 1117 1503

MCRA 373 679 1052 1622Proposed 383 705 1160 1623

AirportMS 175 704 985 1473

MCRA 164 754 966 1515Proposed 217 816 1010 1573

StreetMS 277 675 1088 1531

MCRA 234 740 988 1436Proposed 318 824 1062 1503

RestaurantMS 031 448 940 1474

MCRA 027 547 920 1524Proposed 028 557 943 1558

because of the inability of the noise estimation algorithm tobias below the true noise level The spectrogram of the pro-posed methods for noise reduction is shown in Figure 4(d)In contrast panel Figure 4(d) shows that the residual noise ismore clearly reduced than the conventional methods

Tables 1 and 2 summarize the averaged results of thesegmental SNR and the Itakura-Saito Distortion Measure(ISDM) [15] The segmental SNR can be evaluated in eitherthe time or frequency domain The time domain measureis perhaps one of the simplest objective measures used toevaluate speech enhancement method For this measure to


Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(a)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(b)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(c)

Time (s)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

(d)

Figure 4 Frequency domain results of speech enhancement for exhibition noise 5 dB SNRs in noisy environments (a) Original spectrogram(b) spectrogram using MS with SS method (c) spectrogram using the MCRA with SS method and (d) spectrogram using the proposedmethod

Table 2 Objective evaluation and comparison of the Itakura-SaitoDistortion Measure (ISDM)

Noise Method ISDM0 (dB) 5 (dB) 10 (dB) 15 (dB)

WhiteMS 120 087 060 042

MCRA 092 044 055 037Proposed 084 043 038 034

CarMS 015 016 002 001

MCRA 021 005 002 001Proposed 009 002 002 002

BabbleMS 015 006 002 001

MCRA 012 002 001 001Proposed 008 002 001 001

AirportMS 019 006 002 002

MCRA 014 004 002 001Proposed 010 004 001 001

StreetMS 016 018 006 003

MCRA 055 013 009 005Proposed 017 009 008 004


MCRA 010 003 001 001Proposed 010 001 001 001

be meaningful it is important that the original and processedsignals be aligned in time and that any phase error presentbe corrected [15] For various noise types with an input SNRranging from 0 to 15 dB the segmental SNR after processingwas clearly better for the proposed method compared toconventional ones [7] except for the case of (highlightedin bold) We can also confirm that our methods workwell to control the trade-off between speech distortion andresidual noise for suppressing the estimated noise in highlynonstationary and various noisy environments

The ISDM was shown to give a good correlation withsubjective intelligibility measures specifically the diagnosticacceptability measure (DAM) This results in an objectivetest that can be used to produce a good meaningful resultThis also results in a test that shows the distortion and noisereduction [15] Here we can confirm that the results of theISDM with the proposed method produce good results ofISDMwhen comparedwith the conventionalmethods exceptfor the case of the theMSmethodwith SS in street 10 dB noisysignal

5 Conclusion

We presented a modified noise estimation and suppressionalgorithm that combined the nonlinear function and a prioriSAP estimation for biased compensation Moreover ourmethod uses another adaptive parameter to control thetrade-off between speech distortion and residual noise forsuppressing the estimated noise in highly nonstationary andvarious noisy environments The performance of the newalgorithm was evaluated by measuring the segment SNRand the ISDM We showed that the proposed algorithm wasgenerally superior to conventional methods reducing bothresidual noise and speech distortion in nonstationary andnoisy environments In the future we plan to evaluate itspossible application in preprocessing for signal processingarea

Competing Interests

The authors declare no competing interests

Acknowledgments

This research was supported by NRF (2013R1A1A2012536)


References

[1] S Lee C Lim and J-H Chang ldquoA new a priori SNR estima-tor based on multiple linear regression technique for speechenhancementrdquo Digital Signal Processing vol 30 no 7 pp 154ndash164 2014

[2] S Lee and J Chang ldquoOn using multivariate polynomial regres-sion model with spectral difference for statistical model-basedspeech enhancementrdquo Journal of Systems Architecture In press

[3] M Berouti M Schwartz and J Makhoul ldquoEnhancement ofspeech corrupted by acoustic noiserdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing pp 208ndash211 Washington DC USA April 1979

[4] Y Ephraim and D Malah ldquoSpeech enhancement using aminimum mean-square error short-time spectral amplitudeestimatorrdquo IEEE Transactions on Acoustics Speech and SignalProcessing vol 32 no 6 pp 1109ndash1121 1984

[5] Y Hu Subspace and multitaper methods for speech enhancement[PhD dissertation] University of Texas at Dallas RichardsonTex USA 2003

[6] O Cappe ldquoElimination of the musical noise phenomenon withthe Ephraim andMalah noise suppressorrdquo IEEE Transactions onSpeech and Audio Processing vol 2 no 2 pp 345ndash349 1994

[7] R Martin ldquoNoise power spectral density estimation based onoptimal smoothing andminimum statisticsrdquo IEEE Transactionson Speech and Audio Processing vol 9 no 5 pp 504ndash512 2001

[8] S Rangachari and P C Loizou ldquoA noise-estimation algorithmfor highly non-stationary environmentsrdquo Speech Communica-tion vol 48 no 2 pp 220ndash231 2006

[9] I Cohen ldquoOptimal speech enhancement under signal presenceuncertainty using log-spectral amplitude estimatorrdquo IEEE Sig-nal Processing Letters vol 9 no 4 pp 113ndash116 2002

[10] I Cohen ldquoNoise spectrum estimation in adverse environmentsimproved minima controlled recursive averagingrdquo IEEE Trans-actions on Speech and Audio Processing vol 11 no 5 pp 466ndash475 2003

[11] L Lin W H Holmes and E Ambikairajah ldquoAdaptive noiseestimation algorithm for speech enhancementrdquo ElectronicsLetters vol 39 no 9 pp 754ndash755 2003

[12] S J Lee and S H Kim ldquoNoise estimation based on standarddeviation and sigmoid function using a posteriori signal tonoise ratio in nonstantionary noisy environmentsrdquo Interna-tional Journal of Control Automation and Systems vol 6 no6 pp 818ndash827 2008

[13] R Martin ldquoBias compensationmethods for minimum statisticsnoise power spectral density estimationrdquo Signal Processing vol86 no 6 pp 1215ndash1229 2006

[14] D Mauler and R Martin ldquoImproved reproduction of stops innoise reduction systems with adaptive windows and nonsta-tionarity detectionrdquo EURASIP Journal on Advances in SignalProcessing vol 2009 Article ID 469480 17 pages 2009

[15] P C Loizou Speech Enhancement Theory and Practice CRCPress Boca Raton Fla USA 2007

[16] J R Deller J H L Hansen and J G Proakis Discrete TimeProcessing of Speech Signals IEEE Press New York NY USA1999

[17] Y-S Park and J-H Chang ldquoA probabilistic combinationmethod of minimum statistics and soft decision for robustnoise power estimation in speech enhancementrdquo IEEE SignalProcessing Letters vol 15 pp 95ndash98 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics


Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks



a tendency to bias the noise estimate below that of the truenoise level regardless of the number of frames [8]Thereforeit leaves residual noise in the frames of speech absence andin the frames of variation of noise characteristic in highlynonstationary noisy environments

To solve this problem we propose a combined adaptivefactor based on a sigmoid function and a priori speechabsence probability (SAP) estimation [9] for biased com-pensation Specifically we apply the adaptive factor 120575 asa posteriori SNR When the a posteriori SNR decreases 120575

increases but is constrained to take a value between 120575minand 120575max Thus the proposed adaptive biased compensationfactor 120575 approaches 120575max at times when the SNR is low Inaddition when the a priori SAP equals unity the adaptivebiased compensation factor 120575 also approaches 120575max in eachfrequency bin and vice versa Furthermore our method usesanother adaptive parameter to control the trade-off betweenspeech distortion and residual noise for suppressing theestimated noise in highly nonstationary and various noisyenvironments The autocontrol parameter is controlled by aposteriori signal-to-noise ratio (SNR) as the variation of thenoise level

We evaluate the performance of the proposed algorithmfor nonstationary noise and various noise environmentsTheimprovement can be confirmed in the segmental SNR andthe Itakura-SaitoDistortionMeasure (ISDM) [15]The resultsshow that our proposed method is superior to the conven-tional MS approach The structure of the paper is as followsSection 2 reviews theminimum statistics and the a priori SAPestimation algorithms Section 3 addresses noise estimationand suppression using a linear and a nonlinear functionIn Section 4 we express the combined sigmoid functionusing the a posteriori SNR and a priori SAP estimation forrobust biased compensation In Section 5 we discuss theexperimental results

2 Minimum Statistics (MS) and SpeechAbsence Probability (SAP)

21 Review of MS The noisy speech signal 119910(119899) can berepresented as 119910(119899) = 119909(119899) + 119889(119899) where 119909(119899) is the cleanspeech signal and 119889(119899) is the noise signal Dividing thesignal into overlapping frames using a window function andapplying the short-time Fourier transform (STFT) [16] toeach frame yield the time-frequency representation 119884(119896 119897) =

119883(119896 119897) + 119863(119896 119897) where 119896 = 1 2 119870 is the frequency binindex and 119897 = 1 2 119871 is the time frame index It can beshown that

1003816100381610038161003816119884119896 (119897)10038161003816100381610038162asymp

1003816100381610038161003816119883119896 (119897)10038161003816100381610038162+

1003816100381610038161003816119863119896 (119897)10038161003816100381610038162 (1)

where |119884119896(119897)|2 |119883119896(119897)|

2 and |119863119896(119897)|2 are the power spectrum

of the noisy speech signal clean speech and noise respec-tively

The MS algorithm relies on the fact that the noisy powerspectrum often becomes equal to the noise power spectrumduring periods of speech pauses [7 13 17] Therefore an esti-mate of the noise power spectrum is obtained by separatelytracking the minimum of the noisy speech in each frequency

bin In addition because the minimum is biased towardslower values an unbiased estimate may be obtained throughmultiplication by a bias factor which is derived from thestatistics of the local minimum To search for the minimumwe take the first-order recursive of the noisy power spectrum

119878119896 (119897) = 120572119878119896 (119897 minus 1) + (1 minus 120572)1003816100381610038161003816119884119896 (119897)

10038161003816100381610038162 (2)

where 119878119896(119897) is the smoothed periodogram and 120572 is thesmoothing factor The smoothing factor used in (2) must beclose to 1 to keep the variance of the minimum tracking assmall as possible Hence time and frequency dependenceare required to determine if speech is present or absentThe smoothing factor is therefore derived by minimizing themean square error between 119878119896(119897) and 120590

2

119889119896(119897)

119864 (119878119896 (119897) minus 1205902

119889119896(119897))2

| 119878119896 (119897 minus 1) (3)

where 1205902

119889119896(119897) is the noise variance

119878119896 (119897) = 120572119896 (119897) 119878119896 (119897 minus 1) + (1 minus 120572119896 (119897))1003816100381610038161003816119884119896 (119897)

10038161003816100381610038162 (4)

In (4) the time-frequency dependent smoothing factor 120572119896(119897)is used instead of the fixed 120572 defined in (2) Substituting(4) into (3) and setting the first derivative to 0 we find theoptimum value for 120572119896(119897)

120572opt119896 (119897) =1

1 + (119878119896 (119897 minus 1) 1205902

119889119896(119897) minus 1)

(5)

According to (5) the smoothing factor can vary between0 and 1 but such a smoothing factor is not practical [15]The value of 120572opt becomes progressively smaller for a largea posteriori SNR 120574 asymp (119878119896 (119897 minus 1)120590

2

119889119896(119897)) (speech present)

However smoothing is required even during periods ofspeech because the speech power spectrum also containsa percentage of noise Hence the smoothing factor has afloor of (03) which results in a maximum of only (70)of the original spectrum remaining within any one frameConversely when the a posteriori SNR 120574 is low (speech isabsent) 120572 tends towards 1 which causes the smoothed outputto lock onto the previous value To eliminate this (5) ismultiplied by 120572max = 096 From (5) we note that 120572opt119896(119897)

depends on the true noise variance1205902119889119896

(119897) which is unknownIn practice we can replace 120590

2

119889119896(119897) with the latest estimated

value 2

119889119896(119897 minus 1) In general however this lags the true noise

variance and hence the estimated smoothing factor may betoo small or large Problems may arise when 120572opt119896(119897) is closeto 1 because 119878119896(119897) will not respond fast enough to changesin the noise Thus tracking errors were monitored in [7] bycomparing the average short-term smoothed periodogram tothe estimated noise variance After including the correctionfactor [7]

120572119888 (119897) =1

1 + (sum119870

119896=1119878119896 (119897 minus 1) sum

119870

119896=1

1003816100381610038161003816119884119896 (119897)10038161003816100381610038162minus 1)2 (6)


the final factor

120572opt119896 (119897) =120572max sdot 120572119888 (119897)

1 + (119878119896 (119897 minus 1) 2

119889119896(119897 minus 1) minus 1)

2 (7)





2

119889119896(119897) = 120573min119896 (119897) sdot 119878min119896 (119897) (9)

where 2




1198670 (119896 119897) 119884 (119896 119897) = 119863 (119896 119897)

1198671 (119896 119897) 119884 (119896 119897) = 119883 (119896 119897) + 119863 (119896 119897)

(10)


119875 (1198670 (119896 119897) equiv 119902 (119896 119897)) (11)




(119896 119897) = 1198862

(119896 119897 minus 1)

2

119889(119896 119897)

+ (1 minus 119886)max [120574 (119896 119897) minus 1 0] (13)


22

119889(119896 119897)


120577120582 (119896 119897) =

119908120582

sum

119894=minus119908120582

ℎ120582 (119894) 120577 (119896 minus 119894 119897) (14)



119875120582 (119896 119897) =

0 if 120577120582 (119896 119897) le 120577min

1 if 120577120582 (119896 119897) ge 120577max

log (120577120582 (119896 119897) 120577min)


(15)





else119875frame(119897) = 120583(119897)

Else

119875frame(119897) = 0

where 120577frame(119897) = (1(1198702 + 1))sum1198702+1

119896=1120577(119896 119897) is an average in







2



2

119889as

follows

119864 (119875 (119896 119897) minus 1205902

119889(119896 119897))

2

| 119875 (119896 119897 minus 1) (17)


0 50 100 150 200 250 300 350 400 450 5001

102

104

Frames

120573ne

w

(a)

minus5 0 5 10 15 20 25 30 350

005

01

A priori SNR in DD

120573ne

w

(b)



119889


1205902

119889(119896 119897)2asymp 2

119889(119896 119897)2+ 120575 (119896 119897) (18)


120575 (119897) = 119875 (119896 119897) minus 2

119889(119896 119897) (19)

where 2



120575 (119897) asymp 120578 sdot1



SNR (119897) = 10 sdot log(

10038171003817100381710038171003817|119884 (119896 119897)|

210038171003817100381710038171003817100381710038171003817100381710038172

119889(119896 119897)

10038171003817100381710038171003817

) (21)


2



1205902

119889(119896 119897) =

2

119889(119896 119897) + 120575 (119897) sdot (119896 119897) (22)




opt (119896 119897) =120572max sdot 120572119888 (119897)

1 + (119875 (119896 119897 minus 1) 1205902

119889(119896 119897 minus 1) minus 1)

2 (23)



SNR (119896) = log(119875 (119896 119897)

1205902

119889(119896 119897)

) (24)

120577119904 =120577min minus 120577max



120577 (119896) = 120577119904 sdot SNR (119896) + 120577119900 (27)

10038161003816100381610038161003816 (119896 119897)

10038161003816100381610038161003816

2

asymp |119884 (119896 119897)|2minus 120577 (119896) sdot 120590

2

119889(119896 119897) (28)






minus505

1015

Short-time frames


Residualnoise

Residual noise






0 100 200 300 400 500 600 700 800minus60

minus50

minus40

minus30

minus20

minus10

0

10

20

Frames


Car5dB

Babble5dBWhite

0dBBabble10dB

Car15dB




WhiteMS 427 877 1283 1657

MCRA 508 999 1356 1715Proposed 569 1064 1409 1740

CarMS 344 748 1201 1610

MCRA 492 793 1185 1642Proposed 539 884 1245 1706

BabbleMS 183 600 1117 1503

MCRA 373 679 1052 1622Proposed 383 705 1160 1623

AirportMS 175 704 985 1473

MCRA 164 754 966 1515Proposed 217 816 1010 1573

StreetMS 277 675 1088 1531

MCRA 234 740 988 1436Proposed 318 824 1062 1503


MCRA 027 547 920 1524Proposed 028 557 943 1558




Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(a)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(b)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(c)

Time (s)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

(d)




WhiteMS 120 087 060 042

MCRA 092 044 055 037Proposed 084 043 038 034

CarMS 015 016 002 001

MCRA 021 005 002 001Proposed 009 002 002 002

BabbleMS 015 006 002 001

MCRA 012 002 001 001Proposed 008 002 001 001

AirportMS 019 006 002 002

MCRA 014 004 002 001Proposed 010 004 001 001

StreetMS 016 018 006 003

MCRA 055 013 009 005Proposed 017 009 008 004


MCRA 010 003 001 001Proposed 010 001 001 001



5 Conclusion


Competing Interests


Acknowledgments



References




















RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










the final factor

120572opt119896 (119897) =120572max sdot 120572119888 (119897)

1 + (119878119896 (119897 minus 1) 2

119889119896(119897 minus 1) minus 1)

2 (7)





2

119889119896(119897) = 120573min119896 (119897) sdot 119878min119896 (119897) (9)

where 2




1198670 (119896 119897) 119884 (119896 119897) = 119863 (119896 119897)

1198671 (119896 119897) 119884 (119896 119897) = 119883 (119896 119897) + 119863 (119896 119897)

(10)


119875 (1198670 (119896 119897) equiv 119902 (119896 119897)) (11)




(119896 119897) = 1198862

(119896 119897 minus 1)

2

119889(119896 119897)

+ (1 minus 119886)max [120574 (119896 119897) minus 1 0] (13)


22

119889(119896 119897)


120577120582 (119896 119897) =

119908120582

sum

119894=minus119908120582

ℎ120582 (119894) 120577 (119896 minus 119894 119897) (14)



119875120582 (119896 119897) =

0 if 120577120582 (119896 119897) le 120577min

1 if 120577120582 (119896 119897) ge 120577max

log (120577120582 (119896 119897) 120577min)


(15)





else119875frame(119897) = 120583(119897)

Else

119875frame(119897) = 0

where 120577frame(119897) = (1(1198702 + 1))sum1198702+1

119896=1120577(119896 119897) is an average in







2



2

119889as

follows

119864 (119875 (119896 119897) minus 1205902

119889(119896 119897))

2

| 119875 (119896 119897 minus 1) (17)


0 50 100 150 200 250 300 350 400 450 5001

102

104

Frames

120573ne

w

(a)

minus5 0 5 10 15 20 25 30 350

005

01

A priori SNR in DD

120573ne

w

(b)



119889


1205902

119889(119896 119897)2asymp 2

119889(119896 119897)2+ 120575 (119896 119897) (18)


120575 (119897) = 119875 (119896 119897) minus 2

119889(119896 119897) (19)

where 2



120575 (119897) asymp 120578 sdot1



SNR (119897) = 10 sdot log(

10038171003817100381710038171003817|119884 (119896 119897)|

210038171003817100381710038171003817100381710038171003817100381710038172

119889(119896 119897)

10038171003817100381710038171003817

) (21)


2



1205902

119889(119896 119897) =

2

119889(119896 119897) + 120575 (119897) sdot (119896 119897) (22)




opt (119896 119897) =120572max sdot 120572119888 (119897)

1 + (119875 (119896 119897 minus 1) 1205902

119889(119896 119897 minus 1) minus 1)

2 (23)



SNR (119896) = log(119875 (119896 119897)

1205902

119889(119896 119897)

) (24)

120577119904 =120577min minus 120577max



120577 (119896) = 120577119904 sdot SNR (119896) + 120577119900 (27)

10038161003816100381610038161003816 (119896 119897)

10038161003816100381610038161003816

2

asymp |119884 (119896 119897)|2minus 120577 (119896) sdot 120590

2

119889(119896 119897) (28)






minus505

1015

Short-time frames


Residualnoise

Residual noise






0 100 200 300 400 500 600 700 800minus60

minus50

minus40

minus30

minus20

minus10

0

10

20

Frames


Car5dB

Babble5dBWhite

0dBBabble10dB

Car15dB




WhiteMS 427 877 1283 1657

MCRA 508 999 1356 1715Proposed 569 1064 1409 1740

CarMS 344 748 1201 1610

MCRA 492 793 1185 1642Proposed 539 884 1245 1706

BabbleMS 183 600 1117 1503

MCRA 373 679 1052 1622Proposed 383 705 1160 1623

AirportMS 175 704 985 1473

MCRA 164 754 966 1515Proposed 217 816 1010 1573

StreetMS 277 675 1088 1531

MCRA 234 740 988 1436Proposed 318 824 1062 1503


MCRA 027 547 920 1524Proposed 028 557 943 1558




Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(a)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(b)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(c)

Time (s)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

(d)




WhiteMS 120 087 060 042

MCRA 092 044 055 037Proposed 084 043 038 034

CarMS 015 016 002 001

MCRA 021 005 002 001Proposed 009 002 002 002

BabbleMS 015 006 002 001

MCRA 012 002 001 001Proposed 008 002 001 001

AirportMS 019 006 002 002

MCRA 014 004 002 001Proposed 010 004 001 001

StreetMS 016 018 006 003

MCRA 055 013 009 005Proposed 017 009 008 004


MCRA 010 003 001 001Proposed 010 001 001 001



5 Conclusion


Competing Interests


Acknowledgments



References




















RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










0 50 100 150 200 250 300 350 400 450 5001

102

104

Frames

120573ne

w

(a)

minus5 0 5 10 15 20 25 30 350

005

01

A priori SNR in DD

120573ne

w

(b)



119889


1205902

119889(119896 119897)2asymp 2

119889(119896 119897)2+ 120575 (119896 119897) (18)


120575 (119897) = 119875 (119896 119897) minus 2

119889(119896 119897) (19)

where 2



120575 (119897) asymp 120578 sdot1



SNR (119897) = 10 sdot log(

10038171003817100381710038171003817|119884 (119896 119897)|

210038171003817100381710038171003817100381710038171003817100381710038172

119889(119896 119897)

10038171003817100381710038171003817

) (21)


2



1205902

119889(119896 119897) =

2

119889(119896 119897) + 120575 (119897) sdot (119896 119897) (22)




opt (119896 119897) =120572max sdot 120572119888 (119897)

1 + (119875 (119896 119897 minus 1) 1205902

119889(119896 119897 minus 1) minus 1)

2 (23)



SNR (119896) = log(119875 (119896 119897)

1205902

119889(119896 119897)

) (24)

120577119904 =120577min minus 120577max



120577 (119896) = 120577119904 sdot SNR (119896) + 120577119900 (27)

10038161003816100381610038161003816 (119896 119897)

10038161003816100381610038161003816

2

asymp |119884 (119896 119897)|2minus 120577 (119896) sdot 120590

2

119889(119896 119897) (28)






minus505

1015

Short-time frames


Residualnoise

Residual noise






0 100 200 300 400 500 600 700 800minus60

minus50

minus40

minus30

minus20

minus10

0

10

20

Frames


Car5dB

Babble5dBWhite

0dBBabble10dB

Car15dB




WhiteMS 427 877 1283 1657

MCRA 508 999 1356 1715Proposed 569 1064 1409 1740

CarMS 344 748 1201 1610

MCRA 492 793 1185 1642Proposed 539 884 1245 1706

BabbleMS 183 600 1117 1503

MCRA 373 679 1052 1622Proposed 383 705 1160 1623

AirportMS 175 704 985 1473

MCRA 164 754 966 1515Proposed 217 816 1010 1573

StreetMS 277 675 1088 1531

MCRA 234 740 988 1436Proposed 318 824 1062 1503


MCRA 027 547 920 1524Proposed 028 557 943 1558




Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(a)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(b)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(c)

Time (s)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

(d)




WhiteMS 120 087 060 042

MCRA 092 044 055 037Proposed 084 043 038 034

CarMS 015 016 002 001

MCRA 021 005 002 001Proposed 009 002 002 002

BabbleMS 015 006 002 001

MCRA 012 002 001 001Proposed 008 002 001 001

AirportMS 019 006 002 002

MCRA 014 004 002 001Proposed 010 004 001 001

StreetMS 016 018 006 003

MCRA 055 013 009 005Proposed 017 009 008 004


MCRA 010 003 001 001Proposed 010 001 001 001



5 Conclusion


Competing Interests


Acknowledgments



References




















RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











minus505

1015

Short-time frames


Residualnoise

Residual noise






0 100 200 300 400 500 600 700 800minus60

minus50

minus40

minus30

minus20

minus10

0

10

20

Frames


Car5dB

Babble5dBWhite

0dBBabble10dB

Car15dB




WhiteMS 427 877 1283 1657

MCRA 508 999 1356 1715Proposed 569 1064 1409 1740

CarMS 344 748 1201 1610

MCRA 492 793 1185 1642Proposed 539 884 1245 1706

BabbleMS 183 600 1117 1503

MCRA 373 679 1052 1622Proposed 383 705 1160 1623

AirportMS 175 704 985 1473

MCRA 164 754 966 1515Proposed 217 816 1010 1573

StreetMS 277 675 1088 1531

MCRA 234 740 988 1436Proposed 318 824 1062 1503


MCRA 027 547 920 1524Proposed 028 557 943 1558




Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(a)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(b)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(c)

Time (s)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

(d)




WhiteMS 120 087 060 042

MCRA 092 044 055 037Proposed 084 043 038 034

CarMS 015 016 002 001

MCRA 021 005 002 001Proposed 009 002 002 002

BabbleMS 015 006 002 001

MCRA 012 002 001 001Proposed 008 002 001 001

AirportMS 019 006 002 002

MCRA 014 004 002 001Proposed 010 004 001 001

StreetMS 016 018 006 003

MCRA 055 013 009 005Proposed 017 009 008 004


MCRA 010 003 001 001Proposed 010 001 001 001



5 Conclusion


Competing Interests


Acknowledgments



References




















RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(a)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(b)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

Time (s)

(c)

Time (s)

Freq

uenc

y (H

z)

0 1 2 3 4 5 6 70

2000

4000

(d)




WhiteMS 120 087 060 042

MCRA 092 044 055 037Proposed 084 043 038 034

CarMS 015 016 002 001

MCRA 021 005 002 001Proposed 009 002 002 002

BabbleMS 015 006 002 001

MCRA 012 002 001 001Proposed 008 002 001 001

AirportMS 019 006 002 002

MCRA 014 004 002 001Proposed 010 004 001 001

StreetMS 016 018 006 003

MCRA 055 013 009 005Proposed 017 009 008 004


MCRA 010 003 001 001Proposed 010 001 001 001



5 Conclusion


Competing Interests


Acknowledgments



References




















RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










References




















RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









Documents

Research Article Noise Estimation and Suppression Using ...downloads.hindawi.com/journals/js/2016/5352437.pdf · speech enhancement to improve the speech intelligibility or quality