Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

Embed Size (px)

Citation preview

  • 7/30/2019 Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

    1/8

    IEEERANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNALROCESSING, VOL. ASSP-27, NO. 2 , APRIL 1979 1 1 3

    Abstract-A stand-alone noise suppression algorithm is presented forreducing the spectral effects of acoustically added noise in speech. Ef-fecti ve performance of digital speech processors operating in practicalenvironments may require suppression of noise from the digital wave-form. Spectral subtraction offers a com putationally efficient, processor-independen t approach to effec tive digital speech analysis. The method ,requiring about the same computation as high-speed convolution, sup-presses stationary no ise from speech by subtracting the spectral noisebias calculated during nonspeech activity. Secondaryprocedures arethen applied to attenuate the residual noise left after subtraction. Sincethe algorithm resynthesizes a speech waveform, it can be used as a pre-processor to narrow-band voice comm unications systems, speech recog-nition systems, or speaker authentication systems.

    I . NTRODUCTIONB CKGROUNDnoise acoustically added to speech candegrade the performance of digital voice processors usedfor applications such as speech compression, recognition, andauth entic ation 11 [2] Digital voice system s willbe used ina variety ofenvironments,and heirperformancemust bemaintained at a level near th at m easured using noise-free inpu tspeech. To ensure continued reliability, the effects of back-ground noise can be reduced by using noise-cancelling micro-phones, internal modification of theoice processor algorithmst o explicitly compensate or signal contam ination,orpre-processor noise reduction .Noise-cancelling micro phon es, lthough essential for ex -treme ly high noiseenvironments such as the helicopter cockpit,offer little or no noise reduction above 1 kHz [3 ] (see Fig. 5 ) .Techniques available for voice proce ssor modification to ac-count for noise contam ination are being developed [4] [ 5 ] .But due to the tim e, effo rt, and money spent on the designand implementation of these voice processors [6] [8] thereis a reluctance t o internally m odify these system s.Prepr ocesso r noise redu ction E121 [21] offers the advantagethat noise stripping is done o n the waveform itself with heoutp ut be ing either digital oranalogspeech.Thus, existingvoice processors uned to c lean speechcan continue to beused unm odified. Also, since the outpu t is speech, the noisestripping becom es ndepe ndentofany specific subsequent

    Manuscript received June 1, 1978; revised September 12, 197 8. Thisresearch was supported by the InformationProcessingBranch of theDefense Advanced Research Projects Age ncy, monito red by th e NavalResearch Laboratory under Contract N0 0173-7 7-C-00 41,The author is with the Department of Computer Science, Universityof Utah, Salt Lake City, UT 84 112 .

    speech processor implementation (it could be connected to aCCD channel vocoder or a igital LPC vocoder).The objectives of this effor t were to develop a noise sup-pression technique,mplement omputationally efficientalgorithm, and est ts performance in actual noise environ-ments. The approach usedwas to estimate hemagnitudefrequencyspectrumof heunderlyingcleanspeechby sub-tracting the noise magnitude spectrum from the noisy speechspectrum. This estimator requires an estimate of the currentnoise spectrum. Rather han obtain this noise estimate fromasecondmicrophonesource [9] [lo] it is approximatedusing the average noise m agnitud e mea sured during nonspee chactivity. Using this approach , the spectral app roximation erroris then defined,an dsecondarymethods or educing it aredescribed.The noise suppressor is implemented using about the sameamount of computation as required in a high-speech convolu-tion. It is tested on speech recorded in a helicopter environ-ment. Its performance is measured using the D iagnostic RhymeTest (DRT) 111 and is d emon strated using isometric plots ofshort-time spectra.The paper is divided into sections which develop the spectralestimator, describe the algorithm implementation, and demon-strate the algorithm performance.

    11. SUBTRACTIVEOISE SUPPRESSION ANALYSISA . Introduction

    This section describes the noise-suppressed pectral estimator.The estimator is obtained by subtracting an estimate of thenoise spectrum from the noisy speech spectrum. Spectral in-formation required to describe the noise spectrum is obtainedfrom he signal measuredduringnonspeech activity. Afterdeveloping he spectral estimator, he spectral error is com-puted and four methods for reducing itre presented.The following ssumptions were sed in developingheanalysis. The backgroundnoise is acoustically or digitallyadded tohe speech. The background noise environmentremains locally stationary to the degree that its spectral mag-nitude expected value just prior to speech activity equals itsexpected value during peech activity. If the nvironmentchanges to a new stationary state, here exists enough ime(about 300 ms) to estimate a new background noise spectralmagnitude expected value before speech activity commen ces.For the slowly varying nonstationa ry noise environm ent, thealgorithm equiresa peech activity detector to signal the1979 IEEE

    AlultIXDoM1a1UfIX Ra

  • 7/30/2019 Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

    2/8

    114 IEEE TRANSACTIONSNCOUSTICS,PEECH, A N D SIGNALROCESSING, VOL.ASSP-27, NO. 2, APRIL 1979program that speech has ceased and a new noise bias can beestima ted. Finally, it is assumed tha t significant noise reduc -tion is possible by removing the effect o f noise from the mag-nitude spectrum only.Speech, suitably low-pass filtered and digitized, is analyzedby windowing data from half-overlapped inpu t data buffers.The magnitude spectra of he windowed data are calculatedand the spectral noise bias calculated during nonspeech ctivityis subtracted ff.Resulting negative amplitudes are thenzeroed ou t. Secondary residual noise suppression ishenapplied. A time waveform is recalculated from the modifiedmagnitude. This waveform is then overlap added t o the previ-ous data o generate the output speech.B. Additive Noise Model

    Assume that a windowed noise signal n ( k )has been added oa windowed speechignal s (k ) ,with their sum denoted byThenx ( k ) s ( k ) n ( k ) .

    Taking the Fourier transform givesX ( e ' " ) N( e iw)where

    X ( e i " )

    k=O

    2n 1,( e iw) e jwkC. Spectral Subtraction Estimator

    The spectral subtraction filter H ( e i w ) is Calculated by re-placing the noise spectrum N ( e i w ) with spectra which can bereadily measured. The magnitude ( N ( e i w ) ( f N ( e i w ) is re-placed by ts averagevalue p ( e J w ) takenduringnonspeechactivity, and the phase e,(ei") of N ( e i w ) is replaced by thephase ex(eiw) of X ( e i w ) . T2esesubstitutions result in thespectral subtraction estimatorS ( e i w ) :

    D. Spectral ErrorThe spectral error resulting from this estimato r isgiven b y

    A numb er of simple modifications areavailable to reduceth eauditoryeffectsof his spectral error. These include:1) magnitude averaging; 2) half-wave rectification; 3) residualnoise reduction; and additional signal attenuation duringnonspeech activity.E. Magnitude Averaging

    Since the spectral error equals the difference between henoise spectrum N and its mean local averaging of spectralmagnitudesan e used to reduceherror.ReplacingIX(eiw)I with IX(ejW)I whereIx(ej")l IXi(e'")I1 M- 1

    i = O

    Xi(&" it h time-windowed transform of ( k )gives

    The rationale behind averaging is that the spectral error be-comes approximatelye(e'"> s ( e i w > pwhere

    Thus, the sample mean of IN(e iw)l will converge to p(e'") asa longer average is take n.The obvious problem withhis modification is that thespeechis nonsta tionary, and therefore only limited time averaging isallowed.DRT results show tha t averagingovermore thanthree half-overlapped windows with a total time duration of38.4 ms will decre ase intelligibility. Spectra lexamples andDRT scoreswith and with out averaging re iven in the"Results" sect ion. Based upo n these results, it appe ars tha taveraging coupled with half rectification offers some improve-me nt. The major disadvantage of averaging is the risk of sometemporal smearing of short transitory sounds.F, Half-Wave Rectifica tion

    For each frequency where the noisy signal spectrum mag-nitud e is less than the average noise spectru m mag-nitude p(ei") , the output isset t o zero. This modificationcan be simply implem ented by half-wave rectifying H ( e i w) .The estimator then becomesHR ( e jW) X ( e jW)

    where

    The input-output relationship between X(eiW) and ateach frequency c3 is show n in Fig. 1 .Thus, the effect f half-wave rectification is to bias down themagnitudespectrumateachfrequency by he noisebiasdetermined at that freq uenc y. The bias value can, of course,

    AlultIXDoM1a1UfIX Ra

  • 7/30/2019 Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

    3/8

    BOLL: SUPPRESSION OF ACOUSTIC NOISE INPEECH

    Fig. 1. Input-output relation between X(@) and $(eiw).

    change from frequency to frequ ency as well as from analysistime w indow to time window. The advan tage of half rectifica-tion is tha t the noise floor is reduced by p ( e i w ) . Also, anylow variance coherent noise ones are essentially eliminated.The disadvantage of half rectification can exhibit itself in thesituation where the sum of the noise plus speechat a frequencyis ess tha n p(e'"). Then the speech information at thatfrequency is incorrectly removed, implying a ossible decreasein intelligibility. As discussed in the section on "Results," forthe helicopter speech data base this processing did no t reduceintelligibility as measured using the DRT.

    Residual Noise Reductio nAfter half-wave rectification, speech plus noise lying aboveremain. n heabsenceof peech activity thedifferenceNR -N p i e n , which shall be called the noise residual, will

    for uncorre lated noise exhib it itself in the spectrum as ran-domly spaced narrow bands of magnitude spikes (see Fig. 7).This n oise residual will have a magnitude betwe en zero and amaximum value measuredduringnonspeech activity. Trans-formed back to the time domain, the noise residual will soundlike the sumof tone generatorswith andom undamentalfrequencies which are turned on and off at a rate of about 20ms. Durin g speech activity the noise residual will also be per-ceived at those requenc ieswhich are not masked by hespeech.The audible effects of the noise residual can be reduced bytakingadvantageof its frame-to-frame andomness. Specifi-cally, a t a given frequency bin , since the noise residual willrandomly luctuate namplitudeateach analysis frame, tcanbe suppresse d by replacing its urren t value with tsminimum value chosen rom he djacent analysis frames.T&$g the minim um value is used only whe n the mag nitudeof S ( e i w ) is less than the max imum noise residual calculatedduring nonspeech activity. The motivation beh ind tE s replace-ment scheme is threefold: first, f the amplitude of&'(eiW)liesbelow the maximum noise residual, and it varies radically fromanalysis frame to frame, then there is a high probability tha tthe spectrum at that frequency is due to noee; therefore,sup-press it by taking the minimum; second, if S ( e i w ) ies belowthe maximum but has a nearly constant value, there is a high

    probability that the spectrum at that frequency is due to lo wenergy speech; therefore,*taking the m inimum will retain theinformation; and third,f S ( e i w ) is greater than the maximum,there is speech present at that frequency; therefore, emovingthe bias is sufficient. The amou nt of noise reduction using thisreplacement schem e was judged equivalent o th at btain ed byaveraging over three frames. However, with this approach highenergy frequency bins are no t averaged together. The disad-vantage to the scheme is th at more storage is required t o savethemaximum noise residuals and hemagnitude values forthree adjacent frames.The residual noise redu ction schem e s implemented as

    where

    andmax INR(ejw)I maximum value of noise residualmeasured during nonspeech ctivity.

    H. Additio nal Signal Attenua tion During Nonspeech Act ivi tyThe energycontent of $(ei") relative to provides anaccurate indicator of the presence of speech activity within agiven analysis frame. If speech activity is abse nt, th en S ( e i w )will consist of the noise residual which remains afterhalf-waverectification and minimum value selection. Emp irically,it wasdetermined that the average (before versus after) power ratiowas down a t least 12 dB. This im plied a m easurefor detecting

    the absence of speech iven by

    If T was less than 12 dB, the fra me was classified as havingno speech activity. During the absence of speechactivity thereare at least three options prior to resynthesis: do nothing, at-tenuate the output by a fixed factor, r set the outp ut to ero.Having some signal presentduringnonspeech activity wasjudged to give the higher quality result. A possible reaso n forthis is that noise presentduringspeech activity is partiallymasked by he speech. Its perceived magnitude hould bebalanced by the presence of the same amount of noise duringnonspeech activity. Setting the buffer to zero had the effectof amplifying the noise during speechctivity. Likewise, doingnothing had theffect of amplifyinghe noise during nonspeechactivity. A reasonable, though by no means optimum amountofattenuation was found to be -30 dB .Thus, he outpu tspectral estimate ncluding outp utattenuationduringnon-speech activity is given by

    $ ( e j w ) = $ ( e i w )T > - 1 2 d B

    c X ( e i w ) T 12 dBwhere log,, c -3 0 dB.

    AlultIXDoM1a1UfIX Ra

  • 7/30/2019 Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

    4/8

    116 IEEERANSACTIONS O N ACOUSTICS,PEECH, A N D SIGNALROCESSING,OL. ASP-27, NO . 2, APRIL 1979

    Fig. 2. Data segmentation and advance.

    ALGORITHMMPLEMENTATIONA . Introduction

    Based on hedevelopmentof he last section, a completeanalysis-synthesis algorithm can be constr ucte d. This sectionpresents he specifications required t o implement a spectralsubtraction noise suppression system.B. Input-Output Data Bufferin g and Windowing

    Speech from the A-D conver ter is segmented and windowedsuch that in the absence of spectral modifications, if the syn-thesis speech segments are added together, the esulting overallsystem reduces to an dentity.The data are segmented andwindowed using th e result 121 th at if a sequence is separatedint o half-overlapped data buffe rs, nd each buffers multipliedby a Hanning window, then the sum of these windowed se-quences adds back u p to the original sequences. The windowlength is chosen to beapprox imately twice as largeas themaximum expected pitch period for adequate frequency reso-lution 13] For the sampling rate of 8.00 kHz a windowlength of 25 6 points shifted in steps of 128 points was used.Fig. 2 shows the data segmentation andadvance.C Frequency Analysis

    The DFT of each data windo w is taken and the magnitudeis computed.Since real data are being transform ed, two datawindows canbe transformed using one FFT [14 ] The FF T size is set equalt o the window size of 256. Augmentation with zeros was notincorporated. As correctly oted by Allen [15] spectralmodification followed b y inverse transforming can d istort thetime waveform due o temporal aliasingcaused b y circularconvolutionwithhe time responsef themodification.Augmenting th e input time waveform w ith zeros before spec-tral modification willminimize this aliasing. Expe rimentswithandwithoutaugmentation using thehelicopter speechresulted in negligible differences , and therefore augm entationwas not ncorp orate d. Finally, since eal data are analyzed,transform ymm etries were taken advantage of to reducestorage requirem ents essentially in half [ I41

    D.Magnitude AveragingAs was described in the previous section, the variance of thenoise spectral estimate is reduced by averaging over as manyspectral magnitude sets as possible. However, the nonstation-arity of the speech limits the tot al time interval available folocal averaging. Th enum ber of verages s limited by henum ber of nalysis windows whichcan be fit into thetationaryspeech time interval. The choice of window length and averaging interval must ompromisebetwe en conflicting requirements.Fo racceptable pectral esolution a window engthgreater than twice the expected largest pitch period is requiredwith a 256 -point window being used. For minimum noisevariance a large nu mb er of windows are required fo r averagingFinally, for acceptable time resolution a narrow analysis interval is required. A reasonable compromise between variancereductionand time resolution appears to be three averagesThis results in an effective analysis time interval of 38 ms.

    E. Bias Estima tionThe pectral ubtractionmethod requires an stimate a

    each frequen cy bin of the expected value of noise magnitu despectrumP N

    This estimate sobtain ed by averaging the signalmagnitudespectrum 1x1 duringnonspeech ctivity. Estimating inthis manner places certain constraints when im plementing themet hod . If the noise remains stationary during the subsequenspeech activity, then an initial startup or calibration period ofnoise-o nly signal is requir ed. During this period (on the orderof a third of a second) an estimate of can be comp uted. Ifthe noise environm ent is nonstationary , then a new estimateof p N must be calculated prior to bias removal each time thenoise spectrum changes. Since the estimate is com puted usingthe no ise-only signal durin g non speech ac tivity, a voice switchis required. When the voice switch is off , anaveragenoisespectrum can be recomputed . If the noisemagnitude pec-trum ischanging faster than an estimate of t canbe com-pute d, hen time averaging to estimate p N cannot be used.Likewise, if the expected value of the noise spectrum changesafter an es timate of it has been comp uted, then noise reduction throug h bias removal will be less effective or even harm-ful , i.e., removing speech where little noise is present.F. Bias Removal and Half-Wave Rectification

    The spectral subtractio n spectral estimate is obtained bysubtracting the expected noise magnitude spectrum from themagnitude signal spectrum Thu sk = 0 , 1 ; . . , L - I

    OK

    where L DFT buffer length.After ubtractin g, he differenced values aving egativemagnitudes are set to zero (half-wave rectification). These

    AlultIXDoM1a1UfIX Ra

  • 7/30/2019 Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

    5/8

    BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH

    negative differences represent frequencies where the sum ofspeech plus local noise is less han the expecte d noise.G. Residual Noise Reduction

    As discussed in th e previous section, the noise th at remainsafter the mean is removed can be suppressed or even removedby selecting the minimum magnitude value from the threeadjacent analysis frames in ach requenc y bin where thecurrentamplitude is ess than the maximum noise residualmeasured duringnonspee ch activity. This replacement pro-cedure follows bias removal and half-wave rectification . Sincethe minimum is chosen from values on each side of the currenttime frame, the modification induces a one frame delay. Theimprovement in performance was judged superior to threeframe averaging in hat an equivalent am ount of noise sup-pression resulted wi tho ut he adverse effect of high-energyspectral smoothin g. The following section presents examplesof spectra with and w ithout residual noise reduction.H. Add itional Noise Suppression During Nonspeech Activity

    The final improvem ent in noise reduction is signal suppres-sion during nonspee ch activity. Aswas discussed, a balancemust be maintained between the magnitude and characteristicsof the noise that is perceived during speech activity and thenoise th at is perceived during speech absence.An effective speech activity detector was defined using spec-tra generated by the spectral ubtraction algorithm. Thisdetector required thedetermination of a threshold signalingabsence of speech activity. This threshold ( T 12 dB) wasempirically determined to ensure tha t only signals definitelyconsisting of background noise would be atten uated .

    117

    Fig. 3. System block diagram.

    I. SynthesisAfter bias removal, rectification, residual noise removal, andnonspeech signal suppression a time waveform is reconstructedfrom the modified magnitude corresponding to the center win-dow. Again, since only real data are generated, two time win-dows are computed simultaneously using one inverse FFT.The data windows are then overlap added to form the outputspeech sequence. The overall system block diagram is given inFig. 3.

    VI. RESULTSA. ntroduction

    Examples of the performance of spectral subtraction will bepresented in two forms: isometric plots of time versus fre-quency magnitude spectra, with and without noise cancella-tion; and intelligibility andquality measurement obtainedfrom the Diagnostic Rhyme Test (DRT) 11 The DRT is awell-established methodor evaluating speech processingdevices. Testing and scoring of the DRT data basewas pro-vided by Dynastat nc. [12 ]. A limited single speaker DRTtest was used. The DRT data base consisted of 192 wordsusing speaker RH recorded inahelicopterenvironment.Acrew of 8 listeners was used.The results are presented as follows: 1 ) short-time ampli-tude spectra of helicopter speech; 2) DRT intelligibility andquality scores on LPC vocoded speech using as input th e data

    given in 2); and 3) short-time spectra showing additional im-provements in noise rejection through residual noise suppres-sion and nonspeech signal attenua tion.B. Sho rt-Time Spectra of Helicopter Speech

    Isometric plots of time versus frequency magnitude spectrawere constructed from the data by computing and displayingmagnitude spectra from 64 overlapped Hanning windows.Each line represents a 128-p oint frequency analysis. Timeincreases from bott om to top nd frequency from left to right.A 92 0 ms section of speech recorded with a noise-cancellingmicrophone inahelicopterenvironment is presented. Thephrase Save your was filtered at 3.2 kHz and sampled at

    6.67 kHz. Since the noise was acoustically add ed, no under-lying clean speech signal is available. Fig. 4 shows the digitizedtime signal. Fig. 5 shows the averagenoise magnitude spec-trum computed by averagingover the first 30 0 ms of non -speech activity. The short-tim e spectrum of the noisy signalis shown in Fig. 6 . Note the high amplitude, narrow-bandridges corresponding to the fundam ental (1550 Hz) and firstharmonic (3100 Hz) of the helicop ter engine, as well as theramped noise floor above 180 0 Hz. Fig. 7 shows the resultfrom bias removal and rectification. Figs. 8 and 9 show thenoisy spectrum and the spectral ubtractionestimate usingthree frame averaging.These figures indicate that considerable noise rejection has

    AlultIXDoM1a1UfIX Ra

  • 7/30/2019 Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

    6/8

    IEEE TRANSACTIONSNCOUSTICS,PEECH,NDIGNALROCESSING,OL.SSP-21,O. 2, APRIL 19192 145Et4

    I I I 1I I.

    8 . 8

    I I3

    Fig. 4. Time waveform of helicopter speech. Save your.3

    3 0

    Fig. 5. Average noise magnitude of helicopter noise.

    Fig. 6 . Short-time spectrum of helicopter speech. Fig. 7. Short-tim epectrum using bias removal and half-waverectification.

    Fig. 8. Short-tim e pectrum of helicopter peech using three rame Fig. 9. Short-time pectrum using bias removal and half-wave rectifi-averaging. cationfter threerame averaging.

    AlultIXDoM1a1UfIX Ra

  • 7/30/2019 Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

    7/8

    BOLL: SUPPRESSION OF ACOUSTIC NOISE IN SPEECH 119

    been chieved , lthough ome noise residual remain s. T henext s tep was to quantitatively measure the effect of spectralsubtraction on intelligibility andquality.Fo r this tasklimited single speaker DRT was invoked to establish an anch orpoint for credibility.C Intelligibility and Quality Results using the DRT

    The DRT data base consisted of 192 words recorded in ahelicopte r environm ent. The dat a base was filtered at 4 kHzand sampled at 8 kHz. During the pause between each word,the noise bias was updated. Six output speech filesweregenerated: 1) digitized original; 2) speech resulting from biasremoval and rectification without averaging; 3) speech result-ing from bias removal and rectification using three averages;4) an LPC vocoded version of original speech; 5) an LPCvocoded version of 2); and 6 ) an LPC vocode d version of 3).The last three experim ents were conducted to measure intelli-gibility andquality mprov emen ts resulting from he useofspectral subtraction as apreprocessor t o an LPC analysis-synthesis device. The LPC vocoder usedwas a nonreal-timefloating-point implementation 171 A ten-pole autocorrela-tion implementation was used with a SIFT pitch trackerThe channel parameters used for synthesis were not quantized.Thus, any degradation would no t be attributed t o parameterquantization, but rather to the all-pole approximation to thespectrumand to he buzz-hissapproximation to theerrorsignal. In addition, a frameate of 40 frames/s was used whichis typical of 2400 bit/s implementations. The vocoder on 3.2kHz filtered clean speech achieved a DRT score of 8.In add ition o intelligibility, a c oarsemeasure of quality 191was conducted using the same DRT data base. These qualityscores are neither quantitative ly nor qualitatively equivalentto the more rigorous qualityests such as PARM or DAM [20]However, the y do indicate on a relativescale improvementsbetween da ta sets. Modern 2.4 kbit/s systems are expected torange from 45 t o 50 on comp osite acceptability; unprocessedspeech, 88-92.The results of the tests are ummarized n Tables I-IV.Tables I and I1 indicate that spectral subtraction alone doesno t decre ase intelligibility, but does increase quality, especiallyin the areas of increased pleasantness and inconspicuousness ofnoise background. Tables I11 and IV clearly indicate that spec-tral s ubtra ction can be used to improve the intelligibility andquality of speech processed throu gh an LPC bandwidth com-pression device.D. Short-Time Spectra Using Residual Noise Reduc tion andNonspeech Signal Attenuatio n

    Based on hepromisin g results of these prelimin aryDRTexperiments, the algorithm was modified to incorporate resid-ual noise reduction and nonspee ch signal atten uatio n. Fig.shows the short-time spectra using the helicopter speech datawith bo th m odifications added. Note that now noise betweenwords has been reduced below the resolution f the graph, andnoise within th e wo rds has been significantly atten uate d (c om -pare with Fig. 7).

    TABLE IDIAGNOSTICHYME EST CORESi (N o s^

    TABLE I1QUALITYATINGSS (N o

    63 60

    30 33

    TABLE I11DIAGNOSTICHYME ESTSCORES

    56

    66

    TABLE IVQUALITYATINGS

    s

    34

    AlultIXDoM1a1UfIX Ra

  • 7/30/2019 Suppression of Acoustic Noise in Speech Using Spectral Subtraction-YNI

    8/8

    12 0 IEEE TRANSACTION S ON ACOUSTICS, SPECECH, AND SIGN AL PROCESSING, VOL. ASSP-27, NO. 2, APRIL 197

    Fig. 10. Short-time spectrum using bias removal, half-wave rectifica-tion, residual noise reductio n, nd nonspeech signal atten uatio n(helicopter speech).

    v. SUMMARY AND CONCLUSIONSA preprocessing noise suppression algorithm using spectralsubtractionha sbeendeveloped, mplemented, n d ested.Spectralestimates or hebackground noisewere obtainedfrom he nput signal duringnonspeech activity. Thealgo-

    rithm can be implemented using a single microphone sourceand requires abou t he same compu tation as ahigh-speechconvolution. Its performance was demonstrated using short-time spectra with and w ithout noise suppression and quantita-tively tested for mpro veme nts in intelligibility andqualityusing the Diagnostic Rhyme Test con ducted by Dynastat nc.Resul ts indicate overall significant impro vemen ts in qualityand intelligibility when used as a prepro cessor o an LPC peechanalysis-synthesis vocoder.ACKNOWLEDGMENT

    The author would like to thank K. Evans for preparation ofthe manuscript and M. Milochik for preparation of the photo-graphs.REFERENCES

    [ l ] B. Gold, R obust speech processing, M.I.T. Lincoln Lab., Tech.Note 197 6-6, DDC AD-A012 P99/0, Jan. 27, 1976.[2 ] M. R. Samburand N. S. Jayant, LPC analysis/synthesis fromspeech inputs containing uantizing noise or additivewhite

    noise,. ZEEE Trans. Acoust.,Speech,Signal Processing, voASSP-24, pp. 488-494, Dec. 1976.[3 ] D. Coulter, private communication.[4 ] F. Boll, Improvin g linear prediction analysis of noisy speecby predictivenoisecancellation, in hoc. IEEE Znt. Coni onAcoust., Speech, Sign al Processing, Philadelphia, PA, Apr. 12-14[ 5 ] J. S. Lim and A. V. Oppe nheim , All pole modeling of degradedspeech, ZEEE Trans. Acoust., Speech, Signal Processing, volASSP-26, pp. 197-210, Jun e 19 78.[ 6 ] B. Gold, Digital speechnetworks, Proc. ZEEE, vol. 65, pp1636-1658, Dec. 1977.[7] B. Beek, E. P. Neuberg , and D. C. Hodge, An assessment of thtechnologyof auto mat ic speech ecognition for militaryappli-cations, IEEE Trans. Acoust.,Speech,Signal Processing, vo[ 8 ] J. D. Markel, Text indep endent speaker identific ation rom large linguistically unconstrainedime-spaced data base, iProc. IEEE Znt. Coni on Acoust.,Speech,Signal ProcessingTulsa, OK, Apr. 1978 , pp. 287-291.[9] B. Widrow et al., Adaptive noise cancelling: Principles anapplications,Proc. ZEEE, vol. 63, pp. 1692-1716, Dec. 1975.[ l o ] S . F. Boll and D. Pulsipher, Noise suppression metho ds orrobust speech processing, Dep. Comput. Sci.,Univ. Utah, SaLake City, Semi-Annu. Tech. Rep., Utec-CSc-77-202, pp. 50-54Oct. 1977.11 W. D. Voiers, A. D. Sharpley , and C. H. Helmsath, Research o

    diagnostic evaluation of speech intelligibility, AFSC , Final Rep.Contract AF19628-70-C-O182,1973.12 1 M. R.Weiss, E. Aschkenasy, and T. W. Parsons, Studyan ddevelopment f the NTEL echniqu e or improving speecintelligibility, Nicolet Scientific Corp., Final Rep. NSC-FR/4023Dec. 1974.[ 1 3 ] J. Makhoul and J. Wolf, Linear prediction and he spectraanalysis of speech,Bolt, Beranek, and Newm an Inc., BBNRep. 2304, NTIS No. AD-749066, pp. 172-185,1972.I141 0. Brigham, The Fast Fourier Transform. Englewood CliffNJ: Prentice-Hall, 1974 .15 1 J . Allen, Short term spectral analysis, synthesis, and modification by discrete Fourier transform, IEEE Trans. Acoust., SpeechSignal Processing, vol. ASSP-25, pp. 235-239, Jun e 1 97 7.

    1976 , pp. 10-13.

    ASSP-25, pp. 310-322, Aug. 1977.

    [16] Dynastat Inc. , Austin, TX 78705.[ 1 7 ] S. F. Boll, Selected meth ods or improving synthesis speecquality using linear predictive coding: Systemdescription,co -efficient moothing and STREAK, Dep. Comput. Sci.,Univ.Utah , Salt Lake City, UTEC-CS-74-151, Nov. 1974.[1 8 ] J . D. Markel and A. H. G ray, Linear Prediction of Speech. NewYork: Springer-Verlag, 1976 , pp. 206-210.191 In-house research, Dynastat Inc., Austin, TX.[2 0 ] W. D. Voiers, Diagnostic acceptability measure for speech commun ication ystems, in Proc. Znt. Conf. on Acoust., SpeechSignal Processing, Hartfo rd, CT, May 197 7, pp. 204-207.[ 2 1 ] S. F. Boll, Suppression of noise in speech using the SABEmethod, in Proc. ZEEE Znt. Conf . on Acoust., Speech, SignProcessing, Tulsa, OK, Apr. 1 978 , pp. 606-609.