DeepNeural Networks for Acoustic …web.cse.ohio-state.edu/~wang.7642/homepage/files...DeepNeural...

Preview:

Citation preview

Deep Neural Networks forAcoustic Modeling in Speech

Recognition

Hinton,Geoffrey,etal.“Deepneuralnetworksforacousticmodelinginspeechrecognition:Thesharedviewsoffourresearchgroups.” Signal

ProcessingMagazine,IEEE 29.6(2012):82-97.

Presented by PeidongWang04/04/2016

1

Content

• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion

2

Content

• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion

3

SpeechRecognitionSystem

• Goal• Convertingspeechtotext

• AMathematicalPerspective

orw = argmax

w{P(w |Y )}

w = argmaxw

{P(Y |w)P(w)}

4

Content

• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion

5

GMM-HMMModel

• GMM and HMM• GMM is short for Gaussian Mixture Model, and HMM isshort for Hidden Markov Model.

• PredecessorofDNNs• Before Deep Neural Networks (DNNs), the most commonlyused speech recognition systemswere consistedof GMMsand HMMs.

6

GMM-HMMModel

• HMM• HMMisusedtodealwiththetemporalvariabilityofspeech.

• GMM• GMMisusedtorepresenttherelationshipbetweenHMMstatesandtheacousticinput.

7

GMM-HMMModel

• Features• ThefeaturesistypicallyrepresentedbyconcatenatingMel-frequencycepstralcoefficients(MFCCs)orperceptuallinearpredictivecoefficients(PLPs)computedfromtherawwaveformandtheirfirst- andsecond-ordertemporaldifferences.

8

GMM-HMMModel

• Shortcoming• GMM-HMMmodelsarestatisticallyinefficientformodelingdatathatlieonornearanonlinearmanifoldinthedataspace.• Forexample,modelingthesetofpointsthatlieveryclosetothesurfaceofasphere.

9

Content

• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion

10

TrainingDeepNeuralNetworks

• DeepNeuralNetwork(DNN)• ADNNisafeed-forward,artificialneuralnetworkthathasmorethanonelayerofhiddenunitsbetweenitsinputsanditsoutputs.•Withnonlinearactivationfunctions,DNNisabletomodelanarbitrarynonlinearfunction(projectionfrominputstooutputs).[*]

[*]Addedbythepresenter.

11

TrainingDeepNeuralNetworks

• ActivationFunctionoftheOutputUnits• Theactivationfunctionoftheoutputunitsis“softmax”function.• Themathematicalexpressionisasfollows.

pj =exp(x j )exp(xk )

k∑

12

TrainingDeepNeuralNetworks

• ObjectiveFunction•Whenusingthesoftmaxoutputfunction,thenaturalobjectivefunction(costfunction)Cisthecross-entropybetweenthetargetprobabilitiesdandtheoutputsofthesoftmax,p.• Themathematicalexpressionisasfollows.

C = dj log pjj∑

13

TrainingDeepNeuralNetworks

•WeightPenaltiesandEarlyStopping• Toreduceoverfitting,largeweightscanbepenalizedinproportiontotheirsquaredmagnitude,orthelearningcansimplybeterminatedatthepointwhichperformanceonaheld-outvalidationsetstartsgettingworse.

14

TrainingDeepNeuralNetworks

• OverfittingReduction• Generallyspeaking,therearethreemethods.•Weightpenaltiesandearlystoppingcanreducetheoverfittingbutonlybyremovingmuchofthemodelingpower.• Verylargetrainingsetscanreduceoverfittingbutonlybymakingtrainingverycomputationallyexpensive.• GenerativePretraining

15

Content

• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion

16

GenerativePretraining

• Purpose• Themultiplelayersoffeaturedetectors(theresultofthisstep)canbeusedasagoodstartingpointforadiscriminative“fine-tuning”phaseduringwhichbackpropagationthroughtheDNNslightlyadjuststheweightsandimprovestheperformance.• Inaddition,thisstepcansignificantlyreduceoverfitting.

17

GenerativePretraining

• RestrictedBoltzmannMachine(RBM)• RBMconsistsofalayerofstochasticbinary“visible”unitsthatrepresentbinaryinputdataconnectedtoalayerofstochasticbinaryhidden (latent)unitsthatlearntomodelsignificantnonindependenciesbetweenthevisibleunits.• Thereareundirectedconnectionsbetweenvisibleandhiddenunitsbutnovisible-visibleorhidden-hiddenconnections.

18

GenerativePretraining

• RestrictedBoltzmannMachine(RBM)(Cont’d)• TheframeworkofanRBMisshownbelow.

From:SlidesinCSE5526NeuralNetworks19

GenerativePretraining

• RestrictedBoltzmannMachine(RBM)(Cont’d)• RBMusesasinglesetofparameters,W,todefinethejointprobabilityofavectorofvaluesoftheobservablevariables,v,andavectorofvaluesofthelatentvariables,h,viaanenergyfunction,E.

20

p(v,h;W ) = 1Ze−E (v,h;W ),Z = e−E (v ',h ';W )

v ',h '∑

E(v,h) = − aivii∈visible∑ − bjhj

j∈visible∑ − vihjwij

i, j∑

GenerativePretraining

• RestrictedBoltzmannMachine(RBM)(Cont’d)• Theprobabilitythatthenetworkassignstoavisiblevector,v,isgivenbysummingoverallpossiblehiddenvectors.

• Thederivativeofthelogprobabilityofatrainingsetwithrespecttoaweightissurprisinglysimple.Theanglebracketsdenoteexpectationsunderthecorrespondingdistribution.

p(v) = 1Z

e−E (v,h)h∑

1N

∂log p(vn )∂wijn=1

N

∑ =< vihj >data − < vihj >model

21

GenerativePretraining

• RestrictedBoltzmannMachine(RBM)(Cont’d)• Thelearningruleisthusasfollows.

• Abetterlearningprocedureiscontrastivedivergence(CD),whichisshownbelow.Thesubscript“recon”denotesastepinCDwhenthestatesofvisibleunitsareassigned0or1accordingtothecurrentstatesofthehiddenunits.

Δwij = ε(< vihj >data − < vihj >model )

Δwij = ε(< vihj >data − < vihj >recon )

22

GenerativePretraining

•ModelingReal-ValuedData• Real-valueddata,suchasMFCCs,aremorenaturallymodeledbylinearvariableswithGaussiannoiseandtheRBMenergyfunctioncanbemodifiedtoaccommodatesuchvariables,givingaGaussian-BernoulliRBM(GRBM).

E(v,h) = (vi − ai )2

2σ i2

i∈vis∑ − bjhj

j∈hid∑ − vi

σ i

hjwiji, j∑

23

GenerativePretraining

• StackingRBMstoMakeaDeepBeliefNetwork• AftertraininganRBMonthedata,theinferredstatesofthehiddenunitscanbeusedasdatafortraininganotherRBMthatlearnstomodelthesignificantdependenciesbetweenthehiddenunitsofthefirstRBM.• Thiscanberepeatedasmanytimesasdesiredtoproducemanylayersofnonlinearfeaturedetectorsthatrepresentprogressivelymorecomplexstatisticalstructureinthedata.

24

GenerativePretraining

• StackingRBMstoMakeaDeepBeliefNetwork(Cont’d)

From:Thepaper25

GenerativePretraining

• InterfacingaDNNwithanHMM• InanHMMframework,thehiddenvariablesdenotethestatesofthephonesequence,andthe“visible”variablesdenotethefeaturevectors.[*]

[*]Addedbythepresenter

From:Gales,Mark,andSteveYoung."TheapplicationofhiddenMarkovmodels inspeechrecognition.”Foundationsandtrendsinsignalprocessing 1.3(2008):195-304. 26

GenerativePretraining

• InterfacingaDNNwithanHMM(Cont’d)• TocomputeaViterbialignmentortoruntheforward-backwardalgorithmwithintheHMMframework,werequirethelikelihoodp(AcousticInput|HMMstate).• ADNN,however,outputsprobabilitiesoftheformp(HMMstate|AcousticInput).

27

GenerativePretraining

• InterfacingaDNNwithanHMM(Cont’d)• TheposteriorprobabilitiesthattheDNNoutputscanbeconvertedintothescaledlikelihoodbydividingthembythefrequenciesoftheHMMstatesintheforcedalignmentthatisusedforfine-tuningtheDNN.• Forcedalignment isaprocedureusedtogeneratelabelsforthetrainingprocess.[*]

[*]Addedbythepresenter

28

GenerativePretraining

• InterfacingaDNNwithanHMM(Cont’d)• All of the likelihoods produced in this way are scaled by thesame unknown factor of p(AcousticInput).• Although this appears to have little effect on somerecognition tasks, it can be important for tasks wheretraining labels are highly unbalanced.

29

Content

• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion

30

Experiments

• PhoneticClassificationandRecognitiononTIMIT• TheTIMITdatasetisarelativelysmalldatasetwhichprovidesasimpleandconvenientwayoftestingnewapproachestospeechrecognition.

31

Experiments

• PhoneticClassificationandRecognitiononTIMIT(Cont’d)

From:Thepaper32

Experiments

• Bing-Voice-SearchSpeechRecognitionTask• Thistaskused24hoftrainingdatawithahighdegreeofacousticvariabilitycausedbynoise,music,side-speech,accents,sloppypronunciation,etal.• ThebestDNN-HMMacousticmodelachievedasentenceaccuracyof69.6%onthetestset,comparedwith63.8%forastrong,minimumphoneerror(MPE)-trainedGMM-HMMbaseline.

33

Experiments

• Bing-Voice-SearchSpeechRecognitionTask(Cont’d)

From:Thepaper 34

Experiments

• OtherLargeVocabularyTasks• SwitchboardSpeechRecognitionTask(acorpuscontainingover300hoftrainingdata)• GoogleVoiceInputSpeechRecognitionTask• YouTubeSpeechRecognitionTask• EnglishBroadcastNewsSpeechRecognitionTask

35

Experiments

• OtherLargeVocabularyTasks(Cont’d)

From:Thepaper 36

Content

• SpeechRecognitionSystem• GMM-HMMModel• TrainingDeepNeuralNetworks• GenerativePretraining• Experiments• Discussion

37

Discussion

• ConvolutionalDNNsforPhoneClassificationandRecognition• AlthoughconvolutionalmodelsalongthetemporaldimensionachievedgoodclassificationresultsonTIMITcorpus,applyingthemtophonerecognitionisnotstraightforward.• ThisisbecausetemporalvariationsinspeechcanbepartiallyhandledbythedynamicprogramingprocedureintheHMMcomponentandhiddentrajectorymodels.

38

Discussion

• SpeedingUpDNNsatRecognitionTime• ThetimethataDNN-HMMsystemrequirestorecognize1sofspeechcanbereducedfrom1.6sto210ms,withoutdecreasingrecognitionaccuracy,byquantizingtheweightsdownto8busingCPU.• Alternatively,itcanbereducedto66msbyusingagraphicsprocessingunit(GPU).

39

Discussion

• AlternativePretrainingMethodsforDNNs• ItispossibletolearnaDNNbystartingwithashallowneuralnetwithasinglehiddenlayer.Oncethisnethasbeentraineddiscriminatively,asecondhiddenlayerisinterposedbetweenthefirsthiddenlayerandthesoftmaxoutputunitsandthewholenetworkisagaindiscriminativelytrained.Thiscanbecontinueduntilthedesirednumberofhiddenlayersisreached,afterwhichfullbackpropagationfine-tuningisapplied.

40

Discussion

• AlternativePretrainingMethodsforDNNs(Cont’d)• PurelydiscriminativetrainingofthewholeDNNfromrandominitialweightsworkswell,too.• Varioustypesofautoencoderwithonehiddenlayercanalsobeusedinthe layer-by-layergenerativepretrainingprocess.

41

Discussion

• AlternativeFine-TuningMethodsforDNNs•MostDBN-DNNacousticmodelsarefine-tunedbyapplyingstochasticgradientdescentwithmomentumtosmallminibatchesoftrainingcases.•Moresophisticatedoptimizationmethodscanbeused,butitisnotclearthatthemoresophisticatedmethodsareworthwhilesincethefine-tuningprocessistypicallystoppedearlytopreventoverfitting.

42

Discussion

• UsingDBN-DNNstoProvideInputFeaturesforGMM-HMMSystems• ThisclassofmethodsuseneuralnetworkstoprovidethefeaturevectorsforthetrainingprocessoftheGMMinaGMM-HMMsystem.• Themostcommonapproachistotrainarandomlyinitializedneuralnetwithanarrowbottleneckmiddlelayerandtousetheactivationsofthebottleneckhiddenunitsasfeatures.

43

Discussion

• UsingDNNstoEstimateArticulatoryFeaturesforDetection-BasedSpeechRecognition• DBN-DNNsareeffectivefordetectingsubphoneticspeechattributes(alsoknownasphonologicalorarticulatoryfeatures).

44

Discussion

• Summary•MostofthegaincomesfromusingDNNstoexploitinformationinneighboringframesandfrommodelingtiedcontext-dependentstates.• Thereisnoreasontobelievethattheoptimaltypesofhiddenunitsortheoptimalnetworkarchitecturesareused,anditishighlylikelythatboththepretrainingandfine-tuningalgorithmscanbemodifiedtoreducetheamountofoverfittingandtheamountofcomputation.

45

Thank You!

46

InvestigationofSpeechSeparationasaFront-Endfor

NoiseRobustSpeechRecognition

Narayanan,Arun,andDeLiangWang."Investigationofspeechseparationasafront-endfornoiserobustspeechrecognition."Audio,Speech,andLanguageProcessing,IEEE/ACMTransactionson 22.4

(2014):826-835.

Presented by PeidongWang04/04/2016

47

Content

• Introduction• SystemDescription• EvaluationResults• Discussion

48

Content

• Introduction• SystemDescription• EvaluationResults• Discussion

49

Introduction

• Background• Althoughautomaticspeechrecognition(ASR)systemshavebecomefairlypowerful,theinherentvariabilitycanstillposechallenges.• Typically,ASRsystemsthatworkwellincleanconditionssufferfromadrasticlossofperformanceinthepresenceofnoise.

50

Introduction

• Feature-BasedMethods• Thisclassofmethodsfocusonfeatureextractionorfeaturenormalization.• Feature-basedtechniqueshavethepotentialtogeneralizewell,butdonotalwaysproducethebestresults.

51

Introduction

• TwoGroupsofFeature-BasedMethods•Whenstereo[*] data isunavailable,priorknowledgeaboutspeechand/ornoiseisused,suchasspectralreconstructionbasedmissingfeaturemethods,directmaskingmethodsandfeatureenhancementmethods.•Whenstereodataisavailable,featuremappingmethodsandrecurrentneuralnetworkshavebeenused.

[*]Bystereowemeannoisyandthecorresponding cleansignals.

52

Introduction

•Model-BasedMethods• TheASRmodelparametersareadaptedtomatchthedistributionofnoisyorenhancedfeatures.•Model-basedmethodsworkwellwhentheunderlyingassumptionsaremet,buttypicallyinvolvesignificantcomputationaloverhead.• Thebestperformancesareusuallyobtainedbycombiningfeature-basedandmodel-basedmethods.

53

Introduction

• SupervisedClassificationBasedSpeechSeparation• Stereotrainingdataisalsousedbysupervisedclassificationbasedspeechseparationalgorithms.• Suchalgorithmstypicallyestimatetheidealbinarymask(IBM)-abinarymaskdefinedinthetime-frequency(T-F)domainthatidentifiesspeechdominantandnoisedominantT-Funits.• Theabovemethodcanbeextendedtoidealratiomask(IRM),which representstheratioofspeechtomixture energy.

54

Content

• Introduction• SystemDescription• EvaluationResults• Discussion

55

SystemDescription

• BlockDiagramoftheProposedSystem

From:Thepaper56

SystemDescription

• AddressingAdditiveNoiseandConvolutionalDistortion• Theadditivenoiseandtheconvolutionaldistortionaredealtwithintwoseparatestages:Noiseremovalfollowedbychannelcompensation.• NoiseisremovedviaT-FmaskingusingtheIRM.Tocompensateforchannelmismatchandtheerrorsintroducedbymasking,welearnanon-linearmappingfunctionthatundoesthesedistortions.

57

SystemDescription

• Time-FrequencyMasking

58

SystemDescription

• Time-FrequencyMasking(Cont’d)• HeretheauthorsperformT-Fmaskinginthemel-frequencydomain,unlikesomeoftheothersystemsthatoperateinthegammatonefeaturedomain.• Toobtainthemel-spectrogramofasignal,itisfirstpre-emphasizedandtransformedtothelinearfrequencydomainusinga320channelfastFouriertransform(FFT).A20msecHammingwindowisused. The161-dimensionalspectrogramisthenconvertedtoa26-channelmel-spectrogram.

59

SystemDescription

• Time-FrequencyMasking(Cont’d)• TheauthorsuseDNNstoestimatetheIRMasDNNsshowgoodperformanceandtrainingusingstochasticgradientdescentscaleswellcomparedtoothernonlineardiscriminativeclassifiers.

60

SystemDescription

• Time-FrequencyMasking(Cont’d)• TargetSignal• Theidealratiomaskisdefinedastheratioofthecleansignalenergytothemixtureenergyateachtime-frequencyunit.• Themathematicalexpressionisshownbelow.

IRM (t, f ) = 10(SNR(t , f )/10)

10(SNR(t , f )/10) +1SNR(t, f ) = 10 log10 (X(t, f ) / N(t, f ))

61

SystemDescription

• Time-FrequencyMasking(Cont’d)• TargetSignal• RatherthanestimatingIRMdirectly,theauthorsestimateatransformedversionoftheSNR.• Themathematicalexpressionofthesigmoidaltransformationisshownbelow.

d(t, f ) = 11+ exp(−α (SNR(t, f )− β ))

62

SystemDescription

• Time-FrequencyMasking(Cont’d)• TargetSignal• Duringtesting,thevaluesoutputfromtheDNNaremappedbacktotheircorrespondingIRMvalues.

63

SystemDescription

• Time-FrequencyMasking(Cont’d)• Features• Featureextractionisperformedbothatthefullbandandthesubbandlevel.• Thecombinationoffeatures,31dimensionalMFCCs,13dimensionalFASTAfilteredPLPsand15dimensionalamplitudemodulationspectrogram(AMS)features,areused.

64

SystemDescription

• Time-FrequencyMasking(Cont’d)• Features• ThefullbandfeaturesarederivedbysplicingtogetherfullbandMFCCsandRASTA-PLPs,alongwiththeirdeltaandaccelerationcomponents,andsubbandAMSfeatures.• ThesubbandfeaturesarederivedbysplicingtogethersubbandMFCCs,RASTA-PLPs,andAMSfeatures.Someauxiliarycomponentsarealsoadded.

65

SystemDescription

• Time-FrequencyMasking(Cont’d)• SupervisedLearning• IRMestimationisperformedintwostages.Inthefirststage,multipleDNNsaretrainedusingfullbandandsubbandfeatures.ThefinalestimateisobtainedusinganMLPthatcombinestheoutputofthefullbandandthesubbandDNNs.

66

SystemDescription

• Time-FrequencyMasking(Cont’d)• SupervisedLearning• ThefullbandDNNswouldbecognizantoftheoverallspectralshapeoftheIRMandtheinformationconveyedbythefullbandfeatures,whereasthesubbandDNNsareexpectedtobemorerobusttonoiseoccurringatfrequenciesoutsidetheirpassband.

67

SystemDescription

• Time-FrequencyMasking(Cont’d)

From:Thepaper 68

SystemDescription

• FeatureMapping

69

SystemDescription

• FeatureMapping(Cont’d)• EvenafterT-Fmasking,channelmismatchcanstillsignificantlyimpactperformance.• Thishappensfortworeasons.Firstly,thealgorithmlearnstoestimatetheratiomaskusingmixturesofspeechandnoiserecordedusingasinglemicrophone.Secondly,becausechannelmismatchisconvolutional,speechandnoise,whichnowincludesbothbackgroundnoiseandconvolutivenoise,areclearlynotuncorrelated.

70

SystemDescription

• FeatureMapping(Cont’d)• Thegoaloffeaturemappinginthisworkistolearnspectro-temporalcorrelationsthatexistinspeechtoundothedistortionsintroducedbyunseenmicrophonesandthefirststageofthealgorithm.

71

SystemDescription

• FeatureMapping(Cont’d)• TargetSignal• Thetargetisthecleanlog-melspectrogram(LMS).The“clean”LMSherecorrespondstothoseobtainedfromthecleansignalsrecordedusingasinglemicrophoneinasinglefiltersetting.

72

SystemDescription

• FeatureMapping(Cont’d)• TargetSignal• InsteadofusingtheLMSdirectlyasthetarget,theauthorsapplyalineartransformtolimitthetargetvaluestotherange[0,1]tousethesigmoidaltransferfunctionfortheoutputlayeroftheDNN.• Themathematicalexpressionisasfollows.

Xd (t, f ) =ln(X(t, f ))−min(ln(X(⋅, f )))

max(ln(X(⋅, f )))−min(ln(X(⋅, f )))

73

SystemDescription

• FeatureMapping(Cont’d)• TargetSignal• Duringtesting,theoutputoftheDNNismappedbacktothedynamicrangeoftheutterancesintrainingset.

74

SystemDescription

• FeatureMapping(Cont’d)• Features• TheauthorsuseboththenoisyandthemaskedLMS.

• SupervisedLearning• UnliketheDNNsusedforIRMestimation,thehiddenlayersoftheDNNforthistaskuserectifiedlinearunits(ReLUs).Inaddition,theoutputlayerusessigmoidactivations.

75

SystemDescription

• FeatureMapping(Cont’d)

From:Thepaper76

SystemDescription

• AcousticModeling

77

SystemDescription

• AcousticModeling(Cont’d)• TheacousticmodelsaretrainedusingtheAurora-4dataset.• Aurora-4isa5000-wordclosedvocabularyrecognitiontaskbasedontheWallStreetJournaldatabase.Thecorpushastwotrainingsets,cleanandmulti-condition,bothwith7138utterances.

78

SystemDescription

• AcousticModeling(Cont’d)• GaussianMixtureModels• TheHMMsandtheGMMsareinitiallytrainedusingthecleantrainingset.Thecleanmodelsarethenusedtoinitializethemulti-conditionmodels;bothcleanandmulti-conditionmodelshavethesamestructureanddifferonlyintransitionandobservationprobabilitydensities.

79

SystemDescription

• AcousticModeling(Cont’d)• DeepNeuralNetworks• Theauthorsfirstalignthecleantrainingsettoobtainsenonelabelsateachtime-frameforallutterancesinthetrainingset.DNNsarethentrainedtopredicttheposteriorprobabilityofsenonesusingeithercleanfeaturesorfeaturesextractedfromthemulti-conditionset.

80

SystemDescription

• DiagonalFeatureDiscriminantLinearRegression

81

SystemDescription

• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• dFDLRisasemi-supervisedfeatureadaptationtechnique.• ThemotivationfordevelopingdFDLRistoaddresstheproblemofgeneralizationtounseenmicrophoneconditionsinourdataset,whichiswheretheDNN-HMMsystemsperformtheworst.

82

SystemDescription

• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• ToapplydFDLR,wefirstobtainaninitialsenone-levellabelingforourtestutterancesusingtheunadaptedmodels.Featuresarethentransformedtominimizethecross-entropyerrorinpredictingtheselabels.• Themathematicalexpressionsareasfollow.

Ot ( f ) = wf iOt ( f )+ bf

min E(st ,Dout (Ot−5...Ot+5 ))t∑

83

SystemDescription

• DiagonalFeatureDiscriminantLinearRegression(Cont’d)• TheparameterscaneasilybelearnedwithintheDNNframeworkbyaddingalayerbetweentheinputlayerandthefirsthiddenlayeroftheoriginalDNN. Afterinitialization,thestandardbackpropagationalgorithmisrunfor10epochstolearntheparametersofthedFDLRmodel. Duringbackpropagation,weightsoftheoriginalhiddenlayersarekeptunchangedandonlytheparametersinthedFDLRareupdated.

84

Content

• Introduction• SystemDescription• EvaluationResults• Discussion

85

EvaluationResults

From:Thepaper86

EvaluationResults

From:Thepaper87

Content

• Introduction• SystemDescription• EvaluationResults• Discussion

88

Discussion

• Severalinterestingobservationscanbemadefromtheresultspresentedintheprevioussection.• Firstly,theresultsclearlyshowthatthespeechseparationfront-endisdoingagoodjobatremovingnoiseandhandlingchannelmismatch.• Secondly,withnochannelmismatch,T-Fmaskingaloneworkedwellinremovingnoise.

89

Discussion

• Finally,directlyperformingfeaturemappingfromnoisyfeaturestocleanfeaturesperformsreasonably,butitdoesnotperformaswellastheproposedfront-end.

90

Thank You!

91

Recommended