1.7. Gaussian Processes · 1.7. Gaussian Processes Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems

1.7. Gaussian ProcessesGaussianProcesses(GP)areagenericsupervisedlearningmethoddesignedtosolveregressionandprobabilisticclassificationproblems.

TheadvantagesofGaussianprocessesare:

Thepredictioninterpolatestheobservations(atleastforregularkernels).Thepredictionisprobabilistic(Gaussian)sothatonecancomputeempiricalconfidenceintervalsanddecidebasedonthoseifoneshouldrefit(onlinefitting,adaptivefitting)thepredictioninsomeregionofinterest.Versatile:differentkernelscanbespecified.Commonkernelsareprovided,butitisalsopossibletospecifycustomkernels.

ThedisadvantagesofGaussianprocessesinclude:

Theyarenotsparse,i.e.,theyusethewholesamples/featuresinformationtoperformtheprediction.Theyloseefficiencyinhighdimensionalspaces–namelywhenthenumberoffeaturesexceedsafewdozens.

1.7.1. Gaussian Process Regression (GPR)

TheGaussianProcessRegressorimplementsGaussianprocesses(GP)forregressionpurposes.Forthis,theprioroftheGPneedstobespecified.Thepriormeanisassumedtobeconstantandzero(fornormalize_y=False )orthetrainingdata’smean(fornormalize_y=True ).Theprior’scovarianceisspecifiedbypassingakernelobject.ThehyperparametersofthekernelareoptimizedduringfittingofGaussianProcessRegressorbymaximizingthelog-marginal-likelihood(LML)basedonthepassedoptimizer .AstheLMLmayhavemultiplelocaloptima,theoptimizercanbestartedrepeatedlybyspecifyingn_restarts_optimizer .Thefirstrunisalwaysconductedstartingfromtheinitialhyperparametervaluesofthekernel;subsequentrunsareconductedfromhyperparametervaluesthathavebeenchosenrandomlyfromtherangeofallowedvalues.Iftheinitialhyperparametersshouldbekeptfixed,None canbepassedasoptimizer.

Thenoiselevelinthetargetscanbespecifiedbypassingitviatheparameteralpha ,eithergloballyasascalarorperdatapoint.NotethatamoderatenoiselevelcanalsobehelpfulfordealingwithnumericissuesduringfittingasitiseffectivelyimplementedasTikhonovregularization,i.e.,byaddingittothediagonalofthekernelmatrix.AnalternativetospecifyingthenoiselevelexplicitlyistoincludeaWhiteKernelcomponentintothekernel,whichcanestimatetheglobalnoiselevelfromthedata(seeexamplebelow).

TheimplementationisbasedonAlgorithm2.1of[RW2006].InadditiontotheAPIofstandardscikit-learnestimators,GaussianProcessRegressor:

allowspredictionwithoutpriorfitting(basedontheGPprior)providesanadditionalmethodsample_y(X) ,whichevaluatessamplesdrawnfromtheGPR(priororposterior)atgiveninputsexposesamethodlog_marginal_likelihood(theta) ,whichcanbeusedexternallyforotherwaysofselectinghyperparameters,e.g.,viaMarkovchainMonteCarlo.

1.7.2. GPR examples

1.7.2.1. GPR with noise-level estimation

ThisexampleillustratesthatGPRwithasum-kernelincludingaWhiteKernelcanestimatethenoiselevelofdata.Anillustrationofthelog-marginal-likelihood(LML)landscapeshowsthatthereexisttwolocalmaximaofLML.

https://scikit-learn.org/stable/modules/gaussian_process.html#gp-kernels

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html#sklearn.gaussian_process.GaussianProcessRegressor


https://scikit-learn.org/stable/modules/gaussian_process.html#rw2006

Thefirstcorrespondstoamodelwithahighnoiselevelandalargelengthscale,whichexplainsallvariationsinthedatabynoise.

Thesecondonehasasmallernoiselevelandshorterlengthscale,whichexplainsmostofthevariationbythenoise-freefunctionalrelationship.Thesecondmodelhasahigherlikelihood;however,dependingontheinitialvalueforthehyperparameters,thegradient-basedoptimizationmightalsoconvergetothehigh-noisesolution.Itisthusimportanttorepeattheoptimizationseveraltimesfordifferentinitializations.

https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_noisy.html


1.7.2.2. Comparison of GPR and Kernel Ridge Regression

Bothkernelridgeregression(KRR)andGPRlearnatargetfunctionbyemployinginternallythe“kerneltrick”.KRRlearnsalinearfunctioninthespaceinducedbytherespectivekernelwhichcorrespondstoanon-linearfunctionintheoriginalspace.Thelinearfunctioninthekernelspaceischosenbasedonthemean-squarederrorlosswithridgeregularization.GPRusesthekerneltodefinethecovarianceofapriordistributionoverthetargetfunctionsandusestheobservedtrainingdatatodefinealikelihoodfunction.BasedonBayestheorem,a(Gaussian)posteriordistributionovertargetfunctionsisdefined,whosemeanisusedforprediction.

AmajordifferenceisthatGPRcanchoosethekernel’shyperparametersbasedongradient-ascentonthemarginallikelihoodfunctionwhileKRRneedstoperformagridsearchonacross-validatedlossfunction(mean-squarederrorloss).AfurtherdifferenceisthatGPRlearnsagenerative,probabilisticmodelofthetargetfunctionandcanthusprovidemeaningfulconfidenceintervalsandposteriorsamplesalongwiththepredictionswhileKRRonlyprovidespredictions.

Thefollowingfigureillustratesbothmethodsonanartificialdataset,whichconsistsofasinusoidaltargetfunctionandstrongnoise.ThefigurecomparesthelearnedmodelofKRRandGPRbasedonaExpSineSquaredkernel,whichissuitedforlearningperiodicfunctions.Thekernel’shyperparameterscontrolthesmoothness(length_scale)andperiodicityofthekernel(periodicity).Moreover,thenoiselevelofthedataislearnedexplicitlybyGPRbyanadditionalWhiteKernelcomponentinthekernelandbytheregularizationparameteralphaofKRR.


https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_compare_gpr_krr.html

Thefigureshowsthatbothmethodslearnreasonablemodelsofthetargetfunction.GPRcorrectlyidentifiestheperiodicityofthefunctiontoberoughly (6.28),whileKRRchoosesthedoubledperiodicity .Besidesthat,GPRprovidesreasonableconfidenceboundsonthepredictionwhicharenotavailableforKRR.Amajordifferencebetweenthetwomethodsisthetimerequiredforfittingandpredicting:whilefittingKRRisfastinprinciple,thegrid-searchforhyperparameteroptimizationscalesexponentiallywiththenumberofhyperparameters(“curseofdimensionality”).Thegradient-basedoptimizationoftheparametersinGPRdoesnotsufferfromthisexponentialscalingandisthusconsiderablefasteronthisexamplewith3-dimensionalhyperparameterspace.Thetimeforpredictingissimilar;however,generatingthevarianceofthepredictivedistributionofGPRtakesconsiderablelongerthanjustpredictingthemean.

1.7.2.3. GPR on Mauna Loa CO2 data

ThisexampleisbasedonSection5.4.3of[RW2006].Itillustratesanexampleofcomplexkernelengineeringandhyperparameteroptimizationusinggradientascentonthelog-marginal-likelihood.ThedataconsistsofthemonthlyaverageatmosphericCO2concentrations(inpartspermillionbyvolume(ppmv))collectedattheMaunaLoaObservatoryinHawaii,between1958and1997.TheobjectiveistomodeltheCO2concentrationasafunctionofthetimet.

Thekerneliscomposedofseveraltermsthatareresponsibleforexplainingdifferentpropertiesofthesignal:

alongterm,smoothrisingtrendistobeexplainedbyanRBFkernel.TheRBFkernelwithalargelength-scaleenforcesthiscomponenttobesmooth;itisnotenforcedthatthetrendisrisingwhichleavesthischoicetotheGP.Thespecificlength-scaleandtheamplitudearefreehyperparameters.aseasonalcomponent,whichistobeexplainedbytheperiodicExpSineSquaredkernelwithafixedperiodicityof1year.Thelength-scaleofthisperiodiccomponent,controllingitssmoothness,isafreeparameter.Inordertoallowdecayingawayfromexactperiodicity,theproductwithanRBFkernelistaken.Thelength-scaleofthisRBFcomponentcontrolsthedecaytimeandisafurtherfreeparameter.smaller,mediumtermirregularitiesaretobeexplainedbyaRationalQuadratickernelcomponent,whoselength-scaleandalphaparameter,whichdeterminesthediffusenessofthelength-scales,aretobedetermined.Accordingto[RW2006],theseirregularitiescanbetterbeexplainedbyaRationalQuadraticthananRBFkernelcomponent,probablybecauseitcanaccommodateseverallength-scales.a“noise”term,consistingofanRBFkernelcontribution,whichshallexplainthecorrelatednoisecomponentssuchaslocalweatherphenomena,andaWhiteKernelcontributionforthewhitenoise.TherelativeamplitudesandtheRBF’slengthscalearefurtherfreeparameters.

Maximizingthelog-marginal-likelihoodaftersubtractingthetarget’smeanyieldsthefollowingkernelwithanLMLof-83.214:

Thus,mostofthetargetsignal(34.4ppm)isexplainedbyalong-termrisingtrend(length-scale41.8years).Theperiodiccomponenthasanamplitudeof3.27ppm,adecaytimeof180yearsandalength-scaleof1.44.Thelongdecaytimeindicatesthatwehavealocallyveryclosetoperiodicseasonalcomponent.Thecorrelatednoisehasanamplitudeof0.197ppmwithalengthscaleof0.138yearsandawhite-noisecontributionof0.197ppm.Thus,theoverallnoiselevelisverysmall,indicatingthatthedatacanbeverywellexplainedbythemodel.Thefigureshowsalsothatthemodelmakesveryconfidentpredictionsuntilaround2015

34.4**2*RBF(length_scale=41.8)+3.27**2*RBF(length_scale=180)*ExpSineSquared(length_scale=1.44,periodicity=1)+0.446**2*RationalQuadratic(alpha=17.7,length_scale=0.957)+0.197**2*RBF(length_scale=0.138)+WhiteKernel(noise_level=0.0336)



1.7.3. Gaussian Process Classification (GPC)

TheGaussianProcessClassifierimplementsGaussianprocesses(GP)forclassificationpurposes,morespecificallyforprobabilisticclassification,wheretestpredictionstaketheformofclassprobabilities.GaussianProcessClassifierplacesaGPprioronalatentfunction ,whichisthensquashedthroughalinkfunctiontoobtaintheprobabilisticclassification.Thelatentfunction isaso-callednuisancefunction,whosevaluesarenotobservedandarenotrelevantbythemselves.Itspurposeistoallowaconvenientformulationofthemodel,and isremoved(integratedout)duringprediction.GaussianProcessClassifierimplementsthelogisticlinkfunction,forwhichtheintegralcannotbecomputedanalyticallybutiseasilyapproximatedinthebinarycase.

Incontrasttotheregressionsetting,theposteriorofthelatentfunction isnotGaussianevenforaGPpriorsinceaGaussianlikelihoodisinappropriatefordiscreteclasslabels.Rather,anon-Gaussianlikelihoodcorrespondingtothelogisticlinkfunction(logit)isused.GaussianProcessClassifierapproximatesthenon-GaussianposteriorwithaGaussianbasedontheLaplaceapproximation.MoredetailscanbefoundinChapter3of[RW2006].

TheGPpriormeanisassumedtobezero.Theprior’scovarianceisspecifiedbypassingakernelobject.ThehyperparametersofthekernelareoptimizedduringfittingofGaussianProcessRegressorbymaximizingthelog-marginal-likelihood(LML)basedonthepassedoptimizer .AstheLMLmayhavemultiplelocaloptima,theoptimizercanbestartedrepeatedlybyspecifyingn_restarts_optimizer .Thefirstrunisalwaysconductedstartingfromtheinitialhyperparametervaluesofthekernel;subsequentrunsareconductedfromhyperparametervaluesthathavebeenchosenrandomlyfromtherangeofallowedvalues.Iftheinitialhyperparametersshouldbekeptfixed,None canbepassedasoptimizer.

GaussianProcessClassifiersupportsmulti-classclassificationbyperformingeitherone-versus-restorone-versus-onebasedtrainingandprediction.Inone-versus-rest,onebinaryGaussianprocessclassifierisfittedforeachclass,whichistrainedtoseparatethisclassfromtherest.In“one_vs_one”,onebinaryGaussianprocessclassifierisfittedforeachpairofclasses,whichistrainedtoseparatethesetwoclasses.Thepredictionsofthesebinarypredictorsarecombinedintomulti-classpredictions.Seethesectiononmulti-classclassificationformoredetails.

InthecaseofGaussianprocessclassification,“one_vs_one”mightbecomputationallycheapersinceithastosolvemanyproblemsinvolvingonlyasubsetofthewholetrainingsetratherthanfewerproblemsonthewholedataset.SinceGaussianprocessclassificationscalescubicallywiththesizeofthedataset,thismightbeconsiderablyfaster.However,notethat“one_vs_one”doesnotsupportpredictingprobabilityestimatesbutonlyplainpredictions.Moreover,notethatGaussianProcessClassifierdoesnot(yet)implementatruemulti-classLaplaceapproximationinternally,butasdiscussedaboveisbasedonsolvingseveralbinaryclassificationtasksinternally,whicharecombinedusingone-versus-restorone-versus-one.

1.7.4. GPC examples

1.7.4.1. Probabilistic predictions with GPC

https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_co2.html

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessClassifier.html#sklearn.gaussian_process.GaussianProcessClassifier




https://scikit-learn.org/stable/modules/multiclass.html#multiclass


ThisexampleillustratesthepredictedprobabilityofGPCforanRBFkernelwithdifferentchoicesofthehyperparameters.ThefirstfigureshowsthepredictedprobabilityofGPCwitharbitrarilychosenhyperparametersandwiththehyperparameterscorrespondingtothemaximumlog-marginal-likelihood(LML).

WhilethehyperparameterschosenbyoptimizingLMLhaveaconsiderablelargerLML,theyperformslightlyworseaccordingtothelog-lossontestdata.Thefigureshowsthatthisisbecausetheyexhibitasteepchangeoftheclassprobabilitiesattheclassboundaries(whichisgood)buthavepredictedprobabilitiescloseto0.5farawayfromtheclassboundaries(whichisbad)ThisundesirableeffectiscausedbytheLaplaceapproximationusedinternallybyGPC.

Thesecondfigureshowsthelog-marginal-likelihoodfordifferentchoicesofthekernel’shyperparameters,highlightingthetwochoicesofthehyperparametersusedinthefirstfigurebyblackdots.

1.7.4.2. Illustration of GPC on the XOR dataset

ThisexampleillustratesGPConXORdata.Comparedareastationary,isotropickernel(RBF)andanon-stationarykernel(DotProduct).Onthisparticulardataset,theDotProductkernelobtainsconsiderablybetterresultsbecausetheclass-boundariesarelinearandcoincidewiththecoordinateaxes.Inpractice,however,stationarykernelssuchasRBFoftenobtainbetterresults.

https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpc.html

https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpc.html

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.RBF.html#sklearn.gaussian_process.kernels.RBF

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.DotProduct.html#sklearn.gaussian_process.kernels.DotProduct



1.7.4.3. Gaussian process classification (GPC) on iris dataset

ThisexampleillustratesthepredictedprobabilityofGPCforanisotropicandanisotropicRBFkernelonatwo-dimensionalversionfortheiris-dataset.ThisillustratestheapplicabilityofGPCtonon-binaryclassification.TheanisotropicRBFkernelobtainsslightlyhigherlog-marginal-likelihoodbyassigningdifferentlength-scalestothetwofeaturedimensions.

1.7.5. Kernels for Gaussian Processes

Kernels(alsocalled“covariancefunctions”inthecontextofGPs)areacrucialingredientofGPswhichdeterminetheshapeofpriorandposterioroftheGP.Theyencodetheassumptionsonthefunctionbeinglearnedbydefiningthe“similarity”oftwodatapointscombinedwiththeassumptionthatsimilardatapointsshouldhavesimilartargetvalues.Twocategoriesofkernelscanbedistinguished:stationarykernelsdependonlyonthedistanceoftwodatapointsandnotontheirabsolutevalues andarethusinvarianttotranslationsintheinputspace,whilenon-stationarykernelsdependalsoonthespecificvaluesofthedatapoints.Stationarykernelscanfurtherbesubdividedintoisotropicandanisotropickernels,whereisotropickernelsarealsoinvarianttorotationsintheinputspace.Formoredetails,werefertoChapter4of[RW2006].

1.7.5.1. Gaussian Process Kernel API

https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpc_xor.html

https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpc_iris.html


ThemainusageofaKernelistocomputetheGP’scovariancebetweendatapoints.Forthis,themethod__call__ ofthekernelcanbecalled.Thismethodcaneitherbeusedtocomputethe“auto-covariance”ofallpairsofdatapointsina2darrayX,orthe“cross-covariance”ofallcombinationsofdatapointsofa2darrayXwithdatapointsina2darrayY.Thefollowingidentityholdstrueforallkernelsk(exceptfortheWhiteKernel):k(X)==K(X,Y=X)

Ifonlythediagonaloftheauto-covarianceisbeingused,themethoddiag() ofakernelcanbecalled,whichismorecomputationallyefficientthantheequivalentcallto__call__ :np.diag(k(X,X))==k.diag(X)

Kernelsareparameterizedbyavector ofhyperparameters.Thesehyperparameterscanforinstancecontrollength-scalesorperiodicityofakernel(seebelow).Allkernelssupportcomputinganalyticgradientsofthekernel’sauto-covariancewithrespectto viasettingeval_gradient=True inthe__call__ method.ThisgradientisusedbytheGaussianprocess(bothregressorandclassifier)incomputingthegradientofthelog-marginal-likelihood,whichinturnisusedtodeterminethevalueof ,whichmaximizesthelog-marginal-likelihood,viagradientascent.Foreachhyperparameter,theinitialvalueandtheboundsneedtobespecifiedwhencreatinganinstanceofthekernel.Thecurrentvalueof canbegetandsetviathepropertytheta ofthekernelobject.Moreover,theboundsofthehyperparameterscanbeaccessedbythepropertybounds ofthekernel.Notethatbothproperties(thetaandbounds)returnlog-transformedvaluesoftheinternallyusedvaluessincethosearetypicallymoreamenabletogradient-basedoptimization.ThespecificationofeachhyperparameterisstoredintheformofaninstanceofHyperparameterintherespectivekernel.Notethatakernelusingahyperparameterwithname“x”musthavetheattributesself.xandself.x_bounds.

TheabstractbaseclassforallkernelsisKernel.KernelimplementsasimilarinterfaceasEstimator ,providingthemethodsget_params() ,set_params() ,andclone() .Thisallowssettingkernelvaluesalsoviameta-estimatorssuchasPipeline orGridSearch .Notethatduetothenestedstructureofkernels(byapplyingkerneloperators,seebelow),thenamesofkernelparametersmightbecomerelativelycomplicated.Ingeneral,forabinarykerneloperator,parametersoftheleftoperandareprefixedwithk1__ andparametersoftherightoperandwithk2__ .Anadditionalconveniencemethodisclone_with_theta(theta) ,whichreturnsaclonedversionofthekernelbutwiththehyperparameterssettotheta .Anillustrativeexample:

AllGaussianprocesskernelsareinteroperablewithsklearn.metrics.pairwiseandviceversa:instancesofsubclassesofKernelcanbepassedasmetric topairwise_kernels fromsklearn.metrics.pairwise.Moreover,kernelfunctionsfrompairwisecanbeusedasGPkernelsbyusingthewrapperclassPairwiseKernel.Theonlycaveatisthatthegradientofthehyperparametersisnotanalyticbutnumericandallthosekernelssupportonlyisotropicdistances.Theparametergamma isconsideredtobeahyperparameterandmaybeoptimized.Theotherkernelparametersaresetdirectlyatinitializationandarekeptfixed.

1.7.5.2. Basic kernels

TheConstantKernelkernelcanbeusedaspartofaProductkernelwhereitscalesthemagnitudeoftheotherfactor(kernel)oraspartofaSumkernel,whereitmodifiesthemeanoftheGaussianprocess.Itdependsonaparameter .Itisdefinedas:

Themainuse-caseoftheWhiteKernelkernelisaspartofasum-kernelwhereitexplainsthenoise-componentofthesignal.Tuningitsparameter correspondstoestimatingthenoise-level.Itisdefinedas:

>>>fromsklearn.gaussian_process.kernelsimportConstantKernel,RBF>>>kernel=ConstantKernel(constant_value=1.0,constant_value_bounds=(0.0,10.0))*RBF(length_scale=0.5,length_scale_bounds=(0.0,10.0))+RBF(length_scale=2.0,length_scale_bounds=(0.0,10.0))>>>forhyperparameterinkernel.hyperparameters:print(hyperparameter)Hyperparameter(name='k1__k1__constant_value',value_type='numeric',bounds=array([[0.,10.]]),n_elements=1,fixed=False)Hyperparameter(name='k1__k2__length_scale',value_type='numeric',bounds=array([[0.,10.]]),n_elements=1,fixed=False)Hyperparameter(name='k2__length_scale',value_type='numeric',bounds=array([[0.,10.]]),n_elements=1,fixed=False)>>>params=kernel.get_params()>>>forkeyinsorted(params):print("%s:%s"%(key,params[key]))k1:1**2*RBF(length_scale=0.5)k1__k1:1**2k1__k1__constant_value:1.0k1__k1__constant_value_bounds:(0.0,10.0)k1__k2:RBF(length_scale=0.5)k1__k2__length_scale:0.5k1__k2__length_scale_bounds:(0.0,10.0)k2:RBF(length_scale=2)k2__length_scale:2.0k2__length_scale_bounds:(0.0,10.0)>>>print(kernel.theta)#Note:log-transformed[0.-0.693147180.69314718]>>>print(kernel.bounds)#Note:log-transformed[[-inf2.30258509][-inf2.30258509][-inf2.30258509]]

>>>

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.Kernel.html#sklearn.gaussian_process.kernels.Kernel

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.WhiteKernel.html#sklearn.gaussian_process.kernels.WhiteKernel

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.Hyperparameter.html#sklearn.gaussian_process.kernels.Hyperparameter


https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.pairwise


https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.pairwise

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.PairwiseKernel.html#sklearn.gaussian_process.kernels.PairwiseKernel

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.ConstantKernel.html#sklearn.gaussian_process.kernels.ConstantKernel

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.Product.html#sklearn.gaussian_process.kernels.Product

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.Sum.html#sklearn.gaussian_process.kernels.Sum

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.WhiteKernel.html#sklearn.gaussian_process.kernels.WhiteKernel

1.7.5.3. Kernel operators

Kerneloperatorstakeoneortwobasekernelsandcombinethemintoanewkernel.TheSumkerneltakestwokernels and andcombinesthemvia .TheProductkerneltakestwokernels and andcombinesthemvia

.TheExponentiationkerneltakesonebasekernelandascalarparameter andcombinesthemvia .

1.7.5.4. Radial-basis function (RBF) kernel

TheRBFkernelisastationarykernel.Itisalsoknownasthe“squaredexponential”kernel.Itisparameterizedbyalength-scaleparameter ,whichcaneitherbeascalar(isotropicvariantofthekernel)oravectorwiththesamenumberofdimensionsastheinputs (anisotropicvariantofthekernel).Thekernelisgivenby:

Thiskernelisinfinitelydifferentiable,whichimpliesthatGPswiththiskernelascovariancefunctionhavemeansquarederivativesofallorders,andarethusverysmooth.ThepriorandposteriorofaGPresultingfromanRBFkernelareshowninthefollowingfigure:

1.7.5.5. Matérn kernel

TheMaternkernelisastationarykernelandageneralizationoftheRBFkernel.Ithasanadditionalparameter whichcontrolsthesmoothnessoftheresultingfunction.Itisparameterizedbyalength-scaleparameter ,whichcaneitherbeascalar(isotropicvariantofthekernel)oravectorwiththesamenumberofdimensionsastheinputs (anisotropicvariantofthekernel).Thekernelisgivenby:

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.Sum.html#sklearn.gaussian_process.kernels.Sum

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.Product.html#sklearn.gaussian_process.kernels.Product

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.Exponentiation.html#sklearn.gaussian_process.kernels.Exponentiation


https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_prior_posterior.html

https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.Matern.html#sklearn.gaussian_process.kernels.Matern


As ,theMatérnkernelconvergestotheRBFkernel.When ,theMatérnkernelbecomesidenticaltotheabsoluteexponentialkernel,i.e.,

Inparticular, :

and :

arepopularchoicesforlearningfunctionsthatarenotinfinitelydifferentiable(asassumedbytheRBFkernel)butatleastonce()ortwicedifferentiable( ).

Theflexibilityofcontrollingthesmoothnessofthelearnedfunctionvia allowsadaptingtothepropertiesofthetrueunderlyingfunctionalrelation.ThepriorandposteriorofaGPresultingfromaMatérnkernelareshowninthefollowingfigure:


See[RW2006],pp84forfurtherdetailsregardingthedifferentvariantsoftheMatérnkernel.

1.7.5.6. Rational quadratic kernel

TheRationalQuadratickernelcanbeseenasascalemixture(aninfinitesum)ofRBFkernelswithdifferentcharacteristiclength-scales.Itisparameterizedbyalength-scaleparameter andascalemixtureparameter Onlytheisotropicvariantwhere isascalarissupportedatthemoment.Thekernelisgivenby:

ThepriorandposteriorofaGPresultingfromaRationalQuadratickernelareshowninthefollowingfigure:

1.7.5.7. Exp-Sine-Squared kernel

TheExpSineSquaredkernelallowsmodelingperiodicfunctions.Itisparameterizedbyalength-scaleparameter andaperiodicityparameter .Onlytheisotropicvariantwhere isascalarissupportedatthemoment.Thekernelisgivenby:

ThepriorandposteriorofaGPresultingfromanExpSineSquaredkernelareshowninthefollowingfigure:


https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.RationalQuadratic.html#sklearn.gaussian_process.kernels.RationalQuadratic


https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.RationalQuadratic.html#sklearn.gaussian_process.kernels.RationalQuadratic


https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.kernels.ExpSineSquared.html#sklearn.gaussian_process.kernels.ExpSineSquared

1.7.5.8. Dot-Product kernel

TheDotProductkernelisnon-stationaryandcanbeobtainedfromlinearregressionbyputting priorsonthecoefficientsofandapriorof onthebias.TheDotProductkernelisinvarianttoarotationofthecoordinatesaboutthe

origin,butnottranslations.Itisparameterizedbyaparameter .For ,thekerneliscalledthehomogeneouslinearkernel,otherwiseitisinhomogeneous.Thekernelisgivenby

TheDotProductkerneliscommonlycombinedwithexponentiation.Anexamplewithexponent2isshowninthefollowingfigure:





©2007-2019,scikit-learndevelopers(BSDLicense).Showthispagesource

RW2006(1,2,3,4,5,6)

1.7.5.9. References

CarlEduardRasmussenandChristopherK.I.Williams,“GaussianProcessesforMachineLearning”,MITPress2006,LinktoanofficialcompletePDFversionofthebookhere.

ToggleMenu

https://scikit-learn.org/stable/_sources/modules/gaussian_process.rst.txt

https://scikit-learn.org/stable/modules/gaussian_process.html#id1







http://www.gaussianprocess.org/gpml/chapters/RW.pdf

Documents

1.7. Gaussian Processes · 1.7. Gaussian Processes Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems