6
Circular Regression Based on Gaussian Processes Pablo Guerrero Departamento de Ciencias de la Computaci´ on Advanced Mining Technology Center Universidad de Chile Blanco Encalada 2120. Santiago. Chile. Email: [email protected] Javier Ruiz del Solar Departamento de Ingenier ´ ıa El ´ ectrica Advanced Mining Technology Center Universidad de Chile Tupper 2007. Santiago. Chile. Email: [email protected]  Abstract—Circular data is very relevant in many elds such as Geostatistics, Mobile Robotics and Pose Estimation. However, some exi sti ng ang ula r re gr ess ion met hods do not cop e wit h arbi trary nonli near funct ions prop erly . More ove r , some other regr essio n metho ds that do cope with nonli near functions , like Gaussian Processes, are not designed to work well with angular resp onse s. This paper pres ents two nove l methods for circula r regression based on Gaussian Processes. The proposed methods were tested on both synthetic data from basic functions, and real data obtained from a computer vision application. In these ex- periments, both proposed methods showed superior performance to that of Gaussian Processes. I. I NTRODUCTION Circular data [1], [2], [3], [4], [5], [6] corresponds to 2D angular data. Circular data is usually associated with directions or ti me . This ki nd of da ta is present in many interest ing pro ble ms in va ri ous areas of kno wle dge . Jus t to menti on a few of these areas, circular data is of interest in biology [7], geology [8], [9], and meteorology [10], [11]. Circular data is also relevant for computer vision [12] and image processing [13]. For instance, in [12], the orientation of a car in a picture is est ima ted fro m fea tur es ext rac ted fr om the pic tur e. The av ailab ilit y of reli able methods for circ ular regressi on is of key importance for performing adequate estimations in those domains. On the one hand, most of the existing regression methods for deal ing wi th ci rc ul ar data do not cope well wi th the nonlinearities of the problem. On the other hand, there are non- linear and nonparametric regression methods, such as Gaussian Pro ces ses (GP s), whi ch are not well des ign ed to cop e wi th circular data. In fact, as will be shown in the results section of this paper, GPs perform poorly when solving a regression problem with an angular response. The reason for this poor per for mance appears to be the fact tha t GPs always bring a line ar combinati on of the trai ning target s as the predicte d mean. However, the linear combination of angles does not cope well with the disco ntin uiti es that are inher ent to an angul ar space. The two met hods pro posed in this document fol low the same methodology. This methodology is intended to overcome the pro ble m of the dis conti nuities by employ ing a bet ter stra tegy for combi ning angles, which consists of foll owi ng thr ee ste ps: (i) get tin g an est imati on of the sine and cosine of each angle, (ii) combining linearly the estimated sine and cosin e val ues indep endent ly , and, (iii ) using the arcta ngent fun cti on to get the na l est ima tio n of the angle from the estimated sine and cosine values. This approach is based on the projected normal distribution (See [4] for example). The structure of the document is as follows: Related work on circular regression and on nonlinear regression is reviewed in Section II. Then, the nonlinear regression based on Gaussian Processes is outlined in Section III. The proposed methods for nonli near angular regressi on are descr ibed in Secti on IV. In Secti on V, real and simul ated experiments and thei r resul ts are presented. Finally, some conclusions are drawn in Section VI. II . RELATED WOR K The regre ssi on pro ble m can be dened as the probl em of le ar ni ng a functi on  f  fro m ni te pai red rea liz ati ons , (x i , y i ), i  {1,...,N  }, of a covariate  x  and a response  y. We are interested in the case in which  y  is circular. Given a test point  x = (x 1 ,...,x d ) , a regression method estimates a pre dic ti on on the pro babili ty dis tri butio n of the rel ate d response y . The predicted mean  µ = E(y )  is of particular interest. The circular regression problem has been studied in the last few decades using various approaches. For instance, an early solution for the circular regression problem proposed by [14], calculated µ as a linear combination of the covariate components: µ = µ 0  + i β i x i  (1) These kinds of regression models are known as helical. [15] noted that these models had innite maxima in the likelihood function. To overcome this probl em for a singl e covariate, they proposed a specic model for the joint distribution of  y and a linear variable,  x, with a completely specied marginal distribution function  F (x). Then, they calculated the mean as: µ = µ 0  + 2 πF (x )  (2) Following this idea, [3] introduced the use of a nonlinear link function,  g, that maps  R to  (π, π)  monotonically, and is applied to a linear combination of  x : µ = µ 0  + g i β i x i  (3) 2014 22nd International Conference on Pattern Recognition 1051-4651/14 $31.00 © 2014 IEEE DOI 10.1109/ICPR.2014.6 31 3672

Circular Regression Based on Gaussian Processes

Embed Size (px)

Citation preview

Page 1: Circular Regression Based on Gaussian Processes

8/18/2019 Circular Regression Based on Gaussian Processes

http://slidepdf.com/reader/full/circular-regression-based-on-gaussian-processes 1/6

Circular Regression Based on Gaussian Processes

Pablo Guerrero

Departamento de Ciencias de la ComputacionAdvanced Mining Technology Center

Universidad de Chile

Blanco Encalada 2120. Santiago. Chile.

Email: [email protected]

Javier Ruiz del Solar

Departamento de Ingenierıa ElectricaAdvanced Mining Technology Center

Universidad de Chile

Tupper 2007. Santiago. Chile.

Email: [email protected]

 Abstract—Circular data is very relevant in many fields suchas Geostatistics, Mobile Robotics and Pose Estimation. However,some existing angular regression methods do not cope witharbitrary nonlinear functions properly. Moreover, some otherregression methods that do cope with nonlinear functions, likeGaussian Processes, are not designed to work well with angularresponses. This paper presents two novel methods for circularregression based on Gaussian Processes. The proposed methods

were tested on both synthetic data from basic functions, and realdata obtained from a computer vision application. In these ex-periments, both proposed methods showed superior performanceto that of Gaussian Processes.

I. INTRODUCTION

Circular data [1], [2], [3], [4], [5], [6] corresponds to 2Dangular data. Circular data is usually associated with directionsor time. This kind of data is present in many interestingproblems in various areas of knowledge. Just to mention afew of these areas, circular data is of interest in biology [7],geology [8], [9], and meteorology [10], [11]. Circular data isalso relevant for computer vision [12] and image processing[13]. For instance, in [12], the orientation of a car in a picture

is estimated from features extracted from the picture. Theavailability of reliable methods for circular regression is of key importance for performing adequate estimations in thosedomains.

On the one hand, most of the existing regression methodsfor dealing with circular data do not cope well with thenonlinearities of the problem. On the other hand, there are non-linear and nonparametric regression methods, such as GaussianProcesses (GPs), which are not well designed to cope withcircular data. In fact, as will be shown in the results sectionof this paper, GPs perform poorly when solving a regressionproblem with an angular response. The reason for this poorperformance appears to be the fact that GPs always bringa linear combination of the training targets as the predictedmean. However, the linear combination of angles does not copewell with the discontinuities that are inherent to an angularspace.

The two methods proposed in this document follow thesame methodology. This methodology is intended to overcomethe problem of the discontinuities by employing a betterstrategy for combining angles, which consists of followingthree steps: (i) getting an estimation of the sine and cosineof each angle, (ii) combining linearly the estimated sine andcosine values independently, and, (iii) using the arctangentfunction to get the final estimation of the angle from the

estimated sine and cosine values. This approach is based onthe projected normal distribution (See [4] for example).

The structure of the document is as follows: Related work on circular regression and on nonlinear regression is reviewedin Section II. Then, the nonlinear regression based on GaussianProcesses is outlined in Section III. The proposed methods fornonlinear angular regression are described in Section IV. In

Section V, real and simulated experiments and their resultsare presented. Finally, some conclusions are drawn in SectionVI.

II. RELATED  WOR K

The regression problem can be defined as the problemof learning a function   f    from finite paired realizations,(xi, yi), i   ∈ {1, . . . , N  }, of a covariate   x   and a response   y.We are interested in the case in which  y   is circular. Given a

test point  x∗ = (x∗1, . . . , x∗d)

, a regression method estimates

a prediction on the probability distribution of the relatedresponse  y∗. The predicted mean  µ∗ =  E(y∗)  is of particularinterest. The circular regression problem has been studied in

the last few decades using various approaches. For instance,an early solution for the circular regression problem proposedby [14], calculated µ∗ as a linear combination of the covariatecomponents:

µ∗ = µ0 +i

β ix∗

i   (1)

These kinds of regression models are known as helical. [15]noted that these models had infinite maxima in the likelihoodfunction. To overcome this problem for a single covariate,they proposed a specific model for the joint distribution of  yand a linear variable,  x, with a completely specified marginaldistribution function F (x). Then, they calculated the mean as:

µ∗ = µ0 + 2πF (x∗)   (2)

Following this idea, [3] introduced the use of a nonlinearlink function,   g, that maps  R   to   (−π, π)   monotonically, andis applied to a linear combination of  x∗:

µ∗ = µ0 + g

i

β ix∗

i

  (3)

2014 22nd International Conference on Pattern Recognition

1051-4651/14 $31.00 © 2014 IEEE

DOI 10.1109/ICPR.2014.631

3672

Page 2: Circular Regression Based on Gaussian Processes

8/18/2019 Circular Regression Based on Gaussian Processes

http://slidepdf.com/reader/full/circular-regression-based-on-gaussian-processes 2/6

[16] pointed out that these methods are practically nonuse-able in the general case due to several flaws including implau-sibility of fitted models, non-identifiability of parameters, anddifficulties in the computation of the parameter estimates.

Finally, some Bayesian methods for circular regressionhave been proposed. For example, [4], proposed a model based

on a projected normal distribution, where (cos y, sin y)

is theprojection to the unit circle of the bivariate Gaussian  z :

u =  z/R,R =  z   (4)

Then,  µ∗z  =  E(z∗)  is calculated as:

µ∗z  =i

β ix∗

i   (5)

To infer the parameters  β i, a Bayesian procedure based onGibbs sampling is proposed. As in [4], the method proposedin this paper uses the projected normal distribution to model

the probability distribution of the response variable. However,differing from [4], the method proposed here uses GaussianProcesses (GPs) for estimating the mean and variance of thebivariate Gaussian  z .

There are also methods based on Gaussian Processes thatallow the estimation of linear responses having circular covari-ates. For that purpose, periodic covariance functions have beenused (see for instance [17]). Differing from those approaches,in this paper, circular responses are estimated using linearcovariates.

III. GAUSSIAN P ROCESSES FOR R EGRESSION

Gaussian Processes (GPs) provide a non-parametric tool

for non-linear regression and classification. An excellent sum-mary that includes both theoretical and practical aspects andreferences to deeper theoretical insights can be found in [17].Although we are only interested in the regression capabilitiesof GPs, they can also solve classification problems.

We are interested in the case when the observations inthese samples have an associated noise, i.e.,  yi  =  f (xi) + εi.GPs are able to solve this kind of regression problem at leastwhen the noise εi, sometimes called the observational noise, isassumed to be Gaussian with zero mean and arbitrary varianceσ2n. Furthermore, for any arbitrary input test,  x∗, GPs are able

to give a predictive mean and variance of the response  y∗.

 A. Covariance Functions

Covariance functions are a key component of GPs forregression and classification. Covariance functions encode theinformation of the kind of functions that a GP can learn.They also restrict the possible measures of proximity that arenecessary for the regression mechanism to operate. Usuallycovariance functions have parameters, called  hyperparametersand denoted   θ. There are several methods for estimating thehyperparameters from the training data. In general terms, thecovariance function is defined as the covariance of the valuesof  f  at two points  x  and  x :

k(x,x) = Cov(f (x), f (x))   (6)

There are several kinds of covariance functions examinedin the literature, some of which are described in [17]. Theselection of an adequate covariance function is crucial forthe correct resolution of a determined problem. The mostcommonly used covariance function is the squared exponential

covariance function:

kθ(x,x) =  σ2

f  exp

1

2(x − x)W (x − x)

  (7)

where  W   is a diagonal matrix with scaling factors in itsdiagonal. Note that the subscript in kθ  indicates the dependenceof the covariance function on the hyperparameters  θ .

 B. Prediction

Given a covariance function and a vector   θ   of values forits hyperparameters, a GP can predict the mean and varianceof the function value for any arbitrary input point set. Forconvenience, given a set of input vectors,   {x1, . . . ,xm}, wewill define its  aggregated matrix  column-wise as:

Agg({x1, . . . ,xm}) = [x1| . . . |xm]   (8)

Then, let  X   denote the aggregated matrix of the traininginput set,  {xi}, and  y  denote the transpose of the aggregatedmatrix of the training target set,   {yi}. Note that since theresponse is one-dimensional,  y   is actually a vector. If we havea set of test inputs,  {x∗i }, then let  X ∗ denote the aggregatedmatrix of   {x∗i }. Additionally, let us define   f ∗i   =   f (x∗i )   ∼

 N f ∗

i ,Var(f ∗i )

, which, of course, is a random variable.

Finally, let   f ∗ denote the aggregated matrix of   {f ∗i },   f ∗

itsmean, and  Cov(f ∗)   its covariance matrix.

Given two aggregated matrices  X  and  X  of the inputsets {xi} and  {xi }, respectively, we can define the covariancematrix between X  and  X , K θ (X , X ), as the matrix whosecomponents are defined by:

K θ (X , X )i,j  = kθxi,x

j

  (9)

Let us define  K X   =   K θ (X,X )   and call it simply   thecovariance matrix. If there is only one test point  x∗, we canwrite   k∗   =   K θ (X, x∗)   and then the predictive mean andvariance of the function value  f ∗, for the input  x∗, are [17]:

f ∗

= k∗ K −1

n   y   (10)

Var(f ∗) =  kθ (x∗,x∗) − k∗ K −1

n   k∗   (11)

Then, given that   εi   is assumed to have zero mean and

variance σ2n, the response,  y∗, at input point  x∗ has mean  f 

and variance  Var(f ∗) + σ2n. Stated in equations,

E(y∗) = k∗ K −1

n   y   (12)

Var(y∗) =  kθ (x∗,x∗) − k∗ K −1

n   k∗ + σ2

n   (13)

3673

Page 3: Circular Regression Based on Gaussian Processes

8/18/2019 Circular Regression Based on Gaussian Processes

http://slidepdf.com/reader/full/circular-regression-based-on-gaussian-processes 3/6

C. Learning

As has been stated before, there are automatic method-ologies to learn the hyperparameters from the training data.One of these methodologies pursues the maximization of thelog marginal likelihood, which corresponds to (see [17] forderivation):

log p(y|X ) =  −1

2yK −1

n   y − 1

2 log |K n| −

 N 

2  log2π   (14)

Then, the learning process consists of finding the hyperpa-

rameter vector  θ   that maximizes  log p(y|X ):

θ  = argmaxθ

log p(y|X )   (15)

Note that the hyperparameter vector contains a set of pa-rameters for the covariance function that influence  log p(y|X )through   K X , and the noise variance,   σ2

n, that influenceslog p(y|X )  directly. A convex optimization algorithm may be

employed for finding the maximum. A useful expression forthe maximization algorithm is the marginal likelihood gradient:

∂  log p(y|X, θ)

∂θj=

 1

2 tr

αα − K −1

n

 ∂K n∂θj

  (16)

where α =  K −1n   y  and  θj   is the  j th element of  θ , i.e., the

 jth hyperparameter.

IV. REGRESSION WITH AN  A NGULAR R ESPONSE

In this section, two GP-based novel methods for circularregression are proposed.

 A. SinCos-GP

The first method to be introduced solves the previouslymentioned angle regression problem by separating it into twoindependent regression problems: one for the sine and theother for the cosine of the angle. For that reason, we call thismethod   SinCos-GP. Two independent GPs are trained usingthe standard GP training procedure: one,  GP sin, for the sineof the angle, and the other,  G P cos, for the cosine. Then, twoindependent sets of hyperparameters must be learned:

θcos = argmaxθ

log p(ycos|X )   (17)

θsin = argmaxθ log p(ysin|X )   (18)

where   ycos   and   ysin   are vectors whose elements are thecosine and the sine respectively of the elements of   y. Inelement-wise equations,  [ycos]i = cos yi   and  [ysin]i = sin yi.

Then, for each test input, SinCos-GP will predict the sineand the cosine of the angular response,  y∗, independently:

y∗cos =  E(cos y∗) = kcos

∗   [K cosn   ]−1

ycos   (19)

y∗sin =  E(sin y∗) = ksin

K sinn

−1ysin   (20)

where

K cosn   = K cosX   + (σcos

n   )2 I    (21)

K cosX

  = K θcos (X,X )   (22)

kcos

∗   = K θcos (X,x∗)   (23)

and

K sinn   = K sinX   +σsin

n

2I    (24)

K sinX   = K θsin (X,X )   (25)

ksin

∗   = K θsin (X, x∗)   (26)

Finally, SinCos-GP approximates the mean of the angularresponse as the result of the arctangent function applied to thepredicted sine and cosine:

E(y∗) = arctan(y∗sin, y∗

cos)   (27)

 B. Angle-GP

Angle-GP is also a solution for the linear-circular regres-sion problem that is based on SinCos-GP. In fact, the wholepredictive part of Angle-GP is identical to that of SinCos-GP(See equations 19 to 27).

However, differing from SinCos-GP, Angle-GP assumesthat the covariance of the sine and the cosine of the responseare identical. For this purpose, the same hyperparameters are

learned for both  G P cos and  G P sin:  θcos =  θsin =  θ.

For considering the influence of both the sine and thecosine in the learning process, the log marginal likelihood,log L   = log p(ycos,ysin|X ), of both   ycos   and   ysin   is maxi-mized. Assuming that  ycos   and  ysin   are independent:

log L   = log p(ycos|X ) + log p(ysin|X )

=   −1

2ycos

K −1n   ycos −

  1

2ysinK −1

n   ysin

− log |K n| − N  log 2π

(28)

The hyperparameters are selected in order to maximizelog L:

θ = argmaxθ log L   (29)

The gradient of   log L   may be useful for performing theformer maximization. It can be calculated as:

∂  log L

∂θj=

  1

2 tr

αcosαcos

+ αsinαsin − 2K −1

n

 ∂K n∂θj

(30)

where αcos =  K −1n   ycos   and  αsin =  K −1

n   ysin. In fact,

3674

Page 4: Circular Regression Based on Gaussian Processes

8/18/2019 Circular Regression Based on Gaussian Processes

http://slidepdf.com/reader/full/circular-regression-based-on-gaussian-processes 4/6

∂  log L

∂θj=   −

1

2ycos

∂K −1

n

∂θjycos −

 1

2ysin

∂K −1

n

∂θjysin −

 ∂  log |K n∂θj

=  1

2ycosK −1

n

∂K n∂θj

K −1

n   ycos

+1

2ysinK −1

n

∂K n∂θj

K −1

n   ysin − tr

K −1

n

∂K n∂θj

=

  1

2 tr

αcosαcos

+ αsinαsin

− 2K −1

n ∂K n∂θj

As in SinCos-GP, when a test input x∗ is presented, Angle-GP uses  G P cos and  G P sin to predict the means of the cosineand sine of the angle independently, and then applies thearctangent function to them in order to get the predicted meanof the angular response.

V. RESULTS

The SinCos-GP and Angle-GP methods are tested usingboth synthetic data generated in MATLAB and real dataobtained from a database of car images. The performances of Angle-GP and SinCos-GP are compared to that of a GP. As ameasure of performance the root mean squared error (RMSE)is employed.

 A. Simulated Experiment 

In this experiment, each method was used to perform theregression of the following function:

y  = arctan (x2, x1) + 45◦ (31)

for a two-dimensional covariate  x  = (x1, x2)

. Regardlessof the definition of the output interval of the arctangentfunction, the function was discontinuous at one point. Thedefinition of the output interval selected for the purposes of this

experiment was   (−180

, 180

]. The size,  N , of the trainingset varied from 5 to 100 samples, increasing in incrementsof 5. For each training set size, 100 trials were performed,with different randomly sampled training sets. In each trainingsample, x1  and x2  were independently sampled in the interval[−1, 1]. In order to check the regression accuracy of themethods, a fixed set of test points were obtained from a fixed20 × 20  uniform grid in   [−1, 1] × [−1, 1].

For each trial and for each method, the RMSE1 of theprediction was calculated in order to have a global comparisonmeasure. Figure 1 shows the results in terms of comparativeperformance for the described regression task using: a GP, aSinCos-GP, and an Angle-GP.

From these results, it is clear that Angle-GP surpasses the

performance of both the GP and the SinCos-GP. As the numberof training samples grows, the three methods being comparedimprove their predictions and the accuracy of SinCos-GPcomes closer to that of Angle-GP. However, the GP is notable to get a low RMSE even with a high number of trainingsamples.

1Note that the angular error cannot be obtained by a simple subtraction.Sometimes  ±360◦ must be added to the result of the subtraction in order toget an error in the  (−180◦, 180◦]  interval.

0 20 40 60 80 100

0

20

40

60

Training Set Size

    T   r    i   a     l    R

    M     S    E     (    °     )

Trial RMSE for each Training Set Size

GP

SinCos-GP

Angle-GP

Fig. 1. Trial RMSE for each Training Set Size.

    0    °  -    3    0    °

    3    0    °  -    6    0    °

    6    0    °  -    9    0    °

    9    0    °  -    1    2    0    °

    1    2    0    °  -    1    5    0    °

    1    5    0    °  -    1    8    0    °

    1    8    0    °  -    2    1    0    °

    2    1    0    °  -    2    4    0    °

    2    4    0    °  -    2    7    0    °

    2    7    0    °  -    3    0    0    °

    3    0    0    °  -    3    3    0    °

    3    3    0    °  -    3    6    0    °

40

60

80

100

Angle Interval

    T   r    i   a     l    R    M     S    E     (    °     )

Trial RMSE for each Angle Interval

GP

SinCos-GPAngle-GP

(a)

    0    °  -    3    0    °

    3    0    °  -    6    0    °

    6    0    °  -    9    0    °

    9    0    °  -    1    2    0    °

    1    2    0    °  -    1    5    0    °

    1    5    0    °  -    1    8    0    °

    1    8    0    °  -    2    1    0    °

    2    1    0    °  -    2    4    0    °

    2    4    0    °  -    2    7    0    °

    2    7    0    °  -    3    0    0    °

    3    0    0    °  -    3    3    0    °

    3    3    0    °  -    3    6    0    °

0

20

40

60

80

Angle Interval

    T   r    i   a     l    R    M     S    E     (    °     )

Trial RMSE for each Angle Interval

GP

SinCos-GPAngle-GP

(b)

Fig. 2. Whole Experiment RMSE. (a) 5 samples, (b) 100 samples

In order to explore the behavior of the described methodsin more detail, their prediction performances over differentsubintervals of the angular response space were measured. Forthis purpose, the average prediction error was calculated fordifferent subsets of the test sets.

Each subset corresponds to a subinterval of the angularresponse space. Figure 2 shows the prediction RMSE throughthe whole experiment for each angle interval, with 5 and 100training samples.

From Figure 2, it is possible to infer that the cause of the poor performance of the GP on the angle regressiontask is mainly due to the discontinuity in  180◦. No apparentcorrelation between the target angle and the performances of SinCos-GP and Angle-GP is shown. These figures also confirmthe consistent superiority of Angle-GP over SinCos-GP, whichcan be explained by the selection of a more adequate logmarginal likelihood in the sense that it reflects the symmetryof this particular problem.

 B. Car Pose Estimation

The following experiment consists in estimating the orien-tations of cars in an image dataset [12]. The dataset consistsof pictures taken at a car show. Each image contains a single

3675

Page 5: Circular Regression Based on Gaussian Processes

8/18/2019 Circular Regression Based on Gaussian Processes

http://slidepdf.com/reader/full/circular-regression-based-on-gaussian-processes 5/6

Fig. 3. Examples of Cropped Pictures from the Car Database [12].

car in its foreground. The bounding box and ground-truthorientation of each image is available.

There are 20 different car models rotating by 360 degreeswith an image taken every 3 or 4 degrees. Figure 3 showsexamples of images in the dataset.

The original dataset is split into two subsets, one fortraining (10 sequences) and one for testing (10 sequences).Since we are not concerned with the detection problem, thebounding box of the car in each image is provided to thesystem in advance and thus the image window containing thecar is cropped so that the image just fits the car. In order tomake the experiment more independent on the selection of thetraining set, 10 trials of the experiment were performed and ineach of them, a different random selection of the training andtesting subsets was made.

In order to extract a feature vector from each image, wefollow a very similar procedure to the one used by the authors

of the database [12]. The procedure employed is describedas follows: For every image window, the DAISY descriptors[18] of each pixel are extracted. A subset of 117 900 DAISYdescriptors (100 per each training image) is randomly sampled.Then, 100 clusters are extracted from the sampled subsetusing k-means. The resulting clusters are then used to classifyevery pixel, through its DAISY descriptor, in every image.Figure 4 shows an example of an image from the dataset andthe resulting descriptor-class image. Then, a 4-level spatialpyramid of histograms [19] is created. A spatial pyramidconsiders frequencies of the different descriptor clusters notonly in the whole image but also in different regions of it,yielding richer information about the positions in which thesedescriptors are found. Figure 5 illustrates a spatial pyramidwith the same levels as the one mentioned above but with

only 2 clusters of descriptors.

The resulting feature vector has 3 000 dimensions be-cause for each of the 30 cells in the pyramid, there are100 histograms (one per DAISY descriptor cluster). For thetraining stage, the training images were subsampled so thatone of every ten was considered. Then, differing from [12],we used   Principal Component Analysis   (PCA) to reduce thedimensionality of the feature vector. Different numbers of principal components were chosen, from 5 to 100, in stepsof 5.

(a) (b)

Fig. 4. Exemplary Result: (a) a cropped car image, and (b) the resultingdescriptor-class image.

×  

(a)

×  

(b)

×  

(c)

×  

(d)

Fig. 5. Spatial Pyramids with 4 Levels and 2 Descriptors. The pyramidlevels have: (a) 1x1 (b) 2x2 (c) 3x3, and (d) 4x4 cells. The histogram of eachdescriptor is calculated in each pyramid cell.

The performance in terms of RMSE of the prediction wascompared for each number of principal components. Figure 6

shows the RMSE of the rotation estimation as a function of the number of principal components used. Independent of thenumber of principal components, SinCos-GP and Angle-GPshow a clearly lower RMSE than a GP.

Additionally, the three tested methods were able to reducetheir RMSE while the number of components increased up toits maximum number (120). With a few exceptions, Angle-GPhas a slightly lower RMSE than SinCos-GP. However, all thetested methods have a high RMSE (over 60 degrees). This isprobably due to the lack of representative training data (only

3676

Page 6: Circular Regression Based on Gaussian Processes

8/18/2019 Circular Regression Based on Gaussian Processes

http://slidepdf.com/reader/full/circular-regression-based-on-gaussian-processes 6/6

0 20 40 60 80 100

70

80

90

100

Number of Principal Components

    R    M     S

    E     (    °     )

RMSE for each Number of Principal Components

GP

SinCos-GP

Angle-GP

Fig. 6. Mean Squared Error in the Rotation Angle Estimation for the TestSet.

120 images per trial) in the experiment.

VI. CONCLUSION

Two novel methods for nonlinear and nonparametric re-gression of functions with a 2D angular response are presented,SinCos-GP and Angle-GP. Both methods are based on Gaus-sian Processes and replace the linear combination of targetsby a linear combination of the sine and cosine of the angles.

SinCos-GP and Angle-GP were compared to a GP insolving regression simulated and real circular regression prob-lems. In both simulated and real data experiments, SinCos-GPand Angle-GP performed significantly better than a GP. Thetwo experiments presented show the ability of the methodspresented to fit problems of very different natures where theresponse is an angular variable.

Angle-GP seemed to work better than SinCos-GP in mostof the tested situations. However, this superiority does notappear to be consistent enough to differentiate clearly theperformance of both methods. Thus, it is advisable to evaluateboth alternatives for each particular application. It seems areasonable prediction that in those applications where someof the covariate components bring more useful informationfor the estimation of the sine, and others to the estimationof the cosine, SinCos-GP will show better results. Whereasin applications in which all the covariate components bringas much information for the estimation of the sine as for

the cosine, it would be reasonable to think that Angle-GPwould work better. For future work, it would be useful todevelop a method for predicting the variance of the angularresponse. Moreover, the authors would like to investigate theextension of the proposed method to the case of 3D angulardata. Additionally, it would be interesting to investigate theapplication of the ideas presented in this paper to the regressionof angular variables in the presence of heteroscedasticity.Finally, some experiments should be carried out in the future inorder to determine the sensitivity of the method to the trainingsize in regression problems.

ACKNOWLEDGMENT

This work was partially funded by FONDECYT Postdoc-toral Grantt 3130729 and by FONDECYT Grant 1130153.

REFERENCES

[1] K. Mardia and P. Jupp,   Directional Statistics, ser. Wiley Series inProbability and Statistics. Wiley, 2009.

[2] A. Lee, “Circular data,”  WIREs Comp Stat , vol. 2, no. 4, pp. 477–486,2010.

[3] N. I. Fisher and A. J. Lee, “Regression Models for an AngularResponse,”  Biometrics, vol. 48, no. 3, pp. 665–677, 1992.

[ 4] G. Nunez Antonio, E. Gutierrez-Pena, and G. Escarela, “A Bayesianregression model for circular data based on the projected normaldistribution,”   Statistical Modelling, vol. 11, no. 3, pp. 185–201, Jun.2011.

[5] N. I. Fisher,   Statistical Analysis of Circular Data. Cambr idgeUniversity Press, Jan. 1996.

[6] S. R. Jammalamadaka and A. Sengupta,  Topics in Circular Statistics,har/dskt ed. World Scientific Pub Co Inc, 2001.

[7] E. Batschelet,  Circular Statistics in Biology. New York: AcademicPress, 1981.

[8] J. R. Curray, “The analysis of two-dimensional orientation data,”   The

 Journal of Geology, vol. 64, no. 2, pp. 117–131, 1956.

[9] H. J. Pincus, “The analysis of aggregates of orientation data in the earthsciences,”  The Journal of Geology, vol. 61, no. 6, pp. 482–509, 1953.

[10] J. A. Carta, C. Bueno, and P. Ramırez, “Statistical modelling of directional wind speeds using mixtures of von mises distributions: Casestudy,”   Energy Conversion and Management , vol. 49, no. 5, pp. 897–907, 2008.

[11] J. Bowers, I. Morton, and G. Mould, “Directional statistics of the windand waves,”  Applied Ocean Research, vol. 22, no. 1, pp. 13 – 30, 2000.

[12] M.  Ozuysal, V. Lepetit, and P. Fua, “Pose estimation for categoryspecific multiview object localization,” in   CVPR. IEEE, 2009, pp.778–785.

[13] A. Blake and C. Marinos, “Shape from texture: Estimation, isotropyand moments,”   Artif. Intell., vol. 45, no. 3, pp. 323–380, 1990.

[14] L. A. Gould, “A Regression Technique for Angular Variates,”  Biomet-

rics, vol. 25, no. 4, pp. 683–700, 1969.

[15] R. A. Johnson and T. E. Wehrly, “Some Angular-Linear Distributionsand Related Regression Models,”   Journal of the American Statistical

 Association, vol. 73, no. 363, pp. 602–606, 1978.

[16] B. Presnell, S. Morrison, and R. Littell, “Projected multivariate linearmodels for directional data,”   Journal of the American Statistical Asso-

ciation, vol. 93, no. 443, pp. 1068–1077, 1998.

[17] C. E. Rasmussen and C. K. I. Williams,   Gaussian Processes for 

 Machine Learning (Adaptive Computation and Machine Learning).The MIT Press, 2005.

[18] E. Tola, V. Lepetit, and P. Fua, “A fast local descriptor for densematching,” in   CVPR. IEEE Computer Society, 2008.

[19] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories,”in   Proceedings of the 2006 IEEE Computer Society Conference onComputer Vision and Pattern Recognition - Volume 2, ser. CVPR ’06.Washington, DC, USA: IEEE Computer Society, 2006, pp. 2169–2178.

3677