Robust Estimation for Multivariate Wrapped ModelsRobust Estimation for Multivariate Wrapped Models Giovanni Saraceno 1, Claudio Agostinelli 1, and Luca Greco 2 1 Department of Mathematics,

Robust Estimation for Multivariate WrappedModels

Giovanni Saraceno1, Claudio Agostinelli1, and Luca Greco2

1Department of Mathematics, University of Trento, Trento, [email protected], [email protected]

2Department DEMM, University of Sannio, Benevento, [email protected]

Abstract

A weighted likelihood technique for robust estimation of a multi-variate Wrapped Normal distribution for data points scattered on ap−dimensional torus is proposed. The occurrence of outliers in thesample at hand can badly compromise inference for standard tech-niques such as maximum likelihood method. Therefore, there is theneed to handle such model inadequacies in the fitting process by arobust technique and an effective downweighting of observations notfollowing the assumed model. Furthermore, the employ of a robustmethod could help in situations of hidden and unexpected substruc-tures in the data. Here, it is suggested to build a set of data-dependentweights based on the Pearson residuals and solve the correspondingweighted likelihood estimating equations. In particular, robust esti-mation is carried out by using a Classification EM algorithm whoseM-step is enhanced by the computation of weights based on current pa-rameters’ values. The finite sample behavior of the proposed methodhas been investigated by a Monte Carlo numerical studies and realdata examples.Keywords: CEM Algorithm, Multivariate Wrapped Distributions,Pearson residuals, Robust Estimators, Torus, Weighted Likelihood.

1

arX

iv:2

010.

0844

4v1

[st

at.M

E]

16

Oct

202

0

1 Introduction

Multivariate circular observations arise commonly in all those fields wherea quantity of interest is measured as a direction or when instruments suchas compasses, protractors, weather vanes, sextants or theodolites are used[Mardia, 1972]. Circular (or directional) data can be seen as points on theunit circle and represented by angles, provided that an initial direction andorientation of the circle have been chosen.

These data might be successfully modeled by using appropriate wrappeddistributions such as the Wrapped Normal on the unit circle. The reader ispointed to Mardia and Jupp [2000], Jammalamadaka and SenGupta [2001]and Batschelet [1981] for modeling and inferential issues on circular data.

When data come in a multivariate setting, we might extend the uni-variate wrapped distribution by using a component-wise wrapping of multi-variate distributions. The multivariate Wrapped Normal (WNp) is obtainedby component-wise wrapping of a p-variate Normal distribution (Np) on ap−dimensional torus [Johnson and Wehrly, 1978, Baba, 1981]. Wrappingcan be explained as the geometric translation of a distribution with supporton R to a space defined on a circular object, e.g., a unit circle [Mardia andJupp, 2000].

Let X ∼ Np(µ,Σ), where µ is the vector mean and Σ is the variance-covariance matrix. Then, the distribution of Y = X mod 2π is denoted asWNp(µ,Σ) with distribution function

F (y) =∑

j∈Zp

[Φp(y + 2πj; Ω)− Φp(2πj; Ω)] ,

and density function

f(y) =∑

j∈Zp

φp(y + 2πj; Ω) ,

with y ∈ (0, 2π]p, j ∈ Zp, Ω = (µ,Σ), where Φp(·) and φp(·) are the distribu-tion and density function of X, respectively, and the modulus operator modis applied component-wise. An appealing property of the wrapped Normaldistribution is its closure with respect to convolution [Baba, 1981, Jammala-madaka and SenGupta, 2001].

Likelihood based inference about the parameters of the WNp(µ,Σ) dis-tribution can be trapped in numerical and computational hindrances since

2

the log-likelihood function

`(Ω) =n∑

i=1

log

[∑

j∈Zp

φp(yi + 2πj; Ω)

]

involves the evaluation of an infinite series. Agostinelli [2007] proposed an It-erative Reweighted Maximum Likelihood Estimating Equations algorithm inthe univariate setting, that is available in the R package circular [Agostinelliand Lund, 2017]. Algorithms bases on the Expectation-Maximization (EM)method have been used by Fisher and Lee [1994] for parameter estimation forautoregressive models of Wrapped Normal distributions and by Coles [1998],Ravindran and Ghosh [2011] and Ferrari [2009] in a Bayesian framework ac-cording to a data augmentation approach to estimate the missing unobservedwrapping coefficients. An innovative estimation strategy based on EM andClassification EM algorithms has been discussed in Nodehi et al. [2020]. Inorder to perform maximum likelihood estimation, the wrapping coefficientsare treated as latent variables.

Let y1, . . . ,yn be a i.i.d. sample from a multivariate Wrapped Normal dis-tribution Y ∼ WNp(µ,Σ) on the p-torus with mean vector µ and variance-covariance matrix Σ. We can think of yi = xi mod 2π where xi is a samplefrom X i ∼ Np(µ,Σ). The EM algorithm works with the complete log-likelihood function given by

`C(Ω) =n∑

i=1

log

[∑

j∈Zp

vijφ(yi + 2πj; Ω)

], (1)

that is characterized by the missing unobserved wrapping coefficients j andvij is an indicator of the ith unit having the j vector as wrapping coeffi-cients. The EM algorithm iterates between an Expectation (E) step and aMaximization (M) step. In the E-step, the conditional expectation of (1) isobtained by estimating the vij with the posterior probability that yi has jas wrapping coefficients based on current parameters’ values, i.e.

vij =φ(yi + 2πj; Ω)∑

h∈Zp φ(yi + 2πh; Ω), j ∈ Zp, i = 1, . . . , n .

In the M-step, the conditional expectation of (1) is maximized with respectto Ω. The reader is pointed to Nodehi et al. [2020] for computational detailsabout such maximization problem and updating formulas for Ω.

3

An alternative estimation strategy is based on the CEM-type algorithm.The substantial difference is that the E-step is followed by a C-step (where Cstands for classification) in which vij is estimated as either 0 or 1 and so thateach observation yi is associated to the most likely wrapping coefficients jiwith ji = arg maxh∈Zp vih.

When the sample data is contaminated by the occurrence of outliers, itis well known that maximum likelihood estimation, also achieved throughthe implementation of the EM or CEM algorithm, is likely to lead to un-reliable results [Farcomeni and Greco, 2016]. Then, there is the need fora suitable robust procedure providing protection against those unexpectedanomalous values. An attractive solution would be to modify the likelihoodequations in the M-step by introducing a set of weights aimed to bound theeffect of outliers. Here, it is suggested to evaluate weights according to theweighted likelihood methodology [Markatou et al., 1998]. Weighted likelihoodis an appealing robust technique for estimation and testing [Agostinelli andMarkatou, 2001]. The methodology leads to a robust fit and gives the chanceto detect possible substructures in the data. Furthermore, the weighted like-lihood methodology works in a very satisfactory fashion when combined withthe EM and CEM algorithms, as in the case of mixture models [Greco andAgostinelli, 2020, Greco et al., 2020].

The remainder of the paper is organized as follows. Section 2 gives briefbut necessary preliminaries on weighted likelihood. The weighted CEM algo-rithm for robust fitting of the multivariate Wrapped Normal model on dataon a p−dimensional torus is discussed in Section 3. Section 4 reports theresults of some numerical studies, whereas a real data example is discussedin Section 5. Concluding remarks end the paper.

2 Preliminaries on weighted likelihood

Let y1, · · · ,yn be a random sample of size n drawn from a r.v. Y withdistribution function F and probability (density) function f . Let M =M(y;θ),θ ∈ Θ ⊆ Rd, d ≥ 1,y ∈ Y be the assumed parametric model, withcorresponding density m(y;θ), and Fn the empirical distribution function.Assume that the support of F is the same as that of M and independentof θ. A measure of the agreement between the true and assumed modelis provided by the Pearson residual function δ(y), with δ(y) ∈ [−1,+∞),

4

[Lindsay, 1994, Markatou et al., 1998], defined as

δ(y) = δ(y;θ, F ) =f(y)

m(y;θ)− 1 . (2)

The finite sample counterpart of (2) can be obtained as

δn(y) = δ(y;θ, Fn) =fn(y)

m(y;θ)− 1 , (3)

where fn(y) is a consistent estimate of the true density f(y). In discretefamilies of distributions, fn(y) can be driven by the observed relative fre-quencies [Lindsay, 1994], whereas in continuous models one could consider anon parametric density estimate based on the kernel function k(y; t, h), thatis

fn(y) =

∫

Yk(y; t, h)dFn(t) . (4)

Moreover, in the continuous case, the model density in (3) can be replacedby a smoothed model density, obtained by using the same kernel involved innon-parametric density estimation [Basu and Lindsay, 1994, Markatou et al.,1998], that is

m(y;θ) =

∫

Yk(y; t, h)m(t;θ) dt

leading to

δn(y) = δ(y;θ, Fn) =fn(y)

m(y;θ)− 1 . (5)

By smoothing the model, the Pearson residuals in (5) converge to zero withprobability one for every y under the assumed model and it is not requiredthat the kernel bandwidth h goes to zero as the sample size n increases.Large values of the Pearson residual function correspond to regions of thesupport Y where the model fits the data poorly, meaning that the observationis unlikely to occur under the assumed model. The reader is pointed to Basuand Lindsay [1994], Markatou et al. [1998], Agostinelli and Greco [2019] andreferences therein for more details.

Observations leading to large Pearson residuals in (5) are supposed to bedown-weighted. Then, a weight in the interval [0, 1] is attached to each datapoint, that is computed accordingly to the following weight function

w(δ(y)) =[A(δ(y)) + 1]+

δ(y) + 1, (6)

5

where w(δ) ∈ [0, 1], [·]+ denotes the positive part and A(δ) is the Residual Ad-justment Function (RAF, Basu and Lindsay [1994]). The weights w(δn(y))are meant to be small for those data points that are in disagreement withthe assumed model. Actually, the RAF plays the role to bound the effectof large Pearson residuals on the fitting procedure. By using a RAF suchthat |A(δ)| ≤ |δ| outliers are expected to be properly downweighted. Theweight function (6) can be based on the families of RAF stemming from theSymmetric Chi-squared divergence [Markatou et al., 1998], the GeneralizedKullback-Leibler divergence [Park and Basu, 2003]

Agkl(δ, τ) =log(τδ + 1)

τ, 0 ≤ τ ≤ 1; (7)

or the Power Divergence Measure [Cressie and Read, 1984, 1988]

Apdm(δ, τ) =

τ((δ + 1)1/τ − 1

)τ <∞

log(δ + 1) τ →∞ .

The resulting weight function is unimodal and declines smoothly to zero asδ(y)→ −1 or δ(y)→∞.

Then, robust estimation can be based on a Weighted Likelihood Estimat-ing Equation (WLEE), defined as

n∑

i=1

w(δn(yi);θ, Fn)s(yi;θ) = 0 , (8)

where s(yi;θ) is the individual contribution to the score function. Therefore,weighted likelihood estimation can be thought as a root solving problem.Finding the solution of (8) requires an iterative weighting algorithm.

The corresponding weighted likelihood estimator θw

(WLE) is consistent,asymptotically normal and fully efficient at the assumed model, under somegeneral regularity conditions pertaining the model, the kernel and the weightfunction [Markatou et al., 1998, Agostinelli and Markatou, 2001, Agostinelliand Greco, 2019]. Its robustness properties have been established in Lindsay[1994] in connection with minimum disparity problems. It is worth to remarkthat under very standard conditions, one can build a simple WLEE match-ing a minimum disparity objective function, hence inheriting its robustnessproperties.

In finite samples, the robustness/efficiency trade-off of weighted likelihoodestimation can be tuned by varying the smoothing parameter h in equation

6

(4). Large values of h lead to Pearson residuals all close to zero and weightsall close to one and, hence, large efficiency, since fn(y) is stochastically closeto the postulated model. On the other hand, small values of h make fn(y)more sensitive to the occurrence of outliers and the Pearson residuals becomelarge for those data points that are in disagreement with the model. On thecontrary, the shape of the kernel function k(y; t, h) has a very limited effect.

For what concerns the tasks of testing and setting confidence regions, aweighted likelihood counterparts of the classical likelihood ratio test, and itsasymptotically equivalent Wald and Score versions, can be established. Notethat, all share the standard asymptotic distribution at the model, accordingto the results stated in [Agostinelli and Markatou, 2001], that is

Λ(θ) = 2n∑

i=1

wi

[`(θ

w;yi)− `(θ;yi)

]p→ χ2

p ,

with wi = w(δn(yi); θw, Fn). Profile tests can be obtained as well.

3 A weighted CEM algorithm

As previously stated in the Introduction, Nodehi et al. [2020] provided effec-tive iterative algorithms to fit a multivariate Wrapped normal distribution onthe p−dimensional torus. Here, robust estimation is achieved by a suitablemodification of their CEM algorithm, consisting in a weighting step beforeperforming the M-step, in which data-dependent weights are evaluated ac-cording to (6) yielding a WLEE (8) to be solved in the M-step.

The construction of Pearson residuals in (5) involves a multivariate WrappedNormal kernel with covariance matrix hΛ. Since the family of multivariateWrapped Normal is closed under convolution, then the smoothed model den-sity is still Wrapped Normal with covariance matrix Σ + hΛ. Here, we setΛ = Σ so that h can be a constant independent of the variance-covariancestructure of the data.

The weighted CEM algorithm is structured as follows:

0 Initialization, Starting values can be obtained by maximum likelihoodestimation evaluated over a randomly chosen subset. The subsamplesize is expected to be as small as possible in order to increase theprobability to get an outliers’ free initial subset but large enough toguarantee estimation of the unknown parameters. A starting solution

7

for µ can be obtained by the circular means, whereas the diagonalentries of Σ can be initialized as−2 log(ρr), where ρr is the sample mean

resultant length and the off-diagonal elements by ρc(yr,ys)σ(0)rr σ

(0)ss (r 6=

s), where ρc(yr,ys) is the circular correlation coefficient, r = 1, 2, . . . , pand s = 1, 2, . . . , p, see Jammalamadaka and SenGupta [2001, pag. 176,equation 8.2.2]. In order to avoid the algorithm to be dependent oninitial values, a simple and common strategy is to run the algorithmfrom a number of starting values using the bootstrap root searchingapproach as in Markatou et al. [1998]. A criterion to choose amongdifferent solutions will be illustrated in Section 5.

1 E-step. Based on current parameters’ values, first evaluate posteriorprobabilities

vij =φ(yi + 2πj; Ω)∑

h∈Zp φ(yi + 2πh; Ω), j ∈ Zp, i = 1, . . . , n ,

2 C-step Set ji = arg maxh∈Zp vih, where vij = 1 for j = ji, and vij = 0otherwise.

3 W-step (Weighting step) Based on current parameters’ values, com-pute Pearson residuals according to (5) based on a multivariate WrappedNormal kernel with bandwidth h and evaluate the weights as

wi = w(δn(yi),Ω, Fn).

4 M-step Update parameters’ values by computing a weighted mean andvariance-covariance matrix with weights wi, used to compute estimates,by maximizing the classification log-likelihood conditionally on ji (i =1, . . . , n), given by

µi =

∑ni=1wixi∑ni=1wi

,

Σij =

∑ni=1(xi − µi)>(xj − µj)wi∑n

i=1 wi.

Note that, at each iteration the classification algorithm provides also anestimate of the original unobserved sample obtained as xi = yi + 2πji,i = 1, . . . , n.

8

4 Numerical studies

The finite sample behavior of the proposed weighted CEM has been inves-tigated by some numerical studies based on 500 Monte Carlo trials each.Data have been drawn from a WNp(µ,Σ). We set µ = 0, whereas inorder to account for the lack of affine equivariance of the Wrapped Nor-mal model [Nodehi et al., 2020], we consider different covariance structuresΣ as in Agostinelli et al. [2015]. In particular, for fixed condition numberCN = 20, we construct a random correlation matrix R. Then, we convertthe correlation matrix R into the covariance matrix Σ = D1/2RD1/2, withD = diag(σ1p), where σ is a chosen constant and 1p is a p-dimensional vectorof ones. Outliers have been obtained by shifting a proportion ε of randomlychosen data points by an amount kε in the direction of the smallest eigen-value of Σ. We consider sample sizes n = 50, 100, 500, dimensions p = 2, 5,contamination level ε = 0, 5%, 10%, 20%, contamination size kε = π/4, π/2, πand σ = π/8, π/4, π/2.

For each combination of the simulation parameters, we are going to com-pare the performance of CEM and weighted CEM algorithms. The weightsused in the W-step are computed using the Generalized Kullback-LeiblerRAF in equation (7) with τ = 0.1. According to the strategy described inAgostinelli and Markatou [2001], the bandwidth h has been selected by set-ting Λ = Σ, so that h is a constant independent of the scale of the model.Here, h is obtained so that any outlying observation located at least threestandard deviations away from the mean in a componentwise fashion, is at-tached a weight not larger than 0.12 when the rate of contamination in thedata has been fixed equal to 20%. The algorithm has been initialized accord-ing to the root search approach described in Markatou et al. [1998] based on15 subsamples of size 10.

The weighted CEM is assumed to have reached convergence when at the(k + 1)th iteration

max

(√2(1− cos(µ(k) − µ(k+1))),max |Σ(k) − Σ(k+1)|

)< 10−6.

The algorithm has been implemented so that Zp is replaced by the Cartesianproduct ×ps=1J where J = (−J,−J + 1, . . . , 0, . . . , J − 1, J) for some Jproviding a good approximation. Here we set J = 3. The algorithm runs onR code [R Core Team, 2020] available from the authors upon request.

Fitting accuracy has been evaluated according to

9

(i) the average angle separation

AS(µ) =1

p

p∑

i=1

(1− cos(µi − µi)) ,

which ranges in [0, 2], for the mean vector;

(ii) the divergence

∆(Σ) = trace(ΣΣ−1)− log(|ΣΣ−1|)− p ,

for the variance-covariance matrix. Here, we only report the results stemmingfrom the challenging situation with n = 100 and p = 5. Figure 1 displays theaverage angle separation whereas Figure 2 gives the divergence to measurethe accuracy in estimating the variance-covariance matrix for the weightedCEM (in green) and CEM (in red). The weighted CEM exhibits a fairlysatisfactory fitting accuracy both under the assumed model (i.e. when thesample at hand is not corrupted by the occurrence of outliers) and under con-tamination. The robust method outperforms the CEM method, especially inthe estimation of the variance-covariance components. The algorithm resultsin biased estimates for both the mean vector and the variance-covariancematrix only for the large contamination rate ε = 20%, with small contami-nation size and a large σ. Actually, in this data constellation outliers are notwell separated from the group of genuine observations. A similar behaviorhas been observed for the other sample sizes. Complete results are madeavailable in the Supplementary Material.

5 Real data example: Protein data

The data under consideration [Najibi et al., 2017] contain bivariate infor-mation about 63 protein domains that were randomly selected from threeremote Protein classes in the Structural Classification of Proteins (SCOP).In the following, we consider the data set corresponding to the 39-th proteindomain. A bivariate Wrapped Normal has been fitted to the data at hand byusing the weighted CEM algorithm, based on a Generalized Kullback-LeiblerRAF with τ = 0.25 and J = 6. The tasks of bandwidth selection and initial-ization have been resolved according to the same strategy described abovein Section 4.

10

0.000

0.005

0.010

0.015

0.020

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 4

0.00

0.02

0.04

0.06

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 2

0.00

0.05

0.10

0.15

0.20

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π

0.00

0.01

0.02

0.03

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 4

0.000

0.025

0.050

0.075

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 2

0.00

0.05

0.10

0.15

0.20

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π

0.00

0.05

0.10

0.15

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 4

0.0

0.1

0.2

0.3

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 2

0.0

0.2

0.4

0.6

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π

Figure 1: Distribution of average angle separation for n = 100 and p = 5using weighted CEM (in green) and the CEM (in red). The contaminationrate ε is given on the horizontal axis. Increasing contamination size kε fromleft to right, increasing σ from top to bottom.11

0

10

20

30

0 5 10 20ε

∆(Σ

)σ : π 8 kε : π 4

0

50

100

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π 2

0

25

50

75

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π

0

2

4

6

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 4

0

10

20

30

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 2

0

5

10

15

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π

0

1

2

3

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 4

0.0

2.5

5.0

7.5

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 2

0

5

10

15

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π

Figure 2: Distribution of the divergence measure for n = 100 and p = 5 usingthe weighted CEM (in green) and the CEM (in red). The contamination rateε is given on the horizontal axis. Increasing contamination size kε from leftto right, increasing σ from top to bottom.12

The inspection of the data suggests the presence of at least a coupleof clusters that make the data non homogeneous. Figure 3 displays thedata on a flat torus together with fitted means and 95% confidence regionscorresponding to three different roots of the WLEE (that are illustratedby different colors): one root gives location estimate µ1 = (1.85, 2.34) anda positive correlation ρ1 = 0.79; the second root gives location estimateµ2 = (1.85, 5.86) and a negative correlation ρ2 = −0.80; the third root giveslocation estimate µ3 = (1.61, 0.88) and correlation ρ3 = −0.46. The firstand second roots are very close to maximum likelihood estimates obtainedin different directions when unwrapping the data: this is evident from theshift in the second coordinate of the mean vector and the change in thesign of the correlation. In both cases the data exhibit weights larger than0.5, except in few cases, corresponding to the most extreme observations, asdisplayed in the first two panels of Figure 4. In none of the two cases thebulk of the data corresponds to an homogeneous sub-group. On the contrary,the third root is able to detect an homogeneous substructure in the sample,corresponding to the most dense region in the data configuration. Almosthalf of the data points is attached a weight close to zero, as shown in thethird panel of Figure 4. This findings still confirm the ability of the weightedlikelihood methodology to tackle such uneven patterns as a diagnostic ofhidden substructures in the data. In order to select one of the three roots wehave found, we consider the strategy discussed in Agostinelli [2006], that is,we select the root leading to the lowest fitted probability

ProbΩ

(δn(y; Ω, Fn) < −0.95

).

This probability has been obtained by drawing 5000 samples from the fittedbivariate Wrapped Normal distribution for each of the three roots. Thecriterion correctly leads to choose the third root, for which an almost nullprobability is obtained, wheres the fitted probabilities for the first and secondroot are 0.204 and 0.280, respectively.

6 Conclusions

In this paper an effective strategy for robust estimation of a multivariateWrapped normal on a p−dimensional torus has been presented. The methodinherits the good computational properties of a CEM algorithm developed

13

−4 −2 0 2 4 6 8

−1

0−

50

51

0

y1

y2

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Figure 3: Protein data. Fitted means (+) and 95% confidence regions corre-sponding to three different roots from weighted CEM (J = 6).

in Nodehi et al. [2020] jointly with the robustness properties stemming fromthe employ of Pearson residuals and the weighted likelihood methodology.In this respect, it is particularly appealing the opportunity to work with afamily of distribution that is close under convolution and allows to parallel theprocedure one would have developed on the real line by using the multivariatenormal distribution. The proposed weighted CEM works satisfactory at leastin small to moderate dimensions, both on synthetic and real data. It isworth to stress that the method can be easily extended to other multivariatewrapped models.

14

0.25

0.50

0.75

1.00

0 50 100 150 200 250Observation Index

Root 1

0.25

0.50

0.75

1.00


Root 2

0.000.250.500.751.00


Root 3

Figure 4: Protein data. Weights corresponding to three different roots fromweighted CEM.

References

C. Agostinelli. Notes on pearson residuals and weighted likelihood estimatingequations. Statistics & Probability Letters, 76(17):1930–1934, 2006.

C. Agostinelli. Robust estimation for circular data. Computational Statisticsand Data Analysis, 51(12):5867–5875, 2007.

C. Agostinelli and L. Greco. Weighted likelihood estimation of multivariatelocation and scatter. Test, 28(3):756–784, 2019.

C. Agostinelli and U. Lund. R package circular: Circular Statistics (ver-sion 0.4-93), 2017. URL https://r-forge.r-project.org/projects/

circular/.

15

C. Agostinelli and M. Markatou. Test of hypotheses based on the weightedlikelihood methodology. Statistica Sinica, pages 499–514, 2001.

C. Agostinelli, A. Leung, V.J. Yohai, and R.H. Zamar. Robust estimation ofmultivariate location and scatter in the presence of cellwise and casewisecontamination. TEST, 24(3):441–461, 2015.

Y. Baba. Statistics of angular data: wrapped normal distribution model.Proceedings of the Institute of Statistical Mathematics, 28:41–54, 1981. (inJapanese).

A. Basu and B.G. Lindsay. Minimum disparity estimation for continuousmodels: efficiency, distributions and robustness. Annals of the Institute ofStatistical Mathematics, 46(4):683–705, 1994.

E. Batschelet. Circular Statistics in Biology. Academic Press, NewYork,1981.

S. Coles. Inference for circular distributions and processes. Statistics andComputing, 8:105–113, 1998.

N. Cressie and T.R.C. Read. Multinomial goodness-of-fit tests. Journal ofthe Royal Statistical Society, Series B (statistical methodology), 46:440–464, 1984.

N. Cressie and T.R.C. Read. Cressie–Read Statistic, pages 37–39. Wiley,1988. In: Encyclopedia of Statistical Sciences, Supplementary Volume,edited by S. Kotz and N.L. Johnson.

A. Farcomeni and L. Greco. Robust methods for data reduction. CRC press,2016.

C. Ferrari. The Wrapping Approach for Circular Data Bayesian Modeling.Phd thesis, Alma Mater Studiorum Universita di Bologna. Dottorato diricerca in Metodologia statistica per la ricerca scientifica, 21 Ciclo., 2009.

N.I. Fisher and A.J. Lee. Time series analysis of circular data. Journal ofthe Royal Statistical Society. Series B, 56:327–339, 1994.

L. Greco and C. Agostinelli. Weighted likelihood mixture modeling andmodel-based clustering. Statistics and Computing, 30(2):255–277, 2020.

16

L. Greco, A. Lucadamo, and C. Agostinelli. Weighted likelihood latent classlinear regression. Statistical Methods and Applications, 2020. doi: 10.1007/s10260-020-00540-8.

S.R. Jammalamadaka and A. SenGupta. Topics in Circular Statistics, vol-ume 5 of Multivariate Analysis. World Scientific, Singapore, 2001.

R.A. Johnson and T. Wehrly. Some angular-linear distributions and relatedregression models. Journal of the American Statistical Association, 73:602–606, 1978.

B.G. Lindsay. Efficiency versus robustness: The case for minimum hellingerdistance and related methods. The Annals of Statistics, 22:1018–1114,1994.

K.V. Mardia. Statistics of Directional Data. Academic Press, London, 1972.

K.V. Mardia and P.E. Jupp. Directional Statistics. Wiley, New York, 2000.

M. Markatou, A. Basu, and B.G. Lindsay. Weighted likelihood equations withbootstrap root search. Journal of the American Statistical Association, 93(442):740–750, 1998.

S.M. Najibi, M. Maadooliat, L. Zhou, J.Z. Huang, and X. Gao. Proteinstructure classification and loop modeling using multiple ramachandrandistributions. Computational and Structutal Biotechnology Journal, 15:243–254, 2017. doi: 10.1016/j.csbj.2017.01.011.

A. Nodehi, M. Golalizadeh, M. Maadooliat, and C. Agostinelli. Estimationof parameters in multivariate wrapped models for data on a p-torus. Com-putational Statistics, 2020. doi: 10.1007/s00180-020-01006-x.

C. Park and A. Basu. The generalized kullback-leibler divergence and robustinference. Journal of Statistical Computation and Simulation, 73(5):311–332, 2003.

R Core Team. R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna, Austria, 2020. URLhttps://www.R-project.org/.

17

P. Ravindran and S.K. Ghosh. Bayesian analysis of circular data usingwrapped distributions. Journal of Statistical Theory and Practice, 5:547–561, 2011.

18

Supplemental Material: Robust Estimation forMultivariate Wrapped Models

Giovanni Saraceno1, Claudio Agostinelli1, and Luca Greco2

1Department of Mathematics, University of Trento, Trento, [email protected], [email protected]

2Department DEMM, University of Sannio, Benevento, [email protected]

October 19, 2020

The Supplemental Material contains complete results of the numericalstudy described in Section 4 of the manuscript.

Additional Numerical results

Figure 1, Figure 3 and Figure 5 show the angle separation whereas Figure2, Figure 4 and Figure 6 display the measure of accuracy in estimating thevariance-covariance components for p = 2 and n = 50, 100, 500, respectively.

Figure 7 show the angle separation whereas Figure 8 display the measureof accuracy in estimating the variance-covariance components for p = 5 andn = 500, respectively.

1

arX

iv:2

010.

0844

4v1

[st

at.M

E]

16

Oct

202

0

0.00

0.01

0.02

0.03

0.04

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 4

0.000

0.025

0.050

0.075

0.100

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 2

0.0

0.1

0.2

0.3

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π

0.000

0.025

0.050

0.075

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 4

0.00

0.05

0.10

0.15

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 2

0.0

0.1

0.2

0.3

0.4

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π

0.0

0.2

0.4

0.6

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 4

0.0

0.3

0.6

0.9

1.2

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 2

0.0

0.5

1.0

1.5

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π

Figure 1: Distribution of angle separation for n = 50 and p = 2 usingweighted CEM (in green) and the CEM (in red). The contamination rate εis given on the horizontal axis. Increasing contamination size kε from left toright, increasing σ from top to bottom.2

0

5

10

0 5 10 20ε

∆(Σ

)σ : π 8 kε : π 4

0

20

40

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π 2

0.0

2.5

5.0

7.5

10.0

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π

0

1

2

3

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 4

0

5

10

15

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 2

0.0

0.5

1.0

1.5

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π

0.00

0.25

0.50

0.75

1.00

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 4

0

5

10

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 2

0

5

10

15

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π


0.00

0.01

0.02

0.03

0.04

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 4

0.000

0.025

0.050

0.075

0.100

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 2

0.00

0.05

0.10

0.15

0.20

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π

0.000

0.025

0.050

0.075

0.100

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 4

0.00

0.05

0.10

0.15

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 2

0.00

0.05

0.10

0.15

0.20

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π

0.0

0.1

0.2

0.3

0.4

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 4

0.0

0.2

0.4

0.6

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 2

0.00

0.25

0.50

0.75

1.00

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π


0

5

10

0 5 10 20ε

∆(Σ

)σ : π 8 kε : π 4

0

20

40

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π 2

0.0

2.5

5.0

7.5

10.0

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π

0

1

2

3

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 4

0

5

10

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 2

0.0

0.5

1.0

1.5

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π

0.0

0.2

0.4

0.6

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 4

0.0

2.5

5.0

7.5

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 2

0

5

10

15

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π


0.000

0.005

0.010

0.015

0.020

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 4

0.00

0.02

0.04

0.06

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 2

0.00

0.05

0.10

0.15

0.20

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π

0.00

0.01

0.02

0.03

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 4

0.00

0.02

0.04

0.06

0.08

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 2

0.00

0.03

0.06

0.09

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π

0.000

0.025

0.050

0.075

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 4

0.00

0.05

0.10

0.15

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 2

0.00

0.05

0.10

0.15

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π


0

3

6

9

12

0 5 10 20ε

∆(Σ

)σ : π 8 kε : π 4

0

10

20

30

40

50

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π 2

0.0

2.5

5.0

7.5

10.0

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π

0.0

0.5

1.0

1.5

2.0

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 4

0

3

6

9

12

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 2

0.0

0.5

1.0

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π

0.0

0.1

0.2

0.3

0.4

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 4

0.0

0.5

1.0

1.5

2.0

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 2

0.00

0.05

0.10

0.15

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π


0.000

0.005

0.010

0.015

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 4

0.00

0.02

0.04

0.06

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π 2

0.00

0.01

0.02

0.03

0 5 10 20ε

AS

(µ)

σ : π 8 kε : π

0.000

0.005

0.010

0.015

0.020

0.025

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 4

0.00

0.02

0.04

0.06

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π 2

0.00

0.05

0.10

0.15

0.20

0 5 10 20ε

AS

(µ)

σ : π 4 kε : π

0.0

0.2

0.4

0.6

0.8

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 4

0.0

0.2

0.4

0.6

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π 2

0.0

0.2

0.4

0.6

0 5 10 20ε

AS

(µ)

σ : π 2 kε : π


0

10

20

30

0 5 10 20ε

∆(Σ

)σ : π 8 kε : π 4

0

50

100

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π 2

0

25

50

75

100

125

0 5 10 20ε

∆(Σ

)

σ : π 8 kε : π

0.0

2.5

5.0

7.5

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 4

0

10

20

30

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π 2

0

20

40

60

0 5 10 20ε

∆(Σ

)

σ : π 4 kε : π

0

10

20

30

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 4

0

10

20

30

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π 2

0

10

20

30

0 5 10 20ε

∆(Σ

)

σ : π 2 kε : π


Documents

Robust Estimation for Multivariate Wrapped ModelsRobust Estimation for Multivariate Wrapped Models Giovanni Saraceno 1, Claudio Agostinelli 1, and Luca Greco 2 1 Department of Mathematics,