Additive Model Building for Spatial Regression - … Model Building for Spatial Regression ... where a number of demographic and socio-economic predictors are ... s2Rdgbe a spatial

Additive Model Building for Spatial Regression

Siddhartha Nandy, Chae Young Lim and Tapabrata MaitiDepartment of Statistics & Probability

Michigan State University, East Lansing, MI 48824Email: nandysid, lim, [email protected]

Abstract

Spatial regression is an important predictive tool in many scientific applications andan additive model provides a flexible regression relationship between predictors and a re-sponse variable. Such model is proved to be effective in regression based predictions. Inthis article, we develop a regularized variable selection technique for building a spatial ad-ditive model. We found that the approaches developed for independent data do not workwell for spatially dependent data. This motivates us to propose a spatially weighted `2-error norm with a group LASSO type penalty to select additive components for spatial ad-ditive models. We establish the selection consistency of the proposed approach where thepenalty parameter depends on several factors, such as the order of approximation of additivecomponents, characteristics of the spatial weight and the rate of decay of true covariancefunction, etc. An extensive simulation study provides a vivid picture of the impacts of de-pendent data structures, the choice of a spatial weight as well as the asymptotic behaviorof the estimates. We also investigate the impact of correlated predictor variables. As an il-lustrative example, the proposed approach is applied to lung cancer mortality data over theperiod of 2000-2005, obtained from Surveillance, Epidemiology, and End Results Program(SEER, www.seer.cancer.gov) where a number of demographic and socio-economicpredictors are available.

Keywords. Additive models; Increasing-domain asymptotics; Group LASSO; LASSO; Spatialregression; Variable selection.

1 IntroductionIdentifying important factors that explain certain phenomena reasonably well such as climatechange, economic volatility, ecological dynamics and disease progresses, etc. is important yetfairly challenging. In such applications, spatially dependent data are often observed and spatialregression is a natural tool to analyze the data. An additive model provides a flexible regres-sion relationship and is proven to be effective for regression based predictions. The modelsdeveloped for independent data are known to be statistically inefficient in this context. Thus the

1

statistical models for dealing with spatially dependent data received huge attention over the lastfew decades.

A common feature for spatial data is a spatial relation among the sampling sites. In general,we assume that dependence between two data points at two sampling sites decreases as thedistance between two sites increases. At each sampling location, we observed a quantity ofinterest (response variable) with additional information (covariates or predictors) that couldaffect the response. A natural approach to identify contributing factors is selecting covariatesin a regression set up, commonly known as variable selection. Among many variable selectionmethods, the regularization technique received huge attention following the seminal paper byTibshirani (1996).

The current literature of variable selection concentrates heavily on regression models builtfor independent observations. The methods are not expected to work well for spatially depen-dent data without any adaptation. Further the theoretical justification for spatial data requiresspecial attention and so true for the variable selection. There are studies on variable selection fortime series data. Typical time series data can be viewed as a special case of spatial lattice databy reducing the dimension of the observation domain to one and assuming the data are observedevenly over time. Wang et al. (2007) studied selection of regression coefficients and autore-gressive order via the LASSO when the regression model has autoregressive errors. Nardi andRinaldo (2011) considered the LASSO for autoregressive process modeling. Hsu et al. (2008)applied the LASSO to select the subset for vector autoregressive processes. Xu et al. (2012)studied variable selection for autoregressive models which has infinite variance. While theseapproaches deal with a specific dependence structure (autoregressive structure), Gupta (2012)investigated variable selection of linear regression models for time series data with a generalweak dependence structure.

Model selection or variable selection for spatial data has been investigated relatively re-cently. Hoeting et al. (2006) derived Akaike’s Information Criterion (AIC) for a geostatisticalmodel and used it to select explanatory variables when spatial correlation is considered. Huangand Chen (2007) introduced model selection criterion with a generalized degrees of freedomto select a spatial prediction method for geostatistical data. Huang et al. (2010a) considered aspatial LASSO for selecting covariates and spatial neighborhoods with a known spatial depen-dence structure but no theoretical investigation was made. On the other hand, Wang and Zhu(2009) considered penalised least squares for geostatistical data with various penalty functionsand investigated theoretical properties.

In likelihood based approaches, Zhu et al. (2010) considered selection of spatial linearmodels together with a spatial neighborhood structure for spatial lattice data using a penalizedmaximum likelihood method under a spatial adaptive LASSO. Chu et al. (2011) investigatedvariable selection for spatial linear models for geostatistical data using a penalized maximumlikelihood method. They considered an approximated penalized maximum likelihood approachwith a tapered spatial covariance function. Reyes et al. (2012) extended the approach by Zhu

2

et al. (2010) for spatial-temporal lattice models and applied the method to identify factorsthat propagate an outbreak of an insect. For spatial binary data, Fu et al (2013) consideredselection of autologistic regression models using a penalized pseudolikelihood to study landcover in relation to land ownership. To our knowledge, there is no work on spatial additivemodel selection with theoretical justification. The purpose of this article is to link the rapidlygrowing variable selection techniques with two widely applicable fields, namely spatial andnonparametric statistics.

For the flexible relationship between the response and the regressor variables, consider anadditive model with spatially dependent error distribution. Consider Y (s); s ∈ Rd be aspatial process on Rd and X(s); s ∈ Rd be a J-dimensional vector which can be stochastic,if necessary. We consider a spatial additive model given in (1) with the overall mean µ and fjbeing an unknown function describing the relation between Y and Xj , for all j = 1, . . . , J .We assume the error process is a zero mean stationary Gaussian random field with a covariancefunction, δ(h) for distance h. J could be larger than sample size. Our objective is to select the‘effective’ fj’s.

For selection and estimation of nonlinear components, fj , Huang et al. (2010b) proposedan approach to use an adaptive group LASSO for the additive model (1) but with indepen-dent errors. Their work is based on spline approximation [Schumaker (2007)] of non-linearcomponents which led to rewrite the mean of (1) as a linear regression model with spline coef-ficients. Hence the problem of selecting a component in an additive model is transformed intoselecting groups of variables in a linear regression with predefined group members. Meier etal. (2009) also considered variable selection for high dimensional additive models using splineapproximation but with a sparsity-smoothness penalty, which controls both sparsity as well assmoothness in spline approximation. Several other works on the selection of additive modelsor additive nonparametric regression comprise of Antoniadis and Fan (2001), Lin and Zhang(2006), Ravikumar et al. (2009) and Lai et al.(2012), etc. These works assumed independenterror distributions. For geo-additive regression models, Kneib et al. (2009) considered a vari-able selection method using a penalized spline approach for breeding bird communities data butno theoretical justification was discussed in their work.

Through a small simulation study, we examined a group LASSO approach developed forindependent data to select nonzero components in an additive model when the true errors arespatially dependent. For the additive model (1), we considered J = 10 with two true nonzerocomponents, f1(x) = sin(x) and f2(x) = x. We considered m × m unit square lattices withm = 6, 12, 24 and for the spatial errors we used Gaussian distribution with an exponentialcovariance function, δ(h) = exp(−ρ|h|), ρ = 0.5. We generated 400 datasets. Since thespatial dependence can also be captured by a function of the location in the mean, we alsoinvestigated two different intercepts in addition to the additive component model (1): one is aconstant and another is a nonparametric function of the location using spline approximation.For a nonparametric function of location s = (s1, s2), we considered an additive structure in

3

Table 1: Average and Standard deviation for the number of selected covariates using 400 datasets from the exponential covariance function,δ(h) = exp(−ρ|h|) with ρ = 0.5. The true number of nonzero components is 2. m represents the size of the observation domain.

mConstant A function of the location

Average Standard Deviation Average Standard Deviation6 5.64 1.74 3.61 1.23

12 6.61 1.60 4.31 1.6224 7.22 1.38 5.91 2.01

terms of s1 and s2.The Table 1 shows the average and standard deviation of the number of selected components

when the independence approach is used. The regularization parameter (or penalty parameter)is chosen as recommended by Huang et al. (2010b). The number of selected covariates is muchlarger than the true number of nonzero components for both cases at various sample sizes.Although a function of location as an additional mean component helps improve the selectionresults in this simulation study, it is not satisfactory. The performance did not improve even witha larger sample size, which might indicate revisiting the asymptotic properties in the context ofspatially dependent data. This leads us to investigate a variable selection method specially forspatial additive models. The comprehensive simulation study is given in Section 4.

For a spatial additive model, we maintain the mean structure of the independent data groupLASSO approach, that is, we use the idea of approximating fj by a linear combination ofspline functions and a group LASSO penalty for sparsity. Then, we introduce a spatial weightmatrix in the objective function that plays a major role. We develop the asymptotic theory toprove the selection consistency of nonzero components in the additive model under spatiallydependent Gaussian errors. The spatial dependence structures we assumed here are commonin spatial modeling and valid for a wide range of applications. Proofs of selection consistencyprovided here is under the assumption that additive components share the same smoothnesslevel. However, we find that the theoretical results can be extended to an additive model with adifferent level of smoothness.

It is well known that the selection result is sensitive to a penalty parameter. We find thata theoretical lower bound for the penalty parameter depends on a spatial dependent structureof the error as well as characteristics of a spatial weight matrix. Our proposal for penaltyparameter selection is thus different than that for independent data. We demonstrate how theproposed penalty parameter selection approach selects the penalty parameter and the weightmatrix in practice.

The rest of the paper is organized as follows. Section 2 explains the proposed approachfor selecting and estimating nonzero components in an additive model with spatially dependenterrors. Section 3 discusses the main theoretical results for asymptotic properties of our proposedestimators. Section 4 contains simulation results along with a real data example for illustration.

4

Finally, we make some concluding remarks in section 5. Proofs of all theorems and lemmas areprovided in the appendix.

2 Method for selecting components in spatial additive modelsConsider Y (s); s ∈ Rd be a spatial process on Rd and X(s); s ∈ Rd be a J-dimensionalvector which can be stochastic, if necessary. Then, we consider the following spatial additivemodel:

Y (s) = µ+J∑j=1

fj(Xj(s)) + ε(s), ∀s ∈ Rd (1)

where µ is the overall mean, fj is an unknown function describing the relation between Y andXj , and ε(s); s ∈ Rd is a zero mean stationary Gaussian random field with a covariancefunction, δ(h) for distance h. J could be larger than sample size.

In this section, we formulate and explain the proposed approach for selecting components inspatial additive models. Suppose that (Y (s),X(s)) are observed at n different locations lyingin the sampling regionDn ⊂ Rn. Let S be the set of sampling locations.

We use a spline approximation of fj in the additive models.

fj(Xj(s)) ≈ fnj(Xj(s)) :=mn∑l=1

βjlBl(Xj(s)), for j = 1, . . . , J, (2)

where Bl(·)s are normalized B-spline bases, and βjls are called control points. The approx-imation (2) is based on the theory which states that every smooth function can be uniquelyrepresented by a linear combination of B-splines. Then, we consider an approximated linearmodel given below.

Y (s) = µ+J∑j=1

mn∑l=1

βjlBl(Xj(s)) + π(s), for s ∈ S, (3)

where π(s) = ε(s) + θ(s) with θ(s) =∑J

j=1(fj(Xj(s)) − fnj(Xj(s))) and also define θ =

(θ(s); s ∈ S)′. The model (3) can be written in a matrix form. Y = µ + Bβ + π, whereY = (Y (s), s ∈ S)′, β = (β′1,β

′2, . . . ,β

′J)′, B is the design matrix constructed by spline

functions and π = (π(s), s ∈ S)′.For v = (v′1,v

′2, . . . ,v

′J)′, where vj’s are vectors, and ψ = (ψ1, ψ2, . . . , ψJ)′, we define

a weighted `1/`2-norm, ‖v‖2,1,ψ =

∑Jj=1 ψj‖vj‖2, where ‖ · ‖2 is the `2-norm of a vector.

This is a weighted `1-norm of (‖v1‖2, · · · , ‖vJ‖2) with a weight vector, ψ. Then, we proposethe following weighted `1/`2-penalized least-squares objective function, weighted by spatialweight matrix ΣW :

Qn(β, λn) = (Y − µ− Bβ)′Σ−1W (Y − µ− Bβ) + λn‖β‖2,1,ψn, (4)

5

where λn is a regularization parameter, ψn = (ψn1, ψn2, . . . , ψnJ)′ is a suitable choice of aweight vector for the penalty term and β is the (Jmn × 1) dimensional vector of control pointsintroduced in (2). ΣW is a known positive definite spatial weight matrix where more weights aregiven if two locations are closer and vice-versa. For example, we can construct ΣW using somecommonly used spatial covariance functions, such as Gaussian or inverse multiquadratic. Thepractical issues will be discussed later. Without a spatial weight matrix, (4) is a usual objectivefunction for independent data additive model with a group LASSO penalty.

We allow the possibility that the regularization parameter, λn, and the weight vector, ψn,can depend on the sample size, n. To avoid an identifiability issue, we assume that Efj(Xj) =

0, ∀1 ≤ j ≤ J , which leads us to assume

mn∑l=1

βjlBl(Xj(s)) = 0, ∀1 ≤ j ≤ J, s ∈ S. (5)

Combining (4) and (5), we have the unconstrained objective function given by

Qn(β, λn) = (Y c − Bcβ)′Σ−1W (Y c − Bcβ) + λn‖β‖2,1,ψn, (6)

where Y c = (Y c(s), s ∈ S)′ = (Y (s) − Y , s ∈ S)′ with Y = 1n

∑s∈S Y (s) and Bc =

(Bc1,Bc2, . . . ,BcJ) is the design matrix with a n by mn matrix Bcj . Each row of Bcj is (Bc1(Xj(s)),

. . . ,Bcmn(Xj(s))) with Bcl (Xj(s)) = Bl(Xj(s))− 1n

∑s′∈S Bl(Xj(s

′)). Minimizing (4) subjectto (5) is equivalent to minimizing (6).

As the first step, we obtain an estimate ofβ by minimizing the objective function,Qn1(β, λn1) :=

Qn(β, λn1) with ψnj = 1, ∀j = 1, . . . , J . We call this estimate as a group LASSO (gL) es-timate, βgL(λn1) and the corresponding objective function as the gL objective function. Notethat ‖β‖2,1,1 =

∑Jj=1 ‖βj‖2. To improve the selection, we use the following updated weights

from βgL

ψnj =

1/‖βgL,j‖2 if ‖βgL,j‖2 > 0,

∞ if ‖βgL,j‖2 = 0.(7)

in an objective function. We call it as an adaptive group LASSO (AgL) objective function. Thatis, we define an AgL objective function, Qn2(β, λn2) := Qn(β, λn2) with ψn given in (7).The estimate from this updated objective function, βAgL(λn2) = arg minβ Qn2(β, λn2) as afunction of λn2 is referred as an adaptive group LASSO estimate.

We define∞×0 = 0, so components not selected by the gL method are not included for theAgL method. Also, due to the nature of weights on the penalty term, the AgL objective functionputs higher penalty on components with smaller `2-norm of the gL estimates and lower penaltyon components with larger `2-norm of the gL estimates. Hence, components with the larger gLestimates have higher chance of being selected in the final model. Finally, the AgL estimates

6

for µ and fj are given by

µ = Y =1

n

∑s∈S

Y (s) and

fAgL,j(Xj(s)) =mn∑l=1

βAgL,jlBl(Xj(s))

for all j = 1, . . . , J , respectively.

Computationally efficient objective function: The objective function, (6), involves Σ−1W whichmay cause computational complicacy particularly for large data. To avoid such situation, wereformulate (6) using Cholesky decomposition of Σ−1W . Let Σ−1W = LL′, where L is a lowertriangular matrix. Then, (6) can be rewritten as,

Qn(β, λn) = (Zc − Dcβ)′ (Zc − Dcβ) + λn‖β‖2,1,ψn, (8)

where Zc = L′Y c and Dc = L′Bc with Dcj = L′Bcj . Then, the objective function becomes

the one with no spatial weight matrix with a new response variable Z so that we can adopt anavailable algorithm for a group LASSO method with independent errors. Note that we usedknown spatial weight matrix so that Cholesky decomposition to get L is done only once.

3 Main Theoretical ResultsIn this section, we present results on asymptotic properties of the gL and AgL estimators in-troduced in Section 2. To do so, we need to introduce some notations. Let A0 and A∗ be thesets of zero and nonzero components, respectively. Without loss of generality, we considerA∗ = 1, 2, . . . , q and A0 = q+ 1, . . . , J. Also, we assume that there exists A0 that satisfies∑

j∈A0‖βj‖2 ≤ η1 for some η1 ≥ 0 and let A∗ = 1, 2, . . . , J\A0. Existence of A0 is referred

to as generalized sparsity condition (GSC) [Zhang and Huang (2008)].First, we consider the selection and estimation properties of a gL estimator. Let Aβ be the

index set of nonzero gL estimates for βj and Af be the index set of nonzero gL estimators forfj . Necessary assumptions to study asymptotic properties are given in Assumption 1.

Assumption 1.

(H 1) Among J covariates, the number of nonzero components, q, is fixed and there exists aconstant kf > 0 such that min1≤j≤q ‖fj‖2 ≥ kf .

(H 2) There exists v > 0 such that mins 6=s′∈S ‖s− s′‖2 > v, where ‖ · ‖2, for a vector, is the

`2-norm so that ‖s− s′‖2 is the Euclidean distance between s and s′.

(H 3) The random vector ε = ε(s), s ∈ S ∼ Gaussian(0,ΣT ), where ΣT = ((σs,s′))s,s′∈Swith σs,s′ = δ(s−s′) and δ(h) is a covariance function such that

∫Dn

δ(h)dh = O(nα)

7

for some α ∈ [0, 1). Dn ⊂ Rd is the sampling region that contains the sampling locationsS. Without loss of generality, we assume that the origin of Rd is in the interior ofDn andDn is increasing with n.

(H 4) fj ∈ F and Efj(Xj) = 0 for j = 1, . . . , J , where

F =f |∣∣f (k)(s)− f (k)(t)

∣∣ ≤ C |s− t|ν , ∀s, t ∈ [a, b]

for some nonnegative integer k and ν ∈ (0, 1]. Also suppose that τ = k + ν > 1.

(H 5) The covariate vector X has a bounded continuous density function gj(x) of Xj on abonded domain [a, b] for 1 ≤ j ≤ J .

(H 6) mn = O(nγ) with 1/6 ≤ γ = 1/(2τ + 1) < (1− α)/3.

(H 7) There existsM <∞ such that, κ(ΣW ) = κ(Σ−1W ) ≤M , where κ is the condition numberof a matrix.

The assumption (H 1) indicates that we need strong signals for nonzero components todistinguish them from the noise. The assumption (H 2) implies that we consider increasing-domain asymptotics for our large sample properties, which is a common sampling assumptionfor the asymptotic theory of spatial statistics.

The assumption (H 3) is related to the level of spatial dependence. Any integrable stationaryspatial covariance functions imply α = 0 and commonly available stationary spatial covariancemodels are within this category. For example, the covariance functions introduced in Table 2are all integrable. By allowing 0 < α < 1, we include covariance models of long-memoryprocesses. Thus, Theorems under this assumption cover a broader class of spatial covariancefunctions. We assume thatDn contains the origin to have spatial lag h and observation locations on the same spatial domain. The stationary spatial covariance function δ(h) gives a marginalvariance at h = 0 (i.e. s = s′) and decreases toward zero as ‖h‖ → ∞. The condition onδ(h) in the assumption (H 3) becomes meaningful by assuming Dn contains the origin. Forexample, the condition does not guarantee whether δ(h) is integrable (which corresponds toα = 0) if we do not assume thatDn contains the origin.

The assumption (H 4) considers fj is in the class of functions defined on [a, b] such thatthe kth derivative satisfies the Lipschitz condition of order ν and zero expectation condition isneeded to avoid an identifiability issue. Non-zero constant mean can be captured by an overallmean µ so that we assume the mean of each additive component is zero.

The assumption (H 5) is needed to have spline approximation for additive components. Theassumption (H 6) is related to the number of B-spline bases to approximate additive compo-nents. The parameter γ in assumption (H 6) controls the smoothness of additive components,where imposing an upper and lower bound to γ implies that those functions can be neithertoo smooth nor too wiggly. If functions are too smooth, then it would be hard to detect those

8

distinctively from the overall mean, whereas, if the functions are too wiggly, then it would behard to detect those distinctively from the random error. Thus, this is a necessary assumption toidentify nonzero components in additive models.

The assumption (H 7) implies that the spatial weight matrix, ΣW , is a well-conditioned ma-trix. Note that we consider a spatial covariance function to construct the spatial weight matrix.In fact, one can see that, under the assumption (H 2), the smallest eigenvalue of the covariancematrix constructed by several spatial covariance functions is bounded away from zero. Thosespatial covariance functions include Gaussian and inverse multiquadratic covariance functions[see, e.g. Wendland (2005)]. The Gaussian and inverse multiquadratic spatial covariance func-tions are given in Table 2. On the other hand, the largest eigenvalue of a covariance matrix isrelated to the norm of the covariance matrix. By the Geršgorin’s theorem [Horn and Johnson(1985)], we can show that ρmax(ΣW ) ≤ maxj

∑k |ΣW,jk|, where ΣW,jk is the (j, k)-th entry of

ΣW . When we use one of stationary spatial covariance functions, δ(h), to construct a spatialweight matrix, ρmax(ΣW ) ≤ maxj

∑k δ(sj−sk) and this is bounded by a finite constant which

is independent of n for most of stationary spatial covariance functions.Now, we introduce the selection consistency result for the gL estimator.

Theorem 1. Suppose that conditions in Assumption 1 hold and if λn1 > Cρmax (L)√n1+αmn log(Jmn)

for a sufficiently large constant C. Then, we have

(a)∑J

j=1 ‖βgL,j − βj‖22 = Op

(ρ2

max(L)m3n log(Jmn)

n1−α + mnn1−α + 1

m2τ−1n

+4m2

nλ2n1

n2

),

(b) If m2nλ

2n1

n2 −→ 0 as n −→∞, all the nonzero components βj, 1 ≤ j ≤ q are selected withprobability (w.p.) converging to 1.

The spatial dependence structure of the data and the spatial weight matrix are involved inthe theoretical lower bound of the penalty parameter and the convergence rate via the maximumeigenvalue ofL and α. L is the Cholesky decomposition component of Σ−1W and α characterizesthe spatial process. This implies that the spatial dependence of the data and the spatial weightmatrix affect the convergence rate and the selection of components through the choice of thepenalty parameter. This might trigger someone to estimate these additional parameters, namelythe ρmax(L) and α immediately. However we argue from theoretical and numerical standpointthat one does not need to do this painful job in most practical situations. First, the choice ofa weight matrix requires to satisfy the assumption (H 7) meaning the weight matrix need notnecessarily be the true covariance matrix. As discussed before, we use a parameterized spatialcovariance functions to construct a spatial weight matrix. The selection of the parameter fora spatial weight matrix is discussed in the next section. Second, for any integrable stationaryspatial covariance model, α = 0 and this is the case for most practical situations. If 0 < α < 1

which corresponds to a long-range dependence, one might estimate it to calculate the lowerbound of the penalty parameter. There are some literature which provide how to estimate long-range parameters for random fields [e.g. Anh and Lunney (1995), Boissy et al. (2005)] but they

9

are limited since a specific class of random fields is assumed. Here we argue that the choice ofα may not be critical in practice as we will be selecting the penalty parameter (discussed morein depth later) guided by the theoretical lower bound to provide a plausible range of λn1 andthus avoiding the estimation of α.

Besides the parameters from spatial weight matrix and the true covariance matrix, the lowerbound of λn1 contains extra mn, the number of basis functions for approximating the fj’s, incontrast to the independent data situation. This additional mn comes from the bound of theexpected value for a function of spatially dependent error (see, Lemma 3). All these additionalquantities make the lower bound of λn1 for spatial additive models larger compared to indepen-dent error models. The larger lower bound reduces overestimation even in case of ΣW = I.This is also observed in the simulation study.

To prove Theorem 1 (details given in the Appendix B), we need to control spline approxima-tion of fj as well as the spatial dependence in the error terms. The boundedness of the numberof selected components is critical to prove the theorem. These are investigated as lemmas inAppendix B and used in the proof of the theorem. Although the approach to prove the asymp-totic properties stated above for spatial additive models is similar to the one for additive modelswith independent error, details are different due to spatial dependence as well as spatial weightmatrix in the proposed method.

Next theorem provides selection consistency in terms of the estimated nonlinear componentsfj .

Theorem 2. Suppose that conditions in Assumption 1 hold and if λn1 > Cρmax (L)√n1+αmn log(Jmn)

for a sufficiently large constant C. Then,

(a) ‖fgL,j − fj‖22 = Op

(ρ2

max(L)m2n log(Jmn)

n1−α + 1n1−α + 1

m2τn

+4mnλ2

n1

n2

)for j ∈ Aβ ∪ A∗,

where Aβ is the index set of nonzero gL estimates for βj ,

(b) If mnλ2n1

n2 −→ 0 as n −→∞, all the nonzero components fj, 1 ≤ j ≤ q are selected w.p.converging to 1.

Note that by (a) and (b) of Theorem 2 with λn1 = O(ρmax (L)√n1+αmn log(Jmn)) and

mn = O(nγ) with 1/6 ≤ γ = 1/(2τ + 1) < (1− α)/3 (assumption (H6)), we have,

(i) ‖fgL,j − fj‖22 = Op

(ρ2

max(L)n2γ log(Jmn)

n1−α

), for j ∈ Aβ ∪ A∗ and,

(ii) Ifρ2

max(L) log(J)

n1−α−2γ −→ 0 as n −→ ∞ then, with probability converging to 1, all thenonzero components fj, 1 ≤ j ≤ q are selected.

Hence, we can infer that, the number of additive components, J , can be as large upto,exp

(o(ρ2min (L)n(1−α−2γ))). In particular, if we need second order differentiability in the

functions fj , then τ = 2 which implies γ = 1/5 and in which case J can be as large asexp

(o(ρ2min (L)n(3/5−α))). Keeping α and L fixed, we can see that J increases exponentially

in n.

10

Next, we state the additional assumptions for the asymptotic properties of the AgL estimator.

Assumption 2.

(K 1) The initial estimators, βgL,j , are rn-consistent :

rn max1≤j≤J

‖βgL,j − βj‖2 = Op(1), as rn −→∞,

and there exists a constant kb > 0 such that

P(minj∈A∗‖βgL,j‖2 ≥ kbbn1) −→ 1

where bn1 = minj∈A∗ ‖βj‖2 m1/2n , where for two positive sequences an and bn, an bn

if there exists a1 and a2 such that 0 < a1 < an/bn < a2 <∞.

(K 2) √ρ2max (L)n1+αmn log(snmn)

λn2rn+

n2

λ2n2r2nmn

+λn2mn

n= o(1)

where sn = J − |A∗∗|. A∗∗ is the set of indices that correspond to the componentsin the additive model which are correctly selected by the AgL approach. Mathematicaldefinition is given in the proof of Theorem 3.

Note that (K 1) ensures the availability of a rn-consistent estimator provided certain regularityconditions are satisfied. Also one can observe that under assumptions (H 1) to (H 7), a suitable

choice of rn is(ρ2

max(L)m3n log(Jmn)

n1−α

)−1/2. Then, (K 2) can be replaced by (K 2)′ given as,

(K 2)′λn1√mn

λn2+λn2mn

n= o(1).

Detail derivation is given in the Appendix A).

For the selection consistency of the AgL estimator, we introduce “βAgL =0 β” which meansthat sgn0(‖βAgL,j‖2) = sgn0(‖βj‖2) for all j, where sgn0(‖x‖2) = 1 if ‖x‖2 > 0 and = 0 if‖x‖2 = 0. Define J0 = |A∗ ∪ j : ‖βAgL,j‖2 > 0|. Note that J0 is bounded by a finite numberw.p. converging to 1 by Theorem 1 and because |A∗| = q.

Theorem 3. Suppose that conditions in Assumptions 1 and 2 are satisfied. Then,(a) P(βAgL =0 β) −→ 1,

(b)∑q

j=1 ‖βAgL,j − βj‖22 = Op

(ρ2

max(L)m3n log(J0mn)

n1−α + mnn1−α + 1

m2τ−1n

+4m2

nλ2n2

n2

).

11

By Theorem 3, we can show that the proposed AgL estimator of β can separate out truezero components for the spatial additive model. Similar to Theorem 1, we have additionalquantities on the right hand side of the expression in Theorem 3 (b), which are due to the spatialdependence in errors as well as use of a spatial weight matrix. Since J0 is bounded by a finitenumber w.p. converging to 1, the convergence rate of the AgL estimator of β is faster thanthe gL estimator of β. Next theorem shows that estimated components for fjs in the spatialadditive model using the AgL estimator of β can identify zero components consistently. Also,the Theorem provides the convergence rate for the estimated components.

Theorem 4. Suppose that conditions in Assumptions 1 and 2 are satisfied. Then,(a) P(‖fAgL,j‖2 > 0, j ∈ A∗ and ‖fAgL,j‖2 = 0, j /∈ A∗) −→ 1,

(b)∑q

j=1 ‖fAgL,j − fj‖22 = Op

(ρ2

max(L)m2n log(J0mn)

n1−α + 1n1−α + 1

m2τn

+4mnλ2

n2

n2

).

The upper bounds of the convergence rates in Theorems 1 – 4 show that the convergencerates are slower than that for independent data, which is not surprising given that we are dealingwith dependent data. Also, our theoretical results show we need an improved lower bound ofthe penalty parameter for spatial additive models, which is critical since the penalty parameteris sensitive to the selection results in practice. This is supported by the simulation study in thenext section where we can clearly see worse performance when we blindly apply the approachdeveloped for independent data to select non-zero components of spatial additive models.

We assumed that each additive component shares the same smoothness by the assumption(H 4) in Assumption 1. We can extend our results to allow different levels of smoothness foradditive components without much difficulty. Necessary changes in Assumption 1 are(H 4)′ fj ∈ Fj and Efj(Xj) = 0 for j = 1, . . . , J , where

Fj =f |∣∣f (kj)(s)− f (kj)(t)

∣∣ ≤ C |s− t|νj , ∀s, t ∈ [a, b]

for some nonnegative integer kj and νj ∈ (0, 1]. Also suppose that τj = kj + νj > 1, and,(H 6)′ mnj = O(nγj) with 1/6 ≤ γj = 1/(2τj + 1) < (1−α)/3, ∀j = 1, 2, . . . , J , where mnj

is the number of B-spline bases (or knots) to approximate the jth additive component.

The revised theorems and lemmas are provided in the appendix C, wheremn = maxj=1,...,J mnj .Results are similar except a few changes due to the introduction of mnj . In practice, we stan-dardize the range and variability of all the additive components and use the same number ofknot points for each component which we choose as mn. Note that the largest number of knotpoints mn can cover all smooth additive components. Since the order of mn also has to bebetween

[n1/6, n(1−α)/3) according to the assumption (H 6)′, we use this bound as a guide line

to choose mn. The suggestion of using the same number of knot points for each component, inpractice, is also suggested by Hastie and Tibshirani (1990).

12

3.1 Selection of a regularization parameter and a spatial weight matrix

The selection result is sensitive to the choice of a regularization parameter. In addition to theregularization parameter, the proposed approach for a spatial additive model requires to choosea spatial weight matrix as well. Theoretical results only provide the lower bound of the regular-ization parameter which involves the information of a spatial weight matrix through ρmax(L).There is a large literature on how to select the regularization parameter in the penalized methodsto select covariates. We can not apply such approaches directly to our setting due to the spatialweight matrix. Complete theoretical investigation for finding an optimal value could be inter-esting, however, beyond the scope of this article, we believe. Alternatively, we demonstrate howthe theoretical results derived in this paper could be used to select the regularized parameter inpractice without introducing much of complicacy.

Since there are many choices to construct a spatial weight matrix, we first assume that aspatial weight matrix is constructed by a class of stationary spatial covariance functions con-trolled by a parameter, ρ for simplicity. We call ρ a spatial weight parameter. Example classesof spatial covariance functions that satisfies the assumption (H 7) to construct a spatial weightmatrix are Gaussian covariance function and Inverse Multiquadratic function. The explicit ex-pression of these functions are given in Table 2. The selection problem is then reduced to selecta spatial weight parameter. To choose a regularization parameter together, we consider a the-oretical lower bound of the regularization parameter as our regularization parameter value at agiven value of ρ so that one parameter (a spatial weight parameter) controls both regularizationlevel and spatial weight. Finally, we adopt generalized information criterion (GIC) [Fan andTang (2013)] as a measure to choose a spatial weight parameter. That is, we find ρ that mini-mize, GIC(λn(ρ)) = log (RSS) + dfλn

log log(n) log(J)n

. In practice, we consider a sequence of ρvalues and choose the one that minimizes GIC. While experimenting with different informationcriterions and comparing with the existing cross validation criterion suggested by Yuan and Lin(2006), we noticed that we cannot have an initial least square estimator when J > n. Thus,we define degrees of freedom (dfλn) as the total number of estimated non-zero components, i.e.dfλn = qλnmn where qλn is the active set of selected variables.

For the independent data case, Huang et al. (2010b) suggested extending the Bayesianinformation criterion (EBIC) to choose the regularization parameter, which requires to choosean additional parameter (ν in their paper). A smaller value compared to the suggested value forthe additional parameter works better in our setting in the simulation study. Given the sensitivityof EBIC with this additional parameter, we instead recommend to use GIC, which does not haveany additional parameter.

There are two places where a spatial covariance model is considered. One is for modelinga spatial dependence structure of the data and the other one is for constructing a spatial weightmatrix in the objective function. One may think that a true spatial covariance matrix as a spatialweight matrix could work better instead of a known spatial weight matrix. In fact, using atrue spatial covariance matrix as a spatial weight matrix is not validated by the theory for some

13

Name Parameters δ(h)

Exponential, Exp(ρ) ρ σ2 exp(−ρ|h|)Matérn, Matν(ρ) ν, ρ σ2 21−ν

Γ(ν)

(√2ν|h|ρ

)νKν(√

2ν|h|ρ

)Inverse Multiqudratics, InvMQ(ρ) ρ σ2(1 + (|h|)2)−ρGaussian, Gauss(ρ) ρ σ2 exp(−ρ2|h|2)

Table 2: Spatial covariance functions, δ(h), considered in the simulation study for a covariance model and a spatial weight matrix.

covariance models. Our theory shows that the method is valid for a class of weight matricesthat satisfy the condition (H 7) and for a class of underlying spatial distributions that satisfythe condition (H 3). Some spatial covariance models that satisfy (H 3) may not satisfy (H 7).Thus, we claim that our method is more general since it includes the case such that the truespatial covariance matrix as a spatial weight matrix as long as the conditions (H 3) and (H 7)are valid. Also, our objective is not to estimate the true covariance matrix and our method doesnot require to estimate the true covariance matrix to select additive components.

4 Numerical investigation4.1 Simulation study

In this section, we present a simulation study to illustrate our theoretical findings. We con-sider S = (s1, s2), si, sj = 1, · · · ,m, with m = 6, 12, 24 which makes sample sizes,n = 36, 144, 576, respectively and we consider J = 15, 25, 35. We have q = 4 nonzero compo-nents which are f1(x) = 5x, f2(x) = 3(2x− 1)2, f3(x) = 4 sin(2πx)/(2− sin(2πx)),f4(x) =

0.6 sin(2πx) + 1.2 cos(2πx) + 1.8 sin2(2πx) + 2.4 cos3(2πx) + 3.0 sin3(2πx) and the remain-ing components are set to zero. That is, fj(x) ≡ 0 for j = 5, · · · , J . The covariates areXj = (Wj + tU)/(1 + t), for j = 1, · · · , J , where Wj and U are i.i.d. from Uniform[0, 1].Note that correlation between Xj and Xk is given as t2/(1 + t2), therefore, with an increase int, the dependence among covariates increases. In the simulation study, we consider t = 1 andt = 3. The nonzero components, f1, · · · , f4 given above are taken from Huang et al. (2010b)which are originally introduced by Lin and Zhang (2006).

We assume that the error process follows a stationary mean zero Gaussian process with aspatial covariance function, δ(h). To investigate selection performance of the proposed methodby the type of spatial dependence of the process, we consider three different covariance mod-els: Exponential, Matérn and Gaussian covariance functions which are listed in Table 2. Forsimplicity, we set the variance, σ2 = 1. For an exponential covariance function, we considerρ = 0.5 and 1. For a Matérn covariance function, we consider ν = 3/2 and 5/2 and for eachof these cases we have chosen ρ to be 1.5 and 2.5. For a Gaussian covariance function, weconsider ρ = 1.5 and 2.5. The parameter ρ controls how fast the covariance function decays asthe distance ‖h‖ increases, thus, the level of spatial dependence within the covariance model.Given specified fjs and spatial dependence structure of the error process described above, we

14

generate 100 data sets for each case.Three covariance functions are characterized by mean square differentiability of the process.

A Gaussian process with an exponential covariance function is continuous in a mean squaredsense while a Gaussian process with a Gaussian covariance function is infinitely differentiable.As an intermediate, a Gaussian process with a Matérn covariance function is dν − 1e timesdifferentiable in a mean squared sense, where dxe is the smallest integer larger than or equal tox [Stein (1999)]. The mean square differentiability is related to the smoothness of the processes,that is, the local behavior of the processes, thus, we can also investigate selection performancein view of local property of the processes by considering different types of covariance functions.

Once the data is generated, the selection performance of the proposed method is examinedunder several choices of a spatial weight matrix. In particular, we considered two classes: Gaus-sian and Inverse Multiquadratic functions. In addition to these two classes, we also consideredidentity matrix and the true covariance function of the underlying process as a spatial weightmatrix as well for comparison. When a spatial weight matrix is controlled by a spatial weightparameter, we applied the approach introduced in the section 3.1 to select both a spatial weightparameter and a regularization parameter. When we consider no spatial weight matrix, we donot have a spatial weight parameter to control. In this case, we considered a sequence of theregularization parameter values around the theoretical lower bound of λn1 and choose the onethat minimizes GIC. Note that the true spatial processes we considered are stationary so thatα = 0.

We also implemented the method by Huang et al. (2010b) which was developed for inde-pendent data for comparison. We refer this approach as ‘independent approach’.Following thestandard practice, we computed the average and standard deviation of True Positive (TP) andFalse Positive (FP). TP is the number of additive components that are correctly selected, FPis the number of additive components that are falsely selected. In our simulation setting, thedesired values of TP and FP are 4(= q) and 0, respectively.

Table 3 shows the selected results for the case with t = 1 when generating Xj , m = 6, 24

and a selected set of true correlation parameter values. Complete simulation results are given inthe Appendix D. To illustrate the applicability of our method for the case where the dependencebetween covariates increases, we include results corresponding to t = 3 in the Appendix D aswell.

The first rows (where Spatial Weight is ‘None(Indep)’) of each of the covariance modelblock in Table 3 are corresponding to the independent approach. This gives a clear picture ofoverestimation as per FP components are concerned. By looking at TP, one may think the resultby the independent approach is good but with large FP, the method is selecting many more thanthe truth. The trend remains even with increased sample size. This is expected as discussedbefore.

The following rows are results from various choices of spatial weight matrix including theΣW = In. Note that the spatial weight parameter involved here was chosen as introduced be-

15

J Cov Model Spatial Weightsm=6 m=24

GLASSO GLASSOTrue Positive False Positive True Positive False Positive

15

Exp(0.5)

None(Indep) 3.74(0.48) 3.63(1.83) 4(0) 0.55(0.77)I 3.17(0.82) 2.02(1.34) 4(0) 0(0)

Gauss 3.01(0.85) 1.76(1.24) 4(0) 0(0)InvMQ 2.94(0.93) 1.58(1.44) 4(0) 0(0)

True 1.66(1.02) 0.29(0.56) 4(0) 0(0)

Mat3/2(2.5)

None(Indep) 3.8(0.45) 3.63(1.86) 4(0) 0.34(0.57)I 3.26(0.77) 1.87(1.5) 4(0) 0.05(0.22)

Gauss 2.98(0.85) 1.64(1.34) 4(0) 0.01(0.1)InvMQ 2.74(0.86) 1.5(1.18) 4(0) 0(0)

True 0.8(0.68) 0.09(0.32) 4(0) 0(0)

Mat5/2(2.5)

None(Indep) 3.82(0.41) 3.59(2.08) 4(0) 0.58(0.88)I 3.22(0.76) 2.06(1.55) 4(0) 0.04(0.2)

Gauss 2.91(0.89) 1.72(1.35) 4(0) 0(0)InvMQ 2.97(0.89) 1.67(1.33) 4(0) 0(0)

True 0.37(0.56) 0.04(0.24) 4(0) 0(0)

Gauss(1.5)

None(Indep) 3.76(0.49) 3.97(2.06) 4(0) 0.59(0.81)I 3.23(0.75) 2.07(1.24) 4(0) 0.03(0.17)

Gauss 3.02(0.82) 1.86(1.25) 4(0) 0.01(0.1)InvMQ 2.76(0.79) 1.58(1.17) 4(0) 0(0)

True 3.3(0.73) 2.32(1.55) 4(0) 0.12(0.33)

35

Exp(0.5)

None(Indep) 3.38(0.74) 6.22(2.39) 4(0) 1.49(1.55)I 2.62(0.94) 3.15(2.14) 4(0) 0.04(0.2)

Gauss 2.39(0.98) 2.59(1.84) 4(0) 0.02(0.14)InvMQ 2.22(1.04) 2.38(1.79) 4(0) 0(0)

True 1.21(0.95) 0.48(0.78) 4(0) 0(0)

Mat3/2(2.5)

None(Indep) 3.48(0.67) 5.79(2.32) 4(0) 1.25(1.37)I 2.71(0.9) 2.94(1.75) 4(0) 0.1(0.39)

Gauss 2.57(0.92) 2.59(1.6) 4(0) 0.01(0.1)InvMQ 2.25(0.9) 2.24(1.68) 4(0) 0(0)

True 0.52(0.58) 0.12(0.41) 4(0) 0(0)

Mat5/2(2.5)

None(Indep) 3.33(0.74) 5.59(2.19) 4(0) 1.34(1.51)I 2.72(1) 3.06(1.97) 4(0) 0.1(0.33)

Gauss 2.53(1.04) 2.57(1.81) 4(0) 0.01(0.1)InvMQ 2.25(0.97) 2.17(1.68) 4(0) 0(0)

True 0.34(0.57) 0.05(0.26) 4(0) 0(0)

Gauss(1.5)

None(Indep) 3.15(0.88) 6.37(1.95) 4(0) 1.78(1.54)I 2.44(1.02) 3.35(1.98) 4(0) 0.08(0.27)

Gauss 2.24(0.95) 2.83(1.61) 4(0) 0.02(0.14)InvMQ 2.18(0.9) 2.2(1.41) 4(0) 0(0)

True 2.78(0.87) 3.66(1.74) 4(0) 0.17(0.45)

Table 3: Monte Carlo Mean (Standard dev.) for the selected number of nonzero covariates using100 datasets under both Independent and Dependent setup using a spatially weighted group LASSOalgorithm

16

fore. Our method successfully reduced the overestimation. Even when the spatial weight matrixis the identity matrix under dependent error models, our method still reduces overestimation ofselected components compared to the independent approach. The result persists for all samplesizes (m), all covariance models we considered, choices of spatial weight matrices and of coursewith a large number of covariates (J).

When the true process is exponential and Matérn and we used the true covariance model asour spatial weight matrix, the proposed method did not perform well in terms of TP for smallsample size. One possible reason is that the exponential and Matérn covariance functions inthe spatial weight matrices produced larger maximum eigenvalues of L for small sample. Forexample, when Mat5/2(2.5) is used for a spatial weight, the corresponding maximum eigen-value of L is 20.32 for m = 6 while the maximum eigenvalue of L is 1.23 for Gauss(1.5) asa spatial weight. A larger maximum eigenvalue of L makes a larger penalty parameter, so lesscomponents are selected. However, the performance is getting better as m increases. For smallsize (m = 6), inverse Multiquadratic spatial weights tend to underestimate so that the TP islower compared to other spatial weight matrix choices, but improving as m increases. Gaussianspatial weight matrix maintains similar level of TP while FP is reduced compared to the inde-pendent approach. Thus, we recommend to use a Gaussian spatial weight matrix, in particularfor small sample size in practice.

Results in Tables 9 - 11 in Appendix D show that there is a clear increase (overestimation)of FP with an increase in dependence among covariates. This is somewhat expected sincestrong dependence between covariates may hinder the selection power of the variable selectionapproaches, in turn, results in selection of more components. However, we would like to pointout that our approach performs comparatively better than the independent approach.

4.2 Real data example

We considered lung cancer mortality data over the period of 2000-2005, obtained from Surveil-lance, Epidemiology, and End Results Program (SEER, www.seer.cancer.gov) as an il-lustrative example. The SEER data includes incidence and associated variables for US counties.In this analysis, we considered the southern part of Michigan, that is, counties falling south of45 latitude zone or east of −86.5 longitude zone. This constitutes of 68 counties from thelower part of Michigan.

We applied Tukey’s transformation (e.g., Cressie and Chan (1989)) to the age-adjusted lungcancer mortality rates to be used as a response variable. We included 20 covariates obtainedfrom the SEER database which are from the U. S. Census Bureau. We also added PM2.5 (Partic-ulate matter smaller than 2.5 micrometers) obtained from the U.S. EPA’s National Emission In-ventory (NEI) database ( www.epa.gov/air/data/emisdist.html?st~MI~Michigan).Since emission data in this website is available for the years 2001 and 2002, we considered theaverage of 2001− 2002 emission data. The unit is tons per county.

For the numerical analysis, we scaled each of our variables to [0, 1]. We then computed aB-spline matrix for each of our predictor variable and finally applied our method of variable

17

selection. We considered a Gaussian covariance function for the spatial weight matrix and asequence of ρ value around the estimated ρ by fitting an empirical variogram. Then, we appliedthe selection approach introduced in Section 3.1. Selected variables for both group LASSOand adaptive group LASSO algorithms under independent and dependent error are presented inTable 4.

Our method of variable selection has a strict sense of selecting variables in the sense ofdropping more variables. For this data example, our approach (adaptive group LASSO case)dropped two more variables among the variables selected by the independent approach. Theselected components, ‘Poverty’ and ‘Move different state’, do not seem to be related to lungcancer mortality rates directly but one may think ‘Poverty’ can be a proxy of more relevantcovariate to lung cancer mortality rates. For example, a study shows smoking is more prevalentin lower socio-economic groups of society [Haustein (2006)]. Thus, one can think of ‘Poverty’as a proxy of tobacco use. Although a variable ‘Move different state’ was kept, our approach atleast dropped a few more irrelevant variables compared to the independent approach.

To explore more on those covariates dropped by the proposed approach but selected by theindependent approach, we fit a multiple linear regression. Although we considered non-linearrelationship in spatial additive models in this paper, simple linear regression can provide initialassessments about the result. We present outputs of linear regression with five covariates inTable 5. This shows that our approach selected two most significant variables based on p-values(bold) while independent approach selected some insignificant variables.

5 DiscussionWe established a method of selecting non-zero components in spatial additive models usinga two-step adaptive group LASSO type approach with a spatial weight in the `2- error term.Through the simulation study, we found that a variable selection method developed for additivemodels with independent errors does not perform well for spatial additive models which is themain motivation for developing the proposed method in this paper.

The condition for the selection consistency infers that the number of additive componentsincreases exponentially in the number of sample size. The selection result is sensitive to thechoice of a regularization parameter λn. Our theorems showed that a lower bound of λn in-volves a spatial weight matrix. Thus, we considered an approach to choose a spatial weightmatrix together with a regularization parameter that works well in practice. Our theoreticalfindings showed that the proposed variable selection method can still be used even if the spatialinformation is missing by considering an identity matrix as a spatial weight matrix. Indeed, thesimulation results showed that our approach works better in this case compared to the indepen-dent data approach.

In the gL objective function, we introduced a spatial weight matrix in the `2 error term,which looks like the generalized least square. Indeed, if we use the true covariance matrixof the process, it is the form of the generalized least square. Noted, the problem becomessimultaneous estimation of mean and variance, which is beyond the scope of this paper. Our

18

ID Variable descriptionTukey’s Transformation

Independent DependentGLASSO AGLASSO GLASSO AGLASSO

1 Population Mortality × × × ×2 Poverty X X X X3 PM25 × × × ×4 Urban × × × ×5 Nonwhite × × × ×6 Never Married × × × ×7 Agriculture × × × ×8 Unemployment X X × ×9 White Collar × × × ×10 Higher Highschool × × × ×11 Age more than 65 × × × ×12 Age less than 18 × × × ×13 Crowding × × × ×14 Foreign born × × × ×15 Language isolation × × × ×16 Median household income × × × ×17 Same house no migration × × × ×18 Move same County × × × ×19 Move same State × × × ×20 Move different State X X X X21 Normalized cost of living X X X ×

Table 4: Comparing the two methods for the real data example using Group LASSO and AdaptiveGroup LASSO Algorithm

Variable Estimate Std. Error t value p-value

Intercept 0.43958 0.07400 5.940 1.41e-07Poverty 0.42467 0.12887 3.295 0.00163

Unemployment 0.01086 0.14638 0.074 0.94107Move same State -0.01837 0.10624 -0.173 0.86331Move diff State -0.15474 0.10379 -1.491 0.14106

Normalized cost of living 0.02715 0.07346 0.370 0.71298

Table 5: Coefficient estimates(standard error) and corresponding p-values obtained from Linear regression using R of the variables selectedunder independent error assumption

19

goal was mean selection, not covariance estimation. Therefore, our approach is more robust,when using working correlation matrix, rather than the generalized least square.

Acknowledgements The authors greatly acknowledge the comments from the reviewersand editors that substantially improved the paper.

20

Appendices

A Derivation of the assumption (K 2)′

Using the given choice of rn,

n2

λ2n2r2nmn

≥ n2

λ2n2mn

ρ2max (L)m3n log(Jmn)

n1−α

=n1+αρ2max(L)m2

n log(Jmn)

λ2n2

=n1+αρ2max(L)mn log(snmn)

λ2n2r2n

log(Jmn)

log(snmn)r2nmn

. (9)

Since

log(Jmn)log(snmn)

r2nmn

−→∞, if L.H.S of (9) goes to 0, then

√ρ2

max(L)n1+αmn log(snmn)

λn2rn−→ 0.

Also by same choice of rn and using the fact that mn n1/2τ+1, we have

n2

λ2n2r2nmn

=n2ρ2max(L)m3

n log(Jmn)

λ2n2n1−αmn

=m2nn

1+αρ2max(L) log(Jmn)

λ2n2=mnλ

2n1

λ2n2.

Therefore under (H 1) to (H 7), (K 2) can be replaced by (K 2)′.

B Proofs of theoremsBefore proving the theorems stated in Section 3, we introduce some notations and lemmas.Note that for f ∈ F , there exists fn ∈ Sn such that ‖f − fn‖2 = O(m−τn ), where ‖ · ‖2 for

a function is defined as ‖f‖2 =√∫ b

af 2(x)dx and Sn is the space spanned by B-spline bases

[e.g. see Huang et al. (2010b)]. Then, define the centered S0nj by,

S0nj =

fnj : fnj =

mn∑l=1

bjlBcl (x), (bj1, · · · , bjmn) ∈ Rmn

, 1 ≤ j ≤ J. (10)

Recall that Bcl (x) = Bl(x)− 1n

∑s′∈S Bl(Xj(s

′)), which depend on Xj . Also, for the purposeof emphasizing the fact that, π, ε and θ depends on sample size n we will use suffix n for eachof these three quantities in our proofs.

Recall that A0 = j : fj(x) ≡ 0, 1 ≤ j ≤ J and A∗ = 1, 2, . . . , J \ A0 so that A0 andA∗ are the sets of zero and nonzero components, respectively. We introduced A0 as the indexset that satisfies

∑j∈A0‖βj‖2 ≤ η1 for some η1 ≥ 0 and let A∗ = 1, 2, . . . , J \ A0. Without

loss of generality, we can assume A0 ⊆ A0 so that A∗ ⊆ A∗ and q∗ := |A∗| ≤ q. For any subsetA ⊂ 1, 2, . . . , J, define βA = (βj, j ∈ A)′ and ΩA = Dc ′

ADcA/n, where Dc

A = L′BcA andBcA = (Bcj, j ∈ A) is the sub-design matrix formed by the columns indexed in the set A. Note

21

that βA0≡ 0. Also, denote the minimum and maximum eigenvalues and the condition number

of a matrixM by ρmin(M ), ρmax(M) and κ(M ), respectively.

Lemma 1 (Lemma 1 in Huang et al. (2010b)). Suppose that f ∈ F and Ef(Xj) = 0. Thenunder (H 4) and (H 5) in Assumption 1, there exists an fn ∈ S0

nj such that

‖fn − f‖2 = Op

(m−τn +

√mn

n

). (11)

Particularly, under the choice of mn = O(n1

2τ+1 ), we have

‖fn − f‖2 = Op

(m−τn

)= Op(n

− τ2τ+1 ). (12)

Lemma 2. Suppose that |A| is bounded by a fixed constant independent of n and J . Let hn m−1n . Then under (H 4) and (H 5) in Assumption 1, with probability converging to one,

ρmin(Σ−1W )d1hn ≤ ρmin(ΩA) ≤ ρmax(ΩA) ≤ ρmax(Σ−1W )d2hn (13)

Additionally under (H 7), (13) becomes,

c1hn ≤ ρmin(ΩA) ≤ ρmax(ΩA) ≤ c2hn (14)

where d1, d2, c1 and c2 are some positive constants.

Proof. One can follow the proof of the Lemma 3 in Huang et al. (2010b) but after observingthat,

ρmin(Σ−1W )

(Bc ′ABcAn

)≤ ΩA =

Dc ′ADc

A

n≤ ρmax(Σ

−1W )

(Bc ′ABcAn

)which gives (13). By (H 7), the well-conditioned property of Σ−1W , we have (14).

Lemma 3. DefineMn be a non-negative definite matrix of order n and,

Tjl =(mn

n

) 12a′jlMnε ∀1 ≤ j ≤ J, 1 ≤ l ≤ mn (15)

where ajl = (Bcl (Xj(s)), s ∈ S)′ and Tn = max 1≤j≤J1≤l≤mn

|Tjl|. Then, under assumptions (H 2)to (H 5) in Assumption 1,

E(Tn) ≤ C1ρmax(Mn)√

(mn log(Jmn))O(nα), (16)

for some C1 > 0.

Proof. Since ε ∼ Gaussian(0,ΣT ), Tjl ∼ Gaussian(0, mnna′jlMnΣTM

′najl). Therefore we

can use maximal inequalities of sub-Gaussian random variables [van der Vaart and Wellner(1996), Lemmas 2.2.1 and 2.2.2]. Let ‖ · ‖φ be the Orlicz norm, defined by ‖X‖φ = inf k ∈

22

(0,∞) |E(φ(|X|/k)) ≤ 1. Then, conditional on Xj(s), s ∈ S, 1 ≤ j ≤ J, we have thefollowing∥∥ max

1≤j≤J,1≤l≤mn|Tjl|

∣∣Xj(s), s ∈ S, 1 ≤ j ≤ J∥∥φ2

≤ K

√mn

nlog(1 + Jmn) max

1≤j≤J,1≤l≤mn‖ |a′jlMnε|

∣∣Xj(s), s ∈ S, 1 ≤ j ≤ J‖φ2

≤ K

√mn

nlog(Jmn) max

1≤j≤J,1≤l≤mn

√a′jlMnΣTM

′najl,

where K > 0 is a generic constant and φp(x) = exp − 1. Now taking expectation with respect

to Xj(s), s ∈ S, 1 ≤ j ≤ J on both sides of the above inequality,

‖ max1≤j≤J,1≤l≤mn

|Tjl| ‖φ2

≤ K

√mn

nlog(Jmn)E

(max


√a′jlMnΣTM

′najl

)= K

√mn

nlog(Jmn)E

(√max

1≤j≤J,1≤l≤mna′jlMnΣTM

′najl

)

≤ Kρmax(Mn)

√mn

nlog(Jmn)

√E(

max1≤j≤J,1≤l≤mn

a′jlΣTajl

). (17)

Since Bcl (x) are normalized B-splines, we have

E

max1≤j≤J,1≤l≤mn

∑s∈S

∑s′∈S

Bcl (Xj(s))δ(s− s′)Bcl (Xj(s′))

≤ 4

∑s∈S

∑s′∈S

δ(s− s′)

≤ K∑s∈S

∫h∈CDn

δ(h)dh

≤ Kn

∫h∈CDn

δ(h)dh. (18)

for some K,C > 0.From (17) and (18),

‖ max1≤j≤J,1≤l≤mn

|Tjl| ‖φ ≤ Kρmax(Mn)√mn log(Jmn)

√∫h∈CDn

δ(h)dh

≤ Kρmax(Mn)√mn log(Jmn)O(nα).

Finally, (16) follows from ‖X‖L1 ≤ C‖X‖L2 ≤ ‖X‖φ2 , where ‖X‖Lp = (E(|X|p))1/p.

23

Before we delve into the proof of the theorems, let us define and summarize some of indexsets we will be using. Recall that A0 = j : fj ≡ 0, 1 ≤ j ≤ J. For an index set A1 thatsatisfies Aβ = j : ‖βgL,j‖2 > 0 ⊆ A1 ⊆ Aβ ∪ A∗, we consider the following sets:

“Large” ‖βj‖2 “Small” ‖βj‖2i.e. A∗ i.e. A0

A1 A3 A4

A2 = Ac1 A5 A6

We can deduce some relations from the above table A3 = A1∩A∗, A4 = A1∩A0, A5 = Ac1∩A∗,A6 = A2 ∩ A0, and hence we have A3 ∪ A4 = A1, A5 ∪ A6 = A2, and A3 ∩ A4 = A5 ∩ A6 = φ.Also, let |A1| = q1. For an index set A1 that satisfies Af = j : ‖fgL,j‖2 > 0 ⊆ A1 ⊆ Af∪A∗,

“Large” fj “Small” fji.e. A∗ i.e. A0

A1 A3 A4

A2 = Ac1 A5 A6

We have a similar set of relations from the above which is A3 = A1 ∩ A∗, A4 = A1 ∩ A0,A5 = Ac1 ∩ A∗, A6 = A2 ∩ A0, and hence we have A3 ∪ A4 = A∗, A5 ∪ A6 = A2, andA3 ∩ A4 = A5 ∩ A6 = φ.

To prove Theorem 1, we need boundedness of |Aβ|, which is given in the following lemma.

Lemma 4. Under the Assumption 1 with λn1 > Cρmax (L)√n1+αmn log(Jmn) for a suf-

ficiently large constant C. we have |Aβ| ≤ M1|A∗| for a finite constant M1 > 1 with w.p.converging to 1.

Proof. Along with considering the approximation error for spline regression, we also have totake care of the dependence structure of a Gaussian random vector ε according to (H 3). Toemphasize the dependence on n, we denote write εn instead of ε and similar notation for othersas well. Recall πn = εn + θn, where θn = (θn(s); s ∈ S)′ with θn(s) =

∑Jj=1(fj(Xj(s)) −

fnj(Xj(s))). Note that ‖θn‖2 = O(n1/2q1/2m−τn ) = O(q1/2n1/(4τ+2)) by Lemma 1 sincemn = O(n1/(2τ+1)). Define λn,J = 2

√Kρ2max(L)mnn1+α log(Jmn) for some K > 0 and

λn1 ≥ maxλ0, λn,J, where λ0 = infλ : M1(λ)q∗ + 1 ≤ q0 for some finite q0 > 0 andM1(λ) > 1, which will be specified later in the proof. Without loss of generality, we willassume the infimum of an empty set to be∞. That is, if λ : M1(λ)q∗ + 1 ≤ q0 is an emptyset, it implies that λn1 = λ0 =∞ and which in turn implies that we drop all the components inour additive model, i.e. |Aβ| = 0. So part (i) is trivial in this case and hence for the rest of theproof we will assume λ : M1(λ)q∗ + 1 ≤ q0 is a non-empty set.

24

First, define a new vectorU k such thatU k = Dkc′(Zc−DcβgL)

/λn1 for k = 1, · · · , J . By

Karsuh-Kuhn-Tucker (KKT) conditions of the optimization problem for (4) with the solutionβgL, we have

U k

=ˆβgL,k‖ ˆβgL,k‖2

if ‖βgL,k‖2 > 0,

≤ 1 if ‖βgL,k‖2 = 0.

(19)

Then, the norm of U k is

‖U k‖2

= 1 if ‖βgL,k‖2 > 0,

≤ 1 if ‖βgL,k‖2 = 0.(20)

Now we introduce the following quantities.

xr = max|A|=r

max‖Uk‖2=1,k∈A,B⊂A

∣∣π′nwA|B∣∣ , and

x∗r = max|A|=r

max‖Uk‖2=1,k∈A,B⊂A

∣∣ε′nwA|B∣∣ ,

wherewA|B = W A|B/∥∥W A|B

∥∥2

withW A|B = (DcA(Dc

A′Dc

A)−1λn1Q′BAQBAUA − (I− PA)Dcβ).

For B ⊂ A, QBA is the matrix corresponding to the selection of variables in B from A, i.e.QBAβA = βB. PA = Dc

AΩ−1A DcA′/n, UA = (U k; k ∈ A)′. By the triangle inequality and

Cauchy-Schwarz inequality, for some set A with |A| = r > 0, we have∣∣π′nwA|B∣∣ ≤ ∣∣ε′nwA|B

∣∣+ ‖θn‖2≤

∣∣ε′nwA|B∣∣+K2qn

1/(4τ+2)

≤∣∣ε′nwA|B

∣∣+K1

√(rmn ∨mn)ρ2max(L)nαmn log(Jmn)

ρmax(ΩA1)

(21)

where the last inequality holds for a sufficiently large n for some K1 > 0. By introducing thefollowing sets,

Ωr0 =

(Dc,πn);xr ≤ 2K1


ρmax(ΩA1)

, ∀r ≥ r0

, and

Ω∗r0 =

(Dc, εn);x∗r ≤ K1


ρmax(ΩA1)

,∀r ≥ r0

,

we can show P (Dc,πn) ∈ Ωr0 ≥ P

(Dc, εn) ∈ Ω∗r0

for any r0 > 0 since we have

xr ≤ x∗r + ‖θn‖2 ≤ x∗r +K1


ρmax(ΩA1)

(22)

by recalling the definitions of xr and x∗r and (21).

25

Now, we want to show that P (Dc,πn) ∈ Ωq1 → 1 implies |Aβ| ≤ M1|A∗| = M1q∗ for

some finite M1 > 1, which completes the proof since q∗ ≤ q. Before proving this claim, wefirst show P

(Dc, εn) ∈ Ω∗q1

→ 1, which implies P (Dc,πn) ∈ Ωq1 → 1. We start with the

following:

1− P

(Dc, εn) ∈ Ω∗q1

(23)

≤∞∑r=0

P

(x∗r > K1


ρmax(ΩA1)

)

≤∞∑r=0

(J

r

)P

(|w′A|Bεn| > K1


ρmax(ΩA1)

),

(24)

where |A| = r. Since w′A|Bεn ∼ Gaussian(0,w′A|BΣTwA|B), (24) becomes

≤ 2∞∑r=0

(J

r

)exp

(−0.5K2

1

(rmn ∨mn)ρ2max(L)nαmn log(Jmn)

(w′A|BΣTwA|B)ρmax(ΩA1)

)

≤ 2∞∑r=0

(J

r

)exp

(−0.5K2

1

(rmn ∨mn)ρ2max(L)nαmn log(Jmn)

ρmax(ΣT )ρmax(ΩA1)

)= 2

∞∑r=0

(J

r

)(Jmn)−0.5K

21 (rmn∨mn)ρ2

max(L)nαmn/ρmax(ΣT )ρmax(ΩA1

). (25)

Let Kn = 0.5K21m

2nρ

2max (L)nα

/ (ρmax (ΣT ) ρmax

(ΩA1

)). Then (25) becomes,

= 2(Jmn)−Kn + 2∞∑r=1

(J

r

)(Jmn)−rKn

≤ 2(Jmn)−Kn + 2∞∑r=1

1

r!

(J

(Jmn)Kn

)r= 2(Jmn)−Kn + 2 exp

(J

(Jmn)Kn

)− 2. (26)

Define ‖ΣT‖1 = maxs∈S∑s′∈S σs,s′ and note that ‖ΣT‖1

∫h∈Dn

δ(h)dh = O(nα).Therefore by using the fact, 1√

n‖ΣT‖1 ≤ ρmax(ΣT ) ≤

√n‖ΣT‖1 and ρmax(Σ

−1W ) ≤ ρmax(L)ρmax(L

′) =

ρ2max(L), we have

Kn ≥ c10.5K2

1m3nn

α

√n‖ΣT‖1

0.5c1K21

√n6γ−1,

and Kn −→∞ by (H 6). This shows (26) goes to zero as n→∞.To show P (Dc,πn) ∈ Ωq1 → 1 implies |Aβ| ≤M1|A∗| = M1q

∗, let

V 1j =Ω− 1

2

A1Q′j1U Aj

λn1√n

, for j = 1, 3, 4,

26

and

u =DcA1

Ω−1/2A1

V 14/√n− ω2

‖DcA1

Ω−1/2A1

V 14/√n− ω2‖2

,

where, for simplicity in notations, Qkj = QAkAjis the matrix corresponding to the selection of

variables in Ak from Aj andω2 = (I−PA1)Dc

A2βA2

= (I−PA1)Dcβ. We can show that, V 11 =

V 14 +V 13 andQ′31Q31 +Q′41Q41 = Imn|A1| due to the fact that A3∪ A4 = A1, A3∩ A4 = φ andhence β′

A3Q31+β′

A4Q41 = βA1

. Since q1 = |A1| = |A3|+|A4| and |A3| ≤ q∗, |A4| ≥ (q1−q∗).Then, we have the following lower bound for L2-norm of V 14,

‖V 14‖22 ≥λ2n1‖Q′41UA4

‖22nρmax(ΩA1

)=λ2n1‖Q′41Q41UA1

‖22nρmax(ΩA1

)=

λ2n1mn|A4|nρmax(ΩA1

)≥ B1

(q1 − q∗)+

q∗, (27)

with B1 =λ2n1mnq

∗

nρmax(ΩA1). From (27), we have

|Aβ| ≤ |A1| = q1 ≤ (q1 − q∗)+ + q∗ ≤ q∗‖V 14‖22B1

+ q∗ =

(‖V 14‖22B1

+ 1

)q∗. (28)

Thus, to show |Aβ| ≤ M1q∗ for some finite M1 > 1, we need to show ‖V 14‖22

B1+ 1 ≤ M1 for

some finite M1 > 1.We start with an upper bound of ‖V 14‖22 + ‖ω2‖22. Since

‖V 14‖22 + ‖ω2‖22 = V ′14V 14 + ‖ω2‖22 = V ′14(V 11 − V 13) + ‖ω2‖22≤ V ′14V 11 + ‖V 14‖2‖V 13‖2 + ‖ω2‖22, (29)

we find upper bounds for V ′14V 11, ‖V 14‖2‖V 13‖2 and ‖ω2‖22, respectively. First,

V ′14V 11 =λ2n1nU ′A4Q41Ω

−1A1U A1

=λ2n1nU ′A4Q41Ω

−1A1

(Dc′A1

(Zc − DcβgL

)/λn1

)=

λn1nU ′A4Q41Ω

−1A1Dc′A1

(DcA1βA1

+ DcA2βA2

+ πn − DcA1βgL,A1

)= λn1U

′A4Q41

((βA1

− βgL,A1) +

Ω−1A1

n(Dc′

A1DcA2

)βA2+

Ω−1A1

nDc′A1πn

)

= λn1U′A4

(βA4− βgL,A4

) + λn1U′A4Q41

(Ω−1A1

n(Dc′

A1DcA2

)βA2+

Ω−1A1

nDc′A1πn

)

≤ λn1∑k∈A4

‖βk‖2 +λn1U

′A4Q41Ω

−1A1

(Dc′A1DcA2

)βA2

n+λn1U

′A4Q41Ω

−1A1Dc′A1πn

n,

(30)

27

where the last inequality is based on∣∣∣U ′A4

βA4

∣∣∣ ≤∑k∈A4|U ′kβk| ≤

∑k∈A4‖βk‖2 andU ′

A4βgL,A4

=∑k∈A4∩Aβ

U ′kβgL,k ≥ 0. For ‖V 14‖2‖V 13‖2, we have

‖V 14‖2‖V 13‖2 ≤ ‖V 14‖2λn1

√mn|A3|

nρmin(ΩA1)

(31)

from the definition of V 13. For ‖ω2‖22,

‖ω2‖22 = ‖(I − PA1)Dc

A2βA2‖22

= β′A2Dc′A2

(I − PA1)Dc

A2βA2

= β′A2

(nΩA2

βA2− 1

nDc′A2DcA1

Ω−1A1Dc′A1DcA2βA2

)≤ β′

A2

(λn1DA2

− Dc′A2πn − Dc′

A2DcA1

(βA1− βgl,A1

)− 1

nDc′A2DcA1


)= β′

A2

(λn1DA2

− Dc′A2πn −

λn1n

Dc′A2DcA1

Ω−1A1U A1

+1

nDc′A2DcA1

Ω−1A1Dc′A1πn

)= λn1β

′A2DA2

− ω′2πn −λn1nU ′A1


,

where the inequality is from

Dc′A2DcA1

(βA1− βgL,A1

) + nΩA2βA2

+ Dc′A2πn ≤ λn1DA2

. (32)

In (32), DA is a 0 − 1 vector whose kth entry is I(‖βk,gL‖2 = 0), where I(A) is the indicatorfunction for a set A. The inequality between vectors is defined entry-wise. Note that (32) holdsdue to (20).

Since V 14 ⊥⊥ ω2, we have

‖DcA1

Ω−1A1Q′41U A4

λn1/n− ω2‖22 = ‖DcA1

Ω−1/2A1

V 14/√n− ω2‖22 = ‖V 14‖22 + ‖ω2‖22

so that (λn1nU ′A4Q41Ω

−1A1Dc′A1− ω′2

)πn = (‖V 14‖22 + ‖ω2‖22)1/2(u′πn)

using the definition of u. Then, this implies

‖ω2‖22 ≤ λn1β′A2DA2

− λn1nU ′A1


+(‖V 14‖22 + ‖ω2‖22)1/2(u′πn)− λn1nU ′A4Q41Ω

−1A1Dc′A1πn. (33)

28

Combining (30) and (33), we have

V ′14V 11 + ‖ω2‖22

≤ λn1∑k∈A4

‖βk‖2 −λn1U

′A3Q31Ω

−1A1

(Dc′A1DcA2

)βA2

n+ λn1β

′A2DA2

+ (‖V 14‖22 + ‖ω2‖22)1/2|u′πn|= λn1

∑k∈A4

‖βk‖2 − V ′13Ω−1/2A1

(Dc′A1DcA2

)βA2/√n+ λn1β

′A2DA2

+ 2((‖V 14‖22 + ‖ω2‖22)1/2/2

)|u′πn|

≤ λn1∑k∈A4

‖βk‖2 + ‖V 13‖2‖Ω−1/2A1(Dc′

A1DcA2

)βA2‖2/√n+ λn1

∑k∈A2

‖βk‖2

+(‖V 14‖22 + ‖ω2‖22)

4+ |u′πn|2, (34)

where the inequality is by the Cauchy-Schwarz inequality, triangle inequality and 2ab ≤ a2+b2.Then, by (31) and (34),

‖V 14‖22 + ‖ω2‖22 ≤ V ′14V 11 + ‖V 14‖2‖V 13‖2 + ‖ω2‖22≤ λn1

∑k∈A4

‖βk‖2 + ‖V 13‖2‖Ω−1/2A1(Dc′

A1DcA2

)βA2‖2/√n+ λn1

∑k∈A2

‖βk‖2

+(‖V 14‖22 + ‖ω2‖22)

4+ |u′πn|2 + ‖V 14‖2λn1

√mn|A3|

nρmin(ΩA1). (35)

Since (Dc,πn) ∈ Ωq1 implies |u′πn|2 ≤ (xq1)2 ≤ (q1mn∨mn)λ2n1

4nρmax(ΩA1)

=q1mnλ2

n1

4nρmax(ΩA1)

= 14q1q∗B1,

we can show |u′πn|2 ≤ 14q1q∗B1 ≤ 1

4(‖V14‖22 + B1) by using (27). Along with ‖V 13‖2 ≤

29

λn1

√mn|A3|

nρmin(ΩA1), (35) becomes

≤ λn1∑k∈A4

‖βk‖2 + λn1

√mn|A3|

nρmin(ΩA1)‖Ω−1/2

A1Dc′A1DcA2βA2‖2/√n+ λn1

∑k∈A2

‖βk‖2

+(‖V 14‖22 + ‖ω2‖22)

4+

(‖V 14‖22 +B1)

4+ ‖V 14‖2λn1

√mn|A3|

nρmin(ΩA1)

= λn1∑k∈A5

‖βk‖2 + λn1

√mn|A3|

nρmin(ΩA1)‖P 1/2

A1DcA2βA2‖2 + λn1

∑k∈A4∪A6

‖βk‖2

+‖V 14‖22

2+‖ω2‖22

4+B1

4+ ‖V 14‖2λn1

√mn|A3|

nρmin(ΩA1)

≤ λn1∑k∈A5

‖βk‖2 + λn1

√mn|A3|


A1DcA2βA2‖2 + λn1η1

+‖V 14‖22

2+‖ω2‖22

4+B1

4+ ‖V 14‖2λn1

√mn|A3|

nρmin(ΩA1),

where the equality comes from A2 = A5∪A6 with P 1/2

A1= Ω

−1/2A1

Dc′A1/√n and the last inequality

is due to GSC. This result gives

1

2‖V 14‖22 +

3

4‖ω2‖22 ≤ λn1

∑k∈A5

‖βk‖2 + λn1

√mn|A3|


A1DcA2βA2‖2 + λn1η1

+B1

4+ ‖V 14‖2λn1

√mn|A3|

nρmin(ΩA1)

(36)

from which we have

‖V 14‖22 ≤ 2

(1

2‖V 14‖22 +

3

4‖ω2‖22

)≤ 2λn1

∑k∈A5

‖βk‖2 + 2λn1

√mn|A3|


A1DcA2βA2‖2 + 2λn1η1

+B1

2+ 2‖V 14‖2λn1

√mn|A3|

nρmin(ΩA1).

Note that the largest possible A1 contains all the “large" ‖βj‖2 then A5 = φ so that A2 =

A6,∑

k∈A5‖βk‖2 = 0 and P 1/2

A1DcA2βA2

= P1/2

A1DcA6βA6

. Finally using ‖P 1/2

A1DcA6βA6‖2 ≤

30

maxA⊂A0‖∑

k∈ADckβk‖2 since A6 ⊂ A0, we have the following inequality.

‖V 14‖22 ≤ 2η2√B2 + 2λn1η1 +

B1

2+ 2√B2‖V 14‖2, (37)

where η2 = maxA⊂A0‖∑

k∈ADckβk‖2 and B2 =

λ2n1mnq

∗

nρmin(ΩA1). Hence after using the fact

2√B2‖V 14‖2 = 2

(√2B2

) (‖V 14‖2/

√2)≤ 2B2 + ‖V 14‖22/2, we obtain an upper bound

for ‖V 14‖22 from (37),

‖V 14‖22 ≤ B1 + 4λn1η1 + 4η2√B2 + 4B2

which, when combined with (28), implies

|Aβ| ≤M1q∗,

where M1 = M1(λn1) = 2 + 4r1 + 4r2√C12 + 4C12 with

r1 = r1(λn1) =

(c2η1n

q∗mnλn1

), r2 = r2(λn1) =

(c2η

22n

q∗mnλ2n1

)1/2

and C12 =c2c1.

Note that M1(λn1) is a decreasing function in λn1.

If η1 = 0 which is referred to as a narrow-sense sparsity condition, then r1 = r2 = 0

and hence M1(λn1) = 2 + 4C12 < ∞. Note that since we are assuming λ0 < ∞, we im-plicitly assume that (2 + 4C12)q

∗ + 1 ≤ q0 holds. In general, as long as η1 and η2 satisfythat η1 ≤

(C1q∗

c2

) (mnλn1

n

)and η22 ≤

(C2q∗

c2

)(mnλ2

n1

n

)for some finite C1 and C2, we will have

r1 ≤ C1 and r2 ≤ C2, which gives M1(λn1) ≤ (2 + 4C1 + 4C2

√C12 + 4C12) <∞. Thus, we

complete the proof.

PROOF OF THEOREM 1

Part (a): Since βgL is the group LASSO estimate by minimizingQn1(β, λn1), for any β,

‖Zc − DcβgL‖22 + λn1‖βgL‖2,1,1 ≤ ‖Zc − Dcβ‖22 + λn1‖β‖2,1,1. (38)

Let A2 = j : ‖βj‖2 > 0 or ‖βgL,j‖2 > 0, where βA2= (βj, j ∈ A2) and βgL,A2

=

(βgL,j, j ∈ A2). Recall that ‖βA2‖2,1,1 =

∑j∈A2‖βj‖2 and ‖βgL,A2

‖2,1,1 =∑

j∈A2‖βgL,j‖2.

By (38), we have

‖Zc − DcA2βgL,A2

‖22 + λn1‖βgL,A2‖2,1,1 ≤ ‖Zc − Dc

A2βA2‖22 + λn1‖βA2

‖2,1,1. (39)

31

Let ϑA2 = Zc − DcA2βA2

and ζA2= Dc

A2(βgL,A2

− βA2). Using the fact that ‖a − b‖22 =

‖a‖22−2a′b+‖b‖22 andZc−DcA2βgL,A2

= Zc−DcA2βA2−Dc

A2(βgL,A2

−βA2), we can rewrite

(39) such that

‖ζA2‖22 − 2ϑ′A2

ζA2≤ λn1‖βA2

‖2,1,1 − λn1‖βgL,A2‖2,1,1

≤ λn1

∣∣∣ ‖βA2‖2,1,1 − ‖βgL,A2

‖2,1,1∣∣∣

≤ λn1√|A2| ‖βgL,A2

− βA2‖2. (40)

Subsequent steps will be to bound ‖βgL,A2− βA2

‖22. First, we have

2|ϑ′A2ζA2| ≤ 2‖ϑ∗A2

‖2‖ζA2‖2 = 2

(√2‖ϑ∗A2

‖2)(‖ζA2

‖2√2

)≤ 2‖ϑ∗A2

‖22 +‖ζA2‖22

2, (41)

where the last inequality is based on the fact that, 2ab ≤ a2 + b2 and ϑ∗A2is the projection of

ϑA2 to the span of DcA2

. Now by combining (40) and (41), we have

‖ζA2‖22 ≤ 4‖ϑ∗A2

‖22 + 2λn1√|A2| ‖βgL,A2

− βA2‖2. (42)

On the other hand, we have

‖ζA2‖22 = (βgL,A2

− βA2)′Dc ′

A2DcA2

(βgL,A2− βA2

)

= n(βgL,A2− βA2

)′Dc ′A2DcA2

n(βgL,A2

− βA2)

≥ nρmin

(Dc ′A2DcA2

n

)‖βgL,A2

− βA2‖22

= nρmin(ΩA2)‖βgL,A2− βA2

‖22≥ nc1hn‖βgL,A2

− βA2‖22, (43)

where the last inequality is from Lemma 2.Combining (42) and (43), we have

nc1hn‖βgL,A2− βA2

‖22≤ 4‖ϑ∗A2

‖22 + 2λn1√|A2|‖βA2

− βgL,A2‖2

= 4‖ϑ∗A2‖22 + 2

(λn1√

2|A2|√nc1hn

)(‖βA2

− βgL,A2‖2

√nc1hn

2

)

≤ 4‖ϑ∗A2‖22 +

λ2n12|A2|nc1hn

+ ‖βA2− βgL,A2

‖22nc1hn

2.

Thus,

‖βgL,A2− βA2

‖22 ≤8‖ϑ∗A2

‖22nc1hn

+4λ2n1|A2|

n2c21h2n. (44)

32

Let ϑ(s) be the entry of ϑA2 so that we can rewrite

ϑ(s) = ε(s) + (µ− Y ) + f 0(X(s))− fnA2(X(s)), (45)

where f 0(X(s)) =∑J

j=1 fj(Xj(s)) and fnA2(X(s)) =

∑j∈A2

fnj(Xj(s)). Note that |µ −Y |2 = Op(n

−(1−α)) due to (18) and (H 3).Let ε∗A2

be the projection of εn to the span of DcA2

, that is, ε∗A2= (Dc ′

A2DcA2

)−1/2Dc ′A2εn.

Then, we have

‖ϑ∗A2‖22 ≤ ‖ε∗A2

‖22 +Op(nα + |A2|nm−2τn ) (46)

= ‖(Dc ′A2DcA2

)−1/2Dc ′A2εn‖22 +Op(n

α + |A2|nm−2τn )

≤‖Dc ′

A2εn‖22

nρmin(ΩA2)+Op(n

α + |A2|nm−2τn )

≤‖Dc ′

A2εn‖22

nc1hn+Op(n

α + |A2|nm−2τn ) (By Lemma 2)

≤ 1

nc1hnmax

A:|A|≤|A2|‖Dc ′

A εn‖22 +Op(nα + |A2|nm−2τn )

=1

nc1hnmax

A:|A|≤|A2|

∑j∈A

‖Dc ′j εn‖22 +Op(n

α + |A2|nm−2τn )

≤ 1

nc1hn|A2|mn max


∣∣Dc ′jl εn

∣∣2 +Op(nα + |A2|nm−2τn )

=1

c1hn|A2| max


∣∣∣∣(mn

n

)1/2a′jlLεn

∣∣∣∣2 +Op(nα + |A2|nm−2τn )

= Op

(|A2|ρ2max (L) nαmn log(Jmn)

c1hn

)+Op(n

α + |A2|nm−2τn ), (47)

where the last equality is by By Lemma 3 using Mn = L. The part (a) follows by combining(44) and (47) since |A2| is bounded by Lemma 4.

Part (b): If m2nλ

2n1

n2 −→ 0 then, by the condition λn1 > Cρmax (L)√n1+αmn log(Jmn), we

haveρ2

max(L)m3n log(Jmn)

n1−α −→ 0, also note that, 1 = ρmax(I) = ρmax(Σ−1W ΣW ) ≤ ρmax(Σ

−1W )

ρmax(ΣW ) ≤ ρ2max(L)ρmax(ΣW ) therefore ρ2max(L) ≥ ρmin

(Σ−1W

)and hence,

ρ2max (L)m3n log(Jmn)

n1−α

=ρ2max (L)m2

n log(Jmn) mn

n1−α

≥ρmin

(Σ−1W

)m2n log(Jmn)

mn

n1−α

≥Cm2

n log(Jmn) mn

n1−α , (48)

33

where C is a generic constant. Since L.H.S of (48) goes to 0 and m2n log(Jmn) −→ ∞, hence

mnn1−α −→ 0.

Similarly, ρ2max (L)m3

n log(Jmn)

n1−α

=

ρ2max (L)m2τ+2

n log(Jmn)

n1−α

1

m2τ−1n

=

ρmin

(Σ−1W

)n

2τ+22τ−1 log(Jmn)

n1−α

1

m2τ−1n

≥ρmin

(Σ−1W

)nα log(Jmn)

1

m2τ−1n

≥ Cnα log(Jmn) 1

m2τ−1n

(49)

Since L.H.S of (49) goes to 0 and nα log(Jmn) −→∞, hence 1m2τ−1n−→ 0 and hence we have

part (b) of Theorem 1.

PROOF OF THEOREM 2The part (a) is from the fact that

c∗m−1n ‖βgL,j − βj‖2 ≤ ‖fgL,j − fj‖2 ≤ c∗m−1n ‖βgL,j − βj‖2

for some c∗, c∗ > 0. The part (b) is from the part (a).

PROOF OF THEOREM 3

Part (a): Recall that, by the KKT conditions, a necessary and sufficient condition for βAgL isDcj′(Zc − DcβAgL) = λn2ηnj

ˆβAgL,j2‖ ˆβAgL,j‖2

, when ‖βAgL,j‖2 > 0,

‖Dcj′(Zc − DcβAgL)‖2 ≤ λn2ηnj/2, when ‖βAgL,j‖2 = 0.

(50)

Let A∗∗ = A∗ ∩ j : ‖βAgL,j‖2 > 0. Define

βA∗∗ = (DcA∗∗′Dc

A∗∗)−1(Dc

A∗∗′Zc − λn2vn), (51)

where vn = (vnj, j ∈ A∗∗) with vnj = ηnjβj/(2‖βj‖2). Then, we have Dcj′(Zc−Dc

A∗∗βA∗∗) =

λn2ηnjˆβj

2‖ ˆβj‖2, for j ∈ A∗∗. If we assume ‖Dc

j′(Zc − Dc

A∗∗βA∗∗)‖2 ≤ λn2ηnj/2 for all j /∈ A∗∗,

then (50) holds for (β′A∗∗ ,0

′), so that βAgL = (β′A∗∗ ,0

′) since Dcβ = DcA∗∗βA∗∗ .

34

If ‖βj‖2 − ‖βj‖2 < ‖βj‖2 for all j ∈ A∗∗, then βA∗∗ =0 βA∗∗ so that we have βAgL =0 β.Therefore, we can have the following inequalities,

P(βAgL 6=0 β) ≤ P(‖βj‖2 − ‖βj‖2 ≥ ‖βj‖2,∃j ∈ A∗∗)+P(‖Dc

j′(Z − Dc

A∗∗βA∗∗)‖2 > λn2ηnj/2,∃j /∈ A∗∗)≤ P(‖βj − βj‖2 ≥ ‖βj‖2,∃j ∈ A∗∗)+P(‖Dc

j′(Z − Dc

A∗βA∗∗)‖2 > λn2ηnj/2, ∃j /∈ A∗∗), (52)

where the last inequality is from ‖βj‖2 > 0 for j ∈ A∗∗.First we show

P(‖βj − βj‖2 ≥ ‖βj‖2,∃j ∈ A∗∗)→ 0. (53)

To show (53), it is sufficient to show that maxj∈A∗∗ ‖βj − βj‖2 → 0 in probability since‖βj‖2 > 0 for j ∈ A∗∗. Define T nj = (Omn , . . . ,Omn , Imn ,Omn , . . . ,Omn) be a mn × qmn

matrix with Imn is in the jth block, where Omn be mn × mn matrix of zeros and Imn be anmn × mn identity matrix. From (51), βA∗∗ − βA∗∗ = n−1Ω−1A∗∗(D

c′A∗∗εn + Dc′

A∗∗θn − λn2vn).Thus, if j ∈ A∗∗, we have βj − βj = n−1T njΩ

−1A∗∗

(Dc′A∗∗εn + Dc′

A∗∗θn − λn2vn). By triangleinequality,

‖βj − βj‖2 ≤ n−1‖T njΩ−1A∗∗

Dc′A∗∗εn‖2 + n−1‖T njΩ

−1A∗∗

Dc′A∗∗θn‖2

+ n−1λn2‖T njΩ−1A∗∗vn‖2 (54)

Now, we show each term on the right hand side in (54) goes to zero in probability. For the firstterm,

maxj∈A∗∗

n−1‖T njΩ−1A∗∗

Dc′A∗∗εn‖2 ≤ 1

nρmax(ΩA∗∗)‖Dc′

A∗∗εn‖2

=1

nρmax(ΩA∗∗)

√∑j∈A∗∗

‖Dc′j εn‖22

≤√|A∗∗|

n1/2ρmax(ΩA∗∗)

√maxj∈A∗∗

1≤l≤mn

mn

n|Dc

jl′εn|2

= Op

(√ρ2max(L)m3

n log(|A∗∗|mn)

n1−α

)−→ 0 (55)

where the last equality holds by Lemma 3 and (55) holds by assumptions (H 6) and (H 7). Forthe second term,

maxj∈A∗∗

n−1‖T njΩ−1A∗∗

Dc′A∗∗θn‖2 ≤ n−1/2‖Ω−1A∗∗‖2‖n

−1Dc′A∗∗D

cA∗∗‖

1/22 ‖θn‖2

≤ n−1/2ρ−1min(ΩA∗∗)ρ1/2max(ΩA∗∗)Op(n

1/(4τ+2))

= Op

(n1/(2τ+1)−1/2)

−→ 0, (56)

35

where (56) holds by assumption (H 6). For the third term, we first find an upper bound for‖vn‖22,

‖vn‖22 =1

2

∑j∈A∗∗

η2nj =1

2

∑j∈A∗∗

‖βgL,j‖−22

=1

2

∑j∈A∗∗

‖βj‖22 − ‖βgL,j‖22 + ‖βgL,j‖22‖βgL,j‖22‖βj‖22

=1

2

∑j∈A∗∗

‖βj‖22 − ‖βgL,j‖22‖βgL,j‖22‖βj‖22

+1

2

∑j∈A∗∗

‖βj‖−22

≤ Ck−2b b−4n1 ‖βgL,A∗∗ − βA∗∗‖22 + qb−2n1

= Op(k−2b b−4n1 r

−1n + qb−2n1 ) = Op(k

2n), (57)

where C is a generic constant. Then, we have

maxj∈A∗∗

n−1λn2‖T njΩ−1A∗∗vn‖2 ≤ n−1λn2ρ

−1min(ΩA∗∗)‖vn‖2

= Op(n−1λn2ρ

−1min(ΩA∗∗)kn)

= Op(n−1λn2(r

−1/2n +m1/2

n )

= Op

(λn2m

1/2n

n

)−→ 0, (58)

where (58) is implied by assumption (K 2). Therefore by combining (55), (56) and (58), wehave (53).

Now, we show

P(‖Dcj′(Zc − Dc

A∗βA∗∗)‖2 > λn2ηnj/2,∃j /∈ A∗∗)→ 0. (59)

As ηnj = ‖βgL,j‖−12 = Op(rn) for j /∈ A∗∗, instead of (59) it is sufficient to show,


A∗∗βA∗∗)‖2 > λn2rn/2,∃j /∈ A∗∗)→ 0. (60)

For j /∈ A∗∗,

Dcj′(Zc − Dc

A∗∗βA∗∗)

= Dcj′(Zc − Dc

A∗∗(DcA∗∗′Dc

A∗∗)−1Dc

A∗∗′Zc + λn2n

−1DcA∗∗Ω

−1A∗∗vn)

= Dcj′HnZ

c + λn2n−1Dc

j′Dc

A∗∗Ω−1A∗∗vn

= Dcj′HnDcβ + Dc

j′Hnεn + Dc

j′Hnθn + λn2n

−1Dcj′Dc


= Dcj′Dc

Ac∗∗βAc∗∗ + Dc

j′Hnεn + Dc

j′Hnθn + λn2n

−1Dcj′Dc


= Dcj′Dc

A∗∩Ac∗∗βA∗∩Ac∗∗ + Dcj′Hnεn + Dc

j′Hnθn + λn2n

−1Dcj′Dc

A∗∗Ω−1A∗vn, (61)

36

whereHn = I− PA∗∗ , the first equality is by replacing βA∗∗ with the expression as in (51) andthe fourth equality is becauseHn is the projection matrix on to Ac∗∗.

By (61), the left hand side of (60) can be bounded above by


A∗∗βAgL,A∗∗)‖2 > λn2rn/2,∃j /∈ A∗∗)≤ P

(‖Dc

j′Dc

A∗∩Ac∗∗βA∗∩Ac∗∗‖2 > λn2rn/8,∃j /∈ A∗∗)

+ P(‖Dc

j′Hnεn‖2 > λn2rn/8,∃j /∈ A∗∗

)+ P

(‖Dc

j′Hnθn‖2 > λn2rn/8,∃j /∈ A∗∗

)+ P

(‖λn2n−1Dc

j′Dc

A∗∗Ω−1A∗vn‖2 > λn2rn/8,∃j /∈ A∗∗

)(62)

so that we find upper bounds of the four terms in (62).For the first term, we have

maxj /∈A∗∗

‖Dcj′Dc

A∗∩Ac∗∗βA∗∩Ac∗∗‖2

≤ n maxj /∈A∗∗

‖n−1/2Dc′j ‖2‖n−1/2Dc

A∗∩Ac∗∗‖2‖βA∗∩Ac∗∗‖2

= Op(nρ1/2max(ΩAc∗∗)ρ

1/2max(ΩA∗∗)m

1/2n )

≤ Op(nm−1/2n ).

Then, we have, for some generic constant C,

P(‖Dcj′Dc

A∗∩Ac∗∗βA∗∩Ac∗∗‖2 > λn2rn/8, ∃j /∈ A∗∗)≤ P(max

j /∈A∗∗‖Dc

j′Dc

A∗∩Ac∗∗βA∗∩Ac∗∗‖2 > Cλn2rn/8)

≤ P(nm−1/2n > Cλn2rn/8)

= P

(nm

−1/2n

λn2rn> C/8

)−→ 0, (63)

where (63) holds by assumption (K 2). For the second term, let sn = J − |A∗∗|. Sinceρmax(Hn) = ρmax(I − PA∗∗) = 1 − ρmin(PA∗∗) and PA∗∗ is a non-negative definite matrix,ρmax(Hn) ≤ 1. By Lemma 3 with Mn = LHn, and using the face that ρmax(LHn) ≤ρmax(L)ρmax(Hn) we have,

E(

maxj /∈A∗∗

n−1/2‖Dcj′Hnεn‖2

)

= E

maxj /∈A∗∗

n−1/2

√√√√ mn∑l=1

|Dcjl′Hnεn|2

≤ E

maxj /∈A∗∗

1≤l≤mn

(mn

n

)1/2|a′jlLHnεn|

= O(

√ρ2max(L)nαmnlog(snmn)), (64)

37

thus, by Markov’s inequality, we have

P(‖Dcj′Hnεn‖2 > λn2rn/8,∃j /∈ A∗∗)≤ P(max

j /∈A∗∗n−1/2‖Dc

j′Hnεn‖2 > Cn−1/2λn2rn/8)

≤ O

(√ρ2max(L)n1+αmnlog(snmn)

Cλn2rn

)−→ 0, (65)

where C is a generic constant and (65) holds by assumption (K 2). For the third term,

maxj /∈A∗∗

‖Dc′jHnθn‖2 ≤ n1/2 max

j /∈A∗∗‖n−1Dc′

j Dcj‖

1/22 ‖Hn‖2‖θn‖2

= O(nρ1/2max(ΩAc∗∗)m−τn ) = O(nm−τ−1/2n ).

Therefore, for some generic constant C,

P(‖Dcj′Hnθn‖2 > λn2rn/6,∃j /∈ A∗∗)≤ P(max

j /∈A∗∗‖Dc

j′Hnθn‖2 > Cλn2rn/8)

≤ P(nm−τ−1/2n > Cλn2rn/8)

= P(

n

λn2rnm(2τ+1)/2n

> C/8

)−→ 0, (66)

where (66) is implied by assumption (K 2). Finally, using (57) we have

maxj /∈A∗∗

‖λn2n−1Dc′j Dc

A∗∗Ω−1A∗∗vn‖2

≤ λn2 maxj /∈A∗∗

‖n−1/2Dc′j ‖2‖n−1/2Dc

A∗∗Ω−1/2A∗∗‖2‖Ω−1/2A∗∗

‖2‖vn‖2

= Op(λn2ρ1/2max(ΩAc∗∗)ρ

−1/2min (ΩA∗∗)kn)

= Op

(λn2(m

−1n r−1/2n +m−1/2n )

).

Then, we have, for some generic constant C,

P(‖λn2n−1Dc′j Dc

A∗∗Ω−1A∗∗vn‖2 > λn2rn/8,∃j /∈ A∗∗)

≤ P(maxj /∈A∗∗

‖λn2n−1Dc′j Dc

A∗∗Ω−1A∗∗vn‖2 > Cλn2rn/8)

≤ P(λn2(m

−1n r−1/2n +m−1/2n ) > Cλn2rn/8

)= P

(m−1n r

−1/2n +m

−1/2n

rn> C/8

)−→ 0, (67)

38

where (67) holds since rn,mn →∞. Hence by combining (65), (66) and (67), (59) follows

Part (b): Denote η∗ = maxj∈A∗ 1/‖βj‖2. Let A∗∗∗ = A∗ ∪ j : ‖βAgL,j‖2 > 0. Note thatJ0 = |A∗∗∗|. Define ϑA∗∗∗ = Zc − Dc

A∗∗∗βA∗∗∗ and denote ϑ∗A∗∗∗ and ε∗A∗∗∗ be the projectionsof ϑA∗∗∗ and εn to the span of Dc

A∗∗∗ . Then, in a similar way as in the part (b) of Theorem 1, wecan show

‖ϑ∗A∗∗∗‖22 ≤ ‖ε∗A∗∗∗‖

22 +Op(n

α + |A∗∗∗|nm−2τn )

= ‖(Dc ′A∗∗∗D

cA∗∗∗)

−1/2Dc ′A∗∗∗εn‖

22 +Op(n

α + |A∗∗∗|nm−2τn )

= Op

(|A∗∗∗|nαρ2max(L)mn log(J0mn)

c1hn+ nα + |A∗∗∗|nm−2τn

). (68)

In a similar way to get (44), we can also show

‖βAgL,A∗∗∗ − βA∗∗∗‖22 ≤

8‖ϑ∗A∗∗∗‖22nc1hn

+4λ2n2|A∗∗∗|η∗n2c21h

2n

, (69)

thus, by (68) and (69), we obtain the part (b) of Theorem 3.

PROOF OF THEOREM 4The proof would go similar as Theorem 2.

C Lemmas and Theorems under the assumptions H4′ and H6′

In this appendix, we provide revised Lemmas and Theorems when we allow different levels ofsmoothness for additive components. We first extend the definition of S0

nj as follows:

S0nj =

fnj : fnj =

mnj∑l=1

bjlBcl (x), (bj1, · · · , bjmnj) ∈ Rmnj

, 1 ≤ j ≤ J. (70)

With new assumptions, Lemmas 1 and 3 and Theorems 2 and 4 are changed as follows.Lemma 1′. Suppose that f ∈ Fj and Ef(Xj) = 0. Then, under (H 4)′ and (H 5), there existsan fn ∈ S0

nj such that

‖fn − f‖2 = Op

(m−τnj +

√mnj

n

). (71)

Particularly, under the choice of mnj = O(n1

2τj+1 ), we have

‖fn − f‖2 = Op

(m−τjnj

)= Op

(n−

τj2τj+1

). (72)

Lemma 3′. DefineMn be a non-negative definite matrix of order n and,

Tjl =(mnj

n

) 12a′jlMnε ∀1 ≤ j ≤ J, 1 ≤ l ≤ mnj, (73)

39

where ajl = (Bcl (Xj(s)), s ∈ S)′ and Tn = max 1≤j≤J1≤l≤mnj

|Tjl|. With new Assumption 1,

E(Tn) ≤ C1ρmax(Mn)√

(mn log(Jmn))O(nα), (74)

for some C1 > 0 and mn = maxj=1,2,...,J mnj .

Theorem 2′. With new Assumption 1 and λn1 > Cρmax (L)√n1+αmn log(Jmn) for a suffi-

ciently large constant C,

(a) ‖fgL,j − fj‖22 = Op

(ρ2

max(L)m3n log(Jmn)

n1−α + mnn1−α + 1

m2τ−1n

+4m2

nλ2n1

n2

)/mnj

for j ∈

Aβ ∪ A∗, where Aβ is the index set of nonzero gL estimates for βj ,

(b) If m2nλ

2n1

mnjn2 −→ 0 as n −→ ∞ for 1 ≤ j ≤ q, all the nonzero components fj, 1 ≤ j ≤ q

are selected w.p. converging to 1.

Theorem 4′. With new Assumptions 1 and 2,(a) P(‖fAgL,j‖2 > 0, j ∈ A∗ and ‖fAgL,j‖2 = 0, j /∈ A∗) −→ 1,

(b) ‖fAgL,j − fj‖22 = Op

(ρ2

max(L)m3n log(J0mn)

n1−α + mnn1−α + 1

m2τ−1n

+4m2

nλ2n2

n2

)/mnj

∀j ∈

A∗.

D Additional simulation resultsIn this appendix, we provide complete simulation results including the simulated data witht = 3 in generating covariates, Xj . Compared the Table 3, Tables 6 - 8 include the results fromboth Group LASSO and Adaptive Group LASSO algorithms, m = 6, 12, 24 and more choicesof correlation parameter values.

As discussed in Section 4, the correlation betweenXj andXk is higher with t = 3 comparedto t = 1. Tables 9- 11 show results with t = 3 case. The selection results are worse for both ourapproach as well as the independent approach compared to the cases with t = 1 in Tables 6- 8and in Section 4 but our approach still performs better than the independent approach.

40

JC

ovM

odel

Spat

ialW

eigh

tsm

=6m

=12

m=2

4G

LA

SSO

Ada

pt.G

LA

SSO

GL

ASS

OA

dapt

.GL

ASS

OG

LA

SSO

Ada

pt.G

LA

SSO

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

15

Exp

(0.5

)

Non

e(In

dep)

3.74

(0.4

8)3.

63(1

.83)

3.74

(0.4

8)3.

62(1

.84)

4(0)

1.6(

1.3)

4(0)

1.6(

1.3)

4(0)

0.55

(0.7

7)4(

0)0.

55(0

.77)

I3.

17(0

.82)

2.02

(1.3

4)3.

15(0

.82)

2(1.

29)

4(0)

0.38

(0.6

5)4(

0)0.

38(0

.65)

4(0)

0(0)

4(0)

0(0)

Gau

ss3.

01(0

.85)

1.76

(1.2

4)3.

01(0

.85)

1.74

(1.2

3)4(

0)0.

18(0

.48)

4(0)

0.18

(0.4

8)4(

0)0(

0)4(

0)0(

0)In

vMQ

2.94

(0.9

3)1.

58(1

.44)

2.94

(0.9

3)1.

58(1

.44)

4(0)

0.1(

0.3)

4(0)

0.1(

0.3)

4(0)

0(0)

4(0)

0(0)

True

1.66

(1.0

2)0.

29(0

.56)

1.66

(1.0

2)0.

29(0

.56)

4(0)

0.07

(0.2

9)4(

0)0.

07(0

.29)

4(0)

0(0)

4(0)

0(0)

Exp

(1)

Non

e(In

dep)

3.82

(0.4

1)3.

77(2

.09)

3.82

(0.4

1)3.

67(1

.99)

4(0)

1.51

(1.2

4)4(

0)1.

51(1

.24)

4(0)

0.53

(0.6

9)4(

0)0.

53(0

.69)

I3.

11(0

.82)

2.31

(1.3

5)3.

11(0

.82)

2.31

(1.3

5)4(

0)0.

43(0

.64)

4(0)

0.43

(0.6

4)4(

0)0.

03(0

.22)

4(0)

0.03

(0.2

2)G

auss

3.01

(0.8

)1.

93(1

.27)

3.01

(0.8

)1.

93(1

.27)

4(0)

0.24

(0.5

1)4(

0)0.

24(0

.51)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

99(0

.88)

1.56

(1.1

7)2.

99(0

.88)

1.54

(1.1

3)4(

0)0.

14(0

.47)

4(0)

0.14

(0.4

7)4(

0)0(

0)4(

0)0(

0)Tr

ue2.

33(1

.05)

1.1(

0.96

)2.

33(1

.05)

1.1(

0.96

)4(

0)0.

12(0

.38)

4(0)

0.12

(0.3

8)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(2.5

)

Non

e(In

dep)

3.8(

0.45

)3.

63(1

.86)

3.79

(0.4

6)3.

57(1

.81)

4(0)

1.37

(1.3

9)4(

0)1.

37(1

.39)

4(0)

0.34

(0.5

7)4(

0)0.

34(0

.57)

I3.

26(0

.77)

1.87

(1.5

)3.

26(0

.77)

1.87

(1.5

)4(

0)0.

46(0

.77)

4(0)

0.46

(0.7

7)4(

0)0.

05(0

.22)

4(0)

0.05

(0.2

2)G

auss

2.98

(0.8

5)1.

64(1

.34)

2.98

(0.8

5)1.

63(1

.33)

4(0)

0.25

(0.5

8)4(

0)0.

25(0

.58)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

74(0

.86)

1.5(

1.18

)2.

74(0

.86)

1.5(

1.18

)4(

0)0.

1(0.

33)

4(0)

0.1(

0.33

)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

8(0.

68)

0.09

(0.3

2)0.

8(0.

68)

0.09

(0.3

2)3.

36(0

.72)

0.01

(0.1

)3.

36(0

.72)

0.01

(0.1

)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(1.5

)

Non

e(In

dep)

3.75

(0.4

6)4.

07(2

.04)

3.75

(0.4

6)4.

06(2

.04)

4(0)

1.54

(1.4

2)4(

0)1.

54(1

.42)

4(0)

0.47

(0.6

7)4(

0)0.

47(0

.67)

I3.

23(0

.71)

2.05

(1.3

)3.

22(0

.7)

2.01

(1.2

6)4(

0)0.

46(0

.78)

4(0)

0.46

(0.7

8)4(

0)0.

02(0

.14)

4(0)

0.02

(0.1

4)G

auss

3.03

(0.7

6)1.

83(1

.3)

3.03

(0.7

6)1.

8(1.

31)

4(0)

0.28

(0.6

)4(

0)0.

28(0

.6)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

84(0

.9)

1.51

(1.0

6)2.

84(0

.9)

1.48

(1.0

1)4(

0)0.

14(0

.35)

4(0)

0.14

(0.3

5)4(

0)0(

0)4(

0)0(

0)Tr

ue1.

01(0

.88)

0.15

(0.3

6)1.

01(0

.88)

0.15

(0.3

6)3.

82(0

.46)

0.07

(0.2

6)3.

82(0

.46)

0.07

(0.2

6)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(2.5

)

Non

e(In

dep)

3.82

(0.4

1)3.

59(2

.08)

3.81

(0.4

2)3.

53(2

.04)

4(0)

1.34

(1.2

)4(

0)1.

34(1

.2)

4(0)

0.58

(0.8

8)4(

0)0.

58(0

.88)

I3.

22(0

.76)

2.06

(1.5

5)3.

22(0

.76)

2.06

(1.5

5)4(

0)0.

43(0

.86)

4(0)

0.43

(0.8

6)4(

0)0.

04(0

.2)

4(0)

0.04

(0.2

)G

auss

2.91

(0.8

9)1.

72(1

.35)

2.91

(0.8

9)1.

69(1

.35)

4(0)

0.26

(0.6

6)4(

0)0.

26(0

.66)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

97(0

.89)

1.67

(1.3

3)2.

97(0

.89)

1.67

(1.3

3)4(

0)0.

1(0.

3)4(

0)0.

1(0.

3)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

37(0

.56)

0.04

(0.2

4)0.

37(0

.56)

0.04

(0.2

4)2.

2(0.

9)0.

06(0

.24)

2.2(

0.9)

0.06

(0.2

4)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(1.5

)

Non

e(In

dep)

3.69

(0.5

1)3.

85(1

.92)

3.69

(0.5

1)3.

84(1

.93)

4(0)

1.74

(1.5

2)4(

0)1.

74(1

.52)

4(0)

0.58

(0.7

5)4(

0)0.

58(0

.75)

I3.

29(0

.74)

1.9(

1.43

)3.

27(0

.75)

1.88

(1.3

9)4(

0)0.

47(0

.77)

4(0)

0.47

(0.7

7)4(

0)0.

01(0

.1)

4(0)

0.01

(0.1

)G

auss

3.21

(0.7

4)1.

63(1

.23)

3.21

(0.7

4)1.

61(1

.21)

4(0)

0.25

(0.5

)4(

0)0.

25(0

.5)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

84(0

.88)

1.53

(1.1

)2.

84(0

.88)

1.52

(1.1

)4(

0)0.

19(0

.42)

4(0)

0.19

(0.4

2)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

59(0

.65)

0.06

(0.2

4)0.

59(0

.65)

0.06

(0.2

4)2.

79(0

.99)

0.06

(0.2

4)2.

79(0

.99)

0.06

(0.2

4)4(

0)0(

0)4(

0)0(

0)

Gau

ss(1

.5)

Non

e(In

dep)

3.76

(0.4

9)3.

97(2

.06)

3.76

(0.4

9)3.

92(2

.02)

4(0)

1.63

(1.3

)4(

0)1.

63(1

.3)

4(0)

0.59

(0.8

1)4(

0)0.

59(0

.81)

I3.

23(0

.75)

2.07

(1.2

4)3.

23(0

.75)

2.07

(1.2

4)4(

0)0.

64(0

.92)

4(0)

0.64

(0.9

2)4(

0)0.

03(0

.17)

4(0)

0.03

(0.1

7)G

auss

3.02

(0.8

2)1.

86(1

.25)

3.02

(0.8

2)1.

86(1

.25)

4(0)

0.28

(0.5

9)4(

0)0.

28(0

.59)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

76(0

.79)

1.58

(1.1

7)2.

76(0

.79)

1.58

(1.1

7)3.

99(0

.1)

0.23

(0.4

7)3.

99(0

.1)

0.23

(0.4

7)4(

0)0(

0)4(

0)0(

0)Tr

ue3.

3(0.

73)

2.32

(1.5

5)3.

3(0.

73)

2.32

(1.5

5)4(

0)0.

98(1

.05)

4(0)

0.98

(1.0

5)4(

0)0.

12(0

.33)

4(0)

0.12

(0.3

3)

Gau

ss(2

.5)

Non

e(In

dep)

3.67

(0.5

7)3.

97(1

.89)

3.67

(0.5

7)3.

83(1

.81)

4(0)

1.77

(1.4

6)4(

0)1.

77(1

.46)

4(0)

0.55

(0.7

8)4(

0)0.

55(0

.78)

I3.

19(0

.77)

2.1(

1.19

)3.

19(0

.77)

2.09

(1.1

9)4(

0)0.

49(0

.69)

4(0)

0.49

(0.6

9)4(

0)0.

01(0

.1)

4(0)

0.01

(0.1

)G

auss

2.97

(0.8

3)1.

85(1

.34)

2.97

(0.8

3)1.

83(1

.33)

4(0)

0.24

(0.4

7)4(

0)0.

24(0

.47)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

75(0

.77)

1.45

(1.1

2)2.

75(0

.77)

1.44

(1.0

9)4(

0)0.

24(0

.47)

4(0)

0.24

(0.4

7)4(

0)0(

0)4(

0)0(

0)Tr

ue3.

56(0

.57)

3.11

(1.3

8)3.

56(0

.57)

3.11

(1.3

8)4(

0)1.

73(1

.3)

4(0)

1.73

(1.3

)4(

0)0.

69(0

.8)

4(0)

0.69

(0.8

)

Tabl

e6:

Mon

teC

arlo

Mea

n(S

tand

ard

dev.

)fo

rth

ese

lect

ednu

mbe

rof

nonz

ero

cova

riat

esus

ing

100

data

sets

unde

rbo

thIn

depe

nden

tand

Dep

ende

ntse

tup

usin

gsp

atia

llyw

eigh

ted

grou

pLA

SSO

and

adap

tive

Gro

upLA

SSO

algo

rith

ms

whe

nJ

=15

(t=

1)

41

JC

ovM

odel

Spat

ialW

eigh

tsm

=6m

=12

m=2

4G

LA

SSO

Ada

pt.G

LA

SSO

GL

ASS

OA

dapt

.GL

ASS

OG

LA

SSO

Ada

pt.G

LA

SSO

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

25

Exp

(0.5

)

Non

e(In

dep)

3.53

(0.6

7)5.

01(2

.31)

3.53

(0.6

7)4.

89(2

.26)

4(0)

2.54

(1.7

3)4(

0)2.

54(1

.73)

4(0)

0.88

(0.9

9)4(

0)0.

88(0

.99)

I2.

84(0

.95)

2.76

(1.7

8)2.

84(0

.95)

2.76

(1.7

8)4(

0)0.

89(1

.14)

4(0)

0.89

(1.1

4)4(

0)0.

04(0

.2)

4(0)

0.04

(0.2

)G

auss

2.72

(0.9

1)2.

49(1

.56)

2.72

(0.9

1)2.

48(1

.56)

4(0)

0.57

(0.8

9)4(

0)0.

57(0

.89)

4(0)

0.02

(0.1

4)4(

0)0.

02(0

.14)

InvM

Q2.

4(0.

85)

2(1.

36)

2.4(

0.85

)2(

1.36

)4(

0)0.

29(0

.61)

4(0)

0.29

(0.6

1)4(

0)0(

0)4(

0)0(

0)Tr

ue1.

36(0

.99)

0.44

(0.6

1)1.

36(0

.99)

0.44

(0.6

1)3.

99(0

.1)

0.07

(0.2

6)3.

99(0

.1)

0.07

(0.2

6)4(

0)0(

0)4(

0)0(

0)

Exp

(1)

Non

e(In

dep)

3.49

(0.6

3)5.

25(1

.9)

3.49

(0.6

3)5.

14(1

.8)

4(0)

2.98

(2.0

7)4(

0)2.

98(2

.07)

4(0)

0.83

(1.0

4)4(

0)0.

83(1

.04)

I2.

78(0

.77)

2.77

(1.6

2)2.

78(0

.77)

2.77

(1.6

2)4(

0)1.

06(1

.08)

4(0)

1.06

(1.0

8)4(

0)0.

01(0

.1)

4(0)

0.01

(0.1

)G

auss

2.7(

0.85

)2.

18(1

.35)

2.7(

0.85

)2.

16(1

.33)

4(0)

0.55

(0.7

8)4(

0)0.

55(0

.78)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

57(0

.9)

1.8(

1.32

)2.

57(0

.9)

1.8(

1.32

)4(

0)0.

37(0

.63)

4(0)

0.37

(0.6

3)4(

0)0(

0)4(

0)0(

0)Tr

ue2.

05(1

.09)

1.02

(1.0

8)2.

05(1

.09)

1.02

(1.0

8)4(

0)0.

31(0

.54)

4(0)

0.31

(0.5

4)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(2.5

)

Non

e(In

dep)

3.57

(0.5

9)5.

09(2

.16)

3.57

(0.5

9)5.

07(2

.16)

4(0)

2.29

(1.8

2)4(

0)2.

29(1

.82)

4(0)

0.83

(1.1

)4(

0)0.

83(1

.1)

I2.

99(0

.9)

2.52

(1.5

1)2.

99(0

.9)

2.52

(1.5

1)4(

0)0.

64(0

.93)

4(0)

0.64

(0.9

3)4(

0)0.

08(0

.27)

4(0)

0.08

(0.2

7)G

auss

2.85

(0.7

7)2.

38(1

.49)

2.85

(0.7

7)2.

36(1

.48)

4(0)

0.39

(0.7

1)4(

0)0.

39(0

.71)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

55(0

.87)

1.91

(1.4

4)2.

55(0

.87)

1.89

(1.4

4)4(

0)0.

25(0

.56)

4(0)

0.25

(0.5

6)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

66(0

.65)

0.01

(0.1

)0.

66(0

.65)

0.01

(0.1

)3.

15(0

.93)

0.01

(0.1

)3.

15(0

.93)

0.01

(0.1

)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(1.5

)

Non

e(In

dep)

3.63

(0.5

6)4.

91(2

.17)

3.63

(0.5

6)4.

87(2

.15)

4(0)

2.59

(1.5

)4(

0)2.

59(1

.5)

4(0)

0.91

(1.0

6)4(

0)0.

91(1

.06)

I2.

97(0

.97)

2.78

(1.8

5)2.

97(0

.97)

2.78

(1.8

5)4(

0)0.

9(1.

01)

4(0)

0.9(

1.01

)4(

0)0.

02(0

.14)

4(0)

0.02

(0.1

4)G

auss

2.68

(1)

2.18

(1.6

4)2.

68(1

)2.

16(1

.61)

4(0)

0.53

(0.7

4)4(

0)0.

53(0

.74)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

56(0

.89)

1.88

(1.4

)2.

56(0

.89)

1.87

(1.4

)4(

0)0.

18(0

.48)

4(0)

0.18

(0.4

8)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

8(0.

74)

0.06

(0.2

4)0.

8(0.

74)

0.06

(0.2

4)3.

78(0

.48)

0.06

(0.2

4)3.

78(0

.48)

0.06

(0.2

4)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(2.5

)

Non

e(In

dep)

3.67

(0.5

3)4.

99(2

.2)

3.67

(0.5

3)4.

97(2

.2)

4(0)

1.95

(1.9

)4(

0)1.

95(1

.9)

4(0)

0.71

(0.9

4)4(

0)0.

71(0

.94)

I3.

07(0

.86)

2.59

(1.7

8)3.

07(0

.86)

2.57

(1.7

7)4(

0)0.

7(1.

11)

4(0)

0.7(

1.11

)4(

0)0.

06(0

.24)

4(0)

0.06

(0.2

4)G

auss

2.87

(0.9

1)2.

28(1

.53)

2.87

(0.9

1)2.

26(1

.5)

4(0)

0.41

(0.7

5)4(

0)0.

41(0

.75)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

56(0

.83)

1.98

(1.4

8)2.

56(0

.83)

1.97

(1.4

7)4(

0)0.

26(0

.63)

4(0)

0.26

(0.6

3)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

22(0

.44)

0.03

(0.1

7)0.

22(0

.44)

0.03

(0.1

7)1.

75(0

.89)

0.07

(0.2

9)1.

75(0

.89)

0.07

(0.2

9)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(1.5

)

Non

e(In

dep)

3.49

(0.6

1)5.

31(2

.15)

3.49

(0.6

1)5.

17(2

.11)

4(0)

2.45

(1.7

7)4(

0)2.

45(1

.77)

4(0)

0.76

(0.9

7)4(

0)0.

76(0

.97)

I2.

82(0

.9)

2.63

(1.5

3)2.

81(0

.92)

2.62

(1.5

1)4(

0)0.

84(1

)4(

0)0.

84(1

)4(

0)0.

03(0

.17)

4(0)

0.03

(0.1

7)G

auss

2.67

(0.8

8)2.

26(1

.38)

2.67

(0.8

8)2.

26(1

.38)

4(0)

0.44

(0.7

6)4(

0)0.

44(0

.76)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

46(0

.82)

1.77

(1.2

6)2.

46(0

.82)

1.76

(1.2

7)4(

0)0.

24(0

.51)

4(0)

0.24

(0.5

1)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

4(0.

67)

0.04

(0.2

)0.

4(0.

67)

0.04

(0.2

)2.

46(1

)0.

07(0

.26)

2.46

(1)

0.07

(0.2

6)4(

0)0(

0)4(

0)0(

0)

Gau

ss(1

.5)

Non

e(In

dep)

3.5(

0.63

)5.

17(1

.93)

3.48

(0.6

6)5.

02(1

.79)

4(0)

2.89

(1.5

3)4(

0)2.

89(1

.53)

4(0)

0.8(

0.88

)4(

0)0.

8(0.

88)

I2.

84(0

.9)

3.23

(1.6

)2.

84(0

.9)

3.23

(1.6

)4(

0)0.

8(0.

84)

4(0)

0.8(

0.84

)4(

0)0.

04(0

.2)

4(0)

0.04

(0.2

)G

auss

2.74

(0.8

7)2.

9(1.

46)

2.74

(0.8

7)2.

87(1

.43)

4(0)

0.38

(0.6

9)4(

0)0.

38(0

.69)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

47(0

.96)

2.11

(1.5

2)2.

47(0

.96)

2.1(

1.51

)4(

0)0.

27(0

.51)

4(0)

0.27

(0.5

1)4(

0)0(

0)4(

0)0(

0)Tr

ue2.

77(0

.99)

3.32

(1.7

3)2.

76(1

)3.

32(1

.73)

4(0)

1.4(

1.11

)4(

0)1.

4(1.

11)

4(0)

0.21

(0.4

6)4(

0)0.

21(0

.46)

Gau

ss(2

.5)

Non

e(In

dep)

3.42

(0.7

4)5.

47(2

.27)

3.4(

0.75

)5.

32(2

.15)

4(0)

2.89

(1.8

5)4(

0)2.

89(1

.85)

4(0)

0.95

(1.0

6)4(

0)0.

95(1

.06)

I2.

77(0

.89)

2.82

(1.5

6)2.

77(0

.89)

2.82

(1.5

6)4(

0)0.

96(1

.08)

4(0)

0.96

(1.0

8)4(

0)0(

0)4(

0)0(

0)G

auss

2.7(

0.88

)2.

5(1.

33)

2.7(

0.88

)2.

49(1

.33)

4(0)

0.4(

0.68

)4(

0)0.

4(0.

68)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

65(0

.78)

1.99

(1.3

1)2.

65(0

.78)

1.99

(1.3

1)4(

0)0.

37(0

.65)

4(0)

0.37

(0.6

5)4(

0)0(

0)4(

0)0(

0)Tr

ue3.

36(0

.75)

3.99

(1.8

)3.

36(0

.75)

3.97

(1.7

9)4(

0)2.

52(1

.46)

4(0)

2.52

(1.4

6)4(

0)1.

21(1

.15)

4(0)

1.21

(1.1

5)

Tabl

e7:

Mon

teC

arlo

Mea

n(S

tand

ard

dev.

)fo

rth

ese

lect

ednu

mbe

rof

nonz

ero

cova

riat

esus

ing

100

data

sets

unde

rbo

thIn

depe

nden

tand

Dep

ende

ntse

tup

usin

gsp

atia

llyw

eigh

ted

grou

pLA

SSO

and

adap

tive

Gro

upLA

SSO

algo

rith

ms

whe

nJ

=25

(t=

1)

42

JC

ovM

odel

Spat

ialW

eigh

tsm

=6m

=12

m=2

4G

LA

SSO

Ada

pt.G

LA

SSO

GL

ASS

OA

dapt

.GL

ASS

OG

LA

SSO

Ada

pt.G

LA

SSO

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

35

Exp

(0.5

)

Non

e(In

dep)

3.38

(0.7

4)6.

22(2

.39)

3.36

(0.7

6)6.

1(2.

33)

4(0)

3.74

(2.6

8)4(

0)3.

74(2

.68)

4(0)

1.49

(1.5

5)4(

0)1.

49(1

.55)

I2.

62(0

.94)

3.15

(2.1

4)2.

6(0.

95)

3.11

(2.1

3)4(

0)1.

5(1.

45)

4(0)

1.5(

1.45

)4(

0)0.

04(0

.2)

4(0)

0.04

(0.2

)G

auss

2.39

(0.9

8)2.

59(1

.84)

2.39

(0.9

8)2.

56(1

.81)

4(0)

0.89

(1.0

8)4(

0)0.

89(1

.08)

4(0)

0.02

(0.1

4)4(

0)0.

02(0

.14)

InvM

Q2.

22(1

.04)

2.38

(1.7

9)2.

22(1

.04)

2.37

(1.7

8)4(

0)0.

41(0

.75)

4(0)

0.41

(0.7

5)4(

0)0(

0)4(

0)0(

0)Tr

ue1.

21(0

.95)

0.48

(0.7

8)1.

21(0

.95)

0.48

(0.7

8)3.

99(0

.1)

0.1(

0.3)

3.99

(0.1

)0.

1(0.

3)4(

0)0(

0)4(

0)0(

0)

Exp

(1)

Non

e(In

dep)

3.36

(0.7

)5.

77(2

.3)

3.33

(0.7

3)5.

54(1

.98)

4(0)

3.68

(1.9

3)4(

0)3.

68(1

.93)

4(0)

1.31

(1.1

8)4(

0)1.

31(1

.18)

I2.

59(0

.98)

3.3(

1.8)

2.59

(0.9

8)3.

3(1.

8)4(

0)1.

39(1

.36)

4(0)

1.39

(1.3

6)4(

0)0.

09(0

.29)

4(0)

0.09

(0.2

9)G

auss

2.36

(0.9

7)2.

98(1

.69)

2.36

(0.9

7)2.

95(1

.67)

4(0)

0.67

(0.9

5)4(

0)0.

67(0

.95)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

07(0

.99)

2.25

(1.6

1)2.

07(0

.99)

2.24

(1.5

9)4(

0)0.

71(1

.04)

4(0)

0.71

(1.0

4)4(

0)0(

0)4(

0)0(

0)Tr

ue1.

56(1

.05)

1.32

(1.3

8)1.

56(1

.05)

1.32

(1.3

8)4(

0)0.

51(0

.7)

4(0)

0.51

(0.7

)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(2.5

)

Non

e(In

dep)

3.48

(0.6

7)5.

79(2

.32)

3.46

(0.6

7)5.

59(2

.25)

4(0)

2.71

(2.1

3)4(

0)2.

71(2

.13)

4(0)

1.25

(1.3

7)4(

0)1.

25(1

.37)

I2.

71(0

.9)

2.94

(1.7

5)2.

71(0

.9)

2.94

(1.7

5)4(

0)1.

22(1

.28)

4(0)

1.22

(1.2

8)4(

0)0.

1(0.

39)

4(0)

0.1(

0.39

)G

auss

2.57

(0.9

2)2.

59(1

.6)

2.57

(0.9

2)2.

57(1

.58)

4(0)

0.69

(1)

4(0)

0.69

(1)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

25(0

.9)

2.24

(1.6

8)2.

25(0

.9)

2.23

(1.6

8)4(

0)0.

62(0

.97)

4(0)

0.62

(0.9

7)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

52(0

.58)

0.12

(0.4

1)0.

52(0

.58)

0.12

(0.4

1)2.

81(1

.01)

0.04

(0.2

)2.

81(1

.01)

0.04

(0.2

)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(1.5

)

Non

e(In

dep)

3.24

(0.7

4)6.

39(2

.65)

3.23

(0.7

4)6.

21(2

.5)

4(0)

3.4(

2.23

)4(

0)3.

4(2.

23)

4(0)

1.47

(1.4

2)4(

0)1.

47(1

.42)

I2.

63(0

.91)

3.15

(1.9

3)2.

63(0

.91)

3.15

(1.9

3)4(

0)1.

47(1

.51)

4(0)

1.47

(1.5

1)4(

0)0.

05(0

.22)

4(0)

0.05

(0.2

2)G

auss

2.39

(0.8

4)2.

93(1

.93)

2.39

(0.8

4)2.

9(1.

88)

4(0)

0.68

(0.9

9)4(

0)0.

68(0

.99)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

03(1

)2.

26(1

.57)

2.03

(1)

2.25

(1.5

5)4(

0)0.

43(0

.82)

4(0)

0.43

(0.8

2)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

69(0

.73)

0.2(

0.53

)0.

69(0

.73)

0.2(

0.53

)3.

56(0

.61)

0.17

(0.4

)3.

56(0

.61)

0.17

(0.4

)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(2.5

)

Non

e(In

dep)

3.33

(0.7

4)5.

59(2

.19)

3.33

(0.7

4)5.

41(2

.08)

4(0)

3.2(

2.56

)4(

0)3.

2(2.

56)

4(0)

1.34

(1.5

1)4(

0)1.

34(1

.51)

I2.

72(1

)3.

06(1

.97)

2.72

(1)

3.06

(1.9

7)4(

0)0.

75(1

.19)

4(0)

0.75

(1.1

9)4(

0)0.

1(0.

33)

4(0)

0.1(

0.33

)G

auss

2.53

(1.0

4)2.

57(1

.81)

2.53

(1.0

4)2.

57(1

.81)

4(0)

0.41

(0.8

1)4(

0)0.

41(0

.81)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

25(0

.97)

2.17

(1.6

8)2.

25(0

.97)

2.16

(1.6

7)4(

0)0.

38(0

.91)

4(0)

0.38

(0.9

1)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

34(0

.57)

0.05

(0.2

6)0.

34(0

.57)

0.05

(0.2

6)1.

82(0

.83)

0.05

(0.2

2)1.

82(0

.83)

0.05

(0.2

2)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(1.5

)

Non

e(In

dep)

3.2(

0.71

)5.

66(2

.16)

3.19

(0.7

1)5.

56(2

.08)

4(0)

3.57

(2.3

8)4(

0)3.

56(2

.37)

4(0)

1.3(

1.32

)4(

0)1.

3(1.

32)

I2.

39(1

.03)

2.86

(1.9

9)2.

39(1

.03)

2.86

(1.9

9)4(

0)1.

39(1

.41)

4(0)

1.39

(1.4

1)4(

0)0.

07(0

.26)

4(0)

0.07

(0.2

6)G

auss

2.35

(0.9

1)2.

84(1

.76)

2.35

(0.9

1)2.

82(1

.75)

4(0)

0.88

(1.1

1)4(

0)0.

88(1

.11)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

2(0.

89)

2.09

(1.5

2)2.

2(0.

89)

2.09

(1.5

2)4(

0)0.

36(0

.72)

4(0)

0.36

(0.7

2)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

28(0

.51)

0(0)

0.28

(0.5

1)0(

0)2.

38(0

.91)

0.04

(0.2

)2.

38(0

.91)

0.04

(0.2

)4(

0)0(

0)4(

0)0(

0)

Gau

ss(1

.5)

Non

e(In

dep)

3.15

(0.8

8)6.

37(1

.95)

3.14

(0.8

8)6.

24(1

.83)

4(0)

3.94

(1.9

2)4(

0)3.

94(1

.92)

4(0)

1.78

(1.5

4)4(

0)1.

78(1

.54)

I2.

44(1

.02)

3.35

(1.9

8)2.

44(1

.02)

3.34

(1.9

6)4(

0)1.

25(1

.14)

4(0)

1.25

(1.1

4)4(

0)0.

08(0

.27)

4(0)

0.08

(0.2

7)G

auss

2.24

(0.9

5)2.

83(1

.61)

2.24

(0.9

5)2.

82(1

.57)

4(0)

0.63

(0.7

3)4(

0)0.

63(0

.73)

4(0)

0.02

(0.1

4)4(

0)0.

02(0

.14)

InvM

Q2.

18(0

.9)

2.2(

1.41

)2.

18(0

.9)

2.19

(1.3

9)4(

0)0.

59(0

.83)

4(0)

0.59

(0.8

3)4(

0)0(

0)4(

0)0(

0)Tr

ue2.

78(0

.87)

3.66

(1.7

4)2.

75(0

.87)

3.61

(1.6

9)4(

0)1.

84(1

.54)

4(0)

1.84

(1.5

4)4(

0)0.

17(0

.45)

4(0)

0.17

(0.4

5)

Gau

ss(2

.5)

Non

e(In

dep)

3.13

(0.7

5)6.

64(2

.37)

3.12

(0.7

6)6.

45(2

.24)

4(0)

4.38

(2.0

6)4(

0)4.

38(2

.06)

4(0)

1.14

(1.0

7)4(

0)1.

14(1

.07)

I2.

6(0.

96)

3.62

(1.6

6)2.

6(0.

96)

3.62

(1.6

6)4(

0)1.

42(1

.36)

4(0)

1.42

(1.3

6)4(

0)0.

02(0

.14)

4(0)

0.02

(0.1

4)G

auss

2.44

(0.8

6)2.

9(1.

47)

2.44

(0.8

6)2.

86(1

.44)

4(0)

0.73

(0.9

2)4(

0)0.

73(0

.92)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

17(0

.94)

2.36

(1.5

2)2.

17(0

.94)

2.36

(1.5

2)4(

0)0.

63(0

.85)

4(0)

0.63

(0.8

5)4(

0)0(

0)4(

0)0(

0)Tr

ue3(

0.83

)4.

62(1

.85)

3(0.

83)

4.62

(1.8

5)4(

0)3.

53(2

.23)

4(0)

3.53

(2.2

3)4(

0)1.

28(1

.05)

4(0)

1.28

(1.0

5)

Tabl

e8:

Mon

teC

arlo

Mea

n(S

tand

ard

dev.

)fo

rth

ese

lect

ednu

mbe

rof

nonz

ero

cova

riat

esus

ing

100

data

sets

unde

rbo

thIn

depe

nden

tand

Dep

ende

ntse

tup

usin

gsp

atia

llyw

eigh

ted

grou

pLA

SSO

and

adap

tive

Gro

upLA

SSO

algo

rith

ms

whe

nJ

=35

(t=

1)

43

JC

ovM

odel

Spat

ialW

eigh

tsm

=6m

=12

m=2

4G

LA

SSO

Ada

pt.G

LA

SSO

GL

ASS

OA

dapt

.GL

ASS

OG

LA

SSO

Ada

pt.G

LA

SSO

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

15

Exp

(0.5

)

Non

e(In

dep)

3.74

(0.4

8)3.

63(1

.83)

3.74

(0.4

8)3.

62(1

.84)

4(0)

1.6(

1.3)

4(0)

1.6(

1.3)

4(0)

0.55

(0.7

7)4(

0)0.

55(0

.77)

I3.

17(0

.82)

2.02

(1.3

4)3.

15(0

.82)

2(1.

29)

4(0)

0.38

(0.6

5)4(

0)0.

38(0

.65)

4(0)

0(0)

4(0)

0(0)

Gau

ss3.

01(0

.85)

1.76

(1.2

4)3.

01(0

.85)

1.74

(1.2

3)4(

0)0.

18(0

.48)

4(0)

0.18

(0.4

8)4(

0)0(

0)4(

0)0(

0)In

vMQ

2.94

(0.9

3)1.

58(1

.44)

2.94

(0.9

3)1.

58(1

.44)

4(0)

0.1(

0.3)

4(0)

0.1(

0.3)

4(0)

0(0)

4(0)

0(0)

True

1.66

(1.0

2)0.

29(0

.56)

1.66

(1.0

2)0.

29(0

.56)

4(0)

0.07

(0.2

9)4(

0)0.

07(0

.29)

4(0)

0(0)

4(0)

0(0)

Exp

(1)

Non

e(In

dep)

3.82

(0.4

1)3.

77(2

.09)

3.82

(0.4

1)3.

67(1

.99)

4(0)

1.51

(1.2

4)4(

0)1.

51(1

.24)

4(0)

0.53

(0.6

9)4(

0)0.

53(0

.69)

I3.

11(0

.82)

2.31

(1.3

5)3.

11(0

.82)

2.31

(1.3

5)4(

0)0.

43(0

.64)

4(0)

0.43

(0.6

4)4(

0)0.

03(0

.22)

4(0)

0.03

(0.2

2)G

auss

3.01

(0.8

)1.

93(1

.27)

3.01

(0.8

)1.

93(1

.27)

4(0)

0.24

(0.5

1)4(

0)0.

24(0

.51)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

99(0

.88)

1.56

(1.1

7)2.

99(0

.88)

1.54

(1.1

3)4(

0)0.

14(0

.47)

4(0)

0.14

(0.4

7)4(

0)0(

0)4(

0)0(

0)Tr

ue2.

33(1

.05)

1.1(

0.96

)2.

33(1

.05)

1.1(

0.96

)4(

0)0.

12(0

.38)

4(0)

0.12

(0.3

8)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(2.5

)

Non

e(In

dep)

3.8(

0.45

)3.

63(1

.86)

3.79

(0.4

6)3.

57(1

.81)

4(0)

1.37

(1.3

9)4(

0)1.

37(1

.39)

4(0)

0.34

(0.5

7)4(

0)0.

34(0

.57)

I3.

26(0

.77)

1.87

(1.5

)3.

26(0

.77)

1.87

(1.5

)4(

0)0.

46(0

.77)

4(0)

0.46

(0.7

7)4(

0)0.

05(0

.22)

4(0)

0.05

(0.2

2)G

auss

2.98

(0.8

5)1.

64(1

.34)

2.98

(0.8

5)1.

63(1

.33)

4(0)

0.25

(0.5

8)4(

0)0.

25(0

.58)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

74(0

.86)

1.5(

1.18

)2.

74(0

.86)

1.5(

1.18

)4(

0)0.

1(0.

33)

4(0)

0.1(

0.33

)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

8(0.

68)

0.09

(0.3

2)0.

8(0.

68)

0.09

(0.3

2)3.

36(0

.72)

0.01

(0.1

)3.

36(0

.72)

0.01

(0.1

)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(1.5

)

Non

e(In

dep)

3.75

(0.4

6)4.

07(2

.04)

3.75

(0.4

6)4.

06(2

.04)

4(0)

1.54

(1.4

2)4(

0)1.

54(1

.42)

4(0)

0.47

(0.6

7)4(

0)0.

47(0

.67)

I3.

23(0

.71)

2.05

(1.3

)3.

22(0

.7)

2.01

(1.2

6)4(

0)0.

46(0

.78)

4(0)

0.46

(0.7

8)4(

0)0.

02(0

.14)

4(0)

0.02

(0.1

4)G

auss

3.03

(0.7

6)1.

83(1

.3)

3.03

(0.7

6)1.

8(1.

31)

4(0)

0.28

(0.6

)4(

0)0.

28(0

.6)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

84(0

.9)

1.51

(1.0

6)2.

84(0

.9)

1.48

(1.0

1)4(

0)0.

14(0

.35)

4(0)

0.14

(0.3

5)4(

0)0(

0)4(

0)0(

0)Tr

ue1.

01(0

.88)

0.15

(0.3

6)1.

01(0

.88)

0.15

(0.3

6)3.

82(0

.46)

0.07

(0.2

6)3.

82(0

.46)

0.07

(0.2

6)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(2.5

)

Non

e(In

dep)

3.82

(0.4

1)3.

59(2

.08)

3.81

(0.4

2)3.

53(2

.04)

4(0)

1.34

(1.2

)4(

0)1.

34(1

.2)

4(0)

0.58

(0.8

8)4(

0)0.

58(0

.88)

I3.

22(0

.76)

2.06

(1.5

5)3.

22(0

.76)

2.06

(1.5

5)4(

0)0.

43(0

.86)

4(0)

0.43

(0.8

6)4(

0)0.

04(0

.2)

4(0)

0.04

(0.2

)G

auss

2.91

(0.8

9)1.

72(1

.35)

2.91

(0.8

9)1.

69(1

.35)

4(0)

0.26

(0.6

6)4(

0)0.

26(0

.66)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

97(0

.89)

1.67

(1.3

3)2.

97(0

.89)

1.67

(1.3

3)4(

0)0.

1(0.

3)4(

0)0.

1(0.

3)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

37(0

.56)

0.04

(0.2

4)0.

37(0

.56)

0.04

(0.2

4)2.

2(0.

9)0.

06(0

.24)

2.2(

0.9)

0.06

(0.2

4)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(1.5

)

Non

e(In

dep)

3.69

(0.5

1)3.

85(1

.92)

3.69

(0.5

1)3.

84(1

.93)

4(0)

1.74

(1.5

2)4(

0)1.

74(1

.52)

4(0)

0.58

(0.7

5)4(

0)0.

58(0

.75)

I3.

29(0

.74)

1.9(

1.43

)3.

27(0

.75)

1.88

(1.3

9)4(

0)0.

47(0

.77)

4(0)

0.47

(0.7

7)4(

0)0.

01(0

.1)

4(0)

0.01

(0.1

)G

auss

3.21

(0.7

4)1.

63(1

.23)

3.21

(0.7

4)1.

61(1

.21)

4(0)

0.25

(0.5

)4(

0)0.

25(0

.5)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

84(0

.88)

1.53

(1.1

)2.

84(0

.88)

1.52

(1.1

)4(

0)0.

19(0

.42)

4(0)

0.19

(0.4

2)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

59(0

.65)

0.06

(0.2

4)0.

59(0

.65)

0.06

(0.2

4)2.

79(0

.99)

0.06

(0.2

4)2.

79(0

.99)

0.06

(0.2

4)4(

0)0(

0)4(

0)0(

0)

Gau

ss(1

.5)

Non

e(In

dep)

3.76

(0.4

9)3.

97(2

.06)

3.76

(0.4

9)3.

92(2

.02)

4(0)

1.63

(1.3

)4(

0)1.

63(1

.3)

4(0)

0.59

(0.8

1)4(

0)0.

59(0

.81)

I3.

23(0

.75)

2.07

(1.2

4)3.

23(0

.75)

2.07

(1.2

4)4(

0)0.

64(0

.92)

4(0)

0.64

(0.9

2)4(

0)0.

03(0

.17)

4(0)

0.03

(0.1

7)G

auss

3.02

(0.8

2)1.

86(1

.25)

3.02

(0.8

2)1.

86(1

.25)

4(0)

0.28

(0.5

9)4(

0)0.

28(0

.59)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

76(0

.79)

1.58

(1.1

7)2.

76(0

.79)

1.58

(1.1

7)3.

99(0

.1)

0.23

(0.4

7)3.

99(0

.1)

0.23

(0.4

7)4(

0)0(

0)4(

0)0(

0)Tr

ue3.

3(0.

73)

2.32

(1.5

5)3.

3(0.

73)

2.32

(1.5

5)4(

0)0.

98(1

.05)

4(0)

0.98

(1.0

5)4(

0)0.

12(0

.33)

4(0)

0.12

(0.3

3)

Gau

ss(2

.5)

Non

e(In

dep)

3.67

(0.5

7)3.

97(1

.89)

3.67

(0.5

7)3.

83(1

.81)

4(0)

1.77

(1.4

6)4(

0)1.

77(1

.46)

4(0)

0.55

(0.7

8)4(

0)0.

55(0

.78)

I3.

19(0

.77)

2.1(

1.19

)3.

19(0

.77)

2.09

(1.1

9)4(

0)0.

49(0

.69)

4(0)

0.49

(0.6

9)4(

0)0.

01(0

.1)

4(0)

0.01

(0.1

)G

auss

2.97

(0.8

3)1.

85(1

.34)

2.97

(0.8

3)1.

83(1

.33)

4(0)

0.24

(0.4

7)4(

0)0.

24(0

.47)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

75(0

.77)

1.45

(1.1

2)2.

75(0

.77)

1.44

(1.0

9)4(

0)0.

24(0

.47)

4(0)

0.24

(0.4

7)4(

0)0(

0)4(

0)0(

0)Tr

ue3.

56(0

.57)

3.11

(1.3

8)3.

56(0

.57)

3.11

(1.3

8)4(

0)1.

73(1

.3)

4(0)

1.73

(1.3

)4(

0)0.

69(0

.8)

4(0)

0.69

(0.8

)

Tabl

e9:

Mon

teC

arlo

Mea

n(S

tand

ard

dev.

)fo

rth

ese

lect

ednu

mbe

rof

nonz

ero

cova

riat

esus

ing

100

data

sets

unde

rbo

thIn

depe

nden

tand

Dep

ende

ntse

tup

usin

gsp

atia

llyw

eigh

ted

grou

pLA

SSO

and

adap

tive

Gro

upLA

SSO

algo

rith

ms

whe

nJ

=15

(t=

3)

44

JC

ovM

odel

Spat

ialW

eigh

tsm

=6m

=12

m=2

4G

LA

SSO

Ada

pt.G

LA

SSO

GL

ASS

OA

dapt

.GL

ASS

OG

LA

SSO

Ada

pt.G

LA

SSO

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

25

Exp

(0.5

)

Non

e(In

dep)

3.53

(0.6

7)5.

01(2

.31)

3.53

(0.6

7)4.

89(2

.26)

4(0)

2.54

(1.7

3)4(

0)2.

54(1

.73)

4(0)

0.88

(0.9

9)4(

0)0.

88(0

.99)

I2.

84(0

.95)

2.76

(1.7

8)2.

84(0

.95)

2.76

(1.7

8)4(

0)0.

89(1

.14)

4(0)

0.89

(1.1

4)4(

0)0.

04(0

.2)

4(0)

0.04

(0.2

)G

auss

2.72

(0.9

1)2.

49(1

.56)

2.72

(0.9

1)2.

48(1

.56)

4(0)

0.57

(0.8

9)4(

0)0.

57(0

.89)

4(0)

0.02

(0.1

4)4(

0)0.

02(0

.14)

InvM

Q2.

4(0.

85)

2(1.

36)

2.4(

0.85

)2(

1.36

)4(

0)0.

29(0

.61)

4(0)

0.29

(0.6

1)4(

0)0(

0)4(

0)0(

0)Tr

ue1.

36(0

.99)

0.44

(0.6

1)1.

36(0

.99)

0.44

(0.6

1)3.

99(0

.1)

0.07

(0.2

6)3.

99(0

.1)

0.07

(0.2

6)4(

0)0(

0)4(

0)0(

0)

Exp

(1)

Non

e(In

dep)

3.49

(0.6

3)5.

25(1

.9)

3.49

(0.6

3)5.

14(1

.8)

4(0)

2.98

(2.0

7)4(

0)2.

98(2

.07)

4(0)

0.83

(1.0

4)4(

0)0.

83(1

.04)

I2.

78(0

.77)

2.77

(1.6

2)2.

78(0

.77)

2.77

(1.6

2)4(

0)1.

06(1

.08)

4(0)

1.06

(1.0

8)4(

0)0.

01(0

.1)

4(0)

0.01

(0.1

)G

auss

2.7(

0.85

)2.

18(1

.35)

2.7(

0.85

)2.

16(1

.33)

4(0)

0.55

(0.7

8)4(

0)0.

55(0

.78)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

57(0

.9)

1.8(

1.32

)2.

57(0

.9)

1.8(

1.32

)4(

0)0.

37(0

.63)

4(0)

0.37

(0.6

3)4(

0)0(

0)4(

0)0(

0)Tr

ue2.

05(1

.09)

1.02

(1.0

8)2.

05(1

.09)

1.02

(1.0

8)4(

0)0.

31(0

.54)

4(0)

0.31

(0.5

4)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(2.5

)

Non

e(In

dep)

3.57

(0.5

9)5.

09(2

.16)

3.57

(0.5

9)5.

07(2

.16)

4(0)

2.29

(1.8

2)4(

0)2.

29(1

.82)

4(0)

0.83

(1.1

)4(

0)0.

83(1

.1)

I2.

99(0

.9)

2.52

(1.5

1)2.

99(0

.9)

2.52

(1.5

1)4(

0)0.

64(0

.93)

4(0)

0.64

(0.9

3)4(

0)0.

08(0

.27)

4(0)

0.08

(0.2

7)G

auss

2.85

(0.7

7)2.

38(1

.49)

2.85

(0.7

7)2.

36(1

.48)

4(0)

0.39

(0.7

1)4(

0)0.

39(0

.71)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

55(0

.87)

1.91

(1.4

4)2.

55(0

.87)

1.89

(1.4

4)4(

0)0.

25(0

.56)

4(0)

0.25

(0.5

6)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

66(0

.65)

0.01

(0.1

)0.

66(0

.65)

0.01

(0.1

)3.

15(0

.93)

0.01

(0.1

)3.

15(0

.93)

0.01

(0.1

)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(1.5

)

Non

e(In

dep)

3.63

(0.5

6)4.

91(2

.17)

3.63

(0.5

6)4.

87(2

.15)

4(0)

2.59

(1.5

)4(

0)2.

59(1

.5)

4(0)

0.91

(1.0

6)4(

0)0.

91(1

.06)

I2.

97(0

.97)

2.78

(1.8

5)2.

97(0

.97)

2.78

(1.8

5)4(

0)0.

9(1.

01)

4(0)

0.9(

1.01

)4(

0)0.

02(0

.14)

4(0)

0.02

(0.1

4)G

auss

2.68

(1)

2.18

(1.6

4)2.

68(1

)2.

16(1

.61)

4(0)

0.53

(0.7

4)4(

0)0.

53(0

.74)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

56(0

.89)

1.88

(1.4

)2.

56(0

.89)

1.87

(1.4

)4(

0)0.

18(0

.48)

4(0)

0.18

(0.4

8)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

8(0.

74)

0.06

(0.2

4)0.

8(0.

74)

0.06

(0.2

4)3.

78(0

.48)

0.06

(0.2

4)3.

78(0

.48)

0.06

(0.2

4)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(2.5

)

Non

e(In

dep)

3.67

(0.5

3)4.

99(2

.2)

3.67

(0.5

3)4.

97(2

.2)

4(0)

1.95

(1.9

)4(

0)1.

95(1

.9)

4(0)

0.71

(0.9

4)4(

0)0.

71(0

.94)

I3.

07(0

.86)

2.59

(1.7

8)3.

07(0

.86)

2.57

(1.7

7)4(

0)0.

7(1.

11)

4(0)

0.7(

1.11

)4(

0)0.

06(0

.24)

4(0)

0.06

(0.2

4)G

auss

2.87

(0.9

1)2.

28(1

.53)

2.87

(0.9

1)2.

26(1

.5)

4(0)

0.41

(0.7

5)4(

0)0.

41(0

.75)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

56(0

.83)

1.98

(1.4

8)2.

56(0

.83)

1.97

(1.4

7)4(

0)0.

26(0

.63)

4(0)

0.26

(0.6

3)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

22(0

.44)

0.03

(0.1

7)0.

22(0

.44)

0.03

(0.1

7)1.

75(0

.89)

0.07

(0.2

9)1.

75(0

.89)

0.07

(0.2

9)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(1.5

)

Non

e(In

dep)

3.49

(0.6

1)5.

31(2

.15)

3.49

(0.6

1)5.

17(2

.11)

4(0)

2.45

(1.7

7)4(

0)2.

45(1

.77)

4(0)

0.76

(0.9

7)4(

0)0.

76(0

.97)

I2.

82(0

.9)

2.63

(1.5

3)2.

81(0

.92)

2.62

(1.5

1)4(

0)0.

84(1

)4(

0)0.

84(1

)4(

0)0.

03(0

.17)

4(0)

0.03

(0.1

7)G

auss

2.67

(0.8

8)2.

26(1

.38)

2.67

(0.8

8)2.

26(1

.38)

4(0)

0.44

(0.7

6)4(

0)0.

44(0

.76)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

46(0

.82)

1.77

(1.2

6)2.

46(0

.82)

1.76

(1.2

7)4(

0)0.

24(0

.51)

4(0)

0.24

(0.5

1)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

4(0.

67)

0.04

(0.2

)0.

4(0.

67)

0.04

(0.2

)2.

46(1

)0.

07(0

.26)

2.46

(1)

0.07

(0.2

6)4(

0)0(

0)4(

0)0(

0)

Gau

ss(1

.5)

Non

e(In

dep)

3.5(

0.63

)5.

17(1

.93)

3.48

(0.6

6)5.

02(1

.79)

4(0)

2.89

(1.5

3)4(

0)2.

89(1

.53)

4(0)

0.8(

0.88

)4(

0)0.

8(0.

88)

I2.

84(0

.9)

3.23

(1.6

)2.

84(0

.9)

3.23

(1.6

)4(

0)0.

8(0.

84)

4(0)

0.8(

0.84

)4(

0)0.

04(0

.2)

4(0)

0.04

(0.2

)G

auss

2.74

(0.8

7)2.

9(1.

46)

2.74

(0.8

7)2.

87(1

.43)

4(0)

0.38

(0.6

9)4(

0)0.

38(0

.69)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

47(0

.96)

2.11

(1.5

2)2.

47(0

.96)

2.1(

1.51

)4(

0)0.

27(0

.51)

4(0)

0.27

(0.5

1)4(

0)0(

0)4(

0)0(

0)Tr

ue2.

77(0

.99)

3.32

(1.7

3)2.

76(1

)3.

32(1

.73)

4(0)

1.4(

1.11

)4(

0)1.

4(1.

11)

4(0)

0.21

(0.4

6)4(

0)0.

21(0

.46)

Gau

ss(2

.5)

Non

e(In

dep)

3.42

(0.7

4)5.

47(2

.27)

3.4(

0.75

)5.

32(2

.15)

4(0)

2.89

(1.8

5)4(

0)2.

89(1

.85)

4(0)

0.95

(1.0

6)4(

0)0.

95(1

.06)

I2.

77(0

.89)

2.82

(1.5

6)2.

77(0

.89)

2.82

(1.5

6)4(

0)0.

96(1

.08)

4(0)

0.96

(1.0

8)4(

0)0(

0)4(

0)0(

0)G

auss

2.7(

0.88

)2.

5(1.

33)

2.7(

0.88

)2.

49(1

.33)

4(0)

0.4(

0.68

)4(

0)0.

4(0.

68)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

65(0

.78)

1.99

(1.3

1)2.

65(0

.78)

1.99

(1.3

1)4(

0)0.

37(0

.65)

4(0)

0.37

(0.6

5)4(

0)0(

0)4(

0)0(

0)Tr

ue3.

36(0

.75)

3.99

(1.8

)3.

36(0

.75)

3.97

(1.7

9)4(

0)2.

52(1

.46)

4(0)

2.52

(1.4

6)4(

0)1.

21(1

.15)

4(0)

1.21

(1.1

5)

Tabl

e10

:M

onte

Car

loM

ean

(Sta

ndar

dde

v.)

for

the

sele

cted

num

ber

ofno

nzer

oco

vari

ates

usin

g10

0da

tase

tsun

der

both

Inde

pend

enta

ndD

epen

dent

setu

pus

ing

spat

ially

wei

ghte

dgr

oup

LASS

Oan

dad

aptiv

eG

roup

LASS

Oal

gori

thm

sw

henJ

=25

(t=

3)

45

JC

ovM

odel

Spat

ialW

eigh

tsm

=6m

=12

m=2

4G

LA

SSO

Ada

pt.G

LA

SSO

GL

ASS

OA

dapt

.GL

ASS

OG

LA

SSO

Ada

pt.G

LA

SSO

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

True

Posi

tive

Fals

ePo

sitiv

eTr

uePo

sitiv

eFa

lse

Posi

tive

35

Exp

(0.5

)

Non

e(In

dep)

3.38

(0.7

4)6.

22(2

.39)

3.36

(0.7

6)6.

1(2.

33)

4(0)

3.74

(2.6

8)4(

0)3.

74(2

.68)

4(0)

1.49

(1.5

5)4(

0)1.

49(1

.55)

I2.

62(0

.94)

3.15

(2.1

4)2.

6(0.

95)

3.11

(2.1

3)4(

0)1.

5(1.

45)

4(0)

1.5(

1.45

)4(

0)0.

04(0

.2)

4(0)

0.04

(0.2

)G

auss

2.39

(0.9

8)2.

59(1

.84)

2.39

(0.9

8)2.

56(1

.81)

4(0)

0.89

(1.0

8)4(

0)0.

89(1

.08)

4(0)

0.02

(0.1

4)4(

0)0.

02(0

.14)

InvM

Q2.

22(1

.04)

2.38

(1.7

9)2.

22(1

.04)

2.37

(1.7

8)4(

0)0.

41(0

.75)

4(0)

0.41

(0.7

5)4(

0)0(

0)4(

0)0(

0)Tr

ue1.

21(0

.95)

0.48

(0.7

8)1.

21(0

.95)

0.48

(0.7

8)3.

99(0

.1)

0.1(

0.3)

3.99

(0.1

)0.

1(0.

3)4(

0)0(

0)4(

0)0(

0)

Exp

(1)

Non

e(In

dep)

3.36

(0.7

)5.

77(2

.3)

3.33

(0.7

3)5.

54(1

.98)

4(0)

3.68

(1.9

3)4(

0)3.

68(1

.93)

4(0)

1.31

(1.1

8)4(

0)1.

31(1

.18)

I2.

59(0

.98)

3.3(

1.8)

2.59

(0.9

8)3.

3(1.

8)4(

0)1.

39(1

.36)

4(0)

1.39

(1.3

6)4(

0)0.

09(0

.29)

4(0)

0.09

(0.2

9)G

auss

2.36

(0.9

7)2.

98(1

.69)

2.36

(0.9

7)2.

95(1

.67)

4(0)

0.67

(0.9

5)4(

0)0.

67(0

.95)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

07(0

.99)

2.25

(1.6

1)2.

07(0

.99)

2.24

(1.5

9)4(

0)0.

71(1

.04)

4(0)

0.71

(1.0

4)4(

0)0(

0)4(

0)0(

0)Tr

ue1.

56(1

.05)

1.32

(1.3

8)1.

56(1

.05)

1.32

(1.3

8)4(

0)0.

51(0

.7)

4(0)

0.51

(0.7

)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(2.5

)

Non

e(In

dep)

3.48

(0.6

7)5.

79(2

.32)

3.46

(0.6

7)5.

59(2

.25)

4(0)

2.71

(2.1

3)4(

0)2.

71(2

.13)

4(0)

1.25

(1.3

7)4(

0)1.

25(1

.37)

I2.

71(0

.9)

2.94

(1.7

5)2.

71(0

.9)

2.94

(1.7

5)4(

0)1.

22(1

.28)

4(0)

1.22

(1.2

8)4(

0)0.

1(0.

39)

4(0)

0.1(

0.39

)G

auss

2.57

(0.9

2)2.

59(1

.6)

2.57

(0.9

2)2.

57(1

.58)

4(0)

0.69

(1)

4(0)

0.69

(1)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

25(0

.9)

2.24

(1.6

8)2.

25(0

.9)

2.23

(1.6

8)4(

0)0.

62(0

.97)

4(0)

0.62

(0.9

7)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

52(0

.58)

0.12

(0.4

1)0.

52(0

.58)

0.12

(0.4

1)2.

81(1

.01)

0.04

(0.2

)2.

81(1

.01)

0.04

(0.2

)4(

0)0(

0)4(

0)0(

0)

Mat3/2

(1.5

)

Non

e(In

dep)

3.24

(0.7

4)6.

39(2

.65)

3.23

(0.7

4)6.

21(2

.5)

4(0)

3.4(

2.23

)4(

0)3.

4(2.

23)

4(0)

1.47

(1.4

2)4(

0)1.

47(1

.42)

I2.

63(0

.91)

3.15

(1.9

3)2.

63(0

.91)

3.15

(1.9

3)4(

0)1.

47(1

.51)

4(0)

1.47

(1.5

1)4(

0)0.

05(0

.22)

4(0)

0.05

(0.2

2)G

auss

2.39

(0.8

4)2.

93(1

.93)

2.39

(0.8

4)2.

9(1.

88)

4(0)

0.68

(0.9

9)4(

0)0.

68(0

.99)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

03(1

)2.

26(1

.57)

2.03

(1)

2.25

(1.5

5)4(

0)0.

43(0

.82)

4(0)

0.43

(0.8

2)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

69(0

.73)

0.2(

0.53

)0.

69(0

.73)

0.2(

0.53

)3.

56(0

.61)

0.17

(0.4

)3.

56(0

.61)

0.17

(0.4

)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(2.5

)

Non

e(In

dep)

3.33

(0.7

4)5.

59(2

.19)

3.33

(0.7

4)5.

41(2

.08)

4(0)

3.2(

2.56

)4(

0)3.

2(2.

56)

4(0)

1.34

(1.5

1)4(

0)1.

34(1

.51)

I2.

72(1

)3.

06(1

.97)

2.72

(1)

3.06

(1.9

7)4(

0)0.

75(1

.19)

4(0)

0.75

(1.1

9)4(

0)0.

1(0.

33)

4(0)

0.1(

0.33

)G

auss

2.53

(1.0

4)2.

57(1

.81)

2.53

(1.0

4)2.

57(1

.81)

4(0)

0.41

(0.8

1)4(

0)0.

41(0

.81)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

25(0

.97)

2.17

(1.6

8)2.

25(0

.97)

2.16

(1.6

7)4(

0)0.

38(0

.91)

4(0)

0.38

(0.9

1)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

34(0

.57)

0.05

(0.2

6)0.

34(0

.57)

0.05

(0.2

6)1.

82(0

.83)

0.05

(0.2

2)1.

82(0

.83)

0.05

(0.2

2)4(

0)0(

0)4(

0)0(

0)

Mat5/2

(1.5

)

Non

e(In

dep)

3.2(

0.71

)5.

66(2

.16)

3.19

(0.7

1)5.

56(2

.08)

4(0)

3.57

(2.3

8)4(

0)3.

56(2

.37)

4(0)

1.3(

1.32

)4(

0)1.

3(1.

32)

I2.

39(1

.03)

2.86

(1.9

9)2.

39(1

.03)

2.86

(1.9

9)4(

0)1.

39(1

.41)

4(0)

1.39

(1.4

1)4(

0)0.

07(0

.26)

4(0)

0.07

(0.2

6)G

auss

2.35

(0.9

1)2.

84(1

.76)

2.35

(0.9

1)2.

82(1

.75)

4(0)

0.88

(1.1

1)4(

0)0.

88(1

.11)

4(0)

0(0)

4(0)

0(0)

InvM

Q2.

2(0.

89)

2.09

(1.5

2)2.

2(0.

89)

2.09

(1.5

2)4(

0)0.

36(0

.72)

4(0)

0.36

(0.7

2)4(

0)0(

0)4(

0)0(

0)Tr

ue0.

28(0

.51)

0(0)

0.28

(0.5

1)0(

0)2.

38(0

.91)

0.04

(0.2

)2.

38(0

.91)

0.04

(0.2

)4(

0)0(

0)4(

0)0(

0)

Gau

ss(1

.5)

Non

e(In

dep)

3.15

(0.8

8)6.

37(1

.95)

3.14

(0.8

8)6.

24(1

.83)

4(0)

3.94

(1.9

2)4(

0)3.

94(1

.92)

4(0)

1.78

(1.5

4)4(

0)1.

78(1

.54)

I2.

44(1

.02)

3.35

(1.9

8)2.

44(1

.02)

3.34

(1.9

6)4(

0)1.

25(1

.14)

4(0)

1.25

(1.1

4)4(

0)0.

08(0

.27)

4(0)

0.08

(0.2

7)G

auss

2.24

(0.9

5)2.

83(1

.61)

2.24

(0.9

5)2.

82(1

.57)

4(0)

0.63

(0.7

3)4(

0)0.

63(0

.73)

4(0)

0.02

(0.1

4)4(

0)0.

02(0

.14)

InvM

Q2.

18(0

.9)

2.2(

1.41

)2.

18(0

.9)

2.19

(1.3

9)4(

0)0.

59(0

.83)

4(0)

0.59

(0.8

3)4(

0)0(

0)4(

0)0(

0)Tr

ue2.

78(0

.87)

3.66

(1.7

4)2.

75(0

.87)

3.61

(1.6

9)4(

0)1.

84(1

.54)

4(0)

1.84

(1.5

4)4(

0)0.

17(0

.45)

4(0)

0.17

(0.4

5)

Gau

ss(2

.5)

Non

e(In

dep)

3.13

(0.7

5)6.

64(2

.37)

3.12

(0.7

6)6.

45(2

.24)

4(0)

4.38

(2.0

6)4(

0)4.

38(2

.06)

4(0)

1.14

(1.0

7)4(

0)1.

14(1

.07)

I2.

6(0.

96)

3.62

(1.6

6)2.

6(0.

96)

3.62

(1.6

6)4(

0)1.

42(1

.36)

4(0)

1.42

(1.3

6)4(

0)0.

02(0

.14)

4(0)

0.02

(0.1

4)G

auss

2.44

(0.8

6)2.

9(1.

47)

2.44

(0.8

6)2.

86(1

.44)

4(0)

0.73

(0.9

2)4(

0)0.

73(0

.92)

4(0)

0.01

(0.1

)4(

0)0.

01(0

.1)

InvM

Q2.

17(0

.94)

2.36

(1.5

2)2.

17(0

.94)

2.36

(1.5

2)4(

0)0.

63(0

.85)

4(0)

0.63

(0.8

5)4(

0)0(

0)4(

0)0(

0)Tr

ue3(

0.83

)4.

62(1

.85)

3(0.

83)

4.62

(1.8

5)4(

0)3.

53(2

.23)

4(0)

3.53

(2.2

3)4(

0)1.

28(1

.05)

4(0)

1.28

(1.0

5)

Tabl

e11

:M

onte

Car

loM

ean

(Sta

ndar

dde

v.)

for

the

sele

cted

num

ber

ofno

nzer

oco

vari

ates

usin

g10

0da

tase

tsun

der

both

Inde

pend

enta

ndD

epen

dent

setu

pus

ing

spat

ially

wei

ghte

dgr

oup

LASS

Oan

dad

aptiv

eG

roup

LASS

Oal

gori

thm

sw

henJ

=35

(t=

3)

46

References[1] Anh, V. V. and Lunney, K. E. (1995), “Parameter estimation of random fields with long-

range dependence,” Mathematical and computer modelling, 21(9), 67-77.

[2] Antoniadis, A. and Fan, J. (2001), “Regularization of wavelet approximation (with dis-cussion),” Journal of the American Statistical Association, 96, 939-967.

[3] Chu, T., Zhu, J. and Wang, H. (2011), “Penalized maximum likelihood estimation andvariable selection in geostatistics,” The Annals of Statistics, 39(5), 2607-2625.

[4] Cressie, N. and Chan, N. H. (1989), “Spatial modeling of regional variables,” Journal ofthe American Statistical Association, 84(406), 393-401.

[5] Fan, Y. and Tang, C. Y. (2013), “Tuning parameter selection in high dimensional penal-ized likelihood,” Journal of the Royal Statistical Society: Series B (Statistical Methodol-ogy), 75(3), 531-552.

[6] Fu, R., Thurman, A. L., Chu, T., Steen-Adams, M. M. and Zhu, J. (2013), “On Estima-tion and Selection of Autologistic Regression Models via Penalized Pseudolikelihood,”Journal of Agricultural, Biological, and Environmental Statistics, 18, 429-449.

[7] Gupta, S. (2012), “A note on the asymptotic distribution of LASSO estimator for corre-lated data,” Sankhya A, 74, 10-28.

[8] Hastie, T. J. and Tibshirani, R. J. (1990),“ Generalized additive models,” CRC Press.

[9] Haustein, K. O. (2006), “Smoking and poverty,” European Journal of Preventive Cardi-ology, 13(3), 312–318.

[10] Hoeting, J. A., Davis, R. A., Merton, A. A. and Thompson, S. E. (2006), “Model selectionfor geostatistical models,” Ecological Applications, 16(1), 87-98.

[11] Horn, R. A. and Johnson, C. A. (1985), “Matrix analysis”, Cambridge University Press.

[12] Hsu, N-J. Hung, H-L. and Chang, Y-M. (2008), “Subset selection for vector autoregres-sive processes using Lasso,” Computational Statistics and Data Analysis, 52, 3645-3657.

[13] Huang, H. C. and Chen, C. S. (2007), “Optimal geostatistical model selection,” Journalof the American Statistical Association, 102(479), 1009-1024.

[14] Huang, H. C., Hsu, N. J., Theobald, D. M. and Breidt, F. J. (2010a), “Spatial Lasso withapplications to GIS model selection,” Journal of Computational and Graphical Statistics,19(4), 963-983.

47

[15] Huang, J., Horowitz, J. L. and Wei, F. (2010b), “Variable selection in nonparametricadditive models,” The Annals of Statistics, 38, 2282-2313.

[16] Kneib, T., Hothorn, T. and Tutz, G. (2009), “Variable selection and model choice ingeoadditive regression models,” Biometrics, 65(2), 626-634.

[17] Lai, R.C.S., Huang, H. and Lee, T.C.M. (2012), “Fixed and random effects selection innonparametric additive mixed models,” Electronic Journal of Statistics, 6, 810-842.

[18] Lin, Y. and Zhang, H. (2006), “Component selection and smoothing in multivariate non-parametric regression,” The Annals of Statistics, 34, 2272-2297.

[19] Meier, L., Van De Geer, S. and Buhlmann, P. (2009), “High-dimensional additive mod-eling,” The Annals of Statistics, 37, 3779-3821.

[20] Nardi, Y. and Rinaldo, A. (2011), “Autoregressive process modeling via the Lasso pro-cedure,” Journal of Multivariate Analysis, 102, 528-549.

[21] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009), “Sparse additive models,”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5), 1009-1030.

[22] Reyes, P. E., Zhu, J. and Aukema, B. H. (2012), “Selection of Spatial-Temporal LatticeModels: Assessing the Impact of Climate Conditions on a Mountain Pine Beetle Out-break,” Journal of Agricultural, Biological, and Environmental Statistics, 17, 508-525.

[23] Schumaker, L. (2007), “Spline functions: basic theory,” Cambridge University Press

[24] SEER (Census 2000), “ Surveillance, Epidemiology, and End Results” SEER, cf. -www.seer.cancer.gov

[25] Stein, M. L. (1999), “Interpolation of Spatial Data”, Springer.

[26] Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso,” Journal of theRoyal Statistical Society. Series B (Methodological), 267-288.

[27] Van der Vart, A. W. and Wellner, J.A. (1996), “Weak Convergence and Emperical Pro-cesses:With Application to Statistics,” Springer, New York. MR 13851671.

[28] Wang, H. and Zhu, J. (2009), “Variable selection in spatial regression via penalized leastsquares,” The Canadian Journal of Statistics, 37, 607-624.

[29] Wang, H., Li, G. and Tsai, C-L. (2007), “Regression coefficient and autoregressive ordershrinkage and selection via the lasso”, Journal of the Royal Statistical Society. Series B(Methodological), 69, 63-78.

48

[30] Wei, F. and Huang, J. (2010), “ Consistent group selection in high-dimensional linearregression,” Bernoulli Volume 16, Number 4 (2010), 1369-1384.

[31] Wendland, H. (2005). “ Scattered data approximation” (Vol. 17). Cambridge UniversityPress.

[32] Yuan, M. and Lin, Y. (2006), “Model selection and estimation in regression with groupedvariables”, Journal of the Royal Statistical Society. Series B (Methodological), 68, 49-67.

[33] Zhang, C. and Huang, J. (2008), “The sparsity and bias of the Lasso selection in high-dimensional linear regression,” The Annals of Statistics, 46, 1567-1594.

[34] Zhu, J., Huang, H.-C., and Reyes, P.E. (2010), “On selection of spatial linear models forlattice data”, Journal of the Royal Statistical Society Series B, 72, 389-402.

49

Documents

Additive Model Building for Spatial Regression - … Model Building for Spatial Regression ... where a number of demographic and socio-economic predictors are ... s2Rdgbe a spatial