School of Statistics University of Minnesota...Intro. MLR MLE 2Pop. Cattle II. Other progress Intro:PLSR PLS Asymp Finis Response envelopes – (P EY,X) Q EY – in multi-response

Envelope Methods

Dennis Cook

School of StatisticsUniversity of Minnesota

September 15, 2019

Intro. MLR MLE 2Pop. Cattle II. Other progress Intro:PLSR PLS Asymp Finis

I. Overarching ideasII. Progress to dateIII. Envelopes and PLS regression

Context: Regression of Y ∈ Rr on X ∈ Rp.

Dennis Cook | Envelope Methods


Predictor reduction via sufficient dimensionreduction

Central subspace SY|X, the intersections of all S ⊆ Rp so that

Y X | PSX

Strengths: model-free, graphics, many extensions,specialization and adaptations. (B. Li, 2019)

Limitations:1 Predictor collinearity – PSY|XX confused with QSY|XX2 No advantage in model-based analyses.3 Post reduction inference is difficult (for now).



Predictor reduction via envelopes

Predictor envelope E, the intersections of all S ⊆ Rp so that

(a) Y X | PSX and (b) PSX QSX ⇐⇒ (c) (Y, PSX) QSX.

Strengths:

1 Serviceable in model-based analyses2 Collinearity managed3 Post reduction inference is doable – Can reduce estimative

and predictive variation relative to standard methods

Limitations:1 SY|X ⊆ E, so E ’envelopes’ SY|X.2 Under-developed for model-free analysis (for now)



Response reduction via envelopes

Response envelope E, the intersections of all S ⊆ Rp so that

(PSY, X) QSY ⇐⇒ X QSY and PSY QSY | X

Strengths and limitations parallel those for predictor reduction.



Response envelopes – (PEY, X) QEY – inmulti-response linear regression,

Yi = α+ βXi + εi, i = 1, . . . , n

Y ∈ Rr. X ∈ Rp, non-stochastic. ε ∼ N(0,ΣY|X).Goal: estimate β ∈ Rr×p, prediction.

MLE B of β is obtained by doing r univariate linear regressions,one for each response on X.

The response envelope is the smallest reducing subspace ofΣY|X that contains B = span(β). In expanded notationE = EΣY|X(B), with u = dim(EΣY|X(B)).



Parameterizing the model in terms of EΣY|X(B)

Semi-orthogonal basis matrices Γ ∈ Rr×u and Γ0 ∈ Rr×(r−u) forEΣY|X(B) and E⊥ΣY|X

(B).

Envelope Model: Y = α+ ΓηX + ε, ΣY|X = ΓΩΓT + Γ0Ω0ΓT0 .

0 6 u 6 r. If u = r, the envelope model reduces to the standardmodel.

MLE with u determined by AIC, BIC, likelihood ratio testing,cross validation, or avoided by model averaging.

We are still interested in β = Γη and ΣY|X, which depend on theenvelope EΣY|X(B), but not on the particular basis Γ selected torepresent them.



Maximum likelihood estimation

Envelope estimator EΣY|X(B) with given dimension u:

EΣY|X(B) = arg minS∈Gu,r

(log |PSSY|XPS|0 + log |QSSYQS|0),

where| · |0 means the product of the non-zero eigenvaluesGu,r = GrassmannianSY|X sample residual covariance matrix from standard fitSY sample covariance of Y



Let E be short-hand for EΣY|X(B). Then

β = PE

B,

ΣY|X = PE

SY|XPE+ Q

ESY|XQ

E

Also,avar(

√nvec[β]) 6 avar(

√nvec[B])

massive gains when

‖var(PEY | X)‖ ‖var(QEY | X)‖ = ‖var(QEY)‖

Estimators are√

n-consistent without normality but withfinite fourth moments.



How envelopes workTwo responses, Y1 and Y2, and a single predictor, X = 0 or 1, toindicate two populations.

Y =

(Y1Y2

)=

(α1α2

)+

(β1β2

)X +

(ε1ε2

)α1 = E(Y1|X = 0), β1 = E(Y1|X = 1) − E(Y1|X = 0),α2 = E(Y2|X = 0), β2 = E(Y2|X = 1) − E(Y2|X = 0).

Standard estimators are obtained by substituting samplemoments.



Possible configurations for uncomplicated study

Y2

Y1

0

0

Y2

Y1

0

0



Inference for β2 = E(Y2|X = 1) − E(Y2|X = 0)









Schematic representation of an envelope analysis

E⊥Σ(B)

EΣ(B)




E⊥Σ(B)

EΣ(B)




E⊥Σ(B)

EΣ(B)



Response envelopes & cattle data

Experiment: Two treatments, each assigned randomly to 30cows. Weight (kg) measured at weeks 2, 4, 6, . . . ,16, 18, 19.Do the treatment have a differential effect; if so, about when it isfirst apparent?



Profile plot of cattle data

Week

W

e

i

g

h

t

0 2 4 6 8 10 12 14 16 18 20

1

9

0

2

1

0

2

3

0

2

5

0

2

7

0

2

9

0

3

1

0

3

3

0

3

5

0

3

7

0

Yi = α+ βXi + εi, X = 0, 1, ε ∼ N10(0,Σ)B = Ytrt1 − Ytrt2, the MLE of β



Mean profile plot of cattle data

max10i=1 |(B)i|/SE(B)i ≈ 1.3.



Fitted profiles from envelope analysisFitted profiles with u = 5. |βi|/SE(βi) > 4.1, i > 10



Cattle weight, week 12 vs week 14

240 260 280 300 320 340 360

260

280

300

320

340

360

Weight on week 12

Wei

ght o

n w

eek

14

EnvelopeΓTY

E

Γ0TY

E S



Cattle weight, mean of ΓT0 Y by treatment and

time, where Γ0 is a basis for E⊥Σ(B)



II. Response envelopes in constrainedmulti-response linear regression,

span(β) ⊆ span(U)

Yi = α0 + UαXi + εi, i = 1, . . . , n

U ∈ Rr×k, knownα ∈ Rk×p, unknown.β = Uα.

In longitudinal data, ΣY|X is often modeled as well, perhapsusing compound symmetry or an AR structure.



II. Other Advances

1 Predictor envelopes & PLS (Cook, Helland & Su, 2013)2 Response-predictor envelopes (Cook, et al. 2015)3 Envelope foundations (Cook & Zhang 2015)4 Bayesian response envelopes (Khare, et al. 2016)5 Sparse response & predictor envelopes (Su, et al. 2017)6 Tensor envelopes & neuroimaging (L. Li & Zhang, 2015)7 Supervised SVD (G. Li, et al. 2015)8 Imaging genetic analysis (Park, et al. 2017)9 Spatial analyses (Rekabdarkolaee & Wang, 2017)10 Estimating genetic fitness via Aster models (Eck, et al.

2017)11 Envelope Quantile Regression (Ding, et al. 2019)



III. Predictor envelopes – (Y, PEX) QEX –and PLS regression

Developed around 1980 by the Scandinavian Chemometricscommunity – S. Wold, H. Martens & H. Wold – PLS regressionconsist of algorithms for fitting high-dimensional (n < p) linearregressions

Yi = α+ βTXi + εi, i = 1, . . . , n,

Y univariate, X ∈ Rp ∼ N(0,ΣX), normal errors, ε X.

Envelope: E = EΣX(B), the smallest reducing subspace of ΣXthat contains B = span(β).



LetΦ ∈ Rp×q be a semi-ortho. basis matrix for E = EΣX(B).

Yi = α+ ηTΦTXi + εi, i = 1, . . . , n,ΣX = Φ∆ΦT +Φ0∆0Φ

T0

Two choices for analysisLikelihood-based estimation.PLS regression algorithms are envelope methods. SIMPLSand NIPALS yield moment-based estimators of a basisΦfor EΣX(B).

With p fixed and without requiring normality, the resultingestimator of β =Φη is

√n-consistent.

In either case, β = PE(SX)

B, E = likelihood or PLSestimator of EΣX(B), B = usual estimator of β.



Why might PLS regression be of interest?

Core prediction method in Chemometrics.Used across the applied sciences: Micro-array data, FMRIdata, biomedical analyses, tumor classification,bioprocesses, forecasting, characteristics of craft beer, . . . .Serviceable in suitable big, high-dimensional problems:Scalable PLS algorithms have been proposed by Schwartz,et al. (2010), Zeng & Li (2014) and Tabei, et al. (2016).

And yet . . .Relatively little interest from the Statistics community,perhaps because PLS reg. is formulated as an algorithm.Chun and Keles (2010): PLS reg. can be consistent for βonly if p/n→ 0, using the previous regression model &standard regularity conditions like e.v’s ΣX bdd as p→∞.



PLS n, p Asymptotics (Cook & Forzani, 2018,2019)

Same context withq = dim(EΣX(B)) fixed.var(Y | X) bounded away from 0 as p→∞. Unnecessary ifa sparse β is assumed, which PLS does not.β , 0, so q > 1.Gauges:

Fitted value YN at a new independent XN: YN − E(Y|XN)

Estimation: ‖βpls − β‖ΣX



Governing Quantities

Let E = EΣX(B) for subscripts.Collinearily: ρ(p) = sum of pop. VIFs over a reducedregression.Signal: η(p) trace(PEΣXPE) = tracevar(PEX)Noise: κ(p) trace(QEΣXQE) = tracevar(QEX)

Special cases for reference:Abundance: η(p)→∞. If, as p→∞, ‖ΣXY‖2 →∞ thenη→∞.Sparsity: η(p) bounded. If the regression is sparse (e.g.only q predictors are active) then η is bounded.



Assume ρ(p) bounded.η(p) tracevar(PEX)κ(p) tracevar(QEX)

Table: Orders of YN − E(Y|XN) and ‖βpls − β‖ΣX as n, p→∞.

Conditions Op(·)κ p p/(nη)1/2

κ η 1/√

n

Chun-Keles –

ΣX bdd p/n1/2

κ p & η 1 p/n1/2

κ p when finitely many eigenvalues λvar(QEX) pDennis Cook | Envelope Methods


Informal conclusions

Sparsity: If the signal η is bounded then PLS may not workwell in high dimensional regressions. (But Zhu & Su(2019). Envelope-based sparse PLS. Annals, to appear).

Abundance: If the signal η is unbounded then PLS maywork well in high dimensional regressions – as argued byH. Wold, S. Wold et al. in the mid 90’s.



Finis: Comparison of PLS and EnvelopesRimal, Almøy and Sæbø(2019). Comparison of Multi-responsePrediction Methods. Chemometrics and Intelligent LaboratorySystems, 190, 10–21.

Analysis using both simulated data and real data has shownthat the envelope methods are more stable, less influenced by. . . [predictor collinearity] and in general, performed betterthan PCR and PLS methods. These methods are also foundto be less dependent on the number of components.

When n < p they used PCA to reduce the predictors prior toapplying likelihood-based envelope methodology.

Computing: z.umn.edu/envelopes

Papers: www.stat.umn.edu/∼dennisThank you


Documents

School of Statistics University of Minnesota...Intro. MLR MLE 2Pop. Cattle II. Other progress Intro:PLSR PLS Asymp Finis Response envelopes – (P EY,X) Q EY – in multi-response