75
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign ch Another Look at the Lasso Ronald Christensen Department of Math and Statistics University of New Mexico October 15, 2015

Another Look at the Lasso - math.unm.edufletcher/LassoTalk2015.pdfSeymourOther possible talksIntroductionChanging the sign of the estimatorA lasso trace (pro le)Standardization and

Embed Size (px)

Citation preview

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Another Look at the Lasso

Ronald Christensen

Department of Math and StatisticsUniversity of New Mexico

October 15, 2015

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Seymour

Figure: Seymour with Anne

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Seymour

Figure: Seymour: In disguise as Fisher??

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Seymour

Figure: Seymour at ease.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Seymour

Figure: Seymour with pipes.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Seymour

Figure: Seymour with hair.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Seymour

Figure: Seymour as I knew him.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Minnesota

First half of my life in Minnesota.

Second quarter at U of M.

Carefree and callow?

Seymour.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grad school

Figure: Our home.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grad school

Figure: Our other home.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grad school

Figure: Friday night without lights.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grad school

My son Fletcher is in the background of some of these.

Figure: Chris.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grad school

Figure: Norton with Erik and Chris in background.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grad school

Figure: Gary and me.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Another look at the lasso

Other talks I could have given.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Linear model lack-of-fit tests

Partial sums of residuals.

Nonnormal asymptotics.

Linear model version of law of the iterated logarithm.

Slow convergence to asymptotic null distribution.

Remarkably successful simulations of most null distributions.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Generalized Split Plot Models

Testing the whole plot variance.

Equivalence of F and GLRT.

Constraints on parameter space complicate things.

When different, F is better.

Quite lovely application of linear model theory.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Favorite results in prediction

Obviously, in honor of Seymour.

Bayes better than frequentist, by frequentist standards.

R2 = [corr(yi , yi )]2.

BP residuals uncorrelated with everything.

BLP residuals uncorrelated with everything linear.

Leave-1-out CV overestimates prediction error worse thannaive underestimates.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Lasso

Instead, I’ve chosen to discuss somethingI don’t know much about!

Hot topic; HTW (2015).

20 years of work; Tibshirani (1996).

The talk consists of:

Lots of pictures;

Lots of speculation.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Outline of Presentation

1 Defining the problem.

2 Geometry for unique least squares estimates.

3 Estimates changing signs. (Lasso trace/profile.)

4 Geometry for nonunique least squares estimates.

5 Uniqueness of lasso estimates.

6 LARS

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Penalized estimation

Linear model

Y = Xβ + e, E(e) = 0, Cov(e) = σ2I . (1)

Minimize‖Y − Xβ‖2 + λp(β) (2)

Tuning parameter λ ≥ 0.Nonnegative penalty function p(β).

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Equivalent: Restricted estimation

Minimize‖Y − Xβ‖2

subject top(β) ≤ δ.

Unless p(β) < δ, the minimizing value has p(β) = δ.

We focus on restricted estimation. HTW focus on penalizedestimation.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Lasso and Ridge

yi = β0 + β1xi1 + · · ·+ βpxi ,p + εi , i = 1, . . . , n, (3)

The lasso penalty:

pL(β) =

p∑j=1

|βj |.

Ridge regression penalty:

pR(β) =

p∑j=1

β2j .

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Geometry. Not Statistics.

Least squares is a geometric procedure, not a statistical one.Under certain model assumptions, least squares has very nicestatistical properties.

Similarly, lasso is a geometric procedure, not a statistical one.Inference is largely based on the Bootstrap, hence is asymptotic.

Bayesian lasso largely defeats the stated purpose.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Two standard pictures

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: Shrinkage without selection.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Two standard pictures

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: Shrinkage and selection.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Two standard pictures

If you don’t believe me

−0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

b1

b2(b1,b2)

Figure: Shrinkage and selection: Closeup.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grow the ellipse

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: Repeat: Lasso with δ = 1. Estimation is trivial.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grow the diamond

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

(1.7,0)

Figure: Lasso with δ = 1.7. Estimation is easy.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grow the diamond

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

(0,2.7)

Figure: Lasso with δ = 2.7. Estimation is easy.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Grow the diamond

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: Lasso with δ = 3.1. Estimation is trivial.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Two dimensions. Too easy.

If you know you are hitting a vertex, estimation is trivial.

If you know you are hitting a line, estimation is easy.

The problem is knowing what you are hitting.

Much harder in higher dimensions.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Three dimensions

Figure: Octohedron. (Copied from Wikipedia.)

Hit it with a football. Because a football is round, you can hit asurface, and edge, or a vertex.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Three dimensions

Figure: Octohedron with ellipsoid. (Copied from HTW.)

Hit it with a football. Because a football is round, you can hit asurface, and edge, or a vertex.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Three and higher dimensions.

If you know you are hitting a vertex, estimation is trivial.

If you know you are hitting an edge, estimation is still prettyeasy.

If you know you are hitting a surface, estimation is easy.

Even in very high dimensions, estimation is pretty easy if youknow what you are hitting.

The problem is knowing what you are hitting.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Three and higher dimensions.

A vertex is the intersection of 3 one-dimensional linearconstraints. Dimension 0.

An edge is the intersection of 2 one-dimensional linearconstraints. Dimension 1.

A surface is 1 one-dimensional linear constraint. Dimension 2.

Ideas scale up to higher dimensions.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Three and higher dimensions.

It is easy to fit a linear model subject to linear constraints.

(Just figure out the new linear model.)

The problem is knowing what constraints are appropriate.

With δ large enough, knowing the (one) constraint is easy.

Wanted to reconstruct the lasso based on this fact.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Three and higher dimensions.

First idea, fit the model subject to the constraint. Zero outany estimates that changed sign, refit. (Not even close.)

Second idea, backward elimination as δ decreases. Zero outestimates one at a time. (Not the lasso, but not necessarilybad.)

Note, LARS with lasso modification is a version of forwardselection with stepwise.

Latest conjecture, backward elimination with stepwise mightbe lasso.

With predictor standardization, why bother to shrink at all?(Carroll and colleagues, relaxed.)

Backward elimination, forward selection, or stepwise based on|βj |?Out with the small, in with the big.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Changing signs: Lasso

My first error was thinking that Lasso estimates could not changesign.

Based on the two standard pictures plus a lack of imagination onmy part.

I thought that the signs of β determined the surface. Just had toworry about its edges and vertices.

Only true when δ is large.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Changing signs: Ridge regression

−1 0 1 2 3

−1

01

23

b1

b2(b1,b2)

Figure: Ridge regression with sign change.

I knew ridge regression estimates could change sign from Hoerl andKennard (and the ph.d. applied class).

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Changing signs: Lasso

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: Lasso with sign change.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace (profile)

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: δ = 1.8.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: δ = 1.6.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace

−0.1 0.0 0.1 0.2 0.3

1.2

1.3

1.4

1.5

1.6

b1

b2

Figure: Closeup: δ = 1.6.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace

−0.1 0.0 0.1 0.2 0.3

1.2

1.3

1.4

1.5

1.6

b1

b2

Figure: δ = 1.52.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace

−0.1 0.0 0.1 0.2 0.3

1.2

1.3

1.4

1.5

1.6

b1

b2

Figure: δ = 1.477.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace

−0.2 −0.1 0.0 0.1 0.2

1.0

1.1

1.2

1.3

1.4

1.5

b1

b2

Figure: δ = 1.46.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace

−0.2 −0.1 0.0 0.1 0.2

1.0

1.1

1.2

1.3

1.4

1.5

b1

b2

Figure: δ = 1.43.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace

−0.2 −0.1 0.0 0.1 0.2

1.0

1.1

1.2

1.3

1.4

1.5

b1

b2

Figure: δ = 1.39.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

A lasso trace

−0.2 −0.1 0.0 0.1 0.2

1.0

1.1

1.2

1.3

1.4

1.5

b1

b2

Figure: δ = 1.3.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

No changing signs with standardization?

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: Standardized lasso. No sign change? ρ = 0.9, δ = 1

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

No changing signs with standardization?

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: Standardized lasso. No sign change? ρ = 0.999, δ = 1.935.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

No changing signs with standardization?

−1 0 1 2 3

−1

01

23

b1

b2

(b1,b2)

Figure: Standardized lasso. No sign change? ρ = −0.999, δ = 1.935

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Increasing correlation (Towards nonunique estimation)

0.0 0.5 1.0 1.5 2.0

−1

01

23

b1

b2

(b1,b2)

Figure: Repeat: Lasso with ρ = .4/√.8

.= .45.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Increasing correlation

0.0 0.5 1.0 1.5 2.0

−1

01

23

b1

b2

(b1,b2)

Figure: Lasso with ρ = 0.999.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Increasing correlation

0.0 0.5 1.0 1.5 2.0

−1

01

23

b1

b2

(b1,b2)

Figure: Lasso with ρ = 0.9999.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Increasing correlation

0.0 0.5 1.0 1.5 2.0

−1

01

23

b1

b2

(b1,b2)

Figure: Lasso with ρ = −0.9999.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Nonunique estimates

−1 0 1 2 3

−1

01

23

b1

b2β

Figure: Nonunique estimates. δ = 1.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Nonunique estimates

−1 0 1 2 3

−1

01

23

b1

b2β

Figure: Least squares Lasso estimate. δ = 7/3. Grow the diamond to hitthe line. δ > 7/3?

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Nonunique estimates

−1 0 1 2 3

−1

01

23

b1

b2β

β0

Figure: Nonunique estimates with shortest estimate. δ = 1.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Nonunique estimates

−1 0 1 2 3

−1

01

23

b1

b2β

β0

Figure: Nonunique estimates with shortest estimate and lasso estimate.δ = 1.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Nonunique estimates

Figure: Octohedron. (Copied from Wikipedia.)

If you throw it against a wall, likely to hit a vertex. If you throw itinto a wire, likely to hit an edge.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Overparameterized one-way ANOVA

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

−2.

0−

1.5

−1.

0−

0.5

0.0

0.5

1.0

α1 − α2 = 2

α1

α 2

β0 = (1, − 1)

Figure: Shrinkage and selection.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Computations

Y = Xβ + e

Mean adjusted.

X ′X is a correlation matrix.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Geometry versus Lagrange (penalized likelihood)

δ →∞⇒ β(δ) = β

δ → 0⇒ β(δ)→ 0

λ→∞⇒ β(λ) = 0

λ→ 0⇒ β(λ)→ β

Soft threshold for one variable:

βj(λ) = sign(βj)[|βj | − λ]+ ≡ Sλ(βj)

This is 0 unless λ < maxj |X ′jY | because with X ′X a

correlation matrix, X ′jY = βj .

Cyclic coordinate descent: βj ← Sλ(βj + X ′j [Y − X β]); keep

running through the js.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

LARS

Focus on update steps, worry about starting later.There exist knots λ0 ≥ λ1 ≥ · · · ≥ λp∧n ≥ 0.For knot λk there is an active set of predictor variables Ak .βAk

are least squares estimates for the active set, with 0s forinactive variables.β(λk) is the current estimate.Define, for λ ≤ λk ,

β(λ) =λ

λkβ(λk) +

λk − λλk

βAk.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

LARS—will see later

For active Xj , ∣∣X ′j [Y − X β(λ)]

∣∣ = λ.

For inactive Xi , ∣∣X ′i [Y − X β(λ)]

∣∣ ≤ λ.Decreasing λ.

First inactive Xi with,∣∣X ′

i [Y − X β(λ)]∣∣ = λ added to active set

which defines λk+1.

Repeat.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

LARS Notes

1 The two definitions of β(λk+1) agree.

β(λk+1) =λk+1

λkβ(λk) +

λk − λk+1

λkβAk

=λk+1

λk+1β(λk+1) +

λk+1 − λk+1

λk+1βAk+1

When you add a new active variable, you initially use verylittle of it.

2 λ0 = maxj X′jY . A0 is the maximizing Xj . For λ ≥ λ0,

β(λ) ≡ 0.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

LARS into Lasso

If any coefficient in

β(λ) =λ

λkβ(λk) +

λk − λλk

βAk

changes sign, throw out that variable and refit.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Constant residual covariance for LARS

InductionSuppose that at knot λk we have

0 ≤ X ′j [Y − X β(λk)] = λk

for any Xj in the active set. Then, for λ < λk ,

X ′j [Y − X β(λ)] =

λ

λkX ′j [Y − X β(λk)] +

λk − λλk

X ′j [Y − X βAk

]

λkλk +

λk − λλk

0

= λ.

Recall: X βAk= MAk

Y , where MAkis the ppo onto space of active

variables.Similar result if inner product is negative.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

LARS—Fletch

Additions to active variables driven by knot behavior. For aninactive variable Xi , compute pi defined by

X ′i [Y − X β(λk)] ≡ piλk .

Then

X ′i [Y − X β(λ)] =

λ

λkX ′i [Y − X β(λk)] +

λk − λλk

X ′i [Y − X βAk

]

Add Xi if (1− λ

λk

)X ′i [Y − X βAk

= (1− pi )λ

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

LARS—Fletch

For λk ≥ λ ≥ λk+1,

β(λ) =k∑

j=0

λ

λjβAj

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Degrees of freedom

Dimensions of vector spaces.

Leery of other definitions. (DIC?)

HTW: yi = m(xi ) + εi , mi (Y ) = m(xi ),

df [m] =Cov[mi (Y ), yi ]

σ2.

Lasso: df estimated by number of nonzero coefficients.

Ridge: df = p/(1 + λ) for orthogonal design

yi = yi/nλ, df = 1/λ.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

Sparsity

FittingY = Xβ + e

Truth

Y = X0γ + e, C (X0) = span{Xj |j ∈ S}

S ⊂ {1, 2, . . . , p} has r elements.

Ideally,Xk ⊥ C (X0), k 6∈ S,

or some asymptotic version thereof.

Clearly, if the extraneous variables are highly correlated withthe important ones, you have a mess.

Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References

References

Hastie, T., Tibshirani, R., and Wainwright, M. (2015). StatisticalLearning with Sparcity: The Lasso and Generalizations.Chapman and Hall, Boca Raton, FL.

Murdoch, D. and Chow, E. D. (2015). Package ‘ellipse.’https://cran.r-project.org/web/packages/ellipse/ellipse.pdf

Qi, X., Luo, R., Carroll, R. J. and Zhao, H. (2015). SparseRegression by Projection and Sparse Discriminant Analysis,Journal of Computational and Graphical Statistics, 24,416-438.

Tibshirani, R. (1996). Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society: Series B, 58,267-288.