View
216
Download
0
Category
Preview:
Citation preview
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Another Look at the Lasso
Ronald Christensen
Department of Math and StatisticsUniversity of New Mexico
October 15, 2015
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Seymour
Figure: Seymour with Anne
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Seymour
Figure: Seymour: In disguise as Fisher??
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Seymour
Figure: Seymour at ease.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Seymour
Figure: Seymour with pipes.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Seymour
Figure: Seymour with hair.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Seymour
Figure: Seymour as I knew him.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Minnesota
First half of my life in Minnesota.
Second quarter at U of M.
Carefree and callow?
Seymour.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grad school
Figure: Our home.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grad school
Figure: Our other home.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grad school
Figure: Friday night without lights.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grad school
My son Fletcher is in the background of some of these.
Figure: Chris.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grad school
Figure: Norton with Erik and Chris in background.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grad school
Figure: Gary and me.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Another look at the lasso
Other talks I could have given.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Linear model lack-of-fit tests
Partial sums of residuals.
Nonnormal asymptotics.
Linear model version of law of the iterated logarithm.
Slow convergence to asymptotic null distribution.
Remarkably successful simulations of most null distributions.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Generalized Split Plot Models
Testing the whole plot variance.
Equivalence of F and GLRT.
Constraints on parameter space complicate things.
When different, F is better.
Quite lovely application of linear model theory.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Favorite results in prediction
Obviously, in honor of Seymour.
Bayes better than frequentist, by frequentist standards.
R2 = [corr(yi , yi )]2.
BP residuals uncorrelated with everything.
BLP residuals uncorrelated with everything linear.
Leave-1-out CV overestimates prediction error worse thannaive underestimates.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Lasso
Instead, I’ve chosen to discuss somethingI don’t know much about!
Hot topic; HTW (2015).
20 years of work; Tibshirani (1996).
The talk consists of:
Lots of pictures;
Lots of speculation.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Outline of Presentation
1 Defining the problem.
2 Geometry for unique least squares estimates.
3 Estimates changing signs. (Lasso trace/profile.)
4 Geometry for nonunique least squares estimates.
5 Uniqueness of lasso estimates.
6 LARS
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Penalized estimation
Linear model
Y = Xβ + e, E(e) = 0, Cov(e) = σ2I . (1)
Minimize‖Y − Xβ‖2 + λp(β) (2)
Tuning parameter λ ≥ 0.Nonnegative penalty function p(β).
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Equivalent: Restricted estimation
Minimize‖Y − Xβ‖2
subject top(β) ≤ δ.
Unless p(β) < δ, the minimizing value has p(β) = δ.
We focus on restricted estimation. HTW focus on penalizedestimation.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Lasso and Ridge
yi = β0 + β1xi1 + · · ·+ βpxi ,p + εi , i = 1, . . . , n, (3)
The lasso penalty:
pL(β) =
p∑j=1
|βj |.
Ridge regression penalty:
pR(β) =
p∑j=1
β2j .
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Geometry. Not Statistics.
Least squares is a geometric procedure, not a statistical one.Under certain model assumptions, least squares has very nicestatistical properties.
Similarly, lasso is a geometric procedure, not a statistical one.Inference is largely based on the Bootstrap, hence is asymptotic.
Bayesian lasso largely defeats the stated purpose.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Two standard pictures
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: Shrinkage without selection.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Two standard pictures
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: Shrinkage and selection.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Two standard pictures
If you don’t believe me
−0.5 0.0 0.5 1.0
0.0
0.5
1.0
1.5
2.0
b1
b2(b1,b2)
Figure: Shrinkage and selection: Closeup.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grow the ellipse
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: Repeat: Lasso with δ = 1. Estimation is trivial.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grow the diamond
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
(1.7,0)
Figure: Lasso with δ = 1.7. Estimation is easy.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grow the diamond
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
(0,2.7)
Figure: Lasso with δ = 2.7. Estimation is easy.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Grow the diamond
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: Lasso with δ = 3.1. Estimation is trivial.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Two dimensions. Too easy.
If you know you are hitting a vertex, estimation is trivial.
If you know you are hitting a line, estimation is easy.
The problem is knowing what you are hitting.
Much harder in higher dimensions.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Three dimensions
Figure: Octohedron. (Copied from Wikipedia.)
Hit it with a football. Because a football is round, you can hit asurface, and edge, or a vertex.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Three dimensions
Figure: Octohedron with ellipsoid. (Copied from HTW.)
Hit it with a football. Because a football is round, you can hit asurface, and edge, or a vertex.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Three and higher dimensions.
If you know you are hitting a vertex, estimation is trivial.
If you know you are hitting an edge, estimation is still prettyeasy.
If you know you are hitting a surface, estimation is easy.
Even in very high dimensions, estimation is pretty easy if youknow what you are hitting.
The problem is knowing what you are hitting.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Three and higher dimensions.
A vertex is the intersection of 3 one-dimensional linearconstraints. Dimension 0.
An edge is the intersection of 2 one-dimensional linearconstraints. Dimension 1.
A surface is 1 one-dimensional linear constraint. Dimension 2.
Ideas scale up to higher dimensions.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Three and higher dimensions.
It is easy to fit a linear model subject to linear constraints.
(Just figure out the new linear model.)
The problem is knowing what constraints are appropriate.
With δ large enough, knowing the (one) constraint is easy.
Wanted to reconstruct the lasso based on this fact.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Three and higher dimensions.
First idea, fit the model subject to the constraint. Zero outany estimates that changed sign, refit. (Not even close.)
Second idea, backward elimination as δ decreases. Zero outestimates one at a time. (Not the lasso, but not necessarilybad.)
Note, LARS with lasso modification is a version of forwardselection with stepwise.
Latest conjecture, backward elimination with stepwise mightbe lasso.
With predictor standardization, why bother to shrink at all?(Carroll and colleagues, relaxed.)
Backward elimination, forward selection, or stepwise based on|βj |?Out with the small, in with the big.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Changing signs: Lasso
My first error was thinking that Lasso estimates could not changesign.
Based on the two standard pictures plus a lack of imagination onmy part.
I thought that the signs of β determined the surface. Just had toworry about its edges and vertices.
Only true when δ is large.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Changing signs: Ridge regression
−1 0 1 2 3
−1
01
23
b1
b2(b1,b2)
Figure: Ridge regression with sign change.
I knew ridge regression estimates could change sign from Hoerl andKennard (and the ph.d. applied class).
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Changing signs: Lasso
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: Lasso with sign change.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace (profile)
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: δ = 1.8.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: δ = 1.6.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace
−0.1 0.0 0.1 0.2 0.3
1.2
1.3
1.4
1.5
1.6
b1
b2
Figure: Closeup: δ = 1.6.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace
−0.1 0.0 0.1 0.2 0.3
1.2
1.3
1.4
1.5
1.6
b1
b2
Figure: δ = 1.52.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace
−0.1 0.0 0.1 0.2 0.3
1.2
1.3
1.4
1.5
1.6
b1
b2
Figure: δ = 1.477.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace
−0.2 −0.1 0.0 0.1 0.2
1.0
1.1
1.2
1.3
1.4
1.5
b1
b2
Figure: δ = 1.46.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace
−0.2 −0.1 0.0 0.1 0.2
1.0
1.1
1.2
1.3
1.4
1.5
b1
b2
Figure: δ = 1.43.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace
−0.2 −0.1 0.0 0.1 0.2
1.0
1.1
1.2
1.3
1.4
1.5
b1
b2
Figure: δ = 1.39.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
A lasso trace
−0.2 −0.1 0.0 0.1 0.2
1.0
1.1
1.2
1.3
1.4
1.5
b1
b2
Figure: δ = 1.3.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
No changing signs with standardization?
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: Standardized lasso. No sign change? ρ = 0.9, δ = 1
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
No changing signs with standardization?
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: Standardized lasso. No sign change? ρ = 0.999, δ = 1.935.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
No changing signs with standardization?
−1 0 1 2 3
−1
01
23
b1
b2
(b1,b2)
Figure: Standardized lasso. No sign change? ρ = −0.999, δ = 1.935
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Increasing correlation (Towards nonunique estimation)
0.0 0.5 1.0 1.5 2.0
−1
01
23
b1
b2
(b1,b2)
Figure: Repeat: Lasso with ρ = .4/√.8
.= .45.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Increasing correlation
0.0 0.5 1.0 1.5 2.0
−1
01
23
b1
b2
(b1,b2)
Figure: Lasso with ρ = 0.999.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Increasing correlation
0.0 0.5 1.0 1.5 2.0
−1
01
23
b1
b2
(b1,b2)
Figure: Lasso with ρ = 0.9999.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Increasing correlation
0.0 0.5 1.0 1.5 2.0
−1
01
23
b1
b2
(b1,b2)
Figure: Lasso with ρ = −0.9999.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Nonunique estimates
−1 0 1 2 3
−1
01
23
b1
b2β
Figure: Nonunique estimates. δ = 1.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Nonunique estimates
−1 0 1 2 3
−1
01
23
b1
b2β
Figure: Least squares Lasso estimate. δ = 7/3. Grow the diamond to hitthe line. δ > 7/3?
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Nonunique estimates
−1 0 1 2 3
−1
01
23
b1
b2β
β0
Figure: Nonunique estimates with shortest estimate. δ = 1.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Nonunique estimates
−1 0 1 2 3
−1
01
23
b1
b2β
β0
Figure: Nonunique estimates with shortest estimate and lasso estimate.δ = 1.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Nonunique estimates
Figure: Octohedron. (Copied from Wikipedia.)
If you throw it against a wall, likely to hit a vertex. If you throw itinto a wire, likely to hit an edge.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Overparameterized one-way ANOVA
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
−2.
0−
1.5
−1.
0−
0.5
0.0
0.5
1.0
α1 − α2 = 2
α1
α 2
β0 = (1, − 1)
Figure: Shrinkage and selection.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Computations
Y = Xβ + e
Mean adjusted.
X ′X is a correlation matrix.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Geometry versus Lagrange (penalized likelihood)
δ →∞⇒ β(δ) = β
δ → 0⇒ β(δ)→ 0
λ→∞⇒ β(λ) = 0
λ→ 0⇒ β(λ)→ β
Soft threshold for one variable:
βj(λ) = sign(βj)[|βj | − λ]+ ≡ Sλ(βj)
This is 0 unless λ < maxj |X ′jY | because with X ′X a
correlation matrix, X ′jY = βj .
Cyclic coordinate descent: βj ← Sλ(βj + X ′j [Y − X β]); keep
running through the js.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
LARS
Focus on update steps, worry about starting later.There exist knots λ0 ≥ λ1 ≥ · · · ≥ λp∧n ≥ 0.For knot λk there is an active set of predictor variables Ak .βAk
are least squares estimates for the active set, with 0s forinactive variables.β(λk) is the current estimate.Define, for λ ≤ λk ,
β(λ) =λ
λkβ(λk) +
λk − λλk
βAk.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
LARS—will see later
For active Xj , ∣∣X ′j [Y − X β(λ)]
∣∣ = λ.
For inactive Xi , ∣∣X ′i [Y − X β(λ)]
∣∣ ≤ λ.Decreasing λ.
First inactive Xi with,∣∣X ′
i [Y − X β(λ)]∣∣ = λ added to active set
which defines λk+1.
Repeat.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
LARS Notes
1 The two definitions of β(λk+1) agree.
β(λk+1) =λk+1
λkβ(λk) +
λk − λk+1
λkβAk
=λk+1
λk+1β(λk+1) +
λk+1 − λk+1
λk+1βAk+1
When you add a new active variable, you initially use verylittle of it.
2 λ0 = maxj X′jY . A0 is the maximizing Xj . For λ ≥ λ0,
β(λ) ≡ 0.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
LARS into Lasso
If any coefficient in
β(λ) =λ
λkβ(λk) +
λk − λλk
βAk
changes sign, throw out that variable and refit.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Constant residual covariance for LARS
InductionSuppose that at knot λk we have
0 ≤ X ′j [Y − X β(λk)] = λk
for any Xj in the active set. Then, for λ < λk ,
X ′j [Y − X β(λ)] =
λ
λkX ′j [Y − X β(λk)] +
λk − λλk
X ′j [Y − X βAk
]
=λ
λkλk +
λk − λλk
0
= λ.
Recall: X βAk= MAk
Y , where MAkis the ppo onto space of active
variables.Similar result if inner product is negative.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
LARS—Fletch
Additions to active variables driven by knot behavior. For aninactive variable Xi , compute pi defined by
X ′i [Y − X β(λk)] ≡ piλk .
Then
X ′i [Y − X β(λ)] =
λ
λkX ′i [Y − X β(λk)] +
λk − λλk
X ′i [Y − X βAk
]
Add Xi if (1− λ
λk
)X ′i [Y − X βAk
= (1− pi )λ
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
LARS—Fletch
For λk ≥ λ ≥ λk+1,
β(λ) =k∑
j=0
λ
λjβAj
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Degrees of freedom
Dimensions of vector spaces.
Leery of other definitions. (DIC?)
HTW: yi = m(xi ) + εi , mi (Y ) = m(xi ),
df [m] =Cov[mi (Y ), yi ]
σ2.
Lasso: df estimated by number of nonzero coefficients.
Ridge: df = p/(1 + λ) for orthogonal design
yi = yi/nλ, df = 1/λ.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
Sparsity
FittingY = Xβ + e
Truth
Y = X0γ + e, C (X0) = span{Xj |j ∈ S}
S ⊂ {1, 2, . . . , p} has r elements.
Ideally,Xk ⊥ C (X0), k 6∈ S,
or some asymptotic version thereof.
Clearly, if the extraneous variables are highly correlated withthe important ones, you have a mess.
Seymour Other possible talks Introduction Changing the sign of the estimator A lasso trace (profile) Standardization and sign changes? Towards nonunique estimation Geometry of r(X ) < p + 1 LARS References
References
Hastie, T., Tibshirani, R., and Wainwright, M. (2015). StatisticalLearning with Sparcity: The Lasso and Generalizations.Chapman and Hall, Boca Raton, FL.
Murdoch, D. and Chow, E. D. (2015). Package ‘ellipse.’https://cran.r-project.org/web/packages/ellipse/ellipse.pdf
Qi, X., Luo, R., Carroll, R. J. and Zhao, H. (2015). SparseRegression by Projection and Sparse Discriminant Analysis,Journal of Computational and Graphical Statistics, 24,416-438.
Tibshirani, R. (1996). Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society: Series B, 58,267-288.
Recommended