Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
1/50
Chapter 2 - linear methodsLinear regression
Logistic regression
Geir Storvik
January 25, 2021
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 1 / 50
2/50
Lectures
Course web-page: 3 hours lectures
Schedule: 4 hours lectures
New plan: Only lectures 14.15-15.00 on Wednesdays
We might use the extra hour later!
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 2 / 50
3/50
Linear regression
What is linear regression?Some repetition from STK1110, see chap 12 in Devore & Berk
Properties, what can be done with the linear model?Challenges/weaknesses
Many of these common with other methods
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 3 / 50
4/50
Prediction - Advertising data
Response: Sale of a product in 200different market (sales)Explanatory variables
Advertisement budget in tv (TV)Advertisement budget in radio(radio)Advertisement budget in newspapers(newspaper)
0 50 100 200 300
510
15
20
25
TV
Sale
s
0 10 20 30 40 50
510
15
20
25
Radio
Sale
s
0 20 40 60 80 100
510
15
20
25
Newspaper
Sale
s
1Some of the figures are taken from "An Introduction to Statistical Learning,
with applications in R" (Springer, 2013) with permission from the authors: G.
James, D. Witten, T. Hastie and R. Tibshirani
Questions:
Is there a relationship between advertisment and sales?
How strong is this relationsip?
Which media gives strongest influence?
How precise can we estimate the effect?
How precise can we predict future sale?
Is there a linear relationship?
Is there some synergy/interaction between different media?
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 4 / 50
5/50
Linear regression
Data (x1, y1), ..., (xn, yn)Model: Assume
Yi = β0 + β1xi,1 + · · ·+ βpxi,p + εi , , εiind∼ (0, σ2) (*)
Matrix form:y1y2...
yn
=
1 x1,1 x1,2 · · · x1,p1 x2,1 x2,2 · · · x2,p...1 xn,1 xn,2 · · · xn,p
β0β1...βp
+ε1ε2...εn
Y =Xβ + ε
Least squares estimates (also ML if εiiid∼ N(0, σ2))
β̂ =(XT X)−1XT Y
Prediction on new point x∗ = (x∗1 , ..., x∗p ):
ŷ∗ =β̂0 + β̂1x∗1 + · · ·+ β̂px∗p
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 5 / 50
6/50
Random vectors
If z1, ..., zp are random variables, we say z = (z1, ..., zp) is a random vectorWe define expectation and covariance matrix by
E [z] =
E [z1]E [z2]
...E [zp]
, V [z] =
Var[z1] Cov[z1, z2] · · · Cov[z1, zp]Cov[z2, z1] Var[z2] · · · Cov[z2, zp]
......
. . ....
Cov[zp, z1] Cov[zp, z2] · · · Var[zp]
Rules
E [Az + b] =AE [z] + b,
V [Az + b] =AV [z]AT
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 6 / 50
7/50
Properties - linear regression
Estimate: β̂ = (XT X)−1XT YIf (*) is true,
E[β̂] = β, V[β̂] = σ2(XT X)−1
If also εiiid∼ N(0, σ2):
Test H0 : βj = 0: T =β̂j
SE(β̂j )
H0∼ tn−p−1 under H0Test H0 : β1 = β2 = · · · = βp = 0
F = (TSS−RSS)/pRSS/(n−p−1)H0∼ Fp,n−p−1 under H0
RSS =n∑
i=1
(yi − ŷi )2 < TSS =n∑
i=1
(yi − ȳ)2
Approximate true of n� p
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 7 / 50
8/50
Geometric interpretation
β̂ = (XT X)−1XT Yŷi = xTi β̂, Ŷ = (ŷ1, ..., ŷn)
Ŷ =X β̂ = X (XT X)−1XT︸ ︷︷ ︸P
Y = PY P symmetric
P2 =X (XT X)−1XT X (XT X)−1XT = P Projection matrix
Ŷ − Y =(I − P)Y
(Ŷ − Y )T Ŷ =Y (I − P)PY = 0 Orthogonality
Y
Ŷ
Y − Ŷ
C(X)
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 8 / 50
9/50
Advertising data
> Adve r t i s i ng f i t . lm summary ( f i t . lm )
C o e f f i c i e n t s :Est imate Std . E r ro r t value Pr ( > | t | )
( I n t e r c e p t ) 2.938889 0.311908 9.422
10/50
Advertising data
Clearly significant that at least one of the explanatory variables are useful forprediction of response.Are all explanatory variables important?
Newspaper seems to be less important.f i t 2 . lm | t | )( I n t e r c e p t ) 2.92110 0.29449 9.919
11/50
Comparison of models
Topic within chapter 3Here: Some simple approachesAssume we want to test
H0 : βi1 = βi2 = · · · = βiq = 0
Let RSS0 =∑n
i=1(y − ŷi )2 where ŷi is computed under H0, RSS similarly for the fullmodel
F =(RSS0 − RSS)/qRSS/(n − p − 1)
H0∼ Fq,n−p−1
Example: H0 : β3 = 0, q = 1> RSS RSS0 Fobs Fobs[ 1 ] 0.03122805> 1−pf ( Fobs ,1 ,196)[ 1 ] 0.8599151> anova ( f i t . lm , f i t 2 . lm )Ana lys is o f Variance Table
Model 1 : Sales ~ TV + Radio + NewspaperModel 2 : Sales ~ TV + Radio
Res . Df RSS Df Sum of Sq F Pr ( >F)1 196 556.832 197 556.91 −1 −0.088717 0.0312 0.8599
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 11 / 50
12/50
Two tests when q = 1
Test H0 : βj = 0:
T =β̂j
SE(β̂j )
H0∼ tn−p−1
F = (RSS0−RSS)/qRSS/(n−p−q)H0∼ F1,n−p−1
Same test, since F = T 2 and
T ∼ tn−p−1 ⇒ T 2 ∼ F1,n−p−1
Example: F = 0.03122805
C o e f f i c i e n t s :Est imate Std . E r ro r t value Pr ( > | t | )
( I n t e r c e p t ) 2.938889 0.311908 9.422
13/50
Interactions
Alternative model for Advertising data:
Y = β0 + β1x1 + β2x2 + β3x1x2 + ε
> f i t 3 . lm summary ( f i t 3 . lm )
C o e f f i c i e n t s :Est imate Std . E r ro r t value Pr ( > | t | )
( I n t e r c e p t ) 6.750e+00 2.479e−01 27.233
14/50
What is linearity?
Model with interactions
Y = β0 + β1x1 + β2x2 + β3x1x2 + ε
The model is not linear in the x ’s
The model is linear in the β’s
Theory about linear regression require linearity in the β’s
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 14 / 50
15/50
What if p is large?
Example: HittersResponse: Salary, p = 19 explanatory variables> f i t . lm summary ( f i t . lm )C o e f f i c i e n t s :
Est imate Std . E r ro r t value Pr ( > | t | )( I n t e r c e p t ) 163.10359 90.77854 1.797 0.073622 .AtBat −1.97987 0.63398 −3.123 0.002008 ∗∗Hi t s 7.50077 2.37753 3.155 0.001808 ∗∗HmRun 4.33088 6.20145 0.698 0.485616Runs −2.37621 2.98076 −0.797 0.426122RBI −1.04496 2.60088 −0.402 0.688204Walks 6.23129 1.82850 3.408 0.000766 ∗∗∗Years −3.48905 12.41219 −0.281 0.778874CAtBat −0.17134 0.13524 −1.267 0.206380CHits 0.13399 0.67455 0.199 0.842713CHmRun −0.17286 1.61724 −0.107 0.914967CRuns 1.45430 0.75046 1.938 0.053795 .CRBI 0.80771 0.69262 1.166 0.244691CWalks −0.81157 0.32808 −2.474 0.014057 ∗LeagueN 62.59942 79.26140 0.790 0.430424DivisionW −116.84925 40.36695 −2.895 0.004141 ∗∗PutOuts 0.28189 0.07744 3.640 0.000333 ∗∗∗Ass is t s 0.37107 0.22120 1.678 0.094723 .Er ro rs −3.36076 4.39163 −0.765 0.444857NewLeagueN −24.76233 79.00263 −0.313 0.754218−−−Residual standard e r r o r : 315.6 on 243 degrees o f freedom
(59 observat ions de le ted due to missingness )M u l t i p l e R−squared : 0.5461 , Adjusted R−squared : 0.5106F−s t a t i s t i c : 15.39 on 19 and 243 DF, p−value : < 2.2e−16
How to choose explanatory variables?
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 15 / 50
16/50
Variable selection
The number of possibilities grows fast with pp = 3 gives 23 = 8 possible modelsp = 30 gir 230 = 1 073 741 824 possible models!
Forward selectionStart with a null model Y = β0 + εAdd the value that gives the best improvementContinue as long as you obtain a significant improvement
Backwards selectionStart with the full model Y = β0 + β1x1 + · · ·+ βpxp + εTake away the variable that gives smallest deteriorationContinue until you get a non-significant deterioration
Mixed selectionCombination of forward and backwards selection
We will come back to this in ch 3
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 16 / 50
17/50
Measure of performance
Common choices:
s2 =1
n − p − 1
n∑i=1
(yi − ŷi )2 =1
n − p − 1RSS RSS = D(β)
R2 =TSS− RSS
TSS= 1− RSS
TSS
=1−∑n
i=1(yi − ŷi )2∑n
i=1(yi − ȳ)2
=
∑ni=1(yi − ȳ)(ŷi − ¯̂y)√∑n
i=1(yi − ȳ)2∑n
i=1(ŷi − ¯̂y)2
Can show: 0 ≤ R2 ≤ 1R2 close to 1 indicate good performance
Do not take into account overfitting. We will look at this later
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 17 / 50
18/50
Prediction
β̂0, ..., β̂p gives prediction
Ŷ = β̂0 + β̂1x1 + · · ·+ β̂pxp
Approximation to assumed model
f (x) = β0 + β1x1 + · · ·+ βpxp
β0 + β1x1 + · · ·+ βpxp also an approximation to the true model f (x) = E[Y |x].Most theoritical results related to assuming is linear.
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 18 / 50
19/50
Confidence- og prediction intervals
Confidence interval for βj : β̂j ± tα/2;n−p−1SE(β̂j )> c o n f i n t ( f i t 2 . lm )
2.5 % 97.5 %( I n t e r c e p t ) 2.34034299 3.50185683TV 0.04301292 0.04849671Radio 0.17213877 0.20384969
Confidence interval for E [Y |x] = xTβ: xT β̂ ± tα/2;n−p−1SE(xT β̂)newdata newdata predict ( f i t 2 . lm , newdata , i n t e r v a l = " p r e d i c t " )
f i t lw r upr1 11.25647 7.929616 14.58332
Based on that the assumed model is the true model.Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 19 / 50
20/50
Qualitative explanatory variables
So far: Assumed explanatory variables are quantitativeExample: Credit data set
Male Female
050
010
0015
0020
00
How to do regression with qualitative data?Assume first one explanatory variable with two categories.Define
xi =
{1 if individal i is female kvinne0 if individual i is male
Assume model
Yi =β0 + β1xi + εi
=
{β0 + β1 + εi if i is femaleβ0 + εi if i is male
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 20 / 50
21/50
Qualitative explanatory variables - cont.
Example: Credit data set
African American Asian Caucasian
050
010
0015
0020
00
Explanatory variable with three categories.Define
xi1 =
{1 if individual i is Asian0 if individual i is not Asian
xi2 =
{1 if individual i is Caucasian0 if individual i is not Caucasion
Assume model
Yi =β0 + β1xi1 + β2xi2 + εi
=
β0 + β1 + εi if i is Asianβ0 + β2 + εi if i is Caucasianβ0 + εi If i is African American
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 21 / 50
22/50
Regression with qualitative variable in R
> class ( C red i t $Student )[ 1 ] " f a c t o r "> class ( C red i t $ E t h n i c i t y )[ 1 ] " f a c t o r "> f i t . lm summary ( f i t . lm )C o e f f i c i e n t s :
Est imate Std . E r ro r t value Pr ( > | t | )( I n t e r c e p t ) 490.776 45.411 10.807 < 2e−16 ∗∗∗StudentYes 398.221 74.391 5.353 1.47e−07 ∗∗∗E t h n i c i t y A s i a n −29.216 62.899 −0.464 0.643Ethn ic i t yCaucas ian −6.297 54.817 −0.115 0.909
Residual standard e r r o r : 445.6 on 396 degrees o f freedomM u l t i p l e R−squared : 0.06768 , Adjusted R−squared : 0.06062F−s t a t i s t i c : 9.583 on 3 and 396 DF, p−value : 4.025e−06
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 22 / 50
23/50
Quantitative and qualitative variable is R
> f i t . lm2 summary ( f i t . lm2 )C o e f f i c i e n t s :
Est imate Std . E r ro r t value Pr ( > | t | )( I n t e r c e p t ) −487.65045 35.22809 −13.843 < 2e−16 ∗∗∗Age −0.59933 0.29304 −2.045 0.0415 ∗Cards 18.06541 4.33008 4.172 3.72e−05 ∗∗∗Education −1.16552 1.59422 −0.731 0.4652Income −7.79950 0.23395 −33.338 < 2e−16 ∗∗∗L i m i t 0.19394 0.03258 5.953 5.86e−09 ∗∗∗Rating 1.08888 0.48785 2.232 0.0262 ∗StudentYes 426.10483 16.61371 25.648 < 2e−16 ∗∗∗E t h n i c i t y A s i a n 15.01876 14.00721 1.072 0.2843Ethn ic i t yCaucas ian 9.24342 12.17138 0.759 0.4480
Residual standard e r r o r : 98.77 on 390 degrees o f freedomM u l t i p l e R−squared : 0.9549 , Adjusted R−squared : 0.9538F−s t a t i s t i c : 917.2 on 9 and 390 DF, p−value : < 2.2e−16
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 23 / 50
24/50
Extensions of the linear model
Seen interactions earlier
Y =β0 + β1x1 + β2x2 + β3x1x2 + ε
=β0 + (β1 + β3x2)x1 + β3x2 + ε
=β0 + β1X1 + (β2 + β3x1)x2 + ε
Non-linear i x, linear in β!Variable selection with interactions:Hierarcical principle: If an interaction term is included, also include thecorresponding main effects even though they are not significant
Gives easier interpretation of the model
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 24 / 50
25/50
Interaction between qualitative and quantitative variables
Credit data: Want to predict balance from income (quantitative) og Student(qualitative)
f i t . lm3
26/50
Potential problems
Non-linearities
Correlations between noise terms
Non-constant variance of noise terms (heteroscedasticity)
Outliers
Observations with high infuence
Colinearity
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 26 / 50
27/50
Non-linear relations
Auto datasett
50 100 150 200
1020
3040
Horsepower
Mile
s pe
r ga
llon
Reasonable with linear model?
Alternative:Y = β0 + β1x + β2x2 + · · ·+ βqxq + εPlot: q = 1, 2, 5
50 100 150 200
10
20
30
40
50
Horsepower
Mile
s p
er
gallo
n
Linear
Degree 2
Degree 5
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 27 / 50
28/50
Correlation between noise terms
Standard assumption:
Yi = β0 + β1xi1 + · · ·+ βpxip + εi , ε1, ..., εn independent
What if there is dependence in the noise terms?
β̂ = (XT X)−1XT yWe still have E[Y] = Xβ and thereby E [β̂] = βHowever V[β̂] 6= σ2(XT X)−1, typically larger variancesWill influence inference
Necessary to change model
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 28 / 50
29/50
Non-constant variance
Often assume V[ε] = σ2, the same for all data
10 15 20 25 30
−1
0−
50
51
01
5
Fitted values
Re
sid
ua
ls
Response Y
998
975845
2.4 2.6 2.8 3.0 3.2 3.4
−0
.8−
0.6
−0
.4−
0.2
0.0
0.2
0.4
Fitted values
Re
sid
ua
lsResponse log(Y)
437671
605
Transformations can typically help!
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 29 / 50
30/50
Outliers
An outlier is a y -value which is far from the predicted value.
Easiest to identify by residual plots (linreg_outlier.R)
2 4 6 8 10
05
1015
2025
3035
x
y
All dataOutlier excluded
2 4 6 8 10
−5
05
10Index
fit1$
resi
dual
s
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 30 / 50
31/50
Observations with high influence
Linear model Yi = β0 + β1xi1 + · · ·+ βpxip + εiLS estimate: β̂ = (XT X)−1XT yPrediction: Ŷ = Xβ̂ = X(XT X)−1XT Y = Pyŷi =
∑nj=1 Pijyj
Pii says how much influence yi has on ŷiDo not want this influence to be too large (→ overfitting)Large infuence: Typical for x-values that are unusual
−2 −1 0 1 2 3 4
05
10
20
41
−2 −1 0 1 2
−2
−1
01
2
0.00 0.05 0.10 0.15 0.20 0.25
−1
01
23
45
Leverage
Stu
de
ntize
d R
esid
ua
ls
20
41
X
Y
X1
X2
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 31 / 50
32/50
Colinearity - two variables
Credit data: Some x-variables highly correlated
2000 4000 6000 8000 12000
30
40
50
60
70
80
Limit
Ag
e
2000 4000 6000 8000 12000
20
04
00
60
08
00
Limit
Ra
tin
g
> f i t 1 . lm summary ( f i t 1 . lm )C o e f f i c i e n t s :
Est imate Std . E r ro r t value Pr ( > | t | )( I n t e r c e p t ) −2.928e+02 2.668e+01 −10.97 | t | )( I n t e r c e p t ) −377.53680 45.25418 −8.343 1.21e−15 ∗∗∗L i m i t 0.02451 0.06383 0.384 0.7012Rat ing 2.20167 0.95229 2.312 0.0213 ∗
Can be identified through correlation matrices
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 32 / 50
33/50
Colinearity - several variables
Colinearity between more that two variables is more problematicThe problem can be indentified through the Variance inflation factor (VIF), definedby
VIF(β̂j ) =Vfull(β̂j )
Vsingle(β̂j )
where Vfull(β̂j ) is the variance of the estimate based on the model with allexplanatory variables andVsingle(β̂j ) is the variance to the estimate based only on xj as explanatory variableVIF(β̂j ) ≥ 1, low value indicate small colinearlty> f i t . f u l l l i b r a r y ( car )> v i f ( f i t . f u l l )
GVIF Df GVIF^(1 / (2∗Df ) )X 1.030358 1 1.015066Income 2.787231 1 1.669500L i m i t 234.064316 1 15.299161Rat ing 235.887178 1 15.358619Cards 1.449767 1 1.204063Age 1.054739 1 1.027005Education 1.019588 1 1.009747Gender 1.019885 1 1.009894Student 1.032245 1 1.015994Marr ied 1.045300 1 1.022399E t h n i c i t y 1.040571 2 1.009992
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 33 / 50
34/50
Least squares
Data {(x1, y1), ..., (xn, yn)}Least squares estimate
β̂ = (X T X )−1︸ ︷︷ ︸p×p
X T Y︸ ︷︷ ︸p×n
Calculation of X T X =∑n
i=1 x ixTi is O(np
2)
Inverting X T X is O(p3)Calculation of X T Y is O(np)Typically using a Gram-Schmidt procedure
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 34 / 50
35/50
Recursive methods
Define W n =∑n
i=1 x ixTi and V n = W
−1n . We have
W n+1 =W n + xn+1xTn+1
V n+1 =V n − hnV nxn+1xTn+1V n Sherman-Morrison
hn =1
1 + xTn+1V nxn+1
and
β̂n+1 =β̂n + k n (yn+1 − xTn+1β̂n)︸ ︷︷ ︸Prediction error
k n =hnV nxn+1
Sum of squares Qn =∑n
i=1(yi − ŷi )2:
Qn+1 = Qn + hn(yn+1 − xTn+1β̂n)2
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 35 / 50
36/50
Least squares and maximum likelihood
Assume now
yi = xTi β + εi , εiiid∼ N(0, σ2)
Likelihood=density for observation:
L(θ) =f (y |θ)
ind=
n∏i=1
f (yi |θ)
=n∏
i=1
1√2πσ
exp
(− 1
2σ2(yi − xTi β
)2
=1
(2π)n/2σnexp
(− 1
2σ2
n∑i=1
(yi − xTi β)2)
Maximum likelihood principle: θ̂ = arg maxθ L(θ)Maximization with respect to β is equivalent to minimizing
D(β) =n∑
i=1
(yi − xTi β)2
that is least squares is equivalent to maximum likelihood for Gaussian noiseBonus: Estimate for σ2:
σ̂2 =1n
n∑i=1
(yi − xTi β̂)2
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 36 / 50
37/50
Advantages with maximum likelihood
Possible with other models for noise1 t distribution allowing for some large noise terms
Possible for other outcomesGamma distribution for positive continuous dataBinomial distribution for binary dataPoisson distribution for count data
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 37 / 50
38/50
General ML theory
Maximum likelihood principle:
θ̂ = arg maxθ
L(θ) = arg maxθ
log L(θ)
log Lθ) = log L(θ)
Typically found as the solution of
∂
∂θlog Lθ) = 0
Note: L involves products, log L involves sums. Easier with derivatives on the latterWill only give guaranty to local maximumAs n→∞, log L(θ) will in many cases converge towards a convex function
Global maximum can be obtainedImportant quantity:
J (θ̂) =− ∂2
∂θ∂θTlog L(θ)|θ=θ̂ Fisher’s observed information matrix
θ̂ ≈N(θ,J (θ̂)−1)
Note: Exact results available for linear regression!Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 38 / 50
39/50
ML and inference
Confidence intervals: θ̂r ± zα/2std.err(θ̂r )Testing
H0 : θr =a
t =θ̂r − a
std.err(θ̂r )
H0≈ N(0, 1)
P-value ≈2Φ(−|t |)
Testing H0 : gj (θ) = 0, j = 1, ..., qθ̂0 is ML estimate under H0
H0 : θr =a
w = D =2[log L(θ̂)− log L(θ̂)]H0≈ χ2q Likelihood ratio statistics
P-value ≈Pr(χ2q > w)
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 39 / 50
40/50
Binary variables
Assume y ∼ Binom(n, π)
log L(π) =constant + y log(π) + (n − y) log(1− π)∂
∂πlog L(π) =
yπ− n − y
1− π
π̂ML =yn
∂2
∂π2log L(π) =− y
π2− n − y
(1− π)2
∂2
∂π2log L(π̂) =− n
π̂− n
(1− π̂) = −n
π̂(1− π̂)
Var[π̂] ≈ π̂(1− π̂)n
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 40 / 50
41/50
Binary variables - two groups
Assume yj ∼ Binom(nj , πj ), j = 1, 2Test: H0 : π1 = π2 = π
Under H0 : π̂ = (y1 + y2)/(n1 + n2)
Under alternative: π̂j = yj/nj
log L(π1, π2) =constant +n∑
j=1
y [log(πj ) + (nj − yj ) log(1− πj )]
D =2[log L(π̂1, π̂2)− log L(π̂, π̂)] Deviance=D0 − D1
D0 =− 2 log L(π̂, π̂)D1 =− 2 log L(π̂1, π̂2)
H0 :D ≈ χ21
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 41 / 50
42/50
Example
data from Brazilian bank
Response: Satisfaction: low/high
Groups: young/old
satisfaction\ group young old totallow 84 34 118high 225 157 382total 309 191 500π̂ 0.729 0.822 0.764
std.err(π̂) 0.025 0.026 0.019
D = 5.96, P-value 0.015
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 42 / 50
43/50
Logistic regression
Assume yi ∼ Binom(1, πi ), i = 1, ..., nWant to make πi = π(x i ) for some explanatory variables x iLinear relation
η(x) = β0 + β1x1 + · · ·+ βp−1xp−1
Logistic regression
π(x) =exp(η(x))
1 + exp(η(x))logistic function
=1
1 + exp(−η(x)) sigmoid function
=sigmoid(η(x))
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 43 / 50
44/50
Logistic function
−4 −2 0 2
0.0
0.2
0.4
0.6
0.8
1.0
x
Logi
stic
func
tion
2−2x2+3x0+x
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 44 / 50
45/50
Brazilian data
Earlier: Compared two groups, young or old
Age is a numberic variable, could be used as an explanatory variable directly
Brazilian_logist_reg.R
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 45 / 50
46/50
Generalized linear models
Logistic regression:
Y ∼Binom(n, π(x))
E [Y |x ] =π(x) = eη(x)
1 + eη(x)
g(π) = log(
π
1− π
)g(E [Y |x ]) =η(x) = xTβ
Special case of
Y ∼f (µ(x)) f in exponential family
η(x) =g(µ(x)) = xTβ
E [Y |x ] =µ(x) = g−1(η(x))
Generalized linear modelsInclude Linear regression, Logistic regression, Poisson regression, ...Theme in STK3100
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 46 / 50
47/50
K -nearest neighbor regression
linear regression:Simple to fitSimple interpretationSimple to perform different testsStrong assumptions on model
Non-parametric methods (machine learning)Do not assume any explicit formAK -nearest neighbor method:
f̂ (x0) =1K
∑x i∈N0
yi
where N0 ⊂ {x1, ..., xn} contain the K nearest points to x0.
yy
x1x1
x 2x 2
Choise of K : Trade-off between bias and varianceGeir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 47 / 50
48/50
Parametric or non-parametric?
If parametric model close to true: Parametric method is better (smaller varians,small bias)
−1.0 −0.5 0.0 0.5 1.0
12
34
−1.0 −0.5 0.0 0.5 1.0
12
34
yy
xx
−1.0 −0.5 0.0 0.5 1.0
12
34
0.2 0.5 1.0
0.0
00
.05
0.1
00
.15
Me
an
Sq
ua
red
Err
or
y
x 1/K
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 48 / 50
49/50
Parametric or non-parametric?
If parametric model is very wrong: Non-parametric method better (smaller bias)
−1.0 −0.5 0.0 0.5 1.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
0.2 0.5 1.0
0.0
00.0
20.0
40.0
60.0
8
Mean S
quare
d E
rror
−1.0 −0.5 0.0 0.5 1.0
1.0
1.5
2.0
2.5
3.0
3.5
0.2 0.5 1.0
0.0
00.0
50.1
00.1
5
Mean S
quare
d E
rror
yy
x
x
1/K
1/K
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 49 / 50
50/50
Parametric or non-parametric - high dimension
For one explanatory variable: Many observations can give good results fornon-parametric methods
For many explanatory variables: The K nearest xi ’s to x0 will typically be far awayfrom x0.Gives large bias!
Based on f (x0) ≈ f (x i ) for x i close to x0.
0.2 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p=1
0.2 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p=2
0.2 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p=3
0.2 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p=4
0.2 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p=10
0.2 0.5 1.00
.00
.20
.40
.60
.81
.0
p=20
Me
an
Sq
ua
red
Err
or
1/K
Geir Storvik Chapter 2 - linear methods Linear regression Logistic regression January 25, 2021 50 / 50