12
Regression Diagnostics Using R Yiqing Xu Based on Jens Hainmueller’s MIT Lecture 17800.10-1/2 Data used in this handout is based on Eggers-Hainmueller (2009) 1 Overview 1. Scatterplot: car and lattice 2. Regression Diagnostics Hat-values: identifying leverage points Studentized residuals: identifying outliers QQ plot: evaluating model fit and normality DFBetas: evaluating influence for each coefficient Cook’s distance: summarizing influence across coefficients Automatic regression diagnosis 3. Standard Error Adjustment Breusch-Pagan test Robust standard errors 1

Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

  • Upload
    buihanh

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

Regression Diagnostics Using R

Yiqing Xu

Based on Jens Hainmueller’s MIT Lecture 17800.10-1/2

Data used in this handout is based on Eggers-Hainmueller (2009)

1 Overview

1. Scatterplot: car and lattice

2. Regression Diagnostics

• Hat-values: identifying leverage points

• Studentized residuals: identifying outliers

• QQ plot: evaluating model fit and normality

• DFBetas: evaluating influence for each coefficient

• Cook’s distance: summarizing influence across coefficients

• Automatic regression diagnosis

3. Standard Error Adjustment

• Breusch-Pagan test

• Robust standard errors

1

Page 2: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

2 Scatterplot

wealth

20 40 60 80 100

●●

●●

●●●

●●

● ●● ●●

●●● ●● ●

●●● ●

●●

●●●

● ●●● ●●

●●

●● ●●●●●

●●

●●● ●● ●●● ●●●●●

● ●● ●●

●●●

●●

●● ● ●

●●●

●●

● ●●●

●●

● ●●●

● ●●●

●●

●●

● ●

●●●

●●

●●●

●●

● ●●●

●●●●

●●● ●

● ●

●●

●● ●

●●●●●

●●

●●

●●

●● ●●

●●● ●● ●●●

●●

0.0e

+00

5.0e

+06

1.0e

+07

1.5e

+07

●●

●●●

●●●

●●● ●

●●●

●●●●● ●

●●● ●

●●

●●●

● ●●●● ●●

●●

●●●●● ●●

●●

●●● ●●●●●● ●●●●

● ●●● ●

●●●

●●●

●● ●●

●●●

●●

●●●●

●●

●●●●

● ●●●

●●

●●● ●

●●●

●●

●●●

●●

●●●●

●● ●●●

●●●●

●●

●●

●●●

●●●●●

●●

●●

●●

●● ●●

●●●● ●●●●●

● ●

2040

6080

100

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●● ●

●●

●●

●●

●●●

●●

●●●

● ●

●●●●

● ●●●

●●●

●●●

●●

● ●

●●

runage

●●

●●

●●

●●

●●

● ●

●●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

● ●

●●●●●

●●●●

●●●

●●●

●●

●●

●●

0.0e+00 5.0e+06 1.0e+07 1.5e+07

●●

●●

● ●●

●●●

● ●

●●●

●●

●● ●

●●

● ●

● ●●● ●●●●

●●●

●●

●●● ●●

●●●●

●●

● ●

●●

●●●●●

●●●

● ●●

●●

● ●●●

●●

●●

●●

●●

●●

●● ●●

●●●

●● ●●●

●●

● ●

● ●●●

●●●

● ●

●●●

● ● ●

●●

● ●

●●●

●●● ●●

●●

●●

● ●●●

●● ●●

●●● ●●

●●

●●

●●●●●

● ●● ●

● ●●● ●

●●

●● ●

●●

●●

●●

● ●

● ●

● ●

● ●

● ●●

●● ●

●●

●●●

●●

●● ●

●●

●●

●● ●●● ●●●

● ● ●

●●

●●●● ●

● ●●●

● ●

●●

●●

● ●●● ●

●●●

● ●●

● ●

● ●● ●

● ●

● ●

●●

● ●

● ●

●● ●●

● ●●

●●● ● ●

●●

●●

● ●● ●

●●●

● ●

●●●

●● ●

●●

● ●

●●●

● ●●● ●

●●

●●

● ●●●

●●● ●

●● ●● ●

●●

● ●

●●●●●

● ●● ●

●●●● ●

●●

●●●

●●

● ●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

served

R Code

library(foreign)

d<-read.dta("bp.dta") # loading the data

library(car)

scatterplotMatrix(d[,c("wealth","runage","served")])

2

Page 3: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

●●●

●●

●●

● ●

20 40 60 80 100

0.0e

+00

5.0e

+06

1.0e

+07

1.5e

+07

runage

wea

lth

●●

●●

● ●

●●●

●●●●

●●

●●● ●

●●

●●

●●

● ●●

●●●

●●

●●

●●●

●●● ●

●●

●●

● ●●

●●

● ● ●

●●

●●

●●

● ●●

●●

●●

●●●

●●●

● ●●

●●●

●●●

● ●

●● ●●

●●●

●●

●●

●●●

●●

David,LlewellynGeorge,Walker

Michael,Grylls

John,WaringLeslie,Pine

Albert,Costain

Bonner,PinkReginald,Webster

Nicholas,Ridley

Michael,Underhill

Shelagh,Roberts

Gerald,Coles William,GrantJohn,Bidgood Michael,FidlerRoyden,Greene

Norris,McWhirter

Anthony,Bourne−ArtonWillfred,BakerHumphry,BerkeleyEdwin,Lee

Maurice,CowlingGodman,Irvine

Peter,Jenkin−Jones

James,Davis

Harry,Goodwin

Constance,MonksTom,NormantonHarold,ClippingdaleMichael,Havers

William van,Straubenzee

Muriel,Williamson

Brian,Bell

David,BellDennis,Larrow

Robin,Leach

Robert,BrumTimothy,Keigwin

Robert,Johnston

Thomas,Iremonger

Huw,Griffith Harold,GurdenGeoffrey,Tucker

Cyril,UnsworthDorothy,RussellAnthony,Barber

Michael,Hooker

Geoffrey,Waite

Richard,Thompson

Kenneth,Thompson

Arthur,Bottomley

Geoffrey,Rippon

Leonard,CleaverAubrey,MoodyBrian,WarrenAlexander,LeitchFrancis,Richardson

Robert,AdleyJulian,Ridsdale

Francis,Lofthouse

Richard,Lamb

Patrick,Wall

William,Lowe

Frederick,BurdenMartin,HenryK Graham,Routledge

Neil,MartenSydney,RipleyAnne,PapworthPatrick,CrottyDonald,Allen Noel,O'Reilly

John,Tilney

Christopher,WoodhouseMichael,CoulsonPaul,Beard

Michael,Ogden

Kenneth,PayneJohn,Rodgers

Clive,Howson David,MaxwellHarold,Soref

John,Cordle

Cuthbert,AlportRonald,BrayJohn,Stuart−Mills

Ian,Percival

Michael,HiggsNeil,Murray

William,Beale

Frederick,Corfield

Michael,HamiltonRobert,HorrocksAnthony,CourtneyErnest,Partridge

Graham,Partridge

Frank,TaylorJames,TaylorFrederick,Hingston

Christopher,Soames

Roger,White

Cyril,Black

Robert,Cooke

Charles,Lawson

Spencer,Le MarchantRupert,Speir

Geraint,MorganLeslie,Morgan

Robert,CrouchDavid,Crouch

Michael,Alison

Eric,BullusGilbert,LongdenThomas,John

Percy,Lucas

K A,Quas−CohenColin,Mitchell

Nigel,FisherDouglas,Barnard

Bernard,Owens

Julian,Critchley

Charles,Fletcher−Cooke

Anthony,RoyleWilliam,Peel

Harold,Denman

David,Martin−JonesThomas,Beattie−EdwardsDonald,Kaberry

Charles,Orr−Ewing

David,Napley

John,Litchfield

Ronald,Scott−MillerGodfrey,LagdenJames,Bazin

Michael,Scholfield

Patrick,Radford

Edward,Gardner

Bruce,Butcher

John,Fawcett

Michael,McNair−Wilson

Alan,Green

John,Grigg

Trevor,Skeet

Leonard,CaplanPatrick,MeddPhilip,Heselton

Jacob,Astor

John,Astor

Enoch,Powell

Peter,Emery

John,Stokes

William,Loftus

David,Lane

Leslie,PriestleyMichael,Way

John,Biggs−DavisonKenneth,DunkleyArthur,O'ConnorJohn,Spence

Richard,Lonsdale

Cyril,Lipman

Herbert,DaviesJohn,DaviesWyndham,DaviesHorace,Cutler

Michael,Argyle

Malcolm,St Clair

Alison,Tennant

James,HillGordon,MatthewsTerence,Clarke

Basil,de Ferranti

William,YatesAlbert,Holdsworth

Stephen,Hastings

Denys,Bullard

Joyce,Ratcliffe

Percy,BrowneGeoffrey,Hirst

Peter,Boydell

Stanley,CheethamDenton,HinchcliffeMartin,BrannanMaurice,ChandlerNicholas,Scott

Alan,Glyn

Neil,McLean

James,Lindsay

Banner,Adkins

Julian,Amery

David,Gibson−Watt

Nicholas,BudgenMarcus,Fox

Harmar,Nicholls

Geoffrey,SingletonWilliam,HowPeter,MillsIvor,Stanbrook

Arthur,Jones

Murray,Leask

Idris,Owen

Joseph,NortonKenneth,Hargreaves

Reginald,BevinsRobert,YoungsonJames,Scott−Hopkins

Paul,HawkinsJohn,TresmanKenneth,Lawton

Bernard,Braine

Anthony,Leavey

Kathleen,SmithDonald,Thompson

R Code

scatterplot(d$wealth~d$runage,xlab="runage",ylab="wealth")

text(y=d$wealth,x=d$runage,labels=d$name,pos=3,cex=.6,col=4)

3

Page 4: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

runage

wea

lth

0

5000000

10000000

15000000

20 40 60 80 100

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ● ●●●●

●●

●●●● ●●

●● ●●

●●

●●

●● ●

●●

●●●

●●●

●●●

●● ●

●●●

●●

●●●● ●●●

●●

●●

served

20 40 60 80 100

● ●

●●

●●●●●

●● ●

●●

●●

● ●●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●●●

●●

●●

●●●

served

R Code

library(lattice)

mypanel<-function(x,y,...) {

panel.xyplot(x,y,...)

panel.lmline(x,y)

panel.loess(x,y)

}

xyplot(wealth~runage|served,data=d,panel=mypanel)

4

Page 5: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

3 Regression Diagnostic

3.1 Testing Two Models

----------------------------------------------------------

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2349006 702638 3.3431 0.0009808 ***

runage -40120 16862 -2.3794 0.0182380 *

served 691895 290422 2.3824 0.0180933 *

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

----------------------------------------------------------

----------------------------------------------------------

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 13.599589 0.451991 30.0882 < 2.2e-16 ***

runage -0.025181 0.010847 -2.3215 0.0212179 *

served 0.698336 0.186822 3.7380 0.0002392 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

----------------------------------------------------------

R Code

library(lmtest)

d<-na.omit(d)

mod1<-lm(wealth~runage+served,data=d)

mod2<-lm(log(wealth)~runage+served,data=d)

coeftest(mod1)

coeftest(mod2)

5

Page 6: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

3.2 Hat-values

hatv

John,Grigg

Kenneth,Lawton

Royden,Greene

David,Maxwell

Nicholas,Ridley

Geoffrey,Rippon

Nicholas,Scott

Gerald,Coles

Bonner,Pink

Michael,Fidler

Charles,Lawson

Humphry,Berkeley

Robert,Cooke

Peter,Emery

Basil,de Ferranti

Constance,Monks

Ernest,Partridge

Harry,Goodwin

Dorothy,Russell

John,Litchfield

Alison,Tennant

Charles,Orr−Ewing

Horace,Cutler

Reginald,Webster

James,Lindsay

0.05 0.10 0.15 0.20

R Code

d$hatv <- hatvalues(mod1)

d <- d[order(d$hatv),]

d$name <- factor(d$name,levels=d$name,ordered=T)

n <- mod1$df.residual + mod1$rank # num of obs

k <- mod1$rank # num of regressors

cutoffhatv <- 2*k/n

mypanel = function(x,y,...){

panel.dotplot(x,y,...)

panel.abline(v=cutoffhatv,col="green")

}

dotplot(name~hatv,data=d[d$hatv>.02,],panel=mypanel)

6

Page 7: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

3.3 Studentized Residuals

●●●●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●●●●●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●

●●

●●●

●●●●

●●

●●●●●

●●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

0 50 100 150 200

02

46

8

Index

Stu

dent

ized

Res

idua

ls

Cutoff 95%Adjusted Cutoff

Thomas,Beattie−Edwards

Marcus,Fox Ian,Percival

R Code

d$studresid <- rstudent(mod1)

cutoffstud <- qt(.025, n-k, lower.tail=F)

cutoffstudadj <- qt(.025/(n-k), n-k, lower.tail=F)

plot(d$studresid, ylab="Studentized Residuals", pch=19)

abline(h=cutoffstud, col="blue")

abline(h=cutoffstudadj, col="red")

legend("topleft", legend=c("Cutoff 95%","Adjusted Cutoff"),

lty=1, col=c("blue","red"), cex=.6)

text(y=d$studresid[d$studresid>cutoffstudadj],

x=(1:length(mod1$fitted.values))[d$studresid>cutoffstudadj],

label=d$name[d$studresid>cutoffstudadj], pos=1, cex=.6)

graphics.off()

7

Page 8: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

3.4 Q-Q Plot

−3 −2 −1 0 1 2 3

02

46

8

t

Stu

dent

ized

Res

idua

ls(m

od1)

● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●

●●●●●●

● ●

(a) Model 1: Raw data

−3 −2 −1 0 1 2 3

−3

−2

−1

01

2

tS

tude

ntiz

ed R

esid

uals

(mod

2)

●●

●●●●

●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●

●●●●

●●●●●●●●

●●

● ●●

(b) Model 2: After log transformation

R Code

qqPlot(mod1,"t",envelope=TRUE)

qqPlot(mod2,"t",envelope=TRUE)

8

Page 9: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

3.5 DFBetas

●●● ●●●

●●

●●

●●

● ●●

●●●●●

●● ●●

●●

●●

●●●●●

●●●●

●●

●● ●●

●●●●●

●● ●

●●

●●

● ●●●●

●● ●

● ●

●●● ●●

●●●

●●● ●● ●●

●●

●●

●●●

●●

●●●●

●●

●●●

●● ●

●● ●

●●●●●

●●●●

●●●

●●

●●●●

●●

●●●●●●

●●

●●

●●●

● ●

●●●

●●●● ●

●●

●●●●

●●

● ●

●●●●●

●● ●●

●●●

● ●

●●

●●●●

−0.4 −0.2 0.0 0.2 0.4 0.6

−1.

0−

0.5

0.0

0.5

DFBETAS served

DF

BE

TAS

run

age

Frederick,CorfieldRupert,SpeirDavid,CrouchArthur,JonesJohn,CordleAlan,Green

Idris,Owen

Donald,Thompson

Kenneth,ThompsonMichael,Hamilton

John,Astor

Alan,Glyn

Richard,ThompsonCuthbert,Alport

Michael,HiggsMichael,Alison

Enoch,Powell Wyndham,DaviesDenys,Bullard

Harmar,Nicholls

William,GrantGodman,IrvineTom,NormantonChristopher,WoodhouseRonald,Bray

David,Lane

Gordon,MatthewsIvor,StanbrookReginald,BevinsShelagh,Roberts

Francis,LofthouseDouglas,Barnard

Norris,McWhirter

Edwin,LeeDennis,Larrow

Aubrey,Moody

Richard,LambKenneth,PayneGraham,PartridgeMichael,ScholfieldNigel,Fisher

Charles,Fletcher−CookeRobert,JohnstonFrancis,RichardsonSydney,Ripley

Bruce,Butcher

Leonard,CaplanJohn,Davies

Kathleen,SmithArthur,BottomleyNeil,MartenJohn,Tilney

William,PeelDonald,KaberryEdward,GardnerStephen,HastingsPeter,Mills

Alexander,LeitchMichael,OgdenJames,Taylor

Thomas,Beattie−Edwards

James,BazinJohn,Tresman

Clive,HowsonHarold,Denman

Albert,HoldsworthJohn,BidgoodJulian,RidsdalePercy,BrowneBernard,Braine

George,WalkerK A,Quas−CohenRichard,LonsdaleHarold,Clippingdale

Stanley,CheethamDenton,Hinchcliffe

Willfred,BakerJohn,RodgersEric,BullusJohn,SpenceJohn,Waring

Neil,MurrayRobert,Horrocks

David,NapleyPatrick,MeddWilliam,How

Patrick,WallHarold,SorefSpencer,Le MarchantPercy,LucasAnthony,LeaveyDavid,BellWilliam,BealeFrederick,HingstonLeslie,PriestleyKenneth,Dunkley

Herbert,DaviesPeter,BoydellFrederick,BurdenColin,Mitchell

Godfrey,LagdenJohn,Stuart−Mills

Michael,Argyle

Geoffrey,SingletonTimothy,KeigwinCyril,UnsworthBrian,Warren

Noel,O'ReillyDavid,Martin−JonesThomas,Iremonger

Michael,McNair−WilsonWilliam,YatesJames,Scott−Hopkins

David,LlewellynAnthony,Bourne−ArtonRobert,CrouchRonald,Scott−Miller

James,HillTerence,Clarke

Geoffrey,Hirst

Brian,Bell

Huw,Griffith

Geoffrey,Tucker

Geoffrey,WaiteRobert,BrumPatrick,RadfordMartin,BrannanTrevor,Skeet

John,Biggs−Davison

Nicholas,Budgen

Michael,Havers

Anthony,CourtneyJohn,StokesMichael,UnderhillMaurice,Cowling

Leslie,Morgan

Philip,HeseltonMuriel,Williamson

Martin,HenryJoseph,NortonKenneth,Hargreaves

Robert,Youngson

Michael,CoulsonJacob,AstorNeil,McLeanDavid,Gibson−Watt

Marcus,Fox

Frank,TaylorCyril,Black

Gilbert,Longden

Michael,HookerK Graham,RoutledgeArthur,O'ConnorMalcolm,St ClairAnne,PapworthBanner,Adkins

William van,StraubenzeeRobert,Adley

Roger,WhiteGeraint,Morgan

Julian,Amery

Albert,CostainPeter,Jenkin−JonesJames,DavisWilliam,LoweBernard,Owens

Maurice,Chandler

Thomas,John

William,LoftusJoyce,RatcliffeAnthony,Barber

Ian,Percival

Christopher,SoamesMichael,GryllsLeonard,CleaverDonald,AllenMurray,Leask

Julian,CritchleyPaul,Beard

John,FawcettPatrick,CrottyAnthony,RoyleRobin,Leach

Michael,Way

Harold,Gurden

Paul,HawkinsLeslie,Pine

Cyril,Lipman

John,Grigg

Kenneth,LawtonRoyden,Greene

David,MaxwellNicholas,RidleyGeoffrey,RipponNicholas,ScottGerald,Coles

Bonner,PinkMichael,FidlerCharles,LawsonHumphry,Berkeley

Robert,CookePeter,EmeryBasil,de Ferranti

Constance,MonksErnest,Partridge

Harry,Goodwin

Dorothy,RussellJohn,Litchfield

Alison,TennantCharles,Orr−Ewing

Horace,CutlerReginald,WebsterJames,Lindsay

R Code

dfbetas <- dfbetas(mod1)

2/sqrt(n)

plot(dfbetas[,3],dfbetas[,2],xlab="DFBETAS served",ylab="DFBETAS runage")

text(dfbetas[,3],dfbetas[,2],label=d$name,post=1,cex=.6,col=4)

9

Page 10: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

3.6 Influence Plots

0.00 0.05 0.10 0.15 0.20 0.25

02

46

8

hat values

stud

entiz

ed r

esid

uals

● ●●●

●●●

●●●●

●●

●●●●

●●

●●●●

●●

● ●●●

●● ●●●

●●●

●●●

●●

● ●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

● Donald,Thompson

Thomas,Beattie−Edwards

Marcus,Fox

William,Lowe

Ian,Percival

Cyril,Lipman

R Code

symbols(y=rstudent(mod1), x=hatvalues(mod1), circles=sqrt(cookd(mod1)),

ylab="studentized residuals",xlab="hat values",

ylim=c(-1,9.5),xlim=c(0,.26))

abline(h=cutoffstud,col="blue")

abline(h=cutoffstudadj,col="red")

abline(v=cutoffhatv,col="green")

filter <- rstudent(mod1) > cutoffstudadj | hatvalues(mod1) > cutoffhatv

text(y=rstudent(mod1)[filter], x=hatvalues(mod1)[filter], label=d$name[filter],pos=3)

10

Page 11: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

3.7 Automatic Regression Diagnostics

−1000000 0 1000000 2000000

0.0e

+00

1.0e

+07

Fitted values

Res

idua

ls

●● ●●● ●●

●●●

●● ●●

●●●

● ● ●

● ●●

● ●●

● ●

●●● ●

●●●●●

●●

●● ●●●● ●●

●●●

●●

●● ●

●●●●

●●●

●●

● ●●●

●●●●

●●

● ●

●● ●●

●●● ●

● ●●●●

● ●

●●●

●● ● ●●

●●●

● ●●

●●●

● ●●

●●

●● ● ●

●●●

● ●●

●● ●●

● ●●

● ●●

● ●●

●●●

●●●●

●●

●●

●● ●●

●●

●●

●●●

● ●

●●

Residuals vs Fitted

160 188

62

●●●● ● ●

●●

● ●●

●● ●●

●●●

● ●●

●●●

● ● ●

●●

●●●●

● ● ●● ●

●●

●● ●●● ●●●

●● ●

●●

●●●

●● ●●

●●

●● ●

●● ● ●

●● ●●

●●

●●

● ●● ●

● ●●●

●●●●

●●●

● ●●

●●●● ●

●●●●●

●●●

●●

●●

●●●●

●●●

●●●

●● ●●

●●●

●●●

● ●●

●●●

●●●●

●●

● ●

●●● ●

●●

●●

●●●

● ●

●●

−3 −2 −1 0 1 2 3

02

46

8Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

160188

62

−1000000 0 1000000 2000000

0.0

0.5

1.0

1.5

2.0

2.5

Fitted values

Sta

ndar

dize

d re

sidu

als

● ●●

● ●

●●●

●● ●

●●

●● ●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

● ●

●●

●● ● ●

● ●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●●●

●●●

● ●

● ● ●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

Scale−Location160 188

62

0.00 0.05 0.10 0.15 0.20

02

46

8

Leverage

Sta

ndar

dize

d re

sidu

als

●●●● ●●

●●

●●●

●●●●●

●●●

●●●

●●●●●●

●●

●●●●●●●● ●

●●●

●● ●●●●●●●

●●

●●●●●

●●●●

●●●●●

●●●●

●●●●

●●●●

●●● ●

●●●●

●●●●●●●

●●●

●●●●●

●●●●●●

●●●●

●●●

●●

●●●●

●●●●●●

●●●●

●●●

●●●

●●●

●●●

●●●●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

Cook's distance

0.5

1

Residuals vs Leverage

188160

205

(c) Model 1: Raw data

12.0 12.5 13.0 13.5

−4

−2

02

4

Fitted values

Res

idua

ls

●●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

● ●●●

●●

● ●

● ●

●●●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●●

● ●●

●●

●●● ● ●

●●

●●●

●●

Residuals vs Fitted

214224

49

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●●

●●

●●●

●●

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

214224

49

12.0 12.5 13.0 13.5

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

●●

● ●

●● ●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

●●

●●

●●●

Scale−Location214

22449

0.00 0.05 0.10 0.15 0.20

−4

−2

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

● ●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●

●●

Cook's distance1

0.5

0.5

1

Residuals vs Leverage

205

224

188

(d) Model 2: After log transformation

R Code

par(mfrow=c(2,2))

plot(mod1)

par(mfrow=c(2,2))

plot(mod2)

11

Page 12: Regression Diagnostics Using R - Yiqing Xuyiqingxu.org/teaching/17800/handout_17800diagnostics.pdf · David,CrouchRobert,Crouch Michael,Alison Eric,BullusGilbert,LongdenThomas,John

4 SE Adjustment

4.1 Breusch-Pagan Test

----------------------------------------------------------

Breusch-Pagan test

data: mod1

BP = 83.4522, df = 2, p-value < 2.2e-16

----------------------------------------------------------

R Code

library(lmtest)

bptest(mod1, studentize = FALSE)

4.2 Robust SE

----------------------------------------------------------

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2349006 702638 3.3431 0.0009808 ***

runage -40120 16862 -2.3794 0.0182380 *

served 691895 290422 2.3824 0.0180933 *

----------------------------------------------------------

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2349006 813865 2.8862 0.004306 **

runage -40120 19900 -2.0161 0.045065 *

served 691895 285120 2.4267 0.016081 *

----------------------------------------------------------

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2349006 819658 2.8658 0.004582 **

runage -40120 20042 -2.0018 0.046589 *

served 691895 287149 2.4095 0.016835 *

----------------------------------------------------------

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

R Code

library(sandwich)

library(lmtest)

coeftest(mod1) # homoskedasticity

coeftest(mod1,vcov=vcovHC(mod1,type="HC0")) # classic White

coeftest(mod1,vcov=vcovHC(mod1,type="HC1")) # small sample correction

12