63
Linear Modelling: Simple Regression 10 th of May 2018 R. Nicholls / D.-L. Couturier / M. Fernandes

Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

  • Upload
    ngothu

  • View
    227

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Modelling: Simple Regression10th of May 2018 R. Nicholls / D.-L. Couturier / M. Fernandes

Page 2: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Introduction:

ANOVA•  Usedfortestinghypothesesregardingdifferencesbetweengroups•  Considersthevariationwithinandbetweengroups

Regression•  Usedforrevealingandinvestigatingrelationshipsbetweeninputandoutputvariables•  Modeldata,andextrapolateasmuchinformationaspossible

2

Page 3: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

3

Howtomeasurethestrengthofalinearrelationshipbetweenvariables?

Page 4: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

4

Page 5: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

60

x

yCorrelation:

5

Page 6: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

-15

-10

-50

510

x

yCorrelation:

6

Page 7: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

7

Page 8: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

8

Page 9: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

9

Page 10: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

10

Page 11: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

11

Page 12: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

12

Page 13: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

yCorrelation:

13

Positivelycorrelated:

Negativelycorrelated:

Uncorrelated:

Page 14: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Correlation:

14

Pearson’sproduct-momentcorrelationcoefficient:

CoefficientofVariation(R2value):

Page 15: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

y

Correlation:

15

r=0.931R2=0.866

0 10 20 30 40 500

1020

3040

5060

x

y

r=-0.949R2=0.901

Page 16: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50-5

05

1015

2025

30

x

y

0 10 20 30 40 50

-15

-10

-50

510

x

y

Correlation:

16

r=-0.060R2=0.004

r=0.106R2=0.011

Page 17: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

y

Correlation:

17

data:xandyt=17.613,df=48,p-value<2.2e-16alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:0.88025560.9602168sampleestimates:cor0.9305923

data:xandyt=1.5609,df=48,p-value=0.1251alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:-0.062380660.46941403sampleestimates:cor0.2197833

0 10 20 30 40 50

-200

-100

0100

200

xy

CanIsaywhethermydataarecorrelated?Isanobservedcorrelationsignificant?

Page 18: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Simple Regression:

18

0 10 20 30 40 50

010

2030

4050

x

y

Aims:•  Toinvestigatelinearcorrelationbetweentwovariablesinmoredetail•  Beabletopredictresponsegivenaknowledgeoftheindependentvariable

PredictorvariableIndependentvariable

ResponsevariableDependentvariable

Page 19: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

y

Simple Regression:

19

Aims:•  Toinvestigatelinearcorrelationbetweentwovariablesinmoredetail•  Beabletopredictresponsegivenaknowledgeoftheindependentvariable

ResponsevariableDependentvariable

PredictorvariableIndependentvariable

Page 20: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

y

Simple Regression:

20

Aims:•  Toinvestigatelinearcorrelationbetweentwovariablesinmoredetail•  Beabletopredictresponsegivenaknowledgeoftheindependentvariable

ResponsevariableDependentvariable

PredictorvariableIndependentvariable

Page 21: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

y

Simple Regression:

21

Aims:•  Toinvestigatelinearcorrelationbetweentwovariablesinmoredetail•  Beabletopredictresponsegivenaknowledgeoftheindependentvariable

ResponsevariableDependentvariable

PredictorvariableIndependentvariable

εi=errors,residuals

Fortheithobservation:

yi

εixi

Page 22: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Simple Regression:

22

Sohowdowefittheregressionline?Supposeweknowparameterestimatesand

0 10 20 30 40 500

1020

3040

50

x

y

Observations:

yi

εixi

Page 23: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

= !(! + !!+ !|!,!)

Simple Regression:

23

0 10 20 30 40 500

1020

3040

50

x

y

Observations:Fittedvalues:

yi

εi

ŷi

xi

Sohowdowefittheregressionline?Supposeweknowparameterestimatesand

Page 24: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Simple Regression:

24

! = !− !Residuals:

0 10 20 30 40 500

1020

3040

50

x

y yi

εi

ŷi

xi

Sohowdowefittheregressionline?Supposeweknowparameterestimatesand

Page 25: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Simple Regression:

25

0 10 20 30 40 500

1020

3040

50

x

y yi

εi

! = !− !

ŷi

Residuals: xi! = !+ !

Sohowdowefittheregressionline?Supposeweknowparameterestimatesand ! = !− !

Page 26: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Simple Regression:

26

0 10 20 30 40 500

1020

3040

50

x

y yi

εi

! = !− !

ŷi

Residuals: xi! = !+ !

! ~ !(!,!!)

Sohowdowefittheregressionline?Supposeweknowparameterestimatesand ! = !− !

Page 27: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Simple Regression:

27

0 10 20 30 40 500

1020

3040

50

x

y yi

εi

! = !− !

ŷi

Residuals: xi! = !+ !

! ~ !(!,!!)

!(!|!;!,!) = 12!!!

!!(!!!)!!!!

Sohowdowefittheregressionline?Supposeweknowparameterestimatesand ! = !− !

Page 28: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 500

1020

3040

50

x

y

Simple Regression:

28

Sohowdowefittheregressionline?ObtainestimatesandMaximiselikelihoodofparametersgiventhedata

yi

εi

ŷi

xi

! = !− !

!(!|!;!,!) = 12!!!

!!(!!!)!!!!

! !,! !,! = !(!!|!!;!,!)!

= 12!!!!

!!(!!!!!)!

!!!

Page 29: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 500

1020

3040

50

x

y

Simple Regression:

29

Sohowdowefittheregressionline?ObtainestimatesandMaximiselikelihoodofparametersgiventhedata

yi

εi

ŷi

xi

! = !− !

!(!|!;!,!) = 12!!!

!!(!!!)!!!!

! !,! !,! = !(!!|!!;!,!)!

= 12!!!!

!!(!!!!!)!

!!!

ln! !,! !,! = !!! ln 2!!! − (!!!!!)!

!!!!

= !!! log 2!!

! − !!!! (!! − !!)!

!

Page 30: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

! !,! !,! !"#

0 10 20 30 40 500

1020

3040

50

x

y

Simple Regression:

30

Optimalparameters:minimiseresidualsumofsquaresMaximumLikelihoodandLeastSquaresestimatesareequivalent(forGaussianerrorsmodel)

yi

εi

ŷi

xi

! = !− !

Sohowdowefittheregressionline?ObtainestimatesandMaximiselikelihoodofparametersgiventhedata

Page 31: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

! !,! !,! !"#

Simple Regression:

31

Optimalparameters:minimiseresidualsumofsquaresMaximumLikelihoodandLeastSquaresestimatesareequivalent(forGaussianerrormodel)

! = !− !

Sohowdowefittheregressionline?ObtainestimatesandMaximiselikelihoodofparametersgiventhedata

0 10 20 30 40 500

1020

3040

50

x

y!

!

Page 32: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Simple Regression:

32

! = !− !

Sohowdowefittheregressionline?ObtainestimatesandMaximiselikelihoodofparametersgiventhedataMinimisesumofsquaredresiduals

0 10 20 30 40 500

1020

3040

50

x

y!

!

Finalanswer:

Page 33: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Simple Regression:

33

Example:Predictingtimbervolumeoffelledblackcherrytrees

8 10 12 14 16 18 20

1020

3040

5060

70

GirthVolume

> cor(trees$Volume,trees$Girth)[1] 0.9671194

> m1 = lm(Volume~Girth,data=trees)> summary(m1)

Call:lm(formula = Volume ~ Girth, data = trees)

Residuals: Min 1Q Median 3Q Max -8.065 -3.107 0.152 3.495 9.587

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -36.9435 3.3651 -10.98 7.62e-12 ***Girth 5.0659 0.2474 20.48 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.252 on 29 degrees of freedomMultiple R-squared: 0.9353, Adjusted R-squared: 0.9331 F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16

Response: y=VolumePredictor: x=Girth

Page 34: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

8 10 12 14 16 18 20

1020

3040

5060

70

GirthVolume

Simple Regression:

34

Example:Predictingtimbervolumeoffelledblackcherrytrees

> cor(trees$Volume,trees$Girth)[1] 0.9671194

> m1 = lm(Volume~Girth,data=trees)> summary(m1)

Call:lm(formula = Volume ~ Girth, data = trees)

Residuals: Min 1Q Median 3Q Max -8.065 -3.107 0.152 3.495 9.587

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -36.9435 3.3651 -10.98 7.62e-12 ***Girth 5.0659 0.2474 20.48 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.252 on 29 degrees of freedomMultiple R-squared: 0.9353, Adjusted R-squared: 0.9331 F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16

Response: y=VolumePredictor: x=Girth

Page 35: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Residuals

Frequency

-10 -5 0 5 10

01

23

45

67

Simple Regression:

35

Example:Predictingtimbervolumeoffelledblackcherrytrees

> cor(trees$Volume,trees$Girth)[1] 0.9671194

> m1 = lm(Volume~Girth,data=trees)> summary(m1)

Call:lm(formula = Volume ~ Girth, data = trees)

Residuals: Min 1Q Median 3Q Max -8.065 -3.107 0.152 3.495 9.587

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -36.9435 3.3651 -10.98 7.62e-12 ***Girth 5.0659 0.2474 20.48 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.252 on 29 degrees of freedomMultiple R-squared: 0.9353, Adjusted R-squared: 0.9331 F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16

Response: y=VolumePredictor: x=Girth

! = 4.252!! = 18.1

Page 36: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Residuals

Frequency

-10 -5 0 5 10

01

23

45

67

Simple Regression:

36

Example:Predictingtimbervolumeoffelledblackcherrytrees

> cor(trees$Volume,trees$Girth)[1] 0.9671194

> m1 = lm(Volume~Girth,data=trees)> summary(m1)

Call:lm(formula = Volume ~ Girth, data = trees)

Residuals: Min 1Q Median 3Q Max -8.065 -3.107 0.152 3.495 9.587

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -36.9435 3.3651 -10.98 7.62e-12 ***Girth 5.0659 0.2474 20.48 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.252 on 29 degrees of freedomMultiple R-squared: 0.9353, Adjusted R-squared: 0.9331 F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16

Response: y=VolumePredictor: x=Girth

95%within±8.5

! = 4.252!! = 18.1

Page 37: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

37

Assumptions:1.  Modelislinearinparameters.

Page 38: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

38

Assumptions:1.  Modelislinearinparameters.

Page 39: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

39

Assumptions:1.  Modelislinearinparameters.

Page 40: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

40

Assumptions:1.  Modelislinearinparameters.

Page 41: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

41

Assumptions:1.  Modelislinearinparameters.

Page 42: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

42

Assumptions:1.  Modelislinearinparameters.

2.  Gaussianerrormodel.

Page 43: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

43

Assumptions:1.  Modelislinearinparameters.

2.  Gaussianerrormodel.

3.  Additiveerrormodel.

Page 44: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

44

Assumptions:1.  Modelislinearinparameters.

2.  Gaussianerrormodel.

3.  Additiveerrormodel.

4.  Independenceoferrors.

Noautocorrelation–whenoneobservationdependsonthelast

Page 45: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Linear Regression:

45

Assumptions:1.  Modelislinearinparameters.

2.  Gaussianerrormodel.

3.  Additiveerrormodel.

4.  Independenceoferrors.

Noautocorrelation–whenoneobservationdependsonthelast

5.  Homoscedasticity.Homogeneity/stabilityofvarianceoftheresiduals

Page 46: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Testing Assumptions: diagnostic plots

46

1.  ResidualsvsFittedValues

10 20 30 40 50 60

-10

-50

510

Fitted values

Residuals

lm(Volume ~ Girth)

Residuals vs Fitted

31

2019

•  Shouldnotberelated•  Novisiblepattern•  Meanresidual=zero•  Constantvariance

Page 47: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

-2 -1 0 1 2

-2-1

01

23

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

lm(Volume ~ Girth)

Normal Q-Q

31

2019

Testing Assumptions: diagnostic plots

47

1.  ResidualsvsFittedValues2.  NormalQuantile-Quantileplot

•  VisualtestforNormality•  Nostrongtrends/departures

Page 48: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

10 20 30 40 50 60

0.0

0.5

1.0

1.5

Fitted values

Standardized residuals

lm(Volume ~ Girth)

Scale-Location31

20

19

Testing Assumptions: diagnostic plots

48

1.  ResidualsvsFittedValues2.  NormalQuantile-QuantilePlot3.  Scale-LocationPlot

•  Testforhomoscedasticity•  Shouldbeconstant,≈1•  Notrend

Page 49: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0.00 0.05 0.10 0.15 0.20

-2-1

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

lm(Volume ~ Girth)

Cook's distance0.5

0.5

1

Residuals vs Leverage

31

128

Testing Assumptions: diagnostic plots

49

1.  ResidualsvsFittedValues2.  NormalQuantile-QuantilePlot3.  Scale-LocationPlot4.  IndexPlotofCook’sDistance

•  Measurestheinfluenceofaparticularobservation

•  Extremex-vals:highleverage•  Mayinformoutlierrejection

Page 50: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Modelling Non-Linear Relationships

50

Linearmodelscanbeusedtodescribenon-linearrelationships…

Applyingtransformationstoresponseand/orpredictorvariablescanbeusefulto:•  Linearisethedata,i.e.maketherelationshipbetweenvariablesmorelinear.•  Stabilisethevarianceoftheresiduals,sothatσ2doesn’tdependonthe

independentvariable.•  Normalisethedistributionoftheresiduals

Page 51: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Modelling Non-Linear Relationships

51

Example:Stoppingdistanceofcarsversusspeed(mph)

5 10 15 20 25

020

4060

80100

120

speed

dist

Response: y=distancePredictor: x=speed

Page 52: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

5 10 15 20 25

020

4060

80100

120

speed

dist

Modelling Non-Linear Relationships

52

Example:Stoppingdistanceofcarsversusspeed(mph)

Response: y=distancePredictor: x=speed

0 20 40 60 80

-20

020

40

Fitted values

Residuals

lm(dist ~ speed)

Residuals vs Fitted

4923

35

R2=0.651

Page 53: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

3 4 5 6 7 8 9

-2-1

01

23

Fitted values

Residuals

lm(sqrt(dist) ~ speed)

Residuals vs Fitted

23

35

39

5 10 15 20 25

24

68

10

speed

sqrt(dist)

Modelling Non-Linear Relationships

53

Example:Stoppingdistanceofcarsversusspeed(mph)

Response: y=distancePredictor: x=speed

R2=0.651R2=0.709

Page 54: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

1.5 2.0 2.5 3.0 3.5 4.0 4.5

-1.0

-0.5

0.0

0.5

1.0

Fitted values

Residuals

lm(log(dist) ~ log(speed))

Residuals vs Fitted

3

232

1.5 2.0 2.5 3.0

12

34

log(speed)

log(dist)

Modelling Non-Linear Relationships

54

Example:Stoppingdistanceofcarsversusspeed(mph)

Response: y=distancePredictor: x=speed

R2=0.651R2=0.709R2=0.733

Page 55: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

5 10 15 20 25

020

4060

80100

120

speed

dist

Modelling Non-Linear Relationships

55

Example:Stoppingdistanceofcarsversusspeed(mph)

Response: y=distancePredictor: x=speed

R2=0.651R2=0.709R2=0.733

Call:lm(formula = log(dist) ~ log(speed), data = cars)

Residuals: Min 1Q Median 3Q Max -1.00215 -0.24578 -0.02898 0.20717 0.88289

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.7297 0.3758 -1.941 0.0581 . log(speed) 1.6024 0.1395 11.484 2.26e-15 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4053 on 48 degrees of freedomMultiple R-squared: 0.7331, Adjusted R-squared: 0.7276 F-statistic: 131.9 on 1 and 48 DF, p-value: 2.259e-15

Page 56: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Modelling Non-Linear Relationships

56

Canyouusesimpleregressiontofitthismodel?

Non-linearMultiplicativeerrormodel

Page 57: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Modelling Non-Linear Relationships

57

Canyouusesimpleregressiontofitthismodel?

Non-linearMultiplicativeerrormodel

Page 58: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

Modelling Non-Linear Relationships

58

Canyouusesimpleregressiontofitthismodel?

Yes,solongasErrormodelislog-Normal.

5 10 15 20 25

020

4060

80100

120

speed

dist

Non-linearMultiplicativeerrormodel

Page 59: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

5 10 15 20 25

020

4060

80100

120

speed

dist

Modelling Non-Linear Relationships

59

Example:Stoppingdistanceofcarsversusspeed(mph)

Response: y=distancePredictor: x=speed

R2=0.651R2=0.709R2=0.733

Call:lm(formula = log(dist) ~ log(speed), data = cars)

Residuals: Min 1Q Median 3Q Max -1.00215 -0.24578 -0.02898 0.20717 0.88289

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.7297 0.3758 -1.941 0.0581 . log(speed) 1.6024 0.1395 11.484 2.26e-15 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4053 on 48 degrees of freedomMultiple R-squared: 0.7331, Adjusted R-squared: 0.7276 F-statistic: 131.9 on 1 and 48 DF, p-value: 2.259e-15

! = !!!!!!

Page 60: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

0 10 20 30 40 50

010

2030

4050

x

y

R functions:

plot(x,y)cor(x,y)cor.test(x,y)

60

data:xandyt=17.613,df=48,p-value<2.2e-16alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:0.88025560.9602168sampleestimates:cor0.9305923

Simple Regression in R:

Correlation Coefficients:

Page 61: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

61

Simple Regression in R:

R functions:

plot(x,y)m1 = lm(y~x)abline(m1)

summary(m1)

Call:lm(formula = Volume ~ Girth, data = trees)

Residuals: Min 1Q Median 3Q Max -8.065 -3.107 0.152 3.495 9.587

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -36.9435 3.3651 -10.98 7.62e-12 ***Girth 5.0659 0.2474 20.48 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.252 on 29 degrees of freedomMultiple R-squared: 0.9353, Adjusted R-squared: 0.9331 F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16

Page 62: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

62

Simple Regression in R:

R functions:

plot(x,y)m1 = lm(y~x)abline(m1)

summary(m1)

r1 = residuals(r1)hist(r1)

Page 63: Linear Modelling: Simple Regression - GitHub Pages · Simple Regression: 21 Aims: • To investigate linear correlation between two variables in more detail • Be able to predict

63

Simple Regression in R:

R functions:

plot(x,y)m1 = lm(y~x)abline(m1)

summary(m1)

r1 = residuals(r1)hist(r1)

plot(m1)

10 20 30 40 50 60

-10

-50

510

Fitted values

Residuals

Residuals vs Fitted

31

2019

-2 -1 0 1 2

-2-1

01

23

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q-Q

31

2019

10 20 30 40 50 60

0.0

0.5

1.0

1.5

Fitted values

Standardized residuals

Scale-Location31

2019

0.00 0.05 0.10 0.15 0.20

-2-1

01

23

Leverage

Sta

ndar

dize

d re

sidu

als

Cook's distance 0.5

0.5

1

Residuals vs Leverage

31

128