28
Multiple regression

Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Multiple regression

Page 2: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Problem: to draw a straight line through the points that best explains the variance

Regression

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6

Page 3: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Problem: to draw a straight line through the points that best explains the variance

Regression

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6

Page 4: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Problem: to draw a straight line through the points that best explains the variance

Regression

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6

Page 5: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Test with F, just like ANOVA:

Variance explained by x-variable / dfVariance still unexplained / df

Regression

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6

Varianceexplained

(change in line lengths2)

Varianceunexplained

(residualline lengths2)

Page 6: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Test with F, just like ANOVA:

Variance explained by x-variable / dfVariance still unexplained / df

Regression

In regression, each x-variable will normally have 1 df

Page 7: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Test with F, just like ANOVA:

Variance explained by x-variable / dfVariance still unexplained / df

Regression

Essentially a cost: benefit analysis –

Is the benefit in variance explained worth the cost in using up degrees of freedom?

Page 8: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Total variance for 32 data points is 300 units.

An x-variable is then regressed against the data, accounting for 150 units of variance.

1. What is the R2?

2. What is the F ratio?

Regression example

Page 9: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Total variance for 32 data points is 300 units.

An x-variable is then regressed against the data, accounting for 150 units of variance.

1. What is the R2?

2. What is the F ratio?

Regression example

R2 = 150/300 = 0.5

F 1,30 = 150/1 = 30 150/30

Why is df error = 30?

Page 10: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Multiple regression

Tree age

Herbivore damage

Higher nutrient treesLower nutrient trees

Damage= m1*age + b

Page 11: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Tree age

Herbivore damage

Tree nutrient concentration

Residuals ofherbivore damage

Page 12: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Tree age

Herbivore damage

Tree nutrient concentration

Residuals ofherbivore damage

Damage= m1*age + m2*nutrient + b

Page 13: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

0

20

40

60

1 2 3 41 0

50

100

1 2 3 41

Damage= m1*age + m2*nutrient + m3*age*nutrient +b

No interaction (additive): Interaction (non-additive):

y y

Page 14: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Non-linear regression?

Just a special case of multiple regression!

Y = m1 x +m2 x2 +b

X X2 Y1 1 1.12 4 2.03 9 3.64 16 3.15 25 5.26 36 6.77 49 11.3

X2X1

Y = m1 x1 +m2 x2 +b

Page 15: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

STEPWISE REGRESSION

Page 16: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

8 11109

Jump height (how high ball can be raised off the ground)

Feet off ground

Total SS = 11.11

Page 17: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

7

7.5

8

8.5

9

9.5

10

10.5

11

4.5 5.5 6.5 7.5 8.5

Height (ft)

Ju

mp

(ft

)

X variable parameter SS F1,13 p

Height +0.943 9.96 112 <0.0001of player

Page 18: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

7

7.5

8

8.5

9

9.5

10

10.5

11

105 125 145 165 185 205

Weight (lbs)

Ju

mp

(ft

)

X variable parameter SS p

Weight +0.040 7.92 32 <0.0001of player

F1,13

Page 19: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Why do you think weight is + correlated with jump height?

Page 20: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

An idea

Perhaps if we took two people of identical height, the lighter one might actually jump higher? Excess weight may reduce ability to jump high…

Page 21: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

How could we test this idea?

Page 22: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

7

7.5

8

8.5

9

9.5

10

10.5

11

4 5 6 7 8

Height (lbs)

Ju

mp

(ft

)

lighterheavier

X variable parameter SS F p

Height +2.133 9.956 803 <0.0001Weight -0.059 1.008 81 <0.0001

Page 23: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Questions:

•Why did the parameter estimates change?

•Why did the F tests change?

Page 24: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Heavy people often tall (tall people often

heavy)

Tall people can jump higher

People light for their height can jump a bit more

Weight

HeightJump

+

+

-

Page 25: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

The problem:

The parameter estimate and significance of an x-variable is affected by the x-variables already in the model!

How do we know which variables are significant, and which order to enter them in model?

Page 26: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Solutions

1) Use a logical order. For example in ANCOVA it makes sense to test the interaction first

2) Stepwise regression: “tries out” various orders of removing variables.

Page 27: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Stepwise regression

Enters or removes variables in order of significance, checks after each step if the significance of other variables has changed

Enters one by one: forward stepwise

Enters all, removes one by one: backwards stepwise

Page 28: Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression

Forward stepwise regression

• Enter the variable with the highest correlation with y-variable first (p>p enter).

• Next enter the variable to explains the most residual variation (p>p enter).

• Remove variables that become insignificant (p> p leave) due to other variables being added. And so on…