21

Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

Embed Size (px)

Citation preview

Page 1: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best
Page 2: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

Regression

• This Chapter is on Regression

• We will learn the difference between dependent and independent variables

• We will be looking at the line of best fit

• We are going to see how to calculate the equation of the line of best fit (regression equation), and interpret it

Page 3: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best
Page 4: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionVariables, and the line of best fit

The equation of a straight line is usually given in the form y = a + bx.

If y = a + bx then a is the y-intercept (where the line cuts the y-axis) and b is the gradient of the line.

You can draw any line like this by choosing values for x and substituting into the equation.

Sketch the equation y = 2x + 3

7A

0 1 320

2

8

6

4

x

y

x = 0

y = 3x =

1y = 5x =

2y = 7

y = 2x + 3

Page 5: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionVariables, and the line of best fitIndependent variable (explanatory) is independent of the other variable. It is plotted on the x-axis.

Dependent variable (response) is the one whose values are determined by the independent variable. It is plotted on the y-axis.

For example: If we are looking at album sales and stores that stock albums… The album sales will be dependent on the number of stores selling them So album sales are dependent, and the number of stores independent

7A

Page 6: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

e1

RegressionVariables, and the line of best fitThe formula for the line of best fit will be in the form:

y = a + bx

xy

xx

Sb

S a y bx So you must always calculate b

first!

x

y

e2

e5

e4e3

The regression line goes through the middle of the points plotted

Mathematically each point is a vertical distance ‘e’ from the line

Each of these distances is known as a residual

The regression line will minimise the sum of the squares of these residuals2e Minimum

7A

Page 7: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionVariables, and the line of best fitFor the following set of data:

a) Calculate Sxx and Sxy.b) Work out the equation of the regression line.

7A

5n288.6y

2 22000x 2 16879.14y

18238xy 300x

60x 57.72y

2

2xxx

xn

S 2(300)

220005

xxS

4000xxS

xyx y

xyn

S 300 288.6

182385

xyS

922xyS

Page 8: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionVariables, and the line of best fitFor the following set of data:

a) Calculate Sxx and Sxy.b) Work out the equation of the regression line.

7A

5n288.6y

2 22000x 2 16879.14y

18238xy 300x

60x 57.72y

4000xxS 922xyS y = a + bx

xy

xx

Sb

S a y bx

922

4000b

0.2305b

57.72 (0.2305 60)a

43.89a

y = 43.89 + 0.2305x

Give answers in full, or if rounded, to 3sf

Page 9: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best
Page 10: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionCoding and Regression EquationsAs with other topics we have looked at, coding can be used to make the numbers easier to work with.

However, the coded regression line will most likely be different from the actual regression line

To calculate the actual regression line, you must substitute the codes for x and y into the coded regression formula…

7B

Page 11: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionCoding and Regression EquationsThe following coding was used to alter a set of data.

This is the formula for the coded regression line:

Calculate the actual regression line for the original data, x and y.

7B

2

10

xr

5t y

2 5t r

2 5t r

5y

50y 2 2x 50

50y 2 4x 50

50y 2 46x

y 2 46

50

x

Substitute the codes for t and r

522

10

x Multiply all parts

by 10 to cancel the divide by 10

Expand the bracket

Simplify by grouping

Divide by 50 to leave y on its own

OR: y = (0.04x + 0.92)

Page 12: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionCoding and Regression EquationsEight Samples of carbon steel were produced with different percentages (c) of carbon in them. Each sample was heated until it melted and the temperature (m) recorded. The results were coded so that:

The following table shows the coded results:

Calculate Sxy and Sxx.

7B

10x c700

5

my

Carbon (x) 1 2 3 4 5 6 7 8

Melting Point (y)

35 28 24 16 15 12 8 6

2 204x 478xy

2

2xxx

xn

S 2(36)

2048

xxS

170xxS

36x

xyx y

xyn

S 36 144

4788

xyS

42xyS

144y

Page 13: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionCoding and Regression Equations

Calculate the regression line of y on x.

y = a + bx

y = 36.21 - 4.048x

7B

Carbon (x) 1 2 3 4 5 6 7 8

Melting Point (y)

35 28 24 16 15 12 8 6

2 204x 478xy 170xxS

36x 42xyS 144y

xy

xx

Sb

S a y bx

xy

xx

Sb

S

170

42b

4.048b 85

21

a y bx

144 364.048

8 8a

36.21a 507

14

y

ny

x

nx

Page 14: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionCoding and Regression Equations

y = 36.21 - 4.048x

Calculate the regression line of m on c.

7B

10x c700

5

my

36.21 4.048y x

700

5

m 36.21 4.048(10 )c

700

5

m 36.21 40.48c

700m 181.08 202.4c

m881.08 202.4c

Substitute the codes for y and

x

Multiply out the bracket

Multiply by 5 to cancel the

division

Add 700

Remember, with longer decimals, make a note of

the fraction your calculator gives, so you can get the

exact value later on…

Page 15: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best
Page 16: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionApplying and Interpreting the Regression EquationA regression equation can be used to predict the dependent variable, based on a chosen value of the independent variable.

Interpolation Estimating a value that is within the data range you have

Extrapolation Estimating a value outside the data that you have. As it is outside the data you have, extrapolated values can be unreliable.

Generally, avoid extrapolating values unless asked and even then treat answers ‘with caution’…

7C

Page 17: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionApplying and Interpreting the Regression EquationThe results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below.

The regression line was calculated to be:y = 43.89 + 0.2305x

Estimate the value for y when x = 35kg. Is this Interpolation or Extrapolation?

7C

Mass, (x) kg 20 40 60 80 100

Length, y (cm)

4855.1

56.3

61.2

68

43.89 0.2305y x 43.89 (0.2305 35)y 51.96y cm Include

the unit!

Interpolation as x = 35 is within the data range we

have…

Page 18: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionApplying and Interpreting the Regression EquationThe results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below.

The regression line was calculated to be:y = 43.89 + 0.2305x

Estimate the value for y when x = 120kg. Is this Interpolation or Extrapolation?

7C

Mass, (x) kg 20 40 60 80 100

Length, y (cm)

4855.1

56.3

61.2

68

43.89 0.2305y x 43.89 (0.2305 120)y 71.55y cm Include

the unit!

Extrapolation as x = 120 is outside the

data range we have…

Page 19: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionApplying and Interpreting the Regression EquationThe results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below.

The regression line was calculated to be:y = 43.89 + 0.2305x

Interpret the ’43.89’ in the equation. If x = 0, y = 43.89 If the mass is 0kg, the length of the spring is 43.89cm So the 43.89 represents the starting length of the spring!

7C

Mass, (x) kg 20 40 60 80 100

Length, y (cm)

4855.1

56.3

61.2

68

The x represents mass and the y

represents spring length

Page 20: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

RegressionApplying and Interpreting the Regression EquationThe results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below.

The regression line was calculated to be:y = 43.89 + 0.2305x

Interpret the ’0.2305’ in the equation. If we increase x by 1, y increases by 0.2305 If the mass increases by 1kg, the length of the spring increases by 0.2305cm So the 0.2305 represents the length increase of the spring after adding on an extra kilogram of mass

7C

Mass, (x) kg 20 40 60 80 100

Length, y (cm)

4855.1

56.3

61.2

68

The x represents mass and the y

represents spring length

Page 21: Regression This Chapter is on Regression We will learn the difference between dependent and independent variables We will be looking at the line of best

Summary

• We have learnt how to calculate a line of best fit

• We have used coding and learnt how to ‘undo’ it by substitution

• We have learnt how to interpret a regression equation

• We have looked at Interpolation and Extrapolation