Upload
elizabeth-ward
View
214
Download
1
Embed Size (px)
Citation preview
Regression
• This Chapter is on Regression
• We will learn the difference between dependent and independent variables
• We will be looking at the line of best fit
• We are going to see how to calculate the equation of the line of best fit (regression equation), and interpret it
RegressionVariables, and the line of best fit
The equation of a straight line is usually given in the form y = a + bx.
If y = a + bx then a is the y-intercept (where the line cuts the y-axis) and b is the gradient of the line.
You can draw any line like this by choosing values for x and substituting into the equation.
Sketch the equation y = 2x + 3
7A
0 1 320
2
8
6
4
x
y
x = 0
y = 3x =
1y = 5x =
2y = 7
y = 2x + 3
RegressionVariables, and the line of best fitIndependent variable (explanatory) is independent of the other variable. It is plotted on the x-axis.
Dependent variable (response) is the one whose values are determined by the independent variable. It is plotted on the y-axis.
For example: If we are looking at album sales and stores that stock albums… The album sales will be dependent on the number of stores selling them So album sales are dependent, and the number of stores independent
7A
e1
RegressionVariables, and the line of best fitThe formula for the line of best fit will be in the form:
y = a + bx
xy
xx
Sb
S a y bx So you must always calculate b
first!
x
y
e2
e5
e4e3
The regression line goes through the middle of the points plotted
Mathematically each point is a vertical distance ‘e’ from the line
Each of these distances is known as a residual
The regression line will minimise the sum of the squares of these residuals2e Minimum
7A
RegressionVariables, and the line of best fitFor the following set of data:
a) Calculate Sxx and Sxy.b) Work out the equation of the regression line.
7A
5n288.6y
2 22000x 2 16879.14y
18238xy 300x
60x 57.72y
2
2xxx
xn
S 2(300)
220005
xxS
4000xxS
xyx y
xyn
S 300 288.6
182385
xyS
922xyS
RegressionVariables, and the line of best fitFor the following set of data:
a) Calculate Sxx and Sxy.b) Work out the equation of the regression line.
7A
5n288.6y
2 22000x 2 16879.14y
18238xy 300x
60x 57.72y
4000xxS 922xyS y = a + bx
xy
xx
Sb
S a y bx
922
4000b
0.2305b
57.72 (0.2305 60)a
43.89a
y = 43.89 + 0.2305x
Give answers in full, or if rounded, to 3sf
RegressionCoding and Regression EquationsAs with other topics we have looked at, coding can be used to make the numbers easier to work with.
However, the coded regression line will most likely be different from the actual regression line
To calculate the actual regression line, you must substitute the codes for x and y into the coded regression formula…
7B
RegressionCoding and Regression EquationsThe following coding was used to alter a set of data.
This is the formula for the coded regression line:
Calculate the actual regression line for the original data, x and y.
7B
2
10
xr
5t y
2 5t r
2 5t r
5y
50y 2 2x 50
50y 2 4x 50
50y 2 46x
y 2 46
50
x
Substitute the codes for t and r
522
10
x Multiply all parts
by 10 to cancel the divide by 10
Expand the bracket
Simplify by grouping
Divide by 50 to leave y on its own
OR: y = (0.04x + 0.92)
RegressionCoding and Regression EquationsEight Samples of carbon steel were produced with different percentages (c) of carbon in them. Each sample was heated until it melted and the temperature (m) recorded. The results were coded so that:
The following table shows the coded results:
Calculate Sxy and Sxx.
7B
10x c700
5
my
Carbon (x) 1 2 3 4 5 6 7 8
Melting Point (y)
35 28 24 16 15 12 8 6
2 204x 478xy
2
2xxx
xn
S 2(36)
2048
xxS
170xxS
36x
xyx y
xyn
S 36 144
4788
xyS
42xyS
144y
RegressionCoding and Regression Equations
Calculate the regression line of y on x.
y = a + bx
y = 36.21 - 4.048x
7B
Carbon (x) 1 2 3 4 5 6 7 8
Melting Point (y)
35 28 24 16 15 12 8 6
2 204x 478xy 170xxS
36x 42xyS 144y
xy
xx
Sb
S a y bx
xy
xx
Sb
S
170
42b
4.048b 85
21
a y bx
144 364.048
8 8a
36.21a 507
14
y
ny
x
nx
RegressionCoding and Regression Equations
y = 36.21 - 4.048x
Calculate the regression line of m on c.
7B
10x c700
5
my
36.21 4.048y x
700
5
m 36.21 4.048(10 )c
700
5
m 36.21 40.48c
700m 181.08 202.4c
m881.08 202.4c
Substitute the codes for y and
x
Multiply out the bracket
Multiply by 5 to cancel the
division
Add 700
Remember, with longer decimals, make a note of
the fraction your calculator gives, so you can get the
exact value later on…
RegressionApplying and Interpreting the Regression EquationA regression equation can be used to predict the dependent variable, based on a chosen value of the independent variable.
Interpolation Estimating a value that is within the data range you have
Extrapolation Estimating a value outside the data that you have. As it is outside the data you have, extrapolated values can be unreliable.
Generally, avoid extrapolating values unless asked and even then treat answers ‘with caution’…
7C
RegressionApplying and Interpreting the Regression EquationThe results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below.
The regression line was calculated to be:y = 43.89 + 0.2305x
Estimate the value for y when x = 35kg. Is this Interpolation or Extrapolation?
7C
Mass, (x) kg 20 40 60 80 100
Length, y (cm)
4855.1
56.3
61.2
68
43.89 0.2305y x 43.89 (0.2305 35)y 51.96y cm Include
the unit!
Interpolation as x = 35 is within the data range we
have…
RegressionApplying and Interpreting the Regression EquationThe results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below.
The regression line was calculated to be:y = 43.89 + 0.2305x
Estimate the value for y when x = 120kg. Is this Interpolation or Extrapolation?
7C
Mass, (x) kg 20 40 60 80 100
Length, y (cm)
4855.1
56.3
61.2
68
43.89 0.2305y x 43.89 (0.2305 120)y 71.55y cm Include
the unit!
Extrapolation as x = 120 is outside the
data range we have…
RegressionApplying and Interpreting the Regression EquationThe results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below.
The regression line was calculated to be:y = 43.89 + 0.2305x
Interpret the ’43.89’ in the equation. If x = 0, y = 43.89 If the mass is 0kg, the length of the spring is 43.89cm So the 43.89 represents the starting length of the spring!
7C
Mass, (x) kg 20 40 60 80 100
Length, y (cm)
4855.1
56.3
61.2
68
The x represents mass and the y
represents spring length
RegressionApplying and Interpreting the Regression EquationThe results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below.
The regression line was calculated to be:y = 43.89 + 0.2305x
Interpret the ’0.2305’ in the equation. If we increase x by 1, y increases by 0.2305 If the mass increases by 1kg, the length of the spring increases by 0.2305cm So the 0.2305 represents the length increase of the spring after adding on an extra kilogram of mass
7C
Mass, (x) kg 20 40 60 80 100
Length, y (cm)
4855.1
56.3
61.2
68
The x represents mass and the y
represents spring length
Summary
• We have learnt how to calculate a line of best fit
• We have used coding and learnt how to ‘undo’ it by substitution
• We have learnt how to interpret a regression equation
• We have looked at Interpolation and Extrapolation