18
LBSRE1021 Data Interpretation Lecture 11 Correlation and Regression

LBSRE1021 Data Interpretation Lecture 11

Embed Size (px)

DESCRIPTION

LBSRE1021 Data Interpretation Lecture 11. Correlation and Regression. Example Data. The scatter diagram of the data would appear as below:. Alternatively a negative correlation would appear as below:. Alternatively data with no correlation may appear as below:. -10+1 - PowerPoint PPT Presentation

Citation preview

Page 1: LBSRE1021  Data Interpretation Lecture  11

LBSRE1021 Data Interpretation

Lecture 11

Correlation and Regression

Page 2: LBSRE1021  Data Interpretation Lecture  11

Example DataDay Output (TONS) Cost £000

1 23 58

2 17 50

3 24 54

4 35 64

5 10 40

6 16 43

7 15 42

8 24 50

9 18 53

10 30 62

Page 3: LBSRE1021  Data Interpretation Lecture  11

The scatter diagram of the data would appear as below:

5 10 15 20 25 30 35 4040

45

50

55

60

65

70

Page 4: LBSRE1021  Data Interpretation Lecture  11

Alternatively a negative correlation would appear as below:

5 10 15 20 25 30 35 400

10

20

30

40

50

Page 5: LBSRE1021  Data Interpretation Lecture  11

Alternatively data with no correlation may appear as below:

0 5 10 15 20 25 30 35 400

10

20

30

40

50

60

Page 6: LBSRE1021  Data Interpretation Lecture  11

Correlation Scale

-1 0 +1

Perfect negative No correlation Perfect positive correlation correlation

Page 7: LBSRE1021  Data Interpretation Lecture  11

Pearson’s product moment correlation coefficient (r)

r = n ∑ xy - ∑x ∑y

√ [n ∑x - (∑x)] [n ∑y - (∑y)]

x y xy x y 23 58 1334 529 3364 17 50 850 289 2500 24 54 1296 576 2916

∑ 212 516 11452 5000 27242

Page 8: LBSRE1021  Data Interpretation Lecture  11

Pearson’s product moment correlation coefficient (r) (2)

r = 10 x 11452 – 212 * 516

√ [10 x 5000 – (212)] [10 x 27242 – (516)]

= 5128

√ 5056 x 6164

= 0.9186

Page 9: LBSRE1021  Data Interpretation Lecture  11

Linear Regression

Need to establish a ‘line of best fit’The ‘freehand method’ has many

drawbacks.

In some sense we need the ‘best fit’ to the data. To obtain this we do not use crude graphical techniques. We identify the ‘line of best fit’ or ‘least squares line.’

Page 10: LBSRE1021  Data Interpretation Lecture  11

Linear Regression (2)

5 10 15 20 25 30 35 4040

45

50

55

60

65

70

The equation for this line is Y = 30.10 + 1.014X

Page 11: LBSRE1021  Data Interpretation Lecture  11

Linear Regression (3)

The equation of this line is Y =30.10 +1.014XBut how is this obtained?

The scattered points illustrate the actual data, while the least squares line is an estimate of Y for a given value of X. Notice the distance between the scattered points and the line; this will give you some idea of how good a fit the line is.

Page 12: LBSRE1021  Data Interpretation Lecture  11

Linear Regression (4)

How do we determine the least squares line?

Simply we need to determine the intercept (a) and the (b) gradient.

The formula is therefore Y = a + bx

You need to apply a little calculus (we will omit that process here) to develop standard equations.

Page 13: LBSRE1021  Data Interpretation Lecture  11

Linear Regression Equations

b = n ∑ xy - ∑ x ∑ y

n ∑ x - (∑ x)

b = 10 x 11452 – 212 x 516 10 x 5000 – 44944

b = 1.0142405

Page 14: LBSRE1021  Data Interpretation Lecture  11

Linear Regression Equations (2)

And a = y – b.x

a = 51.6 – 1.0142405 x 21.2

a = 30.098101

Rounding these values a little:Y = 30.10 + 1.014X

Page 15: LBSRE1021  Data Interpretation Lecture  11

Coefficient of Determination

The coefficient of determination measures the proportion of the variation in the dependent variable (y) explained by the variation in the independent variable (x).

It is reported as r - the square of the product moment correlation coefficient.

Page 16: LBSRE1021  Data Interpretation Lecture  11

Coefficient of Determination (2)

For our previous example:

r = 0.9186 = 0.844

This means that 84.4% of the variation in cost is dependent upon output volume. Alternatively, 15.6% of variation is not explained.

Page 17: LBSRE1021  Data Interpretation Lecture  11

Summary

Correlation is measured on a scale from -1 to +1 using Pearson’s product moment correlation coefficient (r).

Linear regression identifies the line of ‘best fit’ using the formula Y = a + bx

The coefficient of determination (r) measures the extent to which the dependent variable is explained by the independent variable.

Page 18: LBSRE1021  Data Interpretation Lecture  11

Exam Question – May 2008

Q. 7. The data below shows annual company income (£m) against year of trading.

 Year Income (£m)

1 202 233 264 285 35

A regression of income on year gives the following results: r = 0.974, r squared = 0.948, intercept = 11.4, slope = 3.5 a. Explain each of the results above (1 mark each).b. Use the results above to make a forecast for company income for year

6 (4marks).c. What assumption is made in making this forecast? (2marks).