27
S1 :: Chapter 6 Correlation www.drfrostmaths.com Dr J Frost ([email protected]) Last modified: 20 th January 2016

S1: Chapter 6 Correlation

  • Upload
    zorana

  • View
    71

  • Download
    0

Embed Size (px)

DESCRIPTION

S1: Chapter 6 Correlation. Dr J Frost ([email protected]). Last modified : 21 st November 2013. Recap of correlation. Weak negative correlation. ?. ?. Type of correlation: Weak positive correlation. ?. ?. strength. type. No correlation. ?. ?. - PowerPoint PPT Presentation

Citation preview

Page 1: S1:  Chapter 6 Correlation

S1 :: Chapter 6Correlation

www.drfrostmaths.com Dr J Frost ([email protected])

Last modified: 20th January 2016

Page 2: S1:  Chapter 6 Correlation

0 10 20 30 40 50 60 70 80 90 1000

102030405060708090

100

English Score

Mat

hs S

core

0 10 20 30 40 50 60 70 80 900

5

10

15

20

25

Age

Wee

kly

time

on in

tern

et (h

ours

)0 20 40 60 80 100 120 140

£0.00

£10.00

£20.00

£30.00

£40.00

£50.00

£60.00

£70.00

Distance travelled (km)

Cost

of t

rain

fare

0 10000 20000 30000 40000 50000 6000005

10152025303540

Number of people in city called 'Dave'

Crim

e Ra

te

Type of correlation:Weak positive correlation? ?

strength type

Weak negative correlation??

Strong positive correlation? ?

No correlation?

Recap of correlationCorrelation gives the strength of the relationship (and the type of relationship) between two variables.

Page 3: S1:  Chapter 6 Correlation

Recall that variance is defined as “the average squared distance from the mean”. We could therefore express in terms of :

! represents the total squared distance from the mean.

Simplified formula

?

Formula based on definition

?Bro Exam Tip: Given in formula booklet, but useful to memorise.

?

Page 4: S1:  Chapter 6 Correlation

Covariance

We understand variance as ‘how much a variable varies’.

We can extend variance to two variables.We might be interested in how one variable varies with another.

0 20 40 60 80 100 120 140£0.00

£10.00

£20.00

£30.00

£40.00

£50.00

£60.00

£70.00

Distance travelled (km)

Cost

of t

rain

fare

We can say that as distance (say ) increases, the cost (say ) increases. Thus the covariance of and is positive.

?

(this won’t be tested in an exam but is intended to provide background)

Page 5: S1:  Chapter 6 Correlation

Covariance

Comment on the covariance between the variables.

As increases, doesn’t change very much. So the covariance is small (but positive)

As increases, doesn’t change very much. So the covariance is small (but positive)

? ?

𝑥

𝑦 𝑦

𝑥

(this won’t be tested in an exam but is intended to provide background)

Page 6: S1:  Chapter 6 Correlation

Covariance

Comment on the covariance between the variables.

As varies, doesn’t vary at all. So we say that variables are independent, and the covariance is 0.

?

𝑥

𝑦

As increases, decreases. So the covariance is negative.?

𝑥

𝑦

(this won’t be tested in an exam but is intended to provide background)

Page 7: S1:  Chapter 6 Correlation

Simplified formula

?

Interesting things to note (but not examined):• Just as , • How could be expressed in terms of covariance?

i.e. variance is the extent to which a variable varies with itself!

?

?

Just as gave a measure of how much a variable varies, gives a measure of how two variables and vary with each other.

!

Page 8: S1:  Chapter 6 Correlation

Product Moment Correlation Coefficient (PMCC)We saw that gives a measure of how two variables vary with each other. That sounds like correlation!Wouldn’t it be nice if we could somehow ‘normalise’ it so we end up with just a number between -1 and 1…

𝑟=𝑆𝑥𝑦

√𝑆𝑥𝑥𝑆𝑦𝑦We’ll interpret what that means in a second.

is known as the Product Moment Correlation Coefficient (PMCC).

Have an intelligent guess based on the discussion

above.

?

!

Page 9: S1:  Chapter 6 Correlation

Interpreting the PMCC

We’ve seen the PMCC varies between -1 and 1.

𝑟=1means

Perfect positive correlation.

𝑟=0means

No correlation

𝑟=−1means

Perfect negative correlation.

?

?

?

Page 10: S1:  Chapter 6 Correlation

Interpreting the PMCC

0 10 20 30 40 50 60 70 80 90 1000

102030405060708090

100

English Score

Mat

hs S

core

0 10 20 30 40 50 60 70 80 900

5

10

15

20

25

AgeW

eekl

y tim

e on

inte

rnet

(hou

rs)

0 20 40 60 80 100 120 140£0.00

£10.00

£20.00

£30.00

£40.00

£50.00

£60.00

£70.00

Distance travelled (km)

Cost

of t

rain

fare

0 10000200003000040000500006000005

10152025303540

Number of people in city called 'Dave'

Crim

e Ra

te

𝑟=0.8

𝑟=0

𝑟=−0.4

𝑟=0.96

Match the value to each scatter diagram.

Page 11: S1:  Chapter 6 Correlation

Example

Baby A B C D E FHead Circumference () 31.1 33.3 30.0 31.5 35.0 30.2

Gestation Period () 36 37 38 38 40 40

Σ𝑥=191.1 Σ 𝑦=229Σ𝑥2=6105.39Σ 𝑦2=8753

Σ𝑥𝑦=7296.7

𝑆𝑥𝑥=Σ𝑥2−

(Σ 𝑥 )2

𝑛=18.855

𝑆 𝑦𝑦=Σ 𝑦2−

(Σ 𝑦 )2

𝑛=12.833

𝑆𝑥𝑦=Σ 𝑥𝑦−(Σ 𝑥 ) (Σ 𝑦 )

𝑛=3.05

𝑟=𝑆𝑥𝑦

√𝑆𝑥𝑥𝑆𝑦𝑦

=0.196

?

?

?

?

??

?

?

? 𝑛=6?

Page 12: S1:  Chapter 6 Correlation

Let’s do it on our calculators!

Baby A B C D E FHead Circumference () 31.1 33.3 30.0 31.5 35.0 30.2

Gestation Period () 36 37 38 38 40 40

• Put in Stats mode: MODE • Select for (i.e. calculations to do with linear

relationships)• Insert the data into your table. Use the arrow keys and

‘=‘ to add the values.• Once done, press the button. This goes to normal

calculation input.• We want to insert into your calculation. Press , and

choose for REGRESSION.• Select for . is now in your calculation, so press =.

Page 13: S1:  Chapter 6 Correlation

Test Your UnderstandingJune 2013 Q1

?

??

Page 14: S1:  Chapter 6 Correlation

Further Practice

? ? ?

?

? ? ?

?

Quite often the values are given to you in an exam.

Page 15: S1:  Chapter 6 Correlation

Interpreting the PMCC“Interpret” vs “State”In general in Statistics exams, the word ‘interpret’ means “explain in context using non-statistical language”.

Bob wants to establish if there’s a connection between waiting time () at the post office and customer satisfaction (). He calculates as -0.81. Interpret this correlation coefficient.

A bad answer (that may or may not be accepted):“Strong negative correlation” (this is stating the correlation not interpreting it)

A good answer:“As the waiting time increases, the customer satisfaction tends to decrease”. ?

?

Page 16: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q1

?

?

?

Page 17: S1:  Chapter 6 Correlation

(Before you go on to Q2) Effects of coding

We know that and

Therefore, if all our data values get k times bigger in size and values become times bigger, what happens to…

(Recap) Variance of : times as big: times as big: times as big: times as big: Unaffected!

?????

Bro Exam Note: For the purposes of the S1 exam, you just need to remember that:!• Coding affects in the

same way that the variance is affected. i.e. If the variance becomes 9 times larger, so does .

• PMCC is completely unaffected by (linear) coding.

Page 18: S1:  Chapter 6 Correlation

𝑟=0.655

Example

1020 1032 1028 1034 1023 1038

320 335 345 355 360 380

𝑝=𝑥−10201

𝑞=𝑦−3005

0 12 8 14 3 18

4 7 9 11 12 16

We can now just find the PMCC of this new data set, and no further adjustment is needed.

?

Page 19: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q2

?

?

?

Page 20: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q3

?

?

Page 21: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q4

?

?

?

Page 22: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q5

?

?

?

Page 23: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q6

?

?

?

Page 24: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q7

?

?

?

Page 25: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q8

?

?

?

??

Page 26: S1:  Chapter 6 Correlation

Exam Questions (on provided sheet)

Q9

?

?

?

Page 27: S1:  Chapter 6 Correlation

Limitations of correlationOften there’s a 3rd variable that explains two others, but the two variables themselves are not connected.

Q1: The number of cars on the road has increased, and the number of DVD recorders bought has decreased. Is there a correlation between the two variables?

Buying a car does not necessarily mean that you will not buy a DVD recorder, so we cannot say there is a correlation between the two.

Q2: Over the past 10 years the memory capacity of personal computers has increased, and so has the average life expectancy of people in the western world. Is there are correlation between these two variables?

The two are not connected, but both are due to scientific development over time (i.e. a third variable!)

?

?