18
Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph? Go on then!......accurately correlation 1.xls

Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Embed Size (px)

Citation preview

Page 1: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Measure your handspan and foot length in cm to nearest mm

We will record them as Bivariate data below:

Now we need to plot them in what kind of graph?

Go on then!......accurately

correlation 1.xls

Page 2: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Before adding a line of best fit it is sensible to consider if there should be one in the first place.At GCSE we just looked at the scattergraph and

decided visually whether the correlation; existed, was weak or strong.However this is dangerous.

Consider the graphs below and state there correlation purely from a visual point of view.

corrrelation 2.xls

Page 3: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Because of this statisticians use a numerical value to assess whether the

correlation is strong enough to add a line of best fit.

A popular choice is the Product Moment Correlation Coefficient (PMCC)

This is often just denoted by "r" and is often squared (although this would mean

you don't know if it's a positive or negative correlation.)

This is calculated using the formula below. It is a lot easier on a spreadsheet or

graphical calculator and so in exams they often give you some of the "bits".

This is the formula we use practically but this link

explains where it has come from and how it relates to your scattergraph points

Page 4: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Once you've calculated the PMCC it needs to be interpreted.

Open this spreadsheet and use the graph tool to draw scattergraphs for each.

Consider which coloured data has the strongest correlation.

Now add a linear line of best fit and consider how close the points appear to

the line. Do you still agree with your previous answers?

Calculate the r values for each set of data.

Now add the r2 value to each graph. Were you right?

Square root these values to find the r value and consider if it's negative or

positive.

Page 5: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

If r is 1 there is perfect positive correlation (the points form a straight line)

If r is -1 there is perfect negative correlation

Between -1 and 1 we have, strong weak and no correlation.

The closer to 1 or -1 the stronger the correlation and the closer to 0 indicates no

correlation between your values.

However the more points you have in your dataset the further from 1 it will appear,

despite a strong correlation. To interpret r correctly we must also consider how many

pieces of data are collected.

From your earlier datasets which of the turquoise and orange is strongest

according to the r value?

Page 6: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

They have almost the same PMCC value.

However the orange dataset has a stronger correlation because it is more

difficult to get 10 points near to a straight line than 5 points.

This weblink gives a table of data you should refer to when considering if a

value has a high enough PMCC value to assume a correlation exists.

As long as the r value is larger than the one in the table you can

be ....% sure there is a correlation.

Consider the yellow and blue data sets.Only one piece of data has changed.

What is the probability these data sets show a correlation?

Page 7: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Try Ex 6AQ 2, 3, 6

and

Ex 6CQ1, 2, 4, 5, 7

Page 8: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

correlation 1.xls

Add a line of best fit (visually) to your graph for feet and hand span

data.

Use what you have learned on C1 to calculate an equation for this line.

Is this line the same as any of your classmates?

Why do you think this is?

Are you happy you have put your line in the right place?

Could you move it and still be happy?What made you put it where you did?

Page 9: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Was your line equation the same as the excel one?

Excel calculates this line mathematically rather than by visual judgement.

It calculates the vertical distance between each coordinate and the

possible line, adds the square of these distances together and then it adjusts

the lines position to minimise this value.

Why do you think it squares the value?

Page 10: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

This seems complicated but there are formulae you can use to do it quickly.

To begin with we will consider how each coordinate differs from the

mean.

Above you can see above how the formulae can be rewritten into an easier form to calculate.

Below is how the three parts need to be put together to produce the Product Moment

Correlation Coefficient (PMCC) - r from the Excel graphs we considered earlier.

Sxx = x2 - ( x)2

n

Syy = y2 - ( y)2

n

Sxy = xy - ( x )( y)

n

Page 11: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Try Ex 6BQ 1, 2, 3, 4, 5, 9

Page 12: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

The formulae for Sxx etc... can also be used to find the equation of the line of best fit

A straight line is in the form y = a + bx

where b is the gradient and found using

where y is the mean of the y data and x is the

mean of the x data

Given the gradient of the line and knowing it should pass through the

point (x,y) can easily be calculated

Page 13: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

From our class data estimate the hand span of a year 12 SAC student who has a foot

length of 30cm.

Estimate the foot length of a student with a hand span of 22cm.

Redraw the data with handspan on the x axis and draw a line of best fit. Is your

answer the same.

Download the data in excel and swap the data columns over. What happens to the

equation of the line of best fit?

Calculate the foot length above using both equations excel gives you.

Comment on your results.

correlation 1.xls

Page 14: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

You will notice that the formula for b uses Sxx but not Syy.

This is because this formula is only used for a line if best fit required for finding y given a

specific x coordinate.It minimises the distance of each point

vertically from the line of best fit.If you want to estimate the x value given a

specific y value you should use a different line of best fit which minimises the horizontal

distance from each point to the line.The formula for the line of best fit is only very

slightly different:

Use b' and the means of x and y to find a'

Sxy

b' = Syy

Page 15: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

The only time we don't use the y = a' + b'x version for estimating x when we know the y

value is when the x data is FIXED.

If you collect data from an experiment where one value in the data is pre-set we call that FIXED and we must plot that on the x axis and then use the y = a + bx line of best fit

for any estimating of values.

An example of this might be timing an ice cube melting at certain temperatures. The temperatures used are decided before hand

- FIXED - and temperature needs to be on the x axis.

Page 16: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Try Ex 7AQ1, 2, 4, 7, 9

Page 17: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

A regression line can be used to estimate the value of any dependent variable for any

independent variable.

Interpolation is when you estimate the value with thin the range of data using the equation of the regression line (line of best

fit)

Extrapolation is when you estimate the value with thin the range of data using the

equation of the regression line.

What do you think are the dangers of either of these techniques and which one would you view

most cautiously?Why?

Page 18: Measure your handspan and foot length in cm to nearest mm We will record them as Bivariate data below: Now we need to plot them in what kind of graph?

Try Ex 7CQ1, 3, 4, 6, 8