22
Linear Regression and Correlation Topic 18

Linear Regression and Correlation Topic 18. Linear Regression Is the link between two factors i.e. one value depends on the other. E.g. Drivers age

Embed Size (px)

Citation preview

Linear Regression and Correlation

Topic 18

Linear Regression

Is the link between two factors i.e. one value depends on the other.

E.g. Drivers age – risk of accident. Gender – time spent shopping Car price – depends on age (of car) Sales – depend on Marketing

Crickets and Temperature

Crickets make their chirping sounds by rapidly sliding one wing over the other.

The faster they move their wings, the higher the chirping sound that is produced.

Crickets and Temperature

Analysing the data

First graph the data using the XY (Scatter) option

Analysing the data

Then right click on one of the data points and select – Add Trendline

Analysing the data

Select the Linear Regression type

Analysing the data

Now right click on the Trendline and select Format Trendline then select Options – finally select Display equation on Chart

Analysing the data

We can now predict the Temperature.

Line of Best Fit

You can see differences between the Measured Values and the Calculated values – why?

Mean Squared Error (MSE)

The mean squared error or MSE of an estimator is the expected value of the square of the "error."

The error is the amount by which the estimator differs from the quantity to be estimated.

The difference occurs because of randomness

or because the estimator doesn't account for information that could produce a more accurate estimate.

Root Mean Square Error

The root mean square error (RMSE) is a frequently-used measure of the difference between values predicted by a model and the values actually observed from the thing being modelled or estimated.

The lower the value of the RMSE the better the fit of observed to calculated data.

RMSE

Stating the Error

For our Crickets we could then say: Temperature Y = 1.8635X – 3.7532 Where X is the recorded beats per

second of the Crickets wings. Accurate to + or – 2.07 o C

Correlation Coefficient

The correlation coefficient is a measure of how well trends in the predicted values follow trends in the actual values. 

It is a measure of how well the predicted values from a forecast model "fit" with the real-life data.

Correlation Coefficient

The correlation coefficient is a number between 0 and +/- 1. 

If there is no relationship between the predicted values and the actual values the correlation coefficient is 0 or very low (the predicted values are no better than random numbers). 

As the strength of the relationship between the predicted values and actual values increases, so does the correlation coefficient. 

A perfect fit gives a coefficient of +/- 1.0.  Thus the higher the correlation coefficient the better.

A demonstration

correlation

Correlation

Two main methods of calculating correlations are:

Spearman's Rank Correlation Coefficient and

Pearson's or the Product-Moment Correlation Coefficient.

Spearman’s Rank Correlation Coefficient

Spearman's Rank Correlation Coefficient

In calculating this coefficient, we use the Greek letter 'rho' or rThe formula used to calculate this coefficient is:

r = 1 - (6 d2 ) / n(n2 - 1)

Pearson's or Product-Moment Correlation Coefficient

The Pearson Correlation Coefficient is denoted by the symbol r. Its formula is based on the standard deviations of the x-values and the y-values:

Coefficient of Determination R Squared

Shows the amount of variation in y that depends on x

The version most common in statistics texts is based on an analysis of variance decomposition as follows:

SST is the total sum of squares, SSR is the explained sum of squares, and SSE is the residual sum of squares

Coefficient of Determination R Squared

Thankfully Excel calculates this for you: