26
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution is required for correlation and regression technique. A bivariate distribution is a joint distribution of two variables, the individual scores of which are paired in some logical way. A bivariate distribution may show positive correlation, negative correlation, or zero correlation. EDUCATIONAL STATISTICS EDU595 0 WEEK12

Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

Embed Size (px)

Citation preview

Page 1: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data.

A bivariate distribution is required for correlation and regression technique.

A bivariate distribution is a joint distribution of two variables, the individual scores of which are paired in some logical way.

A bivariate distribution may show positive correlation, negative correlation, or zero correlation.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 2: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

In a case of a positive correlation between two variables, high measurements on one variable tend to be associated with high measurements on the other variable, and low measurements on one variable with low measurements on the other.

Table 1 shows a graph of relationship of tall fathers tend to have sons who grew up to be tall men.

Short fathers tend to have sons who grow up to be short men.

If such were the case (which, of course, is ridiculous), then it would be possible to predict without error the adult height of an unborn son simply by measuring his father.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 3: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 4: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 5: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

In Figure 1, each point represents a pair of scores, the height of a father and the height of his son.

Such an array of points is called a scatterplot.

The line that runs through the points is called a regression line.

It is a line of best fit.

When there is perfect correlation (r = 1.00), all points fall exactly on the regression line.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 6: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

When a correlation is negative, increases in one variable are accompanied by decreases in the other variable (an inverse relationship).

With negative correlation, the regression line goes from the upper left corner of the graph to the lower right corner.

As you may recall, such lines have a negative slope.

Although some correlation coefficients are positive and some are negative, one is not more valuable than the other.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 7: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

The algebraic sign simply tells you the direction of the relationship (which is important when you are describing how the variables are related).

The absolute size of r, however tells you the degree of the relationship.

A strong relationship (either positive or negative) is usually more valuable than a weaker one.

A zero correlation means there is no linear relationship between two variables.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 8: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

High and low scores on the two variables are not associated in any predictable manner.

Figure 2 shows a scatterplot that produces a zero correlation coefficient.

When r = 0, the regression line is horizontal at a height of Y.

This make sense; if r = 0, then your best estimate of Y for any value is Y.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 9: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 10: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

A correlation coefficient provides a quantitative way to express the degree of relationship that exists between two variables.

The definition formula is

where r = Pearson product-moment correlation coefficient

zx = a z score for variable Xzy = the corresponding z score for variable YN = number of pairs of X and Y values

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 11: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

Because researchers often use means and standard deviations when telling the story of the data, this formula (Blanched formula) is used by many:

Where; x & y = paired observations xy = product of each x value multiplied

by its paired y value

= mean of variable x= mean of variable y= standard deviation of variable x= standard deviation of variable y

N = number of pairs of observations

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 12: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

With the raw score formula, you calculate r from the raw scores without computing means and standard deviations.

The formula is

Remember that N is the number of pairs of values.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 13: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 14: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 15: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 16: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

The basic simple interpretation of r is probably familiar to you at this point.

A correlation coefficient measures the degree of linear relationship between two variables of a bivariate distribution.

What is qualifies as a large correlation coefficient? What is small?

Jacob Cohen proposed that the question be answered by calculating an effect size index (d) and that d values of .20, .50, and .80 were designated as small, medium, and large, respectively.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 17: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

In a similar way, Cohen addressed the question of calculating an effect size index for correlation coefficients.

Small r = .10Medium r = .30Large r = .50

The correlation coefficient is also the basis of the coefficient of determination, which tells the proportion of variance that two variables in a bivariate distribution have in common.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 18: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

The coefficient of determination is calculated by squaring r; it is always a positive value between 0 and 1.

coefficient of determination = r2

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 19: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

Linear regression is a technique that uses the data to write an equation for a straight line – then to make predictions.

Often predictions are based on an assumption that the relationship between two variables is linear.

Formula for a straight line is Y = mX + b

Where; - Y and X are variables representing scores on the Y and X axes - m = slope of the line - b = intercept of the line with the Y axis

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 20: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

Least square method is

Consider this data: Father height 64 inFirst daughter height 66 inSecond daughter height 64 in

Draw a regression line using least square method

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 21: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

There is an error for each person on the scatterplot.

The least square method creates a straight line such that the sum of the squares of the errors is a minimum.

The least square method produces numerical values for the slope and the intercept – write the equation for a straight line; this line is the one that best fits the data.

In statistics, the regression equation is

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 22: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

Where; Y = Y value predicted from a particular X value a = point at which the regression line intersects the s the axis b = slope of the regression line X = X value for which you wish to predict a Y valueNote: In correlation problem, the symbol Y can be assigned to either variable, but in regression equation, Y is assigned to the variable you wish to predict.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 23: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

You need the values for a and b, which are called regression coefficients – can be calculated from any bivariate set of data.

To calculate b,

To calculate a,

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 24: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

Let use our data from earlier correlation problem, predict math from oral test in SPM.

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 25: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

Entering these regression coefficient values into the regression equation produces a formula that predicts math from oral test:

EDUCATIONAL STATISTICS EDU5950 WEEK12

Page 26: Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution

.

EDUCATIONAL STATISTICS EDU5950 WEEK12