21
Correlation & Regression Chapter 15

Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Embed Size (px)

Citation preview

Page 1: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Correlation & Regression

Chapter 15

Page 2: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Correlation

It is a statistical technique that is used to measure and describe a relationship between two variables. Usually the two variables are simply observed as they exist – there is no attempt to manipulate or control them in any way.

To study relationship between age and delinquency a research will look at the ages of children and how many delinquent acts they have committed – she is not manipulating she is just observing.

This gives us two scores, one for age (X) and one for delinquency (Y), and if we plot it on a scatter plot, we can see the relationship between the two.

Page 3: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

The same set of n = 6 pairs of scores (X and Y values) is shown in a table and in a scatterplot. Note that the scatterplot allows you to see

the relationship between X and Y.

Page 4: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

The Characteristics of a RelationshipCorrelation measures three characteristics of a relationship

between X and Y:

1. The Direction of the relationship: two types either positive or negative.

• In positive correlation as X increases Y increases, or as X decreases Y decreases (identified by the sign +)

• In negative correlation as X increases Y decreases and vice versa (identified by the sign - )

Page 5: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

The Characteristics of a RelationshipCorrelation measures three characteristics of a relationship

between X and Y:

1. The Form of the Relationship: Some have linear forms – there are other forms as well.

1. The Degree of the Relationship: A correlation also measures how well the data fit the form being considered. So in a linear relationship, we like to see how well the data fit into a straight line. A perfect Correlation is identified by either +1 (positive) or -1 (negative). Correlation ranges from 0 to +/- 1, O is when there is no fit at all.

Page 6: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Examples of positive and negative relationships. Beer sales (gallons) are positively related to temperature, and coffee sales

(gallons) are negatively related to temperature.

Page 7: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Examples of different values for linear correlations: (a) a strong positive relationship, approximately + 0.90

(b) a relatively weak negative correlation, approximately 0.40 (c) a perfect negative correlation, 1.00

(d) no linear trend, 0.00

Page 8: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Pearson Correlation

Measures the degree and direction of the linear relationship between the two variables.

Pearson Correlation = r = degree to which X and Y vary together

degree to which X and Y vary separately

Easy to do on SPSS, go to Analyze/Correlate/Bivariate – put variables of interest into the box on the right and check mark against Pearson, Click OK. SPSS gives you the output with the value of r and also the significance.

Page 9: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Pearson Correlation – Uses and interpretation1. Prediction: if two variables are known to correlate in a systematic way

then you can use one to make predictions about the other. Example: SAT scores used to predict college performance.

1. Validity: To check if certain tests or measures are measuring what they are supposed to be measuring. Eg. Is a new IQ test really measuring IQ, if it is it should have a strong positive correlation with scores on other IQ tests.

1. Reliability: Do certain measurement procedures provide consistent results. Eg. The same individuals will produce same (or similar) scores on an IQ test when repeated at different times, so you could correlate scores at time one with scores at Time2 on the same test to see if there is a positive correlation indicating reliability of the measure.

1. Theory Verification: to test if certain theories are true. Relationship between brain size and IQ.

Page 10: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Pearson Correlation – Uses and interpretation

1. Correlation simply describes a relationship between two variables and does not tell us why they are related. Does not tell us about Cause and Effect.

2. The value of a correlation can be greatly affected by the range of scores represented in the data.

3. One or two extreme data points (outliers) can have a dramatic effect on the value of a correlation.

4. The numbers do not directly relate to the size of the effect. 0.5 does not mean that you can predict with 50% accuracy, you have to use r2 (coefficient of determination) for that, so a correlation of 0.5 produces a r2 of (0.5 x 0.5) .25 or 25%.

Page 11: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Example: Causation IssuesHypothetical data showing the logical relationship between the number of churches and the number of serious crimes

for a sample of US cities (large and small).

Page 12: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

(a) In this example, the full range of X and Y values shows a strong, positive correlation, but the restricted range of scores

produces a correlation near zero. (b) Here the full range of X and Y values shows a correlation

near zero but the scores in the restricted range produce a strong positive correlation.

Page 13: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

A demonstration of how one extreme data point (an outlier) can influence the value of a correlation.

Page 14: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Three sets of data showing three different degrees of linear relationship.

r = 0 (and r2=0)

r = 0.60 r2 = .36 or 36%

r = 1.00 r2 = 1 or 100%

Page 15: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

The main purpose of the hypothesis test using Correlation is to see if the nonzero correlation is simply due to chance or

sampling error

The Hypothesis Test with Pearson’s Correlation

Scatter plot of a population of X and Y values with a near-zero correlation. However, a small sample of n = 3 data points from this population shows a relatively strong, positive correlation. Data points in the sample are circled.

Page 16: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Hypothesis Testing using Pearson’s Correlation

State Null and alternate Hypothesis: (the common term for population correlation is represented by the Greek symbol ρ (rho)

H0: ρ = 0 (no population correlation)

H1: ρ ≠ 0 (There is real correlation)

Degrees of Freedom for the correlation test is always n-2

Look up table for Pearson Correlation to identify the critical value based on your df and alpha value (Or just do it in SPSS and be happy!)

Accept or reject your null hypothesis. Report results in APA format (next slide)

Page 17: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Reporting Correlations The correlation coefficient revealed that amount of education and annual

income were significantly related, r = +.65, n=30, p<.01, two tails.

If you have looked at a number of variables in your study, you can present results in a correlation matrix:

TABLE 1Correlation Matrix for income, amount of education, age and intelligence

Education Age IQ

Income +.65* +.41** +.27

Education +.11 +.38**

Age -.02

n = 30

*p<.01, two tails

**p<.05, two tails

Page 18: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

The Spearman Correlation

Pearson measures degree of linear relationships between two variable used for data from an interval or ratio level of measurement.

Spearman is used when • data has been measured using an ordinal scale (when X and Y

values are in ranks), • also used when there could be relationships that are not linear. • also when you have outliers in your interval level data that could be

affecting the r value drastically, Spearman converts those raw scores to ranks and does the calculations.

• As for calculations leave it to SPSS!

Page 19: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

Example of a non-linear relationship Hypothetical data showing the relationship between practice

and performance. Although this relationship is not linear, there is a consistent positive relationship. An increase in

performance tends to accompany an increase in practice.

Page 20: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

How to do it SPSS

SPSS commands are: Analyze, Correlate, Bivariate – put two variables of interest in the variable box. Check both Pearson’s and Spearman’s correlation – click OK. Interpret the output and report results as shown in previous slides.

To get a scatter plot in SPSS, Graphs, Scatter, select ‘Simple’ click ‘Define’ – Put variables on X and Y axes. Add titles etc if you want by clicking on ‘Titles’. Click OK.

Correlations

1.000 .201**

. .000

1443 1443

.201** 1.000

.000 .

1443 1444

Correlation Coefficient

Sig. (2-tailed)

N

Correlation Coefficient

Sig. (2-tailed)

N

How many times inthe last year haveyou hit a teacher?

How many times inthe last year haveyou been beatenup by your parents?

Spearman's rho

How manytimes in the

last yearhave you hita teacher?

How manytimes in the

last year haveyou been

beaten up byyour parents?

Correlation is significant at the 0.01 level (2-tailed).**.

You are interested in

this cell

Or this one

Page 21: Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables

How to do it SPSS

SPSS commands are: Analyze, Correlate, Bivariate – put two variables of interest in the variable box. Check both Pearson’s and Spearman’s correlation – click OK. Interpret the output and report results as shown in previous slides.

To get a scatter plot in SPSS, Graphs, Scatter, select ‘Simple’ click ‘Define’ – Put variables on X and Y axes. Add titles etc if you want by clicking on ‘Titles’. Click OK.

Correlations

1.000 .201**

. .000

1443 1443

.201** 1.000

.000 .

1443 1444

Correlation Coefficient

Sig. (2-tailed)

N

Correlation Coefficient

Sig. (2-tailed)

N

How many times inthe last year haveyou hit a teacher?

How many times inthe last year haveyou been beatenup by your parents?

Spearman's rho

How manytimes in the

last yearhave you hita teacher?

How manytimes in the

last year haveyou been

beaten up byyour parents?

Correlation is significant at the 0.01 level (2-tailed).**.

You are interested in

this cell

Or this one