Chapter 9 : Linear Correlation

Correlation

Correlational Research

Correlational research: describes the relationship between

two or more naturally occurring variables.

– Is age related to political conservativism?

– Are highly extraverted people less afraid of rejection

than less extraverted people?

– Is depression correlated with hypochondriasis?

– Is I.Q. related to reaction time?

measure two variables and determine whether

there is a relationship present

predictor <-> criterion

No causality because:

direction: there is no way to tell which is the cause or

the effect

third variable problem: some third variable that was not

measured could be responsible for the relationship.

3rd variable

problem

red car speeding

ticket

?

midnight

basketball

less

crime

?

vitamins healthier

?

larger feet reading

skills

?

Dr. Dimwit

Scales of Measurement and Indicators

Scale of

Measurement

Indicator of

Central

Tendency

Indicator of

Variability

Indicator of

Association

Nominal/

CategoricalMode Variation Ratio Cramer’s Phi (ϕc)

Ordinal MedianSemi-Interquartile

Range (SIQ)Spearman’s Rho (rs)

Interval/Ratio Mean Standard Deviation Pearson’s (r)

Correlation Coefficients

Correlation

Coefficient

Predictor (X) Criterion (Y)

Cramér’s Phi (ϕ𝑐) Nominal/Categorical Nominal/Categorical

Spearman’s Rho (rs) Ordinal Ordinal

Pearson’s (r) Interval or Ratio Interval or Ratio

No relationship

Positive linear

relationship

Negative linear

relationship

Curvilinear

Relationship:(Linear corr.

not appropriate)

Estimate r

for each case:

• Correlation coefficient (r)

+1.00 perfect positive correlation;

-1.00 perfect negative correlation;

0 lack of correlation

ABS|r| = magnitude of relationship

sign (r) direction of relationship

r2 = % variance of Y explained by X

Types of correlation coefficients

Pearson’s correlation coefficient: linear

relationship between two interval / ratio

variables.

Spearman’s rank-order correlation: linear

relationship between two variables measured

using ordinal (ranked) scores.

Point-biserial correlation: linear relationship

between the scores from one continuous

variable and one dichotomous (0 or 1) variable.

Conceptual Formula for

Pearson’s correlation:

Positive r

[z’s from x and y same sign]Negative r

[z’s from x and y different sign]

<-Neg zx | Pos zx-> <-Neg zx | Pos zx->

<-N

eg

zy

| P

os

zy->

<-N

eg

zy

| P

os

zy->

N

zzr

yxpopulation

y

iy

x

ix

s

YYz

s

XXz

1.11Y 31.5 X

y

iy

x

ix

s

YYz

s

XXz

Sx = 6.22 Sy = 3.41

Research question: Is education about other ethnicities

correlated with tolerant attitudes towards others?

Education Score

Tolerance Score

Zx Zy ZxZy

25 3 -1.05 -2.38 2.50

25 9 -1.05 -.62 .65

33 14 .24 .85 .20

35 11 .56 -.03 -.02

38 13 1.05 .56 .59

36 14 .72 .85 .61

31 12 -.08 .26 -.02

29 12 -.40 .26 -.10

22 9 -1.53 -.62 .95

41 14 1.53 .85 1.30

315 111 6.66

1

2

n

XXs

ix

74.

110

66.6

1

n

zzr

yx

Could this be (1) due to chance, such as random error, or

(2) very UNLIKELY to occur due to chance (< 5%)?

Inferential statistics are needed.

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

0.00 10.00 20.00 30.00 40.00 50.00

To

lera

nc

e S

co

re

Education Score

Testing Pearson’s r for significance

H0: ρ = 0 x<->y association does not exist

Ha: ρ ≠ 0 x<->y association exists (non-directional)

Using the t distribution:

Using the table of critical values

df = N – 2 (N is the number of pairs of scores)

= 10 – 2 = 8

21

2

r

rNt

dfp =

0.05

p =

0.01

1 12.7163.6

6

2 4.30 9.92

3 3.18 5.84

4 2.78 4.60

5 2.57 4.03

6 2.45 3.71

7 2.36 3.50

8 2.31 3.36

9 2.26 3.25

10 2.23 3.17

11 2.20 3.11

12 2.18 3.05

13 2.16 3.01

14 2.14 2.98

Using a t-tableHa: an association exists between education & tolerance (two-tailed)

alpha = .05

df = N – 2

10-2 = 8

If t > 2.31,

reject H0,

left with Ha.

If t <= 2.31,

retain ho

Hypotheses

Directional hypothesis – Ha states whether the correlation is expected to be positive or negative (one-tailed test appropriate).

Nondirectional hypothesis – Ha states that there is an association, but does not specify the direction (two-tailed test appropriate).

t = -2.0 t = +2.0 t = -1.67

df = 8

αlevel = .05

Our example

21

2

r

rNt

tr = 3.96 df = 10 – 2 = 8

tcrit df = 8 = 3.71

APA Style: r(df) = value obtained, p = .##

r(8) = .74, p = .0059

274.1

74.210

t

Hypothesis Testing

Rejecting the null hypothesis –concluding that the null

hypothesis is wrong.

Leaving us with the alternative hypothesis (Ha) that there

is an association between predictor and criterion

Failing to reject the null hypothesis –concluding the null

hypothesis (no association) is a likely possibility.

We do not “accept” the null hypothesis (H0) , because

the null hypothesis can never be proven.

Errors• Type I error – a researcher rejects the null

hypothesis when it is true (a false positive)

– Alpha –probability of Type I error (most commonly p = .05).

• Type II error – a researcher fails to reject the null hypothesis when it is false (a false negative)

– Beta – the probability of Type II error

(most commonly beta = .20).

Statistical Decisions and

Outcomes

Reject null

hypothesis

Fail to reject

(we retain) null

Null hypothesis

false

Null hypothesis

true

Reality (unknown)

Statistical Decision

Type II Error ():Incorrectly concludeNo correlation

Type I Error ():Incorrectly concludethere is a correlation

Correct:

Correlation exists

Correct:

Correlation does

not exist

Power

• Power is the probability that a study will detect effects that are really present (correctly reject the null hypothesis).

• Power = 1-beta. Typically set at .80, or 80% chance of observing an effect when present.

• Power analysis is used to decide how many participants are needed to detect a significant effect, since increasing participants increases power.

Power Table: required n (rows) and r (columns)

n .10 .20 .30 .40 .50 .60 .70 .80 .90

15 .06 .11 .19 .32 .50 .70 .88 .98 >.995

30 .08 .16 .37 .61 .83 .95 >.995

50 .11 .29 .57 .83 .97 >.995

100 .17 .52 .86 .99 >.995

200 .29 .81 .99 >.995

1000 .89 >.995

Power has a direct impact on likelihood of success and is often required for

Masters and Dissertation proposals and fellowship and grant applications.

Know your power, use your power!

Effect Size

Effect size: how strongly variables are related to eachother.

Coefficient of determination (r2): the proportion of variability in the criterion that is due to the predictor.

(Range: .00 to 1.00). One indicator of effect size.

r2 =.742 = .55

55% of variance in the criterion (tolerance) is explained by the predictor (education)

Limitations

• Pearson’s r only measures the degree of linearcorrelation.

• Problems in generalizing from sample correlations– Restricted or truncated ranges (results in smaller

magnitude correlation)

– Bivariate outliers

RESTRICTION OF RANGE

Full Range. r = .60

Restricted range, r = .20

Restriction of range

often decreases r

Marital Satisfaction Over Time

1 2 3 4 5 6 7 8 9 10

Years of Marriage

Ma

rita

l S

atisfa

ctio

n

Wife

Husband

Marital Satisfaction Over Time

No C

hild

Infa

nt

Pre

school

Sch

ool

Adole

scen

t

Young

Adult

Em

pty N

est

Ret

irem

ent

Years of Marriage

Ma

rita

l S

ati

sfa

cti

on

Previous slide data showed

restriction of range!

Outliers

• An outlier is a score that is so deviant from

the data that one can question whether it

belongs in the data set.

• > + / - 3 SD from the mean.

• On-line outliers fall in the same pattern as the

rest of the data artificially inflating r.

• Off-line outliers fall outside of the pattern of

the rest of the data artificially deflating r.

IMPACT OF OUTLIERS ON CORRELATION

On-line outlier Off-line outlier

..

…

Assumptions of the significance test

Independent random sampling

Normal distribution (and bivariate normal

distribution)

Interval or ratio scale variables

SPSS

Pearson’s r:

Analyze → correlate → bivariate correlations

Select variable you wish to correlate and

place them in box

Make sure Pearson’s is checked

Choose one/two tailed

OK

SPSS

Scatter Plot:

GraphLegacy DialoguesScatter/Dot

Select Simple Scatter

Select variables for X and Y axis

Ok

Note: Select 3D Scatter to look at bivariate

normal assumption for r.

Documents

Chapter 9 : Linear Correlation