Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Correlation
Correlational Research
Correlational research: describes the relationship between
two or more naturally occurring variables.
– Is age related to political conservativism?
– Are highly extraverted people less afraid of rejection
than less extraverted people?
– Is depression correlated with hypochondriasis?
– Is I.Q. related to reaction time?
measure two variables and determine whether
there is a relationship present
predictor <-> criterion
No causality because:
direction: there is no way to tell which is the cause or
the effect
third variable problem: some third variable that was not
measured could be responsible for the relationship.
3rd variable
problem
red car speeding
ticket
?
midnight
basketball
less
crime
?
vitamins healthier
?
larger feet reading
skills
?
Dr. Dimwit
Scales of Measurement and Indicators
Scale of
Measurement
Indicator of
Central
Tendency
Indicator of
Variability
Indicator of
Association
Nominal/
CategoricalMode Variation Ratio Cramer’s Phi (ϕc)
Ordinal MedianSemi-Interquartile
Range (SIQ)Spearman’s Rho (rs)
Interval/Ratio Mean Standard Deviation Pearson’s (r)
Correlation Coefficients
Correlation
Coefficient
Predictor (X) Criterion (Y)
Cramér’s Phi (ϕ𝑐) Nominal/Categorical Nominal/Categorical
Spearman’s Rho (rs) Ordinal Ordinal
Pearson’s (r) Interval or Ratio Interval or Ratio
No relationship
Positive linear
relationship
Negative linear
relationship
Curvilinear
Relationship:(Linear corr.
not appropriate)
Estimate r
for each case:
• Correlation coefficient (r)
+1.00 perfect positive correlation;
-1.00 perfect negative correlation;
0 lack of correlation
ABS|r| = magnitude of relationship
sign (r) direction of relationship
r2 = % variance of Y explained by X
Types of correlation coefficients
Pearson’s correlation coefficient: linear
relationship between two interval / ratio
variables.
Spearman’s rank-order correlation: linear
relationship between two variables measured
using ordinal (ranked) scores.
Point-biserial correlation: linear relationship
between the scores from one continuous
variable and one dichotomous (0 or 1) variable.
Conceptual Formula for
Pearson’s correlation:
Positive r
[z’s from x and y same sign]Negative r
[z’s from x and y different sign]
<-Neg zx | Pos zx-> <-Neg zx | Pos zx->
<-N
eg
zy
| P
os
zy->
<-N
eg
zy
| P
os
zy->
N
zzr
yxpopulation
y
iy
x
ix
s
YYz
s
XXz
1.11Y 31.5 X
y
iy
x
ix
s
YYz
s
XXz
Sx = 6.22 Sy = 3.41
Research question: Is education about other ethnicities
correlated with tolerant attitudes towards others?
Education Score
Tolerance Score
Zx Zy ZxZy
25 3 -1.05 -2.38 2.50
25 9 -1.05 -.62 .65
33 14 .24 .85 .20
35 11 .56 -.03 -.02
38 13 1.05 .56 .59
36 14 .72 .85 .61
31 12 -.08 .26 -.02
29 12 -.40 .26 -.10
22 9 -1.53 -.62 .95
41 14 1.53 .85 1.30
315 111 6.66
1
2
n
XXs
ix
74.
110
66.6
1
n
zzr
yx
Could this be (1) due to chance, such as random error, or
(2) very UNLIKELY to occur due to chance (< 5%)?
Inferential statistics are needed.
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
0.00 10.00 20.00 30.00 40.00 50.00
To
lera
nc
e S
co
re
Education Score
Testing Pearson’s r for significance
H0: ρ = 0 x<->y association does not exist
Ha: ρ ≠ 0 x<->y association exists (non-directional)
Using the t distribution:
Using the table of critical values
df = N – 2 (N is the number of pairs of scores)
= 10 – 2 = 8
21
2
r
rNt
dfp =
0.05
p =
0.01
1 12.7163.6
6
2 4.30 9.92
3 3.18 5.84
4 2.78 4.60
5 2.57 4.03
6 2.45 3.71
7 2.36 3.50
8 2.31 3.36
9 2.26 3.25
10 2.23 3.17
11 2.20 3.11
12 2.18 3.05
13 2.16 3.01
14 2.14 2.98
Using a t-tableHa: an association exists between education & tolerance (two-tailed)
alpha = .05
df = N – 2
10-2 = 8
If t > 2.31,
reject H0,
left with Ha.
If t <= 2.31,
retain ho
Hypotheses
Directional hypothesis – Ha states whether the correlation is expected to be positive or negative (one-tailed test appropriate).
Nondirectional hypothesis – Ha states that there is an association, but does not specify the direction (two-tailed test appropriate).
t = -2.0 t = +2.0 t = -1.67
df = 8
αlevel = .05
Our example
21
2
r
rNt
tr = 3.96 df = 10 – 2 = 8
tcrit df = 8 = 3.71
APA Style: r(df) = value obtained, p = .##
r(8) = .74, p = .0059
274.1
74.210
t
Hypothesis Testing
Rejecting the null hypothesis –concluding that the null
hypothesis is wrong.
Leaving us with the alternative hypothesis (Ha) that there
is an association between predictor and criterion
Failing to reject the null hypothesis –concluding the null
hypothesis (no association) is a likely possibility.
We do not “accept” the null hypothesis (H0) , because
the null hypothesis can never be proven.
Errors• Type I error – a researcher rejects the null
hypothesis when it is true (a false positive)
– Alpha –probability of Type I error (most commonly p = .05).
• Type II error – a researcher fails to reject the null hypothesis when it is false (a false negative)
– Beta – the probability of Type II error
(most commonly beta = .20).
Statistical Decisions and
Outcomes
Reject null
hypothesis
Fail to reject
(we retain) null
Null hypothesis
false
Null hypothesis
true
Reality (unknown)
Statistical Decision
Type II Error ():Incorrectly concludeNo correlation
Type I Error ():Incorrectly concludethere is a correlation
Correct:
Correlation exists
Correct:
Correlation does
not exist
Power
• Power is the probability that a study will detect effects that are really present (correctly reject the null hypothesis).
• Power = 1-beta. Typically set at .80, or 80% chance of observing an effect when present.
• Power analysis is used to decide how many participants are needed to detect a significant effect, since increasing participants increases power.
Power Table: required n (rows) and r (columns)
n .10 .20 .30 .40 .50 .60 .70 .80 .90
15 .06 .11 .19 .32 .50 .70 .88 .98 >.995
30 .08 .16 .37 .61 .83 .95 >.995
50 .11 .29 .57 .83 .97 >.995
100 .17 .52 .86 .99 >.995
200 .29 .81 .99 >.995
1000 .89 >.995
Power has a direct impact on likelihood of success and is often required for
Masters and Dissertation proposals and fellowship and grant applications.
Know your power, use your power!
Effect Size
Effect size: how strongly variables are related to eachother.
Coefficient of determination (r2): the proportion of variability in the criterion that is due to the predictor.
(Range: .00 to 1.00). One indicator of effect size.
r2 =.742 = .55
55% of variance in the criterion (tolerance) is explained by the predictor (education)
Limitations
• Pearson’s r only measures the degree of linearcorrelation.
• Problems in generalizing from sample correlations– Restricted or truncated ranges (results in smaller
magnitude correlation)
– Bivariate outliers
RESTRICTION OF RANGE
Full Range. r = .60
Restricted range, r = .20
Restriction of range
often decreases r
Marital Satisfaction Over Time
1 2 3 4 5 6 7 8 9 10
Years of Marriage
Ma
rita
l S
atisfa
ctio
n
Wife
Husband
Marital Satisfaction Over Time
No C
hild
Infa
nt
Pre
school
Sch
ool
Adole
scen
t
Young
Adult
Em
pty N
est
Ret
irem
ent
Years of Marriage
Ma
rita
l S
ati
sfa
cti
on
Previous slide data showed
restriction of range!
Outliers
• An outlier is a score that is so deviant from
the data that one can question whether it
belongs in the data set.
• > + / - 3 SD from the mean.
• On-line outliers fall in the same pattern as the
rest of the data artificially inflating r.
• Off-line outliers fall outside of the pattern of
the rest of the data artificially deflating r.
IMPACT OF OUTLIERS ON CORRELATION
On-line outlier Off-line outlier
..
…
Assumptions of the significance test
Independent random sampling
Normal distribution (and bivariate normal
distribution)
Interval or ratio scale variables
SPSS
Pearson’s r:
Analyze → correlate → bivariate correlations
Select variable you wish to correlate and
place them in box
Make sure Pearson’s is checked
Choose one/two tailed
OK
SPSS
Scatter Plot:
GraphLegacy DialoguesScatter/Dot
Select Simple Scatter
Select variables for X and Y axis
Ok
Note: Select 3D Scatter to look at bivariate
normal assumption for r.