View
2
Download
0
Category
Preview:
Citation preview
Correlation
Correlation
A statistics method to measure the
relationship between two variables
Three characteristics
Direction of the relationship
Form of the relationship
Strength/Consistency
Direction of the Relationship
Positive correlation
Variables moving in the same direction
Negative correlation
Variables moving in opposite directions
Form of the Relationship
Linear or non-linear
Predicting data
Strength/Consistency
How well do data fit the specific form?
Measured by the distance between actual data
and the predicted data
The absolute value of a correlation
Measuring the fitness
1: Perfect fit
0: Not fit at all
Correlation Measures The Pearson correlation
Linear relationship
The sign of the correlation: direction
The numerical value: the degree of the relationship
The Spearman correlation For ordinal scale of measurement
Both the X values and the Y values are ranks. Measuring consistency for data relationship
Not necessarily to be linear
The point-biserial correlation Used to measure the correlation between a regular
variable and a dichotomous variable
The Pearson Correlation
YX SSSS
SP
y separatelY and X ofy variabilit
Y and X ofity covariabil
atelyvary separ Y and X whichto degree
togethervary Y and X whichto degreer
2)( MXSS
n
YXXYPS
))((
))(( YX MYMXPS
n
XXSS
2
2)(
Check the Result
Using the scatterplot of data
Drawing the envelope around all data points
Checking the direction and shape of the
envelope
0
1
2
3
4
5
0 2 4 6 8 10 12
X Y
0 1
10 3
4 1
8 2
8 3
Interpreting Correlations
Predication
Correlation is just about relationship between two variables. Not necessarily causation!!
Interpreting Correlations
Predication
Correlation is just about relationship between two variables. Not necessarily causation!!
The value could be affected greatly by the data range.
Data Range and Correlation
Interpreting Correlations
Predication
Correlation is just about relationship between two variables. Not necessarily causation!!
The value could be affected greatly by the data range.
Outliers can dramatically affect the value.
Outlier and Correlation
The Strength of Relationship
The Strength of Relationship
The coefficient of determination
Squaring the value of correlation
How much of the variance in dependent variable
is accounted for by independent variable.
Similar to the power used in z- and t-tests
Hypothesis Tests with the
Pearson Correlation
Pearson correlation is usually computed for
sample data, but used to test hypotheses
about the relationship in the population
Population correlation shown by Greek letter
rho (ρ)
Non-directional: H0: ρ = 0 and H1: ρ ≠ 0
Directional: H0: ρ ≤ 0 and H1: ρ > 0 or
Directional: H0: ρ ≥ 0 and H1: ρ < 0
Population vs. Sample
Correlation Hypothesis Test
Sample correlation r used to test population ρ
Hypothesis test can be computed using
either t or F
Use t table to find critical value
rs
rt
df
rsr
21
About df
What should the df be?
Suppose the sample size is n
2
)1( 2
n
r
rt
Example
α = .05
n = 30
r = 0.35
Two-tailed test: critical value ±2.048
Fail to reject the null hypothesis
One-tailed test: reject: 1.701
Reject
97.1
28
)35.01(
35.0
2
)1( 22
n
r
rt
Using r Directly …
Report Correlations
A correlation for the data revealed a
significant relationship between amount of
education and annual income, r (28)= 0.65,
p < .01, two-tailed.
Usually, Multiple Variables
Involved in Correlation Tests
Partial Correlation
Involvement of
other factors in
correlation?
Partial Correlation
Partial Correlation
A partial correlation measures the
relationship between two variables while
mathematically controlling the influence of a
third variable by holding it constant
)1)(1(
)(
22
yzxz
yzxzxy
zxy
rr
rrrr
ExampleNumber of Churches
(X) Number of Crimes
(Y) Population
(Z)1 4 12 3 13 1 14 2 15 5 17 8 28 11 29 9 2
10 7 211 10 213 15 314 14 315 16 316 17 317 13 3
0zxyr
What if the relationship looks
like this?
The Spearman Correlation
To measure the degree of consistency of direction Not necessarily linear.
One extra step before calculating the Pearson correlation Ranking the X and Y values
Analyze the correlation of ranking values.
X Y (values) X Y (Ranks)
1 3 2 2
6 4 4 3
2 5 3 4
0 2 1 1
Ranking Tied Scores
Using the same rank for same scores
Ranking all scores
Computing the mean for ranked position of same scores
X Y (values) X Y (Ranks)
1 3 2 2 (2.5)
6 3 4 3 (2.5)
2 5 3 4
0 2 1 1
Special Formula for
Spearman Correlation
12
)1( 2
nnSS
)1(
61
2
2
nn
Drs
The Point-Biserial Correlation
Just like the Pearson correlation
One variable has only two values
Gender, success/failure, college education or not, …
The value of correlation has nothing to do with the
values you used in study (1/0, 1/-1, etc.)
Point-Biserial Correlation vs. t
Test
t test
t = 4
p <.001
df = 18
Point-Biserail
r = 0.686
If we know two variables are
linearly related, how can we
describe such a relationship?
Using a linear equation
y = bx + a
Regression
Goal of Regression
Determining two constants for a linear
equation: y=bx+a
b: slope
a: intercept
Methods
The least-squares solution
Distance = Y - Y^
Minimizing S(Y-Y)2^
Formula
XY
X
bMMa
SS
SPb
Regression in Excel
Draw a scatterplot
Show the trendline
Linear Equations and
Regression
The Pearson correlation
measures a linear relationship
between two variables
This figure
Makes the relationship easier to
see
Shows the central tendency of
the relationship
Can be used for prediction
Linear Equations
General equation for a line
Equation: Y = bX + a
X and Y are variables
a and b are fixed constant
Regression
Regression is a method of finding an
equation describing the best-fitting line for a
set of data
Least square
Minimizing errors of known data
Or the error of prediction
Error of Prediction
With a linear function from regression, we
can calculate the predicted value based on a
given X
Ŷ
Error of prediction: Y- Ŷ
Often squared
Standard Error of Estimate
Regression equation makes a prediction
Precision of the estimate is measured by the
standard error of estimate (SEoE)
SEoE =2
)ˆ( 2
n
YY
df
SSresidual
2
)ˆ(SS
2
residual
n
YY
Relationship Between
Correlation and Standard Error
of Estimate
SSregression = r2 SSY
SSresidual = (1 - r2) SSY
2
)1( 2
n
SSr
df
SS Yresidual
Testing Regression
Significance
Analysis of Regression
Similar to Analysis of Variance
Uses an F-ratio of two Mean Square values
Each MS is an SS divided by its df
H0: the slope of the regression line (b or beta)
is zero
no regression
Mean Squares and F-ratio
residual
residualresidual
df
SSMS
regression
regression
regressiondf
SSMS
residual
regression
MS
MSF
SS and df in
Regression Analysis
SPSS Output Example
In Excel
X Y5 101 44 57 116 154 63 52 0
ANOVA and Regression
Basically the same method, but different
perspectives to look at the results
Main effect in ANOVA == a variable in
regression
Interaction between two factors ==
multiplication of two variables in regression
Regression not only tells difference, but also
predicts by how much.
Multivariate regression
Linear or Non-Linear
Regression?
Linear models are usually good enough to
most research in IST.
If non-linear models are involved, how to tell
the linear model you have is not appropriate?
Look at residual distribution
In Summary
Correlation: the relationship between two
variables
Direction, form, degree
Three methods
For different purposes
Regression
Determining the linear equation that data best fit
Slope and intercept
Homework
Three problems to solve.
Recommended