View
216
Download
0
Tags:
Embed Size (px)
Citation preview
PSY 307 – Statistics for the Behavioral Sciences
Chapter 6 – Correlation
Midterm Results
Top score = 45Top score for curve = 45
40-53 A 736-39 B 431-35 C 227-30 D 80-26 F 3
24
Aleks/Holcomb Hint
To Find the Cutoff Scores
If you know the mean and standard deviation, you can find what x values cut off certain percentages. Solve for k then multiply the k value by the SD and add/subtract that number from the mean to get the cutoff scores.
Does Aleks Quiz 1 Predict Midterm Scores?
Adding a Prediction (Regression) Line Provides More Information
r = .56
Does Time Spent on Aleks Predict Quiz Grades?
r = .16
Sometimes the Relationship is Not Linear
r = .16
r = .47 (quadratic)
This is the graph as published in a Wall Street Journal editorial (7/13), where they claimed that reducing corporate taxes results in greater revenue.
Treating Norway as an outlier, the data instead shows that as taxes increase, so do revenues – the opposite conclusion.
Which is right? The correct graph is the one with the best fit – where most of the data points are close to the line drawn (right).
Lying With Statistics
Describing Relationships
Positive relationship – high values tend to go with high values, low with low.
Negative relationship – high values tend to go with low values, low with high.
No relationship – no regularity appears between pairs of scores in two distributions.
Relationship Does Not Imply Causality
A relationship can exist without being a CAUSAL relationship. Correlation does not imply causation.
Third variable problem -- a third variable is causing both of the variables you are measuring to change – e.g., popsicles & drowning.
The direction of causality cannot be determined from the r statistic.
Chocolate and Nobel Prizes
http://www.nejm.org/doi/full/10.1056/NEJMon1211064
Scatterplots
One variable is measured on the x-axis, the other on the y-axis.
Positive relationship – a cluster of dots sloping upward from the lower left to the upper right.
Negative relationship – a cluster of dots sloping down from upper left to lower right.
No relationship – no apparent slope.
Example Positive Correlations
r=1.0
r=.85
r=.39
r=.17
Example Negative Correlations
r=-.54
r=-.94 r=-.33
Note that the line slopes in the opposite direction, from upper left to lower right.
Strength of Relationship
The more closely the dots approximate a straight line, the stronger the relationship.
A perfect relationship forms a straight line.
Dots forming a line reflect a linear relationship.
Dots forming a curved or bent line reflect a curvilinear relationship.
More Examples
http://www.stat.uiuc.edu/courses/stat100/java/GCApplet/GCAppletFrame.html
Correlation Coefficient
Pearson’s r –a measure of how well a straight line describes the cluster of dots in a plot. Ranges from -1 to 1. The sign indicates a positive or
negative relationship. The value of r indicates strength of
relationship. Pearson’s r is independent of units
of measure.
Interpreting Pearson’s r
The value of r needed to assert a strong relationship depends on: The size of n What is being measured.
Pearson’s r is NOT the percent or proportion of a perfect relationship.
Correlation is not causation. Experimentation is used to confirm a
suspected causal relationship.
Calculating Pearson’s r
zxzy
r = _______
n – 1
This formula is most useful when the scores are already z-scores.
Computational formulas – use whichever is most convenient for the data at hand.
Sum of the Products (SP)
n
XXXXSS
andn
YXXYYYXXSP
where
SSSS
SPr
x
xy
yx
xy
222 )(
)(
))(())((
Computational Formulas
Outliers
An outlier that is near where the regression line might normally go, increases the r value.
r=.457
An outlier away from the regression line decreases the r value.
r=.336
Dealing with Outliers
Outliers can dramatically change the value of the r correlation coefficient.
Always produce a scatterplot and inspect for outliers before calculating r.
Sometimes outliers can be omitted. Sometimes r cannot be used. http://www.stat.sc.edu/~west/javahtml/Regression.html
Other Correlation Coefficients
Spearman’s rho (r) – based on ranks rather than values. Used with ordinal data (qualitative data
that can be ordered least to most). Point biserial correlation --
correlations between quantitative data and two coded categories.
Cramer’s phi – correlation between two ordered qualitative categories.