34
Ed Psy 641 Correlation and Regression

Presentation 8 Chap. 7 (1) (2)

Embed Size (px)

Citation preview

Ed Psy 641

Ed Psy 641Correlation and RegressionReviewSo far we have discussed and begun to use examples of the properties of a variable and how it can be displayed and communicated to others.This has led us to understand the importance of the mean and standard deviation in explaining key features to others.Review (2)We have also discussed the importance of understanding the shape of a distribution of a sample and how this information can be important in helping us understand how different people within a sample relate to each other.We have used percentile ranks, standard scores and z-scores to begin to understand how probable any single score in a distribution is as well.ReviewWe have talked about the fact that we can begin to understand the difference in group means by the substitution of a mean from one sample with a score from a second sample.

AssociationOne question that we often want to answer in research is whether two variables are related. In addition, we may want to know how strongly two variables are related.In order to answer this question we will need to have a new approach to understanding our data and new tools as well.CorrelationCorrelation is a bivariate statistical procedure and it measures the degree of linear association between two quantitative variables. Scatter plots are a quick and easy way to visualize dataA scatter plot has two axes of equal length one for each variable.Each axis is marked off according to that variables scale with the low scores converging where the two axes intersectTwo VariablesStudentVerbalPerformance1110982100105389934701105115105685907951208102809120115Graphing Two Variables

Visual InspectionAssociationDirectionOutliersNonlinearityVisual InspectionAssociation- the stronger the association or relationship of two variables the more the data points cluster along an imaginary straight lineIf all data points fall on a straight line then you have a perfect relationship (which truly never happens in research)If there is no association or relationship then the data points spread out like a shot gun blast. Visual InspectionDirectionIf positive relationship then the ellipse or straight line goes from the lower left corner to the upper right (see figure 7.2 page 109)If negative relationship ellipse or straight line goes from upper left corner to lower rightVisual InspectionScatter plots tell us about outliersThese are data points that do not fit with all the other data points. Outliers affect your correlation coefficient and changes the magnitude. In other words outliers causes you to get misleading correlation Scatter plots help us identify those outliers.Visual InspectionNon-linearity The whole premise behind correlation is linearity A relationship is said to be linear if a straight line accurately represents the constellation of data pointsSo scatter plot can help us determine if data points can match a straight line Again see figure 7.2 page 109 for different types of relationships. Figure 7.2

CovarianceCov = ( (Xi Xbar)( Yi Ybar))/ n

Sum of Verbal = 886Mean = 98.4 Sum of Performance = 916 Mean = 101.7Go ahead and calculate your covariance estimate.EstimateCovariance estimate should be approximately 69.76. What does this tell us?The positive value suggests that there is a positive relationship to the data.The magnitude of the covariance estimate indicates the strength of the relationship.Limitations of CovarianceThe estimation of magnitude is dependent upon the scale of the distribution. Fixing the ProblemIn order to adjust for the problem of scale we can use a trick similar to that of the z score.If we place each of the two variables on to an adjusted scale using their respective standard deviations we eliminate the scale problem. Pearsons r( (Xi Xbar)( Yi Ybar))/ n______________________(SDX)(SDY )SD Verbal = 15.73 SD Performance = 12.75Go ahead and calculate Pearsons r for the data.

Pearsons r69.76 / (15.73*12.75) .3478

So what does this tell us. We retain the knowledge that the correlation is positive. If there is no association then the value would equal 0. The stronger the relationship the closer to 1 our value should be.Pearsons rThe book also offers a second formula, referred to as the calculating formula. It will yield the same result. I find the calculating formula to be easier to deal with because we do not have to calculate the standard deviations. For homework you can use either formula. CorrelationHow to interpret your correlation coefficientCohen suggests the following values for rIf r = .10 to .29 then a small correlation .30 to .49 then a medium correlation .50 and up then a strong correlationCorrelation and CausationCorrelation does NOT mean causation. There is always the possibility that a third variable is correlated with the other two variables that are causal in relationship.Limitation of Pearson rPearsons r looks to see how strong a linear relationship between two variables may be. It is possible that our data give us the false impression of moderate linearity and are really curvilinear in nature. This is one reason why scatterplots are helpful.Second, a strong curvlinear relationship may give us a really low r value because any linear relationship will not fit well.OutliersOutliers can cause distortions in the dataset. The further someone is off the line the more they will reduce the r value. However, people who are further from the mean and along the axis may do pull us toward a higher r value. Restriction of RangeBy the same token if we choose a narrow sample of our total group we may loose our ability to correctly detect a significant relationship.Pearsons r and FitAnother way of thinking about Pearsons r is to think about it as a fit statistic. If we go back to earlier lectures we introduced the idea that a Model = Observation (Measurement) + ErrorHere, we are testing the hypothesis of a linear relationship and seeing how well it fits. The closer to 1.0 or -1.0 we get the better a linear relationship explains our measurement and its associated error. The closer to 0 the less well it explains it.Strength of the RelationshipCovariancerprevious research and reasonr2

r2r2 is the amount of variance that is shared between the two variables. This is called the coefficient of determination1- r2 is the coefficient of nondetermination and tells us how much variance our model leaves out. Both of these numbers are important as they help guide us to understand whether more research in an area is needed with the tools we have. Exampleour estimated r = .34. r2 =.12 So, the relationship between Verbal and Performance accounts for approximately 12% of the total variance seen in both sets of the scores.However, our coefficient of nondetermination = .88 (1.0 - .12). Clearly, we may be able to use other tools to better understand any relationship between the verbal and performance measures.Effect Sizer2 is often used as a measure of effect size because of its ability to tell us how much we do not know or have yet to explain about a research area.Other Correlation CoefficientsThere are times when you may need to use other correlation coefficients. SPSS provides a sample of these that you can look at as well. An example would be the use of Spearmans correlation coefficient which can be used when our variables are not parametric in nature. Table 7.2

I think it is very important and helpful to create tables like this one when calculating pearson r; This is example of the defining formula (which I think is more difficult). 33Table 7.5

This table is for the calculating formula. Setting up these tables makes it easy to plug numbers into the formulas. 34