Upload
carlo-magno
View
105
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
Multiple Regression and Correlation
Dr. Carlo Magno
Y = a + bX
Bivariate correlation
y = b1x1 + b2x2 + ... + bnxn + c
Multiple correlation
Multiple Regression– association between a criterion variable and two or more predictor variables (Aron & Aron, 2003).
Multiple correlation coefficient = R
Using two or more variables to predict a criterion variable.
Onwuegbuzie, A. J., Bailey, P, & Daley, C. E. (2000). Cognitive, affective, personality, and demographic predictors of foreign-language achievement. The Journal of Educational Research, 94, 3-15.
Foreign Language Achievement
Cognitive
Academic Ach.
Study Habits
Expectation
Affective
Perception
Anxiety
Personality
Cooperativeness
Competitiveness
Demographic
Gender
Age
Espin, C., Shin, J., Deno, S. L., Skare, S., Robinson, S., & Brenner, B. (2000). Identifying indicators of written expression proficiency for middle school students. The Journal of Special Education, 34, 140-153.
Words written
Words correct
Characters
Sentences
Character/Word
Word/sentences
Correct word sentences
Incorrect Word sentences
Correct minus incorrect word sentences
Mean length of correct word sentences
Written Expression Proficiency
Results Regression coefficient () /Beta weight– Distinct
contribution of a variable, excluding any overlap with other predictor variables. Unstandardized simple regression coefficient
Standardized regression coefficient - converted variables (independent and dependent) to z-scores before doing the regression. Indicates which independent variable has most effect on the dependent variable.
Results Multiple correlation coefficient (R) – the
correlation between the criterion variable and all the predictor variables taken together.
Squared Correlation Coefficient (R2) –The percent of variance in the dependent variable explained collectively by all of the independent variables.
R2adjusted - assessing the goodness of fit of a
regression equation. How well do the predictors (regressors), taken together, explain the variation in the dependent variable.
R2adj = 1 - (1-R2)(N-n-1)/(N-1)
R2adj
above 75% as very good; 50-75% as good; 25-50% as fair; below 25% as poor and perhaps
unacceptable. R2adj values above 90% are
rare in psychological data
Residual - The deviation of a particular point from the regression line (its predicted value).
t-tests - used to assess the significance of individual b coefficients.
F test - The F test is used to test the significance of R,
F = [R2/k]/[(1 - R2 )/(n - k - 1)].
Considerations in using multiple regression:
The units (usually people) observed should be a random sample from some well defined population.
The dependent variable should be measured on an interval, continuous scale.
The independent variables should be measured on interval scales
Considerations in using multiple regression:
The distributions of all the variables should be normal
The relationships between the dependent variable and the independent variable should be linear.
Although the independent variables can be correlated, there must be no perfect (or near-perfect) correlations among them, a situation called multicollinearity.
Considerations in using multiple regression:
There must be no interactions (in the anova sense) between independent variables
a rule of thumb for testing b coefficients is to have N >= 104 + m, where m = number of independent variables.
Reporting regression results:“The data were analyzed by multiple regression,
using as regressors age, income and gender. The regression was a rather poor fit (R2adj = 40%), but the overall relationship was significant (F3,12 = 4.32, p < 0.05). With other variables held constant, depression scores were negatively related to age and income, decreasing by 0.16 for every extra year of age, and by 0.09 for every extra pound per week income. Women tended to have higher scores than men, by 3.3 units. Only the effect of income was significant (t12 = 3.18, p < 0.01).
Partial Correlation
In its squared form is the percent of variance in the dependent uniquely and jointly attributable to the given independent when other variables in the equation are controlled
Stepwise Regression
y = ß0 + ß1x1 + ß2x2 + ß3x3 + ß4x4 + ß5x5 +
ß6x6 + ß7x7 + ß8x8 + ß9x9 + ß10x10 + ß11x11 +
ß12x12 + ß13x13 + ß14x14 + ß14x14 + choose a subset of the independent
variables which "best" explains the dependent variable.
Heirarchical Regression
1) Forward Selection Start by choosing the independent
variable which explains the most variation in the dependent variable.
Choose a second variable which explains the most residual variation, and then recalculate regression coefficients.
Continue until no variables "significantly" explain residual variation.
Stepwise Regression
2) Backward Selection Start with all the variables in the model, and
drop the least "significant", one at a time, until you are left with only "significant" variables.
3) Mixture of the two Perform a forward selection, but drop variables
which become no longer "significant" after introduction of new variables.
Hierarchical Regression
The researcher determines the order of entry of the variables.
F-tests are used to compute the significance of each added variable (or set of variables) to the explanation reflected in R-square
an alternative to comparing betas for purposes of assessing the importance of the independents
Categorical Regression
Used when there is a combination of nominal, ordinal, and interval-level independent variables.