Upload
gayora
View
65
Download
7
Tags:
Embed Size (px)
DESCRIPTION
Multiple Regression Analysis (MRA). Design requirements Multiple regression model R 2 Comparing standardized regression coefficients. Steps in data analysis. Look first at each variable separately Then at relationships among the variables - PowerPoint PPT Presentation
Citation preview
MULTIPLE REGRESSION ANALYSIS (MRA)
Design requirements
Multiple regression model
R2
Comparing standardized regression coefficients
STEPS IN DATA ANALYSIS Look first at each variable separately Then at relationships among the variables
Examine the distribution of each variable to be used in multiple regression to determine if there are any unusual patterns that may be important in building our regression analysis.
DISTRIBUTION OF VARIABLES
CORRELATION ANALYSIS If interested only in determining whether a relationship exists, use
correlation analysis. Example: Student’s height and weight.
Plot of Height vs Weight
100 140 180 220 260
Weight
4.6
5
5.4
5.8
6.2
6.6
7
Hei
ght
Plot of Height vs Weight
100 140 180 220 260
Weight
5.3
5.6
5.9
6.2
6.5
6.8
Hei
ght
Plot of Height vs Weight
100 140 180 220 260
Weight
5.4
5.8
6.2
6.6
7
Hei
ght
Plot of Height vs Weight
100 140 180 220 260
Weight
5
5.4
5.8
6.2
6.6
Hei
ght
CORRELATION ANALYSIS
Correlation coefficient close to +1=strong positive relationship.
Correlation coefficient close to -1= strong negative relationship.
Correlation coefficient close to 0= no relationship.
EXAMPLE: SELF CONCEPT AND ACADEMIC ACHIEVEMENT (N=103)
CORRELATION
MULTIPLE REGRESSION ANALYSIS (MRA)
Method for studying the relationship between a dependent variable and two or more independent variables.
Purposes: Prediction Explanation Theory building
DESIGN REQUIREMENTS
One dependent variable (criterion)
Two or more independent variables (predictor variables).
Sample size: >= 50 (at least 10 times as many cases as independent variables)
ASSUMPTIONS
Independence: The scores of any particular subject are independent of the scores of all other subjects
Normality: In the population, the scores on the dependent variable are normally distributed for each of the possible combinations of the level of the X variables; each of the variables is normally distributed
ASSUMPTIONS Homoscedasticity: In the population, the
variances of the dependent variable for each of the possible combinations of the levels of the X variables are equal.
Linearity: In the population, the relation between the dependent variable and the independent variable is linear when all the other independent variables are held constant.
HOMOSCEDASTICITY(HOMOGENEITY OF VARIANCE)
LINEAR REGRESSION
In simple linear regression the relationship between one explanatory variable (IV) and one response variable (DV).
In multiple regression, several explanatory variables work together to explain the dependent variable.
MODELS
WHAT IS A MODEL?
Representation of Some PhenomenonRepresentation of Some Phenomenon
(Non-Math/Stats Model)(Non-Math/Stats Model)
WHAT IS A MATH/STATS MODEL?
Describe Relationship between Variables
Types- Deterministic Models
(no randomness)
- Probabilistic Models
(with randomness)
DETERMINISTIC MODELS
1. Hypothesize Exact Relationships
2. Suitable When Prediction Error is Negligible
3. Example: Body mass index (BMI) is measure of body fat based on this formula.
Non-metric Formula: BMI = Weight (pounds)x703 (Height in inches)2
PROBABILISTIC MODELS
1. Hypothesize 2 Components Deterministic Random Error
2. Example: Systolic blood pressure (SBP) of newborns is 6 Times the Age in days + Random Error
SBP = 6xage(d) + Random Error May Be Due to Factors
Other than age in days (e.g. Birth weight)
TYPES OF PROBABILISTIC MODELS
ProbabilisticModels
RegressionModels
CorrelationModels
OtherModels
ProbabilisticModels
RegressionModels
CorrelationModels
OtherModels
REGRESSION MODELS
TYPES OF PROBABILISTIC MODELS
ProbabilisticModels
RegressionModels
CorrelationModels
OtherModels
ProbabilisticModels
RegressionModels
CorrelationModels
OtherModels
REGRESSION MODELS Relationship between one dependent variable
and explanatory variable(s)
Use equation to set up relationship Numerical Dependent (Response) Variable 1 or More Numerical or Categorical
Independent (Explanatory) Variables
Used Mainly for Prediction & Estimation
REGRESSION MODELING STEPS
1. Hypothesize Deterministic Component Estimate Unknown Parameters
2. Specify Probability Distribution of Random Error Term
Estimate Standard Deviation of Error
3. Evaluate the fitted Model
4. Use Model for Prediction & Estimation
MULTIPLE REGRESSION
Very popular among social scientists.Most social phenomena have more than
one cause.
Very difficult to manipulate just one social variable through experimentation.
Social scientists must attempt to model complex social realities to explain them.
MULTIPLE REGRESSIONAllows us to:
Use several variables at once to explain the variation in a continuous dependent variable.
Isolate the unique effect of one variable on the continuous dependent variable while taking into consideration that other variables are affecting it too.
Write a mathematical equation that tells us the overall effects of several variables together and the unique effects of each on a continuous dependent variable.
Control for other variables to demonstrate whether bivariate relationships are spurious
*** MULTIPLE REGRESSION For example:
A researcher may be interested in the relationship between Education and Income and Number of Children in a family.
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
MULTIPLE REGRESSION For example:
Research Hypothesis: As education of respondents increases, the number of children in families will decline (negative relationship).
Research Hypothesis: As family income of respondents increases, the number of children in families will decline (negative relationship).
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
MULTIPLE REGRESSION For example:
Null Hypothesis: There is no relationship between education of respondents and the number of children in families.
Null Hypothesis: There is no relationship between family income and the number of children in families.
Independent Variables
Education
Family Income
Dependent Variable
Number of Children
MULTIPLE REGRESSION
Model Summary
.757a .573 .534 2.33785Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Income, Educationa. ANOVAb
161.518 2 80.759 14.776 .000a
120.242 22 5.466
281.760 24
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Income, Educationa.
Dependent Variable: Childrenb.
Coefficientsa
11.770 1.734 6.787 .000
-.364 .173 -.412 -2.105 .047
-.403 .194 -.408 -2.084 .049
(Constant)
Education
Income
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Childrena.
57% of the variation in number of children is explained by education and income!
Predictable variation by combination of independent variables
EXPLAINING VARIATION: HOW MUCH?
Total Variation in Y
UnpredictableVariation
PROPORTION OF PREDICTABLE AND UNPREDICTABLE VARIATION
X1
Y
(1-R2) = Unpredictable (unexplained) variation in Y
X2
Where:Y= # ChildrenX1 = EducationX2 = Income
R2 = Predictable (explained) variation in Y
MULTIPLE REGRESSIONNow… More Variables! The social world is very complex. What happens when you have even more variables?
For example:
A researcher may be interested in the effects of Education, Income, Sex, and Gender Attitudes on Number of Children in a family.
Independent Variables
Education
Family Income
Sex
Gender Attitudes
Dependent Variable
Number of Children
SIMPLE VS. MULTIPLE REGRESSION
One dependent variable Y predicted from one independent variable X
One regression coefficient
r2: proportion of variation in dependent variable Y predictable from X
One dependent variable Y predicted from a set of independent variables (X1, X2 ….Xk)
One regression coefficient for each independent variable
R2: proportion of variation in dependent variable Y predictable by set of independent variables (X’s)
DIFFERENT WAYS OF BUILDING REGRESSION MODELS
Simultaneous (Enter): All independent variables entered together
Stepwise: Independent variables entered according to some order (Determined by researcher) By size or correlation with dependent variable In order of significance (theory)
Hierarchical (Forward, Backward): Independent variables entered in stages
MULTIPLE REGRESSION:BLUE CRITERIA
Regression forces a best-fitting model onto data. If the model is appropriate for the data, regression should be used.
How do we know that our model is appropriate for the data?
Criteria for determining whether a regression model is appropriate for the data are nicknamed “BLUE” for best linear unbiased estimate.
MULTIPLE REGRESSION:BLUE CRITERIA
Violating the BLUE assumptions may result in biased estimates or incorrect significance tests. (However, OLS is robust to most violations.)
Data (constellation) should meet these criteria: The relationship between the dependent variable and its
predictors is linear No irrelevant variables are either omitted from or included in
the equation. (Good luck!) All variables are measured without error. (Good luck!)