Upload
maryann-payne
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 1
S052/§I.1(b): Applied Data AnalysisRoadmap of the Course – What Is Today’s Topic?
S052/§I.1(b): Applied Data AnalysisRoadmap of the Course – What Is Today’s Topic?
More details can be found in the “Course Objectives and Content” handout on the course webpage.More details can be found in the “Course Objectives and Content” handout on the course webpage.
Multiple RegressionAnalysis (MRA)
Multiple RegressionAnalysis (MRA) iiii XXY 22110
Do your residuals meet the required assumptions?
Test for residual
normality
Use influence statistics to
detect atypical datapoints
If your residuals are not independent,
replace OLS by GLS regression analysis
Use Individual
growth modeling
Specify a Multi-level
Model
If your sole predictor is continuous, MRA is
identical to correlational analysis
If your sole predictor is dichotomous, MRA is identical to a t-test
If your several predictors are
categorical, MRA is identical to ANOVA
If time is a predictor, you need discrete-
time survival analysis…
If your outcome is categorical, you need to
use…
Binomial logistic
regression analysis
(dichotomous outcome)
Multinomial logistic
regression analysis
(polytomous outcome)
If you have more predictors than you
can deal with,
Create taxonomies of fitted models and compare
them.Form composites of the indicators of any common
construct.
Conduct a Principal Components Analysis
Use Cluster Analysis
Use non-linear regression analysis.
Transform the outcome or predictor
If your outcome vs. predictor relationship
is non-linear,
How do you deal with missing
data?
Today’s Topic Area
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 2
S052/§I.1(b): Applied Data AnalysisWhere Does Today’s Topic Appear in the Printed Syllabus?
S052/§I.1(b): Applied Data AnalysisWhere Does Today’s Topic Appear in the Printed Syllabus?
Don’t forget to check the inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of the day’s class when you download and pre-read the day’s materials.
Don’t forget to check the inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of the day’s class when you download and pre-read the day’s materials.
Syllabus Section I.1(b),Syllabus Section I.1(b), on on Testing Complex Hypotheses About Regression Testing Complex Hypotheses About Regression ParametersParameters, includes, includes: Sometimes you may need to test more complex hypotheses (Slide 3). Framing a joint hypothesis on several regression parameters simultaneously (Slide 4). Using the GLH strategy to test a joint hypothesis (Slides 5-7). Using the GLH test at critical decision points in taxonomy-building (Slide 8). The statistical underpinning of the GLH test (Slide 9). Conducting a GLH test by hand (Slide 10). Fascinating addition to the ILLCAUSE taxonomy of fitted models (Slides 11-13). Appendix 1: Why is SSModel a reasonable summary of model goodness-of-fit?
Syllabus Section I.1(b),Syllabus Section I.1(b), on on Testing Complex Hypotheses About Regression Testing Complex Hypotheses About Regression ParametersParameters, includes, includes: Sometimes you may need to test more complex hypotheses (Slide 3). Framing a joint hypothesis on several regression parameters simultaneously (Slide 4). Using the GLH strategy to test a joint hypothesis (Slides 5-7). Using the GLH test at critical decision points in taxonomy-building (Slide 8). The statistical underpinning of the GLH test (Slide 9). Conducting a GLH test by hand (Slide 10). Fascinating addition to the ILLCAUSE taxonomy of fitted models (Slides 11-13). Appendix 1: Why is SSModel a reasonable summary of model goodness-of-fit?
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 3
In creating the taxonomy of fitted regression models featured at the end of the last class, I made some decisions that were complex, particularly when I retained, deleted or modified predictors later in the taxonomy … for instance:
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersSometimes You Need Tests That Are A Little More Complicated!!!
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersSometimes You Need Tests That Are A Little More Complicated!!!
As an example, in M5, there were several two-way and three-way interactions, none having a separately statistically significant impact on
outcome ILLCAUSE (at =.05).
I eliminated them as a group and “fell back” on Model M4 before continuing my journey!
As an example, in M5, there were several two-way and three-way interactions, none having a separately statistically significant impact on
outcome ILLCAUSE (at =.05).
I eliminated them as a group and “fell back” on Model M4 before continuing my journey!
These terms represent all possible interactions among subsidiary control predictor SES and all earlier predictors included in the model.
They were therefore less important to me, substantively, as a group.
These terms represent all possible interactions among subsidiary control predictor SES and all earlier predictors included in the model.
They were therefore less important to me, substantively, as a group.
So, for efficiency, I dropped them as a group. But, before I did this, I also checked that they
did not make a difference as a group.
How? I used a GENERAL LINEAR HYPOTHESIS (GLH) TEST to assess
whether their joint impact on the outcome was simultaneously statistically significant.
This is a useful strategy because it helps me “preserve” my Type I Error.
So, for efficiency, I dropped them as a group. But, before I did this, I also checked that they
did not make a difference as a group.
How? I used a GENERAL LINEAR HYPOTHESIS (GLH) TEST to assess
whether their joint impact on the outcome was simultaneously statistically significant.
This is a useful strategy because it helps me “preserve” my Type I Error.
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 4
Here’s a “formal” specification of the null hypothesis that I tested by GLH in M5 … you start with the model itself:Here’s a “formal” specification of the null hypothesis that I tested by GLH in M5 … you start with the model itself:
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersFraming a Joint Hypothesis About the Simultaneous Impact of Several Predictors
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersFraming a Joint Hypothesis About the Simultaneous Impact of Several Predictors
iiiiiii
iiiiii
iiiiiiiii
SESAGEASESAGEDSESAGESESASESD
SESAGEAAGEDAGEADILLCAUSE
)()()()()(
)()(
1110
987
6543210
In other words, in Model M5, I must test (and, hopefully, here, fail to reject) the following joint or simultaneous null hypothesis:
In other words, in Model M5, I must test (and, hopefully, here, fail to reject) the following joint or simultaneous null hypothesis:
0;0 ;0 ;0 ;0: 11109870 H
This is the same as testing, in words:H0: In the population, ILLCAUSE is not related to the two-way interaction of HEALTH and SES, the two-way interaction of AGE and SES, and the three-way interaction of HEALTH by AGE by SES, controlling for ...
This is the same as testing, in words:H0: In the population, ILLCAUSE is not related to the two-way interaction of HEALTH and SES, the two-way interaction of AGE and SES, and the three-way interaction of HEALTH by AGE by SES, controlling for ...
To simultaneously eliminate all these interactions involving SES from the model, I
must confirm that all their slope parameters – that’s β7, β8, β9, β10, β11 – are zero concurrently,
in the population.
PROC REG DATA=ILLCAUSE; VAR ILLCAUSE D A H AGE SES; * Estimating the total main effect of health status; M1: MODEL ILLCAUSE = D A; T1: TEST D=0, A=0; * Accounting for important issues of research design; * Controlling for the presence of multiple age-cohorts of children; * Checking the main effect of AGE; M2: MODEL ILLCAUSE = D A AGE; * Checking the two-way interaction of health status and AGE; M3: MODEL ILLCAUSE = D A AGE DxAGE AxAGE; T3: TEST DxAGE=0, AxAGE=0; * Controlling for additional substantive effects; * Checking the main effect of socioeconomic status; M4: MODEL ILLCAUSE = D A AGE DxAGE AxAGE SES; * Checking that all interactions with SES are not needed; M5: MODEL ILLCAUSE = D A AGE DxAGE AxAGE SES DxSES AxSES AGExSES DxAGExSES AxAGExSES; T5: TEST DxSES=0, AxSES=0, AGExSES=0, DxAGExSES=0, AxAGExSES=0;
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 5
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters It’s Easy o Test a Joint Hypothesis About the Impact of Several Predictors
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters It’s Easy o Test a Joint Hypothesis About the Impact of Several Predictors
Doing the test is easy ... look at Data-Analytic Handout I.1(b).1:Doing the test is easy ... look at Data-Analytic Handout I.1(b).1:
I just added a “TEST” command to Model M5I just added a “TEST”
command to Model M5
CAUTION! Phrasing of the TEST command in SAS is misleading: It says “Test that the predictors are
zero”!!!!This, of course, is wacko – we
actually want to test that the regression parameters associated with those predictors are zero.
The predictors are certainly NOT “zero”, after all each person has her own value on each, and none of them are zero!!!
CAUTION! Phrasing of the TEST command in SAS is misleading: It says “Test that the predictors are
zero”!!!!This, of course, is wacko – we
actually want to test that the regression parameters associated with those predictors are zero.
The predictors are certainly NOT “zero”, after all each person has her own value on each, and none of them are zero!!!
This is just a poor choice of programming language.
Do not be misled!
This is just a poor choice of programming language.
Do not be misled!
Regression ParametersRegression Parameters
Associated PredictorAssociated Predictor
Analysis of Variance
Sum of MeanSource DF Squares Square F Value Pr > FModel 11 139.43769 12.67615 37.15 <.0001Error 182 62.10945 0.34126Corrected Total 193 201.54714 Parameter Estimates
Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 1.48021 0.49622 2.98 0.0032D 1 -2.09934 1.39143 -1.51 0.1331A 1 -0.76832 1.06673 -0.72 0.4723AGE 1 0.02492 0.00361 6.90 <.0001DxAGE 1 0.00728 0.00942 0.77 0.4404AxAGE 1 0.00025202 0.00766 0.03 0.9738SES 1 0.26339 0.25308 1.04 0.2994DxSES 1 0.72592 0.53338 1.36 0.1752AxSES 1 0.11595 0.40164 0.29 0.7731AGExSES 1 -0.00255 0.00179 -1.42 0.1573DxAGExSES 1 -0.00460 0.00353 -1.30 0.1938AxAGExSES 1 -0.00121 0.00286 -0.42 0.6722 Test T5 Results for Dependent Variable ILLCAUSE
Mean Source DF Square F Value Pr > F Numerator 5 0.72216 2.12 0.0654 Denominator 182 0.34126
Analysis of Variance
Sum of MeanSource DF Squares Square F Value Pr > FModel 11 139.43769 12.67615 37.15 <.0001Error 182 62.10945 0.34126Corrected Total 193 201.54714 Parameter Estimates
Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 1.48021 0.49622 2.98 0.0032D 1 -2.09934 1.39143 -1.51 0.1331A 1 -0.76832 1.06673 -0.72 0.4723AGE 1 0.02492 0.00361 6.90 <.0001DxAGE 1 0.00728 0.00942 0.77 0.4404AxAGE 1 0.00025202 0.00766 0.03 0.9738SES 1 0.26339 0.25308 1.04 0.2994DxSES 1 0.72592 0.53338 1.36 0.1752AxSES 1 0.11595 0.40164 0.29 0.7731AGExSES 1 -0.00255 0.00179 -1.42 0.1573DxAGExSES 1 -0.00460 0.00353 -1.30 0.1938AxAGExSES 1 -0.00121 0.00286 -0.42 0.6722 Test T5 Results for Dependent Variable ILLCAUSE
Mean Source DF Square F Value Pr > F Numerator 5 0.72216 2.12 0.0654 Denominator 182 0.34126
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 6
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersStandard PC-SAS Regression Output from the “TEST” Command
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersStandard PC-SAS Regression Output from the “TEST” Command
Here’s the results of the regression analysis for Model M5, and the accompanying GLH test …Here’s the results of the regression analysis for Model M5, and the accompanying GLH test …
Here are the usual regression parameter estimates, standard
errors, t-statistics, and p-values, etc.
Here are the usual regression parameter estimates, standard
errors, t-statistics, and p-values, etc.
Here is the General Linear General Linear Hypothesis TestHypothesis Test.
We’ll decode its pieces in a moment, but notice the
interesting connections with the regression ANOVA table!
Here is the General Linear General Linear Hypothesis TestHypothesis Test.
We’ll decode its pieces in a moment, but notice the
interesting connections with the regression ANOVA table!
Here’s the usual regression “ANOVA” table, which summarizes variability in the ILLCAUSE outcome:
Here’s the usual regression “ANOVA” table, which summarizes variability in the ILLCAUSE outcome:
• Some of it was predicted successfully (“Model”).
• Some of it was predicted successfully (“Model”).
• Some became residual variability (“Error”).
• Some became residual variability (“Error”).
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 7
Like any test, you reject H0 if the observed value of the test statistic is larger than the corresponding critical critical valuevalue.
Here, Fobserved = 2.12,
Fcritical = F5,182( = .05) =2.26
Because Fobserved < Fcritical we cannot reject:
Like any test, you reject H0 if the observed value of the test statistic is larger than the corresponding critical critical valuevalue.
Here, Fobserved = 2.12,
Fcritical = F5,182( = .05) =2.26
Because Fobserved < Fcritical we cannot reject:
0 :0
SESAGEASESAGED
SESAGESESASESDH
In practice, you can compare the p-value to an -level. Here,
Observed p-value = 0.0654, Chosen -level = .05, say.
Because p > .05, you cannot reject H0.
In practice, you can compare the p-value to an -level. Here,
Observed p-value = 0.0654, Chosen -level = .05, say.
Because p > .05, you cannot reject H0.
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersInterpreting the Statistics Provided by the GLH Test
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersInterpreting the Statistics Provided by the GLH Test
Test T5 Results for Dependent Variable ILLCAUSE MeanSource DF Square F Value Pr > FNumerator 5 0.72216 2.12 0.0654Denominator 182 0.34126
Test T5 Results for Dependent Variable ILLCAUSE MeanSource DF Square F Value Pr > FNumerator 5 0.72216 2.12 0.0654Denominator 182 0.34126
Either way, we conclude that the group of regression parameters associated with all interactions between group of regression parameters associated with all interactions between subsidiary control predictor SESsubsidiary control predictor SES and other predictors in in Model M5Model M5 are jointly zero in the populationjointly zero in the population.
Thus, we can remove themthem simultaneously simultaneously, returning to the more parsimonious Model M4 before continuing.
Either way, we conclude that the group of regression parameters associated with all interactions between group of regression parameters associated with all interactions between subsidiary control predictor SESsubsidiary control predictor SES and other predictors in in Model M5Model M5 are jointly zero in the populationjointly zero in the population.
Thus, we can remove themthem simultaneously simultaneously, returning to the more parsimonious Model M4 before continuing.
Working with the results of a GLH test is typical …Working with the results of a GLH test is typical …
PROC REG DATA=ILLCAUSE; VAR ILLCAUSE D A H AGE SES; * Estimating the total main effect of health status; M1: MODEL ILLCAUSE = D A; T1: TEST D=0, A=0; * Accounting for important issues of research design; * Controlling for the presence of multiple age-cohorts of children; * Checking the main effect of AGE; M2: MODEL ILLCAUSE = D A AGE; * Checking the two-way interaction of health status and AGE; M3: MODEL ILLCAUSE = D A AGE DxAGE AxAGE; T3: TEST DxAGE=0, AxAGE=0; * Controlling for additional substantive effects; * Checking the main effect of socioeconomic status; M4: MODEL ILLCAUSE = D A AGE DxAGE AxAGE SES; * Checking that all interactions with SES are not needed; M5: MODEL ILLCAUSE = D A AGE DxAGE AxAGE SES DxSES AxSES AGExSES DxAGExSES AxAGExSES; T5: TEST DxSES=0, AxSES=0, AGExSES=0, DxAGExSES=0, AxAGExSES=0;
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 8
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters You Can Also Use the GLH Test To Support Other Kinds of Conclusions & Decisions
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters You Can Also Use the GLH Test To Support Other Kinds of Conclusions & Decisions
Back to Data-Analytic Handout I.1(b).1:Back to Data-Analytic Handout I.1(b).1:
0 ;0:0 ADH
0 ;0:0 AGEAAGEDH
There are other options I could have exercised, but I wanted to be analytically and substantively
efficient, and to conserve my Type I error.
There are other options I could have exercised, but I wanted to be analytically and substantively
efficient, and to conserve my Type I error.
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 9
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersIt’s a Good Idea To Collect It All Together in Your APA-Style Exhibit!
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersIt’s a Good Idea To Collect It All Together in Your APA-Style Exhibit!
Note that I have had to eliminate the table caption, in order to fit the exhibit onto this slide, and leave it intelligble – see the handout for a complete table.
The real question, of course, is:On What Statistical Principles Are
These GLH Tests Based????
Here are the results of the three GLH tests that we have conducted and discussed, so far.
Notice that I have included the key statistics:Null hypothesis.F-statistic. “Numerator” and “denominator” degrees
of freedom.p-value.Testing decision.
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 10
iiiiiiiii AGEAAGEDAGEADILLCAUSEM )()(:3 543210 iiiiiiiii AGEAAGEDAGEADILLCAUSEM )()(:3 543210
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersWhat Is the Statistical Rationale That Underpins the GLH Test?
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersWhat Is the Statistical Rationale That Underpins the GLH Test?
It’s all about comparing the fit of competing models … think about Test T3 in Model M3 …It’s all about comparing the fit of competing models … think about Test T3 in Model M3 …
In M3, I used the GLH strategy to test:In M3, I used the GLH strategy to test:
0 ;0: 0 AGEAAGEDH This means that comparing the fit of “full model”
M3 to the fit of “reduced model” M2 provides the following logic: If predictors DAGE and AAGE were really
needed in M3, then removing them would undermine the “success” of the prediction.
And, then, in M2, ILLCAUSE would be predicted markedly less well than in M3.
So, clearly, we can test our joint H0 by checking whether “reduced model” M2 fits “less well” than “full model” M3.
This is the GLH testing strategy, and it uses the “SSModel” statistic as a summary of model fit.
This means that comparing the fit of “full model” M3 to the fit of “reduced model” M2 provides the following logic: If predictors DAGE and AAGE were really
needed in M3, then removing them would undermine the “success” of the prediction.
And, then, in M2, ILLCAUSE would be predicted markedly less well than in M3.
So, clearly, we can test our joint H0 by checking whether “reduced model” M2 fits “less well” than “full model” M3.
This is the GLH testing strategy, and it uses the “SSModel” statistic as a summary of model fit.
If I were to NOT reject H0, then I would prefer a model in which:
βDAGE & βAAGE were jointly zero.
Such a model would not contain the DAGE and AAGE interactions (i.e. it would have no two-way HEALTHAGE interaction).
This latter model is, of course, Model M2.
If I were to NOT reject H0, then I would prefer a model in which:
βDAGE & βAAGE were jointly zero.
Such a model would not contain the DAGE and AAGE interactions (i.e. it would have no two-way HEALTHAGE interaction).
This latter model is, of course, Model M2.
iiiii AGEADILLCAUSEM 3210:2 iiiii AGEADILLCAUSEM 3210:2
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 11
We can check whether the difference is “statistically significant” by converting these differences in SSModel and dfModel into an F-statistic:
And because Fobserved is larger than critical value, Fcritical = F2,188(.05) = 3.044, we can reject H0: βDAGE = 0; βAAGE = 0
We can check whether the difference is “statistically significant” by converting these differences in SSModel and dfModel into an F-statistic:
And because Fobserved is larger than critical value, Fcritical = F2,188(.05) = 3.044, we can reject H0: βDAGE = 0; βAAGE = 0
) ) )
) ) ) 70.4
356.0
2352.3
Model Fullin Variance Residual
in Changein Change
dfModelSSModelFobserved
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersConducting a GLH Test By Hand
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersConducting a GLH Test By Hand
The GLH testing strategy compares the fits of selected full and a reduced models, as follows …The GLH testing strategy compares the fits of selected full and a reduced models, as follows …
ModelRole
Of ModelPredictorsIn Model
Constraint Imposed To Force Full Model To
Become Reduced ModelSSModel
Change in
SSModeldfModel
ChangeIn
dfModel
M3 FullD, A, AGE,
DAGE, AAGE,134.532 5
M2 Reduced D, A, AGE, 131.180 3
This is the observed F-statistic provided by the GLH test.
This is the observed F-statistic provided by the GLH test.
Key Question: Is losing 3.352 units of fit from SSModel worth gaining 2
extra degrees of freedom?
Key Question: Is losing 3.352 units of fit from SSModel worth gaining 2
extra degrees of freedom?
The constraint that was imposed to make the full model become the reduced model is actually a statement of the null hypothesis being tested.The constraint that was imposed to make the full model become the reduced model is actually a statement of the null hypothesis being tested.
3.352 2
This is the critical F-statistic implicit in the GLH test. The “denominator” df are those of the residual variance in the full model.
This is the critical F-statistic implicit in the GLH test. The “denominator” df are those of the residual variance in the full model.
0 0
AGEA
AGED
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 12
S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersAfterthoughts: Simplifying the “Last” Model in the Taxonomy Further?S052/§I.1(b): Testing Complex Hypotheses About Regression ParametersAfterthoughts: Simplifying the “Last” Model in the Taxonomy Further?
When only main effects of chronic illness are present in the models, the estimated impact of each of the predictors D and A – which jointly represent the child’s HEALTH status -- are very similar in magnitude: Perhaps the main effect on ILLCAUSE of being diabetic
is really no different from the effect of being asthmatic? In terms of the main effects, perhaps the real difference
is simply that it matters whether the child is chronically ill or not?
When only main effects of chronic illness are present in the models, the estimated impact of each of the predictors D and A – which jointly represent the child’s HEALTH status -- are very similar in magnitude: Perhaps the main effect on ILLCAUSE of being diabetic
is really no different from the effect of being asthmatic? In terms of the main effects, perhaps the real difference
is simply that it matters whether the child is chronically ill or not?
You reach a similar conclusion when you examine the corresponding two-way interactions with AGE Notice, again, that the estimated impact of predictors DxAGE and AxAGE – which jointly represent the two-way interaction of the child’s HEALTH status and their AGE – are also very similar in magnitude: Perhaps the effect on ILLCAUSE of the interaction
between diabetic and AGE is really no different from the effect of the interaction of asthmatic and AGE?
Perhaps the real difference here is simply that it matters whether we include the interaction of AGE with whether the child is chronically ill or not?
You reach a similar conclusion when you examine the corresponding two-way interactions with AGE Notice, again, that the estimated impact of predictors DxAGE and AxAGE – which jointly represent the two-way interaction of the child’s HEALTH status and their AGE – are also very similar in magnitude: Perhaps the effect on ILLCAUSE of the interaction
between diabetic and AGE is really no different from the effect of the interaction of asthmatic and AGE?
Perhaps the real difference here is simply that it matters whether we include the interaction of AGE with whether the child is chronically ill or not?
You can use the GLH strategy to test this hunch, and I have done this, in my preliminary “final model” M4.
You can use the GLH strategy to test this hunch, and I have done this, in my preliminary “final model” M4.
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 13
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters And What Do We Find?
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters And What Do We Find?
PROC REG DATA=ILLCAUSE; VAR ILLCAUSE D A H AGE SES; * Checking whether we can collapse the separate health effects; M4: MODEL ILLCAUSE = D A AGE DxAGE AxAGE SES; T4: TEST D=A, DxAGE=AxAGE;
PROC REG DATA=ILLCAUSE; VAR ILLCAUSE D A H AGE SES; * Checking whether we can collapse the separate health effects; M4: MODEL ILLCAUSE = D A AGE DxAGE AxAGE SES; T4: TEST D=A, DxAGE=AxAGE;
First, I added the extra GLH test to a refit of Model M4, at the end of the I.1(b).1 handout, as follows …
Notice the interesting, and different, nature of the null hypothesis:
(It turns out that the GLH strategy can be used to test any hypothesis that can be framed as a linear weighted combination of parameters, or linear contrast. I will return to this, and explain how, in a week or so!)
AGEAAGEDAD andH :0
Test T4 Results for Dependent Variable ILLCAUSE
Mean Source DF Square F Value Pr > F Numerator 2 0.02868 0.08 0.9217 Denominator 187 0.35145
Test T4 Results for Dependent Variable ILLCAUSE
Mean Source DF Square F Value Pr > F Numerator 2 0.02868 0.08 0.9217 Denominator 187 0.35145
This means that we can simplify Model M4 still further!
Notice that Fobserved is very small and p>.05 so we do not reject H0.
So, it doesn’t matter whether the child is diabetic or asthmatic, all that matters is whether he or she is chronically ill or not!!
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 14
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters This Leads to Model M6 – The Model We Will Eventually Interpret!
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters This Leads to Model M6 – The Model We Will Eventually Interpret!
Replace predictors D and A in M4, with predictor ILL as both a main effect and an interaction with AGE, to provide Model M6…
Replace predictors D and A in M4, with predictor ILL as both a main effect and an interaction with AGE, to provide Model M6…
DATA ILLCAUSE; SET ILLCAUSE; * Creating a new question predictor to identify ill children; IF D=1 OR A=1 THEN ILL=1; ELSE ILL=0; * Creating a new two-way interaction of ILL and AGE; ILLxAGE = ILL*AGE;
PROC REG DATA=ILLCAUSE; VAR ILLCAUSE D A H AGE SES; M6: MODEL ILLCAUSE = ILL AGE ILLxAGE SES;
DATA ILLCAUSE; SET ILLCAUSE; * Creating a new question predictor to identify ill children; IF D=1 OR A=1 THEN ILL=1; ELSE ILL=0; * Creating a new two-way interaction of ILL and AGE; ILLxAGE = ILL*AGE;
PROC REG DATA=ILLCAUSE; VAR ILLCAUSE D A H AGE SES; M6: MODEL ILLCAUSE = ILL AGE ILLxAGE SES;
This is the “final model”
we will interpret later!
This is the “final model”
we will interpret later!
© Willett, Harvard University Graduate School of Education, 04/21/23 S052/I.1(b) – Slide 15
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters Appendix 1: Why Is SSModel A Decent Summary of Model Goodness of Fit?
S052/§I.1(b): Testing Complex Hypotheses About Regression Parameters Appendix 1: Why Is SSModel A Decent Summary of Model Goodness of Fit?
Y
X
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
Y
X
) YYiPredicted ) YYiPredicted
“Model”“Model”
) Y-YiTotal ) Y-YiTotal==
“Total”“Total”
You can square and add these deviations across everyone in the sample to summarize the state of the model’s prediction:
You can square and add these deviations across everyone in the sample to summarize the state of the model’s prediction:
When the model fits the data well, SSModel is big compared to SSError.
When the model fits the data poorly, SSModel is small compared to SSError.
When the model fits the data well, SSModel is big compared to SSError.
When the model fits the data poorly, SSModel is small compared to SSError.
)
2ˆSSModel"" Y-YSum i
)
2ˆSSError"" ii Y-YSum
) 2SSTotal"" Y-YSum i
R2 statistic summarizes all this, because:R2 statistic summarizes all this, because:
SSTotal
SSError
SSTotal
SSModelR 12
)ii Y-Y ˆResidual )ii Y-Y ˆResidual
“Error”“Error”Re-centering the vertical axis on the average value of Y …Re-centering the vertical axis on the average value of Y …