View
217
Download
3
Embed Size (px)
Citation preview
EPI809/Spring 2008EPI809/Spring 2008 11
Testing Individual Testing Individual CoefficientsCoefficients
EPI809/Spring 2008EPI809/Spring 2008 22
Test of Slope Coefficient Test of Slope Coefficient pp
1.1. Tests if there is a Linear Relationship Tests if there is a Linear Relationship Between one Between one XX & & YY
2.2. Involves one single population Slope Involves one single population Slope pp
3.3. Hypotheses: HHypotheses: H00: : p p = 0 vs. H= 0 vs. Haa: : pp 0 0
EPI809/Spring 2008EPI809/Spring 2008 33
Test of Slope Coefficient Test of Slope Coefficient pp
Test StatisticTest Statistic
0ˆ ˆ 0( 1)
ˆ ˆ~
Hp p p
p p
t t n kS S
EPI809/Spring 2008EPI809/Spring 2008 44
Test of Slope Coefficient Test of Slope Coefficient Rejection RuleRejection Rule
Reject HReject H00 in favor of H in favor of Ha a if t falls in colored if t falls in colored
areaarea
Reject HReject H00 for H for Ha a if P-value = 2P(T>|t|)<if P-value = 2P(T>|t|)<αα
T=t(n-k-1)T=t(n-k-1)00 tt1-1-αα/2/2(n-k-1)(n-k-1)
Reject HReject H00 Reject HReject H00
αα/2/2
-t-t1-1-αα/2/2(n-k-1)(n-k-1)
αα/2/2
EPI809/Spring 2008EPI809/Spring 2008 55
Individual Coefficients Individual Coefficients SAS OutputSAS Output
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.06397 0.25986 0.25 0.8214
Food 1 0.20492 0.05882 3.48 0.0399
weight 1 0.28049 0.06860 4.09 0.0264
P
2
01
^̂
^̂
^̂^̂ βp/s ^̂p
^̂ P-value
EPI809/Spring 2008EPI809/Spring 2008 77
1.1. Tests the Contribution of a Tests the Contribution of a SetSet of of XX Variables to the Relationship With Variables to the Relationship With YY
2.2. Null Hypothesis HNull Hypothesis H00: : g+1g+1 = ... = = ... = kk = 0 = 0 Variables in Set Do Variables in Set Do NotNot Improve Improve
Significantly the Model When All Other Significantly the Model When All Other Variables Are Included Variables Are Included
3.3. Used in Selecting Used in Selecting XX Variables or Models Variables or Models
Testing Model PortionsTesting Model Portions
EPI809/Spring 2008EPI809/Spring 2008 88
Testing Model PortionsTesting Model PortionsNestedNested Models Models
HH00: : Reduced model (Reduced model (g+1g+1 = ... = = ... = kk = 0 = 0))
HHaa: : Full modelFull model
', iiggiii XXXY 22110
', iiggiii XXXY 22110
Y X X Xi i i k ki i 0 1 1 2 2 Y X X Xi i i k ki i 0 1 1 2 2
EPI809/Spring 2008EPI809/Spring 2008 99
F-Test for F-Test for NestedNested Models Models
NumeratorNumerator
Reduction in SSE from additional parametersReduction in SSE from additional parameters
dfdf = = k-gk-g = number of additional parameters = number of additional parameters
DenominatorDenominator
SSE of full modelSSE of full model
df=n-(k+1)=df=n-(k+1)=error df of full modelerror df of full model
1/
/
1/
/
knSSE
gkSSEinDROP
knSSE
gkSSESSEF
FF
FR
EPI809/Spring 2008EPI809/Spring 2008 1010
Selecting Variables Selecting Variables in Model Buildingin Model Building
EPI809/Spring 2008EPI809/Spring 2008 1111
Model Building with Computer Model Building with Computer SearchesSearches
1.1. Rule: Use as Few Rule: Use as Few XX Variables As Possible Variables As Possible
2.2. Stepwise Regression Stepwise Regression Computer Selects Computer Selects XX Variable Most Highly Variable Most Highly
Correlated With Correlated With YY Continues to Add or Remove Variables Continues to Add or Remove Variables
Depending on SSEDepending on SSE
3.3. Best Subset Approach Best Subset Approach Computer Examines All Possible SetsComputer Examines All Possible Sets
EPI809/Spring 2008EPI809/Spring 2008 1212
Residual Analysis for goodness of fitResidual Analysis for goodness of fit
yyresiduale ˆ
EPI809/Spring 2008EPI809/Spring 2008 1313
Residual (Estimated Errors) Residual (Estimated Errors) AnalysisAnalysis
1.1. Graphical Analysis of ResidualsGraphical Analysis of Residuals Plot Estimated Errors vs. Plot Estimated Errors vs. XXii Values (or pred.) Values (or pred.) Plot Histogram or Stem-&-Leaf of ResidualsPlot Histogram or Stem-&-Leaf of Residuals
2.2. PurposesPurposes- Examine Functional Form (Linear vs. Non-Examine Functional Form (Linear vs. Non-
Linear Model)Linear Model)- Evaluate Violations of Assumptions (to insure Evaluate Violations of Assumptions (to insure
validity of the statistic tests on validity of the statistic tests on ββ’s’s))
EPI809/Spring 2008EPI809/Spring 2008 1414
We recall Linear Regression We recall Linear Regression Assumptions Assumptions
1.1. Mean of Distribution of Error Is 0Mean of Distribution of Error Is 0
2.2. Distribution of Error Has Constant Distribution of Error Has Constant VarianceVariance
3.3. Distribution of Error is NormalDistribution of Error is Normal
4.4. Errors Are IndependentErrors Are Independent
EPI809/Spring 2008EPI809/Spring 2008 1515
X
e
Residual Plot Residual Plot for Functional Formfor Functional Form
X
e
Nonlinear patternNonlinear pattern Correct SpecificationCorrect Specification
EPI809/Spring 2008EPI809/Spring 2008 1616
Residual Plot Residual Plot for Equal Variancefor Equal Variance
X
SR
Unequal VarianceUnequal Variance
X
SR
Correct SpecificationCorrect Specification
Fan-shaped.Fan-shaped.Standardized residuals used typically (residual Standardized residuals used typically (residual divided by standard error of prediction) divided by standard error of prediction)
EPI809/Spring 2008EPI809/Spring 2008 1717
Residual Plot Residual Plot for Independencefor Independence
X
SR
Not IndependentNot Independent
X
SR
Correct SpecificationCorrect Specification
EPI809/Spring 2008EPI809/Spring 2008 1818
Residuals Diagnostics in SASResiduals Diagnostics in SAS
symbol v=dot h=symbol v=dot h=22 c=green; c=green;
PROCPROC REGREG data=Cow; data=Cow;
model milk = food weight;model milk = food weight;
plot residual.*predicted.plot residual.*predicted.
/cHREF=red cframe=ligr;/cHREF=red cframe=ligr;
runrun; ;
EPI809/Spring 2008EPI809/Spring 2008 1919
Mi l k = 0. 064 +0. 2049 Food +0. 2805 wei ght
N 6
Rsq 0. 9737
Adj Rsq0. 9561
RMSE 0. 2888
- 0. 5
- 0. 4
- 0. 3
- 0. 2
- 0. 1
0. 0
0. 1
0. 2
0. 3
Pr edi ct ed Val ue
0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0
EPI809/Spring 2008EPI809/Spring 2008 2020
Check for Outlying Observations Check for Outlying Observations and Influence analysisand Influence analysis
symbol v=dot h=symbol v=dot h=22 c=green; c=green;
procproc regreg data=cow; data=cow;model milk = food weight/influence;model milk = food weight/influence;plot plot rstudent.*obs.rstudent.*obs. / vref=- / vref=-22 22 cvref=blue cvref=blue
lvref=lvref=22 HREF= HREF=00 to to 77 by by 11 cHREF=red cHREF=red cframe=ligr;cframe=ligr;
runrun;;
EPI809/Spring 2008EPI809/Spring 2008 2121
Mi l k = 0. 064 +0. 2049 Food +0. 2805 wei ght
N 6
Rsq 0. 9737
Adj Rsq0. 9561
RMSE 0. 2888
- 5
- 4
- 3
- 2
- 1
0
1
2
Obser vat i on Number
1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0
EPI809/Spring 2008EPI809/Spring 2008 2222
Influence analysis of each obs.Influence analysis of each obs.
The REG Procedure Model: MODEL1 Dependent Variable: Milk
Output Statistics
Hat Diag Cov -----------DFBETAS----------- Obs Residual RStudent H Ratio DFFITS Intercept Food weight
1 0.1701 0.8283 0.5473 3.0770 0.9108 0.8436 -0.5503 0.0565 2 0.0527 0.2040 0.4552 5.8235 0.1865 -0.0632 -0.0215 0.1145 3 0.0408 0.1688 0.5271 6.8398 0.1782 0.1530 0.0335 -0.1211 4 -0.0520 -0.2266 0.5678 7.2379 -0.2597 -0.0164 0.1767 -0.2170 5 -0.4155 -4.0459 0.2260 0.0056 -2.1863 -0.9217 -1.0080 1.0753 6 0.2039 1.4531 0.6766 1.2013 2.1019 -0.5540 1.7420 -0.9265
EPI809/Spring 2008EPI809/Spring 2008 2323
MulticollinearityMulticollinearity
1.1. High Correlation Between High Correlation Between XX Variables Variables
2.2. Coefficients Measure Combined EffectCoefficients Measure Combined Effect
3.3. Leads to Unstable Coefficients Leads to Unstable Coefficients Depending on Depending on XX Variables in Model Variables in Model
4.4. Always ExistsAlways Exists
5. 5. Example: Using Both Age & Height of Example: Using Both Age & Height of children as indep. Var. in Same Model children as indep. Var. in Same Model
EPI809/Spring 2008EPI809/Spring 2008 2424
Detecting MulticollinearityDetecting Multicollinearity
1.1. Examine Correlation MatrixExamine Correlation Matrix Correlations Between Pairs of Correlations Between Pairs of XX Variables Are More Variables Are More
than With than With YY Variable Variable
2.2. Examine Variance Inflation Factor (VIF)Examine Variance Inflation Factor (VIF) If VIFIf VIFjj > 5 (or 10 according to most references), > 5 (or 10 according to most references),
Multicollinearity ExistsMulticollinearity Exists
3.3. Few RemediesFew Remedies Obtain New Sample DataObtain New Sample Data Eliminate One Correlated Eliminate One Correlated X X VariableVariable
EPI809/Spring 2008EPI809/Spring 2008 2525
SAS CODES :VET EXAMPLESAS CODES :VET EXAMPLE
PROC CORR data=vetPROC CORR data=vet;;
VARVAR milk food weight; milk food weight;
runrun;;
EPI809/Spring 2008EPI809/Spring 2008 2626
Correlation Matrix Correlation Matrix SAS Computer OutputSAS Computer Output
Pearson Correlation Coefficients, N = 6 Prob > |r| under H0: Rho=0
Milk Food weight
Milk 1.00000 0.90932 0.93117 0.0120 0.0069
Food 0.90932 1.00000 0.74118 0.0120 0.0918
weight 0.93117 0.74118 1.00000 0.0069 0.0918
rY1 rY2
All 1’sr12
EPI809/Spring 2008EPI809/Spring 2008 2727
Variance Inflation Factors Variance Inflation Factors SAS CODESSAS CODES
/* VIF measures the inflation in the variances of the /* VIF measures the inflation in the variances of the parameter estimates due to collinearity that exists parameter estimates due to collinearity that exists among the regressors or (dependent) variablesamong the regressors or (dependent) variables */ */
PROCPROC REGREG data=Cow; data=Cow;
model milk = food weight/model milk = food weight/VIFVIF;;
runrun; ;
EPI809/Spring 2008EPI809/Spring 2008 2828
Variance Inflation Factors Variance Inflation Factors Computer OutputComputer Output
Parameter Estimates
Parameter Standard Variance
Variable DF Estimate Error t Value Pr > |t| Inflation
Intercept 1 0.06397 0.25986 0.25 0.8214 0
Food 1 0.20492 0.05882 3.48 0.0399 2.21898
weight 1 0.28049 0.06860 4.09 0.0264 2.21898
VIF1 5
EPI809/Spring 2008EPI809/Spring 2008 2929
Types of Regression Models Types of Regression Models viewed from the explanatory viewed from the explanatory
variables standpointvariables standpoint
EPI809/Spring 2008EPI809/Spring 2008 3030
ExplanatoryVariable
1stOrderModel
3rdOrderModel
2 or MoreQuantitative
Variables
2ndOrderModel
1stOrderModel
2ndOrderModel
Inter-ActionModel
1Qualitative
Variable
DummyVariable
Model
1Quantitative
Variable
ExplanatoryVariable
1stOrderModel
3rdOrderModel
2 or MoreQuantitative
Variables
2ndOrderModel
1stOrderModel
2ndOrderModel
Inter-ActionModel
1Qualitative
Variable
DummyVariable
Model
1Quantitative
Variable
EPI809/Spring 2008EPI809/Spring 2008 3131
Regression Models Regression Models based on a Single based on a Single
Quantitative Explanatory Quantitative Explanatory VariableVariable
EPI809/Spring 2008EPI809/Spring 2008 3232
Types of Types of Regression ModelsRegression Models
ExplanatoryVariable
1stOrderModel
3rdOrderModel
2 or MoreQuantitative
Variables
2ndOrderModel
1stOrderModel
2ndOrderModel
Inter-ActionModel
1Qualitative
Variable
DummyVariable
Model
1Quantitative
Variable
ExplanatoryVariable
1stOrderModel
3rdOrderModel
2 or MoreQuantitative
Variables
2ndOrderModel
1stOrderModel
2ndOrderModel
Inter-ActionModel
1Qualitative
Variable
DummyVariable
Model
1Quantitative
Variable
EPI809/Spring 2008EPI809/Spring 2008 3333
First-Order Model With First-Order Model With 1 Independent Variable1 Independent Variable
EPI809/Spring 2008EPI809/Spring 2008 3434
First-Order Model With First-Order Model With 1 Independent Variable1 Independent Variable
1.1. Relationship Between 1 Dependent Relationship Between 1 Dependent & 1 Independent Variable Is Linear& 1 Independent Variable Is Linear
E Y X i( ) 0 1 1E Y X i( ) 0 1 1
EPI809/Spring 2008EPI809/Spring 2008 3535
First-Order Model With First-Order Model With 1 Independent Variable1 Independent Variable
1.1. Relationship Between 1 Dependent Relationship Between 1 Dependent & 1 & 1 Independent Variable Is LinearIndependent Variable Is Linear
2.2. Used When Expected Rate of Used When Expected Rate of Change Change in in YY Per Unit Change in Per Unit Change in XX Is Is StableStable
E Y X i( ) 0 1 1E Y X i( ) 0 1 1
EPI809/Spring 2008EPI809/Spring 2008 3636
First-Order Model RelationshipsFirst-Order Model Relationships
1 < 01 > 0Y
X1
Y
X1
E Y X i( ) 0 1 1E Y X i( ) 0 1 1
EPI809/Spring 2008EPI809/Spring 2008 3737
First-Order Model First-Order Model WorksheetWorksheet
Run regression with Run regression with YY, , XX11
EPI809/Spring 2008EPI809/Spring 2008 3838
Types of Types of Regression ModelsRegression Models
ExplanatoryVariable
1stOrderModel
3rdOrderModel
2 or MoreQuantitative
Variables
2ndOrderModel
1stOrderModel
2ndOrderModel
Inter-ActionModel
1Qualitative
Variable
DummyVariable
Model
1Quantitative
Variable
ExplanatoryVariable
1stOrderModel
3rdOrderModel
2 or MoreQuantitative
Variables
2ndOrderModel
1stOrderModel
2ndOrderModel
Inter-ActionModel
1Qualitative
Variable
DummyVariable
Model
1Quantitative
Variable
EPI809/Spring 2008EPI809/Spring 2008 3939
Second-Order Model With Second-Order Model With 1 Independent Variable1 Independent Variable
1.1. Relationship Between 1 Dependent & Relationship Between 1 Dependent & 1 1 Independent Variables Is a Quadratic Independent Variables Is a Quadratic
FunctionFunction
2.2. Useful 1Useful 1StSt Model If Non-Linear Model If Non-Linear Relationship SuspectedRelationship Suspected
EPI809/Spring 2008EPI809/Spring 2008 4040
Second-Order Model With Second-Order Model With 1 Independent Variable1 Independent Variable
1.1. Relationship Between 1 Dependent & Relationship Between 1 Dependent & 1 1 Independent Variables Is a Quadratic Independent Variables Is a Quadratic
FunctionFunction
2.2. Useful 1Useful 1StSt Model If Non-Linear Model If Non-Linear Relationship SuspectedRelationship Suspected
3.3. ModelModel
E Y X Xi i( ) 0 1 1 2 12E Y X Xi i( ) 0 1 1 2 12
Linear effectLinear effect
Curvilinear Curvilinear effecteffect
EPI809/Spring 2008EPI809/Spring 2008 4141
Y
X1
Y
X1
Second-Order Model Second-Order Model RelationshipsRelationships
Y
X1
Y
X1
Y
X1
Y
X1
Y
X1
Y
X1
2 > 02 > 0
2 < 02 < 0
EPI809/Spring 2008EPI809/Spring 2008 4242
Second-Order Model Second-Order Model WorksheetWorksheet
Case, i Yi X1i X1i2
1 1 1 1
2 4 8 64
3 1 3 9
4 3 5 25
: : : :
Case, i Yi X1i X1i2
1 1 1 1
2 4 8 64
3 1 3 9
4 3 5 25
: : : :
Create Create XX1122 column. column.
Run regression with Run regression with YY, , XX11, , XX1122. .
EPI809/Spring 2008EPI809/Spring 2008 4343
Types of Types of Regression ModelsRegression Models
ExplanatoryVariable
1stOrderModel
3rdOrderModel
2 or MoreQuantitative
Variables
2ndOrderModel
1stOrderModel
2ndOrderModel
Inter-ActionModel
1Qualitative
Variable
DummyVariable
Model
1Quantitative
Variable
ExplanatoryVariable
1stOrderModel
3rdOrderModel
2 or MoreQuantitative
Variables
2ndOrderModel
1stOrderModel
2ndOrderModel
Inter-ActionModel
1Qualitative
Variable
DummyVariable
Model
1Quantitative
Variable
EPI809/Spring 2008EPI809/Spring 2008 4444
Third-Order Model With Third-Order Model With 1 Independent Variable1 Independent Variable
1.1. Relationship Between 1 Dependent & Relationship Between 1 Dependent & 1 1 Independent Variable Has a ‘Wave’Independent Variable Has a ‘Wave’
2.2. Used If 1 Reversal in CurvatureUsed If 1 Reversal in Curvature
EPI809/Spring 2008EPI809/Spring 2008 4545
Third-Order Model With Third-Order Model With 1 Independent Variable1 Independent Variable
1.1. Relationship Between 1 Dependent & Relationship Between 1 Dependent & 1 1 Independent Variable Has a ‘Wave’Independent Variable Has a ‘Wave’
2.2. Used If 1 Reversal in CurvatureUsed If 1 Reversal in Curvature
3.3. ModelModel
E Y X X Xi i i( ) 0 1 1 2 12
3 13E Y X X Xi i i( ) 0 1 1 2 1
23 1
3
Linear effectLinear effect Curvilinear Curvilinear effectseffects
EPI809/Spring 2008EPI809/Spring 2008 4646
Third-Order Model Third-Order Model RelationshipsRelationships
Y
X1
Y
X1
Y
X1
Y
X1
3 < 03 > 0
E Y X X Xi i i( ) 0 1 1 2 12
3 13E Y X X Xi i i( ) 0 1 1 2 1
23 1
3
EPI809/Spring 2008EPI809/Spring 2008 4747
Third-Order Model WorksheetThird-Order Model Worksheet
Case, i Yi X1i X1i2 X1i
3
1 1 1 1 1
2 4 8 64 512
3 1 3 9 27
4 3 5 25 125
: : : : :
Case, i Yi X1i X1i2 X1i
3
1 1 1 1 1
2 4 8 64 512
3 1 3 9 27
4 3 5 25 125
: : : : :MultiplyMultiply X X11 byby X X1 1 to get to get XX11
22. .
MultiplyMultiply X X11 byby X X1 1 by by XX1 1 to get to get XX1133. .
Run regression with Run regression with YY, , XX11, , XX1122
, , XX1133..