47
EPI809/Spring 2008 EPI809/Spring 2008 1 Testing Individual Testing Individual Coefficients Coefficients

EPI809/Spring 2008 1 Testing Individual Coefficients

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

EPI809/Spring 2008EPI809/Spring 2008 11

Testing Individual Testing Individual CoefficientsCoefficients

EPI809/Spring 2008EPI809/Spring 2008 22

Test of Slope Coefficient Test of Slope Coefficient pp

1.1. Tests if there is a Linear Relationship Tests if there is a Linear Relationship Between one Between one XX & & YY

2.2. Involves one single population Slope Involves one single population Slope pp

3.3. Hypotheses: HHypotheses: H00: : p p = 0 vs. H= 0 vs. Haa: : pp 0 0

EPI809/Spring 2008EPI809/Spring 2008 33

Test of Slope Coefficient Test of Slope Coefficient pp

Test StatisticTest Statistic

0ˆ ˆ 0( 1)

ˆ ˆ~

Hp p p

p p

t t n kS S

EPI809/Spring 2008EPI809/Spring 2008 44

Test of Slope Coefficient Test of Slope Coefficient Rejection RuleRejection Rule

Reject HReject H00 in favor of H in favor of Ha a if t falls in colored if t falls in colored

areaarea

Reject HReject H00 for H for Ha a if P-value = 2P(T>|t|)<if P-value = 2P(T>|t|)<αα

T=t(n-k-1)T=t(n-k-1)00 tt1-1-αα/2/2(n-k-1)(n-k-1)

Reject HReject H00 Reject HReject H00

αα/2/2

-t-t1-1-αα/2/2(n-k-1)(n-k-1)

αα/2/2

EPI809/Spring 2008EPI809/Spring 2008 55

Individual Coefficients Individual Coefficients SAS OutputSAS Output

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.06397 0.25986 0.25 0.8214

Food 1 0.20492 0.05882 3.48 0.0399

weight 1 0.28049 0.06860 4.09 0.0264

P

2

01

^̂^̂ βp/s ^̂p

^̂ P-value

EPI809/Spring 2008EPI809/Spring 2008 66

Testing Model PortionsTesting Model Portions

EPI809/Spring 2008EPI809/Spring 2008 77

1.1. Tests the Contribution of a Tests the Contribution of a SetSet of of XX Variables to the Relationship With Variables to the Relationship With YY

2.2. Null Hypothesis HNull Hypothesis H00: : g+1g+1 = ... = = ... = kk = 0 = 0 Variables in Set Do Variables in Set Do NotNot Improve Improve

Significantly the Model When All Other Significantly the Model When All Other Variables Are Included Variables Are Included

3.3. Used in Selecting Used in Selecting XX Variables or Models Variables or Models

Testing Model PortionsTesting Model Portions

EPI809/Spring 2008EPI809/Spring 2008 88

Testing Model PortionsTesting Model PortionsNestedNested Models Models

HH00: : Reduced model (Reduced model (g+1g+1 = ... = = ... = kk = 0 = 0))

HHaa: : Full modelFull model

', iiggiii XXXY 22110

', iiggiii XXXY 22110

Y X X Xi i i k ki i 0 1 1 2 2 Y X X Xi i i k ki i 0 1 1 2 2

EPI809/Spring 2008EPI809/Spring 2008 99

F-Test for F-Test for NestedNested Models Models

NumeratorNumerator

Reduction in SSE from additional parametersReduction in SSE from additional parameters

dfdf = = k-gk-g = number of additional parameters = number of additional parameters

DenominatorDenominator

SSE of full modelSSE of full model

df=n-(k+1)=df=n-(k+1)=error df of full modelerror df of full model

1/

/

1/

/

knSSE

gkSSEinDROP

knSSE

gkSSESSEF

FF

FR

EPI809/Spring 2008EPI809/Spring 2008 1010

Selecting Variables Selecting Variables in Model Buildingin Model Building

EPI809/Spring 2008EPI809/Spring 2008 1111

Model Building with Computer Model Building with Computer SearchesSearches

1.1. Rule: Use as Few Rule: Use as Few XX Variables As Possible Variables As Possible

2.2. Stepwise Regression Stepwise Regression Computer Selects Computer Selects XX Variable Most Highly Variable Most Highly

Correlated With Correlated With YY Continues to Add or Remove Variables Continues to Add or Remove Variables

Depending on SSEDepending on SSE

3.3. Best Subset Approach Best Subset Approach Computer Examines All Possible SetsComputer Examines All Possible Sets

EPI809/Spring 2008EPI809/Spring 2008 1212

Residual Analysis for goodness of fitResidual Analysis for goodness of fit

yyresiduale ˆ

EPI809/Spring 2008EPI809/Spring 2008 1313

Residual (Estimated Errors) Residual (Estimated Errors) AnalysisAnalysis

1.1. Graphical Analysis of ResidualsGraphical Analysis of Residuals Plot Estimated Errors vs. Plot Estimated Errors vs. XXii Values (or pred.) Values (or pred.) Plot Histogram or Stem-&-Leaf of ResidualsPlot Histogram or Stem-&-Leaf of Residuals

2.2. PurposesPurposes- Examine Functional Form (Linear vs. Non-Examine Functional Form (Linear vs. Non-

Linear Model)Linear Model)- Evaluate Violations of Assumptions (to insure Evaluate Violations of Assumptions (to insure

validity of the statistic tests on validity of the statistic tests on ββ’s’s))

EPI809/Spring 2008EPI809/Spring 2008 1414

We recall Linear Regression We recall Linear Regression Assumptions Assumptions

1.1. Mean of Distribution of Error Is 0Mean of Distribution of Error Is 0

2.2. Distribution of Error Has Constant Distribution of Error Has Constant VarianceVariance

3.3. Distribution of Error is NormalDistribution of Error is Normal

4.4. Errors Are IndependentErrors Are Independent

EPI809/Spring 2008EPI809/Spring 2008 1515

X

e

Residual Plot Residual Plot for Functional Formfor Functional Form

X

e

Nonlinear patternNonlinear pattern Correct SpecificationCorrect Specification

EPI809/Spring 2008EPI809/Spring 2008 1616

Residual Plot Residual Plot for Equal Variancefor Equal Variance

X

SR

Unequal VarianceUnequal Variance

X

SR

Correct SpecificationCorrect Specification

Fan-shaped.Fan-shaped.Standardized residuals used typically (residual Standardized residuals used typically (residual divided by standard error of prediction) divided by standard error of prediction)

EPI809/Spring 2008EPI809/Spring 2008 1717

Residual Plot Residual Plot for Independencefor Independence

X

SR

Not IndependentNot Independent

X

SR

Correct SpecificationCorrect Specification

EPI809/Spring 2008EPI809/Spring 2008 1818

Residuals Diagnostics in SASResiduals Diagnostics in SAS

symbol v=dot h=symbol v=dot h=22 c=green; c=green;

PROCPROC REGREG data=Cow; data=Cow;

model milk = food weight;model milk = food weight;

plot residual.*predicted.plot residual.*predicted.

/cHREF=red cframe=ligr;/cHREF=red cframe=ligr;

runrun; ;

EPI809/Spring 2008EPI809/Spring 2008 1919

Mi l k = 0. 064 +0. 2049 Food +0. 2805 wei ght

N 6

Rsq 0. 9737

Adj Rsq0. 9561

RMSE 0. 2888

- 0. 5

- 0. 4

- 0. 3

- 0. 2

- 0. 1

0. 0

0. 1

0. 2

0. 3

Pr edi ct ed Val ue

0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0

EPI809/Spring 2008EPI809/Spring 2008 2020

Check for Outlying Observations Check for Outlying Observations and Influence analysisand Influence analysis

symbol v=dot h=symbol v=dot h=22 c=green; c=green;

procproc regreg data=cow; data=cow;model milk = food weight/influence;model milk = food weight/influence;plot plot rstudent.*obs.rstudent.*obs. / vref=- / vref=-22 22 cvref=blue cvref=blue

lvref=lvref=22 HREF= HREF=00 to to 77 by by 11 cHREF=red cHREF=red cframe=ligr;cframe=ligr;

runrun;;

EPI809/Spring 2008EPI809/Spring 2008 2121

Mi l k = 0. 064 +0. 2049 Food +0. 2805 wei ght

N 6

Rsq 0. 9737

Adj Rsq0. 9561

RMSE 0. 2888

- 5

- 4

- 3

- 2

- 1

0

1

2

Obser vat i on Number

1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0

EPI809/Spring 2008EPI809/Spring 2008 2222

Influence analysis of each obs.Influence analysis of each obs.

The REG Procedure Model: MODEL1 Dependent Variable: Milk

Output Statistics

Hat Diag Cov -----------DFBETAS----------- Obs Residual RStudent H Ratio DFFITS Intercept Food weight

1 0.1701 0.8283 0.5473 3.0770 0.9108 0.8436 -0.5503 0.0565 2 0.0527 0.2040 0.4552 5.8235 0.1865 -0.0632 -0.0215 0.1145 3 0.0408 0.1688 0.5271 6.8398 0.1782 0.1530 0.0335 -0.1211 4 -0.0520 -0.2266 0.5678 7.2379 -0.2597 -0.0164 0.1767 -0.2170 5 -0.4155 -4.0459 0.2260 0.0056 -2.1863 -0.9217 -1.0080 1.0753 6 0.2039 1.4531 0.6766 1.2013 2.1019 -0.5540 1.7420 -0.9265

EPI809/Spring 2008EPI809/Spring 2008 2323

MulticollinearityMulticollinearity

1.1. High Correlation Between High Correlation Between XX Variables Variables

2.2. Coefficients Measure Combined EffectCoefficients Measure Combined Effect

3.3. Leads to Unstable Coefficients Leads to Unstable Coefficients Depending on Depending on XX Variables in Model Variables in Model

4.4. Always ExistsAlways Exists

5. 5. Example: Using Both Age & Height of Example: Using Both Age & Height of children as indep. Var. in Same Model children as indep. Var. in Same Model

EPI809/Spring 2008EPI809/Spring 2008 2424

Detecting MulticollinearityDetecting Multicollinearity

1.1. Examine Correlation MatrixExamine Correlation Matrix Correlations Between Pairs of Correlations Between Pairs of XX Variables Are More Variables Are More

than With than With YY Variable Variable

2.2. Examine Variance Inflation Factor (VIF)Examine Variance Inflation Factor (VIF) If VIFIf VIFjj > 5 (or 10 according to most references), > 5 (or 10 according to most references),

Multicollinearity ExistsMulticollinearity Exists

3.3. Few RemediesFew Remedies Obtain New Sample DataObtain New Sample Data Eliminate One Correlated Eliminate One Correlated X X VariableVariable

EPI809/Spring 2008EPI809/Spring 2008 2525

SAS CODES :VET EXAMPLESAS CODES :VET EXAMPLE

PROC CORR data=vetPROC CORR data=vet;;

VARVAR milk food weight; milk food weight;

runrun;;

EPI809/Spring 2008EPI809/Spring 2008 2626

Correlation Matrix Correlation Matrix SAS Computer OutputSAS Computer Output

Pearson Correlation Coefficients, N = 6 Prob > |r| under H0: Rho=0

Milk Food weight

Milk 1.00000 0.90932 0.93117 0.0120 0.0069

Food 0.90932 1.00000 0.74118 0.0120 0.0918

weight 0.93117 0.74118 1.00000 0.0069 0.0918

rY1 rY2

All 1’sr12

EPI809/Spring 2008EPI809/Spring 2008 2727

Variance Inflation Factors Variance Inflation Factors SAS CODESSAS CODES

/* VIF measures the inflation in the variances of the /* VIF measures the inflation in the variances of the parameter estimates due to collinearity that exists parameter estimates due to collinearity that exists among the regressors or (dependent) variablesamong the regressors or (dependent) variables */ */

PROCPROC REGREG data=Cow; data=Cow;

model milk = food weight/model milk = food weight/VIFVIF;;

runrun; ;

EPI809/Spring 2008EPI809/Spring 2008 2828

Variance Inflation Factors Variance Inflation Factors Computer OutputComputer Output

Parameter Estimates

Parameter Standard Variance

Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 0.06397 0.25986 0.25 0.8214 0

Food 1 0.20492 0.05882 3.48 0.0399 2.21898

weight 1 0.28049 0.06860 4.09 0.0264 2.21898

VIF1 5

EPI809/Spring 2008EPI809/Spring 2008 2929

Types of Regression Models Types of Regression Models viewed from the explanatory viewed from the explanatory

variables standpointvariables standpoint

EPI809/Spring 2008EPI809/Spring 2008 3030

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

EPI809/Spring 2008EPI809/Spring 2008 3131

Regression Models Regression Models based on a Single based on a Single

Quantitative Explanatory Quantitative Explanatory VariableVariable

EPI809/Spring 2008EPI809/Spring 2008 3232

Types of Types of Regression ModelsRegression Models

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

EPI809/Spring 2008EPI809/Spring 2008 3333

First-Order Model With First-Order Model With 1 Independent Variable1 Independent Variable

EPI809/Spring 2008EPI809/Spring 2008 3434

First-Order Model With First-Order Model With 1 Independent Variable1 Independent Variable

1.1. Relationship Between 1 Dependent Relationship Between 1 Dependent & 1 Independent Variable Is Linear& 1 Independent Variable Is Linear

E Y X i( ) 0 1 1E Y X i( ) 0 1 1

EPI809/Spring 2008EPI809/Spring 2008 3535

First-Order Model With First-Order Model With 1 Independent Variable1 Independent Variable

1.1. Relationship Between 1 Dependent Relationship Between 1 Dependent & 1 & 1 Independent Variable Is LinearIndependent Variable Is Linear

2.2. Used When Expected Rate of Used When Expected Rate of Change Change in in YY Per Unit Change in Per Unit Change in XX Is Is StableStable

E Y X i( ) 0 1 1E Y X i( ) 0 1 1

EPI809/Spring 2008EPI809/Spring 2008 3636

First-Order Model RelationshipsFirst-Order Model Relationships

1 < 01 > 0Y

X1

Y

X1

E Y X i( ) 0 1 1E Y X i( ) 0 1 1

EPI809/Spring 2008EPI809/Spring 2008 3737

First-Order Model First-Order Model WorksheetWorksheet

Run regression with Run regression with YY, , XX11

EPI809/Spring 2008EPI809/Spring 2008 3838

Types of Types of Regression ModelsRegression Models

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

EPI809/Spring 2008EPI809/Spring 2008 3939

Second-Order Model With Second-Order Model With 1 Independent Variable1 Independent Variable

1.1. Relationship Between 1 Dependent & Relationship Between 1 Dependent & 1 1 Independent Variables Is a Quadratic Independent Variables Is a Quadratic

FunctionFunction

2.2. Useful 1Useful 1StSt Model If Non-Linear Model If Non-Linear Relationship SuspectedRelationship Suspected

EPI809/Spring 2008EPI809/Spring 2008 4040

Second-Order Model With Second-Order Model With 1 Independent Variable1 Independent Variable

1.1. Relationship Between 1 Dependent & Relationship Between 1 Dependent & 1 1 Independent Variables Is a Quadratic Independent Variables Is a Quadratic

FunctionFunction

2.2. Useful 1Useful 1StSt Model If Non-Linear Model If Non-Linear Relationship SuspectedRelationship Suspected

3.3. ModelModel

E Y X Xi i( ) 0 1 1 2 12E Y X Xi i( ) 0 1 1 2 12

Linear effectLinear effect

Curvilinear Curvilinear effecteffect

EPI809/Spring 2008EPI809/Spring 2008 4141

Y

X1

Y

X1

Second-Order Model Second-Order Model RelationshipsRelationships

Y

X1

Y

X1

Y

X1

Y

X1

Y

X1

Y

X1

2 > 02 > 0

2 < 02 < 0

EPI809/Spring 2008EPI809/Spring 2008 4242

Second-Order Model Second-Order Model WorksheetWorksheet

Case, i Yi X1i X1i2

1 1 1 1

2 4 8 64

3 1 3 9

4 3 5 25

: : : :

Case, i Yi X1i X1i2

1 1 1 1

2 4 8 64

3 1 3 9

4 3 5 25

: : : :

Create Create XX1122 column. column.

Run regression with Run regression with YY, , XX11, , XX1122. .

EPI809/Spring 2008EPI809/Spring 2008 4343

Types of Types of Regression ModelsRegression Models

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

EPI809/Spring 2008EPI809/Spring 2008 4444

Third-Order Model With Third-Order Model With 1 Independent Variable1 Independent Variable

1.1. Relationship Between 1 Dependent & Relationship Between 1 Dependent & 1 1 Independent Variable Has a ‘Wave’Independent Variable Has a ‘Wave’

2.2. Used If 1 Reversal in CurvatureUsed If 1 Reversal in Curvature

EPI809/Spring 2008EPI809/Spring 2008 4545

Third-Order Model With Third-Order Model With 1 Independent Variable1 Independent Variable

1.1. Relationship Between 1 Dependent & Relationship Between 1 Dependent & 1 1 Independent Variable Has a ‘Wave’Independent Variable Has a ‘Wave’

2.2. Used If 1 Reversal in CurvatureUsed If 1 Reversal in Curvature

3.3. ModelModel

E Y X X Xi i i( ) 0 1 1 2 12

3 13E Y X X Xi i i( ) 0 1 1 2 1

23 1

3

Linear effectLinear effect Curvilinear Curvilinear effectseffects

EPI809/Spring 2008EPI809/Spring 2008 4646

Third-Order Model Third-Order Model RelationshipsRelationships

Y

X1

Y

X1

Y

X1

Y

X1

3 < 03 > 0

E Y X X Xi i i( ) 0 1 1 2 12

3 13E Y X X Xi i i( ) 0 1 1 2 1

23 1

3

EPI809/Spring 2008EPI809/Spring 2008 4747

Third-Order Model WorksheetThird-Order Model Worksheet

Case, i Yi X1i X1i2 X1i

3

1 1 1 1 1

2 4 8 64 512

3 1 3 9 27

4 3 5 25 125

: : : : :

Case, i Yi X1i X1i2 X1i

3

1 1 1 1 1

2 4 8 64 512

3 1 3 9 27

4 3 5 25 125

: : : : :MultiplyMultiply X X11 byby X X1 1 to get to get XX11

22. .

MultiplyMultiply X X11 byby X X1 1 by by XX1 1 to get to get XX1133. .

Run regression with Run regression with YY, , XX11, , XX1122

, , XX1133..