Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction

3-1

Lecture 3: Inference in SLR

STAT 512

Spring 2011

Background Reading

KNNL: 2.1 – 2.6

3-2

Topic Overview

This topic will cover:

Review of hypothesis testing

Inference about 1

Inference about 0

Confidence Intervals

Prediction Intervals

3-3

Review: Significance Tests

One Sample T-test

Take a sample of size n from some

(normal) population:

0 0

0

0

:

:a

H Yt

H s Y

Compare t to a critical value from the

students-T distribution (table B.2) with

(typically) 0.05 .

3-4

Review: Significance Tests (2)

One Sample T-test: Can turn the test

statistic into a confidence interval for

1 /2, 1nY t s Y

Generally a confidence interval takes the

form

Point Est. ± Crit. Value * SE

Two Sample T-test: Compares the means of

two samples.

3-5

Significance Levels

The significance level is the probability

of making a Type I error and rejecting the

null hypothesis when it is in fact true (false

positive).

The most common significance level that we

will use is 0.05 .

The corresponding confidence level is

1 . So for 0.05 our confidence

level will be 95%.

3-6

P-Values

The p-value for a test is the probability

(under the null hypothesis) of observing a

test statistic that is at least as extreme as

the one that is actually observed. We

reject the null if P-value

Mathematically, the p-value is

0 1Pr , where ~H nT t T t

Graphically, the p-value is twice the area in

the upper tail of the 1nt distribution (above

the observed t ).

3-7

Conclusions

“Conclude Ha” means “there is sufficient

evidence in the data to conclude that H0 is

false, and hence we can assume Ha is true”.

“Fail to Reject H0” means “there is

insufficient evidence in the data to

conclude that either H0 or Ha is true or

false, so we default to assuming that H0 is

true”. Unless prepared to make further

justification (power) it is not appropriate to

“conclude H0”.

3-8

Power of a Test

The probability of a Type II error (failing to

reject H0 when Ha is in fact true or a false

negative) is often denoted (not to be

confused with regression coefficients).

The power of a test is 1 . This is the

probability that H0 will be rejected given

that Ha is true.

Power calculations involve the non-central t-

distribution (generally use a computer).

3-9

1β Inference

Recall that

1 2

i i XY

Xi

X X Y Y SSb

SSX X

X‟s are constant, Y‟s are normally

distributed. Using probability theory it can

thus be shown that (page 42-43)

21 1 1~ ,b Normal b

where 2

21

X

bSS

3-10

Test for 00 1H : β

As in the case of the one-sample t-test, we

can develop the test statistic for testing

H0: 1 0 vs. Ha: 1 0 :

1

1

0bts b

where 1

X

MSEs b

SS

This statistic has a t-distribution with n – 2

degrees of freedom (not n – 1 because we

are also estimating 0 ).

3-11

Test for 00 1H : β

Reject H0 if | | critt t , where

(1 ; 2)2critt t n

.

SAS will give us both the value of the t-

statistic and the P-value. If the P-value is

smaller than , reject in favor of

1: 0aH

3-12

Confidence Interval for 1β

The 100 1 % CI for 1 is

1 1critb t s b

where (1 ; 2)2critt t n

.

In terms of hypothesis testing, if the CI does

not contain 0, then we reject 0 1: 0H

and conclude that 1: 0aH is true.

3-13

Power

In cases where we fail to reject, it is

important to know the power of the test for

0 1: 0H . There are two important

questions we must answer before we can

determine power:

1. What size difference is important?

2. Guess for the variance 2 ?

Note that power calculations should be done

prior to collection of data if possible.

3-14

Power (2)

The power to detect a difference of size d is

calculated using the non-central t

distribution. In addition to and the

degrees of freedom, we need the non-

centrality parameter:

1 1

21 / Xb SS

Power for some values of , can be looked

up in Table B5. SAS also has a procedure

for computing power (for any values).

3-15

0β Inference

Similar to inference for 1

20 0 0~ ,b Normal b

where 2

2 20

1

X

Xb

n SS

To test 0 k :

0

0

b kts b

where

2

0

1

X

Xs b MSE

n SS

3-16

Test for k0 0H : β

The statistic has a t-distribution with n – 2

degrees of freedom; compare it with the

appropriate t-critical value.

SAS gives both statistic and p-value for

testing 0 0 ; to test 0 k , obtain and

use a confidence interval.

The 100 1 % CI for 0 is

0 0critb t s b

Remember: If X = 0 is not within the scope of

the model, inference may be meaningless!!

3-17

Robustness

In cases where the errors are not quite

normal, the CIs and significance tests for

1 and 0 are still generally reasonable

approximations.

We say that these tests are robust with

respect to minor violations of the normality

assumption.

3-18

SAS Coding

PROC REG data=diamonds;

model price=weight /clb;

RUN;

„clb‟ option in PROC REG requests the

confidence limits for 1b and 0b .

You can also specify alpha=0.xxx to change

the significance level (default = 0.05)

3-19

SAS Output

Parameter Std

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -259.625 17.318 -14.99 <.0001

weight 1 3721.024 81.785 45.50 <.0001

Variable DF 95% Confidence Limits

Intercept 1 -294.48696 -224.76486

weight 1 3556.39841 3885.65129

3-20

Summary of Inference

SLR Model

0 1i i iY X

2~ 0,i Normal are independent, random

errors

20 1~ ,i iY Normal X

3-21


Parameter Estimates

For 1 :

1 2

i i XY

Xi

X X Y Y SSb

SSX X

For 0 : 0 1b Y b X

For 2 :

22

2i

E

eSSEs MSE

df n

3-22


100 1 %

Confidence Intervals

1 1critb t s b

0 0critb t s b

Where (1 ; 2)2critt t n

.

3-23


Significance tests

H0: 1 0 vs. Ha: 1 0 :

1

1

0( 2)

bt t ns b

under H0

H0: 0 0 vs. Ha: 0 0 :

0

0

0( 2)

bt t ns b

under H0

Reject H0 if the P-value is small (<)

3-24

CI for the Mean Response

The mean response when hX X is

0 1h hY b b X

hY is a normal random variable (since the

parameter estimates are linear combos of

the iY and these are normal).

To develop a confidence interval we can

obtain a formula for the standard error

from 20b and 2

1b .

3-25

Standard Error

The variance associated to hY is

20 1

2

2

ˆ

1

h h

h

X

Var Y Var b X Var b

X X

n SS

Substitute MSE for 2 to get the estimated

variance. Take the square root to get the

hs Y

3-26

Confidence Interval for hE Y

Recall: Point Est. ± Crit. Value * SE

Confidence Limits are

ˆ ˆh crit hY t s Y


3-27

Prediction Intervals

Predicting a new observation for hX X is

different from estimating the mean

response in that there is additional

variation associated to the normal curve

that is centered at hE Y

Hence two components to ,h news Y

Variance associated to the estimated

mean response.

Variance associated to the new obs.

3-28

Prediction Intervals (2)

The variance associated to ,h newY is

2,

2

2

ˆ ˆ

11

h new h

h

X

Var Y Var Y

X X

n SS

As before, substitute MSE for 2 and take

the square root to get ,h news Y , or

equivalently, s pred .

3-29

Prediction Intervals (3)

The 100 1 % prediction interval for a

new observation at hX X is given by

h critY t s pred


3-30

CI’s and PI’s in SAS

PROC REG data=diamonds;

model price=weight

/clm cli;

„clm‟ produces CI‟s for the mean response

„cli‟ produces prediction intervals

Intervals produced for each data point

including those with missing values

3-31

SAS Output

Predicted Std Error

Obs Wt Price Value Mean Predict 95% CL Mean

1 0.12 223.00 186.897 8.2768 170.237 203.558

2 0.15 323.00 298.528 6.3833 285.679 311.377

49 0.43 . 1340 19.033 1302 1379

Obs Wt 95% CL Predict Residual

1 0.12 120.6754 253.1187 36.1029

2 0.15 233.1609 363.8947 23.4722

49 0.43 1266 1415 .

3-32

Comparing Standard Errors

1X

MSEs b

SS

2

0

1

X

Xs b MSE

n SS

2

1ˆ h

hX

X Xs Y MSE

n SS

2

11

h

X

X Xs pred MSE

n SS

3-33

Minimizing Standard Errors

Can sometimes design experiments to minimize

standard errors

Increase sample size

Increase XSS by spreading out the values of

the predictor variable

Arrange for the predictor of interest to be

hX X

3-34

Upcoming in Lecture 4...

We will look at one more example

illustrating the use of SAS.

We‟ll discuss the Working-Hotelling

Confidence Band (2.6), details of the

ANOVA table (2.7 – 2.9) and clean up a

few details in 2.10.

Documents

Lecture 3: Inference in SLR - Purdue Universityghobbs/STAT_512/Lecture_Notes/... · Review of hypothesis testing Inference about C 1 Inference about C 0 Confidence Intervals Prediction