Topic 6: Estimation and Prediction of Y h

Topic 6: Estimation and Prediction of Yh

Outline

• Estimation and inference of E(Yh)

• Prediction of a new observation

• Construction of a confidence band for the entire regression line

Estimation of E(Yh)

• E(Yh) = μh = β0 + β1Xh, the mean value of Y for the subpopulation with X=Xh

• We will estimate E(Yh) by

• KNNL use for this estimate, see equation (2.28) on pp 52

h 0 1ˆ hb b X hY

Theory for Estimation of E(Yh)

• is Normal with mean μh and variance

• The Normality is a consequence of the fact that b0 + b1Xh is a linear combination of Yi’s

• See KNNL pp 52-54 for details

X X1ˆ

n X Xh

Application of the Theory• We estimate σ2( ) by

• It then follows that

• Details for confidence intervals and significance tests are consequences

2 2 hh 2

X X1ˆ( )

( )( )

ˆ E(Y )~ t n 2

ˆ( )( )

95% Confidence Interval for E(Yh)

• ± tcs( )

where tc = t(.975, n-2)

• NOTE: significance tests can be constructed but they are rarely used in practice

Toluca Company Example (pg 19)

• Manufactures refrigeration equipment

• One replacement part manufactured in lots of varying sizes

• Company wants to determine the optimum lot size

• To do this, company needs to first describe the relationship between work hours and lot size

lotsize

20 30 40 50 60 70 80 90 100 110 120

Scatterplot w/ regr line

***Generating the data set***;data toluca; infile ‘../data/CH01TA01.txt'; input lotsize hours;data other; size=65; output; size=100; output;data toluca1; set toluca other;proc print data=toluca1; run;

SAS CODE

***Generating the confidence intervals for all values of X in the data set***;

proc reg data=toluca1; model hours=size/clm; id lotsize;run;

SAS CODE

clm option generates confidence intervals for the

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|Intercept 1 62.36586 26.17743 2.38 0.0259

lotsize 1 3.57020 0.34697 10.29 <.0001

Output Statistics

Obs lotsizeDependent

VariablePredicted

ValueStd Error

Mean Predict 95% CL Mean1 80 399.0000 347.9820 10.3628 326.5449 369.4191

25 70 323.0000 312.2800 9.7647 292.0803 332.479726 65 . 294.4290 9.9176 273.9129 314.945127 100 . 419.3861 14.2723 389.8615 448.9106

• Standard error affected by how far Xh is from (see Figure 2.6)

• Recall teeter-totter idea…a change in the slope has bigger impact on Y as you move away from

Prediction of Yh(new)

• Want to predict value for a new observation at X=Xh

• Model: Yh(new) = β0 + β1Xh +

• Since E(e)=0 same value as for E(Yh)

• Prediction interval, however, relies heavily on assumption that e are Normally distributed

Note!!

Prediction of Yh(new)

• Var(Yh(new))=Var( )+Var(ξ )

• Then follows that

X X1pred 1

n X X( )s s

( ) hˆ( ) / (pred) ~ ( 2)h newY s t n

• Procedure can be modified for the mean of m observations at X=Xh (see 2.39a and 239b on page 60)

• Standard error affected by how far Xh is from (see Figure 2.6)X

***Generating the prediction intervals for all values of X in data set***;

proc reg data=toluca1; model hours=lotsize/cli; id lotsize;run;

SAS CODE

cli option generates prediction interval for a

new observation

Output Statistics

Obs lotsizeDependent

VariablePredicted

ValueStd Error

Mean Predict 95% CL Predict1 80 399.0000 347.9820 10.3628 244.7333 451.2307

25 70 323.0000 312.2800 9.7647 209.2811 415.278926 65 . 294.4290 9.9176 191.3676 397.490427 100 . 419.3861 14.2723 314.1604 524.6117

These are wrong…same as before. Does not include variability about regression line

• The standard error (Std Error Mean Predict)given in this output is the standard error of not s(pred)

• The prediction interval is correct and wider than the previous confidence interval

• To get correct standard error need to add the variance about the regression line

2 ˆ( ) ( )hs pred s MSE

Confidence band for regression line

• ± Ws( )

where W2=2F(1-α; 2, n-2) • This gives combined “confidence intervals”

for all Xh

• Boundary values of confidence bands define a hyperbola

• Will be wider at Xh than single CI

Confidence band for regression line

• Theory comes from the joint confidence region for (β0, β1 ) which is an ellipse (Stat 524)

• We can find an alpha for tc that gives the same results

• We find W2 and then find the alpha for tc

that will give W = tc

data a1; n=25; alpha=.10; dfn=2; dfd=n-2; tsingle=tinv(1-alpha/2,dfd); w2=2*finv(1-alpha,dfn,dfd); w=sqrt(w2); alphat=2*(1-probt(w,dfd)); t_c=tinv(1-alphat/2,dfd); output;proc print data=a1;run;

SAS CODE

n alpha dfn dfd tsingle w2 w alphat t_c25 0.1 2 23 1.71387 5.09858 2.25800 0.033740 2.25800

SAS OUTPUT

Used for single 90% CI

Used for 90%

confidence band

symbol1 v=circle i=rlclm97;

proc gplot data=toluca; plot hours*lotsize; run;

SAS CODE

lotsize

20 30 40 50 60 70 80 90 100 110 120

Estimation of E(Yh) and Prediction of Yh

• = b0 + b1Xh

11pred)

symbol1 v=circle i=rlclm95; proc gplot data=toluca; plot hours*lotsize;

symbol1 v=circle i=rlcli95; proc gplot data=toluca; plot hours*lotsize;

SAS CODE

Confidence intervals

Prediction intervals

lotsize

20 30 40 50 60 70 80 90 100 110 120

Confidence band

lotsize

20 30 40 50 60 70 80 90 100 110 120

Confidence intervals

lotsize

20 30 40 50 60 70 80 90 100 110 120

Prediction intervals

Background Reading

• Program topic6.sas has the code for the various plots and calculations;

• Sections 2.7, 2.8, and 2.9

Topic 6: Estimation and Prediction of Y h

Documents

Parameter estimation and prediction for groundwater

Freeway Travel Time Estimation and Prediction Using

Methodology for prediction and estimation of consequences

Blood glucose prediction with variance estimation using

Available Bandwidth Estimation and Prediction in Ad hoc

Kernel density estimation-based real-time prediction …druan.bol.ucla.edu/papers/files/jour/ruan_09_kde.pdfKernel density estimation-based real-time prediction for respiratory motion

LIFETIME PREDICTION AND ESTIMATION OF POWER

Prediction, Estimation, and Attribution · 2019-09-06 · Prediction,Estimation, andAttribution BradleyEfron brad@stat.stanford.edu DepartmentofStatistics StanfordUniversity. Regression

Prediction and Estimation of Electroplating

Topic 7-Aw Prediction

Estimation of Tropical deforestation and Prediction of

Supplement: Semi-Supervised Prediction-Constrained Topic

Ballistic Coefficient Estimation for Reentry Prediction of

Small Area Estimation-Based Prediction Methods to ... - The World …documents.worldbank.org/curated/en/211731468326995187/pdf/WP… · Small Area Estimation-Based Prediction . Methods

Joint Channel Estimation and Prediction for OFDM Systems

Channel Estimation and Prediction for 5G Applications

Estimation and Prediction 2011 01-11-01

Effort Prediction Models for Interval Estimation

Energy Prediction and Construction Cost Estimation … · Energy Prediction and Construction Cost Estimation Report . Project: SunMine Pilot Project . Sullivan Mine, Kimberley, British

Intra Prediction Header Bits Estimation Algorithm for RDO ...Intra prediction header bits estimation algorithm for RDO in H.265/HEVC 8723 statetoperformproperbitcountingcorrespondstothemostcomputationally