30
Additional Topics in Prediction Methodology

Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Embed Size (px)

Citation preview

Page 1: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Additional Topics in Prediction Methodology

Page 2: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Introduction

• Predictive distribution for random variable Y0 is meant to capture all the information about Y0 that is contained in Yn.

• not completely specify Y0 but does provide a probability distribution of more likely and less likely values of Y0

• E{Y0|Yn} is the best MSPE predictor of Y0

Page 3: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Hierarchical models have two stages

• X Rd

• f0=f(x0) known p*1 vector

• F=(fj(xj)) known n*p matrix

unknown p*1 vector regression coefficients

• R=(R(xi-xj)) known n*n matrix correlations among trainning data Yn

• r0=(R(xi-x0)) known n*1 vector correlations of Y0 with Yn

Page 4: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Predictive Distributions when Z2, R

and r0 are known

Page 5: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about
Page 6: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Interesting features of (a) and (b)

• Non-informative Prior is the limit of the normal prior as

• While the prior is non-informative, it is not a proper distribution. The corresponding predictive distribution is proper.

• The same conditioning argument can be applied to drive posterior mean for the non-informative prior and normal prior.

Page 7: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

The mean and variance of the predictive distribution (mean)

0|n(x0) and 0|n(x0) depend on x0 only through the regression function f0 and correlation vector r0

0|n(x0) is a linear unbiased predictor of Y(x0)• The continuity and other smoothness properties

of 0|n(x0) are inherited from correlation function R(.) and the regressors {f(.)}j=1

p

Page 8: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

0|n(x0) depends on the parameters z2 2

only through their ratio

0|n(x0) interpolate the training data. When x0=xi, f0=f(xi), and r0

TR-1=eiT, the ith unit vect

or.

Page 9: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

)2/7cos()( 4.1 xexy x

00| )1(0

bn

Page 10: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

The mean and variance of the predictive distribution (Variance)

• MSPE(0|n(x0) )= 0|n2(x0)

• The variance of the posterior of Y(x0) given Yn should be 0 whenever x0=xi

0|n2(xi)=0

Page 11: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Most important use of Theorem 4.1.1

Page 12: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Predictive Distributions when R and r0 are known

The posterior is a location shifted and scaled univariate t distribution having degrees of freedom that are enhanced when there is informative prior information for either or z

2

Page 13: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about
Page 14: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about
Page 15: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Degree of freedom

• Base value for the degree of freedom i=n-p

• P additional degrees of freedom when prior is informative

0 additional degree of freedom when z2 is infor

mative

Page 16: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Location shift

The same centering value as Theorem 4.1.1 (known z

2 )

The non-informative prior gives the BLUP

Page 17: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Scale factor i2(x0)

(compare 4.1.15 with 4.1.6)

• Estimate of the scale factor 0|n2(x0).

• Qi2/i : estimate z

2

• Qi2: get information about z

2 from the conditional distribution Yn given z

2 and information from the prior of z

2

i2(xi)=0, xi is any of the training data point

s.

Page 18: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Prediction Distributions when Correlation parameters are unknown

• If the correlations among the observations is unknown (R r0 are unknown)?– Assume y(.) has a Gaussian prior with

correlation function R(.|), is unknown vector parameters

• Two issues– Standard error of Plug-in predictor 0|n(x0|)

by substituting comes from MLE or REML– Bayesian approach to uncertainty in which

is to model it by a prior distribution

Page 19: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Prediction of Multiple Response Models

• Several outputs are available for from a computer experiment

• Several codes are available for computing the same response (fast and slow code)

• Competing response

• Several stochastic models for joint response• Using these models to describe the optimal

predictor for one of the several computed responses.

Page 20: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Modeling Multiple Outputs

• Zi(.): marginally mean zero stationary Gaussian stochastic processes with unknown variance and correlation function R

• Zi(x) implies that the correlation between Zi(x1) and Zi(x2) only depends on x1-x2

• Assume Cov(Zi(x1), Zj(x2))=ijRij(x1-x2)• Rij(.) cross-correlation function of Zi(.) and Zj(.) • Linear model: global mean of the Yi process. fi(.): known

regression functions i: unknown regression parameters

Page 21: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Selection of correlation and cross-correlation functions are complicated

• Reason: for any input sites xli, the multivariate normal distributed random vector (Z1(x1

1), ….)T must have a nonnegative definite covariance matrix

• Solution: construct the Zi(.) from a set of elementary processes (usually this processes are mutually independent)

Page 22: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Example by Kennedy and O’Hagan

• Yi(x): prior for the ith code level (i=m top-level code). The autoregressive model:– Yi(x)=i-1Yi-1(x)+i(x), i=2, … , m

• The output for each successive higher level code i at x is related to the output of the less precise code i-1 at x plus the refinement i(x)

– Cov(Yi(x), Yi-1(w)|Yi-1(x))=0 for all w~=x• No additional second-order knowledge of code i at x can be

obtained from the lower-level code i-1 if the value of code i-1 at x is known (Markov property on the hierarchy of codes)

• Since there is no natural hierarchy of computer code in such applications, we need find something better.

Page 23: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

More reasonable Model

• Each constraint function is associated with the objective function plus a refinement– Yi(x)=iY1(x)+i(x), i=2, … , m+1

• Ver Hoef and Marry– Form models in the environmental sciences– Include an unknown smooth surface plus a ra

ndom measurement error.– Moving averages over white noise processes

Page 24: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Morris and Mitchell model• Prior information about y(x) is specified by a Gaussian pr

ocessor Y(.)• Prior information about the partial derivatives y(j)(x) is obt

ained by considering the “derivative” processes of Y(.)– Y1(.)=y(.), y2(.)= y(1)(.), y1+m(.)=y(m)(.)

• Natural prior for y(j)(x):

• The covariances between Y(x1), Y(j)(x2) and Y(i)(x1), Y(j)(x2) are:

Page 25: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Optimal Predictors for Multiple Outputs

• The best MSPE predictor based on training data is:

• Where Y0=Y1(X0), Yini=(Yi(x1

i), …), and yini i

s observed value for i=[1,m]

Page 26: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

The joint distribution is the multivariate normal distribution

Page 27: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Conditional expectation

…..• In practice, this is useless (it requires knowledge of marg

inal correlation functions, joint correlation function and ratio of all the process variance)

• Empirical versions are of practical use:– Every time we assume each of the correlation matrices Ri and cr

oss-correlation matrices Rij are known up to a vector of parameters.

– Estimate using MLE or REML

Page 28: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

example1

• 14 point training data has feature that it allows us to learn over the entire input space: space-filling

• Compare two model– Using the predictor of y(.) based on y(.) alone– Using the predictor of y(.) base on (y(.), y(1)(.),

y(2)(.))

• Second one is both more visually fit and has 24% smaller ERMSPE

Page 29: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about
Page 30: Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about

Thank you!