19
Sampling plans for linear regression • Given a domain, we can reduce the prediction error by good choice of the sampling points. • The choice of sampling locations is called “design of experiments” or DOE. • In this lecture we will consider DOEs for linear regression using linear and quadratic polynomials and where errors are due to noise in the data. • With a given number of points the best DOE is one that will reduce the prediction variance (reviewed in next few slides). • The simplest DOE is full factorial design where we sample each variable (factor) at a fixed number of values (levels) • Example: with four factors and three levels each we will sample 81 points • Full factorial design is not practical except for low dimensions

Sampling plans for linear regression Given a domain, we can reduce the prediction error by good choice of the sampling points. The choice of sampling

Embed Size (px)

Citation preview

  • Slide 1

Slide 2 Sampling plans for linear regression Given a domain, we can reduce the prediction error by good choice of the sampling points. The choice of sampling locations is called design of experiments or DOE. In this lecture we will consider DOEs for linear regression using linear and quadratic polynomials and where errors are due to noise in the data. With a given number of points the best DOE is one that will reduce the prediction variance (reviewed in next few slides). The simplest DOE is full factorial design where we sample each variable (factor) at a fixed number of values (levels) Example: with four factors and three levels each we will sample 81 points Full factorial design is not practical except for low dimensions Slide 3 Linear Regression Slide 4 Model based error for linear regression Slide 5 Prediction variance Linear regression model Define then With some algebra Standard error Slide 6 Prediction variance for full factorial design Recall that standard error (square root of prediction variance is For full factorial design the domain is normally a box. Cheapest full factorial design: two levels (not good for quadratic polynomials). For a linear polynomial standard error is then Maximum error at vertices What does the ratio in the square root represent? Slide 7 Designs for linear polynomials Traditionally use only two levels. Orthogonal design when X T X is diagonal. Full factorial design is orthogonal, not so easy to produce other orthogonal designs with less points. It is beneficial to place the points at the edges of the design domain. Stability: Small variation of prediction variance in domain is also desirable property. Slide 8 Example Slide 9 Comparison Slide 10 Quadratic Polynomial A quadratic polynomial has (n+1)(n+2)/2 coefficients, so we need at least that many points. Need at least three different values of each variable. Simplest DOE is three-level, full factorial design Impractical for n>5 Also unreasonable ratio between number of points and number of coefficients For example, for n=8 we get 6561 samples for 45 coefficients. My rule of thumb is that you want twice as many points as coefficients Slide 11 Central Composite Design Slide 12 Repeated observations at origin Unlike linear designs, prediction variance is high at origin. Repetition at origin decreases variance there and improves stability. What other rationale for choosing the origin for repetition? Repetition also gives an independent measure of magnitude of noise. Can be used also for lack-of-fit tests. Slide 13 Without repetition (9 points) Contours of prediction variance for spherical CCD design. How come it is rotatable? Slide 14 Center repeated 5 times (13 points). With five repetitions we reduce the maximum prediction variance and greatly improve the uniformity. Five points is the optimum for uniformity. Slide 15 Variance optimal designs Full factorial and CCD are not flexible in number of points Standard error A key to most optimal DOE methods is moment matrix A good design of experiments will maximize the terms in this matrix, especially the diagonal elements. D-optimal designs maximize determinant of moment matrix. Determinant is inversely proportional to square of volume of confidence region on coefficients. Slide 16 Example Given the model y=b 1 x 1 +b 2 x 2, and the two data points (0,0) and (1,0), find the optimum third data point (p,q) in the unit square. We have So that the third point is (p,1), for any value of p Finding D-optimal design in higher dimensions is a difficult optimization problem often solved heuristically Slide 17 Matlab example >> ny=6;nbeta=6; >> [dce,x]=cordexch(2,ny,'quadratic'); >> dce' 1 1 -1 -1 0 1 -1 1 1 -1 -1 0 scatter(dce(:,1),dce(:,2),200,'filled') >> det(x'*x)/ny^nbeta ans = 0.0055 With 12 points: >> ny=12; >> [dce,x]=cordexch(2,ny,'quadratic'); >> dce' -1 1 -1 0 1 0 1 -1 1 0 -1 1 1 -1 -1 -1 1 1 -1 -1 0 0 0 1 scatter(dce(:,1),dce(:,2),200,'filled') >> det(x'*x)/ny^nbeta ans =0.0102 Slide 18 Other criteria A-optimal minimizes trace of the inverse of the moment matrix. This minimizes the sum of the variances of the coefficients. G-optimality minimizes the maximum of the prediction variance. Slide 19 Example For the previous example, find the A-optimal design Minimum at (0,1), so this point is both A-optimal and D-optimal. Slide 20 Problems Create a 13-point D-optimal design in two dimensional space and compare its prediction variance to that of the CCD design shown on Slide 13. Generate noisy data for the function y=(x+y) 2 and fit using the two designs and compare the accuracy of the coefficients.