18
DX9-05-3-HistRSM-V8.docx Rev. 5/2/14 Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 1 Historical Data RSM Tutorial Part 1 – The Basics Introduction In this tutorial you will see how the regression tool in Design-Expert ® software, intended for response surface methods (RSM), is applied to historical data. We don’t recommend you work with such happenstance variables if there’s any possibility of performing a designed experiment. However, if you must, take advantage of how easy Design-Expert makes it to develop predictive models and graph responses, as you will see by doing this tutorial. It is assumed that at this stage you’ve mastered many program features by completing preceding tutorials. At the very least you ought to first do the one-factor RSM tutorials, both basic and advanced, prior to starting this one. The historical data for this tutorial, shown below, comes from the U.S. Bureau of Labor Statistics via James Longley (An Appraisal of Least Squares Programs for the Electronic Computer from the Point of View of the User, Journal of the American Statistical Association, 62 (1967): 819-841). As discussed in RSM Simplified (Mark J. Anderson and Patrick J. Whitcomb, Productivity, Inc., New York, 2005: Chapter 2), it presents some interesting challenges for regression modeling. Run A: Prices B: GNP C: Unemp. D: Military E: Pop. F: Time Employ. # (1954 =100) Armed Forces People >14 Year Total 1 83 234289 2356 1590 107608 1947 60323 2 88.5 259426 2325 1456 108632 1948 61122 3 88.2 258054 3682 1616 109773 1949 60171 4 89.5 284599 3351 1650 110929 1950 61187 5 96.2 328975 2099 3099 112075 1951 63221 6 98.1 346999 1932 3594 113270 1952 63639 7 99 365385 1870 3547 115094 1953 64989 8 100 363112 3578 3350 116219 1954 63761 9 101.2 397469 2904 3048 117388 1955 66019 10 104.6 419180 2822 2857 118734 1956 67857 11 108.4 442769 2936 2798 120445 1957 68169 12 110.8 444546 4681 2637 121950 1958 66513 13 112.6 482704 3813 2552 123366 1959 68655 14 114.2 502601 3931 2514 125368 1960 69564 15 115.7 518173 4806 2572 127852 1961 69331 16 116.9 554894 4007 2827 130081 1962 70551 Longley data on U.S. economy from 1947-1962

DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 1

Historical Data RSM Tutorial

Part 1 – The Basics

Introduction

In this tutorial you will see how the regression tool in Design-Expert® software, intended for response surface methods (RSM), is applied to historical data. We don’t recommend you work with such happenstance variables if there’s any possibility of performing a designed experiment. However, if you must, take advantage of how easy Design-Expert makes it to develop predictive models and graph responses, as you will see by doing this tutorial. It is assumed that at this stage you’ve mastered many program features by completing preceding tutorials. At the very least you ought to first do the one-factor RSM tutorials, both basic and advanced, prior to starting this one.

The historical data for this tutorial, shown below, comes from the U.S. Bureau of Labor Statistics via James Longley (An Appraisal of Least Squares Programs for the Electronic Computer from the Point of View of the User, Journal of the American Statistical Association, 62 (1967): 819-841). As discussed in RSM Simplified (Mark J. Anderson and Patrick J. Whitcomb, Productivity, Inc., New York, 2005: Chapter 2), it presents some interesting challenges for regression modeling.

Run A: Prices

B: GNP

C: Unemp.

D: Military

E: Pop.

F: Time

Employ.

# (1954 =100)

Armed Forces

People >14

Year Total

1 83 234289 2356 1590 107608 1947 60323

2 88.5 259426 2325 1456 108632 1948 61122

3 88.2 258054 3682 1616 109773 1949 60171

4 89.5 284599 3351 1650 110929 1950 61187

5 96.2 328975 2099 3099 112075 1951 63221

6 98.1 346999 1932 3594 113270 1952 63639

7 99 365385 1870 3547 115094 1953 64989

8 100 363112 3578 3350 116219 1954 63761

9 101.2 397469 2904 3048 117388 1955 66019

10 104.6 419180 2822 2857 118734 1956 67857

11 108.4 442769 2936 2798 120445 1957 68169

12 110.8 444546 4681 2637 121950 1958 66513

13 112.6 482704 3813 2552 123366 1959 68655

14 114.2 502601 3931 2514 125368 1960 69564

15 115.7 518173 4806 2572 127852 1961 69331

16 116.9 554894 4007 2827 130081 1962 70551

Longley data on U.S. economy from 1947-1962

Page 2: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

2 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

Assume the objective for analyzing this data is to predict future employment as a function of leading economic indicators – factors labeled A through F in the table above. Longley’s goal was different: He wanted to test regression software circa 1967 for round-off error due to highly correlated inputs. Will Design-Expert be up to the challenge? We will see!

Let’s begin by setting up this “experiment” (quotes added to emphasize this is not really an experiment but rather an after-the-fact analysis of happenstance data).

Design the “Experiment”

Click the Design-Expert icon that may appear on your desktop. You will see our handy new easy-start opening page. (To save you typing time, we will re-build a previously saved design rather than entering it from scratch.) Click Open Design as highlighted below.

New easy-start page – Open Design option highlighted in red

The file name is Longley.dxp. Double-click to open.

Opening the Longley data

The data table appears on your screen. To re-build this design (and thus see how it was created), press the blank-sheet icon at the left of the toolbar (or select File, New Design).

New Design icon

Click Yes when Design-Expert queries “Use previous design info?”

Page 3: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 3

Re-using previous design

Now you see how this design was created via the Response Surface tab and Historical Data option.

Setting up historical data design

Note for each of 6 numeric factors we entered name, units, and range from minimum (“Min”) to maximum (“Max”). Before moving ahead, you must set Design-Expert to how many rows of data you want to key or copy/paste into the design layout. In this case there are 16 rows.

Entry for rows

Press Continue to accept all entries on your screen. You now see response details – in this case only one response.

Page 4: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

4 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

Response entry

Press Continue to see the resulting design layout in run order. (Ignore the column labeled “Std” because there is no standard order for happenstance data).

A Peculiarity on Pasting Data

You could now type in all data for factor levels and resulting responses, row-by-row. (Don’t worry: We won’t make you do this!) However, in most cases data is already available via a Microsoft Window-based spreadsheet. If so, simply click/drag these data, copy to Window’s clipboard, and Edit, Paste (or right-click and Paste as shown below) into the design layout within Design-Expert. (Be sure, as shown below, to first click/drag the top row of all your destination cells.)

Correct way to paste data into Design-Expert (top-row of cells pre-selected)

If you simply click the upper left cell in the empty run sheet, the program only pastes one value.

Analyze the Results

Normally you’d save your work at this stage, but because we already did this, simply re-open our file: Press the Open Design icon and double-click Longley.dxp. Click No to pass up the opportunity to save what you did previously.

Last chance to save (say “No” in this case)

Before we get started, be forewarned you will encounter many statistics related to least squares regression and analysis of variance (ANOVA). If you are coming into

Page 5: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 5

this without previous knowledge, pick up a copy of RSM Simplified and keep it handy. For a good guided tour of statistics for RSM analysis, attend our Stat-Ease workshop titled RSM for Process Optimization. Details about this computer-intensive, hands-on class – including prerequisites – are at www.statease.com.

Under the Analysis branch, click the Employment branch. Design-Expert displays a screen for transforming response. However, as noted by the program, the response range in this case is so small that there is little advantage to applying any transformation.

Information about the response shown on the Transformation screen

Press Fit Summary. Design-Expert evaluates each degree of the model from the mean on up. In this case, the best that can be done is linear. Anything higher is aliased.

Fit Summary – only the linear model is possible here

Move on by pressing Model.

Page 6: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

6 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

Linear model is chosen

It’s all set up how Design-Expert suggested. Notice many two-factor interactions can’t be estimated due to aliasing – symbolized by a red tilde (~). Hold on to your hats (because this upcoming data is really a lot of hot air!) and press ANOVA (analysis of variance).

Analysis of variance (ANOVA)

Notice although the overall model is significant, some terms are not.

Page 7: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 7

Some statistical details on how Design-Expert does analysis of variance You may have noticed this ANOVA is labeled “[Partial sum of squares - Type III]. This approach to ANOVA, done by default, causes total sums-of-squares (SS) for the terms to come up short of the overall model when analyzing data from a non-orthogonal array, such as historical data. If you want SS terms to add up to the model SS, go to Edit, Preferences for Math and change the default to Sequential (Type I). However, we do not recommend this approach because it favors the first term put into the model. For example, in this case, ANOVA by partial SS (Type III -- the default of DX) for the response (employment total) calculates prob>F p-value for A as 0.8631 (F=0.031) as seen above, which is not significant. Recalculating ANOVA by sequential sum of squares (Type I) changes the p to <0.0001 (F=1876), which looks highly significant, but only because this term (main effect of factor A) is fit first. This simply is not correct.

Assuming Factor A (population) is least significant of all as indicated by default ANOVA (partial SS), let’s see what happens with it removed. However, before we do, on the Bookmarks, click R-Squared and view statistics (shown below) to help us compare what happens before and after reducing the model.

Model statistics

Also bookmark the Coefficients estimates.

Coefficient estimates for linear model

Notice the huge VIF (variance inflation factor) values. A value of 1 is ideal (orthogonal), but a VIF below 10 is generally accepted. A VIF above 1000, such as factor B (GNP), indicates severe multicollinearity in the model coefficients. (That’s bad!). In the follow-up tutorial (Part 2) based on this same Longley data, we delve more into this and other statistics generated by Design-Expert for purposes of design evaluation. For now, right-click any VIF result to access context-sensitive Help, or go to Help on the main menu and search on this statistic. You will find some details there.

Press Model again. Right-click A-Prices and Exclude it, or simply double-click this term to remove the “M” (model) designation.

Page 8: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

8 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

Excluding an insignificant term

You could now go back to ANOVA, look for the next least significant term, exclude it, and so on. However, this backward-elimination process can be performed automatically in Design-Expert. Here’s how. First, reset Process Order to Linear.

Resetting model to linear

Now change Selection to Backward.

Specifying backward regression

Notice a new field called “Alpha out” appears. By default the program removes the least significant term, step-by-step, as long as it exceeds the risk level (symbolized by statisticians with the Greek letter alpha) of 0.1 (estimated by p-value). Let’s be a bit more conservative by changing Alpha out to 0.05.

Page 9: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 9

Changing risk-level alpha for removing model terms via backward selection

Now press ANOVA to see what happens.

Backward regression results

Not surprisingly, the program first removed A and then E – that’s it. Look at the following ANOVA table and note all other terms come out significant. (Note: If you do not see the report of the model being “significant” change your View to Annotated ANOVA.)

ANOVA for backward-reduced model

You may have noticed that in the full model, factor B had a much higher p-value than what’s shown above. This instability is typical of models based on historical data. Scroll down the ANOVA table to view model statistics and coefficients.

Page 10: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

10 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

Backward-reduced model statistics and coefficients

Now let’s try a different regression approach – building the model from the ground (mean) up, rather than tearing terms down from the top (all terms in chosen polynomial). Press Model, then re-set Process Order to Linear. This time choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward, change Alpha in to 0.05.

Forward selection (remember to re-set model to the original process order first!)

Heed the text displayed by the program (When reducing your model…) because this approach may not work as well for this highly collinear set of factors. See what happens now in ANOVA.

Results of forward regression

Page 11: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 11

Surprisingly, factor B now comes in first as the single most significant factor. Then comes factor C. That’s it! The next most significant factor evidently does not achieve the alpha-in significance threshold of p<0.05.

Look at your ANOVA to see that all the other terms come out significant.

ANOVA for forward-reduced model

On Bookmarks, click R-Squared.

Forward-reduced model statistics and coefficients

This simpler model scores very high on all measures of R-squared, but it falls a bit short of what was achieved in the model derived from the backward regression.

Finally, go back to Model, re-set Process Order to Linear and try the last model Selection option offered by Design-Expert software: Stepwise.

Specifying stepwise regression

Page 12: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

12 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

As you might infer from seeing both Alpha in and Alpha out now displayed, stepwise algorithms involve elements of forward selection with bits of backward added in for good measure. For details, search program Help, but consider this – terms that pass the alpha test in (via forward regression) may later (after further terms are added) become disposable according to the alpha test out (via backward selection). If this seems odd, look back at how factor B’s p-value changed depending on which other factors were chosen with it for modeling. To see what happens with this forward-selection method, press ANOVA again. Results depend on what you do with Alpha in and Alpha out – both which default back to 0.1000.

As you see in the message displayed for both forward and stepwise (in essence an enhancement of forward) approaches, we favor the backward approach if you decide to make use of an automated selection method. Ideally, an analyst is also a subject-matter expert, or such a person is readily accessible. Then they could do model reduction via the manual method filtered not only by the statistics, but also by simple common sense from someone with profound system knowledge.

This concludes part 1 of our Longley data-set exploration. In Part 2 we mine deeper into Design-Expert to see interesting residual analysis aspects within Diagnostics, and we also see what can be gleaned from its sophisticated tools within Design, Evaluation.

Part 2 – Advanced Topics

Design Evaluation

If you still have the Longley data active in Design-Expert software from Part 1 of this tutorial, continue on. If you exited the program, re-start it and use Open

Design to open your data file (Longley.dxp). Under the Design branch of the program, click Evaluation. The software brings up a quadratic polynomial model by default, but, as you will see, the order must be downgraded to linear (we will get to the reason momentarily). The screen shot shows the Response field set at “Design Only” as opposed to the Employment response. In other words, it will evaluate the entire matrix of factors, regardless whether response data are present. The other option (response by response) comes in handy when experimenters end up with missing data, thus degrading the “designed-for” model.

Page 13: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 13

Design evaluation (design only)

Press the Results button.

Results of evaluation for quadratic polynomial

Page 14: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

14 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

This model is badly aliased. For example, the effect of A is confounded with -24.5 CD, etc. Go back to Model and reduce the Order to Linear.

Re-setting order to linear

Press Results again and note “No aliases found…” Much better!

Results of evaluation for linear model

On Bookmarks click the DF option to bring up the accounting for degrees of freedom.

Bookmarking to evaluate degrees of freedom (DF)

Looking over the annotations provided by the software (activated via View, Annotated Evaluation), notice this design flunks the recommendation for pure error df. Of course this really is not a designed experiment, but rather historical data collected at happenstance.

Page 15: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 15

Annotations for degrees of freedom

Study the next section of the evaluation by Design-Expert. Do any of the statistics pass the tests suggested for a good design? No!

Details on model terms, including power

Scroll down or bookmark to the leverage report. These statistics come out surprisingly good – none exceeds twice the average.

More statistics are available by going back to Model, selecting Options, and turning on (checkmarks) Matrix Measure and Correlation Matrices.

Page 16: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

16 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

Turning on more options for report

Click OK and view the Results. On Bookmarks choose Matrix to see new statistics.

Matrix measures for design evaluation

Page 17: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

DX9-05-3-HistRSM-V8.docx Rev. 5/2/14

Design-Expert 8 User’s Guide Historical Data RSM Tutorial – Part 1 17

Notice the condition number (12,220) far exceeds the level considered to represent severe multicollinearity for a design matrix (1000 or fewer). Viewing specific correlations reveals why.

Correlation matrices

Notice many values are highlighted in blue for being unacceptably high. No wonder Longley picked this data to test regression software!

Now, just for fun, press the Graphs button and select View, Perturbation (or press this option on the floating Graphs Tool).

Perturbation plot for standard error

Page 18: DX8 Tutorial - Historical RSM Part 1...choose Forward stepwise regression in Selection. To provide a fair comparison of this forward approach with that done earlier going backward,

18 Historical Data RSM Tutorial – Part 1 Design-Expert 8 User’s Guide

Notice factors B and F exhibit the most dramatic tracks for standard error. On the floating Graphs Tool select 3D Surface. On the Factors Tool, right-click factor F:Time and change it to X1 axis.

3D view of standard error for factors B and F

There’s no sense doing anything more. By now it’s clear that this ‘design’ fails all the tests for a good experiment, but that’s generally the nature of the beast for happenstance data.