02 regresion

MEET MTB UGUIDE 1 SC QREFUGUIDE 2INDEXCONTENTS HOW TO USE

STREGRSN.MK5 Page 1 Friday, December 17, 1999 12:23 PM

2Regression

Regression Overview, 2-2

■ Regression, 2-3

■ Stepwise Regression, 2-14

■ Best Subsets Regression, 2-20

■ Fitted Line Plot, 2-24

■ Residual Plots, 2-27

Logistic Regression Overview, 2-29

■ Binary Logistic Regression, 2-33

■ Ordinal Logistic Regression, 2-44

■ Nominal Logistic Regression, 2-51

See also,

■ Resistant Line, Chapter 8

■ Regression with Life Data, Chapter 16

MINITAB User’s Guide 2 2-1


Chapter 2 Regression Overview



Regression OverviewRegression analysis is used to investigate and model the relationship between a response variable and one or more predictors. MINITAB provides various least-squares and logistic regression procedures.

■ Use least squares regression when your response variable is continuous.

■ Use logistic regression when your response variable is categorical.

Both least squares and logistic regression methods estimate parameters in the model so that the fit of the model is optimized. Least squares minimizes the sum of squared errors to obtain parameter estimates, whereas MINITAB’s logistic regression commands obtain maximum likelihood estimates of the parameters. See Logistic Regression Overview on page 2-29 for more information about logistic regression.

Use the table below to assist in selecting a procedure:

Use… to…response type

estimation method

Regression(page 2-3)

perform simple or multiple regression: fit a model, store regression statistics, examine residual diagnostics, generate point estimates, generate prediction and confidence intervals, and perform lack-of-fit tests

continuous least squares

Stepwise (page 2-14)

perform stepwise, forward selection, or backward elimination which add or remove variables from a model in order to identify a useful subset of predictors


Best Subsets (page 2-20)

identify subsets the predictors based on the maximum R2 criterion


Fitted Line Plot (page 2-24)

perform linear and polynomial regression with a single predictor and plot a regression line through the data; on the actual or log10 scale


Residual Plots (page 2-27)

generate a set of residual plots to use for residual analysis: normal score plot, a chart of individual residuals, a histogram of residuals, and a plot of fits versus residuals


Binary Logistic (page 2-33)

perform logistic regression on a response with only two possible values, such as presence or absence

categorical maximum likelihood

Ordinal Logistic (page 2-44)

perform logistic regression on a response with three or more possible values that have a natural order, such as none, mild, or severe


Nominal Logistic (page 2-51)

perform logistic regression on a response with three or more possible values that have no natural order, such as sweet, salty, or sour


2-2 MINITAB User’s Guide 2


Regression Regression



RegressionYou can use Regression to perform simple and multiple regression using the method of least squares. Use this procedure for fitting general least squares models, storing regression statistics, examining residual diagnostics, generating point estimates, generating prediction and confidence intervals, and performing lack-of-fit tests.

You can also use this command to fit polynomial regression models. However, if you want to fit a polynomial regression model with a single predictor, you may find it more advantageous to use Fitted Line Plot (page 2-24).

Data

Enter response and predictor variables in numeric columns of equal length so that each row in your worksheet contains measurements on one observation or subject.

MINITAB omits all observations that contain missing values in the response or in the predictors, from calculations of the regression equation and the ANOVA table items.

h To do a linear regression

1 Choose Stat ➤ Regression ➤ Regression.

2 In Response, enter the column containing the response (Y) variable.

3 In Predictors, enter the columns containing the predictor (X) variables.

4 If you like, use one or more of the options listed below, then click OK.

Options

Graphs subdialog box

■ draw five different residual plots for regular, standardized, or deleted residuals—see Choosing a residual type on page 2-5. Available residual plots include a:



Chapter 2 Regression



– histogram.– normal probability plot.– plot of residuals versus the fitted values ( ).– plot of residuals versus data order. The row number for each data point is shown

on the x-axis (for example, 1 2 3 4… n).– separate plot for the residuals versus each specified column.

For a discussion, see Residual plots on page 2-6.

Results subdialog box

■ display the following in the Session window:– no output– the estimated regression equation, table of coefficients, s, R2, and the analysis of

variance table– the default output, which includes the above output plus the sequential sums of

squares and the fits and residuals of unusual observations– the default output, plus the full table of fits and residuals

Options subdialog box

■ perform weighted regression—see Weighted regression on page 2-6

■ exclude the intercept term from the regression by unchecking Fit Intercept—see Regression through the origin on page 2-7

■ display the variance inflation factor (VIF—a measure of multicollinearity effect) associated with each predictor—see Variance inflation factor on page 2-7

■ display the Durbin-Watson statistic which detects autocorrelation in the residuals—see Detecting autocorrelation in residuals on page 2-7

■ display the PRESS statistic and adjusted R-squared

■ perform a pure error lack-of-fit test for testing model adequacy when there are predictor replicates—see Testing lack-of-fit on page 2-8

■ perform a data subsetting lack-of-fit test to test the model adequacy—see Testing lack-of-fit on page 2-8

■ predict the response, confidence interval, and prediction interval for new observations—see Prediction of new observations on page 2-9

Storage subdialog box

■ store the coefficients, fits, and regular, standardized, and deleted residuals—see Choosing a residual type on page 2-5.

■ store the leverages, Cook’s distances, and DFITS, for identifying outliers—see Identifying outliers on page 2-9.

Y






■ store the mean square error, the (X′X)-1 matrix, and the R matrix of the QR or Cholesky decomposition. (The variance-covariance matrix of the coefficients is MSE*(XX)-1.) See Help for information on these matrices.

Residual analysis and regression diagnostics

Regression analysis usually does not end when a regression model has been fit. You also can examine residual plots and other regression diagnostics to assess if the residuals appear random and normally distributed. MINITAB provides a number of residual plots through the Graphs subdialog box. Alternatively, after fits and residuals are stored, you can use Stat ➤ Regression ➤ Residual Plots to obtain four plots within a single graph window.

MINITAB also produces regression diagnostics for identifying outliers or unusual observations. These observations may have a significant influence upon the regression results. See Identifying outliers on page 2-9. You might check unusual observations to see if they are correct. If so, you can try to determine why they are unusual and consider what effect they have on the regression equation. You might wish to examine how sensitive the regression results are to the outliers being present. Outliers can suggest inadequacies in the model or a need for additional information.

Choosing a residual type

You can calculate three types of residuals. Use the table below to help you choose which type you would like to plot:

Residual type… Choose when you want to… Calculation

regular examine residuals in the original scale of the data response − fit

standardized use a rule of thumb for identifying observations that are not fit well by the model. A standardized residual greater than 2, in absolute value, might be considered to be large. MINITAB displays these observations in a table of unusual observations, labeled with an R.

(residual) / (standard deviation of the residual)

Studentized identify observations that are not fit well by the model. Removing observations can affect the variance estimate and also can affect parameter estimates. A large absolute Studentized residual may indicate that including the observation in the model increases the error variance or that it has a large affect upon the parameter estimates, or both.

(residual) / (standard deviation of the residual). The ith studentized residual is computed with the ith observation removed.






Residual plots

MINITAB generates residual plots that you can use to examine the goodness of model fit. You can choose the following residual plots:

■ Normal plot of residuals. The points in this plot should generally form a straight line if the residuals are normally distributed. If the points on the plot depart from a straight line, the normality assumption may be invalid. To perform a statistical test for normality, use Stat ➤ Basic Statistics ➤ Normality Test (page 1-43).

■ Histogram of residuals. This plot should resemble a normal (bell-shaped) distribution with a mean of zero. Substantial clusters of points away from zero may indicate that factors other than those in the model may be influencing your results.

■ Residuals versus fits. This plot should show a random pattern of residuals on both sides of 0. There should not be any recognizable patterns in the residual plot. The following may indicate error that is not random:

– a series of increasing or decreasing points– a predominance of positive residuals, or a predominance of negative residuals– patterns such as increasing residuals with increasing fits

■ Residuals versus order. This is a plot of all residuals in the order that the data was collected and can be used to find non-random error, especially of time-related effects.

■ Residuals versus other variables. This is a plot of all residuals versus another variable. Commonly, you might use a predictor or a variable left out of the model and see if there is a pattern that you may wish to fit.

If certain residual values are of concern, you can brush your graph to identify these values. See the Brushing Graphs chapter in MINITAB User’s Guide 1 for more information.

Weighted regression

Weighted least squares regression is a method for dealing with observations that have nonconstant variances. If the variances are not constant, observations with

■ large variances should be given relatively small weights

■ small variances should be given relatively large weights

The usual choice of weights is the inverse of pure error variance in the response.

h To perform weighted regression

1 Choose Stat ➤ Regression ➤ Regression ➤ Options.

2 In Weights, enter the column containing the weights. The weights must be greater than or equal to zero. Click OK in each dialog box.






If there are n observations in the data set, MINITAB forms an n × n matrix W with the column of weights as its diagonal and zeros elsewhere. MINITAB calculates the regression coefficients by (X′WX)-1 (X′WY). This is equivalent to minimizing a weighted error sum of squares,

, where wi is the weight.

Regression through the origin

By default, the y-intercept term (also called the constant) is included in equation. Thus, MINITAB fits the model

However, if the response at X = 0 is naturally zero, a model without an intercept can make sense. If so, Uncheck Fit Intercept in the Options subdialog box, and the β0 term will be omitted. Thus, MINITAB fits the model

Because it is difficult to interpret the R2 when the constant is omitted, the R2 is not printed. If you wish to compare fits of models with and without intercepts, compare mean square errors and examine residual plots.

Variance inflation factor

The variance inflation factor (VIF) is used to detect whether one predictor has a strong linear association with the remaining predictors (the presence of multicollinearity among the predictors). VIF measures how much the variance of an estimated regression coefficient increases if your predictors are correlated (multicollinear).

VIF = 1 indicates no relation; VIF > 1, otherwise. The largest VIF among all predictors is often used as an indicator of severe multicollinearity. Montgomery and Peck [21] suggest that when VIF is greater than 5-10, then the regression coefficients are poorly estimated. You should consider the options to break up the multicollinearity: collecting additional data, deleting predictors, using different predictors, or an alternative to least square regression. For additional information, see [3], [21].

Detecting autocorrelation in residuals

In linear regression, it is assumed that the residuals are independent (that is, they are not autocorrelated) of each other. If the independence assumption were violated, some model fitting results would be questionable. For example, positive correlation between

wi y fit–( )2[ ]∑

Y β0 β1+ X1 β2X2 … βkXk ε+ + + +=

Y β1X1 β2X2 … βkXk ε+ + + +=






error terms tends to inflate the t-values for coefficients. Because of that, checking model assumptions after fitting a model is an important part of regression analysis.

MINITAB provides two methods to check this assumption:

■ A graph of residuals versus data order (1 2 3 4… n) can provide a means to visually inspect residuals for autocorrelation.

■ The Durbin-Watson statistic tests for the presence of autocorrelation in regression residuals by determining whether or not the correlation between two adjacent error terms is zero. The test is based upon an assumption that errors are generated by a first-order autoregressive process. If there are missing observations, these are omitted from the calculations, and only the nonmissing observations are used.

To reach a conclusion from the test, you will need to compare the displayed statistic with lower and upper bounds in a table. If D > upper bound, no correlation; if D < lower bound, positive correlation; if D is in between the two bounds, the test is inconclusive. For additional information, see [4], [22].

Testing lack-of-fit

MINITAB provides two lack-of-fit tests so you can determine whether or not the regression model adequately fits your data. The pure error lack-of-fit test requires replicates; the data subsetting lack-of-fit test does not require replicates.

■ Pure error lack-of-fit test—If your predictors contain replicates (repeated x values with one predictor or repeated combinations of x values with multiple predictors), MINITAB can calculate a pure error test for lack-of-fit. The error term will be partitioned into pure error (error within replicates) and a lack-of-fit error. The F-test can be used to test if you have chosen an adequate regression model. For additional information, see [9], [22], [29].

■ Data subsetting lack-of-fit test—MINITAB also performs a lack-of-fit test that does not require replicates but involves subsetting the data, and attempts to identify the nature of any lack-of-fit. This test is nonstandard, but it can provide information about the lack-of-fit relative to each variable. See [6] and Help for more information.

MINITAB performs 2k+1 hypothesis tests, where k is the number of predictors, and then combines them using Bonferroni inequalities to give an overall significance level of 0.1. A message is printed out for each test for which there is evidence of lack-of-fit. For each predictor, a curvature test and an interaction test are performed by comparing the fit above and below the predictor mean using indicator variables. A test can also be performed by fitting the model to the “central” portion of the data and then comparing the error sums of squares of that central data portion to the error sums of squares of all the data.




MINITAB

C



Prediction of new observations

If you have new predictor (X) values and you wish to know what the response would be using the regression equation, then use Prediction intervals for new observations in the Options subdialog box. Enter constants or columns containing the new x values, one for each predictor. Columns must be of equal length. If you enter a constant and a column(s), MINITAB will assume that you want predicted values for all combinations of constant and column values. You can change the confidence level from the default 95%, and you can also store the printed values: fits, standard errors of fits, confidence limits, and prediction limits. If you use prediction with weights, see Help for obtaining correct results.

Identifying outliers

In addition to graphs, you can store three additional measures for the purpose of identifying outliers, or unusual observations that can have a significant influence upon the regression. The measures are leverages, Cook’s distance, and DFITS:

■ Leverages are the diagonals of the “hat” matrix, H = X (X′X)-1 X′, where X is the design matrix. Note that hi depends only on the predictors; it does not involve the response Y. Many people consider hi to be large enough to merit checking if it is more than 2p/n or 3p/n, where p is the number of predictors (including one for the constant). MINITAB displays these in a table of unusual observations with high leverage. Those with leverage over 3p/n or 0.99, whichever is smallest, are marked with an X and those with leverage greater than 5p/n are marked with XX.

■ Cook’s distance combines leverages and Studentized residuals into one overall measure of how unusual the predictor values and response are for each observation. Large values signify unusual observations. Geometrically, Cook’s distance is a measure of the distance between coefficients calculated with and without the ith observation. Cook [7] and Weisberg [29] suggest checking observations with Cook’s distance > F (.50, p, n−p), where F is a value from an F-distribution.

■ DFITS, like Cook’s distance, combines the leverage and the Studentized residual into one overall measure of how unusual an observation is. DFITS (also called DFFITS) is the difference between the fitted values calculated with and without the ith observation, and scaled by stdev ( i). Belseley, Kuh, and Welsch [3] suggest that observations with DFITS > 2 should be considered as unusual. See Help for more details on these measures.

e Example of performing a simple linear regression

You are a manufacturer who wishes to easily obtain a quality measure on a product, but the procedure is expensive. However, there is a quick-and-dirty way of doing the same thing that is much less expensive but also is slightly less precise. You examine the relationship between the two scores to see if you can

Yp n⁄

User’s Guide 2 2-9

MEET MTB UGUIDE 1 SC QREFUGUIDE 2INDEXONTENTS HOW TO USE




predict the desired score (Score2) from the score that is easy to obtain (Score1). You also obtain a prediction interval for an observation with Score1 being 8.2.

1 Open the worksheet EXH_REGR.MTW.


3 In Response, enter Score2. In Predictors, enter Score1.

4 Click Options.

5 In Predict intervals for new observations, type 8.2. Click OK in each dialog box.

Sessionwindowoutput

Regression Analysis: Score2 versus Score1 The regression equation isScore2 = 1.12 + 0.218 Score1

Predictor Coef SE Coef T PConstant 1.1177 0.1093 10.23 0.000Score1 0.21767 0.01740 12.51 0.000

S = 0.1274 R-Sq = 95.7% R-Sq(adj) = 95.1%

Analysis of Variance

Source DF SS MS F PRegression 1 2.5419 2.5419 156.56 0.000Residual Error 7 0.1136 0.0162Total 8 2.6556

Unusual ObservationsObs Score1 Score2 Fit SE Fit Residual St Resid 9 7.50 2.5000 2.7502 0.0519 -0.2502 -2.15R

R denotes an observation with a large standardized residual

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI1 2.9026 0.0597 ( 2.7614, 3.0439) ( 2.5697, 3.2356)

Values of Predictors for New Observations

New Obs Score11 8.20

Interpreting the results

The regression procedure fits the model

Y β0 β1X ε+ +=






where Y is the response, X is the predictor, β0 and β1 are the regression coefficients, and ε is an error term having a normal distribution with mean of zero and standard deviation σ. MINITAB estimates β0 by b0, β1 by b1, and σ by s. The fitted equation is then

where is called the predicted or fitted value. In this example, b0 is 1.12 and b1 is 0.218.

Table of Coefficients. The first table in the output gives the estimated coefficients, b0 and b1, along with their standard errors. In addition, a t-value that tests whether the null hypothesis of the coefficient is equal to zero and the corresponding p-value is given. In this example, the p-values that are used to test whether the constant and slope are equal to zero are printed as 0.000, because MINITAB rounds these values to three decimal points. These p-values are actually less than 0.0005. These values indicate that there is sufficient evidence that the coefficients are not zero for likely Type I error rates (α levels).

S = 0.1274. This is an estimate of σ, the estimated standard deviation about the regression line. Note that

R-Sq = 95.7%. This is R2, also called the coefficient of determination. Note that R2 = Correlation (Y, )2. Also,

R2 = (SS Regression) / (SS Total)

The R2 value is the proportion of variability in the Y variable (in this example, Score2) accounted for by the predictors (in this example, Score1).

R-Sq(adj) = 95.1%. This is R2 adjusted for degrees of freedom. If a variable is added to an equation, R2 will get larger even if the added variable is of no real value. To compensate for this, MINITAB also prints R-Sq (adj), which is an approximately unbiased estimate of the population R2 that is calculated by the formula

converted to a percent, where p is the number of coefficients fit in the regression equation (2 in our example). In the same notation, the usual R2 is

Analysis of Variance. This table contains sums of squares (abbreviated SS). SS Regression is sometimes written SS (Regression | b0) and sometimes called SS Model. SS Error is sometimes written as SS Residual, SSE, or RSS. MS Error is often written as MSE. SS Total is the total sum of squares corrected for the mean. Use the analysis of variance

Y b0 b1X+=

Y

s2

MSError=

Y

R2

adj( ) 1 SS Error n p–( )⁄SS Total n 1–( )⁄--------------------------------------------–=

R2

1 SS Error

SS Total--------------------–=






table to assess the overall fit. The F-test is a test of the hypothesis H0: All regression coefficients, excepting β0, are zero.

Unusual Observations. Unusual observations are marked with an X if the predictor is unusual (large leverage), and they are marked with an R if the response is unusual (large standardized residual). See Choosing a residual type on page 2-5 and Identifying outliers on page 2-9. The default is to print only unusual observations. You can choose to print a full table of fitted values by selecting this option in the Results subdialog box.

The Fit or fitted Y value is sometimes called predicted Y value or . SE Fit is the (estimated) standard error of the fitted value. St Resid is the standardized residual.

Predicted Values. The interval displayed under 95% CI is the confidence interval for the population mean of all responses (Score2) that correspond to the given value of the predictor (Score1 = 8.2). The interval displayed under 95% PI is the prediction interval for an individual observation taken at Score1 = 8.2. The confidence interval is appropriate for the data used in the regression. If you have new observations, use the prediction interval. See Prediction of new observations on page 2-9.

Regression analysis would not be complete without examining residual patterns. The following multiple regression example and residual plots procedure provide additional information about regression analysis.

e Example of a multiple regression

As part of a test of solar thermal energy, you measure the total heat flux from homes. You wish to examine whether total heat flux (Heatflux) can be predicted by insulation, by the position of the focal points in the east, south, and north directions, and by the time of day. Data are from [21], page 486. You found, using best subsets regression on page 2-23, that the best two-predictor model included the variables North and South and the best three-predictor added the variable East. You would like to evaluate the three-predictor model using multiple regression.



3 In Response, enter Heatflux.

4 In Predictors, enter North South East. Click OK.

Y






Sessionwindowoutput

Regression Analysis: HeatFlux versus North, South, East The regression equation isHeatFlux = 389 - 24.1 North + 5.32 South + 2.12 East

Predictor Coef SE Coef T PConstant 389.17 66.09 5.89 0.000North -24.132 1.869 -12.92 0.000South 5.3185 0.9629 5.52 0.000East 2.125 1.214 1.75 0.092

S = 8.598 R-Sq = 87.4% R-Sq(adj) = 85.9%


Source DF SS MS F PRegression 3 12833.9 4278.0 57.87 0.000Residual Error 25 1848.1 73.9Total 28 14681.9

Source DF Seq SSNorth 1 10578.7South 1 2028.9East 1 226.3

Unusual ObservationsObs North HeatFlux Fit SE Fit Residual St Resid 4 17.5 230.70 210.20 5.03 20.50 2.94R 22 17.6 254.50 237.16 4.24 17.34 2.32R

R denotes an observation with a large standardized residual


MINITAB fits the regression model

where Y is the response, X1, X2, and X3 are the predictors, β0, β1, β2, and β3 are the regression coefficients, and ε is an error term having a normal distribution with mean of 0 and standard deviation σ.

The multiple regression output is similar to the simple regression output, but it also includes the sequential sums of squares. Sequential sums of squares differ from t-statistics. T-statistics test the null hypothesis that each coefficient is zero, given that all other variables are present in the model. The sequential sums of squares are the unique sums of squares of the current variable, given the sums of squares of any previously entered variables.

For example, in the sequential sums of squares column of the Analysis of Variance table, the value for North (10578.7) is the sums of squares for North; the value for South (2028.9) is the unique sums of squares for South given the sums of squares for North;

Y β0 β1X1 β2X2 β3X3 e+ + + +=



Chapter 2 Stepwise Regression



and the value for East (226.3) is the unique sums of squares for East given the sums of squares of North and South.

The first line in the sequential sums of squares table gives SS (b1 | b0), or the reduction in SS Error due to fitting the b1 term (an equivalent is to use X1 as a predictor), assuming that you have already fit b0. The next line gives SS (b2 | b0, b1), or the reduction in SS Error due to fitting the b2 term, assuming that you have already fit the terms b0 and b1. The next line is SS (b3 | b0, b1, b2), and so on. If you want a different sequence, say SS (b2 | b0, b3), then repeat the regression procedure and enter X3 first, then X2. MINITAB does not print p-values for the sequential sums of squares. Except for the last sequential sums of squares, the mean square error should not be used to test the significance of these terms.

In this example, t-test p-values of less than 0.0005 indicate that there is significant evidence that the coefficients of variables North and South are not zero. The coefficient of the variable East, however, has an t-test p-value of 0.092. If the evidence for the coefficient not being zero appears insufficient and if it adds little to the prediction, you may choose the more parsimonious model with predictors North and South. Make this decision only after examining the residuals. In the residual plots example on page 2-28, you examine the residuals from the model with predictors North and South. (Alternatively, you could have used the graphs available in the Graphs subdialog box.)

Stepwise RegressionStepwise regression removes and adds variables to the regression model for the purpose of identifying a useful subset of the predictors. MINITAB provides three commonly used procedures: standard stepwise regression (adds and removes variables), forward selection (adds variables), and backward elimination (removes variables).

Data

Enter response and predictor variables in the worksheet in numeric columns of equal length so that each row in your worksheet contains measurements on one observation or subject. MINITAB automatically omits rows with missing values from the calculations.



Stepwise Regression Regression



h To do a stepwise regression

1 Choose Stat ➤ Regression ➤ Stepwise.

2 In Response, enter the numeric column containing the response (Y) data.

3 In Predictors, enter the numeric columns containing the predictor (X) variables.


Options

Stepwise dialog box

■ By entering variables in Predictors to include in every model, you can designate a set of predictor variables that cannot be removed from the model, even when their p-values are less than the Alpha to enter value.

Method subdialog box

■ perform standard stepwise regression (adds and removes variables), forward selection (adds variables), or backward elimination (removes variables).

■ when you choose the Stepwise method, you can enter a starting set of predictor variables in Enter. These variables are removed if their p-values are greater than the Alpha to enter value. If you want keep variables in the model regardless of their p-values, enter them in Predictors to include in every model in the main dialog box. See Stepwise regression (default) on page 2-16.

■ when you choose the Stepwise or Forward selection method, you can set the value of the α for entering a new variable in the model in Alpha to enter. See Stepwise regression (default) and Forward selection on page 2-17.

■ when you choose the Stepwise or Backward elimination method, you can set the value of α for removing a variable from the model in Alpha to remove. See Stepwise regression (default) and Backward elimination on page 2-17.







■ display the next best alternate predictors up to the number requested. If a new predictor is entered into the model, MINITAB displays the predictor which was the second best choice, the third best choice, and so on, up to the requested number.

■ set the number of steps between pauses. See User intervention on page 2-17.

■ exclude the intercept term from the regression by unchecking Fit Intercept. See Regression through the origin on page 2-7.

Method

MINITAB provides three commonly used procedures: standard stepwise regression (page 2-16), forward selection (page 2-17), and backward elimination (page 2-17)

Stepwise regression (default)

The basic method of stepwise regression is to calculate an F-statistic for each variable in the model. Suppose the model contains X1, …, Xj. Then the F-statistic for Xi is

with 1 and n − j − 1 degrees of freedom.

MINITAB then determines the corresponding p-value. If the p-value for any variable is greater than Alpha to remove, the variable with the largest p-value is removed from the model. The regression equation is calculated for this smaller model, the results are printed, and the procedure proceeds to a new step.

If no variable can be removed, the procedure attempts to add a variable. An F-statistic is calculated for each variable not yet in the model. Suppose the model, at this stage, contains X1, …, Xp. Then the F-statistic for a new variable, Xj + 1 is

MINITAB then determines the corresponding p-value. The variable with the smallest p-value is then added, provided its p-value is smaller than Alpha to enter. Adding this variable is equivalent to choosing the variable with the largest partial correlation or to choosing the variable that most effectively reduces the error SS. The regression equation is then calculated, results are displayed, and the procedure goes to a new step.

When no more variables can be entered into or removed from the model, the stepwise procedure ends.

SSE X1 … X i 1–( ) X i 1+( ) … Xj, ,,, ,[ ] SSE X1 … Xj, ,[ ]–MSE X1 … Xj, ,[ ]

-------------------------------------------------------------------------------------------------------------------------------------------------

SSE X1 … Xj, ,[ ] SSE X1 … Xj Xj 1+,, ,[ ]–

MSE X1 … Xj X j 1+,, ,[ ]-----------------------------------------------------------------------------------------------------------






Forward selection

This procedure adds variables to the model using the same method as the stepwise procedure. Once added, however, a variable is never removed. The forward selection procedure ends when none of the candidate variables have a p-value smaller than Alpha to enter.

Backward elimination

This procedure starts with the model that contains all the predictors and then removes variables, one at a time, using the same method as the stepwise procedure. No variable, however, can re-enter the model. The backward elimination procedure ends when none of the variables included the model have a p-value greater than Alpha to remove.

User intervention

Stepwise proceeds automatically by steps and then pauses. You can set the number of steps between pauses in the Options subdialog box.

The number of steps can start at one with the default and maximum determined by the output width. Set a smaller value if you wish to intervene more often. You must check Editor ➤ Enable Commands in order to intervene and use the procedure interactively. If you do not, the procedure will run to completion without pausing.

At the pause, MINITAB displays a MORE? prompt. At this prompt, you can continue the display of steps, terminate the procedure, or intervene by typing a subcommand.

To … Type

display another “page” of steps (or until no more predictors can enter or leave the model)

YES

terminate the procedure NO

enter a set of variables ENTER C…C

remove a set of variables REMOVE C…C

force a set of variables to be in model FORCE C…C

display the next best alternate predictors BEST K

set the number of steps between pauses STEPS K

change F to enter FENTER K

change F to remove FREMOVE K

change α to enter AENTER K

change α to remove AREMOVE K






Use of variable selection procedures

Variable selection procedures can be a valuable tool in data analysis, particularly in the early stages of building a model. At the same time, these procedures present certain dangers. Here are some considerations:

■ Since the procedures automatically “snoop” through many models, the model selected may fit the data “too well.” That is, the procedure can look at many variables and select ones which, by pure chance, happen to fit well.

■ The three automatic procedures are heuristic algorithms, which often work very well but which may not select the model with the highest R2 value (for a given number of predictors).

■ Automatic procedures cannot take into account special knowledge the analyst may have about the data. Therefore, the model selected may not be the best from a practical point of view.

e Example of a stepwise regression

Students in an introductory statistics course participated in a simple experiment. Each student recorded his or her height, weight, gender, smoking preference, usual activity level, and resting pulse. They all flipped coins, and those whose coins came up heads ran in place for one minute. Afterward, the entire class recorded their pulses once more. You wish to find the best predictors for the second pulse rate.

1 Open the worksheet PULSE.MTW.

2 Press c+M to make the Session window active.

3 Check Editor ➤ Enable Commands.

4 Choose Stat ➤ Regression ➤ Stepwise.

5 In Response, enter Pulse2.

6 In Predictors, enter Pulse1 Ran–Weight.

7 Click Options.

8 In Number of steps between pauses, enter 2. Click OK in each dialog box.

9 At the first More? prompt, type Yes.

10 At the second More? prompt, type No.






Sessionwindowoutput

Stepwise Regression: Pulse2 versus Pulse1, Ran, ... F-to-Enter: 4 F-to-Remove: 4

Response is Pulse2 on 6 predictors, with N = 92

Step 1 2Constant 10.28 44.48

Pulse1 0.957 0.912T-Value 7.42 9.74P-Value 0.000 0.000

Ran -19.1T-Value -9.05P-Value 0.000

S 13.5 9.82R-Sq 37.97 67.71R-Sq(adj) 37.28 66.98C-p 103.2 13.5 More? (Yes, No, Subcommand, or Help)

Step 3Constant 42.62

Pulse1 0.812T-Value 8.88P-Value 0.000

Ran -20.1T-Value -10.09P-Value 0.000

Sex 7.8T-Value 3.74P-Value 0.000

S 9.18R-Sq 72.14R-Sq(adj) 71.19C-p 1.9 More? (Yes, No, Subcommand, or Help)


This example uses six predictors. You requested that MINITAB do two steps of the automatic stepwise procedure, display the results, and allow you to intervene.

The first “page” of output gives results for the first two steps. In step 1, the variable Pulse1 entered the model; in step 2, the variable Ran entered. No variables were removed on either of the first two steps. For each model, MINITAB displays the constant



Chapter 2 Best Subsets Regression



term, the coefficient and its t-value for each variable in the model, S (square root of MSE), and R2.

Because you answered YES at the MORE? prompt, the automatic procedure continued for one more step, adding the variable Sex. At this point, no more variables could enter or leave, so the automatic procedure stopped and again allowed you to intervene. Because you do not want to intervene, you typed NO.

The stepwise output is designed to present a concise summary of a number of fitted models. If you want more information on any of the models, you can use the regression procedure (page 2-3).

Best Subsets RegressionBest subsets regression generates regression models using the maximum R2 criterion by first examining all one-predictor regression models and then selecting the two models giving the largest R2. MINITAB displays information on these models, examines all two-predictor models, selects the two models with the largest R2, and displays information on these two models. This process continues until the model contains all predictors.

Data

Enter response and predictor variables in the worksheet in numeric columns of equal length so that each row in your worksheet contains measurements on one unit or subject. MINITAB automatically omits rows with missing values from all models.

When there are a large number of predictors, there are many subsets. There is a limit to the number of free predictors (variables which may optionally enter the model). If you have m free predictors and force q predictors to be in the model, then (m − q) must be less than or equal to 31.

In general, best subsets regression can take a long time when there are 15 or more free predictors. The length of time, however, varies with the data set. If you have to analyze a very large data set, consider forcing certain predictors to be in the model, thereby decreasing the number of free variables.



Best Subsets Regression Regression



h To do a best subsets regression

1 Choose Stat ➤ Regression ➤ Best Subsets.

2 In Response, enter the numeric column containing the response (Y) data.

3 In Free predictors, enter up to 31 numeric columns containing the predictor (X) variables. MINITAB can include any of these free predictors if they meet the specified criteria (for example, greater than F to enter).


Options

Best Subsets Regression dialog box

■ force a set of predictors to be in the model by entering these variables in Predictors in all models. These variables cannot be removed, even when their F-statistics are less than the F to Enter value.


■ display information for a specified number of free variables in the models. You can give a minimum and maximum. For example, if you specify 3 as the minimum and 6 as the maximum, MINITAB will display the best 3, 4, 5, and 6 variable models (this number does not include variables forced into the model).

■ display information about the “best” models of each variable number. You can specify the number of “best” models to display. For example, if you specify 3, MINITAB will display the best, second best, and third best models for each number of variables. You can enter a value from 1 to 5 (the default is 2).

■ exclude the intercept term from the regression by unchecking Fit Intercept—see Regression through the origin on page 2-7.



Chapter 2 Best Subsets Regression



Using the best subsets regression procedure

The best subsets regression procedure can be used to select a group of likely models for further analysis. The general method is to select the smallest subset that fulfills certain statistical criteria. The reason that you would use a subset of variables rather than a full set is because the subset model may actually estimate the regression coefficients and predict future responses with smaller variance than the full model using all predictors [15].

The statistics R2, adjusted R2, Cp, and s (square root of MSE) are calculated by the best subsets procedure and can be used as comparison criteria.

Typically, you would only consider subsets that provide the largest R2 value. However, R2 always increases with the size of the subset. For example, the best 5-predictor model will always have a higher R2 than the best 4-predictor model. Therefore, R2 is most useful when comparing models of the same size. When comparing models with the same number of predictors, choosing the model with the highest R2 is equivalent to choosing the model with the smallest SSE.

Use adjusted R2 and Cp to compare models with different numbers of predictors. In this case, choosing the model with the highest adjusted R2 is equivalent to choosing the model with the smallest mean square error (MSE). If adjusted R2 is negative (usually when there is a large number of predictors and small R2) then MINITAB sets the adjusted R2 to zero.

The Cp statistic is given by the formula

where SSEp is SSE for the best model with p parameters (including the intercept, if it is in the equation), and MSEm is the mean square error for the model with all m predictors.

In general, look for models where Cp is small and close to p. If the model is adequate (i.e., fits the data well), then the expected value of Cp is approximately equal to p (the number of parameters in the model). A small value of Cp indicates that the model is relatively precise (has small variance) in estimating the true regression coefficients and predicting future responses. This precision will not improve much by adding more predictors. Models with considerable lack-of-fit have values of Cp larger than p. See [15] for additional information on Cp.

Exercise caution when using variable selection procedures such as best subsets (and stepwise regression). These procedures are automatic and therefore do not consider the practical importance of any of the predictors. In addition, anytime you fit a model to data, the goodness of the fit comes from two basic sources:

■ fitting the underlying structure of the data (a structure that will appear in other data sets gathered in the same way)

■ fitting the peculiarities of the one particular data set you analyze

CpSSEp

MSEm

----------------- n 2p–( )–=



Best Subsets Regression Regression



Unfortunately, when you search through many models to find the “best,” as you do in best subsets regression, a good fit is often chosen largely for the second reason. There are two ways that you can verify a model obtained by a variable selection procedure. You can

■ verify the model using a new set of data.

■ take the original data set and randomly divide it into two parts. Then use the variable selection procedure on one part to select a model and verify the fit using the second part.

e Example of best subsets regression

Total heat flux is measured as part of a solar thermal energy test. You wish to see how total heat flux is predicted by other variables: insolation, the position of the focal points in the east, south, and north directions, and the time of day. Data are from Montgomery and Peck [21], page 486.

1 Open the worksheet EXH_REGR.

2 Choose Stat ➤ Regression ➤ Best Subsets.


4 In Free Predictors, enter Insolation-Time. Click OK.

Sessionwindowoutput

Best Subsets Regression: HeatFlux versus Insolation, East, ... Response is HeatFlux

I n s o S N l E o o T a a u r i t s t t m Vars R-Sq R-Sq(adj) C-p S i t h h e

1 72.1 71.0 38.5 12.328 X 1 39.4 37.1 112.7 18.154 X 2 85.9 84.8 9.1 8.9321 X X 2 82.0 80.6 17.8 10.076 X X 3 87.4 85.9 7.6 8.5978 X X X 3 86.5 84.9 9.7 8.9110 X X X 4 89.1 87.3 5.8 8.1698 X X X X 4 88.0 86.0 8.2 8.5550 X X X X 5 89.9 87.7 6.0 8.0390 X X X X X


Each line of the output represents a different model. Vars is the number of variables or predictors in the model. The statistics R2, adjusted R2, Cp, and s are displayed next (R2



Chapter 2 Fitted Line Plot



and adjusted R2 are converted to percentages). Predictors that are present in the model are indicated by an X.

In this example, the best one-predictor model uses North (R2 adj = 71.0) and the second-best one-predictor model uses Insolation (R2 adj = 37.1). Moving from the best one-predictor model to the best two-predictor model increased the adjusted R2 from 71.0 to 84.8. R2 usually increases slightly as more predictors are added even when the new predictors do not improve the model. The best two-predictor model might be considered as the minimum fit. The multiple regression example on page 2-12 and the residual plots example on page 2-28 indicate that adding the variable East does not improve the fit of the model.

Fitted Line PlotThis procedure performs regression with linear and polynomial (second or third order) terms, if requested, of a single predictor variable and plots a regression line through the data, on the actual or log10 scale. Polynomial regression is one method for modeling curvature in the relationship between a response variable (Y) and a predictor variable (X) by extending the simple linear regression model to include X2 and X3 as predictors.

Data

Enter your response and single predictor variables in the worksheet in numeric columns of equal length so that each row in your worksheet contains measurements on one unit or subject. MINITAB automatically omits rows with missing values from the calculations.

h To do a fitted line plot

1 Choose Stat ➤ Regression ➤ Fitted Line Plot.

2 In Response (Y), enter the numeric column containing the response data.

3 In Predictor (X), enter the numeric column containing the predictor variable.




Fitted Line Plot Regression



Options

Fitted Line Plot dialog box

■ choose a linear (default), quadratic, or cubic regression model to automatically include all lower order terms. See Polynomial regression model choices on page 2-25.


■ transform the y-variable by log10Y. You can also choose to display the y-scale in the log10 scale.

■ transform the x-variable by log10X. You can also choose to display the plot x scale in the log10 scale. If you use this option with polynomials of order greater than one, then the polynomial regression will be based on powers of the log10X.

■ display confidence bands and prediction bands about the regression line. You can also change the confidence level from the default of 95%.

■ replace the default title with your own title.


■ store the residuals, fits, and regression model coefficients (b0, b1, b2, up to b3 down the column, where bi is the coefficient of the ith power of the predictor or transformed predictor).

■ store the scaled residuals and scaled fits when using the y-variable transformation, log10Y.

Polynomial regression model choices

You can fit the following linear, quadratic, or cubic regression models:

Another way of modeling curvature is to generate additional models by using the log10 of X and/or Y for linear, quadratic, and cubic models. In addition, taking the log10 of Y may be used to reduce right-skewness or nonconstant variance of residuals.

Model type Order Statistical model

linear first

quadratic second

cubic third

Y β0 β1+ X ε+=

Y β0 β1+ X β2X2 ε+ +=

Y β0 β1+ X β2X2 β3X

3 ε+ + +=



Chapter 2 Fitted Line Plot



e Example of plotting a fitted regression line

You are studying the relationship between a particular machine setting and the amount of energy consumed. This relationship is known to have considerable curvature, and you believe that a log transformation of the response variable will produce a more symmetric error distribution. You choose to model the relationship between the machine setting and the amount of energy consumed with a quadratic model.


2 Choose Stat ➤ Regression ➤ Fitted Line Plot.

3 In Response (Y), enter EnergyConsumption.

4 In Predictor (X), enter MachineSetting.

5 Under Type of Regression Model, choose Quadratic.

6 Click Options. Check Logten of Y, Display logscale for Y variable, Display confidence bands, and Display prediction bands. Click OK in each dialog box.

Sessionwindowoutput

Polynomial Regression Analysis: EnergyConsum versus MachineSetti

The regression equation is log(EnergyConsum) = 7.06962 - 0.698628 MachineSetti + 0.0173974 MachineSetti**2 S = 0.167696 R-Sq = 93.1 % R-Sq(adj) = 91.1 %


Source DF SS MS F PRegression 2 2.65326 1.32663 47.1743 0.000Error 7 0.19685 0.02812 Total 9 2.85012

Source DF Seq SS F PLinear 1 0.03688 0.1049 0.754Quadratic 1 2.61638 93.0370 0.000

Graphwindowoutput



Residual Plots Regression




The quadratic model (p-value = 0.000, or actually p-value < 0.0005) appears to provide a good fit to the data. The R2 indicates that machine setting accounts for 93.1% of the variation in log10 of the energy consumed. A visual inspection of the plot reveals that the data are evenly spread about the regression line, implying no systematic lack-of-fit. The lines labeled CI are the 95% confidence limits for the log10 of energy consumed. The lines labeled PI are the 95% prediction limits for new observations.

Residual PlotsYou can generate a set of plots to use for residual analysis by storing fits and residuals using another procedure, such as regression, and then using the Residual Plots procedure to produce a normal score plot, a chart of individual residuals, a histogram of residuals, and a plot of fits versus residuals, all on the same graph.

Data

You must save a column of residuals and a column of fits from another MINITAB procedure. MINITAB automatically omits rows with missing values from the calculations.

h To display the residual plots

1 Choose Stat ➤ Regression ➤ Residual Plots.

2 In Fits, enter the column containing stored fits.

3 In Residuals, enter the column containing the stored residuals.

4 If you like, use the option listed below, then click OK.

Options

You can replace the default title with your own title.



Chapter 2 Residual Plots



e Example of residual plots

You examine the residuals from the best two-predictor model of the best subsets regression example on page 2-23. You determined in the multiple regression example on page 2-12 that adding the third variable from the best three-predictor model may not add appreciably to the fit. You now examine residual patterns from the best two-predictor model to further examine goodness-of-fit.

Step 1: Store the residuals and fits from a regression analysis




4 In Predictors, enter South North.

5 Click Storage. Check Fits and Standardized residuals.

6 Click OK in each dialog box.

Step 2: Generate the residual plots

1 Choose Stat ➤ Regression ➤ Residual Plots.

2 In Fits, enter the column containing the stored fits.

3 In Residuals, enter the column containing the stored residuals. Click OK.

Sessionwindowoutput

Residual Plots

TEST 1. One point more than 3.00 sigmas from center line.Test Failed at points: 22

TEST 2. 9 points in a row on same side of center line.Test Failed at points: 16

Graphwindowoutput



Logistic Regression Overview Regression




The residuals plots procedure generates four plots in one graph window. The normal plot shows an approximately linear pattern that is consistent with a normal distribution. Similarly, the histogram exhibits a pattern that is consistent with a sample from a normal distribution. However, the I Chart (a control chart of individual observations) reveals that one point labeled with a 1—the twenty-second value—is outside the three sigma limits, and another point labeled with a 2 is flagged because it is the ninth in a row on the same side of the mean.

The plot of residuals versus fits shows that the fit tends to be better for higher predicted values. Investigation shows that the highest residual coincides with the highest value of the variable East. Including East in the model and repeating the residual plots procedure showed that no points are flagged as unusual (not shown). The contribution to the fit by the variable East may warrant further investigation.

Logistic Regression OverviewBoth logistic regression and least squares regression investigate the relationship between a response variable and one or more predictors. A practical difference between them is that logistic regression techniques are used with categorical response variables, and linear regression techniques are used with continuous response variables.

MINITAB provides three logistic regression procedures that you can use to assess the relationship between one or more predictor variables and a categorical response variable of the following types:

Both logistic and least squares regression methods estimate parameters in the model so that the fit of the model is optimized. Least squares minimizes the sum of squared errors to obtain parameter estimates, whereas logistic regression obtains maximum likelihood estimates of the parameters using an iterative-reweighted least squares algorithm [19].

Tip You can identify points in the plots using the brushing capabilities. See the Brushing Graphs chapter in MINITAB User’s Guide 1.

Variabletype

Number ofcategories Characteristics Examples

Binary 2 two levels success, failureyes, no

Ordinal 3 or more natural orderingof the levels

none, mild, severefine, medium, coarse

Nominal 3 or more no natural ordering of the levels

blue, black, red, yellowsunny, rainy, cloudy



Chapter 2 Logistic Regression Overview



How to specify the model terms

The logistic regression procedures can fit models with:

■ up to 9 factors and up to 50 covariates

■ crossed or nested factors—see Crossed vs. nested factors on page 3-19

■ covariates that are crossed with each other or with factors, or nested within factors

Model continuous predictors as covariates and categorical predictors as factors. Here are some examples. A is a factor and X is a covariate.

The model for logistic regression is a generalization of the model used in MINITAB’s general linear model (GLM) procedure. Any model fit by GLM can also be fit by the logistic regression procedures. For a discussion of specifying models in general, see Specifying the model terms on page 3-20 and Specifying reduced models on page 3-21. In the logistic regression commands, MINITAB assumes any variable in the model is a covariate unless the variable is specified as a factor. In contrast, GLM assumes that any variable in the model is a factor unless the variable is specified as a covariate.

Model restrictions

Logistic regression models in MINITAB have the restrictions as GLM models:

■ There must be enough data to estimate all the terms in your model, so that the model is full rank. MINITAB will automatically determine if your model is full rank and display a message. In most cases, eliminating some unimportant high order interactions in your model should solve your problem.

■ The model must be hierarchical. In a hierarchical model, if an interaction term is included, all lower order interactions and main effects that comprise the interaction term must appear in the model.

Model terms

A X A∗X fits a full model with a covariate crossed with a factor

A | X an alternative way to specify the previous model

A X X∗X fits a model with a covariate crossed with itself making a squared term

A X(A) fits a model with a covariate nested within a factor



Logistic Regression Overview Regression



Reference levels for factors

MINITAB needs to assign one factor level as the reference level, meaning that the interpretation of the estimated coefficients is relative to this level. MINITAB designates the reference level based on the data type:

■ For numeric factors, the reference level is the level with the least numeric value.

■ For date/time factors, the reference level is the level with the earliest date/time.

■ For text factors, the reference level is the level that is first in alphabetical order.

You can change the default reference level in the Options subdialog box.

For more information, Interpreting the parameter estimates relative to the event and the reference levels on page 2-39.

Logistic regression creates a set of design variables for each factor in the model. If there are k levels, there will be k−1 design variables and the reference level will be coded as 0. Here are two examples of the default coding scheme:

Reference event for the response variable

MINITAB needs to designate one of the response values as the reference event. MINITAB defines the reference event based on the data type:

■ For numeric factors, the reference event is the greatest numeric value.

■ For date/time factors, the reference event is the most recent date/time.

■ For text factors, the reference event is the last in alphabetical order.

You can change the default reference event in the Options subdialog box.

For more information, Interpreting the parameter estimates relative to the event and the reference levels on page 2-39.

Note If you have defined a value order for a text factor, the default rule above does not apply. MINITAB designates the first value in the defined order as the reference value. See Ordering Text Categories in the Manipulating Data chapter in MINITAB User’s Guide 1.

Factor A with 4 levels(1 2 3 4)

Factor B with 3 levels(Temp Pressure Humidity)

reference level

A1 A2 A3 reference level

B1 B21 0 0 0 Humidity 0 02 1 0 0 Pressure 1 03 0 1 0 Temp 0 14 0 0 1

Note If you have defined a value order for a text factor, the default rule above does not apply. MINITAB designates the last value in the defined order as the reference event. See Ordering Text Categories in the Manipulating Data chapter in MINITAB User’s Guide 1.



Chapter 2 Logistic Regression Overview



Worksheet structure

Data used for input to the logistic regression procedures may be arranged in two different ways in your worksheet: as raw (categorical) data, or as frequency (collapsed) data. For binary logistic regression, there are three additional ways to arrange the data in your worksheet: as successes and trials, as successes and failures, or as failures and trials. These ways are illustrated here for the same data.

The response entered as raw data or as frequency data

C1 C2 C3 C4Response Factor Covar

0 1 121 1 121 1 12...

......

1 1 120 2 121 2 12...

......

1 2 12...

......

C1 C2 C3 C4Response Count Factor Covar

0 1 1 121 19 1 120 1 2 121 19 2 120 5 1 241 15 1 240 4 2 241 16 2 240 7 1 501 13 1 500 8 2 501 12 2 500 11 1 1251 2 1 1250 9 2 1251 11 2 1250 19 1 2001 1 1 2000 18 2 2001 2 2 200

Raw Data: one row for each observation

Frequency Data: one row for each combination of factor and

1

19

1

19



Binary Logistic Regression Regression



The binary response entered as the number of successes, failures, or trials

Enter one row for each combination of factor and covariate.

Use caution when viewing large regression coefficients

If the absolute value of the regression coefficient is large, exercise caution in judging the p-value of the test. When the absolute regression coefficients are large, their calculated standard errors can be too large, leading you to conclude that they are not significant [13]. If you have one or more large absolute regression coefficients for the factor(s) and/or covariate(s), the best test is to perform logistic regression both with and without these terms and make a conclusion based upon the change in the log-likelihood.

If you do test the significance of model terms in this way, your test statistic will be −2∗(log-likelihood from reduced model − log-likelihood from full model). To compute the p-value for this test, choose Calc ➤ Probability Distributions ➤ Chi-square. In Degrees of freedom, enter the model degrees of freedom from full model − model degrees of freedom from reduced model, where the model degrees of freedom are the number of estimated coefficients. Check Input constant, and enter the test statistic from above. Store the answer in a constant, say k1, and then calculate the p-value as 1 − k1 using Calc ➤ Calculator.

Binary Logistic RegressionUse binary logistic regression to perform logistic regression on a binary response variable. A binary variable only has two possible values, such as presence or absence of a particular disease. A model with one or more predictors is fit using an iterative-reweighted least squares algorithm to obtain maximum likelihood estimates of the parameters [19].

Successes and TrialsC1 C2 C3 C4S T Factor Covar19 20 1 1219 20 2 1215 20 1 2416 20 2 2413 20 1 5012 20 2 509 20 1 12511 20 2 1251 20 1 2002 20 2 200

Successes and FailuresC1 C2 C3 C4S F Factor Covar

19 1 1 1219 1 2 1215 5 1 2416 4 2 2413 7 1 5012 8 2 509 11 1 125

11 9 2 1251 19 1 2002 18 2 200

Failures and TrialsC1 C2 C3 C4F T Factor Covar1 20 1 121 20 2 125 20 1 244 20 2 247 20 1 508 20 2 5011 20 1 1259 20 2 12519 20 1 20018 20 2 200



Chapter 2 Binary Logistic Regression



Binary logistic regression has also been used to classify observations into one of two categories, and it may give fewer classification errors than discriminant analysis for some cases [10], [23].

Data

Your data must be arranged in your worksheet in one of five ways: as raw data, as frequency data, as successes and trials, as successes and failures, or as failures and trials. See Worksheet structure on page 2-32.

Factors, covariates, and response data can be numeric, text, or date/time. The reference level and the reference event depend on the data type. See Reference levels for factors on page 2-31 and Reference event for the response variable on page 2-31 for details.

The predictors may either be factors (nominal variables) or covariates (continuous variables). Factors may be crossed or nested. Covariates may be crossed with each other or with factors, or nested within factors.

The model can include up to 9 factors and 50 covariates. Unless you specify a predictor in the model as a factor, the predictor is assumed to be a covariate. Model continuous predictors as covariates and categorical predictors as factors. See How to specify the model terms on page 2-30 for more information.

MINITAB automatically omits observations with missing values from all calculations.

h To do a binary logistic regression

1 Choose Stat ➤ Regression ➤ Binary Logistic Regression.

2 Do one of the following:

■ If your data is in raw form, choose Response and enter the column containing the response variable.






■ If your data is in frequency form, choose Response and enter the column containing the response variable. In Frequency, enter the column containing the count or frequency variable.

■ If your data is in success-trial, success-failure, or failure-trial form, choose Success with Trial, Success with Failure, or Failure with Trial, and enter the respective columns in the accompanying boxes.

See Worksheet structure on page 2-32.

3 In Model, enter the model terms. See How to specify the model terms on page 2-30.


Options

Binary Logistic Regression dialog box

■ include categorical variables (factors) in the model

Graphics subdialog box

■ plot delta Pearson χ2, delta deviance, delta β based on standardized Pearson residuals, and delta β based on Pearson residuals versus:– the estimated event probability for each distinct factor/covariate pattern– the leverage for each distinct factor/covariate pattern

See Regression diagnostics and residual analysis on page 2-38.


■ specify the link function: logit (the default), normit (also called probit), or gompit (also called complementary log-log)—see Link functions on page 2-46

■ change the reference event of the response or the reference levels for the factors—see Interpreting the parameter estimates relative to the event and the reference levels on page 2-39

■ specify initial values for model parameters or parameter estimates for a validation model—see Entering initial values for parameter estimates on page 2-37

■ change the maximum number of iterations for reaching convergence (the default is 20)

■ change the number of groups for the Hosmer-Lemeshow goodness-of-fit test from the default of 10—see Groups for the Hosmer-Lemeshow goodness-of-fit test on page 2-38







■ display the following in the Session window:– no output.– basic information on response, parameter estimates, the log-likelihood, and the

test for all slopes being zero.– the default output, which includes the above output plus three goodness-of-fit

tests (Pearson, deviance, and Hosmer-Lemeshow), a table of observed and expected frequencies, and measures of association.

– the default, along with factor level values, and tests for terms with more than 1 degree of freedom. If you choose the logit link function, MINITAB also prints two Brown goodness-of-fit tests.

■ display the log-likelihood at each iteration of the parameter estimation process.


■ store the following diagnostic measures:– Pearson, standardized Pearson, and deviance residuals– changes (delta) in: Pearson χ2, the deviance statistic, and the estimated regression

coefficients based on either standardized Pearson or Pearson residuals when the respective factor/covariate patterns are removed

– leverages, the diagonals of the hat matrix

See Regression diagnostics and residual analysis on page 2-38.

■ store the following characteristics of the estimated regression equation:– predicted probabilities of success– estimated model coefficients, their standard errors, and the variance-covariance

matrix of the estimated coefficients– the log-likelihood for the last maximum likelihood iteration

■ store the following aggregated data:– the number of occurrences for each factor/covariate pattern– the number of trials for each factor/covariate pattern

Link functions

MINITAB provides three link functions—logit (the default), normit (also called probit), and gompit (also called complementary log-log)—allowing you to fit a broad class of binary response models. These are the inverse of the cumulative logistic distribution function (logit), the inverse of the cumulative standard normal distribution function (normit), and the inverse of the Gompertz distribution function (gompit). This class of models is defined by:






g(πj) = , where

The link function is the inverse of a distribution function. The link functions and their corresponding distributions are summarized below (pi in the variance is 3.14159):

You want to choose a link function that results in a good fit to your data. Goodness-of-fit statistics can be used to compare fits using different link functions. Certain link functions may be used for historical reasons or because they have a special meaning in a discipline.

An advantage of the logit link function is that it provides an estimate of the odds ratios. For a comparison of link functions, see [19].

Entering initial values for parameter estimates

There are several scenarios for which you might enter values for parameter estimates. For example, you may wish to give starting estimates so that the algorithm converges to a solution, or you may wish to validate a model with an independent sample.

■ Convergence—The maximum likelihood solution may not converge if the starting estimates are not in the neighborhood of the true solution. If the algorithm does not converge to a solution, you can specify what you think are good starting values for parameter estimates in Starting estimates for algorithm in the Options subdialog box.

■ Validation—You may also wish to validate the model with an independent sample. Typically, this is done by splitting the data into two subsets. Use the first set to estimate and store the coefficients. If you enter these estimates in Estimates for validation model in the Options subdialog box, MINITAB will use these values as the parameter estimates rather than calculating new parameter estimates. Then, you can assess the model fit for the independent sample.

πj = the probability of a response for the jth factor/covariate pattern

g(πj) = the link function (described below)

β0 = the intercept

= a vector of predictor variables associated with the jth factor/covariate pattern

β = a vector of unknown coefficients associated with the predictors

Name Link function Distribution Mean Variance

logit g(πj) = loge(πj / (1− πj)) logistic 0 pi2 / 3

normit g(πj) = Φ -1(πj) normal 0 1

gompit g(πj) = loge(−loge(1− πj)) Gompertz −γ (Euler constant) pi2 / 6

β0 x′j+ β

x′j






In both cases, enter a column with the first entry being the constant estimate, and the remaining entries corresponding to the model terms in the order in which they appear in the Model box or the output.

Groups for the Hosmer-Lemeshow goodness-of-fit test

The Hosmer-Lemeshow statistic is the chi-square goodness-of-fit statistic from a 2 × (the number of groups) table. The default number of groups is 10. This may work for a large number of problems but if the number of distinct factor/covariate patterns is small or large you may wish to adjust the number of groups. Hosmer and Lemeshow suggest using a minimum of six groups. See [16] for details.

Regression diagnostics and residual analysis

Following any modeling procedure, it is a good idea to assess the validity of your model. Logistic regression has a collection of diagnostic plots, goodness-of-fits tests, and other diagnostic measures to do this. These residuals and diagnostic statistics allow you to identify factor/covariate patterns that are either poorly fit by the model, have a strong influence upon the estimated parameters, or which have a high leverage. MINITAB provides different options for each of these, as listed in the following table (See Help for computational details). Hosmer and Lemeshow [16] suggest that you interpret these diagnostics jointly to understand any potential problems with the model.

To identify… Use... Which measures...

poorly fit factor/covariate patterns

Pearson residual the difference between the actual and predicted observation

standardized Pearson residual

the difference between the actual and predicted observation, but standardized to have σ = 1

deviance residual deviance residuals, a component of deviance χ2

delta chi-square changes in the Pearson χ2 when the jth factor/covariate pattern is removed

delta deviance changes in the deviance when the jth factor/covariate pattern is removed

factor/covariate patterns with a strong influence on parameter estimates

delta beta changes in the coefficients when the jth factor/covariate pattern is removed—based on Pearson residuals

delta beta based on standardized Pearson residuals

changes in the coefficients when the jth factor/covariate pattern is removed—based on standardized Pearson residuals

factor/covariate patterns with a large leverage

leverage (Hi) leverages of the jth factor/covariate pattern, a measure of how unusual predictor values are






The graphs available in the Graphs subdialog box allow you to visualize some of these diagnostics jointly; you can plot a measure useful for identifying poorly fit factor/covariate patterns (delta chi-square or delta deviance) or a measure useful for identifying a factor/covariate pattern with a strong influence on parameter estimates (one of the delta beta statistics) versus either the estimated event probability or leverage. The estimated event probability is the probability of the event, given the data and model. Leverages are used to assess how unusual the predictor values are (see Identifying outliers on page 2-9). See [16] for a further discussion of diagnostic plots. You can use MINITAB’s graph brushing capabilities to identify points. See the Brushing Graphs chapter in MINITAB User’s Guide 1 for more information.

Interpreting the parameter estimates relative to the event and the reference levels

The interpretation of the parameter estimates depends on: the link function (see Link functions on page 2-36), reference event (see Reference event for the response variable on page 2-31), and reference factor levels (see Reference levels for factors on page 2-31). A parameter estimate associated with a predictor (factor or covariate) represents the change in the link function for each unit change in the predictor, while all other predictors are held constant. A unit change in a factor refers to a comparison of a certain level to the reference level.

The logit link provides the most natural interpretation of the parameter estimates and is therefore the default link in MINITAB. A summary of the interpretation follows:

■ The odds of a reference event is the ratio of P(event) to P(not event). The parameter estimate of a predictor (factor or covariate) is the estimated change in the log of P(event)/P(not event) for each unit change in the predictor, assuming the other predictors remain constant.

■ The parameter estimates can also be used to calculate the odds ratio, or the ratio between two odds. Exponentiating the parameter estimate of a factor yields the ratio of P(event)/P(not event) for a certain factor level compared to the reference level. The odds ratios at different values of the covariate can be constructed relative to zero. In the covariate case, it may be more meaningful to interpret the odds and not the odds ratio. Note that a parameter estimate of zero or an odds ratio of one both imply the same thing—the factor or covariate has no effect.

You can change the event or reference levels in the Options subdialog box if you wish to change how you view the parameter estimates. To change the event, specify the new event value in the Event box. To change the reference level for a factor, specify the factor variable followed by the new reference level in the Reference factor level box. You can specify reference levels for more than one factor at the same time. If the levels are text or date/time, enclose them in double quotes.






e Example of a binary logistic regression

You are a researcher who is interested in understanding the effect of smoking and weight upon resting pulse rate. Because you have categorized the response—pulse rate—into low and high, a binary logistic regression analysis is appropriate to investigate the effects of smoking and weight upon pulse rate.


2 Choose Stat ➤ Regression ➤ Binary Logistic Regression.

3 In Response, enter RestingPulse. In Model, enter Smokes Weight. In Factors (optional), enter Smokes.

4 Click Graphs. Check Delta chi-square vs probability and Delta chi-square vs leverage. Click OK.

5 Click Results. Choose In addition, list of factor level values, tests for terms with more than 1 degree of freedom, and 2 additional goodness-of-fit tests. Click OK in each dialog box.

Sessionwindowoutput

Binary Logistic Regression: RestingPulse versus Smokes, Weight Link Function: Logit

Response Information

Variable Value CountRestingP Low 70 (Event) High 22 Total 92

Factor Information

Factor Levels Values Smokes 2 No Yes

Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant -1.987 1.679 -1.18 0.237Smokes Yes -1.1930 0.5530 -2.16 0.031 0.30 0.10 0.90Weight 0.02502 0.01226 2.04 0.041 1.03 1.00 1.05

Log-Likelihood = -46.820Test that all slopes are zero: G = 7.574, DF = 2, P-Value = 0.023

Goodness-of-Fit Tests

Method Chi-Square DF PPearson 40.848 47 0.724Deviance 51.201 47 0.312Hosmer-Lemeshow 4.745 8 0.784Brown:General Alternative 0.905 2 0.636Symmetric Alternative 0.463 1 0.496

A

B C

E

D






Graphwindowoutput

Table of Observed and Expected Frequencies:(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)

GroupValue 1 2 3 4 5 6 7 8 9 10 TotalLow Obs 4 6 6 8 8 6 8 12 10 2 70 Exp 4.4 6.4 6.3 6.6 6.9 7.2 8.3 12.9 9.1 1.9 High Obs 5 4 3 1 1 3 2 3 0 0 22 Exp 4.6 3.6 2.7 2.4 2.1 1.8 1.7 2.1 0.9 0.1

Total 9 10 9 9 9 9 10 15 10 2 92

Measures of Association:(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary MeasuresConcordant 1045 67.9% Somers’ D 0.38Discordant 461 29.9% Goodman-Kruskal Gamma 0.39Ties 34 2.2% Kendall’s Tau-a 0.14Total 1540 100.0%

G

F






Interpretation of results

The Session window output contains the following seven parts:

a. Response Information—displays the number of missing observations and the number of observations that fall into each of the two response categories. The response value that has been designated as the reference event is the first entry under Value and labeled as the event. In this case, the reference event is low pulse rate (see Reference event for the response variable on page 2-31).

B Factor Information—displays all the factors in the model, the number of levels for each factor, and the factor level values. The factor level that has been designated as the reference level is first entry under Values, the subject does not smoke (see Reference levels for factors on page 2-31).

C Logistic Regression Table—shows the estimated coefficients (parameter estimates), standard error of the coefficients, z-values, and p-values. When you use the logit link function, you also see the odds ratio and a 95% confidence interval for the odds ratio.

■ From the output, you can see that both Smokes (z = −2.16, p = 0.031) and Weight (z = 2.04, p = 0.041) have p-values less than 0.05, indicating that there is sufficient evidence that the parameters are not zero using a significance level of α = 0.05.

■ The coefficient of -1.193 for Smokes represents the estimated change in the log of P(low pulse)/P(high pulse) when the subject smokes compared to when he/she does not smoke, with the covariate Weight held constant. The coefficient of 0.0250 for Weight is the estimated change in the log of P(low pulse)/P(high pulse) with a 1 unit (lb) increase in Weight, with the factor Smokes held constant.

■ Although there is evidence that the parameter of Weight is not zero, the odds ratio is very close to one (1.03), indicating that a one pound increase in weight minimally effects a person’s resting pulse rate. A more meaningful difference would be found if you compared subjects with a larger weight difference (for example, if the weight unit is 10 pounds, the odds ratio becomes 1.28, indicating that the odds of a subject having a low pulse increases by 1.28 times with each 10 pound increase in weight).

■ For Smokes, the negative coefficient of -1.193 and the odds ratio of 0.30 indicate that subjects who smoke tend to have a higher resting pulse rate than subjects who do not smoke. Given that subjects have the same weight, the odds ratio can be interpreted as the odds of non-smokers in the sample having a low pulse being 30% of the odds of smokers having a low pulse.

D Next, the last Log-Likelihood from the maximum likelihood iterations is displayed along with the statistic G. This statistic tests the null hypothesis that all the coefficients associated with predictors equal zero versus these coefficients not all being equal to zero. In this example, G = 7.574, with a p-value of 0.023, indicating that there is sufficient evidence that at least one of the coefficients is different from zero, given that your accepted α level is greater than 0.023.






■ Note that for factors with more than 1 degree of freedom, MINITAB performs a multiple degrees of freedom test with a null hypothesis that all the coefficients associated with the factor are equal to 0 versus them not all being equal to 0. This example does not have a factor with more than 1 degree of freedom.

E Goodness-of-Fit Tests—displays Pearson, deviance, and Hosmer-Lemeshow goodness-of-fit tests. In addition, two Brown tests—general alternative and symmetric alternative—are displayed because you have chosen the logit link function and the selected option in the Results subdialog box. The goodness-of-fit tests, with p-values ranging from 0.312 to 0.724, indicate that there is insufficient evidence to claim that the model does not fit the data adequately. If the p-value is less than your accepted α level, the test would reject the null hypothesis of an adequate fit.

F Table of Observed and Expected Frequencies—allows you to see how well the model fits the data by comparing the observed and expected frequencies. There is insufficient evidence that the model does not fit the data well, as the observed and expected frequencies are similar. This supports the conclusions made by the Goodness of Fit Tests.

G Measures of Association—display a table of the number and percentage of concordant, discordant, and tied pairs, as well as common rank correlation statistics. These values measure the association between the observed responses and the predicted probabilities.

■ The table of concordant, discordant, and tied pairs is calculated by pairing the observations with different response values. Here, you have 70 individuals with a low pulse and 22 with a high pulse, resulting in 70 ∗ 22 = 1540 pairs with different response values. Based on the model, a pair is concordant if the individual with a low pulse rate has a higher probability of having a low pulse, discordant if the opposite is true, and tied if the probabilities are equal. In this example, 67.9% of pairs are concordant and 29.9% are discordant. You can use these values as a comparative measure of prediction, for example in comparing fits with different sets of predictors or with different link functions.

■ Somers’ D, Goodman-Kruskal Gamma, and Kendall’s Tau-a are summaries of the table of concordant and discordant pairs. These measures most likely lie between 0 and 1 where larger values indicate that the model has a better predictive ability. In this example, the measure range from 0.14 to 0.39 which implies less than desirable predictive ability.

Plots—In the example, you chose two diagnostic plots—delta Pearson χ2 versus the estimated event probability and delta Pearson χ2 versus the leverage. Delta Pearson χ2 for the jth factor/covariate pattern is the change in the Pearson χ2 when all observations with that factor/covariate pattern are omitted. These two graphs indicate that two observations are not well fit by the model (high delta χ2). A high delta χ2 can be caused by a high leverage and/or a high Pearson residual. In this case, a high Pearson residual caused the large delta χ2, because the leverages are less than 0.1. Hosmer and Lemeshow indicate that delta χ2 or delta deviance greater than 3.84 is large.

If you choose Editor ➤ Brush, brush these points, and then click on them, they will be identified as data values 31 and 66. These are individuals with a high resting pulse, who



Chapter 2 Ordinal Logistic Regression



do not smoke, and who have smaller than average weights (Weight = 116, 136 pounds). You might further investigate these cases to see why the model did not fit them well.

Ordinal Logistic RegressionUse ordinal logistic regression to perform logistic regression on an ordinal response variable. Ordinal variables are categorical variables that have three or more possible levels with a natural ordering, such as strongly disagree, disagree, neutral, agree, and strongly agree. A model with one or more predictors is fit using an iterative-reweighted least squares algorithm to obtain maximum likelihood estimates of the parameters [19].

Parallel regression lines are assumed, and therefore, a single slope is calculated for each covariate. In situations where this assumption is not valid, nominal logistic regression, which generates separate logit functions, is more appropriate.

Data

Your data may be arranged in one of two ways: as raw data or as frequency data. See Worksheet structure on page 2-32.







Ordinal Logistic Regression Regression



h To do an ordinal logistic regression

1 Choose Stat ➤ Regression ➤ Ordinal Logistic Regression.


■ If you have raw response data, in Response, enter the numeric column containing the response data.

■ If you have frequency data, in Response, enter the numeric column containing the response values. Then, in Frequency, enter the variable containing the counts.




Options

Ordinal Logistic Regression dialog box

■ include categorical variables (factors) in the model


■ specify the link function: logit (the default), normit (also called probit), or gompit (also called complementary log-log)—see Link functions on page 2-46

■ specify the order of the response values or change the reference levels for the factors—see Interpreting the parameter estimates relative to the order of response values and the reference levels on page 2-47


■ change the maximum number of iterations for reaching convergence (default is 20)







■ display the following in the Session window:– no output– information on response, parameter estimates, the log-likelihood, and the test for

all slopes being zero– the default output, which includes the above output plus two goodness-of-fit tests

(Pearson and deviance), and measures of association– the default output plus factor level values, and tests for terms with more than one

degree of freedom

■ display the log-likelihood at each iteration of the parameter estimation process


■ store the following characteristics of the estimated regression equation:– estimated model coefficients, their standard errors, and the variance-covariance

matrix of the estimated coefficients.– the log-likelihood for the last maximum likelihood iteration.

■ store the following event probabilities and/or aggregated data:– the number of trials for each factor/covariate pattern.– event probabilities, cumulative event probabilities, and number of occurrences for

each factor/covariate pattern. Note that the number of events (distinct response values) must be specified to store these values.

Link functions

MINITAB provides three link functions—logit (the default), normit (also called probit), and gompit (also called complementary log-log)—allowing you to fit a broad class of ordinal response models. These are the inverse of the cumulative logistic distribution function (logit), the inverse of the cumulative standard normal distribution function (normit), and the inverse of the Gompertz distribution function (gompit). This class of models is defined by:

g(γij) =

where

k = the number of distinct values of the response or the number of possible events

γij = the cumulative probability up to and including event i for the jth factor/covariate pattern

g(γij) = the link function (described below)

= the constant associated with the ith event

θi x′j+ β, i=1, ..., k-1

θi






The link function is the inverse of a distribution function. The link functions and their corresponding distributions are summarized below (π in the variance is 3.14159):

You want to choose a link function that results in a good fit to your data. Goodness-of-fit statistics can be used to compare the fits using different link functions. Certain link functions may be used for historical reasons or because they have a special meaning in a discipline.

An advantage of the logit link function is that it provides an estimate of the odds ratios. The logit link function is the default. For a comparison of link functions, see [19].

Interpreting the parameter estimates relative to the order of response values and the reference levels

The interpretation of the parameter estimates depends on: the link function (see Link functions on page 2-36), reference event (see Reference event for the response variable on page 2-31), and reference factor levels (see Reference levels for factors on page 2-31). A parameter estimate associated with a predictor (factor or covariate) represents the change in the link function for each unit change in the predictor, while all other predictors are held constant. A unit change in a factor refers to a comparison of a certain level to the reference level.

The logit link provides the most natural interpretation of the estimated coefficients and is therefore the default link in MINITAB. A summary of the interpretation follows:

■ The odds of a reference event is the ratio of P(event) to P(not event). The estimated coefficient of a predictor (factor or covariate) is the estimated change in the log of P(event)/P(not event) for each unit change in the predictor, assuming the other predictors remain constant.

■ The estimated coefficient can also be used to calculate the odds ratio, or the ratio between two odds. Exponentiating the parameter estimate of a factor yields the ratio of P(event)/P(not event) for a certain factor level compared to the reference level. The odds ratios at different values of the covariate can be constructed relative to zero. In the covariate case, it may be more meaningful to interpret the odds and not

= a vector of predictor variables associated with the jth factor/covariate pattern

β = a vector of coefficients associated with the predictors

Name Link function Distribution Mean Variance

logit g(γij) = loge(γij / (1− γij)) logistic 0 pi2 / 3

normit g(γij) = Φ -1(γij) normal 0 1

gompit g(γij) = loge(−loge(1− γij)) Gompertz −γ (Euler constant) pi2 / 6

x′j






the odds ratio. Note that a coefficient of zero or an odds ratio of one both imply the same thing—the factor or covariate has no effect.

You can change the order of response values or the reference level in the Options subdialog box if you wish to change how you view the parameter estimates. If your responses were coded Low, Medium, and High, rather than 1, 2, 3, the default alphabetical ordering of the responses would be improper. To change the order of response values, specify the new order in the Order of the response values box. To order as Low, Medium, and High, enter these values in this order, each enclosed in double quotes. To change the reference level for a factor, specify the factor variable and the new reference level in the Reference factor level box. You can specify reference levels for more than one factor at the same time. If the levels are text or date/time, enclose them in double quotes.

e Example of an ordinal logistic regression

Suppose you are a field biologist and you believe that the adult population of salamanders in the Northeast has gotten smaller over the past few years. You would like to determine whether any association exists between the length of time a hatched salamander survives and level of water toxicity, as well as whether there is a regional effect. Survival time is coded as 1 if it is less than 10 days, 2 if it is equal to 10 to 30 days, and 3 if it is equal to 31 to 60 days.


2 Choose Stat ➤ Regression ➤ Ordinal Logistic Regression.

3 In Response, enter Survival. In Model, enter Region ToxicLevel. In Factors (optional), enter Region.

4 Click Results. Choose In addition, list of factor level values, and tests for terms with more than 1 degree of freedom. Click OK in each dialog box.






Sessionwindowoutput


The Session window contains the following five parts:

a. Response Information—displays the number of observations that fall into each of the response categories and the number of missing observations. The ordered response values, from lowest to highest, are shown. Here, you use the default coding scheme, which orders

Ordinal Logistic Regression: Survival versus Region, ToxicLevel Link Function: Logit

Response Information

Variable Value CountSurvival 1 15 2 46 3 12 Total 73

Factor Information

Factor Levels Values Region 2 1 2

Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConst(1) -7.043 1.680 -4.19 0.000Const(2) -3.523 1.471 -2.39 0.017Region 2 0.2015 0.4962 0.41 0.685 1.22 0.46 3.23ToxicLev 0.12129 0.03405 3.56 0.000 1.13 1.06 1.21

Log-likelihood = -59.290Test that all slopes are zero: G = 14.713, DF = 2, P-Value = 0.001


Method Chi-Square DF PPearson 122.799 122 0.463Deviance 100.898 122 0.918

Measures of Association:(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary MeasuresConcordant 1127 79.3% Somers’ D 0.59Discordant 288 20.3% Goodman-Kruskal Gamma 0.59Ties 7 0.5% Kendall’s Tau-a 0.32Total 1422 100.0%

D

A

B C

E






the values from lowest to highest: 1 is less than 10 days, 2 is equal to 10 to 30 days, and 3 is equal to 31 to 60 days (see Reference event for the response variable on page 2-31).

B Factor Information—displays all the factors in the model, the number of levels for each factor, and the factor level values. The factor level that has been designated as the reference level is the first entry under Values, region 1 (see Reference levels for factors on page 2-31).

C Logistic Regression Table—shows the estimated coefficients (parameter estimates), standard error of the coefficients, z-values, and p-values. When you use the logit link function, MINITAB displays the calculated odds ratio and a 95% confidence interval for the odds ratio.

■ The values labeled Const(1) and Const(2) are estimated intercepts for the logits of the cumulative probabilities of survival for less than 10 days, and for 10 to 30 days, respectively. Because the cumulative probability for the last response value is 1, there is no need to estimate an intercept for 31 to 60 days.

■ The coefficient of 0.2015 for Region is the estimated change in the logit of the cumulative survival time probability when the region is 2 compared to region being 1, with the covariate ToxicLevel held constant. Because the p-value for this parameter estimate is 0.685, there is insufficient evidence to conclude that Region has an effect upon survival time.

■ There is one parameter estimated for each covariate, which gives parallel lines for the factor levels. Here, the estimated coefficient for the single covariate, ToxicLevel, is 0.121, with a p-value of less than 0.0005. The p-value indicates that there is sufficient evidence to conclude that the toxic level affects survival. The positive coefficient, and an odds ratio that is greater than one, indicates that higher toxic levels tend to be associated with lower values of survival.

■ Next, MINITAB displays the last Log-Likelihood from the maximum likelihood iterations along with the statistic G. This statistics tests the null hypothesis that all the coefficients associated with predictors equal to 0 versus them not all being equal to 0. In this example, G = 14.713 with a p-value of 0.001, indicating that there is sufficient evidence to conclude that at least one of the coefficients is different from zero.

D Goodness-of-Fit Tests—displays both Pearson and deviance goodness-of-fit tests. In our example, the p-value for the Pearson test is 0.463, and the p-value for the deviance test is 0.918, indicating that there is insufficient evidence to claim that the model does not fit the data adequately. If the p-value is less than your selected α level, the tests rejects the null hypothesis of an adequate fit.

E Measures of Association—displays a table of the number and percentage of concordant, discordant, and tied pairs, as well as common rank correlation statistics. These values measure the association between the observed responses and the predicted probabilities.

■ The table of concordant, discordant, and tied pairs is calculated by pairing the observations with different response values. Here, you have fifteen 1’s, fourty-six 2’s, and twelve 3’s, resulting in (15 ∗ 46) + (15 ∗ 12) + (46 ∗ 12) = 1422 pairs of different response values. Pairs involving the lowest coded response value (the 1-2



Nominal Logistic Regression Regression



and 1-3 value pairs in the example) are concordant if the cumulative probability up to the lowest response value (here 1) is greater for the observation with the lowest value. This pattern applies to other value pairs. Pairs involving responses coded as 2 and 3 in this example are concordant if the cumulative probability up to 2 is greater for the observation coded as 2. The pair is discordant if the opposite is true and tied if the cumulative probabilities are equal. In this example, 79.3% of pairs are concordant, 20.3% are discordant, and 0.5% are ties. You can use these values as a comparative measure of prediction (for example, when evaluating predictors and different link functions).

■ Somers’ D, Goodman-Kruskal Gamma, and Kendall’s Tau-a are summaries of the table of concordant and discordant pairs. The numbers have the same numerator (the number of concordant pairs minus the number of discordant pairs). The denominators are the total number of pairs with Somers’ D, the total number of pairs excepting ties with Goodman-Kruskal Gamma, and the number of all possible observation pairs for Kendall’s Tau-a. These measures most likely lie between 0 and 1, where larger values indicate that the model has a better predictive ability.

Nominal Logistic RegressionUse nominal logistic regression to perform logistic regression on a nominal response variable using an iterative-reweighted least squares algorithm to obtain maximum likelihood estimates of the parameters [19]. Nominal variables are categorical variables that have three or more possible levels with no natural ordering. For example, the levels in a food tasting study may include crunchy, mushy, and crispy.

Data

Your data may be arranged in one of two ways: as raw data or as frequency data. See Worksheet structure on page 2-32.






Chapter 2 Nominal Logistic Regression




h To do a nominal logistic regression

1 Choose Stat ➤ Regression ➤ Nominal Logistic Regression.


■ If you have raw response data, in Response, enter the numeric column containing the response data.

■ If you have frequency data, in Response, enter the numeric column containing the response values. Then, in Frequency, enter the variable containing the counts.




Options

Nominal Logistic Regression dialog box

■ define the factors (categorical variables) in the model


■ change the reference event of the response or the reference levels for the factors—see Interpreting the parameter estimates relative to the reference event and reference levels on page 2-54


■ change the maximum number of iterations for reaching convergence from the default of 20







■ display the following in the Session window:– no output– basic information on response, parameter estimates, the log-likelihood, and the

test for all slopes being zero– the default output, which includes the above output plus two goodness-of-fit tests

(Pearson and deviance)– the default output, plus factor level values and tests for terms with more than one

degree of freedom

■ display the log-likelihood at each iteration of the parameter estimation process


■ store characteristics of the estimated regression equation:– estimated model coefficients, their standard errors, and the variance-covariance

matrix of the estimated coefficients.– the log-likelihood for the last maximum likelihood iteration.

■ store event probabilities and/or aggregated data:– the number of trials for each factor/covariate pattern.– event probabilities and number of occurrences for each factor/covariate pattern.

Note that the number of events (distinct response values) must be specified to store these values.






Nominal logistic regression model

The model fit by nominal logistic regression is defined by:

, where

See [16] for additional discussion.

Interpreting the parameter estimates relative to the reference event and reference levels

The interpretation of the parameter estimates depends upon the designated reference event (see Reference event for the response variable on page 2-31) and reference factor levels (see Reference levels for factors on page 2-31). A parameter estimate associated with a predictor represents the change in the particular logit for each unit change in the predictor, assuming that all other factors and covariates are held constant. A one unit change in a factor refers to a comparison of a certain level to the reference level.

If there are k distinct response values, MINITAB estimates k−1 sets of parameter estimates. These are the estimated differences in log odds or logits of levels of the response variable relative to the reference event. Each set contains a constant and coefficients for the factors and the covariates. Note that these sets of parameter estimates give nonparallel lines for the response value. The interpretation of the parameter estimates is as follows:

■ The coefficient of a predictor (factor or covariate) is the estimated change in the log of P(response level)/P(reference event) for each unit change in the predictor, assuming the other predictors remain constant.

■ The coefficient can also be used to calculate the odds ratio, or the ratio between two odds. Exponentiating the parameter estimate of a factor yields the ratio of P(response level)/P(reference event) for a certain factor level compared to the reference level. The odds ratios at different values of the covariate can be constructed relative to zero. In the covariate case, it may be more meaningful to interpret the odds and not

k = the number of distinct values of the response or the number of possible events

πij = the probability of the ith event for the jth factor/covariate pattern (π1j is the probability of the reference event for the jth factor/covariate pattern)

bi0 = the intercept for the (i−1)st logit function= a vector of predictor variables for the jth factor/covariate pattern

bi = a vector of unknown coefficients associated with the predictors for the (i−1)st logit function

logeπi jπ1j-------

βi0 x′j+ βi , i=2, ..., k=

x′j






the odds ratio. Note that a coefficient of zero or an odds ratio of one both imply the same thing—the factor or covariate has no effect.

You can change the reference event or reference levels in the Options subdialog box if you wish to change how you view the parameter estimates. To change the event, specify the new event value in the Reference event box. To change the reference event, give the new level. To change the reference level for a factor, specify the factor variable followed by the new reference level in the Reference factor level box. You can specify reference levels for more than one factor at the same time. If the levels are text or date/time, enclose them in double quotes.

e Example of a nominal logistic regression

Suppose you are a grade school curriculum director interested in what children identify as their favorite subject and how this subject is associated with their age or the teaching method employed. Thirty children, 10 to 13 years old, had classroom instruction in science, math, and language arts that employed either lecture or discussion techniques. At the end of the school year, they were asked to identify their favorite subject. You use nominal logistic regression because the response is categorical but possesses no implicit categorical ordering.


2 Choose Stat ➤ Regression ➤ Nominal Logistic Regression.

3 In Response, enter Subject. In Model, enter TeachingMethod Age. In Factors (optional), enter TeachingMethod.

4 Click Results. Choose In addition, list of factor level values, tests for terms with more than 1 degree of freedom. Click OK in each dialog box.






Sessionwindowoutput


The Session window output contains the following five parts:

a. Response Information—displays the number of observations that fall into each of the response categories (science, math, and language arts), as well as the number of missing observations. The response value that has been designated as the reference event is the first entry under Value and labeled as the reference event. Here, the default coding scheme defines the reference event as science using reverse alphabetical order.

B Factor Information—displays all the factors in the model, the number of levels for each factor, and the factor level values. The factor level that has been designated as the reference

Nominal Logistic Regression: Subject versus TeachingMethod, Age Response Information

Variable Value CountSubject science 10 (Reference Event) math 11 arts 9 Total 30

Factor Information

Factor Levels Values Teaching 2 discuss lecture

Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperLogit 1: (math/science)Constant -1.123 4.564 -0.25 0.806Teaching lecture -0.5631 0.9376 -0.60 0.548 0.57 0.09 3.58Age 0.1247 0.4011 0.31 0.756 1.13 0.52 2.49

Logit 2: (arts/science)Constant -13.848 7.243 -1.91 0.056Teaching lecture 2.770 1.372 2.02 0.044 15.96 1.08 234.91Age 1.0135 0.5845 1.73 0.083 2.76 0.88 8.66

Log-likelihood = -26.446Test that all slopes are zero: G = 12.825, DF = 4, P-Value = 0.012


Method Chi-Square DF PPearson 6.953 10 0.730Deviance 7.886 10 0.640

A

B

C

DE






level is the first entry under Values. Here, the default coding scheme defines the reference level as “discussion” using alphabetical order.

C Logistic Regression Table—shows the estimated coefficients (parameter estimates), standard error of the coefficients, z-values, and p-values. MINITAB also displays the odds ratio and a 95% confidence interval for the odds ratio. The coefficient associated with a predictor is the estimated change in the logit with a one unit change in the predictor, assuming that all other factors and covariates are the same.

■ If there are k distinct response values, MINITAB estimates k−1 sets of parameter estimates, here labeled as Logit(1) and Logit(2). These are the estimated differences in log odds or logits of math and language arts, respectively, compared to science as the reference event. Each set contains a constant and coefficients for the factors (here, teaching method) and the covariates (here, age). The TeachingMethod coefficient is the estimated change in the logit when TeachingMethod is lecture compared to the teaching method being discussion, with Age held constant. The Age coefficient is the estimated change in the logit with a one year increase in age with teaching method held constant. These sets of parameter estimates give nonparallel lines for the response values.

■ The first set of estimated logits, labeled Logit(1), are the parameter estimates of the change in logits of math relative to the reference event, science. The p-values of 0.548 and 0.756 for TeachingMethod and Age, respectively, indicate that there is insufficient evidence to conclude that a change in teaching method from discussion to lecture or a change in age affected the choice of math as favorite subject compared to science.

■ The second set of estimated logits, labeled Logit(2), are the parameter estimates of the change in logits of language arts relative to the reference event, science. The p-values of 0.044 and 0.083 for TeachingMethod and Age, respectively, indicate that there is sufficient evidence, if the p-values are less than your acceptable α level, to conclude that a change in teaching method from discussion to lecture or a change in age affected the choice of language arts as the favorite subject compared to science. The positive coefficient for teaching method indicates students given a lecture style of teaching tend to prefer language arts over science compared to students given a discussion style of teaching. The estimated odds ratio of 15.96 implies that the odds of choosing language arts over science is about 16 times higher for these students when the teaching method changes from discussion to lecture. The positive coefficient associated with age indicates that students tend to like language arts over science as they become older.

D Next, MINITAB displays the last Log-Likelihood from the maximum likelihood iterations along with the statistic G. G is the difference in −2 log-likelihood for a model that only has the constant terms and the fitted model shown in the Logistic Regression Table. G is the test statistic for testing the null hypothesis that all the coefficients associated with predictors being equal to 0 versus them not all being equal to 0. G = 12.825 with a p-value of 0.012, which indicates that at α = 0.05 there is sufficient evidence for at least one coefficient being different from 0.



Chapter 2 References



E Goodness-of-Fit Tests—displays Pearson and deviance goodness-of-fit tests. The p-value for the Pearson test is 0.730 and the p-value for the deviance test is 0.640, indicating that there is insufficient evidence for the model not fitting the data adequately. If the p-value is less than your selected α level, the test would indicate sufficient evidence for an inadequate fit.

References[1] A. Agresti (1984). Analysis of Ordinal Categorical Data, John Wiley & Sons, Inc.

[2] A. Agresti (1990). Categorical Data Analysis, John Wiley & Sons, Inc.

[3] D.A. Belsley, E. Kuh, and R.E. Welsch (1980). Regression Diagnostics, John Wiley & Sons, Inc.

[4] A. Bhargava (1989). “Missing Observations and the Use of the Durbin-Watson Statistic,” Biometrika 76, 4, pp.828–831.

[5] C.C. Brown (1982). “On a Goodness of fit Test for the Logistic Model Based on Score Statistics,” Communications in Statistics, 11, pp.1087–1105.

[6] D.A. Burn and T.A. Ryan, Jr. (1983). “A Diagnostic Test for Lack of Fit in Regression Models,” ASA 1983 Proceedings of the Statistical Computing Section, pp.286–290.

[7] R.D. Cook (1977). “Detection of Influential Observations in Linear Regression,” Technometrics 19, pp.15–18.

[8] R.D. Cook and S. Weisberg (1982). Residuals and Influence in Regression, Chapman and Hall.

[9] N.R. Draper and H. Smith (1981). Applied Regression Analysis, Second Edition, John Wiley & Sons, Inc.

[10] S.E. Fienberg (1987). The Analysis of Cross-Classified Categorical Data. The MIT Press.

[11] M.J. Garside (1971). “Some Computational Procedures for the Best Subset Problem,” Applied Statistics 20, pp.8–15.

[12] James H. Goodnight (1979). “A Tutorial on the Sweep Operator,” The American Statistician 33, pp.149–158.

[13] W.W. Hauck and A. Donner (1977). “Wald’s test as applied to hypotheses in logit analysis. Journal of the American Statistical Association 72, 851-853.

[14] D.C. Hoaglin and R.E. Welsch (1978). “The Hat Matrix in Regression and ANOVA,” The American Statistician 32, pp.17–22, and Corrigenda 32, p.146.

[15] R.R. Hocking (1976). “A Biometrics Invited Paper: The Analysis and Selection of Variables in Linear Regression,” Biometrics 32, pp.1–49.



References Regression



[16] D.W. Hosmer and S. Lemeshow (1989). Applied Logistic Regression, John Wiley & Sons, Inc.

[17] LINPACK (1979). Linpack User’s Guide by J.J. Dongarra, J.R. Bunch, C.B. Moler, and G.W. Stewart, Society for Industrial and Applied Mathematics, Philadelphia, PA.

[18] J.H. Maindonald (1984). Statistical Computation, John Wiley & Sons, Inc.

[19] P. McCullagh and J.A. Nelder (1992). Generalized Linear Models, Chapman & Hall.

[20] W. Miller (1978). “Performing Armchair Roundoff Analysis of Statistical Algorithms,” Communications in Statistics, pp.243–255.

[21] D.C. Montgomery and E.A. Peck (1982). Introduction to Linear Regression Analysis. John Wiley & Sons.

[22] J. Neter, W. Wasserman, and M. Kutner (1985). Applied Linear Statistical Models, Richard D. Irwin, Inc.

[23] S.J. Press and S. Wilson (1978). “Choosing Between Logistic Regression and Discriminant Analysis,” Journal of the American Statistical Association 73, 699-705.

[24] M. Schatzoff, R. Tsao, and S. Fienberg (1968). “Efficient Calculation of All Possible Regressions,” Technometrics 10, pp.769–779.

[25] G.W. Stewart (1973). Introduction to Matrix Computations, Academic Press.

[26] R.A. Thisted (1988). Elements of Statistical Computing: Numerical Computation, Chapman & Hall.

[27] P. Velleman and R. Welsch (1981). “Efficient Computation of Regression Diagnostics,” The American Statistician 35, pp.234–242.

[28] P.F. Velleman, J. Seaman, and I.E. Allen (1977). “Evaluating Package Regression Routines,” ASA 1977 Proceedings of the Statistical Computing Section.

[29] S. Weisberg (1980). Applied Linear Regression, John Wiley & Sons, Inc.

Acknowledgments

We are very grateful for help in the design of the regression algorithm from W. Miller and from G.W. Stewart and for useful suggestions from. P.F. Velleman and S. Weisberg and many others.



Chapter 2 References





Documents

02 regresion