Click here to load reader

Subset Selection in Regression (A. J. Miller)

Embed Size (px)

Citation preview

  • SIAM REVIEWVol. 34, No. 2, pp. 325-349, June 1992

    () 1992 Society for Industrial and Applied MathematicsO08

    BOOK REVIEWS

    EDITED BY NICHOLAS D. KAZARINOFF

    Subset Selection in Regression. ByA. J. MillerChapman & Hall, New York, 1990. x + 229pp. $47.50. ISBN 0-412-35380-6. Mono-graphs on Statistics and Applied Probability,Vol. 40.

    This monograph is unique in its coverage.There is no other book on the subject, al-though there are some survey articles, mostnotably Millers article [1].

    After an introductory chapter there is achapter on least squares computations. Milleremphasizes orthogonalization procedures, es-pecially planar rotations, and not the Gauss-Jordan approach that is prominent in Amer-ican statistical packages. The Gauss-Jordanalgorithm is often given in the form called thesweep algorithm, which is described by Miller,but the name "sweep" is not used by him. Hegives an excellent explanation of why the or-thogonalization method gives better accuracy,but gives no evidence of the need for the in-creased accuracy over what is obtainable indouble precision from Gauss-Jordan. Thischapter also has algorithms for computing theregressions on all subsets of a set ofpredictors.

    Chapter 3 begins with some of the standardprocedures for variable selection, such as for-ward selection, backward elimination, and thecombination of these two called stepwise re-gression. Then Miller describes more sophis-ticated procedures for generating the subsetsofpredictors that are "best" by some criterion.He also describes ridge regression and its usein subset selection. The final section consistsof four examples, including one with 20 vari-ables and only 14 cases and another examplewith 11 variables and 13 cases. Based on theseand other examples, Miller makes the generalobservation that the best subset proceduresare advantageous over the sequential proce-dures mainly in these examples with few casescompared to the number of variables. On theother hand, the reader might wonder if theapparent advantage is usually just a matter ofrandom error. Do 14 cases have sufficient in-formation to enable choice from all subsets of20 predictors?

    325

    Next is a chapter on significance testing.Miller suggests including entirely random pre-dictors, and seeing how well they competewith the variables in the data. If the ran-dom predictors are chosen, then there are toomany predictors. After mentioning that theusual F-test is not valid, he presents othermethods, including some based on simulationand random permutation, and some for simul-taneous confidence intervals based on the F-distribution.

    Chapter 5 emphasizes the bias in regres-sion coefficients that results when predictorsubsets are selected based on error sums ofsquares. Several cures are given for this dis-ease, including Millers new maximum likeli-hood method. Unfortunately, this method iscomplicated, it is limited to special cases, andit may increase the variance while decreasingthe bias.The following chapter gives a variety of

    stopping rules, criteria for deciding how manypredictors to retain. It concludes with a verygood discussion of the relationships amongthe various criteria.

    Finally, Chapter 7 is a summary in the formof seven questions that Miller poses and an-swers. Question 6 asks whether subset re-gression can be justified, and Millers answermentions the case in which the predictors areexpensive to measure. The reader who hasworked through the book to this point may bedisappointed that the author does not make astronger case for subset selection.

    Even though the bibliography runs toaround 200 entries, it does not include all re-search on subset selection. One could wonderif there were space constraints, because thedata for only one of the examples are given,and there are not many figures. On page83 Miller advocates the use of graphical andother methods of residual analysis, but thereis not much of this in the book, perhaps againbecause of space limitations. Another reasonmight be that, as he points out on page 14,the properties of residuals are unknown whenpredictors have been selected based on thesesame residuals.

    Dow

    nloa

    ded

    12/0

    2/14

    to 1

    29.1

    20.2

    42.6

    1. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • 326 BOOK REVIEWS

    Miller does a fine job of summarizing mostof the research in this field, and he givesclear explanations of some complicated pro-cedures. Those who use subset selection andthose who teach it should read at least Chap-ters 3 and 6. Millers survey [1] is useful, too,because it has material not in the book, andthe accompanying discussion gives the viewsofothers.

    REFERENCE

    [1] A. J. MILLER, Selection of subsets and regres-sion variables, J. Roy. Statist. Soc., A, 147(1984), pp. 389-425.

    KENNETH N. BERKIllinois State University

    Computer Networks and Systems: Queue-ing Theory and Performance Evaluation. ByThomas G. Robertazzi. Springer-Verlag, NewYork, 1990. xii + 306 pp. $49.50. ISBN 0-387-97393-1.

    Various models of computer communica-tion networks capture different aspects of net-work performance. For example, a graph the-oretic representation provides information onconnectivities and path capacities, whereas aqueueing model deals with message delays,buffer occupancies and overflows, character-istics of traffic among work stations, and thelike. Because it has been generally recog-nized that queueing models are essential tothe effective analysis of networks, electricaland computer engineers have become famil-iar with queueing theory, while queueing the-orists have applied their skills to networks.

    Accordingly, it comes as no surprise thatseveral recent texts on networks devote a largefraction of their contents to applicable aspectsof queueing theory. The book under reviewgoes a step further; queueing analysis of net-works is its central theme. Since the book isintended for beginning graduate students, it(intentionally) contains no original material,and should be judged on the basis of its peda-gogic quality.

    This text is not directed at a mathematicalaudience. The level of rigor is perhaps wellillustrated by the statement (page 18):

    It turns out that random splits of a Pois-son process are Poisson and that ajoining of

    independent Poisson processes is also Pois-son. A little thought with the coin flippinganalogy will show that this is true.

    This statement appears in reference to theintroduction of the M/M/1 queue, which withvariations such as the M/M/m/m queue, occu-pies sixty-five pages early in the book. The ar-guments on equilibrium probabilities for suchqueues are based on deriving the usual dif-ferential equations, and setting the derivativeto zero; one finds no reference to the transi-tion matrix, recurrent states, reducibility, Kol-mogorov equations, or any of the other termsdear to those familiar with Markov processes.The same chapter contains a simplified

    treatment of the embedded M/G/1 queue, areference to reversibility and its implications,and Littles formula. None of the three(!)proofs or ensuing exposition of Littles for-mula give any indication that this result is notmerely applicable to queues, but is actuallya systems theorem of broad interest. Indeed,the whole chapter suffers from a narrow focus,and is almost entirely derivative of (but less lu-cid than) the standard texts [1]-[4] repeatedlyreferenced by the author.

    Next is a chapter dealing with queueingnetworks. The emphasis is on Markov sys-tems having product form equilibrium solu-tion. The author attempts to explain globaland local balance intuitively in terms of prob-ability flows, rather than in the context of theinfinitesimal generator. His exposition is lesssuccessful than that found in earlier books[5], which yield similar results with consid-erable elegance but no greater mathematicalcomplexity. There are also examples, at leastone ofwhich is borrowed directly from a well-established text [6].

    Following this chapter on networks ofqueues is another long chapter (about sixtypages), covering numerical methods. Theserelate almost entirely to the calculation ofproduct form solutions. Algorithms are de-scribed, and some numerical examples offered. There is virtually no error analysis, noris attention given to relative computationalrequirements of alternative methods. In thelatter portion of the chapter, "guest authors"discuss simulation, but without any specificswhatsoever. In the opinion of the reviewer,the chapters focus on product form is exces-sively restrictive (especially for a first course

    Dow

    nloa

    ded

    12/0

    2/14

    to 1

    29.1

    20.2

    42.6

    1. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p