2
Book Reviews 295 many Bayesians will be delighted to read this chapter. It is obvious that Rao has taken great care in writing the nal chapter of this monograph. Because HB estimation, when applied properly, has an advantage over the EB and other approaches in incorporating all the uncertainties associated with an estimator, I have a personal preference for this approach. Although one needs to be clever with the other approaches, the HB methods can be routinely applied to a wide variety of problems. However, one dif culty with the HB approach is that it is computer intensive. Realizing the importance of computing, Rao has carefully presented computational issues in detail. The rst two sections of the chapter are written in the general framework of Bayesian inference. Section 10.2 has an extensive and lucid discussion of Markov Chain Monte Carlo (MCMC) meth- ods. These methods play an important role in HB applications. Many practical issues of Bayesian computing and model determination using MCMC methods have been so carefully discussed that even practitioners of small area estima- tion with little or no exposure to Bayesian statistics will be able to implement the HB approach in their own applications. After discussing the main ingredi- ents of Bayesian methods, Rao had spelled them out for many useful small area models, for continuous as well as for binary and count data. Many of the topics in Chapter 9 are revisited in an HB framework. In summary, this book will be considered a masterpiece in small area esti- mation. I believe that it has the potential to turn small area estimation, which is already quite popular in survey sampling, into a larger area of importance to both researchers and practitioners. Professor Rao communicated to me some time ago the following list of er- rata. I am including this list for the readers’ bene t. (a) Page 6, line 13: “con- strained HB” should read “constrained.” (b) Second line after (9.3.9), “bias of order m ¡1 ” should be “bias lower order than m ¡1 .” (c) Reference 316, is the same as reference 317, but has the wrong year and should be deleted. (d) Ghosh and Rao (1994) and Rao (1992, 1999), which are cited in the text, are not listed in the references. Gauri Sankar DAT TA University of Georgia REFERENCES Datta, G. S., Ghosh, M., Smith, D., and Lahiri, P. (2002), “Asymptotic Theory of Conditional and Unconditional Coverage Probabilities of Empirical Bayes Con dence Intervals,” Scandinavian Journal Statistics, 29, 139–152. Ghosh, M., and Rao, J. N. K. (1994), “Small Area Estimation: An Appraisal” (with discussion), Statistical Science, 9, 55–93. Nandram, B. (1999), “An Empirical Bayes Prediction Interval for the Finite Population Mean of Small Area,” Statistica Sinica, 9, 325–343. Rao, J. N. K. (1992), “Estimating Totals and Distribution Functions Using Aux- iliary Information at the Estimation Stage,” Proceedings of the Workshop on Uses of Auxiliary Information in Surveys, Orebro: Statistics Sweden. (1999), “Some Recent Advances in Model-Based Small Area Estima- tion,” Survey Methodology , 25, 175–186. Stochastic Approximation and Its Applications. Han-Fu CHEN. Dordrecht, Netherlands: Kluwer, 2002. ISBN 1-4020- 0806-6. xv C 357 pp. $130.00 (H). The book under review is written by a leading expert in the eld. It contains a thorough treatment of the fundamentals of stochastic approximation and a wide range of applications; it provides clear exposition and superb presentation. The original motivation of stochastic approximation stems from nding ze- ros (roots) of an unknown function f.¢/ : R r 7! R r , where the values of the function cannot be measured or observed, but only noisy measurements or ob- servations are available. As a consequence, the usual numerical method for searching zeros of a nonlinear function cannot be used. In the early 1950s, Robbins and Monro (1951) suggested a procedure known as the stochastic ap- proximation method. The idea is that at each k, the observation y k D f.x k / C " k is taken, where x k is the design point and " k is the observation noise. Then one constructs the next estimate x kC1 D x k C a k y k ; where fak g is a sequence of step sizes (a sequence of real numbers that satisfy a k ¸ 0, a k ! 0, and P k a k D1). Later, it was demonstrated that stochas- tic optimization problems in which only noisy measurements are available can also be recast into a stochastic approximation setting (see Kiefer and Wolfowitz 1952). A half century has passed since then. Due to its applicability in a diverse range of applications, stochastic approximation has been the focus of enormous amounts of research in the past several decades. Much effort has been devoted to analyzing the associated stochastic dynamic systems, their asymptotic prop- erties, such as convergence and rates of convergence, and their variants and improvements. There is a vast literature, including many excellent books, on the subject of stochastic approximation. Perhaps one would ask, What is special about the book under review and how is it distinct from other books available? Two of the distinct features of the book are its coverage of randomly expanding trunca- tion methods, which were initiated by the author and his co-workers in the late 1980s (see Chen and Zhu 1986), and the TS (trajectory-subsequence) method for proving convergence of the algorithm. The classical treatment of stochas- tic approximation often requires that the function f.¢/ under consideration be growing at certain rates, and the modi cation of such an approach using pro- jection or truncations normally requires that the projection or truncation region be known a priori. The so-called randomly generated expanding truncation procedure suggested by the author and co-workers speci es that, in lieu of a xed truncation region, one generates expanding truncation regions. At any in- stance, one compares the iterates with the truncation bounds. If the iterate is in the bounded truncation region, do nothing. If the iterate is outside the trunca- tion region, project it back to the region and update the expanding truncation bounds as well. It has been shown that after a nite number of steps, the trunca- tion terminates and the iterates remain in a bounded region with probability 1. Then one can proceed to obtain convergence of the algorithm. The trajectory- subsequence method developed by the author examines the convergence of the algorithms by concentrating on the trajectories of the dynamics of the iterates and by verifying the noise conditions on a subsequence of the iterates. This fa- cilitates the treatment of state-dependent noises. One of the advantages is that a convergent subsequence fx nk g is always bounded, whereas the boundedness of the iterates fx n g is not guaranteed a priori. The entire book contains six chapters and two appendixes. At the end of each chapter, there is a section titled “Notes and References,” which gives historical notes and an update on the latest research progress. The book begins with an introduction to the Robbins–Monro algorithm. Af- ter a brief description of the algorithms, the author discusses the probability method that was used in the early development of the stochastic approximation method: the ODE (ordinary differential equation) method that combines proba- bilistic arguments with analytic techniques that use compactness in an essential way. Then an account of the TS approach is presented. In addition, the weak convergence method is also surveyed in this chapter. The second chapter is devoted to stochastic approximation with randomly varying truncation bounds. Following the motivational discussion, general con- vergence via the TS method is presented. Other issues discussed in this chap- ter include state-dependent noise, non-additive noise, necessary conditions for convergence, connection of trajectory convergence and limit points, robustness, and dynamic stochastic approximation. Chapter 3 proceeds with the study of further asymptotic properties of sto- chastic approximation. Here convergence rates are studied, including both non- degenerate and degenerate cases. It also deals with asymptotic normality and asymptotic ef ciency via averaging. Chapter 4 continues the study for stochastic optimization algorithms. Ran- dom truncation algorithms are developed for Kiefer–Wolfowitz (KW) type al- gorithms. Then these algorithms are analyzed. The KW algorithms normally converge only to local optima. In this chapter, global optimization algorithms are also studied, together with an application to model reduction. Chapters 5 and 6 concentrate on applications to signal processing, and sys- tems theory and control. In these two chapters, in addition to presenting some well-known applications in adaptive control and system identi cation, adap- tive ltering with constraints, and asynchronous estimation, the author pro- vides several newly worked out examples, such as recursive blind identi cation, principal components analysis and its recursion via stochastic approximation, adaptive ltering with a hard limiter, pole assignment for systems with un- known coef cients, and adaptive regulation. Because of the TS approach, con- ditions used in the application examples are weaker than those used in the existing results. The two appendixes give some background material. They include basics in probability theory and stochastic processes.

Stochastic Approximation and Its Applications

  • Upload
    george

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Stochastic Approximation and Its Applications

Book Reviews 295

many Bayesians will be delighted to read this chapter. It is obvious that Raohas taken great care in writing the � nal chapter of this monograph. BecauseHB estimation, when applied properly, has an advantage over the EB and otherapproaches in incorporating all the uncertainties associated with an estimator,I have a personal preference for this approach. Although one needs to be cleverwith the other approaches, the HB methods can be routinely applied to a widevariety of problems. However, one dif� culty with the HB approach is that it iscomputer intensive. Realizing the importance of computing, Rao has carefullypresented computational issues in detail. The � rst two sections of the chapterare written in the general framework of Bayesian inference. Section 10.2 has anextensive and lucid discussion of Markov Chain Monte Carlo (MCMC) meth-ods. These methods play an important role in HB applications. Many practicalissues of Bayesian computing and model determination using MCMC methodshave been so carefully discussed that even practitioners of small area estima-tion with little or no exposure to Bayesian statistics will be able to implementthe HB approach in their own applications. After discussing the main ingredi-ents of Bayesian methods, Rao had spelled them out for many useful small areamodels, for continuous as well as for binary and count data. Many of the topicsin Chapter 9 are revisited in an HB framework.

In summary, this book will be considered a masterpiece in small area esti-mation. I believe that it has the potential to turn small area estimation, whichis already quite popular in survey sampling, into a larger area of importance toboth researchers and practitioners.

Professor Rao communicated to me some time ago the following list of er-rata. I am including this list for the readers’ bene� t. (a) Page 6, line 13: “con-strained HB” should read “constrained.” (b) Second line after (9.3.9), “bias oforder m¡1” should be “bias lower order than m¡1 .” (c) Reference 316, is thesame as reference 317, but has the wrong year and should be deleted. (d) Ghoshand Rao (1994) and Rao (1992, 1999), which are cited in the text, are not listedin the references.

Gauri Sankar DATTA

University of Georgia

REFERENCES

Datta, G. S., Ghosh, M., Smith, D., and Lahiri, P. (2002), “Asymptotic Theoryof Conditional and Unconditional Coverage Probabilities of Empirical BayesCon� dence Intervals,” Scandinavian Journal Statistics, 29, 139–152.

Ghosh, M., and Rao, J. N. K. (1994), “Small Area Estimation: An Appraisal”(with discussion), Statistical Science, 9, 55–93.

Nandram, B. (1999), “An Empirical Bayes Prediction Interval for the FinitePopulation Mean of Small Area,” Statistica Sinica, 9, 325–343.

Rao, J. N. K. (1992), “Estimating Totals and Distribution Functions Using Aux-iliary Information at the Estimation Stage,” Proceedings of the Workshop onUses of Auxiliary Information in Surveys, Orebro: Statistics Sweden.

(1999), “Some Recent Advances in Model-Based Small Area Estima-tion,” Survey Methodology, 25, 175–186.

Stochastic Approximation and Its Applications.

Han-Fu CHEN. Dordrecht, Netherlands: Kluwer, 2002. ISBN 1-4020-0806-6. xv C 357 pp. $130.00 (H).

The book under review is written by a leading expert in the � eld. It contains athorough treatment of the fundamentals of stochastic approximation and a widerange of applications; it provides clear exposition and superb presentation.

The original motivation of stochastic approximation stems from � nding ze-ros (roots) of an unknown function f .¢/ :Rr 7! Rr , where the values of thefunction cannot be measured or observed, but only noisy measurements or ob-servations are available. As a consequence, the usual numerical method forsearching zeros of a nonlinear function cannot be used. In the early 1950s,Robbins and Monro (1951) suggested a procedure known as the stochastic ap-proximation method. The idea is that at each k, the observation

yk D f .xk / C "k

is taken, where xk is the design point and "k is the observation noise. Then oneconstructs the next estimate

xkC1 D xk C akyk;

where fakg is a sequence of step sizes (a sequence of real numbers that satisfyak ¸ 0, ak ! 0, and

Pk ak D 1). Later, it was demonstrated that stochas-

tic optimization problems in which only noisy measurements are available canalso be recast into a stochastic approximation setting (see Kiefer and Wolfowitz1952). A half century has passed since then. Due to its applicability in a diverserange of applications, stochastic approximation has been the focus of enormousamounts of research in the past several decades. Much effort has been devotedto analyzing the associated stochastic dynamic systems, their asymptotic prop-erties, such as convergence and rates of convergence, and their variants andimprovements.

There is a vast literature, including many excellent books, on the subjectof stochastic approximation. Perhaps one would ask, What is special about thebook under review and how is it distinct from other books available? Two ofthe distinct features of the book are its coverage of randomly expanding trunca-tion methods, which were initiated by the author and his co-workers in the late1980s (see Chen and Zhu 1986), and the TS (trajectory-subsequence) methodfor proving convergence of the algorithm. The classical treatment of stochas-tic approximation often requires that the function f .¢/ under consideration begrowing at certain rates, and the modi� cation of such an approach using pro-jection or truncations normally requires that the projection or truncation regionbe known a priori. The so-called randomly generated expanding truncationprocedure suggested by the author and co-workers speci� es that, in lieu of a� xed truncation region, one generates expanding truncation regions. At any in-stance, one compares the iterates with the truncation bounds. If the iterate is inthe bounded truncation region, do nothing. If the iterate is outside the trunca-tion region, project it back to the region and update the expanding truncationbounds as well. It has been shown that after a � nite number of steps, the trunca-tion terminates and the iterates remain in a bounded region with probability 1.Then one can proceed to obtain convergence of the algorithm. The trajectory-subsequence method developed by the author examines the convergence of thealgorithms by concentrating on the trajectories of the dynamics of the iteratesand by verifying the noise conditions on a subsequence of the iterates. This fa-cilitates the treatment of state-dependent noises. One of the advantages is that aconvergent subsequence fxnk

g is always bounded, whereas the boundedness ofthe iterates fxng is not guaranteed a priori.

The entire book contains six chapters and two appendixes. At the end of eachchapter, there is a section titled “Notes and References,” which gives historicalnotes and an update on the latest research progress.

The book begins with an introduction to the Robbins–Monro algorithm. Af-ter a brief description of the algorithms, the author discusses the probabilitymethod that was used in the early development of the stochastic approximationmethod: the ODE (ordinary differential equation) method that combines proba-bilistic arguments with analytic techniques that use compactness in an essentialway. Then an account of the TS approach is presented. In addition, the weakconvergence method is also surveyed in this chapter.

The second chapter is devoted to stochastic approximation with randomlyvarying truncation bounds. Following the motivational discussion, general con-vergence via the TS method is presented. Other issues discussed in this chap-ter include state-dependent noise, non-additive noise, necessary conditions forconvergence, connection of trajectory convergence and limit points, robustness,and dynamic stochastic approximation.

Chapter 3 proceeds with the study of further asymptotic properties of sto-chastic approximation. Here convergence rates are studied, including both non-degenerate and degenerate cases. It also deals with asymptotic normality andasymptotic ef� ciency via averaging.

Chapter 4 continues the study for stochastic optimization algorithms. Ran-dom truncation algorithms are developed for Kiefer–Wolfowitz (KW) type al-gorithms. Then these algorithms are analyzed. The KW algorithms normallyconverge only to local optima. In this chapter, global optimization algorithmsare also studied, together with an application to model reduction.

Chapters 5 and 6 concentrate on applications to signal processing, and sys-tems theory and control. In these two chapters, in addition to presenting somewell-known applications in adaptive control and system identi� cation, adap-tive � ltering with constraints, and asynchronous estimation, the author pro-vides several newly worked out examples, such as recursive blind identi� cation,principal components analysis and its recursion via stochastic approximation,adaptive � ltering with a hard limiter, pole assignment for systems with un-known coef� cients, and adaptive regulation. Because of the TS approach, con-ditions used in the application examples are weaker than those used in theexisting results.

The two appendixes give some background material. They include basics inprobability theory and stochastic processes.

Page 2: Stochastic Approximation and Its Applications

296 Book Reviews

This book can be used by researchers in stochastic approximation, recursiveestimation, adaptive control, signal processing, and Monte Carlo optimization.It may also be used as an advanced graduate-level textbook for a topics courseon stochastic processes, recursive estimation, or asymptotic statistics.

In conclusion, this book presents a treatise on stochastic approximation andrecursive estimation with an illuminating introduction to the � elds, and varioustheoretical and practical issues. It is a welcome addition to the literature pertain-ing to systems theory and control, signal processing, and other related � elds. Itis conceivable that this book will have a signi� cant impact on these � elds.

George YIN

Wayne State University

REFERENCES

Chen, H.-F., and Zhu, Y. M. (1986), “Stochastic Approximation ProceduresWith Randomly Varying Truncations,” Scientia Sinica, Ser. A, 29, 914–926.

Kiefer, J., and Wolfowitz, J. (1952), “Stochastic Estimation of the Maximum ofa Regression Function,” Annals of Mathematical Statistics, 23, 462–466.

Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,”Annals of Mathematical Statistics, 22, 400–407.

Optimal Design of Blocked and Split-Plot Experiments.

Peter GOOS. New York: Springer-Verlag, 2002. ISBN 0-387-95515-1.xiii C 244 pp. $59.95 (P).

It could be argued that most industrial experiments are run as block de-signs or split-plot designs. However, until the last decade, most optimal designliterature focused on completely randomized designs. Industrial split-plot andblocked designs have received a great deal of attention both in the literatureand in national conferences in recent years. The attention has focused on theiranalysis and on optimal design principles, as well as on educating practitionerson the importance and bene� ts of using restricted randomization in experimen-tation and the “how tos” of analyzing these designs. This text offers almosteverything that one could hope for in a review of this topic and is a must haveresource for those doing research in this area. The target audience is applied andtheoretical statisticians and those in academia. The text is more theoretical thanintuitive, as a great deal of attention is devoted to derivations of D-optimalitycriteria.

Practically speaking, the types of designs used for split-plot and blockedexperiments are the standard optimal designs that are used when one has theluxury of completely randomizing the experiment. The problem, as the authorpoints out in numerous settings, is that these designs perform poorly when thedata are correlated. Optimal design theory is important because one can tai-lor a design for a speci� c situation instead of forcing a situation into a stan-dard design. The text by Goos is devoted solely to optimal design theory whenthere have been restrictions on randomization. It is considerably easier to readthan the classic optimal design text by Silvey (1980), but not as intuitive as theresponse surface methods text by Myers and Montgomery (2002) or the ex-perimental design text by Montgomery (2001). Silvey’s text presents optimaldesign theory for continuous designs and thus is a measure-theoretic approach.Goos differentiates between discrete and continuous designs in the � rst chap-ter, and the treatment of continuous designs is fairly light throughout the text.Myers and Montgomery (2002) discussed design optimality in their text, butonly for completely randomized designs. The appealing thing about Myers andMontgomery’s approach to topics in their text is that they illustrated most pointswith examples. Goos does some illustration, but to be more appealing, it wouldbe nice if more of the concepts and algorithms were illustrated with more de-tailed examples. I highly recommend the text by Goos as a supplement to theresponse-surface methods texts of Myers and Montgomery (2002) or Khuri andCornell (1996).

The � rst chapter in the book is the longest, and the author does a greatjob of giving a broad overview of design of experiments. In this chapter, hedifferentiates between discrete and continuous designs, presents the standardresponse-surface designs, discusses blocking, and presents the various alpha-betic optimality design criteria. In developing the D-optimality criterion, Goosdemonstrates the fact that the information matrix can be written as a sum ofouter products of vectors. The rationale for doing this is that when one ex-presses the information matrix in this manner, one has a simple expression for

adding or deleting design points. The addition and deletion of design points isthe mechanism by which exchange algorithms operate when determining theoptimal design for a given setting. Throughout the text, as Goos presents vari-ous types of experimental situations, he begins by writing the information ma-trix in this manner and then describes exchange algorithms that can be used to� nd the optimal design. The � rst chapter is carefully written (as is the rest ofthe text) and serves as the cornerstone for the advanced topics discussed laterin the book. The D-optimality criterion is the criterion used in the algorithmspresented throughout the text.

In many experimental situations, the assumptions of homogenous varianceand uncorrelated observations is not satis� ed. An overview of these situationsis presented in Chapter 2. Perhaps the most interesting component of this chap-ter is the discussion of orthogonal blocking as a D-optimal strategy. However,as the author points out, in most practical situations, the number of blocks andthe block sizes make it impossible to block a design orthogonally. It wouldhave been useful if the author had taken a set of design points and blockedthem orthogonally as well as nonorthogonally, and then illustrated the extent ofthe advantage that orthogonal blocking offers. As I read through this chapter,I found myself wondering if there might be an advantage to “almost blockingorthogonally.”

Chapter 3 introduces birandomized designs. Birandomized designs in-clude split-plot and block designs. The remainder of the text, approximately150 pages, is devoted to discussions of speci� c cases of birandomized designs.In birandomized designs, the number of groups and the group sizes are often� xed by practical considerations prior to running an experiment. Chapters 4–6and part of Chapter 9 are devoted to these types of birandomized designs. Chap-ter 4 is devoted to optimal designs in this setting when the block effects are ran-dom. In Chapter 5, optimal design strategy is discussed for quadratic regressionon one variable when the blocks are of size 2. This problem is motivated by anoptometry experiment originally discussed by Chasalow (1992). In Chapter 6,the author provides an overview of industrial split-plot designs, and provides ageneral algorithm for the ef� cient design of split-plot experiments for any num-ber of observations, number of whole plots, and whole-plot sizes speci� ed bythe researcher. A consequence of split-plot experimentation is that whole-plotfactors are estimated less precisely and subplot factors are estimated more pre-cisely than they would be in a completely randomized design. A nice illustrationis provided.

In some experimental situations, the researcher has the � exibility to choosethe number of groups as well as the group sizes, and wishes to optimally choosethese parameters. These situations are the topics of Chapters 7 and 8, and partof Chapter 4. Industrial split-plot designs generally are used when one or morefactors are dif� cult or costly to change or control. As a result, once the levels ofthe dif� cult or costly factors are set, the combinations of the levels of the otherfactors are run in random order before changing the levels of the dif� cult/costlyfactors. In this situation, the experimenter may wish to know the optimal num-ber of whole-plot settings to run and the optimal number of observations totake on a given whole plot. This problem is detailed, and a design-constructionalgorithm is provided in Chapter 7. In Chapter 8, interesting comparisons aremade regarding the D-ef� ciency of split-plot designs relative to completelyrandomized designs. The chapter also provides interesting insight regarding theimprovement of split-plot designs when the number of whole plots is increased.

In Chapter 9, the author gives an overview of two-level factorial and frac-tional factorial blocked and split-plot designs, and discusses the concept ofaberration. The text is summarized in Chapter 10, and Goos suggests areas offuture research.

Overall, the book is very well organized and is a terri� c resource on designoptimality for blocked and split-plot experimentation. I highly recommend thistext as a supplement to a text such as the one by Myers and Montgomery (2002)for an advanced design course. The text contains no exercises. The literaturereview provided on each topic is expansive. Although optimal design construc-tion algorithms are outlined in each chapter, none of them is illustrated. Thetext would be improved immensely if one or two of these algorithms were il-lustrated point by point. An example of such a point-by-point illustration wasprovided by Borkowski (2003) in his manuscript that outlined the use of geneticalgorithms to generate optimal response surface designs, complete with an ex-ample and the various steps used in the genetic algorithm. In summary, this texthas a great deal to offer to anyone doing research in design optimality and theauthor should be commended for his solid treatment of this important area ofindustrial experimentation.

Timothy J. ROBINSON

University of Wyoming