CMAT · 2018. 7. 24. · 1 Some Details 9 1.1 Examples for glim, glmixd, and glmod ... 3.18 Freshman Data: Campbell & McCabe (1984) ... Classification Accuracy 14.28571429 . Goodman-Kruskal

CMAT c©

Details Manual

Extension of C Language:

Matrix Algebra, Statistics,

Nonlinear Optimization and Estimation

Release 9 (September 2016)

Wolfgang M. Hartmann 1

Friedrich-Ebert-Anlage 46 / ID-69117 Heidelberg, Germany

Copyright c©1997 by Wolfgang M. HartmannAll Rights Reserved

Reproduction, translation, or transmission of any part of this workwithout the written permission of the copyright owner is unlawful.

December 4, 2016

1Thanks to my dear wife Walee and my old brave Apollo 4500.

2

Contents

1 Some Details 9

1.1 Examples for glim, glmixd, and glmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.1.1 Examples for glim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.1.2 Stratified Two-Phase Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.1.3 Examples for glmixd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

1.1.4 Examples for glmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

1.2 SEM: Structural Equation Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

1.2.1 Model Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

1.2.2 A Structural Equation Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

1.2.3 Model Specification of Confirmatory Factor Analysis . . . . . . . . . . . . . . . . . . . . . . 157

1.2.4 Assessment of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

1.2.5 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

1.2.6 Goodness-of-Fit Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

1.2.7 Measures of Multivariate Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

1.2.8 Initial Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

1.2.9 Automatic Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

1.2.10 Exogenous Manifest Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

1.2.11 List of Test Examples for SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

1.3 Details on Optimization: LP and QP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

1.3.1 Quadratic Optimization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

1.4 Details on Optimization: NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

1.4.1 Choosing an Optimization Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

1.4.2 List of Options in Alphabetical Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

3

4

1.4.3 Summary of Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

1.4.4 Types of Derivatives Used in Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

1.4.5 Finite Difference Approximations of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 219

1.4.6 Hessian and CRP Jacobian Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

1.4.7 Criteria for Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

1.4.8 Remarks in LINCOA (Powell, 2014) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

1.4.9 Computational Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

1.5 Some Examples for NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

1.5.1 Two Examples for Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

1.5.2 Exploiting Sparsity: Extended Rosenbrock . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

1.5.3 ML Estimation: Examples by Clarke (1987) . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

1.5.4 ML Estimation: Error Rates of Two Medical Diagnostic Tests . . . . . . . . . . . . . . . . . 274

1.5.5 Diffusion of Tetracycline Hydrochloride . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

1.5.6 Using QP and NLP for Computing Efficient Frontier . . . . . . . . . . . . . . . . . . . . . . 296

1.5.7 Fitting Quantal Response Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

1.5.8 Neural Nets with One Hidden Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

1.6 Details on Nonlinear L1 and L∞ Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

1.6.1 Nonlinear L1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

1.6.2 Nonlinear L∞ (Chebyshev) Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

1.7 Details on Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

1.7.1 Different Model Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

1.7.2 Binary Classification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

1.7.3 SVM Regression Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

1.7.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

1.8 Details on Robust Regression: LMS and LTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406

1.8.1 Estimation Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406

1.8.2 Syntax for LMS and LTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

1.8.3 Some Computational Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410

1.8.4 Flow Chart for LMS and LTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

1.8.5 Examples for LMS and LTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

1.9 Details on Outlier Detection: MVE and MCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

1.9.1 Estimation Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

5

1.9.2 Syntax for MVE and MCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422

1.9.3 Examples for MVE and MCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

1.9.4 Examples Combining Robust Residuals and Robust Distances . . . . . . . . . . . . . . . . . 434

2 Some Useful CMAT Modules 437

2.1 Module EVTST for Testing the EV Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 437

2.2 Module GEVTST for Testing Generalized EV Decomposition . . . . . . . . . . . . . . . . . . . . . 438

2.3 Module GSCHURT for Testing Generalized Schur Decomposition Examples . . . . . . . . . . . . . 439

2.4 Module GSVDTST for Testing Generalized SVD Decomposition . . . . . . . . . . . . . . . . . . . 440

2.5 Module GLSTST for Testing the GLS and GLM Functions . . . . . . . . . . . . . . . . . . . . . . 441

2.6 Modules for Testing the LLS and PINV Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 444

2.7 Some Modules for Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

3 Some Important Data Sets 457

3.1 Iris Data: Fisher [253] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457

3.2 Brain Data: Weisberg [840] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

3.3 Hertzsprung-Russell Star Data: Rousseeuw & Leroy [715] . . . . . . . . . . . . . . . . . . . . . . . 459

3.4 Brainlog Data: Rousseeuw & Leroy [715] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

3.5 Kootenay Data: Rousseeuw & Leroy [715] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460

3.6 Heart Data: Weisberg [840] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460

3.7 Real Estate Example by Narula and Wellington (1977) . . . . . . . . . . . . . . . . . . . . . . . . . 461

3.8 Mail Order Catalog Data: Spaeth [776] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461

3.9 Stackloss Data: Brownlee [127] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

3.10 Hald Data: see Draper and Smith [225] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

3.11 Longley Data: Weisberg [840] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

3.12 Census Data: SAS Institute [734] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

3.13 Fitness Data: SAS Institute [734] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464

3.14 Skin Data: SAS Institute [734] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

3.15 SAT Data: SAS Institute [734] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

3.16 Data by Chatterjee and Price [157] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466

3.17 Wampler Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467

3.18 Freshman Data: Campbell & McCabe (1984) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467

3.19 Orheim Data: Campbell & McCabe (1984) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467

6

3.20 Thurstone Box Data (Jennrich & Sampson, 1966, p.320) . . . . . . . . . . . . . . . . . . . . . . . . 468

3.21 Twenty Four Psychological Tests (Holtzinger & Harman) . . . . . . . . . . . . . . . . . . . . . . . 469

3.22 Birth and Death Rates per 1000 Persons (Hartigan, p.197) . . . . . . . . . . . . . . . . . . . . . . . 471

4 The Bibliography 473

5 Index 509

List of Figures

1.1 Path Diagram of Stability and Alienation Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

1.2 Specific Examples of RAM nomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

1.3 Path Diagram of Second-Order Factor Analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . 158

1.4 Endogenous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

1.5 Flow Chart for LMS and LTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413

7

8 Details: Examples for glim, glmixd, and glmod

Chapter 1

Some Details

1.1 Examples for glim, glmixd, and glmod

1.1.1 Examples for glim

1. Multiordinal logistic regression: Cheese data McCaullagh and Nelder (1989, p.175):

a = [ 1 0 0 1 0, 1 0 0 2 0,1 0 0 3 1, 1 0 0 4 7,1 0 0 5 8, 1 0 0 6 8,1 0 0 7 19, 1 0 0 8 8,1 0 0 9 1, 0 1 0 1 6,0 1 0 2 9, 0 1 0 3 12,0 1 0 4 11, 0 1 0 5 7,0 1 0 6 6, 0 1 0 7 1,0 1 0 8 0, 0 1 0 9 0,0 0 1 1 1, 0 0 1 2 1,0 0 1 3 6, 0 0 1 4 8,0 0 1 5 23, 0 0 1 6 7,0 0 1 7 5, 0 0 1 8 1,0 0 1 9 0, 0 0 0 1 0,0 0 0 2 0, 0 0 0 3 0,0 0 0 4 1, 0 0 0 5 3,0 0 0 6 7, 0 0 0 7 14,0 0 0 8 16, 0 0 0 9 11 ];

/* use WGT variable */clas = 4;optn = [ "print" 5,

"link" "logit","dist" "binom","wgt" 5,"tech" "trureg" ];

gof = glim(a,"4= 1 2 3",optn,clas);

*****************Model Information*****************

Number Valid Observations 36Response Variable Y[4]N Independend Variables 3Error Distribution BINOMIAL

9


Link Function LOGIT

Weight Variable Column 5Significance Level: 0.0500000Design Coding: Full-RankHessian Matx. for OptimizationHessian Matx. for Covariance MNo Variable Selection Process

*************Model Effects*************

Intercept + X1 + X2 + X3

***********************Class Level Information***********************

Class Level Value

Y[1] 9 1 2 3 4 56 7 8 9

*****************Simple Statistics*****************

Column Nobs Mean Std Dev Skewness KurtosisX[1] 36 0.2500000 0.4391550 1.2055236 -0.5822935X[2] 36 0.2500000 0.4391550 1.2055236 -0.5822935X[3] 36 0.2500000 0.4391550 1.2055236 -0.5822935

*********************Parameter Information*********************

Parameter | Meaning----------------------------------------No Param. | Threshold 0

1 | Threshold 12 | Threshold 23 | Threshold 34 | Threshold 45 | Threshold 56 | Threshold 6

Details: Examples for glim, glmixd, and glmod 11

7 | Threshold 78 | X19 | X2

10 | X3

************************************************Sums of Weights and Frequencies for Class Levels************************************************

Total Weights = 208

Variable Value Weight Freq Weight*FreqY[4] 1 7.0000000 4 7.000000000

2 10.000000 4 10.000000003 19.000000 4 19.000000004 27.000000 4 27.000000005 41.000000 4 41.000000006 28.000000 4 28.000000007 39.000000 4 39.000000008 25.000000 4 25.000000009 12.000000 4 12.00000000

8 Observations w. Zero Weight or Frequency

*********************Goodness of Model Fit*********************

Log Likelihood -355.67395 Degrees of Freedom 17Deviance . Pearson ChiSquare .

AIC (Intercept) 875.80180 AIC (All Param.) 733.34790SBC (Intercept) 886.45944 SBC (All Param.) 748.00215-2logL (Intercept) 859.80180 -2logL (All Param.) 711.34790-2logL (ChiSqu.) 148.45390 Pvalue (df= 3) 0.0000000Score ChiSqu. Test 143.66374 Pvalue (df= 3) 0.0000000

Score Test for Proportional Odds Assumption:ChiSquare 17.286815 Pvalue (df= 21) 0.6935801

*******************************************Analysis of Effects and Parameter Estimates*******************************************

Parameter DF Estimate Std_Error WaldChiSq Pr>ChiSq

Threshold 1 0 0.000000 . . .Threshold 2 1 -7.080166 0.564010 157.58438 0.000000


Threshold 3 1 -6.024980 0.476431 159.92303 0.000000Threshold 4 1 -4.925416 0.425651 133.89917 0.000000Threshold 5 1 -3.856801 0.388022 98.796787 0.000000Threshold 6 1 -2.520552 0.345268 53.293965 0.000000Threshold 7 1 -1.568538 0.312208 25.240787 0.000001Threshold 8 1 -0.066875 0.273819 0.0596489 0.807052Threshold 9 1 1.492974 0.335696 19.779412 0.000009X1 1 1.612791 0.380544 17.961686 0.000023X2 1 4.964640 0.476721 108.45455 0.000000X3 1 3.322683 0.421830 62.044388 0.000000Scale 0 . 0.000000 . .

*******************************Confidence Limits of Parameters*******************************

Parameter Estimate LowWaldCL UppWaldCL

Intercept 1 -7.0801661 -8.18560555 -5.97472665Intercept 2 -6.0249800 -6.95876784 -5.09119223Intercept 3 -4.9254155 -5.75967671 -4.09115436Intercept 4 -3.8568014 -4.61730966 -3.09629308Intercept 5 -2.5205516 -3.19726486 -1.84383839Intercept 6 -1.5685382 -2.18045410 -0.95662228Intercept 7 -0.0668752 -0.60355079 0.46980039Intercept 8 1.4929743 0.83502305 2.15092564X1 1.6127909 0.86693900 2.35864281X2 4.9646400 4.03028487 5.89899505X3 3.3226828 2.49591085 4.14945470

*************************************Odds Ratio and Standardized Estimates*************************************

Parameter OddsRatio StndEst

X1 5.0167931 0.390487X2 143.25696 1.202033X3 27.734657 0.804484

*******************************Evaluation of Training Data Fit*******************************

Index Value StdErrAbsolute Classification Error 24 .Concordant Pairs 53.04347826 .


Discordant Pairs 22.31884058 .Tied Pairs 24.63768116 .Classification Accuracy 14.28571429 .Goodman-Kruskal Gamma 0.407692308 0.137866946Kendall Tau_b 0.333397629 0.114924996Stuart Tau_c 0.304209184 0.106180684Somers D C|R 0.307246377 0.105153193

Classification Table--------------------

| PredictedObserved | 1 2 3 4 5 6 7---------|---------------------------------------------------------

1 | 0 0 1 0 1 0 02 | 0 0 1 0 1 0 03 | 0 0 1 0 1 0 14 | 0 0 1 0 1 0 15 | 0 0 1 0 1 0 16 | 0 0 1 0 1 0 17 | 0 0 1 0 1 0 18 | 0 0 0 0 1 0 19 | 0 0 0 0 0 0 1

| PredictedObserved | 8 9---------|-----------------

1 | 0 02 | 0 03 | 0 04 | 1 05 | 1 06 | 1 07 | 1 08 | 1 09 | 1 0

Estimated Covariance Matrix of Estimates----------------------------------------

1| 0.31812| 0.2198 0.2273| 0.1747 0.1763 0.18124| 0.1459 0.1462 0.1472 0.15065| 0.1155 0.1155 0.1155 0.116 0.11926| 0.09295 0.09294 0.0929 0.09294 0.09377| 0.06087 0.06087 0.06087 0.06083 0.06072


8| 0.04271 0.04271 0.04271 0.04268 0.042529| -0.09436 -0.09441 -0.09453 -0.09431 -0.09187

10| -0.1884 -0.1819 -0.1668 -0.1448 -0.115511| -0.1347 -0.1349 -0.1345 -0.1316 -0.1145

6| 0.097477| 0.0616 0.074988| 0.04293 0.05105 0.11279| -0.08615 -0.06439 -0.04644 0.1448

10| -0.09302 -0.06087 -0.0427 0.09415 0.227311| -0.09319 -0.06102 -0.04278 0.09238 0.1322

11| 0.1779

Estimated Correlation Matrix of Estimates-----------------------------------------

1| 12| 0.8182 13| 0.7277 0.8694 14| 0.6666 0.7906 0.891 15| 0.5931 0.7022 0.7862 0.866 16| 0.5279 0.6248 0.6991 0.7672 0.86927| 0.3942 0.4666 0.5222 0.5725 0.64228| 0.2256 0.2671 0.2989 0.3276 0.36699| -0.4397 -0.5207 -0.5836 -0.6387 -0.6992

10| -0.7007 -0.8007 -0.8221 -0.7829 -0.701911| -0.5661 -0.6711 -0.7493 -0.8039 -0.7862

6| 17| 0.7206 18| 0.4096 0.5554 19| -0.7251 -0.6179 -0.3636 1

10| -0.625 -0.4663 -0.2668 0.519 111| -0.7076 -0.5283 -0.3021 0.5755 0.6573

11| 1

********************************************Nobs Ypred LowerCL UpperCL

********************************************

1 0.00420455 0.00151068 0.011646102 0.01198326 0.00521714 0.027283683 0.03514062 0.01732919 0.069955864 0.09586736 0.05293003 0.16747645


5 0.28745828 0.18799250 0.412797176 0.51106137 0.38360247 0.637098027 0.82432305 0.72204664 0.894466128 0.95712993 0.90974630 0.980178929 0.96178570 0.92270854 0.98150222

10 0.10759690 0.05116063 0.2123558511 0.25724447 0.16110570 0.3844609712 0.50980485 0.37827136 0.6399946113 0.75172594 0.62851481 0.8442013514 0.92012807 0.85547532 0.9573021215 0.96758248 0.93500141 0.9841095816 0.99259204 0.98312769 0.9967649317 0.99843392 0.99580122 0.9994168518 0.99861051 0.99400160 0.9996792819 0.02280997 0.00909698 0.0560254920 0.06283794 0.03158777 0.1211370121 0.16760002 0.10057683 0.2660739922 0.36955680 0.26210463 0.4917060223 0.69043017 0.57212596 0.7881369824 0.85247479 0.76302641 0.9120515125 0.96288124 0.92735662 0.9813821126 0.99196322 0.98069873 0.9966758827 0.99286423 0.97178259 0.9982242528 8.409e-004 2.786e-004 0.0025357529 0.00241177 9.494e-004 0.0061130830 0.00720739 0.00314223 0.0164449631 0.02069803 0.00978269 0.0432604232 0.07442993 0.03926878 0.1365979733 0.17242488 0.10151950 0.2775549834 0.48328743 0.35353175 0.6153365135 0.81652429 0.69741598 0.8957552436 0.83379851 0.70410837 0.91361876

*********************************LR Statistics for Type 1 Analysis*********************************

Effect Deviance DF ChiSquare Pr>Chi

Intercept . 0 . .X1 . 1 8.64383735 0.0033X2 . 1 67.3990766 0.0000X3 . 1 72.4109852 0.0000



Effect DF ChiSquare Pr>Chi

X1 1 19.1326383 0.0000X2 1 136.064397 0.0000X3 1 72.4109852 0.0000

2. Miltinomial Response Logistic Regression Example: (see PROC LOGISTIC in SAS/STAT r©)

school = [ 1 "regular" "self" 10 , 1 "regular" "team" 17 ,1 "regular" "class" 26 , 1 "afternoon" "self" 5 ,1 "afternoon" "team" 12 , 1 "afternoon" "class" 50 ,2 "regular" "self" 21 , 2 "regular" "team" 17 ,2 "regular" "class" 26 , 2 "afternoon" "self" 16 ,2 "afternoon" "team" 12 , 2 "afternoon" "class" 36 ,3 "regular" "self" 15 , 3 "regular" "team" 15 ,3 "regular" "class" 16 , 3 "afternoon" "self" 12 ,3 "afternoon" "team" 12 , 3 "afternoon" "class" 20 ];

cnam = [ "School" "Program" "Style" "Count" ];school = cname(school,cnam);print "Data=", school;nr = nrow(school); nc = ncol(school); print "nr,nc=", nr,nc;

clas = [ 1 2 3 ];modl = "3 = 1 2 ";optn = [ "print" 5 ,

"link" "logit" ,"dist" "binom" ,"ynom" ,"freq" 4 ,"seed" 123 ,"tech" "trureg" ];

gof = glim(school,modl,optn,clas);

3. Poisson Regression Example: Using Offset Variable (see PROC GENMOD: Example in text)

ins0 = [ 500 42 "small" 1,1200 37 "medum" 1,100 1 "large" 1,400 101 "small" 2,500 73 "medum" 2,300 14 "large" 2 ];

v1 = log(ins0[,1]);ins1 = ins0 -> v1;


clas = [ 3 4 ];optn = [ "print" 5,

"link" "log","dist" "poisson","design" "rankdef","offset" 5,"order" "asc","cl" "plik","tech" "trureg" ];

model = "2 = 3 4";gof = glim(ins1,model,optn,clas);


Number Valid Observations 6Response Variable Y[2]N Independend Variables 2Error Distribution POISSONLink Function LOG

Offset Variable Column 5Significance Level: 0.0500000Design Coding: Rank-DeficientHessian Matx. for OptimizationHessian Matx. for Covariance MNo Variable Selection Process

*************Model Effects*************

Intercept + C3 + C4


Class Level Value

C[3] 3 large medum smallC[4] 2 1 2

*********************Parameter Information


*********************

Parameter | Meaning----------------------------------------

1 | Intercept2 | C3 large3 | C3 medum4 | C3 small5 | C4 16 | C4 2

***************************************Number of Observations for Class Levels***************************************

Variable Value Nobs ProportionC[3] large 2 33.333333

medum 2 33.333333small 2 33.333333

C[4] 1 3 50.0000002 3 50.000000

Number Optimizations=68 Number Iterations=138


Log Likelihood 837.45327 Degrees of Freedom 2Deviance 2.8206651 Scaled Deviance 2.8206651Pearson ChiSquare 2.8416089 Scaled Pearson CS 2.8416089



Intercept 1 -1.316758 0.090280 212.73212 0.000000C3 large 1 -1.764281 0.272368 41.958749 0.000000C3 medum 1 -0.692778 0.128248 29.179963 0.000000C3 small 0 0.000000 . . .C4 1 1 -1.319933 0.135896 94.338801 0.000000C4 2 0 0.000000 . . .Scale 0 1.000000 0.000000 . .



Parameter Estimate LowWaldCL UppWaldCL LowPLCL UppPLCL

Intercept -1.3167581 -1.493703 -1.139813 -1.498003 -1.144507C3 large -1.7642810 -2.298113 -1.230449 -2.338911 -1.265364C3 medum -0.6927777 -0.944140 -0.441416 -0.945318 -0.442494C3 small 0.0000000 . . . .C4 1 -1.3199328 -1.586284 -1.053582 -1.590453 -1.057755C4 2 0.0000000 . . . .


1| 0.008152| -0.007772 0.074183| -0.006344 0.006556 0.016454| . . . .5| -0.004623 0.003113 -0.002592 . 0.018476| . . . . .

6| .


1| 12| -0.3161 13| -0.5479 0.1877 14| . . . .5| -0.3768 0.08411 -0.1487 . 16| . . . . .

6| .

********************************************************Nobs Ypred LowerCL UpperCL Diag(H)

********************************************************

1 35.7989022 27.6490421 46.3510234 0.621885242 42.9745564 33.5520312 55.0432398 0.685336113 1.22654141 0.69916905 2.15170257 0.100867924 107.201098 89.8158801 127.951487 0.873731775 67.0254436 54.1186609 83.0103706 0.798247656 13.7734586 8.29965942 22.8573429 0.91993131


Pearson ChiSquare=2.84161 Deviance=2.82067 (df=2)



Intercept 175.1535596 0 . .C3 107.4620294 2 67.6915301 0.0000C4 2.820665105 1 104.641364 0.0000



C3 2 72.8180614 0.0000C4 1 104.641364 0.0000

4. Logistic regression: Events / Trial: (events in column 3, trials in column 4) (see PROC GENMOD: Example1)

a = [ ’a’ .1 1 10,’a’ .23 2 12,’a’ .67 1 9,’b’ .2 3 13,’b’ .3 4 15,’b’ .45 5 16,’b’ .78 5 13,’c’ .04 0 10,’c’ .15 0 11,’c’ .56 1 12,’c’ .7 2 12,’d’ .34 5 10,’d’ .6 5 9,’d’ .7 8 10,’e’ .2 12 20,’e’ .34 15 20,’e’ .56 13 15,’e’ .8 17 20 ];

clas = 1 ;optn = [ "print" 5,

"link" "logit","dist" "binom","design" "rankdef","trial" 4,"cl" "plik","tech" "trureg" ];

gof = glim(a,"3/4 = 1 2",optn,clas);



Number Valid Observations 18Events/Trial Resp [3]/[4]N Independend Variables 2Error Distribution BINOMIALLink Function LOGIT

Trial Variable Column 4Significance Level: 0.0500000Design Coding: Rank-DeficientHessian Matx. for OptimizationHessian Matx. for Covariance MNo Variable Selection Process

*************Model Effects*************

Intercept + C1 + X2


Class Level Value

C[1] 5 a b c d e


Column Nobs Mean Std Dev Skewness KurtosisY[3] 18 0.3746973 0.2992524 0.4545768 -1.2341128X[2] 18 0.4288889 0.2478351 0.0274600 -1.4378591



1 | Intercept2 | C1 a3 | C1 b4 | C1 c


5 | C1 d6 | C1 e2 | X2

*******************************Events/Trials Response Variable*******************************

Value Nobs ProportionEvent 99 41.772152NonEvent 138 58.227848


Variable Value Nobs ProportionC[1] a 3 16.666667

b 4 22.222222c 4 22.222222d 3 16.666667e 4 22.222222



Log Likelihood -114.77315 Degrees of Freedom 12Deviance 5.2751265 Pearson ChiSquare 4.5132642

AIC (Intercept) 324.10476 AIC (All Param.) 241.54630SBC (Intercept) 327.57282 SBC (All Param.) 262.35466-2logL (Intercept) 322.10476 -2logL (All Param.) 229.54630-2logL (ChiSqu.) 92.558457 Pvalue (df= 5) 0.0000000Score ChiSqu. Test 82.028739 Pvalue (df= 5) 0.0000000



Intercept 1 0.279241 0.419563 0.4429610 0.505697C1 a 1 -2.895540 0.609224 22.589416 0.000002


C1 b 1 -2.016170 0.405161 24.762787 0.000001C1 c 1 -3.795214 0.665460 32.525839 0.000000C1 d 1 -0.854839 0.483816 3.1218157 0.077251C1 e 0 0.000000 . . .X2 1 1.979395 0.766021 6.6770288 0.009766Scale 0 1.000000 0.000000 . .



Intercept 0.2792412 -0.543086 1.101569 -0.533186 1.117081C1 a -2.8955403 -4.089598 -1.701483 -4.226861 -1.791785C1 b -2.0161704 -2.810271 -1.222070 -2.835468 -1.244268C1 c -3.7952137 -5.099491 -2.490936 -5.310122 -2.626765C1 d -0.8548389 -1.803102 0.093424 -1.803701 0.101024C1 e 0.0000000 . . . .X2 1.9793946 0.478021 3.480768 0.504821 3.516931

*************************************Odds Ratio and Standardized Estimates*************************************

Parameter OddsRatio StndEst

C1 a 0.0552692 .C1 b 0.1331645 .C1 c 0.0224781 .C1 d 0.4253517 .C1 e 1.0000000 .X2 7.2383596 0.270462

*******************************Evaluation of Training Data Fit*******************************

Index Value StdErrAbsolute Classification Error 104 .Concordant Pairs 19.14800176 .Discordant Pairs 15.92007027 .Tied Pairs 64.93192798 .Classification Accuracy 56.11814346 .Goodman-Kruskal Gamma 0.092047589 0.155717980Kendall Tau_b 0.038206267 0.065391230Stuart Tau_c 0.031405224 0.053805000


Somers D C|R 0.032279315 0.055298112

Classification Table--------------------

| PredictedObserved | Event NoEvt---------|-----------------

Event | 24 75NoEvt | 29 109


1| 0.1762| -0.07918 0.37123| -0.06438 0.07529 0.16424| -0.05461 0.07494 0.07805 0.44285| -0.04626 0.07465 0.07899 0.08186 0.23416| . . . . .7| -0.2427 0.008465 -0.02733 -0.05096 -0.07115

6| .7| . 0.5868


1| 12| -0.3098 13| -0.3787 0.305 14| -0.1956 0.1849 0.2895 15| -0.2279 0.2533 0.403 0.2543 16| . . . . .7| -0.755 0.01814 -0.08805 -0.09997 -0.192

6| .7| . 1


********************************************************

1 0.08178335 0.02726973 0.22056278 0.261220302 0.10330423 0.03711003 0.25615979 0.346966273 0.21583508 0.08089329 0.46258554 0.51548072


4 0.20733739 0.11514186 0.34460523 0.271136425 0.24174966 0.14556368 0.37369596 0.281136456 0.30023025 0.19385073 0.43358900 0.293331527 0.45189886 0.27987278 0.63624192 0.474122138 0.03116506 0.00812448 0.11215879 0.147076289 0.03845460 0.01086362 0.12711474 0.17683756

10 0.08260169 0.02682237 0.22728723 0.3317434511 0.10617737 0.03395168 0.28648500 0.4397204912 0.52432987 0.32837457 0.71306946 0.4290755013 0.64840579 0.46188261 0.79848365 0.3124182014 0.69210492 0.50298517 0.83313577 0.3532850715 0.66264868 0.51195104 0.78624314 0.4579937916 0.72156450 0.59913422 0.81796367 0.3168495517 0.80022515 0.69112953 0.87761101 0.2116591418 0.86561796 0.74472842 0.93430758 0.37994716




Intercept 97.83358387 0 . .C1 12.24049041 4 85.5930935 0.0000X2 5.275126475 1 6.96536393 0.0083



C1 4 77.0556642 0.0000X2 1 6.96536393 0.0083

5. Normal Regression, Log Link (see PROC GENMOD document):


a = [ 0 5,0 7,0 9,1 7,1 10,1 8,2 11,2 9,3 16,3 13,3 14,4 25,4 24,5 34,5 32,5 30 ];

optn = [ "print" 5,"link" "log","dist" "normal","hesalg" "fscm","tech" "trureg" ];

gof = glim(a,"2=1",optn);


Number Valid Observations 16Response Variable Y[2]N Independend Variables 1Error Distribution NORMALLink Function LOG

Significance Level: 0.0500000Design Coding: Full-RankFisher Sc.Mx. for OptimizationHessian Matx. for Covariance MNo Variable Selection Process

*************Model Effects*************

Intercept + X1



*********************


Parameter Information*********************


1 | Intercept2 | X13 | Weight Parameter


Log Likelihood -32.178277 Degrees of Freedom 14Deviance 52.299994 Scaled Deviance 16.000000Pearson ChiSquare 52.299994 Scaled Pearson CS 16.000000



Intercept 1 1.721357 0.089397 370.76077 0.000000X1 1 0.349602 0.020649 286.64049 0.000000Scale 1 1.807968 0.319607 . .



Intercept 1.7213575 1.54614206 1.89657286X1 0.3496015 0.30912971 0.39007333Scale 1.8079684 1.18155080 2.43438594


1| 0.0079922| -0.001775 0.00042643| -2.2e-008 5.285e-009 0.1021

Estimated Correlation Matrix of Estimates


-----------------------------------------1| 12| -0.9615 13| -7.699e-007 8.008e-007 1


********************************************************

1 5.59211440 4.69332865 6.66302018 0.084552862 5.59211440 4.69332865 6.66302018 0.057208193 5.59211440 4.69332865 6.66302018 0.029863514 7.93242650 6.91854568 9.09488686 0.104731185 7.93242650 6.91854568 9.09488686 0.069288576 7.93242650 6.91854568 9.09488686 0.092916977 11.2521643 10.1824755 12.4342261 0.102868858 11.2521643 10.1824755 12.4342261 0.120752369 15.9612197 14.9221444 17.0726491 0.09171294

10 15.9612197 14.9221444 17.0726491 0.1089928911 15.9612197 14.9221444 17.0726491 0.1032329012 22.6410251 21.5673585 23.7681409 0.0863158413 22.6410251 21.5673585 23.7681409 0.0905716214 32.1163435 30.2803900 34.0636141 0.2679386815 32.1163435 30.2803900 34.0636141 0.2856637716 32.1163435 30.2803900 34.0636141 0.30338887




Intercept 1439.750000 0 . .X1 52.29999415 1 53.0436561 0.0000



X1 1 53.0436561 0.0000

6. Gamma Distribution Applied to Life Data (see PROC GENMOD document):


a = [ 620 470 260 89 388242 103 100 39 460284 1285 218 393 106158 152 477 403 10369 158 818 947 399

1274 32 12 134 660548 381 203 871 193531 317 85 1410 25041 1101 32 421 32

343 376 1512 1792 4795 76 515 72 1585

253 6 860 89 1055537 101 385 176 11565 164 16 1267 352160 195 1279 356 751500 803 560 151 24689 1119 1733 2194 763555 14 45 776 1 ];

b = [ 1747 945 12 1453 14150 20 41 35 69195 89 1090 1868 29496 618 44 142 892

1307 310 230 30 403860 23 406 1054 1935561 348 130 13 230250 317 304 79 1793536 12 9 256 201733 510 660 122 27273 1231 182 289 667761 1096 43 44 87405 998 1409 61 278407 113 25 940 28848 41 646 575 219303 304 38 195 1061174 377 388 10 246323 198 234 39 30855 729 813 1216 1618

539 6 1566 459 946764 794 35 181 147116 141 19 380 609546 ];

nca = ncol(a); av = cons(nca,1,’a’);ncb = ncol(b); bv = cons(ncb,1,’b’);ad = av -> a‘; bd = bv -> b‘;lifdat = ad |> bd;print lifdat;

optn = [ "print" 5,"link" "log","dist" "gamma","design" "rankdef","hesalg" "hess","cov" "fscm","cl" "plik","tech" "trureg" ];

clas = 1;model = "2=1";gof = glim(lifdat,model,optn,clas);


Number Valid Observations 201Response Variable Y[2]


N Independend Variables 1Error Distribution GAMMALink Function LOG

Significance Level: 0.0500000Design Coding: Rank-DeficientHessian Matx. for OptimizationFisher Sc.Mx. for Covariance MNo Variable Selection Process

*************Model Effects*************

Intercept + C1


Class Level Value

C[1] 2 a b



1 | Intercept2 | C1 a3 | C1 b2 | Weight Parameter


Variable Value Nobs ProportionC[1] a 90 44.776119

b 111 55.223881




Log Likelihood -1432.4177 Degrees of Freedom 199Deviance 287.05911 Scaled Deviance 237.53348Pearson ChiSquare 211.68666 Scaled Pearson CS 175.16486



Intercept 1 6.130168 0.104343 3451.6064 0.000000C1 a 1 0.019889 0.155933 0.0162692 0.898505C1 b 0 0.000000 . . .Scale 1 0.827472 0.071440 . .



Intercept 6.1301683 5.925661 6.334676 5.931925 6.342147C1 a 0.0198894 -0.285734 0.325513 -0.284546 0.328243C1 b 0.0000000 . . . .Scale 0.8274723 0.687453 0.967492 0.696083 0.975914


1| 0.010892| -0.01089 0.024323| . . .4| 2.07e-011 1.153e-011 . 0.005104


1| 12| -0.6691 13| . . .4| 2.777e-009 1.035e-009 . 1

********************************************************


Nobs Ypred LowerCL UpperCL Diag(H)********************************************************

1 468.744442 373.508396 588.263488 0.011111112 468.744442 373.508396 588.263488 0.011111113 468.744442 373.508396 588.263488 0.011111114 468.744442 373.508396 588.263488 0.011111115 468.744442 373.508396 588.263488 0.011111116 468.744442 373.508396 588.263488 0.011111117 468.744442 373.508396 588.263488 0.011111118 468.744442 373.508396 588.263488 0.011111119 468.744442 373.508396 588.263488 0.01111111

10 468.744442 373.508396 588.263488 0.01111111

198 459.513512 374.525749 563.786785 0.00900901199 459.513512 374.525749 563.786785 0.00900901200 459.513512 374.525749 563.786785 0.00900901201 459.513512 374.525749 563.786785 0.00900901




Intercept 287.0787882 0 . .C1 287.0591137 1 0.01627968 0.8985



C1 1 0.01627968 0.8985

7. Poisson Polynomial Regression (see PROC GENMOD document):


a = [ 1 0,2 10,3 50,4 80,5 110,6 116,7 82,8 78,9 66,

10 207,11 900 ];

v = [ -.5 : .1 : .5 ]‘;aa = a[ ,2] -> v;

optn = [ "print" 5,"nomis" ,"link" "log","dist" "poi","design" "rankdef","tech" "trureg" ];

model = "1= 2 2*2 2*2*2 2*2*2*2";gof = glim(aa,model,optn);



Skip Cases with Missing ValuesSignificance Level: 0.0500000Design Coding: Rank-DeficientHessian Matx. for OptimizationHessian Matx. for Covariance MNo Variable Selection Process

*************Model Effects*************

Intercept + X2 + X2 * X2 + X2 * X2 * X2 + X2 * X2 * X2 * X2


Column Nobs Mean Std Dev Skewness KurtosisY[1] 11 154.45455 253.40693 3.0391701 9.6268476X[2] 11 -3.03e-017 0.3316625 3.35e-016 -1.2000000

*********************Parameter Information


*********************


1 | Intercept2 | X23 | X2*X24 | X2*X2*X25 | X2*X2*X2*X2





Intercept 1 4.639941 0.059460 6089.3861 0.000000X2 1 -1.989841 0.374649 28.209052 0.000000X2*X2 1 -6.203628 1.479680 17.577451 0.000028X2*X2*X2 1 35.68272 3.486159 104.76626 0.000000X2*X2*X2*X2 1 4.052269 8.075194 0.2518202 0.615796Scale 0 1.000000 0.000000 . .



Intercept 4.6399406 4.52340090 4.75648023X2 -1.9898407 -2.72413841 -1.25554305X2*X2 -6.2036282 -9.10374773 -3.30350859X2*X2*X2 35.682716 28.8499699 42.5154619X2*X2*X2*X2 4.0522690 -11.7748197 19.8793577



1| 0.0035362| 0.004564 0.14043| -0.06303 -0.2722 2.1894| -0.02863 -1.122 2.051 12.155| 0.2176 2.133 -9.766 -23.04 65.21


1| 12| 0.2049 13| -0.7164 -0.491 14| -0.1381 -0.8594 0.3975 15| 0.4532 0.7052 -0.8173 -0.8185 1


********************************************************

1 0.88415863 0.28221082 2.77004435 0.300159862 9.61515618 6.13026519 15.0811140 0.507087883 42.4342704 34.8642839 51.6479074 0.426497414 90.9932647 78.8829199 104.962826 0.483165335 114.618689 102.379491 128.321050 0.380483036 103.538194 92.1484538 116.335729 0.366059697 82.6825151 72.9308377 93.7381020 0.338985688 72.6586648 64.6934467 81.6045805 0.255006359 88.3157553 78.5863773 99.2496778 0.31320861

10 188.454325 168.142930 211.219304 0.6380197011 904.805007 847.969123 965.450366 0.99132646




Intercept 2557.176296 0 . .X2 690.3605569 1 1866.81574 0.0000X2 * X2 522.8352860 1 167.525271 0.0000X2 * X2 * X2 14.69092338 1 508.144363 0.0000X2 * X2 * X2 ... 14.44287007 1 0.24805332 0.6184

*********************************


LR Statistics for Type 3 Analysis*********************************


X2 1 29.9873509 0.0000X2 * X2 1 17.5887602 0.0000X2 * X2 * X2 1 233.343743 0.0000X2 * X2 * X2 ... 1 0.24805332 0.6184

optn = [ "print" 5,"nomis" ,"link" "log","dist" "poi","design" "rankdef","scale" "pear","tech" "trureg" ];

model = "1= 2 2*2 2*2*2";gof = glim(aa,model,optn);



Skip Cases with Missing ValuesSignificance Level: 0.0500000Design Coding: Rank-DeficientHessian Matx. for OptimizationHessian Matx. for Covariance MNo Variable Selection Process

*************Model Effects*************

Intercept + X2 + X2 * X2 + X2 * X2 * X2



Column Nobs Mean Std Dev Skewness KurtosisY[1] 11 154.45455 253.40693 3.0391701 9.6268476X[2] 11 -3.03e-017 0.3316625 3.35e-016 -1.2000000



1 | Intercept2 | X23 | X2*X24 | X2*X2*X2





Intercept 1 4.626600 0.074145 3893.7163 0.000000X2 1 -2.123067 0.367596 33.356969 0.000000X2*X2 1 -5.606160 1.209750 21.475317 0.000004X2*X2*X2 1 37.13480 2.816679 173.81507 0.000000Scale 0 1.388970 0.000000 . .



Intercept 4.6266002 4.48127939 4.77192108X2 -2.1230672 -2.84354162 -1.40259284X2*X2 -5.6061598 -7.97722651 -3.23509307


X2*X2*X2 37.134797 31.6142078 42.6553861


1| 0.0054972| -0.004889 0.13513| -0.06086 0.09244 1.4634| 0.09662 -0.71 -2.813 7.934


1| 12| -0.1794 13| -0.6785 0.2079 14| 0.4627 -0.6857 -0.8256 1


********************************************************

1 0.70099416 0.25365604 1.93724074 0.097738332 9.04502204 5.22153362 15.6682748 0.368419213 42.7909446 32.6056525 56.1578989 0.426677384 92.7482574 78.4096835 109.708889 0.352971825 115.089555 98.5089243 134.460971 0.375810746 102.166132 88.3476314 118.145992 0.291126087 81.0743561 70.5939974 93.1106249 0.209608808 71.8671138 61.8386007 83.5219746 0.219035479 88.9224997 75.9776349 104.072876 0.29698328

10 191.907748 169.733765 216.978536 0.3903839711 902.687377 825.574376 987.003139 0.97124493



Effect Deviance NDF DDF F Pr>F ChiSqu Pr>Chi

Intercept 2557.18 0 7 . . . .X2 690.361 1 7 967.645 0.0000 967.645 0.0000X2 * X2 522.835 1 7 86.8350 0.0000 86.8350 0.0000X2 * X2 * X2 14.6909 1 7 263.391 0.0000 263.391 0.0000



Effect NDF DDF F Pr>F ChiSquare Pr>Chi

X2 1 7 33.966367 0.0006 33.966367 0.0000X2 * X2 1 7 27.620368 0.0012 27.620368 0.0000X2 * X2 * X2 1 7 263.39142 0.0000 263.39142 0.0000

1.1.2 Stratified Two-Phase Logistic Regression

Example 1: Data by Carroll, Gail & Lubin, (1993)

Note, that this code is available in the files cmat/test/twophase.inp and twop macro.inp. The call cmattwophase.inp will generate the two output files twophase.txt and twophase.log.

1. Include the file with the functions for the estimation:

This file should be available in the cmat/test directory.

%inc "twop_macro.inp";

2. Two small data sets (Carroll, Gail & Lubin, 1993):

The first data set is small but with large counts in column 3. The first column contains the strata numbersand column 2 the binary response (disease, case-control):

print "Test1: Carroll Data";carr1 = [ 1 0 750 0 ,

2 0 562 1 ,1 1 336 0 ,2 1 396 1 ];

cnam1 = [" Strata Dy count Z "];carr1 = cname(carr1,cnam1);

The larger data set for phase 2 contains the strata number in the first column, the response in the secondcolumn, and the smaller count numbers in the third row. Column 4 is unused and corresponds to the strata,column 6 contains the only predictor variable which is here binary.

carr2 = [ 1 0 33 0 0 ,1 0 16 0 1 ,2 0 11 1 0 ,2 0 16 1 1 ,1 1 13 0 0 ,1 1 5 0 1 ,2 1 3 1 0 ,


2 1 18 1 1 ];cnam2 = [" Strata Dy count Z X "];carr2 = cname(carr2,cnam2);

3. Setting the column numbers:

par = cons(20,1,.);par[1] = cs = 1; par[2] = cy = 2; /* column number strata, disease= y */par[3] = cc = 3; /* column name count */par[4] = cx = 5; par[5] = cx2 = 5; /* first and last column number of cat. X vars */nx = cx2 - cx + 1; nxp = nx + 1; /* number covars=regr */

par[6] = ipri = 0;par[7] = maxiter = 1000; /* like in macro */par[8] = comp = 1; /* run comparison macro */par[9] = eps = 1.e-10; /* termination, usually 1.e-10 */par[10] = prosp = 0; /* never set different? */

4. Preparing the data for the parameter estimation:

ns = carr1[<>,cs]; ns2 = ns + ns; /* ns: number strata */nr1 = nrow(carr1); nr2 = nrow(carr2);< par,nz01,m01,mz01,strat,vars > = twop_inp(par,carr1,carr2);mres = inp2 = 0; free meth,pase,pase2,covms;

5. EM5 ML Estimation:

< par,parm,ase,cov > = twop_em5(par,nz01,m01,mz01,strat,vars,carr1,carr2);if (par[20] >= 0) {

mres++; str = "EM5 ML Estimation";if (mres == 1) meth = str; else meth = meth |> str;pase = pase -> (parm -> ase);pase2 = pase2 -> cons(ns,2,.);covms = covms |> cov;

}

6. Breslow-Cain Pseudo Likelihood:

< par,parm,ase,cov > = twop_bc(par,nz01,m01,mz01,strat,vars,carr1,carr2);if (par[20] >= 0) {

/* note: that BC method yields ns more parms for strata */mres++; str = "Breslow-Cain ML Estimation";if (mres == 1) meth = str; else meth = meth |> str;inp2 = 1;lo = ns + 1; up = ns + nxp;


pase = pase -> (parm[lo:up] -> ase[lo:up]);pase2 = pase2 -> (parm[1:ns] -> ase[1:ns]);covms = covms |> cov[lo:up,lo:up];

}

7. Pseudo Likelihood Schill:

< par,parm,ase,cov > = twop_zlr(par,nz01,m01,mz01,strat,vars,carr1,carr2);if (par[20] >= 0) {

/* note: that ZLR method yields ns more parms for strata */mres++; str = "Schill Pseudo ML Estimation";if (mres == 1) meth = str; else meth = meth |> str;inp2 = 1;lo = ns + 1; up = ns + nxp;pase = pase -> (parm[lo:up] -> ase[lo:up]);pase2 = pase2 -> (parm[1:ns] -> ase[1:ns]);covms = covms |> cov[lo:up,lo:up];

}

8. WLR: Weighted Likelihood Estimation:

< par,parm,ase,cov > = twop_wlr(par,nz01,m01,mz01,strat,vars,carr1,carr2);if (par[20] >= 0) {

mres++; str = "Weighted Likelihood Estimation";if (mres == 1) meth = str; else meth = meth |> str;pase = pase -> (parm -> ase);pase2 = pase2 -> cons(ns,2,.);covms = covms |> cov;

}

9. Sample 2 ML Estimation:

< par,parm,ase,cov > = twop_s2(par,nz01,m01,mz01,strat,vars,carr1,carr2);if (par[20] >= 0) {

mres++; str = "Sample 2 ML Estimation";if (mres == 1) meth = str; else meth = meth |> str;pase = pase -> (parm -> ase);pase2 = pase2 -> cons(ns,2,.);covms = covms |> cov;

}

10. Code for printing the results:

/* print meth; print pase; print pase2; */cnam = cn = [" Parameters AsymStdErr "];for (ic = 2; ic <= mres; ic++) cnam = cnam -> cn;


pase = cname(pase,cnam);

options ls=120;ic1 = 1; ic2 = ic1 + 2; if (ic2 > mres) ic2 = mres;

newprt1:for (ic = ic1; ic <= ic2; ic++) print meth[ic];jc1 = 2 * ic1 - 1; jc2 = 2 * ic2;

if (inp2) print "Estimates for Predictors";print pase[,jc1:jc2];if (inp2) {

pase2 = cname(pase2,cnam);rnam = prefname("S_",ns);pase2 = rname(pase2,rnam);print "Additional Estimates for Strata";print pase2[,jc1:jc2];

}if (ic2 < mres) {

ic1 = ic2 + 1; ic2 = ic1 + 2;if (ic2 > mres) ic2 = mres;goto newprt1;

}

11. Output of the results:

(a) EM5 ML Estimation

(b) Breslow-Cain ML Estimation

(c) Schill Pseudo ML Estimation

Estimates for Predictors

| Parameters AsymStdErr Parameters AsymStdErr Parameters AsymStdErr---------------------------------------------------------------------------------------alpha | -0.51352 0.14142 -0.30256 0.19631 -0.29962 0.19826

X | 0.95792 0.23659 0.56875 0.37337 0.56422 0.37600

Additional Estimates for Strata

| Parameters AsymStdErr Parameters AsymStdErr Parameters AsymStdErr-------------------------------------------------------------------------------------S_1 | . . -0.21943 0.04670 -0.21338 0.04472S_2 | . . 0.23344 0.04665 0.22738 0.04482

(a) Weighted Likelihood Estimation

(b) Sample 2 ML Estimation



| Parameters AsymStdErr Parameters AsymStdErr-------------------------------------------------------------alpha | -0.31383 0.18685 -0.34443 0.21547

X | 0.60808 0.35034 0.68136 0.39994

Example 2: Data by Pohlabeln et al. (2002)

1. Two larger data sets:

The first data set is small but with large counts in column 3. The first column contains the strata andcolumn 2 the binary response (disease):

print "Test2: HdA Data";HdA1 = [ 1 0 347 ,

2 0 62 ,3 0 42 ,4 0 42 ,5 0 210 ,6 0 48 ,7 0 46 ,8 0 42 ,1 1 135 ,2 1 19 ,3 1 32 ,4 1 28 ,5 1 360 ,6 1 80 ,7 1 89 ,8 1 96 ];

cnam = [" Strata Dy count "];HdA1 = cname(HdA1,cnam); /* print "HdA1=",HdA1; */

The second data set is so large that we are storing it in the cmat/tdata directory:

options NOECHO;%inc "..\\tdata\\HdA2.dat";

options ECHO;

We permute the columns of the large data set so that the strata column is first, followed by that ofthe binary response, the count variable, and then all of the predictor variables. The predictor variablesSMOKE1, SMOKE2, SMOKE3 are binary and FY is interval scaled:

cnam = [" Obs Dy Strata SMOKE FY countSMOKE1 SMOKE2 SMOKE3 "];

HdA2 = cname(HdA2,cnam);


ind = [ 3 2 6 7 8 9 5 4 ];HdA2 = HdA2[,ind];cnam = cnam[ind]; HdA2 = cname(HdA2,cnam);/* print "[1] HdA2=", HdA2[1:10,]; */

The number of different values in the columns of the predictor variables of data set 2 is important forestimating the amount of computer resources some applications require. (For interval scaled predictorvariables some of the estimation techniques require the storage of large (sparse) matrices, which may becomeclose to n2 × n2, where n2 is the number of rows of the phase-2 data matrix.

/* All predictor vars must be ordinal starting with level 0:Sort FY and replace by levels 0,1,2,... */

fy = HdA2[,7];fy = ordinal(fy); print "Nrank=", nrnk = fy[<>] + 1;/* HdA2[,7] = fy; HdA2 = cname(HdA2,cnam); */print "[2] HdA2=", HdA2[1:10,];

2. Setting the column numbers:

par = cons(20,1,.);par[1] = cs = 1; par[2] = cy = 2; /* column number strata, disease= y */par[3] = cc = 3; /* column name count *//* response is Dy, predictors are SMOKE1,2,3,FY */par[4] = cx = 4; par[5] = cx2 = 7; /* first and last column number of cat. X vars */nx = cx2 - cx + 1; nxp = nx + 1; /* number covars=regr */

par[6] = ipri = 0;par[7] = maxiter = 1000; /* like in macro */par[8] = comp = 1; /* run comparison macro */par[9] = eps = 1.e-10; /* termination, usually 1.e-10 */par[10] = prosp = 0; /* never set different? */

3. Preparing the data for the parameter estimation:

ns = HdA1[<>,cs]; ns2 = ns + ns; /* ns: number strata */nr1 = nrow(HdA1); nr2 = nrow(HdA2);print "nrow(HdA1,HdA2)=",nr1,nr2;< par,nz01,m01,mz01,strat,vars > = twop_inp(par,HdA1,HdA2);mres = inp2 = 0; free meth,pase,pase2,covms;

4. The code for calling the functions which obtain the parameter estimates and standard errors is very similarto that shown above for example 1.

5. Output of the results:

(a) EM5 ML Estimation


(b) Breslow-Cain ML Estimation

(c) Schill Pseudo ML Estimation


| Parameters AsymStdErr Parameters AsymStdErr Parameters AsymStdErr----------------------------------------------------------------------------------------

alpha | -1.6159 0.45468 -1.6263 0.45578 -1.6272 0.45624SMOKE1 | 0.84504 0.54383 0.94049 0.54140 0.94200 0.54176SMOKE2 | 1.9399 0.47946 1.9808 0.47861 1.9839 0.47849SMOKE3 | 2.4028 0.50382 2.4197 0.50837 2.4225 0.50865

FY | 0.16389 0.05739 0.13211 0.07039 0.12783 0.07378

Additional Estimates for Strata

| Parameters AsymStdErr Parameters AsymStdErr Parameters AsymStdErr-------------------------------------------------------------------------------------S_1 | . . -0.94405 0.08891 -0.93738 0.08456S_2 | . . -1.1827 0.25764 -1.1463 0.22919S_3 | . . -0.27193 0.22951 -0.33047 0.20098S_4 | . . -0.40547 0.23904 -0.41292 0.20688S_5 | . . 0.53900 0.07180 0.54064 0.06628S_6 | . . 0.51083 0.17592 0.51708 0.15275S_7 | . . 0.65999 0.17490 0.65840 0.14713S_8 | . . 0.82668 0.17844 0.81449 0.14987

(a) Weighted Likelihood Estimation

(b) Sample 2 ML Estimation


| Parameters AsymStdErr Parameters AsymStdErr--------------------------------------------------------------

alpha | -1.6296 0.44913 -1.7106 0.47797SMOKE1 | 0.90879 0.53278 0.89373 0.53473SMOKE2 | 1.9789 0.47023 2.0175 0.52330SMOKE3 | 2.4498 0.49940 2.4417 0.55246

FY | 0.11708 0.06081 0.14560 0.08239

Listing of the file twop macro.inp

/********************************************************************/

function twop_inp(par,dat1,dat2){

/* Input part for every estimation : see macro prep */


par[20] = 0;cs = par[1]; cy = par[2]; cc = par[3];cx = par[4]; cx2 = par[5]; /* first and last column number of cat. X vars */nx = cx2 - cx + 1; /* number covars=regr */

ns = dat1[<>,cs]; ns2 = ns + ns; /* ns: number strata *//* nr1,nr2: number rows of data */nr1 = nrow(dat1); nr2 = nrow(dat2);ipri = par[6];if (ipri) print "nx,ns,nr1,nr2=",nx,ns,nr1,nr2;cnam1 = cname(dat1); cnam2 = cname(dat2);xnam = (nx == 1) ? cnam2[cx] : cnam2[cx:cx2];

ns = dat1[<>,cs]; ns2 = ns + ns; /* ns: number strata */nz01 = m01 = mz01 = strat = .;

/*--- Phase 1 preparation -------------------------------*//* n0: number controls, n1: number events *//* nz0[ns]: number controls in strata, nz1[ns]: in events */n0 = n1 = 0.; nz0 = nz1 = cons(ns,1,0.);for (ir = 1; ir <= nr1; ir++) {

if (dat1[ir,cc] == 0)print "Warning: zero count data in sample 1: row=",ir;if (dat1[ir,cy] == 0) {

n0 += dat1[ir,cc];nz0[dat1[ir,cs]] += dat1[ir,cc];

} else {n1 += dat1[ir,cc];nz1[dat1[ir,cs]] += dat1[ir,cc];

} }par[12] = n0; par[13] = n1;nz01 = nz0 -> nz1;if (ipri) print "N0,N1=", n0,n1;if (ipri) print "nz0,nz1=",nz01;

/* this is what the (PROC MEANS BY &strat &caco &covar) call is doing:stored into vectors mz0,mz1[nsx] and not into tensortmp2[nr2,4+nx]: strata,covar,count1,count2 */

nd = 1 + nx;nc = ic1 = 3 + nx; ic0 = 2 + nx;tmp2 = cons(nr2,nc,0.);for (ir = 1; ir <= nr2; ir++) {

if (dat2[ir,cc] == 0)print "Warning: zero count data in sample 2: row=",ir;iy = dat2[ir,cy];tmp2[ir,1] = dat2[ir,cs]; /* strata starts at 1 */if (nx == 1) tmp2[ir,2] = dat2[ir,cx];else tmp2[ir,2:nd] = dat2[ir,cx:cx2];/* count in last column: no sort */if (iy == 0) tmp2[ir,ic0] = dat2[ir,cc];


else tmp2[ir,ic1] = dat2[ir,cc];}if (ipri > 3) {

cnam = "strata" -> xnam -> [" count0 count1 "];tmp2 = cname(tmp2,cnam);print "Before Sort Tmp2=",tmp2;

}

/* sort tmp2 wrt. temp2[,1] and compress for nsx */ind = [ 1:nd ];< ord,tmp2,rank > = sortrow(tmp2,ind);if (ipri > 3) print "After Sort Tmp2=",tmp2;nrnk = rank[nr2];if (ipri > 1) print "nrnk,rank=",nrnk,rank;

nsx = 1; i1 = rank[1];for (ir = 2; ir <= nr2; ir++) {

i2 = rank[ir];if (i1 == i2) {

/* print "same key: row and ind=", ir,i1; */tmp2[nsx,ic0] += tmp2[ir,ic0];tmp2[nsx,ic1] += tmp2[ir,ic1];

} else {nsx++;tmp2[nsx,] = tmp2[ir,];i1 = rank[ir];

} }if (ipri) print "nsx=",nsx;tmp2 = tmp2[1:nsx,];

par[11] = nsx; nsx2 = nsx + nsx;strat = tmp2[,1];mz0 = tmp2[,ic0]; mz1 = tmp2[,ic1];mz01 = mz0 -> mz1;if (ipri) print "strat,mz0,mz1=",strat -> mz01;vars = (nx == 1) ? tmp2[,2] : tmp2[,2:nd];if (ipri > 1) print "vars=",vars;

/* this is what the (PROC MEANS BY &strat &caco) call is doing *//* get m0[ns] and m1[ns] */m0 = m1 = cons(ns,1,0.);for (ir = 1; ir <= nr2; ir++) {

j = dat2[ir,cs]; iy = dat2[ir,cy];if (iy == 0) m0[j] += dat2[ir,cc];else m1[j] += dat2[ir,cc];

}m01 = m0 -> m1;if (ipri) print "m0,m1=",m01;

leave:return(par,nz01,m01,mz01,strat,vars);


}

/********************************************************************//************** EM5 ML Estimation ***********************************//********************************************************************/

function twop_em5(par,nz01,m01,mz01,strat,vars,dat1,dat2){

print "***************** Into EM5 ML Estimation ***************";ipri = par[6];if (ipri > 1) {

print "nz01,m01=",nz01,m01;print "mz01,strat,vars=",mz01 -> strat -> vars; /* [nsx],[nsx,nx] */

}parm = cov = ase = .;par[20] = 0;

cs = par[1]; cy = par[2]; cc = par[3];cx = par[4]; cx2 = par[5]; /* first and last column number of cat. X vars */nx = cx2 - cx + 1; /* number covars=regr */nxp = nx + 1; /* number parms */

maxiter = par[7]; comp = par[8];eps = par[9]; prosp = par[10];

ns = dat1[<>,cs]; ns2 = ns + ns; /* ns: number strata */nr1 = nrow(dat1); nr2 = nrow(dat2);cnam1 = cname(dat1); cnam2 = cname(dat2);xnam = (nx == 1) ? cnam2[cx] : cnam2[cx:cx2];nsx = par[11]; nsx2 = nsx + nsx;n0 = par[12]; n1 = par[13];if (ipri > 1) print "ns,nsx,n0,n1=",ns,nsx,n0,n1;

nz0 = nz01[,1]; nz1 = nz01[,2]; /* [ns] */m0 = m01[,1]; m1 = m01[,2]; /* [ns] */mz0 = mz01[,1]; mz1 = mz01[,2]; /* [nsx] */

/*-----------------------------------------------------------------*/

/* get nm0[ns]= nz0[ns] - m0[ns] and nm1[ns]= nz1[ns] - m1[ns] */nm0 = nm1 = cons(ns);for (i = 1; i <= ns; i++) {

nm0[i] = nz0[i] - m0[i];nm1[i] = nz1[i] - m1[i];

}if (ipri > 2) print "nm0,nm1=",nm0 -> nm1;

/* m_com[nsx2] */m_com = mz0 |> mz1;/* get m_inc[nsx2], nz0[nsx2] */


m_inc = nz10 = cons(nsx2);l = 1;for (j = 1; j <= 2; j++)for (i = 1; i <= nsx; i++, l++) {

is = strat[i];m_inc[l] = (j == 1) ? nm0[is] : nm1[is];nz10[l] = (j == 1) ? nz0[is] : nz1[is];

}if (ipri > 2) print "m_com,m_inc,nz10=",m_com -> m_inc -> nz10;

/* get ss2[nsx2], sy, sz2[nsx2], vars[nsx2,nx] */sy = cons(nsx,1,0.) |> cons(nsx,1,1.);sz = ss2 = cons(nsx2);l = 1;for (j = 0; j < 2; j++)for (i = 1; i <= nsx; i++, l++) {

is = strat[i]; sz[l] = is;ss2[l] = 2 * is - 1 + j;

}if (ipri > 2) print "sy,sz,ss2=",sy -> sz -> ss2;

/* parm[1+nx+nsx] */np = 1 + nx + nsx;/* make new function names = prefname(pref,n) */parmc = prefname("delta_",nsx);pnam = "alpha" -> xnam -> parmc‘;v1 = (nx == 1) ? 0. : cons(nx,1,0.);t = log(n0 / nsx); v2 = cons(nsx,1,t);parm = log(n1 / n0) |> v1 |> v2;parm = rname(parm,pnam);if (ipri > 1) print "parm=",parm;

/* get large sparse design matrix:xdes[nsx2,np]= (sy[nsx2],sy[nsx2]#vars[nsx2,nx],ide[nsx2,nsx]) */

hh = cons(nsx,nx,0.) |> vars;ndes = [ 1:nsx 1:nsx ];xdes = sy -> hh -> design(ndes);if (ipri > 3) print "Xdesign=",xdes;

/* get: mu[nsx2], B[nsx2,ns2] */mu = exp(xdes * parm);if (ipri > 2) print "mu=",mu;B = design(ss2); BB = B * B‘;if (ipri > 3) print "Bdesign=",B;

for (iter = 1; iter <= maxiter; iter++) {/* Update weight vector */wgt = mu ./ (BB * mu);/* Update pseudo-counts mv[nsx2] */mv = m_com + diag(wgt) * m_inc;


mlogm = cons(nsx2,1,0.);for (i = 1; i <= nsx; i++)if (mv[i] > 0.) mlogm[i] = mv[i] * log(mv[i]);v1 = mlogm - mv .* log(mu) - mv + mu;devold = 2. * v1[+]; thold = parm;rhs = xdes‘* (mv - mu);if (iter < 10) {

sym = xdes‘ * diag(mu) * xdes;} else {

WW = (wgt * wgt‘) .* BB;sym = xdes‘ * (diag(mu) - diag(m_inc) * (diag(wgt) - WW)) * xdes;

}/* if (ipri > 3)

print "Solving linear system with", sym; */

parm += sym \ rhs;for (jter = 1; jter <= 3; jter++) {

mu = exp(xdes * parm);v1 = mlogm - mv .* log(mu) - mv + mu;dev = 2. * v1[+];if (dev < devold+eps) break;parm = .5 * (parm + thold);

}v2 = abs(parm - thold); crit = v2[+];if (crit < eps) break;

}par[19] = crit;if (iter == maxiter) {

print "Warning maxiter=",maxiter," iterations without convergence: crit=", crit;par[20] = -1; goto leave;

}if (ipri)print "Convergence with crit=",crit," after",iter," iterations.";

/* compute COV matrix */ww = (wgt * wgt‘) .* BB;cov = inv(xdes‘ * (diag(mu) - diag(m_inc) * (diag(wgt) - ww)) * xdes);if (ipri > 1) print "EM5: parm=",parm;if (ipri > 2) print "EM5: Cov=",cov;

/* modification of intercept due to retrospective phase one sample */if (prosp == 0) {

parm[1] -= log(n1 / n0);cov[1,1] -= 1. / n0 + 1. / n1;

}

/* Wrapping it up */ind = [ 1:nxp ];parm = parm[ind];pnam = "alpha" -> xnam; parm = rname(parm,pnam);


cov = cov[ind,ind];ase = dia2vec(cov);for (j = 1; j <= nxp; j++)ase[j] = (ase[j] <= 0.) ? 0. : sqrt(ase[j]);ase = rname(ase,pnam);

leave:if (ipri > 1) print "Result EM5: parm, ase=", parm -> ase;return(par,parm,ase,cov);

}

/********************************************************************//************** Pseudo Likelihood Breslow-Cain **********************//********************************************************************/

function twop_bc(par,nz01,m01,mz01,strat,vars,dat1,dat2){

print "*** Into Breslow-Cain Pseudo Likelihood Estimation ************";ipri = par[6];if (ipri > 1) {


}parm = cov = ase = .;par[20] = 0;

cs = par[1]; cy = par[2]; cc = par[3];cx = par[4]; cx2 = par[5]; /* first and last column number of cat. X vars */nx = cx2 - cx + 1; /* number covars=regr */nxp = 1 + nx; /* number parms */


ns = dat1[<>,cs]; ns2 = ns + ns; /* ns: number strata */nr1 = nrow(dat1); nr2 = nrow(dat2);cnam1 = cname(dat1); cnam2 = cname(dat2);xnam = (nx == 1) ? cnam2[cx] : cnam2[cx:cx2];nsx = par[11]; nsx2 = nsx + nsx;n0 = par[12]; n1 = par[13];ntot = (real)n0 + n1;if (ipri > 2) print "ns,nxp,n0,n1=",ns,nxp,n0,n1;


/*-----------------------------------------------------------------*/

offs1 = anz1 = cons(ns,1,0.);st1 = cons(ns,ns,0.);


tt = log(n1 / n0);for (is = 1; is <= ns; is++) {

offs1[is] = (1. - prosp) * tt;anz1[is] = nz0[is] + nz1[is];for (i = 1; i <= ns; i++) st1[is,i] = (i == is) ? 1. : 0.;

}

vnam = prefname("s_",ns)‘; if (ipri > 2) print "vnam=", vnam;cnam = [" offs1 nz1 anz1 "] -> vnam;tmp1 = offs1 -> nz1 -> anz1 -> st1;tmp1 = cname(tmp1,cnam);if (ipri > 2) print "Tmp1=",tmp1;

pri = (ipri > 1) ? 5 : 0;optn = [ "print" pri ,

"noint" ,"offset" 1 ,"link" "logit" ,"dist" "binom" ,"trial" 3 ,"tech" "trureg" ];

modl1 = sprintf("2/3 = %i : %i",4,3+ns);if (ipri > 2) print "Modl=",modl1;< gof1,parm1,sterr1,conf1,cov1 > = glim(tmp1,modl1,optn);parm1 = rname(parm1,vnam);if (ipri > 1) print "Parm1=",parm1;

offs2 = anz2 = alfa = cons(nsx,1,0.);st2 = cons(nsx,ns,0.);for (i = 1; i <= nsx; i++) {

is = strat[i];offs2[i] = log((real)m1[is] / m0[is]) - parm1[is];anz2[i] = mz0[i] + mz1[i];st2[i,is] = alfa[i] = 1.;

}

pnam = "alpha" -> xnam;cnam = [" offs2 mz1 anz2 "] -> pnam;

pred = alfa -> vars;tmp2 = offs2 -> mz1 -> anz2 -> pred;tmp2 = cname(tmp2,cnam);if (ipri > 2) print "Tmp2=",tmp2;


"noint" ,"offset" 1 ,"link" "logit" ,"dist" "binom" ,


"trial" 3 ,"tech" "trureg" ];

modl2 = sprintf("2/3 = %i : %i",4,4+nx);if (ipri > 2) print "Modl=",modl2;< gof2,parm2,sterr2,conf2,cov2 > = glim(tmp2,modl2,optn);parm2 = rname(parm2,pnam);if (ipri > 1) print "Parm2=",parm2;

/* Stage 1: dim=ns */w1 = nz1 |> nz0; z1 = st1 |> st1;xd1 = st1 |> st1; of1 = offs1 |> offs1;eta1 = of1 + xd1 * parm1;p1 = 1. / (1. + exp(-eta1)); f1 = p1 .* (1. - p1);if (ipri > 2) print "eta1,p1,f1=", eta1 -> p1 -> f1;xpxi = z1‘ * (w1 .* f1 .* z1);if (ipri > 3) print "XPXi=",xpxi;

/* Stage 2: dim=nxp */w2 = mz1 |> mz0; z2 = st2 |> st2; x2 = pred |> pred;xd2 = pred |> pred; of2 = offs2 |> offs2;eta2 = of2 + xd2 * parm2;p2 = 1. / (1. + exp(-eta2)); f2 = p2 .* (1. - p2);if (ipri > 2) print "eta2,p2,f2=", eta2 -> p2 -> f2;

/* Stage 1: dim=ns2 */rho = ntot / n0 + ntot / n1;vmat = ntot / (z1‘* (f1 .* w1)); if (ipri > 3) print "Vmat=", vmat;wmat = diag(vmat) - (1. - prosp) * rho * cons(ns,ns,1.);cov1 = wmat / ntot;ase1 = dia2vec(cov1);for (j = 1; j <= ns; j++)ase1[j] = (ase1[j] <= 0.) ? 0. : sqrt(ase1[j]);if (ipri > 2) print "ASE1=",ase1;

/* Stage 2: dim=ns+nxp */ndim = ns + nxp;x12 = cons(ns2,nxp,0.);xdes = (z1 -> x12) |> (-z2 -> x2);wgt = w1 |> w2; fmat = f1 |> f2;hmat = (xdes‘ * (wgt .* fmat .* xdes)) / ntot;if (ipri > 3) print "Hmat=",hmat;

/* modification of variance due to 2-stage setup */m20 = st2‘ * mz0; m21 = st2‘ * mz1;vec = ntot / m20 + ntot / m21;qmat = diag(vec); if (ipri > 3) print "Qmat=", qmat;nsp = ns + 1;ind1 = [ 1 : ns ]; ind2 = [ nsp : ndim ];h12 = hmat[ind1,ind2]; h22 = hmat[ind2,ind2]; h22i = inv(h22);cov2 = (h22i * (h22 - h12‘* (qmat - wmat) * h12) * h22i) / ntot;


if (ipri > 3) print "COV2=", cov2;

/* Wrapping it up */ase2 = dia2vec(cov2);for (j = 1; j <= nxp; j++)ase2[j] = (ase2[j] <= 0.) ? 0. : sqrt(ase2[j]);/* print "ASE2=",ase2; */

cov = cov2 \> xpxi;parm = parm1 |> parm2; ase = ase1 |> ase2;

leave:if (ipri > 1) print "Result BC: parm, ase=", parm -> ase;return(par,parm,ase,cov);

}

/********************************************************************//************** Pseudo Likelihood Schill ****************************//********************************************************************/

function twop_zlr(par,nz01,m01,mz01,strat,vars,dat1,dat2){

print "********** Into Schill Pseudo Likelihood Estimation *************";ipri = par[6];if (ipri > 1) {


}parm = cov = ase = .;par[20] = 0;



ns = dat1[<>,cs]; ns2 = ns + ns; /* ns: number strata */nr1 = nrow(dat1); nr2 = nrow(dat2);cnam1 = cname(dat1); cnam2 = cname(dat2);xnam = (nx == 1) ? cnam2[cx] : cnam2[cx:cx2];

nsx = par[11]; nsx2 = nsx + nsx;n0 = par[12]; n1 = par[13];ntot = (real)n0 + n1;if (ipri > 2) print "ns,nxp,n0,n1=",ns,nxp,n0,n1;

nz0 = nz01[,1]; nz1 = nz01[,2]; /* [ns] */m0 = m01[,1]; m1 = m01[,2]; /* [ns] */


mz0 = mz01[,1]; mz1 = mz01[,2]; /* [nsx] */

/*-----------------------------------------------------------------*/

nssx = ns + nsx; nss2 = nssx + nssx;offst = event = trial = alpha = cons(nssx,1,0.);st = cons(nssx,ns,0.);tt = log(n1 / n0);for (is = 1; is <= ns; is++) {

offst[is] = (1. - prosp) * tt;event[is] = nz1[is];trial[is] = nz0[is] + nz1[is];st[is,is] = 1.; alpha[is] = 0.;

}l = ns + 1;for (i = 1; i <= nsx; i++, l++) {

is = strat[i];offst[l] = log((real)m1[is] / m0[is]);event[l] = mz1[i];trial[l] = mz0[i] + mz1[i];st[l,is] = -1.; alpha[l] = 1.;

}if (ipri > 2) print "offs,event,trial=",offst -> event -> trial;if (ipri > 3) print "st=",st;

vnam = prefname("s_",ns)‘; /* print "vnam=", vnam; */pnam = vnam -> "alpha" -> xnam;cnam = [" offst event trial "] -> pnam;

xvar = cons(ns,nx,0.) |> vars;pred = alpha -> xvar;tmp1 = offst -> event -> trial -> st -> pred;tmp1 = cname(tmp1,cnam);if (ipri > 3) print "Tmp1=",tmp1;

np = ns + nxp;pri = (ipri > 1) ? 5 : 0;optn = [ "print" pri ,


modl = sprintf("2/3 = %i : %i",4,4+ns+nx);if (ipri > 2) print "Modl=",modl;< gof,parm,sterr,conf,cov > = glim(tmp1,modl,optn);parm = rname(parm,pnam);if (ipri > 1) print "Parm=",parm;


ns1 = ns + 1;y1 = cons(ns,1,1.) |> cons(ns,1,0.);y2 = cons(nsx,1,1.) |> cons(nsx,1,0.);yv = y1 |> y2;wv = nz1 |> nz0 |> mz1 |> mz0;x1 = st[1:ns,]‘ -> alpha[1:ns] -> cons(ns,nx,0.);x2 = st[ns1:nssx,] -> alpha[ns1:nssx] -> vars;xv = x1 |> x1 |> x2 |> x2;if (ipri > 2) print "Xv=", xv;

of1 = offst[1:ns]; of2 = offst[ns1:nssx];offs = of1 |> of1 |> of2 |> of2;if (ipri > 2) print "offs=",offs;

x1 = st[1:ns,]‘; x2 = -st[ns1:nssx,]‘;if (ipri > 2) print "X1,X2=", x1,x2;m20 = x2 * mz0; m21 = x2 * mz1;if (ipri > 2) print "m20,m21=",m20,m21;

/* PCL-variance 2-phase logistic regression by Schill */eta = offs + xv * parm;pv = 1. / (1. + exp(-eta));fv = pv .* (1. - pv); if (ipri > 2) print "eta,fv=",eta -> fv;covp = inv(xv‘ * (wv .* fv .* xv));dia = dia2vec(covp);for (j = 1; j <= np; j++)dia[j] = (dia[j] <= 0.) ? 0. : 1. / sqrt(dia[j]);corp = diag(dia) * covp * diag(dia);hmat = (xv‘* (wv .* fv .* xv)) / ntot;if (ipri > 3) print "Hmat=",hmat;

/* Modification due to 2-phase setup, Diss. p29,30 */f1 = fv[1:ns] .* (nz1 + nz0);vv = diag((x1 * f1) / ntot); if (ipri > 2) print "vv=",vv;vv = diag(vv);ta = hmat[1:ns,1:ns] - vv; tb = hmat[1:ns,ns1:np];qmat = ntot / diag(m20) + ntot / diag(m21);if (ipri > 2) print "qmat=",qmat;

/* nxp = np - ns: (ta,tb)[ns,np] */up = ta -> tb; lo = cons(nxp,np,0.);ri = up |> lo;rho = ntot / n0 + ntot / n1; if (ipri > 2) print "rho=",rho;vvt = vv[,+] @ vv[+,]; if (ipri > 3) print "VVT=",vvt;

ds1 = (1. - prosp) * (((rho * vvt) -> cons(ns,nxp,0.)) |> lo);ds2 = ri‘ * ((qmat -> cons(ns,nxp,0.)) |> lo) * ri;if (ipri > 2) print "ds1,ds2=",ds1,ds2;

hi = inv(hmat);


cov = (hi * (hmat - ds1 - ds2) * hi) / ntot;if (ipri > 2) print "COV=", cov;

/* Wrapping it up */ase = dia2vec(cov);for (j = 1; j <= np; j++)ase[j] = (ase[j] <= 0.) ? 0. : sqrt(ase[j]);ase = rname(ase,pnam);

leave:if (ipri > 1) print "Result ZLR: parm, ase=", parm -> ase;return(par,parm,ase,cov);

}

/********************************************************************//************ WLR: Weighted Likelihood Estimation *******************//********************************************************************/

function twop_wlr(par,nz01,m01,mz01,strat,vars,dat1,dat2){

print "************ Into WLR Weighted Likelihood Estimation ***************";ipri = par[6];if (ipri > 1) {


}parm = cov = ase = .;par[20] = 0;




nsx = par[11]; nsx2 = nsx + nsx;n0 = par[12]; n1 = par[13];if (ipri > 2) print "ns,nsx,n0,n1=",ns,nsx,n0,n1;



/*-----------------------------------------------------------------*/

nm0 = nm1 = cons(ns);for (i = 1; i <= ns; i++) {

nm0[i] = nz0[i] - m0[i];nm1[i] = nz1[i] - m1[i];

}if (ipri > 2) print "nm0,nm1=",nm0 -> nm1;

/* get m_com[nsx2],m_inc[nsx2], nz0[nsx2] */m_com = mz0 |> mz1;m_inc = nz10 = cons(nsx2);l = 1L;for (j = 0; j <= 1; j++)for (i = 1; i <= nsx; i++, l++) {

is = strat[i];m_inc[l] = (j) ? nm1[is] : nm0[is];nz10[l] = (j) ? nz1[is] : nz0[is];

}if (ipri > 2) print "m_com,m_inc,nz10=",m_com,m_inc -> nz10;

/*-----------------------------------------------------------------*/

/* Logistic regression needs only sample 2 data [nsx] */w0 = w1 = anz = cons(nsx,1,0.);for (i = 1; i <= nsx; i++) {

is = strat[i];w0[i] = (real)nz0[is] / (real)m0[is] * mz0[i];w1[i] = (real)nz1[is] / (real)m1[is] * mz1[i];anz[i] = w0[i] + w1[i];

}if (ipri > 2) print "w0,w1,anz=",w0 -> w1 -> anz;tmp1 = w1 -> anz -> vars;if (ipri > 3) print "Tmp1=",tmp1;


"link" "logit" ,"dist" "binom" ,"trial" 2 ,"tech" "trureg" ];

modl = (nx == 1) ? "1/2 = 3": sprintf("1/2= %i:%i",3,2+nx);

if (ipri > 2) print "modl=",modl;< gof,parm,sterr,conf,cov > = glim(tmp1,modl,optn);pnam = "alpha" -> xnam;parm = rname(parm,pnam);if (ipri > 1) print "pnam,parm=",pnam,parm;


/*-----------------------------------------------------------------*/

/* Sort (strat,yv,) w.r.t key = (strat=1,...,ns, caco=1,0) */key = cons(nsx2); tmp2 = cons(nsx2,3);l = 1;for (j = 0; j <= 1; j++)for (i = 1; i <= nsx; i++, l++) {

is = strat[i];key[l] = (is - 1.) * 2. + (1. - j);tmp2[l,1] = (j) ? nz1[is] : nz0[is];tmp2[l,2] = (j) ? m1[is] : m0[is];tmp2[l,3] = (j) ? mz1[i] : mz0[i];

}

/* tmp1 = (key,caco,strat) (nz01,m01,mz01,m_com,vars) */caco = cons(nsx,1,0.) |> cons(nsx,1,1.);vrs = vars |> vars;nc = 6 + nx; str = strat |> strat;tmp1 = key -> caco -> str -> tmp2 -> m_com -> vrs;cnam = [" key caco strat nz01 m01 mz01 m_com "] -> xnam;tmp1 = cname(tmp1,cnam);if (ipri > 3) print "For sorting:", tmp1;< ord,tmp1,rank > = sortrow(tmp1,1);if (ipri > 3) print "Sort rank=", rank;if (ipri > 3) print "After sorting:", tmp1;caco = tmp1[,2]; strat = tmp1[,3];m_com = tmp1[,7];if (nx == 1) vars = tmp1[,8];else { ux = 7+nx; vars = tmp1[,8:ux]; }if (ipri > 2) print "VARS=", vars;

w0 = w1 = ww = cons(nsx,1,0.);mm = mcsd = misd = cons(nsx2);ll = 1;for (is = 1; is <= ns; is++) {

ca = c0 = 0.;for (i = 1; i <= nsx2; i++)if (strat[i] == is) {

iy = caco[i];if (iy) ca += m_com[i]; else c0 += m_com[i];

}if (ipri > 2) print "ca,c0=",ca,c0;

l = ll;for (i = k = 1; i <= nsx2; i++)if (strat[i] == is) {

iy = caco[i];if (iy) w1[k] = m_com[i] / ca; else w0[k] = m_com[i] / c0;ww[k] = w1[k] + w0[k];k++; l++;


}if (ipri > 2) print "strata,N=",is,l;if (ipri > 5) print "w1,w0,ww=", w1 -> w0 -> ww;

for (i = k = 1; i <= nsx2; i++)if (strat[i] == is) {

iy = caco[i];mcsd[ll] = (iy) ? ca : c0;misd[ll] = (iy) ? nm1[is] : nm0[is];mm[ll] = m_com[i] + misd[ll] * ww[k];k++; ll++;

} }if (ipri > 2) print "mm,mcom,mcomsd,mincsd=",mm -> m_com -> mcsd -> misd;

/* Design matrix for logistic model */xdes = cons(nsx2,1,1.) -> vars;

/* variance for weighted logistic regression according to Reilly and Pepe () */pv = 1. / (1. + exp(-xdes * parm));fv = pv .* (1. - pv);/* ww = mm ./ fv; xx = fv .* xdes; */

qv = (mcsd + misd) ./ mcsd;ww = (caco - pv) .* (caco - pv);a1 = xdes‘ * (m_com .* qv .* (qv - 1.) .* ww .* xdes);if (ipri > 2) print "A1=",a1;

ns2 = ns + ns;m01 = q01 = cons(ns2);u01 = cons(ns2,nxp);for (is = k = 1; is <= ns; is++) {

m0i = m1i = q0i = q1i = cons(nsx2);u0i = u1i = cons(nsx2,nxp);for (i = 1; i <= nsx2; i++)if (strat[i] == is) {

iy = caco[i];if (iy) m1i[i] = mcsd[i]; else m0i[i] = mcsd[i];if (iy) q1i[i] = qv[i]; else q0i[i] = qv[i];if (iy) u1i[i] = (iy - pv[i]) .* m_com[i] .* xdes[i,];

else u0i[i] = (iy - pv[i]) .* m_com[i] .* xdes[i,];}/* if (ipri > 2) print "u01,u1i=",u0i,u1i; */m01[k] = m1i[<>]; m01[k+1] = m0i[<>];q01[k] = q1i[<>]; q01[k+1] = q0i[<>];u01[k,] = u1i[+,] ./ m01[k];u01[k+1,] = u0i[+,] ./ m01[k+1];k += 2;

}if (ipri > 2) print "m01,q01,u01=",m01 -> q01 -> u01;


a2 = u01‘* (q01 .* (q01 - 1.) .* m01 .* u01);if (ipri > 3) print "A2=",a2;hm = inv(xdes‘ * (m_com .* qv .* fv .* xdes));if (ipri > 3) print "HM=",hm;cov = hm + hm * (a1 - a2) * hm;if (ipri > 2) print "COV=", cov;

/* modification of intercept due to retrospective phase one sample */if (prosp == 0) {

parm[1] -= log(n1 / n0);cov[1,1] -= 1. / n0 + 1. / n1;

}

/* Wrapping it up */ase = dia2vec(cov);for (j = 1; j <= nxp; j++)ase[j] = (ase[j] <= 0.) ? 0. : sqrt(ase[j]);ase = rname(ase,pnam);

leave:if (ipri > 1) print "Result WLR: parm, ase=", parm -> ase;return(par,parm,ase,cov);

}

/********************************************************************//************** S2 Complete Case ML *********************************//********************************************************************/

function twop_s2(par,nz01,m01,mz01,strat,vars,dat1,dat2){

print "************ Into S2 Likelihood Estimation ***************";ipri = par[6];if (ipri > 1) {


}parm = cov = ase = .;par[20] = 0;





nsx = par[11]; nsx2 = nsx + nsx;n0 = par[12]; n1 = par[13];if (ipri > 2) print "ns,nsx,n0,n1=",ns,nsx,n0,n1;


/*-----------------------------------------------------------------*/

offst = event = trial = alpha = cons(nsx,1,0.);st = cons(nsx,ns,0.);sm20 = m0[+]; sm21 = m1[+]; tt = log((real)sm21 / sm20);if (ipri > 2) print "sm20,sm21=", sm20,sm21;for (i = 1; i <= nsx; i++) {

is = strat[i];offst[i] = tt;event[i] = mz1[i];trial[i] = mz0[i] + mz1[i];st[i,is] = alpha[i] = 1.;

}if (ipri > 2) print "offs,event,trial,alpha=",offst -> event -> trial -> alpha;if (ipri > 3) print "st=",st;

pnam = "alpha" -> xnam;cnam = [" offst event trial "] -> pnam;

pred = alpha -> vars;tmp1 = offst -> event -> trial -> pred;tmp1 = cname(tmp1,cnam);if (ipri > 3) print "Tmp1=",tmp1;

np = nxp;pri = (ipri > 1) ? 5 : 0;optn = [ "print" pri ,


modl = sprintf("2/3 = %i : %i",4,4+nx);if (ipri > 2) print "Modl=",modl;< gof,parm,sterr,conf,cov > = glim(tmp1,modl,optn);pnam = prefname("est_",np)‘; /* print "pnam=", pnam; */parm = rname(parm,pnam);

yv = cons(nsx,1,1.) |> cons(nsx,1,0.);wgt = mz1 |> mz0;


off = offst |> offst;pred = alpha -> vars;xdes = pred |> pred; if (ipri > 3) print "Xdes=",xdes;

eta = off + xdes * parm;pv = 1. / (1 + exp(eta)); fv = pv .* (1. - pv);if (ipri > 2) print "eta,fv=",eta -> fv;cov = inv(xdes‘ * (wgt .* fv .* xdes));

/* Sample 2 is always retrospective : correction */cov[1,1] = cov[1,1] - (1. / sm20 + 1. / sm21);if (ipri > 2) print "COV=", cov;

/* Wrapping it up */pnam = "alpha" -> xnam; parm = rname(parm,pnam);ase = dia2vec(cov);for (j = 1; j <= np; j++)ase[j] = (ase[j] <= 0.) ? 0. : sqrt(ase[j]);ase = rname(ase,pnam);

leave:if (ipri > 1) print "Result S2: parm, ase=", parm -> ase;return(par,parm,ase,cov);

}

/********************************************************************/

1.1.3 Examples for glmixd

1. The followig three examples are taken from MIXNO manual by Hedeker (1999, [383]). The observations ofthe data set tvspfors.dat are students which are clustered within classrooms.

#include "tvsfpors.dat"print " Nobs=", nrow(tvsfp),

" Nvar=", ncol(tvsfp);

In the first example all effects are considered fixed and the clustering of subjects in the data is ignored forthe parameter estimation.

optn = [ "print" 4,"link" "logit","efdis" "normal","yscale" "n","idvar" 2,"ctvar" 6,"cl" "wald","nquad" 10,"tech" "trureg" ];


model = "3 = 6 7 8 9";class = [ 2 3 ];< gof,par,ase,ci,cov> = glmixd(tvsfp,model,optn,class);


Number Valid Observations 1600Response Variable Y[3]N Independend Variables 4Error Distribution NORMALLink Function LOGITEffect Distribution NORMALNominal Response with 4 Categ.Subject Variable Column 2Significance Level: 0.0500000Design Coding: Full-Rank

*************Model Effects*************

Intercept + X6 + X7 + X8 + X9


Class Level Value

Y[1] 4 1 2 3 4I[2] 135 193101 194101 194102 194103 194104

194105 194106 196101 196102 197101197102 197103 197104 198101 198102198103 199101 199102 199103 199104199105 199106 401101 401102 402101402102 403101 403102 404101 404102404103 405101 405102 405103 407101407102 407103 408101 408102 408103408104 409101 409102 409103 409104410101 410102 410103 410104 411101411102 412101 412102 412103 412104412105 412106 414101 414102 414104414105 414106 415101 415102 415103415104 415105 505101 505102 505103505104 505105 505106 505107 506102


506103 506105 506107 506110 506111506112 506113 506114 506115 506116507101 507102 507103 507104 507105507106 507107 508101 508102 508103508104 509101 509102 509103 509104509105 509106 509107 509108 510101510102 510103 510104 510105 510106510107 513101 513102 513103 513104514101 514102 514103 514104 514105514106 514107 515101 515102 515103515104 515105 515106 515107 515108515109 515110 515111 515112 515113

T[6] 7 0 1 2 3 45 6


Column Nobs Mean Std Dev Skewness KurtosisX[6] 1600 2.0693750 1.2601804 0.3854767 -0.2616621X[7] 1600 0.4768750 0.4996211 0.0926860 -1.9939032X[8] 1600 0.4993750 0.5001559 0.0025023 -2.0024984X[9] 1600 0.2393750 0.4268354 1.2227251 -0.5055769


Number | Coefficient | Effect-------------------------------------------------------------

1 | Alph_1_1 | Resp 0 vs. 1 Fixed Intercept2 | Alph_2_1 | Resp 0 vs. 1 X63 | Alph_3_1 | Resp 0 vs. 1 X74 | Alph_4_1 | Resp 0 vs. 1 X85 | Alph_5_1 | Resp 0 vs. 1 X96 | Alph_1_2 | Resp 0 vs. 2 Fixed Intercept7 | Alph_2_2 | Resp 0 vs. 2 X68 | Alph_3_2 | Resp 0 vs. 2 X79 | Alph_4_2 | Resp 0 vs. 2 X8

10 | Alph_5_2 | Resp 0 vs. 2 X911 | Alph_1_3 | Resp 0 vs. 3 Fixed Intercept12 | Alph_2_3 | Resp 0 vs. 3 X613 | Alph_3_3 | Resp 0 vs. 3 X714 | Alph_4_3 | Resp 0 vs. 3 X815 | Alph_5_3 | Resp 0 vs. 3 X9

-------------------------------------------------------------


Data are sorted in groups of subject ID variable.


Variable Value Nobs ProportionY[3] 1 355 22.187500

2 398 24.8750003 400 25.0000004 447 27.937500

I[2] 193101 26 1.625000194101 11 0.687500194102 10 0.625000..........................515111 15 0.937500515112 11 0.687500515113 10 0.625000

T[6] 0 150 9.3750001 419 26.1875002 481 30.0625003 332 20.7500004 163 10.1875005 48 3.0000006 7 0.437500

*****************************************Crosstabulation of T[6] vs. Response Y[3]*****************************************

1 2 3 4 Total----------------------------------------------------------

0 55 41 35 19 1500.366667 0.273333 0.233333 0.126667

1 117 126 89 87 4190.279236 0.300716 0.212411 0.207637

2 106 119 135 121 4810.220374 0.247401 0.280665 0.251559

3 54 73 96 109 3320.162651 0.219880 0.289157 0.328313

4 15 34 38 76 1630.092025 0.208589 0.233129 0.466258

5 8 4 7 29 480.166667 0.083333 0.145833 0.604167

6 0 1 0 6 70.000000 0.142857 0.000000 0.857143


----------------------------------------------------------Total 355 398 400 447 1600

***********************************************Data for Subject 403101 who has 1 Observations***********************************************

Y Vector--------

1 : 2

W Vector--------

1 : 1 2 1 0 0

***************Starting Values***************

Covariates----------

1| -1.255 0 0 0 02| -0.1176 0 0 0 03| 0.9476 0 0 0 0

*************************************Maximum Marginal Likelihood Estimates*************************************

Total Iterations 7 N Function Calls 10N Gradient Calls 4 N Hessian Calls 11Log Likelihood -2122.808437 Deviance (-2logL) 4245.616873AIC -4215.616873 SBC -4134.950490

-------- ------------ ------------ ------------ ------------Variable Estimate Stand. Error Z Value p-Value-------- ------------ ------------ ------------ ------------

RESPONSE CODE 1 vs CODE 0----------------------------

Fixed EffectsAlph_1 -0.219036381 0.177652597 -1.232947816 0.217595 (2)


Alph_2 0.164254474 0.064137054 2.560991875 0.010437 (2)Alph_3 0.178630725 0.213968578 0.834845596 0.403805 (2)Alph_4 -0.118959930 0.187567200 -0.634225655 0.525934 (2)Alph_5 0.159968921 0.301765627 0.530109814 0.596036 (2)


Fixed EffectsAlph_1 -0.961286263 0.192973639 -4.981438243 6.3e-007 (2)Alph_2 0.337251249 0.064070080 5.263786958 1.4e-007 (2)Alph_3 0.901755503 0.215772561 4.179194504 2.9e-005 (2)Alph_4 0.130869553 0.201744637 0.648689130 0.516539 (2)Alph_5 -0.106344003 0.301611904 -0.352585564 0.724399 (2)


Fixed EffectsAlph_1 -1.721515496 0.204571423 -8.415229620 3.9e-017 (2)Alph_2 0.632252334 0.063955645 9.885794037 4.8e-023 (2)Alph_3 1.233279731 0.217255996 5.676620002 1.4e-008 (2)Alph_4 0.374432991 0.203141192 1.843215490 0.065298 (2)Alph_5 -0.544781960 0.302771760 -1.799315632 0.071969 (2)

Note: (1) = 1-tailed p-value(2) = 2-tailed p-value

Variance-Covariance Matrix of Estimates---------------------------------------

1| 0.031562| -0.007898 0.0041143| -0.01866 0.00119 0.045784| -0.01757 0.0006206 0.01656 0.035185| 0.01729 -0.0004738 -0.04557 -0.03516 0.091066| 0.01656 -0.004268 -0.01015 -0.009448 0.0093277| -0.004286 0.002321 0.00077 0.0003936 -0.00032628| -0.01003 0.0007165 0.0253 0.008841 -0.025169| -0.009365 0.0003531 0.008836 0.01806 -0.01805

10| 0.009229 -0.0002789 -0.02515 -0.01804 0.0497111| 0.01691 -0.004382 -0.0104 -0.00966 0.00957312| -0.004377 0.002363 0.0008014 0.0004173 -0.000362513| -0.01023 0.0007506 0.02563 0.008994 -0.0254914| -0.009469 0.0003336 0.008972 0.01836 -0.0183515| 0.009307 -0.0002467 -0.02545 -0.01834 0.05034

6| 0.03724


7| -0.008357 0.0041058| -0.02279 0.001309 0.046569| -0.02135 0.0006036 0.02033 0.0407

10| 0.02101 -0.0004379 -0.04628 -0.04068 0.0909711| 0.01853 -0.004944 -0.01119 -0.01019 0.010112| -0.004828 0.002541 0.0009741 0.0004787 -0.000423913| -0.01132 0.001082 0.02654 0.009398 -0.0263114| -0.0101 0.0004562 0.009371 0.01913 -0.0191215| 0.009889 -0.0003511 -0.02627 -0.01912 0.05195

11| 0.0418512| -0.009055 0.0040913| -0.02474 0.001478 0.047214| -0.02277 0.0005933 0.02169 0.0412715| 0.02254 -0.0004853 -0.04683 -0.04125 0.09167

Correlation of Maximum Marginal Likelihood Estimates----------------------------------------------------

1| 12| -0.6932 13| -0.4908 0.08674 14| -0.5272 0.05159 0.4127 15| 0.3224 -0.02448 -0.7058 -0.6212 16| 0.4829 -0.3448 -0.2459 -0.261 0.16027| -0.3766 0.5648 0.05616 0.03275 -0.016878| -0.2617 0.05177 0.5479 0.2185 -0.38649| -0.2613 0.02729 0.2047 0.4771 -0.2964

10| 0.1722 -0.01442 -0.3897 -0.3189 0.546111| 0.4652 -0.334 -0.2375 -0.2517 0.155112| -0.3852 0.576 0.05856 0.03479 -0.0187913| -0.265 0.05387 0.5513 0.2207 -0.388714| -0.2624 0.0256 0.2064 0.4818 -0.299315| 0.173 -0.0127 -0.3929 -0.323 0.551

6| 17| -0.6759 18| -0.5473 0.09471 19| -0.5484 0.0467 0.4671 1

10| 0.3609 -0.02266 -0.7111 -0.6685 111| 0.4694 -0.3772 -0.2535 -0.2469 0.163712| -0.3912 0.6201 0.07058 0.0371 -0.0219813| -0.2699 0.07772 0.5662 0.2144 -0.401514| -0.2575 0.03505 0.2138 0.4668 -0.312115| 0.1692 -0.0181 -0.4021 -0.3129 0.5689

11| 1


12| -0.6921 113| -0.5566 0.1064 114| -0.5479 0.04567 0.4915 115| 0.3639 -0.02506 -0.7119 -0.6707 1

**********************Wald Confidence Limits**********************

Parameter Estimate LowWaldCL UppWaldCLAlph_1_1 -0.2190364 -0.56722907 0.12915631Alph_2_1 0.1642545 0.03854816 0.28996079Alph_3_1 0.1786307 -0.24073998 0.59800143Alph_4_1 -0.1189599 -0.48658489 0.24866503Alph_5_1 0.1599689 -0.43148084 0.75141868Alph_1_2 -0.9612863 -1.33950764 -0.58306488Alph_2_2 0.3372512 0.21167620 0.46282630Alph_3_2 0.9017555 0.47884905 1.32466195Alph_4_2 0.1308696 -0.26454267 0.52628178Alph_5_2 -0.1063440 -0.69749247 0.48480447Alph_1_3 -1.7215155 -2.12246812 -1.32056287Alph_2_3 0.6322523 0.50690157 0.75760309Alph_3_3 1.2332797 0.80746580 1.65909366Alph_4_3 0.3744330 -0.02371643 0.77258241Alph_5_3 -0.5447820 -1.13820370 0.04863978


[ 1]: Alph_1_1 -0.2190364[ 2]: Alph_2_1 0.1642545[ 3]: Alph_3_1 0.1786307[ 4]: Alph_4_1 -0.1189599[ 5]: Alph_5_1 0.1599689[ 6]: Alph_1_2 -0.9612863[ 7]: Alph_2_2 0.3372512[ 8]: Alph_3_2 0.9017555[ 9]: Alph_4_2 0.1308696[10]: Alph_5_2 -0.1063440[11]: Alph_1_3 -1.7215155[12]: Alph_2_3 0.6322523[13]: Alph_3_3 1.2332797[14]: Alph_4_3 0.3744330[15]: Alph_5_3 -0.5447820


The second example treats the intercept as random effect which accounts for the clustering of students inthe classrooms.

optn = [ "print" 4,"link" "logit","efdis" "normal","yscale" "n","idvar" 2,"ctvar" 6,"cl" "wald","nquad" 10,"tech" "trureg" ];

model = "3 = 6 7 8 9";class = [ 2 3 ];random = 0;< gof,par,ase,ci,cov> = glmixd(tvsfp,model,optn,class,random);


Number Valid Observations 1600Response Variable Y[3]N Independend Variables 4Error Distribution NORMALLink Function LOGITEffect Distribution NORMALNominal Response with 4 Categ.Number Random Effects 1Number Fixed Effects 4Quad Points per Dimension 10Subject Variable Column 2Significance Level: 0.0500000Design Coding: Full-Rank

*************Model Effects*************

Intercept + X6 + X7 + X8 + X9


Class Level Value


Y[1] 4 1 2 3 4I[2] 135 193101 194101 194102 194103 194104

194105 194106 196101 196102 197101197102 197103 197104 198101 198102198103 199101 199102 199103 199104199105 199106 401101 401102 402101402102 403101 403102 404101 404102404103 405101 405102 405103 407101407102 407103 408101 408102 408103408104 409101 409102 409103 409104410101 410102 410103 410104 411101411102 412101 412102 412103 412104412105 412106 414101 414102 414104414105 414106 415101 415102 415103415104 415105 505101 505102 505103505104 505105 505106 505107 506102506103 506105 506107 506110 506111506112 506113 506114 506115 506116507101 507102 507103 507104 507105507106 507107 508101 508102 508103508104 509101 509102 509103 509104509105 509106 509107 509108 510101510102 510103 510104 510105 510106510107 513101 513102 513103 513104514101 514102 514103 514104 514105514106 514107 515101 515102 515103515104 515105 515106 515107 515108515109 515110 515111 515112 515113

T[6] 7 0 1 2 3 45 6


Column Nobs Mean Std Dev Skewness KurtosisX[6] 1600 2.0693750 1.2601804 0.3854767 -0.2616621X[7] 1600 0.4768750 0.4996211 0.0926860 -1.9939032X[8] 1600 0.4993750 0.5001559 0.0025023 -2.0024984X[9] 1600 0.2393750 0.4268354 1.2227251 -0.5055769



1 | Alph_1_1 | Resp 0 vs. 1 X6


2 | Alph_2_1 | Resp 0 vs. 1 X73 | Alph_3_1 | Resp 0 vs. 1 X84 | Alph_4_1 | Resp 0 vs. 1 X95 | Alph_1_2 | Resp 0 vs. 2 X66 | Alph_2_2 | Resp 0 vs. 2 X77 | Alph_3_2 | Resp 0 vs. 2 X88 | Alph_4_2 | Resp 0 vs. 2 X99 | Alph_1_3 | Resp 0 vs. 3 X6

10 | Alph_2_3 | Resp 0 vs. 3 X711 | Alph_3_3 | Resp 0 vs. 3 X812 | Alph_4_3 | Resp 0 vs. 3 X913 | Mean_1_1 | Resp 0 vs. 1 Random Intercept14 | Mean_1_2 | Resp 0 vs. 2 Random Intercept15 | Mean_1_3 | Resp 0 vs. 3 Random Intercept16 | COV_1_1 | Random Intercept Random Intercept

-------------------------------------------------------------




2 398 24.8750003 400 25.0000004 447 27.937500

I[2] 193101 26 1.625000194101 11 0.687500194102 10 0.625000..........................515111 15 0.937500515112 11 0.687500515113 10 0.625000

T[6] 0 150 9.3750001 419 26.1875002 481 30.0625003 332 20.7500004 163 10.1875005 48 3.0000006 7 0.437500



1 2 3 4 Total----------------------------------------------------------

0 55 41 35 19 1500.366667 0.273333 0.233333 0.126667

1 117 126 89 87 4190.279236 0.300716 0.212411 0.207637

2 106 119 135 121 4810.220374 0.247401 0.280665 0.251559

3 54 73 96 109 3320.162651 0.219880 0.289157 0.328313

4 15 34 38 76 1630.092025 0.208589 0.233129 0.466258

5 8 4 7 29 480.166667 0.083333 0.145833 0.604167

6 0 1 0 6 70.000000 0.142857 0.000000 0.857143

----------------------------------------------------------Total 355 398 400 447 1600

************************************************Data for Subject 403101 who has 20 Observations************************************************

Y Vector--------

1 : 2 3 2 3 36 : 2 1 3 3 3

11 : 2 3 2 3 116 : 3 3 2 2 3

X Vector--------

1 : 1 1 1 1 16 : 1 1 1 1 1

11 : 1 1 1 1 116 : 1 1 1 1 1

W Matrix--------

1| 2 1 0 02| 4 1 0 03| 4 1 0 04| 3 1 0 05| 3 1 0 06| 4 1 0 07| 2 1 0 0


8| 4 1 0 09| 5 1 0 0

10| 3 1 0 011| 3 1 0 012| 3 1 0 013| 1 1 0 014| 2 1 0 015| 2 1 0 016| 1 1 0 017| 4 1 0 018| 3 1 0 019| 0 1 0 020| 3 1 0 0

***************Starting Values***************


1| -0.2115 -0.4508 -0.1403 0.20122| -0.2327 -0.4958 -0.1543 0.22143| -0.2538 -0.5409 -0.1684 0.2415

Means-----

1 : -3.481 -2.344 -1.278

Homogeneous COV Parameter Matrix--------------------------------

1| 0.5736






Fixed EffectsAlph_1 0.164111368 0.065938155 2.488868065 0.012815 (2)Alph_2 0.194322479 0.256490646 0.757620138 0.448678 (2)Alph_3 -0.188874132 0.234326062 -0.806031261 0.420225 (2)Alph_4 0.232402265 0.362413631 0.641262484 0.521352 (2)

Random EffectsMean_1 -0.171664370 0.205329372 -0.836043906 0.403130 (2)


Fixed EffectsAlph_1 0.337222643 0.065820668 5.123354911 3.0e-007 (2)Alph_2 0.917228803 0.258001671 3.555127368 3.8e-004 (2)Alph_3 0.060417271 0.245851621 0.245746887 0.805878 (2)Alph_4 -0.033208335 0.362296583 -0.091660634 0.926968 (2)

Random EffectsMean_1 -0.913912495 0.218663733 -4.179533950 2.9e-005 (2)




Random Effect Variance Term: Expressed as Standard DeviationMean_1 0.511231771 0.109828178 4.654832474 1.6e-006 (1)


***************************************Intracluster Correlation

Residual Variance = pi*pi / 3 (assumed)***************************************


1 cluster variance = ( 0.51123 * 0.51123 ) = 0.26136intracluster corr = 0.26136 / ( 0.26136 + (pi*pi/3)) = 0.07360


1| 0.0043482| 0.001298 0.065793| 0.0006493 0.02585 0.054914| -0.0005472 -0.06535 -0.0549 0.13135| 0.002551 0.0008787 0.0004211 -0.0004048 0.0043326| 0.0008267 0.0453 0.01813 -0.04493 0.0014187| 0.0003911 0.01812 0.03779 -0.03779 0.0006318| -0.000368 -0.04492 -0.03779 0.08999 -0.00051859| 0.002589 0.0009098 0.0004393 -0.0004357 0.002765

10| 0.0008643 0.04563 0.01828 -0.04526 0.001211| 0.0003784 0.01825 0.03809 -0.03809 0.000495812| -0.0003506 -0.04522 -0.03809 0.09061 -0.000455613| -0.008337 -0.02818 -0.02751 0.02731 -0.00471714| -0.004699 -0.01967 -0.01938 0.01937 -0.00878315| -0.004806 -0.01991 -0.01958 0.0196 -0.00536716| 0.0001265 0.0008014 -0.002625 0.002645 0.0001275

6| 0.066567| 0.02962 0.060448| -0.06605 -0.06043 0.13139| 0.001086 0.0005052 -0.0005068 0.004313

10| 0.04656 0.01869 -0.0461 0.001595 0.0672311| 0.01866 0.03888 -0.03888 0.0006274 0.0309812| -0.04606 -0.03888 0.09224 -0.0005788 -0.0666313| -0.01955 -0.01932 0.01929 -0.0048 -0.0197514| -0.0323 -0.03129 0.03105 -0.005249 -0.0208515| -0.02072 -0.02012 0.02016 -0.009477 -0.0342716| 0.0007941 -0.002647 0.002669 0.000128 0.0007733

11| 0.0610312| -0.06103 0.13213| -0.01944 0.01939 0.0421614| -0.02006 0.01999 0.02714 0.0478115| -0.03272 0.03262 0.02748 0.02911 0.0524316| -0.002673 0.002716 0.001778 0.001785 0.001796

16| 0.01206



1| 12| 0.07677 13| 0.04202 0.4302 14| -0.0229 -0.703 -0.6465 15| 0.5877 0.05205 0.0273 -0.01697 16| 0.04859 0.6846 0.2999 -0.4806 0.083527| 0.02413 0.2874 0.6559 -0.4241 0.038998| -0.01541 -0.4834 -0.4451 0.6854 -0.021749| 0.5978 0.05401 0.02854 -0.01831 0.6396

10| 0.05055 0.6861 0.3009 -0.4817 0.070311| 0.02323 0.2881 0.6579 -0.4254 0.0304912| -0.01464 -0.4853 -0.4474 0.6882 -0.0190513| -0.6157 -0.535 -0.5717 0.367 -0.34914| -0.3259 -0.3507 -0.3783 0.2444 -0.610215| -0.3183 -0.339 -0.3649 0.2362 -0.356116| 0.01747 0.02845 -0.102 0.06646 0.01763

6| 17| 0.467 18| -0.7067 -0.6785 19| 0.06408 0.03129 -0.0213 1

10| 0.696 0.2932 -0.4907 0.09364 111| 0.2927 0.6402 -0.4344 0.03867 0.483712| -0.4914 -0.4353 0.7008 -0.02426 -0.707413| -0.369 -0.3828 0.2592 -0.356 -0.37114| -0.5726 -0.582 0.3919 -0.3655 -0.367815| -0.3507 -0.3575 0.243 -0.6302 -0.577316| 0.02803 -0.09804 0.06707 0.01774 0.02716

11| 112| -0.68 113| -0.3832 0.26 114| -0.3713 0.2516 0.6046 115| -0.5785 0.3922 0.5844 0.5813 116| -0.09853 0.06806 0.07885 0.07432 0.07141

16| 1


Parameter Estimate LowWaldCL UppWaldCLAlph_1_1 0.1641114 0.03487496 0.29334778Alph_2_1 0.1943225 -0.30838995 0.69703491Alph_3_1 -0.1888741 -0.64814477 0.27039651


Alph_4_1 0.2324023 -0.47791540 0.94271993Alph_1_2 0.3372226 0.20821650 0.46622878Alph_2_2 0.9172288 0.41155482 1.42290279Alph_3_2 0.0604173 -0.42144305 0.54227759Alph_4_2 -0.0332083 -0.74329659 0.67687992Alph_1_3 0.6322030 0.50348251 0.76092356Alph_2_3 1.2482151 0.74002180 1.75640837Alph_3_3 0.3034746 -0.18072930 0.78767846Alph_4_3 -0.4702398 -1.18229643 0.24181684Mean_1_1 -0.1716644 -0.57410254 0.23077380Mean_1_2 -0.9139125 -1.34248554 -0.48533945Mean_1_3 -1.6739709 -2.12277353 -1.22516837COV_1_1 0.5112318 0.29597250 0.72649104


[ 1]: Alph_1_1 0.1641114[ 2]: Alph_2_1 0.1943225[ 3]: Alph_3_1 -0.1888741[ 4]: Alph_4_1 0.2324023[ 5]: Alph_1_2 0.3372226[ 6]: Alph_2_2 0.9172288[ 7]: Alph_3_2 0.0604173[ 8]: Alph_4_2 -0.0332083[ 9]: Alph_1_3 0.6322030[10]: Alph_2_3 1.2482151[11]: Alph_3_3 0.3034746[12]: Alph_4_3 -0.4702398[13]: Mean_1_1 -0.1716644[14]: Mean_1_2 -0.9139125[15]: Mean_1_3 -1.6739709[16]: COV_1_1 0.5112318

******************************************Empirical Bayes Estimates of Random Effect******************************************

N ID_Var Nobs P_Mean P_StdDev

1 403101 20 0.874105249 0.8492071682 403102 3 0.253027885 0.9545713513 404101 11 0.370691609 0.857427702

...........................................133 515111 15 1.294318719 0.810793820134 515112 11 0.353389008 0.814552941135 515113 10 -1.091915917 0.794790108


The third example estimates model covariates for the random intercept model for both clustering levels, theclassroom level and the level of the students.

optn = [ "print" 4,"link" "logit","efdis" "normal","yscale" "n","idvar" 2,"ctvar" 6,"cl" "wald","vcat" ,"nquad" 10,"tech" "trureg" ];

model = "3 = 6 7 8 9";class = [ 2 3 ];random = 0;< gof,par,ase,ci,cov> = glmixd(tvsfp,model,optn,class,random);


Number Valid Observations 1600Response Variable Y[3]N Independend Variables 4Error Distribution NORMALLink Function LOGITEffect Distribution NORMALNominal Response with 4 Categ.Number Random Effects 1Number Fixed Effects 4Quad Points per Dimension 10Subject Variable Column 2Significance Level: 0.0500000Design Coding: Full-Rank



1 | Alph_1_1 | Resp 0 vs. 1 X62 | Alph_2_1 | Resp 0 vs. 1 X73 | Alph_3_1 | Resp 0 vs. 1 X84 | Alph_4_1 | Resp 0 vs. 1 X95 | Alph_1_2 | Resp 0 vs. 2 X66 | Alph_2_2 | Resp 0 vs. 2 X7


7 | Alph_3_2 | Resp 0 vs. 2 X88 | Alph_4_2 | Resp 0 vs. 2 X99 | Alph_1_3 | Resp 0 vs. 3 X6

10 | Alph_2_3 | Resp 0 vs. 3 X711 | Alph_3_3 | Resp 0 vs. 3 X812 | Alph_4_3 | Resp 0 vs. 3 X913 | Mean_1_1 | Resp 0 vs. 1 Random Intercept14 | Mean_1_2 | Resp 0 vs. 2 Random Intercept15 | Mean_1_3 | Resp 0 vs. 3 Random Intercept

-------------------------------------------------------------Variance for Response Level 1

16 | COV_1_1_1 | Random Intercept Random Intercept-------------------------------------------------------------

Variance for Response Level 217 | COV_1_1_2 | Random Intercept Random Intercept



***************Starting Values***************


1| -0.2115 -0.4508 -0.1403 0.20122| -0.2327 -0.4958 -0.1543 0.22143| -0.2538 -0.5409 -0.1684 0.2415

Means-----

1 : -3.481 -2.344 -1.278

Covariance Parameter Matrix for Y=1-----------------------------------

1| 0.5736


1| 0.6309


1| 0.6883






Fixed EffectsAlph_1 0.165460976 0.064754213 2.555215636 0.010612 (2)Alph_2 0.200542364 0.229533515 0.873695345 0.382284 (2)Alph_3 -0.142517911 0.204872931 -0.695640514 0.486654 (2)Alph_4 0.185226602 0.324147930 0.571426146 0.567711 (2)



















1| 0.0041932| 0.001275 0.052693| 0.0006001 0.01962 0.041974| -0.0005262 -0.05226 -0.04199 0.10515| 0.002467 0.0008634 0.0003888 -0.000375 0.0043996| 0.0008146 0.03866 0.01507 -0.03838 0.0014817| 0.0003546 0.01495 0.03119 -0.0312 0.000618| -0.0003451 -0.03823 -0.03119 0.07668 -0.00052619| 0.002517 0.0008878 0.0004079 -0.0003982 0.002862

10| 0.0008531 0.04063 0.01603 -0.04034 0.00128211| 0.0003559 0.01591 0.03314 -0.03318 0.000493112| -0.0003467 -0.04021 -0.03315 0.08065 -0.000503113| -0.008023 -0.02183 -0.02102 0.02091 -0.00453414| -0.004539 -0.01652 -0.01608 0.0161 -0.00893815| -0.004659 -0.01748 -0.01708 0.01708 -0.00557916| 0.000141 0.0009208 -0.00207 0.002257 0.000136517| 0.0001619 0.0009899 -0.001038 0.001065 0.000281818| 0.0001266 0.0007619 -0.001078 0.001257 0.0002472


6| 0.073487| 0.03301 0.067428| -0.07292 -0.0675 0.14559| 0.001152 0.0004812 -0.0005116 0.004456

10| 0.0571 0.02388 -0.05664 0.00169 0.0820511| 0.02379 0.0493 -0.0493 0.0006314 0.0381712| -0.05653 -0.04936 0.1134 -0.0006133 -0.0813413| -0.01638 -0.01604 0.01606 -0.00464 -0.0173914| -0.03588 -0.03433 0.03408 -0.005453 -0.0261815| -0.0261 -0.02529 0.02549 -0.009818 -0.0418716| 0.0005368 -0.002163 0.002249 0.0001274 0.000528317| 0.0009841 -0.002736 0.00336 0.0002371 0.000704518| 0.001086 -0.00137 0.001346 0.0003198 0.001277

11| 0.0758812| -0.07592 0.161913| -0.017 0.01701 0.035314| -0.02516 0.02516 0.02363 0.0517615| -0.03962 0.03958 0.02484 0.03446 0.0611116| -0.002314 0.002387 0.002038 0.002056 0.00224217| -0.001655 0.002056 0.001477 -0.0005108 0.00191918| -0.002692 0.002986 0.001406 0.00124 -0.00134

16| 0.0145217| 0.008656 0.0168618| 0.008407 0.008097 0.01642


1| 12| 0.08576 13| 0.04524 0.4172 14| -0.02507 -0.7024 -0.6323 15| 0.5745 0.05671 0.02861 -0.01744 16| 0.04641 0.6214 0.2715 -0.4368 0.082397| 0.02109 0.2508 0.5863 -0.3707 0.035428| -0.01397 -0.4367 -0.3991 0.6201 -0.020799| 0.5824 0.05794 0.02983 -0.01841 0.6464

10| 0.04599 0.6179 0.2731 -0.4344 0.0674811| 0.01996 0.2516 0.5873 -0.3716 0.0269912| -0.01331 -0.4354 -0.4021 0.6183 -0.0188513| -0.6594 -0.5062 -0.5462 0.3433 -0.363814| -0.3081 -0.3163 -0.3449 0.2183 -0.592315| -0.2911 -0.308 -0.3372 0.2132 -0.340316| 0.01808 0.03329 -0.08386 0.05779 0.0170817| 0.01926 0.03321 -0.03904 0.0253 0.03272


18| 0.01525 0.0259 -0.04106 0.03026 0.02908

6| 17| 0.469 18| -0.7052 -0.6815 19| 0.06369 0.02777 -0.02009 1

10| 0.7353 0.321 -0.5183 0.0884 111| 0.3187 0.6893 -0.4692 0.03434 0.483712| -0.5183 -0.4724 0.7391 -0.02283 -0.705713| -0.3216 -0.3289 0.224 -0.37 -0.323214| -0.5818 -0.5811 0.3926 -0.359 -0.401715| -0.3896 -0.394 0.2703 -0.595 -0.591416| 0.01644 -0.06915 0.04894 0.01584 0.0153117| 0.02796 -0.08116 0.06784 0.02736 0.0189418| 0.03127 -0.04118 0.02753 0.03738 0.03479

11| 112| -0.685 113| -0.3285 0.225 114| -0.4014 0.2749 0.5529 115| -0.5818 0.3979 0.5349 0.6128 116| -0.06973 0.04924 0.09001 0.07501 0.0752517| -0.04626 0.03936 0.06055 -0.01729 0.0597818| -0.07626 0.05791 0.05839 0.04252 -0.04231

16| 117| 0.5533 118| 0.5444 0.4866 1


Parameter Estimate LowWaldCL UppWaldCLAlph_1_1 0.1654610 0.03854505 0.29237690Alph_2_1 0.2005424 -0.24933506 0.65041979Alph_3_1 -0.1425179 -0.54406148 0.25902566Alph_4_1 0.1852266 -0.45009167 0.82054487Alph_1_2 0.3406194 0.21061923 0.47061967Alph_2_2 0.9355394 0.40426019 1.46681852Alph_3_2 0.0694757 -0.43943772 0.57838913Alph_4_2 -0.0349813 -0.78263973 0.71267717Alph_1_3 0.6361242 0.50529493 0.76695357Alph_2_3 1.2667392 0.70531633 1.82816206Alph_3_3 0.3020847 -0.23780856 0.84197794Alph_4_3 -0.4559497 -1.24455430 0.33265489


Mean_1_1 -0.1784016 -0.54666421 0.18986104Mean_1_2 -0.9587179 -1.40462088 -0.51281489Mean_1_3 -1.7432791 -2.22779707 -1.25876117COV_1_1_1 0.2953077 0.05913745 0.53147795COV_1_1_2 0.6027214 0.34823027 0.85721244COV_1_1_3 0.6865963 0.43541536 0.93777730


[ 1]: Alph_1_1 0.1654610[ 2]: Alph_2_1 0.2005424[ 3]: Alph_3_1 -0.1425179[ 4]: Alph_4_1 0.1852266[ 5]: Alph_1_2 0.3406194[ 6]: Alph_2_2 0.9355394[ 7]: Alph_3_2 0.0694757[ 8]: Alph_4_2 -0.0349813[ 9]: Alph_1_3 0.6361242[10]: Alph_2_3 1.2667392[11]: Alph_3_3 0.3020847[12]: Alph_4_3 -0.4559497[13]: Mean_1_1 -0.1784016[14]: Mean_1_2 -0.9587179[15]: Mean_1_3 -1.7432791[16]: COV_1_1_1 0.2953077[17]: COV_1_1_2 0.6027214[18]: COV_1_1_3 0.6865963



1 403101 20 1.194621760 0.7582308492 403102 3 0.182880768 0.9283923303 404101 11 0.292604492 0.783294404

...........................................133 515111 15 1.260508525 0.728980322134 515112 11 0.463341017 0.763379453135 515113 10 -1.266446297 0.780378734

2. The followig example is taken from the MIXNO manual by Hedeker (1999, [383]). The observations of thedata set sdhouse.dat are homeless people participating in the San Diego Homeless Project clustered bysupportive case management and levels of access to independent housing. Among other model types with


this data set, Hedeker shows how to model a trend (longitudinal) analysis using multiple random subjecteffects (see page 40).

The missing value coding of 999 is replaced by missing values of CMAT type (externally periods whichinternally are coded as NaN).

/* San Diego Homeless Research Project (Mc Kinney):Housing outcome across time362 low income individuals:missing value coding is 999Variables: [1] subject ID,

[2]: housing status (=1: street, ...)[3]: intercept[4] section 8 certificate (=0: no, =1: yes)[5-7]: dummy codes for time effects[8-10]: section 8 x time effects[11]: non section 8 group (=0: no, =1: yes)[12]: linear time contrast[13]: section 8 x linear time */

sdhous0 = [1 1 1 1 0 0 0 0 0 0 0 0 01 2 1 1 1 0 0 1 0 0 0 1 11 2 1 1 0 1 0 0 1 0 0 2 21 2 1 1 0 0 1 0 0 1 0 3 32 1 1 1 0 0 0 0 0 0 0 0 0

. . .361 999 1 0 0 0 1 0 0 0 1 3 0362 1 1 0 0 0 0 0 0 0 1 0 0362 1 1 0 1 0 0 0 0 0 1 1 0362 1 1 0 0 1 0 0 0 0 1 2 0362 1 1 0 0 0 1 0 0 0 1 3 0 ];

sdhous1 = shape(sdhous0,.,13);sdhous = replace(sdhous1,999,.);

The first model only treats the intercept as random: i

#include "sdhouse.dat"print " Nobs=", nrow(sdhous),

" Nvar=", ncol(sdhous);optn = [ "print" 4,

"nomis" ,"link" "logit","efdis" "normal","yscale" "n","idvar" 1,"ctvar" 4,"cl" "wald","vcat" ,


"nquad" 20,"tech" "dbldog" ];


Number Valid Observations 1289Response Variable Y[2]N Independend Variables 7Error Distribution NORMALLink Function LOGITEffect Distribution NORMALNominal Response with 3 Categ.Number Random Effects 1Number Fixed Effects 7Quad Points per Dimension 20Skip Cases with Missing ValuesSubject Variable Column 1Significance Level: 0.0500000Design Coding: Full-Rank

*************Model Effects*************

Intercept + C4 + C5 + C6 + C7 + C8 + C9 + C10


Class Level Value

Y[ 1] 3 0 1 2C[ 4] 2 0 1C[ 5] 2 0 1C[ 6] 2 0 1C[ 7] 2 0 1C[ 8] 2 0 1C[ 9] 2 0 1C[10] 2 0 1I[ 1] 362 1 2 3 4 5

6 7 8 9 1011 12 13 14 1516 17 18 19 20

...................................


351 352 353 354 355356 357 358 359 360361 362

T[ 4] 2 0 1



1 | Alph_1_1 | Resp 0 vs. 1 C4_02 | Alph_2_1 | Resp 0 vs. 1 C5_03 | Alph_3_1 | Resp 0 vs. 1 C6_04 | Alph_4_1 | Resp 0 vs. 1 C7_05 | Alph_5_1 | Resp 0 vs. 1 C8_06 | Alph_6_1 | Resp 0 vs. 1 C9_07 | Alph_7_1 | Resp 0 vs. 1 C10_08 | Alph_1_2 | Resp 0 vs. 2 C4_09 | Alph_2_2 | Resp 0 vs. 2 C5_0

10 | Alph_3_2 | Resp 0 vs. 2 C6_011 | Alph_4_2 | Resp 0 vs. 2 C7_012 | Alph_5_2 | Resp 0 vs. 2 C8_013 | Alph_6_2 | Resp 0 vs. 2 C9_014 | Alph_7_2 | Resp 0 vs. 2 C10_015 | Mean_1_1 | Resp 0 vs. 1 Random Intercept16 | Mean_1_2 | Resp 0 vs. 2 Random Intercept



Variance for Response Level 218 | COV_1_1_2 | Random Intercept Random Intercept

-------------------------------------------------------------



Variable Value Nobs ProportionY[ 2] 0 294 22.808379

1 484 37.5484872 511 39.643134

I[ 1] 1 4 0.3103182 4 0.310318


3 4 0.310318.......................360 4 0.310318361 2 0.155159362 4 0.310318

T[ 4] 0 632 49.0302561 657 50.969744

C[ 4] 0 632 49.0302561 657 50.969744

C[ 5] 0 967 75.0193951 322 24.980605

C[ 6] 0 986 76.4934061 303 23.506594

C[ 7] 0 986 76.4934061 303 23.506594

C[ 8] 0 1128 87.5096971 161 12.490303

C[ 9] 0 1132 87.8200161 157 12.179984

C[10] 0 1131 87.7424361 158 12.257564


0 1 2 Total------------------------------------------------

0 161 305 166 6320.254747 0.482595 0.262658

1 133 179 345 6570.202435 0.272451 0.525114

------------------------------------------------Total 294 484 511 1289

******************************************Data for Subject 1 who has 4 Observations******************************************

Y Vector


--------

1 : 1 2 2 2

X Vector--------

1 : 1 1 1 1

W Matrix--------

1| 1 0 0 0 02| 1 1 0 0 13| 1 0 1 0 04| 1 0 0 1 0

1| 0 02| 0 03| 1 04| 0 1

***************Starting Values***************


1| -0.1963 -0.6469 -0.8929 -0.9665 -0.43092| -0.216 -0.7116 -0.9822 -1.063 -0.474

1| -0.2849 -0.1082| -0.3134 -0.1188

Means-----

1 : -2.9 -1.261


1| 0.5736


1| 0.6309






Fixed EffectsAlph_1 0.522066110 0.261883604 1.993504377 0.046206 (2)Alph_2 1.942128162 0.300308574 6.467108593 1.0e-010 (2)Alph_3 2.822471500 0.384005238 7.350085944 2.0e-013 (2)Alph_4 2.259364063 0.354805805 6.367889223 1.9e-010 (2)Alph_5 -0.136930594 0.461666704 -0.296600542 0.766771 (2)Alph_6 -1.921742165 0.516678686 -3.719414442 2.0e-004 (2)Alph_7 -0.952860567 0.480762057 -1.981979552 0.047482 (2)




Fixed EffectsAlph_1 0.779384366 0.480766108 1.621130012 0.104990 (2)Alph_2 2.680644318 0.435922079 6.149365778 7.8e-010 (2)Alph_3 4.088686089 0.497444715 8.219377891 2.0e-016 (2)Alph_4 4.096575811 0.469613016 8.723301259 2.7e-018 (2)Alph_5 2.003754827 0.619140371 3.236349816 1.2e-003 (2)Alph_6 0.546877226 0.650796383 0.840320015 0.400729 (2)Alph_7 0.306437511 0.619569823 0.494597218 0.620884 (2)










1| 0.068582| 0.0327 0.090193| 0.03387 0.04512 0.14754| 0.0335 0.04383 0.04958 0.12595| -0.05748 -0.0744 -0.02347 -0.02406 0.21316| -0.05925 -0.03155 -0.1288 -0.03253 0.063627| -0.05877 -0.0299 -0.03047 -0.1084 0.064078| 0.05936 0.02159 0.02321 0.02302 -0.039519| 0.01998 0.07222 0.0341 0.03302 -0.05525

10| 0.02073 0.03292 0.1286 0.03812 -0.0116511| 0.01976 0.02977 0.0351 0.1023 -0.0132712| -0.03499 -0.06076 -0.01805 -0.01773 0.169513| -0.03689 -0.02593 -0.1186 -0.02825 0.0428414| -0.03541 -0.02073 -0.02237 -0.09004 0.0446615| -0.0342 -0.03037 -0.03067 -0.03056 0.0295116| -0.02789 -0.01704 -0.01723 -0.01809 0.0186717| 0.005428 0.01898 0.02602 0.02373 0.0104818| 0.003571 0.01172 0.01647 0.01581 0.008397

6| 0.2677| 0.06464 0.23118| -0.0415 -0.04121 0.23119| -0.0191 -0.01795 0.0956 0.19

10| -0.1096 -0.0192 0.09877 0.1121 0.247511| -0.02009 -0.08757 0.09906 0.1103 0.119912| 0.04629 0.04497 -0.1696 -0.1699 -0.0842913| 0.207 0.04645 -0.1737 -0.09544 -0.223414| 0.04733 0.1813 -0.1734 -0.09277 -0.0951715| 0.02996 0.0298 -0.02836 -0.01731 -0.0173216| 0.01757 0.01928 -0.1259 -0.097 -0.102717| -0.0006374 0.002281 0.004887 0.02007 0.0250418| 0.00226 0.002275 0.006738 0.02188 0.0306


11| 0.220512| -0.08255 0.383313| -0.09391 0.1935 0.423514| -0.1951 0.1881 0.1937 0.383915| -0.01697 0.01722 0.01757 0.01702 0.0340816| -0.1071 0.07776 0.07932 0.0874 0.0276817| 0.01915 0.002255 -0.01087 -0.001329 -0.00106418| 0.03117 0.02018 0.01069 0.008133 1.704e-006

16| 0.142917| 0.002848 0.0353818| -0.0232 0.023 0.04776


1| 12| 0.4158 13| 0.3368 0.3913 14| 0.3605 0.4114 0.3639 15| -0.4754 -0.5366 -0.1324 -0.1469 16| -0.4379 -0.2034 -0.6493 -0.1774 0.26677| -0.4668 -0.2071 -0.165 -0.6357 0.28878| 0.4714 0.1495 0.1257 0.135 -0.1789| 0.175 0.5517 0.2037 0.2135 -0.2745

10| 0.1591 0.2204 0.6731 0.216 -0.0507311| 0.1606 0.2111 0.1946 0.6141 -0.0611912| -0.2158 -0.3268 -0.07593 -0.08069 0.593113| -0.2165 -0.1327 -0.4745 -0.1223 0.142614| -0.2182 -0.1114 -0.09401 -0.4096 0.156115| -0.7073 -0.5478 -0.4327 -0.4666 0.346216| -0.2817 -0.1501 -0.1187 -0.1348 0.10717| 0.1102 0.336 0.3602 0.3556 0.120718| 0.06239 0.1786 0.1963 0.2038 0.08323

6| 17| 0.2602 18| -0.1671 -0.1783 19| -0.08481 -0.08564 0.4561 1

10| -0.4266 -0.08028 0.413 0.517 111| -0.08279 -0.3879 0.4387 0.539 0.513212| 0.1447 0.1511 -0.5697 -0.6297 -0.273713| 0.6156 0.1485 -0.555 -0.3364 -0.690114| 0.1478 0.6087 -0.5821 -0.3435 -0.308815| 0.3141 0.3358 -0.3196 -0.2151 -0.188616| 0.08995 0.1061 -0.6926 -0.5886 -0.54617| -0.006558 0.02522 0.05404 0.2447 0.267618| 0.02001 0.02165 0.06413 0.2297 0.2815


11| 112| -0.2839 113| -0.3073 0.4802 114| -0.6705 0.4903 0.4803 115| -0.1958 0.1507 0.1463 0.1488 116| -0.6034 0.3322 0.3224 0.3732 0.396617| 0.2168 0.01936 -0.08884 -0.0114 -0.0306318| 0.3037 0.1491 0.07517 0.06007 4.223e-005

16| 117| 0.04005 118| -0.2808 0.5594 1


Parameter Estimate LowWaldCL UppWaldCLAlph_1_1 0.5220661 0.00878368 1.03534854Alph_2_1 1.9421282 1.35353417 2.53072215Alph_3_1 2.8224715 2.06983506 3.57510794Alph_4_1 2.2593641 1.56395746 2.95477066Alph_5_1 -0.1369306 -1.04178071 0.76791952Alph_6_1 -1.9217422 -2.93441378 -0.90907055Alph_7_1 -0.9528606 -1.89513688 -0.01058425Alph_1_2 0.7793844 -0.16289989 1.72166862Alph_2_2 2.6806443 1.82625274 3.53503589Alph_3_2 4.0886861 3.11371236 5.06365981Alph_4_2 4.0965758 3.17615121 5.01700041Alph_5_2 2.0037548 0.79026200 3.21724766Alph_6_2 0.5468772 -0.72866025 1.82241470Alph_7_2 0.3064375 -0.90789703 1.52077205Mean_1_1 -0.4523355 -0.81417689 -0.09049402Mean_1_2 -2.6736839 -3.41463811 -1.93272960COV_1_1_1 0.8707448 0.50208256 1.23940702COV_1_1_2 2.3336276 1.90528720 2.76196799


[ 1]: Alph_1_1 0.5220661[ 2]: Alph_2_1 1.9421282[ 3]: Alph_3_1 2.8224715[ 4]: Alph_4_1 2.2593641[ 5]: Alph_5_1 -0.1369306[ 6]: Alph_6_1 -1.9217422


[ 7]: Alph_7_1 -0.9528606[ 8]: Alph_1_2 0.7793844[ 9]: Alph_2_2 2.6806443[10]: Alph_3_2 4.0886861[11]: Alph_4_2 4.0965758[12]: Alph_5_2 2.0037548[13]: Alph_6_2 0.5468772[14]: Alph_7_2 0.3064375[15]: Mean_1_1 -0.4523355[16]: Mean_1_2 -2.6736839[17]: COV_1_1_1 0.8707448[18]: COV_1_1_2 2.3336276



1 1 4 0.648199421 0.6385506762 2 4 0.099507753 0.5878327203 3 4 0.310741513 0.607006120

...........................................360 360 4 -0.546369897 0.640654078361 361 2 -0.930767828 0.820574217362 362 4 -0.205458196 0.611499462

The next model uses the section 8 variable in column 4 both as fixed and as random effect. To perform thisanalysis in CMAT we have to make a copy of column 4 in column 13:

sdhous2 = sdhous -> sdhous[.,4];

optn = [ "print" 24,"nomis" ,"link" "logit","efdis" "normal","yscale" "n","nomu" ,"idvar" 1,"ctvar" 4,"cl" "wald","vcat" ,"vgrp" ,"nquad" 20,"tech" "trureg" ];

model = "2 = 4 5 6 7 8 9 10 11 14";class = [ 2 4:11 14 ]; /* these are var numbers in data set */


random = [ 8 9 ]; /* effect numbers in model statement */< gof,par,ase,ci,cov> = glmixd(sdhous2,model,optn,class,random);


Number Valid Observations 1289Response Variable Y[2]N Independend Variables 9Error Distribution NORMALLink Function LOGITEffect Distribution NORMALNominal Response with 3 Categ.Number Random Effects 2Number Fixed Effects 8Quad Points per Dimension 20Skip Cases with Missing ValuesSubject Variable Column 1Significance Level: 0.0500000Design Coding: Full-Rank

*************Model Effects*************

Intercept + C4 + C5 + C6 + C7 + C8 + C9 + C10 + C11 + C14


Class Level Value

Y[ 1] 3 0 1 2C[ 4] 2 0 1C[ 5] 2 0 1C[ 6] 2 0 1C[ 7] 2 0 1C[ 8] 2 0 1C[ 9] 2 0 1C[10] 2 0 1C[11] 2 0 1C[14] 2 0 1I[ 1] 362 1 2 3 4 5

6 7 8 9 1011 12 13 14 15


16 17 18 19 20...................................351 352 353 354 355356 357 358 359 360361 362

T[ 4] 2 0 1



1 | Alph_1_1 | Resp 0 vs. 1 Fixed Intercept2 | Alph_2_1 | Resp 0 vs. 1 C4_03 | Alph_3_1 | Resp 0 vs. 1 C5_04 | Alph_4_1 | Resp 0 vs. 1 C6_05 | Alph_5_1 | Resp 0 vs. 1 C7_06 | Alph_6_1 | Resp 0 vs. 1 C8_07 | Alph_7_1 | Resp 0 vs. 1 C9_08 | Alph_8_1 | Resp 0 vs. 1 C10_09 | Alph_1_2 | Resp 0 vs. 2 Fixed Intercept

10 | Alph_2_2 | Resp 0 vs. 2 C4_011 | Alph_3_2 | Resp 0 vs. 2 C5_012 | Alph_4_2 | Resp 0 vs. 2 C6_013 | Alph_5_2 | Resp 0 vs. 2 C7_014 | Alph_6_2 | Resp 0 vs. 2 C8_015 | Alph_7_2 | Resp 0 vs. 2 C9_016 | Alph_8_2 | Resp 0 vs. 2 C10_0

-------------------------------------------------------------Variances for Response Level 1

17 | COV_1_1_1 | C11_0 C11_018 | COV_2_2_1 | C14_0 C14_0

-------------------------------------------------------------Variances for Response Level 2

19 | COV_1_1_2 | C11_0 C11_020 | COV_2_2_2 | C14_0 C14_0

-------------------------------------------------------------




1 484 37.548487


2 511 39.643134

I[ 1] 1 4 0.3103182 4 0.3103183 4 0.310318

.......................360 4 0.310318361 2 0.155159362 4 0.310318

T[ 4] 0 632 49.0302561 657 50.969744

C[ 4] 0 632 49.0302561 657 50.969744

C[ 5] 0 967 75.0193951 322 24.980605

C[ 6] 0 986 76.4934061 303 23.506594

C[ 7] 0 986 76.4934061 303 23.506594

C[ 8] 0 1128 87.5096971 161 12.490303

C[ 9] 0 1132 87.8200161 157 12.179984

C[10] 0 1131 87.7424361 158 12.257564

C[11] 0 657 50.9697441 632 49.030256

C[14] 0 632 49.030256


0 1 2 Total------------------------------------------------

0 161 305 166 6320.254747 0.482595 0.262658

1 133 179 345 6570.202435 0.272451 0.525114

------------------------------------------------


Total 294 484 511 1289


Y Vector--------

1 : 1 2 2 2

X Matrix--------

1| 0 12| 0 13| 0 14| 0 1

W Matrix--------

1| 1 1 0 0 02| 1 1 1 0 03| 1 1 0 1 04| 1 1 0 0 1

1| 0 0 02| 1 0 03| 0 1 04| 0 0 1

***************Starting Values***************


1| 0 0 0 0 02| 0 0 0 0 0

1| 0 0 02| 0 0 0

Diag Covariance Parameter Matrix for Y=1----------------------------------------


1 : 1.814 0.9069

Diag Covariance Parameter Matrix for Y=2----------------------------------------

1 : 1.995 1.088

Trust Region OptimizationWithout Parameter ScalingUser Specified Gradient

User Specified Hessian (dense)

Iteration Start:N. Variables 20Criterion -1405.705027 Max Grad Entry 96.75097727TR Radius 1.000000000

Iter rest nfun act optcrit difcrit maxgrad lambda radius

1 0 2 0 -1297.718 107.9873 66.9155 33.3182 1.000002 0 3 0 -1186.757 110.9605 45.1086 3.36055 2.757553 0 4 0 -1131.655 55.10201 21.2839 1.19378 2.786784 0 5 0 -1114.440 17.21482 10.7945 0.00000 2.967515 0 6 0 -1111.389 3.051897 8.46259 0.00000 2.863266 0 7 0 -1110.267 1.121825 6.32037 0.00000 0.707277 0 8 0 -1109.716 0.550224 4.46051-0.46987 0.421718 0 9 0 -1109.459 0.257941 3.29085 0.00000 0.709819 0 10 0 -1109.341 0.117389 2.27865 0.00000 0.41835

10 0 11 0 -1109.287 0.054385 1.67808 0.00000 0.2031111 0 12 0 -1109.254 0.033164 1.11302-0.33159 0.1776712 0 13 0 -1109.241 0.013011 0.88514 0.00000 0.1514013 0 14 0 -1109.230 0.010195 0.56322-0.46662 0.1131314 0 15 0 -1109.228 2.7e-003 0.47704 0.00000 0.0975015 0 16 0 -1109.224 3.7e-003 0.33652 0.66068 0.0733016 0 18 0 -1109.217 6.7e-003 0.23828 6.55796 0.0301117 0 19 0 -1109.217 7.5e-004 0.18759 0.00000 0.0301518 0 20 0 -1109.216 3.3e-004 0.13566 0.00000 0.0197819 0 21 0 -1109.216 2.5e-004 0.09193 0.79675 0.0127820 0 22 0 -1109.216 6.1e-005 0.08539 0.00000 0.0120621 0 23 0 -1109.216 1.2e-004 0.07472 0.92720 0.0110622 0 25 0 -1109.216 2.2e-004 0.04022 6.84048 5e-00323 0 26 0 -1109.216 2.6e-005 0.03125 0.00000 5e-00324 0 27 0 -1109.216 1.2e-005 0.02700 0.00000 4e-00325 0 28 0 -1109.216 8.4e-006 0.02329 0.46649 3e-00326 0 29 0 -1109.216 1.5e-006 0.02230 0.00000 3e-00327 0 30 0 -1109.216 1.5e-005 0.01114 6.74649 1e-00328 0 31 0 -1109.216 2.5e-006 8e-003 0.00000 1e-003

Successful Termination After 28 Iterations


GCONV convergence criterion satisfied.Criterion -1109.215521 Max Grad Entry 0.007849886Ridge (lambda) 0.000000000 TR Radius 0.001409232Act.dF/Pred.dF 0.432393760N. Function Calls 32 N. Gradient Calls 2N. Hessian Calls 30 Preproces. Time 1Time for Method 24 Effective Time 26





Fixed EffectsAlph_1 -0.450315431 0.180332722 -2.497136552 0.012520 (2)Alph_2 0.531316222 0.262491114 2.024130316 0.042957 (2)Alph_3 1.889644218 0.314352992 6.011217532 1.8e-009 (2)Alph_4 2.748375264 0.405122028 6.784067692 1.2e-011 (2)Alph_5 2.192529235 0.375564441 5.837957473 5.3e-009 (2)Alph_6 -0.007298538 0.531096240 -0.013742401 0.989035 (2)Alph_7 -1.778781872 0.584551518 -3.042985634 2.3e-003 (2)Alph_8 -0.818155787 0.550799514 -1.485396712 0.137439 (2)

Random Effect Var & Cov Terms: Cholesky of Var-Cov MatrixMean_1 0.770926528 0.280242943 2.750922180 5.9e-003 (2)Mean_2 0.960791082 0.268060997 3.584225575 3.4e-004 (2)


Fixed EffectsAlph_1 -2.650839188 0.400692568 -6.615643505 3.7e-011 (2)Alph_2 0.751291575 0.548416939 1.369927733 0.170709 (2)Alph_3 2.614348166 0.450433039 5.804077274 6.5e-009 (2)Alph_4 3.999616394 0.519625656 7.697111081 1.4e-014 (2)Alph_5 4.021035065 0.490664203 8.195085444 2.5e-016 (2)Alph_6 2.159698205 0.690969869 3.125604028 1.8e-003 (2)


Alph_7 0.710390052 0.730067248 0.973047419 0.330530 (2)Alph_8 0.459917980 0.700152153 0.656882905 0.511256 (2)

Random Effect Var & Cov Terms: Cholesky of Var-Cov MatrixMean_1 2.228009157 0.316565657 7.038063380 1.9e-012 (2)Mean_2 2.431725672 0.309520474 7.856429156 4.0e-015 (2)



1| 0.032522| -0.03252 0.06893| -0.02988 0.02988 0.098824| -0.03019 0.03019 0.0586 0.16415| -0.03008 0.03008 0.05663 0.06716 0.1416| 0.02988 -0.05252 -0.09882 -0.0586 -0.056637| 0.03019 -0.05385 -0.0586 -0.1641 -0.067168| 0.03008 -0.05357 -0.05663 -0.06716 -0.1419| 0.02446 -0.02446 -0.01429 -0.01409 -0.01561

10| -0.02446 0.05821 0.01429 0.01409 0.0156111| -0.01641 0.01641 0.08133 0.04889 0.0472212| -0.01632 0.01632 0.04583 0.1444 0.0555313| -0.01585 0.01585 0.03947 0.04881 0.113814| 0.01641 -0.0288 -0.08133 -0.04889 -0.0472215| 0.01632 -0.03042 -0.04583 -0.1444 -0.0555316| 0.01585 -0.02929 -0.03947 -0.04881 -0.113817| -0.001425 0.001425 0.04042 0.0553 0.0515418| 4.471e-010 0.008701 -1.051e-010 -2.456e-009 -3.722e-00919| 0.0004577 -0.0004577 0.02457 0.03437 0.0335420| 2.49e-009 0.007374 -5.083e-009 -8.386e-009 -7.534e-009

6| 0.28217| 0.1349 0.34178| 0.1341 0.1382 0.30349| 0.01429 0.01409 0.01561 0.1606

10| -0.02568 -0.02909 -0.02868 -0.1606 0.300811| -0.08133 -0.04889 -0.04722 -0.1018 0.101812| -0.04583 -0.1444 -0.05553 -0.1119 0.111913| -0.03947 -0.04881 -0.1138 -0.1205 0.120514| 0.2317 0.1124 0.1096 0.1018 -0.207315| 0.09939 0.2684 0.1066 0.1119 -0.226816| 0.09832 0.1056 0.2382 0.1205 -0.227917| -0.04042 -0.0553 -0.05154 0.005389 -0.00538918| 0.06001 0.05074 0.05254 2.517e-009 0.0155219| -0.02457 -0.03437 -0.03354 -0.04942 0.0494220| 0.04259 0.03888 0.03795 9.317e-009 -0.0302


11| 0.202912| 0.131 0.2713| 0.1271 0.1427 0.240814| -0.2029 -0.131 -0.1271 0.477415| -0.131 -0.27 -0.1427 0.2932 0.53316| -0.1271 -0.1427 -0.2408 0.2857 0.301617| 0.04402 0.05433 0.04202 -0.04402 -0.0543318| -2.26e-009 -5.394e-009 -6.568e-009 0.0464 0.0296119| 0.04662 0.06483 0.06636 -0.04662 -0.0648320| -1.106e-008 -1.598e-008 -1.479e-008 0.08372 0.08083

16| 0.490217| -0.04202 0.0785418| 0.03706 -1.268e-009 0.0718619| -0.06636 0.04962 -1.413e-009 0.100220| 0.07767 -3.912e-009 0.049 -4.842e-009 0.0958


1| 12| -0.687 13| -0.527 0.3621 14| -0.4133 0.2839 0.4601 15| -0.4442 0.3052 0.4797 0.4414 16| 0.3119 -0.3768 -0.5919 -0.2723 -0.28397| 0.2864 -0.351 -0.3189 -0.693 -0.30598| 0.3029 -0.3705 -0.3271 -0.301 -0.68199| 0.3385 -0.2326 -0.1134 -0.08682 -0.1037

10| -0.2473 0.4044 0.08287 0.06343 0.0757911| -0.202 0.1388 0.5744 0.2679 0.279212| -0.1742 0.1196 0.2806 0.6859 0.284513| -0.1792 0.1231 0.2559 0.2456 0.617414| 0.1317 -0.1588 -0.3744 -0.1746 -0.18215| 0.124 -0.1587 -0.1997 -0.4882 -0.202516| 0.1255 -0.1594 -0.1793 -0.1721 -0.432717| -0.02819 0.01937 0.4588 0.4871 0.489718| 9.249e-009 0.1237 -1.247e-009 -2.261e-008 -3.697e-00819| 0.008017 -0.005508 0.2469 0.268 0.282120| 4.462e-008 0.09076 -5.225e-008 -6.688e-008 -6.481e-008

6| 17| 0.4346 18| 0.4585 0.4291 19| 0.06713 0.06017 0.07073 1

10| -0.08816 -0.09074 -0.09494 -0.7306 111| -0.34 -0.1857 -0.1903 -0.5641 0.4121


12| -0.1661 -0.4754 -0.194 -0.5373 0.392513| -0.1515 -0.1702 -0.421 -0.6131 0.447914| 0.6315 0.2782 0.288 0.3677 -0.547115| 0.2563 0.6289 0.265 0.3824 -0.566316| 0.2644 0.2581 0.6177 0.4296 -0.593617| -0.2715 -0.3376 -0.3339 0.04799 -0.0350718| 0.4215 0.3238 0.3559 2.344e-008 0.105619| -0.1462 -0.1857 -0.1923 -0.3896 0.284720| 0.2591 0.2149 0.2226 7.512e-008 -0.1779

11| 112| 0.5598 113| 0.575 0.5599 114| -0.6519 -0.3649 -0.3748 115| -0.3985 -0.7118 -0.3985 0.5812 116| -0.4029 -0.3923 -0.7008 0.5906 0.5917| 0.3487 0.3731 0.3056 -0.2273 -0.265618| -1.871e-008 -3.873e-008 -4.994e-008 0.2505 0.151319| 0.327 0.3941 0.4272 -0.2131 -0.280520| -7.932e-008 -9.937e-008 -9.738e-008 0.3915 0.3577

16| 117| -0.2142 118| 0.1975 -1.687e-008 119| -0.2994 0.5593 -1.665e-008 120| 0.3584 -4.51e-008 0.5906 -4.941e-008 1


Parameter Estimate LowWaldCL UppWaldCLAlph_1_1 -0.4503154 -0.80376107 -0.09686979Alph_2_1 0.5313162 0.01684309 1.04578935Alph_3_1 1.8896442 1.27352367 2.50576476Alph_4_1 2.7483753 1.95435068 3.54239985Alph_5_1 2.1925292 1.45643646 2.92862201Alph_6_1 -0.0072985 -1.04822804 1.03363097Alph_7_1 -1.7787819 -2.92448180 -0.63308195Alph_8_1 -0.8181558 -1.89770300 0.26139142Alph_1_2 -2.6508392 -3.43618219 -1.86549618Alph_2_2 0.7512916 -0.32358588 1.82616902Alph_3_2 2.6143482 1.73151563 3.49718070Alph_4_2 3.9996164 2.98116882 5.01806397Alph_5_2 4.0210351 3.05935090 4.98271923Alph_6_2 2.1596982 0.80542215 3.51397426Alph_7_2 0.7103901 -0.72051546 2.14129556


Alph_8_2 0.4599180 -0.91235502 1.83219098COV_1_1_1 0.7709265 0.22166045 1.32019260COV_2_2_1 0.9607911 0.43540118 1.48618098COV_1_1_2 2.2280092 1.60755187 2.84846644COV_2_2_2 2.4317257 1.82507669 3.03837465


[ 1]: Alph_1_1 -0.4503154[ 2]: Alph_2_1 0.5313162[ 3]: Alph_3_1 1.8896442[ 4]: Alph_4_1 2.7483753[ 5]: Alph_5_1 2.1925292[ 6]: Alph_6_1 -0.0072985[ 7]: Alph_7_1 -1.7787819[ 8]: Alph_8_1 -0.8181558[ 9]: Alph_1_2 -2.6508392[10]: Alph_2_2 0.7512916[11]: Alph_3_2 2.6143482[12]: Alph_4_2 3.9996164[13]: Alph_5_2 4.0210351[14]: Alph_6_2 2.1596982[15]: Alph_7_2 0.7103901[16]: Alph_8_2 0.4599180[17]: COV_1_1_1 0.7709265[18]: COV_2_2_1 0.9607911[19]: COV_1_1_2 2.2280092[20]: COV_2_2_2 2.4317257



1 1 4 0.652430286 0.6334271342 2 4 0.111514214 0.5808639433 3 4 0.288639364 0.597830669

...........................................360 360 4 -0.566834685 0.658898734361 361 2 -0.883025712 0.833461576362 362 4 -0.247644747 0.628687767

Since the response variable has 3 levels and we wish to estimate separate parameters for levels 1 vs. 0 and2 vs. 0 by setting vcat, there will be 2 sets of parameters (fixed covariates, random means and randomcovariance matrices):


options ls=68 ps=2000;#include "sdhouse.dat"optn = [ "print" 34,

"nomis" ,"link" "logit","efdis" "normal","yscale" "n","idvar" 1,"ctvar" 12,"cl" "wald","vcat" ,"nquad" 20,"tech" "trureg" ];

This example uses 2 random effects and specifies a number of 6 contrasts among the fixed effects (covariates):

model = "2 = 4 12 13";class = [ 2 4 ]; /* these are var numbers in data set */random = [ 0 2 ]; /* effect numbers in model statement *//* order or parms: 4 alfa, 2 mu, 2 covmat(2 by 2) */const = [ 1 1 0 0 0 0 0 0 0 0 0 0 0 0,

0 0 1 1 0 0 0 0 0 0 0 0 0 0,1 2 0 0 0 0 0 0 0 0 0 0 0 0,0 0 1 2 0 0 0 0 0 0 0 0 0 0,1 3 0 0 0 0 0 0 0 0 0 0 0 0,0 0 1 3 0 0 0 0 0 0 0 0 0 0 ];

< gof,par,ase,ci,cov > = glmixd(sdhous,model,optn,class,random,const);

Since there are 2 random effects, the numerical quadrature must be done in two dimensions using a gridof 100 points (10 in each deimension specified by optn[17]=10). Since the quadrature must be done duringeach function call the computer time needed for the optimization process will increase compared with thatneeded for only one random effect.


Number Valid Observations 1289Response Variable Y[2]N Independend Variables 3Error Distribution NORMALLink Function LOGITEffect Distribution NORMALNominal Response with 3 Categ.Number Random Effects 2Number Fixed Effects 2Quad Points per Dimension 10Skip Cases with Missing Values


Subject Variable Column 1Significance Level: 0.0500000Design Coding: Full-Rank

*************Model Effects*************

Intercept + C4 + X12 + X13


Class Level Value

Y[ 1] 3 0 1 2C[ 4] 2 0 1I[ 1] 362 1 2 3 4 5

6 7 8 9 1011 12 13 14 15

...................................351 352 353 354 355356 357 358 359 360361 362

T[12] 4 0 1 2 3


Column Nobs Mean Std Dev Skewness KurtosisX[12] 1448 1.5000000 1.1184203 0.0000000 -1.3605527X[13] 1448 0.7500000 1.0901012 1.0878333 -0.3683209



1 | Alph_1_1 | Resp 0 vs. 1 C4_02 | Alph_2_1 | Resp 0 vs. 1 X133 | Alph_1_2 | Resp 0 vs. 2 C4_04 | Alph_2_2 | Resp 0 vs. 2 X135 | Mean_1_1 | Resp 0 vs. 1 Random Intercept


6 | Mean_2_1 | Resp 0 vs. 1 X127 | Mean_1_2 | Resp 0 vs. 2 Random Intercept8 | Mean_2_2 | Resp 0 vs. 2 X12

-------------------------------------------------------------Covariances for Response Level 1

9 | COV_1_1_1 | Random Intercept Random Intercept10 | COV_1_2_1 | Random Intercept X1211 | COV_2_2_1 | X12 X12

-------------------------------------------------------------Covariances for Response Level 2

12 | COV_1_1_2 | Random Intercept Random Intercept13 | COV_1_2_2 | Random Intercept X1214 | COV_2_2_2 | X12 X12

-------------------------------------------------------------




1 484 37.5484872 511 39.643134

I[ 1] 1 4 0.3103182 4 0.3103183 4 0.310318

.......................360 4 0.310318361 2 0.155159362 4 0.310318

T[12] 0 361 28.0062061 322 24.9806052 303 23.5065943 303 23.506594

C[ 4] 0 632 49.0302561 657 50.969744

******************************************Crosstabulation of T[12] vs. Response Y[2]******************************************

0 1 2 Total------------------------------------------------


0 180 136 45 3610.498615 0.376731 0.124654

1 45 138 139 3220.139752 0.428571 0.431677

2 32 108 163 3030.105611 0.356436 0.537954

3 37 102 164 3030.122112 0.336634 0.541254

------------------------------------------------Total 294 484 511 1289


Y Vector--------

1 : 1 2 2 2

X Matrix--------

1| 1 02| 1 13| 1 24| 1 3

W Matrix--------

1| 1 02| 1 13| 1 24| 1 3

***************Starting Values***************


1| -0.3629 -0.022112| -0.3991 -0.02433

Means-----


1| -2.607 -0.3552| -0.968 0


1| 1.8142| 0 0.9069


1| 1.9952| 0 1.088

Trust Region OptimizationWithout Parameter ScalingUser Specified Gradient

User Specified Hessian (dense)

Iteration Start:N. Variables 14Criterion -2080.874221 Max Grad Entry 587.2391981TR Radius 1.000000000


1 0 2 0 -1802.919 277.9557 555.198 18.3251 1.000002 0 3 0 -1525.733 277.1853 498.468 0.00000 3.749663 0 4 0 -1373.228 152.5053 421.984 0.00000 9.021974 0 5 0 -1265.240 107.9879 328.878 0.00000 5.278195 0 6 0 -1191.263 73.97735 218.534 0.00000 6.155716 0 7 0 -1128.009 63.25418 79.0500 0.00000 3.998457 0 8 0 -1101.049 26.95981 27.1235 0.00000 3.937268 0 9 0 -1094.935 6.113478 6.30824 0.00000 1.279479 0 10 0 -1094.165 0.770494 2.64173 1.06019 0.56009

10 0 12 0 -1093.927 0.237737 0.67726 2.15223 0.2291411 0 13 0 -1093.916 0.011406 1.04615 1.31678 0.2459812 0 14 0 -1093.866 0.049892 0.37932 3.37100 0.1234013 0 16 0 -1093.862 3.8e-003 0.20643 2.19802 0.0374414 0 18 0 -1093.861 1.4e-003 0.20594 3.26697 0.0198415 0 19 0 -1093.860 2.1e-004 0.12267 0.86089 0.0207816 0 20 0 -1093.860 3.9e-005 0.12338 1.44258 0.0224517 0 21 0 -1093.860 4.5e-004 0.08590 4.77684 0.0112718 0 22 0 -1093.860 4.8e-005 0.07471 0.00000 0.0113619 0 23 0 -1093.860 4.7e-006 0.05427 0.96955 4e-00320 0 24 0 -1093.860 3.0e-005 0.03362 5.46950 2e-00321 0 25 0 -1093.860 4.1e-006 0.02352 0.00000 2e-003


22 0 26 0 -1093.860 1.7e-006 0.01693 1.65083 9e-004

Successful Termination After 22 IterationsGCONV convergence criterion satisfied.Criterion -1093.859823 Max Grad Entry 0.016929996Ridge (lambda) 1.650831471 TR Radius 0.000862830Act.dF/Pred.dF 0.231500503N. Function Calls 27 N. Gradient Calls 2N. Hessian Calls 24 Preproces. Time 5Time for Method 137 Effective Time 148

********************Optimization Results********************

Parameter Estimates-------------------

Parameter Estimate Gradient

1 X1 0.33477527 0.00165442 X2 -0.47548057 -0.00233923 X3 1.11420912 -0.00258484 X4 0.59326953 -0.00168155 X5 -0.38329711 6.48e-0046 X6 2.38167019 -0.00956437 X7 -1.78229020 -0.00277938 X8 2.54554267 -0.00220689 X9 0.28581704 0.003185210 X10 1.58886200 0.016930011 X11 -0.26009339 -0.003519312 X12 0.32257032 0.001392613 X13 2.23934924 2.87e-00414 X14 0.90659539 5.62e-004

Value of Objective Function = -1093.86



-------- ------------ ------------ ------------ ------------Variable Estimate Stand. Error Z Value p-Value


-------- ------------ ------------ ------------ ------------


Fixed EffectsAlph_1 0.334775272 0.220753275 1.516513280 0.129390 (2)Alph_2 -0.475480567 0.287286947 -1.655071946 0.097910 (2)

Random EffectsMean_1 -0.383297110 0.153441091 -2.498008241 0.012489 (2)Mean_2 2.381670186 0.420154909 5.668552559 1.4e-008 (2)

Random Effect Var & Cov Terms: Cholesky of Var-Cov MatrixMean_1 0.285817041 0.162347132 1.760530274 0.039159 (1)covar. 1.588862000 0.283944879 5.595670557 2.2e-008 (2)Mean_2 -0.260093390 0.291184586 -0.893225131 0.185868 (1)


Fixed EffectsAlph_1 1.114209122 0.296375613 3.759449404 1.7e-004 (2)Alph_2 0.593269532 0.368057588 1.611893224 0.106985 (2)

Random EffectsMean_1 -1.782290203 0.233606580 -7.629452069 2.4e-014 (2)Mean_2 2.545542670 0.474876879 5.360426635 8.3e-008 (2)

Random Effect Var & Cov Terms: Cholesky of Var-Cov MatrixMean_1 0.322570324 0.218452215 1.476617323 0.069889 (1)covar. 2.239349244 0.332747538 6.729874711 1.7e-011 (2)Mean_2 0.906595389 0.386744213 2.344173121 9.5e-003 (1)


*****************************************Random Effects Variance-Covariance Matrix*****************************************

1 Mean_1 var = ( 0.29 * 0.29 ) = 0.08169cov = ( 0.29 * 1.59 ) = 0.45412

Mean_2 var = ( 1.59 * 1.59 ) + ( -0.26 * -0.26 ) = 2.59213Covariance expressed as a correlation = 0.986865

2 Mean_1 var = ( 0.32 * 0.32 ) = 0.10405cov = ( 0.32 * 2.24 ) = 0.72235

Mean_2 var = ( 2.24 * 2.24 ) + ( 0.91 * 0.91 ) = 5.83660


Covariance expressed as a correlation = 0.926919


1| 0.048732| -0.02162 0.082533| 0.02851 -0.01189 0.087844| -0.009309 0.07446 -0.03942 0.13555| -0.02285 0.009589 -0.01344 0.004814 0.023546| 0.01268 -0.02882 0.008792 -0.03361 -0.017257| -0.01382 0.005669 -0.05327 0.02271 0.013988| 0.007297 -0.02872 0.02693 -0.06773 -0.012219| 0.00108 0.000109 -0.0007555 0.002414 0.001876

10| 0.001533 0.003672 0.00224 -0.0009207 -0.00601111| -0.0003057 0.01052 -0.0007071 0.01247 0.000769812| 0.002044 -0.001803 0.007015 -0.006449 0.00269913| 0.0008218 0.01279 -0.001508 0.01962 -0.00601914| 0.0003883 0.004576 -0.002072 0.0247 0.001871

6| 0.17657| -0.01436 0.054578| 0.1847 -0.03252 0.22559| 0.001079 0.004871 -0.0009164 0.02636

10| 0.1033 -0.007842 0.1104 -0.009411 0.0806211| -0.02197 0.002671 -0.03229 0.006786 -0.014912| 0.005487 -0.004873 0.009515 0.01037 -0.000330313| 0.1019 -0.006093 0.1045 -0.005022 0.073414| -0.03134 0.002913 -0.04875 0.01084 -0.01974

11| 0.0847912| -0.001141 0.0477213| -0.02542 -0.01878 0.110714| 0.1005 -0.004261 -0.03166 0.1496


1| 12| -0.3409 13| 0.4358 -0.1396 14| -0.1146 0.7042 -0.3614 15| -0.6747 0.2175 -0.2956 0.08524 16| 0.1367 -0.2387 0.0706 -0.2174 -0.26757| -0.268 0.08447 -0.7694 0.2641 0.38998| 0.06961 -0.2105 0.1914 -0.3875 -0.1676


9| 0.03013 0.002338 -0.0157 0.0404 0.075310| 0.02445 0.04502 0.02661 -0.00881 -0.13811| -0.004755 0.1258 -0.008193 0.1164 0.0172312| 0.04239 -0.02873 0.1083 -0.08021 0.0805113| 0.01119 0.1338 -0.01529 0.1602 -0.117914| 0.004548 0.04118 -0.01807 0.1735 0.03154

6| 17| -0.1463 18| 0.9258 -0.2932 19| 0.01581 0.1284 -0.01189 1

10| 0.8657 -0.1182 0.8188 -0.2041 111| -0.1796 0.03926 -0.2335 0.1435 -0.180212| 0.05978 -0.0955 0.09173 0.2923 -0.00532513| 0.7291 -0.07838 0.6612 -0.09297 0.776914| -0.1929 0.03225 -0.2654 0.1726 -0.1798

11| 112| -0.01793 113| -0.2624 -0.2583 114| 0.8923 -0.05043 -0.246 1


Parameter Estimate LowWaldCL UppWaldCLAlph_1_1 0.3347753 -0.09789320 0.76744374Alph_2_1 -0.4754806 -1.03855264 0.08759150Alph_1_2 1.1142091 0.53332359 1.69509465Alph_2_2 0.5932695 -0.12811008 1.31464915Mean_1_1 -0.3832971 -0.68403612 -0.08255810Mean_2_1 2.3816702 1.55818170 3.20515868Mean_1_2 -1.7822902 -2.24015069 -1.32442972Mean_2_2 2.5455427 1.61480109 3.47628425COV_1_1_1 0.2858170 -0.03237749 0.60401157COV_1_2_1 1.5888620 1.03234026 2.14538374COV_2_2_1 -0.2600934 -0.83080469 0.31061791COV_1_1_2 0.3225703 -0.10558815 0.75072880COV_1_2_2 2.2393492 1.58717605 2.89152244COV_2_2_2 0.9065954 0.14859066 1.66460012



[ 1]: Alph_1_1 0.3347753[ 2]: Alph_2_1 -0.4754806[ 3]: Alph_1_2 1.1142091[ 4]: Alph_2_2 0.5932695[ 5]: Mean_1_1 -0.3832971[ 6]: Mean_2_1 2.3816702[ 7]: Mean_1_2 -1.7822902[ 8]: Mean_2_2 2.5455427[ 9]: COV_1_1_1 0.2858170[10]: COV_1_2_1 1.5888620[11]: COV_2_2_1 -0.2600934[12]: COV_1_1_2 0.3225703[13]: COV_1_2_2 2.2393492[14]: COV_2_2_2 0.9065954

Transform Matrix****************

Matrix: Sparse Storage

1 | 1 21.0000000 1.0000000

2 | 3 41.0000000 1.0000000

3 | 1 21.0000000 2.0000000

4 | 3 41.0000000 2.0000000

5 | 1 21.0000000 3.0000000

6 | 3 41.0000000 3.0000000

*********************************Transforms of Parameter Estimates*********************************

--------- ------------ ------------ ------------ ------------Transform Estimate Stand. Error Z Value p-Value--------- ------------ ------------ ------------ ------------

1 -0.140705295 0.296684760 -0.474258586 0.6353155202 1.707478654 0.380076398 4.492461680 7.0405e-006


3 -0.616185862 0.540721061 -1.139563273 0.2544682944 2.300748185 0.687030136 3.348831534 0.0008115315 -1.091666430 0.813513531 -1.341915516 0.1796233906 2.894017717 1.034647536 2.797104923 0.005156279

Note: p-values are 2-tailed

Correlation of MML Transformed Estimates----------------------------------------

1| 12| 0.7252 13| 0.9284 0.7024 14| 0.7208 0.921 0.7644 15| 0.8694 0.6692 0.9908 0.7532 16| 0.6909 0.8558 0.7571 0.9897 0.7545

6| 1

3. Example from the MIXOR manual by D. Hedeker: NIMH Schiz Data - Two Groups - Seven TimepointsIMPS79 (ordinal) across SQRT Week - one random effect3059 observation (some with missing Y)

/* NIMH Schiz Data - Two Groups - Seven TimepointsIMPS79 (ordinal) across SQRT Week - one random effect:Missing values are coded with -9[1] : subject IDvar for cross tabulation[2] : Imps79 score[3] : dichotomous Imps79 score[4] : ordinal Imps79 score: Y Variable[5] : intercept[6] : treatment group[7] : week[8] : sqrt of week[9] : [6] * [8] */

schizo = [1103 5.500 1 4 1 1 0 0.0000 0.00001103 3.000 0 2 1 1 1 1.0000 1.00001103 -9.000 -9 -9 1 1 2 1.4142 1.4142................................................9316 -9.000 -9 -9 1 0 5 2.2361 0.00009316 6.000 1 4 1 0 6 2.4495 0.0000 ];

schizo0 = replace(schizo,-9.,.);schizo1 = shape(schizo0,.,9);

The first model treats the intercept as random effect:


#include "schizx1.dat"optn = [ "print" 4,

"nomis" ,"link" "probit","efdis" "normal","yscale" "o","idvar" 1,"ctvar" 7,"nquad" 20,"cl" "wald","tech" "trureg" ];

model = "4 = 6 8 9";class = [ 1 4 6 7 ];random = 0;< gof,par,ase,ci,cov > = glmixd(schizo1,model,optn,class,random);


Number Valid Observations 1603Response Variable Y[4]N Independend Variables 3Error Distribution NORMALLink Function PROBITEffect Distribution NORMALOrdinal Response with 4 Categ.Number Random Effects 1Number Fixed Effects 3Quad Points per Dimension 20Skip Cases with Missing ValuesSubject Variable Column 1Significance Level: 0.0500000Design Coding: Full-Rank

Covariates and Random Effects Mean Subtracted from ThresholdsPositive Value: Positive Association bw. Regressor and Response

*************Model Effects*************




Class Level Value

Y[1] 4 1 2 3 4C[6] 2 0 1I[1] 437 1103 1104 1105 1106 1107

1108 1109 1110 1111 11121113 1114 1115 1118 1119....................................9310 9311 9312 9313 93149315 9316

T[7] 7 0 1 2 3 45 6


Column Nobs Mean Std Dev Skewness KurtosisX[8] 3059 1.5474143 0.7783017 -0.8225575 -0.3674431X[9] 3059 1.1649870 0.9495875 -0.1371005 -1.5833737



1 | Gamm_1 | Threshold12 | Gamm_2 | Threshold23 | Alph_1 | C6_04 | Alph_2 | X85 | Alph_3 | X96 | Mean_1 | Random Intercept7 | COV_1_1 | Random Intercept Random Intercept

-------------------------------------------------------------




2 474 29.5695573 412 25.701809


4 527 32.875858

I[1] 1103 4 0.2495321104 4 0.2495321105 3 0.187149........................9314 4 0.2495329315 4 0.2495329316 4 0.249532

T[7] 0 434 27.0742361 426 26.5751722 14 0.8733623 374 23.3312544 11 0.6862135 9 0.5614476 335 20.898316

C[6] 0 378 23.5807861 1225 76.419214


1 2 3 4 Total----------------------------------------------------------

0 1 54 122 257 4340.002304 0.124424 0.281106 0.592166

1 23 135 124 144 4260.053991 0.316901 0.291080 0.338028

2 3 4 2 5 140.214286 0.285714 0.142857 0.357143

3 54 132 113 75 3740.144385 0.352941 0.302139 0.200535

4 5 3 2 1 110.454545 0.272727 0.181818 0.090909

5 3 4 0 2 90.333333 0.444444 0.000000 0.222222

6 101 142 49 43 3350.301493 0.423881 0.146269 0.128358

----------------------------------------------------------Total 190 474 412 527 1603

*********************************************Data for Subject 1103 who has 4 Observations*********************************************


Y Vector--------

1 : 3 1 1 1

X Vector--------

1 : 1 1 1 1

W Matrix--------

1| 1 0 02| 1 1 13| 1 1.732 1.7324| 1 2.45 2.45

***************Starting Values***************

Category Thresholds-------------------

1 : 1.037 1.7


1 : 0.02841 0.2642 0.3449

Means-----

1 : 0.8354

Covariance Parameter Matrix---------------------------

1| 0.3162


Total Iterations 15 N Function Calls 18N Gradient Calls 4 N Hessian Calls 19


Log Likelihood -1699.739475 Deviance (-2logL) 3399.478950AIC -3385.478950 SBC -3347.821525


Thresholds1 0.0000000002 1.729318581 0.074717088 23.14488720 1.6e-118 (1)3 2.939688904 0.095460595 30.79478935 3.1e-208 (1)

Fixed EffectsAlph_1 -0.051255113 0.179121593 -0.286147039 0.774766 (2)Alph_2 -0.459113850 0.074706919 -6.145533226 8.0e-010 (2)Alph_3 -0.672200147 0.086110426 -7.806257360 5.9e-015 (2)

Random EffectsMean_1 3.366089474 0.183953951 18.29854404 8.5e-075 (2)



*******************************Intracluster Correlation

Residual Variance = 1 (assumed)*******************************

1 cluster variance = ( 1.10770 * 1.10770 ) = 1.22701intracluster corr = 1.22701 / ( 1.22701 + 1.0) = 0.55097


1| 0.0055832| 0.005755 0.0091133| 0.00012 3.014e-006 0.032084| -0.0005741 -0.001009 0.006694 0.0055815| -0.001064 -0.001659 -0.008755 -0.005272 0.0074156| 0.006229 0.009075 -0.02444 -0.0077 0.0049447| 0.001792 0.003151 -0.0002729 -0.0004552 -0.0008492

6| 0.033847| 0.004032 0.004324

Correlation of Maximum Marginal Likelihood Estimates


----------------------------------------------------

1| 12| 0.8068 13| 0.008967 0.0001763 14| -0.1028 -0.1415 0.5002 15| -0.1654 -0.2019 -0.5676 -0.8196 16| 0.4532 0.5168 -0.7417 -0.5603 0.31217| 0.3648 0.502 -0.02317 -0.09267 -0.15

6| 17| 0.3334 1


Parameter Estimate LowWaldCL UppWaldCLGamm_1 1.7293186 1.58287578 1.87576138Gamm_2 2.9396889 2.75258958 3.12678823Alph_1 -0.0512551 -0.40232698 0.29981676Alph_2 -0.4591139 -0.60553672 -0.31269098Alph_3 -0.6722001 -0.84097348 -0.50342681Mean_1 3.3660895 3.00554636 3.72663259COV_1_1 1.1077041 0.97882139 1.23658676


[1]: Gamm_1 1.7293186[2]: Gamm_2 2.9396889[3]: Alph_1 -0.0512551[4]: Alph_2 -0.4591139[5]: Alph_3 -0.6722001[6]: Mean_1 3.3660895[7]: COV_1_1 1.1077041



1 1103 4 -0.138022204 0.4651355682 1104 4 -0.477353047 0.4608965513 1105 3 -1.394397634 0.523649832


...........................................436 9315 4 1.449327551 0.521356396437 9316 4 1.282741684 0.642068392

The second model treats intercept and sqare root of week as random:

optn = [ "print" 4,"nomis" ,"link" "probit","efdis" "normal","yscale" "o","idvar" 1,"ctvar" 7,"nquad" 10,"cl" "wald","add2dia" 1.,"tech" "trureg" ];

model = "4 = 6 8 9";class = [ 1 4 6 7 ];random = [ 0 2 ];< gof,par,ase,ci,cov > = glmixd(schizo1,model,optn,class,random);


Number Valid Observations 1603Response Variable Y[4]N Independend Variables 3Error Distribution NORMALLink Function PROBITEffect Distribution NORMALOrdinal Response with 4 Categ.Number Random Effects 2Number Fixed Effects 2Quad Points per Dimension 10Skip Cases with Missing ValuesSubject Variable Column 1Significance Level: 0.0500000Design Coding: Full-Rank

Covariates and Random Effects Mean Subtracted from ThresholdsPositive Value: Positive Association bw. Regressor and Response

*************Model Effects*************




Class Level Value

Y[1] 4 1 2 3 4C[6] 2 0 1I[1] 437 1103 1104 1105 1106 1107

1108 1109 1110 1111 11121113 1114 1115 1118 1119....................................9310 9311 9312 9313 93149315 9316

T[7] 7 0 1 2 3 45 6


Column Nobs Mean Std Dev Skewness KurtosisX[8] 3059 1.5474143 0.7783017 -0.8225575 -0.3674431X[9] 3059 1.1649870 0.9495875 -0.1371005 -1.5833737



1 | Gamm_1 | Threshold12 | Gamm_2 | Threshold23 | Alph_1 | C6_04 | Alph_2 | X95 | Mean_1 | Random Intercept6 | Mean_2 | X87 | COV_1_1 | Random Intercept Random Intercept8 | COV_1_2 | Random Intercept X89 | COV_2_2 | X8 X8

-------------------------------------------------------------





2 474 29.5695573 412 25.7018094 527 32.875858

I[1] 1103 4 0.2495321104 4 0.2495321105 3 0.187149........................9315 4 0.2495329316 4 0.249532

T[7] 0 434 27.0742361 426 26.5751722 14 0.8733623 374 23.3312544 11 0.6862135 9 0.5614476 335 20.898316

C[6] 0 378 23.5807861 1225 76.419214


1 2 3 4 Total----------------------------------------------------------

0 1 54 122 257 4340.002304 0.124424 0.281106 0.592166

1 23 135 124 144 4260.053991 0.316901 0.291080 0.338028

2 3 4 2 5 140.214286 0.285714 0.142857 0.357143

3 54 132 113 75 3740.144385 0.352941 0.302139 0.200535

4 5 3 2 1 110.454545 0.272727 0.181818 0.090909

5 3 4 0 2 90.333333 0.444444 0.000000 0.222222

6 101 142 49 43 3350.301493 0.423881 0.146269 0.128358

----------------------------------------------------------Total 190 474 412 527 1603


*********************************************Data for Subject 1103 who has 4 Observations*********************************************

Y Vector--------

1 : 3 1 1 1

X Matrix--------

1| 1 02| 1 13| 1 1.7324| 1 2.45

W Matrix--------

1| 1 02| 1 13| 1 1.7324| 1 2.45

***************Starting Values***************

Category Thresholds-------------------

1 : 1.037 1.7


1 : 0.02841 0.3449

Means-----

1 : 0.8354 0.2642

Covariance Parameter Matrix---------------------------


1| 22| 0 1.5




Thresholds1 0.0000000002 2.183331012 0.112825771 19.35135025 2.0e-083 (1)3 3.651343104 0.154784854 23.58979589 4.9e-123 (1)

Fixed EffectsAlph_1 0.044290640 0.218433389 0.202764971 0.839319 (2)Alph_2 -0.950915098 0.141406339 -6.724699228 1.8e-011 (2)

Random EffectsMean_1 4.104942623 0.254275913 16.14365503 1.3e-058 (2)Mean_2 -0.504713269 0.121963905 -4.138218334 3.5e-005 (2)

Random Effect Var & Cov Terms: Cholesky of Var-Cov MatrixMean_1 1.473277353 0.144468449 10.19791774 1.0e-024 (1)covar. -0.208190783 0.080291161 -2.592947721 9.5e-003 (2)Mean_2 0.764902542 0.075775388 10.09434015 2.9e-024 (1)


*****************************************Random Effects Variance-Covariance Matrix*****************************************

1 Mean_1 var = ( 1.47 * 1.47 ) = 2.17055cov = ( 1.47 * -0.21 ) = -0.30672

Mean_2 var = ( -0.21 * -0.21 ) + ( 0.76 * 0.76 ) = 0.62842Covariance expressed as a correlation = -0.262625



1| 0.012732| 0.01518 0.023963| 0.001153 0.001519 0.047714| -0.004236 -0.006512 -0.01778 0.025| 0.01633 0.02536 -0.03435 0.006412 0.064666| -0.00136 -0.00285 0.01299 -0.01326 -0.017187| 0.006084 0.01231 0.0008589 -0.002969 0.017768| -0.001952 -0.004408 -0.0004308 0.0007421 -0.0070179| 0.004698 0.007566 0.0009612 -0.003027 0.008508

6| 0.014887| -0.004577 0.020878| 0.002456 -0.00908 0.0064479| -0.0008342 0.005691 -0.003411 0.005742


1| 12| 0.869 13| 0.0468 0.04492 14| -0.2655 -0.2975 -0.5755 15| 0.5693 0.6444 -0.6184 0.1783 16| -0.09886 -0.151 0.4877 -0.769 -0.55417| 0.3733 0.5506 0.02722 -0.1453 0.48348| -0.2155 -0.3547 -0.02457 0.06536 -0.34379| 0.5495 0.6451 0.05807 -0.2825 0.4416

6| 17| -0.2598 18| 0.2508 -0.7828 19| -0.09026 0.5199 -0.5607 1


Parameter Estimate LowWaldCL UppWaldCLGamm_1 2.1833310 1.96219656 2.40446546Gamm_2 3.6513431 3.34797037 3.95471584Alph_1 0.0442906 -0.38383094 0.47241222Alph_2 -0.9509151 -1.22806643 -0.67376377Mean_1 4.1049426 3.60657099 4.60331425Mean_2 -0.5047133 -0.74375813 -0.26566841COV_1_1 1.4732774 1.19012440 1.75643031COV_1_2 -0.2081908 -0.36555857 -0.05082300COV_2_2 0.7649025 0.61638551 0.91341957



[ 1]: Gamm_1 2.1833310[ 2]: Gamm_2 3.6513431[ 3]: Alph_1 0.0442906[ 4]: Alph_2 -0.9509151[ 5]: Mean_1 4.1049426[ 6]: Mean_2 -0.5047133[ 7]: COV_1_1 1.4732774[ 8]: COV_1_2 -0.2081908[ 9]: COV_2_2 0.7649025

1.1.4 Examples for glmod

1. Analysis of Unbalanced 2-by-2 Factorial: (see [730], PROC GLM, p. 898) This is an ANOVA design whichhas only one observation in cell (a2, b2) but all other cells have 2 observations:

a = [ ’a1’ ’b1’ 12,’a1’ ’b1’ 14,’a1’ ’b2’ 11,’a1’ ’b2’ 9,’a2’ ’b1’ 20,’a2’ ’b1’ 18,’a2’ ’b2’ 17 ];

clas = [ 1 2 ];model = "3 = 1 2 1*2";optn = cons(10,1,.);optn[1] = 5;< parm,ase,conf > = glmod(a,model,optn,clas);print "\n GLMOD: Parameters=",parm,

"\n Asymptotic Standard Errors", ase,"\n Confidence Limits", conf;

Some information about the data matrix is printed first:


Number Valid Observations 7Response Variable Y[3]N Independend Variables 2


*************Model Effects*************

Intercept + C1 + C2 + C1 * C2


Class Level Value

[1] 2 a1 a2[2] 2 b1 b2

Number of Observations for Class Levels

Variable Value NobsC[1] a1 4

a2 3

C[2] b1 4b2 3

********************Analysis of Variance

Full Analysis********************

Source DF SSQ MeanSquare F_Value Prob>F

Model 3 91.71428571 30.57142857 15.285714 0.0253Error 3 6.000000000 2.000000000C Total 6 97.71428571

Goodness of Model Fit

R-square 0.938596491 Adj R-sq 0.877192982AIC 6.920945241 BIC 16.03205635SBC 6.704585837 C(p) 4.000000000



Variable DF Estimate Std_Error TypeII_SS F_Value Prob>F

Intercept 1 14.75000 0.559017 1392.4000 696.200 0.0001Effect 1 1 . . 67.600000 33.8000 0.0101C1 a1 1 -3.250000 0.559017 67.600000 33.8000 0.0101C1 a2 0 3.250000 . . . .Effect 2 1 . . 10.000000 5.00000 0.1114C2 b1 1 1.250000 0.559017 10.000000 5.00000 0.1114C2 b2 0 -1.250000 . . . .Effect 3 1 . . 0.4000000 0.20000 0.6850C1_a1*C2_b1 1 0.250000 0.559017 0.4000000 0.20000 0.6850C1_a1*C2_b2 0 -0.250000 . . . .C1_a2*C2_b1 0 -0.250000 . . . .C1_a2*C2_b2 0 0.250000 . . . .

Effect Type_I_SS MeanSquare F_Value DF Prob>F

C1 80.04761905 80.04761905 40.023810 1 0.0080C2 11.26666667 11.26666667 5.6333333 1 0.0982C1 * C2 0.400000000 0.400000000 0.2000000 1 0.6850

Effect Type_III_SS MeanSquare F_Value DF Prob>F

C1 67.60000000 67.60000000 33.800000 1 0.0101C2 10.00000000 10.00000000 5.0000000 1 0.1114C1 * C2 0.400000000 0.400000000 0.2000000 1 0.6850

The model can be expressed in a shorter form using the bar operator yielding the same results:

clas = [ 1 2 ];model = "3 = 1|2";optn = cons(10,1,.);optn[1] = 5;< parm,ase,conf > = glmod(a,model,optn,clas);print "\n GLMOD: Parameters=",parm,

"\n Asymptotic Standard Errors", ase,"\n Confidence Limits", conf;

2. 3 by 2 Design with Interactions: (see [730], PROC GLM, p. 932)

aby = [ 1 1 23.5,1 1 23.7,1 2 28.7,2 1 8.9,2 2 5.6,2 2 8.9,


3 1 10.3,3 1 12.5,3 2 13.6,3 2 14.6 ];

clas = [ 1 2 ];model = "3 = 1|2";optn = cons(10,1,.);optn[1] = 5;parm = glmod(aby,model,optn,clas);print "GLMOD: Parameters=",parm;



*************Model Effects*************



Class Level Value

[1] 3 1 2 3[2] 2 1 2


Variable Value NobsC[1] 1 3

2 33 4

C[2] 1 52 5



Full Analysis********************







Intercept 1 15.65833 0.482614 2206.6506 1052.67 0.0000Effect 1 2 . . 479.10786 114.277 0.0003C1 1 1 10.49167 0.703525 466.20029 222.397 0.0001C1 2 1 -7.583333 0.703525 243.55882 116.188 0.0004C1 3 0 -2.908333 . . . .Effect 2 1 . . 9.4556250 4.51073 0.1009C2 1 1 -1.025000 0.482614 9.4556250 4.51073 0.1009C2 2 0 1.025000 . . . .Effect 3 2 . . 15.730714 3.75211 0.1209C1_1*C2_1 1 -1.525000 0.703525 9.8497059 4.69873 0.0961C1_1*C2_2 0 1.525000 . . . .C1_2*C2_1 1 1.850000 0.703525 14.495294 6.91487 0.0582C1_2*C2_2 0 -1.850000 . . . .C1_3*C2_1 0 -0.325000 . . . .C1_3*C2_2 0 0.325000 . . . .


C1 494.0310000 247.0155000 117.83685 2 0.0003C2 10.71428571 10.71428571 5.1111679 1 0.0866C1 * C2 15.73071429 7.865357143 3.7521084 2 0.1209



C1 479.1078571 239.5539286 114.27737 2 0.0003C2 9.455625000 9.455625000 4.5107335 1 0.1009C1 * C2 15.73071429 7.865357143 3.7521084 2 0.1209

3. Snapdragon Experiment: (see [730], PROC GLM: Ex. 1, pp. 965)

x = [ "clarion" 1 32.7, "clarion" 2 32.3,"clarion" 3 31.5, "clinton" 1 32.1,"clinton" 2 29.7, "clinton" 3 29.1,"knox" 1 35.7, "knox" 2 35.9,"knox" 3 33.1, "o\’neill" 1 36.0,"o\’neill" 2 34.2, "o\’neill" 3 31.2,"compost" 1 31.8, "compost" 2 28.0,"compost" 3 29.2, "wabash" 1 38.2,"wabash" 2 37.8, "wabash" 3 31.9,"webster" 1 32.5, "webster" 2 31.1,"webster" 3 29.7 ];

model = "3 = 1 2";clas = [ 1 2 ];/* GLM Design Coding */optn = cons(10,1,.);optn[1] = 5; optn[10] = 1;parm = glmod(x,"3 = 1 2",optn,clas);print "GLMOD: Parameters=",parm;



*************Model Effects*************

Intercept + C1 + C2


Class Level Value


[1] 7 clarion clinton knox compost o’neillwabash webster

[2] 3 1 2 3


Variable Value NobsC[1] clarion 3

clinton 3knox 3compost 3o’neill 3wabash 3webster 3

C[2] 1 72 73 7


Full Analysis********************







Intercept 1 29.35714 0.839704 2010.9643 1222.29 0.0000Effect 1 6 . . 103.15143 10.4495 0.0004C1 clarion 1 1.066667 1.047294 1.7066667 1.03734 0.3285


C1 clinton 1 -0.800000 1.047294 0.9600000 0.58350 0.4597C1 knox 1 3.800000 1.047294 21.660000 13.1653 0.0035C1 compost 1 -1.433333 1.047294 3.0816667 1.87308 0.1962C1 o’neill 1 2.700000 1.047294 10.935000 6.64645 0.0242C1 wabash 1 4.866667 1.047294 35.526667 21.5936 0.0006C1 webster 0 0.000000 . . . .Effect 2 2 . . 39.037143 11.8637 0.0014C2 1 1 3.328571 0.685615 38.777857 23.5698 0.0004C2 2 1 1.900000 0.685615 12.635000 7.67974 0.0169C2 3 0 0.000000 . . . .


C1 103.1514286 17.19190476 10.449493 6 0.0004C2 39.03714286 19.51857143 11.863676 2 0.0014


C1 103.1514286 17.19190476 10.449493 6 0.0004C2 39.03714286 19.51857143 11.863676 2 0.0014

4. Regression with Mileage Data: (see [730], PROC GLM: Ex. 2, p. 969) Here, a quadratic model is used todetermine at which speed a car achieves the highest mileage:

x = [ 20 15.4,30 20.2,40 25.7,50 26.2,50 26.6,50 27.4,55 . ,60 24.8 ];

model = "2 = 1 1*1";optn = cons(4,1,.);optn[1] = 5;parm = glmod(x,model,optn);print "GLMOD: Parameters=",parm;




*************Model Effects*************

Intercept + X1 + X1


Column Nobs Mean Std Dev Skewness KurtosisY[2] 7 23.757143 4.3718254 -1.4920839 1.4095657X[1] 8 44.375000 13.479482 -0.9323480 -0.0041244


Full Analysis********************







Intercept 1 12.02500 3.187691 82.628929 14.2304 0.0130X1 1 0.273750 0.071279 85.644643 14.7498 0.0121X1 0 . . . . .



X1 85.64464286 85.64464286 14.749788 1 0.0121X1 . . . 0 .


X1 85.64464286 85.64464286 14.749788 1 0.0121X1 0.000000000 0.000000000 0.0000000 1 1.0000

5. Unbalanced ANOVA for Two-Way Design with Interaction: (see [730], PROC GLM: Ex. 3, p. 972) Theoriginal data sourse is Afifi & Azen (1972).

a = [ 1 1 42 44 36 13 19 22,1 2 33 . 26 . 33 21,1 3 31 -3 . 25 25 24,2 1 28 . 23 34 42 13,2 2 . 34 33 31 . 36,2 3 3 26 28 32 4 16,3 1 . . 1 29 . 19,3 2 . 11 9 7 1 -6,3 3 21 1 . 9 3 .,4 1 24 . 9 22 -2 15,4 2 27 12 12 -5 16 15,4 3 22 7 25 5 12 . ];

x = a[,1:3]; aa = a[ ,1:2];for (j = 4; j <= 8; j++) x = x |> (aa -> a[,j]);

First we report the results for the full-rank design coding:

/* use bar operator */clas = [ 1 2 ];model = "3 = 1|2";optn = cons(3,1,.);optn[1] = 5;parm = glmod(x,model,optn,clas);print "GLMOD: Parameters=",parm;



*************Model Effects


*************



Class Level Value

[1] 4 1 2 3 4[2] 3 1 2 3


Variable Value NobsC[1] 1 15

2 153 124 16

C[2] 1 192 193 20


Full Analysis********************








Intercept 1 18.95972 1.407657 20037.613 181.414 0.0000Effect 1 3 . . 2997.4719 9.04603 0.0001C1 1 1 7.034722 2.401150 948.04970 8.58332 0.0053C1 2 1 7.595833 2.401150 1105.3201 10.0072 0.0028C1 3 1 -9.215278 2.605423 1381.7710 12.5101 0.0009C1 4 0 -5.415278 . . . .Effect 2 2 . . 415.87305 1.88259 0.1637C2 1 1 2.856944 2.013078 222.46314 2.01411 0.1626C2 2 1 0.786111 1.993936 17.168060 0.15543 0.6952C2 3 0 -3.643056 . . . .Effect 3 6 . . 707.26626 1.06723 0.3958C1_1*C2_1 1 0.481944 3.302216 2.3526573 0.02130 0.8846C1_1*C2_2 1 1.469444 3.515962 19.292727 0.17467 0.6779C1_1*C2_3 0 -1.951389 . . . .C1_2*C2_1 1 -1.412500 3.393856 19.132242 0.17322 0.6792C1_2*C2_2 1 6.158333 3.515962 338.85481 3.06788 0.0865C1_2*C2_3 0 -4.745833 . . . .C1_3*C2_1 1 3.731944 3.872398 102.58552 0.92877 0.3402C1_3*C2_2 1 -6.130556 3.530476 333.04950 3.01532 0.0892C1_3*C2_3 0 2.398611 . . . .C1_4*C2_1 0 -2.801389 . . . .C1_4*C2_2 0 -1.497222 . . . .C1_4*C2_3 0 4.298611 . . . .


C1 3133.238506 1044.412835 9.4557615 3 0.0001C2 418.8337407 209.4168703 1.8959897 2 0.1617C1 * C2 707.2662593 117.8777099 1.0672250 6 0.3958


C1 2997.471860 999.1572868 9.0460330 3 0.0001C2 415.8730463 207.9365232 1.8825871 2 0.1637C1 * C2 707.2662593 117.8777099 1.0672250 6 0.3958

Now we report the results for the rankdeficient (GLM) design coding:

/* GLM Design Coding */clas = [ 1 2 ];model = "3 = 1 2 1*2";optn = cons(10,1,.);optn[1] = 5; optn[10] = 1;parm = glmod(x,model,optn,clas);print "GLMOD: Parameters=",parm;



Full Analysis********************







Intercept 1 18.95972 1.407657 20037.613 181.414 0.0000Effect 1 3 . . 2997.4719 9.04603 0.0001C1 1 1 7.034722 2.401150 948.04970 8.58332 0.0053C1 2 1 7.595833 2.401150 1105.3201 10.0072 0.0028C1 3 1 -9.215278 2.605423 1381.7710 12.5101 0.0009C1 4 0 -5.415278 . . . .Effect 2 2 . . 415.87305 1.88259 0.1637C2 1 1 2.856944 2.013078 222.46314 2.01411 0.1626C2 2 1 0.786111 1.993936 17.168060 0.15543 0.6952C2 3 0 -3.643056 . . . .Effect 3 6 . . 707.26626 1.06723 0.3958C1_1*C2_1 1 0.481944 3.302216 2.3526573 0.02130 0.8846C1_1*C2_2 1 1.469444 3.515962 19.292727 0.17467 0.6779C1_1*C2_3 0 -1.951389 . . . .C1_2*C2_1 1 -1.412500 3.393856 19.132242 0.17322 0.6792C1_2*C2_2 1 6.158333 3.515962 338.85481 3.06788 0.0865C1_2*C2_3 0 -4.745833 . . . .C1_3*C2_1 1 3.731944 3.872398 102.58552 0.92877 0.3402C1_3*C2_2 1 -6.130556 3.530476 333.04950 3.01532 0.0892C1_3*C2_3 0 2.398611 . . . .C1_4*C2_1 0 -2.801389 . . . .C1_4*C2_2 0 -1.497222 . . . .C1_4*C2_3 0 4.298611 . . . .



C1 3133.238506 1044.412835 9.4557615 3 0.0001C2 418.8337407 209.4168703 1.8959897 2 0.1617C1 * C2 707.2662593 117.8777099 1.0672250 6 0.3958


C1 2997.471860 999.1572868 9.0460330 3 0.0001C2 415.8730463 207.9365232 1.8825871 2 0.1637C1 * C2 707.2662593 117.8777099 1.0672250 6 0.3958

6. Analysis of Covariance: (see [730], PROC GLM: Ex. 4, p. 974) Here the features of regression and ANOVAare combined when using both, categorical variables and interval scaled covariates:

x = [ ’a’ 11 6, ’a’ 8 0, ’a’ 5 2, ’a’ 14 8, ’a’ 19 11,’a’ 6 4, ’a’ 10 13, ’a’ 6 1, ’a’ 11 8, ’a’ 3 0,’d’ 6 0, ’d’ 6 2, ’d’ 7 3, ’d’ 8 1, ’d’ 18 18,’d’ 8 4, ’d’ 19 14, ’d’ 8 9, ’d’ 5 1, ’d’ 15 9,’f’ 16 13, ’f’ 13 10, ’f’ 11 18, ’f’ 9 5, ’f’ 21 23,’f’ 16 12, ’f’ 12 5, ’f’ 12 16, ’f’ 7 1, ’f’ 12 20];

We first report the results for the rankdeficient (GLM) design coding:

/* GLM Design Coding */clas = 1;model = "3 = 1 2";optn = cons(10,1,.);optn[1] = 5; optn[10] = 1;parm = glmod(x,model,optn,clas);print "GLMOD: Parameters=",parm;



*************Model Effects*************

Intercept + C1 + X2



Class Level Value

[1] 3 a d f




Variable Value NobsC[1] a 10

d 10f 10


Full Analysis********************








Intercept 1 -0.434671 2.471354 0.4963929 0.03094 0.8617Effect 1 2 . . 68.553711 2.13613 0.1384C1 a 1 -3.446138 1.886781 53.529875 3.33597 0.0793C1 d 1 -3.337167 1.853866 51.996324 3.24040 0.0835C1 f 0 0.000000 . . . .X2 1 0.987184 0.164498 577.89740 36.0145 0.0000


C1 293.6000000 146.8000000 9.1485528 2 0.0010X2 577.8974030 577.8974030 36.014475 1 0.0000


C1 68.55371060 34.27685530 2.1361282 2 0.1384X2 577.8974030 577.8974030 36.014475 1 0.0000

Now, we solve the same problem however using the full-rank design coding:

/* DMREG Design Coding */clas = 1;model = "3 = 1 2";optn = cons(3,1,.);optn[1] = 5;parm = glmod(x,model,optn,clas);print "GLMOD: Parameters=",parm;

The first part of the output is the same and is not repeated here:


Full Analysis********************





146 SEM: Structural Equation Modeling



Intercept 1 -2.695773 1.911085 31.928644 1.98979 0.1702Effect 1 2 . . 68.553711 2.13613 0.1384C1 a 1 -1.185037 1.060822 20.024075 1.24790 0.2742C1 d 1 -1.076065 1.041298 17.135646 1.06789 0.3109C1 f 0 2.261102 . . . .X2 1 0.987184 0.164498 577.89740 36.0145 0.0000


C1 293.6000000 146.8000000 9.1485528 2 0.0010X2 577.8974030 577.8974030 36.014475 1 0.0000


C1 68.55371060 34.27685530 2.1361282 2 0.1384X2 577.8974030 577.8974030 36.014475 1 0.0000

1.2 SEM: Structural Equation Modeling

1.2.1 Model Definitions

The Original COSAN Model

C = F1 · · ·FnPFTn · · ·FT

1 ,

C : given symmetric correlation or covariance matrix,Fj , j = 1, . . . , n : rectangular coefficienmt matrices,

P is symmetric coefficient matrix.

Fj =

{Gj,G−1

j ,j = 1, . . . , n, and P =

{Q,

Q−1.

The Generalized COSAN Model

C = F1P1FT1 + · · ·+ FmPmFT

m ,

C : given symmetric correlation or covariance matrix,

SEM: Structural Equation Modeling 147

Fk , k = 1, . . . ,m: product of n(k) matrices Fk1, . . . ,Fkn(k) ,Pk : symmetric coefficient matrices

Fk = Fk1 · · ·Fkn(k) , and Pk = PTk , k = 1, . . . ,m.

Fkj =

{ Gkj ,

G−1kj,

(I− Gkj)−1,

j = 1, . . . , n(k), and Pk ={Qk,Q−1

k .

The EQS Model

η = βη + γξ ,

η : vector of endogenous variables,ξ : vector of exogenous and error variables,

β : nonsingular coefficient matrix among endogenous variables,γ : coefficients between endogenous and exogenous variables.

Variables of η and ξ can be manifest (observed) or latent (not observed) variables.

The covariance matrix C of the manifest variables in η and ξ is then:

C = J(I− B)−1ΓΦΓT (I− B)−TJT ,

J : selection matrix, Φ = E{ξξT},

B =

(β 00 0

)and Γ =

(γI

).

The RAM Model

v = Av + u ,

v : vector of endogenous random variables,u : vector of exogenous random variables,A : nonsingular matrix of coefficients.

Variables in u and v can be manifest (observed) or latent (unobservable) variables.

The covariance matrix C of the manifest variables in u and v is then:

C = J(I−A)−1P(I− A)−TJT ,

J : selection matrix (filter of manifest variables) ,

C = E{JvvTJT} , P = E{uuT} .


The LISREL Model

The general nonrecursive LISREL model assumes linear relationship among the latent variables

η = Bη + Γξ + ζ

where η1, . . . , ηm: endogeneous (latent) variables,ξ1, . . . , ξn: predetermined (latent) variables,

ζ1, . . . , ζm: disturbance terms,and where B is an m ×m and Γ is an m × n coefficient matrix; and assumes also

E{ξζT} = 0, E{ξ} = 0.

It also assumes a linear relationship between manifest indicators and latent constructs

y = Λyη + ε

x = Λxξ + δ

y1, . . . , yp : manifest indicators of endogeneous variables,x1, . . . , xq : manifest indicators of predetermined variables,

ε1, . . . , εp : disturbance terms,δ1, . . . , δm : disturbance terms,

where Λy is an p×m and Λx is an q × n coefficient matrix and assuming (as in factor analysis)

E{ε} = 0, E{δ} = 0, E{ηεT} = 0, E{ξδT} = 0, E{η} = 0, E{ξ} = 0.

The covariance matrix C of the p+ q manifest variables y and x is then:

C = J(I−A)−1P(I− A)−TJT ,

A =

0 0 Λy 00 0 0 Λx

0 0 B Γ0 0 0 0

and P =

Θε

Θδ

ΨΦ

,

with selection matrix J, Φ = E{ξξT}, Ψ = E{ζζT}, Θδ = E{δδT} , and Θε = E{εεT} .


The Hierarchical Factor Model

First Order Factor ModelC = F1P1F

T1 + U2

1 ,

C : given symmetric correlation matrix,F1 : first order factor loadings,

P1 : correlations among first order factors,U2

1 : unique first order variances (disturbances).

Second Order Factor ModelC = F1F2P2F

T2 F

T1 + F1U

22F

T1 + U2

1 .

C : given symmetric correlation matrix,F1 : first order factor loadings,F2 : second order factor loadings,

P2 : correlations among second order factors,U2

1 : first order unique variances,U2

2 : second order unique variances.

Example of Mc DONALD (1980):k = 3: Occasions of Measurement

n = 3: Variables (Tests)m = 2: Common Factors

First-Order Autoregressive Longitudinal Factor Model

C = F1F2F3LF−13 F−1

2 PF−T2 F−T

3 LTF T3 F

T2 F

T1 + U2 .

F1 =

B1

B2

B3

, F2 =

I2

D2

D2

, F3 =

I2

I2D3

,

L =

I2 o oI2 I2 oI2 I2 I2

, P =

I2

S2

S3

, U =

U11 U12 U13

U21 U22 U23

U31 U32 U33

,

S2 = I2 −D22 , S3 = I2 −D2

3 .

1.2.2 A Structural Equation Example:

To illustrate the relationships among the RAM, EQS, and LISREL models, a widely known example from Wheatonet al (1977) is used. Different structural models for these data are in the LISREL 7 manual (see Joreskog & Sorbom,1988, [439]) and in the EQS 3.0 manual (see Bentler, 1989, [57]). The data set contains covariances among six(manifest) variables collected from 932 people in rural regions of Illinois:

Variable 1: V 1, y1 : Anomia 1967


Variable 2: V 2, y2 : Powerlessness 1967

Variable 3: V 3, y3 : Anomia 1971

Variable 4: V 4, y4 : Powerlessness 1971

Variable 5: V 5, x1 : Education (years of schooling)

Variable 6: V 6, x2 : Duncan’s Socioeconomic Index (SEI).

It is assumed that anomia and powerlessness are indicators of an alienation factor, and that education and SEIare indicators for a socioeconomic status (SES) factor. Hence the analysis contains three latent variables:

Variable 7: F1, η1 : Alienation 1967

Variable 8: F2, η2 : Alienation 1971

Variable 9: F3, ξ1 : Socioeconomic Status (SES)

The following path diagram shows the structural model used in Bentler (1985, p. 29) and slightly modified inJoreskog & Sorbom (1985, p. 56). In this notation for the path diagram, regression coefficients between thevariables are indicated as one-headed arrows. Variances and covariances among the variables are indicated astwo-headed arrows. Indicating error variances and covariances as two-headed arrows with the same source anddestination (McArdle, 1988, see [586]; McDonald, 1985, see [595]) is helpful in transforming the path diagram toRAM model list input for the sem function.


1: V1, y1 2: V2, y2 3: V3, y3 4: V4, y4

�--Θ1

�--Θ2

�--Θ1

�--Θ2

? ?

Θ5

? ?

Θ5

@@

@I

��

��1.0 .833

@@

@I

��

��1.0 .833�

��

��

��

��

��

7: F1, η1 8: F2, η2

9: F3, ξ

Ψ16 6

Ψ26 6

Φ6 6

@@

@@

@@I

��

��

��

γ1 γ2

��

��

��

@@

@@

@@R

1.0 λ

-β

5: V5, x1 6: V6, x2 6 6Θ3

6 6Θ4

Figure 1.1: Path Diagram of Stability and Alienation Example


RAM Model

The vector v contains the six manifest variables v1 = V 1, . . . , v6 = V 6 and the three latent variables v7 =F1, v8 = F2, v9 = F3. The vector u contains the corresponding error variables u1 = E1, . . . , u6 = E6 andu7 = D1, u8 = D2, u9 = D3. The path diagram corresponds to the following set of structural equations of theRAM model:

v1 = 1.0v7 + u1

v2 = .833v7 + u2

v3 = 1.0v8 + u3

v4 = .833v8 + u4

v5 = 1.0v9 + u5

v6 = λv9 + u6

v7 = γ1v9 + u7

v8 = βv7 + γ2v9 + u8

v9 = u9

This gives the matrices A and P in the RAM model:

A =

o o o o o o 1. o oo o o o o o .833 o oo o o o o o o 1. oo o o o o o o .833 oo o o o o o o o 1.o o o o o o o o λo o o o o o o o γ1

o o o o o o β o γ2

o o o o o o o o o

,

P =

θ1 o θ5 o o o o o oo θ2 o θ5 o o o o oθ5 o θ1 o o o o o oo θ5 o θ2 o o o o oo o o o θ3 o o o oo o o o o θ4 o o oo o o o o o ψ1 o oo o o o o o o ψ2 oo o o o o o o o φ

.

The following statement defines the structural model of the alienation example as a RAM model:

ram = [ 1 1 7 1. , 1 2 7 .833 ,

1 3 8 1. , 1 4 8 .833 ,

1 5 9 1. , 1 6 9 "LAMB" ,

1 7 9 "GAM1" , 1 8 7 "BETA" ,

1 8 9 "GAM2" ,

1 7 . "F1" , 1 8 . "F2" ,

1 9 . "F3" ,


2 1 1 "THE1" , 2 2 2 "THE2" ,

2 3 3 "THE1" , 2 4 4 "THE2" ,

2 5 5 "THE3" , 2 6 6 "THE4" ,

2 1 3 "THE5" , 2 2 4 "THE5" ,

2 7 7 "PSI1" , 2 8 8 "PSI2" ,

2 9 9 "PHI" ,

2 1 . "E1" , 2 2 . "E2" ,

2 3 . "E3" , 2 4 . "E4" ,

2 5 . "E5" , 2 6 . "E6" ,

2 7 . "D1" , 2 8 . "D2" ,

2 9 . "D3" ];

There is a very close relationship between the RAM model algebra and the specification of structural linear modelsby path diagrams.

See McArdle (1980, [585]) for the interpretation of the models shown this figure.

EQS Model

The vector η contains the six endogenous manifest variables V 1, . . . , V 6 and the two endogenous latent variablesF1 and F2. The vector ξ contains the exogenous error variables E1, . . . , E6, D1, and D2, and the exogenouslatent variable F3. The path diagram corresponds to the following set of structural equations of the EQS model:

V 1 = 1.0F1 +E1

V 2 = .833F1 + E2V 3 = 1.0F2 +E3V 4 = .833F2 + E4V 5 = 1.0F3 +E5V 6 = λF3 + E6F1 = γ1F3 +D1

F2 = βF1 + γ2F3 +D2

This gives the matrices β, γ and Φ in the EQS model:

β =

o o o o o o 1. oo o o o o o .833 oo o o o o o o 1.o o o o o o o .833o o o o o o o oo o o o o o o oo o o o o o o oo o o o o o β o

, γ =

1 o o o o o o o oo 1 o o o o o o oo o 1 o o o o o oo o o 1 o o o o oo o o o 1 o o o 1.o o o o o 1 o o λo o o o o o 1 o γ1

o o o o o o o 1 γ2

,

Φ =

θ1 o θ5 o o o o o oo θ2 o θ5 o o o o oθ5 o θ1 o o o o o oo θ5 o θ2 o o o o oo o o o θ3 o o o oo o o o o θ4 o o oo o o o o o ψ1 o oo o o o o o o ψ2 oo o o o o o o o φ

.


x1 x2

y

-� ρ12��--1 ��1@

@@

@R

��

��

b1 b2

��e

6

b

� �6611. Multiple Regression

x

y1

y2

?

?

a

b

��e

��d1

��d2

2. Chain Simplex

��e1 ��

e2 ��e3

� �??s1 � �??

s2 � �??s3

y1 y2 y3

? ? ?

1 1 1

��f� �661

@@

@@I 6

��

��

d1 d2 d3

3. First-Order Factor Analysis

y1 y2 y3

� �??s1 � �??

s2 � �??s3

��f11 ��

f12

��--d1

��d2

JJ

JJ]

�

JJ

JJ]

�

b11 b12 b13 b14

��f2

AA

AAK

��

b21 b22

� �66d3

4. Second-Order Factor Analysis

Figure 1.2: Specific Examples of RAM nomography


semeqs V1 = F1 + E1 ,

V2 = .833 F1 + E2 ,

V3 = F2 + E3 ,

V4 = .833 F2 + E4 ,

V5 = F3 + E5 ,

V6 = LAMB F3 + E6 ,

F1 = GAM1 F3 + D1 ,

F2 = BETA F1 + GAM2 F3 + D2 ;

semvar E1:E6 = THE1:THE2 THE1:THE4,

D1:D2 = PSI1:PSI2 ,

F3 = PHI ;

semcov E1 E3 = THE5 ,

E4 E2 = THE5 ;

COSAN Model

The following is the COSAN model specification for this example:

_J = ide(6) -> cons(6,3,0.);

_A = cons(9,9);

_A[,7] = [ 1. .833 4#0. 0. "beta" 0. ]‘;

_A[,8] = [ 2#0. 1.0 .833 3#0. 0. 0. ]‘;

_A[,9] = [ 4#0. 1.0 "lamb" "gam1" "gam2" 0. ]‘;

t2 = [ "the1" "the2" "the1":"the4" "psi1" "psi2" "phi" ];

t3 = diag(t2);

t3[3,1] = t3[4,2] = "the5" ;

_P = (tri2sym)t3;

semcos _J * _A_imi * _P;


LISREL Model

The vector y contains the four endogenous manifest variables y1 = V 1, . . . , y4 = V 4, and the vector x contains theexogenous manifest variables x1 = V 5 and x2 = V 6. The vector ε contains the error variables ε1 = E1, . . . , ε4 = E4corresponding to y, and the vector δ contains the error variables δ1 = E5 and δ2 = E6 corresponding to x. Thevector η contains the endogenous latent variables (factors) η1 = F1 and η2 = F2, while the vector ξ containsthe exogenous latent variable (factor) ξ1 = F3. The vector ζ contains the errors ζ1 = D1 and ζ2 = D2 in theequations (disturbance terms) corresponding to η. The path diagram corresponds to the following set of structuralequations of the LISREL model:

y1 = 1.0η1 + ε1,

y2 = .833η1 + ε2,

y3 = 1.0η2 + ε3,

y4 = .833η2 + ε4,

x1 = 1.0ξ1 + δ1,

x2 = λξ1 + δ2,

η1 = γ1ξ1 + ζ1,

η2 = βη1 + γ2ξ1 + ζ2.

This gives the matrices Λy , Λx, B, Γ, and Φ in the LISREL model:

Λy =

1. o.833 oo 1.o .833

, Λx =

(1.λ

), B =

(o oβ o

), Γ =

(γ1

γ2

),

Θ2ε =

θ1 o θ5 oo θ2 o θ5θ5 o θ1 oo θ5 o θ2

, Θ2

δ =(θ3 oθ4 o

),

Ψ =(ψ1 oo ψ2

), Φ = (φ ) .

The sem function does not provide a LISREL model input specification. However, any model that can be specifiedby the LISREL model can also be specified by using the COSAN, EQS, or RAM model specifications.


1.2.3 Model Specification of Confirmatory Factor Analysis

For example, to specify a confirmatory second-order factor analysis model

S = F1F2P2FT2 FT

1 + F1U22F

T1 + U2

1,

with m1 = 3 first-order factors, m2 = 2 second-order factors, and n = 9 variables and the following matrixpattern,

F1 =

X1 0 0X2 0 0X3 0 00 X4 00 X5 00 X6 00 0 X7

0 0 X8

0 0 X9

, U1 =

U1

U2

U3

U4

U5

U6

U7

U8

U9

,

F2 =

Y1 0Y1 Y2

0 Y2

, P2 =

(P 00 P

), U2 =

V1

V2

V3

,

you can use the following semcos statement:

f1 = [ "x1" 0. 0. ,

"x2" 0. 0. ,

"x3" 0. 0. ,

0. "x4" 0. ,

0. "x5" 0. ,

0. "x6" 0. ,

0. 0. "x7" ,

0. 0. "x8" ,

0. 0. "x9" ];

f2 = [ "x10", "x11", "x12" ];

p = 1.;

uv1 = [ "u11" : "u19" ];

u1 = diag(uv1);

u2 = ide(3);

semcos f1 * f2 * p + f1 * u2 + u1;

The matrix pattern includes several equality constraints. Two loadings in the first and second factor of F2

(parameter names Y1 and Y2) and the two factor correlations in the diagonal of matrix P2 (parameter name P)are constrained to be equal. There are many other ways to specify the same model.

For example, you can specify the confirmatory second-order factor analysis model

S = F1F2P2FT2 FT

1 + F1U22F

T1 + U2

1,

using the following RAM model statement:

ram = [ 1 1 10 "X1", 1 2 10 "X2",

1 3 10 "X3", 1 4 11 "X4",

1 5 11 "X5", 1 6 11 "X6",

1 7 12 "X7", 1 8 12 "X8",


V1

V2

V3

V4 V5 V6

V7

V8

V9

��--U1��--U2��--U3

� �66U4� �66U5

� �66U6

��U7

��U8

��U9

��F 1

1

HHHHHHHY X1

�X2 ��X3

��V1 ��F 1

2�

��

��

��

X4

?

X5

@@

@@

@@@R

X6

��--V2 ��F 1

3

��*

X7

-X8HHHHHHHj

X9

��--V3

��F 2

1

��

��

��

Y1

AAAAAAAU

Y1

��--P ��F 2

2

��

��

��

Y2

AAAAAAAU

Y2

��P

Figure 1.3: Path Diagram of Second-Order Factor Analysis Model

1 9 12 "X9", 1 10 13 "X10",

1 11 13 "X11", 1 12 13 "X12",

2 1 1 "U11", 2 2 2 "U12",

2 3 3 "U13", 2 4 4 "U14",

2 5 5 "U15", 2 6 6 "U16",

2 7 7 "U17", 2 8 8 "U18",

2 9 9 "U19", 2 10 10 1. ,

2 11 11 1. , 2 12 12 1. ,

2 13 13 1. ];

optns = [ "data" "cov",

"nobs" 213,

"pall" ];

gof = sem(thur,ram,optns);

The confirmatory second-order factor analysis model corresponds to the path diagram.


1.2.4 Assessment of Fit

This section contains a collection of formulas used in computing indices to assess the goodness of fit by the semfunction. The following notation is used:

• N for the sample size

• n for the number manifest variables

• t for the number of parameters to estimate

•

NM :=

(N − 1) , corrected CORR or COV matrix is analyzedor the intercept variable is not used in the model;

N , uncorrected UCORR or UCOV matrix is analyzedand the intercept variable is not used in the model;

• df for the number of degrees of freedom

• γ = X for the t vector of optimal parameter estimates

• S = (sij) for the n × n input COV, CORR, UCOV, or UCORR matrix

• C = (cij) = Σ = Σ(γ) for the predicted model matrix, i.e. best fit of the model to sample covariancematrix S based on the estimated parameters γ

• W for the weight matrix (W = I: for ULS, W = S: for default GLS, W = C: for ML estimates)

• U for the n2 × n2 asymptotic covariance matrix of sample covariances

• Φ(x|λ, df) for the cumulative distribution function of the noncentral chi-squared distribution with noncen-trality parameter λ

The following notation is for indices that allow testing hierarchical models by a χ2 difference test:

• f0 for the function value of a baseline or null model

• df0 for the degrees of freedom of a baseline or null model

• fmin = F for the function value of the fitted model

• dfmin = df for the degrees of freedom of the fitted model

1.2.5 Residuals

The sem function computes four types of residuals and writes them to the OUTSTAT= data set:

• Simple Unnormed Residuals:

Res = S−C , Resij = sij − cij

The unnormed residuals are printed whenever the ALL, the PRINT, or the RESIDUAL option is specified.


• Variance Standardized Residuals:Variance standardized residuals

V SResij =sij − cij√siisjj

are printed when

– the PALL, the PRINT, or the RESIDUAL option is specified and METHOD= NONE, ULS, or DWLS

– RESIDUAL= VARSTAND is specified

The variance standardized residuals are equal to those computed by the EQS 3 program (Bentler 1989).

• Asymptotically Standardized Residuals:

ASResij =sij − cij√cij,ij

where cij,ij is the diagonal element of the n2 × n2 covariance matrix of residuals

cij,ij = diag(U − JCov(γ)JT )ij

U is the asymptotic covariance matrix of sample covariances, J is the n2 × t Jacobian matrix dΣ/dγ, andCov(γ) is the t×t asymptotic covariance matrix of parameter estimates (inverse of the information matrix).Asymptotically standardized residuals are not computed if the information matrix is singular or not positivedefinite. Asymptotically standardized residuals are printed when one of the following conditions is met:

– The information matrix is positive definite and not singular.

– The PALL, the PRINT, or the RESIDUAL option is specified, METHOD= ML, GLS, or WLS isspecified, and the expensive information and Jacobian matrices are computed for some other reason(see below).

– RESIDUAL= ASYSTAND is specified.

The asymptotically standardized residuals are equal to those computed by the LISREL 7 program (Joreskog& Sorbom 1988) except for the denominator NM in the definition of matrix U.

• Normalized Residuals:

NResij =sij − cij√uij,ij

where uij,ij is the diagonal element of the n2 × n2 asymptotic covariance matrix U of sample covariancesare defined for method

– GLS: uij,ij = 1NM (siisjj + s2ij)

– ML : uij,ij = 1NM (ciicjj + c2ij)

– WLS: uij,ij = wij,ij

Normalized residuals are printed when one of the following conditions is met:

– The PALL, the PRINT, or the RESIDUAL option is specified, METHOD= ML, GLS, or WLS isspecified, and the expensive information and Jacobian matrices are not computed for some otherreason (see below).

– RESIDUAL= NORM is specified.


The normalized residuals are equal to those computed by the LISREL VI program (Joreskog & Sorbom1985) except for the definition of the denominator NM in matrix U.

For estimation methods that are not BGLS estimation methods (Browne 1982, 1984), such as METHOD= ULSor DWLS, the assumption of an asymptotic covariance matrix U of sample covariances doesn’t seem to beappropriate. In this case, the normalized residuals should be replaced by the more relaxed variance standardizedresiduals. Computation of asymptotically standardized residuals requires computing the Jacobian and informationmatrices. This is computationally very expensive and is done only if the Jacobian matrix has to be computed forsome other reason, that is, if at least one of the following items is true:

• either default, PRINT, or ALL printed output is requested, and neither the NOMOD nor NOSTDERRoption is specified

• either the MODIFICATION (included in ALL), PCOVES, STDERR (included in default, PRINT, andPALL output) option is requested or RESIDUAL= ASYSTAND is specified

• either the LEVMAR, NRRIDG, NEWRAP, or TRUREG optimization technique is used

• an OUTRAM= data set is specified without using the NOSTDERR option

• an OUTEST= data set is specified without using the NOSTDERR option

Since normalized residuals use an overestimate of the asymptotic covariance matrix of residuals (diagonal of U),the normalized residuals cannot be larger than the asymptotically standardized residuals (which use the diagonalof U− JCov(γ)JT ).

Together with the residual matrices, the values of the average residual, the average off-diagonal residual, and therank order of the largest values are printed. The distribution of the normalized and standardized residuals isprinted also.

1.2.6 Goodness-of-Fit Indices

The sem function computes three groups of goodness-of-fit indices:

• The following four items are computed for all five kinds of estimation: ULS, GLS, ML, WLS, and DWLS.The GFI, AGFI, and RMR are computed as in the LISREL 7 program of Joreskog & Sorbom (1988, [439]).The GFI and AGFI should be between zero and one. If the GFI or AGFI are negative or much larger thanone, this probably indicates that the data do not fit the model.

1. Goodness-of-Fit Index:The goodness-of-fit index is for ULS, GLS, and ML estimation

GFI = 1 − Tr((W−1(S−C))2)Tr((W−1S)2)

,

but for WLS and DWLS estimation, it is

GFI = 1 − V ec(sij − cij)T W−1V ec(sij − cij)V ec(sij)T W−1V ec(sij)

,

where W = diag for DWLS estimation. For a constant weight matrix W, the goodness-of-fit index is1 minus the ratio of the minimum function value and the function value before any model has beenfitted.


2. Adjusted Goodness-of-Fit Index:The AGFI is the GFI adjusted for the degrees of freedom of the model

AGFI = 1 − n(n + 1)2df

(1 − GFI) .

The AGFI corresponds to the GFI in replacing the total sum of squares by the mean sum of squares.Caution:

(a) Large n and small df the AGFI can result in any negative AGFI value. E.g. GFI= .90, n= 20,df= 2:

AGFI = 1 −20 ∗ 19

4∗ .01 = 1 − 9.5 = −8.5

(b) AGFI is not defined for the saturated model, due to division by df = 0.(c) AGFI is not sensitive enough for losses in df .

For more information see Mulaik et al. (1989).

3. Root Mean Square Residual:The RMR is the mean of the squared residuals :

RMR =

√√√√ 2n(n + 1)

n∑

i

i∑

j

(sij − cij)2 .

4. Parsimonious Goodness-of-Fit Index:The PGFI (Mulaik et al., 1989, [634]) is a modification of the GFI that takes parsimony of the modelinto account,

PGFI =dfmin

df0GFI .

The PGFI uses the same parsimonious factor as the parsimonious normed Bentler-Bonett index (James,Mulaik, & Brett, 1982, [421]).

• The following nineteen items are transformations of the overall χ2 value and in general depend on thesample size N. Some indices, e.g. the Bentler & Bonett indices allow you to test hierarchical models bya χ2 difference test. These indices are not computed for unweighted least-squares estimates or diagonallyweighted least-squares estimates.

1. Uncorrected χ2:The overall χ2 measure is the optimum function value F multiplied by N − 1 if a CORR or COVmatrix is analyzed or multiplied by N if a UCORR or UCOV matrix is analyzed. This gives thelikelihood ratio test statistic for the null hypothesis that the predicted matrix C has the specifiedmodel structure against the alternative that C is unconstrained. The χ2 test is valid only if theobservations are independent and identically distributed, the analysis is based on the nonstandardizedsample covariance matrix S, and the sample size N is sufficiently large (Browne,1982, [120]; Bollen,1989 b, [85]; Joreskog & Sorbom, 1985, [439]). For ML and GLS estimation, the variables mustalso have an approximately multivariate normal distribution. The notation Prob>Chi**2 means ”theprobability under the null hypothesis of obtaining a greater χ2 statistic than that observed.”

χ2 = NM ∗ F

where F is the function value at the minimum.SAS Code:chisq=fit*(n-1); pchisq=1-probchi(chisq,df);


2. χ2 Value of the Null Model:The χ2

0 value of the baseline (or null) model

χ20 = NM ∗ f0

and the corresponding degrees of freedom df0 can be used (in large samples) to evaluate the gain ofexplanation by fitting the specific model (Bentler, 1989).

3. RMSEA Index:The Steiger & Lind (1980,[782]) RMSEA (root mean squared error of approximation) coefficient is

εα =

√max(

F

df− 1NM

, 0)

The lower and upper limits of the confidence interval are computated using the cumulative distributionfunction of the noncentral chi-squared distribution Φ(x|λ, df) = α, with x = NM ∗ F , λL satisfyingΦ(x|λL, df) = α

2, and λU satisfying Φ(x|λU , df) = 1 − α

2:

(εαL ; εαU ) = (

√λL

NM ∗ df ;

√λU

NM ∗ df ).

See Browne & Du Toit (1992) for more details. The size of the confidence interval is defined by theoption ALPHARMS= α, 0 ≤ α ≤ 1. The default is α = .1 which corresponds to a 90% confidenceinterval [.05, .95].

4. Probability for Test of Close Fit:The traditional exact χ2 test hypothesis H0 : εα = 0. is replaced by the null hypothesis of close fit(Browne & Cudeck, 1993) H0 : εα ≤ 0.05 and the exceedance probability P is computed[124]:

P = 1 − Φ(x|λ∗, df)

where x = NM ∗F and λ∗ = .052 ∗NM ∗ df . The null hypothesis of close fit is rejected if P is smallerthan a prespecified level (P < .05 or P < .01).

5. Expected Cross Validation Index:For GLS and WLS, the estimator c of the ECVI (see [124]) is linearly related to AIC (Browne &Cudeck, 1993)

c = F (S,C) +2tNM

For ML estimation, cML is used

cML = FML(S,C) +2t

NM − n− 1.

The confidence interval (cL; cU) for c is computed using the cumulative distribution function Φ(x|λ, df)of the noncentral chi-squared distribution,

(cL; cU) = (λL + nnt

NM;λU + nnt

NM)

with nnt = n(n+ 1)/2 + t, x = NM ∗ F and Φ(x|λU , df) = 1 − α2 and Φ(x|λL, df) = α

2 .The confidence interval (c∗L; c∗U) for cML

(c∗L; c∗U) = (λ∗L + nnt

NM − n− 1;

λ∗U + nnt

NM − n− 1)


with nnt = n(n+ 1)/2 + t, x = (NM − n− 1) ∗ F and Φ(x|λ∗U , df) = 1 − α2

and Φ(x|λ∗L, df) = α2.

See also Browne & Cudeck (1993). The size of the confidence interval is defined by the optionALPHAECV=α, 0 ≤ α ≤ 1. Default is α = .1 which corresponds to a 90% confidence interval[.05, .95].

6. Comparative Fit Index:

CFI = 1 − max(NM ∗ fmin − dfmin, 0)max(NM ∗ f0 − df0, 0)

See Bentler (1989, see [57]).

7. Adjusted χ2 Value:If the variables are n-variate elliptic rather than normally distributed and have significant amounts ofmultivariate kurtosis (leptokurtic or platokurtic), the χ2 value can be adjusted to

χ2ell =

χ2

η2,

where η2 is the multivariate relative kurtosis coefficient. See Browne, 1982, (1.7.6), [120]).

8. Normal Theory Reweighted LS χ2 Value:This index is printed only if METHOD=ML. Instead of the function value FML, the reweightedgoodness-of-fit criterion FGWLS is used,

χ2GWLS = NM ∗ FGWLS

where FGWLS is the value of the criterion at the minimum.

9. Akaike’s Information Criterion:This is a general criterion for estimating the best number of parameters to include in a model(Akaike,1974; Akaike, 1987). The number of parameters that yields the smallest value of AIC is consideredbest.

AIC = χ2 − 2df .

Note: Some authors (e.g. Mulaik et al, 1989) replace −df by t, i.e. do not subtract n(n+ 1)/2. Thisalways results in nonnegative AIC values.

10. Consistent Akaike’s Information Criterion (CAIC):This is another criterion, similar to AIC, for determining the best number of parameters (Bozdogan,1987). The number of parameters that yields the smallest value of CAIC is considered best. CAIC ispreferred by some people to AIC or the χ2 test,

CAIC = χ2 − (ln(N ) + 1)df .

11. Schwarz’s Bayesian Criterion (SBC):This is another criterion, similar to AIC, for determining the best number of parameters(Schwarz,1978; Sclove, 1987). The number of parameters that yields the smallest value of SBC is consideredbest. SBC is preferred by some people to AIC or the χ2 test,

SBC = χ2 − ln(N )df .

12. McDonald’s Measure of Centrality:

CENT = exp(− (χ2 − df)2N

) .

See McDonald (1989,[596]))


13. Parsimonious Normed Fit Index:The PNFI (James, Mulaik, & Brett, 1982, [421]) is a modification of Bentler-Bonett’s normed fit indexthat takes parsimony of the model into account,

PNFI =dfmin

df0

(f0 − fmin)f0

.

The PNFI uses the same parsimonious factor as the parsimonious GFI of Mulaik et al. (1989).

14. Z-Test of Wilson & Hilferty (1931):The Z-Test of Wilson & Hilferty assumes n-variate normal distribution:

Z =3

√χ2

df − (1 − 29df )

√2

9df

.

See McArdle (1988, [586]) for an application of the Z-Test. See also Bishop, Fienberg, & Holland (1977,p.527, [72]).SAS Code:z=((chisq/df)##(1/3) - (1-2/(9*df)))/sqrt(2/(9*df));

15. Bentler & Bonett’s (1980) Nonnormed Coefficient:

ρ =f0/df0 − fmin/dfmin

f0/df0 − 1/NM

See also Tucker & Lewis (1973).

16. Bentler & Bonett’s (1980) Normed Coefficient:

∆ =f0 − fmin

f0.

There is always 0 ≤ ∆ ≤ 1, if Fmin = 0 then ∆ = 1. ∆ measures ”proportionate reduction in the fittingfunction or chi-square values when moving from the baseline to the maintained model” (Bollen, 1989b, p.270). Adding parameters reduces fmin and so ∆. If H0 is true, Function values grow smaller asN gets larger. As N increases ∆ looks better. Mulaik et al (1989) recommend parsimonious weightedform PNFI.

17. Bollen’s (1986) Normed Index ρ1:

ρ1 =f0df0

− fmin

dfmin

f0df0

.

There is always ρ1 ≤ 1 but ρ1 < 0 is unlikely in practice. See discussion in Bollen (1989 a).

18. Bollen’s (1989 a) Nonnormed Index ∆2:

∆2 =f0 − fmin

f0 − dfNM

Modification of Bentler & Bonett’s ∆ that uses df and ”lessens the dependence” on N . See thediscussion in Bollen (1989 b). ∆2 is identical to Mulaik et al. (1989) IFI2 index.


19. Hoelter’s (1983) Critical N Index:

CN =χ2

crit

F+ 1 ,

where χ2crit is the critical chi-square value for given df degrees of freedom and probability α = .05, and

F is the value of the estimation criterion (minimization function), see Bollen (1989 b, p. 277). Hoelter(1983) suggests that CN should be at least 200. However, Bollen (1989 b) notes that the CN valuemay lead to an overly pessimistic assessment of fit for small samples.SAS Code:cnhoelt=cinv(.95,df)/fit + 1;

• The following four items are measures of the squared multiple correlation for manifest and endogenousvariables and are computed for all five estimation methods: ULS, GLS, ML, WLS, and DWLS. Thesecoefficients are computed as in the LISREL VI program of Joreskog & Sorbom (1988, [439]). The DETAE,DETSE, and DETMV determination coefficients are intended to be global means of the squared multiplecorrelations for different subsets of model equations and variables. These coefficients are printed only whenthe PDETERM option is specified and a RAM or EQS model is specified.

1. R2 Values Corresponding to Endogenous Variables:

R2i = 1 −

var(ζi)var(ηi)

.

2. Total Determination of All Equations:

DETAE = 1 − det(Θ, Ψ)

det( Cov(y, x, η)).

3. Total Determination of the Structural Equations:

DETSE = 1 − det(Ψ)

det( Cov(η)).

4. Total Determination of the Manifest Variables:

DETMV = 1 − det(Θ)det(S)

.

Caution: In the LISREL program, the structural equations are defined by specifying the BETA matrix. Withthe sem function, a structural equation has a dependent left-hand-side variable that appears at least once onthe right-hand side of another equation, or the equation has at least one right-hand-side variable that is theleft-hand-side variable of another equation. Therefore, the sem function sometimes identifies more equations asstructural equations than the LISREL program does.


1.2.7 Measures of Multivariate Kurtosis

In many applications, the manifest variables are not even approximately multivariate normal. If this happens to bethe case with your data set, the default generalized least-squares and maximum-likelihood estimation methods arenot appropriate, and you should compute the parameter estimates and their standard errors by an asymptoticallydistribution-free method, such as the WLS estimation method. If your manifest variables are multivariate normal,then they have a zero relative multivariate kurtosis and all marginal distributions have zero kurtosis (Browne,1982, [120]). If your DATA= data set contains raw data, the sem function computes univariate skewness andkurtosis and a set of multivariate kurtosis values. By default, the values of univariate skewness and kurtosis arecorrected for bias, but using the BIASKUR option allows you to compute the uncorrected values also.

Corrected Variance for Variable zj:

σ2j =

1N − 1

N∑

i

(zij − zj)2

Corrected Univariate Skewness for Variable zj :

γ1(j) =N

(N − 1)(N − 2)

∑Ni (zij − zj)3

σ3j

Uncorrected Univariate Skewness for Variable zj :

γ1(j) =N∑N

i (zij − zj)3√N [∑N

i (zij − zj)2]3

Corrected Univariate Kurtosis for Variable zj :

γ2(j) =N (N + 1)

(N − 1)(N − 2)(N − 3)

∑Ni (zij − zj)4

σ4j

− 3(N − 1)2

(N − 2)(N − 3)

Uncorrected Univariate Kurtosis for Variable zj :

γ2(j) =N∑N

i (zij − zj)4

[∑N

i (zij − zj)2]2− 3

Mardia’s Multivariate Kurtosis:

γ2 =1N

N∑

i

[(zi − z)T S−1(zi − z)]2 − n(n+ 2)

Relative Multivariate Kurtosis:

η2 =γ2 + n(n+ 2)n(n + 2)

Normalized Multivariate Kurtosis:κ0 =

γ2√8n(n+ 2)/N


Mardia Based Kappa:κ1 =

γ2

n(n+ 2)

Mean Scaled Univariate Kurtosis:

κ2 =13n

n∑

j

γ2(j)

Adjusted Mean Scaled Univariate Kurtosis:

κ3 =13n

n∑

j

γ∗2(j)

with

γ∗2(j) =

γ2(j) , if γ2(j) >

−6n+2

−6n+2 , otherwise

If variable Zj is normally distributed, the uncorrected univariate kurtosis γ2(j) is zero. If Z has an n variatenormal distribution, Mardia’s multivariate kurtosis γ2 is zero too. A variable Zj is called leptokurtic if it hasa significant positive value of γ2(j) and is called platokurtic if it has a significant negative value of γ2(j). Thevalues of κ1, κ2, and κ3 should not be smaller than a lower bound (Bentler, 1989, [57])

κ ≥ −2n+ 2

.

The sem function prints a message if this happens.

If weighted least-squares estimates (METHOD=WLS or ADF) are specified and the weight matrix is computedfrom an input raw data set, the sem function procedure computes two further measures of multivariate kurtosis:

Multivariate Mean Kappa:

κ4 =1m

n∑

i

i∑

j

j∑

k

k∑

l

κij,kl − 1 ,

whereκij,kl =

sij,kl

sijskl + siksjl + silsjk,

and m = n(n+ 1)(n+ 2)(n+ 3)/24 is the number of elements in these vectors (Bentler, 1989, [57]).

Multivariate Least-Squares Kappa:

κ5 =sT4 s2sT2 s2

− 1 ,

where

sij,kl =1N

N∑

r=1

(zri − zi)(zrj − zj)(zrk − zk)(zrl − zl) ,

s4 is the vector of the sij,kl, s2 is the vector of the elements in the denominator of κ (Bentler, 1985).

The occurence of significant nonzero values of Mardia’s multivariate kurtosis γ2 and significant amounts of some ofthe univariate kurtosis values γ2(j) indicate that your variables are not multivariate normal distributed. Violatingthe multivariate normality assumption in (default) generalized least-squares and maximum-likelihood estimationusually leads to wrong approximate standard errors and incorrect fit statistics based on the χ2 value. In general,the parameter estimates are more stable against violations of the normal distribution assumption. For moredetails see Browne (1974, 1982, 1984).


1.2.8 Initial Estimates

Each optimization technique requires a set of initial values X0 for the parameters. To avoid local optima, theinitial values should be as close as possible to the globally optimal solution. You should check for local optimaby running the analysis with several different sets of initial values; the RANDOM= option in the sem functionstatement is useful in this regard.

RAM and EQS There are different default estimation methods available with the sem function for initial valuesof parameters in a linear structural equation model specified by a RAM or EQS model statement dependingon the form of the specified model:

• two-stage least-squares estimation;

• instrumental variable method (Hagglund, 1982; Jennrich, 1987, [428]);

• approximative factor analysis method;

• ordinary least-squares estimation;

• method by McDonald & Hartmann (1992,[598])

FACTOR For default (exploratory) factor analysis, the sem function computes initial estimates for factor load-ings and unique variances by an algebraic method of approximate factor analysis. If you use a MATRIXstatement together with a FACTOR model specification, initial values will be computed with the methodby McDonald & Hartmann (1992) if possible. This method of computing initial values works better if youscale the factors by setting the factor correlations to one rather than setting the loadings of the referencevariables equal to one. If none of the two methods seems to be appropriate, the initial values are set by theSTART= option.

COSAN For the more general COSAN model, there is no default estimation method for the initial values. Inthis case, the START= or RANDOM= option can be used to set otherwise unassigned initial values.

Poor initial values can cause convergence problems, especially with maximum-likelihood estimation. You shouldnot use a constant initial value for all parameters since this would produce a singular predicted model matrixin the first iteration. Sufficiently large positive diagonal elements in the central matrices of each model matrixterm will provide a nonnegative definite initial predicted model matrix. If maximum-likelihood estimation fails toconverge, it may help to use METHOD=LSML, which uses the final estimates from an unweighted least-squaresanalysis as initial estimates for maximum-likelihood. Or you can fit a slightly different but better-behaved modeland produce an OUTRAM= data set, which can then be modified in accordance with the original model andused as an INRAM= data set to provide initial values for another analysis.

If you are analyzing a covariance or scalar product matrix, be sure to take into account the scales of the variables.The default initial values may be inappropriate when some variables have extremely large or small variances.

1.2.9 Automatic Variable Selection

If a linear structural equation model using the RAM or EQS model specification (or an INRAM= data setspecifying a RAM or EQS model) does not use all of the manifest variables given in the input data set, the semfunction automatically deletes those manifest variables not used in the model. In some special circumstances, theautomatic variable selection performed for the RAM and EQS statements may be inappropriate, for example, ifyou are interested in modification indices connected to some of the variables that are not used in the model. Youcan include such manifest variables as exogenous variables in the analysis by specifying constant zero coefficients.


The following example illustrates the EQS specification of

For example, the first three steps in a stepwise regression analysis of the Werner Blood Chemistry data (Joreskog& Sorbom, 1988, p. 111, [439]) can be performed as follows:

semeqs Y = 0. X1 + 0. X2 + 0. X3 + 0. X4 + 0. X5 + 0. X6 + 0. X7 + E;

semvar E = VAR;

free semcov;


"anal" "cor",

"meth" "gls",

"nobs" 180,

"primat" ,

"pall" ];

< gof,est > = sem(blood,"semeqs",optns);

semeqs Y = G1 X1 + 0. X2 + 0. X3 + 0. X4 + 0. X5 + 0. X6 + 0. X7 + E;

semvar E = VAR;


"anal" "cor",

"meth" "gls",

"nobs" 180,

"pall" ];


semeqs Y = G1 X1 + 0. X2 + 0. X3 + 0. X4 + 0. X5 + G6 X6 + 0. X7 + E;

semvar E = VAR;


"anal" "cor",

"meth" "gls",

"nobs" 180,

"pall" ];


semeqs Y = G1 X1 + 0. X2 + 0. X3 + 0. X4 + 0. X5 + G6 X6 + G7 X7 + E;

semvar E = VAR;


"anal" "cor",

"meth" "gls",

"nobs" 180,

"pall" ];


The RAM model statement requires that the first n variable numbers in the path diagram correspond to thenumbers of the n manifest variables in the correlation or covariance matrix after it has been reduced or reorderedby the VAR statement. A model specified by a RAM statement will use automatic variable reduction if it doesnot contain the numbers of all manifest variables. If you specify the model by reading an INRAM= data set, youcan also use the VAR statement or automatic variable reduction. In this case automatic variable reduction willbe done in two steps.

1. The names of the manifest variables in the DATA= data set and the names used in the TYPE = ’VAR-NAME’ observations of the INRAM= data set are compared, and those that are not listed in the INRAM=data set are deleted from the analysis.


2. Those manifest variables listed in TYPE = ’VARNAME’ observations in the INRAM= data set that arenot used in the model specified by TYPE = ’ESTIM’ observations are also deleted from the analysis.

Using the COSAN statement does not automatically delete those variables from the analysis that are not used inthe model. You can use the output of the predetermined values in the predicted model matrix (PREDET option)to detect unused variables. Variables that are not used in the model are indicated by zero rows and columns inthe predetermined predicted model matrix.

1.2.10 Exogenous Manifest Variables

If there are exogenous manifest variables in the linear structural equation model, then there is a one-to-onerelationship between the given sample covariances and corresponding estimates in the central model matrix (P orΦ). In general, using exogenous manifest variables reduces the degrees of freedom since the corresponding samplecorrelations or covariances are not part of the exogenous information provided for the parameter estimation. SeeCounting the Degrees of Freedom for more information.

If you use a RAM or EQS model statement, or such a model is recognized in an INRAM= data set, those elementsin the central model matrices that correspond to the exogenous manifest variables are reset to the given samplevalues in S after computing covariances or correlations within the current BY group. The COSAN statementdoes not automatically set the covariances in the central model matrices that correspond to manifest exogenousvariables.

You can use the output of the predetermined values in the predicted model matrix (PREDET option) whichindicates the differences between constant values in the predicted model matrix and the data matrix that isanalyzed. This output shows which of the manifest variables are exogenous variables and helps you to set thecorresponding locations of the central model matrices with their covariances. If the analyzed matrix is a correlationmatrix (containing constant elements of 1 in the diagonal) and the model generates a predicted model matrix withq constant (rather than variable) elements in the diagonal, the degrees of freedom are reduced by q. The outputgenerated by the PREDET option shows those constant diagonal positions. You can overwrite these correctionof the degrees of freedom by using the DFREDUCE= option.

The following two simple examples show how different the results of the sem function can be if manifest variablesare considered either as endogenous or as exogenous variables. In both examples a correlation matrix S istested against an identity model matrix C, i.e. no parameter is estimated. The three runs of the first example(specified by the COSAN, EQS, and RAM models) consider the two variables y and x as endogenous variables:

print "Data: FULLER (1987, p.18)";corn = [ 86 70,

115 97,90 53,86 64,

110 95,91 64,99 50,96 70,99 94,

104 69,96 51 ];

cnam = [ "Y" "X"];corn = cname(corn,cnam);

print "Endogenous Y and X";corr = ide(2);semcos corr;

semeqs Y = EY,X = EX;

semvar EY EX = 2 * 1;

ram = [ 1 1 3 1.,1 2 4 1.,2 3 3 1.,2 4 4 1.];

The two runs of the second example (specified by the EQS and RAM statements) consider y and x as exogenous


x y

� �??1 � �??

1

Exogenous x, y

x y

��ex

?1

��ey

?1

� �??1 � �??

1

Endogenous x, y

Figure 1.4: Endogenous Variables

variables:

print "Exogenous Y and X";

semvar Y X = 2 * 1;

ram = [ 2 1 1 1. ,

2 2 2 1. ];

Using the EQS and the RAM model specification will set the covariances (i.e. correlations) of exogenous manifestvariables in the estimated model matrix and reduce automatically the degrees of freedom.


1.2.11 List of Test Examples for SEM

Mixed Test Examples

SEM Eight Documentation Examples:

1. Stability of Alienation (see SEM10)COV Data, nobs=932, nvar=6

2. Supply and Demand Food Example of Kmenta (see SEM01)Raw Data, nobs=20, nvar=4

3. Second Order Confirmatory Factor Analysis (see SEM09)CORR Data, nobs=213, nvar=9

4. Linear Relations Among Factor Loadings (see SEM22)CORR Data, nobs=326,nvar=6

5. Ordinal Relations Among Factor Loadings (see SEM23)CORR Data, nobs=326, nvar=6

6. Longitudinal Factor Analysis (see SEM24)CORR Data, nobs=100, nvar=9

7. Jackknifing with SEM: An Illustration (see SEM79)Raw Data, nobs=50, nvar=6

8. Andersons Circular Stochastic Process (see SEM78)CORR Data, nobs=710, nvar=6

SEM01 Supply and Demand Food Example

• DATA: J. KMENTA: Elements of Econometrics (1971, p.565,582)CORR Data, nobs=25, nvar=4

• TEST: Apply SEM to analyze uncorrected covariance matrixOptions UCOV and AUGEstimate mean structureCompare SEM and SYSLIN results

SEM40 Consumer Reaction to Dissatisfaction, FORNELL(1982, pp. 299)

• DATA: BEST & ANDREASEN (1977)CORR Data, nobs=213, nvar=7

• TEST: Different identified and not identified modelsSingular and indefinite central model matrices

SEM41 Subjective Class Identification, BOLLEN (1989)

• DATA: KLUEGEL, et al. (1977), BOLLEN (1989, p.116)COV Data, nobs=432, nvar=5

• TEST: Different structural equation models

SEM42 Democratic Political Structures in 113 Counties BOLLEN (1980)


• DATA: BOLLEN (1980)COV Data, nobs=113, nvar=6

• TEST: Different first order confirmatory FA models

SEM43 Instrumental FA by HAEGGLUND (1982)

• DATA: LAWLEY & MAXWELL (1971, p. 96), HOLZINGER & SWINEFORDCORR Data, nobs=73, nvar=9

• TEST: First order confirmatory FA modelInitial Values

SEM44 Various Examples of W. A. FULLER (1987)

• DATA:

1. REILLY & PATINO-LEAL (1981)COV Data, nobs=19, nvar=4

2. Corn data (pg. 19)

• TEST: Singular central model matrices occur

1. Only manifest variables2. Simple first order confirmatory FA3. Using UCOV and AUG options4. Convergence problems

SEM45 Simulated Examples of JAMES, MULAIK, & BRETT (1982)

• DATA: Constructed Data by JAMES, MULAIK, & Brett (1982)COV Data, nobs=800, nvar=13

• TEST: Exogenous Manifest VariablesProblems with initial estimates

SEM46 Exploratory FA by JOERESKOG inENSLEIN, RALSTON, & WILF (1977): Statistical Methods for Digital Computers

• DATA: THURSTONE (1940)CORR Data, nobs=286, nvar=9

• TEST: χ2 and Tucker & Lewis (1973) Reliability

SEM47 Eight Physical Variables: HARMAN (1967, p.80), 2ND ED

• DATA: MULLEN (1939)CORR Data, nobs=305, nvar=8

• TEST: Exploratory FASingular central model matrices occurAkaike’s and Schwarz’s Information Criterion


SEM48 Exploratory FA: W. SARLE’s Car Example (FACTOR3.SAS)

• DATA: FACTOR3.SASRaw Data, nobs=251, nvar=10

• TEST: BY Group ProcessingSingular central model matrices occurCompare with PROC FACTORTest WLS and DWLS estimation

SEM49 Confirmatory FA: Twin Example of LOEHLIN (1987, p.120)

• DATA: VANDENBERG et al. (1968)CORR + STDev Data, nobs=63, nvar=6

• TEST: First order confirmatory FA modelSimple example for VideoEquality ConstraintsLocal minimum occurs for start LS methodIndefinite central model matrices occur

SEM52 Synthetic Examples of Browne (1982)

• DATA: BROWN (1982, 1984)COV Data, nobs=932, nvar=6

• TEST: Simple one factor modelEquality ConstraintsCompare GLS and ML results with Browne (1982, 1984)

SEM53 Corn Example of W. A. FULLER (1987)

• DATA: FULLER (1987, p. 19)Raw Data, nobs=11, nvar=2

• TEST: Simple models without parameters to fitIllustrates the differences in handling exogenous and endogenous variables

SEM54 Corn Example of W. A. FULLER (1987)

• DATA: FULLER (1987, p. 19) Raw Data, nobs=11, nvar=2

• TEST: Some errors in variables examplesContains estimation of INTERCEP variableTest different initial estimate settingsTest WLS and DWLS estimationConvergence problems with Levenberg-Marquardt and Newton-Raphson algorithm

SEM56 Fitness Data used in PROC REG document

• DATA: see: STAT User’s Guide, PROC REG chapter Raw Data, nobs=31, nvar=7


• TEST: Some errors in variables examplesEquality ConstraintsContains estimation of INTERCEP variable

SEM75 Responses to Job Attitudes

• DATA: Data from MacCALLUM (1991)COV Data, nobs=1000, nvar=12Joint Meeting of Psychometric andClassification Societies, Rutgers, June 1991

• TEST: Structural Equation model

SEM77 Examples for Browne & Cudeck (1993) Fit Indices

• DATA: Data of Naglieri & Jensen, Intelligence 1992CORR Data, nobs=86, nvar=24Data from Guttman (1954, p.330) see SEM36Examples for AMA Meeting (Feb 1992, San Antonio):Print Ad Recognition Readership Scores, JMR 1988Output Sector Munificence Effects..., JMR 1987

• TEST: Test Browne & Cudeck (1993) Fit IndicesRMSEA, ECVI, and Hypothesis of Close Fit

SEM78 Anderson’s Circular Stochastic Process

• DATA: Data from Guttman (1954, p.330) see SEM36CORR Data, nobs=710, nvar=6see Example 6.4 of Browne & DuToit (1992)

• TEST: Test extensive CMP applicationAnderson’s model includes many local optima that are prevented by BOUNDS statements and goodinitial values

SEM79 Jackknifing with SEM: An Illustration

• DATA: Data of Bentler (1985,p.105), see SEM12Raw Data, nobs=50, nvar=6Observation 50 is OutlierModel: Simulated Confirmatory Factor Analysis

• TEST: Test the SAS Macro SIMULATEShow the distribution of the values of ML discrepancy function and of Schwarz’ Bayesian coefficientfor different subsamples of data set

SEM80 Compare VARCLUS parameters with SEM

• DATA: Eight Physical Variables (Harman)CORR Data, nobs=305, nvar=8Crime Rates by State (see Proc Princomp)Fisher Iris Data

• TEST: First order confirmatory FA modelSEM obtains VARCLUS LS estimatesSEM fits also other (better) modelsRelationship between LS and DWLS methodSEM obtains other than LS estimates:Teststatistics and standard errors with ML and ADF methods


COSAN Manual Test Examples

SEM02 Latent Growth Model

• DATA: Jack McARDLECOV Data, nobs=204, nvar=5

• TEST: Different Print OptionsEquality ConstraintsDifferent Minimization AlgorithmsGood and Bad start values presetSerious problems generating start values

SEM03 Latent Growth Model

• DATA: Jack McARDLECOV Data, nobs=204, nvar=5

• TEST: Different Jacobi AlgorithmsEquality ConstraintsProblems in generating start values

SEM04 Single Factor Model

• DATA: McDONALD and McARDLECORR Data, nobs=200, nvar=6

• TEST: Data Set Input and OutputGenerating good start values

SEM05 Confirmatory Factor Analysis (COSAN Manual)

• DATA: McDONALD and McArdleCORR Data, nobs=100, nvar=8

• TEST: First Order Confirmatory FA modelsDifferent Minimization AlgorithmssGenerating good start values

SEM06 Confirmatory Factor Model (COSAN Manual)

• DATA: LAWLEY & MAXWELL (1971)CORR Data, nobs=100, nvar=8

• TEST: First Order Confirmatory FA modelsGenerating good start values

SEM07 Artificial Data to illustrate LISREL (COSAN Manual)

• DATA: Artificial DataCOV Data, nobs=100, nvar=10

• TEST: Equality Constraints


SEM08 MEYER & BENDIG Longitudinal Factor Model (COSAN Manual)

• DATA: MEYER & BENDIG, McDONALD (1985, p.185)COV Data, nobs=100, nvar=10

• TEST: Different Minimization AlgorithmsDifferent Jacobi AlgorithmsEquality ConstraintsIndefinite central model matrices occur

SEM09 Different Exploratory and Confirmatory Factor Models

• DATA: THURSTONE, see McDONALD (1985, p.57)CORR Data, nobs=213, nvar=9

• TEST: Some Examples of McDonald’s book:Factor Analysis and Related Methods (1985)Identified and not identified models

Test Examples Using Linear and Nonlinear Constraints

SEM20 Some NLIN examples, MORE et al(1981)

• DATA: Rosenbrock FunctionBard FunctionHelical Valley Function

• TEST: CMP TestsMinimization Algorithms

SEM21 Meyer & Bendig Multimode FA, McDONALD (1980)

• DATA: MEYER & BENDIG, McDONALD (1980)COV Data, nobs=100, nvar=10

• TEST: CMP Tests & Active BoundsEquality ConstraintsIndefinite central model matrices

SEM22 Linear Related FA, McDONALD (1980)

• DATA: KINZER & KINZER(), GUTTMAN (1957), Mc DONALD (1980)CORR Data, nobs=326, nvar=6

• TEST: CMP TestsNot identified models

SEM23 Ordinal Related FA, McDONALD (1980)

• DATA: KINZER & KINZER, GUTTMAN (1957), Mc DONALD (1980)CORR Data, nobs=326, nvar=6


• TEST: CMP TestsTest of LINCON StatementCorrection of degrees-of-freedom Not identified models

SEM24 SWAMINATHAN Longitudinal FA, McDONALD (1980)

• DATA: Constructed by Mc DONALD (1980)CORR Data, nobs=100, nvar=9

• TEST: CMP TestsSingular correlation matrixEquality ConstraintsTest the RIDGE optionSingular and indefinite central model matrices

SEM25 SWAMINATHAN Longitudinal FA, McARDLE (1988)

• DATA: Mc ARDLE (1988)CORR + STDev and Mean Data, nobs=204, nvar=32

• TEST: CMP TestsSingular and indefinite central model matrices

SEM26 Different Second Order Confirmatory Factor Models

• DATA: THURSTONE, see McDONALD (1985, p.57)CORR Data, nobs=213, nvar=9

• TEST: CMP Tests with Aichison-Silvey hypothesisSome Examples of McDonald’s book:Factor Analysis and Related Methods (1985)Test of NLINCON StatementIdentified and not identified models

SEM27 Analysis of a Correlation Structure, LEE (1985)

• DATA: Data of ANDERSON & MAIER (1963), HILTON (1969)CORR + STDev Data, nobs=383, nvar=12same as in SEM59, LISREL 7, p.177, see also:Change in Verbal and Quantitive Ability.

• TEST: Constraints the diagonal elements to 1.by using CMP codeEquality Constraints

SEM28 Image factor Analysis with Normal Varimax Rotation (M.W. Browne & S.H.C. Du Toit, 1992)

• DATA: Twenty Four Psychological Tests, HARMAN 3d edition (1976, p.123)CORR Data, nobs=145, nvar=24see Example 6.2 of Browne & DuToit (1992)Example 11 in Extended Users Guide

• TEST: Test of NLINCON StatementNonlinear Equality ConstraintsStandard Errors for Rotated Factor Loadings


SEM29 Exploratory Second Order FA with Direct Quartimin Rotation (M.W. Browne & S.H.C. Du Toit, 1992)

• DATA: Data from BOLLEN (1989, p.318), original from Marsh & Hocevar (1985)CORR + STDev Data, nobs=251, nvar=16see Example 6.3 of Browne & DuToit (1992)Example 12 in Extended Users Guide

• TEST: Test of NLINCON StatementNonlinear Equality and Inequality ConstraintsStandard Errors for Rotated Factor Loadings

SEM76 Analysis of Correlation Structures (Krane & McDonald, 1978)Completely Standardized Recursive Path Model (McDonald, Parker, & Ishizuka, 1993)Examples 8, 9, and 10 in Extended Users Guide

• DATA: 1. THURSTONE, see McDONALD (1985, p.57) (see SEM26)CORR Data, nobs=213, nvar=92. Bollen (1989, p. 334, see SEM71): Political Democracy in 75 Developing CountriesCOV and Mean Data, nobs=75, nvar=11

• TEST: Test CMP applications:1. Analysis of Correlation Structures see Krane & McDonald (1976), Browne (1982) FA Model: SIGMA= FPF’+U is replaced by: SIGMA = D[FPF’ + Diag(I-FPF’)]D, Matrices D, F, and P are LS and MLestimated 2. Completely Standardized Recursive Path Model see: McDonald, Parker, and Ishizuka(1993)

SEM78 Anderson’s Circular Stochastic Process

• DATA: Data from Guttman (1954, p.330) see SEM36see Example 6.4 of Browne & DuToit (1992)CORR Data, nobs=710, nvar=6Example 13 in Extended Users Guide

• TEST: Test extensive CMP applicationAnderson’s model includes many local optima that are prevented by BOUNDS statements and goodinitial values

EQS 2.0 Manual Test Examples

SEM10 Stability and Alienation, EQS (1985, p.31)see also SEM30 and SEM50

• DATA: WHEATON, MUTHEN, ALWIN, & SUMMERS (1977)COV Data, nobs=932, nvar=6

• TEST: The structural equation models from the EQS Guide, Bentler (1985) Equality ConstraintsSome tests for Lagrange MultipliersExample of all Documentations (see also SEM30)Compare results in EQS 2 Manual (1985, p.31-32) and Appendix p. 21.

SEM11 Performance and Satisfaction, EQS (1985, p.101)


• DATA: BAGOZZI (1980), Model studied by DILLON & GOLDSTEIN (1984)CORR and STDev Data, nobs=122, nvar=8

• TEST: Structural equation model from the EQS Guide, BENTLER(1985) Compare results in EQS 2Manual (1985, p.101-104)

SEM12 Simulated Confirmatory FA, EQS (1985, p.105)

• DATA: Simulated by BENTLER (1985)Raw Data, nobs=50, nvar=6

• TEST: Structural equation model from the EQS Guide, Bentler (1985) First order confirmatory FAmodelEliminating OutlierSome tests for Lagrange MultipliersCompare results in EQS 2 Manual (1985, pp. 105)Elliptic and arbitrary estimationTest WLS and DWLS estimation

SEM50 Stability and Alienation, EQS (1985, p.20)see also SEM10 and SEM30


• TEST: Path model from the EQS Guide, Bentler (1985, p.20)Compare results in EQS 2 Manual (1985, pp. 20)Test automatic variable selection

More Test Examples of Bentler

SEM51 Generalized Multimode Latent Variable Models

• DATA: BENTLER, POON & LEE (1988)CORR Data, nobs=68, nvar=12

• TEST: Three Mode Models in Factor AnalysisCompare GLS and ML results with BENTLER et al(1988)Serious problems in estimating initial valuesSingular central model matrices occur

SEM55 Two Mean Structure Models of EQS 3. Manual

• DATA: BENTLER (1989) and McARDLE & EPSTEIN (1987)Raw Data, nobs=4, nvar=2COV and Mean Data, nobs=204, nvar=4

• TEST: 1. Regression Example with INTERCEP variable:Standard Error for INTERCEP parameter differsTotal Effects are okay2. Growth in WISC Scores:


Serious problems in estimating initial valuesGoodness-of-fit statistics agree for 3 decimalsStandard Errors agree (N=204)Total Effects are okay (EQS prints some more, redundand)

SEM79 Illustration of Bootstrap and Jackknife Strategy

• MODEL: First Order Confirmatory FA, EQS (1985, p.105)

• DATA: Simulated by BENTLER (1985)Raw Data, nobs=50, nvar=6

• TEST: Generate sets of fit criteria by deleting observations Observation 50 in data set is an outlierUse PROC APPEND to collect fit indicesUse PROC CHART for vertical bar charts

JOERESKOG (1978) Test Examples

SEM30 Stability and Alienation, JOERESKOG (1978, pp.466)See also SEM 10 and SEM50See also LISREL VI Manual (1985, p. III.54)See also LISREL 7 Manual (1987, p. 169, Ex. 6.4)


• TEST: Some structural equation modelsExample of all Documentations (See also SEM10)

• OUTP: Long output contains (p. III.62-66):Modification indices for model AInitial estimatesMaximum-likelihood estimatesTotal effectsVariances and covariances

SEM31 Twenty Four Psychological Tests, JOERESKOG (1978, pp.457)

• DATA: HARMAN 3d edition (1976, p.123)CORR Data, nobs=145, nvar=24

• TEST: Exploratory and First Order Confirmatory FA

• OUTP: Output contains (p. 458-460):Maximum-likelihood estimatesχ2 values.

SEM32 Vocabulary Tests, JOERESKOG (1978, pp.451)See also LISREL 7 Manual (1987, p. 93, Ex. 3.3)

• DATA: LORD (1957)COV Data, nobs=649, nvar=4


• TEST: First Order Confirmatory FASingular central model matricesEquality ConstraintsTesting various models by means of χ2 values

• OUTP: ML estimates and χ2 values.

SEM33 Speed Factor Data, JOERESKOG (1978, p.453-455)

• DATA: LORD (1956)CORR Data, nobs=649, nvar=18

• TEST: First Order Confirmatory FA


SEM34 Assess Teacher’s Judgments, JOERESKOG (1978, p.462-464)

• DATA: MILLER & LUTZ, WILEY, SCHMIDT, & BRAMBLE (1973)COV Data, nobs=51, nvar=8

• TEST: First order confirmatory FA modelAnalysis of Factorial DesignEquality ConstraintsGiven loadings, factor corr’s to estimateTesting various models by means of χ2 values.


SEM35 Role Behavior of Farm Managers, JOERESKOG (1978, pp.465)See also LISREL VI Manual (1985, p. III.67)See also LISREL 7 Manual (1987, p. 135, Ex. 5.2)

• DATA: WARREN, WHITE & FULLER (1978) COV Data, nobs=98, nvar=9

• TEST: Second Order Confirmatory FA modelSingular central model matrices


SEM36 Six different Kinds of Abilities, JOERESKOG (1978, pp.475)See also LISREL VI Manual (1985, p. III.24)’;

• DATA: WARREN, WHITE & FULLER (1978)CORR Data, nobs=710, nvar=6

• TEST: Two Circumplex ModelsEquality Constraints

LISREL VI Manual Test Examples

SEM13 Ability and Aspiration, LISREL VI (1985, p.III.5)See also LISREL 7 Manual (1987, p. 83, Ex. 3.2)


• DATA: CALSYN & KENNY (1977)CORR Data, nobs=556, nvar=6

• TEST: First Order Confirmatory FA model.

• OUTP: Long output contains (p. III.5-20):CORR matrix with determinantParameter specificationInitial estimatesMaximum-likelihood estimatesR-squared valuesCoefficient of determinationGoodness-of-fit measuresStandard errors and t-valuesAbsolute and standardized residualsVariances and covariancesModification indicesFactor scores regressions

SEM14 Prediction of Grade Averages, LISREL VI (1985, p.III.26)See also LISREL 7 Manual (1987, p. 116, Ex. 4.4)

• DATA: FINN (1974)Raw Data, nobs=15, nvar=5

• TEST: Model contains manifest variables only.

• OUTP: Output contains (p. III.29-31):Initial estimatesMaximum-likelihood estimatesR-squared valuesCoefficient of determinationGoodness-of-fit measuresStandard errors and t-valuesTests also:WLS and DWLS estimates, WRIDGE optionnobs == n(n+1)/2: singular weight matrix

SEM15 Ambition and Attainment, LISREL VI (1985, p.III.33)See also LISREL 7 Manual (1987, p. 119, Ex. 4.5)

• DATA: KERCHOFF (1974)CORR Data, nobs=767, nvar=7

• TEST: Model contains manifest variables only.• OUTP: Output contains (p. III.38-39):

Maximum-likelihood estimatesR-squared valuesCoefficient of determinationGoodness-of-fit measuresTotal effects

SEM16 Nine Psychological Variables, LISREL VI (1985, p.III.106)


• DATA: HOLZINGER & SWINEFORD (1937)CORR Data, nobs=145, nvar=9

• TEST: First Order Confirmatory FA modelsFull output given in LISREL VI ManualDifferent models illustrate use of modification indices.

• OUTP: Long output contains (p. III.108-122):CORR matrix with determinantParameter specificationInitial estimatesMaximum-likelihood estimatesR-squared valuesCoefficient of determinationGoodness-of-fit measuresModification indicesStandard errors and t-valuesAbsolute and standardized residuals

SEM17 KLEIN’s Model I, LISREL VI (1985, p.III.42)See also LISREL 7 Manual (1987, p. 123, Ex. 4.6)

• DATA: Time Series Data of THEIL (1975)Raw Data, nobs=21, nvar=15

• TEST: LS Model:Contains manifest variables onlySingular correlation matrixSingular predicted model matrixRIDGE optionML Model:Contains exogenous manifest variablesSingular and indefinite central model matrices

• OUTP: Output contains (p. III.49):Initial estimatesLeast-squares estimates(Full information) Maximum-likelihood estimates

SEM18 Peer Influences on Ambition, LISREL VI (1985, p.III.81)See also LISREL 7 Manual (1987, p. 145, Ex. 5.5)

• DATA: DUNCAN, HALLER & PORTES (1971)CORR Data, nobs=329, nvar=10

• TEST: Structural equation example (path diagram p.III.83)Discussion of identification of various models

• OUTP: Output contains (p. III.90-91):Initial estimatesEquality ConstraintsMaximum-likelihood estimatesStandardized maximum-likelihood estimates


χ2 values (in text)Total effects

SEM19 Simplex Models for Academic Performance,LISREL VI (1985, p.III.70)See also LISREL 7 Manual (1987, p. 186, Ex. 6.6)

• DATA: HUMPHREYS (1968)CORR Data, nobs=1560, nvar=10

• TEST: Identified and not identified models

• OUTP: Output contains (p. III.78):Maximum-likelihood estimatesEquality Constraintsχ2 values (in text)Problems with FABIN estimatesSingular central model matrices

SEM37 Hypothetical Structural Model, LISREL VI (1985, pp. III.94)See also LISREL 7 Manual (1987, p.7-9,73, Ex. 1)

• DATA: Constructed Data of JOREKOG & SORBOMCOV Data, nobs=100, nvar=11

• TEST: Structural equation model (path model p. III.95)Second order confirmatory FA modelSingular predicted model matrix.

• OUTP: Output contains (p. III.101):True parameter valuesInitial estimatesLeast-squares estimatesMaximum-likelihood estimatesχ2 values (in text)

SEM38 Principal Components of Fowl Bone Measurements,LISREL VI (1985, pp. III.102)

• DATA: WRIGHT (1954)CORR Data, nobs=276, nvar=6

• TEST: Different PC and first order FA Models

LISREL 7 Manual: Test Examples

SEM39 Performance and SatisfactionLISREL 7 Manual (1987, p. 151, Ex. 5.6)

• DATA: BAGOZZI (1980), Exampel studied by DILLON & GOLDSTEIN (1984)CORR, Mean, STDev Data, nobs=122, nvar=8


• TEST: Structural equation model from the paper:New Developments in LISREL (Joereskog & Soerbom, 1987)Compare results to those reported in the paper.

SEM57 1. Regression of GNP, LISREL 7, p.109: Ex. 4.12. Stepwise Regression with GLS, LISREL 7, p.111: Ex. 4.23. ANOVA and ANCOVA, LISREL 7, p.114: Ex. 4.3

• 1. DATA: GNP Data of GOLDBERGER (1964, p.187)RAW Data, nobs=23, nvar=4

• 1. TEST: Simple Regression Analysis

• 2. DATA: Werner Blood Chemistry Data, see DIXON (1981) COV Data, nobs=180, nvar=8

• 2. TEST: Stepwise Regression with Modification IndicesInteresting features

• 3. DATA: HUITEMA (1980): Academic Achievement Raw Data, nobs=30, nvar=5

• 3. TEST: Simple ANOVA and ANCOVA Models

SEM58 1. Second Order Factor Analysis, LISREL 7, p. 160: Ex. 6.22. Social Status and Participation, LISREL 7, p.143: Ex. 5.4

• 1. DATA: JOREKOG & SORBOM, (1988)CORR and STDev Data, nobs=267, nvar=10

• 1. TEST: Two Second Order Confirmatory FA model

• 1. OUTP: No Problems

• 2. DATA: HODGE & TREIMAN (1968)CORR Data, nobs=267, nvar=6

• 2. TEST: MIMIC model

• 2. OUTP: Parameters, Test indices, DeterminationSEM gets a much better chi square

SEM59 Change in Verbal and Quantitive Ability, LISREL 7, p.177: Ex. 6.5

• DATA: Data of ANDERSON & MAIER (1963), HILTON (1969)CORR and STDev Data, nobs=383, nvar=12

• TEST: Different models with different df

• OUTP: No Problems

SEM60 KRISTOF Model, Three Subtests of SAT, LISREL 7, p.159: Ex. 6.1

• DATA: Data and Model of KRISTOF (1971)COV Data, nobs=900, nvar=3


• TEST: Different models with CMP applications

• OUTP: COSAN Model works fineNo problems with CMP applicationsNo problems without CMP

SEM61 Attitudes of Morality and Equality, LISREL 7, p. 194: Ex. 7.1

• DATA: HASSELROTH & LERNBERG, (1980)CORR Data, nobs=200, nvar=10

• TEST: First Order Confirmatory FA model.

• OUTP: Test WLS and DWLS estimationAsymptotic (polychoric) correlations

SEM62 Two Wave Panel Model for Political Efficacy, LISREL 7, p.196: Ex. 7.2

• DATA: AISH & JORESKOG, (1988)CORR Data, nobs=410, nvar=12

• TEST: Second Order Confirmatory FA model.

• OUTP: Test WLS and DWLS estimation,Test WLS modification indices,Asymptotic (polychoric) correlations

SEM63 Law School Admissions Test, LISREL 7, p. 203: Ex. 7.3

• DATA: BOCK & LIEBERMAN, (1970)CORR Data, nobs=1000, nvar=5

• TEST: First Order Exploratory FA model.

• OUTP: Test WLS and DWLS estimation,Asymptotic (tetrachoric) correlations

SEM64 Generated Nonnormal Data, LISREL 7, p.206: Ex. 7.4

• DATA: generated by JORESKOG (1988)Raw Data, nobs=200, nvar=4

• TEST: First Order Exploratory FA model,Compute asymptotic covariances,WLS, DWLS, ML, GLS, ULS estimation,Variable selection in INWGT data set,Optimization methods,BIASKUR option

• OUTP: Test WLS and DWLS estimation,Asymptotic (tetrachoric) correlations

SEM65 Estimating and Testing a Correlation StructureLISREL 7, p. 208: Ex. 7.5


• DATA: Generated by JORESKOG (1988)CORR Data, nobs=200, nvar=5

• TEST: Estimate matrix PHI with equality constraints

• OUTP: Test WLS and DWLS estimation,Test WLS criterion and standard errors,Test WLS modification indices,Modification indices for equality constraints

SEM66 Goesta’s Bad Example, LISREL 7, p.212: Ex. 8.1

• DATA: HAGGLUND (1982)CORR Data, nobs=200, nvar=6

• TEST: BOUNDS statement necessary for convergence

• OUTP: Maximum Likelihood Estimation,Data noise leads to negative variances,No convergence without BOUNDS statement,No problems with BOUNDS statement

BOLLEN (1989) Test Examples

SEM41 Objective and Subjective Social Status, BOLLEN (1989, p.116)

• DATA: KLUEGEL, et al. (1977)COV Data, nobs=432, nvar=5

• TEST: Different structural equation modelsCompare results to LISREL VI

SEM42 Democratic Political Structures in 113 Counties BOLLEN (1980)

• DATA: BOLLEN (1980)COV Data, nobs=113, nvar=6

• TEST: Different first order confirmatory FA models

SEM70 Union Sentiment, BOLLEN (1989, p.120, 130)

• DATA: McDONALD & CLELLAND (1984)COV and Mean Data, nobs=173, nvar=5

• TEST: Different first order confirmatory FA modelsModel with intercepts

SEM71 Political Democracy in 75 Developing Countries

• DATA: Data from BOLLEN (1989, p.229, p334) Raw and COV and STDev Data, nobs=75, nvar=8


• TEST: First order confirmatory FA modelsStructural Equation ModelsWith and without Intercepts

SEM72 Sympathy and Anger, BOLLEN (1989, p. 260)

• DATA: Data from REISENZEIN (1986)COV Data, nobs=138, nvar=6

• TEST: First order confirmatory FA model

SEM73 Line Length Estimation, see BOLLEN (1989, p. 310,316)

• DATA: Data from BOLLEN (1989, p.310)COV and Mean Data, nobs=60, nvar=5

• TEST: First order confirmatory FA modelLatent Variable Means - Equation Intercept

SEM74 Self Concept, see BOLLEN (1989, p. 318)

• DATA: Data from BOLLEN (1989, p.318)CORR and STDev Data, nobs=251, nvar=16

• TEST: Second order confirmatory FA model

Mean Structure and Multiple Sample Modeling

SEM101 EQS Manual (1992), chapter 7, pp. 158:

• Equal Loadings and equal Factor Correlation

• All Parms are equal, p. 162

SEM102 LISREL 7 Manual (1992, pp.230):

• Test Equality of COV MatricesLISREL: chisqu= 38.31, df=10, p= .0

• Full CFA Model, no group constraintsLISREL: chisqu= 1.52, df=2, p= .468

• Test Equality of Factor LoadingsLISREL: chisqu= 8.77, df=4, p= .067

• Equality of Factor Loadings and Unique Var.LISREL: chisqu= 21.55, df=8, p= .006

• Test All Parms EqualLISREL: chisqu= 38.22, df=11, p= .000

SEM103 EQS Manual, 1995, Chapter 8:

• Regression with Mean: Bentler (1995), p.170

• Growth in WISC Scores: Bentler (1995), p. 175


SEM104 EQS Manual, 1995, Chapter 9:

• Head Start 1 Group Model: Bentler (1995), p.182

• Head Start 2 Group Model: Bentler (1995), p.186

• Olsson’s Experimental Data: Bentler (1995), p. 193

SEM105 LISREL 7 Manual, 1992, Chapter 9:

• Testing Equality of Factor Corr. Matrices, p. 233

• Son’s and Parent’s Reports of Parental..., p. 235

• Subjective and Objective Social Class, p. 242

192 Details: Optimization: LP, QP, and NLP

1.3 Details on Optimization: LP and QP

1.3.1 Quadratic Optimization Method

The qp() function algorithm can be used to minimize or maximize a quadratic objective function,

f(x) =12xTGx+ gTx+ c, with GT = G, (1.1)

with linear or boundary constraints

Ax ≥ b or lbj ≤ xj ≤ ubj . (1.2)

where x = (x1, . . . , xn)T , g = (g1, . . . , gn)T , G is an n × n symmetric matrix, A is an m × n matrix of generallinear constraints, and b = (b1, . . . , bm)T . The value of c only modifies the value of the objective function, not itsderivatives, and the location of the optimizer x∗ does not depend on the value of the constant term c. Of course,derivatives do not have to be specified when using a quadratic objective function. For the quadratic objectivefunction, the gradient vector

∇f(x) = Gx+ g (1.3)

and the n× n Hessian matrix∇2f(x) = G (1.4)

are specified by the data input.

QP Algorithms The matrix G must be symmetric but not necessarily positive (for minimization) or negative(for maximization) definite. The active set algorithm in qp() implements three versions of an active set method(Gill, Murray, Saunders, & Wright, 1984):

null space method: ”qpnusp” The Cholesky factor of the projected Hessian matrix ZTk GZk and the QT

decomposition of the matrix Ak of active linear constraints are updated simultaneously when the active setchanges. Here Q is an nfree × nfree orthogonal matrix containing the null space Z in its first nfree − nalc

columns and the range space Y in its last nalc columns and T is an nalc × nalc triangular matrix of specialform, tij = 0 for i < n − j, where nfree is the number of free parameters (n minus the number of activeboundary constraints), and nalc is the number of active linear constraints.

range space method: ”qprasp” Only the nfree×nalc range space matrix Y is stored, the null space matrix Zis not even computed. That means, that the range space method can save a significant amount of memoryif the optimization problem is defined in such a way that only a few linear constraints can be active at eachstage of the optimization process.

Powell’s Goldfarb and Idnani: ”qppogi” Is very similar to the null space method ”qpnusp” but is obviouslyslower than that for large n.

The null space methods are more suitable when there are many active constraints expected at some stage of theoptimization process. It also can be numerically more stable than the range space method. If only boundary andno general linear constraints are specified both methods should behave identical. For both methods the updateof active boundary and linear constraints is done separately to save core memory.

In addition three more QP algorithms are implemented which solve problems for large n with only boundaryconstraints:

Details: Optimization: LP, QP, and NLP 193

dual algorithm by Madsen, Nielsen and Pinar (1995): ”qpmanp” This algorithm works well for suffi-ciently positive eigenvalues of G and can be very fast. The method exploits the sparsity or band structureof G.

TRON algorithm by Lin and More (1999): ”qptron” The method exploits the sparsity or band structureof G and can be very fast for very large n.

interior point algorithm with barrier function: ”qpbarr” The method exploits the sparsity or band struc-ture of G and can be very fast for very large n.

1.4 Details on Optimization: NLP

1.4.1 Choosing an Optimization Technique

1. Nonlinearly Constrained Optimization

(a) Small Problems: COBYLAThis is a modification of Powell’s (1992) original COBYLA algorithm which can also solve nonsmoothproblems. The original version of Powell’s code can also be run by specifying the version option "vers"to one. Since the COBYLA (Constrained Optimization BY Linear Approximations) algorithm usesonly function calls to generate linear approximations of first-order derivatives, it is not very suitablefor highly nonlinear problems or for problems with more than 20 parameters.

(b) Medium Problems: DQNSQPThe nonlinearly constrained version of the quasi Newton algorithm is a modification of Powell’s (1982)VMCWD algorithm, which is a sequential quadratic programming (SQP) method.

(c) Medium Problems: VMCWDThis is the original version of Powell’s VMCWD code which he mailed me when I was still employedat SAS.

(d) Very Large Problems: QADPENThe quadratic penalty method by Conn & Gould (1984) is able to solve large and sparse nonlinearlyconstrained problems.

2. Unconstrained or Linearly Constrained Optimization

(a) Small Problems: TRUREG (NEWRAP, NRRIDG)Use these techniques for small and medium-sized optimization problems (up to 40 parameters) wherethe Hessian matrix is not expensive to compute. Sometimes NRRIDG can be faster than TRUREG,but TRUREG can be more stable. NRRIDG needs only one matrix with n(n + 1)/2 double words.TRUREG and NEWRAP need for dense problems two such matrices, however, both are able to exploitsparsity for the storage of the Hessian and the computation of the Cholesky factor.

(b) Medium Problems: QUANEW (DBLDOG)Use these techniques for medium-sized and moderately large-sized optimization problems (up to 200parameters) where the objective function and the gradient are much faster to evaluate than the Hessian.QUANEW and DBLDOG in general need more iterations than TRUREG, NRRIDG, and NEWRAP,but each iteration can be much faster. QUANEW and DBLDOG need only the gradient to updatean approximate Hessian. QUANEW and DBLDOG need slightly less memory than TRUREG orNEWRAP (essentially one matrix with n(n + 1)/2 double words). Usually the BFGS update of theapproximate Hessian works very well.


(c) Large Problems with sparse Hessian: NEWRAP or TRUREGUse these techniques for unconstrained or only boundary constrained problems where the Hessian isnot expensive to compute and contains a significant number of zeros. In general, the trust regionmethod is more stable numerically, but the amount of computation during each iteration is smallerfor the line search Newton Raphson method. Currently only the null space method is implementedfor general linear constraints, the projected Hessian becomes dense for active linear constraints, andNEWRAP and TRUREG are not recommended.

(d) Large Problems with only boundary constraints: LMBFGSThe Limited Memory method (Byrd, Nocedal, & Zhu, 1995) only needs O(n) memory for storingan approximate Hessian by storing a number of gradient and point differences of the last iterations.That positions LMBFGS inbetween the quasi Newto methods (which need O(n2) memory for theapproximate Hessian) and the conjugate gradient methods which only need a very small number ofvectors of size n.

(e) Large Problems with dense Hessian: CONGRAUse this technique for very large-sized optimization problems (more than 200 parameters) where theobjective function and the gradient can be computed much faster than the Hessian and where too muchmemory is needed to store the (approximate) Hessian. CONGRA in general needs more iterations thanQUANEW or DBLDOG, but each iteration can be much faster. Since CONGRA needs only a factorof n double-word memory, many large applications can be solved only by CONGRA. The following

updates are available:

”PB” for CONGRA: Powell-Beale update (default)”FR” for CONGRA: Fletcher-Reeves update”PR” for CONGRA: Polak-Ribiere update”BM” for CONGRA: Birgin-Martinez restart update”SFR” for CONGRA: scaled Fletcher-Reeves update”SPR” for CONGRA: scaled Polak-Ribiere update”CD” for CONGRA: conjugate directions update (not recommended)

The Powell-Beale and the Birgin-Martinez restart updates are recommended most.

3. Least-Squares Minimization:

(L2) F (x) =m∑

i=1

f2i (x) −→ min

x

(a) Small Problems: LEVMAR (HYQUAN)Use these techniques for small and medium-sized least-squares minimization problems (up to 60 pa-rameters) where the cross-product Jacobian matrix is easy and inexpensive to compute. In general,LEVMAR is more reliable, but there are problems with high residuals where HYQUAN can be fasterthan LEVMAR.

(b) Medium Problems: QUANEW (DBLDOG)No specific algorithm is available, but the least squares problem can be reformulated as a generalminimization problem and solved by one of the more general quasi-Newton algorithms.

(c) Case 3: Large Problems: CONGRA (GAUNEW)No specific algorithm is available, but the least squares problem can be reformulated as a generalminimization problem and solved by one of the more general conjugate gradient algorithms. For thefuture a sparse Gauss-Newton algorithm is planned for large scale least squares problems.

4. Nonsmooth Optimization:

Nonsmooth optimization tries to optimize objective functions which are continuous but not continuousdifferentiable. An example is the absolute value function |f(x)|.


(a) Bundle Trust Region Methods: NONDIF For general nondifferential optimization with boundaryand linear constraints a set of Bundle Trust Region methods was implemented (Outrata, Schramm, &Zowe, 1991). Below we give more details and applications.

(b) L1 Minimization: NL1REG There are two different algorithms available to solve the specific non-smooth nonlinear L1 regression problem:

(L1) F (x) =m∑

i=1

|fi(x)| −→ minx

• The algorithm of Hald & Madsen (1985) is able to solve linearly constrained L1 minimizationproblems.

• The algorithm by Pinar & Hartmann (1999) is a modification of the Conn & Gould (1984) quadraticpenalty algorithm for large-scale nonlinearly constrained optimization. Currently, this algorithmis only able to solve unconstrained L1 problems.

Note The reg(.) function provides a number of algorithms to solve the linear L1 regression problem

|Ax− y| −→ minx

(c) L∞ Minimization: NLIREG, MINMAX The algorithm of Hald & Madsen (1984) is able to solvelinearly constrained MinMax problem

(MinMax) F (x) =m

maxi=1

fi(x) −→ minx

which is also used to solve the more specific linearly constrained L∞ minimization problem

(L∞) F (x) =m

maxi=1

|fi(x)| −→ minx

Solving the (L∞) problem of m functions fi(x) is equivalent to solving the MinMax problem with 2mfunctions fi(x) and −fi(x). Note The reg(.) function provides a number of algorithms to solve thelinear L∞ regression problem

maxmi=1|

n∑

j=1

aijxj − yi| −→ minx

(d) Direct Search Methods: NMSIMP, COBYLA There are two derivative free search methodsavailable for optimization problems for more general nonsmooth objective functions:

• A classic Nelder-Mead algorithm, NMSIMP, which is able to solve unconstrained or boundaryconstrained general nonsmooth problems.

• The COBYLA (Constrained Optimization BY Linear Approximation) algorithm (Powell, 1992) isable to solve linear and nonlinearly constrained nonsmooth problems.

5. Global Optimization:

(a) Simulated Annealing: SIMANNThis is the algorithm by Goffe, Ferrier, & Rogers (1994). It is highly recommended to restrict the sizeof the search area by specifying some intelligent lower and upper bounds.

(b) Genetic Algorithm: GENALGThe current implementation of the genetic algorithm assumes the maximization of an objective functiong(x) with only positive function values. Therefore, for minimization a positive valued function f(x)the function definition must be changed to some g(x) like

g(x) = sup(f(x)) − f(x)


where sup() denotes the largest possible function value inside the feasible region.It is highly recommended to restrict the size of the search area by specifying some intelligent lower andupper bounds.

(c) Multilevel Coordinate Search: MCS

1.4.2 List of Options in Alphabetical Order

ABSCONV | ABSTOL =r ≥ 0 :specifies the absolute function convergence criterion:

• For minimization, termination requires

f (k) = f(x(k)) ≤ r.

• For maximization, termination requires

f (k) = f(x(k)) ≥ r.

The default value of r is:

• for minimization: the negative square root of the largest double precision value, i.e. -sqrt(macon("mbig"));

• for maximization: the positive square root of the largest double precision value, i.e. sqrt(macon("mbig")).

ABSFCONV | ABSFTOL =r ≥ 0 :specifies the absolute function convergence criterion.

TRUREG, NEWRAP, NRRIDG, QUANEW, DBLDOG, CONGRA, LEVMAR, HYQUANTermination requires a small change of the function value in successive iterations:

|f(x(k−1)) − f(x(k))| ≤ r.

The default value is r = 0.

NMSIMP The same formula is used, but x(k) is defined as the vertex with the lowest function value, andx(k−1) is defined as the vertex with the highest function value in the simplex. The default value isr = 0.

QADPEN, DQNSQP, COBYLA, NL1REG, NLIREG Currently not used.

ABSGCONV | ABSGTOL =r ≥ 0 :specifies the absolute gradient convergence criterion.

TRUREG, NEWRAP, NRRIDG, QUANEW, DBLDOG, CONGRA, LEVMAR, HYQUAN, DQNSQPTermination requires the maximum absolute gradient element to be small:

maxj

|gj(x(k))| ≤ r.

The default value is r=1E-5.

QADPEN, COBYLA, NMSIMP, NL1REG, NLIREG Currently not used.

ABSXCONV | ABSXTOL =r ≥ 0 :specifies the absolute parameter convergence criterion.


TRUREG, NEWRAP, NRRIDG, QUANEW, DBLDOG, CONGRA, LEVMAR, HYQUAN, DQNSQPTermination requires a small Euclidean distance between successive parameter vectors,

‖ x(k) − x(k−1) ‖2≤ r.


COBYLA, NMSIMP Termination requires either a small length α(k) of the vertices of a restart simplex:

α(k) ≤ r,

or a small simplex size:δ(k) ≤ r,

where the simplex size δ(k) is defined as the L1 distance of the simplex vertex y(k) with the smallestfunction value to the other n simplex points x(k)

l 6= y:

δ(k) =∑

xl 6=y

‖ x(k)l − y(k) ‖1 .

The default value is r=1E-4 for the COBYLA technique, r=1E-8 for the standard NMSIMP technique.

QADPEN, NL1REG, NLIREG Currently not used.

ASINGULAR | ASING =r ≥ 0 :specifies an absolute singularity criterion for the computation of the inertia (number of positive, negative,and zero eigenvalues) of Hessian and cross product Jacobian and their projected forms. The followingsingularity criterion is used

|dj,j| ≤ max(ASING, V SING ∗ |Aj,j|,MSING ∗max(|A1,1|, . . . , |An,n|))

where dj,j is the diagonal pivot of the matrix A, and VSING and MSING are the specified values of theVSINGULAR= and MSINGULAR= options. The default value for ASING is the square root of the smallestpositive double precision value.

BOUNDS = string :specifies the name of an n× 2 matrix which contains lower bounds blbj in its first column and upper boundsbubj in its second column whereas the rows correspond to the n variables,

blbj ≤ xj ≤ bubj , j = 1, . . . , n

The matrix can contain missing values, if there is no bound for the variable in that direction.

CDIGITS =r ≥ 0 :specifies the number of accurate digits in nonlinear constraint evaluations. Fractional values such as CDIG-ITS=4.7 are allowed. The default is r=-log10(ε), where ε is the machine precision. The value of r is used tocompute the interval size h for the computation of finite-difference approximations of the Jacobian matrixof nonlinear constraints.

DAMPSTEP =r ≥ 0 :specifies that the initial step size value α(0) for each line search (used by QUANEW, HYQUAN, CON-GRA, or NEWRAP) cannot be larger than r times the step size value used in the former iteration. TheDAMPSTEP option can prevent the line-search algorithm from repeatedly stepping into regions where someobjective functions are difficult to compute or where they could lead to floating point overflows during thecomputation of objective functions and their derivatives. The DAMPSTEP option can save time-costlyfunction calls during the line searches of objective functions that result in very small step sizes α. However,for the most common cases specifying this option slows down the convergence speed.


FDA = string | r ≥ 0The FDA option specifies whether forward or central finite difference formulas are used for computing allderivatives. The following specifications are permitted:

• ”F”: Always use forward differences.

• ”C”: Always use central differences.

• r ≥ 0: Use central differences for the initial and final evaluations of the gradient, Jacobian, and/or Hes-sian. During iteration, start with forward differences and switch to a corresponding central-differenceformula during the iteration process when one of the following two criteria is satisfied:

– The absolute maximum gradient element is less than or to equal r times the ABSGTOL threshold.– The term left of the GTOL criterion is less than or equal to max(1.e-6,r * GTOL threshold).

The default value of r = 1.e− 6 ensures that the switch is done, even if the user sets the GTOLthreshold to zero.

FDG = string | r ≥ 0The FDG option specifies whether forward or central finite difference formulas are used for computing thegradient. For more details refer to the FDA option.

FDJ = string | r ≥ 0The FDJ option specifies whether forward or central finite difference formulas are used for computing theJacobian of the objective function. For more details refer to the FDA option.

FDH = string | r ≥ 0The FDH option specifies whether forward or central finite difference formulas are used for computing theHessian. For more details refer to the FDA option.

FDC = string | r ≥ 0The FDV option specifies whether forward or central finite difference formulas are used for computing theJacobian of the nonlinear constraints. For more details refer to the FDA option.

FCONV | FTOL =r ≥ 0 :specifies the relative function convergence criterion.

TRUREG, NEWRAP, NRRIDG, QUANEW, DBLDOG, CONGRA, LEVMAR, HYQUAN, DQNSQPTermination requires a small relative change of the function value in successive iterations,

|f(x(k)) − f(x(k−1))|max(|f(x(k−1))|, FSIZE)

≤ r,

where FSIZE is defined by the FSIZE option. The default value is r=10−FDIGITS, where FDIGITSis either specified or is set by default to −log10(ε) where ε is the machine precision.

NMSIMP, COBYLA The same formula is used, but x(k) is defined as the vertex with the lowest functionvalue, and x(k−1) is defined as the vertex with the highest function value in the simplex. The defaultvalue is r=10−FDIGITS, where FDIGITS is either specified or is set by default to −log10(ε) where ε isthe machine precision.


FCONV2 | FTOL2 =r ≥ 0 :specifies another function convergence criterion for least-squares problems.


TRUREG, NEWRAP, NRRIDG, QUANEW, DBLDOG, CONGRA, LEVMAR, HYQUAN, DQNSQPTermination requires a small predicted reduction

df (k) ≈ f(x(k)) − f(x(k) + s(k))

of the objective function. The predicted reduction

df (k) = −g(k)T s(k) − 12s(k)TG(k)s(k)

= −12s(k)T g(k)

≤ r

is computed by approximating the objective function f by the first two terms of the Taylor series andsubstituting the Newton step

s(k) = −G(k)−1g(k).

The default value is r=1E-6 for DQNSQP and r = 0 otherwise.

NMSIMP, COBYLA Termination requires a small standard deviation of the function values of the n+1simplex vertices x(k)

l , l = 0, . . . , n,

√1

n+ 1

∑

l

(f(x(k)l ) − f (x(k)))2 ≤ r

where f(x(k)) = 1n+1

∑l f(x

(k)l ). If there are nact boundary constraints active at x(k), mean and

standard deviation are computed only for the n + 1 − nact unconstrained vertices. The default valueis r=1E-6.


FDIGITS =r ≥ 0 :specifies the number of accurate digits in evaluations of the objective function. Fractional values such asFDIGITS=4.7 are allowed. The default is r=-log10(ε), where ε is the machine precision. The value of r isused to compute the interval size h for the computation of finite-difference approximations of the derivativesof the objective function and for the default value of the FCONV option.

FDINT = OBJ | CON | ALL :specifies whether the finite difference intervals h should be computed using an algorithm of Gill, Murray,Saunders, & Wright (1983) or are based only on the information of the FDIGITS= and CDIGITS options.For FDINT=OBJ, the interval h is based on the behavior of the objective function, for FDINT=CON, theinterval h is based on the behavior of the nonlinear constraints functions, and for FDINT=ALL, the intervalh is based on both, the behavior of the objective function and the nonlinear constraints functions. Thealgorithm of Gill, Murray, Saunders, & Wright (1983) can be very expensive in the number of function calls.If FDINT= is specified, it is currently performed only twice: the first time before the optimization processstarts and the second time after the optimization terminates (for precise result output).

FSIZE =r ≥ 0 :specifies the FSIZE parameter of the relative function and relative gradient termination criteria. Thedefault value is r = 0. For more details, see the FCONV and GCONV options.

GCONV | GTOL =r ≥ 0 :specifies the relative gradient convergence criterion.


TRUREG, NEWRAP, NRRIDG, QUANEW, DBLDOG, LEVMAR, HYQUAN, DQNSQP Terminationrequires that the normalized predicted function reduction is small,

g(x(k))T [G(k)]−1g(x(k))max(|f(x(k))|, FSIZE)

≤ r,

where FSIZE is defined by the FSIZE option. The default value is r=1E-8.CONGRA Here a reliable Hessian estimate G is not available and therefore

‖ g(x(k)) ‖22 ‖ s(x(k)) ‖2

‖ g(x(k)) − g(x(k−1)) ‖2 max(|f(x(k))|, FSIZE)≤ r

is used. The default value is r=1e-8.QADPEN, NL1REG(version=1) Termination requires that the penalty parameter µ is smaller than r.

The default value is r=1e-8.NMSIMP, COBYLA, NL1REG(version=2), NLIREG Currently not used.

GCONV2 | GTOL2 =r ≥ 0 :specifies another relative gradient convergence criterion for least-squares problems.

TRUREG, NEWRAP, NRRIDG, LEVMAR The criterion of Browne (1982) is used,

maxj

|gj(x(k))|√f(x(k))G(k)

j,j

≤ r.

The default value is r = 0.QUANEW, DBLDOG, HYQUAN, QADPEN, DQNSQP, NMSIMP, COBYLA, NL1REG, NLIREG

Currently not used.

GRADCHECK | GC [= NONE | FAST | DETAIL] :Specifying GRADCHECK= DETAIL computes a test vector and test matrix (see Wolfe, 1982) to checkwhether the gradient g = g(x) specified by a grad() module argument is appropriate for the functionf = f(x) computed by the program statements. If the specification of the first derivatives is correct, theelements of the test vector and test matrix should be relatively small. For very large optimization problems,the algorithm can be too expensive in terms of computer time and memory. If you do not specify theGRADCHECK option, a fast derivative test identical to the GRADCHECK= FAST specification is doneby default. You can suppress the default derivative test by specifying GRADCHECK= NONE.

HESCAL | HS =0|1|2|3 :specifies the scaling version of the Hessian or cross-product Jacobian matrix used in NRRIDG, TRUREG,LEVMAR, NEWRAP, or DBLDOG optimization. If HS is not equal to zero, the first iteration and eachrestart iteration sets the diagonal scaling matrix D(0) = diag(d(0)

i ):

d(0)i =

√max(|G(0)

i,i |, ε)

where G(0)i,i are the diagonal elements of the Hessian or cross-product Jacobian matrix. In every other

iteration, the diagonal scaling matrix D(0) = diag(d(0)i ) is updated depending on the HS option:

HS=0 specifies that no scaling is done;HS=1 specifies the More (1978) scaling update:

d(k+1)i = max(d(k)

i ,

√max(|G(k)

i,i |, ε))


HS=2 specifies the Dennis, Gay, & Welsch (1981) scaling update:

d(k+1)i = max(0.6 ∗ d(k)

i ,

√max(|G(k)

i,i |, ε))

HS=3 specifies that di is reset in each iteration:

d(k+1)i =

√max(|G(k)

i,i |, ε)

where ε is the relative machine precision. The default is HS=1 for LEVMAR minimization and HS=0otherwise. Scaling of the Hessian or cross-product Jacobian can be time-consuming in the case wheregeneral linear constraints are active.

INSTEP =r > 0 :For highly nonlinear objective functions, such as the exp function, the default initial radius of the trust-region algorithms TRUREG, DBLDOG, or LEVMAR or the default step length of the line-search algorithmscan result in arithmetic overflows. If this occurs, you should specify decreasing values of 0 < r < 1 such asINSTEP=1E-1, INSTEP=1E-2, INSTEP=1E-4, and so on, until the iteration starts successfully.

TRUREG, DBLDOG, LEVMAR For Trust-region algorithms INSTEP specifies a factor r > 0 for theinitial radius ∆(0) of the trust region. The default initial trust-region radius is the length of the scaledgradient. This step corresponds to the default radius factor of r = 1.

NEWRAP, CONGRA, QUANEW, HYQUAN For Line-search algorithms INSTEP specifies an up-per bound for the initial step length for the line search during the first five iterations. The defaultinitial step length is r = 1.

NMSIMP,COBYLA Nelder-Mead simplex algorithm: The INSTEP=r option defines the size of the startsimplex.

For more details see the section Computational Problems in the appendix.

HSPAT = string :specifies the name of an nz × 2 matrix containing pairs of row (first column) and column (second column)indices of nonzero entries of the Hessian matrix. If the Hessian matrix contains a large number of structuralzero entries, i.e. is sparse, the definition of a sparsity pattern can save computer resources (memory andcomputation time). This can be very significant especially when the Hessian matrix must be computedby finite differences and variables can be clustered into separate groups taking advantage of the Coleman,Garbow, & More (1985, [172]) algorithm. Besides the speedier computation of derivatives, the trust-region(TRUREG) and Newton-Raphson (NEWRAP) algorithm can take advantage of the sparsity pattern definedby the HSPAT= specification.

INHESS =r ≥ 0 :specifies how the initial estimate of the approximate Hessian is defined for the quasi-Newton techniquesQUANEW, DBLDOG, and HYQUAN. There are two alternatives:

• = r specifiation is not used: the initial estimate of the approximate Hessian is set to the Hessian orcross-product Jacobian at x(0);

• = r specifiation is used: the initial estimate of the approximate Hessian is set to the multiple of theidentity matrix rI.

By default, the initial estimate of the approximate Hessian is set to the multiple of the identity matrix rI,where the scalar r is computed from the magnitude of the initial gradient.


JSPAT =string :specifies the name of an nz × 2 matrix containing pairs of row (first column) and column (second column)indices of nonzero entries of the Jacobian matrix. If the Jacobian matrix contains a large number ofstructural zero entries, i.e. is sparse, the definition of a sparsity pattern can save computer resources(memory and computation time). This can be very significant especially when the Jacobian matrix mustbe computed by finite differences and cloumns (variables) can be clustered into separate groups takingadvantage of the Coleman, Garbow, & More (1984, [171]) algorithm.

LAV :specifies that the objective function is defined as the sum of absolute function values returned by the usersupplied function module

F (x) =m∑

i=1

|fi(x)| .

This option is similar to the LSQ, LPOW, and LINF options. Since the objective function is nonsmooth,only the NL1REG optimization technique or one of the more general nonsmooth methods can be used foroptimization.

LCDEACT | LCD =r :specifies a threshold r for the Lagrange multiplier that decides whether an active inequality constraintremains active or can be deactivated. For a maximization (minimization), an active inequality constraint canbe deactivated only if its Lagrange multiplier is greater (less) than the threshold value r. For maximization,r must be greater than zero; for minimization, r must be smaller than zero. The default is

r = ±min(0.01,max(0.1 ∗ABSGCONV, 0.001 ∗ gmax(k))),

where the + stands for maximization, the − for minimization, ABSGCONV is the value of the absolutegradient criterion, and gmax(k) is the maximum absolute element of the (projected) gradient g(k) or ZT g(k).

LCEPS | LCE =r ≥ 0 :specifies the range for active and violated boundary and linear constraints. If the point x(k) satisfies thecondition

|n∑

j=1

aijx(k)j − bi| ≤ r ∗ (|bi| + 1), (1.5)

the constraint i is recognized as an active constraint. Otherwise, the constraint i is either an inactiveinequality or a violated inequality or equality constraint. The default value is r=1E-8. During the op-timization process the introduction of rounding errors can force increasing values of r by a factor of 10,100,...

LCSING | LCS =r ≥ 0 :specifies a criterion r used in the update of the QR decomposition that decides whether an active constraintis linearly dependent on a set of other active constraints. The default is r=1E-8. The larger r becomes, themore the active constraints are recognized as being linearly dependent. If the value of r is larger than .1, itis reset to .1.

LINCON = string :specifies the name of an nlc × (n+ 2) matrix which defines linear constraints,

blbi ≤n∑

j=1

aijxj ≤ bubi , i = 1, . . . , nlc

with A = (aij) are given coefficients and blb = (blbi ) and blb = (blbi ) are given lower and upper bounds. Therows i = 1, . . . , nlc of the matrix correspond to the nlc constraints, the first column to the lower bounds


blbi , columns 2 until n+ 1 contain the n coefficients aij, j = 1 . . . , n, and column n + 2 contains the upperbounds bub

i . The matrix can contain missing values only in its first and last columns, indicating that thereis no bound in that direction.

LINF :specifies that the objective function is defined as the maximum of absolute function values returned by theuser supplied function module

F (x) =m

maxi=1

|fi(x)| .

This option is similar to the LSQ, LAV and LPOW options. Since the objective function is nonsmooth, onlythe NLIREG optimization technique or one of the more general nonsmooth methods (NMSIMP, COBYLA)can be used for optimization.

LINESEARCH | LIS =i :specifies the line-search method for the CONGRA, QUANEW, HYQUAN, and NEWRAP optimizationtechniques. See Fletcher (1987) for an introduction to line-search techniques. The value of i can be 1, . . . , 8.For CONGRA, QUANEW and NEWRAP, the default is i = 2. A special line-search method is defaultfor the least-squares technique HYQUAN that is based on an algorithm developed by Lindstrom & Wedin(1984). Although it needs more memory, this default line-search method sometimes works better with largeleast-squares problems. However, by specifying LIS=i, i = 1, . . . , 8, you can also use one of the standardtechniques with HYQUAN.

LIS=1 specifies a line-search method that needs the same number of function and gradient calls for cubicinterpolation and cubic extrapolation; this method is similar to one used by the Harwell subroutinelibrary.

LIS=2 specifies a line-search method that needs more function than gradient calls for quadratic and cubicinterpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) andcan be modified to an exact line search by using the LSP option.

LIS=3 specifies a line-search method that needs the same number of function and gradient calls for cubicinterpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) andcan be modified to an exact line search by using the LSP option.

LIS=4 specifies a line-search method that needs the same number of function and gradient calls for stepwiseextrapolation and cubic interpolation; this method is similar to one described by Hamilton & Boothroyd(TOMS,1969).

LIS=5 specifies a line-search method that is a modified version of LIS=4.

LIS=6 specifies golden section line search (Polak, 1971), which uses only function values for linear approx-imation.

LIS=7 specifies bisection line search (Polak, 1971), which uses only function values for linear approxima-tion.

LIS=8 specifies Armijo line-search technique, (Polak, 1971) which uses only function values for linearapproximation.

LPOW=r ≥ 1 :specifies that the objective function is defined as the absolute sum of function values raised to the power r

F (x) =m∑

i=1

|fi(x)|r .

Note, that large values of r, e.g. r ≥ 10 may be numerically instable due to rounding erors in the computationof residuals raised to large powers. This option is similar to the LAV, LSQ, and LINF options.


LSP =r ≥ 0 :specifies the degree of accuracy that should be obtained by the line-search algorithms LIS=2 and LIS=3.Usually an imprecise line search is inexpensive and successful. For more difficult optimization problems, amore precise and expensive line search may be necessary (Fletcher 1987). The second (default for NEWRAP,QUANEW, and CONGRA) and third line-search methods approach exact line search for small LSPRECI-SION= values. If you have numerical problems you should try to decrease the LSPRECISION= value toobtain a more precise line search. The default values are:

TECH= UPD= LSP defaultQUANEW DBFGS, BFGS r = 0.4QUANEW DDFP, DFP r = 0.06HYQUAN DBFGS r = 0.1HYQUAN DDFP r = 0.06CONGRA all r = 0.1NEWRAP no update r = 0.9

For more details see Fletcher (1987).

LSQ :specifies that the objective function is defined as the sum of squares of function values returned by the usersupplied function module

F (x) =m∑

i=1

f2i (x) .

This option is similar to the LAV, LPOW, and LINF options. For almost all applications the preferredoptimization technique is the Levenberg-Marquardt algorithm (LEVMAR).

MAXFU =i ≥ 0 :specifies the maximum number i of function calls in the optimization process. The default values are:

• TRUREG, LEVMAR, NRRIDG, NEWRAP: 125

• DQNSQP, QUANEW, HYQUAN, DBLDOG, NL1REG, NLIREG: 500

• CONGRA: 1000

• COBYLA, NMSIMP: 3000

Note, that the optimization can be terminated only after completing a full iteration. Therefore, the numberof function calls which is actually performed can exceed the number which is specified by the MAXFUNCoption.

MAXIT =i ≥ 0 :specifies the maximum number i of iterations in the optimization process. The default values are:

• TRUREG, LEVMAR, NRRIDG, NEWRAP: 50

• DQNSQP, QUANEW, HYQUAN, DBLDOG, NL1REG, NLIREG: 200

• CONGRA: 400

• COBYLA, NMSIMP: 1000

This default value is valid also when i is specified as missing value. The optional second value n is validonly for TECH= QUANEW with nonlinear constraints.


MAXSTEP =r > 0 :specifies an upper bound for the step length of the line search algorithms. By default, r is the largest doubleprecision value available. Setting this option can reduce the speed of convergence for TECH=CONGRA,QUANEW, HYQUAN, and NEWRAP.

MAXTIM =r ≥ 0 :specifies an upper limit of r seconds of CPU time for the optimization process. The default is the largestfloating point double representation of your computer. Note that the time specified by the MAXTIMEoption is checked only once at the end of each iteration. Therefore, the actual running time of the NLP()job can be much longer than that specified by the MAXTIME option. The actual running time includesthe rest of the time needed to finish the iteration and the time for the printed output.

MINIT =i ≥ 0 :specifies the minimum number of iterations. The default value is zero. If you request more iterations thanthere are necessary to achieve convergence to a stationary point the optimization algorithms can behavedifferently. In such cases the effect of rounding errors (for example, no descent direction is found forminimization) can prevent the algorithm to continue for the required number of iterations.

MSINGULAR | MSING =rr > 0 :specifies a relative singularity criterion for the computation of the inertia (number of positive, negative, andzero eigenvalues) of Hessian and cross product Jacobian and their projected forms.

ASING and VSING are the specified values of the ASINGULAR= and VSINGULAR options. The defaultvalue for MSING is 1E-12 if the SINGULAR option is not specified and max(10 ∗ ε, 1E− 4 ∗ SINGULAR)otherwise.

NLCB = string :specifies the name of an nnlc × 2 matrix which contains lower bounds blbi in its first column and upperbounds bub

i in its second column and the rows i correspond to the nnlc constraints,

blbi ≤ ci(x) ≤ bubi , i = 1, . . . , nnlc.

The nnlc rows must correspond to the same order of the constraint functions specified in module argumentnlcon. The matrix can contain missing values, if there is no bound for that constraint in that direction.

NOP :suppresses the printed output.

NLCRMIT =i ≥ 0 :specifies an upper bound of iterations for a preliminary minimization process designed to reduce the amountof violation of nonlinear constraints at the starting point. The optimization of merit functions like thoseused in DQNSQP and COBYLA may be sensitive toward large violations of nonlinear constraints at thestarting point. Specifying NLCRMIT= could be helpful in moving the starting point toward the feasibleregion and so creating a better starting point for the main optimization algorithm.

NOCSCALE :This option will suppress the scaling of linear and nonlinear constraints which would be done by defaultotherwise.

NOSPARSE :This option will suppress the search for a sparsity pattern in Hessian or Jacobian which would be done bydefault otherwise.


OPTCHECK =r ≥ 0 :specifies the use of a simple algorithm that computes the function values f(xl) of a grid of points xl in a smallneighborhood of the final parameter estimates x∗. The value of r specifies the radius of a n dimensionalsphere within the trial points are located. If OPTCHECK is specified but not r, default is r = .1 if thealgorithm is performed at the starting point and r = .01 if the algorithm is performed after some iterationswere succesful. These default values may not be a good choice for many problems. This algorithm is startedonly if some condition for termination is satisfied. If a point x∗l is found with a better function value thanf(x∗), the optimization process is restarted by using the point x∗l as the initial point x(0). This optioncan be useful in some minimization (maximization) examples where the specified starting point x(0) is amaximum (minimum) point or where the optimization process terminates at a saddle point. In such casesthe first-order termination criteria are satisfied, and the optimization process would otherwise terminateat the starting point x(0). A better alternative circumventing such problems is to specify a grid of initialpoints.

PALL :prints all optional output.

PFUNCTV :prints the values of all functions specified in the func() module. The PALL option sets the PFUNCTVoption automatically.

PHIST :prints the optimization history. This printed output is included in both the default output and the outputspecified by the PALL option.

PINIT :prints the initial values and derivatives (if available) evaluated at the starting point. This printed output isincluded in both the default output and the output specified by the PALL option.

PRINT [=i ≥ 0 ] :specifies the amount of printed output. Larger values of i will increase the amount of printed output.PRINT=0 is equivalent to the NOPRINT option. The integer specification may be suppressed whichdefaults to PRINT=3.

PSHORT :restricts the amount of default printed output. If PSHORT is specified, then

• the initial values are not printed;

• the listing of constraints is not printed;

• if m, the number of functions specified in the funct() module, is larger than one, the values fi(x) ofthe m functions are not printed;

• if the GRADCHECK=DETAIL option is used, only the test vector is printed.

PSUM :restricts the amount of default printed output to a short form of iteration history and NOTEs, WARNINGs,and ERRORs.

PTIME :The PTIME option specifies the output of the total running time.

REST =i > 0 :specifies that the QUANEW, HYQUAN, or CONGRA algorithm is restarted with a steepest descent/ascentsearch direction after at most i iterations. Default values are as follows:


• CONGRA: UPD=PB: restart is done automatically so specification of i is not used.

• CONGRA: UPD6=PB: i = min(10n, 80), where n is the number of parameters.

• QUANEW, HYQUAN: i is the largest integer available.

SING =r > 0 :specifies the singularity criterion for the inversion of the Hessian matrix and cross-product Jacobian. Thedefault value is 1E-8. If the MSINGULAR option is not specified, SINGULAR defines MSINGULAR=max(10 ∗ ε, 1E − 4∗ SINGULAR). If the VSINGULAR option is not specified, SINGULAR sets VSINGU-LAR=SINGULAR.

TECH =name :specifies the optimization technique. Valid values for name are as follows:

• COBYLA | COBperforms Powell’s modification (Constrained Optimization BY Linear Approximations) of the classicNelder-Mead simplex (NMSIMP) optimization method for linear or nonlinear constraints which doesnot need derivatives. The COBYLA optimization technique can require many function calls and maynot able to solve problems with many parameters or with highly nonlinear constraints efficiently.

• CONGRA | CGchooses one of four different conjugate-gradient optimization algorithms which can be more preciselydefined with the UPD option and modified with the LINESEARCH option. The conjugate-gradienttechniques need only O(n) memory, compared to the O(n2) memory for the other optimization tech-niques, where n is the number of parameters. On the other hand, the conjugate gradient techniques canbe significantly slower than other optimization techniques and should be used only when insufficientmemory is available for more efficient techniques. When you choose this option, UPD=PB by default.For n ≥ 400, this is the default optimization technique. The algorithm uses only first-order derivatives.

• DBLDOG | DDperforms a version of double dogleg optimization, which uses the gradient to update an approximationof the Cholesky factor of the Hessian. This technique is in many aspects very similar to the dualquasi-Newton method but does not use line search. The implementation is based on Dennis & Mei(1979) and Gay (1983).

• DQNSQP | SQPchooses a modification of Powell’s (1982) quasi-Newton algorithm VMCWD for nonlinearly constrainedoptimization which is a sequential quadratic programming (SQP) method. This algorithm can bemodified by specifying VERSION=1 which replaces the update of the Lagrange multiplier estimatevector µ to the original update of Powell (1978) which is used in VF02AD. This can be helpful forapplications with linearly dependent active constraints. DQNSQP is the default optimization techniqueif there are nonlinear constraints specified. Similar to QUANEW, the DQNSQP algorithm uses onlyfirst-order derivatives of the objective function and of the nonlinear cosntraint functions.

• GAUNEW | GAN

• GENALG | GA

• HYQUAN | HQNchooses one of three different hybrid quasi-Newton optimization algorithms that can be more pre-cisely defined with the VERSION option and modified with the LINESEARCH= option. This tech-nique is used only for least-squares problems. When you choose this option, VERSION=2 and UP-DATE=DBFGS by default. The three versions of this algorithm correspond to the algorithms HY1,HY2, and HY3 of Fletcher & Xu (1987).


• LEVMAR | LMperforms a highly stable but, for large problems, memory- and time-consuming Levenberg-Marquardtminimization technique, a slightly improved variant of the More (1978) implementation. This techniqueis useful only for least-squares minimization. For n < 40 this is the default minimization technique forleast-squares problems.

• NMSIMP | NMSperforms the classic Nelder-Mead simplex optimization method, which does not need derivatives. Ifthe parameters are subjected to linear or nonlinear constraints, Powell’s (1992) COBYLA algorithmshould be selected. The NMSIMP optimization technique can require many function calls and is notable to solve problems with many parameters efficiently.

• NONE | NOdoes not perform any optimization. This option can be used

– to do grid search without optimization;– to compute and print derivatives and/or covariance matrices that cannot be obtained efficiently

with any of the optimization techniques.

• NEWRAP | NRAperforms a usually stable but, for large dense problems, memory- and time-consuming Newton-Raphsonoptimization technique. The algorithm combines a line-search algorithm with ridging. The line-searchalgorithm LIS=2 is the default. If this algorithm is used for least-squares problems, it performs amodified Gauss-Newton minimization. For general optimization, this algorithm uses second-orderderivatives. For unconstrained or only boundary constrained problems, this algorithm exploits thesparsity structure in Hessian matrices.

• NRRIDG | NRRperforms a usually stable but, for large problems, memory- and time-consuming Newton-Raphson opti-mization technique. For fewer than 40 parameters, this is the default technique for general optimization.If this algorithm is used for least-squares problems, it performs a ridged Gauss-Newton minimization.For general optimization, the algorithm uses second-order derivatives. Since TECH=NRRIDG usesan orthogonal decomposition of the approximate Hessian, each iteration of TECH=NRRIDG can beslower than that of TECH=NEWRAP which works with Cholesky decomposition. However, usuallyTECH=NRRIDG needs less iterations than TECH=NEWRAP.

• QUADPEN | QPNperforms the quadratic penalty method by Conn & Gould (1984). This method should be selectedonly if the variables are subjected to nonlinear constraints.

• QUANEW | QNchooses one of four quasi-Newton optimization algorithms that can be defined more precisely withthe UPD option and modified with the LINESEARCH option. When you choose this option, UPD=DBFGS by default. The QUANEW technique is the default optimization technique if there are morethan 40 and fewer than 400 parameters to estimate. The QUANEW algorithm uses only first-orderderivatives of the objective function.

• TRUREG | TRperforms a usually very stable but, for large problems, memory- and time-consuming trust region opti-mization technique. The algorithm is implemented similar to Gay (1983) and More & Sorensen (1983).Except for least-squares problems, the algorithm uses second-order derivatives. For unconstrained oronly boundary constrained problems, this algorithm exploits the sparsity structure in Hessian matrices.

UPD =name :specifies the update method for the (dual) quasi-Newton, double dogleg, hybrid quasi-Newton, or conjugate-gradient optimization technique.


For TECHNIQUE=QUANEW, the following updates can be used (the default is DBFGS):

• BFGS : performs original BFGS (Broyden, Fletcher, Goldfarb, & Shanno) update of the inverse Hessianmatrix.

• DBFGS : performs the dual BFGS (Broyden, Fletcher, Goldfarb, & Shanno) update of the Choleskyfactor of the Hessian matrix. This is the default.

• DDFP : performs the dual DFP (Davidon, Fletcher, & Powell) update of the Cholesky factor of theHessian matrix.

• DFP : performs the original DFP (Davidon, Fletcher, & Powell) update of the inverse Hessian matrix.

For TECHNIQUE=DQNSQP, DBLDOG, HYQUAN, the following updates can be used (the default is DBFGS):

• DBFGS : performs the dual BFGS (Broyden, Fletcher, Goldfarb, & Shanno) update of the Choleskyfactor of the Hessian matrix. This is the default.

• DDFP : performs the dual DFP (Davidon, Fletcher, & Powell) update of the Cholesky factor of theHessian matrix.

For TECHNIQUE=CONGRA, the following updates can be used (the default is PB):

• PB : performs the automatic restart update method of Powell (1977) and Beale (1972). This is thedefault.

• FR : performs the Fletcher-Reeves update (Fletcher 1987).

• PR : performs the Polak-Ribiere update (Fletcher 1987).

• CD : performs a conjugate-descent update of Fletcher (1987).

• BM: Birgin & Martinez (2001) update.

• SFR: performs the scaled Fletcher-Reeves update.

• SPR:performs the scaled Polak-Ribiere update.

VERSION =1|2|3 :specifies the version of the HYQUAN, DQNSQP, or NL1REG optimization technique.For HYQUAN:

VS=1 specifies version HY1 of Fletcher & Xu (1987);

VS=2 specifies version HY2 of Fletcher & Xu (1987);

VS=3 specifies version HY3 of Fletcher & Xu (1987).

For DQNSQP:

VS=1 specifies the update of the µ vector like Powell (1978) (update like VF02AD);

VS=2 specifies the update of the µ vector like Powell (1982) (update like VMCWD).

For NL1REG:

VS=1 specifies the method by Pinar & Hartmann (1999)

VS=2 specifies the method by Hald & Madsen (1985)

In all cases the default is VS=2.


VSINGULAR | VSING =r ≥ 0 :specifies a relative singularity criterion for the computation of the inertia (number of positive, negative, andzero eigenvalues) of Hessian and cross product Jacobian and their projected forms. The following singularitycriterion is used

|dj,j| ≤ max(ASING, V SING ∗ |Aj,j|,MSING ∗max(|A1,1|, . . . , |An,n|))

where dj,j is the diagonal pivot of the matrix A, and ASING and MSING are the specified values of theASINGULAR and MSINGULAR options. The default value for VSING is 1E-8 if the SINGULAR optionis not specified and the value of SINGULAR otherwise.

XCONV | XTOL =r ≥ 0 :specifies the relative parameter convergence criterion.

TRUREG, NEWRAP, NRRIDG, QUANEW, DBLDOG, CONGRA, LEVMAR, HYQUANTermination requires a small relative parameter change in subsequent iterations,

maxj |x(k)j − x

(k−1)j |

max(|x(k)j |, |x(k−1)

j |, XSIZE)≤ r.


NMSIMP The same formula is used, but x(k)j is defined as the vertex with the lowest function value and

x(k−1)j is defined as the vertex with the highest function value in the simplex. The default value isr=1E-8.

NL1REG(version=2), NLIREG, MINMAX Termination requires a small relative parameter changein subsequent iterations,

‖ x(k+1) − x(k) ‖2≤ r ‖ x(k) ‖2

The default value is r =1e-8.

QADPEN, DQNSQP, COBYLA, NL1REG(version=1) Currently not used.

XSIZE =r ≥ 0 :specifies the XSIZE parameter of the relative parameter termination criterion. The default is r = 0. Formore detail see the XCONV option.

A completely different set of options can be specified for the genetic algorithm GENALG:

Options for Genetic AlgorithmColumn 1 Column 2 Description

elite missing invoke elitism (replicate best individual)microga missing perform micro GA (normally with popsize=5)

niche missing perform niching (recommended)creep missing perform creep mutations

uniform missing perform uniform crossover (vs. single point)maxgen i > 0 generate i generations (iterations)popsize i > 0 size of population at each generation

pmutate 0 ≤ r < 1 jump mutation probabilitypcross 0 ≤ r < 1 crossover probabilitypcreep 0 ≤ r < 1 creep mutation probabilitynchild i = 1, 2 define 1 or 2 children

For simulated annealing we may specify the following options:


Options for Genetic AlgorithmColumn 1 Column 2 Description

temp r ¿ 0 initial temperature (default=1.)redt r ¿ 0. temperature reduction factor (default=.85)cycl i ¿ 0 number of cycles (default=20)it2tr i ¿ 0 number iterations until temperature reduction (default=

max(100, 5 ∗ n)maxf i ¿ 0 maximum number function calls (default=100000)final i ¿ 0 final number function values deciding termination (default=4)

absxtol r ¿ 0 termination criterion (with ”final” option) (default=1.e-6)inist r ¿ 0. radius of neighborhood around starting point (default=1.)

stpadj r ¿ 0. step length adjustment (default=2.)

Termination tolerance is described at Goffe et.al (1994): If the final function values from the last neps temperaturesdiffer from the corresponding value at the current temperature by less than absxtol and the final function value atthe current temperature differs from the current optimal function value by less than absxtol, execution terminateswith convergence.

1.4.3 Summary of Optimization Techniques

DQNSQP: Sequential Quadratic Programming For nonlinearly constrained optimization problems thisSQP algorithm is a modification of Powell’s (1978, 1982) Variable Metric Constrained WatchDog (VMCWD)algorithm. A similar but older algorithm (VF02AD) is part of the Harwell library. Both VMCWD and VF02ADuse Fletcher’s VE02AD algorithm (part of the Harwell library) for positive definite quadratic programming. TheDQNSQP implementation uses a quadratic programming subroutine (same as in qp) that updates and downdatesthe approximation of the Cholesky factor when the active set changes. DQNSQP is not a feasible point algorithm,and the value of the objective function need not decrease (minimization) or increase (maximization) monotonically.Instead, the algorithm tries to reduce a linear combination of the objective function and constraint violations,called the merit function.

The following are similarities and differences between this algorithm and VMCWD:

• A modification of this algorithm can be performed by specifying VERSION=1 which replaces the updateof the Lagrange vector µ with the original update of Powell (1978) which is used in VF02AD. This can behelpful for some applications with linearly dependent active constraints.

• If VERSION is not specified or VERSION=2 is specified, the evaluation of the Lagrange vector µ is per-formed in the same way as Powell (1982) describes.

• Instead of updating an approximate Hessian matrix, this algorithm uses the dual BFGS (or DFP) updatethat updates the Cholesky factor of an approximate Hessian. If the condition of the updated matrix getstoo bad, a restart is done with a positive diagonal matrix. At the end of the first iteration after each restartthe Cholesky factor is scaled.

• The Cholesky factor is loaded into the quadratic programming subroutine, automatically ensuring positivedefiniteness of the problem. During the quadratic programming step, the Cholesky factor of the projectedHessian matrix ZT

k GZk and the QT decomposition are updated simultaneously when the active set changes.See Gill, Murray, Saunders, & Wright (1984) for more information.

• The line-search strategy is very similar to that of Powell (1982). However, this algorithm does not call forderivatives during the line search. That is the reason why this algorithm generally needs fewer derivativecalls than function calls. VMCWD always requires the same number of derivative and function calls.


• Also the watchdog strategy is similar to that of Powell (1982). However, this algorithm doesn’t returnautomatically after a fixed number of iterations to a former better point. A return here is further delayedif the observed function reduction is close to the expected function reduction of the quadratic model.

• Although Powell’s termination criterion still is used (as FTOL2), the DQNSQP implementation uses twoadditional termination criteria (GTOL and ABSGTOL).

The DQNSQP algorithm needs the Jacobian matrix of the first-order derivatives (constraints normals) of theconstraints,

(∇ci) = (∂ci∂xj

), i = 1, . . . , nc, j = 1, . . . , n

where nc is the number of nonlinear constraints, for a given point x.

You can specify two update formulas with the UPD option:

• UPD=DBFGS performs the dual BFGS update of the Cholesky factor of the Hessian matrix. This is thedefault.

• UPD=DDFP performs the dual DFP update of the Cholesky factor of the Hessian matrix.

This algorithm uses its own line-search technique. All options and parameters (except the INSTEP option)controlling the line search in the other algorithms do not apply here. In several applications, large steps in thefirst iterations were troublesome. You can use the INSTEP option to impose an upper bound for the step size αduring the first five iterations. You may also use the INHESS option to specify a different starting approximationfor the Hessian. Choosing simply the INHESS option will use the Cholesky factor of a (possibly ridged) finitedifference approximation of the Hessian to initialize the quasi-Newton update process. The values of the LCSING,LCEPS, and LCDEACT options, which control the processing of linear and boundary constraints, are valid onlyfor the quadratic programming subroutine used in each iteration of the nonlinear constraints QUANEW algorithm.

In some applications the COBYLA algorithm may behave more stable than DQNSQP, but in general needs manymore function evaluations.

VMCWD: Original MCWD Code by Powell

QADPEN: Quadratic Penalty Method

COBYLA: COnstrained Optimization BY Linear Approximation A slightly modified version of Pow-ell’s (1992) COBYLA implementation is used for the optimization of (nonsmooth) objective function with non-linear or general linear constraints. This method does not need the specification of any derivatives. Powell’sCOBYLA algorithm is a sequential trust-region algorithm (originally with a monotonically decreasing radius ρ ofa spheric trust region) that tries to maintain a regular-shaped simplex over the iterations. A small modificationwas made to the original algorithm, which permits an increase of the trust-region radius ρ in special situations.A sequence of iterations is performed with a constant trust-region radius ρ until the computed objective functionreduction is much less than the predicted reduction. Then, the trust-region radius ρ is reduced. The trust-regionradius is increased only if the computed function reduction is relatively close to the predicted reduction and thesimplex is well-shaped. The start radius ρbeg and the final radius ρend can be specified using ρbeg=INSTEP andρend=ABSXTOL. The convergence to small values of ρend (high precision) may take many calls of the functionand constraint modules and may result in numerical problems. There are two main reasons for the sometimesslow convergence of the COBYLA algorithm:


• Only linear approximations of the objective and constraint functions are used locally.

• Maintaining the regular-shaped simplex and not adapting its shape to nonlinearities yields very smallsimplexes for highly nonlinear functions (for example, fourth order polynomials).

TRUREG: Trust-Region Optimization The trust-region method uses the gradient g(k) = ∇f(x(k)) andHessian matrix G(k) = ∇2f(x(k)) and requires that the objective function f = f(x) have continuous first- andsecond-order derivatives inside the feasible region.

The n × n Hessian matrix G contains the second derivatives of the objective function f with respect to theparameters x1, . . . , xn, as follows:

G(x) = ∇2f(x) = (∂2f

∂xj∂xk)

In the unconstrained or only boundary constrained case, the TRUREG algorithm can take advantage of diagonalor sparse Hessian matrices. Specifying the sparsity pattern of the Hessian matrix using the HSPAT= option maysave considerable memory and computer time especially when an analytic Hessian is not specified and numericsecond order derivatives must be computed.

The trust-region method iteratively optimizes a quadratic approximation to the nonlinear objective functionwithin a hyperelliptic trust region with radius ∆ that constrains the step size corresponding to the quality ofthe quadratic approximation. The trust-region method is implemented using Dennis, Gay, & Welsch (1981), Gay(1983), and More & Sorensen (1983).

The trust region method performs well for small- to medium-sized problems and does not need many function,gradient, and Hessian calls. However, if the computation of the Hessian matrix is computationally expensive, oneof the (dual) quasi-Newton or conjugate gradient algorithms may be more efficient.

NEWRAP: Newton-Raphson Optimization With Line Search This algorithm uses a pure Newton stepwhen the Hessian is positive definite and when the Newton step reduces the value of the objective functionsuccessfully. Otherwise a combination of ridging and line-search is done to compute successful steps. If theHessian is not positive definite, a multiple of the identity matrix is added to the Hessian matrix to make itpositive definite (Eskow & Schnabel, 1991).

The NEWRAP technique uses the gradient g(k) = ∇f(x(k)) and Hessian matrix G(k) = ∇2f(x(k)) and requiresthat the objective function have continuous first- and second-order derivatives inside the feasible region. If second-order derivatives are computed efficiently and precisely the NEWRAP method may perform well for medium-sizedto large problems and does not need many function, gradient, and Hessian calls.

In the unconstrained or only boundary constrained case, the NEWRAP algorithm can take advantage of diagonalor sparse Hessian matrices. Specifying the sparsity pattern of the Hessian matrix using the HSPAT= option maysave considerable memory and computer time especially when an analytic Hessian is not specified and numericsecond order derivatives must be computed.

In each iteration a line search is done along the search direction to find an approximate optimum of the objectivefunction. The default line-search method uses quadratic interpolation and cubic extrapolation (LIS=2). Otherline-search algorithms can be specified with the LIS= option.

NRRIDG: Newton-Raphson Ridge Optimization This algorithm uses a pure Newton step when theHessian is positive definite and when the Newton step reduces the value of the objective function successfully.


If at least one of these two conditions is not satisfied, a multiple of the identity matrix is added to the Hessianmatrix.

The NRRIDG technique uses the gradient g(k) = ∇f(x(k)) and Hessian matrix G(k) = ∇2f(x(k)) and requiresthat the objective function have continuous first- and second-order derivatives inside the feasible region.

The NRRIDG method performs well for small to medium-sized problems and does not need many function,gradient, and Hessian calls. However, if the computation of the Hessian matrix is computationally expensive, oneof the (dual) quasi-Newton or conjugate gradient algorithms may be more efficient.

QUANEW: Quasi-Newton Optimization The (dual) quasi-Newton optimization techniques work well formedium to moderately large optimization problems where the objective function and the gradient are muchfaster to compute than the Hessian. QUANEW does not need the computation of second-order derivatives, butin general requires more iterations than the techniques (TRUREG, NEWRAP, and NRRIDG) which computesecond-order derivatives.

Using the UPD option you can choose between two different algorithms:

• the original quasi-Newton algorithm, that updates an approximation of the inverse Hessian.

• the dual quasi-Newton algorithm, that updates the Cholesky factor of an approximate Hessian.

For problems with general linear inequality constraints the dual quasi-Newton methods can be more efficient thanthe original ones.

Four update formulas can be specified with the UPD option:

DBFGS performs the dual BFGS (Broyden, Fletcher, Goldfarb, & Shanno) update of the Cholesky factor ofthe Hessian matrix. This is the default.

DDFP performs the dual DFP (Davidon, Fletcher, & Powell) update of the Cholesky factor of the Hessianmatrix.

BFGS performs the original BFGS (Broyden, Fletcher, Goldfarb, & Shanno) update of the inverse Hessianmatrix.

DFP performs the original DFP (Davidon, Fletcher, & Powell) update of the inverse Hessian matrix.

In each iteration a line search is done along the search direction to find an approximate optimum of the objectivefunction. The default line-search method uses quadratic interpolation and cubic extrapolation to obtain a step sizeα satisfying the Goldstein conditions. One of the Goldstein conditions can be violated if the feasible region definesan upper limit of the step size. Violating the left side Goldstein condition can affect the positive definitenessof the quasi-Newton update. In those cases either the update is skipped or the iterations are restarted with anidentity matrix resulting in the steepest descent or ascent search direction. Line-search algorithms other than thedefault one can be specified with the LIS option.

DBLDOG: Double Dogleg Optimization The double dogleg optimization method combines the ideas ofquasi-Newton and trust region methods. The double dogleg algorithm computes in each iteration the step s(k) asthe linear combination of the steepest descent or ascent search direction s(k)

1 and a quasi-Newton search directions(k)2 ,

s(k) = α1s(k)1 + α2s

(k)2 . (1.6)


The step is requested to remain within a prespecified trust region radius, see Fletcher (1987, p. 107). Thusthe DBLDOG subroutine uses the dual quasi-Newton update but does not perform a line search. Two updateformulas can be specified with the UPD option:



The double dogleg optimization technique works well for medium to moderately large optimization problemswhere the objective function and the gradient are much faster to compute than the Hessian. The implementationis based on Dennis & Mei (1979) and Gay (1983) but is extended for dealing with boundary and linear constraints.DBLDOG generally needs more iterations than the techniques (TRUREG, NEWRAP, or NRRIDG) which needsecond-order derivatives, but each of the DBLDOG iterations is computationally cheap. Furthermore, DBLDOGneeds only gradient calls for the update of the Cholesky factor of an approximate Hessian.

POWBLC: Powell’s Robust BFGS Method for Linear Constraints This algorithm implements Powell’s(1988) BFGS method for boundary and general linear constraints. It seems to be a reasonably fast and robustmethod for linearly constrained problems with rather large n.

LMEMQN: Limited Memory Quasi Newton (BFGS Update) This algorithm implements the Byrd,Nocedal, & Zhu (1995) limited memory BFGS method for optimizing an unconstrained or boundary constrainedgeneral nonlinear function. It cannot be used for general linear constraints. If n is very large and there is notenough space for storing the Hessian Matrix requiring O(n2) memory, only the LMEMQN and CONGRA methodscan be used. The performance of the LMEMQN method is somehow closer to the better among the quasi Newtonand the conjugate gradient methods, where memory use is closer to conjugate gradient and speed closer to thequasi Newton methods.

CONGRA: Conjugate Gradient Optimization The CONGRA subroutine needs only function and gradientcalls. The gradient vector contains the first derivatives of the objective function f with respect to the parametersx1, . . . , xn, as follows:

g(x) = ∇f(x) = (∂f

∂xj)

difference formulas using only function calls. Second-order derivatives are not needed by CONGRA and not evenapproximated. The CONGRA algorithm can be expensive in function and gradient calls but needs only O(n)memory for unconstrained optimization. In general, many iterations are needed to obtain a precise solution, buteach of the CONGRA iterations is computationally cheap. Four different update formulas for generating theconjugate directions can be specified using the UPD option:

PB : performs the automatic restart update method of Powell (1977) and Beale (1972). This is the default.

FR : performs the Fletcher-Reeves update (Fletcher 1987).

PR : performs the Polak-Ribiere update (Fletcher 1987).

CD : performs a conjugate-descent update of Fletcher (1987).


The default is UPD=PB, since it behaved best in most test examples. You are advised to avoid the optionUPD=CD, which behaved worst in most test examples.

The CONGRA subroutine should be used for optimization problems with large n. For the unconstrained orboundary constrained case, CONGRA needs only O(n) bytes of working memory, whereas all other optimizationmethods require order O(n2) bytes of working memory. During n successive iterations, uninterrupted by restartsor changes in the working set, the conjugate gradient algorithm computes a cycle of n conjugate search directions.In each iteration a line search is done along the search direction to find an approximate optimum of the objectivefunction. The default line-search method uses quadratic interpolation and cubic extrapolation to obtain a stepsize α satisfying the Goldstein conditions. One of the Goldstein conditions can be violated if the feasible regiondefines an upper limit for the step size. Other line-search algorithms can be specified with the LIS option.

NMSIMP: Nelder-Mead Simplex Optimization The Nelder-Mead simplex method does not use anyderivatives and does not assume that the objective function has continuous derivatives. The objective func-tion itself needs to be continuous. This technique is quite expensive in the number of function calls and may beunable to generate precise results for n� 40. The original Nelder-Mead simplex algorithm is extended to bound-ary constraints. This algorithm does not compute the objective for infeasible points. The original Nelder-Meadalgorithm cannot be used for general linear or nonlinear constraints but can be faster for the unconstrained orboundary constrained case than COBYLA. The original Nelder-Mead algorithm changes the shape of the simplexadapting the nonlinearities of the objective function which contributes to an increased speed of convergence.

SIMANN: Simulated Annealing Algorithm The algorithm is based on a Fortran implementation by Goffe,Ferrier, & Rogers (1994). The algorithm seems to be more efficient for solving global optimization problems thanthe current form of genetic algorithm GENALG. This algorithm uses only function values and is therefore alsoapplicable for the optimization of nonsmooth functions. Only boundary constraints may be specified, and it ishighly recommended to restrict the size of the search area by specifying some intelligent lower and upper bounds.

GENALG: Global Optimization by Genetic Algorithm The algorithmn is a modified version of thegenetic algorithm implemented by David Carroll (based on [316]). Currenbtly this algorithm is designed onlyfor maximizing a nonnegative valued objective function. That means for minimizing a positive valued functionf(x) the function definition must be changed to some g(x) like

g(x) = sup(f(x)) − f(x)

where sup() denotes the largest possible function value inside the feasible region.

Only boundary constraints may be specified, and it is highly recommended to restrict the size of the search areaby specifying some intelligent lower and upper bounds.

If the "max" option is not specified by the user it will be specified automatically. If the function evaluation returnsnegative values of the objective function, a warning is printed into the log output. The user must take care ofthose specific facts when specifying the objective function. In some applications the objective function f(x) mustbe modified for

maximization to g(x) = c+ f(x) with a sufficient large constant c > 0 to avoid negative values of f(x);

minimization to g(x) = c− f(x) with a sufficient large constant c > 0 to avoid negative values of −f(x).

An example below illustrates how this can be done. Also, the GENALG algorithm assumes that boundaryconstraints are specified. If the user does not specify boundary constraints the unit hypercube

0 ≤ xj ≤ 1, for j = 1, . . . , n


is specified automatically which may be too small for most applications.

LEVMAR: Levenberg-Marquardt Least-Squares Method The Levenberg-Marquardt method is a mod-ification of the trust-region method for nonlinear least-squares problems and is implemented as in More (1978).This is the recommended algorithm for small- to medium-sized least-squares problems. Large least-squares prob-lems can be transformed into minimization problems, which can be processed with conjugate gradient or (dual)quasi-Newton techniques. In each iteration LEVMAR solves a quadratically constrained quadratic minimizationproblem that restricts the step to stay at the surface of or inside an n dimensional elliptical (or spherical) trustregion. In each iteration, LEVMAR computes the cross-product matrix of the m × n Jacobian matrix J, whichcontains the first-order derivatives of the m functions fi = fi(x) with respect to the parameters x1, . . . , xn, asfollows:

J(x) = (∇f1, . . . ,∇fm) = (∂fi

∂xj)

The cross-product Jacobian JTJ ,

JTJ = (m∑

i=1

∂fi

∂xj

∂fi

∂xk)

is used as an approximate Hessian matrix.

HYQUAN: Hybrid Quasi-Newton Least-Squares Methods In each iteration of one of the Fletcher &Xu (1987) (refer also to AlBaali & Fletcher, 1985, 1986) hybrid quasi-Newton methods, a criterion is used todecide whether a Gauss-Newton or a dual quasi-Newton search direction is appropriate. The VERSION optioncan be used to choose one of three criteria (HY1, HY2, HY3) proposed by Fletcher & Xu (1987). The defaultis VERSION=2; that is, HY2. In each iteration, HYQUAN computes the cross-product Jacobian (used for theGauss-Newton step), updates the Cholesky factor of an approximate Hessian (used for the quasi-Newton step),and does a line search to compute an approximate minimum along the search direction. The default line-searchtechnique used by HYQUAN is especially designed for least-squares problems (refer to Lindstrom & Wedin, 1984,and AlBaali & Fletcher, 1986). Using the LIS option you can choose a different line-search algorithm than thedefault one.

Two update formulas can be specified with the UPD option:



The HYQUAN subroutine needs about the same amount of working memory as the LEVMAR algorithm. Inmost applications LEVMAR seems to be superior to HYQUAN, and using HYQUAN is recommended only whenproblems are experienced with the performance of LEVMAR.

NONDIF: General Nondifferential Optimization This algorithm is based on the work by Outrata, Schramm,& Zowe (1991), as well as Einarsson (1998), and Einarsson, & Madsen (1999). For the default version (=2), theBundle Trust Region method by Outrata et.al is used and for version=1 the method by Einarsson. These methodsrequire the specification of a function and a (sub)gradient module. When there are boundary or general linearconstraints specified, only the Bundle Trust Region method can be used.


NL1REG: Nonlinear L1 Minimization There are two different algorithms available to solve the specificnonsmooth nonlinear L1 regression problem:

(L1) F (x) =m∑

i=1

|fi(x)| −→ minx

• The algorithm of Hald & Madsen (1985) is able to solve linearly constrained L1 minimization problems.

• The algorithm by Pinar & Hartmann (1999) is a modification of the Conn & Gould (1984) quadratic penaltyalgorithm for large-scale nonlinearly constrained optimization. Currently, this algorithm is only able to solveunconstrained L1 problems.

Note The reg(.) function provides a number of algorithms to solve the linear L1 regression problem

|Ax− y| −→ minx

NLIREG: Nonlinear L∞ Minimzation The algorithm of Hald & Madsen (1984) is able to solve linearlyconstrained MinMax problem

(MinMax) F (x) =m

maxi=1

fi(x) −→ minx

which is also used to solve the more specific linearly constrained L∞ minimization problem

(L∞) F (x) =m

maxi=1

|fi(x)| −→ minx

Solving the (L∞) problem of m functions fi(x) is equivalent to solving the MinMax problem with 2m functionsfi(x) and −fi(x). Note The reg(.) function provides a number of algorithms to solve the linear L∞ regressionproblem

maxmi=1|

n∑

j=1

aijxj − yi| −→ minx

1.4.4 Types of Derivatives Used in Optimization

The gradient vector contains the first derivatives of one objective function f with respect to the parametersx1, . . . , xn, as follows:

g(x) = ∇f(x) = (∂f

∂xj). (1.7)

The n×n Hessian matrix contains the second derivatives of one objective function f with respect to the parametersx1, . . . , xn, as follows:

G(x) = ∇2f(x) = (∂2f

∂xj∂xk). (1.8)

The m× n Jacobian matrix contains the first-order derivatives of m objective functions fi(x) with respect to theparameters x1, . . . , xn, as follows:

J(x) = (∇f1, . . . ,∇fm) = (∂fi

∂xj). (1.9)


In case of least-squares problems the cross-product Jacobian JTJ ,

JTJ = (m∑

i=1

∂fi

∂xj

∂fi

∂xk) (1.10)

is used as an approximate Hessian matrix. Using them vector f = f(x) of function values f(x) = (f1(x), . . . , fm(x))T ,the gradient g = g(x) is computed by

g(x) = JT (x)f(x). (1.11)

The mc × n Jacobian matrix contains the first-order derivatives of mc nonlinear constraint functions ci(x),i = 1, . . . ,mc, with respect to the parameters x1, . . . , xn, as follows:

CJ(x) = (∇c1, . . . ,∇cmc) = (∂ci∂xj

). (1.12)

1.4.5 Finite Difference Approximations of Derivatives

If the optimization technique needs at least first order derivatives and the grad module is not specified, thegradient or Jacobian is computed using finite difference formulas.

If the optimization technique needs second order derivatives and the hess module is not specified, the Hessian iscomputed using finite difference formulas, by using

• function values if the grad module argument is not specified

• gradient values if the grad module argument is specified.

In general, Hessian matrices which are based on the analytical gradient are significantly more precise than thosewhich are based on function values only.

Forward Difference Approximations

• First-order derivatives: n additional function calls are needed:

gi =∂f

∂xi=f(x + hiei) − f(x)

hi(1.13)

• Second-order derivatives based on function calls only (Dennis & Schnabel, 1983, p. 80, 104): for denseHessian, n+ n2/2 additional function calls are needed:

∂2f

∂xi∂xj=f(x + hiei + hjej) − f(x + hiei) − f(x + hjej) + f(x)

hihj(1.14)

• Second-order derivatives based on gradient calls (Dennis & Schnabel, 1983, p. 103): n additional gradientcalls are needed:

∂2f

∂xi∂xj=gi(x+ hjej) − gi(x)

2hj+gj(x+ hiei) − gj(x)

2hi(1.15)


Central Difference Approximations

• First-order derivatives: 2n additional function calls are needed:

gi =∂f

∂xi=f(x + hiei) − f(x − hiei)

2hi(1.16)

• Second-order derivatives based on function calls only (Abramowitz & Stegun, 1972, p. 884): for denseHessian, 2n+ 4n2/2 additional function calls are needed:

∂2f

∂x2i

=−f(x + 2hiei) + 16f(x + hiei) − 30f(x) + 16f(x − hiei) − f(x − 2hiei)

12h2i

(1.17)

∂2f

∂xi∂xj=

f(x + hiei + hjej) − f(x + hiei − hjej) − f(x − hiei + hjej) + f(x − hiei − hjej)

4hihj

• Second-order derivatives based on gradient: 2n additional gradient calls are needed:

∂2f

∂xi∂xj=gi(x+ hjej) − gi(x− hjej)

4hj+gj(x+ hiei) − gj(x− hiei)

4hi(1.18)

The FDINT option specifies whether the finite difference intervals h should be computed using an algorithm ofGill, Murray, Saunders, and Wright (1983) or are based only on the information of the FDIGITS= and CDIGITSoptions. For FDINT=OBJ, the interval h is based on the behavior of the objective function, for FDINT=CON,the interval h is based on the behavior of the nonlinear constraints functions, and for FDINT=ALL, the intervalh is based on both, the behavior of the objective function and the nonlinear constraints functions. Note that thealgorithm of Gill, Murray, Saunders, and Wright (1983) for computing the finite difference intervals hj can bevery expensive in the number of function calls. If FDINT= is specified, it is currently performed twice, the firsttime before the optimization process starts and the second time after the optimization terminates.

If FDINT is not specified, the step sizes hj , j = 1, . . . , n, are defined as follows:

• for the forward-difference approximation of first-order derivatives using function calls and second-orderderivatives using gradient calls: hj = 2

√ηj(1.+ |xj|),

• for the forward-difference approximation of second-order derivatives which use only function calls and allcentral-difference formulas: hj = 3

√ηj(1.+ |xj|).

where η is defined using the FDIGITS specification:

• if the number of accurate digits is specified with FDIGITS=r: η is set to 10−r;

• if FDIGITS is not specified: η is set to the machine precision ε.

The FDIGITS option specifies the number of accurate digits in evaluations of the objective function. Fractionalvalues such as FDIGITS=4.7 are allowed. The default is r=-log10(ε), where ε is the machine precision. The valueof r is used to compute the interval size h for the computation of finite-difference approximations of the derivativesof the objective function and for the default value of the FCONV option. For FDINT=OBJ and FDINT=ALLthe FDIGITS specification is used in computing the forward and central finite-difference intervals.

If you know of better finite difference formulas, you may supply them using the grad, hess, or jcon arguments.


1.4.6 Hessian and CRP Jacobian Scaling

The rows and columns of the Hessian and cross-product Jacobian matrix can be scaled when using the trust-region, Newton-Raphson, and Levenberg-Marquardt optimization techniques. Each element Gi,j, i, j = 1, . . . , n,is divided by the scaling factor di ∗ dj, where the scaling vector d = (d1, . . . , dn) is iteratively updated in a wayspecified by the HESCAL=i option, as follows:

i = 0 : No scaling is done (equivalent to di = 1).

i 6= 0 : First iteration and each restart iteration sets:

d(0)i =

√max(|G(0)

i,i |, ε) (1.19)

i = 1 : see More (1978):

d(k+1)i = max(d(k)

i ,

√max(|G(k)

i,i |, ε)) (1.20)

i = 2 : see Dennis, Gay, & Welsch (1981):

d(k+1)i = max(.6 ∗ d(k)

i ,

√max(|G(k)

i,i |, ε)) (1.21)

i = 3 : di is reset in each iteration:

d(k+1)i =

√max(|G(k)

i,i |, ε) (1.22)

where ε is the relative machine precision or, equivalently, the largest double precision value that when added to1 results in 1.

1.4.7 Criteria for Optimality

All currently implemented optimization algorithms converge toward local rather than global optima. The smallestlocal minimum (largest local maximum) of an objective function is called the global minimum (global maximum).One way to find out whether the objective function has more than one local optimum is to run various optimiza-tions with a pattern of different starting points x(0).

For a more mathematical definition of the optimality term, refer to the Kuhn-Tucker theorem in standardoptimization literature. Using a rather nonmathematical language, a local minimizer x∗ satisfies the followinglocal optimality conditions:

1. There exists a small (feasible) neighborhood of x∗ that does not contain any point x with

• smaller function value f(x) < f(x∗) for minimization;

• larger function value f(x) > f(x∗) for maximization.

2. The vector of first derivatives (gradient) g(x∗) = ∇f(x∗) of the objective function f (projected toward thefeasible region) at the point x∗ is zero.

3. The matrix of second derivatives G(x∗) = ∇2f(x∗) (Hessian matrix) of the objective function f (projectedtoward the feasible region) at the point x∗

• is positive definite for minimization (convex);


• is negative definite for maximization (concave).

The iterative optimization algorithm terminates at the point xt, which should be in a close neighborhood (interms of a user-specified termination criterion, see the following) of a local optimizer x∗. If the point xt is locatedon one or more active boundary or general linear constraints, the local optimization conditions are valid only forthe feasible region; that means

• the projected gradient ZT g(xt) (rather than g(xt)) must be sufficiently small,

• the projected Hessian ZTG(xt)Z must be positive (minimization) or negative (maximization) definite.

If there are n active constraints at the point xt, the nullspace Z has zero columns and the projected Hessian haszero rows and columns. A matrix with zero rows and columns is considered positive as well as negative definite.

Kuhn-Tucker Conditions

We define the nonlinear programming (NLP) problem (see Hock & Schittkowski, 1981) with one objective functionf and m constraint functions ci which are continuously differentiable,

min f(x), x ∈ Rn,s.t. ci(x) = 0, i = 1, . . . ,me,

ci(x) ≥ 0, i = me + 1, . . . ,m.(1.23)

The linear combination of objective and constraint functions

L(x, λ) = f(x) −m∑

i=1

λici(x) (1.24)

is the Lagrange function and the coefficients λi are called Lagrange multipliers.

Assuming the functions f and ci are twice continuously differentiable, the point x∗ is an isolated local minimizerof the NLP problem, if there exists a vector λ∗ = (λ∗1, . . . , λ∗m) that meets the following conditions:

1. Kuhn-Tucker conditions:

ci(x∗) = 0, i = 1, . . . ,me,ci(x∗) ≥ 0, λ∗i ≥ 0, λ∗i ci(x

∗) = 0, i = me + 1, . . . ,m,∇xL(x∗, λ∗) = 0

(1.25)

2. Second-order condition:

Each nonzero vector y ∈ Rn with

yT∇xci(x∗) = 0{i = 1, . . . ,me,∀i ∈ {me + 1, . . . ,m : λ∗i > 0}, (1.26)

satisifiesyT∇2

xL(x∗, λ∗)y > 0. (1.27)

In practice we cannot expect that the constraint functions ci(x∗) vanish within machine precision, and determiningthe set of active constraints at the solution x∗ may not be simple.


NONDIF Techniques

The new NONDIF optimization technique inplements a set of nonsmooth subgradient techniques developed in theearly 1990’s by a team from the University of Bayreuth (Outrata, Schramm and Zowe, 1991) called Bundle-Trust-Region methods. (BTNCLC was mailed to me in August 2008.) The Fortran code of all routines (exceptBTNCLC) was mailed to me in February and May 1995 when I was still working for SAS. Due to some verytragic accident of Prof. Zowe the work on this software was not continued. The original Fortran package containsthe following programs:

BT the unconstrained problem with convex function

BTNC the unconstrained problem with nonconvex function

BTNCBCL the bound constrained problem with nonconvex function

BTCLC the bound and linear constrained problem with convex function

BTNCLC the bound and linear constrained problem with nonconvex function

All methods use a QP solver written by K. Schittkowski which is based on software by J.M.D. Powell. Slightlymodified versions of these algorithms were implemented in CMAT.

For unconstrained optimization, Einarsson (1998), and Madsen & Einarsson (1999) developed a different methodbased on stepwise LP which sometimes can compete with the BT and BTNC methods. The following additionaloptions are relevant for the NONDIF algorithms:

”vers” this should be 1 for the Einarsson & Madsen algorithm (only unconstrained) or 2 for the BT algorithms(default is 2);

”corrs” is the number of gradients in the bundle (as larger as better), default is 5;

”fconvex” the objective function is convex.

Here a few examples:

1. Subgradient specification of example by Shor:

a = [ 0.0 2.0 1.0 1.0 3.0 0.0 1.0 1.0 0.0 1.0 ,0.0 1.0 2.0 4.0 2.0 2.0 1.0 0.0 0.0 1.0 ,0.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 2.0 ,0.0 1.0 1.0 2.0 0.0 0.0 1.0 2.0 1.0 0.0 ,0.0 3.0 2.0 2.0 1.0 1.0 1.0 1.0 0.0 0.0 ]‘;

b = [ 1. 5. 10. 2. 4. 3. 1.7 2.5 6.0 4.5 ];m = nrow(a); n = ncol(a);c = cons(m);

function fshor5(x) global(a,b,c) {m = nrow(a); n = ncol(a);for (i = 1; i <= m; i++)c[i] = abs(b[i] * ssq(x - a[i,]));


/* get index k1 of max c[m] */k = c[>!]; k1 = k[1];crit = c[k1];return(crit);}

function gshor5(x) global(a,b,c) {/* get index k1 of max c[m]: c[] should still be stored */k = c[>!]; k1 = k[1];grad = 2. * b[k1] * (x - a[k1,]);return(grad);}

x0 = [ 4#0. 1. ];f0 = fshor5(x0);print "Function at starting point",f0;g0 = gshor5(x0);print "Gradient at starting point",g0;

Function at starting point 80.000

Gradient at starting point| 1 2 3 4 5

------------------------------------------------------1 | -20.000 -40.000 -20.000 -20.000 -20.000

2. The Einarsson-Madsen algorithm:

x0 = [ 4#0. 1. ];

mopt = [ "tech" "nondif" ,"vers" 1 , /* NONDIF version Madsen */"maxit" 1000 ,"maxfu" 5000 ,"print" 5 ];

< xr,rp,der1,der2 > = nlp(fshor5,x0,mopt,.,.,.,gshor5);

NonDifferentiable Method (Einarsson and Madsen, 1998)Convexity of Objective Function NOT Assumed

User Specified Gradient

Iteration Start:N. Variables 5Criterion 110.0000000 Max Grad Entry 40.00000000


Iter nfun act optcrit norm(hk) lambda pred rho1 2 1 25.00000 1.000000 1.0000000 0.6071429 1.0000002 10 4 25.00000 0.200000 2.0000000-10.000000 0.2000003 19 5 23.51537 0.500000 0.5000000 0.4017618 0.0408334 27 6 23.51537 0.032465 0.5000000-10.000000 0.0324655 35 3 22.79268 0.125000 0.1250000 0.7954681 0.0324656 44 6 22.67422 0.137706 0.2500000 0.3154945 5.0e-0037 52 6 22.67422 5.0e-003 0.2500000-10.000000 5.0e-0038 59 3 22.63708 0.062500 0.0625000 0.3134346 1.3e-0039 65 3 22.62049 0.062500 0.0625000 0.2157089 1.3e-003

10 74 3 22.60181 0.015625 0.0156250 0.7830801 3.1e-00411 82 5 22.60181 6.3e-004 0.0312500-10.000000 6.3e-00412 89 3 22.60055 7.8e-003 0.0078125 0.5707412 1.6e-00413 98 5 22.60028 7.8e-003 0.0078125 0.3947248 1.6e-00414 105 0 22.60028 1.6e-004 0.0078125 1.00e-006 1.6e-00415 112 0 22.60028 3.9e-005 0.0019531-10.000000 3.9e-00516 120 5 22.60018 4.9e-004 4.88e-004 0.2371260 9.8e-00617 127 3 22.60018 2.4e-006 1.22e-004-1.1439683 2.4e-006

Successful Termination After 17 IterationsABSGCONV convergence criterion satisfied.Criterion 22.60017694 Max Grad Entry 2.4414e-006N. Grad Storage 5N. Function Calls 128 N. Gradient Calls 118Preproces. Time 0 Time for Method 1Effective Time 1




1 X_1 1.12388195 13.4865832 X_2 0.97931517 11.7517823 X_3 1.47603969 -6.28752374 X_4 0.92000741 -0.95991115 X_5 1.12409702 13.489164

Value of Objective Function = 22.6002

3. The unconstrained and convex case BT:

x0 = [ 4#0. 1. ];


mopt = [ "tech" "nondif" ,"vers" 2 , /* NONDIF version BTR */"fconvex" , /* f assumed convex */"corrs" 10 , /* number of corrections */"maxit" 1000 ,"maxfu" 5000 ,"print" 5 ];

< xr,rp,der1,der2 > = nlp(fshor5,x0,mopt,.,.,.,gshor5);/* Fopt = 22.6002

X = [ 1.1243 0.97965 1.4786 0.91989 1.1245 ]; */print "Bundle TR MIN: XR=",xr;print "Bundle TR MIN: RP=",rp;

******************Optimization Start******************



1 X_1 0.00000000 -20.0000002 X_2 0.00000000 -40.0000003 X_3 0.00000000 -20.0000004 X_4 0.00000000 -20.0000005 X_5 1.00000000 -20.000000

Value of Objective Function = 80

Bundle Trust Region Method (Outrata-Schramm-Zowe, 1991)Convex Objective Function Assumed



Iter nfun act optcrit maxgrad gradnrm alpha rho2 3 2 80.00000 40.00000 28.284271 226.50044 32.064032 4 1 80.00000 67.95013 56.568542 0.0000000 1.2825613 5 1 37.80071 18.79501 30.120043 0.0000000 0.3636134 6 2 37.80071 21.00906 16.420987 4.2711930 0.1080754 7 2 37.80071 14.92391 14.303712 5.8060317 2.0500564 8 2 37.80071 33.40431 14.471175 5.5502253 0.7554025 9 3 29.00440 22.13278 10.438652 6.0458126 0.393061


6 10 4 29.00440 26.60915 7.7923853 6.9131913 0.2190347 11 4 26.10894 15.32694 2.6369991 4.8226013 0.0250848 12 4 24.34863 15.15838 1.8864657 2.3266175 0.0128379 13 4 24.34863 12.39751 2.1901058 1.8572140 0.017302

10 14 4 23.89188 19.32857 1.2446275 1.2657621 5.6e-00311 15 4 22.70130 13.74280 0.5903855 0.1312849 0.03143312 16 5 22.70130 18.90823 0.2002678 0.1639584 3.6e-00313 17 5 22.66608 15.15163 0.0861074 0.1161131 6.7e-00414 18 4 22.66608 18.73240 0.0244589 0.0857739 5.4e-00515 19 4 22.64333 12.10321 0.0792175 0.0496663 5.7e-00416 20 5 22.61557 13.50355 0.0921757 0.0240104 7.7e-00417 21 5 22.61096 18.71096 0.0280888 0.0177295 1.8e-00318 22 6 22.61096 15.09778 0.0020142 0.0176058 9.1e-00619 23 6 22.61096 14.99445 0.0268033 0.0146066 1.6e-00320 24 6 22.61096 18.87746 0.0134045 0.0148242 4.1e-00421 25 6 22.60278 12.09313 2.23e-004 0.0055238 1.1e-00722 26 6 22.60278 13.56868 0.0078431 0.0045218 1.4e-00423 27 6 22.60278 18.77953 0.0102646 0.0037998 2.4e-00424 28 6 22.60160 13.52349 0.0077556 0.0021372 1.4e-00425 29 6 22.60160 15.01907 0.0052699 0.0020531 6.3e-00526 30 5 22.60072 18.75832 0.0032959 9.50e-004 2.4e-00527 31 6 22.60058 13.49656 0.0033100 7.17e-004 2.5e-00528 32 6 22.60058 15.00512 0.0032581 6.63e-004 2.4e-00529 33 6 22.60058 12.07966 0.0032579 5.15e-004 2.4e-00530 34 6 22.60058 18.74864 0.0029156 4.98e-004 1.9e-00531 35 6 22.60029 13.50192 0.0014579 2.08e-004 4.8e-00632 36 6 22.60023 18.75746 0.0013410 1.30e-004 4.1e-00633 37 6 22.60023 12.08307 6.37e-004 1.07e-004 9.1e-00734 38 6 22.60022 15.01196 0.0017627 8.47e-005 7.0e-00635 39 6 22.60022 13.50390 2.60e-004 7.77e-005 1.5e-00736 40 6 22.60020 13.49200 9.53e-004 5.23e-005 2.0e-00637 41 6 22.60017 15.00538 6.15e-007 1.98e-005 8.5e-013

Successful Termination After 37 IterationsGCONV convergence criterion satisfied.Criterion 22.60017281 Max Grad Entry 15.00538265N. Grad Storage 10N. Function Calls 42 N. Gradient Calls 42Preproces. Time 1 Time for Method 0Effective Time 1

Objective function seems to be convex.





1 X_1 1.12432717 -15.0053832 X_2 0.97965293 -8.16277653 X_3 1.47861688 3.82893504 X_4 0.91989151 7.35913215 X_5 1.12454714 0.9963771


4. The unconstrained and nonconvex case BTNC:

x0 = [ 4#0. 1. ];

print "Bundle Trust-Region: MIN: NONConvex Algorithm";mopt = [ "tech" "nondif" ,

"vers" 2 , /* NONDIF version BTR */"corrs" 10 , /* number of corrections */"maxit" 1000 ,"maxfu" 5000 ,"print" 5 ];

< xr,rp,der1,der2 > = nlp(fshor5,x0,mopt,.,.,.,gshor5);

Bundle Trust Region Method (Outrata-Schramm-Zowe, 1991)Convexity of Objective Function NOT Assumed



Iter nfun act optcrit maxgrad gradnrm alpha rho2 3 2 80.00000 40.00000 28.284271 226.50044 32.064032 4 1 80.00000 67.95013 56.568542 0.0000000 1.2825613 5 1 37.80071 18.79501 30.120043 0.0000000 0.3636134 6 2 37.80071 21.00906 16.420987 4.2711930 0.1080755 7 2 27.19197 14.92391 11.981370 0.6595729 0.2301456 8 3 27.19197 21.79488 8.0233448 1.5248242 0.1032057 9 4 24.38932 12.48165 6.3630811 1.2120491 0.0649128 10 4 23.67483 14.12845 3.2516058 1.6198609 0.0169519 11 4 23.67483 15.78242 3.0180048 0.9353277 0.014603

10 12 4 23.32867 19.11006 1.5084959 0.8052569 3.6e-00311 13 4 22.77759 12.13325 0.8644411 0.2729511 1.2e-00312 14 4 22.77759 13.74625 0.3939206 0.1930694 2.5e-00413 15 4 22.64137 15.02575 0.0853155 0.0523139 1.2e-00514 16 4 22.61430 18.76364 0.1898998 0.0197479 5.8e-00515 17 4 22.61430 13.50106 0.0750897 0.0144816 9.0e-00616 18 4 22.60080 13.49219 0.0073361 8.30e-004 3.5e-007


17 19 4 22.60071 12.08238 0.0102082 5.90e-004 6.7e-00718 20 4 22.60030 18.75782 0.0035766 1.46e-004 8.2e-00819 21 4 22.60021 14.99990 0.0070167 4.81e-005 1.3e-00620 22 4 22.60018 18.75706 0.0024376 1.53e-005 1.5e-00721 23 4 22.60018 15.00747 9.35e-004 1.36e-005 2.2e-00822 24 4 22.60016 15.00476 2.68e-004 1.88e-006 7.4e-00923 25 4 22.60016 12.08215 3.71e-004 8.86e-007 1.4e-00824 26 5 22.60016 13.49310 2.83e-004 2.99e-007 8.2e-00925 27 4 22.60016 13.49242 1.52e-004 6.31e-008 9.5e-00926 28 5 22.60016 15.00554 3.11e-005 6.65e-008 4.0e-010

Successful Termination After 26 IterationsGCONV convergence criterion satisfied.Criterion 22.60016225 Max Grad Entry 15.00553962N. Grad Storage 10N. Function Calls 29 N. Gradient Calls 29Preproces. Time 0 Time for Method 0Effective Time 0





1 X_1 1.12430755 -15.0055402 X_2 0.97946424 -8.16428613 X_3 1.47763403 3.82107224 X_4 0.92018564 7.36148515 X_5 1.12429511 0.9943609


5. The boundary constrained case BTNCBC:

x0 = [ 4#0. 1. ];lbc = [ 5#0. ]; ubc = [ 4#1. 2. ];bc = lbc‘ -> ubc‘;


"vers" 2 , /* NONDIF version BTR */"corrs" 10 , /* number of corrections */


"maxit" 1000 ,"maxfu" 5000 ,"print" 5 ];

< xr,rp,der1,der2 > = nlp(fshor5,x0,mopt,bc,.,.,gshor5);



Parameter Estimate Gradient Lower BC Upper BC

1 X_1 0.00000000 -20.000000 0.0000000 1.00000002 X_2 0.00000000 -40.000000 0.0000000 1.00000003 X_3 0.00000000 -20.000000 0.0000000 1.00000004 X_4 0.00000000 -20.000000 0.0000000 1.00000005 X_5 1.00000000 -20.000000 0.0000000 2.0000000


Bundle Trust Region Method (Outrata-Schramm-Zowe, 1991)Convexity of Objective Function NOT Assumed


Iteration Start:N. Variables 5N. Bound. Constr. 10 N. Mask Constr. 0Criterion 80.00000000 Max Grad Entry 40.00000000N. Active Constraints 4

Iter nfun act optcrit maxgrad gradnrm alpha rho2 1 1 36.00000 40.00000 29.393877 0.0000000 0.2700003 2 2 36.00000 21.20000 15.767152 2.7485825 0.0776884 3 2 28.02512 18.77268 8.0318882 0.2014390 0.1814385 4 3 28.02512 21.46389 6.1145740 1.3382217 0.1051546 5 3 25.66330 13.83168 2.3641517 2.3610623 0.0157207 6 4 24.99637 12.73736 1.0889734 2.0756864 0.0300178 7 4 24.75232 19.86490 0.4504446 1.4993444 5.1e-0039 8 4 24.75232 16.00000 0.7311917 0.9890784 0.013533

10 9 4 23.87084 12.85575 0.0134521 0.1386117 4.1e-00511 10 4 23.87084 19.32959 0.0167079 0.1074100 6.4e-00512 11 4 23.82708 16.00000 0.0085580 0.0475927 1.7e-00513 12 4 23.79611 12.44946 7.64e-004 0.0145165 1.2e-00614 13 4 23.78211 12.86109 3.41e-006 3.15e-004 2.2e-01015 14 4 23.78205 19.28252 2.22e-006 2.00e-004 9.1e-011


16 15 4 23.78194 16.00000 4.50e-007 6.66e-005 3.4e-011

Successful Termination After 16 IterationsGCONV convergence criterion satisfied.Criterion 23.78194219 Max Grad Entry 16.00000000N. Active Constraints 2 N. Grad Storage 10N. Function Calls 16 N. Gradient Calls 16Preproces. Time 0 Time for Method 0Effective Time 0




Parameter Estimate Gradient Active BC

1 X_1 1.00000000 -16.000000 Upper BC2 X_2 0.88835984 -8.89312133 X_3 1.00000000 0.0000000 Upper BC4 X_4 0.83940002 6.71520015 X_5 1.07175865 0.5740692


6. The linear constrained convex case BTCLC:

x0 = [ 5#1. ];lbc = [ 5#0. ]; ubc = [ 5#2. ];bc = lbc‘ -> ubc‘;lc = [ . 1. 0. 0. 0. 1. 2. ]; /* IC */


"vers" 2 , /* NONDIF version BTR */"fconvex" , /* f assumed convex */"corrs" 10 , /* number of corrections */"maxit" 30 ,"maxfu" 5000 ,"print" 5 ];

< xr,rp,der1,der2 > = nlp(fshor5,x0,mopt,bc,lc,.,gshor5);

******************


Optimization Start******************



1 X_1 1.00000000 -10.000000 0.0000000 2.00000002 X_2 1.00000000 0.0000000 0.0000000 2.00000003 X_3 1.00000000 0.0000000 0.0000000 2.00000004 X_4 1.00000000 0.0000000 0.0000000 2.00000005 X_5 1.00000000 -20.000000 0.0000000 2.0000000


Linear Constraints------------------

[ 1]ACT-2.0000000 <= - 1.00000 * X_1 - 1.00000 * X_5( 0.00000 )

Bundle Trust Region Method (Outrata-Schramm-Zowe, 1991)Convex Objective Function Assumed


Iteration Start:N. Variables 5N. Bound. Constr. 10 N. Mask Constr. 0N. Linear Constr. 1 Lin. Equ. Constr. 0Criterion 25.00000000 Max Grad Entry 20.00000000N. Active Constraints 1

Iter nfun act optcrit maxgrad gradnrm alpha rho1 0 1 25.00000 20.00000 6.32e-004 0.0000000 1.0000001 1 1 25.00000 24.00000 0.0063182 0.0000000 1.0000001 2 1 25.00000 24.00000 0.0631824 0.0000000 1.0000001 3 1 25.00000 24.00000 0.6318237 0.0000000 1.0000001 4 1 25.00000 24.00000 6.3182371 0.0000000 1.0000001 5 1 25.00000 24.00000 7.0710678 0.0000000 0.0125252 6 2 25.00000 16.89532 4.1408339 0.1921540 4.3e-0033 7 2 24.46701 19.42406 2.9375024 0.0106646 0.0194544 8 3 24.46701 13.78642 2.1750176 0.0348509 0.0106655 9 3 24.22694 16.81854 1.8779548 0.0833776 8.0e-0036 10 3 24.22694 18.69935 0.1232769 0.0659432 3.4e-0057 11 3 24.22694 13.23451 0.1358220 0.0575472 4.2e-0058 12 3 24.22694 18.97823 0.8717513 0.0066635 1.7e-0039 13 3 24.18514 13.44354 0.0541816 0.0212196 6.0e-005


10 14 4 24.18258 16.98728 0.0101920 0.0145210 2.1e-00611 15 3 24.18258 18.77699 0.2847616 0.0034695 1.6e-00312 16 3 24.18258 13.69354 0.2050968 0.0053245 8.5e-00413 17 3 24.17809 18.63972 0.0769498 0.0032218 1.2e-00414 18 3 24.17809 13.57858 0.0402430 0.0032119 3.3e-00515 19 3 24.17809 17.07058 0.1063510 0.0015768 2.3e-00416 20 3 24.17653 18.70437 0.0337468 7.94e-004 2.3e-00517 21 2 24.17574 18.68292 1.7878440 5.92e-005 0.58370717 22 2 24.17574 21.99353 1.7878440 5.92e-005 5.8e-00318 23 3 24.17574 13.92120 0.0599509 0.0062178 6.6e-00619 24 3 24.17574 13.58818 0.0109461 8.71e-005 2.2e-00720 25 3 24.17574 17.05213 0.0034029 4.49e-005 2.1e-00821 26 3 24.17574 13.58131 0.0023912 3.74e-005 1.0e-00822 27 3 24.17571 17.05405 7.78e-004 1.95e-006 1.0e-00823 28 4 24.17570 18.68201 2.63e-004 2.58e-007 1.0e-00824 29 4 24.17570 18.68244 0.0065299 0.0063248 6.3e-00625 30 3 24.17570 13.59134 1.63e-004 7.36e-006 3.9e-00926 31 3 24.17570 13.58076 2.38e-005 3.75e-008 8.4e-011

Successful Termination After 26 IterationsGCONV convergence criterion satisfied.Criterion 24.17577254 Max Grad Entry 13.58075529N. Active Constraints 1 N. Grad Storage 10N. Function Calls 32 N. Gradient Calls 32Preproces. Time 0 Time for Method 0Effective Time 0





1 X_1 0.86827039 10.4192452 X_2 1.08538160 13.0245793 X_3 1.12003319 -10.5596024 X_4 0.79458520 -2.46497765 X_5 1.13172961 13.580755


Linear Constraints Evaluated at Solution----------------------------------------


[ 1]ACT -1.00000 * X_1 - 1.00000 * X_5 + 2.00000= -2.2204e-016

7. The linear constrained nonconvex case BTNCLC: Since the SHOR function is convex the same result isobtained as with the BLCLC algorithm.

UOBYQA, NEWUOA, and BOBYQA Techniques

Three new algorithms were added which performs Powell’s Unconstrained and Bound constrained OptimizationBY Quadratic Approximation (UOBYQA and POBYQA):

UOBYQA the original algorithm by Powell (2000): Report DAMTP 2000/NA14, University of Cambridge.This algorithm is using very much memory for large n but could be faster than the other.

NEWUOA the updated algorithm by Powell (2003): Report DAMTP 2003/NA03, University of Cambridge.This algorithm is using much less memory for large n but is usually slower than the other.

BOBYQA this is Powell’s (2008) modification of the NEWUOA algorithm which permits the specification ofinequality boundary constraints (masks, i.e. bounds where lower equal to upper bound are not permittedwith BOBYQA).

The UOBYQA and NEWUOA algorithms are available by setting "tech" to UOBYQA. The NEWUOA is chosenby default (setting "vers" to 0), for selecting the UOBYQA algorithm "vers" must be set to 1. For NEWUOAthe number of optimization points may be specified using the intpoi option which expects an integer in the[n+ 2, (n+ 1)(n+ 2)/2] interval. For UOBYQA the number of interpolation points is set to (n ∗n+ 3 ∗ n+ 2)/2.BOBYQA is an extension of NEWUOA for constraining the feasible region to a hyper cube with a finite lengthof edges.

At the end of the optimization we added some code for computing an approximate gradient by central finitedifferences for an idea how well the result satisfies optimality condition. With BOBYQA we also print themaximum constraint violation and the maximum gradient of the Lagrange function, i.e. the maximum of gradientvalues w.r.t. inactive variables (not at one of the bounds).

Testing confirms, the number of interpolation points has an impact on the memory allocation and numericalperformance for NEWUOA and BOBYQA.

Memory requirements:

UOBYQA (n4 +8∗n3 +23∗n2 +42∗n+max[2∗n2+4, 18∗n])/4 for a fixed number npt = (n∗n+3∗n+2)/2of interpolation points

NEWUOA (npt + 11) ∗ (npt + n) + n ∗ (3 ∗ n + 11)/2 where npt is the specified number of points, by defaultnpt = 2 ∗ n+ 1

BOBYQA (npt + 5) ∗ (npt + n) + 3 ∗ n ∗ (n + 5)/2 where npt is the specified number of points, by defaultnpt = 2 ∗ n+ 1

These new algorithms are designed for problems where derivatives are not easily available or the function is notsmooth (discontinuous first order derivatives). They are related to the COBYLA (Constrained Optimization BYLinear Approximation) algorithm which was developed by Powell in 1992.


The new algorithms compete especially with the Nelder-Mead algorithm. However, when comparing results weshould have in mind that the results obtained from the Nelder-Mead implementation are not nearly as precise asthose from UOBYQA, NEWUOA, and BOBYQA.

The first example features the Chebyquad function for n = 8:

print "\n *** Test NLPUOB: Chebyquad Function: m=n=8 ***\n";n = 8; u1 = cons(n);for (i = 2; i <= n; i+=2) u1[i] = 1. / (i * i - 1);

function fcheby81(x) global(u1) {f = cons(8);for (k = 1; k <= 8; k++) {t1 = 1.; t2 = 2. * x[k] - 1.; s = t2 + t2;for (i = 1; i <= 8; i++) {

f[i] += t2;t = t2 * s - t1; t1 = t2; t2 = t;

} }tt = 1. / 8.;f = tt * f + u1;crit = .5 * f[**];return(crit);

}

x0 = [ 1.:8. ] * .1111111111;

For comparison, we run the Nelder-Mead algorithm first:

mopt = [ "tech" "nmsimp" ,"print" 3 ];

< xr,rp > = nlp(fcheby81,x0,mopt);print "rp=",rp; print "xr=",xr;

Nelder-Mead Simplex Optimization

Iteration Start:N. Variables 8Criterion 0.019308849

Iter rest nfun act optcrit difcrit std delta size1 0 14 0 0.01931 94665.5 29279 1.0000 2.35901...........................................................

65 0 508 0 2e-003 1e-005 4e-006 1.0000 4e-00366 0 516 0 2e-003 1e-005 3e-006 1.0000 4e-00367 0 522 0 2e-003 5e-006 2e-006 1.0000 4e-00368 0 531 0 2e-003 2e-006 7e-007 1.0000 5e-003


Successful Termination After 68 IterationsFCONV2 convergence criterion satisfied.Criterion 0.001763053N. Function Calls 533 Preproces. Time 0Time for Method 1 Effective Time 1



Parameter Estimate

1 X_1 0.042986962 X_2 0.192454583 X_3 0.265519964 X_4 0.498981345 X_5 0.498794376 X_6 0.731789867 X_7 0.807183998 X_8 0.95685882


The default NEWUOA algorithm has a problem between iteration 3 and 4 which obviously results from theminimum number 2n+ 1 = 17 of interpolation points:

mopt = [ "tech" "uobyqa" ,"print" 3 ];


NEWUOA Algorithm by M.J.D. Powell (2004)

Iter nfun optcrit difcrit rho1 18 0.01930885 0.00000000 0.050000002 37 0.01161401 0.00769483 0.005000003 98 0.00671276 0.00490125 5.000e-0044 1249 0.00175929 0.00495347 5.000e-0055 1349 0.00175844 8.473e-007 7.071e-0066 1391 0.00175844 1.744e-009 1.000e-0067 1426 0.00175844 5.344e-011 1.000e-006


Successful Termination After 7 IterationsCriterion 0.001758437 Max Grad Entry 1.3873e-006N. Function Calls 1427 N. Gradient Calls 1Preproces. Time 0 Time for Method 4Effective Time 4



Parameter Estimate

1 X_1 0.043152802 X_2 0.193090983 X_3 0.266328894 X_4 0.500000375 X_5 0.500000056 X_6 0.733671287 X_7 0.806909318 X_8 0.95684718


This problem looks much milder when we are specifying some ineffective boundary constraints and use theBOBYQA technique:

/* Bound Const. Opt. BY Qadrat. Approx. : BOBYQA */n = 8;bc = cons(n,1,-100000.) -> cons(n,1,100000.);

print "Specified npt=17, minimum number of int points";mopt = [ "tech" "bobyqa" ,

"print" 4 ];< xr,rp > = nlp(fcheby81,x0,mopt,bc);

BOBYQA Algorithm by M.J.D. Powell (2008)

Iter nfun optcrit difcrit rho1 18 0.01930885 0.00000000 0.050000002 22 0.01930885 0.00000000 0.005000003 47 0.01358139 0.00572746 5.000e-0044 226 0.00180071 0.01178068 5.000e-0055 319 0.00175844 4.227e-005 7.071e-006


6 329 0.00175844 7.007e-010 1.000e-0067 355 0.00175844 6.694e-011 1.000e-006

Successful Termination After 7 IterationsCriterion 0.001758437 Max Grad Entry 9.6120e-007Max Const Viol. 0.000000000 Max Grad LagF. 9.6120e-007N. Active Constraints 0N. Function Calls 356 N. Gradient Calls 1Preproces. Time 0 Time for Method 1Effective Time 1

Increasing the number of interpolation points to 32 shows a much better iteration history of NEWUOA:

print "specified number interpolation points: 32";mopt = [ "tech" "uobyqa" ,

"intpoi" 32 ,"print" 3 ];

< xr,rp > = nlp(fcheby81,x0,mopt);print "rp=",rp; print "xr=;


Iter nfun optcrit difcrit rho1 33 0.01930885 0.00000000 0.050000002 80 0.00725903 0.01204982 0.005000003 167 0.00325651 0.00400251 5.000e-0044 625 0.00175847 0.00149805 5.000e-0055 662 0.00175844 2.769e-008 7.071e-0066 668 0.00175844 3.407e-010 1.000e-0067 692 0.00175844 2.359e-010 1.000e-006




Parameter Estimate


1 X_1 0.043152602 X_2 0.193090563 X_3 0.266328624 X_4 0.499999535 X_5 0.500000026 X_6 0.733671047 X_7 0.806909018 X_8 0.95684714


print "specified number interpolation points: 44";mopt = [ "tech" "uobyqa" ,

"intpoi" 44 ,"print" 3 ];



Iter nfun optcrit difcrit rho1 45 0.01930885 0.00000000 0.050000002 112 0.00541564 0.01389321 0.005000003 276 0.00176228 0.00365336 5.000e-0044 321 0.00175844 3.837e-006 5.000e-0055 342 0.00175844 9.477e-010 7.071e-0066 345 0.00175844 2.608e-013 1.000e-0067 362 0.00175844 1.498e-011 1.000e-006




Parameter Estimate

1 X_1 0.043152742 X_2 0.193090793 X_3 0.26632868


4 X_4 0.499999965 X_5 0.499999946 X_6 0.733671257 X_7 0.806909128 X_8 0.95684722


The old UOBYQA algorithm performs excellent with the large number of interpolation points:

mopt = [ "tech" "uobyqa" ,"vers" 1 ,"print" 5 ];


UOBYQA Algorithm by M.J.D. Powell (2002)

Iter nfun optcrit difcrit rho1 46 0.01930885 0.00000000 0.050000002 136 0.00358936 0.01571949 0.005000003 236 0.00175990 0.00182946 5.000e-0044 282 0.00175847 1.428e-006 5.000e-0055 327 0.00175844 3.429e-008 7.071e-0066 372 0.00175844 1.616e-010 1.000e-0067 417 0.00175844 0.00000000 1.000e-006




Parameter Estimate

1 X_1 0.043152762 X_2 0.193090843 X_3 0.266328714 X_4 0.50000000


5 X_5 0.500000006 X_6 0.733671297 X_7 0.806909168 X_8 0.95684724


The second example shows results for the Rosenbrock function for n = 2:

function frosbr1(x) {/* crit = .5 * f’ * f */r1 = 10. * (x[2] - x[1] * x[1]);r2 = 1. - x[1];crit = .5 * (r1 * r1 + r2 * r2);return(crit);

}

x0 = [ -1.2 1.];

mopt = [ "tech" "nmsimp" ,"print" 3 ];

< xr,rp > = nlp(frosbr1,x0,mopt);print "rp=",rp; print "xr=",xr;


Iteration Start:N. Variables 2Criterion 12.10000000

Iter rest nfun act optcrit difcrit std delta size1 0 12 0 2.34371 2.65588 1.0873 1.0000 0.386722 0 22 0 1.90558 0.14397 6e-002 1.0000 0.113653 0 32 0 1.47515 0.21064 9e-002 1.0000 0.371034 0 41 0 1.07103 0.17427 8e-002 1.0000 0.218315 0 51 0 0.78360 0.12162 5e-002 1.0000 0.098126 0 60 0 0.57881 0.05905 2e-002 1.0000 0.132497 0 69 0 0.33466 0.06830 3e-002 1.0000 0.148698 0 79 0 0.28183 5e-003 2e-003 1.0000 0.027949 0 88 0 0.21045 0.04213 2e-002 1.0000 0.06359

10 0 98 0 0.08917 0.06314 3e-002 1.0000 0.2086011 0 108 0 0.02495 0.02190 9e-003 1.0000 0.1589712 0 117 0 1e-002 0.01181 5e-003 1.0000 0.1403813 0 125 0 5e-003 2e-003 9e-004 1.0000 0.1051814 0 134 0 3e-004 4e-004 2e-004 1.0000 0.05466


15 0 143 0 1e-005 5e-005 2e-005 1.0000 0.0154316 0 153 0 6e-007 9e-007 4e-007 1.0000 4e-003

Successful Termination After 16 IterationsFCONV2 convergence criterion satisfied.Criterion 6.1951e-007N. Function Calls 155 Preproces. Time 0Time for Method 0 Effective Time 0



Parameter Estimate

1 X_1 1.000985152 X_2 1.00191945

Value of Objective Function = 6.19514e-007

mopt = [ "tech" "uobyqa" ,"print" 5 ];



Iter nfun optcrit difcrit rho1 6 2.60000000 9.50000000 0.050000002 24 1.85569005 0.74430995 0.005000003 136 0.00343971 1.85225034 5.000e-0044 163 5.532e-009 0.00343971 5.000e-0055 171 1.077e-010 5.424e-009 7.071e-0066 175 3.668e-014 1.077e-010 1.000e-0067 180 4.339e-017 3.663e-014 1.000e-006

Successful Termination After 7 IterationsCriterion 5.9691e-019 Max Grad Entry 1.0871e-008N. Function Calls 181 N. Gradient Calls 1Preproces. Time 0 Time for Method 0Effective Time 0

********************


Optimization Results********************


Parameter Estimate

1 X_1 1.000000002 X_2 1.00000000


mopt = [ "tech" "uobyqa" ,"vers" 1 ,"print" 5 ];


UOBYQA Algorithm by M.J.D. Powell (2002)

Iter nfun optcrit difcrit rho1 8 2.19261331 9.90738669 0.050000002 52 0.08467533 2.10793798 0.005000003 98 4.327e-007 0.08467490 5.000e-0044 105 3.972e-009 4.288e-007 5.000e-0055 110 8.791e-012 3.964e-009 7.071e-0066 111 8.501e-014 8.706e-012 1.000e-0067 115 7.764e-017 8.493e-014 1.000e-006

Successful Termination After 7 IterationsCriterion 3.9129e-021 Max Grad Entry 2.7737e-008N. Function Calls 116 N. Gradient Calls 1Preproces. Time 0 Time for Method 0Effective Time 0



Parameter Estimate

1 X_1 1.000000002 X_2 1.00000000



We conclude:

1. The Nelder-Mead algorithm converges fast for a rough precision, but will take a long time for high precision.Nelder-Mead does not need much memory and may also perform better for nonsmooth functions.

2. For high precision unconstrained optimization UOBYQA and NEWUOA are preferred to NMSIMP. Forsmall n UOBYQA is preferred to NEWUOA, for large n UOBYQA may run out of memory.

For testing BOBYQA we ran the example which is attached to the software. First we specify the module for theobjective function:

function fsrecip(x) {n = ncol(x);crit = 0.;for (i = 4; i <= n; i+=2)for (j = 2; j <= i-2; j+=2) {t1 = x[i-1] - x[j-1]; t2 = x[i] - x[j];tt = t1 * t1 + t2 * t2;if (tt < 1.e-6) tt = 1.e-6;crit += 1. / sqrt(tt);

}return(crit);

}

The following is the CMAT specification for a simple run for n=10:

m = 5; n = 2 * m;pi2 = 2. * macon("pi");bc = cons(n,1,-1.) -> cons(n,1,1.);x0 = cons(1,n,.);for (j = k = 1; j <= m; j++, k+=2) {xin = (pi2 / (real)m) * j;x0[k] = cos(xin); x0[k+1] = sin(xin);

}

crit = fsrecip(x0);print "F(x0)=",crit;

npt = 2*n + 1;/* rhobeg = "instep" = 1.e-1;

rhoend = "absxtol" = 1.e-6; */


mopt = [ "tech" "bobyqa" ,"intpoi" npt ,"instep" .1 ,"absxtol" 1.e-6 ,"maxfun" 5000 ,"print" 4 ];

< xr,rp > = nlp(fsrecip,x0,mopt,bc);print "m=",m," n=",n," npt=",npt;print "rp=",rp; print "xr=",xr;

The first test run for m=5, i.e. n=10, gives the same results as they are reported by M. Powell (2008):


Iter nfun optcrit difcrit rho1 44 5.60889786 1.27301174 0.010000002 59 5.60156025 0.00733762 0.001000003 73 5.60153398 2.627e-005 1.000e-0044 79 5.60153397 6.357e-009 1.000e-0055 94 5.60153397 0.00000000 1.000e-0066 106 5.60153397 1.902e-011 1.000e-006

Successful Termination After 6 IterationsCriterion 5.601533972 Max Grad Entry 1.338420835Max Const Viol. 0.000000000 Max Grad LagF. 1.3362e-007N. Active Constraints 9N. Function Calls 107 N. Gradient Calls 1Preproces. Time 0 Time for Method 0Effective Time 0



Parameter Estimate Active BC

1 X_1 1.00000000 Upper BC2 X_2 1.00000000 Upper BC3 X_3 -1.00000000 Lower BC4 X_4 1.00000000 Upper BC5 X_5 -1.00000000 Lower BC6 X_6 -1.00000000 Lower BC7 X_7 1.00000000 Upper BC8 X_8 -1.00000000 Lower BC9 X_9 1.00000000 Upper BC


10 X_10 3.401e-008


The following CMAT input illustrates the entire test run for the example supplied with the software:

pi2 = 2. * macon("pi");for (m = 5; m <= 10; m++) {n = 2 * m; x0 = cons(1,n,.);bc = cons(n,1,-1.) -> cons(n,1,1.);for (j = k = 1; j <= m; j++, k+=2) {

xin = (pi2 / (real)m) * j;x0[k] = cos(xin); x0[k+1] = sin(xin);

}

crit = fsrecip(x0);print "m=",m," F(x0)=",crit;

for (jc = 1; jc <= 2; jc++) {npt = (jc == 1) ? n + 6 : 2*n + 1;print "m=",m," jc=",jc;mopt = [ "tech" "bobyqa" ,

"intpoi" npt ,"instep" .1 ,"absxtol" 1.e-6 ,"maxfun" 5000 ,"print" 4 ];

< xr,rp > = nlp(fsrecip,x0,mopt,bc);print "m=",m," jc=",jc," n=",n," npt=",npt;print "rp=",rp; print "xr=",xr;

} }

We obtain the same results as are reported by Powell (2008).

1.4.8 Remarks in LINCOA (Powell, 2014)

Professor Mike Powell’s optimization algorithm LINCOA was added to the list of available algorithms of the nlp()function. With the help of the Intel Parallel Studio XE 2013 for Fortran/Fortran90 compilation an easy interfacewith the calling C code in cmat was implemented which calls directly Prof. Powell’s Fortran code. Not much hadto be added to the options of nlp() since most were already made available for the UOBYQA (NEWUOA) andBOBYQA techniques:

• the "tech" option now permits "lincoa" (and the shorter alias "lnc")

• the number of interpolation points can be specified with the "intpoi" option inside the range [n+ 1, (n+1)(n+ 2)/2] and is otherwise set by default to 2n+ 1, the same as for BOBYQA


• the "rhobeg" (and as an alias the "instep") option can be used for specifying RHOBEG, the size of the initialtrust region, which is by default = .5, the same as for BOBYQA

• the "rhoend" (and as an alias the "absxtol") option can be used for specifying RHOEND, the size of the finaltrust region, which is by default = 1.e− 6, the same as for BOBYQA

• the maximum number of function values for LINCOA is set by default to 3000, the same as for BOBYQA,but of course, you may just use the "maxit" option for setting a larger limit.

In my way of thinking of the methods UOBYQA, NEWUOA, BOBYQA, and LINCOA I differ slightly with Prof.Powell: when I am talking of iterations I mean all computations which is performed for a constant value of ρ, thesize of the trust region. There for the number of changes of ρ when stepping from "RHOBEG" to "RHOEND" is forme the number of iterations, which is visible in the printout of the iteration history.

The new "rhobeg" and "rhoend" option names were added for an easier understanding of COBYLA, UOBYQA,NEWUOA, BOBYQA, and LINCOA. The old alias names of nstep" and "absxtol" can still be used.

Some extensive testing was done with about 60 examples. They are all converging to the correct optimum ansdif there are only boundary constraints specified sometimes with much less function calls than is needed with theolder BOBYQA algorithm. At this point there is no reference available I could cite.

Two examples illustrate the usefulness of LINCOA:

1. Example coming with the Fortran code of LINCOA:

For eplanation Mike Powell writes: ”Calculate the tetrahedron of least volume that encloses the points(XP(J),YP(J),ZP(J)), J=1,2,...,NP. Our method requires the origin to be strictly inside the convex hull ofthese points. There are twelve variables that define the four faces of each tetrahedron that is considered.Each face has the form ALPHA*X + BETA*Y + GAMMA*Z=1, the variables X(3K-2), X(3K-1) andX(3K) being the values of ALPHA, BETA and GAMMA for the K-th face, K=1,2,3,4. Let the set Tcontain all points in three dimensions that can be reached from the origin without crossing a face. Becausethe volume of T may be infinite, the objective function is the smaller of FMAX and the volume of T, whereFMAX is set to an upper bound on the final volume initially. There are 4*NP linear constraints on thevariables, namely that each of the given points (XP(J),YP(J),ZP(J)) shall be in T. Let XS = min XP(J), YS= min YP(J), ZS = min ZP(J) and SS = max XP(J)+YP(J)+ZP(J), where J runs from 1 to NP. The initialvalues of the variables are X(1)=1/XS, X(5)=1/YS, X(9)=1/ZS, X(2)=X(3)=X(4)=X(6)=X(7)=X(8)=0and X(10)=X(11)=X(12)=1/SS, which satisfy the linear constraints, and which provide the bound FMAX=(SS-XS-YS-ZS)**3/6. Other details of the test calculation are given below, including the choice of the datapoints (XP(J),YP(J),ZP(J)), J=1,2,...,NP. The smaller final value of the objective function in the caseNPT=35 shows that the problem has local minima.”

n = 12; np = 50;pi = macon("pi");xp = yp = zp = cons(np);

/*--- Set the data points ---*/sx = sy = sz = 0.;for (i = 1; i <= np; i++) {

theta = (i - 1.) * pi / (np - 1.);xp[i] = cos(theta) * cos(2. * theta);sx += xp[i];


yp[i] = sin(theta) * cos(2. * theta);sy += yp[i];zp[i] = sin(2. * theta);sz += zp[i];

}sx /= np; sy /= np; sz /= np;for (i = 1; i <= np; i++) {

xp[i] -= sx; yp[i] -= sy; zp[i] -= sz;}/* print "XYZ=", xyz = xp -> yp -> zp; */

/*--- Set the linear constraints ---*/nlc = 4 * np;blc = cons(nlc,1,1.);alc = cons(nlc,n,0.);k = 1;for (j = 1; j <= np; j++)for (i = 1; i <= 4; i++, k++) {

alc[k,3*i-2] = xp[j];alc[k,3*i-1] = yp[j];alc[k,3*i ] = zp[j];

}mis = cons(nlc,1,.);lc = mis -> alc -> blc; /* print "LC=", lc; */

/* Set the initial vector of variables. The JCASE=1,6 loop gives sixdifferent choices of NPT when LINCOA is called. */

sx = sy = sz = ss = 0.;for (i = 1; i <= np; i++) {

sx = min(sx,xp[i]);sy = min(sy,yp[i]);sz = min(sz,zp[i]);tt = xp[i] + yp[i] + zp[i];ss = max(ss,tt);

}fmax = pow(ss - sx - sy - sz,3.) / 6.;print "Fmax=", fmax," sx,...=",sx,sy,sz,ss;

Fmax= 16.521 sx,...=-1.0000-0.7893-0.9995 1.8393

The following is the objective function:

function ftetrah(x) global(fmax) {n = ncol(x); /* print "X=",x; */crit = fmax;v12 = x[1]*x[ 5] - x[ 4]*x[2];


v13 = x[1]*x[ 8] - x[ 7]*x[2];v14 = x[1]*x[11] - x[10]*x[2];v23 = x[4]*x[ 8] - x[ 7]*x[5];v24 = x[4]*x[11] - x[10]*x[5];v34 = x[7]*x[11] - x[10]*x[8];

del1 = v23*x[12] - v24*x[9] + v34*x[6];/* print "del1=",del1; */if (del1 > 0.) {

del2 = -v34*x[3] - v13*x[12] + v14*x[9];/* print "del2=",del2; */if (del2 > 0.) {del3 = -v14*x[6] + v24*x[3] + v12*x[12];/* print "del3=",del3; */if (del3 > 0.) {

del4 = -v12*x[9] + v13*x[6] - v23*x[3];if (del4 > 0.) {

t1 = del1 + del2 + del3 + del4;t2 = del1 * del2 * del3 * del4;tt = pow(t1,3.) / t2;crit = min(tt / 6.,fmax);

} } } }return(crit);

}

We must specify a starting point for the optimization:

x0 = cons(1,n,0.);x0[1] = 1. / sx; x0[5] = 1. / sy; x0[9] = 1. / sz;x0[10] = x0[11] = x0[12] = 1. / ss;crit = ftetrah(x0); print "F(x0)=",crit;

F(x0)= 16.521

npt = 15;mopt = [ "tech" "lincoa" ,

"intpoi" npt ,"rhobeg" 1.0 ,"rhoend" 1.e-6 ,"maxfun" 10000 ,"print" 4 ];

< xr,rp > = nlp(ftetrah,x0,mopt,.,lc);crit = ftetrah(xr); print "F(xr)=",crit;

******************Optimization Start


******************


Parameter Estimate

1 X_1 -1.000000002 X_2 03 X_3 04 X_4 05 X_5 -1.267015546 X_6 07 X_7 08 X_8 09 X_9 -1.00051405

10 X_10 0.5436904911 X_11 0.5436904912 X_12 0.54369049



[ 1] -1.0000000 <= - 1.00000 * X_1 - 0.20818 * X_2- 9e-018 * X_3 ( 2.00000 )

[ 2] -1.0000000 <= - 1.00000 * X_4 - 0.20818 * X_5- 9e-018 * X_6 ( 1.26376 )

[ 3] -1.0000000 <= - 1.00000 * X_7 - 0.20818 * X_8- 9e-018 * X_9 ( 1.00000 )

[ 4] -1.0000000 <= - 1.00000 * X_10 - 0.20818 * X_11- 9e-018 * X_12 ( 0.34313 )

[ 5] -1.0000000 <= - 0.98975 * X_1 - 0.27172 * X_2- 0.12788 * X_3 ( 1.98975 )

..............................................................

[ 197]ACT-1.0000000 <= + 1.00000 * X_1 - 0.20818 * X_2+ 2e-016 * X_3 ( 0 )

[ 198] -1.0000000 <= + 1.00000 * X_4 - 0.20818 * X_5+ 2e-016 * X_6 ( 1.26376 )

[ 199] -1.0000000 <= + 1.00000 * X_7 - 0.20818 * X_8+ 2e-016 * X_9 ( 1.00000 )

[ 200] -1.0000000 <= + 1.00000 * X_10 - 0.20818 * X_11+ 2e-016 * X_12 ( 1.43051 )

LINCOA Algorithm by M.J.D. Powell (2013)


*** Termination Criteria ***Minimum Iterations . . . . . . . . . . . . . . . 0Maximum Iterations . . . . . . . . . . . . . . . 1000Maximum Function Calls. . . . . . . . . . . . . . 10000ABSFCONV Function Criterion . . . . . . . . . . . 0FCONV Function Criterion . . . . . . . . . . . . 2.22e-016FCONV2 Function Criterion . . . . . . . . . . . . 1e-006FSIZE Parameter . . . . . . . . . . . . . . . . . 0ABSXCONV Parameter Change Criterion . . . . . . . 1e-006XCONV Parameter Change Criterion . . . . . . . . 1e-008XSIZE Parameter . . . . . . . . . . . . . . . . . 0ABSCONV Function Criterion . . . . . . . . . . . -1.34e+154

*** Other Control Parameters ***Version of Algorithm. . . . . . . . . . . . . . . 0Initial Simplex Size (INSTEP) . . . . . . . . . . 1Final Simplex Size (ABSXTOL). . . . . . . . . . . 1e-006Number Interpolation Points . . . . . . . . . . . 15

We print an iteration history whenever the size RHO of the trust region is reduced:


Iter nfun optcrit difcrit rho1 17 12.7252293 3.79574440 0.100000002 69 2.79438753 9.93084172 0.010000003 82 2.78016283 0.01422471 0.001000004 119 2.76133365 0.01882918 1.000e-0045 128 2.76131424 1.940e-005 1.000e-0056 137 2.76131265 1.590e-006 1.000e-0067 144 2.76131253 1.233e-007 1.000e-006

Successful Termination After 7 IterationsCriterion 2.761312531 Max Grad Entry 0Max Const Viol. 2.2204e-016 N. Active Constraints 11N. Function Calls 145 Preproces. Time 0Time for Method 0 Effective Time 0



Parameter Estimate

1 X_1 -1.233791122 X_2 -1.23134771


3 X_3 -0.887061654 X_4 1.129888165 X_5 -1.128285076 X_6 1.084033657 X_7 0.843130738 X_8 0.753540259 X_9 -0.84617569

10 X_10 -0.6585847611 X_11 1.5568052512 X_12 0.58534120


The solution is a feasible point and the constraints 3, 14, 18, 44, 86, 105, 151, 189, 192, 193,and 196 areactive at the solution:


[ 1] -1.00000 * X_1 - 0.20818 * X_2- 9e-018 * X_3 + 1.00000 = 2.490128612

[ 2] -1.00000 * X_4 - 0.20818 * X_5- 9e-018 * X_6 + 1.00000 = 0.104994121

[ 3]ACT -1.00000 * X_7 - 0.20818 * X_8- 9e-018 * X_9 + 1.00000 = 7.9028e-018

[ 4] -1.00000 * X_10 - 0.20818 * X_11- 9e-018 * X_12 + 1.00000 = 1.334494707

[ 5] -0.98975 * X_1 - 0.27172 * X_2- 0.12788 * X_3 + 1.00000 = 2.669164996

.........................................................

[ 196]ACT 0.98975 * X_10 - 0.27172 * X_11+ 0.12788 * X_12 + 1.00000 = -9.7145e-017

[ 197] 1.00000 * X_1 - 0.20818 * X_2+ 2e-016 * X_3 + 1.00000 = 0.022546363

[ 198] 1.00000 * X_4 - 0.20818 * X_5+ 2e-016 * X_6 + 1.00000 = 2.364770444

[ 199] 1.00000 * X_7 - 0.20818 * X_8+ 2e-016 * X_9 + 1.00000 = 1.686261463

[ 200] 1.00000 * X_10 - 0.20818 * X_11+ 2e-016 * X_12 + 1.00000 = 0.017325184

2. Example coming with the Fortran code of BOBYQA:

Since boundary constraints are special linear constraints, all examples which are run with BOBYQA canalso be run by LINCOA.

print "\n *** Compare M. Powell’s BOBYQA with LINCOA: ***\n";


function fsrecip(x) {n = ncol(x);crit = 0.;for (i = 4; i <= n; i+=2)for (j = 2; j <= i-2; j+=2) {

t1 = x[i-1] - x[j-1]; t2 = x[i] - x[j];tt = t1 * t1 + t2 * t2;if (tt < 1.e-6) tt = 1.e-6;crit += 1. / sqrt(tt);

}return(crit);

}

m = 5; n = 2 * m;pi2 = 2. * macon("pi");bc = cons(n,1,-1.) -> cons(n,1,1.);x0 = cons(1,n,.);for (j = k = 1; j <= m; j++, k+=2) {xin = (pi2 / (real)m) * j;x0[k] = cos(xin); x0[k+1] = sin(xin);

}crit = fsrecip(x0); print "F(x0)=",crit;

F(x0)= 6.8819

npt = 2*n + 1;mopt = [ "tech" "lincoa" ,

"intpoi" npt ,"rhobeg" 1.0 ,"rhoend" 1.e-6 ,"maxfun" 10000 ,"print" 4 ];

< xr,rp > = nlp(fsrecip,x0,mopt,bc);



Parameter Estimate Lower BC Upper BC

1 X_1 0.30901699 -1.0000000 1.00000002 X_2 0.95105652 -1.0000000 1.0000000


3 X_3 -0.80901699 -1.0000000 1.00000004 X_4 0.58778525 -1.0000000 1.00000005 X_5 -0.80901699 -1.0000000 1.00000006 X_6 -0.58778525 -1.0000000 1.00000007 X_7 0.30901699 -1.0000000 1.00000008 X_8 -0.95105652 -1.0000000 1.00000009 X_9 1.00000000 -1.0000000 1.0000000

10 X_10 -2.449e-016 -1.0000000 1.0000000


LINCOA Algorithm by M.J.D. Powell (2013)*** Termination Criteria ***

Minimum Iterations . . . . . . . . . . . . . . . 0Maximum Iterations . . . . . . . . . . . . . . . 1000Maximum Function Calls. . . . . . . . . . . . . . 10000ABSFCONV Function Criterion . . . . . . . . . . . 0FCONV Function Criterion . . . . . . . . . . . . 2.22e-016FCONV2 Function Criterion . . . . . . . . . . . . 1e-006FSIZE Parameter . . . . . . . . . . . . . . . . . 0ABSXCONV Parameter Change Criterion . . . . . . . 1e-006XCONV Parameter Change Criterion . . . . . . . . 1e-008XSIZE Parameter . . . . . . . . . . . . . . . . . 0ABSCONV Function Criterion . . . . . . . . . . . -1.34e+154

*** Other Control Parameters ***Version of Algorithm. . . . . . . . . . . . . . . 0Initial Simplex Size (INSTEP) . . . . . . . . . . 1Final Simplex Size (ABSXTOL). . . . . . . . . . . 1e-006Number Interpolation Points . . . . . . . . . . . 21


Iter nfun optcrit difcrit rho1 23 5.69374981 1.18815979 0.100000002 40 5.60154297 0.09220684 0.010000003 46 5.60154297 0 0.001000004 54 5.60153418 8.785e-006 1.000e-0045 60 5.60153397 2.112e-007 1.000e-0056 64 5.60153397 0 1.000e-0067 69 5.60153397 0 1.000e-006

Successful Termination After 7 IterationsCriterion 5.601533972 Max Grad Entry 0Max Const Viol. 0 N. Active Constraints 9N. Function Calls 70 N. Gradient Calls 1Preproces. Time 0 Time for Method 0Effective Time 0






10 X_10 -1.236e-009


The same example can be run with the BOBYQA algorithm:

mopt = [ "tech" "bobyqa" ,"intpoi" npt ,"rhobeg" .1 ,"rhoend" 1.e-6 ,"maxfun" 5000 ,"print" 4 ];

< xr,rp > = nlp(fsrecip,x0,mopt,bc);



Parameter Estimate Lower BC Upper BC

1 X_1 0.30901699 -1.0000000 1.00000002 X_2 0.95105652 -1.0000000 1.00000003 X_3 -0.80901699 -1.0000000 1.00000004 X_4 0.58778525 -1.0000000 1.00000005 X_5 -0.80901699 -1.0000000 1.00000006 X_6 -0.58778525 -1.0000000 1.00000007 X_7 0.30901699 -1.0000000 1.0000000


8 X_8 -0.95105652 -1.0000000 1.00000009 X_9 1.00000000 -1.0000000 1.0000000

10 X_10 -2.449e-016 -1.0000000 1.0000000


BOBYQA Algorithm by M.J.D. Powell (2008)*** Termination Criteria ***

Minimum Iterations . . . . . . . . . . . . . . . 0Maximum Iterations . . . . . . . . . . . . . . . 1000Maximum Function Calls. . . . . . . . . . . . . . 5000ABSFCONV Function Criterion . . . . . . . . . . . 0FCONV Function Criterion . . . . . . . . . . . . 2.22e-016FCONV2 Function Criterion . . . . . . . . . . . . 1e-006FSIZE Parameter . . . . . . . . . . . . . . . . . 0ABSXCONV Parameter Change Criterion . . . . . . . 1e-006XCONV Parameter Change Criterion . . . . . . . . 1e-008XSIZE Parameter . . . . . . . . . . . . . . . . . 0ABSCONV Function Criterion . . . . . . . . . . . -1.34e+154

*** Other Control Parameters ***Version of Algorithm. . . . . . . . . . . . . . . 0Initial Simplex Size (INSTEP) . . . . . . . . . . 0.1Final Simplex Size (ABSXTOL). . . . . . . . . . . 1e-006Number Interpolation Points . . . . . . . . . . . 21


Iter nfun optcrit difcrit rho1 44 5.60889786 1.27301174 0.010000002 59 5.60156025 0.00733762 0.001000003 73 5.60153398 2.627e-005 1.000e-0044 79 5.60153397 6.357e-009 1.000e-0055 94 5.60153397 0 1.000e-0066 106 5.60153397 1.902e-011 1.000e-006

Successful Termination After 6 IterationsCriterion 5.601533972 Max Grad Entry 1.338420835Max Const Viol. 0 Max Grad LagF. 1.3362e-007N. Active Constraints 9N. Function Calls 107 N. Gradient Calls 1Preproces. Time 0 Time for Method 0Effective Time 0






10 X_10 3.401e-008


1.4.9 Computational Problems

First Iteration Overflows

If you use default or bad initial values for the parameters, the computation of the value of the objective function(and its derivatives) can lead to arithmetic overflows in the first iteration of each optimization algorithm. Theline-search algorithms that work with cubic extrapolation are especially sensitive to arithmetic overflows. If anoverflow occurs with an optimization technique that uses line search, you can use the INSTEP option to reducethe length of the first trial step during the line search of the first five iterations or use the DAMPSTEP orMAXSTEP options to restrict the step length of the initial α in subsequent iterations. If an arithmetic overflowoccurs in the first iteration of the trust-region, double dogleg, or Levenberg-Marquardt algorithm, you can usethe INSTEP option to reduce the default trust region radius of the first iteration. You can also change theminimization technique or the line-search method. If none of these methods helps, consider the following actions:

• scale the parameters,

• provide better initial values,

• use boundary constraints to avoid the region where overflows may happen,

• change the algorithm (specified in program statements) that computes the objective function.

Problems Evaluating Code for Objective Function

The starting point x(0) must be a point for which the program statements can be evaluated. If the default startingpoint is such a point you must use the X0 option to specify a better start. Even during the iteration there mayoccur locations of x where it may not be possible to evaluate objective or nonlinear constraint functions and/orits derivatives. In some cases, the specification of boundary or linear constraints for parameters can avoid suchsituations. In many other cases the user can indicate that the point x is a bad point simply by returning a missing


value with the funct module or return an extrem large (for minimization) or small (for maximization) value forthe objective function. In these cases the optimization algorithm reduces the step length and stays closer to thepoint which has been evaluated successfully in the former iteration.

Problems with QUANEW and Nonlinear Constraints

The sequential quadratic programming algorithm in QUANEW which is used for solving nonlinearly constrainedproblems can have problems updating the Lagrange multiplier vector µ. This results usually in very high valuesof the Lagrange function and in watchdog restarts indicated by signs in the iteration history. If this happens,there are two actions you can try to achieve successful convergence:

1. By default, the evaluation of the Lagrange vector µ is performed in the same way as Powell (1982) describes.This corresponds to VERSION=2. A modification of this algorithm can be performed by specifying VER-SION=1 which replaces the update of the Lagrange vector µ with the original update of Powell (1978)which is used in VF02AD.

2. In several applications, large steps in the first iterations were troublesome. You can use the INSTEP optionto impose an upper bound for the step size α during the first five iterations.

3. You may also use the INHESS option to specify a different starting approximation for the Hessian. Choosingsimply the INHESS option will use the Cholesky factor of a (possibly ridged) finite difference approximationof the Hessian to initialize the quasi-Newton update process.

No Convergence of Minimization Process

1. Check the derivative specification:If derivatives are specified by using the grad, tt hess, and jcon module argument, you can compare thespecified derivatives with those computed by finite-difference approximations. Using the GRADCHECKoption you can check if the gradient or Jacobian specified by a grad or jcon module is correct for thefunction f = f(x) computed by program statements.

2. Forward-difference derivatives specified with the FD... options may not be precise enough for highly non-linear functions to satisfy strong gradient termination criteria. You may need to specify the more expensivecentral-difference formulas or may have to specify analytical derivatives. If you did not already specify theFDINT option, the finite difference intervals may be too small or too big and the finite difference derivativesmay be erroneous. You can specify the FDINT option to compute better finite difference intervals with thealgorithm of Gill, Murray, Saunders, and Wright (1983). Note, that this algorithm can be very expensivein the number of function calls.

3. Print the parameter estimates and gradient (not for NMSIMP) during selected iterations.

4. Change the optimization technique:For example, if you use the default TECH=LEVMAR, you can:

• change to TECH=QUANEW or to TECH=NRRIDG,

• run some iterations with TECH= CONGRA, save the results, and use them as initial values specifiedby the X0 option in a second run with a different TECH technique.

5. Change or modify the update technique and/or the line-search algorithm:This method applies only to TECH=QUANEW, HYQUAN, or CONGRA. For example, if you use thedefault update formula and the default line-search algorithm, you can


• change the update formula with the UPD option,

• change the line-search algorithm with the LIS option,

• specify a more precise line search with the LSPRECISION option, if you use LIS=2 or LIS=3.

6. Change the initial values by using a grid search specification to obtain a set of good feasible starting values.

Convergence to Stationary Point

The gradient at a stationary point is the null vector, which always leads to a zero search direction. This pointsatisfies the first-order termination criterion. Search directions which are based on the gradient are zero and eachalgorithm except Nelder-Mead terminates. There are two special cases:

1. The starting point for the optimization is a stationary point which is not the solution:

• starting point for minimization is a local maximum point,

• starting point for maximization is a local minimum point,

• starting point is a saddle point.

There are two ways to avoid this situation:

• Use the X0 option to specify a valid feasible starting point x(0).

• You may use the OPTCHECK option to avoid terminating at the stationary point.

2. The optimization algorithm approaches a saddle point and terminates. You may use the OPTCHECKoption to avoid terminating at a saddle point.

Note: Do not specify the OPTCHECK option if there is no need, because the algorithm which is requested withthis option may be very time consuming.

The signs of the eigenvalues of the (projected) Hessian matrix correspond to three cases:

• if all eigenvalues are positive: the Hessian matrix is positive definite, the point is a minimum point;

• some of the eigenvalues are positive and all remaining eigenvalues are zero: Hessian matrix is positivesemidefinite, the point is a minimum or saddle point;

• all eigenvalues are negative: the Hessian matrix is negative definite, the point is a maximum point;

• some of the eigenvalues are negative and all remaining eigenvalues are zero: Hessian matrix is negativesemidefinite, the point is a maximum or saddle point;

• all eigenvalues are zero: the point can be a minimum, maximum, or saddle point.

Precision of Solution

In some applications NLP(.) may result in parameter estimates which are not precise enough. Usually this meansthat the optimization process terminated too early at a point which is too far from the exact optimal point. Thetermination criteria define the size of the termination region around the optimal point. Any point inside thisregion can be accepted for terminating the optimization process. The default values of the termination criteriaare set to satisfy a reasonable compromise between the computational effort (computer time) and the precision


of the computed estimates for the most common applications. However, there are a number of circumstanceswhere the default values of the termination criteria either specify a region that is too large or is too small. If thetermination region is too large then it can contain points with low precision and your application may terminateat such a point too early. In such cases, you should inspect your log or list output to find the message statingwhich termination criterion terminated the optimization process. In many applications you can obtain a solutionwith higher precision by simply using the old parameter estimates as starting values in a subsequent run whereyou specify a smaller value for the termination criterion which was satisfied at the former run.

If the termination region is too small then the optimization process either needs too much computer time tofind a point inside such a region or cannot even find such a point due to rounding errors in function values andderivatives. This can easily happen in applications where finite difference approximations of derivatives are usedand the GCONV and ABSGCONV termination criteria are too small to respect rounding errors in the gradientvalues.

Computer Resources

As the number n of variables in the objective function increases, you will confront problems with computer timeand memory resources of your computer. Some ways to avoid such trouble are:

• Try to use an optimization technique that uses only first-order derivatives (e.g., TECH=QUANEW, DBLDOG,or CONGRA). Computing the Jacobian matrix or even second-order derivatives for the Hessian matrix canbe too expensive, both in terms of computer time and memory need.

• Use the grad, hess, and jcon module arguments to specify the derivatives of the objective function.

• Try to specify a good initial estimate using the X0 option. This could save you iterations and computertime.

• There may be applications where the Hessian matrix of your application is diagonal; however, the methodwhich is used to discover this may not recognize the diagonality. Use the DIAHES option to force the opti-mization algorithm (TRUREG, NEWRAP, NRRIDG, or LEVMAR) to take advantage of the diagonality.

1.5 Some Examples for NLP

1.5.1 Two Examples for Genetic Algorithm

1. Example by Goldberg and Richardson (1987):

This function of two variable x1, x2 has 25 isolated local maxima in 0 ≤ xj ≤ 1 but only one of those is theglobal maximum with a function value f(x) = 1.

real function fgoldrich(x) {pi = macon("pi");t1 = 5.1 * pi; t2 = -4. * log(2.) / .64;f11 = pow(sin(t1 * x[1] + .5),6.);f12 = exp(t2 * (x[1] - 0.0667)** 2);f21 = pow(sin(5.1 * pi * x[2] + .5),6.);f22 = exp(t2 * (x[2] - 0.0667)** 2);f = f11 * f12 * f21 * f22;return(f);


}

After 100 generations (iterations) the uniform crossover micro-GA has found the best point at x = [0.062471390.06247139]which is very close to the global optimum. The convergence would be slightly slower if the ”uniform”crossover option is not specified (i.e. for single-point crossover).

bc = [ 0. 1. ,0. 1. ];

x0 = [ .5 .5 ];mopt = [ "tech" "genalg",

"bounds" "bc","print" 4,"popsize" 5,"maxgen" 100,"nchild" 1,"microga" ,"elite" ,"uniform" ,"pcross" .5,"pmutate" .02,"pcreep" .04 ];

< xr, rp> = nlp(fgoldrich,x0,mopt);

When we feed the result of GENALG as starting point into the trust region algorithm (TRUREG), weobtain with only 3 iterations a highly precise global maximum:

x0 = [0.06247139 0.06247139 ];mopt = [ "max" ,

"tech" "trureg" ,"bounds" "bc" ,"print" 4 ];


Using the trivial starting point of x0 = (.5, .5) the trust region algorithm would converge to the closest localmaximum with a function value of 0.265562.

x0 = [ 0.5 0.5 ];mopt = [ "max" ,



2. Goldstein & Price (1971) Function:

The Goldstein & Price function (see Floudas and Pardalos, 1990, p.27)

min f(x1, x2) with f(x1, x2) = f1(x1, x2)f2(x1, x2)


with

f1(x1, x2) = 1 + (x1 + x2 + 1)2(19 − 14x1 + 3x21 − 14x2 + 6x1x2 + 3x2

2)f2(x1, x2) = 30 + (2x1 − 3x2)2(18 − 32x1 + 12x2

1 + 48x2 − 36x1x2 + 27x22)

is to minimize in −2 ≤ xj ≤ 2. The global minimum of f(x∗) = 3 is at x∗ = (0,−1). To replace minimizationwith maximization and to avoid negative function values we maximize the function g(x) = 106 − f(x):

real function fgolpri(x) {a = (x[1] + x[2] + 1.)**2;b = 19. - 14*x[1] + 3*x[1]**2 - 14.*x[2] + 6.*x[1]*x[2] + 3.*x[2]**2;c = (2.*x[1] - 3.*x[2])**2;d = 18. - 32.*x[1] + 12.*x[1]**2 + 48*x[2] - 36.*x[1]*x[2] + 27.*x[2]**2;f = 1.e6 - (1. + a * b)*(30. + c * d);return(f);

}

After 100 iteration we obtain x = (0.09906308− 0.86593219) with f(x) = 12 which is reasonably close tothe global minimum:

bc = [ -2. 2. ,-2. 2. ];

x0 = [ .0 .0 ];mopt = [ "max" ,

"tech" "genalg","bounds" "bc","print" 4,"popsize" 5,"maxgen" 100,"nchild" 1,"microga" ,"elite" ,"uniform" ,"pcross" .5,"pmutate" .02,"pcreep" .04 ];

< xr, rp> = nlp(fgolpri,x0,mopt);

When we feed the result of GENALG as starting point into the trust region algorithm (TRUREG), we obtainwith only 4 iterations a highly precise global minimum of f(x∗) = 3 with x∗ = (6.604e− 007− 0.99999589):

x0 = [0.09906308 -0.86593219 ];mopt = [ "max" ,



Using the trivial starting point of x0 = (0, 0), the trust region algorithm would converge to the closest localminimum x∗ = (−0.59999735− 0.40000284) with a function value of f(x∗) = 30:


x0 = [0. 0. ];mopt = [ "max" ,



1.5.2 Exploiting Sparsity: Extended Rosenbrock

The following two sets of subroutines specify the objective function, gradient, and Jacobian for the extendedRosenbrock function (see [626]). The left set shows a scalar notation and the right set a notation with matrixalgebra. Since CMAT is an interpreter language it is expected that the matrix notation executes faster thanthe scalar notation. We can also see what effect the specification of the nonzero pattern has for the speed ofcomputing finite difference derivatives.


function fexros3(x) {f = cons(50,1);for (i = 1; i <= 50; i += 2) {t1 = x[i];f[i] = 10. * (x[i+1] - t1 * t1);f[i+1]= 1. - t1;

}crit = .5 * f[**];return(crit);

}

function fexros4(x) {f = cons(50,1);for (i = 1; i <= 50; i += 2) {t1 = x[i];f[i] = 10. * (x[i+1] - t1 * t1);f[i+1]= 1. - t1;

}return(f);

}

function jexros2(x) {jac = cons(50,50,0.);for (i = 1; i <= 50; i += 2) {jac[i,i] = -20 * x[i];jac[i,i+1] = 10.; jac[i+1,i] = -1.;

}return(jac);

}

function gexros2(x) {f = cons(50,1);jac = cons(50,50,0.);for (i = 1; i <= 50; i += 2) {t1 = x[i];f[i] = 10. * (x[i+1] - t1 * t1);f[i+1]= 1. - t1;jac[i,i] = -20 * t1;jac[i,i+1] = 10.; jac[i+1,i] = -1.;

}g = jac‘ * f;return(g);

}

j1= [ 1:2:49 ]; j2= [ 2:2:50 ];

function fexros1(x) global(j1,j2) {f = cons(50,1);f[j1] = 10. * (x[j2] - x[j1] .* x[j1]);f[j2] = 1. - x[j1];crit = .5 * f[**];return(crit);}

function fexros2(x) global(j1,j2) {f = cons(50,1);f[j1] = 10. * (x[j2] - x[j1] .* x[j1]);f[j2] = 1. - x[j1];return(f);}

function jexros(x) global(j1,j2) {w1 = cons(50,1,0);w2 = w3 = cons(49,1,0.);w1[j1] = -20. * x[j1];w2[j1] = 10.; w3[j1] = -1.;jac = diag(w1)+diag(w2,1)+diag(w3,-1);return(jac);}

function gexros(x) global(j1,j2) {f = cons(50,1);f[j1] = 10. * (x[j2] - x[j1] .* x[j1]);f[j2] = 1. - x[j1];w1 = cons(50,1,0);w2 = w3 = cons(49,1,0.);w1[j1] = -20. * x[j1];w2[j1] = 10.; w3[j1] = -1.;jac = diag(w1)+diag(w2,1)+diag(w3,-1);g = jac‘ * f;return(g);}

Specifying initial point x(0) and the nonzero pattern for Hessian and Jacobian:

x0 = cons(50,1,1.);ind = [ 1:6:49 3:6:45 5:6:47 ];x0[ind] = -1.2;i0 = [ 1:50 ]‘; i1= [ 1:2:49 ]‘; i2= [ 2:2:50 ]‘;jpat = (i1 -> i1) |> (i1 -> i2) |> (i2 -> i1);


hpat = (i0 -> i0) |> (i1 -> i2);

The first four runs execute the fast matrix notation modules trying to solve the least-squares problem with theLevenberg-Marquardt algorithm:

mopt2 = [ "tech" "levmar" ,"print" "pshort" ];

madd1 = [ "jspat" "jpat" ];madd2 = [ "jspat" "jpat" ,

"hspat" "hpat" ];mopt4 = mopt2 |> madd2;< xr, rp> = nlp(fexros2,x0,mopt2);< xr, rp> = nlp(fexros2,x0,mopt4);< xr, rp> = nlp(fexros2,x0,mopt2,.,jexros);< xr, rp> = nlp(fexros2,x0,mopt4,.,jexros);

The first run does not use specified derivatives (finite difference evaluation of Jacobian) and no nonzero patternof the Jacobian and therefore takes more than the default 50 iterations in 27 iterations:

Levenberg-Marquardt OptimizationScaling Update of More (1978)

Gradient Computed by Finite DifferencesCRP Jacobian Computed by Finite Differences

Iteration Start:N. Variables 50 N. Equations 50Criterion 302.5000000 Max Grad Entry 107.7999992TR Radius 3397.826433

Iter rest nfun act optcrit difcrit maxgrad lambda rho

1* 0 14 0 149.5817 152.9183 107.800 5e+018 0.167132* 0 15 0 149.5816 9.9e-005 107.800 8e+008 1.000003* 0 16 0 149.5814 2.0e-004 107.800 4e+008 1.000004* 0 17 0 149.5810 3.9e-004 107.799 2e+008 1.000005* 0 18 0 149.5802 7.9e-004 107.799 1e+008 1.000006* 0 19 0 149.5778 2.4e-003 107.796 4e+007 1.000007* 0 20 0 149.5731 4.7e-003 107.792 2e+007 0.999998* 0 21 0 149.5636 9.5e-003 107.782 8886762 0.999989* 0 22 0 149.5353 0.028364 107.755 2962029 0.99995

10* 0 23 0 149.4786 0.056716 107.699 1480743 0.9999111* 0 24 0 149.3652 0.113385 107.588 740100 0.9998112* 0 25 0 149.0254 0.339776 107.256 246475 0.9994313* 0 26 0 148.3475 0.677851 106.596 122967 0.9988614* 0 27 0 146.9986 1.348905 105.289 61213.8 0.9977115* 0 28 0 143.0062 3.992469 101.483 20185.4 0.9930016* 0 29 0 135.2643 7.741887 94.3083 9833.06 0.9854017* 0 30 0 120.7476 14.51664 81.5359 4667.96 0.9682818* 0 31 0 84.96611 35.78153 54.2505 1381.82 0.88413


19* 0 32 0 49.75645 35.20966 23.7772 482.272 0.6158720* 0 34 0 48.55866 1.197791 11.2292 494.067 0.6877721* 0 35 0 48.39509 0.163577 2.49534 27.1534 0.5589822* 0 36 0 48.32235 0.072732 1.98663 220.151 0.1057523* 0 37 0 48.29900 0.023354 2.80191 200.039 0.0347724* 0 38 0 48.07131 0.227692 0.78252 494.398 0.5842425* 0 39 0 47.98249 0.088823 1.26080 126.614 0.8026526* 0 41 0 47.91459 0.067893 0.79661 349.244 0.7406427* 0 42 0 47.85136 0.063230 0.94061 255.102 1.0008728* 0 43 0 47.71542 0.135942 0.80612 77.3363 1.3035529* 0 44 0 47.69583 0.019593 2.13184 41.7384 0.1026530* 0 45 0 47.64272 0.053105 2.84021 205.459 0.0662131* 0 46 0 47.40042 0.242307 0.79805 463.261 0.5659232* 0 47 0 47.29741 0.103008 1.25672 104.995 0.9083333* 0 49 0 47.22911 0.068297 0.79929 346.831 0.7591134* 0 50 0 47.13292 0.096188 1.28349 113.340 0.8299835* 0 52 0 47.06216 0.070765 0.79983 350.426 0.7429836* 0 53 0 46.99670 0.065457 0.93713 247.113 1.0215637* 0 54 0 46.85760 0.139096 0.85031 77.0370 1.3085838* 0 55 0 46.78995 0.067658 2.01916 35.8278 0.3683439* 0 57 0 46.58012 0.209826 0.79407 396.270 0.5572140* 0 58 0 46.47698 0.103144 1.25452 108.240 0.8983341* 0 60 0 46.40658 0.070401 0.80087 327.132 0.7659942* 0 61 0 46.30952 0.097055 1.30686 112.345 0.8017343* 0 63 0 46.23521 0.074312 0.80177 344.860 0.7399444* 0 64 0 46.16700 0.068206 0.94152 237.836 1.0281445* 0 65 0 46.02254 0.144466 0.86909 73.9772 1.3150546* 0 66 0 45.93904 0.083499 2.00620 32.9591 0.4479447* 0 68 0 45.73194 0.207097 0.79410 383.767 0.5624548* 0 69 0 45.62603 0.105906 1.24322 108.623 0.9145449* 0 71 0 45.55353 0.072501 0.80258 302.086 0.7787950* 0 72 0 45.45633 0.097205 1.33288 112.484 0.76411

[warning] file tnlp21.inp, line 125 near ")": LEVMAROptimization cannot be completed.

[warning] file tnlp21.inp, line 125 near ")": LEVMAR needs morethan 50 iterations or 125 function calls.

Criterion 45.45632875 Max Grad Entry 1.332881795Ridge (lambda) 112.4838500 TR Radius 0.063257361Act.dF/Pred.dF 0.764107225N Function Calls 73 N Gradient Calls 2N Hessian Calls 52 Effective Time 27

The second run (specified sparsity pattern), and the third and fourth run (specified derivatives) all take only 11iterations and no more than 1 second. Here we show only the history for the second run:






1 0 4 0 54.54630 247.9537 17.4704 8e-003 0.963892 0 6 0 39.84260 14.70370 3.70145 0.01900 0.987833 0 7 0 33.21206 6.630533 7.08434 8e-003 0.678224 0 8 0 25.97277 7.239293 6.30924 8e-003 0.592835 0 9 0 19.73582 6.236949 7.26173 6e-003 0.486206 0 10 0 14.70955 5.026270 7.88368 5e-003 0.392827 0 11 0 8.555955 6.153597 6.68152 3e-003 0.524458 0 12 0 4.907519 3.648436 8.38572 1e-003 0.468949 0 13 0 2.906522 2.000996 9.30857 2e-004 0.40867

10 0 14 0 9.9e-004 2.905532 0.17806 0.00000 0.9996611 0 15 0 6.2e-029 9.9e-004 4e-014 0.00000 1.00000

Successful Termination After 11 IterationsABSGCONV convergence criterion satisfied.Criterion 0.000000000 Max Grad Entry 0.000000000Ridge (lambda) 0.000000000 TR Radius 0.089031236Act.dF/Pred.dF 1.000000000N Function Calls 16 N Gradient Calls 2N Hessian Calls 13 Effective Time 1

Because of the highly sparse Jacobian only 2 function calls are needed to compute finite difference approximationsfor Jacobian matrix (and gradient).


The next three runs execute the slower scalar notations of function and derivative modules:

< xr, rp> = nlp(fexros4,x0,mopt4);< xr, rp> = nlp(fexros4,x0,mopt2,.,jexros2);< xr, rp> = nlp(fexros4,x0,mopt4,.,jexros2);

The first run takes 1 second, the scond run takes 3, and the third run takes 4 seconds. We only show the historyof the fourth run:


User Specified GradientUser Specified Jacobian (sparse)



1 0 4 0 54.54630 247.9537 17.4704 8e-003 0.963892 0 6 0 39.84260 14.70370 3.70145 0.01900 0.987833 0 7 0 33.21206 6.630533 7.08434 8e-003 0.678224 0 8 0 25.97277 7.239293 6.30924 8e-003 0.592835 0 9 0 19.73582 6.236948 7.26173 6e-003 0.486206 0 10 0 14.70955 5.026270 7.88368 5e-003 0.392827 0 11 0 8.555956 6.153598 6.68152 3e-003 0.524458 0 12 0 4.907519 3.648436 8.38572 1e-003 0.468949 0 13 0 2.906523 2.000996 9.30857 2e-004 0.40867

10 0 14 0 9.9e-004 2.905532 0.17806 0.00000 0.9996611 0 15 0 6.2e-029 9.9e-004 4e-014 0.00000 1.00000

Successful Termination After 11 IterationsABSGCONV convergence criterion satisfied.Criterion 0.000000000 Max Grad Entry 0.000000000Ridge (lambda) 0.000000000 TR Radius 0.089031073Act.dF/Pred.dF 1.000000000N Function Calls 16 N Gradient Calls 2N Jacobian Calls 2 N Hessian Calls 13Effective Time 4

1.5.3 ML Estimation: Examples by Clarke (1987)

Mitcherlitz Equation on Pattinson Data

The model in Clarke (1987, p.224) isf(xi, θ) = θ3 + θ2 exp(θ1xi)


The following are the Pattinson (1981) data used in Clarke (1987):

print "G.P.Y. Clarke (1987): Mitcherlitz Equation";

mitch = [ 1 3.183, 2 3.059, 3 2.871, 4 2.622,

5 2.541, 6 2.184, 7 2.110, 8 2.075,

9 2.018, 10 1.903, 11 1.770, 12 1.762,

13 1.550 ];

cnam = [" x y "];

mitch = cname(mitch,cnam);

We specify the negative log-likelihood function:

l(θ) = log(σ) +12(x− µ(θ)

σ)2, with µ(θ) = θ3 + θ2 exp(θ1xi)

The nlp specification follows:

function fmitchml(x) global(mitch) {

mu = x[4] + x[3] * exp(x[2] * mitch[,1]);

loglik = log(x[1]) + .5*((mitch[,2] - mu)/x[1]).**2;

f = loglik[+];

return(f);

}

bc = cons(4,2,.); cons[1,1] = 1.e-12;

x0 = [ .1 -.1 1. 1. ];

pnam = [" sigma th1 th2 th3 "];

x0 = cname(x0,pnam);

print "Trust-Region Algorithm: ML";

mopt = [ "tech" "trureg" ,

"print" 3 ];

< xr,rp,der1,der2 > = nlp(fmitchml,x0,mopt,bc);

print "Xopt=",xr; print "RP=",rp;

print "DER1=", der1; print "DER2=", der2;

The iteration history follows:

Trust Region Optimization

Without Parameter Scaling

Gradient Computed by Finite Differences

Hessian Computed by Finite Differences (dense)

(Using Only Function Calls)

Iteration Start:

N. Variables 4

Criterion 397.0676620 Max Grad Entry 8410.023279

TR Radius 1.000000000


1* 0 3 0 142.6687 254.3989 3019.21 25105.7 0.10003


2* 0 4 0 60.46836 82.20037 1183.90 1243.25 0.10512

3* 0 5 0 19.98902 40.47934 452.177 656.187 0.11171

4* 0 6 0 0.827850 19.16117 164.037 376.798 0.12132

5* 0 7 0 -7.978590 8.806441 52.1365 243.386 0.12797

6* 0 11 0 -17.97489 9.996298 66.2129 23.9794 1.52490

7* 0 12 0 -20.18709 2.212202 378.072 67.2497 0.38122

8* 0 14 0 -24.45118 4.264089 133.827 598.295 0.03849

9 0 15 0 -25.89750 1.446317 37.2613 314.506 0.04031

10 0 19 0 -27.32614 1.428646 65.0396 8.13024 0.31280

11 0 20 0 -28.27502 0.948878 47.5394 33.5743 0.12968

12 0 21 0 -28.80040 0.525380 253.589 0.00000 0.28200

13 0 22 0 -29.14097 0.340567 8.83469 0.00000 0.19969

14 0 23 0 -29.19834 0.057372 33.5668 3.27322 0.07217

15 0 24 0 -29.20972 0.011385 5.04417 0.00000 0.07252

16 0 25 0 -29.21029 5.7e-004 0.56264 0.00000 0.03503

17 0 26 0 -29.21029 2.4e-006 2e-003 0.00000 0.01066

18 0 27 0 -29.21029 3.0e-010 3e-006 0.00000 6e-004


GCONV convergence criterion satisfied.

Criterion -29.21029458 Max Grad Entry 2.9127e-006

Ridge (lambda) 0.000000000 TR Radius 0.000624645

Act.dF/Pred.dF 4.302549508

N. Function Calls 28 N. Gradient Calls 2

N. Hessian Calls 20 Preproces. Time 0

Time for Method 0 Effective Time 0

The tables of parameter estimates and the Hessian matrix follow:

********************

Optimization Results

********************

Parameter Estimates

-------------------


1 sigma 0.06412340 2.91e-006

2 th1 -0.10305545 -2.16e-006

3 th2 2.51899317 -6.78e-008

4 th3 0.96312800 -1.21e-007


Hessian Matrix

**************

Symmetric Matrix: Dense Storage

S | 1 2 3 4

------------------------------------------------

1 | 6323.2499 -19.282826 -0.3279992 -0.5845209

2 | -19.282826 185589.64 10998.017 23322.116


3 | -0.3279992 10998.017 989.63839 1653.5970

4 | -0.5845209 23322.116 1653.5970 3161.6238

The following code specifies approximate standard errors and the approximate correlation matrix of parameterestimates:

mle = xr; sopt = [ "grad" "hess" ];

< gopt,hopt > = fider(fmitchml,mle,sopt);

print "gopt=", gopt;

print "hopt=", hopt;

v = inv(hopt);

ase = sqrt(dia2vec(v)); tab = mle‘ -> ase;

tab = rname(tab,pnam); tab = cname(tab,[" Estimate STDERR "]);

d = diag(1. / ase);

corr = d * v * d;

corr = cname(corr,pnam); corr = rname(corr,pnam);

print "Estimates:",mle;

print "Standard Errors:",tab;

print "Correlations:",corr;

Estimates:

| sigma th1 th2 th3

--------------------------------------------

1 | 0.06412 -0.10306 2.5190 0.96313

Standard Errors:

| Estimate STDERR

-------------------------------

sigma | 0.06412 0.01258

th1 | -0.10306 0.02230

th2 | 2.5190 0.23240

th3 | 0.96313 0.28116

Correlations:

SYM | sigma th1 th2 th3

-------------------------------------------------

sigma | 1.00000

th1 | 0.00426 1.00000

th2 | 0.00394 0.92282 1.00000

th3 | -0.00419 -0.98400 -0.97222 1.0000

Richards Function on Nelder Data

The Richards function is given in Clarke (1987, p.226)

f(xi, θ) = θ4 − θ3 log(1.+ exp(−θ2θ3

− θ1θ3xi))

The following data are from Nelder (1961) but are given in Clarke (1987) too:


print "G.P.Y. Clarke (1987): Richards Function";

nelder = [ -2.15 2.5451, -1.50 3.6652, -.85 4.5110,

-.08 5.6556, .52 6.3982, 1.10 7.0397,

2.28 7.8241, 3.23 8.2558, 4.00 8.4784,

4.65 8.4121, 5.00 8.4784 ];

pnam = [" x y "];

nelder = cname(nelder,pnam);

The nlp specification for the negative log-likelihood function follows:

print "Maximum Likelihood Estimation";

function frichml(x) global(nelder) {

w = 1. + exp(-x[3]/x[4] - x[2]*nelder[,1] / x[4]);

mu = x[5] - x[4] * log(w);

v = log(x[1]) + .5 *((nelder[,2] - mu) / x[1]) .** 2;

f = v[+];

return(f);

}

bc = cons(5,2,.);

bc[1,1] = 1.e-12; bc[4,1] = 1.e-12;

x0 = [ .1 1. -1. 1. 1. ];

pnam = [" sigma th1 th2 th3 th4 "];




"print" 3 ];

< xr,rp,der1,der2 > = nlp(frichml,x0,mopt,bc);



The iteration history follows:






Iteration Start:

N. Variables 5

N. Bound. Constr. 2 N. Mask Constr. 0


N. Active Constraints 0 TR Radius 1.000000000


1* 0 2 0 11002.90 12672.73 180742 2968.97 1.00000

2* 0 3 0 4937.381 6065.517 67663.4 1584.82 1.00012

3* 0 4 0 2111.973 2825.407 24703.2 821.053 1.00872


4* 0 5 0 842.0891 1269.884 8691.91 409.876 1.02435

5* 0 6 0 298.4665 543.6226 2874.55 199.587 1.05268

6* 0 7 0 83.00837 215.4581 854.195 94.0314 1.09124

7* 0 8 0 7.357272 75.65110 199.751 37.1669 1.13019

8* 0 11 0 -19.08662 26.44389 60.7628 17.6754 3.63103

9* 0 13 0 -20.60162 1.515000 407.233 697.064 0.12355

10* 0 14 0 -26.07121 5.469591 173.536 147.840 0.12550

11 0 15 0 -27.56660 1.495383 49.1202 29.2072 0.12629

12 0 16 0 -28.08124 0.514645 11.8963 12.5331 0.12632

13 0 17 0 -28.23798 0.156734 5.25356 0.00000 0.12641

14 0 18 0 -28.24741 9.4e-003 0.38698 0.00000 0.11886

15 0 19 0 -28.24748 7.1e-005 2e-003 0.00000 0.02629

16 0 20 0 -28.24748 4.9e-009 3e-005 0.00000 3e-003




N. Active Constraints 0 Ridge (lambda) 0.000000000

TR Radius 0.002808353 Act.dF/Pred.dF 1.528095205




The tables of parameter estimates and the Hessian matrix follow:

********************


********************

Parameter Estimates

-------------------


1 sigma 0.04651629 2.89e-005

2 th1 1.64098753 2.71e-007

3 th2 -2.39220641 4.92e-007

4 th3 1.76818791 8.61e-008

5 th4 8.55448140 -4.49e-007


Hessian Matrix

**************


S | 1 2 3 4 5

-----------------------------------------------------------

1 | 10167.465 -1.3368009 -0.8633067 -0.3318032 -6.3062497

2 | -1.3368009 3898.8341 -1418.6074 -279.59754 -602.24648

3 | -0.8633067 -1418.6074 1960.1710 -1011.8422 2562.8372

4 | -0.3318032 -279.59754 -1011.8422 923.41257 -1942.4388

5 | -6.3062497 -602.24648 2562.8372 -1942.4388 5083.7307


The following is the code for specifying approximate standard errors and approximate correlation matrix ofparameter estimates follow:


< gopt,hopt > = fider(frichml,mle,sopt);



v = inv(hopt);



d = diag(1. / ase);

corr = d * v * d;





The tables of approximate standard errors and approximate correlation matrix of parameter estimates follow:

Estimates:

| sigma th1 th2 th3 th4

------------------------------------------------------

1 | 0.04652 1.6410 -2.3922 1.7682 8.5545

Standard Errors:

| Estimate STDERR

-------------------------------

sigma | 0.04652 0.00992

th1 | 1.6410 0.07156

th2 | -2.3922 0.13604

th3 | 1.7682 0.23678

th4 | 8.5545 0.04590

Correlations:

SYM | sigma th1 th2 th3 th4

-----------------------------------------------------------

sigma | 1.00000

th1 | 0.00459 1.00000

th2 | 0.00420 0.95834 1.00000

th3 | 0.00497 0.94905 0.91795 1.00000

th4 | 0.00464 0.62332 0.49209 0.77466 1.00000

1.5.4 ML Estimation: Error Rates of Two Medical Diagnostic Tests

This example shows how to use the nlp function to evaluate the accuracy of two diagnostic tests. The accuracy ofany test is determined by its sensitivity (the ability of the test to correctly diagnose the presence of disease) andspecificity (the ability of the test to correctly diagnose the absence of disease). The false positive rate α is definedas the proportion of non-diseased people who have positive outcomes of the test. Likewise, the false negative rate


β is defined as the proportion of diseased people who have negative outcomes of the test. Therefore, sensitivityis 1 − β and specificity is 1 − α.

Hui and Walter (1980) showed that, if data are available from two populations with different disease prevalences,you can use maximum likelihood to estimate the error rates of both tests and also the prevalences in bothpopulations. In this example, the Tine test (a new test) and the Mantoux test (the standard test), are comparedfor detection of tuberculosis in low and high prevalence popupations. Both skin tests are applied simultaneouslyto the arm of each person, with the evaluation made after 48 hours. The results were:

Population 1 Population 2

Tine test Tine test

Mantoux test Positive Negative Total Positive Negative TotalPositive 14 4 18 887 31 918Negative 9 528 537 37 367 404Total 23 532 555 924 398 1322

Let

• Ni be the sample size from the ith population, i = 1, 2

• αj be the false positive rate of the jth test, j = 1, 2

• βj be the false negative rate of the jth test, j = 1, 2

• θi be the prevalence rate for the ith population, i = 1, 2, with θ1 6= θ2

Under the assumption of conditional independence between the tests, the likelhood function is the product of twoindependent multinomials and is given by

L =2∏

i=1

{θi (1 − β1) (1 − β2) + (1 − θi)α1α2 }Ni Pi,1,1

× {θi (1 − β1) β2 + (1 − θi)α1 (1 − α2) }Ni Pi,1,2

× {θi β1 (1 − β2) + (1 − θi) (1 − α1)α2 }Ni Pi,2,1

× {θi β1 β2 + (1 − θi) (1 − α1) (1 − α2) }Ni Pi,2,2

where Pi,j,j′ is the observed proportion of the sample i with test outcomes j and j′ in tests 1 and 2, respectively:

• j = 1(2) if test 1 is positive (negative) and

• j′ = 1(2) if test 2 is positive (negative).

The likelihood can be maximized by numerical optimization methods with the solution restricted to the unithypercube. In practice, it is common to approach this problem by minimizing the negative log of the likelihood.This is the approach taken here.

The negative log likelihood function can be minimized using nlp. For this example, the trust region techniqueis used with nlp to calculate the maximum likelihood parameter estimates and fider is used to compute thesecond-order derivative.


function fhuiwal(z) global(data) {

/* pieces of the log likelihood */

p = cons(8,1);

p[1] = log(z[5]*(1-z[2])*(1-z[4]) + (1-z[5])*z[1]*z[3]);

p[2] = log(z[5]*(1-z[2])*z[4] + (1-z[5])*z[1]*(1-z[3]));

p[3] = log(z[5]*z[2]*(1-z[4]) + (1-z[5])*(1-z[1])*z[3]);

p[4] = log(z[5]*z[2]*z[4] + (1-z[5])*(1-z[1])*(1-z[3]));

p[5] = log(z[6]*(1-z[2])*(1-z[4]) + (1-z[6])*z[1]*z[3]);

p[6] = log(z[6]*(1-z[2])*z[4] + (1-z[6])*z[1]*(1-z[3]));

p[7] = log(z[6]*z[2]*(1-z[4]) + (1-z[6])*(1-z[1])*z[3]);

p[8] = log(z[6]*z[2]*z[4] + (1-z[6])*(1-z[1])*(1-z[3]));

/* F is the negative log likelihood function */

f = -data * p;

return (f);

}

/* Supply the data values for test1 and test2: */

/* first four values correspond to pop1: (+,+), (+,-), (-,+), (-,-) */

/* second four values correspond to pop2: (+,+), (+,-), (-,+), (-,-) */

data = [ 14.0 4.0 9.0 528.0 887.0 31.0 37.0 367.0 ];

/* Constrain the parameters to the unit hypercube */

bc = cons(6,1,.001) -> cons(6,1,.999);

/* Provide a starting point x0. The parameters are: */

x0 = [ .01 .05 .01 .05 .05 .70 ];

pnam = [" alpha1 beta1 alpha2 beta2 theta1 theta2 "];




"print" 3 ];

< xr,rp,der1,der2 > = nlp(fhuiwal,x0,mopt,bc);



The output shows there were no convergence problems and prints the estimates of the error rates, the prevalences,and the estimated variance-covariance matrix. Note that you need to subtract the estimated error rates from 1to obtain the respective estimates for sensitivity and specificity.






Iteration Start:

N. Variables 6






1 0 3 0 1208.864 9.859474 259.907 8941.93 0.02968

2 0 4 0 1207.675 1.188848 64.2961 0.00000 0.02783

3 0 5 0 1207.617 0.058087 5.73393 0.00000 0.01422

4 0 6 0 1207.617 3.5e-004 0.05911 0.00000 2e-003

5 0 7 0 1207.617 2.7e-008 2e-005 0.00000 1e-004



Criterion 1207.616749 Max Grad Entry 1.5158e-005






********************


********************

Parameter Estimates

-------------------


1 alpha1 0.00668137 1.52e-005

2 beta1 0.03387585 0.0000000

3 alpha2 0.01586428 0.0000000

4 beta2 0.03117227 -1.48e-005

5 theta1 0.02683959 0.0000000

6 theta2 0.71679214 -8.89e-006


Hessian Matrix

**************


S | 1 2 3 4 5

-----------------------------------------------------------

1 | 75931.582 -618.91465 -164.14334 12747.427 2588.6667

2 | -618.91465 23798.136 10051.775 -706.38428 -522.24166

3 | -164.14334 10051.775 36645.312 -478.88769 1236.7929

4 | 12747.427 -706.38428 -478.88769 28078.882 -487.97031

5 | 2588.6667 -522.24166 1236.7929 -487.97031 19880.571

6 | 1670.5061 -854.37704 1535.7543 -490.36164 0.0000000

S | 6

---------------

1 | 1670.5061

2 | -854.37704


3 | 1535.7543

4 | -490.36164

5 | 0.0000000

6 | 6311.7641

The following is the CMAT code for computing asymptotic standard errors and the correlation matrix of parameterestimates under normality assumptions:


< gopt,hopt > = fider(fhuiwal,mle,sopt);



v = inv(hopt);



d = diag(1. / ase);

corr = d * v * d;





Estimates:

| alpha1 beta1 alpha2 beta2 theta1 theta2

----------------------------------------------------------------

1 | 0.00668 0.03388 0.01586 0.03117 0.02684 0.71679

Standard Errors:

| Estimate STDERR

--------------------------------

alpha1 | 0.00668 0.00380

beta1 | 0.03388 0.00695

alpha2 | 0.01586 0.00562

beta2 | 0.03117 0.00623

theta1 | 0.02684 0.00713

theta2 | 0.71679 0.01279

Correlations:

SYM | alpha1 beta1 alpha2

----------------------------------------

alpha1 | 1.00000

beta1 | -0.00535 1.00000

alpha2 | 0.01273 -0.35154 1.00000

beta2 | -0.28189 0.02904 -0.00399

theta1 | -0.07626 0.04382 -0.05897

theta2 | -0.09107 0.11255 -0.13378

SYM | beta2 theta1 theta2

----------------------------------------

beta2 | 1.0000


theta1 | 0.04196 1.00000

theta2 | 0.06255 0.01710 1.0000

Using the estimated variance-covariance matrix to obtain the standard deviations for the estimates, the maximumlikelihood estimates (± one standard error) are:

• α1 = .0067± .0038

• β1 = .0339± .0069

• α2 = .0159± .0056

• β2 = .0312± .0062

• θ1 = .0268± .0071

• θ2 = .7168± .0128

These results match those given by Hui and Walter.

1.5.5 Diffusion of Tetracycline Hydrochloride

The CMAT code for this example can be found in files cmat/tnlp/tnlp160.inp and cmat/tnlp/tnlp164.inp.

Description of the Problem

This example shows how to use the nlp and ode functions to estimate the parameters of a two-compartmentmodel for the diffusion of the drug Tetracycline Hydrochloride. In a system of compartments, the rates of flow ofthe drug between the compartments follow first-order differential equations.

A tetracycline compound was administered to a subject orally, and measurements of the concentration of tetra-cycline hydrochloride in the blood serum were taken over the next 16 hours. The data consist of two variables:time in hours since the adminstration of drug and concentration in micrograms per milliliter.

The input is a bolus in the gut, and the drug is then absorbed into the blood. Let γ1(t) and γ2(t) be theconcentrations at time t in the gut and in the serum, respectively. Let θ1 and θ2 be the transfer parameters. Themodel is depicted as follows:

Gut Compartment (Source)Chemical is introduced

Concentration γ1(t)-

θ1

Blood Compartment (Sink)Chemical is absorbed

Concentration γ2(t)

?θ2

Two-Compartment Model for Tetracyline Diffusion


The rates of flow of the drug are described by the following pair of ordinary differential equations:

∂γ1(t)∂t

= γ1(t) = −θ1γ1(t)

∂γ2(t)∂t

= γ2(t) = θ1γ1(t) − θ2γ2(t)

The initial concentration of the drug in the gut is unknown, but the concentration in the serum is assumed to bezero. Let θ3 be the unknown initial concentration in the gut. The initial conditions are

γ1(0) = θ3 and γ2(0) = 0

Suppose yi is the observed serum concentration at ti. The parameters θ1, θ2, and θ3 are estimated by minimizingthe sum of squares of the differences between the observed and predicted serum concentrations

∑

i

{yi − γ2(ti)}2

Since CMAT is a matrix language, it is more convenient to express the system of differential equations in matrixnotation. Let

γ(t) =[γ1(t)γ2(t)

]

then∂γ(t)∂t

= γ(t) = Aγ(t)

where the transfer matrix A and the initial condition are

A =[

−θ1 0θ1 −θ2

]and γ(0) =

[θ30

]

For fixed θ1, θ2, and θ3, the given system of differential equations can be solved by specifying

par = [ "print" 2 ,

"reltol" 1.e-9 ,

"met" "rk56" ];

< res, tout > = ode(der,t,c,par,.,.,jac);

The input arguments of the ODE subroutine are as follows:

der name of the function returning Aγ(t)

t vector of time points specifying the limits of integration over connected subintervals. The first component mustcontain the initial time of the integration.

c initial values γ(0)

par two-column matrix specifying options

jac name of the function for the Jacobian of the function ”der”

Among others we can specify the following options:


”print” 2 controls printed output (2=all output)

”reltol” 1.e-8 relative accuracy criterion for termination

”met” ”adam” specifies Adams method

”met” ”gear” specifies Gear method

”met” ”rk56” specifies Verner’s (5,6) order Runge-Kutta method

The output arguments of the ODE subroutine are as follows:

res name of the vector that contains the solutions

tout vector of time points which the solution res approximates closer than the input vector t.

See the Reference Manual of CMAT for more information.

Solving the Basic Problem

Using the data in Bates & Watts (1988, p.281), the parameters of the compartment models are estimated by thespecifications given in the following steps:

1. Create the input data. Invoke IML and read in the data.

print "LS Estimation with ODE: Bates and Watts (1988):";print "Example: Tetracycline : Data by Wagner (1967)";options ps=60 ls=68;wagner = [ 1 0.7, 2 1.2, 3 1.4, 4 1.4, 6 1.1,

8 0.8, 10 0.6, 12 0.5, 16 0.3 ];wagner = cname(wagner, [ "time" "conc" ]);m = nrow(wagner);

2. Function and Jacobian Modules needed by ODE call:

The derode function evaluates system of ODE’s, where the rows of y correspond to ODE’s and x correspondsto γt:

function derode(t,y) global(A) {z = A * y;return(z);

}

The jacode function evaluates Jacobian of derode():

function jacode(t,y) global(A) {return(A);

}


3. Define the objective function F0 for the LS estimation. The serum concentrations are computed in thefunction F0 by calling the ODE subroutine.

function f0(theta,time,ipri) global(A) {/* transfer matrix A needed by derode() and jacode() */th1 = theta[1]; t1m = -th1; t2m = -theta[2]; th3 = theta[3];A = [ t1m 0. ,

th1 t2m ];/* vector of initial values y(0) = gamma(0) */c = [ th3 , 0];/* vector of increasing time points

used for the integration limits */t = [0 , time ];par = [ "print" ipri ,

"task" "exa" ,"minval" 1. ,"reltol" 1.e-4 ,"met" "gear" ];

res = ode(derode,t,c,par,.,.,jacode);/* res = ode(derode,t,c,par); */return(res[2,]‘);

}

4. Make some test runs of F0:

print "Test function F0 call";time = wagner[,1];t = [0., time]; th0 = [ .1 .3 10 ];A = [ -.1 0. , .1 -.3 ];fun = f0(th0,wagner[,1],2);print "Fun=",fun;

Solve Initial Value Problem dY_i/dT = F(Y_i,T)Use Adams Method for Non-Stiff Systems

Start Iteration at T=0 Rel. Precision=0.0001

Iter Nstep Nfunc Njac CurrentTime AvStpSiz LStpSiz AvOrd LOrd

1 9 18 0 1.00000000 0.111111 0.255994 1.67 32 13 23 0 2.00000000 0.153846 0.232017 2.08 33 18 28 0 3.00000000 0.166667 0.071933 2.39 44 24 34 0 4.00000000 0.166667 0.640334 2.79 45 28 39 0 6.00000000 0.214286 0.078999 2.96 46 36 48 0 8.00000000 0.222222 0.110070 3.11 37 42 55 0 10.0000000 0.238095 0.717648 3.10 38 45 60 0 12.0000000 0.266667 0.564703 3.09 39 51 67 0 16.0000000 0.313725 0.507446 3.08 3Sucessful Termination at T=16 Rel. Precision=0.0001


Fun=| 1

--------------1 | 0.000002 | 0.819983 | 1.34954 | 1.67125 | 1.84566 | 1.91767 | 1.79308 | 1.59039 | 1.3693

10 | 0.96828

5. LS Objective Function needed by LEVMAR call:Function F1 is a slightly modified version of F0 used to specify the least squares fit problem and to computethe difference between the observed and predicted serum concentrations.

function f1(theta) global(wagner,A,ipri,eps,met) {/* note: we fit the concentration in the serum,

i.e. we compare the gamma_2 values in res[2,]with the concentration measurements in wagner[,2] */

/* transfer matrix A needed by derode() and jacode() */th1 = theta[1]; t1m = -th1; t2m = -theta[2]; th3 = theta[3];A = [ t1m 0. ,


used for the integration limits */time = wagner[,1]; t = [ 0 , time ];par = [ "print" ipri ,

"reltol" eps ,"task" "int" ,"minval" 1. ,"met" met ,"nolog" ];

res = ode(derode,t,c,par,.,.,jacode);/* res = ode(derode,t,c,par); */dif = res[2,2:10]‘ - wagner[,2];return(dif);

}

print "Test run of function F1(): RG56";ipri = 3; eps = 1.e-8; met = "rk56";th0 = [ .1 .3 10 ];f1ini = f1(th0);print "F1_ini=",f1ini;


Solve Initial Value Problem dY_i/dT = F(Y_i,T)Use Runge-Kutta (5,6) Verner Method (Jackson)Start Iteration at T=0 Rel. Precision=1e-008

Iter Nstep Nfunc Curr.Time Rel.Error

0 0 1 0.00000000 1.000e-0081 1 33 1.00000000 1.000e-0082 1 65 2.00000000 1.000e-0083 1 97 3.00000000 1.000e-0084 1 129 4.00000000 1.000e-0085 1 177 6.00000000 1.000e-0086 1 225 8.00000000 1.000e-0087 1 273 10.0000000 1.000e-0088 1 313 12.0000000 1.000e-0089 1 377 16.0000000 1.000e-008

Sucessful Termination at T=16 Rel. Precision=1e-008

F1_ini=| 1

--------------1 | 0.120102 | 0.149603 | 0.271244 | 0.445635 | 0.817566 | 0.993067 | 0.990468 | 0.869359 | 0.66833

6. Try solving the problem by general minimization:

Instead of using the Levenberg-Marquardt algorithm, you can use any of the other optimization algorithmsfor solving the least squares problem specified with function F1. However, you may also solve the relatedgeneral MINimization problem using the the objective function specified in module F2.

The following function F2 is the same as F1 except that it returns a scalar which is the sum of squares errorbetween fitted and observed data:

function f2(theta) global(wagner,A,ipri,eps,met) {/* transfer matrix A needed by derode() and jacode() */th1 = theta[1]; t1m = -th1; t2m = -theta[2]; th3 = theta[3];A = [ t1m 0. ,


used for the integration limits */time = wagner[,1]; t = [ 0 , time ];


par = [ "print" ipri ,"reltol" eps ,"task" "int" ,"minval" 1. ,"met" met ,"nolog" ];

res = ode(derode,t,c,par,.,.,jacode);/* res = ode(derode,t,c,par); */

dif = ssq(res[2,2:10]‘ - wagner[,2]);return(dif);

}

Note that for the special case of 2 × 2 transfer matrix A, the system of ordinary differential equations canbe solved analytically by computing the eigenvalue decomposition of A, and PROC NLP can then be usedfor the optimization. Refer to Bates & Watts (1988, pp.173-175) or Hartmann (1992) for more details.

7. Prepare and Perform nlp and ode calls:

With θ1 = .1, θ2 = .3, and θ3 = 10 as starting values, the Levenberg-Marquardt algorithm converges quickly.We specify boundary constraunts, requiring that all parameters θ ≥ 0.

print "start values: th0 = [ .1 .3 10 ]";th0 = [ .1 .3 10 ];tim = time("clock");ipri = 0; eps = 1.e-8; met = "adam";bc = cons(3,1,0.) -> cons(3,1,.);mopt = [ "tech" "levmar" ,

"print" 3 ];< xr, rp > = nlp(f1,th0,mopt,bc);tim = time("clock") - tim;print "Time=",tim;




1 X_1 0.10000000 38.271683 0.0000000 .2 X_2 0.30000000 -24.093935 0.0000000 .3 X_3 10.0000000 0.8337407 0.0000000 .



Hessian Matrix**************


S | 1 2 3-------------------------------------1 | 873.95662 -235.97285 12.2403102 | -235.97285 141.47402 -5.00273563 | 12.240310 -5.0027356 0.2092783

The iteration history is shown as follows:



Iteration Start:N. Variables 3 N. Equations 9N. Bound. Constr. 3 N. Mask Constr. 0Criterion 2.073493759 Max Grad Entry 38.27168224N. Active Constraints 0 TR Radius 1167.146748

Iter rest nfun act optcrit difcrit maxgrad lambda rho1 0 2 0 0.284283 1.789211 5.00033 0.00000 0.868442 0 4 0 0.065578 0.218705 4.15760 0.01296 0.879603 0 6 0 0.034129 0.031449 0.75303 0.01725 0.984114 0 9 0 0.025409 8.7e-003 0.72110 4e-003 0.881415 0 11 0 0.021930 3.5e-003 0.16632 9e-003 0.991506 0 14 0 0.019378 2.6e-003 0.33477 3e-003 0.833497 0 16 0 0.018333 1.0e-003 0.31083 8e-004 0.649028 0 17 0 0.017829 5.0e-004 0.03131 0.00000 1.001639 0 18 0 0.017823 5.2e-006 2e-004 0.00000 0.98225

10 0 19 0 0.017823 3.6e-008 2e-004 0.00000 0.7870911 0 20 0 0.017823 2.0e-009 2e-005 0.00000 0.9481612 0 23 0 0.017823 7.2e-012 6e-006 5e-003 0.30174

Successful Termination After 12 IterationsGCONV convergence criterion satisfied.Criterion 0.017823385 Max Grad Entry 6.4271e-006N. Active Constraints 0 Ridge (lambda) 0.004617837TR Radius 4.1687e-005 Act.dF/Pred.dF 0.301741460N. Function Calls 24 N. Gradient Calls 2N. Hessian Calls 14 Preproces. Time 1Time for Method 5 Effective Time 7


The parameter estimates match those of Bates & Watts (1988, p. 175):




1 X_1 0.18302024 6.43e-0062 X_2 0.43449513 1.29e-0063 X_3 5.99545670 4.19e-007


Hessian Matrix**************


S | 1 2 3-------------------------------------1 | 95.585789 -25.452984 3.75770472 | -25.452984 28.829443 -2.35639953 | 3.7577047 -2.3563995 0.2327152

8. LS Fit of Tetracycline: Studentized Residuals

A plot of the Tetracyline data and the fitted response versus time (Figure 16) shows a poor fit, since thefitted curve is too squat. A plot of Studentized residuals versus time (Figure 17) also reveals that the modelmay be indequate for the data.

jtj = jopt‘ * jopt;hat = jopt * inv(jtj) * jopt‘;stures = sqrt(ssq(fopt) .* (dia2vec(ide(m) - hat)));stures = fopt ./ stures;print "Studentized Residuals", fopt, stures;


| 1--------------1 | 0.107922 | -0.003993 | -0.065194 | -0.069025 | 0.033356 | 0.074177 | 0.043218 | -0.038449 | -0.07080

| 1--------------1 | 0.748492 | -0.026813 | -0.405234 | -0.438115 | 0.220776 | 0.459767 | 0.265798 | -0.249939 | -0.48519

Solving the Problem with Dead Time Parameter

1. Introducing an additional parameter for dead time:

If a system does not respond immediately to the input, the addition of a delay or dead time parameter canimprove the fit of the model. Let θ4 be the dead time parameter. Consider the modified model:

γ(τ ) = Aγ(τ )γ(0) = γ0

with

τ ={t − θ4 for t > θ40 for t ≤ θ4

2. To incorporate the dead time parameter, the F1 function for the Levenberg-Marquardt algorithm is modifiedby changing the specification of the t vector to include the dead time parameter (instead of 0) as the lowerlimit for the numerical integration of the differential equations.

function f10(theta,wagner,ipri) global(A) {

/* 1. prepare input for ODE */

/* transfer matrix A needed by derode() and jacode() */

th1 = theta[1]; t1m = -th1; t2m = -theta[2];

th3 = theta[3]; th4 = theta[4];

A = [ t1m 0. ,

th1 t2m ];

/* vector of initial values y(0) = gamma(0) */

c = [ th3 , 0. ];

/* vector of increasing time points

used for the integration limits */

time = wagner[,1]; t = [ th4 , time ];

par = [ "print" ipri ,

"task" "int" ,

"minval" 1. ,

"reltol" 1.e-4 ,

"met" "gear" ];

res = ode(derode,t,c,par,.,.,jacode);

return(res[2,]‘);

}


function f11(theta) global(wagner,A,ipri,eps,met) {

/* dif = f10(theta,wagner[,1]) - wagner[,2]; */

/* transfer matrix A needed by der() and jac() */

th1 = theta[1]; t1m = -th1; t2m = -theta[2];

th3 = theta[3]; th4 = theta[4];

A = [ t1m 0. ,

th1 t2m ];

/* vector of initial values y(0) = gamma(0) */

c = [ th3 , 0. ];

/* vector of increasing time points

used for the integration limits */

time = wagner[,1]; t = [ th4 , time ];

par = [ "print" ipri ,

"task" "int" ,

"minval" 1. ,

"reltol" eps ,

"met" met ,

"nolog" ];

res = ode(derode,t,c,par,.,.,jacode);

dif = res[2,2:10]‘ - wagner[,2];

return(dif);

}

3. The arguments for NLPLM have to be modified to reflect the additional parameter θ4. A constraint isimposed on θ4 to restrict it in the range between zero and the first time measurement in the data, which is1. A starting value of .1 for θ4 is used.

m = nrow(wagner);

th0 = [ .1 .3 10 .1 ];

/* to perform ODE, dead time parameter theta[4] must be

smaller than the first time measurement = .7 */

bc = cons(4,1,0.) -> cons(4,1,.);

bc[4,2] = .7; /* print "BC=",bc; */

ipri = 0; eps = 1.e-8; met = "adam";

mopt = [ "tech" "levmar" ,

"print" 3 ];

< xr, rp > = nlp(f11,th0,mopt,bc);

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X_1 0.10000000 37.448785 0.0000000 .

2 X_2 0.30000000 -24.030149 0.0000000 .

3 X_3 10.0000000 0.8240406 0.0000000 .

4 X_4 0.10000000 0.1701123 0.0000000 0.7000000



Hessian Matrix

**************


S | 1 2 3 4

------------------------------------------------

1 | 864.48305 -233.61791 12.116656 -13.568670

2 | -233.61791 139.96633 -4.9481496 -0.4288472

3 | 12.116656 -4.9481496 0.2070553 -0.1126972

4 | -13.568670 -0.4288472 -0.1126972 0.7832142

4. The iteration history does not show any convergence problems in the optimization.

Levenberg-Marquardt Optimization

Scaling Update of More (1978)


CRP Jacobian Computed by Finite Differences

Iteration Start:

N. Variables 4 N. Equations 9





1 0 2 0 0.514668 1.572974 25.1347 0.00000 0.75432

2 0 4 1 0.212573 0.302095 13.2358 0.01630 0.82072

3 0 6 0’ 0.079012 0.133560 3.34982 0.01669 0.97253

4 0 8 0 0.032153 0.046859 2.84165 0.01321 0.74765

5 0 12 0 0.015512 0.016641 1.80673 2e-004 0.61678

6 0 13 0 5.0e-003 0.010481 9e-003 0.00000 0.99904

7 0 14 0 5.0e-003 8.0e-006 2e-003 0.00000 0.74891

8 0 18 0 5.0e-003 2.2e-009 8e-004 0.02396 0.05076

9 0 19 0 5.0e-003 1.2e-008 9e-004 0.05416 1.49553

10 0 20 0 5.0e-003 1.9e-008 1e-003 0.11063 0.27026

11 0 21 0 5.0e-003 2.3e-008 3e-004 0.02050 1.35776

12 0 29 0 5.0e-003 2.5e-012 3e-004 1292.10 5.27292

13 0 30 0 5.0e-003 4.0e-012 2e-004 994.914 2.60020

14 0 31 0 5.0e-003 3.5e-012 2e-004 410.124 1.34910

15 0 64 0 5.0e-003 0.000000 2e-004 3e+017 1.00000


XCONV convergence criterion satisfied.


N. Active Constraints 0 Ridge (lambda) 3.3599e+017

TR Radius 1.4733e-017 Act.dF/Pred.dF 1.000000000




********************


********************


Parameter Estimates

-------------------


1 X_1 0.14854360 1.39e-004

2 X_2 0.71824401 2.17e-004

3 X_3 10.1351892 1.20e-005

4 X_4 0.41371618 6.56e-007


Hessian Matrix

**************


S | 1 2 3 4

------------------------------------------------

1 | 172.45466 -29.388540 3.1611804 -5.5203780

2 | -29.388540 12.141842 -0.9393271 -0.2548978

3 | 3.1611804 -0.9393271 0.0816783 -0.0541781

4 | -5.5203780 -0.2548978 -0.0541781 0.9293510

We save some of the optimal results and compute the Jacobian:

thopt = xr;

fopt = f11(thopt); print "Fopt=",fopt;

jopt = fider(f11,thopt,"jaco"); print "Jopt=",jopt;

jtj = jopt‘ * jopt;

5. LS Fit of Tetracycline with Dead Time: Data and Fit with Inference Band

The Tetracycline data and the fitted model with dead time are plotted in Figure 18. The approximateinference bands for the expected response are also given in the same plot.

n = ncol(thopt); m = nrow(wagner);fres = fopt + wagner[,2]; print "fres=",fres;sigma = ssq(fopt) / (m-n);sigm2 = sqrt(sigma); print "m,n,sigm2=",m,n,sigm2;

/* Inference Band */function f12(theta) global(xtim,A,ipri,eps,met) {

/* ycon = f10(theta,xtim); *//* transfer matrix A needed by der() and jac() */th1 = theta[1]; t1m = -th1; t2m = -theta[2];th3 = theta[3]; th4 = theta[4];A = [ t1m 0. ,

th1 t2m ];/* vector of initial values y(0) = gamma(0) */c = [ th3 , 0. ];/* vector of increasing time points


used for the integration limits */t = [ th4 , xtim ];par = [ "print" ipri ,

"reltol" eps ,"task" "int" ,"minval" 1. ,"met" met ,"nolog" ];

ycon = ode(derode,t,c,par,.,.,jacode);nc = ncol(ycon);return(ycon[2,2:nc]‘);

}

ipri = 0; eps = 1.e-10; met = "rk56";a = thopt[4] + .1;xtim = [ a : .2 : 16. ]‘; /* print "XTIM=",xtim; */yvelo = f12(thopt);jac = fider(f12,thopt,"jaco");jtjopt = jopt‘ * jopt;dhat = sqrt(dia2vec(jac * inv(jtjopt) * jac‘));/* correct bug: here n=2 */quan = fmis(.95, . ,n,m-n);fact = sigm2 * sqrt(2. * quan);yupp = yvelo + fact * dhat;ylow = yvelo - fact * dhat;

pmat = xtim -> yvelo -> ylow -> yupp;cnam = [ "x" "y" "y_low" "y_upp" ];pmat = cname(pmat,cnam);print "Inference Band=", pmat;

| x y y_low y_upp--------------------------------------------1 | 0.51372 0.14419 -0.23597 0.524342 | 0.71372 0.39707 0.14546 0.648683 | 0.91372 0.60815 0.44124 0.775064 | 1.1137 0.78326 0.66133 0.905185 | 1.3137 0.92742 0.81920 1.03566 | 1.5137 1.0450 0.93544 1.15467 | 1.7137 1.1398 1.0265 1.25318 | 1.9137 1.2150 1.1005 1.32959 | 2.1137 1.2735 1.1609 1.3862

10 | 2.3137 1.3177 1.2094 1.426011 | 2.5137 1.3497 1.2471 1.452312 | 2.7137 1.3713 1.2747 1.468013 | 2.9137 1.3841 1.2929 1.475414 | 3.1137 1.3895 1.3023 1.476715 | 3.3137 1.3885 1.3038 1.473316 | 3.5137 1.3823 1.2984 1.466217 | 3.7137 1.3717 1.2872 1.4561


18 | 3.9137 1.3573 1.2713 1.443319 | 4.1137 1.3400 1.2519 1.428020 | 4.3137 1.3201 1.2298 1.410421 | 4.5137 1.2982 1.2059 1.390622 | 4.7137 1.2748 1.1806 1.368923 | 4.9137 1.2501 1.1546 1.345524 | 5.1137 1.2244 1.1281 1.320725 | 5.3137 1.1980 1.1014 1.294626 | 5.5137 1.1711 1.0747 1.267527 | 5.7137 1.1439 1.0482 1.239628 | 5.9137 1.1165 1.0219 1.211229 | 6.1137 1.0892 0.99598 1.182430 | 6.3137 1.0619 0.97043 1.153431 | 6.5137 1.0348 0.94527 1.124432 | 6.7137 1.0080 0.92051 1.095433 | 6.9137 0.98146 0.89614 1.066834 | 7.1137 0.95533 0.87216 1.038535 | 7.3137 0.92961 0.84854 1.010736 | 7.5137 0.90434 0.82527 0.9834237 | 7.7137 0.87956 0.80231 0.9568038 | 7.9137 0.85527 0.77965 0.9308939 | 8.1137 0.83150 0.75727 0.9057340 | 8.3137 0.80826 0.73516 0.88135

41 | 8.5137 0.78554 0.71331 0.8577842 | 8.7137 0.76337 0.69171 0.8350343 | 8.9137 0.74174 0.67039 0.8130944 | 9.1137 0.72064 0.64934 0.7919545 | 9.3137 0.70008 0.62858 0.7715946 | 9.5137 0.68005 0.60814 0.7519747 | 9.7137 0.66055 0.58803 0.7330648 | 9.9137 0.64156 0.56828 0.7148449 | 10.114 0.62308 0.54890 0.6972650 | 10.314 0.60510 0.52995 0.6802551 | 10.514 0.58761 0.51140 0.6638352 | 10.714 0.57061 0.49326 0.6479653 | 10.914 0.55408 0.47557 0.6325854 | 11.114 0.53800 0.45833 0.6176855 | 11.314 0.52238 0.44153 0.6032356 | 11.514 0.50720 0.42523 0.5891757 | 11.714 0.49245 0.40938 0.5755258 | 11.914 0.47812 0.39394 0.5623059 | 12.114 0.46419 0.37910 0.5492960 | 12.314 0.45067 0.36481 0.5365261 | 12.514 0.43753 0.35080 0.5242662 | 12.714 0.42477 0.33713 0.5124163 | 12.914 0.41238 0.32400 0.5007564 | 13.114 0.40034 0.31135 0.4893365 | 13.314 0.38865 0.29915 0.4781566 | 13.514 0.37730 0.28736 0.4672467 | 13.714 0.36628 0.27588 0.45667


68 | 13.914 0.35558 0.26484 0.4463269 | 14.114 0.34519 0.25434 0.4360470 | 14.314 0.33510 0.24419 0.4260071 | 14.514 0.32530 0.23421 0.4163972 | 14.714 0.31579 0.22465 0.4069373 | 14.914 0.30655 0.21580 0.3973174 | 15.114 0.29759 0.20723 0.3879575 | 15.314 0.28889 0.19849 0.3792876 | 15.514 0.28044 0.19023 0.3706477 | 15.714 0.27223 0.18312 0.3613578 | 15.914 0.26427 0.17621 0.35233

6. LS Fit of Tetracycline with Dead Time: Studentized Residuals:

The studentized residuals are plotted in Figure 19. The modified model fit the description of the data betterthan the model without the dead parameter. The distribution of the studentized residuals appears to bemore uniform than that of the previous model.

dhat = dia2vec(jopt * inv(jtjopt) * jopt‘);print "Dhat=", dhat;stures = sigm2 * sqrt(cons(m,1,1.) - dhat);stures = fopt ./ stures;print "Studendized Residuals=", fopt, stures;

| 1--------------1 | -0.012202 | 0.042153 | -0.012714 | -0.049835 | 0.004746 | 0.044957 | 0.033528 | -0.027949 | -0.03910

| 1--------------1 | -1.87042 | 1.53313 | -0.360854 | -1.39125 | 0.139286 | 1.17337 | 0.869418 | -0.769049 | -1.0875

Comparing Different Optimization Methods

The following CMAT input

ipri = 0; eps = 1.e-8;vmet = [ "adam" "gear" "dyna" "rk45" "rk56" ];bc = cons(3,1,0.) -> cons(3,1,.);vtech = [ "levmar" "trureg" "nrridg" "dbldog" ];

for (it = 1; it <= 4; it++)for (im = 1; im <= 5; im++) {tech = vtech[it]; met = vmet[im];tim = time("clock");mopt = [ "tech" tech ,



< xr, rp > = nlp(f1,th0,mopt,bc);tim = time("clock") - tim;print "NLPtech=",tech," ODEmeth=",met," Time=",tim;

}

generates a table showing some computing times when optimizing the least squares objective function F1:

Table 1.1: Least Squares Approach: Time, Iterations, Function Calls

NLP Technique Adams Gear RK(4,5) RK(5,6)LEVMAR 5.750,12,24 6.735,13,22 19.360,13,22 18.187,13,22TRUREG 4.438, 9,11 5.484,10,12 16.344,10,12 15.656,10,12NRRIDG 5.937,12,18 5.828,10,16 17.313,10,16 15.531,10,16DBLDOG 9.250,29,36 10.703,30,38 30.906,30,38 29.156,30,38

The following CMAT input

ipri = 0; eps = 1.e-8;vmet = [ "adam" "gear" "dyna" "rk45" "rk56" ];bc = cons(3,1,0.) -> cons(3,1,.);vtech = [ "trureg" "nrridg" "dbldog" ];

for (it = 1; it <= 3; it++)for (im = 1; im <= 5; im++) {tech = vtech[it]; met = vmet[im];tim = time("clock");mopt = [ "tech" tech ,


< xr, rp > = nlp(f2,th0,mopt,bc);tim = time("clock") - tim;print "NLPtech=",tech," ODEmeth=",met," Time=",tim;

}

generates a table showing some computing times when optimizing the general minimization function F2:

The results for the "dyna" ODE method are basically the same asthbat for the "adam" method. Obviously thetrust region optimization technique using the least squares specification is recommended most for this kind ofproblem. Since double dogleg algorithm is using only the gradient but not the Jacobian, the computer times forthe two runs are very similar.


Table 1.2: Minimization Approach: Time, Iterations, Function Calls

NLP Technique Adams Gear RK(4,5) RK(5,6)TRUREG 12.891,11,21 15.656,12,23 46.172,12,23 42.687,12,23NRRIDG 30.094,21,25 20.672,15,18 61.266,15,18 57.437,15,18DBLDOG 9.094,29,35 10.562,30,38 30.547,30,38 29.156,30,38

1.5.6 Using QP and NLP for Computing Efficient Frontier

The CMAT code for this example can be found in files cmat/tnlp/tnlp161.inp and cmat/tnlp/tnlp162.inp.

Introduction to the Markowitz Model

The Markowitz model can be described by two equivalent approaches:

• Select a portfolio of stocks that offers the greatest return Rp for a given level of risk σp. This is a linearoptimization problem with a quadratic constraint.

• Select a portfolio of stocks with lowest risk σp for a given level of return Rp. This is a quadratic optimizationproblem with a linear constraint.

This paper describes the second approach.

We want to select up to n stocks into a portfolio where each of the stocks is to be chosen with a weight xj,0 ≤ xj ≤ 1, j = 1, . . . , n, and

∑nj=1 xj = 1. The basic information needed for our analysis is a data set containing

the stock returns over a time period in the past. We describe the data with rij, where i = 1, . . . , N refers to thetime points and j = 1, . . . , n refers to the stocks. Based on this data set, we can use PROC CORR to compute

• the stock return vector r = (rj), that is, the vector of the arithmetic means

rj =1N

N∑

i=1

rij

• the covariance matrix S = (σjk) of the stock return values

σjk =1

N − 1

N∑

i=1

(rij − rj)(rik − rk) , j, k = 1, . . . , n.

Two important functions depend on the weights xj of the portfolio stocks:

• the expected portfolio return Rp(x)

Rp(x) =n∑

j=1

xjrj


• the expected portfolio risk

σp(x) =

√√√√n∑

j=1

n∑

k=1

σjkxjxk

The Basic Minimum Risk Model

Here we only want to compute the optimal weights xj that minimize the portfolio risk

minxσ2

p(x) = σ2p(x)

subject to the basic set of constraints

0 ≤ xj ≤ 1, j = 1, . . . , n lower and upper boundary constraints

n∑

j=1

xj = 1 linear equality constraint

This is a quadratic optimization problem minx σ2p(x) = minx xTSx with a positive (semi)definite (covariance)

matrix S subject to a set of linear constraints that can be solved with any of the optimization algorithms availablein PROC NLP and IML. We first illustrate the use of the quadratic optimization technique in PROC NLP. Laterwe will also show how to use the IML subroutines for nonlinear optimization. Note that the objective function forthe quadratic optimization techniques in PROC NLP and IML is 1

2xTSx, which is half of the squared portfolio

risk. That means we have to compute the square root of two times the value of the objective function to obtainthe portfolio risk σp(x).

The following data set contains the monthly returns of ten selected stocks during the period January 1978 untilDecember 1987.

print "Stock Returns: CRSP Monthly Data, Jan 78 - Dec 87";options NOECHO;%inc "c:\\cmat\\tdata\\stock.dat";options ECHO;

vlab = [ "Gerber Corporation" "Tandy Corporation" "General Mills""Con Edison" "Weyerhauser" "IBM""DEC" "Mobil Corporation" "Texaco""CPL" ];

vnam = [ "Gerber" "TandyC" "GenMills""CEdison" "Weyerh" "IBM""DEC" "Mobil" "Texaco" "CPL" ];

print "VName=",vnam;stock = cname(stock,vnam);nr = nrow(stock); nc = nvar= ncol(stock);print "nr,nc=",nr,nc;

The data contains 120 observations (rows, i.e. ten years each with twelfe months) for 10 stocks (columns).

The univar and bivar functions are used to compute the mean and covariance matrix of the data:


print "Compute COV matrix and MEAN vector";sopt = [ "ari" "std" "med" ];univ = univar(stock,sopt);print "Univar=",univ;mean = univ[1,];print "Mean=",mean;cov = bivar(stock,"cov");print "COV=",cov;

The following is the result of the univar call:

Univar=| Gerber TandyC GenMills CEdison

--------------------------------------------------------Ari_Mean | 0.01543 0.02501 0.01523 0.01851Std_Dev | 0.08711 0.12757 0.06328 0.05027Median | 0.01500 0.02250 0.01150 0.01950

| Weyerh IBM DEC Mobil--------------------------------------------------------

Ari_Mean | 0.00963 0.00962 0.01975 0.01619Std_Dev | 0.08507 0.05902 0.09914 0.08031Median | -0.00200 0.00200 0.02400 0.01250

| Texaco CPL----------------------------------

Ari_Mean | 0.01194 0.01270Std_Dev | 0.07970 0.05395Median | 0.01050 0.01350

For space reasons we do not show the output of the covariance matrix.

The following CMAT code shows how to create a feasible starting point x0j = 1/n and how to specify the lowerand upper bounds bc, 0 ≤ xj ≤ 1, as well as the equality constraint lc,

∑nj=1 xj = 1, for the estimates:

/* Creating initial values and constraints *//* 1. parms observation with initial values */r = 1. / nc;x0 = cons(1,nc,r);/* 2. lower bounds: weights must be nonnegative *//* 3. upper bounds: weights must be smaller than 1 */bc = cons(nc,1,0.) -> cons(nc,1,1.);/* 4. linear equality constraint: weights must sum up to 1 */lc = cons(1,nc+2,1.);

Using the null space QP algorithm we run the constrained optimization:


print "Compute Minimum Risk Portfolio";print "Without Additional Constraints on Minimum Return";optn = [ "qpnusp" ,

"print" 3 ];< xr, rp > = qp(cov,.,lc,optn,bc,x0);print "XR=",xr;print "RP=",rp;if (rp[1] < 0) print "Error NLP:", rp[1];

The following output shows the definition of the QP:




1 X1 0.10000000 0.0022481 0.0000000 1.00000002 X2 0.10000000 0.0041447 0.0000000 1.00000003 X3 0.10000000 0.0014994 0.0000000 1.00000004 X4 0.10000000 7.78e-004 0.0000000 1.00000005 X5 0.10000000 0.0029964 0.0000000 1.00000006 X6 0.10000000 0.0016441 0.0000000 1.00000007 X7 0.10000000 0.0033524 0.0000000 1.00000008 X8 0.10000000 0.0023685 0.0000000 1.00000009 X9 0.10000000 0.0019405 0.0000000 1.000000010 X10 0.10000000 0.0010951 0.0000000 1.0000000



[ 1]ACT 1.0000000 == + 1.00000 * X1 + 1.00000 * X2+ 1.00000 * X3 + 1.00000 * X4+ 1.00000 * X5 + 1.00000 * X6+ 1.00000 * X7 + 1.00000 * X8+ 1.00000 * X9 + 1.00000 * X10( -1e-016 )

The iteration history does not show any problems:

Null Space Active Set Method of Quadratic Problem


Using Dense Hessian

Iteration Start:N. Variables 10N. Bound. Constr. 20 N. Mask Constr. 0N. Linear Constr. 1 Lin. Equ. Constr. 1Criterion 0.001103375 Max Grad Entry 0.001604353N. Active Constraints 1

Iter rest nfun act optcrit difcrit maxgrad alpha slope1 0 2 2 6.3e-004 4.7e-004 5e-004 0.68599 -1e-0032 0 3 3 6.2e-004 1.7e-005 4e-004 0.19315 -1e-0043 0 4 4 6.0e-004 1.1e-005 2e-004 0.28289 -5e-0054 0 5 5 6.0e-004 1.9e-006 1e-004 0.15203 -1e-0055 0 6 6 6.0e-004 3.9e-006 2e-005 0.79126 -8e-0066 0 7 6 6.0e-004 1.2e-007 2e-019 1.00000 -2e-007

Successful Termination After 6 IterationsABSGCONV convergence criterion satisfied.Criterion 0.000598330 Max Grad Entry 6.5052e-019N. Active Constraints 6 Slope SDirect. -2.3209e-007N. Function Calls 9 N. Gradient Calls 7Preproces. Time 0 Time for Method 0Effective Time 0

We obtain the following optimal stock weights xj, which correspond to the minimum portfolio risk of σp =√2 ∗ 0.0005983304:




1 X1 0.00000000 0.0012921 Lower BC2 X2 0.00000000 0.0017323 Lower BC3 X3 0.13962066 0.00119674 X4 0.33107360 0.00119675 X5 0.00000000 0.0014926 Lower BC6 X6 0.23000302 0.00119677 X7 0.00000000 0.0014646 Lower BC8 X8 0.00000000 0.0012159 Lower BC9 X9 0.17172687 0.001196710 X10 0.12757585 0.0011967




[ 1]ACT 1.00000 * X1 + 1.00000 * X2+ 1.00000 * X3 + 1.00000 * X4+ 1.00000 * X5 + 1.00000 * X6+ 1.00000 * X7 + 1.00000 * X8+ 1.00000 * X9 + 1.00000 * X10 - 1.00000= 2.2204e-016

XR=| 1 2 3 4 5

------------------------------------------------------1 | 0.00000 0.00000 0.13962 0.33107 0.00000

| 6 7 8 9 10------------------------------------------------------1 | 0.23000 0.00000 0.00000 0.17173 0.12758

RP=| 1

---------------------Failure | 3.0000F_Crit | 0.00060N_Iter | 6.0000N_Func | 9.0000N_Grad | 7.0000N_BCact | 5.0000N_LCact | 1.00000unused | .G_Max | 7e-019

O_Time | 0.00000FG_Time | 0.00000unused | 0.00000

To compute the expected portfolio return Rp(x) for the minimum risk solution x we multiply the vector of optimalstock weights xj with the average stock returns rj

Rp(x) =n∑

j=1

xjrj

We assume that the average returns of the past are the expected future (monthly) returns. (In reality, this canbe questionable and points to the limits of the Markowitz model.)


print "Compute Expected Portfolio Return";_rmax = mean[<>];_retn = retn0 = mean * xr‘;_risk = risk0 = sqrt(2. * rp[2]);xr0 = xr; res0 = [ 0. retn0 retn0 risk0 ];print "Maximum Possible Expected Return=",_rmax;print "Expected Return for Minimum Risk Solution=",_retn;print "Optimal Risk Value for Minimum Risk Solution=",_risk;

For our example we obtain Rp(x) = 0.014137.

Compute Expected Portfolio ReturnMaximum Possible Expected Return= 0.02501Expected Return for Minimum Risk Solution= 0.01414Optimal Risk Value for Minimum Risk Solution= 0.03459

Since all stock weights xj are between 0 and 1 and add up to 1, the expected portfolio return Rp(x) for theminimum risk solution must be somewhere between minj rj and maxj rj

minjrj ≤ Rp(x) ≤ max

jrj

For our example we find minj rj = 0.00962 and maxj rj = 0.02501 in the TYPE = MEAN observation of theOCOV data set.

Lower Bound on the Expected Portfolio Return

In our first application, we computed optimal weights xj minimizing the portfolio risk without requiring a min-imum acceptable level of expected portfolio return. We can, however, subject the minimization process to anadditional linear inequality constraint that requires the expected portfolio return Rp(x, pr) to be larger than orequal to a specified lower bound pr

Rp(x, pr) =n∑

j=1

xjrj ≥ pr

If the value of pr is chosen to be larger than maxj rj = 0.02501, no feasible solution to the problem exists. Italso makes no sense to permit the expected portfolio return Rp(x, pr) to be smaller than the one obtained by theminimum risk solution, Rp(x). We therefore select

pr(perc) = Rp(x) + perc ∗ (maxjrj − Rp(x)) with 0 < perc < 1

We specify an additional linear (ineqality) constraint, which requires a minimum risk solution for a return corre-sponding to perc = .5. As starting point x0 we use the former optimal point xr which is of course not feasiblew.r.t. the linear inequality constraint, however, the QP call will generate a fdeasible starting point:

x0 = xr;rhs = _retn + .5 *(_rmax - _retn);lc2 = cons(2,nc+2); lc2[1,] = lc;


lc2[2,] = rhs -> mean -> .;print "LC2=",lc2;

LC2=| 1 2 3 4 5 6

----------------------------------------------------------------1 | 1.00000 1.00000 1.00000 1.00000 1.00000 1.000002 | 0.01957 0.01543 0.02501 0.01523 0.01851 0.00963

| 7 8 9 10 11 12----------------------------------------------------------------1 | 1.00000 1.00000 1.00000 1.00000 1.00000 1.000002 | 0.00962 0.01975 0.01619 0.01194 0.01270 .

optn = [ "qpnusp" ,"print" 3 ];

< xr, rp > = qp(cov,.,lc2,optn,bc,x0);print "XR=",xr;print "RP=",rp;if (rp[1] < 0) print "Error NLP:", rp[1];

For perc = .5 (corresponding to Rp = 0.01957, which is half of the interval between Rp(x) = 0.014137 andmaxj rj = 0.02501), we obtain a solution with a slightly higher risk value σp =

√2 ∗ 0.001107 than in the first

application (σp =√

2 ∗ 0.000598):




1 X1 0.15904963 0.0028156 0.0000000 1.00000002 X2 0.33093189 0.0079084 0.0000000 1.00000003 X3 0.00000000 0.0017789 0.0000000 1.00000004 X4 0.04542447 6.13e-004 0.0000000 1.00000005 X5 0.05517296 0.0040675 0.0000000 1.00000006 X6 0.00000000 0.0021135 0.0000000 1.00000007 X7 0.23662084 0.0054871 0.0000000 1.00000008 X8 0.17280021 0.0031743 0.0000000 1.00000009 X9 0.00000000 0.0023885 0.0000000 1.000000010 X10 0.00000000 9.77e-004 0.0000000 1.0000000




[ 1]ACT 1.0000000 == + 1.00000 * X1 + 1.00000 * X2+ 1.00000 * X3 + 1.00000 * X4+ 1.00000 * X5 + 1.00000 * X6+ 1.00000 * X7 + 1.00000 * X8+ 1.00000 * X9 + 1.00000 * X10( 1e-016 )

[ 2]ACT 0.0195728 <= + 0.01543 * X1 + 0.02501 * X2+ 0.01523 * X3 + 0.01851 * X4+ 1e-002 * X5 + 1e-002 * X6+ 0.01975 * X7 + 0.01619 * X8+ 0.01194 * X9 + 0.01270 * X10( -2e-018 )

Null Space Active Set Method of Quadratic ProblemUsing Dense Hessian

Iteration Start:N. Variables 10N. Bound. Constr. 20 N. Mask Constr. 0N. Linear Constr. 2 Lin. Equ. Constr. 1Criterion 0.002582039 Max Grad Entry 0.003524371N. Active Constraints 5+

Iter rest nfun act optcrit difcrit maxgrad alpha slope1 0 2 4’ 2.6e-003 2.6e-005 3e-003 1.00000 -3e-0052 0 3 5 2.5e-003 7.5e-005 3e-003 0.02271 -3e-0033 0 4 6 1.9e-003 5.9e-004 2e-003 0.20806 -3e-0034 0 5 7 1.4e-003 4.6e-004 2e-003 0.34514 -2e-0035 0 6 8 1.1e-003 3.1e-004 3e-004 0.74544 -7e-0046 0 7 8 1.1e-003 1.8e-005 4e-019 1.00000 -4e-005

Successful Termination After 6 IterationsABSGCONV convergence criterion satisfied.Criterion 0.001106983 Max Grad Entry 1.8431e-018N. Active Constraints 8 Slope SDirect. -3.5179e-005N. Function Calls 9 N. Gradient Calls 7Preproces. Time 0 Time for Method 0Effective Time 0


Parameter Estimates


-------------------


1 X1 0.00000000 0.0011477 Lower BC2 X2 0.18393296 0.00394293 X3 0.00000000 0.0013548 Lower BC4 X4 0.68348036 0.00187545 X5 0.00000000 0.0018504 Lower BC6 X6 0.00000000 9.02e-004 Lower BC7 X7 0.04948184 0.00227038 X8 0.08310484 0.00113859 X9 0.00000000 5.93e-004 Lower BC10 X10 0.00000000 0.0014942 Lower BC



[ 1]ACT 1.00000 * X1 + 1.00000 * X2+ 1.00000 * X3 + 1.00000 * X4+ 1.00000 * X5 + 1.00000 * X6+ 1.00000 * X7 + 1.00000 * X8+ 1.00000 * X9 + 1.00000 * X10 - 1.00000= 1.1102e-016

[ 2]ACT 0.01543 * X1 + 0.02501 * X2+ 0.01523 * X3 + 0.01851 * X4+ 1e-002 * X5 + 1e-002 * X6+ 0.01975 * X7 + 0.01619 * X8+ 0.01194 * X9 + 0.01270 * X10 - 0.01957= -3.2526e-018

XR=| 1 2 3 4 5

------------------------------------------------------1 | 0.00000 0.18393 0.00000 0.68348 0.00000

| 6 7 8 9 10------------------------------------------------------1 | 0.00000 0.04948 0.08310 0.00000 0.00000

RP=| 1

---------------------Failure | 3.0000F_Crit | 0.00111


N_Iter | 6.0000N_Func | 9.0000N_Grad | 7.0000N_BCact | 6.0000N_LCact | 2.0000unused | .G_Max | 2e-018

O_Time | 0.00000FG_Time | 0.00000unused | 0.00000

print "Compute Expected Portfolio Return";_retn = mean * xr‘;_risk = sqrt(2. * rp[2]);print "RETN=",_retn," RISK=",_risk;

Compute Expected Portfolio ReturnRETN= 0.01957 RISK= 0.04705

The following CMAT module effront computes a so-called efficient frontier, which is a series of minimum risksolutions for a list of different perc values defining the interval of the expected portfolio return Rp(x, pr). Thearguments of the module define:

cov covariance matrix of the data,

mean mean vector of the data,

lst list of perc values, which should be between 0 and 1,

tech string scalar specifying the QP method,

prit int value specifying the amount of printed output.

In applications of the module, the values of the number list lst should be sorted in increasing order since themodule uses the optimal parameter weights from one qp run as the starting values of the subsequent qp run:

function effront(cov,mean,lst,tech,prit){/*--- Compute Efficient Frontier ---*/nc = ncol(cov); nlst = ncol(lst);

/* Creating initial values and constraints *//* 1. parms observation with initial values */r = 1. / nc;x0 = cons(1,nc,r);/* 2. lower bounds: weights must be nonnegative *//* 3. upper bounds: weights must be smaller than 1 */


bc = cons(nc,1,0.) -> cons(nc,1,1.);/* 4. linear equality constraint: weights must sum up to 1 */lc = cons(1,nc+2,1.);

print "Compute Minimum Risk Portfolio";print "Without Constraints on Minimum Return";optn = [ tech ,

"print" prit ];< xr, rp > = qp(cov,.,lc,optn,bc,x0);print "XR=",xr;print "RP=",rp;if (rp[1] < 0) print "Error NLP:", rp[1];

/* Use OCOV (MEAN) and EST (PARMS) data sets */print "Compute Expected Portfolio Return";_imax = mean[<:>]; _rmax = mean[_imax];_retn = retn0 = mean * xr‘;_risk = risk0 = sqrt(2. * rp[2]);rhs0 = _retn + .5 *(_rmax - _retn);xr0 = xr; res0 = [ 0. retn0 retn0 risk0 ];print "Maximum Possible Expected Return=",_rmax;print "Expected Return for Minimum Risk Solution=",_retn;print "Optimal Risk Value for Minum Risk Solution=",_risk;

/*------------------------------------------------------------*/

lc2 = cons(2,nc+2); lc2[1,] = lc;lc2[2,] = rhs0 -> mean -> .;pret = cons(nlst+1,nc+4);pret[1,] = res0 -> xr0;for (ilst = 1; ilst <= nlst; ilst++) {

x0 = xr; perc = lst[ilst];/* This is feasible starting point:

x0 = cons(1,nc,0.); x0[_imax] = 1.;print "New X0=",x0; */

rhs = retn0 + perc * (_rmax - retn0);lc2[2,1] = rhs;/* print "Perc=",perc," RHS=",rhs," Lc2=",lc2; */

print "Lower Bound=",rhs," for Portfolio Return, Perc.=",perc;optn = [ tech ,

"lce" 1.e-6 ,"print" 0 ];

< xr, rp > = qp(cov,.,lc2,optn,bc,x0);/* print "XR=",xr; print "RP=",rp; */if (xr == . || rp[1] < 0) {print "Error NLP:", rp[1];print "Old X0=",x0;res = [ perc rhs . . ];pret[ilst+1,] = res -> cons(1,nc,.);


xr = cons(1,nc,0.); xr[_imax] = 1.;continue;}

_retn = mean * xr‘;_risk = sqrt(2. * rp[2]);print "Restricted Minimum Risk Value=", _risk;print "Expected Return of Portfolio=", _retn;

res = [ perc rhs _retn _risk ];pret[ilst+1,] = res -> xr;

}return(pret);

}

We now want to plot the portfolio returns versus the risk levels for 21 optimal solutions, starting with theminimum risk solution for perc = 0, followed by 20 optimizations for the different values of required expectedportfolio returns corresponding to the lst values perc =0.05, 0.10, 0.15, . . . , 1.0:

lst = [ .05 : .05 : 1. ]; nlst = ncol(lst);tech = "qpnusp"; prit = 3;eff1 = effront(cov,mean,lst,tech,prit);eff1 = rname(eff1,rnam);eff1 = cname(eff1,cnam);print "Efficient Frontier=",eff1;

Efficient Frontier=| Percent RHS Return Risk Gerber

------------------------------------------------------------------MINRisk | 0.00000 0.01414 0.01414 0.03459 0.00000Risk1 | 0.05000 0.01468 0.01468 0.03473 0.00000Risk2 | 0.10000 0.01522 0.01522 0.03513 0.00000Risk3 | 0.15000 0.01577 0.01577 0.03575 0.00000Risk4 | 0.20000 0.01631 0.01631 0.03656 0.00290Risk5 | 0.25000 0.01686 0.01686 0.03757 0.00879Risk6 | 0.30000 0.01740 0.01740 0.03876 0.01469Risk7 | 0.35000 0.01794 0.01794 0.04014 0.01842Risk8 | 0.40000 0.01849 0.01849 0.04186 0.01860Risk9 | 0.45000 0.01903 0.01903 0.04400 0.01030

Risk10 | 0.50000 0.01957 0.01957 0.04705 0.00000Risk11 | 0.55000 0.02012 0.02012 0.05127 0.00000Risk12 | 0.60000 0.02066 0.02066 0.05655 0.00000Risk13 | 0.65000 0.02120 0.02120 0.06310 0.00000Risk14 | 0.70000 0.02175 0.02175 0.07073 0.00000Risk15 | 0.75000 0.02229 0.02229 0.07918 0.00000Risk16 | 0.80000 0.02283 0.02283 0.08821 0.00000Risk17 | 0.85000 0.02338 0.02338 0.09766 0.00000Risk18 | 0.90000 0.02392 0.02392 0.10741 0.00000Risk19 | 0.95000 0.02446 0.02446 0.11740 0.00000


Risk20 | 1.0000 0.02501 0.02501 0.12757 0.00000

| TandyC GenMills CEdison Weyerh IBM------------------------------------------------------------------

MINRisk | 0.00000 0.13962 0.33107 0.00000 0.23000Risk1 | 0.00000 0.14510 0.39391 0.00000 0.20065Risk2 | 0.00000 0.14984 0.45189 0.00000 0.17127Risk3 | 0.00233 0.14967 0.49822 0.00000 0.14038Risk4 | 0.01608 0.13800 0.52448 0.00000 0.10190Risk5 | 0.02967 0.12547 0.55006 0.00000 0.06304Risk6 | 0.04327 0.11295 0.57563 0.00000 0.02418Risk7 | 0.06179 0.09117 0.60366 0.00000 0.00000Risk8 | 0.08843 0.05414 0.63573 0.00000 0.00000Risk9 | 0.12514 0.00000 0.67885 0.00000 0.00000

Risk10 | 0.18393 0.00000 0.68348 0.00000 0.00000Risk11 | 0.24489 0.00000 0.68294 0.00000 0.00000Risk12 | 0.32603 0.00000 0.64789 0.00000 0.00000Risk13 | 0.41464 0.00000 0.58536 0.00000 0.00000Risk14 | 0.49826 0.00000 0.50174 0.00000 0.00000Risk15 | 0.58188 0.00000 0.41812 0.00000 0.00000Risk16 | 0.66551 0.00000 0.33449 0.00000 0.00000Risk17 | 0.74913 0.00000 0.25087 0.00000 0.00000Risk18 | 0.83275 0.00000 0.16725 0.00000 0.00000Risk19 | 0.91638 0.00000 0.08362 0.00000 0.00000Risk20 | 1.00000 0.00000 0.00000 0.00000 0.00000

| DEC Mobil Texaco CPL-------------------------------------------------------

MINRisk | 0.00000 0.00000 0.17173 0.12758Risk1 | 0.00000 0.01904 0.16154 0.07977Risk2 | 0.00226 0.04147 0.14849 0.03477Risk3 | 0.01640 0.05408 0.13892 0.00000Risk4 | 0.02424 0.07033 0.12206 0.00000Risk5 | 0.03222 0.08587 0.10487 0.00000Risk6 | 0.04019 0.10142 0.08768 0.00000Risk7 | 0.04509 0.12063 0.05924 0.00000Risk8 | 0.04495 0.14588 0.01228 0.00000Risk9 | 0.04763 0.13808 0.00000 0.00000

Risk10 | 0.04948 0.08310 0.00000 0.00000Risk11 | 0.05155 0.02062 0.00000 0.00000Risk12 | 0.02608 0.00000 0.00000 0.00000Risk13 | 0.00000 0.00000 0.00000 0.00000Risk14 | 0.00000 0.00000 0.00000 0.00000Risk15 | 0.00000 0.00000 0.00000 0.00000Risk16 | 0.00000 0.00000 0.00000 0.00000Risk17 | 0.00000 0.00000 0.00000 0.00000Risk18 | 0.00000 0.00000 0.00000 0.00000Risk19 | 0.00000 0.00000 0.00000 0.00000


Risk20 | 0.00000 0.00000 0.00000 0.00000

Column 3 shows the expected portfolio return, column 4 the risk of the portfolio, and the columns 5 to 15 showthe optimal weights for the 21 optimizations. This summary of results also indicates how the required increasein the portfolio return also increases the portfolio risk value and affects the diversity of the stock selection.


Efficient Frontier of Portfolios: Using PLOT

0.026 +

|

|

| 0.0250083333 *

|

| * 0.0244647811

0.024 + * 0.0239212289

|

| * 0.0233776767

| * 0.0228341245

|

| * 0.0222905723

0.022 +

| * 0.0217470201

| * 0.0212034679

RETN |

| * 0.0206599157

|

0.020 + * 0.0201163635

| * 0.0195728113

|

| * 0.0190292591

|

| * 0.0184857069

0.018 + * 0.0179421547

|

| * 0.0173986025

| * 0.0168550503

|

| * 0.0163114981

0.016 +

| * 0.0157679459

| * 0.0152243937

|

| * 0.0146808415

|

0.014 + * 0.0141372893

-+-----------+-----------+-----------+-----------+-----------+-----------+

0.02 0.04 0.06 0.08 0.10 0.12 0.14

RISK

CRSP Monthly Data, Jan 78 - Dec 87


Compute Tangential Portfolio

To compute the Capital Market Line and the Tangential Portfolio we need to specify the expected return Rf on therisk-free asset. For this example we use the mean value of past risk-free monthly returns which is Rf = .007161. There aremany other ways to estimate the expected return Rf on the risk-free asset (Elton & Gruber, 1981). The Capital MarketLine is defined as the line through the point of the risk-free asset which is tangent to the Efficient Frontier. The pointwhere the capital market line touches the curve of the efficient frontier defines the Tangential Portfolio. We can use PROCNLP to compute the Tangential Portfolio point by maximizing the following (nonlinear) objective function

maxx

Rp(x) − Rf

σp(x)

where Rp(x) is the expected portfolio return

Rp(x) =

n∑

j=1

xjrj

and σp(x) is the expected portfolio risk

σp(x) =

√√√√n∑

j=1

n∑

k=1

σjkxjxk

We have written the following module tanport which uses the earlier generated data and creates a return object tanp

containing the two points of the risk-free and tangential portfolios.

The following input arguments must be specified:

cov covariance matrix of the data,

mean mean vector of the data,

pret the return table from the effront call.

R f the expected risk-free return Rf ,

tech string scalar specifying the QP method,

prit int value specifying the amount of printed output.

Note, that the nlp function in tanport is using finite difference (numerical) first and second order derivatives and may becomputational expensive for very large applications.

function tanport(pret,tech,prit) global(cov,mean,R_f)

{

/*--- Compute Tangential Portfolio ---*/

nc = ncol(cov); nlst = nrow(pret);

tanp = cons(2,nc+3); nc4 = nc + 4;

ind = [ 1 3 4 ] -> [ 5 : nc4 ];

tanp[1,] = pret[1,ind];

/* Creating initial values and constraints */

nint = nlst / 2; nint = (int)nint;

nc4 = nc+4; ind = [ 5 : nc4 ];

x0 = pret[nint,ind];

bc = cons(nc,1,0.) -> cons(nc,1,1.);

lc = cons(1,nc+2,1.);

function ftang(x) global(cov,mean,R_f)

{


sigm = sqrt(x * cov * x‘);

R_p = x * mean‘;

tangp = (R_p - R_f) / sigm;

return(tangp);

}

mopt = [ "tech" tech ,

"max" ,

"print" prit ];

< xr,rp > = nlp(ftang,x0,mopt,bc,lc);

print "XR=",xr; print "RP=",rp;

if (rp[1] < 0) {

print "Error NLP:", rp[1];

tanp = .;

} else {

_risk = sqrt(xr * cov * xr‘);

_retn = mean * xr‘;

print "Tangential Portfolio Risk= ", _risk;

print "Tangential Portfolio Return= ", _retn;

tanp[2,] = [ R_f _retn _risk ] -> xr;

}

return(tanp);

}

Note also, that the global arguments of the inner function ftang must be global arguments of the outer function. Thefollowing call of the module tanport generates the tangential portfolio:

pret = eff1; R_f = .007161;

tech = "trureg"; prit = 3;

tanp = tanport(pret,tech,prit);

The following shows the output of the tanport call:

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X_1 0.00000000 0.1910605 0.0000000 1.0000000

2 X_2 0.18393296 0.0617060 0.0000000 1.0000000

3 X_3 0.00000000 0.1623224 0.0000000 1.0000000

4 X_4 0.68348036 0.1699034 0.0000000 1.0000000

5 X_5 0.00000000 -0.0157392 0.0000000 1.0000000

6 X_6 0.00000000 0.0968653 0.0000000 1.0000000

7 X_7 0.04948184 0.1492296 0.0000000 1.0000000

8 X_8 0.08310484 0.2084576 0.0000000 1.0000000


9 X_9 0.00000000 0.1831224 0.0000000 1.0000000

10 X_10 0.00000000 0.0918813 0.0000000 1.0000000


Linear Constraints

------------------

[ 1]ACT 1.0000000 == + 1.00000 * X_1 + 1.00000 * X_2

+ 1.00000 * X_3 + 1.00000 * X_4

+ 1.00000 * X_5 + 1.00000 * X_6

+ 1.00000 * X_7 + 1.00000 * X_8

+ 1.00000 * X_9 + 1.00000 * X_10

( 4e-016 )






Iteration Start:

N. Variables 10


N. Linear Constr. 1 Lin. Equ. Constr. 1


N. Active Constraints 6+ TR Radius 1.000000000


1 0 2 5’ 0.268550 4.8e-003 0.04861 0.00000 1.00000

2 0 3 5 0.270668 2.1e-003 2e-003 0.00000 0.15782

3 0 4 5 0.270673 4.6e-006 1e-005 0.00000 0.06824


ABSGCONV convergence criterion satisfied.







********************


********************

Parameter Estimates

-------------------



1 X_1 0.01836736 0.1687852

2 X_2 0.09657008 0.1688005

3 X_3 0.04253182 0.1687808

4 X_4 0.64558689 0.1687909

5 X_5 0.00000000 -0.0233571 Lower BC

6 X_6 0.00000000 0.1031017 Lower BC

7 X_7 0.04502064 0.1687927

8 X_8 0.15192322 0.1687875

9 X_9 0.00000000 0.1676175 Lower BC

10 X_10 0.00000000 0.0864063 Lower BC


Linear Constraints Evaluated at Solution

----------------------------------------

[ 1]ACT 1.00000 * X_1 + 1.00000 * X_2

+ 1.00000 * X_3 + 1.00000 * X_4

+ 1.00000 * X_5 + 1.00000 * X_6

+ 1.00000 * X_7 + 1.00000 * X_8

+ 1.00000 * X_9 + 1.00000 * X_10 - 1.00000

= 4.7184e-016

XR=

| 1 2 3 4 5

------------------------------------------------------

1 | 0.01837 0.09657 0.04253 0.64559 0.00000

| 6 7 8 9 10

------------------------------------------------------

1 | 0.00000 0.04502 0.15192 0.00000 0.00000

RP=

| 1

---------------------

Failure | 3.0000

F_Crit | 0.27067

N_Iter | 3.0000

N_Func | 5.0000

N_Grad | 2.0000

N_Jacm | 0.00000

N_Hess | 5.0000

N_Fnlc | 0.00000

N_Jnlc | 0.00000

O_Time | 1.00000

FG_Tim | 2.0000

unused | 0.00000


Tangential Portfolio Risk= 0.04242

Tangential Portfolio Return= 0.01864

rnam2 = [ "MINRisk" "TangPort" ];

cnam2 = [" Percent Return Risk "] -> vnam;

tanp = rname(tanp,rnam2);

tanp = cname(tanp,cnam2);

print "Tangential Portfolio", tanp;

Tangential Portfolio

| Percent Return Risk X1 X2

--------------------------------------------------------------

MINRisk | 0.00000 0.01146 0.22606 0.20301 0.13301

TangPort | 0.00716 0.28513 0.34513 0.27895 0.00000

| X3 X4 X5 X6 X7

--------------------------------------------------------------

MINRisk | 0.05669 0.14032 0.17784 0.00000 0.09447

TangPort | 0.00000 0.28305 0.01474 0.03295 0.23051

| X8 X9 X10

------------------------------------------

MINRisk | 0.07781 0.00000 0.11685

TangPort | 0.01371 0.14609 0.00000


1.5.7 Fitting Quantal Response Models

This section is based on the work of Ying So and it was presented at our tutorial at the 1995 ASA joint meeting in Orlando.This example can be found in tnlp212.inp.

General Remarks

Quantal response data (sometimes called dose-response data, or quantal assay data) may arise in a variety of differentareas. For example in efficacy studies of a drug, subjects are observed if they have a response (that is, a positive outcome)from using the drug.

A two parameter model such the logit model is often used for quantal response data. However, low dose extrapolationbased on the logit model is often unsatisfactory, and models with extra parameters are often preferred. The Aranda-Ordaz(1981) asymmetric model and the quantit model of Copenhaver and Mielke (1977) are three parameter models that containthe logit model as a special case. The probability P (x) of response to dose x is given by

• Logit Model

P (x) =1

1 + e−(α+βx)

• Aranda-Ordaz Asymmetric Model

P (x) =

{1 − (1 + λeα+βx)−1/λ if λeα+βx > −11 otherwise

λ = 1 =⇒ P (x) = 11+e−(α+βx )

• Quantit Model ∫ P (x)

0.5

dz

1 − |2z − 1|ν+1= α + βx, ν > −1

ν = 1 =⇒ P (x) = 1

1+e−4(α+βx)

Here, α and β are the location and scale parameters. The Aranda-Ordaz model has an extra parameter λ, and the modeltreats responses and nonresponses asymmetrically. The quantit model has an extra shape parameter ν. The underlyingdistribution for the quantit model is called the omega distribution, which is a symmetric distribution and includes thedouble-exponential (ν=0), logistic (ν = 1), and uniform (ν=∞) distributions as special cases (Morgan 1992).

For quantal response data, each observation corresponds to a test dose; that is, for the ith dose, xi is given to ni subjectsof whom ri respond. Parameters in the quantal response models are estimated by maximizing the log likelihood for thedata

log likelihood =∑

i

{ri log(P (xi)) + (ni − ri) log(1 − P (xi))}

The following data (Martin, 1942) are used to illustrate the fitting of these models using CMAT.

Log Number of NumberConcentration Subjects Responded

(xi) (ni) (ri)

0.71 49 161.00 48 181.31 48 341.48 49 471.61 50 471.70 48 48


The nlp function can be used for fitting:

1. the logit model: unconstrained fit

2. the Aranda-Ordaz model with a nonlinear constraint specification

3. the quantit model: boundary constrained fit where the objective function applies the quad function in a simpleiterative method for solving an inverse problem.

Fitting the Logit Model

The parameters α and β are estimated by maximizing the log likelihood

l1(α, β) =∑

i

{ri log(Pi) + (ni − ri) log(1 − Pi)}

where

Pi =1

1 + e−α−βxi

The log likelihood can be written as

l1(α, β) =

k∑

i=1

{ri(α + βxi) − ni log(1 + eα+βxi )}

Note that there are no boundary, linear, or nonlinear constraints in the optimization. Let (α0, β0) = (0, 0) be the startingvalue. The quasi-Newton algorithm is used for the optimization. The convariance matrix is obtained as the inverse of thenegative Hessian evaluated at the maximum likelihood estimates.

The CMAT code for fitting the logit model is the following:

assay = [ 1 .71 49 16 ,

2 1.00 48 18 ,

3 1.31 48 34 ,

4 1.48 49 47 ,

5 1.61 50 47 ,

6 1.70 48 48 ];

cnam = [" obs x n r "];

assay = cname(assay,cnam);

function f212_3(x) global(assay) {

t = x[1] + x[2] * assay[,2];

t1 = assay[,4] .* t;

t2 = assay[,3] .* log(1. + exp(t));

loglik = t1 - t2;

f = loglik[+];

return(f);

}

print "Call NLP to maximize the log likelihood";

x0 = [ 0. 0. ];


"max" ,

"print" 4 ];

< mle,rp > = nlp(f212_3,x0,mopt);


pnam = [" Alpha Beta "];

mle = cname(mle,pnam);

print "Xopt=", mle; print "RP=", rp;

Results of the model fit are

• table of starting values:

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X_1 0.00000000 64.000000

2 X_2 0.00000000 110.58500


• optimization history:






Iteration Start:

N. Variables 2

Criterion -202.3989767 Max Grad Entry 110.5850010

TR Radius 1.000000000


1 0 2 0 -150.5319 51.86712 22.2403 13.5561 1.00000

2 0 4 0 -123.5601 26.97172 9.33134 0.93133 3.20091

3 0 5 0 -119.2234 4.336748 2.18167 0.00000 3.26565

4 0 6 0 -119.0934 0.129987 0.11348 0.00000 1.79683

5 0 7 0 -119.0931 2.6e-004 3e-004 0.00000 0.33736

6 0 8 0 -119.0931 1.7e-009 9e-007 0.00000 0.01459




Ridge (lambda) 0.000000000 TR Radius 0.014589254

Act.dF/Pred.dF 1.010888714





• The maximum likelihood estimates:

********************


********************

Parameter Estimates

-------------------


1 X_1 -4.45093726 -8.75e-007

2 X_2 4.46023238 0.0000000


Hessian Matrix

**************


S | 1 2

--------------------------

1 | -37.350524 -42.606529

2 | -42.606529 -52.365980

• Approximate standard errors and correlations:

sopt = [ "grad" "hess" ];

< gopt,hopt > = fider(f212_3,mle,sopt);



v = inv(-hopt);



d = diag(1. / ase);

corr = d * v * d;

print "*** Logit Model ***";




Standard Errors:

| Estimate STDERR

-------------------------------

Alpha | -4.4509 0.61033

Beta | 4.4602 0.51545

Correlations:

S | 1 2

------------------------

1 | 1.00000

2 | -0.96339 1.0000


Fitting the Aranda-Ordaz Asymmetric Model

The parameters α, β, and λ are estimated by maximizing the log likelihood

l2(α, β,λ) =∑

i


where

Pi =

{1 − (1 + λeα+βxi )−1/λ if λeα+βxi > −11 otherwise

For k doses (observations), there are 3k nonlinear constraints:

λeα+βxi > −1

Pi > .000001

1 − Pi > .000001, i = 1, . . . , k

Nonlinear constraints are defined as a multi-valued function of the parameters such that the nonlinear constraints areequivalent to having function values greater than or equal to zero. The nonlinear constraints can be expressed as

{1 + λeα+βxi > 0.Pi − .000001 > 0..999999 − Pi > 0.

}

and the nonlinear constraints function contains the expression to the left of the inequality sign.

Let (α0, β0, λ0) = (−4.5, 4.5, 1) be the starting value. Note that this corresponds to the maximium likelihood estimates forthe logit model. Again, the quasi-Newton algorithm is used for the optimization.

The CMAT code for fitting the Aranda-Ordaz model is the following:

function f212_4(x) global(assay) {

c = 1. + x[3] .* exp(x[1] + x[2] * assay[,2]);

f = .;

if (c > 1.e-8) {

tt = -1. / x[3]; q = c .** tt;

if (q > 1.e-6 && q < .999999) {

t1 = assay[,4] .* log(1. - q);

t2 = (assay[,3] - assay[,4]) .* log(q);

loglik = t1 + t2;

f = loglik[+];

} }

return(f);

}

/* define nonlinear constraints */

function c212_4(x) global(assay) {

c = 1. + x[3] .* exp(x[1] + x[2] * assay[,2]);

if (c < 1.e-8) c = 1.e-8;

tt = -1. / x[3]; q = c .** tt;

if (q < 1.e-6) q = 1.e-6;

if (q > .999999) q = .999999;

c0 = c - 1.e-8;

c1 = q - 1.e-6;

c2 = .999999 - q;

con = c0 |> c1 |> c2;

return(con);

}



x0 = [ -4.5 4.5 1. ];

mopt = [ "tech" "qadpen" ,

"max" ,

"print" 4 ];

< xr,rp > = nlp(f212_4,x0,mopt,.,.,c212_4);

print "Xopt=", xr; print "RP=", rp;

Results of the model fit are

• Table of starting values:

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X_1 -4.50000000 0.1378881

2 X_2 4.50000000 0.0084796

3 X_3 1.00000000 -3.1560483


Values of Nonlinear Constraints

-------------------------------

Constraint Residual

1 1 1.2711725

2 2 2.0000000

3 3 5.0349746

4 4 9.6711376

5 5 16.564614

6 6 24.336065

7 7 0.7866743

8 8 0.4999990

9 9 0.1986097

10 10 0.1033995

11 11 0.0603687

12 12 0.0410903

13 13 0.2133237

14 14 0.4999990

15 15 0.8013883

16 16 0.8965985

17 17 0.9396293

18 18 0.9589077

• The optimization history:

Quadratic Penalty Method (Gould, 1989)





Jacobian Nonlinear Constraints (dense)

(Computed by Finite Differences)

Iteration Start:

N. Variables 3

N. NonLin Constr. 18 NonLin EquConstr. 0


N. Active Constraints 0 Max Const Viol. 0.000000000

Iter act nfun mu optcrit difcrit conmax maxgrad alpha

0 0 2 0.1000000-119.096 0.00000 0.00000 3.15605 0.0000

0 7 ridge -117.344 1.75189 0.00000 5.22057 0.2476

0 9 ridge -115.629 1.71590 0.00000 19.0414 0.1000

0 10 ridge -115.032 0.59684 0.00000 14.1900 1.0000

0 11 ridge -114.959 0.07227 0.00000 8.00062 2.0000

0 14 ridge -114.928 0.03144 0.00000 4.76793 1.4797

0 16 ridge -114.912 0.01634 0.00000 0.22325 1.1159

0 18 ridge -114.912 7e-005 0.00000 2e-004 1.0024

1 0* 19 0.0010000-114.912 4.18474 0.00000 1e-006 1.0000

2 0* 20 1.00e-005-114.912 -9e-012 0.00000 2e-006 1.0000

3 0* 21 1.00e-007-114.912 2e-012 0.00000 3e-006 1.0000

0 37 -114.912 1e-014 0.00000 3e-006 6e-004

0- 66 -114.912 0.00000 0.00000 3e-006 0.0000

4 0* 67 1.00e-008-114.912 -7e-012 0.00000 1e-006 1.0000

0 76 -114.912 7e-012 0.00000 2e-006 1.3173

0- 111 -114.912 0.00000 0.00000 2e-006 0.0000



N. Active Constraints 0


N. Hessian Calls 17 N. Line Searches 13

Preproces. Time 0 Time for Method 1

Effective Time 1

• maximium likelihood estimates:

********************


********************

Parameter Estimates

-------------------


1 X_1 -2.63984895 -7.86e-007

2 X_2 2.02708665 -1.58e-006

3 X_3 -0.40770962 -6.77e-007


Values of Nonlinear Constraints

-------------------------------


Constraint Residual

1 1 0.8772760

2 2 0.7790817

3 3 0.5858659

4 4 0.4154779

5 5 0.2392416

6 6 0.0869838

7 7 0.7253176

8 8 0.5421033

9 9 0.2694456

10 10 0.1159842

11 11 0.0299531

12 12 0.0025036

13 13 0.2746804

14 14 0.4578947

15 15 0.7305524

16 16 0.8840138

17 17 0.9700449

18 18 0.9974944

Hessian Matrix

**************


S | 1 2 3

-------------------------------------

1 | -335.04088 -488.86776 362.57947

2 | -488.86776 -734.79185 567.90034

3 | 362.57947 567.90034 -472.60286

Since none of the nonlinear constraints is active at the maximum liklihood estimates, the covariance matrix can beapproximated by the inverse of the negative Hessian evaluated at the maximum likelihood estimates. The IML code forcomputing the standard errors and correlations is similiar to that for the logit model, and is given as follows:

print "Compute Standard Errors and COV Matrix";


< gopt,hopt > = fider(f212_4,mle,sopt);



v = inv(-hopt); ase = sqrt(dia2vec(v));



d = diag(1. / ase);

corr = d * v * d;

print "*** Aranda-Ordaz Model ***";




The approximate standard errors and correlations are:


Standard Errors:

| Estimate STDERR

--------------------------------

Alpha | -2.6398 0.58885

Beta | 2.0271 0.61358

Lambda | -0.40771 0.31743

Correlations:

S | 1 2 3

----------------------------------

1 | 1.00000

2 | -0.97431 1.00000

3 | -0.83989 0.93610 1.00000

Fitting the Quantit Model

The parameters α, β, and ν are estimated by maximizing the log likelihood

l3(α,β, ν) =∑

i


where ∫ Pi

0.5

dz

1 − |2z − 1|ν+1= α + βxi, ν > −1

Note that there is no closed form for Pi. The evaluation of Pi is carried out by an iterative Newton scheme, which is nestedwithin the global iteration of (α, β, ν).

Suppose α,β, and ν are known. By a change of variable t = 2z,

Pi =

{.5 + .5u if α + βxi ≥ 0.5 − .5u if α + βxi < 0

where u is the solution of the equation ∫ u

0

.5dt

1 − tν+1= |α + βxi|

Let

g(u) =

∫ ui

0

.5dt

1 − tν+1and ci = |α + βxi|

The derivative of g(u) is g′(u) = .51−uν+1 . IML has a subroutine called QUAD that computes definite integrals. With

QUAD you can solve g(u) = ci by an iterative Newton scheme. With a starting value u0, the solution u is obtainediteratively:

um+1 = um − g(um) − ci

g′(um)

Iteration stops when um+1 is sufficiently close to um (for example, |um+1 − um| ¡ 1e-6). The solution of g(u) = ci isu = um+1.

The starting value (α0, β0, ν0) = (−1, 1, 1) is used, which corresponds roughly to the maximum likelihood estimates forthe logit model. Again, the quasi-Newton algorithm is used for the optimization. The boundary constraint of ν > 1 isimposed in the estimation.

The CMAT code for fitting the Quantit model is the following:

/* integrand function */


function qf0(y) global(nup1) {

v = .5 / (1. - y .** nup1);

return(v);

}

The response probability P (x) in the quantit model does not have a closed form. The function module for the quantitmodel

function fquant(x) global(ipr,assay,nup1) {

nobs = nrow(assay);

logl = 0.; nup1 = x[3] + 1;

for (i = 1; i <= nobs; i++) {

/* g(u)=int(0 to u)dt/(1-t**(nu+1)) */

/* solve u when g(u)=alpha + beta*y */

hx = assay[i,2]; hn = assay[i,3]; hr = assay[i,4];

c = x[1] + x[2] * hx;

if (ipr) print "next obs: i,c=",i,c;

flag = 0;

if (c < 0) { c = -c; flag = 1; }

if (c < 1.e-10) p = .5;

else {

/* Newton method to solve u iteratively:

adjust u so that area z matches c */

err = 1;

u = 0.99999999;

while (abs(err) > 1e-6) {

/* call quad(z,"qf0",{0.}||u) msg=’no’; */

ab = [ 0. u ];

optn = cons(6,1,.);

optn[1] = 1.e-7; optn[2] = 1.e-3;

z = quad(qf0,ab,optn);

/* optn = [ 1.e-8 1.e-6 . 1 ];

zv = quad(qf0,ab,optn);

z = zv[1]; aer = zv[2]; nfun = zv[3]; */

err = z - c;

der = qf0(u);

u = u - err / der;

if (ipr) print "err,area,der,u=",err,z,der,u;

}

p= .5 *(1 + u);

if (flag) p = 1. - p;

}

t1 = hr * log(p);

t2 = (hn - hr) * log(1.-p);

if (ipr) print "result: i,p,t1,t2=",i,p,t1,t2;

logl += t1 + t2;

}

return(logl);

}


x0 = [ -1. 1. 1. ]; nup1 = x0[3] + 1.;

bc = cons(3,2,.); bc[3,1] = -1.;


ipr = 1;

f0 = fquant(x0);

print "F0=",f0;

ipr = 0;


"max" ,

"print" 4 ];

< mle,rp > = nlp(fquant,x0,mopt,bc);

mle = cname(mle,pnam);

print "Xopt=", mle; print "RP=", rp;

Results of the quantit model fit are

• Table of starting values:

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X_1 -1.00000000 12.414603 . .

2 X_2 1.00000000 21.773557 . .

3 X_3 1.00000000 3.0910907 -1.0000000 .


• Optimization history: trust region algorithm:






Iteration Start:

N. Variables 3





1 0 2 0 -118.1159 1.512216 1.36995 0.00000 1.00000

2 0 5 0 -117.4469 0.669010 5.40781 0.00000 2.00775

3* 0 7 0 -116.6419 0.804961 14.4504 1.06281 1.88212

4 0 8 0 -116.2851 0.356869 0.36654 0.00000 1.88344

5 0 9 0 -116.1738 0.111307 2.88726 0.00000 1.45282

6 0 10 0 -116.1438 0.029981 0.75527 0.00000 1.58655

7 0 11 0 -116.1377 6.1e-003 0.36460 0.00000 1.17163

8 0 12 0 -116.1370 6.8e-004 0.07157 0.00000 0.80234


9 0 13 0 -116.1370 1.9e-005 3e-003 0.00000 0.36866

10 0 14 0 -116.1370 2.5e-008 3e-006 0.00000 0.07258









• Optimization history: Quasi Newton algorithm: This run is about twice as fast as trust region, since quasi Newtondoes not need to compute the Hessian matrix. The computing time, however, could be reduced even more by slightlychanging the CMAT code.

Dual Quasi-Newton Optimization

Dual Broyden - Fletcher - Goldfarb - Shanno Update (DBFGS)


Iteration Start:

N. Variables 3




Iter rest nfun act optcrit difcrit maxgrad alpha slope

1 0 4 0 -119.4155 0.212638 4.44892 3e-003 -146.454

2 0 5 0 -119.0164 0.399091 1.71149 0.10000 -5.90599

3 0 7 0 -117.4285 1.587894 21.9755 4.73012 -0.60501

4 0 8 0 -116.9244 0.504062 11.8815 1.00000 -0.98625

5 0 10 0 -116.7344 0.190039 12.0442 0.51747 -0.73094

6 0 12 0 -116.3257 0.408689 18.4741 2.61262 -0.36370

7 0 14 0 -116.2117 0.114066 3.83224 1.46279 -0.15546

8 0 16 0 -116.1691 0.042518 1.93490 1.42499 -0.04983

9 0 17 0 -116.1505 0.018651 3.78431 3.16786 -0.03355

10 0 18 0 -116.1440 6.4e-003 2.20529 1.99705 -0.02335

11 0 19 0 -116.1372 6.9e-003 0.19143 1.15460 -0.01393

12 0 21 0 -116.1370 1.5e-004 0.06318 1.00000 -3e-004

13 0 23 0 -116.1370 9.8e-007 0.03517 1.00000 -3e-006




N. Active Constraints 0 Slope SDirect. -2.7794e-006


N. Line Searches 13 Preproces. Time 1


• maximum likelihood estimates:

********************


********************


Parameter Estimates

-------------------


1 X_1 -0.81175994 4.74e-006

2 X_2 0.81088337 6.85e-006

3 X_3 9.88180051 -8.76e-008


Hessian Matrix

**************


S | 1 2 3

-------------------------------------

1 | -1689.9310 -2239.0267 -4.4075941

2 | -2239.0267 -3166.7312 -7.2265755

3 | -4.4075941 -7.2265755 -0.0271515

• approximate standard errors and correlations:


< gopt,hopt > = fider(fquant,mle,sopt);



v = inv(-hopt);



d = diag(1. / ase);

corr = d * v * d;

print "*** Quantit Model ***";




Standard Errors:

| Estimate STDERR

-------------------------------

Alpha | -0.81176 0.12832

Beta | 0.81088 0.11356

Nu | 9.8818 12.830

Correlations:

S | 1 2 3

----------------------------------

1 | 1.00000

2 | -0.96836 1.00000

3 | 0.65695 -0.78273 1.00000

The observed data and the fitted models are shown in the following figure.Logit, Aranda-Ordaz Asymmetric, and Quantit Models


1.5.8 Neural Nets with One Hidden Layer

This example can be found in tnlp90.inp. The following module computes the function vector of of residuals for theleast-squares problem:

function fneur902(x) global (data,indx,indy,nh) {

nr = nrow(data);

xdat = data[,indx]; nx = size(indx);

ydat = data[,indy]; ny = size(indy);

npar = size(x); /* print "npar=",npar; */

na2 = nx * nh;

nb1 = na2 + 1; nb2 = na2 + nh * ny;

nc1 = nb2 + 1; nc2 = nb2 + nh;

nd1 = nc2 + 1; nd2 = nc2 + ny;

/* mequ= nr * ny = 49 * 3 */

/* nx * nh + nh * ny + nh + ny =

= 40 + 30 + 10 + 3 = 83 parms */

a = x[ 1:na2 ]; a = shape(a,nx,nh);

b = x[ nb1:nb2 ]; b = shape(b,nh,ny);

c = x[ nc1:nc2 ]; c1 = cons(nr,1,1.) @ c;

d = x[ nd1:nd2 ]; d1 = cons(nr,1,1.) @ d;

sum = c1 + xdat * a;

h = 1. / (1. + exp(-sum));

sum = d1 + h * b;

yhat = 1. / (1. + exp(-sum));

res = ydat - yhat;

fun = shape(res,.,1);

/* print "Fun=",fun; */

return(fun);

}

The following module computes the Jacobian matrix of the least-squares problem:

function jneur902(x) global (data,indx,indy,nh) {

nr = nrow(data);

xdat = data[,indx]; nx = size(indx);

ydat = data[,indy]; ny = size(indy);

npar = size(x);

/* print "JNEUR2: nr=",nr," npar=",npar," nx=",nx," ny=",ny; */

na2 = nx * nh;

nb1 = na2 + 1; nb2 = na2 + nh * ny;

nc1 = nb2 + 1; nc2 = nb2 + nh;

nd1 = nc2 + 1; nd2 = nc2 + ny;

/* nx * nh + nh * ny + nh + ny =

= 40 + 30 + 10 + 3 = 83 parms */

a = x[ 1:na2 ]; a = shape(a,nx,nh);

b = x[ nb1:nb2 ]; b = shape(b,nh,ny);

c = x[ nc1:nc2 ]; c1 = cons(nr,1,1.) @ c;


d = x[ nd1:nd2 ]; d1 = cons(nr,1,1.) @ d;

fun = cons(nr*ny,1); /* mequ= nr * ny = 49 * 3 */

jacm = cons(nr*ny,npar);

h = u = cons(nr,nh);

yhat = v = cons(nr,ny);

sum = c1 + xdat * a;

u = exp(-sum);

h = 1. / (1. + u);

sum = d1 + h * b;

v = exp(-sum);

yhat = 1. / (1. + v);

res = ydat - yhat;

fun = shape(res,.,1);

nry = nr * ny;

v = shape(v,nry,1);

yhat = shape(yhat,nry,1);

yind = [ 1:ny:nry ];

/* print "H=",h; print "Yhat=", yhat; */

vyy = -v .* (yhat .* yhat);

uhh = u .* (h .* h);

/* [1] Deriv w.r.t. xd[ny]: cols 81-83 */

if (ny == 1) {

jacm[ ,nd1] = vyy;

} else {

j = nd1;

for (iy = 1; iy <= ny; iy++, j++) {

ind = [ iy:ny:nry ];

jacm[ind,j] = vyy[ind];

} }

/* print "Derivs w.r.t. d[ny] are computed"; */

/* [2] Deriv w.r.t. xb[nh,ny]: cols 41-70 */

if (ny == 1) {

j = nb1;

for (ih = 1; ih <= nh; ih++, j++)

jacm[,j] = vyy .* h[,ih];

} else {

for (iy = 1; iy <= ny; iy++) {

ind = [ iy:ny:nry ];

j = nb1 + iy - 1;

for (ih = 1; ih <= nh; ih++, j+=ny)

jacm[ind,j] = vyy[ind] .* h[,ih];

} }

/* print "Derivs w.r.t. b[nh,ny] are computed"; */

/* [3] Deriv w.r.t. D(yhat[iy]) / D(xc[ih]): cols 71-80 */

j = nc1;

for (ih = 1; ih <= nh; ih++, j++) {

tt = uhh[,ih] @ b[ih,]‘;

jacm[,j] = vyy .* tt;

}


/* print "Derivs w.r.t. c[nh] are computed"; */

/* [4] Deriv w.r.t. xa[nx,nh]: col 1-40 */

for (ih = 1; ih <= nh; ih++) {

tt = uhh[,ih] @ b[ih,]‘;

vv = vyy .* tt;

for (iy = 1; iy <= ny; iy++) {

ind = [ iy:ny:nry ]; ww = vv[ind];

j = ih;

for (ix = 1; ix <= nx; ix++, j+=nh)

jacm[ind,j] = ww .* xdat[,ix];

} }

/* print "Derivs w.r.t. a[nx,nh] are computed"; */

/* print " * Leaving JNEUR902"; */

return(jacm);

}

The complete data can be found in tnlp90.inp:

print "TNLP90: Neural Network Problem, n=83";

cont= [ 1.00 0.00 0.00 0.00 95.29 -.75 1.30 ,

0.75 0.25 0.00 0.00 68.28 -.85 -2.54 ,

.................................................................

0.34 0.00 0.33 0.33 33.98 33.25 -29.88 ,

0.34 0.33 0.00 0.33 37.46 27.13 -15.02 ];

cont[,5] = cont[,5] / 100.;

cont[,6] = (cont[,6] + 1.) / 60.;

cont[,7] = (cont[,7] + 42.) / 55.;

data = con2 = cont[1:49,];

cnam = [ "x1":"x4" "y1":"y3" ];

data = con2 = cname(con2,cnam);

print "CON2=", con2;

nr = nrow(con2);

indx = [ 1:4 ]; indy = [ 5:7 ];

nx = size(indx); ny = size(indy);

nh = 10; /* np = 83; */

np = nx * nh + nh * ny + nh + ny;

print "NX,NY=",nx,ny," NP=",np;

mequ = nr * ny;

lbc = cons(np,1,-30.);

ubc = cons(np,1, 30.);

bc = lbc -> ubc; /* print "BC=",bc; */

x0 = cons(1,np,1.);


fun2 = fneur902(x0);

/* print "Fun2=",fun2; */

jac21 = jneur902(x0);

/* print "JAC21=",jac21; */

jac22 = fider(fneur902,x0);

/* print "JAC22=", jac22; */

jres = jac21 - jac22;

jres = ssq(jres);

print "Squared sum of Difference=",jres;

NX,NY= 4 3 NP= 83

Squared sum of Difference= 7.754e-014

The optimization function creates a singular Hessian and a large number of local minima. Any fit with a nonzero objectivefunction represents a local and not the global minimum:

print "Levenberg-Marquardt LS Algorithm";

mopt = [ "tech" "levmar" ,

"print" 3 ,

"maxit" 3000 ,

"maxfu" 8000 ];

< xr, rp > = nlp(fneur902,x0,mopt,bc,.,.,jneur902);

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 Par_1 1.00000000 1.18e-004 -30.000000 30.000000

2 Par_2 1.00000000 1.18e-004 -30.000000 30.000000

3 Par_3 1.00000000 1.18e-004 -30.000000 30.000000

.......................................................

81 Par_81 1.00000000 0.0019650 -30.000000 30.000000

82 Par_82 1.00000000 0.0018040 -30.000000 30.000000

83 Par_83 1.00000000 0.0012727 -30.000000 30.000000


Levenberg-Marquardt Optimization

Scaling Update of More (1978)

User Specified Jacobian (dense)

334 Details: Nonlinear L1 and L∞ Estimation

Iteration Start:






1* 0 3 82 23.40551 9.861258 2e-038 4e-014 0.32744




N. Active Constraints 82 Ridge (lambda) 4.4409e-014





1.6 Details on Nonlinear L1 and L∞ Estimation

1.6.1 Nonlinear L1 Estimation

For linear L1 regression see the reg() function. For nonlinear regression the objective function is defined as a sum of theabsolute values of other (nonlinear) functions:

f(x) = |f1(x)|+ · · · + |fm(x)| −→ minx

This function is nonsmooth, and you cannot use one of the smooth general optimization algorithms. It may be moreefficient to use the NL1REG algorithm than to use the Nelder-Mead or genetic algorithms. There are two algorithmsavailable

• An algorithm by Hald and Madsen ([345]). This algorithm is selected by the "vs"=2 option and is the defaultmethod. This algorithm permits the specification of simple bounds or linear constraints.

• An algorithm developed by Pinar and Hartmann ([674]), which uses a dual formulation of the nonlinear L1 problemthat can be solved by a modification of the Conn and Gould algorithm ([173]). This algorithm is selected by the"vs"=1 option. Currently this algorithm solves only unconstrained problems.

Additionally we can always use the Nelder-Mead algorithm for nonsmooth problems.

The following example is taken from Madsen et. al (1990). We first report the results of the unconstrained case comparingthe performances of the Hald & Madsen (1985), Pinar & Hartmann (1999), and the unconstrained Nelder-Mead algorithms:

function fmadsn1(x) {

f = cons(3); r = x[2];

f[1] = 1.5 - x[1] * (1. - r); r *= x[2];

f[2] = 2.25 - x[1] * (1. - r); r *= x[2];

f[3] = 2.625 - x[1] * (1. - r);

return(f);

}

Details: Nonlinear L1 and L∞ Estimation 335

function jmadsn1(x) {

jac = cons(3,2); r = x[2];

jac[1,1] = r - 1.; jac[1,2] = x[1]; r *= x[2];

jac[2,1] = r - 1.; jac[2,2] = 2. * x[1] * x[2]; r *= x[2];

jac[3,1] = r - 1.; jac[3,2] = 3. * x[1] * x[2] * x[2];

return(jac);

}

We first show the results of the Pinar and Hartmann algorithm specified by version 1:

x0 = [ 1. 1. ];

mopt = [ "tech" "nl1reg",

"vs" 1,

"print" 4,

"xtol" 1.e-10 ];

< xr, rp> = nlp(fmadsn1,x0,mopt,.,jmadsn1);

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X1 1.00000000 0.0000000

2 X2 1.00000000 6.0000000


Function Values

---------------

Function Value

1 F1 1.50000000

2 F2 2.25000000

3 F3 2.62500000

Nonlinear L1 Regression (Gould, 1989; Pinar, 1998)



Iteration Start:



Iter nact nfun mu optcrit difcrit maxgrad alpha

0 3 1 2.62500000 2.705357 0.000000 5.285714 0.00000


3 5 donc 0.363418 2.341940 1.582348 4.43799

1 0 6 0.26250000 1.864483 0.840875 4.295234 0.00000

1 7 ridge 0.580973 1.283509 4.459205 0.10000

3 9 0.137872 0.443101 4.343026 0.37712

3 10 donc 0.061460 0.076412 2.244040 0.28709

3 13 0.017698 0.043761 1.730377 1.06053

3 15 4.9e-003 0.012820 0.907127 2.00000

3 16 1.2e-004 4.8e-003 0.058984 1.00000

2 3* 17 0.00632954 4.5e-003 4.1e-004 1.684852 1.00000

3 19 4.3e-006 4.5e-003 0.162871 1.00215

3 20 4.7e-013 4.3e-006 1.3e-005 1.00000

3 3* 21 5.666e-008 5.3e-008 4.2e-006 1.455511 1.00000

3 24 nsac 6.3e-021 5.3e-008 1.9e-006 1.00000

3 27 nsac 3.5e-024 6.3e-021 4.7e-008 1.00000

4 3* 28 1.000e-008 3.2e-023 6.3e-021 3.3e-007 1.00000

3 31 nsac 0.000000 3.2e-023 0.000000 1.00000



N Function Calls 32 N Gradient Calls 2

N Jacobian Calls 24 N Hessian Calls 17

N Line Searches 9 Effective Time 1

********************


********************

Parameter Estimates

-------------------


1 X1 3.00000000 -2.1250000

2 X2 0.50000000 8.2500000


First Order Lagrange Multipliers

--------------------------------

Constraint Lag. Mult.

Nonlin EC [1] 0.000000000

Nonlin EC [2] 0.000000000

Nonlin EC [3] 0.000000000

Function Values

---------------

Function Value

1 F1 0.00000000

2 F2 0.00000000

3 F3 0.00000000


Jacobian Matrix

***************

| 1 2

--------------------------

1 | -0.5000000 3.0000000

2 | -0.7500000 3.0000000

3 | -0.8750000 2.2500000

The following are the results of the Hald and Madsen (default) algorithm which is more efficient in this application:

x0 = [ 1. 1. ];

lcon = [ -2. -1. 1. . ];


"print" 4,

"xtol" 1.e-10 ];


We only report the iteration history and results:

Nonlinear L1 Regression (Hald and Madsen, 1985)



Iteration Start:



Iter nact nfun stag optcrit difcrit deltax alpha funcres

1 0 2 1 5.814000 0.561000 0.10000 0.01667 .

2 0 3 1 4.614600 1.199400 0.20000 0.31827 .

3 0 4 1 2.242200 2.372400 0.40000 0.18653 .

4 1 5 1 0.741830 1.500370 0.80000 0.87139 .

5 2 6 1 0.184977 0.556852 0.55169 0.06786 .

6 2 7 1 8.5e-004 0.184126 0.04736 1e-003 .

7 2 8 1 1.5e-006 8.5e-004 9e-004 9e-007 .

8 2 9 1 1.6e-013 1.5e-006 9e-007 3e-014 .

9 2 10 1 1.1e-015 1.6e-013 3e-014 2e-015 .


Completely constrained search space.



N Jacobian Calls 12 Effective Time 0

********************


********************


Parameter Estimates

-------------------


1 X1 3.00000000 -2.1250000

2 X2 0.50000000 8.2500000



--------------------------------


Nonlin EC [7] -0.500000000

Nonlin EC [5] 3.000000000

Function Values

---------------

Function Value

1 F1 2.220e-016

2 F2 4.441e-016

3 F3 4.441e-016

Jacobian Matrix

***************

| 1 2

--------------------------

1 | -0.5000000 3.0000000

2 | -0.7500000 3.0000000

3 | -0.8750000 2.2500000

The following are the results of the Nelder-Mead simplex algorithm:

x0 = [ 1. 1. ];

mopt = [ "lav" . ,

"tech" "nmsimp",

"print" 4 ];

< xr, rp> = nlp(fmadsn1,x0,mopt);

******************

Optimization Start

******************

Parameter Estimates

-------------------

Parameter Estimate


1 X1 1.00000000

2 X2 1.00000000


Function Values

---------------

Function Value

1 F1 1.50000000

2 F2 2.25000000

3 F3 2.62500000


Iteration Start:


Criterion 6.375000000

Iter rest nfun act optcrit difcrit std delta size

1 0 12 0 0.70179 0.67321 0.2756 1.0000 0.5000

2 0 20 0 0.27418 0.22515 1e-001 1.0000 0.2988

3 0 29 0 0.04709 0.05306 2e-002 1.0000 0.1010

4 0 38 0 0.01458 0.01101 5e-003 1.0000 2e-002

5 0 47 0 2e-003 3e-003 1e-003 1.0000 8e-003

6 0 56 0 3e-004 2e-003 8e-004 1.0000 1e-003

7 0 65 0 3e-004 9e-005 4e-005 1.0000 4e-004

8 0 74 0 5e-005 3e-005 1e-005 1.0000 1e-004

9 0 84 0 5e-006 1e-005 5e-006 1.0000 2e-005

10 0 94 0 1e-006 3e-006 1e-006 1.0000 3e-006

11 0 104 0 8e-008 5e-007 2e-007 1.0000 5e-007


FCONV2 convergence criterion satisfied.


N Function Calls 106 Effective Time 0

********************


********************

Parameter Estimates

-------------------

Parameter Estimate

1 X1 3.00000009

2 X2 0.50000003



Function Values

---------------

Function Value

1 F1 4.836e-008

2 F2 2.612e-008

3 F3 -8.208e-009

Finally we report the results of the linearly constrained case comparing the performances of the Hald & Madsen (1985)and the constrained Nelder-Mead (COBYLA) algorithms:

x0 = [ 1. 1. ];


"lincon" "lcon",

"print" 4,

"xtol" 1.e-10 ];


The following are the results of the Hald-Madsen algorithm:

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X1 1.00000000 0.0000000

2 X2 1.00000000 6.0000000


Linear Constraints

------------------

[ 1] 2.0000000 >= + 1.00000 * X1 - 1.00000 * X2 ( 2.00000 )

Function Values

---------------

Function Value

1 F1 1.50000000

2 F2 2.25000000

3 F3 2.62500000

Jacobian Matrix

***************


| 1 2

--------------------------

1 | 0.0000000 1.0000000

2 | 0.0000000 2.0000000

3 | 0.0000000 3.0000000

Nonlinear L1 Regression (Hald and Madsen, 1985)



Iteration Start:




N_Active Constraints 0


1 0 2 1 5.814000 0.561000 0.10000 0.01667 .

2 0 3 1 4.614600 1.199400 0.20000 0.31827 .

3 0 4 1 2.242200 2.372400 0.40000 0.18653 .

4 1 5 1 0.741830 1.500370 0.80000 0.06010 .

5 2 6 1 0.578231 0.163599 0.04560 0.02259 .

6 2 7 1 0.575963 2.3e-003 9e-004 7e-004 .

7 2 8 1 0.575962 1.1e-006 4e-007 3e-007 .

8 2 9 2 0.575962 2.9e-013 2e-013 1.00000 4e-016







At least one element of the (projected) gradient

is greater than 1e-3.

********************


********************

Parameter Estimates

-------------------


1 X1 2.36602540 -1.1830127

2 X2 0.36602540 0.3169873




----------------------------------------

[ 1]ACT -1.00000 * X1 + 1.00000 * X2 + 2.00000 = 0.000000000


--------------------------------


Linear IC [1] 0.750000000

Function Values

---------------

Function Value

1 F1 -2.220e-016

2 F2 0.20096189

3 F3 0.37500000

Jacobian Matrix

***************

| 1 2

--------------------------

1 | -0.6339746 2.3660254

2 | -0.8660254 1.7320508

3 | -0.9509619 0.9509619

The following are the results of Powell’s COBYLA algorithm:

x0 = [ 1. 1. ];

lcon = [ -2. -1. 1. . ];

mopt = [ "lav" . ,

"tech" "cobyla",

"lincon" "lcon",

"print" 4 ];


******************

Optimization Start

******************

Parameter Estimates

-------------------

Parameter Estimate

1 X1 1.00000000

2 X2 1.00000000



Linear Constraints

------------------

[ 1] 2.0000000 >= + 1.00000 * X1 - 1.00000 * X2 ( 2.00000 )

Function Values

---------------

Function Value

1 F1 1.50000000

2 F2 2.25000000

3 F3 2.62500000

COBYLA Algorithm by M.J.D. Powell (1992)

Iteration Start:



Criterion 6.375000000 N_Active Constraints 0

Iter rest nfun objfunc conmax meritf difmerit rho

1 0 8 2.26578 0.00000 2.265782 4.10922 0.50000

2 0 16 0.61213 1e-015 0.612128 1.65365 0.28125

3 0 19 0.58603 0.04972 0.586033 0.02610 0.07031

4 0 23 0.57969 6e-017 0.579695 6e-003 0.01758

5 0 27 0.57692 1e-016 0.576920 3e-003 4e-003

6 0 31 0.57624 0.00000 0.576242 7e-004 1e-003

7 0 34 0.57607 0.00000 0.576073 2e-004 4e-004

8 0 38 0.57601 0.00000 0.576010 6e-005 2e-004

9 0 44 0.57598 0.00000 0.575976 3e-005 6e-005

10 0 47 0.57597 6e-017 0.575967 9e-006 2e-005

11 0 51 0.57596 6e-017 0.575964 3e-006 8e-006

12 0 57 0.57596 0.00000 0.575962 1e-006 3e-006

13 0 60 0.57596 0.00000 0.575962 7e-008 8e-007

14 0 66 0.57596 0.00000 0.575962 3e-007 2e-007

15 0 71 0.57596 6e-017 0.575962 4e-008 5e-008

16 0 75 0.57596 1e-016 0.575962 7e-009 1e-008

17 0 76 0.57596 1e-016 0.575962 0.00000 1e-008


ABSXCONV convergence criterion satisfied.

Criterion 0.575961899 Max Const Viol. 0.000000000

N_Active Constraints 1 TR Radius 0.000000010


********************


********************


Parameter Estimates

-------------------

Parameter Estimate

1 X1 2.36602540

2 X2 0.36602540



----------------------------------------

[ 1]ACT -1.00000 * X1 + 1.00000 * X2 + 2.00000 = 0.000000000

Function Values

---------------

Function Value

1 F1 -9.024e-009

2 F2 0.20096189

3 F3 0.37500000

Finally we report the results of the linearly constrained case comparing the performances of the Hald & Madsen (1985)and the constrained Nelder-Mead (COBYLA) algorithms:

x0 = [ 1. 1. ];

lcon = [ -2. -1. 1. . ];

mopt = [ "tech" "nlireg",

"lincon" "lcon",

"print" 4,

"xtol" 1.e-10 ];


******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X1 1.00000000 0.0000000

2 X2 1.00000000 3.0000000


Linear Constraints

------------------

[ 1] 2.0000000 >= + 1.00000 * X1 - 1.00000 * X2 ( 2.00000 )


Function Values

---------------

Function Value

1 F1 1.50000000

2 F2 2.25000000

3 F3 2.62500000

Jacobian Matrix

***************

| 1 2

--------------------------

1 | 0.0000000 1.0000000

2 | 0.0000000 2.0000000

3 | 0.0000000 3.0000000

Nonlinear Linf Regression (Hald and Madsen, 1981)



Iteration Start:






1 1 2 1 2.354000 0.271000 0.10000 0.03333 .

2 1 3 1 1.955291 0.398709 0.20000 0.08180 .

3 1 4 1 1.487598 0.467693 0.40000 0.24330 .

4 1 5 2 0.734892 0.752706 0.80000 0.65357 1.00008

5 2 7 1 0.437671 0.297221 0.40000 0.49327 .

5 3 8 1 0.437671 0.000000 0.45862 0.91631 .

6 2 9 1 0.391681 0.045991 0.11466 0.22908 .

7 2 10 1 0.375027 0.016654 0.11466 0.39834 .

8 2 11 2 0.375001 2.6e-005 5e-003 1.00000 2e-003

9 2 12 2 0.375000 6.0e-007 6e-004 1.00000 1e-005

10 2 13 2 0.375000 1.6e-011 3e-006 1.00000 1e-008

11 2 14 2 0.375000-4.4e-016 3e-009 1.00000 4e-014

12 2 15 2 0.375000 4.4e-016 1e-014 1.00000 3e-016







********************



********************

Parameter Estimates

-------------------


1 X1 2.36602540 -0.9509619

2 X2 0.36602540 0.9509619



----------------------------------------

[ 1]ACT -1.00000 * X1 + 1.00000 * X2 + 2.00000 = 0.000000000


--------------------------------


Linear IC [1] 0.950961894

Projected Gradient

------------------

Free Dim. Proj. Grd.

1 0.00000000

Function Values

---------------

Function Value

1 F1 -2.220e-016

2 F2 0.20096189

3 F3 0.37500000

Jacobian Matrix

***************

| 1 2

--------------------------

1 | -0.6339746 2.3660254

2 | -0.8660254 1.7320508

3 | -0.9509619 0.9509619

Of course, we could also use the MINMAX technique to obtain the same results. The COBYLA algorithm solves moregeneral nonsmooth problems and is of course not as fast as the more specific Hald–Madsen algorithm:

x0 = [ 1. 1. ];


lcon = [ -2. -1. 1. . ];

mopt = [ "linf" . ,

"tech" "cobyla",

"lincon" "lcon",

"print" 4 ];


COBYLA Algorithm by M.J.D. Powell (1992)

Iteration Start:



Criterion 2.625000000 N_Active Constraints 0

Iter rest nfun objfunc conmax meritf difmerit rho

1 0 10 0.46048 0.00000 0.460482 2.16452 1.12500

2 0 14 0.29195 0.19887 0.291952 0.16853 0.28125

3 0 16 0.37526 2e-016 0.375264 -0.08331 0.07031

4 0 19 0.37503 2e-016 0.375029 2e-004 0.01758

5 0 23 0.37500 2e-016 0.375000 3e-005 4e-003

6 0 28 0.37500 2e-016 0.375000 -2e-016 3e-004

7 0 30 0.37500 2e-016 0.375000 2e-016 7e-005

8 0 34 0.37500 6e-017 0.375000 1e-010 2e-005

9 0 38 0.37500 0.00000 0.375000 6e-011 4e-006

10 0 44 0.37500 6e-017 0.375000 6e-012 1e-006

11 0 50 0.37500 1e-016 0.375000 1e-014 7e-008

12 0 54 0.37500 0.00000 0.375000 7e-016 2e-008

13 0 55 0.37500 0.00000 0.375000 0.00000 1e-008



Criterion 0.375000000 Max Const Viol. 0.000000000

N_Active Constraints 1 TR Radius 0.000000010


********************


********************

Parameter Estimates

-------------------

Parameter Estimate

1 X1 2.36602540

2 X2 0.36602540



----------------------------------------


[ 1]ACT -1.00000 * X1 + 1.00000 * X2 + 2.00000 = 0.000000000

Function Values

---------------

Function Value

1 F1 -5.955e-009

2 F2 0.20096189

3 F3 0.37500000

1.6.2 Nonlinear L∞ (Chebyshev) Estimation

For linear L∞ regression see the reg() function. For nonlinear regression the objective function is defined as the maximumof the absolute values of other (nonlinear) functions:

f(x) = max(|f1(x)|, . . . , |fm(x)|) −→ minx

where x = (x1, . . . , xn) and the m functions fi(x), i = 1, . . . ,m are twice continously differentiable. This function isnonsmooth, and you should not use one of the smooth general optimization algorithms. Only the MINCIN algorithm byHald and Madsen ([344]) is implemented here. This algorithm is designed to solve the following slightly more generalproblem

Mmaxi=1

Fi(x) −→ minx

where x = (x1, . . . , xn) and the M functions Fi(x), i = 1, . . . , M are twice continously differentiable. This problem is solvedwhen specifying the MINMAX technique. For solving the L∞ regression problem

mmaxi=1

|fi(x)| −→ minx

with the MINMAX technique we must specify M = 2m equations

Fi(x) = fi(x), for i = 1, . . . ,m

Fi(x) = −fi(x), for i = m + 1, . . . , 2m

However, when specifying the NLIREG technique, this substitution is performed automatically.

The following example is taken from Madsen et. al (1990). We first report the results of the unconstrained case comparingthe performances of the Hald & Madsen (1985) and the unconstrained Nelder-Mead algorithms. The problem is specifiedwith function and Jacobian:


f = cons(3); r = x[2];

f[1] = 1.5 - x[1] * (1. - r); r *= x[2];

f[2] = 2.25 - x[1] * (1. - r); r *= x[2];

f[3] = 2.625 - x[1] * (1. - r);

return(f);

}


jac = cons(3,2); r = x[2];

jac[1,1] = r - 1.; jac[1,2] = x[1]; r *= x[2];

jac[2,1] = r - 1.; jac[2,2] = 2. * x[1] * x[2]; r *= x[2];

jac[3,1] = r - 1.; jac[3,2] = 3. * x[1] * x[2] * x[2];

return(jac);

}


The following are the results of the Hald & Madsen (1981) algorithm:

x0 = [ .5 1.5 ];

mopt = [ "tech" "nlireg",

"print" 4,

"xtol" 1.e-10 ];

< xr, rp> = nlp(fchabad1,x0,mopt,.,jchabad1);

******************

Optimization Start

******************

Parameter Estimates

-------------------


1 X1 1.00000000 0.0000000

2 X2 1.00000000 3.0000000


Function Values

---------------

Function Value

1 F1 1.50000000

2 F2 2.25000000

3 F3 2.62500000

Jacobian Matrix

***************

| 1 2

--------------------------

1 | 0.0000000 1.0000000

2 | 0.0000000 2.0000000

3 | 0.0000000 3.0000000

Nonlinear Linf Regression (Hald and Madsen, 1981)



Iteration Start:




1 1 2 1 2.354000 0.271000 0.10000 0.03333 .

2 1 3 1 1.955291 0.398709 0.20000 0.08180 .


3 1 4 1 1.487598 0.467693 0.40000 0.24330 .

4 1 5 2 0.734892 0.752706 0.80000 0.65357 1.00008

5 2 7 1 0.411440 0.323452 0.40000 0.43984 .

6 3 8 1 0.321166 0.090274 0.54110 0.31668 .

7 3 9 1 3.7e-003 0.317456 0.34435 2e-003 .

8 3 10 1 1.0e-005 3.7e-003 2e-003 3e-005 .

9 3 11 2 5.5e-011 1.0e-005 2e-005 1.00000 5e-005

10 3 12 2 2.2e-016 5.5e-011 5e-011 1.00000 7e-011


XCONV convergence criterion satisfied.




********************


********************

Parameter Estimates

-------------------


1 X1 3.00000000 -0.5000000

2 X2 0.50000000 3.0000000


Function Values

---------------

Function Value

1 F1 2.220e-016

2 F2 0.00000000

3 F3 0.00000000

Jacobian Matrix

***************

| 1 2

--------------------------

1 | -0.5000000 3.0000000

2 | -0.7500000 3.0000000

3 | -0.8750000 2.2500000

The following are the results of the Nelder-Mead simplex algorithm.

x0 = [ 1. 1. ];

mopt = [ "linf" . ,

"tech" "nmsimp",


"print" 4 ];


******************

Optimization Start

******************

Parameter Estimates

-------------------

Parameter Estimate

1 X1 1.00000000

2 X2 1.00000000


Function Values

---------------

Function Value

1 F1 1.50000000

2 F2 2.25000000

3 F3 2.62500000


Iteration Start:




1 0 12 0 0.40430 0.25195 0.1121 1.0000 0.4688

2 0 22 0 0.17969 0.05663 2e-002 1.0000 0.2136

3 0 32 0 0.01629 0.06527 3e-002 1.0000 0.1966

4 0 42 0 8e-003 9e-003 4e-003 1.0000 4e-002

5 0 52 0 2e-003 8e-004 3e-004 1.0000 7e-003

6 0 61 0 3e-004 5e-004 2e-004 1.0000 2e-003

7 0 71 0 1e-004 4e-005 2e-005 1.0000 4e-004

8 0 81 0 2e-005 3e-005 1e-005 1.0000 1e-004

9 0 91 0 3e-006 5e-006 2e-006 1.0000 2e-005

10 0 101 0 9e-007 1e-006 4e-007 1.0000 5e-006


FCONV2 convergence criterion satisfied.



********************


********************


Parameter Estimates

-------------------

Parameter Estimate

1 X1 2.99999933

2 X2 0.49999959


Function Values

---------------

Function Value

1 F1 -9.049e-007

2 F2 -7.375e-007

3 F3 -3.438e-007

We can use the MINMAX technique to solve the same nonlinear L∞ regression problem. However, we must then augment(vertical concatenation) the function vector and the Jacobian matrix with its negative parts:


f0 = cons(3); r = x[2];

f0[1] = 1.5 - x[1] * (1. - r); r *= x[2];

f0[2] = 2.25 - x[1] * (1. - r); r *= x[2];

f0[3] = 2.625 - x[1] * (1. - r);

f = f0 |> (-f0);

return(f);

}


jac0 = cons(3,2); r = x[2];

jac0[1,1] = r - 1.; jac0[1,2] = x[1]; r *= x[2];

jac0[2,1] = r - 1.; jac0[2,2] = 2. * x[1] * x[2]; r *= x[2];

jac0[3,1] = r - 1.; jac0[3,2] = 3. * x[1] * x[2] * x[2];

jac = jac0 |> (-jac0);

return(jac);

}

The following input specification yields the same results as the NLIREG technique:

x0 = [ 1. 1. ];

mopt = [ "tech" "minmax",

"print" 4,

"xtol" 1.e-10 ];



1.7 Details on Support Vector Machines

1.7.1 Different Model Formulations

C Classification: Different versions are implemented:

1. The original dual model which is a QP with one linear equality constraint and lower and upper bounds.

2. The LS SVM modification of the classification problem by Suykens and Vandewalle (1999).

3. The Mangasarian & Musicant modifcation of the dual model which is a QP with only lower and upper bounds.

4. A few further modifcations by Mangasarian which even get rid of the upper bounds like PSVM, LSVM, andASVM.

5. Another modifcation by Fung and Mangasarian which is very useful for variable selection when Nobs < nvar.

1. Primal:

min1

2wT w + CeT ξ

subject to:yi(w

T Ki + b) ≥ 1 − ξi , ξi ≥ 0 , i = 1, . . . ,N

2. Dual:

min1

2αT Qα − eT α

subject to:yT α = 0 , 0 ≤ αi ≤ C , i = 1, . . . ,N

3. Mangasarian-Musicant modification of the Dual:

min1

2αT (Q + yyT )α − eT α

subject to:0 ≤ αi ≤ C , i = 1, . . . ,N

ν Classification: 1. The original dual model which is a QP with two linear equality constraint and lower and upperbounds.

2. The Mangasarian & Musicant modifcation of the dual model which is a QP with one linear equality constraintand lower and upper bounds.

1. Primal:

min1

2wT w − νρ + eT ξ/N

subject to:yi(w

T Ki + b) ≥ ρ − ξi , ξi ≥ 0 , i = 1, . . . ,N

2. Dual (scaled):

min1

2αT Qα

subject to:yT α = 0 , eT α = Nν , 0 ≤ αi ≤ 1 , i = 1, . . . , N

3. Mangasarian-Musicant modification of the Dual (scaled):

min1

2αT (Q + yyT )α

subject to:eT α = Nν , 0 ≤ αi ≤ 1 , i = 1, . . . ,N


L2 Regression: The dual model which is a QP with one linear equality constraint and lower and upper bounds.

1. Primal:

2. Dual:

min1

2αT Qα − yT α

subject to:eT α = 0 , −C ≤ αi ≤ C , i = 1, . . . ,N

ε Regression: 1. The dual model which is a QP with one linear equality constraint and lower and upper boundsrequiring twice the number of parameters.

2. The primal model which is a LP with a large number of linear and boundary constraints requiring about fourtimes the number of parameters.

1. Primal:

min1

2wT w + CeT ξ + CeT ξ∗

subject to:(wT Ki + b) − yi ≤ ε + ξi , yi − (wTKi + b) ≤ ε + ξ∗

i , i = 1, . . . ,N

2. Dual:

min1

2(α − α∗)T Q(α − α∗) + εeT (α + α∗) + yT (α + α∗)

subject to:eT (α − α∗) = 0 , 0 ≤ αi, α

∗i ≤ C , i = 1, . . . ,N

ν Regression: Due to the sparsity of the resulting parameter vector, this model is very useful for variable selection whenNobs < nvar. The dual model which is a QP with two linear equality constraint and lower and upper boundsrequiring twice the number of parameters.

1. Primal:

min1

2wT w + C[νε +

1

NeT (ξ + ξ∗)]

subject to:(wT Ki + b) − yi ≤ ε + ξi , yi − (wTKi + b) ≤ ε + ξ∗

i , i = 1, . . . ,N

2. Dual (scaled):

min1

2(α − α∗)T Q(α − α∗) + yT (α − α∗)

subject to:eT (α − α∗) = 0 , eT (α + α∗) = NCν , 0 ≤ αi, α

∗i ≤ C , i = 1, . . . ,N

1.7.2 Binary Classification Problem

Find a parametric function, linear or nonlinear, for a hyperplane separating two sets of points in Rm.

Separable Classification Problem for Linear Kernel

The data set has N observations with predictors xi = (xi1, . . . , xim) and binary responses (target values) yi ∈ −1, 1 whichexpress cluster (group) membership. (Note, for mathematical convenience the target values are defined here with -1 and+1 rather than with 0 and 1.) If the two clusters are linearly separable then there are three planes:

1. Separating Hyperplane: H0 : y = wT x − b = 0 separates the points of the two groups with target values +1 and -1


2. Right Hyperplane: H1 : y = wT x − b = +1 contains at least one point x with target value +1 closest to H0

3. Left Hyperplane: H2 : y = wT x − b = −1 contains at least one point x with target value -1 closest to H0

The points x on the left and right hyperplane are called ”support vectors”. The orientation of the plane is so defined thatthe distance between H1 and H2 is maximal. Also, there should be no points inbetween H1 and H2.

Mathematical derivation of the optimization problem:

The distance from one point on H1 to H0 iswT x − b

wT w=

1

wT w

Maximizing the distance between H1 and H2 means minimizing wT w w.r.t. m+1 unknowns w = (w1, . . . , wm) and scalarb subject to the constraints that no point is inbetween H1 and H2,

wT x − b ≥ 1 for y = +1

wT x − b ≤ −1 for y = −1

The two constraints can be combined into one

y(wT x − b) ≥ 1

Therefore the primal separable problem contains m + 1 variables to estimate:

(LP1) minw,b

1

2wT w

s.t. y(wT x − b) ≥ 1

Note, the parameter estimates w and b are defined only by the points on H1 and H2. The other points may be movedrather freely without changing the optimization result.

Inseparable Problem for Linear Kernel

In practical applications data are seldom geometrically separable and for inseparable data the problem (LP1) will have nofeasible solution. Therefore the optimization problem must be modified to permit points inbetween H1 and H2 and evenon the wrong side of H0. But points that cross the boundaries should be penalized.

We introduce N slack variables ξi ≥ 0, i = 1, . . . ,N , and modify the constraints to:

wTi x − b ≥ 1 − ξi for yi = +1

wTi x − b ≤ −1 + ξi for yi = −1

add a penalty term to the objective function and obtain the new problem which contains N + m + 1 unknows ξ,w, b toestimate:

(LP2) minw,b

1

2wT w + C(

N∑

i=1

ξi)m

s.t. yi(wTi x − b) ≥ 1 − ξi and ξi ≥ 0 i = 1, . . . ,N

for a given penalty constant C > 0 and for m = 1 or m = 2. From what I know, only O. Mangasarian (Univ. of Wisconsin,Madison, and Microsoft) uses the primal formulation among many other approaches.

Dual Estimation Problem

Using the Kuhn-Tucker conditions the primal (LP2) for m = 1 can be expressed as dual optimization problem which is aQP:


QP minα

1

2αT Qα − eT α

s.t. 0 ≤ αi ≤ C i = 1, . . . ,N, and yT α = 0.

where e = (1, . . . , 1), Q is an N × N positive semidefinite matrix, Qij = yiyjK(xi, xj), K(xi, xj) = Φ(xi)T Φ(xj) is the

kernel, and for a given scalar C > 0.

For a linear kernel yi = wT xi + b there is K(xi, xj) = xTi xj.

Properties of the QP:

1. Matrix Q is positive semidefinite (usually with very low rank) and the constraints are convex.

2. Matrix Q is dense and very large N × N , i.e. cannot be kept in core. Special optimization methods are necessary.

Three Types of Points

From the KKT (Karush-Kuhn-Tucker) conditions we derive that there are 3 types of points:

1. αi = 0 and therefore ξi = 0, yi(wT xi − b) = yifi ≥ 1: these are good points without influence on the separating

planes

2. 0 < αi < C, ξi = 0, yi(wTxi − b) = yifi = 1: these are points on the separating planes (support vectors)

3. αi = C, ξi > 0, yi(wT xi − b) = yifi = 1 − ξi: these are bad points on the wrong side of the separating planes

1.7.3 SVM Regression Problem

is the equivalent of the binary classification problem but for interval response.

Important Loss Functions

For linear regression with f(x) = wT x + b different choices of loss functions can be used:

1. Quadratic (Least Squares, L2)

The criterion to optimize is

L2 = (f(x) − y)2

This corresponds to a QP

minα

1

2

N∑

i=1

N∑

j=1

αiαj(xi − xj) −N∑

i=1

αiyi +1

2C

N∑

i=1

α2i

subject to constraintN∑

i=1

αi = 0

From the N estimates αi we obtain the regression parameters with

w =

N∑

i=1

αixi

b = −1

2w(xr + xs)

where xr is any support vector with yr > 0 and xs is any support vector with ys < 0.


2. Least Modulus (LAV, L1):

The criterion to optimize is nonsmoothL1 = |f(x)− y|


minα,β

1

2

N∑

i=1

N∑

j=1

(αi − βi)(αj − βj)(xi − xj) −N∑

i=1

yi(αi + βi)

subject to 2N + 1 constraints

0 ≤ αi, βi ≤ C and

N∑

i=1

(αi − βi) = 0

This is the special case of the ε Insensitive loss function for ε = 0.

3. ε Insensitive:

For given parameter ε > 0 the criterion to optimize is nonsmooth

Lε =

{0 for |f(x) − y| < ε

|f(x)− y| − ε otherwise


minα,β

1

2

N∑

i=1

N∑

j=1

(αi − βi)(αj − βj)(xi − xj) −N∑

i=1

[αi(yi − ε) + βi(yi + ε)]

subject to 2N + 1 constraints

0 ≤ αi, βi ≤ C and

N∑

i=1

(αi − βi) = 0

From the 2N estimates αi and βi we obtain the regression parameters with

w =

N∑

i=1

(αi − βi)xi

b = −1

2w(xr + xs)

where xr is any support vector with yr > 0 and xs is any support vector with ys < 0.

We can draw a +-ε tube around the (nonlinear) regression function. Points inside the tube do not contribute to thefit. Points touching the tube have inactive parameter values and points outside the tube have parameter values withαi − βi = + − C.

4. Robust Huber Regression

For given parameter µ > 0 the criterion to optimize is nonsmooth

LHuber =

{12(f(x) − y)2 for |f(x) − y| < µ

µ|f(x)− y| − µ2

2otherwise


minα

1

2

N∑

i=1

N∑

j=1

αiαj(xi − xj) −N∑

i=1

αiyi +1

2C

N∑

i=1

µα2i

subject to constraints

−C ≤ αi ≤ C and

N∑

i=1

αi = 0


1.7.4 Examples

The heart scale data set contains 13 predictor variables and 270 observations creating a QP problem with 270 × 270positive semidefinite (rank=13) Kernel Hessian matrix. Here we specify a radial basis function Kernel definition.

The first set of results shows a comparison of the four QP techniques (QPNUSP, QPRASP, QPMANP, and QPTRON)on the full 270 × 270 only bound constrained optimization problem. This is the modification of the objective functionproposed by Mangasarian and Musicant (1999) which does not impose the original linear constraint

N∑

i=0

αiyi = 0

We show the complete printed output only for the first run (null space technique). For the other optimization techniqueswe only show the output which differs and which is related to the computer resources needed by each technique.

1. Binary SVM Classification: Heart Data RBF2 Kernel

(a) SVM Classification: with LC

i. FQP: C is estimated

data = rspfile("heart_scale.dat");

modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"ppred" ,

"popt" 1,

"meth" "fqp",

"kern" "rbf2",

"kfp1" .076923076 ];

< alfa,sres,vres,yptr > = svm(data,modl,optn,class);

Here we show the output of data set information which we will not repeat for the remaining runs.

*****************

Model Information

*****************

Number Valid Observations 270

Response Variable Y[1]

N Independend Variables 13

Support Vector Classification

Estimation Method FQP

Optimization Method QPNUSP

Kernel Function RBF2

Gamma of RBF 0.076923076

Activation Function Hard

*************

Model Effects

*************

X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 +

X13 + X14

***********************

Class Level Information


***********************

Class Level Value

Y[ 1] 2 1 -1

*****************

Simple Statistics

*****************

Column Nobs Mean Std Dev Skewness Kurtosis

X[ 2] 270 0.0597222 0.3795444 -0.1636155 -0.5448153

X[ 3] 270 0.3555556 0.9363908 -0.7650843 -1.4252589

X[ 4] 270 0.4493827 0.6333933 -0.8787668 -0.2965479

X[ 5] 270 -0.2953878 0.3370115 0.7226183 0.9230968

X[ 6] 270 -0.4353459 0.2360102 1.1837208 4.8956003

X[ 7] 270 -0.7037037 0.7118130 1.9919706 1.9825778

X[ 8] 270 0.0222222 0.9978912 -0.0447032 -2.0053577

X[ 9] 270 0.2011875 0.3536751 -0.5277367 -0.1030720

X[10] 270 -0.3407407 0.9419032 0.7289148 -1.4796993

X[11] 270 -0.6612903 0.3694225 1.2628932 1.7593171

X[12] 270 -0.4148148 0.6143898 0.5431510 -0.6065718

X[13] 270 -0.5530864 0.6292642 1.2098901 0.2982373

X[14] 270 -0.1518519 0.9703295 0.2872680 -1.9006183

***************************************


***************************************

Variable Value Nobs Proportion

Y[ 1] 1 120 44.444444

-1 150 55.555556

We show only a short form of the optimization history:

Null Space Active Set Method of Quadratic Problem

Using Dense Hessian







Effective Time 12

*******************************

Evaluation of Training Data Fit

*******************************

Index Value StdErr

Absolute Classification Error 38 .

Concordant Pairs 72.95555556 .


Discordant Pairs 1.955555556 .

Tied Pairs 25.08888889 .

Classification Accuracy 85.92592593 .

Goodman-Kruskal Gamma 0.947789973 0.018021781

Kendall Tau_b 0.714298727 0.042853057

Stuart Tau_c 0.701234568 0.043906720

Somers D C|R 0.710000000 0.043392908

Classification Table

--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 98 22

-1 | 16 134

Regularization Parameter C . . . . . . . . . . . . . . 1.08152

Norm of Longest Vector . . . . . . . . . . . . . . . . . . . 1

Number Misclassifactions (Training Data) . . . . . . . . . 38

Total Number of Kernel Calls . . . . . . . . . . . . . 110552

Time for Optimization. . . . . . . . . . . . . . . . . . . 12

Total Processing Time. . . . . . . . . . . . . . . . . . . 13

Optimization Criterion . . . . . . . . . . . . . . . -107.798

Infinity Norm of Gradient. . . . . . . . . . . . 3.05311e-015

Geometric Margin . . . . . . . . . . 0.348068 (|w|^2= 33.0166)

Number Support Vectors . . . . . . . . . . . . 128 ( 47.41 %)

Number Support Vectors on Margin . . . . . . . . . . . . . 104

Bias . . . . . . . . . . . . . . . . . . . . . . . . 0.305383

Radius of Sphere Around SV . . . . . . . . . . . . . . 1.05172

Estimated VCdim of Classifier. . . . . . . . . . . . . . 37.52

KT Threshold . . . . . . . . . . . . . . . . . . . . 0.443593

Total Number of Kernel Calls: 110552

Time for Optimization: 12

Total Processing Time: 13

For space reasons we do not report predicted values and residuals.

ii. FQP: C is specifiedNow we specify c as a vector.


modl = "1 = 2 : 14";

class = 1;

cup = [ .01 .1 1. 10. 100. ];

optn = [ "print" 2,

"popt" 1,

"c" "cup",

"meth" "fqp",

"kern" "rbf2",

"kfp1" .076923076 ];


There are 5 optimizations for different C values. Predicted values and results are returned only for therun which shows the smallest misclassification.


****************************************

**********-Fitting for C=0.01-**********

****************************************







Effective Time 12

| Predicted

Observed | 1 -1

---------|-----------------

1 | 93 27

-1 | 17 133

***************************************

**********-Fitting for C=0.1-**********

***************************************







Effective Time 10


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 98 22

-1 | 22 128

*************************************

**********-Fitting for C=1-**********

*************************************




N. Active Constraints 246 Slope SDirect. -0.000214503



Effective Time 11


--------------------


| Predicted

Observed | 1 -1

---------|-----------------

1 | 98 22

-1 | 16 134

**************************************

**********-Fitting for C=10-**********

**************************************







Effective Time 15


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 107 13

-1 | 8 142

***************************************

**********-Fitting for C=100-**********

***************************************







Effective Time 30


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 116 4

-1 | 1 149

***********************************************

**********-Evaluation for Best C=100-**********

***********************************************


Regularization Parameter C . . . . . . . . . . . . . . . . 100


Number Misclassifactions (Training Data) . . . . . . . . . . 5




| N_Sup_Vec N_SV_Marg OptimCrit L1_Loss Grad_Norm

-----------------------------------------------------------

c0.01 | 241.00000 239.00000 -2.2874065 . 9.99e-016

c0.1 | 197.00000 190.00000 -15.705476 . 1.08e-015

c1 | 132.00000 107.00000 -100.87729 . 1.55e-015

c10 | 115.00000 55.000000 -660.42856 . 2.29e-014

c100 | 107.00000 9.0000000 -2526.9244 . 2.59e-013

| Beta(PCE) SV_Radius Bias WGT_Norm Margin

-----------------------------------------------------------

c0.01 | 0.7994501 1.0625912 0.0035426 0.2251870 4.2146195

c0.1 | 0.0159088 1.0517169 -0.0432388 7.1369127 0.7486431

c1 | 0.4245077 1.0517169 0.2882442 31.002196 0.3591979

c10 | 0.8566082 1.0517169 0.7801614 342.34947 0.1080924

c100 | 1.1455139 1.0517169 1.0862709 2940.5750 0.0368819

| VC_Dim Mis_Train

--------------------------

c0.01 | 1.2542586 44.000000

c0.1 | 8.8941990 44.000000

c1 | 35.291789 38.000000

c10 | 379.67561 21.000000

c100 | 3253.5946 5.0000000

iii. DQP: C is specified


modl = "1 = 2 : 14";

class = 1;

cup = [ 1. 10. 100. ];

optn = [ "print" 2,

"c" "cup",

"meth" "dqp",

"kern" "rbf2",

"kfp1" .076923076 ];


*************************************

**********-Fitting for C=1-**********

*************************************

Iter Nfun Term Size NSV BSV Xdif Function GradNorm

1 23 AGTol 30 24 16 4.45e+000 -1.462e+001 2.320e+000

2 24 AGTol 30 51 32 4.46e+000 -3.550e+001 3.667e+000

3 20 AGTol 30 66 38 3.07e+000 -4.749e+001 1.722e+000

4 40 AGTol 30 84 54 3.86e+000 -6.170e+001 2.233e+000

5 34 AGTol 30 103 67 3.47e+000 -7.384e+001 2.287e+000


6 31 AGTol 30 113 79 3.50e+000 -8.526e+001 1.766e+000

7 30 AGTol 30 119 84 2.67e+000 -9.068e+001 1.418e+000

8 37 AGTol 30 124 96 3.28e+000 -9.545e+001 1.410e+000

9 31 AGTol 30 131 100 2.49e+000 -9.825e+001 6.169e-001

10 22 AGTol 30 133 101 2.36e+000 -9.982e+001 7.412e-001

11 13 AGTol 30 133 104 9.57e-001 -1.003e+002 3.991e-001

12 15 AGTol 30 130 108 1.49e+000 -1.008e+002 2.249e-001

13 7 AGTol 30 132 107 7.07e-001 -1.009e+002 2.109e-015

Number Successful Optimizations: 13 : 100.00 %

Residual of Linear Equality Constraint: -6.21725e-015

Number Kernel Evaluations: 111345

Number Misclassifactions: 0


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 98 22

-1 | 16 134

**************************************

**********-Fitting for C=10-**********

**************************************


1 32 AGTol 30 112 103 2.37e+001 -1.707e+002 8.241e+000

2 38 AGTol 30 107 80 3.12e+001 -5.235e+002 5.705e+000

3 24 AGTol 30 108 67 1.73e+001 -5.866e+002 2.712e+000

...................................................................

95 5 RGTol 30 114 55 4.55e-003 -6.604e+002 1.286e-003

96 5 AGTol 30 114 55 5.79e-003 -6.604e+002 1.139e-003

97 5 RGTol 30 114 55 6.00e-003 -6.604e+002 1.207e-003

98 5 AGTol 30 114 55 5.70e-003 -6.604e+002 1.059e-003

99 5 RGTol 30 114 55 7.50e-003 -6.604e+002 1.141e-003

100 5 AGTol 30 114 55 4.68e-003 -6.604e+002 8.537e-004

Expand: -120 N(act)= 179 N(inact)= 91

101 5 RGTol 30 114 55 4.36e-003 -6.604e+002 9.492e-004


102 5 AGTol 30 115 55 3.03e-003 -6.604e+002 7.081e-004






--------------------

| Predicted

Observed | 1 -1


---------|-----------------

1 | 107 13

-1 | 8 142

***************************************

**********-Fitting for C=100-**********

***************************************

The DQP algorithm shows some convergence problems when the C value is so high that many parametersare free and not at the upper bound. For smaller C values the strategy of shrinking and expanding theworking set can save much computer time. For many free parameters this strategy has not much effect.


1 9 AGTol 30 113 57 1.67e+002 6.138e+003 2.923e+001

2 26 AGTol 30 112 40 1.67e+002 2.936e+003 1.449e+001

3 14 AGTol 30 109 35 6.78e+001 2.079e+003 1.557e+001

...................................................................

2409 5 RGTol 30 107 9 4.53e-003 -2.527e+003 1.339e-003

2410 5 RGTol 30 107 9 3.05e-003 -2.527e+003 9.988e-004


2411 5 RGTol 30 107 9 2.73e-003 -2.527e+003 1.266e-003

2412 5 RGTol 30 107 9 4.34e-003 -2.527e+003 9.624e-004


2413 5 RGTol 30 107 9 3.15e-003 -2.527e+003 1.182e-003

2414 5 RGTol 30 107 9 3.48e-003 -2.527e+003 9.496e-004



Number Kernel Evaluations: 8.96001e+006



--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 116 4

-1 | 1 149




Total Number of Kernel Calls . . . . . . . . . . 1.01938e+007



iv. SMO: C is set: always QPsize=2To illustrate SMO a smaller data set of only 60 cases is selected.

data = rspfile("heart_60.dat");

modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,


"c" 100.,

"meth" "smo",

"kern" "rbf2",

"kfp1" .076923076 ];

< alfa,sres,vres,yptr,yptt > = svm(data,modl,optn,class);

*************************************

Sequential Minimal Optimization (SMO)

Original Algorithm by Platt (1997)

*************************************

Case Crit Nkrn Nchg Time

1 18.01434386 7992 37 0

2 202.4037535 954365 3158 9

1 202.4037535 961073 0 9


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 27 0

-1 | 0 33





Time for Optimization. . . . . . . . . . . . . . . . . . . . 9

Total Processing Time. . . . . . . . . . . . . . . . . . . . 9

Optimization Criterion . . . . . . . . . . . . . . . . 202.404


Number Support Vectors . . . . . . . . . . . . . 27 ( 45.00 %)

Number Support Vectors on Margin . . . . . . . . . . . . . . 0

Bias . . . . . . . . . . . . . . . . . . . . . . . . 0.122728


Estimated VCdim of Classifier. . . . . . . . . . . . . 298.185

KT Threshold . . . . . . . . . . . . . . . . . . . . . 0.12272




v. SMO2(Keerthi): C is set: always QPsize=2


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"c" 100.,

"meth" "smo2",

"kern" "rbf2",

"kfp1" .076923076 ];

< alfa,sres,vres,yptr,yptt > = svm(data,modl,optn,class);


*************************************

Sequential Minimal Optimization (SMO)

Mod. 1 by Keerthi et.al. (1999)

*************************************

Case Crit Nkrn Nchg Time

1 2.446625198 4912 20 0

2 -9.163004322 108689 654 1

1 -3.641965784 110787 12 1

2 -13.31799185 258790 920 3

1 -13.46247172 260033 7 3

2 -14.26023967 320164 351 3

1 -14.25084083 320451 1 3

2 -14.41799857 326963 36 3

1 -14.41799857 327086 0 3


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 21 6

-1 | 0 33







Optimization Criterion . . . . . . . . . . . . . . . . -14.418



Number Support Vectors on Margin . . . . . . . . . . . . . . 0

Bias . . . . . . . . . . . . . . . . . . . . . . . . 0.927987



KT Threshold . . . . . . . . . . . . . . . . . . . . . . . . 0




(b) SVM Classification: w/o LC

i. FQP: C is estimated


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"ppred" ,

"popt" 1,


"bconly" ,

"meth" "fqp",

"kern" "rbf2",

"kfp1" .076923076 ];



FCONV convergence criterion satisfied.







--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 98 22

-1 | 16 134


Norm of Longest Vector . . . . . . . . . . . . . . . . 1.41421

Number Misclassifactions (Training Data) . . . . . . . . . 38







Number Support Vectors . . . . . . . . . . . . 129 ( 47.78 %)


Bias . . . . . . . . . . . . . . . . . . . . . . . . -0.134973



KT Threshold . . . . . . . . . . . . . . . . . . . . -1.41667




ii. FQP: C is specified


modl = "1 = 2 : 14";

class = 1;

cup = [ .01 .1 1. 10. 100. ];

optn = [ "print" 2,

"ppred" ,

"popt" 1,

"bconly" ,

"c" "cup",

"meth" "fqp",


"kern" "rbf2",

"kfp1" .076923076 ];


****************************************

**********-Fitting for C=0.01-**********

****************************************









--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 84 36

-1 | 9 141

***************************************

**********-Fitting for C=0.1-**********

***************************************









--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 98 22

-1 | 22 128

*************************************

**********-Fitting for C=1-**********

*************************************










--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 98 22

-1 | 16 134

**************************************

**********-Fitting for C=10-**********

**************************************









--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 107 13

-1 | 8 142

***************************************

**********-Fitting for C=100-**********

***************************************









--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 116 4


-1 | 1 149








-----------------------------------------------------------

c0.01 | 270.00000 270.00000 -2.4439595 . 0.0000000

c0.1 | 197.00000 190.00000 -15.705564 . 4.44e-016

c1 | 131.00000 107.00000 -100.94411 . 6.66e-015

c10 | 114.00000 55.000000 -660.72337 . 6.93e-014

c100 | 107.00000 9.0000000 -2527.4818 . 6.39e-013


-----------------------------------------------------------

c0.01 | 0.0000000 1.0625912 -0.3529093 0.5120809 2.7948641

c0.1 | -1.1428571 1.0517169 -0.0586292 7.1374103 0.7486170

c1 | -1.3333333 1.0517169 -0.1360942 30.952372 0.3594869

c10 | -1.1864407 1.0517169 -0.0773459 342.60369 0.1080523

c100 | -1.0612245 1.0517169 -0.0592445 2943.1600 0.0368657

| VC_Dim Mis_Train

--------------------------

c0.01 | 1.5781906 45.000000

c0.1 | 8.8947494 44.000000

c1 | 35.236678 38.000000

c10 | 379.95681 21.000000

c100 | 3256.4540 5.0000000

iii. DQP: C is specified


modl = "1 = 2 : 14";

class = 1;

cup = [ 1. 10. 100. ];

optn = [ "print" 2,

"bconly" ,

"c" "cup",

"meth" "dqp",

"kern" "rbf2",

"kfp1" .076923076 ];


*************************************

**********-Fitting for C=1-**********

*************************************


1 14 RFTol 30 25 16 4.45e+000 -1.465e+001 2.317e+000


2 16 RFTol 30 35 25 3.29e+000 -2.760e+001 2.383e+000

3 12 RFTol 30 45 37 2.57e+000 -3.745e+001 2.214e+000

4 12 RFTol 30 58 48 2.65e+000 -4.748e+001 2.623e+000

5 16 RFTol 30 70 61 3.59e+000 -6.182e+001 2.132e+000

6 14 RFTol 30 80 71 2.26e+000 -6.889e+001 2.517e+000

7 12 RFTol 30 89 83 3.09e+000 -7.890e+001 2.013e+000

8 10 RFTol 30 96 90 2.03e+000 -8.359e+001 2.453e+000

9 12 RFTol 30 105 97 2.46e+000 -8.989e+001 1.308e+000

10 12 RFTol 30 113 105 2.60e+000 -9.435e+001 1.078e+000

11 10 RFTol 30 120 108 2.47e+000 -9.712e+001 8.256e-001

12 12 RFTol 30 122 106 2.88e+000 -9.975e+001 5.194e-001

13 12 RFTol 30 125 107 2.35e+000 -1.008e+002 1.240e-001

14 8 RFTol 30 130 107 1.61e+000 -1.009e+002 3.848e-002

15 8 RFTol 30 130 107 1.25e-001 -1.009e+002 3.071e-004





--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 98 22

-1 | 16 134

**************************************

**********-Fitting for C=10-**********

**************************************

***************************************

Decomposition QP Using QPTRON Technique

***************************************


1 10 RFTol 30 114 104 2.49e+001 -1.948e+002 8.944e+000

2 10 RFTol 30 107 92 1.77e+001 -3.853e+002 7.402e+000

3 8 RFTol 30 102 81 2.60e+001 -5.548e+002 3.963e+000

...................................................................

30 6 RFTol 30 114 55 1.73e-002 -6.607e+002 9.315e-004


31 8 RFTol 30 113 55 4.26e-002 -6.607e+002 1.096e-003

32 8 RFTol 30 113 55 2.88e-002 -6.607e+002 8.193e-004





--------------------


| Predicted

Observed | 1 -1

---------|-----------------

1 | 107 13

-1 | 8 142

***************************************

**********-Fitting for C=100-**********

***************************************


1 10 RFTol 30 109 55 1.44e+002 6.079e+003 1.793e+001

2 10 RFTol 30 103 44 2.12e+002 2.704e+003 1.601e+001

3 10 RFTol 30 101 40 1.44e+002 8.636e+002 1.273e+001

...................................................................

200 6 RFTol 30 107 9 2.37e-002 -2.527e+003 9.744e-004


201 6 RFTol 30 107 9 3.54e-002 -2.527e+003 1.231e-003

202 6 RFTol 30 107 9 3.08e-002 -2.527e+003 1.291e-003

203 6 RFTol 30 107 9 6.63e-002 -2.527e+003 1.262e-003

204 6 RFTol 30 107 9 1.91e-002 -2.527e+003 1.020e-003

205 6 RFTol 30 107 9 3.69e-002 -2.527e+003 1.494e-003

206 6 RFTol 30 107 9 5.12e-002 -2.527e+003 1.295e-003

207 6 RFTol 30 107 9 6.64e-002 -2.527e+003 9.240e-004


Number Kernel Evaluations: 1.19813e+006



--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 116 4

-1 | 1 149

Compared to the original objective function with linear equality constraint we obtain almost the sameresults with much lower cost:







iv. PSVM: C is specified


modl = "1 = 2 : 14";

class = 1;

cup = [ 1. 10. 100. ];


optn = [ "print" 2,

"bconly" ,

"c" "cup",

"meth" "psv",

"kern" "rbf2",

"kfp1" .076923076 ];


*************************************

**********-Fitting for C=1-**********

*************************************


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 100 20

-1 | 12 138

**************************************

**********-Fitting for C=10-**********

**************************************


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 110 10

-1 | 4 146

***************************************

**********-Fitting for C=100-**********

***************************************


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 117 3

-1 | 0 150









-----------------------------------------------------------

c1 | 270.00000 32.000000 . . .

c10 | 270.00000 14.000000 . . .

c100 | 270.00000 3.0000000 . . .


-----------------------------------------------------------

c1 | 0.0575286 1.0625912 -5.30e-004 15.520000 0.5076731

c10 | 0.0575286 1.0625912 -0.0014095 178.22138 0.1498132

c100 | 0.0575286 1.0625912 -1.80e-004 1578.8435 0.0503339

| VC_Dim Mis_Train

--------------------------

c1 | 18.523633 32.000000

c10 | 202.22977 14.000000

c100 | 1783.6723 3.0000000

v. LSVM: C is estimated


modl = "1 = 2 : 14";

class = 1;

cup = [ 1. 10. 100. ];

optn = [ "print" 2,

"bconly" ,

"c" "cup",

"meth" "lsv",

"kern" "rbf2",

"kfp1" .076923076 ];


*************************************

**********-Fitting for C=1-**********

*************************************

LSVM Iteration History

iter absxdif relxdif xnorm

1 0.34497814 0.19440537 9.98582531

2 0.20157311 0.11231360 9.97333800

3 0.15325082 0.08536120 9.96589865

........................................

75 2.220e-005 1.235e-005 9.96362550

76 1.978e-005 1.100e-005 9.96362554

77 1.763e-005 9.804e-006 9.96362551


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 101 19

-1 | 12 138


**************************************

**********-Fitting for C=10-**********

**************************************



1 3.69092412 0.21995867 76.8366800

2 2.06771120 0.12237905 76.3216285

3 1.72032268 0.10134956 75.8603620

........................................

74 1.939e-004 1.126e-005 74.5874926

75 1.794e-004 1.042e-005 74.5874149

76 1.662e-004 9.654e-006 74.5873437


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 110 10

-1 | 4 146

***************************************

**********-Fitting for C=100-**********

***************************************



1 45.4704242 0.30178448 458.231990

2 20.7747646 0.13722435 449.616520

3 16.8225661 0.11033620 442.844975

........................................

388 0.00157536 1.010e-005 400.775944

389 0.00156160 1.002e-005 400.774860

390 0.00154797 9.928e-006 400.773787


--------------------

| Predicted

Observed | 1 -1

---------|-----------------

1 | 117 3

-1 | 0 150









-----------------------------------------------------------

c1 | 240.00000 31.000000 66.686508 . 27.819577

c10 | 171.00000 14.000000 107.30063 . 42.899379

c100 | 149.00000 2.0000000 -112.87028 . 68.388039


-----------------------------------------------------------

c1 | 0.0508394 1.0625912 0.0013686 16.618952 0.4906007

c10 | 0.0508394 1.0517169 -0.0259779 198.54833 0.1419374

c100 | 0.0508394 1.0625912 -0.8424655 1569.0985 0.0504899

| VC_Dim Mis_Train

--------------------------

c1 | 19.764460 31.000000

c10 | 220.61598 14.000000

c100 | 1772.6692 3.0000000

(c) SVM with LC: FQP: Comparison of techniques

Null Space Technique


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"popt" 1,

"kern" "rbf2",

"meth" "fqp",

"tech" "qpnusp",

"kfp1" .076923076 ];








Effective Time 12




Range Space Technique


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,


"popt" 1,

"kern" "rbf2",

"meth" "fqp",

"tech" "qprasp",

"kfp1" .076923076 ];




Criterion -107.7979538 N. Active Constraints 247

Slope SDirect. -7.5682e-005



Effective Time 3




Madsen-Nielsen-Pinar Technique


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"popt" 1,

"kern" "rbf2",

"meth" "fqp",

"tech" "qpmanp",

"kfp1" .076923076 ];




Criterion -107.7978914 N. Active Constraints 247

Duality Gap -6.4028e-010 Gamma 2.5751e-006


N. Refactorizatns. 6 Preproces. Time 0





Lin-More TRON Technique


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"popt" 1,


"kern" "rbf2",

"meth" "fqp",

"tech" "qptron",

"kfp1" .076923076 ];












(d) SVM with LC: FQP: User Specified Kernel Function

The computation time increases by about a factor 10 for a specified kernel function. Therefore, only the first60 observations of the heart scale.dat are being used in heart 60.dat.

/* radial basis kernel function */

function krbf(vi,vj) {

w = vi - vj;

t = exp(-.076923076 * w * w‘);

return(t);

}


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 1,

"meth" "fqp" ];

< alfa,sres,vres,yptr > = svm(data,modl,optn,class,.,krbf);




Total Number of Kernel Calls . . . . . . . . . . . . . . 5694








Bias . . . . . . . . . . . . . . . . . . . . . . . . 0.179318



KT Threshold . . . . . . . . . . . . . . . . . . . . 0.257047






modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"meth" "dqp" ];

< alfa,sres,vres,yptr > = svm(data,modl,optn,class,.,krbf);








L1 Loss. . . . . . . . . . . . . . . . . . . . . . 0.000100471





Bias . . . . . . . . . . . . . . . . . . . . . . . . 0.179327



KT Threshold . . . . . . . . . . . . . . . . . . . . -0.257057




2. Multinomial SVM Classification: Fisher’s Iris Data; RBF2 Kernel

(a) SVM Classification: with LC

#include "iris.txt"

nr = nrow(iris); irs5 = iris[,5];

count = cons(3);

for (i = 1; i <= nr; i++) count[irs5[i]] += 1;

print "Member Count=", count;

i. FQP: C is specified

class = 5;

modl = "5 = 1 : 4";

optn = [ "print" 3,

"ppred" ,

"popt" 1,

"c" 1.,

"meth" "fqp",

"kern" "rbf2",

"kfp1" .1 ];


alfa = svm(iris,modl,optn,class);

*******************************************

---Fitting Response Category 1 vs. Rest---

*******************************************







Effective Time 12

*******************************************


*******************************************







Effective Time 13

*******************************************


*******************************************







Effective Time 13

*****************************************************

---Evaluating Multinomial Response (Training Data)---

*****************************************************


--------------------

| Predicted

Observed | 1 3 2

---------|-------------------------

1 | 50 0 0

3 | 0 50 0

2 | 0 0 50

Regularization Parameter C . . . . . . . . . . . . . . . . . 1








-----------------------------------------------------------

1 | 109.00000 8.0000000 -20.966424 . 2.44e-015

3 | 114.00000 14.000000 -31.952819 . 1.22e-015

2 | 116.00000 13.000000 -29.427404 . 1.39e-015


-----------------------------------------------------------

1 | -0.3546215 1.4142136 -0.5177542 38.284068 0.3232369

3 | -0.3546215 1.4142136 -0.1763737 56.776691 0.2654269

2 | -0.3546215 1.4142136 -0.3278630 51.080415 0.2798355

| VC_Dim Mis_Train

--------------------------

1 | 77.568136 0.0000000

3 | 114.55338 0.0000000

2 | 103.16083 0.0000000

ii. DQP: C is specified

#include "iris.txt"

class = 5;

modl = "5 = 1 : 4";

optn = [ "print" 3,

"c" 1.,

"meth" "dqp",

"kern" "rbf2",

"kfp1" .1 ];


*******************************************


*******************************************


1 34 AGTol 20 20 2 3.19e+000 -6.837e+000 1.185e+000

2 31 AGTol 20 39 3 3.26e+000 -1.343e+001 8.629e-001

3 23 AGTol 20 57 3 2.20e+000 -1.648e+001 9.454e-001

...................................................................

33 5 AGTol 20 109 9 2.28e-003 -2.097e+001 1.187e-003

34 5 AGTol 20 109 9 2.05e-003 -2.097e+001 7.947e-004


35 6 AGTol 20 110 9 1.18e-003 -2.097e+001 8.095e-004


Residual of Linear Equality Constraint: 4.16334e-015




*******************************************


*******************************************


1 30 AGTol 20 20 5 3.89e+000 -9.264e+000 1.673e+000

2 37 AGTol 20 38 9 3.69e+000 -1.770e+001 1.128e+000

3 39 AGTol 20 58 10 3.47e+000 -2.492e+001 9.979e-001

...................................................................

49 5 AGTol 20 114 14 2.41e-003 -3.195e+001 1.066e-003

50 5 AGTol 20 114 14 1.59e-003 -3.195e+001 5.593e-004


51 5 AGTol 20 114 14 1.31e-003 -3.195e+001 5.313e-004


Residual of Linear Equality Constraint: 1e-008



*******************************************


*******************************************


1 32 AGTol 20 20 5 3.78e+000 -8.888e+000 1.661e+000

2 31 AGTol 20 38 9 3.85e+000 -1.777e+001 9.876e-001

3 26 AGTol 20 56 10 2.80e+000 -2.253e+001 9.854e-001

...................................................................

39 5 AGTol 20 116 13 6.24e-003 -2.943e+001 1.064e-003

40 6 AGTol 20 117 13 3.57e-003 -2.943e+001 6.275e-004


41 5 AGTol 20 117 13 1.40e-003 -2.943e+001 3.966e-004


Residual of Linear Equality Constraint: 4.996e-015



*****************************************************


*****************************************************


--------------------


| Predicted

Observed | 1 3 2

---------|-------------------------

1 | 50 0 0

3 | 0 50 0

2 | 0 0 50








-----------------------------------------------------------

1 | 110.00000 9.0000000 -20.966423 7.29e-004 8.10e-004

3 | 114.00000 14.000000 -31.952818 3.69e-004 5.31e-004

2 | 117.00000 13.000000 -29.427404 4.34e-004 3.97e-004


-----------------------------------------------------------

1 | 0.3546185 1.4142136 -0.5179214 38.285199 0.3232321

3 | 0.3546185 1.4142136 -0.1763776 56.780079 0.2654190

2 | 0.3546185 1.4142136 -0.3280904 51.081525 0.2798324

| VC_Dim Mis_Train

--------------------------

1 | 77.570398 0.0000000

3 | 114.56016 0.0000000

2 | 103.16305 0.0000000

(b) SVM Classification: w/o LC

i. FQP: C is specified

class = 5;

modl = "5 = 1 : 4";

optn = [ "print" 3,

"ppred" ,

"popt" 1,

"bconly" ,

"c" 1.,

"meth" "fqp",

"kern" "rbf2",

"kfp1" .1 ];


*******************************************


*******************************************









*******************************************


*******************************************








*******************************************


*******************************************








*****************************************************


*****************************************************


--------------------

| Predicted

Observed | 1 3 2

---------|-------------------------

1 | 50 0 0

3 | 0 50 0

2 | 0 0 50









-----------------------------------------------------------

1 | 110.00000 9.0000000 -21.106570 . 3.55e-015

3 | 115.00000 14.000000 -31.968502 . 3.55e-015

2 | 117.00000 13.000000 -29.489089 . 6.22e-015


-----------------------------------------------------------

1 | -0.5384615 1.4142136 0.0160043 38.692185 0.3215277

3 | -0.5384615 1.4142136 0.0021846 56.851415 0.2652524

2 | -0.5384615 1.4142136 0.0260752 51.289223 0.2792653

| VC_Dim Mis_Train

--------------------------

1 | 78.384370 0.0000000

3 | 114.70283 0.0000000

2 | 103.57845 0.0000000

ii. DQP: C is specified

#include "iris.txt"

class = 5;

modl = "5 = 1 : 4";

optn = [ "print" 3,

"bconly" ,

"c" 1.,

"meth" "dqp",

"kern" "rbf2",

"kfp1" .1 ];


*******************************************


*******************************************

Summary Decomposition QP

------------------------

Optimization Technique QPTRON

Number Sub QP Iterations 57

Maximum Size of Sub QP 20

Number Support Vector 110

Number Bounded S.V. 9

Misclassified Observations 0

Objective Function -2.1107e+001

Inf. Norm of Gradient 8.8300e-004

Number Function Calls 468

*******************************************


*******************************************


------------------------











*******************************************


*******************************************


------------------------










*****************************************************


*****************************************************


--------------------

| Predicted

Observed | 1 3 2

---------|-------------------------

1 | 50 0 0

3 | 0 50 0

2 | 0 0 50








-----------------------------------------------------------

1 | 110.00000 9.0000000 -21.106565 0.0167706 8.83e-004

3 | 114.00000 14.000000 -31.968493 0.0029244 7.41e-004


2 | 116.00000 13.000000 -29.489084 0.0267606 7.01e-004


-----------------------------------------------------------

1 | -0.0264365 1.4142136 0.0158875 38.695265 0.3215149

3 | -0.0264365 1.4142136 0.0021920 56.854706 0.2652447

2 | -0.0264365 1.4142136 0.0264365 51.292716 0.2792558

| VC_Dim Mis_Train

--------------------------

1 | 78.390529 0.0000000

3 | 114.70941 0.0000000

2 | 103.58543 0.0000000

iii. PSVM: C is specified

#include "iris.txt"

class = 5;

modl = "5 = 1 : 4";

optn = [ "print" 3,

"bconly" ,

"c" 1.,

"meth" "psv",

"kern" "rbf2",

"kfp1" .1 ];


*****************************************************


*****************************************************


--------------------

| Predicted

Observed | 1 3 2

---------|-------------------------

1 | 50 0 0

3 | 0 50 0

2 | 0 0 50








-----------------------------------------------------------

1 | 150.00000 0.0000000 . . .

3 | 150.00000 0.0000000 . . .

2 | 150.00000 0.0000000 . . .



-----------------------------------------------------------

1 | -5.70e-005 1.4142136 -0.0027939 19.448579 0.4535092

3 | -5.70e-005 1.4142136 -0.0014588 24.760054 0.4019335

2 | -5.70e-005 1.4142136 -0.0022457 23.669070 0.4110924

| VC_Dim Mis_Train

--------------------------

1 | 39.897159 0.0000000

3 | 50.520108 0.0000000

2 | 48.338141 0.0000000

iv. LSVM: C is specified

#include "iris.txt"

class = 5;

modl = "5 = 1 : 4";

optn = [ "print" 3,

"bconly" ,

"c" 1.,

"meth" "lsv",

"kern" "rbf2",

"kfp1" .1 ];


*******************************************


*******************************************



1 0.05234040 0.07101304 2.89885537

2 0.02665072 0.03615660 2.90044171

3 0.01448347 0.01964997 2.89966495

........................................

14 1.795e-005 2.435e-005 2.90000757

15 1.037e-005 1.407e-005 2.90000706

16 6.056e-006 8.216e-006 2.90000736

*******************************************


*******************************************



1 0.02881250 0.03764798 3.76816274

2 0.01467077 0.01916941 3.76853416

3 0.00797290 0.01041777 3.76835222

........................................

12 3.061e-005 4.000e-005 3.76843255

13 1.729e-005 2.259e-005 3.76843219

14 9.880e-006 1.291e-005 3.76843240


*******************************************


*******************************************



1 0.02445890 0.02785593 3.60747862

2 0.01245400 0.01418381 3.60775442

3 0.00676819 0.00770823 3.60761945

........................................

12 2.599e-005 2.960e-005 3.60767921

13 1.468e-005 1.672e-005 3.60767895

14 8.387e-006 9.552e-006 3.60767910

*****************************************************


*****************************************************


--------------------

| Predicted

Observed | 1 3 2

---------|-------------------------

1 | 50 0 0

3 | 0 50 0

2 | 0 0 50








-----------------------------------------------------------

1 | 148.00000 0.0000000 173832.01 . 53700.003

3 | 148.00000 0.0000000 141358.79 . 49878.187

2 | 148.00000 0.0000000 4289.2218 . 5724.2969


-----------------------------------------------------------

1 | -2.33e-017 1.4142136 -0.0035232 19.463969 0.4533298

3 | -2.33e-017 1.4142136 -0.0010948 24.764741 0.4018955

2 | -2.33e-017 1.4142136 -0.0019503 23.672410 0.4110634

| VC_Dim Mis_Train

--------------------------

1 | 39.927938 0.0000000

3 | 50.529483 0.0000000

2 | 48.344820 0.0000000


3. XiAlpha Estimates, Loo, and k-fold Cross Validation

To illustrate the approach we use the first 60 observations of the Heart data set. The following input specifies XiAestimates of depth 0 and Loo (Leave-one-out, jackknife) estimation:


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"cloo" ,

"cxia" ,

"xiad" 0,

"c" 1.,

"meth" "fqp",

"kern" "rbf2",

"kfp1" .076923076 ];


*****************

Model Information

*****************










Use Unscaled Predictor


--------------------

| Predicted

Observed | -1 1

---------|-----------------

-1 | 30 3

1 | 5 22

The following output indicates that the actual Loo computation must be run only for 20 of the 60 obeservations.For the remaining 40 cases the result can be predicted entriely by judging the gradient at the optimal point of thefull sample.

******************************

Leave-One-Out Cross Validation

******************************

Case Ytrain Ycv Yobs Err

1 1.142337115 > 0 1.000000000 0

2 -0.447272786 0.117529535 -1.000000000 1

3 -0.313484448 < 0 1.000000000 1


4 0.608518215 > 0 -1.000000000 1

5 -0.925584949 < 0 -1.000000000 0

6 -0.388177238 0.155855351 -1.000000000 1

7 0.930604416 0.905097947 1.000000000 0

8 1.214806149 > 0 1.000000000 0

9 1.397924346 > 0 1.000000000 0

10 1.074415036 > 0 1.000000000 0

11 -0.142606407 0.400607077 -1.000000000 1

12 0.504623225 > 0 -1.000000000 1

13 -0.925584946 < 0 -1.000000000 0

14 0.391029292 0.162314399 1.000000000 0

15 -0.943409409 < 0 -1.000000000 0

16 -0.925584954 < 0 -1.000000000 0

17 1.088140639 > 0 1.000000000 0

18 1.074415045 > 0 1.000000000 0

19 -0.499879746 0.066522902 -1.000000000 1

20 -0.405541984 0.209165774 -1.000000000 1

21 1.700706697 > 0 1.000000000 0

22 -0.742728716 -0.315264645 -1.000000000 0

23 -0.910991132 -0.497163956 -1.000000000 0

24 -1.026390091 < 0 -1.000000000 0

25 -0.925584951 < 0 -1.000000000 0

26 -1.331928779 < 0 -1.000000000 0

27 -0.844889126 -0.316229293 -1.000000000 0

28 -1.388251331 < 0 -1.000000000 0

29 1.052201994 1.038273703 1.000000000 0

30 -0.925584953 < 0 -1.000000000 0

Case Ytrain Ycv Yobs Err

31 0.978618794 0.997241460 1.000000000 0

32 0.159650962 > 0 -1.000000000 1

33 -1.422550908 < 0 -1.000000000 0

34 1.106122934 > 0 1.000000000 0

35 0.897838171 0.823766377 1.000000000 0

36 1.074415043 > 0 1.000000000 0

37 1.044883347 0.913912868 1.000000000 0

38 -0.556337037 < 0 1.000000000 1

39 -1.012700790 < 0 -1.000000000 0

40 -0.925584949 < 0 -1.000000000 0

41 -0.371846530 < 0 1.000000000 1

42 -1.385525143 < 0 -1.000000000 0

43 -0.925584948 < 0 -1.000000000 0

44 -0.493726877 0.121368377 -1.000000000 1

45 0.382032254 0.110618273 1.000000000 0

46 -0.919374149 < 0 -1.000000000 0

47 0.887485622 0.764218992 1.000000000 0

48 -0.375768456 < 0 1.000000000 1

49 1.074415038 > 0 1.000000000 0

50 1.074415044 > 0 1.000000000 0

51 0.275527827 0.285130520 1.000000000 0

52 -1.086660662 < 0 -1.000000000 0

53 -0.941396888 < 0 -1.000000000 0

54 -0.799789970 -0.284259306 -1.000000000 0

55 -1.232646176 < 0 -1.000000000 0


56 -1.166907692 < 0 -1.000000000 0

57 0.686888665 0.454845236 1.000000000 0

58 -1.032684067 < 0 -1.000000000 0

59 -0.554883781 < 0 1.000000000 1

60 1.074415045 > 0 1.000000000 0

Runtime for Leave-one-out in seconds: 3












Bias . . . . . . . . . . . . . . . . . . . . . . . . -0.160795



KT Threshold . . . . . . . . . . . . . . . . . . . . -0.200453

Depth of XiA Estimates (rho=1) . . . . . . . . . . . . . . . 0

XiA Estimate of Error (Nmis=30). . . . . . . . . . . . . 50 %

XiA Estimate of Recall . . . . . . . . . . . . . . . 44.4444 %

XiA Estimate of Precision. . . . . . . . . . . . . . 44.4444 %

Number Loo Computations (rho=1). . . . . . . . . . . . . . 20

Loo Estimate of Error (Nmis=14). . . . . . . . . . . 23.3333 %

Loo Estimate of Recall . . . . . . . . . . . . . . . 81.4815 %

Loo Estimate of Precision. . . . . . . . . . . . . . 70.9677 %




The following input specifies XiA estimates of depth 0 and 4-fold cross validation:

/* FQP: CV with fold=4 */


modl = "1 = 2 : 14";

class = 1;

optn = [ "print" 2,

"cloo" ,

"fold" 4,

"cxia" ,

"xiad" 0,

"c" 1.,

"meth" "fqp",

"kern" "rbf2",

"kfp1" .076923076 ];



*****************

Model Information

*****************










Use Unscaled Predictor


--------------------

| Predicted

Observed | -1 1

---------|-----------------

-1 | 30 3

1 | 5 22

*************************

K=4 Fold Cross Validation

*************************

Fold From To Nmiss +Ch -Ch

1 1 15 6 1 5

2 16 30 2 0 2

3 31 45 3 2 1

4 46 60 2 2 0

Runtime for K=4 fold cross validation in seconds: 1












Bias . . . . . . . . . . . . . . . . . . . . . . . . -0.160795




KT Threshold . . . . . . . . . . . . . . . . . . . . -0.238733

Depth of XiA Estimates (rho=1) . . . . . . . . . . . . . . . 0

XiA Estimate of Error (Nmis=29). . . . . . . . . . . 48.3333 %

XiA Estimate of Recall . . . . . . . . . . . . . . . 44.4444 %

XiA Estimate of Precision. . . . . . . . . . . . . . 46.1538 %

Number CV Computations . . . . . . . . . . . . . . . . . . . 4

CV Estimate of Error (Nmis=13) . . . . . . . . . . . 21.6667 %

CV Estimate of Recall. . . . . . . . . . . . . . . . 81.4815 %

CV Estimate of Precision . . . . . . . . . . . . . . 73.3333 %




4. Parameter Tuning

As an example, we use the 40 first observations of the Heart data set and the mean misclassiction error of 4-foldcross validation:

data = rspfile("..\\tdata\\heart_40.dat");

modl = "1 = 2 : 14";

class = 1;

Since we are specifying 5 grid values for C and 13 grid values for the parameter of the RBF kernel function we shouldhave to evaluate 65 cross validation, each of them with 5 optimizations:

/* tuning by grid search: specify values */

c = [ .001 .1 1. 10. 100. ];

kp1 = [ .2 : .05 : .8 ];

tun_g = c |> kp1 ;

/*--- FQP: grid search tuning ---*/

optn = [ "print" 3 ,

"popt" 1 ,

"tun" "gsrch" ,

"fold" 4 ,

"kern" "rbf2" ,

"kfp1" .076923076,

"meth" "fqp" ];

< alfa,sres,vres,yptr > = svm(data,modl,optn,class,tun_g);

The function values are the mean misclassification across the 4 cross validations:


***************************************

Results of Grid Search for 2 Parameters

***************************************

N F C KF1

1 12.7500000 0.0010000 0.2000000

2 12.7500000 0.0010000 0.2500000

3 12.7500000 0.0010000 0.3000000

4 12.7500000 0.0010000 0.3500000

5 12.7500000 0.0010000 0.4000000

6 12.7500000 0.0010000 0.4500000

7 12.7500000 0.0010000 0.5000000

8 12.7500000 0.0010000 0.5500000

9 12.7500000 0.0010000 0.6000000

10 12.7500000 0.0010000 0.6500000

11 12.7500000 0.0010000 0.7000000

12 12.7500000 0.0010000 0.7500000

13 9.00000000 0.0010000 0.8000000

14 4.00000000 0.1000000 0.2000000

15 4.50000000 0.1000000 0.2500000

16 5.00000000 0.1000000 0.3000000

17 5.25000000 0.1000000 0.3500000

18 6.25000000 0.1000000 0.4000000

19 6.25000000 0.1000000 0.4500000

20 6.00000000 0.1000000 0.5000000

21 6.00000000 0.1000000 0.5500000

22 6.00000000 0.1000000 0.6000000

23 5.75000000 0.1000000 0.6500000

24 5.50000000 0.1000000 0.7000000

25 5.50000000 0.1000000 0.7500000

26 5.50000000 0.1000000 0.8000000

27 1.75000000 1.0000000 0.2000000

28 1.25000000 1.0000000 0.2500000

29 1.00000000 1.0000000 0.3000000

30 1.00000000 1.0000000 0.3500000

N F C KF1

31 0.75000000 1.0000000 0.4000000

32 0.75000000 1.0000000 0.4500000

33 0.75000000 1.0000000 0.5000000

34 0.50000000 1.0000000 0.5500000

35 0.25000000 1.0000000 0.6000000

36 0.25000000 1.0000000 0.6500000

37 0.00000000 1.0000000 0.7000000

38 0.00000000 1.0000000 0.7500000

39 0.00000000 1.0000000 0.8000000

40 4.75000000 10.000000 0.2000000

41 4.50000000 10.000000 0.2500000

42 3.75000000 10.000000 0.3000000

43 2.50000000 10.000000 0.3500000

44 4.25000000 10.000000 0.4000000

45 4.50000000 10.000000 0.4500000

46 4.25000000 10.000000 0.5000000

47 4.25000000 10.000000 0.5500000

48 3.25000000 10.000000 0.6000000

49 2.75000000 10.000000 0.6500000

50 2.25000000 10.000000 0.7000000

51 2.75000000 10.000000 0.7500000

52 2.75000000 10.000000 0.8000000

53 4.75000000 100.00000 0.2000000

54 5.25000000 100.00000 0.2500000

55 5.50000000 100.00000 0.3000000

56 4.75000000 100.00000 0.3500000

57 4.25000000 100.00000 0.4000000

58 6.00000000 100.00000 0.4500000

59 5.75000000 100.00000 0.5000000

60 7.50000000 100.00000 0.5500000

61 7.75000000 100.00000 0.6000000

62 7.25000000 100.00000 0.6500000

63 6.50000000 100.00000 0.7000000

64 9.00000000 100.00000 0.7500000

65 8.50000000 100.00000 0.8000000

Best 10 Results of Grid Search

------------------------------

N F C KF1

37 0.00000000 1.0000000 0.7000000

38 0.00000000 1.0000000 0.7500000

39 0.00000000 1.0000000 0.8000000

36 0.25000000 1.0000000 0.6500000

35 0.25000000 1.0000000 0.6000000

34 0.50000000 1.0000000 0.5500000

33 0.75000000 1.0000000 0.5000000

32 0.75000000 1.0000000 0.4500000

31 0.75000000 1.0000000 0.4000000

30 1.00000000 1.0000000 0.3500000

The following is the best solution we found:

Parameter Estimates


-------------------

1 : 1 0.789 1 1 0.7079

6 : 0.729 0.8606 0.02609 0.7443 1

11 : 0.5914 1 0.2668 1 0.484

16 : 0.6924 1 1 0.7629 0.7904

21 : 0.7006 0.7577 0.5493 0.2186 0.7315

26 : 0.2942 0.7267 0.4106 0.6511 0.727

31 : 0.5767 0.9406 0.4672 0.9472 1

36 : 1 1 1 0.4551 0.4144

We verify the result by running a single optimization with the best parameters:

/*--- FQP: optimal parameters (C,KFP1) ---*/


"popt" 1 ,

"fold" 4 ,

"c" 1. ,

"kern" "rbf2" ,

"kfp1" .8 ,

"meth" "fqp" ];


The following input specifies the automatic search:

/* tuning by automatic NMS: specify bounds */

tun_s = [ 1.e-6 100. ,

1.e-6 .5 ];


"popt" 1 ,

"tun" "bound" ,

"fold" 4 ,

"kern" "rbf2" ,

"kfp1" .076923076,

"meth" "fqp" ];

< alfa,sres,vres,yptr > = svm(data,modl,optn,class,tun_s);

***************************************

Results of Grid Search for 2 Parameters

***************************************

N F C KF1

1 12.7500000 1.00e-006 1.00e-006

2 12.7500000 1.00e-006 0.0050010

3 12.7500000 1.00e-006 0.0500009

4 12.7500000 1.00e-006 0.5000000

5 12.7500000 1.0000010 1.00e-006

6 9.25000000 1.0000010 0.0050010

7 4.25000000 1.0000010 0.0500009

8 0.75000000 1.0000010 0.5000000

9 12.7500000 10.000001 1.00e-006

10 5.25000000 10.000001 0.0050010


11 5.50000000 10.000001 0.0500009

12 4.25000000 10.000001 0.5000000

13 12.7500000 100.00000 1.00e-006

14 6.50000000 100.00000 0.0050010

15 7.75000000 100.00000 0.0500009

16 5.75000000 100.00000 0.5000000

After that small grid search the Nelder-Mead algorithm is started using the best grid point as starting point.

N. Variables 2



1 0 15 1 0.50000 0.25000 0.1179 10.000 0.55469

The improvement from F=.75 to F=.5 is disappointing since we know already that F=0 exists:

Tuning Result by Optimization F=0.5

***********************************

Regularization C 0.7500010

First Kernel Par 0.5000000

5. SVM Classification: The case nobs > nvar

For an illustration we use the data set Caco2 from Kristin Bennett’s and Jinbo Bi’s ([69]) website. It has 27observations and 714 variables. We read two files: caco nam.dat and caco.dat with the 714 variable names and allthe data.

/* Caco: nobs= 27, nvar= 714 (Bi & Bennett) */

fo10 = " %16s %16s %16s %16s %16s %16s %16s %16s %16s %16s";

form = fo10;

for (j = 2; j <= 71; j++) form = strcat(form,fo10);

fo4 = " %16s %16s %16s %16s";

form = strcat(form,fo4);

fid = fopen("..\\tdata\\caco_nam.txt","r");

/* sscanf() is faster when nr and nc are specified: */

c_nam = fscanf(fid,form,.,1);

/* print c_nam; */

/*--- read data: 715 cols, first column is obs number ---*/

fid = fopen("..\\tdata\\caco.dat","r");

form = fo1 = " %g %g %g %g %g %g %g %g %g %g";

for (j = 2; j <= 71; j++) form = strcat(form,fo1);

fo5 = " %g %g %g %g %g";

form = strcat(form,fo5);

c_dat = fscanf(fid,form,27,.);

nr = nrow(c_dat); nc = ncol(c_dat);

print "Observations of c_dat.dat:",nr;

print "Columns of c_dat.dat:",nc;

cdat = c_dat[,2:715]; /* cut out col 1 */

Using the median of the response as cutoff value we create a binary response appropriate for our classificationproblem:


sopt = [ "ari" "std" "med" ];

mom = univar(cdat,sopt);

/* print mom; */

cutof = mom[3,714];

y = cdat[,714]; y = y .< cutof; cdat[,714] = y;

We now have 14 observation with zero and 13 observations with unity response. By specifying delt=2 we performan inner grid search of the method specific parameter δ for the points δ = .001, .01, 1., 10., 100. Unfortunately this issaver for obtaining a convergence region. This executes the algorithm without line search:

/* FSM: without line search */

modl = "714 = 1 : 713";

class = 714;


"pplan" ,

"meth" "fsm" ,

"delt" 2 ,

"kern" "line" ];

alfa = svm(cdat,modl,optn,class);

***************************************


***************************************

Variable Value Nobs Proportion

Y[714] 0 14 51.851852

1 13 48.148148

The results indicate that delta = .1 would be best and delta = 100 does not converge in the maximum number of400 iterations:

[note] Set Regularization Parameter C=0.00157787

---Start Training Cycle: Technique= FSM---

FSM: delta= 0.01: niter= 39 gnorm=0.000162 crit= -0.122911


FSM: delta= 1: niter= 13 gnorm=0.000954 crit= -0.122911


[warning] A suboptimal solution with a relaxed convergence

criterion 0.006463 is accepted for delta=100.


Solution for delta=0.01 is selected as best.

The model indicates 2 misclassifications which is unusual since it can easily be fit without any misclassification:


--------------------

| Predicted

Observed | 0 1

---------|-----------------

0 | 13 1

1 | 1 12


The linear plane with γ = gamma = 0.0647744 has only 6 nonzero coefficients corresponding to the followingvariables:

Largest 6 Values

----------------

1 75 -0.199383340

2 687 -0.107564833

3 632 0.041827466

4 633 0.041827466

5 630 -0.041827289

6 1 -0.005628912

This is an excellent result of feature selection.




Using the lines option we execute the algorithm with Armijo line search:

/* FSM: with line search */

modl = "714 = 1 : 713";

class = 714;


"pplan" ,

"meth" "fsm" ,

"delt" 2 ,

"lines" ,

"kern" "line" ];


The results indicate that delta = .1 would be best and delta = 100 does not converge in the maximum number of400 iterations:

[note] Set Regularization Parameter C=0.00157787

---Start Training Cycle: Technique= FSM---





[warning] A suboptimal solution with a relaxed convergence

criterion 0.006463 is accepted for delta=100.


Solution for delta=0.01 is selected as best.

The linear plane has only 6 nonzero coefficients corresponding to the following variables:

Largest 6 Values

----------------

1 75 -0.199383115


2 687 -0.107565035

3 632 0.041827348

4 633 0.041827348

5 630 -0.041827172

6 1 -0.005629567




The L2 Newton method NSVM converges fast but results in dense coefficient vectors:

/* NSVM: */

modl = "714 = 1 : 713";

class = 714;


"pplan" ,

"meth" "nsvm" ,

"kern" "line" ];


The algorithm converges after one iteration:

NSVM Iteration History

iter crit cdif gnorm xdif

1 -0.01142344 0.00985263 2.161e-015 0.00476242

with a perfect fit:


--------------------

| Predicted

Observed | 0 1

---------|-----------------

0 | 14 0

1 | 0 13

The computer time is obviously only for the long printed output (the linear plane with 713 values):




6. SVM Regression: MYSVM Data

(a) SVM Regression: FQP with QUAD

#include "mysvm.dat"

nobs = nrow(data);

x = data[.,1:11]; y = data[,12];

/* print nobs,y; */


modl = "12 = 1 : 11";

cup = [ 1. 10. 100. 1.e3 1.e4 ];

optn = [ "print" 3,

"popt" 1,

"ppred" ,

"c" "cup",

"meth" "fqp",

"kern" "line",

"loss" "quadr" ];

alfa = svm(data,modl,optn);

*************************************

**********-Fitting for C=1-**********

*************************************







Effective Time 0

**************************************

**********-Fitting for C=10-**********

**************************************







Effective Time 1

***************************************

**********-Fitting for C=100-**********

***************************************







Effective Time 1

****************************************

**********-Fitting for C=1000-**********


****************************************







Effective Time 1


-----------------------------------------------------------

c1 | 100.00000 26.000000 -192.50000 . 2.84e-014

c10 | 100.00000 88.000000 -192.49972 . 8.53e-014

c100 | 100.00000 88.000000 -192.47285 . 7.25e-013

c1000 | 100.00000 87.000000 -189.79255 . 5.81e-012

c10000 | 100.00000 0.0000000 -192.50000 . 2.36e-011


-----------------------------------------------------------

c1 | -11.000000 2.4484904 2.37e-015 384.99999 0.1019294

c10 | -10.999999 2.4484904 2.01e-015 384.99998 0.1019294

c100 | -11.000008 2.4484904 2.05e-015 384.99993 0.1019294

c1000 | -11.000096 2.4484904 -4.31e-015 384.99955 0.1019295

c10000 | -11.000000 2.4484904 1.02e-015 385.00000 0.1019294

| VC_Dim MSE_Train

--------------------------

c1 | 2309.1156 3.71e-015

c10 | 2309.1155 2.05e-013

c100 | 2309.1152 4.25e-011

c1000 | 2309.1129 4.68e-009

c10000 | 2309.1156 1.50e-015

(b) SVM Regression: DQP with QUAD

#include "mysvm.dat"

modl = "12 = 1 : 11";

cup = [ 1. 10. 100. 1.e3 1.e4 ];

optn = [ "print" 3,

"ppred" ,

"c" "cup",

"meth" "dqp",

"kern" "line",

"loss" "quadr" ];

alfa = svm(data,modl,optn);

Regularization Parameter C . . . . . . . . . . . . . . . 1000


Mean Squared Error (MSE, Training Data). . . . . 1.64816e-008






-----------------------------------------------------------

c1 | 95.000000 49.000000 -192.50000 . 2.01e-004

c10 | 96.000000 23.000000 -192.50000 . 8.74e-004

c100 | 96.000000 17.000000 -192.50000 . 4.09e-004

c1000 | 97.000000 15.000000 -192.50000 . 3.28e-004

c10000 | 98.000000 13.000000 -192.50000 . 4.69e-004


-----------------------------------------------------------

c1 | -10.999467 2.4484904 1.44e-015 384.98594 0.1019313

c10 | -11.001622 2.4484904 -1.43e-016 385.02890 0.1019256

c100 | -10.999963 2.4484904 1.19e-015 384.99941 0.1019295

c1000 | -10.999672 2.4484904 1.29e-015 384.99100 0.1019306

c10000 | -11.000052 2.4484904 2.04e-016 385.00861 0.1019283

| VC_Dim MSE_Train

--------------------------

c1 | 2309.0313 1.78e-008

c10 | 2309.2889 1.24e-007

c100 | 2309.1121 1.92e-008

c1000 | 2309.0617 1.65e-008

c10000 | 2309.1672 3.46e-008

7. Some Useful Modules

(a) CMAT Formulation of LSVM Algorithm


x = data[,2:14]; y = data[,1];

m = nrow(x); n = ncol(x); np = n+1;

nu = 1.; rnu = 1. / nu; alf = 1.9 / nu;

D = diag(y);

A = x -> cons(m,1,-1.);

H = D * A; HH = H * H‘; /* HH is a BIG m by m matrix */

Q = H‘ * H + cons(np,np,rnu,’d’);

/* print "Starting Q=", Q; */

S = H * inv(Q);

/* print "Starting S=", S; */

u = nu * (1. - S * H‘ * cons(m,1,1.));

/* print "Starting u:", u; */

ou = u + 1.;

it = 0; itmax = 50; tol = 1.e-6;

while (it < itmax) {

res = norm(ou - u,"inf");

print "Iteration", it, " Residual=", res;

if (res < tol) break;

w = (rnu - alf) * u + HH * u - 1.;

w = .5 * (abs(w) + w);

z = 1. + w; ou = u;


u = nu * (z - S * H‘ * z);

it++;

}

w = x‘ * D * u; v = D * u; gamma = -v[+];

print "LSVM: gamma=", gamma;

print "LSVM solution w=", w;

Iteration 0 Residual= 1.0000




............................




LSVM: gamma=-0.6070

LSVM solution w=

| 1

--------------

1 | -0.10436

2 | 0.21681

3 | 0.35047

4 | 0.35256

5 | 0.38100

6 | -0.11840

7 | 0.10668

8 | -0.41184

9 | 0.13739

10 | 0.33916

11 | 0.12675

12 | 0.55031

13 | 0.25316

(b) CMAT Formulation of PSVM Algorithm

function psvm(A,D,nu) {

m = nrow(A); n = ncol(A); np = n + 1;

e = cons(m,1,-1.);

H = D * [ A e ]; r = H[+,]‘;

rnu = 1. / nu;

Q = H‘ * H + cons(np,np,rnu,’d’);

x = Q \ r;

u = nu * (1. - H * x); s = D * u;

w = A‘ * s; gamma = -s[+];

return(w,gamma);

}


modl = "1 = 2 : 14";

class = 1;

x = data[,2:14]; y = data[,1];

406 Details: Robust Regression LMS and LTS

d = diag(y); nu = 1.;

< w,gamma > = psvm(x,d,nu);

print "PSVM Results: w=", w;

PSVM gamma=-0.3867

PSVM Results: w=

| 1

--------------

1 | -0.07006

2 | 0.15839

3 | 0.28357

4 | 0.20754

5 | 0.23266

6 | -0.08271

7 | 0.08012

8 | -0.33638

9 | 0.11754

10 | 0.25561

11 | 0.09985

12 | 0.40073

13 | 0.23962

1.8 Details on Robust Regression: LMS and LTS

1.8.1 Estimation Principles

The LMS (Least Median of Squares) and LTS (Least Trimmed Squares) subroutines perform robust regression (sometimescalled resistant regression). They are able to detect outliers and perform a least-squares regression on the remainingobservations.

The LMS and LTS and other robust methods (e.g. MVE and MCD) were developed in Rousseeuw (1984) and Rousseeuwand Leroy (1987).

Whereas robust regression methods like L1 (see the reg function) or Huber M-estimators only reduce the influence ofoutliers (compared to least-squares or L2 regression), resistant regression methods like LMS and LTS can completelydisregard influential outliers, sometimes called leverage points, when fitting the model.

The algorithms used in the LMS and LTS subroutines are based on the program PROGRESS by Rousseeuw and Leroy(1987). The (Rousseeuw and Leroy, 1987) reference also provides much additional information. Rousseeuw and Hubert(1997) prepared a new version of PROGRESS incorporating several recent developments. Among other things, the newversion of PROGRESS now yields the exact LMS for simple regression, and uses a new definition of the robust coefficientof determination (R2). Therefore, the outputs may differ slightly from those given in Rousseeuw and Leroy (1987) or thoseobtained from software based on the older version of PROGRESS.

For a given m (response) vector y = (y1, . . . , ym) and m × n (design) matrix X = (xij) the linear regression problemconsists of estimating the intercept c and the n slope parameters (coefficients) β = (b1, . . . , bn)T following the model

yi = c +

n∑

j=1

bjxij + εi, for i = 1, . . . ,m

Details: Robust Regression LMS and LTS 407

Various kinds of linear regression methods differ in the norm the apply in measuring the residual vector r = (ri)

ri = yi − (c +

n∑

j=1

bjxij) for i = 1, . . . ,m

i.e. the form of objective function used to minimize the residuals ri. We then denote the ordered, squared residuals as

(r2)1:m ≤ . . . ≤ (r2)m:m

The objective functions for the LMS and LTS optimization problems are defined as follows:

• LMS: Minimize the hth ordered squared residual: Find b = (b1, . . . , bn) such that

FLMS = (r2)h:m −→ min

Note that for h = [m/2] + 1 the hth quantile is the median of the squared residuals. The default h in PROGRESSis h = [m+n+1

2] which yields the highest possible breakdown value. (Here [k] denotes the integer part of k.)

• LTS: Minimize the sum of the h smallest squared residuals: Find b = (b1, . . . , bn) such that

FLTS =

√√√√ 1

h

h∑

i=1

(r2)i:m −→ min

The value of h should be chosen in the range

m

2+ 1 ≤ h ≤ 3m

4+

n + 1

4

where m is the number of observations and n is the number of regressors.

The value of h determines the breakdown point which is ”the smallest fraction of contamination that can cause the estimatorT to take on values arbitrarily far from T (Z)” (see Rousseeuw and Leroy, 1987, p.10). Here, T denotes an estimator andT (Z) applies T to a sample Z. The value of h may be specified, see the Syntax section below, but in most applications thedefault value works just fine and the results seem to be quite stable toward different choices of h.

More illustrative, Rousseeuw and Leroy state (1987, p. 262): The LMS tries to find the narrowest strip covering h of thedata points. The LTS tries to find the h data points for which the sum of the squared residuals is smallest.

Because of the nonsmooth form of these objective functions, the estimates cannot be obtained with traditional optimizationalgorithms. For LMS and LTS our algorithm, like PROGRESS, selects a number of subsets of n observations out of the mgiven observations, evaluates the objective function, and saves the subset with the lowest objective function. If computertime does not permit us to evaluate all different subsets, a random collection of subsets is evaluated.

Note that the LMS and LTS subroutines are only executed when the number m of observations is over twice the numbern of explanatory variables xj (including the intercept), i.e. if m > 2n.

1.8.2 Syntax for LMS and LTS

CALL LMS(sc, coef,wgt, opt, y <,< x ><, sorb >>);

CALL LTS(sc, coef,wgt, opt, y <,< x ><, sorb >>);

where

sc is a returned column vector containing a variety of scalar results.

coef is a returned matrix with n columns, where the first row contains the optimal regression coefficients. Depending onthe options settings more rows may be attached.


wgt is a returned matrix with m columns, where the first two rows contain the weights and residuals and a third row maybe attached.

opt is an input vector specifying options.

y is an m vector with the response data.

x is an m × n matrix with the regressor data (not needed for the location problem).

sorb is an optional n input vector (almost never needed).

The following are input arguments:

opt refers to an options vector with the following components (missing values are treated as default values). The optionsvector may be a null vector.

1. opt[1] specifies whether an intercept (opt[1]=0) is used in the model or not (opt[1] 6= 0). If opt[1]=0, thena column of 1’s is added as the last column to the input matrix X, i.e. the user does not need to add thiscolumn of 1’s himself. Default is opt[1]=0.

2. opt[2] specifies the amount of printed output. Higher values request additional output and include the outputof lower values.

• opt[2]=0: no output is printed except error messages

• opt[2]=1: print most output except the following:

– arrays of O(m) like weights, residuals, and diagnostic

– history of the optimization process

– subsets that result in singular linear systems

• opt[2]=2: additionally print arrays of O(m) like weights, residuals, and diagnostic; also print case numbersof observations in best subset and some basic history of the optimization process

• opt[2]=3: additionally print some detail history of the exchange algorithm

• opt[2]=4: additionally print subsets that result in singular linear systems

Default is opt[2]=0.

3. opt[3] specifies whether only LMS or LTS is computed or additionally least-squares (LS) and weighted leastsquares (WLS) regression are computed:

(a) opt[3]=0: only LMS or LTS is computed

(b) opt[3]=1: additionally to LMS or LTS compute weighted least-squares regression on the observationswith small LMS or LTS residuals (where small is defined by opt[8])

(c) opt[3]=2: additionally to LMS or LTS compute unweighted least-squares regression

(d) opt[3]=3: add both unweighted and weighted least-squares regression to LMS and LTS regression.


4. opt[4] specifies the quantile h to be minimized. This is the number used in the objective function. Default isopt[4]= h = [m+n+1

2] which corresponds to the highest possible breakdown value. This is also the default of

the PROGRESS program. The value of h should be in the range: m2 + 1 ≤ h ≤ 3m

4 + n+14 .

5. opt[5] specifies the number NRep of generated subsets. Each subset consists of n observations (k1, . . . , kn),where 1 ≤ ki ≤ m. The total number of subsets consisting of n observations out of m observations is

Ntot =

(mn

)=

∏n

j=1(m − j + 1)∏n

j=1j

where n is the number of parameters including the intercept. Due to computer time restrictions not all subsetcombinations of n observations out of m can be inspected for larger values of m and n. Specifying a value ofNRep < Ntot permits the user to save computer time at the expense of computing a suboptimal solution. Ifopt[5] is zero or missing, the default number of subsets is taken from the following table:


n 1 2 3 4 5 6 7 8 9 10

Nlower 500 50 22 17 15 14 0 0 0 0Nupper 1000000 1414 182 71 43 32 27 24 23 22

NRep 500 1000 1500 2000 2500 3000 3000 3000 3000 3000

n 11 12 13 14 15

Nlower 0 0 0 0 0Nupper 22 22 22 23 23

NRep 3000 3000 3000 3000 3000

If the number of cases (observations) m is smaller than Nlower then all possible subsets are used,otherwise NRep subsets are drawn randomly. That means, for opt[5]=-1 an exhaustive search is done.If m is larger than Nupper , a note is printed in the log file telling how many subsets exist.

6. opt[6] specifies an algorithm which may be run following the random subsampling algorithm and willtry to improve a solution by pairwise exchange of one observation from the current p-subset (swap-out) with an observation from its complementary subset (swap-in). The algorithm may be costly incomputer time and is not guaranteed to improve the solution. The following three algorithms areavailable:

(a) opt[6]=1: complete pairwise exchange: looking for the best possible pair to exchange (mostcostly).

(b) opt[6]=2: partial pairwise exchange: looking for the first exchange pair with somehow significantimprovement.

(c) opt[6]=3: partial pairwise exchange: consider only exchanges with a promising set of observations.

Default is opt[6]=0, i.e. no pairwise exchange algorithm is excuted. In a small number of numericalsimulations the exchange algorithm opt[6]=2 performed best and algorithm opt[6]=3 performedworst.

7. opt[7] specifies that the latest argument sorb contains a given parameter vector b rather than a givensubset for which the objective function should be evaluated.

8. opt[8] is relevant only for LS and WLS regression (opt[3] >0). It specifies whether the covariancematrix of parameter estimates and approximate standard errors (ASE) are computed and printed:

(a) opt[8]=0: do not compute covariance matrix and ASE’s(b) opt[8]=1: compute covariance matrix and ASE’s but only print the ASE’s(c) opt[8]=3: compute and print both covariance matrix and ASE’s


y refers to an m response vector y

x refers to an m × n matrix X of regressors. If opt[1] is zero or missing, an intercept xn+1 ≡ 1 is added by default asthe last column of X. If the matrix X is not specified, y is analyzed as a univariate data set (location problem, seeBarnett & Lewis example).

sorb refers to a n vector containing either

• n observation numbers of a subset for which the objective function should be evaluated and which may be thestart for a pairwise exchange algorithm if opt[6] is specified

• n given parameters b = (b1, . . . , bn) (including intercept if necessary) for which the objective function shouldbe evaluated.

Missing values are not permitted in x or y. Missing values in opt cause the default value to be used.

The LMS and LTS subroutines return the following values:


sc is a column vector containing the following scalar information, where rows 1 to 9 correspond to LMS or LTS regressionand rows 11 to 14 correspond either to LS or WLS:

1. sc[1] the quantile h used in the objective function

2. sc[2] number of subsets generated

3. sc[3] number of subsets with singular linear systems

4. sc[4] number of nonzero weights wi

5. sc[5] lowest value of the objective function FLMS or FLTS attained

6. sc[6] preliminary LMS or LTS scale estimate SP

7. sc[7] final LMS or LTS scale estimate SF

8. sc[8] robust R2 (coefficient of determination)

9. sc[9] asymptotic consistency factor

If opt[3] > 0 then the following can also be set:

1. sc[11] LS or WLS objective function (sum of squared residuals)

2. sc[12] LS or WLS scale estimate

3. sc[13] R2 value for LS or WLS

4. sc[14] F value for LS or WLS

For opt[3]=1 or opt[3]=3 these rows correspond to WLS estimates, for opt[3]=2 to LS estimates.

coef is a matrix with n columns containing the following results in its rows:

1. coef[1,] LMS or LTS parameter estimates

2. coef[2,] indices of observations in the best subset

If opt[3] > 0 then the following can also be set:

1. coef[3,] LS or WLS parameter estimates

2. coef[4,] approximate standard errors of LS or WLS estimates

3. coef[5,] t values

4. coef[6,] p values

5. coef[7,] lower boundary of Wald confidence intervals

6. coef[8,] upper boundary of Wald confidence intervals

For opt[3]=1 or opt[3]=3 these rows correspond to WLS estimates, for opt[3]=2 to LS estimates.

wgt is a matrix with m columns containing the following results in its rows:

1. wgt[1,] weights (=1 for small, =0 for large residuals)

2. wgt[2,] residuals ri = yi − xib

3. wgt[3,] resistant diagnostic ui (note, that the resistant diagnostic cannot be computed for a perfect fit wherethe objective function is zero or nearly zero)

1.8.3 Some Computational Details

The breakdown value ε∗m of LMS and LTS computes to:

ε∗m =

{ h−n+1m

if h < m+n+12

m−h+1m

if h ≥ m+n+12


and the value h = m+n+12

corresponds to the highest possible breakdown value.

When run with the print option optn[2]> 0, some summary statistics are printed including the median and the dispersion(MAD). The MAD is defined as

MAD(y1, . . . , ym) = medianmi=1 |yi − medianm

j=1 yj|

If no intercept is included in the model, the dispersions sj are computed as (Rousseeuw & Leroy, 1987, p. 63)

sj = 1.4826medianmi=1 |xij|

since we now consider deviations from zero (rather than a central statistic).

The preliminary LMS and LTS scale estimate SP is defined as the product of the asymptotic consistency factor ACF andthe objective function F (xi, yi)

SP = s(xi, yi) = ACF ∗ F (xi, yi)

The asymptotic consistency factors are defined as

ACFLMS = cα =1

Φ−1(α+12

)

and

ACFLTS =1√

1 − 2αcα

ϕ( 1cα

)

where α = h/m and Φ−1(p) defines the quantile from the standard normal distribution (see the PROBIT function in theSAS Language) and

ϕ(x) =1√2Π

e−x2/2

The final LMS and LTS scale estimate is defined as

SF =

√ ∑m

i=1wir2

i∑m

i=1wi − n

The robust R2 (coefficient of determination) is defined as

R2 = 1 − (s(xi, yi)

s(1, yi))2

where s(1, yi) = ACF ∗ F (1, yi) is computed like the objective function substituting the regressors xi by the constant 1.In case of regression through the origin, we define

R2 = 1 − (s(xi, yi)

s(0, yi))2

The ”outlyingness” ui (see Rousseeuw and Leroy, 1987, pp. 238) is computed using the residuals ri and the value of theobjective function and obtaining the maximum

ui = maxb(τ)

|r(τ)i |

s(τ)(xi, yi)

across all n-subsets used in the computation of LMS and LTS. The resistant diagnostic is the ratio of the ”outlyingness”values Ui and their median,

diagnostici =ui

medianmj=1uj

Comparing LMS and LTS, Rousseeuw and Leroy (1987) state that LTS is more efficient asymptotically than LMS.


1.8.4 Flow Chart for LMS and LTS

1.8.5 Examples for LMS and LTS

Hertzsprung-Russell Star Data: LMS Regression

These data are listed in chapter 7. They are reported in Rousseeuw & Leroy (1987, p. 27) and are based on Humphreys(1978) and Vansina & De Greve (1982). The 47 observations correspond to 47 stars of the CYG OB1 cluster in thedirection of Cygnus. The regressor variable (column 2) x is the logarithm of the effective temperature at the surface of thestar (Te) and the response variable (column 3) y is the logarithm of its light intensity (L/L0).

• Cases (rows): measurements of 47 stars in cluster CYG OB1

• column 1: number of star

• Column 2: logarithm of effective temperature on star surface

• Column 3: logarithm of light intensity

The results for LS and LMS on page 28 of Rousseeuw & Leroy are based on a more precise (5 decimal places) version ofthe data set. This data set is remarkable since it contains four substantial leverage points (giant stars) corresponding toobservations 11, 20, 30, and 34, which greatly affect the results of L2 and even L1 regression.

x = xy[,2]; y = xy[,3];

optn = cons(10,1,.);

optn[1]= 3;

optn[3]= 3;

optn[7]= 1;

optn[8]= 1;

print "*** Hertzsprung-Russell Star Data: LMS Regression ***";

< parm,ase,conf,covm,res> = reg(y,x,"lms",optn);

First some simple statistics is printed:

Number of Variables 2 Number of Observations 47

Specified Quantile 25 Default Quantile 25

Breakdown Value 48.94 Highest Breakdown 48.94

Total Number Subsets 1081 N Subset (Specified) 0

Variable MinVal 1st Qu. Median Mean 3rd Qu. MaxVal

X1 3.480000 4.26000 4.420000 4.310000 4.45000 4.620000

Intercep 1.000000 1.00000 1.000000 1.000000 1.00000 1.000000

Response 3.940000 4.50000 5.100000 5.012128 5.42000 6.290000

Variable StdDev MAD Dispersion S_n

X1 0.29082342 0.11000000 0.16308624 0.14590594

Intercep 0.00000000 0.00000000 0.00000000 0.00000000

Response 0.57124934 0.45000000 0.66717100 0.60794143

Pearson Correlation Matrix

--------------------------


Start w/o Initial Point Start w. Initial Point

?

?

?LS Estimation

?LMS or LTS Estimation by:

Enumeration vs. Random Sampling?

?? ?Enumeration Random Subsamp. Obj.Function

?

?

Improve: Pairwise Exchange?

?WLS Estimation: Weights from LMS or LTS

Flow Chart Indicating: LS −→ [ LMS or LTS ] −→ WLSSeparate LMS or LTS Part Inside Dashbox Corresponds to MVE

Figure 1.5: Flow Chart for LMS and LTS


1| 1

2| . 1

3| -0.2104 . 1

Spearman Rank Correlation Matrix

--------------------------------

1| 1

2| 0.5022 1

3| 0.2969 0.5002 1

Classical Covariance Matrix

---------------------------

1| 0.08458

2| 0 0

3| -0.03496 0 0.3263

The LS solution does not show any large residuals, however, the R2 value is very small:

***********************************

Unweighted Least-Squares Estimation

***********************************

LS Parameter Estimates

Var Estimate AStdErr T Value Prob Low W_CI Upp W_CI

X1 -0.4133039 0.286257 -1.4438 0.1557 -0.974358 0.147750

Intercep 6.7934673 1.236516 5.4940 2e-006 4.369941 9.216993

Sum of Squares 14.346395 Degrees of Freedom 45

LS Scale Estimate 0.5646315 R-squared 0.0442737

F(1,45) Statistic 2.0846121 Probability 0.1557164

COV Matrix of Parameter Estimates

---------------------------------

1| 0.08194

2| -0.3532 1.529

LS Residuals

Case Observed Estimated Residual Res / S Weight

1 5.2300000 4.9873294 0.2426706 0.4297857 1.00000

2 5.7400000 4.9088017 0.8311983 1.4721075 1.00000

3 4.9300000 5.0327929 -0.1027929 -0.1820530 1.00000

4 5.7400000 4.9088017 0.8311983 1.4721075 1.00000

5 5.1900000 5.0162607 0.1737393 0.3077039 1.00000

6 5.4600000 4.9501321 0.5098679 0.9030100 1.00000

7 4.6500000 5.2063805 -0.5563805 -0.9853868 1.00000

8 5.2700000 4.9046687 0.3653313 0.6470261 1.00000

9 5.5700000 5.0327929 0.5372071 0.9514296 1.00000


10 5.1200000 4.9873294 0.1326706 0.2349684 1.00000

11 5.7300000 5.3510368 0.3789632 0.6711690 1.00000

12 5.4500000 4.9625312 0.4874688 0.8633397 1.00000

13 5.4200000 4.9418660 0.4781340 0.8468071 1.00000

14 4.0500000 5.1361188 -1.0861188 -1.9235887 1.00000

15 4.2600000 5.0203937 -0.7603937 -1.3467079 1.00000

16 4.5800000 4.9666642 -0.3866642 -0.6848081 1.00000

17 3.9400000 5.0451920 -1.1051920 -1.9573685 1.00000

18 4.1800000 4.9666642 -0.7866642 -1.3932347 1.00000

19 4.1800000 5.0451920 -0.8651920 -1.5323125 1.00000

20 5.8900000 5.3510368 0.5389632 0.9545396 1.00000

21 4.3800000 5.0203937 -0.6403937 -1.1341799 1.00000

22 4.2200000 5.0203937 -0.8003937 -1.4175505 1.00000

23 4.4200000 4.9666642 -0.5466642 -0.9681787 1.00000

24 4.8500000 4.9377330 -0.0877330 -0.1553809 1.00000

25 5.0200000 4.9831964 0.0368036 0.0651816 1.00000

26 4.6600000 4.9666642 -0.3066642 -0.5431228 1.00000

27 4.6600000 5.0203937 -0.3603937 -0.6382813 1.00000

28 4.9000000 4.9831964 -0.0831964 -0.1473463 1.00000

29 4.3900000 5.0493250 -0.6593250 -1.1677084 1.00000

30 6.0500000 5.3551699 0.6948301 1.2305904 1.00000

31 4.4200000 4.9831964 -0.5631964 -0.9974583 1.00000

32 5.1000000 4.9088017 0.1911983 0.3386249 1.00000

33 5.2200000 4.9542651 0.2657349 0.4706341 1.00000

34 6.2900000 5.3510368 0.9389632 1.6629662 1.00000

35 4.3400000 5.0451920 -0.7051920 -1.2489419 1.00000

36 5.6200000 4.8840035 0.7359965 1.3034988 1.00000

37 5.1000000 4.9212008 0.1787992 0.3166653 1.00000

38 5.2200000 4.9542651 0.2657349 0.4706341 1.00000

39 5.1800000 4.9212008 0.2587992 0.4583506 1.00000

40 5.5700000 4.9625312 0.6074688 1.0758677 1.00000

41 4.6200000 4.9831964 -0.3631964 -0.6432450 1.00000

42 5.0600000 4.9542651 0.1057349 0.1872635 1.00000

43 5.3400000 4.9335999 0.4064001 0.7197616 1.00000

44 5.3400000 4.9542651 0.3857349 0.6831621 1.00000

45 5.5400000 4.9129347 0.6270653 1.1105743 1.00000

46 4.9800000 4.9542651 0.0257349 0.0455782 1.00000

47 4.5000000 4.9666642 -0.4666642 -0.8264934 1.00000

Distribution of Residuals

MinRes 1st Qu. Median Mean 3rd Qu. MaxRes

-1.1 -0.55 0.133 -1.8e-016 0.41 0.94

Looking at Best Crit column of the iteration history table we see that with complete enumeration the optimal solutionwas found very early here:

There are 1081 subsets of 2 cases out of 47 cases.

All 1081 subsets will be considered.

***********************************************

*** Complete Enumeration for LMS ***


***********************************************

Subset Singular Best Crit Pct

271 5 0.3928 25%

541 8 0.3928 50%

811 27 0.3928 75%

1081 45 0.3928 100%

Minimum Criterion = 0.392791

****************************************

Least Median of Squares (LMS) Regression

****************************************

Minimizing 25th Ordered Squared Residual



All Subset Selection 1081 N Singular Subsets 45

Objective Function 0.2620588 Preliminary Scale 0.3987302

Robust R Squared 0.5813149 Final Scale Estimate 0.3645644

Observations of Best Subset

---------------------------

1 : 2 29

Estimated Coefficients

----------------------

1 : 3.971 -12.63

LMS Residuals


1 5.2300000 4.7235294 0.5064706 1.3892484 1.00000

2 5.7400000 5.4779412 0.2620588 0.7188271 1.00000

3 4.9300000 4.2867647 0.6432353 1.7643939 1.00000

4 5.7400000 5.4779412 0.2620588 0.7188271 1.00000

5 5.1900000 4.4455882 0.7444118 2.0419209 1.00000

6 5.4600000 5.0808824 0.3791176 1.0399194 1.00000

7 4.6500000 2.6191176 2.0308824 5.5707087 0.00000

8 5.2700000 5.5176471 -0.2476471 -0.6792957 1.00000

9 5.5700000 4.2867647 1.2832353 3.5199134 0.00000

10 5.1200000 4.7235294 0.3964706 1.0875185 1.00000

11 5.7300000 1.2294118 4.5005882 12.345110 0.00000

12 5.4500000 4.9617647 0.4882353 1.3392290 1.00000

13 5.4200000 5.1602941 0.2597059 0.7123730 1.00000

14 4.0500000 3.2941176 0.7558824 2.0733847 1.00000

15 4.2600000 4.4058824 -0.1458824 -0.4001552 1.00000

16 4.5800000 4.9220588 -0.3420588 -0.9382671 1.00000

17 3.9400000 4.1676471 -0.2276471 -0.6244357 1.00000


18 4.1800000 4.9220588 -0.7420588 -2.0354668 1.00000

19 4.1800000 4.1676471 0.0123529 0.0338841 1.00000

20 5.8900000 1.2294118 4.6605882 12.783990 0.00000

21 4.3800000 4.4058824 -0.0258824 -0.0709953 1.00000

22 4.2200000 4.4058824 -0.1858824 -0.5098751 1.00000

23 4.4200000 4.9220588 -0.5020588 -1.3771470 1.00000

24 4.8500000 5.2000000 -0.3500000 -0.9600497 1.00000

25 5.0200000 4.7632353 0.2567647 0.7043054 1.00000

26 4.6600000 4.9220588 -0.2620588 -0.7188271 1.00000

27 4.6600000 4.4058824 0.2541176 0.6970445 1.00000

28 4.9000000 4.7632353 0.1367647 0.3751455 1.00000

29 4.3900000 4.1279412 0.2620588 0.7188271 1.00000

30 6.0500000 1.1897059 4.8602941 13.331783 0.00000

31 4.4200000 4.7632353 -0.3432353 -0.9414941 1.00000

32 5.1000000 5.4779412 -0.3779412 -1.0366924 1.00000

33 5.2200000 5.0411765 0.1788235 0.4905128 1.00000

34 6.2900000 1.2294118 5.0605882 13.881190 0.00000

35 4.3400000 4.1676471 0.1723529 0.4727640 1.00000

36 5.6200000 5.7161765 -0.0961765 -0.2638120 1.00000

37 5.1000000 5.3588235 -0.2588235 -0.7099527 1.00000

38 5.2200000 5.0411765 0.1788235 0.4905128 1.00000

39 5.1800000 5.3588235 -0.1788235 -0.4905128 1.00000

40 5.5700000 4.9617647 0.6082353 1.6683889 1.00000

41 4.6200000 4.7632353 -0.1432353 -0.3928943 1.00000

42 5.0600000 5.0411765 0.0188235 0.0516329 1.00000

43 5.3400000 5.2397059 0.1002941 0.2751067 1.00000

44 5.3400000 5.0411765 0.2988235 0.8196727 1.00000

45 5.5400000 5.4382353 0.1017647 0.2791405 1.00000

46 4.9800000 5.0411765 -0.0611765 -0.1678070 1.00000

47 4.5000000 4.9220588 -0.4220588 -1.1577070 1.00000



-0.74 -0.19 0.172 0.527 0.4 5.1

The resistant diagnostic is an indicator for the outlyingness of each observation:

Resistant Diagnostic

Case U Diagnostic

1 2.14267352 0.92129969

2 2.04278783 0.87835117

3 3.00943396 1.29398649

4 2.04278783 0.87835117

5 3.12982005 1.34574970

6 1.73873874 0.74761715

7 9.02702703 3.88141131

8 2.11341991 0.90872132

9 4.98843188 2.14490949


10 1.78920308 0.76931564

11 17.7750643 7.64286351

12 2.08108108 0.89481638

13 1.36111111 0.58524607

14 5.43803419 2.33822801

15 3.40836237 1.46551641

16 2.32570741 1.00000000

17 4.36445993 1.87661609

18 3.38888889 1.45714327

19 3.72891986 1.60334866

20 18.2892031 7.86393123

21 3.09059233 1.32888270

22 3.51428571 1.51106098

23 2.74954846 1.18224178

24 2.07532468 0.89234126

25 1.34008097 0.57620360

26 2.11378688 0.90887911

27 2.34912892 1.01007071

28 1.55052265 0.66668861

29 3.19094077 1.37203019

30 18.9627249 8.15352993

31 2.82179410 1.21330572

32 2.45454545 1.05539736

33 1.00000000 0.42997670

34 19.5745501 8.41660051

35 3.30522648 1.42117038

36 2.31914894 0.99718001

37 1.97662338 0.84990200

38 1.00000000 0.42997670

39 1.74112554 0.74864342

40 2.51351351 1.08075225

41 2.29199278 0.98550349

42 1.00000000 0.42997670

43 1.00000000 0.42997670

44 1.43243243 0.61591257

45 1.64516129 0.70738103

46 1.21192053 0.52109759

47 2.53762793 1.09112089

Median(U)= 2.32571

The output for WLS regression follows. Due to the size of the scaled residuals, six observations (with numbers 7, 9, 11,20, 30, 34) where assigned zero weights in the following WLS analysis. The R2 value of the WLS is much better:

*********************************

Weighted Least-Squares Estimation

*********************************

RLS Parameter Estimates Based on LMS

Var Estimate AStdErr T Value Prob Low W_CI Upp W_CI

X1 3.0461569 0.437339 6.9652 2e-008 2.188988 3.903326

Intercep -8.5000549 1.926308 -4.4126 8e-005 -12.27555 -4.724561


Weighted SSQ 4.5281945 Degrees of Freedom 39

RLS Scale Estimate 0.3407456 Weighted R-squared 0.5543574

F(1,39) Statistic 48.514066 Probability 2.39e-008

Average Weight 0.8723404 N Nonzero Weights 41

COV Matrix of Parameter Estimates

---------------------------------

1| 0.1913

2| -0.8421 3.711

Weighted LS Residuals


1 5.2300000 4.8116509 0.4183491 1.2277461 1.00000

2 5.7400000 5.3904207 0.3495793 1.0259245 1.00000

3 4.9300000 4.4765737 0.4534263 1.3306888 1.00000

4 5.7400000 5.3904207 0.3495793 1.0259245 1.00000

5 5.1900000 4.5984199 0.5915801 1.7361342 1.00000

6 5.4600000 5.0858051 0.3741949 1.0981652 1.00000

7 4.6500000 3.1971878 1.4528122 4.2636275 0.00000

8 5.2700000 5.4208823 -0.1508823 -0.4428005 1.00000

9 5.5700000 4.4765737 1.0934263 3.2089230 0.00000

10 5.1200000 4.8116509 0.3083491 0.9049246 1.00000

11 5.7300000 2.1310328 3.5989672 10.562036 0.00000

12 5.4500000 4.9944203 0.4555797 1.3370082 1.00000

13 5.4200000 5.1467282 0.2732718 0.8019820 1.00000

14 4.0500000 3.7150344 0.3349656 0.9830372 1.00000

15 4.2600000 4.5679584 -0.3079584 -0.9037780 1.00000

16 4.5800000 4.9639588 -0.3839588 -1.1268195 1.00000

17 3.9400000 4.3851890 -0.4451890 -1.3065143 1.00000

18 4.1800000 4.9639588 -0.7839588 -2.3007159 1.00000

19 4.1800000 4.3851890 -0.2051890 -0.6021764 1.00000

20 5.8900000 2.1310328 3.7589672 11.031595 0.00000

21 4.3800000 4.5679584 -0.1879584 -0.5516091 1.00000

22 4.2200000 4.5679584 -0.3479584 -1.0211677 1.00000

23 4.4200000 4.9639588 -0.5439588 -1.5963781 1.00000

24 4.8500000 5.1771898 -0.3271898 -0.9602172 1.00000

25 5.0200000 4.8421125 0.1778875 0.5220537 1.00000

26 4.6600000 4.9639588 -0.3039588 -0.8920403 1.00000

27 4.6600000 4.5679584 0.0920416 0.2701183 1.00000

28 4.9000000 4.8421125 0.0578875 0.1698848 1.00000

29 4.3900000 4.3547274 0.0352726 0.1035160 1.00000

30 6.0500000 2.1005713 3.9494287 11.590550 0.00000

31 4.4200000 4.8421125 -0.4221125 -1.2387908 1.00000

32 5.1000000 5.3904207 -0.2904207 -0.8523097 1.00000

33 5.2200000 5.0553435 0.1646565 0.4832242 1.00000

34 6.2900000 2.1310328 4.1589672 12.205491 0.00000

35 4.3400000 4.3851890 -0.0451890 -0.1326179 1.00000

36 5.6200000 5.5731902 0.0468098 0.1373747 1.00000

37 5.1000000 5.2990360 -0.1990360 -0.5841192 1.00000

38 5.2200000 5.0553435 0.1646565 0.4832242 1.00000

39 5.1800000 5.2990360 -0.1190360 -0.3493399 1.00000


40 5.5700000 4.9944203 0.5755797 1.6891772 1.00000

41 4.6200000 4.8421125 -0.2221125 -0.6518426 1.00000

42 5.0600000 5.0553435 0.0046565 0.0136657 1.00000

43 5.3400000 5.2076513 0.1323487 0.3884091 1.00000

44 5.3400000 5.0553435 0.2846565 0.8353931 1.00000

45 5.5400000 5.3599592 0.1800408 0.5283732 1.00000

46 4.9800000 5.0553435 -0.0753435 -0.2211136 1.00000

47 4.5000000 4.9639588 -0.4639588 -1.3615988 1.00000



-0.78 -0.22 0.092 0.383 0.35 4.2

Details: Outlier Detection: MVE and MCD 421

1.9 Details on Outlier Detection: MVE and MCD

1.9.1 Estimation Principles

• MVE: robust (resistent) estimation of multivariate location and scatter, defined by minimizing the volume of anellipsoid containing h points.

• MCD: robust (resistent) estimation of multivariate location and scatter, defined by minimizing the determinant ofthe covariance matrix computed from h points.

Here h is defined in the rangem

2+ 1 ≤ h ≤ 3m

4+

n + 1

4where m is the number of observations and n is the number of regressors.

The value of h determines the breakdown point which is ”the smallest fraction of contamination that can cause the estimatorT to take on values arbitrarily far from T (Z)” (see Rousseeuw and Leroy, 1987, p.10). Here, T denotes an estimator andT (Z) applies T to a sample Z. The value of h may be specified, see the Syntax section below, but in most applications thedefault value works just fine and the results seem to be quite stable toward different choices of h.

The MVE and MCD subroutines compute the minimum volume ellipsoid estimator and the minimum covariance determi-nant estimator. These robust locations and covariance matrices can be used to detect multivariate outliers and leveragepoints. For this purpose, the MVE and MCD subroutines provide a table of robust distances.

The MVE and MCD and other robust methods (e.g. LMS and LTS) were developed in Rousseeuw (1984) and Rousseeuwand Leroy (1987). Some applications and properties of the MVE can be found in Rousseeuw and Van Zomeren (1990),Lopuhaa and Rousseeuw (1991), and Davies (1992).

The MVE algorithm is based on the algorithm used in the MINVOL program by P.J. Rousseeuw. The MCD algorithm isbased on the FAST-MCD program developed by P.J. Rousseeuw and K. Van Driessen (1997). In their paper Rousseeuwand Van Driessen recommend the use of MCD over that of MVE for the following reasons:

1. The MCD has a higher statistical efficiency because it is asymptotically normal (Butler, Davies, and Thun, 1993).Therefore the MCD yields more precise results.

2. The MCD algorithm is faster than the MVE algorithm and allows to deal with a much larger number of observations.

3. The MCD algorithm is able to detect an exact fit situation, i.e. when at least h observations lie on a hyperplane.

The objective functions for the MVE and MCD are:

• MVE: find T and C such thatFMV E =

√det(C) −→ min

subject to dh:m =√

χ2n,0.5 , where the Mahalanobis-type distances are computed as,

di =√

(xi − T )T C−1(xi − T )

where T is a location estimate vector and C is a scatter matrix estimate. Here, dh:m is the hth quantile of theMahalanobis-type distances di.

• MCD: find a subset J of h points such that

FMCD =√

det(CJ) −→ min

where CJ is the usual covariance matrix of J . Then put C = CJ and put T = mean(J).

More illustrative, Rousseeuw and Leroy state (1987, p. 262): The MVE tries to find the smallest ellipse to cover h ofthe data points, whereas the MCD tries to find the h data points for which the determinant of the covariance matrix isminimal.

422 Details: Outlier Detection: MVE and MCD

Because of the nonsmooth form of these objective functions, the estimates cannot be obtained with traditional optimizationalgorithms. For MVE and MCD our algorithm, like MINVOL and FAST-MCD, selects a number of subsets of n observationsout of the m given observations, evaluates the objective function, and saves the subset with the lowest objective function.If computer time does not permit us to evaluate all different subsets, a random collection of subsets is evaluated.

Note that the MVE, and MCD subroutines are only executed when the number m of observations is over twice the numbern of explanatory variables xj (including the intercept), i.e. if m > 2n.

1.9.2 Syntax for MVE and MCD

CALL MVE(sc, coef,dist, opt, x <, s >);

CALL MCD(sc, coef,dist, opt, x <, s >);

where

sc is a returned column vector containing a variety of scalar results.

coef is a returned matrix with n columns, where the first row contains the location of the ellipsoid center.

dist is a returned matrix with m columns containing a variety of distance measures.

opt is an input vector specifying options.

x is an m × n data matrix.

s is an optional n input vector (almost never needed).

The arguments opt and x are input arguments:

opt (optional) refers to an options vector with the following components (missing values are treated as default values):

1. opt[1] specifies the amount of printed output. Higher option values request additional output and include theoutput of lower values.

• opt[1]=0: no output is printed except error messages

• opt[1]=1: print most of the output

• opt[1]=2: additionally print case numbers of observations in best subset and some basic history of theoptimization process

• opt[1]=3: for MVE and specifying opt[6]: additionally print some detail history of the exchange algorithm;for MCD: print information about the iteration progress

• opt[1]=4: for MVE: additionally print how many subsets result in singular linear systems: for MCD:print more information about the iteration progress


2. opt[2] specifies whether the classical, initial, and final robust covariance matrices are printed. Default isopt[2]=0. Note that the final robust covariance matrix is always returned in coef.

3. opt[3] specifies whether the classical, initial, and final robust correlation matrices are printed or returned:

(a) opt[2]=0: do not return or print

(b) opt[2]=1: print the robust correlation matrix

(c) opt[2]=2: return the final robust correlation matrix in coef

(d) opt[2]=3: print and return the final robust correlation matrix

4. opt[4] specifies the quantile h used in the objective objective function. Default is opt[4]= h = m+n+12 . If the

value of h is specified outside the range m2

+ 1 ≤ h ≤ 3m4

+ n+14

it is reset to the closest boundary of thisregion.

5. opt[5] specifies the number NRep of subset generations and is the same as described above for LMS and LTS.Due to computer time restrictions not all subset combinations can be inspected for larger values of m and n.If opt[5] is zero or missing, the default number of subsets is taken from the following table:


n 1 2 3 4 5 6 7 8 9 10

Nlower 500 50 22 17 15 14 0 0 0 0Nupper 1000000 1414 182 71 43 32 27 24 23 22

NRep 500 1000 1500 2000 2500 3000 3000 3000 3000 3000

n 11 12 13 14 15

Nlower 0 0 0 0 0Nupper 22 22 22 23 23

NRep 3000 3000 3000 3000 3000

If the number of cases (observations) m is smaller than Nlower then all possible subsets are used,otherwise NRep subsets are drawn randomly. That means, for opt[5]=-1 exhaustive search is done. Ifm is larger than Nupper, a note is printed in the log file telling how many subsets exist.

6. opt[6] is valid only for MVE and specifies an algorithm which may be run following the randomsubsampling algorithm and will try to improve a solution by pairwise exchange of one observationfrom the current p-subset (swap-out) with an observation from its complementary subset (swap-in).The algorithm may be costly in computer time and is not guaranteed to improve the solution. Thefollowing three algorithms are available:(a) opt[6]=1: complete pairwise exchange: looking for the best possible pair to exchange (most

costly);(b) opt[6]=2: partial pairwise exchange: looking for the first exchange pair with somehow significant

improvement.(c) opt[6]=3: partial pairwise exchange: consider only exchanges with a promising set of observations.Default is opt[6]=0, i.e. no pairwise exchange algorithm is excuted. In a small number of numericalsimulations the exchange algorithm opt[6]=2 performed best and algorithm opt[6]=3 performedworst.

x refers to an m × n matrix X of regressors.

s refers to a n vector containing n observation numbers of a subset for which the objective function should be evaluatedand which may be the start for a pairwise exchange algorithm if opt[6] is specified.

Missing values are not permitted in x. Missing values in opt cause the default value to be used.

The MVE subroutine returns the following values:

sc is a column vector containing the following scalar information:

1. sc[1] the quantile h used in the objective function

2. sc[2] number of subsets generated

3. sc[3] number of subsets with singular linear systems

4. sc[4] number of nonzero weights wi

5. sc[5] lowest value of the objective function FMV E attained (volume of smallest ellipsoid found)

6. sc[6] Mahalanobis-like distance used in the computation of the lowest value of the objective function FMV E

7. sc[7] the cutoff value used for the outlier decision

coef is a matrix with n columns containing the following results in its rows:

1. coef[1,]: location of ellipsoid center

2. coef[2,]: eigenvalues of final robust scatter matrix

3. coef[3:2+n,]: contains the final robust scatter matrix for opt[2]=1 or opt[2]=3

dist is a matrix with m columns containing the following results in its rows:

1. dist[1,]: Mahalanobis distances

2. dist[2,]: robust distances based on the final estimates

3. dist[3,]: weights (=1 for small, =0 for large robust distances)


1.9.3 Examples for MVE and MCD

MVE of Brainlog Data

The data (listed in chapter 7) consist of the body weights (in kilograms) and brain weights (in grams) of m=28 animalsand were reported by Jerison (1973) but can be found also in Rousseeuw & Leroy (1987, p. 57). Instead of the originaldata, here the logarithms of the measurements of the two variables are used:

• Cases (rows): 28 animals

• Column 1: logarithm of body weight (kg)

• Column 2: weight of brain (gr)

By default, the MVE function (like the MINVOL program) uses only 1500 randomly selected subsets and not all subsets.Specifying optn[5]=-1; generates and evaluates all 3276 subsets of 3 cases out of 28 cases.

optn = cons(5,1,.);

optn[1]= 3; /* ipri */

optn[2]= 1; /* pcov: print COV */

optn[3]= 1; /* pcor: print CORR */

optn[5]=-1; /* all subsets */

<loc,scat,dist> = outlmd(aa,"mve",optn);

Our specification of optn[1]=3, optn[2]=1, and optn[3]=1 asks for complete printed output. Therefore, the first part ofthe output shows the classical scatter and correlation matrix:

*****************************************

Minimum Volume Ellipsoid (MVE) Estimation

*****************************************

Consider Ellipsoids Containing 15 Cases




Total Number Subsets 3276 N Subset (Specified)

2147483647


X1 -1.638272 0.45823 1.730816 1.637857 2.69215 4.939519

X2 -0.397940 1.24551 2.133148 1.921947 2.62428 3.756788


X1 1.63757480 0.98933150 1.46678508 1.56942582

X2 1.04199508 0.50174850 0.74389344 0.92046180


--------------------------

1| 1


2| 0.7795 1


--------------------------------

1| 1

2| 0.7163 1


---------------------------

1| 2.682

2| 1.33 1.086

The second part of the output shows the results of the combinatoric optimization (complete subset sampling):



***********************************************

*** Complete Enumeration for MVE ***

***********************************************


819 0 0.4397 25%

1638 0 0.4397 50%

2457 0 0.4397 75%

3276 0 0.4397 100%




---------------------------

1 : 1 22 28

Initial MVE Location Estimates

------------------------------

1 : 1.386 1.802

Initial MVE Scatter Matrix

--------------------------

1| 4.902

2| 3.294 2.34

The third part of the output shows the optimization results after local improvement:

*********************************************


Final MVE Estimates (Using Local Improvement)

*********************************************

Number of Points with Nonzero Weight = 24

Robust MVE Location Estimates

-----------------------------

1 : 1.295 1.873

Robust MVE Scatter Matrix

-------------------------

1| 2.057

2| 1.529 1.204

Eigenvalues of Robust Scatter Matrix

------------------------------------

1 : 0.04307 3.218

Robust Correlation Matrix

-------------------------

1| 1

2| 0.9716 1

The final output presents a table containing the classical Mahalanobis distances, the robust distances, and the weightsidentifying the outlier observations:

Classical Distances and Rousseeuw (Robust) Distances

Unsquared Mahalanobis Distance and

Unsquared Rousseeuw Distance of each Observation

Mahalanobis Rousseeuw

Case Distances Distances Weight

--------------------------------------------

1 1.00659058375 0.89707567976 1

2 0.69526110135 1.40530208726 1

3 0.30083139925 0.18672628324 1

4 0.38081717939 0.31870104279 1

5 1.14648513752 1.13569711562 1

6 2.64417609074 8.82803603143 0

7 1.70833441356 1.69923276790 1

8 0.70652175912 0.68667960666 1

9 0.85840384932 1.08416250086 1

10 0.79869763395 1.58083510506 1

11 0.68648527489 0.69334584197 1

12 0.87434943134 1.07149195604 1

13 0.67779106095 0.71754535578 1

14 1.72152603576 3.39869849718 0

15 1.76194689647 1.76270342739 1

16 2.36947279658 7.99947198806 0

17 1.22225281754 2.80595427819 0

18 0.20317795821 1.20733198035 1


19 1.85520094037 1.77331728985 1

20 2.26626754755 2.07497074813 1

21 0.83141618903 0.78595410941 1

22 0.41615797778 0.34219962687 1

23 0.26418241099 0.91838315803 1

24 1.04611979565 1.78233385527 1

25 2.91110122672 9.56544317804 0

26 1.58645789445 1.54374767782 1

27 1.58212399087 1.80842303726 1

28 0.39466372486 1.52323494449 1

--------------------------------------------

Distribution of Rousseeuw (Robust) Distances


0.187 0.842 1.46 2.13 1.8 9.57

Default Cutoff Value= 2.7162

The cutoff value is the square root of the 0.975 quantile

of the chi square distribution with 2 degrees of freedom.

N= 5 points have robust distances larger than cutoff.

Only points whose Rousseeuw distance is substantially larger

than the cutoff should be considered outliers.

MCD of Brainlog Data

The same data are used for computing the minimum covariance determinant estimator.

print "***MCD for BrainLog Data***";

< loc,scat,dist> = outlmd(aa,"mcd",optn);

print "MCD: BrainLog Data: Location and EValues=",loc,

"MCD: BrainLog Data: Scatter Matrix=",scat,

"MCD: BrainLog Data: Robust Distances=",dist;

**************************************

Fast MCD by Rousseeuw and Van Driessen

**************************************






X1 -1.638272 0.45823 1.730816 1.637857 2.69215 4.939519

X2 -0.397940 1.24551 2.133148 1.921947 2.62428 3.756788



X1 1.63757480 0.98933150 1.46678508 1.56942582

X2 1.04199508 0.50174850 0.74389344 0.92046180


--------------------------

1| 1

2| 0.7795 1


--------------------------------

1| 1

2| 0.7163 1


---------------------------

1| 2.682

2| 1.33 1.086

*****************************************************

MCD Estimates (Obtained by Subsampling and Iteration)

*****************************************************

The total number of iterations was: 9868

The best halve of the entire data set obtained after full iteration

consists of the cases:

1 2 3 4 5 8 9 11

12 13 18 21 22 23 28

MCD Location Estimate

---------------------

1 : 1.622 2.015

Average of 15 Selected Points

MCD Scatter Matrix Estimate

---------------------------

1| 0.8974

2| 0.6424 0.4794

Determinant = 0.0174302

Covariance matrix of 15 Selected Points

MCD Correlation Matrix

----------------------


1| 1

2| 0.9795 1

The MCD scatter matrix multiplied by 2.36872 making it consistent

when all the data come from a single Gaussian distribution.

Consistent Scatter Matrix

-------------------------

1| 2.126

2| 1.522 1.135


***************************

Reweighted Robust Estimates

***************************

Reweighted Location Estimate

----------------------------

1 : 1.315 1.857

Reweighted Scatter Matrix

-------------------------

1| 2.14

2| 1.607 1.252


Eigenvalues

-----------

1 : 3.363 0.02895

Reweighted Correlation Matrix

-----------------------------

1| 1

2| 0.9817 1






--------------------------------------------

1 1.00659058375 0.85534680233 1

2 0.69526110135 1.47704990416 1

3 0.30083139925 0.23982808567 1

4 0.38081717939 0.51771927451 1


5 1.14648513752 1.10836197158 1

6 2.64417609074 10.5993409398 0

7 1.70833441356 1.80845496922 1

8 0.70652175912 0.69009899480 1

9 0.85840384932 1.05242333305 1

10 0.79869763395 2.07713090007 1

11 0.68648527489 0.88854485822 1

12 0.87434943134 1.03582389209 1

13 0.67779106095 0.68397779811 1

14 1.72152603576 4.25796347222 0

15 1.76194689647 1.71606524508 1

16 2.36947279658 9.58499244378 0

17 1.22225281754 3.57169985927 0

18 0.20317795821 1.32378289986 1

19 1.85520094037 1.74106395182 1

20 2.26626754755 2.02652755258 1

21 0.83141618903 0.74354485971 1

22 0.41615797778 0.41992280943 1

23 0.26418241099 0.94461006821 1

24 1.04611979565 2.28933413379 1

25 2.91110122672 11.4719526178 0

26 1.58645789445 1.51872133205 1

27 1.58212399087 2.05459305495 1

28 0.39466372486 1.67565126177 1

--------------------------------------------



0.24 0.872 1.5 2.44 2.07 11.5







MVE of Kootenay Data: Location Problem

The data (listed in chapter 7) are water flow measurements in Libby and Newgate on the Kootenay river in January forthe years 1931 to 1943. The original data from Ezekiel & Fox (1959) were changed in observation 4 from ( 77.6, 44.9) to( 77.6, 15.7) which now defines a serious outlier in observation 4. The following are the data reported by Rousseeuw &Leroy (1987, p. 64).

This is a univariate (location) problem. Our specification of optn[1]=3 andoptn[2]=1 asks for complete printed output.

print "***MVE for Kootenay Data***";

a = aa[,1];

optn = j(5,1,.);

optn[1]= 3; /* ipri */

optn[2]= 1; /* pcov */


< loc,scat,dist> = outlmd(a,"mve",optn);

The first part of the output shows some univariate statistics and the classical scatter matrix:

*****************************************

Minimum Volume Ellipsoid (MVE) Estimation

*****************************************

Consider Ellipsoids Containing 7 Cases






X1 17.60000 23.8000 27.80000 32.53846 34.2500 77.60000


X1 14.9809623 6.20000000 9.19213375 8.45661818


---------------------------

1| 224.4

The second part of the output shows the results of the combinatoric optimization (complete subset sampling):



***********************************************

*** Complete Enumeration for MVE ***

***********************************************


20 0 4.85 25%

39 0 4.6 50%

59 1 4.6 75%

78 1 4.55 100%




---------------------------

1 : 8 10


Initial MVE Location Estimates

------------------------------

1 : 30.55

Initial MVE Scatter Matrix

--------------------------

1| 230.4

The third part of the output shows the optimization results after local improvement:

*********************************************

Final MVE Estimates (Using Local Improvement)

*********************************************

Number of Points with Nonzero Weight = 12

Robust MVE Location Estimate 28.7833

Robust MVE Scatter Matrix Estimate 44.85

- Square Root Yields Scale Estimate -

The final output presents a table containing the classical Mahalanobis distances, the robust distances, and the weightsidentifying the outlier observations:






--------------------------------------------

1 0.36302484698 0.25134390628 1

2 0.77688344197 1.17708581852 1

3 0.05750888665 0.68932932713 1

4 3.00792016066 7.28897328215 0

5 0.29781387729 1.22685688907 1

6 0.73015747157 1.07256657037 1

7 0.99716301673 1.66981941698 1

8 0.17098624334 0.94316178694 1

9 0.00410777762 0.56987875781 1

10 0.43645137190 0.41558843910 1

11 0.32964915383 0.17668730045 1

12 0.41129123399 1.48068934888 1

13 0.31629887657 0.14682465812 1

--------------------------------------------



0.147 0.333 0.943 1.32 1.2 7.29








MCD of Kootenay Data: Location Problem

The data are the same as in the former example, but now we estimate the minimum covariance determinant.

< loc,scat,dist> = outlmd(a,"mcd",optn);

We simply list the output without comments:

**************************************

Fast MCD by Rousseeuw and Van Driessen

**************************************






X1 17.60000 23.8000 27.80000 32.53846 34.2500 77.60000


X1 14.9809623 6.20000000 9.19213375 8.45661818


--------------------------

1| 1


--------------------------------

1| 1


---------------------------

1| 224.4

***********************************************

Minimizing Sum of 7 Smallest Squared Residuals.

***********************************************

Robust MCD Location Estimate 29.9429


Robust MCD Scatter Matrix Estimate 67.44

- Square Root Yields Scale Estimate -






--------------------------------------------

1 0.36302484698 0.34616367923 1

2 0.77688344197 1.10111361283 1

3 0.05750888665 0.42096286620 1

4 3.00792016066 5.80302529603 0

5 0.29781387729 0.85932089216 1

6 0.73015747157 1.01587733000 1

7 0.99716301673 1.50294180329 1

8 0.17098624334 0.62796526735 1

9 0.00410777762 0.32354997154 1

10 0.43645137190 0.48010640938 1

11 0.32964915383 0.28528062007 1

12 0.41129123399 1.06632329331 1

13 0.31629887657 0.26092739640 1

--------------------------------------------



0.261 0.335 0.628 1.08 1.04 5.8







1.9.4 Examples Combining Robust Residuals and Robust Distances

This section is based entirely on Rousseeuw & Van Zomeren (1990). Observations (xi,yi) of which the xi lies far away frommost of the other observations are called leverage points. One classical method is inspecting the Mahalanobis distancesMDi

MDi =√

(xi − x)C−1(xi − x)T

where x is the sample mean and C is the classical sample covariance matrix. Note that the MVE subroutine prints theclassical Mahalanobis distances MDi together with the robust (Rousseeuw) distances RDi. In classical linear regressionthe diagonal elements hii of the hat matrix

H = X(XT X)−1XT

are used to identify leverage points. Rousseeuw & Van Zomeren (1990) report the following monotone relationship betweenthe hii and MDi

hii =(MDi)

2

m − 1+

1

n


and point out that neither the MDi nor the hii are safe for detecting leverage points reliably. Multiple outliers do notnecessarily have large MDi values because of the masking effect. They propose to use the robust distances RDi instead.

The definition of a leverage point is based entirely on the outlyingness of xi and is not related to the response value yi. Byincluding the yi value in the definition, Rousseeuw & Van Zomeren (1990) distinguish between

• good leverage points when the point (xi, yi) is close to the regression plane, i.e. a good leverage point improves theprecision of the regression coefficients;

• bad leverage points when the point (xi, yi) is far from the regression plane, i.e. a bad leverage reduces the precisionof the regression coefficients.

Rousseeuw & Van Zomeren (1990) propose to plot the standardized residuals of a robust regression versus the robustdistances RDi obtained from the MVE (or MCD). Two horizontal lines corresponding to residual values of +2.5 and -2.5are useful to distinguish between small and large residuals, and one vertical line corresponding to the

√χ2

n,.975 quantile isused to distinguish between small and large distances.

Hawkins-Bradu-Kass Data

The first 14 observations of this data set (see Hawkins, Bradu, and Kass, 1984) are leverage points. However, onlyobservations 12, 13, and 14 have large hii and only observations 12 and 14 have large MDi values. Executing outlmd()

with the ”mvk” option gives the multivariate kurtosis, the Mahalanobis-like distances, and the diagonal of the hat matrix:

optn = cons(5,1,.);

optn[1]= 3; /* ipri */

optn[2]= 1; /* idis */

optn[3]= 1; /* ihat */

< mkur,madi > = outlmd(a,"mvk",3);

print "Mardias Multivariate Kurtosis ",mkur[1,1];

print "Relative Multivariate Kurtosis ",mkur[1,2];

print "Normalized Multivariate Kurtosis",mkur[2,1];

print "Mardia Based Kappa ",mkur[2,2];

for (k = 1; k <= 5; k++)

print k,": Contribution to MVK=",mkur[2+k,1]," Case=",mkur[2+k,2];

print "\n Mahalanobis Distances ",madi[1,],

"\n Diagonal of Hat Matrix",madi[2,];

Hawkins, Bradu, Kass (1984) Data

Mardias Multivariate Kurtosis 16.937

Relative Multivariate Kurtosis 2.1291

Normalized Multivariate Kurtosis 13.390

Mardia Based Kappa 1.1291

1: Contribution to MVK= 1335.01 Case= 14.000





Like in Rousseeuw & Leroy (1987, p. 223) we only list the first entries of the second result:


Mahalanobis Distances

| 1 2 3 4 5 6

----------------------------------------------------------------

1 | 3.6742 3.4438 5.3530 4.9714 4.4105 4.6060

| 7 8 9 10 11 12

----------------------------------------------------------------

1 | 4.0422 3.6836 4.9339 5.4454 5.9856 9.6617

| 13 14 15 16 17 18

----------------------------------------------------------------

1 | 7.0883 40.7251 3.2960 4.6283 1.9180 0.7194

Diagonal of Hat Matrix

| 1 2 3 4 5 6

----------------------------------------------------------------

1 | 0.06081 0.05986 0.08500 0.07920 0.07291 0.06846

| 7 8 9 10 11 12

----------------------------------------------------------------

1 | 0.06292 0.06278 0.07823 0.08264 0.09407 0.14388

| 13 14 15 16 17 18

----------------------------------------------------------------

1 | 0.09619 0.54452 0.05566 0.07171 0.01012 0.02074

The following two graphs show

1. the plot of standardized LMS residuals vs. Rousseeuw (robust) distances RDi

2. the plot of standardized LS residuals vs. Mahalanobis distances MDi

The first graph impressively identifies the four good leverage points 11, 12, 13, 14 with small standardized LMS residualsbut large robust distances and the ten bad leverage points 1,...,10 with large standardized LMS residuals and large robustdistances. (The second graph, based on nonrobust estimates, would lead to erroneous conclusions.)

Chapter 2

Some Useful CMAT Modules

2.1 Module EVTST for Testing the EV Decomposition

int function evtst(titl,side,val,vec,a)

{

option noname;

nvec = (side == ’l’) ? nrow(vec) : ncol(vec);

n = (side == ’l’) ? ncol(vec) : nrow(vec);

if (nrow(a) != n || ncol(a) != n) {

print "Dimension of data matrix not compatible",

n, nrow(a),ncol(a);

return(1);

}

if (nrow(val) != nvec || ncol(val) != nvec) {

print "Dimensions of eigenvalue and eigenvector matrices",

" are not compatible.",nvec,nrow(a),ncol(a);

return(2);

}

/* Symmetric case: side must be ’s’ */

/* UNsymmetric case: side must be either ’r’ or ’l’ */

ss = 0.; id = ide(nvec);

if (side == ’s’) {

ss0 = ssq(a * vec - vec * val);

ss1 = ssq(id - vec‘ * vec);

print " Symmetric Case: Error Residuals: ", titl;

print "Definition: A * Vec - Vec * Val=", ss0;

print "Orthogonality 1: I - Vec‘ * Vec=", ss1;

ss2 = 0.;

if (nvec == n) {

ss2 = ssq(id - vec * vec‘);

print "Orthogonality 2: I - Vec * Vec‘=", ss2;

}

ss = ss0 + ss1 + ss2;

} else

if (side == ’r’) {

ss0 = ssq(a * vec - vec * val);

437

438 Details: Useful Modules

ss1 = ssq(id - diag(vec‘ * vec));

print "UNSymmetric Case: Right Eigenvectors: ", titl;

print "Definition: A * RVec - RVec * Val =", ss0;

print "Unit Length: I - diag(RVec‘ * RVec)=", ss1;

ss = ss0 + ss1;

} else

if (side == ’l’) {

ss0 = ssq(vec * a - val * vec);

ss2 = ssq(id - diag(vec * vec‘));

print "UNSymmetric Case: Left Eigenvectors: ", titl;

print "Definition: LVec * A - Val * LVec =", ss0;

print "Unit Length: I - diag(LVec * LVec‘)=", ss2;

ss = ss0 + ss2;

}

print "\n";

return((ss > 1e-4) ? 3 : 0);

}

2.2 Module GEVTST for Testing Generalized EV Decomposition

int function gevtst(titl,ityp,side,val,vec,a,b)

{

option noname;

nvec = (side == ’l’) ? nrow(vec) : ncol(vec);

n = (side == ’l’) ? ncol(vec) : nrow(vec);

if (nrow(a) != n || ncol(a) != n) {

print "GEIG: Dimension of data matrix A not compatible",

n, nrow(a),ncol(a);

return(1);

}

if (nrow(b) != n || ncol(b) != n) {

print "GEIG: Dimension of data matrix B not compatible",

n, nrow(b),ncol(b);

return(1);

}

if (nrow(val) != nvec || ncol(val) != nvec) {

print "Dimensions of eigenvalue and eigenvector matrices",

" are not compatible.",nvec,nrow(a),ncol(a);

return(2);

}

/* Symmetric case: side must be ’s’ */

/* UNsymmetric case: side must be either ’r’ or ’l’ */

ss = 0.; id = ide(nvec);

if (side == ’s’) {

print " Symmetric Case: Error Residuals: ", titl;

if (ityp == 2) {

res0 = inv(vec) * a * inv(vec)‘ - val;

ssq0 = ssq(res0);

print "Type 2: inv(Vec) * A * inv(Vec)‘ - Val =", ssq0;

res1 = vec‘ * b * vec - id;

ssq1 = ssq(res1);

Details: Useful Modules 439

print "Type 2: Vec‘ * B * Vec - I =", ssq1;

} else if (ityp == 3) {

res0 = vec‘ * a * vec - val;

ssq0 = ssq(res0);

print "Type 3: Vec‘ * A * Vec - Val =", ssq0;

res1 = vec‘ * inv(b) * vec - id;

ssq1 = ssq(res1);

print "Type 3: Vec‘ * inv(B) * Vec - I =", ssq1;

} else {

res0 = vec‘ * a * vec - val;

ssq0 = ssq(res0);

print "Type 1: Vec‘ * A * Vec - Val =", ssq0;

res1 = vec‘ * b * vec - id;

ssq1 = ssq(res1);

print "Type 1: Vec‘ * B * Vec - I =", ssq1;

}

if (ssq0 > 1e-3) print "GEIG: Residual Matrix", res0;

if (ssq1 > 1e-3) print "GEIG: Residual Matrix", res1;

if (ssq0 < 1e-3 && ssq1 < 1e-3)

print "Symmetric Eigenvector Problem: res=", res0+res1;

ss = ssq0 + ssq1;

} else

if (side == ’r’) {

res = cons(n,n);

for (i = 0; i < nvec; i++)

res[1,i] = a * vec[,i] - val[i,i] * b * vec[,i];

ss = ssq(res);

print "GEIG: UNSymmetric Case: Right Eigenvectors: ", titl;

print "GEIG: Definition: A * RVec[i] - Val[i] * B * RVec[i] =", ss;

if (ss > 1e-3) print "Residual Matrix", res;

else print "Right Eigenvectors: res=", res;

} else

if (side == ’l’) {

res = cons(n,n);

for (i = 0; i < nvec; i++)

res[i,1] = vec[,i]‘ * a - val[i,i] * vec[,i]‘ * b;

ss = ssq(res);

print "GEIG: UNSymmetric Case: Left Eigenvectors: ", titl;

print "GEIG: Definition: LVec[i] * A - Val[i] * LVec[i] * B =", ss;

if (ss > 1e-3) print "Residual Matrix", res;

else print "Left Eigenvectors: res=", res;

}

print "\n";

return((ss > 1e-4) ? 3 : 0);

}

2.3 Module GSCHURT for Testing Generalized Schur Decomposi-

tion Examples

int function gschurt(titl,af,bf,u,v,a,b)

{


option noname;

n = nrow(a);

if (ncol(a) != n) {

print "Dimension of data matrix A not compatible",

nrow(a),ncol(a);

return(1);

}

if (nrow(b) != n || ncol(b) != n) {

print "Dimension of data matrix B not compatible",

n, nrow(b),ncol(b);

return(1);

}

if (nrow(af) != n || ncol(af) != n ||

nrow(bf) != n || ncol(bf) != n ||

nrow(u) != n || ncol(u) != n ||

nrow(v) != n || ncol(v) != n) {

print "Dimension of data matrices not compatible",

n, nrow(af),ncol(af),nrow(bf),ncol(bf),

nrow(u), ncol(u), nrow(v), ncol(v);

return(1);

}

at = u * af * v‘;

ares = a - at; ssa = ssq(ares);

print "GSCHUR: SSQ Residual A: || A - U * S * V‘ ||=", ssa;

bt = u * bf * v‘;

bres = b - bt; ssb = ssq(bres);

print "GSCHUR: SSQ Residual B: || B - U * P * V‘ ||=", ssb;

/* U and V must be orthogonal */

ures1 = ide(n) - u‘ * u; ures2 = ide(n) - u * u‘;

ssu = ssq(ures1) + ssq(ures2);

print "GSCHUR: Residual U Orthogonality", ssu;

if (ssu > 1e-3) print " U Columnwise", ures1,

" U Rowwise", ures2;

vres1 = ide(n) - v‘ * v; vres2 = ide(n) - v * v‘;

ssv = ssq(vres1) + ssq(vres2);

print "GSCHUR: Residual V Orthogonality", ssv;

if (ssv > 1e-3) print " V Columnwise", vres1,

" V Rowwise", vres2;

print "\n";

ss = ssu + ssv;

return((ss > 1e-4) ? 3 : 0);

}

2.4 Module GSVDTST for Testing Generalized SVD Decomposition

/* Generalized SVD Examples */

/* <siga,sigb,q,r,u,v,erbd> = GSVD(a,b) */

/* A[m,n] = U[m,m] Sig_A [0 R] Q’[n,n] */

/* B[p,n] = V[p,p] Sig_B [0 R] Q’[n,n] */

int function gsvdtst(titl,siga,sigb,r,q,u,v,a,b)

{


option noname;

m = nrow(a); n = ncol(a); p = nrow(b);

if (ncol(b) != n) {

print "Dimension of data matrices A and B are not compatible",

nrow(a),ncol(a),nrow(b),ncol(b);

return(1);

}

at = u * siga * r * q‘;

ares = a - at; ssa = ssq(ares);

print "GSVD: SSQ Residual A: || A - U * Sig_A * R * Q‘ ||=", ssa;

bt = v * sigb * r * q‘;

bres = b - bt; ssb = ssq(bres);

print "GSVD: SSQ Residual B: || B - V * Sig_B * R * Q‘ ||=", ssb;

/* U, V, and Q must be orthogonal */

ures1 = ide(m) - u‘ * u; ures2 = ide(m) - u * u‘;

ssu = ssq(ures1) + ssq(ures2);

print "GSVD: Residual U Orthogonality", ssu;

if (ssu > 1e-3) print " U Columnwise", ures1,

" U Rowwise", ures2;

vres1 = ide(p) - v‘ * v; vres2 = ide(p) - v * v‘;

ssv = ssq(vres1) + ssq(vres2);

print "GSVD: Residual V Orthogonality", ssv;

if (ssv > 1e-3) print " V Columnwise", vres1,

" V Rowwise", vres2;

qres1 = ide(n) - q‘ * q; qres2 = ide(n) - q * q‘;

ss0 = ssq(qres1) + ssq(qres2);

print "GSVD: Residual Q Orthogonality", ss0;

if (ss0 > 1e-3) print " V Columnwise", qres1,

" V Rowwise", qres2;

print "\n";

ss = ssa + ssb + ssu + ssv + ss0;

return((ss > 1e-4) ? 3 : 0);

}

2.5 Module GLSTST for Testing the GLS and GLM Functions

function glstest(ilse,a,b,c,d,x,y)

{

if (ilse) {

/* LSE: min || Ax - c|| subject to Bx = d */

/* A[m,n], B[p,n], c[m], d[p], p <= n <= m+p,

[A’ , B’]’ has full column rank n */

m = nrow(a); n = ncol(a); p = nrow(b);

if (n != ncol(b))

print "LSE: A and B are not compatible:",n,ncol(b);

if (m != nrow(c))

print "LSE: A and c are not compatible:",m,nrow(c);

if (p != nrow(d))

print "LSE: B and d are not compatible:",p,nrow(d);


if (p > n || n > m+p)

print "LSE: Dimensions [m,n,p]=",m,n,p,

" do not satisfy: p <= n <= m+p.";

/* Compute Lagrangian Matrix M:

n p

n | a’a b’ | | x | | a’c | n

| | | | = | |

p | b 0 | | lamb | | d | p

*/

M = ((a‘ * a) -> b‘) |> b;

rhs = a‘ * c |> d;

xlam = lls(M,rhs,"svd");

n1 = n - 1; np = n + p; npm = np - 1;

xx = xlam[0:n1]; lam = xlam[n:npm];

ss0 = ssq(b * xx - d);

ssx = ssq(x - xx);

ss1 = ssq(a * x - c);

ss2 = ssq(a * xx - c);

print "SSQ(x-xx)=", ssx, " Constraint: SSQ(b*xx-d)=", ss0;

print "Input: SSQ(a*x-c)=", ss1, " Check: SSQ(a*x-c)=", ss2;

if (ssx > 1.e-3) print "Input x=", x, " Check: x=",xx;

else print "LSE Result found fine: ssq(x)=", ssx;

} else {

/* GLM: min || y || subject to Ax + By = d */

/* A[n,m], B[n,p], d[n], m <= n <= m+p */

n = nrow(a); m = ncol(a); p = ncol(b);

m1 = m - 1; mpp = m + p; mpm = mpp - 1;

if (n != nrow(b))

print "GLM: A and b are not compatible:",n,nrow(b);

if (n != nrow(d))

print "GLM: A and d are not compatible:",n,nrow(d);

if (m > n || n > m+p)

print "GLM: Dimensions [m,n,p]=",m,n,p,

" do not satisfy: m <= n <= m+p.";

/* Compute Lagrangian Matrix M:

m p n

m | 0 0 a‘ | | x | | 0 | m

p | 0 2*I b‘ | | y | = | 0 | p

n | a b 0 | | lam | | d | n

*/

M1 = cons(m,m,0) -> cons(m,p,0) -> a‘;

M2 = cons(p,m,0) -> (2.*ide(p)) -> b‘;

M3 = a -> b -> cons(n,n,0);

M = M1 |> M2 |> M3;

rhs = cons(m,1,0) |> cons(p,1,0) |> d;

options second;

xyl = lls(M,rhs,"svd");

xx = xyl[0:m1]; yy = xyl[m:mpm];

ss0 = ssq(a * xx + b * yy - d);


ssx = ssq(x - xx); ssy = ssq(y - yy);

ss1 = ssq(y); ss2 = ssq(yy);

print "SSQ(x-xx)=", ssx, " SSQ(y-yy)=", ssy;

print "Constraint: SSQ(a*xx + b * yy - c)=", ss0;

print "Input: SSQ(y)=", ss1, " Check: SSQ(y)=", ss2;

if (ssx > 1.e-3) print "Input x=", x, " Check: x=", xx;

if (ssy > 1.e-3) print "Input y=", y, " Check: y=", yy;

if (ssx < 1.e-3 && ssy < 1.e-3)

print "LSE Result found fine: ssq(y) ssq(x)=", ssy,ssx;

}

return;

}


2.6 Modules for Testing the LLS and PINV Functions

function ranbid(scal,null,n)

{

/*--- Create bidiagonal Matrix ---*/

v1 = 1. + scal*.5*(1. + rand(n)); v2 = scal*.5*(1. + rand(n-1));

vmax = v1[<>]; vmin = v1[><];

a = diag(v1) + diag(v2,-1);

/* print vmin, vmax, v1, v2, a; */

/* Note: the eigenvalues of A are the square roots

of those of A’A */

/* print "Eigenvalues of A", eig(a); */

if (null > 0) {

m = n / null; m = ceil(m);

if (m < 2) print "Error number singularities",null,m;

else {

k = -m / 2;

/* preference rule requires parenthesis: */

while ((k += m) < n) a[k,k] = a[k,k-1] = 0;

}

vmin = macon("mbig"); vmax = -vmin;

for (k = 1; k <= n; k++)

if (a[k,k] != 0) {

if (a[k,k] > vmax) vmax = a[k,k];

if (a[k,k] < vmin) vmin = a[k,k];

} }

print "\n RANBID : n=", n," null=", null,

"\n Smallest Nonzero Eigenvalue=",vmin,

"\n Largest Nonzero Eigenvalue=",vmax,

"\n Reciprocal Condition of A =", vmin / vmax;

return(a);

}

/*——————– Bi- and Tridiagonal Matrices —————–*/

/* Regular case: A is lower bidiagonal, ATA is psd tridiagonal

Following four methods should lead regular solution:

1. X1 = lls(A’A,A’b) : uses PSD3 assuming psd matrix

2. X2 = lls(A’A,A’b,"evd") : uses TQL and V inv(D) V’

3. X3 = A’A \ A’B : MATINV: solve band diagonal system: LAPACK code

DPTTRF and procedes with DPTTRS if A psd

if A is not psd procedes with DGTTRF and DGTTRS */

function trills(scal,null,n) {

a = ranbid(scal,null,n);

b = rand(n,1);

ata = a‘ * a; atb = a‘ * b;

print "--- TQL: Tridiagonal LLS ---";

/* Tridiagonal LS Solve of A’Ax = A’b uses:

TQL and matrix product V inv(D) V’ */

t1 = time("clo");

x2 = lls(ata,atb,"evd");

dt2 = time("clo") - t1;


res2 = ssq(ata * x2 - atb);

print "RES=", res2," Time=", dt2;

print "--- PSD3: Tridiagonal LLS ---";

/* Tridiagonal LS Solve of A’Ax = A’b uses:

PSD3 assuming PSD matrix A’A */

t1 = time("clo");

x1 = lls(ata,atb);




if (null) {

print " ---------- n=", n, " null=", null," ----------",

"\n PSD3: Tridiag. LSSol: Res=", res1," Time=", dt1,

"\n TQL : Tridiag. LSSol: Res=", res2," Time=", dt2;

return;

}

print "--- MATINV: Tridiagonal LAPACK Solver ---";

/* Regular tridiagonal linear system:

MATINV: use LAPACK: DPTTRF and DPTTRS */

t1 = time("clo");

x3 = ata \ atb;





"\n PSD3: Tridiag. LSSol: Res=", res1," Time=", dt1,

"\n TQL : Tridiag. LSSol: Res=", res2," Time=", dt2,

"\n LAPACK: Tridiag. Solve: Res=", res3," Time=", dt3;

return;

}

scal = 10.; null = 0; n = 100;

trills(scal,null,n);

null = 10;


null = 20;


/*——————– Bi- and Tridiagonal Matrices —————–*/

/* Regular case: A is lower bidiagonal, ATA is psd tridiagonal

Following four methods should lead regular inverse:

1. X1 = pinv(A’A) : uses PSD3 assuming psd matrix

2. X2 = pinv(A’A,"evd") : uses TQL and V inv(D) V’

3. X3 = A’A \ A’B : MATINV: solve band diagonal system: LAPACK code

DPTTRF and procedes with DPTTRS if A psd

if A is not psd procedes with DGTTRF and DGTTRS

4. X4 = inv(A’A) : MATINV: sparse inverse gets dense: Harwell

uses sparse LDL’ Decomposition */


function tripin(scal,null,n) {

a = ranbid(scal,null,n);

b = ide(n);

ata = a‘ * a; atb = a‘ * b;

print "--- PSD3: Tridiagonal PseudoInverse ---";

/* Pseudoinverse of tridiagonal A’A uses:

PSD3 assuming PSD matrix A’A */

t1 = time("clo");

x1 = pinv(ata);


res1 = ssq(ide(n) - x1*ata);


print "--- TQL: Tridiagonal PseudoInverse ---";

/* Pseudoinverse of tridiagonal A’A uses:

TQL and matrix product V inv(D) V’ */

t1 = time("clo");

x2 = pinv(ata,"evd");




if (null) {


"\n PSD3: Pseudo_Inverse: Res=", res1," Time=", dt1,

"\n TQL : Pseudo_Inverse: Res=", res2," Time=", dt2;

return;

}

print "--- MATINV: Tridiagonal LAPACK Solver ---";

/* Regular tridiagonal linear system:

MATINV: use LAPACK: DPTTRF and DPTTRS */

t1 = time("clo");

x3 = ata \ atb;




print "--- LUSPAS: Sparse Symmetric Solver ---";

/* Compute inverse of regular tridiagonal matrix A’A:

MATINV, LUSPAS: using sparse LDL’ decomposition */

t1 = time("clo");

x4 = inv(ata);






"\n TQL : Pseudo_Inverse: Res=", res2," Time=", dt2,

"\n LAPACK: Tridiag. Solve: Res=", res3," Time=", dt3,

"\n LUSPAS: Sparse Sym.Inv: Res=", res4," Time=", dt4;

return;

}


scal = 10.; null = 0; n = 100;

tripin(scal,null,n);

null = 10;


null = 20;


function ransym(scal,null,n)

{

/*--- Create A = A * sqrt(D) ---*/

/* set equally distant values of D to zero */

a = rand(n,n,’o’);

u = scal*.5*(1. + rand(n));

v = u[<];

vmax = v[<>]; vmin = v[><];

/* print vmin, vmax, a; */

if (null > 0) {

m = n / null; m = ceil(m);

if (m < 2) print "Error number singularities",null,m;

else {

k = -m / 2;

/* preference rule requires parenthesis: */

while ((k += m) < n) v[k] = 0;

}

vmin = macon("mbig"); vmax = -vmin;

for (i = 1; i <= n; i++)

if (v[i] != 0) {

if (v[i] > vmax) vmax = v[i];

if (v[i] < vmin) vmin = v[i];

} }

v = sqrt(v);

print "\n RANSYM : n=", n," null=", null,

"\n Smallest Nonzero Eigenvalue=",vmin,

"\n Largest Nonzero Eigenvalue=",vmax,

"\n Reciprocal Condition of A =", vmin / vmax;

a = diag(v) * a;

return(a);

}

/*——————– Dense Symmetric PSD Matrices —————–*/

/* Regular case: A is symmetric psd

Following four methods should lead least-squares solution:

1. X1 = lls(A’A,A’b) : uses (TRED3,PSD3,TRBAK) assuming psd matrix

2. X2 = lls(A’A,A’b,"evd") : uses eigenvalue dec. (TRED2,IMTQL2)

3. X3 = lls(A’A,A’b,"svd") : uses singular value decomposition

4. X4 = lls(A’A,A’b,"cod") : uses complete orthogonal decomposition */

function symlls(scal,null,n) {

a = ransym(scal,null,n);


b = rand(n,1);

ata = a‘ * a; atb = a‘ * b;

print "--- PSD3: (TRED3,PSD3) LS Solution ---";

t1 = time("clo");

x1 = lls(ata,atb);




print "--- EVD: (TRED2,IMTQL2) LS Solution ---";

t1 = time("clo");

x2 = lls(ata,atb,"evd");




print "--- SVD: Singular Value LS Solution ---";

t1 = time("clo");

x3 = lls(ata,atb,"svd");




print "--- COD: Comp.Orth.Dec. LS Solution ---";

t1 = time("clo");

x4 = lls(ata,atb,"cod");





"\n PSD3: LS Solution: Res=", res1," Time=", dt1,

"\n EVD : LS Solution: Res=", res2," Time=", dt2,

"\n SVD : LS Solution: Res=", res3," Time=", dt3,

"\n COD : LS Solution: Res=", res4," Time=", dt4;

return;

}

scal = 10.; null = 0; n = 100;

symlls(scal,null,n);

null = 10;


null = 20;


/*——————– Dense Symmetric PSD Matrices —————–*/

/* Regular case: A is symmetric psd

Following four methods should lead pseudoinverse:

1. X1 = pinv(A’A) : uses (TRED3,PSD3,TRBAK) assuming psd matrix

2. X2 = pinv(A’A,"evd") : uses eigenvalue dec. (TRED2,IMTQL2)

3. X3 = pinv(A’A,"svd") : uses singular value decomposition

4. X4 = pinv(A’A,"cod") : uses complete orthogonal decomposition */


function sympin(scal,null,n) {

a = ransym(scal,null,n);

b = ide(n);

ata = a‘ * a; atb = a‘ * b;

print "--- PSD3: (TRED3,PSD3) PseudoInverse ---";

t1 = time("clo");

x1 = pinv(ata);


res1 = ssq(ata * x1 - b);


print "--- EVD: (TRED2,IMTQL2) PseudoInverse ---";

t1 = time("clo");

x2 = pinv(ata,"evd");




print "--- SVD: Singular Value PseudoInverse ---";

t1 = time("clo");

x3 = pinv(ata,"svd");




print "--- COD: Comp.Orth.Dec. PseudoInverse ---";

t1 = time("clo");

x4 = pinv(ata,"cod");






"\n EVD : Pseudo_Inverse: Res=", res2," Time=", dt2,

"\n SVD : Pseudo_Inverse: Res=", res3," Time=", dt3,

"\n COD : Pseudo_Inverse: Res=", res4," Time=", dt4;

return;

}

scal = 10.; null = 0; n = 100;

sympin(scal,null,n);

null = 10;


null = 20;



2.7 Some Modules for Statistics

The following macro describes tests for asymmetry, non-independence, and skew-symmetry of square contingency tables.,,

1. Fit Asymmetric Models

(a) O Model: Goodman (1985): Null Asymmetry Model

(b) F Model: Agresti (1983): Linear DiagonalsParameter Symmetry Model

(c) T Model: Goodman (1985): Triangle Asymmetry Model

(d) D Model: Agresti (1983): DiagonalsParameter Symmetry Model

(e) OS1 Model: Tomizawa (1985): Odds Symmetry Model I

(f) OS2 Model: Tomizawa (1985): Odds Symmetry Model II

(g) 2RPA Model: Tomizawa (1987): 2-Ratio Parameter Symmetry Model

(h) LDPS Model: Agresti (1983): LinearDiagonalsParameter Model

2. Fit Skew-Symmetric Models

(a) O Model: Goodman (1985): Null Skew Asymmetry Model

(b) T Model: Goodman (1985): Triangle Skew Symmetry Model

(c) 2.3 D Model: Agresti (1983): DiagonalsParameter Symmetry Model

(d) QOS Model: Tomizawa (1985): Quasi Odds Symmetry Model

3. Fit Non-Independence Models

(a) O Model: Goodman (1985): Null or Independence Model

(b) F Model: Haberman (1978): Fixed Distance Model

(c) U Model:

(d) V Model: Haberman (1978): Variable Distance Model

(e) L Model: Upton (1985): Loyalty Model

(f) T Model: Upton (1985): Non-Independence Triangle Model

(g) Q Model: Upton (1985): Quasi Independence Model

(h) DAT Model: Diagonals-Absolute Triangle Model

(i) DA Model: Goodman (1972): Non-Independence absolute DiagonalsParameter Model

(j) D Model: Goodman (1972): Non-Independence DiagonalsParameter Model

(k) UF Model:

4. Fit Non-symmetry and Independence Models

(a) O Model: Goodman (1985): Null Non-Symmetry+Independence Model

(b) F Model: Agresti (1983): Linear DiagonalsParameter Symmetry Model

(c) U Model:

(d) T Model: Goodman (1985): Triangle Non-Symmetry+Independence Model

(e) DAT Model: Diagonals Absolute Triangles Model

(f) DA Model: Diagonals Absolute Model

(g) D Model: DiagonalsParameter Non-Symmetry+Indepedence Model


function ttable(ipr,n) {

/* Create tabl[nn,12+n+nm1]=[uv,...,ha,va] data set */

nn = n * n; nm1 = n - 1;

real rv[nn],cv[nn],uv[nn],sv[nn],lv[nn],tv[nn],qv[nn];

real da[nn],dv[nn],fv[nn],os1[nn],os2[nn];

real ha[nn,n],va[nn,nm1];

ij = 1;

for (i = 1; i <= n; i++)

for (j = 1; j <= n; j++, ij++) {

rv[ij] = i; cv[ij] = j;

uv[ij] = i * j;

k = abs(i-j);

l = (i < j) ? i : j;

sv[ij] = k+1 - (l+1)*(.5*l+1) + (n+3)*(l+1) - 3 - 2*n;

lv[ij] = (i == j) ? 2 : 1;

tv[ij] = (i > j) ? 2 : (i < j) ? 1 : 3;

qv[ij] = (i == j) ? i : n+1;

da[ij] = (i == j) ? n : k;

/* factor variable d */

dv[ij] = (i < j) ? k : (i > j) ? k+n-1 : 2*n-1;

/* fixed distance regression variable */

fv[ij] = (i < j) ? k+1 : 1;

os1[ij] = (i < j) ? i : (i > j) ? n-1+j : 2*n-1;

os2[ij] = (i < j) ? 2*n-j : (i > j) ? n+1-i : 2*n-1;

for (k = 1; k <= n; k++) {

ff = (k == i) ? 1 : 0;

ss = (k == j) ? 1 : 0;

ha[ij,k] = ff + ss;

if (k < n) {

l = k + 1;

va[ij,k] = (i <= k && j <= k) ? 2

: (i >= l && j >= l) ? 2 : 1;

} } }

tabl = [ rv cv uv sv lv tv qv da dv fv os1 os2 ha va ];

kha = 12; kva = kha + n;

if (ncol(tabl) != kha+n+nm1) print "Invalid column number.";

if (ipr) {

print "S=", svm = shape(sv,n,n);

print "D=", dvm = shape(dv,n,n);

print "Da=", dam = shape(da,n,n);

print "T=", tvm = shape(tv,n,n);

print "OS1=", os1m = shape(os1,n,n);

print "OS2=", os2m = shape(os2,n,n);

print "F=", fvm = shape(fv,n,n);

print "Q=", qvm = shape(qv,n,n);

print "U=", uvm = shape(uv,n,n);

print "L=", lvm = shape(lv,n,n);

hv4 = ha[,4]; vv4 = va[,4];

print "HV4=", h4 = shape(hv4,n,n);

print "VV4=", v4 = shape(vv4,n,n);

}

if (ipr > 1) print "Full Matrix", tabl;

return(tabl);

}


function analyz(ipr,data,tabl,fmodel,test) {

n = nrow(data); nm1 = n - 1;

ct = shape(data,.,1);

comb = [ ct tabl ];

fmod = (fmodel[1] == "a") ? 1

: (fmodel[1] == "s") ? 2

: (fmodel[1] == "i") ? 3

: (fmodel[1] == "n") ? 4 : 0;

kha = 13; kva = kha + n;

if (ncol(comb) != kha+n+nm1) print "Invalid column number.";

st1 = test[1]; slen = strlen(test);

st2 = (slen > 1) ? test[1:2] : "**";

st3 = (slen > 2) ? test[1:3] : "***";

gof = cons(20,1,.);

switch(fmod) {

case 1:

/*--- Fit asymmetric models: O, F, T, D, OS1, OS2, 2RPA, LDPS ---*/

if (st3 == "OS1") {

if (ipr) print "1.5 OS1 Model: Tomizawa (1985): Odds Symmetry Model I";

clas = [ 5 12 ]; modl = "1 = 5 12"; /* c = s os1; */

} else if (st3 == "OS2") {

if (ipr) print "1.6 OS2 Model: Tomizawa (1985): Odds Symmetry Model II";

clas = [ 5 13 ]; modl = "1 = 5 13"; /* c = s os2; */

} else if (st1 == "O") {

if (ipr) print "1.1 O Model: Goodman (1985): Null Asymmetry Model";

clas = 5; modl = "1 = 5"; /* c = s; */

} else if (st1 == "F") {

if (ipr) print "1.2 F Model: Agresti (1983): Linear DiagonalsParameter Symmetry Model";

clas = 5; modl = "1 = 5 11"; /* c = s f; */

} else if (st1 == "T") {

if (ipr) print "1.3 T Model: Goodman (1985): Triangle Asymmetry Model";

clas = [ 5 7 ]; modl = "1 = 5 7"; /* c = s t; */

} else if (st1 == "D") {

if (ipr) print "1.4 D Model: Agresti (1983): DiagonalsParameter Symmetry Model";

clas = [ 5 10 ]; modl = "1 = 5 10"; /* c = s d; */

} else if (st3 == "2RP") {

if (ipr) print "1.7 2RPA Model: Tomizawa (1987): 2-Ratio Parameter Asymmetry Model";

clas = [ 5 7 ]; modl = "1 = 5 7 11"; /* c = s t f; */

} else if (st3 == "LDP") {

if (ipr) print "1.8 LDPS Model: Model";

clas = [ 5 10 ]; modl = "1 = 5 10 11"; /* c = s d f; */

} else {

print "Unknown method", test; break;

}

optn = [ "print" ipr,

"dist" "poiss" ];

gof = glim(comb,modl,optn,clas);

break;

case 2:

/*--- Fit skew-symmetric models: O, T, D, QOS ---*/

if (st1 == "O") {


if (ipr) print "2.1. O Model: Goodman (1985): Null Skew Asymmetry Model";

clas = [ 2 3 5 ]; modl = "1 = 2 3 5"; /* c = r c s; */

} else if (st1 == "T") {

if (ipr) print "2.2 T Model: Goodman (1985): Triangle Skew Symmetry Model";

clas = [ 2 3 5 7 ]; modl = "1 = 2 3 5 7"; /* c = r c s t; */

} else if (st1 == "D") {

if (ipr) print "2.3 D Model: Agresti (1983): DiagonalsParameter Symmetry Model";

clas = [ 2 3 5 10 ]; modl = "1 = 2 3 5 10"; /* c = r c s d; */

} else if (st3 == "QOS") {

if (ipr) print "2.4 QOS Model: Tomizawa (1985): Quasi Odds Symmetry Model";

clas = [ 2 3 5 12 ]; modl = "1 = 2 3 5 12"; /* c = r c s os1; */

} else {


}


"dist" "poiss" ];


break;

case 3:

/* Fit non-independence models: O, F, U, V, L, T, Q, D, DA, DAT, UF */

if (st1 == "O") {

if (ipr) print "3.1. O Model: Goodman (1985): Null or Independence Model";

clas = [ 2 3 ]; modl = "1 = 2 3"; /* c = r c; */

} else if (st1 == "F") {

if (ipr) print "3.2 F Model: Haberman (1978): Fixed Distance Model";

clas = [ 2 3 ]; modl = "1 = 2 3 11"; /* c = r c f; */

} else if (st2 == "UF") {

if (ipr) print "3.11 UF Model: ";

clas = [ 2 3 4 11 ]; modl = "1 = 2 3 4 11"; /* c = r c u f; */

} else if (st1 == "U") {

if (ipr) print "3.3 U Model: ";

clas = [ 2 3 ]; modl = "1 = 2:4"; /* c = r c u; */

} else if (st1 == "V") {

if (ipr) print "3.4 V Model: Haberman (1978): Variable Distance Model";

clas = [ 2 3 19:22 ]; modl = "1 = 2 3 19:22"; /* c = r c v1..v4; */

} else if (st1 == "L") {

if (ipr) print "3.5 L Model: Upton (1985): Loyalty Model";

clas = [ 2 3 ]; modl = "1 = 2 3 6"; /* c = r c l; */

} else if (st1 == "T") {

if (ipr) print "3.6 T Model: Upton (1985): Non-Independence Triangle Model";

clas = [ 2 3 7 ]; modl = "1 = 2 3 7"; /* c = r c t; */

} else if (st1 == "Q") {

print "3.7 Q Model: Upton (1985): Quasi Independence Model";

clas = [ 2 3 8 ]; modl = "1 = 2 3 8"; /* c = r c q; */

} else if (st3 == "DAT") {

if (ipr) print "3.10 DAT Model: Diagonals-Absolute Triangle Model";

clas = [ 2 3 9 7 ]; modl = "1 = 2 3 9 7"; /* c = r c da t; */

} else if (st2 == "DA") {

if (ipr)

print "3.9 DA Model: Goodman (1972): Non-Independence absolute DiagonalsParameter Model";

clas = [ 2 3 9 ]; modl = "1 = 2 3 9"; /* c = r c da; */

} else if (st1 == "D") {

if (ipr) print "3.8 D Model: Goodman (1972): Non-Independence DiagonalsParameter Model";

clas = [ 2 3 10 ]; modl = "1 = 2 3 10"; /* c = r c d; */


} else {


}


"dist" "poiss" ];


break;

case 4:

/* Fit non-symmetry and independence models: O, F, U, T, D, DA, DAT */

if (st1 == "O") {

if (ipr) print "4.1. O Model: Goodman (1985): Null Non-Symmetry+Independence Model";

clas = .; modl = "1 = 14:17"; /* c = h1..h4; */

} else if (st1 == "F") {

if (ipr) print "4.2 F Model: Agresti (1983): Linear DiagonalsParameter Symmetry Model";

clas = .; modl = "1 = 14:17 11"; /* c = h1..h4 f; */

} else if (st1 == "U") {

if (ipr) print "4.3 U Model: ";

clas = .; modl = "1 = 14:17 4"; /* c = h1..h4 u; */

} else if (st1 == "T") {

if (ipr) print "4.4 T Model: Goodman (1985): Triangle Non-Symmetry+Independence Model";

clas = 7; modl = "1 = 14:17 7"; /* c = h1..h4 t; */

} else if (st3 == "DAT") {

if (ipr) print "4.7 DAT Model: Diagonals Absolute Triangles Model";

clas = [ 9 7 ]; modl = "1 = 14:17 9 7"; /* c = h1..h4 da t; */

} else if (st2 == "DA") {

if (ipr) print "4.6 DA Model: Diagonals Absolute Model";

clas = 9; modl = "1 = 14:17 9"; /* c = h1..h4 da; */

} else if (st1 == "D") {

if (ipr) print "4.5 D Model: DiagonalsParameter Non-Symmetry+Indepedence Model";

clas = 10; modl = "1 = 14:17 10"; /* c = h1..h4 d; */

} else {


}


"dist" "poiss" ];


break;

default:

print "Unknown fit model", fmodel;

break;

}

ind = [ 5 9:10 ];

res = gof[ ind ];

return(res);

}

count = [ 18 17 16 4 2 ,

24 105 109 59 21 ,

23 84 289 217 95 ,

8 49 175 348 198 ,

6 8 69 201 246 ];

n = nrow(count);


tabl = ttable(0,n);

ipr = 1;

free rest;

a_all = [ "O" "F" "T" "D" "OS1" "OS2" "2RPA" "LDPS" ];

for (i = 1; i <= ncol(a_all); i++) {

res = analyz(ipr,count,tabl,"a",a_all[i]);

rest = rest -> res;

}

cname(rest,a_all);

print "1. Fit asymmetric models", rest;

1. Fit asymmetric models

| O F T D

--------------------------------------------

1 | 10.0000 9.0000 9.0000 6.0000

2 | 24.8021 19.0543 18.8188 14.8378

3 | 24.4212 18.9374 18.5221 14.8837

| OS1 OS2 2RPA LDPS

--------------------------------------------

1 | 6.0000 6.0000 8.0000 6.0000

2 | 4.5557 15.9891 18.6809 14.8378

3 | 4.3781 15.6910 18.4421 14.8837

free rest;

s_all = [ "O" "T" "D" "QOS" ];

for (i = 1; i <= ncol(s_all); i++) {

res = analyz(ipr,count,tabl,"s",s_all[i]);

rest = rest -> res;

}

cname(rest,s_all);

print "2. Fit skew-symmetric models", rest;

2. Fit skew-symmetric models

| O T D QOS

--------------------------------------------

1 | 6.00000 5.00000 3.00000 3.00000

2 | 6.46805 6.40012 4.44013 3.22800

3 | 6.30705 6.22872 4.39706 3.10165

/* Q model does not work due to small class vars */

free rest;

i_all = [ "O" "F" "U" "V" "L" "T" "Q" "DAT" "DA" "D" "UF" ];

i_all = [ "O" "F" "U" "V" "L" "T" "DAT" "DA" "D" "UF" ];

for (i = 1; i <= ncol(i_all); i++) {

res = analyz(ipr,count,tabl,"i",i_all[i]);

rest = rest -> res;

}

cname(rest,i_all);

print "3. Fit non-independence models", rest;

3. Fit non-independence models

456 Details: Data Sets

| O F U V L

------------------------------------------------------

1 | 16.000 15.000 15.000 12.000 15.000

2 | 654.207 41.889 47.992 32.830 350.595

3 | 754.104 38.737 68.456 29.934 332.676

| T DAT DA D UF

------------------------------------------------------

1 | 14.000 11.000 12.000 9.000 3.000

2 | 349.762 12.329 12.414 10.228 4.440

3 | 331.191 12.161 12.208 10.140 4.397

free rest;

n_all = [ "O" "F" "U" "T" "DAT" "DA" "D" ];

for (i = 1; i <= ncol(n_all); i++) {

res = analyz(ipr,count,tabl,"n",n_all[i]);

rest = rest -> res;

}

cname(rest,n_all);

print "4. Fit non-symmetry and independence", rest;

4. Fit non-symmetry and independence

| O F U T

--------------------------------------------

1 | 20.000 19.000 19.000 18.000

2 | 667.383 555.625 64.730 360.185

3 | 758.944 546.116 85.249 342.116

| DAT DA D

----------------------------------

1 | 15.000 16.000 12.000

2 | 24.121 30.105 20.140

3 | 23.687 29.534 19.789

Chapter 3

Some Important Data Sets

3.1 Iris Data: Fisher [253]

• Cases (rows): 150 Iris plants

• Column 1: Sepal Length

• Column 2: Sepal Width

• Column 3: Petal Length

• Column 4: Petal Width

50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3

63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2

59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2

65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3

68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3

77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3

49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2

64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3

55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1

49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1

67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1

77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2

50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1

61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1

61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1

51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1

51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1

46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1

50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3

57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1

71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3

49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1

49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1

66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1

44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2

47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2

74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1

457


56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3

49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1

56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2

51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3

54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3

61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3

68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1

45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1

55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1

51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2

63 33 60 25 3 53 37 15 02 1

3.2 Brain Data: Weisberg [840]

• Cases (rows): animals

• Column 2: average brain weight

• Column 3: average body weight

a = [ 1 3.385 44.500, 2 0.480 15.500,

3 1.360 8.100, 4 465.000 423.000,

5 36.330 119.500, 6 27.660 115.000,

7 14.830 98.200, 8 1.040 5.500,

9 4.190 58.000, 10 0.425 6.400,

11 0.101 4.000, 12 0.920 5.700,

13 1.000 6.600, 14 0.005 0.140,

15 0.060 1.000, 16 3.500 10.800,

17 2.000 12.300, 18 1.700 6.300,

19 2547.000 4603.000, 20 0.023 0.300,

21 187.100 419.000, 22 521.000 655.000,

23 0.785 3.500, 24 10.000 115.000,

25 3.300 25.600, 26 0.200 5.000,

27 1.410 17.500, 28 529.000 680.000,

29 207.000 406.000, 30 85.000 325.000,

31 0.750 12.300, 32 62.000 1320.000,

33 6654.000 5712.000, 34 3.500 3.900,

35 6.800 179.000, 36 35.000 56.000,

37 4.050 17.000, 38 0.120 1.000,

39 0.023 0.400, 40 0.010 0.250,

41 1.400 12.500, 42 250.000 490.000,

43 2.500 12.100, 44 55.500 175.000,

45 100.000 157.000, 46 52.160 440.000,

47 10.550 179.500, 48 0.550 2.400,

49 60.000 81.000, 50 3.600 21.000,

51 4.288 39.200, 52 0.280 1.900,

53 0.075 1.200, 54 0.122 3.000,

55 0.048 0.330, 56 192.000 180.000,

57 3.000 25.000, 58 160.000 169.000,

59 0.900 2.600, 60 1.620 11.400,

61 0.104 2.500, 62 4.235 50.400 ];

Details: Data Sets 459

3.3 Hertzsprung-Russell Star Data: Rousseeuw & Leroy [715]

• Cases (rows): measurements of 47 stars in cluster CYG OB1

• Column 1: number of star

• Column 2: logarithm of effective temperature on star surface

• Column 3: logarithm of light intensity

xy = [ 1 4.37 5.23, 2 4.56 5.74, 3 4.26 4.93,

4 4.56 5.74, 5 4.30 5.19, 6 4.46 5.46,

7 3.84 4.65, 8 4.57 5.27, 9 4.26 5.57,

10 4.37 5.12, 11 3.49 5.73, 12 4.43 5.45,

13 4.48 5.42, 14 4.01 4.05, 15 4.29 4.26,

16 4.42 4.58, 17 4.23 3.94, 18 4.42 4.18,

19 4.23 4.18, 20 3.49 5.89, 21 4.29 4.38,

22 4.29 4.22, 23 4.42 4.42, 24 4.49 4.85,

25 4.38 5.02, 26 4.42 4.66, 27 4.29 4.66,

28 4.38 4.90, 29 4.22 4.39, 30 3.48 6.05,

31 4.38 4.42, 32 4.56 5.10, 33 4.45 5.22,

34 3.49 6.29, 35 4.23 4.34, 36 4.62 5.62,

37 4.53 5.10, 38 4.45 5.22, 39 4.53 5.18,

40 4.43 5.57, 41 4.38 4.62, 42 4.45 5.06,

43 4.50 5.34, 44 4.45 5.34, 45 4.55 5.54,

46 4.45 4.98, 47 4.42 4.50 ];

3.4 Brainlog Data: Rousseeuw & Leroy [715]

• Cases (rows): 28 animals

• Column 1: logarithm of body weight (kg)

• Column 2: brain weight (gr)

aa=[1.303338E-001 9.084851E-001 ,

2.6674530 2.6263400 ,

1.5602650 2.0773680 ,

1.4418520 2.0606980 ,

1.703332E-002 7.403627E-001 ,

4.0681860 1.6989700 ,

3.4060290 3.6630410 ,

2.2720740 2.6222140 ,

2.7168380 2.8162410 ,

1.0000000 2.0606980 ,

5.185139E-001 1.4082400 ,

2.7234560 2.8325090 ,

2.3159700 2.6085260 ,

1.7923920 3.1205740 ,

3.8230830 3.7567880 ,

3.9731280 1.8450980 ,

8.325089E-001 2.2528530 ,

1.5440680 1.7481880 ,

-9.208187E-001 .0000000 ,

-1.6382720 -3.979400E-001 ,


3.979400E-001 1.0827850 ,

1.7442930 2.2430380 ,

2.0000000 2.1959000 ,

1.7173380 2.6434530 ,

4.9395190 2.1889280 ,

-5.528420E-001 2.787536E-001 ,

-9.136401E-001 4.771213E-001 ,

2.2833010 2.2552720 ];

3.5 Kootenay Data: Rousseeuw & Leroy [715]

• Cases (rows): time (day) of speed measurement of Kootenay river

• Column 1: speed measurement in Libby

• Column 2: speed measurement in Newgate

aa = [ 27.1 19.7 , 20.9 18.0 ,

33.4 26.1 , 77.6 15.7 ,

37.0 26.1 , 21.6 19.9 ,

17.6 15.7 , 35.1 27.6 ,

32.6 24.9 , 26.0 23.4 ,

27.6 23.1 , 38.7 31.3 ,

27.8 23.8 ];

3.6 Heart Data: Weisberg [840]

see also Hawkins, [377], Spaeth (1992)

• Cases (rows): 12 young patients

• Column 2: x1 height in inches

• Column 3: x2 weight in lbs.

• Column 4: y catheder length in cm

a= [ 1 42.8 40.0 37,

2 63.5 93.5 50,

3 37.5 35.5 34,

4 39.5 30.0 36,

5 45.5 52.0 43,

6 38.5 17.0 28,

7 43.0 38.5 37,

8 22.5 8.5 20,

9 37.0 33.0 34,

10 23.5 9.5 30,

11 33.0 21.0 38,

12 58.0 79.0 47 ];


3.7 Real Estate Example by Narula and Wellington (1977)

dat= [ 4.9176 1.0 3.4720 0.9980 25.9,

5.0208 1.0 3.5310 1.5000 29.5,

4.5429 1.0 2.2750 1.1750 27.9,

4.5573 1.0 4.0500 1.2320 25.9,

5.0597 1.0 4.4550 1.1210 29.9,

3.8910 1.0 4.4550 0.9880 29.9,

5.8980 1.0 5.8500 1.2400 30.9,

5.6039 1.0 9.5200 1.5010 28.9,

15.4202 2.5 9.8000 3.4200 84.9,

14.4598 2.5 12.8000 3.0000 82.9,

5.8282 1.0 6.4350 1.2250 35.9,

5.3003 1.0 4.9883 1.5520 31.5,

6.2712 1.0 5.5200 0.9750 31.0,

5.9592 1.0 6.6660 1.1210 30.9,

5.0500 1.0 5.0000 1.0200 30.0,

5.6039 1.0 9.5200 1.5010 28.9,

8.2464 1.5 5.1500 1.6640 36.9,

6.6969 1.5 6.9020 1.4880 41.9,

7.7841 1.5 7.1020 1.3760 40.5,

9.0384 1.0 7.8000 1.5000 43.9,

5.9894 1.0 5.5200 1.2560 37.5,

7.5422 1.5 4.0000 1.6900 37.9,

8.7951 1.5 9.8900 1.8200 44.5,

6.0931 1.5 6.7265 1.6520 37.9,

8.3607 1.5 9.1500 1.7770 38.9,

8.1400 1.0 8.0000 1.5040 36.9,

9.1416 1.5 7.3262 1.8310 45.8,

12.0000 1.5 5.0000 1.2000 41.0 ];

3.8 Mail Order Catalog Data: Spaeth [776]

• Cases (rows): Catalog editions

• Column 2: x1 number of print runs

• Column 3: x2 number of pages

• Column 4: y number of orders

a = [ 1 2800 22 437, 2 2670 14 204,

3 2800 37 725, 4 2784 15 279,

5 2800 38 474, 6 2620 172 1587,

7 2620 249 2630, 8 2470 84 798,

9 2620 242 2509, 10 2475 100 1192,

11 2620 114 882, 12 2620 37 511,

13 2448 96 896, 14 2648 116 1297,

15 2525 94 857, 16 1000 47 388,

17 980 48 462, 18 1000 15 67,

19 1112 45 326, 20 1000 23 145,

21 1000 44 298, 22 2188 23 179,

23 1028 47 289, 24 2200 31 200,

25 1000 48 461, 26 980 48 223,


27 728 47 235, 28 2510 26 235,

29 1500 58 594, 30 2500 128 1800,

31 2620 120 1457, 32 2528 120 1710,

33 2630 121 1715, 34 2550 122 1615,

35 1150 61 196, 36 1150 50 309,

37 1150 50 263, 38 1147 60 332 ];

3.9 Stackloss Data: Brownlee [127]

• Cases (rows): Catalog editions

• Column 2: y proportion of unprocessed ammonia

• Column 3: x1 operation time

• Column 4: x2 temperature of collant

• Column 5: x3 acid concentration

aa = [ 1 80 27 89 42, 1 80 27 88 37,

1 75 25 90 37, 1 62 24 87 28,

1 62 22 87 18, 1 62 23 87 18,

1 62 24 93 19, 1 62 24 93 20,

1 58 23 87 15, 1 58 18 80 14,

1 58 18 89 14, 1 58 17 88 13,

1 58 18 82 11, 1 58 19 93 12,

1 50 18 89 8, 1 50 18 86 7,

1 50 19 72 8, 1 50 19 79 8,

1 50 20 80 9, 1 56 20 82 15,

1 70 20 91 15 ];

3.10 Hald Data: see Draper and Smith [225]

See also at: http://www.pharma.ethz.ch/qsar/datasets/hald.html

• Column 1: y

• Column 2: X1

• Column 3: X2

• Column 4: X3

• Column 5: X4

a = [ 78.5 1 7 26 6 60,

74.3 1 1 29 15 52,

104.3 1 11 56 8 20,

87.6 1 11 31 8 47,

95.9 1 7 52 6 33,

109.2 1 11 55 9 22,

102.7 1 3 71 17 6,

72.5 1 1 31 22 44,

93.1 1 2 54 18 22,


115.9 1 21 47 4 26,

83.8 1 1 40 23 34,

113.3 1 11 66 9 12,

109.4 1 10 68 8 12 ];

3.11 Longley Data: Weisberg [840]

• Column 2: y: number of employees in the US

• Column 3: x1 price deflation

• Column 4: x2 gross national product

• Column 5: x3 number of unemployed

• Column 6: x4 number of empoyees in armed forces

• Column 7: x5 number of persons older than 14

• Column 8: x6 year

a = [ 1 60323 83.0 234289 2356 1590 107608 1947,

2 61122 88.5 259426 2325 1456 108632 1948,

3 60171 88.2 258054 3682 1616 109773 1949,

4 61187 89.5 284599 3351 1650 110929 1950,

5 63221 96.2 328975 2099 3099 112075 1951,

6 63639 98.1 346999 1932 3594 113270 1952,

7 64989 99.0 365385 1870 3547 115094 1953,

8 63761 100.0 363112 3578 3350 116219 1954,

9 66019 101.2 397469 2904 3048 117388 1955,

10 67857 104.6 419180 2822 2857 118734 1956,

11 68169 108.4 442769 2936 2798 120445 1957,

12 66513 110.8 444546 4681 2637 121950 1958,

13 68655 112.6 482704 3813 2552 123366 1959,

14 69564 114.2 502601 3931 2514 125368 1960,

15 69331 115.7 518173 4806 2572 127852 1961,

16 70551 116.9 554894 4007 2827 130081 1962 ];

3.12 Census Data: SAS Institute [734]

Census populations in 50 states of the US ([734], p. 632):

• Column 1: state

• Column 2: female 1970

• Column 3: male 1970

• Column 4: female 1980

• Column 5: male 1980

a= [ "ALA" 1.78 1.66 2.02 1.87, "ALASKA" 0.14 0.16 0.19 0.21,

"ARIZ" 0.90 0.87 1.38 1.34, "ARK" 0.99 0.93 1.18 1.10,

"CALIF" 10.14 9.82 12.00 11.67, "COLO" 1.12 1.09 1.46 1.43,

"CONN" 1.56 1.47 1.61 1.50, "DEL" 0.28 0.27 0.31 0.29,


"FLA" 3.51 3.28 5.07 4.68, "GA" 2.36 2.23 2.82 2.64,

"HAW" 0.37 0.40 0.47 0.49, "IDAHO" 0.36 0.36 0.47 0.47,

"ILL" 5.72 5.39 5.89 5.54, "IND" 2.66 2.53 2.82 2.64,

"IOWA" 1.45 1.37 1.50 1.41, "KAN" 1.15 1.02 1.21 1.16,

"KY" 1.64 1.58 1.87 1.79, "LA" 1.87 1.77 2.17 2.04,

"ME" 0.51 0.48 0.58 0.55, "MD" 2.01 1.92 2.17 2.04,

"MASS" 2.97 2.72 3.01 2.73, "MICH" 4.53 4.39 4.75 4.52,

"MINN" 1.94 1.86 2.08 2.00, "MISS" 1.14 1.07 1.31 1.21,

"MO" 2.42 2.26 2.55 2.37, "MONT" 0.35 0.35 0.39 0.39,

"NEB" 0.76 0.72 0.80 0.77, "NEV" 0.24 0.25 0.40 0.41,

"NH" 0.38 0.36 0.47 0.45, "NJ" 3.70 3.47 3.83 3.53,

"NM" 0.52 0.50 0.66 0.64, "NY" 9.52 8.72 9.22 8.34,

"NC" 2.59 2.49 3.03 2.86, "ND" 0.31 0.31 0.32 0.33,

"OHIO" 5.49 5.16 5.58 5.22, "OKLA" 1.31 1.25 1.55 1.48,

"ORE" 1.07 1.02 1.34 1.30, "PA" 6.13 5.67 6.18 5.68,

"RI" 0.48 0.46 0.50 0.45, "SC" 1.32 1.27 1.60 1.52,

"SD" 0.34 0.33 0.35 0.34, "TENN" 2.03 1.90 2.37 2.62,

"TEXAS" 5.72 5.48 7.23 7.00, "UTAH" 0.54 0.52 0.74 0.72,

"VT" 0.23 0.22 0.26 0.25, "VA" 2.35 2.30 2.73 2.62,

"WASH" 1.72 1.69 2.08 2.05, "W.VA" 0.90 0.84 1.00 0.95,

"WIS" 2.25 2.17 2.40 2.31, "WYO" 0.16 0.17 0.23 0.24 ];

3.13 Fitness Data: SAS Institute [734]

Original author: A.C. Linnerud, NC State University Census populations in 50 states of the US:

• Column 1: age (years)

• Column 2: weight (kg)

• Column 3: oxygen uptake (ml per kg body weight per minute)

• Column 4: runtime (time to run 1.5 miles)

• Column 5: rstpulse (heartrate while resting)

• Column 6: runpulse (heartrate while running)

• Column 7: maxpulse (maximum heartrate at running)

a= [ 44 89.47 44.609 11.37 62 178 182 ,

40 75.07 45.313 10.07 62 185 185 ,

44 85.84 54.297 8.65 45 156 168 ,

42 68.15 59.571 8.17 40 166 172 ,

38 89.02 49.874 9.22 55 178 180 ,

47 77.45 44.811 11.63 58 176 176 ,

40 75.98 45.681 11.95 70 176 180 ,

43 81.19 49.091 10.85 64 162 170 ,

44 81.42 39.442 13.08 63 174 176 ,

38 81.87 60.055 8.63 48 170 186 ,

44 73.03 50.541 10.13 45 168 168 ,

45 87.66 37.388 14.03 56 186 192 ,

45 66.45 44.754 11.12 51 176 176 ,

47 79.15 47.273 10.60 47 162 164 ,

54 83.12 51.855 10.33 50 166 170 ,

49 81.42 49.156 8.95 44 180 185 ,


51 69.63 40.836 10.95 57 168 172 ,

51 77.91 46.672 10.00 48 162 168 ,

48 91.63 46.774 10.25 48 162 164 ,

49 73.37 50.388 10.08 67 168 168 ,

57 73.37 39.407 12.63 58 174 176 ,

54 79.38 46.080 11.17 62 156 165 ,

52 76.32 45.441 9.63 48 164 166 ,

50 70.87 54.625 8.92 48 146 155 ,

51 67.25 45.118 11.08 48 172 172 ,

54 91.63 39.203 12.88 44 168 172 ,

51 73.71 45.790 10.47 59 186 188 ,

57 59.08 50.545 9.93 49 148 155 ,

49 76.32 48.673 9.40 56 186 188 ,

48 61.24 47.920 11.50 52 170 176 ,

52 82.78 47.467 10.50 53 170 172 ];

3.14 Skin Data: SAS Institute [734]

Original author: A.C. Linnerud, NC State University ([734], p. 227)

• Column 1: chest

• Column 2: abdomen

• Column 3: arm

a= [ 09.0 12.0 3.0, 20.0 20.0 7.5, 10.0 23.0 6.0, 12.0 6.0 5.0,

08.5 15.0 3.0, 12.0 17.0 4.0, 11.0 13.0 6.0, 05.0 14.0 3.0,

13.0 19.0 3.0, 22.0 20.0 6.0, 10.5 12.0 3.5, 17.0 15.0 4.5,

10.0 7.0 4.0, 17.0 28.0 5.5, 15.0 15.5 3.0, 16.0 11.0 3.0,

07.0 13.0 2.5, 16.0 18.0 3.0, 09.0 12.5 5.0, 17.5 18.0 3.0,

15.5 28.5 5.0, 21.0 27.5 6.0, 23.0 24.0 6.5, 11.5 15.0 3.0,

22.5 20.0 4.5, 13.0 14.0 4.0, 14.0 21.0 2.5, 04.0 3.0 2.0,

05.5 8.5 3.0, 21.0 13.0 9.0, 16.0 11.0 3.0, 17.5 15.0 4.5,

25.0 35.0 6.5, 21.0 6.0 3.5, 16.5 17.0 4.0, 09.5 11.5 2.5,

15.0 19.0 4.0, 13.5 6.5 3.5, 16.0 15.0 3.0, 26.0 38.0 4.0,

12.5 20.0 3.0, 05.0 7.5 3.5, 12.0 15.0 3.5, 15.0 13.0 4.5,

17.0 19.5 5.0, 16.0 20.0 5.5, 09.0 4.0 2.0, 19.0 12.0 3.0,

16.0 17.5 6.0, 14.5 14.5 4.0 ];

3.15 SAT Data: SAS Institute [734]

• Column 1: ppvt (PICTURE VOCABULARY TEST)

• Column 2: rpmt (PROGRESSIVE MATRICES TEST)

• Column 3: n (NAMED)

• Column 4: s (STILL)

• Column 5: ns (NAMED STILL)

• Column 6: na (NAMED ACTION)


• Column 7: ss (SENTENCE STILL)

a = [ 49 48 08 01 02 06 12 16

47 76 13 05 14 14 30 27

11 40 13 00 10 21 16 16

09 52 09 00 02 05 17 08

69 63 15 02 07 11 26 17

35 82 14 02 15 21 34 25

06 71 21 00 01 20 23 18

08 68 08 00 00 10 19 14

49 74 11 00 00 07 16 13

08 70 15 03 02 21 26 25

47 70 15 08 16 15 35 24

06 61 11 05 04 07 15 14

14 54 12 01 12 13 27 21

30 55 13 02 01 12 20 17

04 54 10 03 12 20 26 22

24 40 14 00 02 05 14 08

19 66 13 07 12 21 35 27

45 54 10 00 06 06 14 16

22 64 14 12 08 19 27 26

16 47 16 03 09 15 18 10

32 48 16 00 07 09 14 18

37 52 14 04 06 20 26 26

47 74 19 04 09 14 23 23

05 57 12 00 02 04 11 08

06 57 10 00 01 16 15 17

60 80 11 03 08 18 28 21

58 78 13 01 18 19 34 23

06 70 16 02 11 09 23 11

16 47 14 00 10 07 12 08

45 94 19 08 10 28 32 32

09 63 11 02 12 05 25 14

69 76 16 07 11 18 29 21

35 59 11 02 05 10 23 24

19 55 08 00 01 14 19 12

58 74 14 01 00 10 18 18

58 71 17 06 04 23 31 26

79 54 14 00 06 06 15 14 ];

3.16 Data by Chatterjee and Price [157]

• Column 1: year

• Column 2: import

• Column 3: doprod

• Column 4: stock

• Column 5: consum

a = [ 49 15.9 149.3 4.2 108.1

50 16.4 161.2 4.1 114.8


51 19.0 171.5 3.1 123.2

52 19.1 175.5 3.1 126.9

53 18.8 180.8 1.1 132.1

54 20.4 190.7 2.2 137.7

55 22.7 202.1 2.1 146.0

56 26.5 212.4 5.6 154.1

57 28.1 226.1 5.0 162.3

58 27.6 231.9 5.1 164.3

59 26.3 239.0 0.7 167.6

60 31.1 258.0 5.6 176.8

61 33.3 269.8 3.9 186.6

62 37.0 288.4 3.1 199.7

63 43.3 304.5 4.6 213.9

64 49.0 323.4 7.0 223.8

65 50.3 336.8 1.2 232.0

66 56.6 353.9 4.5 242.9 ];

3.17 Wampler Data

a = [ 759 -2048 2048 -2048 2523 -2048 2048 -2048 1838 -2048 2048

-2048 1838 -2048 2048 -2048 2523 -2048 2048 -2048 759 ];

3.18 Freshman Data: Campbell & McCabe (1984)

c = [ 1.00 ,

.376 1.00 ,

.174 .217 1.00 ,

.048 -.010 .015 1.00 ,

.268 .119 .078 -.018 1.00 ,

.352 .138 .582 -.033 .187 1.00 ,

.216 .097 .032 -.069 .095 .094 1.00 ,

.179 .243 .599 -.066 .110 .531 .150 1.00 ,

-.036 .070 -.039 .063 -.009 .047 .035 .063 1.00 ,

.073 .279 .682 -.114 .132 .452 -.006 .564 .028 1.00 ,

-.183 .026 .287 .089 -.023 .194 -.266 .122 .084 .354 1.00 ];

3.19 Orheim Data: Campbell & McCabe (1984)

Constituent Elements in Coal Samples, Orheim (1981):

• Column 1: Aluminium

• Column 2: Silicium

• Column 3: Schwefel

• Column 4: Kalzium

• Column 5: Titan


• Column 6: Eisen

• Column 7: Selen

• Column 8: Strontium

• Column 9: Barium

a = [ 1.00 ,

.961 1.00 ,

.419 .454 1.00 ,

-.010 -.071 -.058 1.00 ,

.926 .879 .425 -.050 1.00 ,

.373 .370 .675 .195 .336 1.00 ,

.328 .280 .465 .005 .416 .424 1.00 ,

.030 -.032 .061 .629 .024 .093 .113 1.00 ,

.304 .269 .225 .103 .272 .185 .261 .489 1.00 ];

3.20 Thurstone Box Data (Jennrich & Sampson, 1966, p.320)

/* Thurstone’s (1947, p.371) centroid pattern for 26 empirical

box variables. The definition of variable i is a certain

function of the boxes’ Height (x), Width (y), and Depth (z). */

vnam = [ "x" "y" "z" "xy" "xz" "yz"

"xy’y" "xyy’" "xy’z" "xzy’" "yy’z" "yzy’"

"x/y" "y/x" "x/z" "z/x" "y/z" "z/y"

"x+y" "x+z" "y+z" "sqrt(xy’+yy’)" "sqrt(xy’+zy’)"

"sqrt(yy’+zy’)" "xyz" "sqrt(xy’+yy’+zy’)" ];

a = [ 650 -669 330 ,

740 530 370 ,

750 60 -639 ,

870 -39 480 ,

880 -399 -239 ,

890 410 -199 ,

840 -349 430 ,

860 220 430 ,

830 -549 -29 ,

850 -259 -439 ,

860 490 -9 ,

870 290 -379 ,

-69 -979 -89 ,

70 980 90 ,

-49 -549 800 ,

50 550 -799 ,

0 490 850 ,

0 -489 -849 ,

860 50 480 ,

870 -389 -319 ,

900 400 -189 ,

850 50 470 ,

860 -339 -319 ,

890 390 -159 ,


990 -9 10 ,

960 100 -19 ];

3.21 Twenty Four Psychological Tests (Holtzinger & Harman)

• Column 1: Visual Perception

• Column 2: Cubes

• Column 3: Paper Form Board

• Column 4: Flags

• Column 5: General Information

• Column 6: Paragraph Comprehension

• Column 7: Sentence Completion

• Column 8: Word Classification

• Column 9: Word Meaning

• Column 10: Addition

• Column 11: Code

• Column 12: Counting Dots

• Column 13: Straight-Curved Capitals

• Column 14: Word Recognition

• Column 15: Number Recognition

• Column 16: Figure Recognition

• Column 17: Object - Nunber

• Column 18: Number - Figure

• Column 19: Figure - Word

• Column 20: Deduction

• Column 21: Numerical Puzzles

• Column 22: Problem Reasoning

• Column 23: Series Completion

• Column 24: Arithmetic Problems

/* Original Correlation Matrix: */

corr = [ 1.000 0.318 0.403 0.468 0.321 0.335 0.304 0.332 0.326 0.116

0.308 0.314 0.489 0.125 0.238 0.414 0.176 0.368 0.270 0.365

0.369 0.413 0.474 0.282,

0.318 1.000 0.317 0.230 0.285 0.234 0.157 0.157 0.195 0.057

0.150 0.145 0.239 0.103 0.131 0.272 0.005 0.255 0.112 0.292

0.306 0.232 0.348 0.211,

0.403 0.317 1.000 0.305 0.247 0.268 0.223 0.382 0.184 -0.075

0.091 0.140 0.321 0.177 0.065 0.263 0.177 0.211 0.312 0.297

0.165 0.250 0.383 0.203,

0.468 0.230 0.305 1.000 0.227 0.327 0.335 0.391 0.325 0.099

0.110 0.160 0.327 0.066 0.127 0.322 0.187 0.251 0.137 0.339

0.349 0.380 0.335 0.248,


0.321 0.285 0.247 0.227 1.000 0.622 0.656 0.578 0.723 0.311

0.344 0.215 0.344 0.280 0.229 0.187 0.208 0.263 0.190 0.398

0.318 0.441 0.435 0.420,

0.335 0.234 0.268 0.327 0.622 1.000 0.722 0.527 0.714 0.203

0.353 0.095 0.309 0.292 0.251 0.291 0.273 0.167 0.251 0.435

0.263 0.386 0.431 0.433,

0.304 0.157 0.223 0.335 0.656 0.722 1.000 0.619 0.685 0.246

0.232 0.181 0.345 0.236 0.172 0.180 0.228 0.159 0.226 0.451

0.314 0.396 0.405 0.437,

0.332 0.157 0.382 0.391 0.578 0.527 0.619 1.000 0.532 0.285

0.300 0.271 0.395 0.252 0.175 0.296 0.255 0.250 0.274 0.427

0.362 0.357 0.501 0.388,

0.326 0.195 0.184 0.325 0.723 0.714 0.685 0.532 1.000 0.170

0.280 0.113 0.280 0.260 0.248 0.242 0.274 0.208 0.274 0.446

0.266 0.483 0.504 0.424,

0.116 0.057 -0.075 0.099 0.311 0.203 0.246 0.285 0.170 1.000

0.484 0.585 0.408 0.172 0.154 0.124 0.289 0.317 0.190 0.173

0.405 0.160 0.262 0.531,

0.308 0.150 0.091 0.110 0.344 0.353 0.232 0.300 0.280 0.484

1.000 0.428 0.535 0.350 0.240 0.314 0.362 0.350 0.290 0.202

0.399 0.304 0.251 0.412,

0.314 0.145 0.140 0.160 0.215 0.095 0.181 0.271 0.113 0.585

0.428 1.000 0.512 0.131 0.173 0.119 0.278 0.349 0.110 0.246

0.355 0.193 0.350 0.414,

0.489 0.239 0.321 0.327 0.344 0.309 0.345 0.395 0.280 0.408

0.535 0.512 1.000 0.195 0.139 0.281 0.194 0.323 0.263 0.241

0.425 0.279 0.382 0.358,

0.125 0.103 0.177 0.066 0.280 0.292 0.236 0.252 0.260 0.172

0.350 0.131 0.195 1.000 0.370 0.412 0.341 0.201 0.206 0.302

0.183 0.243 0.242 0.304,

0.238 0.131 0.065 0.127 0.229 0.251 0.172 0.175 0.248 0.154

0.240 0.173 0.139 0.370 1.000 0.325 0.345 0.334 0.192 0.272

0.232 0.246 0.256 0.165,

0.414 0.272 0.263 0.322 0.187 0.291 0.180 0.296 0.242 0.124

0.314 0.119 0.281 0.412 0.325 1.000 0.324 0.344 0.258 0.388

0.348 0.283 0.360 0.262,

0.176 0.005 0.177 0.187 0.208 0.273 0.228 0.255 0.274 0.289

0.362 0.278 0.194 0.341 0.345 0.324 1.000 0.448 0.324 0.262

0.173 0.273 0.287 0.326,

0.368 0.255 0.211 0.251 0.263 0.167 0.159 0.250 0.208 0.317

0.350 0.349 0.323 0.201 0.334 0.344 0.448 1.000 0.358 0.301

0.357 0.317 0.272 0.405,

0.270 0.112 0.312 0.137 0.190 0.251 0.226 0.274 0.274 0.190

0.290 0.110 0.263 0.206 0.192 0.258 0.324 0.358 1.0 0.167

0.331 0.342 0.303 0.374,

0.365 0.292 0.297 0.339 0.398 0.435 0.451 0.427 0.446 0.173

0.202 0.246 0.241 0.302 0.272 0.388 0.262 0.301 0.167 1.000

0.413 0.463 0.509 0.366,

0.369 0.306 0.165 0.349 0.318 0.263 0.314 0.362 0.266 0.405

0.399 0.355 0.425 0.183 0.232 0.348 0.173 0.357 0.331 0.413

1.000 0.374 0.451 0.448,

0.413 0.232 0.250 0.380 0.441 0.386 0.396 0.357 0.483 0.160

0.304 0.193 0.279 0.243 0.246 0.283 0.273 0.317 0.342 0.463

0.374 1.000 0.503 0.375,

0.474 0.348 0.383 0.335 0.435 0.431 0.405 0.501 0.504 0.262


0.251 0.350 0.382 0.242 0.256 0.360 0.287 0.272 0.303 0.509

0.451 0.503 1.0 0.434,

0.282 0.211 0.203 0.248 0.420 0.433 0.437 0.388 0.424 0.531

0.412 0.414 0.358 0.304 0.165 0.262 0.326 0.405 0.374 0.366

0.448 0.375 0.434 1.000 ] ;

/* Holtzinger centroid pattern reported by Harman (1960,211). */

a= [ 1 608 -116 300 -250 ,

2 372 -119 207 -135 ,

3 427 -220 262 -155 ,

4 477 -211 206 -184 ,

5 668 -306 -344 108 ,

6 661 -337 -258 216 ,

7 652 -396 -384 124 ,

8 662 -225 -153 -60 ,

9 664 -394 -240 308 ,

10 462 455 -365 -136 ,

11 569 397 -208 -63 ,

12 484 360 -149 -388 ,

13 608 130 -99 -402 ,

14 442 199 -13 293 ,

15 407 170 146 266 ,

16 523 77 300 76 ,

17 492 317 82 338 ,

18 547 307 248 72 ,

19 452 125 129 111 ,

20 612 -174 128 4 ,

21 601 114 80 -171 ,

22 608 -144 145 136 ,

23 691 -164 129 -116 ,

24 654 151 -150 -3 ];

3.22 Birth and Death Rates per 1000 Persons (Hartigan, p.197)

Original sorce: Readers Digest Almanac, 1966.

Copied from: Hartigan: Clustering Algorithms, page 197.

• Column 1: case number

• Column 2: country

• Column 3: birth rate

• Column 4: death rate

data = [ 1 "algeria" 36.4 14.6, 2 "congo" 37.3 8.0,

3 "egypt" 42.1 15.3, 4 "ghana" 55.8 25.6,

5 "ivory coast" 56.1 33.1, 6 "malagasy" 41.8 15.8,

7 "morocco" 46.1 18.7, 8 "tunisia" 41.7 10.1,

9 "cambodia" 41.4 19.7, 10 "ceylon" 35.8 8.5,

11 "china" 34.0 11.0, 12 "taiwan" 36.3 6.1,


13 "hongkong" 32.1 5.5, 14 "india" 20.9 8.8,

15 "indonesia" 27.7 10.2, 16 "iraq" 20.5 3.9,

17 "israel" 25.0 6.2, 18 "japan" 17.3 7.0,

19 "jordan" 46.3 6.4, 20 "korea" 14.8 5.7,

21 "malaysia" 33.5 6.4, 22 "mongolia" 39.2 11.2,

23 "philippines" 28.4 7.1, 24 "syria" 26.2 4.3,

25 "thailand" 34.8 7.9, 26 "vietnam" 23.4 5.1,

27 "canada" 24.8 7.8, 28 "costa rica" 49.9 8.5,

29 "dominican r" 33.0 8.4, 30 "guatemala" 47.7 17.3,

31 "honduras" 46.6 9.7, 32 "mexico" 45.1 10.5,

33 "nicaragua" 42.9 7.1, 34 "panama" 40.1 8.0,

35 "united states" 21.7 9.6, 36 "argentina" 21.8 8.1,

37 "bolivia" 17.4 5.8, 38 "brazil" 45.0 13.5,

39 "chile" 33.6 11.8, 40 "colombia" 44.0 11.7,

41 "ecuador" 44.2 13.5, 42 "peru" 27.7 8.2,

43 "uruguay" 22.5 7.8, 44 "venezuela" 42.8 6.7,

45 "austria" 18.8 12.8, 46 "belgium" 17.1 12.7,

47 "britain" 18.2 12.2, 48 "bulgaria" 16.4 8.2,

49 "czechoslovakia" 16.9 9.5, 50 "denmark" 17.6 19.8,

51 "finland" 18.1 9.2, 52 "france" 18.2 11.7,

53 "e. germany" 17.5 13.7, 54 "w. germany" 18.5 11.4,

55 "greece" 17.4 7.8, 56 "hungary" 13.1 9.9,

57 "ireland" 22.3 11.9, 58 "italy" 19.0 10.2,

59 "netherlands" 20.9 8.0, 60 "norway" 17.5 10.0,

61 "poland" 19.0 7.5, 62 "portugal" 23.5 10.8,

63 "romania" 15.7 8.3, 64 "spain" 21.5 9.1,

65 "sweden" 14.8 10.1, 66 "switzerland" 18.9 9.6,

67 "u.s.s.r." 21.2 7.2, 68 "yugoslavia" 21.4 8.9,

69 "australia" 21.6 8.7, 70 "new zealand" 25.5 8.8 ];

Chapter 4

The Bibliography

473

474 Bibliography

Bibliography

[1] Abbasi, S. & Shaheen, F. (2008), “Faster generation of normal and exponential variates using the ziggurat method”,Manuscript submitted to JSS.

[2] Abebe, A., Daniels, J., McKean, J.W., & Kapenga, J.A. (2001), Statistics and Data Analysis,http://www.stat.wmich.edu/s160/book/

[3] Abramowitz, M. & Stegun, I.A. (1972), Handbook of Mathematical Functions, Dover Publications, Inc., New York.

[4] Adlers, M. (1998), Sparse Least Squares Problems with Box Constraints, Linkoping: Linkoping University, Sweden.

[5] Agrawal, R., Imielski, T., & Swami, A (1993), “Mining association rules between sets of items in large databases”;Proceedings of the ACM SIGMOD Conference on Management of Data, p. 207-216.

[6] Agresti, A. (1996), An Introduction to Categorical Data Analysis, New York: John Wiley & Sons.

[7] Agresti, A. (2002), Categorical Data Analysis, Second Edition, New York: John Wiley & Sons.

[8] Ahn, H., Moon, H., Kim, S., & Kodell, R.L. (2002), “A Newton-based approach for attributing tumor lethality inanimal carcinogenicity studies”, Computational Statistics and Data Analysis, 38, 263-283.

[9] Akcin, H. & Zhang, X. (2010), “A SAS Macro for direct adjusted survival curves based on Aalen’s model”; JSS.

[10] Al-Baali, M. & Fletcher, R. (1985), “Variational Methods for Nonlinear Least Squares”, J. Oper. Res. Soc., 36,405-421.

[11] Al-Baali, M. & Fletcher, R. (1986), “An Efficient Line Search for Nonlinear Least Squares”, J. Optimiz. TheoryAppl., 48, 359-377.

[12] Al-Subaihi, A.A. (2002), “Variable Selection in Multivariable Regression Using SAS/IML”, JSS, 2002.

[13] Amestoy, Davis, & Duff, I.S. (1996), “An approximate minimum degree ordering algorithm”, SIAM J. MatrixAnalysis and Applic. 17, 886-905.

[14] Anderberg, M.R. (1973), Cluster Analysis for Applications, New York: Academic Press, Inc.

[15] Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J. , Du Croz, J., Greenbaum, A., Hammarling, S.,McKenney, A., Ostrouchov, S., & Sorensen, D., (1995), LAPACK User’s Guide, SIAM, Philadelphia, PA.

[16] Anderson, T. W. & Darling, D. A. (1954), “A test of goodness of fit”, Journal of the American Statistical Association,49, 765-769.

[17] Andrei, N. (2007), “Scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained opti-mization”; Optimization Methods and Software, 22, 561-571.

[18] Andrews, D.F., Bickel, P.J., Hampel, F.R., Huber, P.J., Rogers, W.H., Tukey, J.W. (1972), Robust Estimation ofLocation: Survey and Advances, Princeton NJ: Princeton University Press.

[19] Andrews, D.W.K. & Fair, R.C. (1988), ”Inference in Nonlinear Econometric Models with Structural Change,”Review of Economic Studies, 55, 615-640.

[20] Anraku, K. (1999), “An Information Criterion for Parameters under a simple order restriction”, Biometrika, 86,141-152.

[21] Appelgate, D., Bixby, R., Chvatal, V. & Cook, W.(2006), Concorde TSP Solver,http://www.tsp.gatech.edu/concorde.

475

476 Bibliography

[22] Appelgate, D., Bixby, R., Chvatal, V. & Cook, W.(2000), “TSP cuts which do not conform to the templateparadigm” in M. Junger & D. Naddef (eds.): Computational Combinatorial Optimization, Optimal or ProvablyNear-Optimal Solutions, Lecture Notes in Computer Science, Vol. 2241, pp. 261-304, London: Springer Verlag.

[23] Appelgate, D., Cook, W. & Rohe, A.(2003), “Chained Lin-Kernighan for large traveling salesman problems”,INFORMS Journal on Computing, 15, 82-92.

[24] Aranda-Ordaz, F.J. (1981), ”On two families of transformations to additivity for binary response data,” Biometrika,68, 357-364.

[25] Archer, C.O. & Jennrich, R.I. (1974), “Standard errors for rotated factor loadings”; Psychometrika, 38, 581-592.

[26] Armstrong, R.D. & Kung, D.S. (1979), “Algorithm AS 135: Min-Max Estimates for a Linear Multiple RegressionProblem”, Appl. Statist. 28, 93-100.

[27] Axelsson, O. (1996), Iterative Solution Methods, Cambridge University Press, Cambridge.

[28] Azzalini, A. & Capitanio, A. (1999), “Statistical applications of the multivariate skew-normal distribution”, JournalRoy. Statist. Soc. B 61, part 3.

[29] Azzalini, A. & Dalla Valle, A. (1996), “The multivariate skew-normal distribution”, Biometrika, 83, 715-726.

[30] Baker, F.B. (1992), Item Response Theory: Parameter Estimation Techniques, Marcel Dekker.

[31] Ballabio, D., Consonni, V., & Todeschini, R. (2009), “The Kohonen and CP-ANN toolbox: a collection of MAT-LAB modules of Self Organizing Maps and Counterpropagation Artificial Neural Networks”; Chemometrics andIntelligent Laboratory Systems, 98, 115-122.

[32] Ballabio, D., Consonni, V., Vasighi, M., & Kompany-Zareh, M. (2011), “Genetic algorithm for architecture optimi-sation of counter-propagation artificial neural networks”; Chemometrics and Intelligent Laboratory Systems, 105,56-64.

[33] Ballabio, D. & Vasighi, M. (2011), “A MATLAB toolbox for self organizing maps and derived supervised neuralnetwork learning strategies”; JSS, 2011.

[34] Bamber, D. (1975), “The area above the ordinal dominance graph and the area below the receiver operating graph”,J. Math. Psych., 12, 387-415.

[35] Bao, X. (2007), “Mining Transaction/Order Data using SAS Enterprise Miner Association Mode”; SAS GlobalForum, Paper 132-2007.

[36] Barber, C.B., Dobkin, D.P., & Hubdanpaa, H.T. (1996), “The quickhull algorithm for convex hulls”; ACM Trans.on Mathematical Software.

[37] Barnett, V. & Lewis, T. (1978), Outlier in Statistical Data, New York: John Wiley & Sons.

[38] Barrodale, I. & Philips, F.D.K. (1975), “An improved algorithm for discrete Chebyshev linear approximation”, inHartnell, B.L., Williams, H.C (Eds.): Proc. Fourth Manitoba Conf. on Numerical Mathematics, Winnipeg 1975,177-190.

[39] Barrodale, I. & Roberts, F.D.K. (1973), “An improved algorithm for discrete l1 linear approximation”, SIAMJournal Numerical Analysis, 10, 839-848.

[40] Barrodale, I. & Roberts, F.D.K. (1974), “Algorithm 478: Solution of an overdetermined system of equations in thel1-norm”,

[41] Bartels, R.E. & Stewart, G.W. (1972), “Solution of the equation AX + Xb = C”, Comm. ACM, 15, 820-26.

[42] Bartolucci, F. (2005), “Clustering univariate observations via mixtures of unimodal normal mixtures”; Journal ofClassification, 22, 203-219.

[43] Bates, D.M. & Watts, D.G. (1988), Nonlinear Regression Analysis and Its Applications, New York: John Wiley &Sons.

[44] Baxter, M. and King, R.G. (1999), “Measuring business cycles: Approximate band-pass filters for economic timeseries”; The Review of Economics and Statistics, 81, 575-593.

[45] Beale, E.M.L. (1972), ”A Derivation of Conjugate Gradients”, in Numerical Methods for Nonlinear Optimization,F. A. Lootsma (ed.), London: Academic Press.

[46] Bellman, R. (1995), Introduction to Matrix Analysis, SIAM, Philadelphia, PA.

Bibliography 477

[47] Ben-Israel, A. & Greville, T.N. (1974), Generalized Inverses: Theory and Applications, New York: John Wiley &Sons.

[48] Benjamini, Y. & Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach tomultiple testing”, Journal of the Royal Statistical Society B, 57, 289-300

[49] Benjamini, Y. & Liu, W. (1999), “A step-down multiple hypotheses testing procedure that controls the falsediscovery rate under interdependence”, Journal of Statistical Planning and Inference, 82, 163-170.

[50] Benjamini, Y. & Yekutieli, D. (2001), “The control of the false discovery rate in multiple testing under dependency”,Annals of Statistics, 291165-1188.

[51] Benner, A., Ittrich, C., & Mansmann, U. (2006), “Predicting survival using microarray gene expression data,”Technical Report, German Cancer Research Institute, Heidelberg, Germany.

[52] Benner, A. (2006), “Survival analysis in high dimensions,” Technical Report, German Cancer Research Institute,Heidelberg, Germany.

[53] Bennett, K.P. (1999), “Combining support vector and mathematical programming methods for classification”; in:B. Scholkopf, C. Burges, & A. Smola (eds.): Advances in Kernel Methods - Support Vector Machines, pp. 307-326;Cambridge, MA: MIT Press.

[54] Bennett, K. P. & Embrechts, M. J.(2003), “An optimization perspective on kernel partial least squares regression”,, .

[55] Bennett, K.P. & Mangasarian, O.L. (1992), “Robust linear programming discrimination of two linearly inseprablesets”, Optimization Software and Methods, 1, 23-34.

[56] Bentler, P.M. (1983), ”Some Contributions to Efficient Statistics in Structural Models: Specification and Estimationof Moment Structures”, Psychometrika, 48,493-517.

[57] Bentler, P.M. (1989): EQS, Structural Equations, Program Manual, Program Version 3.0, Los Angeles: BMDPStatistical Software, Inc.

[58] Bentler, P.M. & Bonett, D.G. (1980), ”Significance Tests and Goodness of Fit in the Analysis of CovarianceStructures”, Psychological Bulletin, 88, 588-606.

[59] Bentler, P.M. & Weeks, D.G. (1980), ”Linear Structural Equations with Latent Variables”, Psychometrika, 45,289-308.

[60] Bentler, P.M. & Weeks, D.G. (1982), ”Multivariate Analysis with Latent Variables”, in Handbook of Statistics, Vol.2, eds. P.R. Krishnaiah and L.N. Kanal, North Holland Publishing Company.

[61] Berkelaar, M., Eikland, K., Notebaert, P. (2004), lp solve (alternatively lpsolve),Open source (Mixed-Integer) LinearProgramming system, Version 5.5.

[62] Berkowitz, J. (2001), “Testing density forecasts, with applications to risk management”; Journal of Business andEconomic Statistics, 19, 465-474.

[63] Bernstein, P. L. (1992), Capital Ideas: The Improbable Origins of Modern Wall Street, New York: The Free Press.

[64] Berry, M. B. & Browne, M. (1999), Understanding Search Engines, Philadelphia: SIAM.

[65] Berry, M. B. & Liang, M. (1992), “Large scale singular value computations”, International Journal of SupercomputerApplications, 6, pp. 13-49.

[66] Berry, M.J.A. & Linoff, G. (1997), Data Mining for marketing, Sales, and Customer Support, New York: J. Wileyand Sons, Inc.

[67] Bernaards, C.A. & Jennrich, R. A. (2004), “Gradient projection algorithms and software for arbitrary rotationcriteria in factor analysis”; http://www.stat.ucla.edu/research; submitted for Publication.

[68] Betts, J. T. (1977), “An accelerated multiplier method for nonlinear programming”, Journal of Optimization Theoryand Applications, 21, 137-174.

[69] Bi, J., Bennett, K.P., Embrechts, M., Breneman, C.M., & Song, M. (2002), “Dimensionality reduction via sparsesupport vector machines”; Journal of Machine Learning, 1, 1-48.

[70] Billingsley, P. (1986), Probability and Measure, Second Edition, New York: J. Wiley.

478 Bibliography

[71] Birgin, E.G. & Martinez, J.M. (2001), “A spectral conjugate gradient method for unconstrained optimization”;Appl. Math. Optim., 43, 117-128.

[72] Bishop, Y.M., Fienberg, S. & Holland, P.W. (1975), Discrete Multivariate Analysis, Cambridge: MIT Press.

[73] Bjorck, A. (1996), Numerical Methods for Least Squares Problems, SIAM, Philadelphia, PA.

[74] Blaker, H. (2000), “Confidence curves and improved exact confidence intervals for discrete distributions”; CanadianJournal of Statistics, 28, 783-798.

[75] Blanchard, G. & Roquain, E. (2008), “Two simple sufficient conditions for FDR control”, Electronic Journal ofStatistics, 2, 963-992.

[76] Bock, R.D. (1972), “Estimating item parameters and latent ability when responses are scored in two or morenominal categories”; Psychometrika, 37, 29-51.

[77] Bock, R.D. & Aitkin, M. (1981), “Marginal maximum likelihood estimation of item parameters: Application of anEM algorithm”; Psychometrika, 46, 443-459.

[78] Boggs, P.T., Byrd, R.H., & Schnabel, R.B. (1987), “A stable and efficient algorithm for nonlinear orthogonaldistance regression”, SIAM J. Sci. Stat. Comput., 8, 1052-1078.

[79] Boggs, P.T., Byrd, R.H., Donaldson, J.R. & Schnabel, R.B. (1989), “Algorithm 676 - ODRPACK: Software forWeighted Orthogonal Distance Regression”, ACM TOMS, 15, 348-364.

[80] Boggs, P.T., Byrd, R.H., Rogers, J.E., & Schnabel, R.B. (1992), Users Reference Guide for ODRPACK Version2.01, Technical Report NISTIR 92-4834, National Institute of Standards and Technology, Gaithersburg MD.

[81] Boik, R. J. & Robison-Cox, J.F. (1997), “Derivatives of the Incomplete Beta Function”, paper submitted to Journalof Statistical Software.

[82] Bollen, K.A. (1986), “Sample Size and Bentler and Bonett’s Nonormed Fit Index”, Psychometrika, 51, 375-377.

[83] Bollen, K.A. (1987), “Total, direct, and indirect effects in structural equation models”, in: Sociological Methodology,C. C. Clogg (ed.), Washington, DC: American Sociological Association.

[84] Bollen, K.A. (1989 a), “A New Incremental Fit Index for General Structural Equation Models”, Sociological Methodsand Research, 17, 303-316.

[85] Bollen, K.A. (1989 b), Structural Equations with Latent Variables, New York: John Wiley & Sons, Inc.

[86] Bollen, K.A. & Stine, R.A. (1990), “Direct and indirect effects: classical and bootstrap estimates of variablity”, in:Sociological Methodology, C. Clogg (ed.), Washington, DC: American Sociological Association.

[87] Bollen, K.A. & Stine, R.A. (1992), “Bootstrapping goodness-of-fit measures in structural equation models”, in:Testing Structural Equation Models, Bollen, K.A. & J.S. Long (eds.), Newbury Park: Sage.

[88] Bollerslev, T. (1986), ”Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31,307-327.

[89] Bolstad, B.M., Irizarry, R.A., Astrand, M, and Speed, T.P. (2003) “A Comparison of Normalization Methods forHigh Density Oligonucleotide Array Data Based on Bias and Variance”; Bioinformatics, 19, 2, 185-193.

[90] Bonnett, D.G., Woodward, J.A., & Randall, R.L. (2002), “Estimating p-values for Mardia’s coefficients of multi-variate skewness and kurtosis”, Computational Statistics, 17, 117-122.

[91] de Boor, C. (1978), A Practical Guide to Splines, Berlin: Springer Verlag.

[92] Borchers, H. W. (2013), Package adagio, in CRAN.

[93] Borg, I. & Groenen, P. (2005), Modern Multidimensional Scaling: Theory and Applications; Berlin: Springer Verlag.

[94] Botev, Z. I., Grotowski, J. F., and Kroese, D. P. (2010), ”Kernel Density Estimation Via Diffusion”, Annals ofStatistics, 38, 2916-2957.

[95] Bouaricha, A. & More, J.J. (1995), “Impact of Partial Separability on Large-scale Optimization”, Technical ReportMCS-P487-0195, Argonne National Laboratory, Argonne.

[96] Bowman, K.O. & Shenton, L.R. (1979), “Approximate percentage points of Pearson distributions”; Biometrica, 66,147-151.

[97] Bracken, J. & McCormick, G.P. (1968), Selected Applications of Nonlinear Programming, New York: John Wiley& Sons.

Bibliography 479

[98] Bradley, P.S. & Mangasarian, O.L. (1998), “Feature selection via concave minimization and support vector ma-chines,” in: J. Shavlik (ed.), Machine Learning Proceedings of the Fifteenth International Conference, 82-90, SanFrancisco: Morgan Kaufmann.

[99] Bradley, P.S. & Mangasarian, O.L. (1998), “Massive Data Discrimination via Linear Support Vector Machines”,Technical Report 98-05, Data Mining Institute, University of Wisconsin, Madison, Wisconsin.

[100] Bradley, P.S., Mangasarian, O.L., & Musicant, D.R. (1999), “Optimization methods in massive datasets,” in: J.Abello, P.M. Pardalos, and M.G.C. Resende (eds), Handbook of Massive Datasets, Dordrecht, NL: Kluwer AcademicPress.

[101] Brand, E.M. (2002), “Incremental singular value decomposition of uncertain data with missing values”; EuropeanConference on Computer Vision (ECCV), 2350 : 707-720.

[102] Breiman, L. (1993), “Better subset selection using the non-negative garotte”; Technical Report, Univ. of California,Berkeley.

[103] Breiman, L. (2001), “Random Forests”, Machine Learning, 45, 5-32.

[104] Breiman, L. & Cutler, A. (2001), “Random Forests”, Technical Report www.stat.berkeley.edu/ breiman.

[105] Breiman, L. & Cutler, A. (2002), “How to use Survival Forests (SPDV1)”, Technical Reportwww.stat.berkeley.edu/ breiman.

[106] Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.H. (1984), Classification and Regression Trees, Wadsworth,Belmont CA.

[107] Brent, R. (1973), Algorithms for Minimization Without Derivatives, Prentice Hall, Inc.

[108] Breslow, N. E. & Day, N. E. (1980), Statistical Methods of Cancer Research; Vol. I: The analysis of Case-ControlStudies, IARC Scientific Publications, IARC Lyon.

[109] Breslow, N. (2005), “Case-Control Study, Two-phase”, in P. Armitage (ed.), Encyclopedia of Biostatistics, pp.734-741, New York: J. Wiley & Sons.

[110] Bretz, F. (1999), Powerful Modifications of William’s Test on Trend, Dissertation, Dept. of Bioinformatics, Univer-sity of Hannover.

[111] Bretz, F., Hsu, J.C., Pinheiro, J.C. & Liu, Y. (2008), “Dose finding - a challenge in statistics”, Biometrical Journal,50, 480-504.

[112] Bretz, F., Hothorn, T. & Westfall, P. (2011), Multiple Comparisons Using R, Boca Raton, London, New York: CRCPress.

[113] Bretz, F. & Hothorn, L.A. (1999), “Testing dose-response relationships with a priori unknown, possibly non-monotonic shapes”, Technical Report, Dept. of Bioinformatics, University of Hannover.

[114] Brockwell, P.J., Dahlhaus, R., & Trindade, A.A. (2002), “Modified Burg algorithms for multivariate subset autore-gression”, Technical Report 2002-015, Dept. of Statistics, Univeristy of Florida.

[115] Bronstein, I.N. & Semendjajew, K.A. (1966), Taschenbuch der Mathematik, B.G. Teubner, Leipzig.

[116] Brown, B.W., Lovato, J., & Russell, K. (1997) DCDFLIB: Library of Fortran Routines for Cumulatice DistributionFunctions, Inverses, and Other Parameters, Dept. of Biomathematics, University of Texas, Houston.

[117] Brown, B.W., Lovato, J., & Russell, K., Venier, J. (1997) RANDLIB: Library of Fortran Routines for RandomNumber Generation, Dept. of Biomathematics, University of Texas, Houston.

[118] Brown, R. G. (2009), “Dieharder: A random number test suite”; www.phy.duke.edu/~rgb/General/dieharder.php

[119] Browne, M.W. (1974), “Generalized Least Squares Estimators in the Analysis of Covariance Structures”, SouthAfrican Statistical Journal, 8, 1 - 24.

[120] Browne, M.W. (1982), “Covariance Structures”, in Topics in Multivariate Analyses, ed. D.M. Hawkins, CambridgeUniversity Press.

[121] Browne, M. W. (1984), “Asymptotically Distribution-free Methods for the Analysis of Covariance Structures”, Br.J. math. statist. Psychol., 37, 62-83.

[122] Browne, M. W., (1992), “Circumplex Models for Correlation Matrices”, Psychometrika, 57, 469-497.

480 Bibliography

[123] Browne, M. W. (2001), “An overview of analytic rotation in exploratory factor analysis”; Multivariate BehavioralResearch, 36, 111-150.

[124] Browne, M. W. & Cudeck, R. (1993), “Alternative Ways of Assessing Model Fit”, in: Testing Structural EquationModels, eds. K. A. Bollen & S. Long, Newbury Park: SAGE Publications.

[125] Browne, M. W. & Du Toit, S.H.C. (1992), “Automated Fitting of Nonstandard Models”, Multivariate BehavioralResearch, 27, 269-300.

[126] Browne, M.W. & Shapiro, A. (1986), “The Asymptotic Covariance Matrix of Sample Correlation Coefficients underGeneral Conditions”, Linear Algebra and its Applications, 82, 169-176.

[127] Brownlee, K.A. (1965), Statistical Theory and Methodology in Science and Engineering; New York: John Wiley &Sons.

[128] BSD UNIX Programmer’s Manual (1989), Hewlett-Packard Comp., Palo Alto, CA.

[129] Bunch, J.R. & Kaufman, L. (1977), “Some stable methods for calculating inertia and solving symmetric linearsystems”, Math. Comp. 31, 163-179.

[130] Burg, J. P. (1968), “A new analysis technique for time series data”; NATO Advanced Study on Signal Processingwith Emphasis on Underwater Acoustics, Enschede, netherlands, August, 1968.

[131] Burnham, K. (1989), ”Numerical Survival Rate Estimation for Capture-Recapture Models Using SAS PROC NLIN”,in: L.McDonald, B. Manly, J. Lockwood, & J. Logan (Eds.): Estimation and Analysis of Insect Populations, LectureNotes in Statistics 55, Springer Verlag, Berlin-Heidelberg-New York.

[132] Bus, J.C.P. & Dekker, T.J. (1975), “Two efficient algorithms with guaranteed convergence for finding a zero of afunction”, ACM TOMS, 1, 330-345.

[133] Butler, R.W., Davies, P.L., & Jhun, M. (1993), “Asymptotics for the Minimum Covariance Determinant Estimator”,The Annals of Statistics, 21, 1385-1400.

[134] Byrd, R.H., Lu, P., Nocedal, J., & Zhu, C. (1995), “A limited memory algorithm for bound constrained optimiza-tion”, SIAM Journal Scientific Computation, 16, 1190-1208.

[135] Byrd, R., Nocedal, J., & Schnabel,R. (1994) “Representations of Quasi-Newton Matrices and their use in LimitedMemory Methods”, Mathematical Programming, 63 no. 4, 129-156.

[136] Cai, L., Maydeu-Olivares, A., Coffman, D.L. & Thissen, D. (2006), “Limited information goodness of fit testing ofitem response theory models for sparse 2p tables”; British Journal of Mathematical and Statistical Psychology, 59,173-194.

[137] Cain, K.C. & Breslow, N.E. (1988), “Logistic regression analysis and efficient design for two-stage studies”, AmericanJournal of Epidemiology, 128, 1198-1206.

[138] Carbon, C.C. (2011), “BiDimRegression: A Matlab toolbox for calculating bidimensional regressions”; JSS, 2011.

[139] Carey, V., Zeger, S.L., & Diggle, K.Y., (1993) Modelling multivariate binary data with alternating logistic regressions,Biometrika, 80, 517-526.

[140] Carlson, B.C. & Notis, E.M. (1979), “Algorithm 577: Algorithms for Incomplete Elliptic Integrals”, ACM TOMS,7, 398-403.

[141] Carpaneto, G., Dell’Amico, M., & Toth, P. (1995), “A branch-and-bound algorithm for large scale asymmetrictraveling salesman problems”, Algorithm 750, Transactions on Mathematical Software, 21, 410-415.

[142] Carroll, R., Gail, M., Lubin, J. (1993), “Case-control studies with errors in covariates”, Journal of the AmericanStat. Association, 88, 185-199.

[143] Carroll, R.A., Ruppert, D. & Stefanski, L.A. (1995), Measurement error in Nonlinear Models, London: Chapmanand Hall.

[144] Carroll, R.A., Kuchenhoff, H., Lombard, F., & Stefanski, L.A. (1996), “Asymptotics for the SIMEX estimator innonlinear measurement error models”, JASA, 91, 242-25.

[145] Carroll, J.D. & Chang, J.J. (1970), “Analysis of individual differences in multidimensional scaling via an N-waygeneralization of ”Eckart-Young” decomposition”; Psychometrika, 35, 283-320.

[146] Carroll, R., Gail, M., Lubin, J. (1993), “Case-control studies with errors in covariates”, Journal of the AmericanStat. Association, 88, 185-199.

Bibliography 481

[147] Catral, M., Han, L., Neumann, M. & Plemmons, R. J. (2004), “On reduced rank nonnegative matrix factorizationfor symmetric nonnegative matrices”, Linear Algebra and Applications, 393, 107-127.

[148] Cattell, R.B. (1966), “The scree test for the number of factors”, Multivariate Behavioral Research, 1, 245-276.

[149] Cattell, R.B. & Vogelmann, S. (1977), “A comprehensive trial of the scree and KG criteria for determining thenumber of factors”, Multivariate Behavioral Research, 12, 289-325.

[150] Chamberlain, R.M., Powell, M.J.D., Lemarechal, C., & Pedersen, H.C. (1982), “The watchdog technique for forcingconvergence in algorithms for constrained optimization”, Mathematical Programming, 16, 1-17.

[151] Chan, T. F. (1982a), “An improved algorithm for computing the singular value decomposition”, ACM TOMS, 8,72-83.

[152] Chan, T. F. (1982b), “Algorithm 581: An improved algorithm for computing the singular value decomposition”,ACM TOMS, 8, 84-88.

[153] Chan, T. F. (1987), “Rank revealing QR factorizations”, Linear Algebra and Its Applic., 88/89, 67-82.

[154] Chang, C.M. (2003), “MinPROMEP: Generation of partially replicated minimal orthogonal main-effect plans usinga novel algorithm”, JSS, 2003.

[155] Chang, C.M. (2003), “Construction of partially replicated minimal orthogonal main-effect plans using a generalprocedure”, Utilitas Mathematica, 2003.

[156] Chasalow, S. (2005), “The combinat Package”; Technical Report for R functions. See CRAN website.

[157] Chatterjee, S. & Price, B. (1977), Regression Analysis by Example, New York: John Wiley & Sons.

[158] Chauvenet, W. (1863), A Manual of Spherical and Practical Astronomy, Dover N.Y. 1960, 474-566.

[159] Chen, R.-B., Chu, C.-H., and Weng, J.-Z. (2010), “A stochastic matching pursuit (SMP) MATLAB toolbox forBayesian variable selection”; JSS, 2010.

[160] Cheung, T. Y. (1980), “Multifacility location problem with rectilinear distance by the minimum cut approach”,ACM Trans. Math. Software, 6, 387-390.

[161] Christensen, R., Pearson, L.M., & Johnson, W. (1992), “Case deletion diagnostics for mixed models”, Technomet-rics, 34, 38-45.

[162] Chu, M. T. & Funderlik, R. E. (2002), “The centroid decomposition: Relationships between discrete variationaldecompositions and SVD’s”; SIAM Journal Matrix Anal. Appl., 23, 1025-1044.

[163] Chung, J,K., Kannappan, P.L., Ng, C.T. & Sahoo, P.K. (1989), “Measure of distance between probability distribu-tions”; Journal of Mathem. Analysis and Applications, 138, 280-292.

[164] Clarke, G.P.Y. (1987), ”Approximate Confidence Limits for a Parameter Function in Nonlinear Regression”, JASA,82, 221-230.

[165] Clauß, G. & Ebner, H. (1970), Grundlagen der Statistik fur Psychologen, Padagogen und Soziologen, Frankfurt a.M.und Zurich, Verlag Harri Deutsch.

[166] Cleveland, W.S. (1979), “Robust locally weighted regression and smoothing scatterplots”; JASA, 74, 829-836.

[167] Cleveland, W.S., Grosse, E. & Shyu, W.M. (1992), “Local regression models”; in: Statistical Models in S, eds. J.M.Chambers and T.J. Hastie; New York: Chapman & Hall.

[168] Cody, W.J. (1969), “Rational Chebyshev approximation for the error function”, Mathematics of Computation, 23,631-637.

[169] Cody, W.J. (1993), “Algorithm 715”, ACM TOMS, 19, 22-32.

[170] Cody, W.J. & Waite, W. (1980), Software Manual for the Elementary Functions, Prentice Hall, Inc., EnglewoodCliffs, NJ.

[171] Coleman, T.F., Garbow, B.S., & More, J.J. (1984), “Software for Estimating Sparse Jacobian Matrices”, ACMTOMS, 10, 329-345.

[172] Coleman, T.F., Garbow, B.S., & More, J.J. (1985), “Software for Estimating Sparse Jacobian Matrices”, ACMTOMS, 11, 363-377.

[173] Conn, A. R. & Gould, N.I.M (1984), “On the location of directions of finite descent for nonlinear programmingalgorithms”, SIAM Journal Numerical Analysis, 21, 1162-1179.

482 Bibliography

[174] Cook, R.D. (1998), Regression Graphics, New York: Wiley & Sons.

[175] Cook, J. R. & Stefanski, L.A. (1994), “Simulation Extrapolation in Parametric Measurement Error Models”, JASA89, 1314-1328.

[176] Cook, R.D. & Weisberg, S. (1990), “Confidence curves for nonlinear regression”, JASA, 85, 544-551.

[177] Cook, R.D & Weisberg, S. (1994), An Introduction to Regression Graphics, New York: John Wiley & Sons.

[178] Cook, R.D. & Weisberg, S. (1999), Applied Regression Including Computing and Graphics, New York: Wiley. &Sons

[179] Copenhaver, T.W. and Mielke, P.W. (1977), ”Quantit analysis: a quantal assay refinement,” Biometrics, 33, 175-187.

[180] Cormen, T.H., Leiserson, C.E., & Rivest, R.L. (1997), Introduction to Algorithms, Cambridge: MIT Press.

[181] Cornuejols, G. & Tutuncu, R. (2006), Optimization Methods in Finance, Pittsburgh PA: Carnegie Mellon University.

[182] Cox, C. (1998), “Delta Method”, Encyclopedia of Biostatistics, eds. Armitage, P. & Colton, T., 1125-1127, NewYork: J. Wiley.

[183] Cox, D.R. & Hinkley, D.V. (1974), Theoretical Statistics, London: Chapman and Hall.

[184] Chamberlain, R.M.; Powell, M.J.D.; Lemarechal, C.; & Pedersen, H.C. (1982), “The watchdog technique for forcingconvergence in algorithms for constrained optimization”, Mathematical Programming, 16, 1-17.

[185] Cooley, W.W. & Lohnes, P.R. (1971), Multivariate Data Analysis, New York: John Wiley & Sons, Inc.

[186] Cramer, J.S. (1986), Econometric Applications of Maximum Likelihood Methods, Cambridge: Cambridge UniversityPress.

[187] Cristianni, N. & Shawe-Taylor, J. (2000), Support Vector Machines and other kernel-based learning methods, Cam-bridge University Press, Cambridge.

[188] Croes, G. A. (1958), “A method for solving traveling-salesman problems”, Operations Research, 6, 791-812.

[189] Cronbach, L.J. (1951), “Coefficient alpha and the internal structure of tests”; Psychometrika, 16, 297-334.

[190] Croux, C., Filzmoser, P., Pison, G. & Rousseeuw (2004), “Fitting multiplicative models by robust alternatingregressions”; Technical Report.

[191] Cudeck, R. & Browne, M.W. (1984), “Cross-validation of covariance structures”, Multivariate Behavioral Research,18 62-83.

[192] Czyzyk, J., Mehrotra, S., Wagner, M. & Wright, S.J. (1997) “PCx User Guide (Version 1.1)”, Technical ReportOTC 96/01, Office of Computational and Technology Research, US Dept. of Energy.

[193] Dale, J.C. (1985), “A Bivariate Discrete Model of Changing Colour in Blackbirds”, in Statistics in Ornithology,Morgan, B.J.T. and North, P.M. (eds.) Berlin: Springer-Verlag.

[194] Dale, J.C. (1986), “Global Cross-Ratio Models for Bivariate, Discrete, Ordered Responses”, Biometrics, 42, 909-917.

[195] Dalenius, T. (1957), Sampling in Sweden. Contributions to Methods and Theories of Sample Survey Practice; Stock-holm: Almqvist & Wiksells.

[196] Daniel, J.W., Gragg, W.B. , Kaufman, L. & Stewart, G.W. (1976), “Reorthogonalization and Stable Algorithmsfor Updating the Gram-Schmidt QR Factorization”, Math. Comp. 30, 772-795.

[197] Davies, L. (1992), “The Asymptotics of Rousseeuw’s Minimum Volume Ellipsoid Estimator”, The Annals of Statis-tics, 20, 1828-1843.

[198] Davison, A.C. & Hinkley, D.V. (1997), Bootstrap Methods and their Application, Cambridge: Cambridge UniversityPress.

[199] Davison, A.C., Hinkley, D.V., & Schechtman, E. (1986), “ Efficient bootstrap simulation”, Biometrika, 73, 555-566.

[200] Dayal, B.S. & McGregor, J.F. (1997), “Improved PLS Algorithms”, Journal of Chemometrics, 11, 73-85.

[201] Deak, I. (1980), “Three Digit Accurate Multiple Normal Probabilities”, Numer. Math., 35, 369-380.

[202] De Berg, M., Cheong, O., & van Kreveld, M. (2008). Computational Geometry: Algorithms and Applications, NewYork: Springer, 2008.

Bibliography 483

[203] DeJong, S. (1993), “SIMPLS: An alternative approach to partial least squares regression”, Chemometrics andIntelligent Laboratory Systems, 18, 251-263.

[204] DeJong, S. & ter Braak, C.J.F. (1994), Journal of Chemometrics, 8, 169-174.

[205] de Leeuw, J. (1977), “Applications of convex analysis to multidimensional scaling”; in: Barra, Brodeau, Romier, &van Cutsem (eds.): Recent Developments in Statistics, Amsterdam: North Holland Publishing Company.

[206] de Leeuw, J. (1983), ”Models and Methods for the Analysis of Correlation Coefficients”, Journal of Econometrics,22, 113-137.

[207] de Leeuw, J. (1984), “Canonical Analysis of Categorical Data”; Leiden: DSWO Press.

[208] de Leeuw, J. (1994), “Block relaxation methods in statistics”; in Bock, Lenski, & Richter (ed.): Information Systemsand Data Analysis, Berlin: Springer Verlag.

[209] de Leeuw, J. (2007), see the following website for a set of R programs: http://gifi.stat.ucla.edu/psychoR/

[210] de Leeuw, J. & Heiser, W. (1977), “Convergence of correction-matrix algorithms for multidimensional scaling”; inJ. C. Lingoes (ed.): Geometric Representations of Relational Data, Ann Arbor, MI: Mathesis Press.

[211] de Leeuw, J. & Pruzansky, S. (1978), “A new computational method to fit the weighted Euclidean distance model”;Psychometrika, 43, 479-490.

[212] de Leeuw, J., Young, F. W. & Takane, Y. (1976), “Additive structure in qualitative data: An alternating leastsquares method with optimal scaling features”; Psychometrika, 41, 471-503.

[213] DeLong E.R., DeLong D.M., & Clarke-Pearson, D.L. (1988), “Comparing the areas under two or more correlatedreceiver operating characteristic curves: A nonparametric approach,” Biometrics, 44, pp. 837-845.

[214] Dennis, J.E., Gay, D.M., & Welsch, R.E. (1981), “An Adaptive Nonlinear Least-Squares Algorithm”, ACM Trans.Math. Software, 7, 348-368.

[215] Dennis, J.E. & Mei, H.H.W. (1979), “Two new unconstrained optimization algorithms which use function andgradient values”, J. Optim. Theory Appl., 28, 453-482.

[216] Dennis, J.E. & Schnabel, R.B. (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations,New Jersey: Prentice-Hall.

[217] Diggle, P.J., Liang, K.Y., & Zeger, S.L. (1994) Analysis of Longitudinal Data, Oxford: Clarendon Clarendon Press,1994.

[218] Ding, C., He, X., & Simon, H. D. (2005), “On the equivalence of nonnegative matrix factorization and spectralclustering”, Proc. SIAM Data Mining Conf., 2005.

[219] Ding, C., Peng, T.L., & Park, H. (2006), “Orthogonal nonnegative matrix tri-factorizations for clustering”, .

[220] Dixon, W. J. (1950), “Analysis of extreme values”; Annals of Mathematical Statistics, 21, 488-506.

[221] Dixon, W. J. (1951), “Ratios involving extreme values”; Annals of Mathematical Statistics, 22, 68-78.

[222] DOMAIN C Library (CLIB) Reference (1985), Apollo Computer Inc., Chelmsford, MA.

[223] Dongarra, J.J., Bunch, J.R., Moler, C.B., & Stewart, G.W. (1979), LINPACK User’s Guide, SIAM, Philadelphia,PA.

[224] Doornik, J. A. & Hansen, H. (2008): “An omnibus test for univariate and multivariate normality”, Oxford Bulletinof Economics and Statistics, 70, 927-939.

[225] Draper, N.R. & Smith, H. (1981), Applied Regression Analysis, New York: J. Wiley.

[226] Du, D.-Z. & Pardalos, P. M. (1993), Network Optimization Problems - Algorithms, Applications, and Complexity,Singapore: World Scientific.

[227] Duff, I.S. & Reid, J.K. (1978), An implementation of Tarjan’s algorithm for the block triangularization of a matrix,ACM TOMS, 4, 137-147.

[228] Duncan, G.T. (1978), “An Empirical Study of jackknife-constructed confidence regions in nonlinear regression”,Technometrics, 20, 123-129.

[229] Dunnett, C. W. (1980), “Pairwise multiple comparisons in the unequal variance case”, Journal of the AmericanStatistical Association, 75, 796-800.

484 Bibliography

[230] Ecker, J. G. & Kupferschid, M. (1988), Introduction to Operations Research, (Reprint of 1991), Malabar, FL: KriegerPublishing Company

[231] Eckert, R. E. (1994), Purdue University, W. Lafayette, Personal Communication.

[232] Edlund, O. (1999), Solution of Linear Programming and Non-Linear Regression Problems Using Linear M-Estimation Methods, LuleaUniversity of Technology, Sweden.

[233] Efron, B. (1994), The Jackknife, the Bootstrap, and Other Resampling Methods, Philadelphia: SIAM.

[234] Efron, B. & Tibshirani, R.J. (1993), An Introduction to the Bootstrap, New York: Chapman & Hall.

[235] Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. (2002), “Least Angle Regression”, The Annals of Statistics,32, 407-499.

[236] Einarsson, H. (1998), “Algorithms for General non-differentiable Optimization Problems”; Master of Science Thesis,IMM, Technical University of Denmark.

[237] Einarsson, H. & Madsen, K. (1999), “Sequential linear programming for non-differentiable optimization”; Paperpresented at the SIAM Meeting on Optimization, Atlanta 1999.

[238] Elton, E. and Gruber, M. (1981), Modern Portfolio Theory and Investment Analysis, New York: John Wiley &Sons, Inc.

[239] Emerson, P.L. (1968), “Numerical construction of orthogonal polynomials from a general recurrence formula”,Biometrics, 24, 695-701.

[240] Engle, R.F. (1982), ”Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of UK Inflation”,Econometrica, 50, 987-1008.

[241] Eskow, E. & Schnabel, R.B. (1991), “Algorithm 695: Software for a New Modified Cholesky Factorization”, ACMTrans. Math. Software, 17, 306-312.

[242] Esmailzadeh, N. (2013), “Two level search designs in Matlab”, paper and software submitted to JSS.

[243] Everitt, B.S. (1984), An Introduction to Latent Variable Methods, London: Chapman & Hall.

[244] Everitt, B.S. (1996), A Handbook of Statistical Analyses Using SAS, London: Chapman & Hall.

[245] Fan, J., & Li, R. (2001), “Variable selection via nonconcave penalized likelihood and its oracle properties,” JASA,96, pp. 1348-1360.

[246] Fan, J., & Li, R. (2002), “Variable selection for Cox’s proportional hazards model and frailty model,” The Annalsof Statistics, 30, pp. 74-99.

[247] Fay, M. P. (2010), “Two-sided exact tests and matching confidence intervals for discrete data”, R Journal, 2, 53-58.

[248] Ferguson, T. (1996), A Course in Large Sample Theory, London: Chapman & Hall.

[249] Ferson, W.E. and Harvey, C.R. (1992), ”Seasonality and Consumption-Based Asset Pricing,” Journal of Finance,47, 511-552.

[250] Fig, M. (2010), Matlab toolbox for permutations and combinations.

[251] Finney, D.J. (1947), “The estimation from individual records of the relationship between dose and quantal response”,Biometrika, 34, 320-334.

[252] Fisher, R. A. (1935), “The logic of inductive inference”, Journal of the Royal Statistical Society, Series A, 39-54.

[253] Fisher, R.A. (1936), “The use of multiple measurements in taxonomic problems”, Annals of Eugenics, 7, part II,pp. 179-188.

[254] Fisher, R. A. (1962), “Confidence limits for a cross-product ratio”, Australian Journal of Statistics, 4, 41.

[255] Fisher, R. A. (1970), Statistical Methods for Research Workers, Oliver & Boyd.

[256] Fishman, G.S. (1996), Monte Carlo: Concepts, Algorithms, and Applications, New York: Springer Verlag.

[257] Fishman, G.S. & Moore, L.R. (1986), “An exhaustive analysis of multiplicative congruential random number gen-erators with modulus 2**31-1”, SIAM Journal on Scientific and Statistical Computation, 7, p.24-45.

[258] Fletcher, R. (1987), Practical Methods of Optimization, 2nd Ed., Chichester: John Wiley & Sons.

[259] Fletcher, R. & Powell, M.J.D. (1963), “A Rapidly Convergent Descent Method for Minimization”, Computer Jour-nal, 6, 163-168.

Bibliography 485

[260] Fletcher, R. & Xu, C. (1987), “Hybrid Methods for Nonlinear Least Squares”, Journal of Numerical Analysis, 7,371-389.

[261] Floudas, C.A. & Pardalos, P.M. (1990), A Collection of Test Problems for Constrained Global Optimization Algo-rithms; Lecture Notes in Computer Science, 455, Berlin: Springer Verlag.

[262] Forrest, J. J. & Lougee-Helmer, R. (2014), Cbc, [email protected], [email protected].

[263] Forrest, J. J., de la Nuez, D., & Lougee-Helmer, R. (2014), Clp, http://www.tsp.gatech.edu/concorde.

[264] Forsythe, G.E., Malcolm, M.A., & Moler, C.B. (1977), Computer Methods for Mathematical Computations,Englewood-Cliffs, NJ: Prentice Hall, 1977.

[265] Foster, D.P. & Stine, R.A. (2004), “Variable selection in Data Mining: Building a predictive model for bankruptcy”;JASA, 99, 303-313.

[266] Fox, T., Hinkley, D. & Larntz, K. (1980), “Jackknifing in Nonlinear Regression”, Technometrics, 22, 29-33.

[267] Frank, I. & Friedman, J. (1993), “A statistical view of some chemometrics regression tools”; Technometrics, 35,109-148.

[268] Fraser, C. (1980), COSAN User’s Guide, Toronto: The Ontario Institute for Studies in Education.

[269] Fraser, C. & McDonald, R.P. (1988), “NOHARM: Least squares item factor analysis”, Multivariate BehavioralResearch, 23, 267-269.

[270] Fredman, M.L., Johnson, D.S., McGeoch, L.A. & Ostheimer, G. (1995), “Data structures for Traveling Salesmen”,J. Algorithms, 18, 432-479.

[271] Friedman, A., & Kohler, B. (2003), “Bidimensional regression: Assessing the configural similarity and accuracy ofcognitive maps and other two-dimensional data sets”, Psychological Methods, 8, 468-491.

[272] Friedman, J.H., Bentley, J.L., Finkel, R.A. (1977), “An algorithm for finding best matches in logarithmic expectedtime”, em ACM Transactions Math. Softw., p. 209-226, http://doi.acm.org/10.1145/355744.355745

[273] Friedman, J.H., & Tukey, J.W. (1974), “A projection pursuit algorithm for exploratory data analysis”, J. Amer.Stat. Assoc. 62, 1159-1178.

[274] Friedman, J. H. & Stuetzle, W. (1981), “Projection pursuit regression”, JASA, 76, 817-823.

[275] Fritsch, F.N. & Carlson, R.E. (1980), “Monotone piecewise cubic interpolation”, SIAM Journal Numerical Analysis,17, 238-246.

[276] Froehlich, H., & Zell, A., (2004), “Feature subset selection for support vector machines by incremental regularizedrisk minimization”, in: The International Joint Conference on Neural Networks (IJCNN), 3, 2041-2046.

[277] Fuller, W.A. (1987), Measurement Error Models, New York: J. Wiley & Sons.

[278] Fung, G. & Mangasarian, O.L. (2000), “Proximal Support Vector Machines”, Technical Report, Data MiningInstitute, University of Wisconsin, Madison, Wisconsin.

[279] Fung, G. & Mangasarian, O.L. (2002), “Finite Newton method for Lagrangian support vector machine classifi-cation”; Technical Report 02-01, Data Mining Institute, Computer Sciences Dep., Univ. of Wisconsin, Madison,Wisconsin, 2002.

[280] Fung, G. & Mangasarian, O.L. (2003), “A Feature Selection Newton Method for Support Vector Machine Classifi-cation”, Computational Optimization and Applications, 1-18.

[281] Gaevert, H., Hurri, J., Saerelae, J. & Hyvaerinen, A. (2005) see http://www.cis.hut.fi/projects/ica/fastica/

for version 5

[282] Gallant, A. R. (1987), Nonlinear Statistical Models, New York: John Wiley & Sons, Inc.

[283] Garbow, B.S., Boyle, J.M., Dongarra, J.J., & Moler, C.B. (1977), Matrix Eigensystem Routines - EISPACK GuideExtension, Lecture Notes in Computer Science, vol. 51, Springer Verlag, Berlin.

[284] Gay, D.M. (1983), “Subroutines for Unconstrained Minimization”, ACM Trans. Math. Software, 9, 503-524.

[285] Genton, M. (2001), “Classes of kernels for machine learning: a statistics perspective”, Journal of Machine LearningResearch, 2, 299-312.

[286] Gao, X., et al. (2008), “Nonparametric multiple comparison procedures for unbalanced one-way factorial designs”,Journal of Planning and Inference, 77, 2574-2591.

486 Bibliography

[287] Genz, A. (1986), “Fully Symmetric Interpolatory Rules for multiple Integrals”, SIAM Journal Numer. Analysis,23, 1273-1283.

[288] Genz, A. (1991), Computing in the 90s, Lecture Notes in Computer Science, vol. 507, New York: Springer Verlag.

[289] Genz, A. (1992), “Numerical Computation of Multivariate Normal Probabilities”, J. of Computational and GraphicalStat., 1, 141-149.

[290] Genz, A. (1999), “Numerical computation of critical values for multiple comparison problems”, Technical Report.

[291] Genz, A. (2000), “Numerical computation of bivariate and trivariate normal probabilities”, Technical Report.

[292] Genz, A. (2004), “Numerical computation of rectangular bivariate and trivariate normal and t-probabilities”, Statis-tics and Computing, 14, 251-260.

[293] Genz, A. & Bretz, F. (1999), “Methods for the computation of multivariate t-probabilities”, Technical Report.

[294] Genz, A. & Bretz, F. (1999), “Numerical computation of multivariate t-probabilities with application to powercalculation of multiple contrasts”, Technical Report.

[295] Genz, A. & Bretz, F. (2002), “Methods for the computation of multivariate t-probabilities”, Journal of Computa-tional and Graphical Statistics, 11, 950-971.

[296] Genz, A. & Bretz, F. (2009), Computation of multivariate normal and t-probabilities, Lecture Notes in Statistics,Heidelberg: Springer Verlag.

[297] Genz, A. & Kass, R. (1996), “Subregion adaptive integration of functions having a dominant peak”, J. of Compu-tational and Graphical Stat.,

[298] Genz, A. & Kwong, K.S. (1999), “Numerical evaluation of singular multivariate normal distributions”, TechnicalReport.

[299] Genz, A. & Monahan, J. (1996), “Stochastic integration rules for infinite regions”, SIAM Journal Scientific Com-putation,

[300] George, J.A. & Liu, J.W. (1981), Computer Solution of Large Sparse Positive Definite Systems, New Jersey:Prentice-Hall.

[301] George, J.A., Gilbert, J.R., & Liu, J.W.H. (1993), Graph Theory and Sparse Computations, Springer Verlag, NewYork.

[302] Ghali, W.A., Quan, H., Brant, R., van Melle, G., Norris, C.M., Faris, P.D., Galbraith, P.D. & Knudtson, M.L.(2001), “Comparison of 2 methods for calculating adjusted survival curves from proportional hazards model”,Journal of American Medical Association, 286, 1494-1497.

[303] Ghosh, S. & Teschmacher, L. (2002), “Comparisons of search designs using search probabilities”, Journal of Statis-tical Planning and Inference, 104, 439-458.

[304] Ghysels, E. and Hall, A. (1990), ”A Test for Structural Stability of Euler Conditions Parameters Estimated via theGeneralized Method of Moments Estimator,” International Economic Review, 31, 355-364.

[305] Gifi, A., (1990) Nonlinear Multivariate Analysis, Chichester: Wiley, 1990.

[306] Gilbert, J.R, Ng, E., & Peyton, B.W. (1994), “An efficient algorithm to compute row and column counts for sparseCholesky factorization”, SIAM J. Matrix Anal. Appl., 15, 1075-1091.

[307] Gill, P.E. , Murray, W., Ponceleon, D.B., & Saunders, M.A. (1992), “Preconditioners for indefinite systems arisingin optimization”, SIAM J. on Matrix Analysis and Applications, 13, 292-311.

[308] Gill, E.P., Murray, W., Saunders, M.A., & Wright, M.H. (1983), “Computing Forward-Difference Intervals forNumerical Optimization”, SIAM Journal on Scientific and Statistical Computation, 4, 310-321.

[309] Gill, E.P., Murray, W., Saunders, M.A., & Wright, M.H. (1984), “Procedures for Optimization Problems with aMixture of Bounds and General Linear Constraints”, ACM Trans. Math. Software, 10, 282-298.

[310] Gill, P.E., Murray, W., & Wright, M.H. (1981), Practical Optimization, Academic Press, New York.

[311] Gini, C. (1912), “Variabilita e mutabilita, contributo allo studio delle distribuzioni e delle relazioni statistiche”,Studi Economico - Giuridici della R. Universita di Cagliari, 3, 3-159.

[312] Gleason, J.R. (1988), “Algorithms for balanced bootstrap simulations”, American Statistician, 42, 263-266.

Bibliography 487

[313] Goano, M. (1995), “Algorithm 745: Computation of the Complete and Incomplete Fermi-Dirac Integral”, ACMTOMS, 21, 221-232.

[314] Goffe, W.L., Ferrier, G.D., & Rogers, J. (1994): “Global optimization of statistical functions with simulatedannealing”; Journal of Econometrics, 60, 65-99.

[315] Gold, C., Holub, A. & Sollich, P., (2005), “Bayesian approach to feature selection and parameter tuning for supportvector machine classifiers”, Neural Networks, 18(5-6), 693-701.

[316] Goldberg, D. E. (1989), Genetic Algorithms in Search, Optimization and Machine Learning, Reading: Addison-Wesley.

[317] Goldberg, D. E., & Richardson, J. (1987), “Genetic Algorithms with Sharing for Multimodal Function Optimiza-tion”, in: Genetic Algorithms and their Applications, Proceedings of the Second International Conference on GeneticAlgorithms, 41-49.

[318] Goldfarb, D. & Idnani, A. (1983): “A numerically stable dual method for solving strictly convex quadratic pro-grams”; Mathematical Programming, 27, 1-33.

[319] Goldfeld, S.M., Quandt, R.E., & Trotter, H.F. (1966), “Maximisation by quadratic hill-climbing”, Econometrica,34, 541-551.

[320] Golub, G., & Reinsch, C. (1970), “Singular value decomposition and least squares solution”, Numerische Mathe-matik, 14, 403-420.

[321] Golub, G., & Van Loan, C.F. (1980), “An analysis of the total least squares problem”, SIAM Journal NumericalAnalysis, 883-893.

[322] Golub, G., & Van Loan, C.F. (1989), Matrix Computations, John Hopkins University Press, 2nd ed., Baltimore,MD.

[323] Gonin, R. & Money, A.H. (1989), Nonlinear Lp-norm Estimation, New York: M. Dekker, Inc.

[324] Goodall, C. (1983), “M-estimators of location: An outline of theory”; in: Hoaglin, D.C., Mosteller, F., & Tukey,J.W.: “Understanding Robust and Exploratory Data Analysis”, New York: John Wiley & Sons.

[325] Goodman, L.A. (1965), “On simultaneous confidence intervals for multinomial proportions”, Technometrics, 7,247-254.

[326] Goodman, L.A. (1985), “The analysis of cross-classified data having ordered and/or unordered categories: Associ-ation models, correlation models, and asymmetry models for contingency tables with or without missing entries”,The Annals of Statistics, 13, 10-69.

[327] Goodnight, J.H. (1979), “A Tutorial on the SWEEP Operator”, The American Statistician, 33, 149-158.

[328] Gould, N.I.M. (1986), “On the accurate determination of search directions for simple differentiable penalty func-tions”, IMA Journal of Numerical Analysis, 6, 357-372.

[329] Gould, N.I.M. (1989), “On the convergence of a sequential penalty function method for constrained minimization”,SIAM Journal Numerical Analysis, 26, 107-128.

[330] Gower, J.C. (1971), “A general coefficient of similarity and some of its properties”, Biometrics, 27, 857-871.

[331] Grassman, R., Gramacy, R.B., & Sterratt, D.C. (2011), Package ‘geometry’;http://geometry.r-forge.r-project.org/.

[332] Graybill, F.A. (1969), Introduction to Matrices with Applications in Statistics, Belmont, CA: Wadsworth, Inc.

[333] Green, D. & Swets, J. (1996), Signal Detection Theory and Psychophysics, New York: John Wiley, 45-49.

[334] Greenbaum, A. (1997), Iterative Methods for Solving Linear Systems, Philadelphia: SIAM.

[335] Grubbs, F. E. (1969), “Procedures for detecting outlying observations in samples”, Technometrics, 11, 1-21.

[336] Gupta, A. & Avron, H. (2000, 2013), WSMP: Watson Sparse Matrix Package, Part I - direct solu-tion of symmetric systems, Part II - direct solution of general systems, Part III - iterative solution ofsparse systems, version 13.11, IBM Research Division, 1101 Kitchawan Road, Yorktown Heights, NY 10598http://www.research.ibm.com/projects/wsmp

[337] Guttman, L. (1953), ”Image theory for the structure of quantitative variates”, Psychometrika, 18, 277-296.

488 Bibliography

[338] Guttman, L. (1957), ”Empirical Verification of the Radex Structure of Mental Abilities and Personality Traits”,Educational and Psychological Measurement, 17, 391-407.

[339] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002), “Gene selection for cancer classification using supportvector machines”, Machine Learning, 46, 389-422, 2002.

[340] Guyon, I., & Elisseeff, A. (2003), “An introduction to variable and feature selection”, Journal of Machine LearningResearch, 3, 1157-1182, 2003.

[341] Hagglund, G. (1982), ”Factor Analysis by Instrumental Variable Methods”, Psychometrika, 47, 209-222.

[342] Hahsler, M. & Hornik, K. (2013), “TSP - Infrastructure for the traveling salesperson problem”, Technical Report,CRAN.

[343] Hald, A. (1952), Statistical Theory with Engineering Applications, New York: J. Wiley.

[344] Hald, J. & Madsen, K. (1981), ”Combined LP and Quasi-Newton Methods for Minimax Optimization”, Mathemat-ical Programming, 20, 49-62.

[345] Hald, J. & Madsen, K. (1985), ”Combined LP and Quasi-Newton Methods for Nonlinear l1 Optimization”, SIAMJournal Numerical Analysis, 20, 68-80.

[346] Hall, P. (1992), The Bootstrap and Edgeworth Expansion, New York: Springer-Verlag.

[347] Hamers, B., Suykens, J.A.K. & Vandevalle, J. (2002), “Compactly supported RBF kernels for sparsifying the Grammatrix in LS-SVM regression models”, Proceedings ICANN 2002, Madrid, Spain; 720-726. Technical Report, Kath.University of Leeuven.

[348] Hamilton, J.D. (1994) Time Series Analysis, Princeton: Princeton University Press.

[349] Hanley, J.A. & McNeil, B.J. (1982), “The meaning and use of the area under a receiving operating characteristic(ROC) curve”, Radiology, 143, 29-36.

[350] Hansen, L.P. (1982), ”Large Sample Properties of Generalized Method of Moments Estimators,” Econometrica,50,1029-1054.

[351] Hansen, L.P. & Singleton, K.J. (1982), ”Generalized Instrumental Variables Estimation of Nonlinear RationalExpectations Models,” Econometrica, 50,1269-1280.

[352] Hansen, P. R. (2005), “A test for superior predictive ability”; Journal of Business and Economic Statistics, 23,365-380. Mathematical Programming, 27, 1-33.

[353] Harbison, S.P., & Steele, G.L. (1984), A C Reference Manual, Prentice Hall, Englewood Cliffs, NJ.

[354] Hardouin, J.-B. (2007), “Non parametric Item Response Theory with SAS and Stata”; JSS, 2007.

[355] Hardouin, J.-B. & Mesbah, M. (2007), “The SAS macro-program Item Response Theory Models”; Communicationsin Statistics - Simulation and Computation, 36, 437-453.

[356] Harman, H.H. (1976), Modern Factor Analysis, Chicago: University of Chicago Press.

[357] Hartigan, J. A. (1975), Clustering Algorithms, New York: John Wiley & Sons.

[358] Hartmann, W. (1979), Geometrische Modelle zur Analyse empirischer Daten, Berlin: Akademie Verlag.

[359] Hartmann, W. (1991), The NLP Procedure: Extended User’s Guide, Releases 6.08 and 6.10; SAS Institute Inc.,Cary, N.C.

[360] Hartmann, W. (1992), Nonlinear Optimization in IML, Releases 6.08, 6.09, 6.10; Technical Report, Cary, N.C.:SAS Institute Inc.

[361] Hartmann, W. (1994), ”Using PROC NLP for Risk Minimization in Stock Portfolios”, Technical Report, Cary,N.C.: SAS Institute Inc.

[362] Hartmann, W. (1995), L1 Norm Minimization and Robust Regression - New in IML Release 6.11, Technical Report,Cary, N.C.: SAS Institute Inc.

[363] Hartmann, W. (1995), The CALIS Procedure: Extended User’s Guide, Release 6.11; SAS Institute Inc., Cary, N.C.

[364] Hartmann, W. (2003), “User Guide for PROC DMNEURL”; Technical Report, Cary: SAS Institute.

[365] Hartmann, W. (2005), “Resampling methods in structural equations”, in: A. Maydeu-Olivares & J.J. McArdle:Comtemporary Psychometrics (Festschrift for Roderick P. McDonald), Mahwah NJ: Laurence Erlbaum.

Bibliography 489

[366] Hartmann, W. (2007), “Tensor and List Operations in CMAT”; Technical Report, CMAT, Heidelberg.

[367] Hartmann, W. (2007), “Olvi’s Matlab Algorithms in CMAT”; Technical Report, CMAT, Heidelberg.

[368] Hartmann, W. (2011), “Automatic Model Improvement usinf the cfa Function in CMAT”; Technical Report,CMAT, Heidelberg.

[369] Hartmann, W. (2015), “Difference in p values by PROC LOGISTIC and elrm in R”; Technical Report, CMAT,Heidelberg.

[370] Hartmann, W. (2015), “Exact logistic Regression for small Samples”; Technical Report, CMAT, Heidelberg.

[371] Hartmann, W. (2015), “Data Objects in CMAT”; Technical Report, CMAT, Heidelberg.

[372] Hartmann, W.M., & R.E. Hartwig (1996), “Computing the Moore-Penrose Inverse for the Covariance Matrix inConstrained Nonlinear Estimation”, SIAM Journal on Optimization, 6, p. 727-747.

[373] Hartmann, W. M., & So, Y. (1995), ”Nonlinear Least-Squares and Maximum-Likelihood Estimation Using PROCNLP and SAS/IML”, Computer Technology Workshop, American Statistical Association, Joint Statistical Meeting,Orlando, 1995.

[374] Harwell Subroutine Library: Specifications, Vol. 1 and 2 (Release 11), ed. 2, Harwell Laboratory, Oxfordshire, UK.

[375] Hastie, T., & Tibshirani, R., (2004), “Efficient quadratic regularization for expression arrays”, Biostatistics, 5,329-340.

[376] Hawkins, D.M. (1994), “The feasible solution algorithm for least trimmed squares regression”, ComputationalStatistics and Data Analysis, 17, p. 185-196.

[377] Hawkins, D.M. (1994), “The feasible solution algorithm for the minimum covariance determinant estimator inmultivariate data”, Computational Statistics and Data Analysis, 17, p. 197-210.

[378] Hawkins, D.M. & Kass, G.V. (1982), “Automatic Interaction Detection”, in Hawkins, D.M. (ed.): Topics in AppliedMultivariate Analysis, 267-302, Cambridge Univ Press: Cambridge.

[379] Hawkins, D.M., Bradu, D., & Kass, G.V. (1984), “Location of several outliers in multiple regression data usingelemental sets”, Technometrics, 26, 197-208.

[380] Haymond, R.E., Jarvis, J.P. & Shier, D.R. (2013), “Algorithm 613: Minimum spanning tree for moderate integerweights”, ACM TOMS, 10, 108-110.

[381] Hedeker, D. (2012), “MIXREGLS: a Fortran program for mixed-effects location scale analysis”, JSS.

[382] Hedeker, D. & Gibbons, R.D. (1996), “MIXOR: a computer program for mixed-effects ordinal regression analysis”;Computer Methods and Programs in Biomedicine, 49, 157-176.

[383] Hedeker, D. (1999), “MIXNO: a computer program for mixed-effects nominal regression”; Journal of StatisticalSoftware, 4.

[384] Hedeker, D., Demirtas, H., & Mermelstein, R.J. (2009), “A mixed ordinal location scale model for analysis ofEcological Momentary Assessment (EMA) data”; Statistics and its Interface, 2, 391-401.

[385] Hedeker, D., Mermelstein, R.J., & Demirtas, H. (2008), “An application of a mixed-effects location scale model foranalysis of Ecological Momentary Assessment (EMA) data”; Biometrics, 64, 627-634.

[386] Hedeker, D., Mermelstein, R.J., & Demirtas, H. (?), “Modeling between- and within-subject variance in EcologicalMomentary Assessment (EMA) data using mixed-effects location scale models”; submitted.

[387] Hegger, R., Kantz, H. & Schreiber, T. (1999), “Practical implementation of nonlinear time series methods: TheTISEAN package”, CHAOS, 9, 413

[388] Hemker, B.T., Sijtsma, K., & Molenaar, I.W. (1995), “Selection of unidimensional scales from a multidimensionalitem bank in the polytomous Mokken IRT model”; Applied Psychological Measurement, 19, 337-352.

[389] Heiser, W. J. (1981), Unfolding Analysis of Proximity Data; Leiden: Karsten Druckers, PhD Thesis.

[390] Hellakalek, P. & Wegenkittel, S. (2003), “Empirical evidence concerning AES”; ACM Transactions on Modelingand Computer Simulation, 13, 322-333.

[391] Helsgaun, K. (2000), “An effective implementation of the Lin-Kernighan Traveling Salesman Heuristic”, EuropeanJournal of Operational Research, 126, 106-130.

490 Bibliography

[392] Helsgaun, K. (2006), “An effective implementation of K-opt moves for the Lin-Kernighan TSP Heuristic”, Datalo-giske Skrifter, 109, Roskilde University, Denmark.

[393] Henze, N. & Zirkler, B. (1990), “A class of invariant consistent tests for multivariate normality”, Communicationsin Statistics: Theory and Methods, 19,3595-3617.

[394] Herbrich, R. (2002), Learning Kernel Classifiers: Theory and Algorithms, Cambridge and London: MIT Press.

[395] Higham, N.J. (1988), “FORTRAN codes for estimating the one-norm of a real or complex matrix with applicationsto condition estimation”, ACM Trans. Math. Soft., 14, pp. 381-396.

[396] Hirjim, K.F. (2006), Exact Analysis of Discrete Data, New York: Chapman and Hall, CRC.

[397] Hjorth, J.S.U. (1994), Computer Intensive Statistical Methods, London: Chapman & Hall.

[398] Hoaglin, D.C., Mosteller, F., & Tukey, J.W. (1983), “Understanding Robust and Exploratory Data Analysis”, NewYork: John Wiley & Sons.

[399] Hochberg, Y. (1988), “A sharper Bonferroni procedure for multiple tests of significance”, Biometrika, 75, 800-803.

[400] Hochberg, Y. & Tamhane, A. C. (1987), Multiple Comparison Procedures New York: Wiley.

[401] Hochreiter, S., Clevert, D.-A., & Obermayer, K. (2006), “A new summarization method for affymetrix probe leveldata”; Bioinformatics, 22, 943-949.

[402] Hock, W. & Schittkowski, K. (1981), Test Examples for Nonlinear Programming Codes, Lecture Notes in Economicsand Mathematical Systems 187, Springer Verlag, Berlin-Heidelberg-New York.

[403] Hodrick, R. J. & Prescott, E. C. (1997), “Postwar US business cycles: An empirical investigation”; Journal ofMoney, Credit, and Banking, 29, 1-16.

[404] Hoegaerts, L., Suykens, J.A.K., Vandewalle, J., & De Moor, B., (2003), “Kernel PLS variants for regression”, inProc. of the 11th European Symposium on Artificial Neural Networks, Bruges, Belgium; 203-208. Technical Report,Kath. University of Leeuven.

[405] Holm, S. (1979), “A simple sequentielle rejective multiple test procedure”, Scandinavian Journal of Statistics, 6,65-70.

[406] Hommel, G. (1988), “A stagewise rejective multiple test procedure based on a modified Bonferroni test”, Biometrika,75, 383-386.

[407] Horan, C. B. (1969), “Multidimensional scaling: Combining observations when individuals have different perceptualstructures”; Psychometrika, 34, 139-165.

[408] Horn, J. L. & Engstrom, R. (1979), “Cattell’s scree test in relation to Bartlett’s Chi-square test and other obser-vations on the number of factors problem; Multivariate Behavioral Research, 14, 283-300.

[409] Horn, R.A. & Johnson, C.R. (1985, 1996), Matrix Analysis, Cambridge University Press, Cambridge.

[410] Hoyer, P. O. (2004), “Non-negative matrix factorization with sparseness constraints”; Journal of Machine LearningResearch, 5, 1457-1469.

[411] Hsu, C.-W. & Lin, C.-J. (1999), “A Simple Decomposition Method for Support Vector Machines”, Technical Report,Dept. of Computer Science and Information Engineering; National University of Taiwan.

[412] Huang, C. (2011), “SAS vs. R in data mining”, http://www.sasanalysis.com/2011/01/why-sas-is-more-useful-than-r.html

[413] Huber, P. (1981), Robust Statistics, New York: John Wiley & Sons.

[414] Huber, P. J. (1985), “Projection pursuit”, The Annals of Statistics, 13, 435-475.

[415] Huber, W., Heydebreck, A.v., Sueltmann, H., Poustka, A., & Vingron, M. (2002), “Variance stabilization appliedto microarray data calibration and to quantification of differential expression”; Bioinformatics, 18, S96-S106.

[416] Huber, W., Heydebreck, A.v., Sueltmann, H., Poustka, A., & Vingron, M. (2003), “Parameter estimation forthe calibration and variance stabilization of microarray data”; Statistical Applications in Genetics and MolecularBiology, 2, No. 1, Article 3.

[417] Huddleston, H.F., Claypool, P.L., & Hocking, R.R. (1970), “Optimal sample allocation to strata using convexprogramming”; Applied Statistics, 19, 273-278.

[418] Hyvaerinen, A., Karhunen, J. & Oja, E. (2001) Independent Component Analysis, New York: J. Wiley & Sons.

Bibliography 491

[419] Hwang, Y.T. & Wang, C.C. (2009), “Assessing multivariate normality based on Shapiro-Wilk test”, Journal of theChinese Statistical Association, 47, 143-158.

[420] Iglewicz, B. (1983), “Robust scale estimators and confidence intervals for location”; in: Hoaglin, D.C., Mosteller,F., & Tukey, J.W.: “Understanding Robust and Exploratory Data Analysis”, New York: John Wiley & Sons.

[421] James, L.R., Mulaik, S.A., & Brett, J.M. (1982), Causal Analysis: Assumptions, Models, and Data, Beverly Hills:Sage Publications, Inc.

[422] Janert, P. K. (2009), Gnuplot in Action: Understanding data with graphs, Greenwich CT: Manning PublicationsCo.

[423] Jarque, C. M. & Bera, A. K. (1987), “A test for normality of observations and regression residuals”, InternationalStatistical Review, 55, 163-172.

[424] Jenkins, M.A. & Traub, J.F. (1972), “Zeros of a complex polynomial”, ACM.

[425] Jenkins, M.A. & Traub, J.F. (1975), “Principles for testing polynomial zerofinding programs”, ACM TOMS, 1,26-34.

[426] Jennrich, R. I. (1973), “Standard errors for obliquely rotated factor loadings”; Psychometrika, 38, 593-604.

[427] Jennrich, R. I. (1974), “Simplified formulae for standard errors in maximum likelihood factor analysis”; BritishJournal of Math. and Statist. Psychology, 27, 122-131.

[428] Jennrich, R.I. (1987), ”Tableau Algorithms for Factor Analysis by Instrumental Variable Methods”, Psychometrika,52, 469-476.

[429] Jennrich, R.I. & Schluchter, M.D. (1986), “Unbalanced repeated-measures models with structured covariance ma-trices”, Biometrics, 42, 805-820.

[430] Jensen, D.R. & Ramirez, D.E. (1998), “Detecting outliers with Cook’s DI statistics”, Computing Science andStatistics, 29(1), 581-586.

[431] Joachims, T. (1999), ”Making large-scale SVM learning practical”, in B. Scholkopf, C.J.C. Burges, and A.J. Smola(eds), Advances in Kernel Methods: Support Vector Learning, Cambridge: MIT Press.

[432] Joe, H. & Xu, J. (2007), “The estimation method of inference for margins for multivariate models”; TechnicalReport.

[433] Joreskog, K.G. (1963), Statistical Estimation in Factor Analysis, Stockholm: Almqvist & Wicksell.

[434] Joreskog, K.G. (1969), ”Efficient estimation in image factor analysis”, Psychometrika, 34, 51-75.

[435] Joreskog, K.G. (1973), ”A general method for estimating a linear structural equation system”, in Structural EquationModels in the Social Sciences, eds. A.S. Goldberger & O.D. Duncan, New York: Academic Press.

[436] Joreskog, K.G. (1978), ”Structural Analysis of Covariance and Correlation Matrices”, Psychometrika, 43, 443-477.

[437] Joreskog, K.G. (1982), ”Analysis of Covariance Structures”, in A Second Generation of Multivariate Analysis, ed.C. Fornell, New York: Praeger Publishers.

[438] Joreskog, K.G. & Sorbom, D. (1979), Advances in Factor Analysis and Structural Equation Modeling, CambridgeMA: Abt Books.

[439] Joreskog, K.G. & Sorbom, D. (1988), LISREL 7: A Guide to the Program and Applications, SPSS Inc., Chicago,Illinois.

[440] Johnson, K., Mandal, A. & Ding, T. (2007), “Software for implementing the sequential elimination algorithm oflevel combination algorithm”; JSS 2007.

[441] Johnson, M. E. (1987), Multivariate Statistical Simulation, New York: John Wiley & Sons.

[442] Jonker, R. & Volgenant, A. (1983), “Transforming asymmetric into symmetric traveling salesman problems”, Op-erations Reseatrch Letters, 2, 161-163.

[443] Jonker, R. & Volgenant, A. (1987), “A shortest augmenting path algorithm for dense and sparse linear assignmentproblems”, Computing, 38, 325-340.

[444] Kahaner, D., Moler, C. & Nash, S. (1989), Numerical Methods and Software, Prentice Hall, Englewood Cliffs, NJ.

[445] Kantz, H. (1994), “A robust method to estimate the maximal Lyapunov exponent of a time series”, Phys. Lett., A185, 77

492 Bibliography

[446] Kantz, H. & Schreiber, T. (2004), Nonlinear Time Series Analysis, 2nd edition, Cambridge: University Press.

[447] Kass, G.V. (1980), “An exploratory technique for investigating large quantities of categorical data”, Applied Statis-tics, 29, 119-127.

[448] Kaufman, L. & Rousseeuw, P.J. (1990), Finding Groups in Data, New York: John Wiley & Sons.

[449] Kay, S. M. & Marple, S. L. Jr. (1981), “Spectrum analysis - a modern perspective”; Proceedings of the IEEE, 69,1380-1419.

[450] Keerthi, S.S., Shevade, S.K., Bhattacharyya, C, & Murthy, K.R.K. (1999a), ”A fast iterative nearest point algorithmfor support vector machine classifier design”, Technical Report TR-ISL-99-03, Department of CSA, Bangalore, India.

[451] Keerthi, S.S., Shevade, S.K., Bhattacharyya, C, & Murthy, K.R.K. (1999b), ”Improvements of Platt’s SMO al-gorithm for SVM classifier design”, Technical Report CD-99-14, Dep. of Mechanical Production and Engineering,National University of Singapore.

[452] Kellerer, H., Pferschy, U., & Pisinger, D. (2004), Knapsack Problems, Berlin, Heidelberg: Springer Verlag.

[453] Kennedy, W.J. & Gentle, J,E. (1988), Statistical Computing, New York: Marcel Dekker.

[454] Kernighan, B.W., & Ritchie, D.M. (1978), The C Programming Language, Prentice Hall, Englewood Cliffs, NJ.

[455] Keuls, M. (1952), “The use of the studentized range in connection with an analysis of variance”, Euphytica, 37,112-122.

[456] Kim, M. (1993), ”Forecasting the Volatility of Financial Markets: ARCH/GARCH Models and the AUTOREGProcedure”, in Proceedings of the Eighteenth Annual SAS Users Group International Conference, pp. 304-312, Cary:SAS Institute Inc.

[457] Kleinbaum, D.G. (1998), Kupper, L.L., Muller, K.E., & Nizam, A. (1998), Applied Regression Analysis and OtherMultivariate Methods, North Scituate, MA: Duxbury Press.

[458] Klugkist, I., Laudy, O., & Hoijtink, H. (2005), “Inequality constrained analysis of variance, A Bayesian Approach”,Psychological Methods, 10, 477-493.

[459] Klugkist, I. & Hoijtink, H. (2007), “The Bayes factor for inequality and about equality constrained models”,Computational Statistics and Data Analysis, 51, 6367-6379.

[460] Knuth, D.E. (1986), The TEXbook, Seventh Printing, Addison-Wesley, Reading, MA.

[461] Knuth, D.E. (1973), The Art of Computer Programming; Vol.1: Fundamental Algorithms, Vol.2: SeminumericalAlgorithms, Vol.3: Sorting and Searching, Addison-Wesley, Reading, MA.

[462] Kohonen, T. (2001), Self-Organizing Maps, Springer Verlag, Berlin.

[463] Kolda, T.G. & O’Leary, D.P. (1998), “A semidiscrete matrix decomposition for latent semantic indexing in infor-mation retrieval”; ACM Trans. Inf. Syst., 16, 322-346.

[464] Kolda, T.G. & O’Leary, D.P. (1999a), “Latent semantic indexing via a semi-discrete matrix decomposition”, in TheMathematics of Information Coding. Extraction and Distribution, Vol 107 of the IMA Volumes in Mathematics andIts Applications, pp. 73-80, Springer Verlag.

[465] Kolda, T.G. & O’Leary, D.P. (1999b), Computation and Uses of the Semidiscrete Matrix Decomposition, TechnicalReport, Sandia National Laboratories, Livermore, CA.

[466] Kolmogorov, A. (1933), “Sulla determinazione empirica di una legge di distributione”, Giornale dell’ InstitutoItaliano degli Attuari, 4, 83-91.

[467] Krane, W.R. & McDonald, R.P. (1978), ”Scale invariance and the factor analysis of correlation matrices”, BritishJournal of Mathematical and Statistical Psychology, 31, 218-228.

[468] Kreft, I. & de Leeuw, J. (1998), Introducing Multilevel Modeling, Beverly Hills, CA: SAGE Publications, Inc.

[469] Kruskal, J.B. (1964a), “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis”, Psy-chometrika, 29, 1-27.

[470] Kruskal, J.B. (1964b), “Nonmetric multidimensional scaling: A numerical method”, Psychometrika, 29, 115-129.

[471] Kruskal, W.H. & Wallis, W.A. (1952), “Use of ranks in one-criterion variance analysis”, JASA, 47, 583-621.

[472] Kruskal, W.H. (1957), “Historical Note on the Wilcoxon unpaired two-sample test”, JASA, 52, 356-360.

Bibliography 493

[473] Kugiumtzis, D. (2002), “Surrogate Data Test for Nonlinearity using Statically Transformed Autoregressive Process”,Physical Review E, 66, 025201.

[474] Kugiumtzis, D. (2009), “Measures of analysis of time series toolkit (MATS)”, paper submitted to JSS.

[475] Kuiper, R.M., Klugkist, I. & Hojtink, H. (2007), “A Fortran 90 Program for Confirmatory Analysis of Variance”,JSS 422 .

[476] Kursa, M.B. & Rudnicki, W.R. (2010), “Feature Selecion with the Boruta Package”; JSS, 2010.

[477] Lambert, Z.V., Wildt, A.R. & Durand, R.M. (1990), “Assessing sampling variation relative to number-of-factorscriteria”; Educational and Psychological Measurement, 50, 253-257.

[478] Lambert, Z.V., Wildt, A.R. & Durand, R.M. (1991), “Approximating confidence intervals for factor loadings”;Multivariate Behavioral Research, 26, 412-434.

[479] Lamport, L. (1986), LaTeX - A Document Preparation System - User’s Guide and Reference Manual, Fifth Printing,Addison-Wesley, Reading, MA.

[480] “Using SAS GENMOD procedure to fit diagonal class models to square contingency tables having ordered cate-gories”; Proceedings of the Midwest SAS Users Group, 149-160.

[481] Lawal, H.B. & Sundheim, R.A. (2002), “Generating factor variables for asymmetry, non-independence, and skew-symmetry models in square contingency tables using SAS”, JSS, 2002.

[482] Lawless, J.F. (1982), Statistical Models and Methods for Lifetime Data, New York: John Wiley & Sons, Inc.

[483] Lawley, D.N. & Maxwell, A.E. (1971), Factor Analysis as a Statistical Method, New York: American ElsevierPublishing Company.

[484] Lawson, C.L. & Hanson, R.J. (1974), Solving Least Squares Problems, Englewood Cliffs, NJ: Prentice Hall.

[485] Lawson, C.L. & Hanson, R.J. (1995), Solving Least Squares Problems, Philadelphia, PA: SIAM.

[486] Lawson, C.L.,Hanson, R.J., Kincaid, D.R., & F.T. Krogh (1979), “Basic linear algebra subprograms for Fortranusage”, ACM Transactions on Mathematical Software, 5, pp. 308-323 and 324-325.

[487] L’Ecuyer, P. (1996a), “Combined Multiple Recursive Random Number Generators”, Operations Research, 44, 816-822.

[488] L’Ecuyer, P. (1996b), “Maximally equidistributed combined Tausworthe generators”, Math. of Comput., 65, 203-213.

[489] L’Ecuyer, P. (1999), “Tables of maximally equidistributed combined LFSR generators”, Math. of Comput., 68,261-269.

[490] Ledoit, O. & Wolf, M. (2003a), “Improved estimation of the covariance matrix of stock returns with an applicationto portfolio selection”; Journal of Empirical Finance, 10, 603-621.

[491] Ledoit, O. & Wolf, M. (2003b), “Honey I shrank the sample covariance matrix”; Technical Report.

[492] Lee, W. and Gentle, J.E. (1986), ”The LAV Procedure”, SUGI Supplemental Library User’s Guide, Cary: SASInstitute, Chapter 21, pp. 257-260.

[493] Lee, D.D. & Seung, S. (1999), “Learning the parts of objects by non-negative matrix factorization”; Nature, 401,788-791.

[494] Lee, D.D. & Seung, S. (2001), “Algorithms for non-negative matrix factorization”; Advances in Neural InformationProcessing Systems, 13, 556-562.

[495] Lee, J.Y., Soong, S.J., Shaw, H.M., Balch, C.M., McCarthy, W.H. & Urist, M.M. (1992), “Predicting survival andrecurrence in localized melanoma: A multivariate approach”; World Journal of Surgery, 16, 191-195.

[496] Lee, J., Yoshizawa, C., Wilkens, L., & Lee, H.P. (1992), “Covariance adjustment of survival curves based on Cox’sproportional hazards regression model”, Comput. Appl. Biosc., 8, 23-27.

[497] Lee, S.Y. (1985),”On Testing Functional Constraints in Structural Equation Models”, Biometrika, 72, 125-131.

[498] Lee, S.Y. & Jennrich, R.I. (1979), ”A Study of Algorithms for Covariance Structure Analysis with Specific Com-parisons Using Factor Analysis”, Psychometrika, 44, 99-113.

494 Bibliography

[499] Lee, Y., Lin, Y., & Wahba, G. (2003), Multicategory Support Vector Machines, Theory and Application to theClassification of Microarray Data and Satellite Radiance Data, Technical Report 1064, Dep. of Statistics, Univ. ofWisconsin, 2003.

[500] Lee, Y., Kim, Y., Lee, S. & Koo, J.Y. (2005), Structured Multicategory Support Vector Machines with Analysis ofVariance Decomposition, Technical Report .

[501] Lee, Y.-Y. & Mangasarian, O. (1999), SSVM: A smooth support vector machine, Technical Report 99-03, ComputerSciences Dept., University of Wisconsin, Madison.

[502] Lee, Y.-Y. & Mangasarian, O. (2000), RSVM: Reduced support vector machines, Technical Report 00-07, ComputerSciences Dept., University of Wisconsin, Madison.

[503] Lehoucq, R.B., Sorensen, D.C., Yang, C. (1998), ARPACK Users’ Guide, Philadelphia: SIAM.

[504] Leong, P.H.W., Zhang, G., Lee, D.U., Luk, W., & Villasenor, J.D. (2005), “A comment on the implementation ofthe ziggurat method”; JSS 12.

[505] Lewis,J.G. (1977), “Algorithms for Sparse Matrix Eigenvalue Problems”, Report STAN-CS-77-595, Computer Sci-ence Department, Stanford University, Stanford, California, March 1977.

[506] Li, K.C. (1991), “Sliced inverse regression for dimension reduction”, JASA, 86, 316-342.

[507] Li, K.C. (1992), “On principal Hessian directions for data visualization and dimension reduction: Another applica-tion of Stein’s lemma”, JASA, 87, 1025-1034.

[508] Liaw, A. (2009), randomForest package in R.

[509] Liaw, A. & Wiener, M. (2002), “Classification and Regression by randomForest”, R News, 2, 3.

[510] Liang, K.Y. & Zeger, S.L. (1986) Longitudinal data analysis using generalized linear models, Biometrika, 73, 13-22.

[511] Liebman, J.; Lasdon, L.; Schrage, L.; and Waren, A. (1986), Modeling and Optimization with GINO, California:The Scientific Press.

[512] Lin, C.-J. & More, J.J. (1999), “Newton’s method for large bound constrained problems”, SIAM Journal of Opti-mization, 9, 1100-1127.

[513] Lin, S. (1965), “Computer solutions of the traveling-salesman problem”, Bell Systems Technology, 44, 2245-2269.

[514] Lin, S. & Kernighan, B. (1973), “An effective heuristic algorithm for the traveling-salesman problem”, OperationsResearch

[515] Lin, Y., Lee, Y., & Wahba, G. (2000), Support Vector Machines for Classification in Nonstandard Situations,Technical Report 1016, Dep. of Statistics, Univ. of Wisconsin, 2000.

[516] Lindstrom, P. & Wedin, P.A. (1984), “A new linesearch algorithm for nonlinear least-squares problems”, Mathe-matical Programming, 29, 268-296.

[517] Liu, H & Wu, T. (2003), “Estimating the Area under a Receiver Operating Characteristic (ROC) Curve for RepeatedMeasures Design”, JSS, 2003.

[518] Liu, J. W-H. (1985), “Modification of the minimum degree algorithm by multiple elimination”, ACM Trans. Math.Software, 11, 141-153.

[519] Liu, L., Hawkins, D.M., Ghosh, S., & Young, S.S. (2003), “Robust Singular Value Decomposition Analysis ofMicroarray Data”; PNAS, 100, 13167-13172 (Nov. 11, 2003).

[520] Liu, W., Jamshidian, M., Zhang, Y., Bretz, F., & Han, X. (2007), “Some new methods for the comparison of tworegression models”, Journal of Statistical Planning and Inference, 137, p. 57-67.

[521] Loevinger, J. (1948), “The technic of homogeneous tests compared with some aspects of scale analysis and factoranalysis”; Psychological Bulletin, 45, 507-529.

[522] Long, J.S. (1983), Covariance Structure Models, an Introduction to LISREL, Beverly Hills, CA: SAGE Publications,Inc.

[523] Long, J.S. (1997) Regression Models for Categorical and Limited Dependent Variables; Beverly Hills, CA: SAGEPublications, Inc.

[524] Lopuhaa, H.P. & Rousseeuw, P.J. (1991), “Breakdown Points of Affine Equivariant Estimators of MultivariateLocation and Covariance Estimators”, The Annals of Statistics, 19, 229-248.

Bibliography 495

[525] Lord, F.M., & Novick, M.R. (1968), Statistical theories of mental test scores; Reading, MA: Addison-Wesley.

[526] Lucas, J. W. (2003), “Status processes and the institutionalization of women as leaders”; American SociologicalReview, 68, 464-480.

[527] Luscher, M. (1994), “A portable high-quality random number generator for lattice field theory simulations”; Com-puter Physics Communications, 79, 100-110.

[528] MacKinnon, D.P. (1992), “Statistical simulation in CALIS”, Proceedings of the 17th Annual SAS Users GroupInternational Conference, 1199-1203, Cary NC: SAS Institute Inc.

[529] Madsen, K. (1975), “Minimax Solution of Non-Linear Equations Without Calculating Derivatives”, MathematicalProgramming Study, 3, 110-126.

[530] Madsen, K. & Nielsen, H.B. (1990), “Finite algorithms for robust linear regression”, BIT, 30, 682-699.

[531] Madsen, K. & Nielsen, H.B. (1993), “A finite smoothing algorithm for linear l1 estimation”, SIAM Journal onOptimization, 3, 223-235.

[532] Madsen, K., Nielsen, H.B., & Pinar, M.C. (1995), “A finite continuation algorithm for bound constrained quadraticprogramming”, Technical Report IMM-REP-1995-22, Technical University of Denmark.

[533] Madsen, K., Nielsen, H.B., & Pinar, M.C. (1996), “A new finite continuation algorithm for linear programming”,SIAM Journal on Optimization, 6, 600-616.

[534] Madsen, K., Tingleff, O., Hansen, P.C., Owczarz, W. (1990), “Robust Subroutines for Non-Linear Optimization”,Technical Report NI-90-06, Technical University of Denmark.

[535] Maiti, S. I. (2009), “Seasonal adjustment in SAS/BASE software - An alternative to PROC X11/X12”; JSS 442,2009.

[536] Malinowski, E. R. (1991), Factor Analysis in Chemistry, New York: John Wiley & Sons.

[537] Malkiel, B. C. (1985), A Random Walk Down Wall Street, New York: Norton, Fourth Edition.

[538] Mandal, A., Wu, C.F.J., & Johnson, K. (2006), “SELC: Sequential elimination of level combinations by means ofmodified genetic algorithms”; Technometrics, 48(2), 273-283.

[539] Mandal, A., Johnson, K., Wu, C.F.J., & Bornemeier, D. (2007), “Identifying promising compounds in drug dis-covery: Genetic algorithms and some new statistical techniques”; Journal of Chemical Information and Modeling,47(3), 981-988.

[540] Mangasarian, O.L. (1995), Nonlinear Programming, Philadelphia: SIAM.

[541] Mangasarian, O.L. & Musicant, D.R (1999), “Successive overrelaxion for support vector machines”, IEEE Trans-actions on Neural Networks, 10, 1032-1037.

[542] Mangasarian, O.L. & Musicant, D.R (2000a), “Active Support Vector Machine Classification”, Technical Report00-04, Data Mining Institute, University of Wisconsin, Madison, Wisconsin.

[543] Mangasarian, O.L. & Musicant, D.R (2000b), “Lagrangian Support Vector Machines”, Technical Report 00-06,Data Mining Institute, University of Wisconsin, Madison, Wisconsin.

[544] Mangasarian, O.L. & Thompson, M.E. (2006), “Massive data classification via unconstrained support vector ma-chines”, Journal of Optimization Theory and Applications. Technical Report 06-07, Data Mining Institute, Univer-sity of Wisconsin, Madison, Wisconsin.

[545] Mangasarian, O.L. & Thompson, M.E. (2006), “Chunking for massive nonlinear kernel classification”, TechnicalReport 06-07, Data Mining Institute, University of Wisconsin, Madison, Wisconsin.

[546] Mangasarian, O.L. & Wild, E.W. (2004), “Feature Selection in k-Median Clustering”, Technical Report 04-01, DataMining Institute, University of Wisconsin, Madison, Wisconsin.

[547] Mangasarian, O.L. & Wild, E.W. (2006), “Feature Selection for Nonlinear Kernel Support Vector Machines”,Technical Report 06-03, Data Mining Institute, University of Wisconsin, Madison, Wisconsin.

[548] Mann, H. & Whitney, D. (1947), “On a test whether one of two random variables is stochastically larger than theother”; Annals of Mathematical Statistics, 18, 50-60.

[549] Mardia, K. V. (1970), “Measures of multivariate skewness and kurtosis with applications”, Biometrika, 57, 519-530.

496 Bibliography

[550] Mardia, K. V. (1974), “Applications of some measures of multivariate skewness and kurtosis in testing normalityand robustness studies”, Sankhya: The Indian Journal of Statistics, Series B, 36, 115-128.

[551] Mardia, K. V. & Foster, K. (1983), “Omnibus tests of multinormality based on skewness and kurtosis”, Communi-cations in Statistics, Part A: Theory and Methods, 12, 207-221.

[552] Mardia, K. V., Kent, J.T., & Bibby, J.M. (1979), Multivariate Analysis,

[553] Markowitz, H. (1952), ”Portfolio Selection”, Journal of Finance, 7, 99-91.

[554] Markowitz, H. (1959), Portfolio Selection: Efficient Diversification of Investments, New York: John Wiley & Sons,Inc.

[555] Markus, M.T. (1994), Bootstrap Confidence Regions in Nonlinear Multivariate Analysis, Leiden: DSWO Press.

[556] Marple, S. L. Jr. (1991), “A fast computational algorithm for the modified covariance method of linear prediction”;Digital Signal Processing, 1, 124-133.

[557] Marsaglia, G., “The Marsaglia Random Number CDROM”, with “The Diehard Battery of Tests of Randomness”,www.stat.fsu.edu/pub/diehard

[558] Marsaglia, G. (1965), “Ratios of normal variables and ratios of sums of uniform variables”; JASA, 60, 193-204.

[559] Marsaglia, G. (2003), “Random number generators”, Journal of Modern Applied Statistical Methods, 3, .

[560] Marsaglia, G. (2003), “Xorshift RNGs”, JSS, 2003.

[561] Marsaglia, G. (2004), “Evaluating the normal distribution function”, JSS, 2004.

[562] Marsaglia, G. (2004), “Fast generation of discrete random variables”, JSS, 2004.

[563] Marsaglia, G. (2006), “Ratios of normal variables”; JSS, 2006.

[564] Marsaglia, G., & Tsang, W.W. (1998), “The Monte Python method for generating Gamma variables”, JSS 3, 1998.

[565] Marsaglia, G., & Tsang, W.W. (2000), “The Ziggurat method for generating random variables”, JSS 5, 2000.

[566] Marsaglia, G. & Tsang, W.W. (2002), “Some difficult to pass tests of randomness”, JSS, 2002.

[567] Marsaglia, G., Tsang, W.W., & Wang, J. (2003), “Evaluating Kolmogorov’s Distribution”, JSS, 8, 2003.

[568] Marsaglia, G., Tsang, W.W., & Wang, J. (2004), “Fast generation of discrete random variables”; JSS, 2004.

[569] Marsaglia, G. & Tsay, L.H. (1985), “Matrices and the structure of random number sequences”, Linear Algebra andits Applications, 67, 147-156.

[570] Marsh, H.W., Balla, J.R., & McDonald, R.P. (1988), ”Goodness-of-fit indices in confirmatory factor analysis. Theeffect of sample size”, Psychological Bulletin, 103, 391-410.

[571] Martello, S. (1983), “Algorithm 595: An enumerative algorithm for finding Hamiltonian circuits in a directedgraph”, ACM TOMS, 9, 131-138.

[572] Martello, S. & Toth, P. (1990), Knapsack Problems: Algorithms and Computer Implementations, New York: Wiley& Sons.

[573] Mashtare, T. L., Jr. & Hutson, A. D. (2010), “SAS Macro for estimating the standard error of area under the curvewith an application to receiver operating curves”; paper submitted to JSS.

[574] Masters, G.N. (1982), “A Rasch model for partial credit scoring”; Psychometrika, 47, 149-174.

[575] MATLAB Reference Guide (1992), The MathWorks, Inc., Natick MA.

[576] Matsumoto, M. & Nishimura, T. (1998), “Mersenne-Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator”; ACM Transactions on Modeling and Computer Simultaion, 8, 3-30.

[577] May, W.L. & Johnson, W.D. (1997), “A SAS macro for constructing simultaneous confidence intervals for multi-nomial proportions”, Computer Methods and Programs in Biomedicine, 53, 153-162.

[578] May, W.L. & Johnson, W.D. (2000), “Constructing two-sided simultaneous confidence intervals for multinomialproportions for small counts in a large number of cells”, JSS, 2000.

[579] Maydeu-Olivares, A. (2001), “Multidimensional IRT Modeling of Binary Data: Large Sample Properties of NO-HARM Estimates”, J. of Educational and Behavioral Statistics, 26, 49-69.

Bibliography 497

[580] Maydeu-Olivares, A. (2005), “Further empirical results on parametric vs. non-parametric IRT modeling of Likert-type personality data”; Multivariate Behavioral Research, 40, 275-293.

[581] Maydeu-Olivares, A. (2006), “Limited information estimation and testing of discretized multivariate normal struc-tural models”; Psychometrika, 71, 57-77.

[582] Maydeu-Olivares, A., Coffman, D.L. & Hartmann, W. (2007), “Asymptotically distribution-free (ADF) intervalestimation of coefficient alpha”, Psychological Methods, 12, 157-176.

[583] Maydeu-Olivares, A., & Joe, H. (2005), “Limited and full information information estimation and goodness-of-fittesting in 2n contingency tables: A unified framework”; JASA, 100, 1009-1020.

[584] Maydeu-Olivares, A., & Joe, H. (2006), “Limited information goodness-of-fit testing in multidimensional contin-gency tables”; Psychometrika, 71, 713-732.

[585] McArdle, J.J. (1980), ”Causal Modeling Applied to Psychonomic Systems Simulation”, Behavior Research Methods& Instrumentation, 12, 193-209.

[586] McArdle, J.J. (1988), ”Dynamic but Structural Equation Modeling of Repeated Measures Data”, in The Handbookof Multivariate Experimental Psychology, eds. J.R. Nesselroade and R.B. Cattell, New York: Plenum Press.

[587] McArdle, J.J. & Boker, S.M. (1986), RAMpath - Path Diagram Software, Denver: DATA Transforms, Inc.

[588] McArdle, J.J. & McDonald, R.P. (1984), “Some Algebraic Properties of the Reticular Action Model”, Br. J. math.statist. Psychol., 37, 234-251.

[589] MacCallum, R. (1986), “Specification Searches in Covariance Structure Modeling”, Psychological Bulletin, 100,107-120.

[590] McBane, G.C. (2006), “Programs to compute distribution functions and critical values for extreme value ratios foroutlier detection”; JSS, 2006.

[591] McKean, J.W. and Schrader, R.M. (1987), ”Least absolute errors analysis of variance”, In: Y. Dodge, ed. StatisticalData Analysis - Based on L1 Norm and Related Methods, Amsterdam: North Holland, 297-305.

[592] McDonald, R.P. (1978), “A Simple Comprehensive Model for the Analysis of Covariance Structures”, Br. J. math.statist. Psychol., 31, 59-72.

[593] McDonald, R.P. (1980), “A Simple Comprehensive Model for the Analysis of Covariance Structures: Some Remarkson Applications”, Br. J. math. statist. Psychol., 33, 161-183.

[594] McDonald, R.P. (1984), “Confirmatory Models for Nonlinear Structural Analysis”, in Data Analysis and Informat-ics, III, eds. E. Diday et al., North Holland: Elsevier Publishers.

[595] McDonald, R.P. (1985), Factor Analysis and Related Methods, Hillsdale NJ and London: Lawrence Erlbaum Asso-ciates.

[596] McDonald, R.P. (1989), ”An Index of Goodness-of-Fit Based on Noncentrality”, Journal of Classification, 6, 97-103.

[597] McDonald, R.P. (1997), “Normal ogive multidimensional model”, in W.J. van der Linden and R. K. Hambleton(Eds.): Handbook of Modern Item Response Theory, pp. 257-269, New York: Springer.

[598] McDonald, R.P., & Hartmann, W. (1992), “A Procedure for Obtaining Initial Values of Parameters in the RAMModel”, Multivariate Behavioral Research, 27, 57-76.

[599] McDonald, R.P. & Marsh, H.W. (1988), “Choosing a Multivariate Model: Noncentrality and Goodness of Fit”,Distributed Paper.

[600] McDonald, R.P. & Mok, M.C. (1995), “Goodness of fit in item response models”, Multivariate Behavioral Research,54, 483-495.

[601] McDonald, R.P., Parker, P.M., & Ishizuka, T. (1993), “A scale-invariant treatment of recursive path models”,Psychometrika, 58, 431-443.

[602] McKean, J.W. & Schrader, R.M. (1987), “Least absolute errors analysis of variance”, In: Y. Dodge, ed. StatisticalData Analysis - Based on L1 Norm and Related Methods, Amsterdam: North Holland, 297-305.

[603] McLachlan, G.J. (1987), “On bootstrapping the likelihood ratio test statistic for the number of components in anormal mixture”, Applied Statistics, 36, 318-324.

[604] McLachlan, G.J. & Krishnan, T. (1997), The EM Algorithm and Extensions, New York: John Wiley & Sons.

498 Bibliography

[605] McMillan, G.P. & Hanson, T. (2005), “SAS Macro BDM for fitting the Dale (1984) regression model to bivariatemordinal response data”; JSS, 2005.

[606] McMillan, G.P., Hanson, T., Bedrick, E. & Lapham, S.C. (2005), “Application of the Bivariate Dale Model toEstimating Risk Factors for Alcohol Consumption Patterns”, submitted.

[607] Meeker, W.Q. & Escobar, L.A. (1995), “Teaching about approximate confidence regions based on maximum likeli-hood estimation”, American Statistician, 49, 48-53.

[608] Mehta, C. R. & Patel, N. R. (1986), “ Algorithm 643: FEXACT: A Fortran subroutine for Fisher’s exact test onunordered r ∗ c contingency tables”, ACM Transactions on mathematical Software, 12, 154-161.

[609] Meulman, J. (1982) Homogoneity Analysis of Incomplete Data, Leiden: DSWO Press, 1982.

[610] Miha, A., Hayter, J., & Kuriki, S. (2003), “The evaluation of general non-centerd orthant probabilities”, BritishJournal of the Royal Statistical Society, Ser. B, 65, 223-234.

[611] Milan, L. & Whittaker, J. (1995), “Application of the parametric bootstrap to models that incorporate a singularvalue decomposition”; Applied Statistics, 44, 31-49.

[612] Miles, R.E. (1959), “The complete amalgamation into blocks, by weighted means, of a finite set of real numbers”,Biometrika, 46, 317-327.

[613] Miller, A. (2002), Subset Selection in Regression, CRC Press, Chapman & Hall, 2nd Edition.

[614] Miller, A.J. (1990), Subset Selection in Regression, New York: Chapman and Hall.

[615] Mills T. C. (1990), Time Series Techniques for Economists, Cambridge: Cambridge University Press.

[616] MKS Lex & Yacc, (1993), Mortice Kern Systems Inc., Waterloo, Ontario CA.

[617] Mokken, R.J. (1971), A Theory and Procedure of Scale Analysis; DeGruyter.

[618] Mokken, R.J. (1997), “Nonparametric models for dichotomous responses”; in: W. J. van der Linden and R. K.Hambleton (eds): Handbook of Modern Item Response Theory New York: Springer.

[619] Molenaar, I.W. (1997), “Nonparametric models for polytomous responses”; in: W. J. van der Linden and R. K.Hambleton (eds): Handbook of Modern Item Response Theory New York: Springer.

[620] Molenberghs, G. & Lesaffre, E. (1994), “Marginal Modeling of Correlated Ordinal Data Using a MultivariatePlackett Distribution”, JASA, 89, 633-644.

[621] Momma, M. & Bennett, K.P. (2004), “Constructing orthogonal latent features for arbitrary loss”; Technical Report;Troy: Rensselaer Polytechnic Institute.

[622] Monahan, J.F. (2005), “Some algorithms for the conditional mean vector and covariance matrix”; JSS 2005.

[623] Moon, H., Lee, J.J., Ahn, H., & Nikolova, R.G. (2002), “Web-based simulator for sample size and power estimationin animal carcinogenicity studies”, JSS, 2002.

[624] More, J.J. (1978), “The Levenberg-Marquardt Algorithm: Implementation and Theory”, in Lecture Notes in Math-ematics 630, ed. G.A. Watson, Springer Verlag, Berlin-Heidelberg-New York, 105-116.

[625] More, J.J., Garbow, B.S., & Hillstrom, K.E. (1980), User Guide for MINPACK-1, Argonne National Laboratory,Argonne, IL.

[626] More, J.J., Garbow, B.S., & Hillstrom, K.E. (1981), “Fortran Subroutines for Testing Unconstrained OptimizationSoftware”, ACM Trans. Math. Software, 7, 17-41.

[627] More, J.J. & Sorensen, D.C. (1983), “Computing a Trust-Region Step”, SIAM Journal on Scientific and StatisticalComputation, 4, 553-572.

[628] More, J.J. and Wright, S.J. (1993), Optimization Software Guide, Philadelphia: SIAM.

[629] Morgan, B.J.T. (1992), Analysis of Quantal Response Data, London: Chapman & Hall.

[630] Mosteller, F. & Tukey, J.W. (1977), Data Analysis and Regression: A Second Course in Statistics, Reading:Addison-Wesley.

[631] Mottonen, J. & Oja H. (1995), “Multivariate spatial sign and rank methods”; Journal of Noparametric Statistics,5, 201-213.

Bibliography 499

[632] Mudholkar, G. S., McDermott, M., & Srivastava, D. K. (1992), “A test of p-variate normality”, Biomerika, 79,850-854.

[633] Mulaik, S.A. (1972) The Foundations of Factor Analysis, New York: McGraw Hill Comp.

[634] Mulaik, S.A., James, L.R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C.D. (1989), ”Evaluation of Goodness-of-Fit Indices for Structural Equation Models”, Psychological Bulletin, 105, 430-445.

[635] Murtagh, B. A. (1981), “Advanced Linear Programming: Computation and Practice”, McGraw - Hill, New York.

[636] Murtagh, B.A. & Saunders, M.A. (1983), MINOS 5.0 User’s Guide; Technical Report SOL 83-20, Stanford Univer-sity.

[637] Muthen, B.O. (1987), LISCOMP: Analysis of Linear Structural Relations Using a Comprehensive MeasurementModel, Mooresville IN: Scientific Software, Inc.

[638] Nakaya, T. (1997), “Statistical inferences in bidimensional regression models”, Geographical Analysis, 29, 169-186.

[639] Narula, S.C. & Wellington, J.F. (1977), “Prediction, linear regression, and minimum sum of relative error”, Tech-nometrics, 19, 185-190.

[640] Nazareth, J. L. (1987), “Computer Solutions of Linear Programs”, Oxford University Press, New York - Oxford.

[641] Nelder, J.A. & Mead, R. (1965), “A Simplex Method for Function Minimization”, Computer Journal, 7, 308-313.

[642] Nemhauser, G. L. & Wolsey, L. A. (1988), Integer and Combinatorial Optimization, New York: John Wiley & Sons

[643] Neter, J., Wasserman, W. & Kutner, M.H. (1990), Applied Linear Statistical Models, 3rd edition, Burr Ridge IL:R.D. Irwin.

[644] Nevalainen, J. & Oja, H. (2006), “SAS/IML macros for multivariate analysis of variance based on spatial signs”;JSS, 2004.

[645] Ng, E. & Peyton, B.W. (1993), “Block sparse Cholesky algorithms on advanced uniprocessor computers”, SIAM J.Sci. Comput., 14, 1034-1056.

[646] Nollau, V. & Hahnewald-Busch, A. (1975), Statistische Analysen; Mathematische Methoden der Planung undAuswertung von Versuchen, Leipzig: VEB Fachbuchverlag.

[647] Oberdiek, H. (1999), “PDF information and navigation elements with hyperref, pdfTeX, and thumbpdf”, Slides.

[648] Ogasarawa, H. (1998), “Standard errors for rotation matrices with an application to the promax solution”; BritishJournal of Math. and Statist. Psychology, 51, 163-178.

[649] Ogasarawa, H. (1999), “Standard errors for Procrustes solutions”; Japanese Psychological Research, 41, 121-130.

[650] Ogasarawa, H. (1999), “Stadard errors for the direct oblimin solution with Kaiser’s normalization”; Japanese Journalof Psychology, 70, 333-338.

[651] Ogasarawa, H. (2000), ROSEF: Version 2 User’s Guide, Technical Report: Otaru University of Commerce, Otaru,Japan.

[652] Ogasarawa, H. (2000), “Standard errors for the principal component loadings for unstandardized and standardizedvariables”; British Journal of Math. and Statist. Psychology, 53, 155-174.

[653] Ogasarawa, H. (2002), “Concise formulas for the standard errors of component loading estimates”; Psychometrika,67, 289-297.

[654] Osborne, M.R. (1985), Finite Algorithms in Optimization and Data Analysis, New York: John Wiley & Sons.

[655] Oja, H. & Randles, R.H. (2004), “Multivariate nonparametric tests”; Statistical Science, to appear.

[656] Osborne, M.R. (1985), Finite Algorithms in Optimization and Data Analysis, New York: J. Wiley.

[657] Outrata, J., Schramm, H., & Zowe, J. (1991), “Bundle Trust Region Methods: Fortran Codes for NondifferentiableOptimization; User’s Guide”; Technical Report No. 269, Mathematisches Institut der Universitat Bayreuth, 1991.

[658] Owen, “Tables for computing bivariate normal probabilities”, Ann. Math. Statist., 27, 1075-1090.

[659] Parlett, B.N. (1980), The Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs, NJ.

[660] Paige,C.C., & Saunders, M.A. (1975), “Solution of Sparse Indefinite Systems of Linear Equations”, SIAM J. Numer.Anal., 12, pp. 617-629.

500 Bibliography

[661] Paige,C.C., & Saunders, M.A. (1982), “LSQR: An algorithm for sparse linear equations and sparse least squares”,ACM Transactions on Mathematical Software 8, pp. 43-71.

[662] Paige,C.C., & Saunders, M.A. (1982), “Algorithm 583, LSQR: Sparse linear equations and least-squares problems”,ACM Transactions on Mathematical Software, 8, pp. 195-209.

[663] Pan, W. (2009), “A SAS/IML Macro for Computing Percentage Points of Pearson Distribution”; JSS 457, 2009.

[664] Pape, U. (1980), “Algorithm 562: Shortest Path Lengths”, ACM TOMS, 6, 450-455.

[665] Patefield, W.M. (1981), “Algorithm AS 159. An efficient method for generating r ∗ c tables with given row andcolumns totals”, Applied Statistics, 30, 91-97.

[666] Patefield, M. (2000), “Fast and Accurate Calculation of Owen’s T Function”, JSS, 2000.

[667] Pavlov, I. (2010), Online Documentation for the 7-zip program, http://7-zip.org/7z.html.

[668] PCTEX32: User Manual, Personal TEX, Inc., Mill Valley, CA.

[669] Pearson, E.S. & Hartley, H.O. (1972), Biometrica Tables for Statisticians, Vol II, New York: Cambridge UniversityPress.

[670] Persson, P.O. & Strang, G. (2004), “A simple mesh generator in Matlab”; SIAM Review, 46, 329-345.

[671] Piessens, R., de Doncker, E., Uberhuber, C.W., & Kahane, D.K. (1983), QUADPACK: A Subroutine Package forAutomatic Integration, Berlin: Springer Verlag.

[672] Pinar, M.C. & Elhedhli, S. (1998), “A penalty continuation method for the l∞ solution of overdetermined linearsystems”; BIT .

[673] Pinar, M.C. & Hartmann, W. (1999), “A Fast Huber Approximation Algorithm for Nonlinear `1 estimation”, SixthSIAM Conference on Optimization, Atlanta.

[674] Pinar, M.C. & Hartmann, W. (2006), “Huber Approximation Algorithm for Nonlinear `1 estimation”, EuropeanJournal of Operational Research, 109, 1096-1107.

[675] Pison, G., Rousseeuw, P.J., Filzmoser, P., & Croux, C. (2003), “Robust Factor Analysis”; Journal of MultivariateAnalysis, 84, 2003, 145-172.

[676] Platt, J. (1999), “Sequential Minimal Optimization: A fast algorithm for training support vector machines”, in B.Scholkopf, C.J.C. Burges, and A.J. Smola (eds), Advances in Kernel Methods: Support Vector Learning, Cambridge:MIT Press.

[677] Pohlabeln, H., Wild, P., Schill, W., Ahrens, W., Jahn, I., Bohn-Audorff, U., Jockel, K.H. (2002), “Asbestosfibreyears and lung cancer: A two-phase case-control study with expert exposure assessment”, Occupational andEnvironmental Medicine, 59, 410-414.

[678] Polak, E. (1971), Computational Methods in Optimization, New York, San Francisco, London: Academic Press, Inc.

[679] Powell, J.M.D. (1970) “A hybrid method for nonlinear equations”, in Numerical Methods for Nonlinear AlgebraicEquations, ed. by Rabinowitz; Gordon and Breach.

[680] Powell, J.M.D. (1977), “Restart Procedures for the Conjugate Gradient Method”, Mathematical Programming, 12,241-254.

[681] Powell, J.M.D. (1978a), “A fast algorithm for nonlinearly constraint optimization calculations”, in Numerical Anal-ysis, Dundee 1977, Lecture Notes in Mathematics 630, ed. G.A. Watson, Springer Verlag, Berlin, 144-175.

[682] Powell, J.M.D. (1978b), “Algorithms for nonlinear constraints that use Lagrangian functions”, Mathematical Pro-gramming, 14, 224-248.

[683] Powell, M.J.D. (1982a), “Extensions to subroutine VF02AD”, in Systems Modeling and Optimization, Lecture NotesIn Control and Information Sciences 38, in: R.F. Drenick and F. Kozin (eds.), Springer Verlag, Berlin, 529-538.

[684] Powell, J.M.D. (1982b), “VMCWD: A Fortran subroutine for constrained optimization”, DAMTP 1982/NA4,Cambridge, England.

[685] Powell, M.J.D. (1988), ”A tolerant algorithm for linearly constrained optimization calculations”, ReportDAMTP/1988/NA17.

[686] Powell, J.M.D. (1992), “A Direct search optimization method that models the objective and constraint functionsby linear interpolation”, DAMTP/NA5, Cambridge, England.

Bibliography 501

[687] Powell, J.M.D. (2000), “UOBYQA: Unconstrained Optimization By Quadratic Approximation”, Report DAMTP2000/NA14, University of Cambridge.

[688] Powell, J.M.D. (2003), “On the use of quadratic models in unconstrained minimization without derivatives”, ReportDAMTP 2003/NA03, University of Cambridge.

[689] Powell, J.M.D. (2014), “On fast trust region methods for quadratic models with linear constraints”, Report DAMTP2014/NA02, University of Cambridge.

[690] Pregibon, D. (1981), ”Logistic Regression Diagnostic”, Annals of Statistics, 9, 705-724.

[691] Prescott, P. (1975), “An approximate test for outliers in linear models”, Technometrics, 17, 129-132.

[692] Quinlan, J.R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufman: San Mateo, CA.

[693] Rahtz, S. (1998), “Hypertext marks in LaTeX: the hyperref package”, http://www.tug.org

[694] Ramirez, D. E. (2000), ”The Generalized F Distribution”, JSS, 2000.

[695] Ramirez, D.E. & Jensen, D.R. (1991), “Misspecified T 2 Tests. II Series Expansions”, Commun. Statist. - Simula.20, 97-108.

[696] Ramsay, J. O. (1969), “Some statistical considerations in multidimensional scaling”; Psychometrika, 34, 167-182.

[697] Ramsay, J. O. (1977), “Maximum likelihood estimation in multidimensional scaling”; Psychometrika, 42, 241-266.

[698] Ranner, S., Lindgren, F. Geladi, P. & Wold, S. (1994), “A PLS algorithm for data sets with many variables andfew objects”, Journal of Chemometrics, 8, 111-125.

[699] Rao, C.R. & Mitra, S.K. (1971), Generalized Inverse of Matrices and Its Applications, New York: John Wiley &Sons.

[700] Rasch, G. (1960), Probabilistic models for some intelligence and attainment tests; Danmarks Pedagogiske Institut,Copenhagen.

[701] Reed, B. C. (1989), “Linear least-squares fits with errors in both coordinates”; American Journal of Physics, 57,642-646.

[702] Reed, B. C. (1992), “Linear least-squares fits with errors in both coordinates II: Comments on parameter variances”;American Journal of Physics, 61, 59-62.

[703] Reinelt, G. (1991), “TSPLIB - A Traveling Salesman Problem Library”, ORSA J. Comput., 3-4, 376-385.

[704] Richardson, J.D. (2011), “Nonlinear pseudo random number generation using digit isolation and entropy buffers”;submitted to JSS, 2011.

[705] Rohlf, F. J. (1978), “A probabilistic minimum spanning tree algorithm”, Information Processing Letters, 7, 44-48.

[706] Rorabacher, D.B. (1991), “Statistical treatment for rejection of deviant values: Critical values of Dixon Q parameterand related subrange ratios at the 95 percent confidence level”, Analytical Chemistry, 63, 139-146.

[707] Rosenbrock, H.H. (1960), “An Automatic Method for Finding the Greatest or Least Value of a Function”, ComputerJournal, 3, 175-184.

[708] Rosenkrantz, D. J., Stearns, R.E. & Lewis, P. M. (1977), “An analysis of several heuristics for the traveling salesmanproblem”, SIAM Journal on Computing, 6, 563-581.

[709] Rosenstein, M.T., Collins, J. J., De Luca, C. J. (1993), “A practical method for calculating largest Lyapunovexponents from small data sets”, Physica, D 65, 117.

[710] Rosipal, R. & Trejo, L.J. (2001), “Kernel partial least squares regression in reproducing kernel Hilbert space”,Journal of Machine Learning, 2, 97-123.

[711] Rousseeuw, P.J. (1984), “Least Median of Squares Regression”, Journal of the American Statistical Association,79, 871-880.

[712] Rousseeuw, P.J. (1985), “Multivariate Estimation with High Breakdown Point”, in Mathematical Statistics andApplications, Dordrecht: Reidel Publishing Company, pp. 283-297.

[713] Rousseeuw, P.J. & Croux, C. (1993), “Alternatives to the Median Absolute Deviation”, Journal of the AmericanStatistical Association, 88, 1273-1283.

502 Bibliography

[714] Rousseeuw, P.J. & Hubert, M. (1997), “Recent developments in PROGRESS”, in: Y. Dodge (ed.): L1-StatisticalProcedures and Related Topics, Institute of Mathematical Statistics, Lecture Notes, Vol. 31.

[715] Rousseeuw, P.J. & Leroy, A.M. (1987), Robust Regression and Outlier Detection, New York: John Wiley & Sons.

[716] Rousseeuw, P.R. & Van Driessen, K. (1997), “A fast Algorithm for the Minimum Covariance Determinant Esti-mator”, paper presented at the Third International Conference on the L1 Norm and Related Methods, Neuchatel,Switzerland.

[717] Rousseeuw, P.J. & Van Zomeren, B.C. (1990), “Unmasking Multivariate Outliers and Leverage Points”, Journal ofthe American Statistical Association, 85, 633-639.

[718] Royston, J. P. (1983), “Some techniques for assessing multivariate normality based on the Shapiro-Wilk W”, AppliedStatistics, 32, 121-133.

[719] Royston, J. P. (1992), “Approximating the Shapiro-Wilk W test for non-normality”, Statistics and Computing, 2,117-119.

[720] Rubin, H. & Johnson, B.C. (), “Efficient generation of exponential and normal deviates”; Journal of StatisticalComputation and Simulation, 76, 509-518.

[721] Rudd, A. and Rosenberg, B. (1979), ”Realistic Portfolio Optimization”, TIMS Studies in Management Sciences,11, 21-46.

[722] Saad, Y. (1996), Iterative Metods for Sparse Linear Systems, Boston: PWS Publishing Company.

[723] Sachs, L. (1974), Angewandte Statistik, Berlin, Heidelberg, New York: Springer Verlag.

[724] Samejima, F. (1969), “Calibration of latent ability using a response pattern of graded scores”; PsychometrikaMonograph Supplement, No. 17.

[725] Sarle, W. S. (1983), “Cubic Cluster Criterion”; SAS Technical Report A-108, Cary NC: SAS Institute Inc.

[726] Sarle, W. S. (1994), “TREEDISC Macro”, Cary NC: SAS Institute Inc.

[727] Sarle, W. S. (1995), The STDIZE SAS Macro, Cary NC: SAS Institute Inc.

[728] Sarle, W.S. (2000), The Jackboot Macro, Cary NC: SAS Institute Inc.

[729] SAS/IMLr©Software, (1989), Version 6, First Ed., SAS Institute Inc., Cary, NC.

[730] SAS/STATr©User’s Guide, (1990), Version 6, Second Printing, SAS Institute Inc., Cary, NC.

[731] The SAS r©System (2000), Version 8, SAS Institute Inc., Cary, NC.

[732] SAS Enterprise Miner documentations for PROC ASSOC, SEQUENCE, RULEGEN, etc. version 9.1.3 and 4.3:http://support.sas.com/documentation/onlinedoc/miner/emtmsas913/TW10113.pdf

http://support.sas.com/documentation/onlinedoc/miner/em43/assoc.pdf

http://support.sas.com/documentation/onlinedoc/miner/em43/rulegen.pdf

http://support.sas.com/documentation/onlinedoc/miner/em43/sequence.pdf

http://support.sas.com/documentation/onlinedoc/miner/em43/dmvq.pdf

http://support.sas.com/documentation/onlinedoc/miner/em43/neural.pdf

http://support.sas.com/documentation/onlinedoc/miner/em43/split.pdf

[733] SAS Institute Inc. (1990), SAS Language: Reference, Version 6, First Edition, Cary, NC: SAS Institute Inc.

[734] SAS Procedures Guide, (1990), Version 6, Third Ed., SAS Institute Inc., Cary, NC.

[735] SAS Institute Inc. (1995), SAS Technical Report P-250, SAS/IML r©Software: Changes and Enhancements, throughRelease 6.11; SAS Institute Inc., Cary, N.C.

[736] Sasieni, M., Yaspan, A., & Friedman, L. (1968), Methoden und Probleme der Unternehmensforschung, Berlin:Verlag Die Wirtschaft

[737] Satorra, A. & Bentler, P.M. (1994), “Corrections to test statistics and standard errors in covariance structureanalysis”, in: Latent Variables Analysis, A. von Eye & C. C. Clogg (ed.), Thousand Oaks: Sage Publications.

[738] Satterthwaite, F. W. (1946), “An approximate distribution of estimates of variance components”; Biometrics Bul-letin, 2, 110-114,

Bibliography 503

[739] Schill, W., Enders, D., & Drescher, K. (2013), “sas-twophase-package: A SAS Package for lo-gistic two-phase studies”, paper and software submitted to JSS. Software can be downloaded from:www.bips.uni-bremen.de/sastwophase

[740] Schittkowski, K. (1978), “An adaptive precision method for the numerical solution of constrained optimizationproblems applied to a time-optimal heating process”, in Proceedings of the 8th IFIP Conference on OptimizationTechniques, Springer Verlag, Heidelberg, New York.

[741] Schill, W., Wild, P., & Pigeot, I. (2007), “A planning tool for two-phase case-control studies”, Computer Programsand Methods in Biomedicine, 88, 175-181.

[742] Schittkowski, K. (1987), More Test Examples for Nonlinear Programming Codes, Lecture Notes in Economics andMathematical Systems 282, Springer Verlag, Berlin-Heidelberg-New York.

[743] Schittkowski, K. & Stoer, J. (1979), “A Factorization Method for the Solution of Constrained Linear Least SquaresProblems Allowing Subsequent Data Changes”, Numer. Math., 31, 431-463.

[744] Schmid, J. & Leiman, J.M. (1957), ”The development of hierarchical factor solutions”, Psychometrika, 22, 53-61.

[745] Schnabel, R.B. & Eskow, E.A. (1990), SIAM Journal on Scientific and Statistical Computation, 11, 1136-1158.

[746] Scholkopf, B. (2000), “Statistical Learning and Kernel Methods”, Technical Report MSR-TR-2000-23, MicrosoftResearch Limited, Cambridge UK.

[747] Schoenemann, P. H. (1972), “An algebraic solution for a class of subjective metrics models”; Psychometrika, 37,441-451.

[748] Schrage, L. (1979), “A more portable random number generator”, ACM TOMS 5 132-138.

[749] Schreiber, T. and Schmitz, A. (1996), “Improved Surrogate Data for Nonlinearity Tests”, Physical Review Letters,77, 635-638.

[750] Schrepp, M. (1999), “On the empirical construction of implications on bi-valued test items”; Mathematical SocialSciences, 38, 361-375.

[751] Schrepp, M. (2003), “A method for the analysis of hierarchical dependencies between items of a questionnaire”,Methods of Psychological Research, 19, 43-79.

[752] Schrepp, M. (2006), ITA 2.0: A program for classical and inductive item tree analysis, JSS 2006.

[753] Searle, S.R. (1971), Linear Models, New York: John Wiley & Sons.

[754] Serneels, S., Filzmoser, P., Croux, C., & Van Espen, P.J. (2004), “Robust continuum regression”; Technical Report.

[755] Shaffer, J.P. (1995), “Multiple hypothesis testing”, Annual Review of Psychology, 46, 561-576.

[756] Shanaz, F., Berry, M.W., Pauca, V.P. & Plemmons, R.J. (2004), “Document clustering using nonnegative matrixfactorization”; J. of Information Processing and Management, to appear.

[757] Shao, J. & Tu, D. (1995), The Jackknife and Bootstrap, New York: Springer Verlag.

[758] Shapiro, S. S. & Wilk, M. B. (1965), “An analysis of variance test for normality (Complete Samples)”, Biometrika,52, 591-611.

[759] Sharpe, W.F. (1987), ”An algorithm for portfolio improvement”, in: K.D. Lawrence, J.B. Guerard Jr, and G.R.Reeves (ed.), Advances in Mathematical Programming and Financial Planning, London: JAI Press Inc., 155-169.

[760] Shechter, G. (2004), Matlab package kDtree

[761] Shepard, R. N. (1962), “Analysis of proximities: Multidimensional scaling with an unknown distance function”;Psychometrika, 27, 125-140, 219-246.

[762] Sheppard, K. (2009), Oxford MFE Toolbox, Oxford. [email protected]

[763] Silvapulle, M. J. & Sen, P. K. (2005): Constrained Statistical Inference, New Jersey: Wiley.

[764] Simard, R. & L’Ecuyer, P. (2010), “Computing the Two-Sided Kolmogorov-Smirnov Distribution”, Journal ofStatistical Software.

[765] Simonetti, N. (1998), “Subroutines for dynamic program for the Traveling Salesman Problem”,www.contrib.andrew.cmu.edu/ neils/tsp/index.html

504 Bibliography

[766] Sison, C.P. & Glaz, J. (1995), “Simultaneous confidence intervals and sample size determination for multinomialproportions”, JASA, 90, 366-369.

[767] Skibicki, M. & Wywial, J. (2000), “On optimal sample allocation in strata”; Technical Report, Dept. of Statistics,University of Economics, Katowice.

[768] Small, N (1980), “Marginal skewness and kurtosis in testing multivariate normality”, Applied Statistics, 29, 85-87.

[769] Small, N (1985), “Multivariate normality, testing for”, in Kotz,, S., Johnson, N.L., & Read, C.B. (eds.), Encyclopediaof Stastistical Sciences, 6, Amsterdam: North Holland.

[770] Smith, B.T., Boyle, J.M., Dongarra, J.J., Garbow, B.S., Ikebe, Y., Klema, V.C. & Moler, C.B. (1976), MatrixEigensystem Routines - EISPACK Guide, Lecture Notes in Computer Science, Vol. 6, 2nd ed., Springer Verlag,Berlin.

[771] Sobel, R.A. (1982), “Asymptotic confidence intervals for indirect effects in structural equations”, in: SociologicalMethodology, S. Leinhardt (ed.), Washington, DC: American Sociological Association.

[772] Sobel, R.A. (1986), “Some new results on indirect effects and their standard errors in covariance structure models”,in: Sociological Methodology, N.B. Tuma (ed.), Washington, DC: American Sociological Association.

[773] Somerville, P.N. (1997), “Multiple testing and simultaneous confidence intervals: calculation of constants”, Com-putational Statistics and Data Analysis, 25, 217-233.

[774] Somerville, P.N. (1998), “Numerical computation of multivariate normal and multivariate-t probabilities over convexregions”, Journal of Computational and Graphical Statistics.

[775] Somerville, P.N. & Bretz, F. (2001), “FORTRAN 90 and SAS-IML programs for computation of critical values formultiple testing and simultaneous confidence intervals”, Journal of Statistical Software.

[776] Spaeth, H. (1987), Mathematische Software zur Linear Regression, Munchen: R. Oldenbourg Verlag GmbH.

[777] Srivastava, J. (1975), “Designs for searching non-negligible effects”, in: A Survey of Statistical Design and LinearModels, 507-519. Amsterdam: Elsevier, North Holland.

[778] Stadlober, E. (1989), “Ratio of uniforms as a convenient method for sampling from classical discrete distribution”,Proceedings of the 21st conference on Winter simulation.

[779] Stadlober, E. & Zechner, H. (1999), “The patchwork rejection technique for sampling from unimodal distributions”,ACM Transactions on Modeling and Computer Simulation, 9, 59-80.

[780] Stefanski, L.A. & Cook, J. R. (1994), “Simulation-Extrapolation: The Measurement Error Jackknife”, JASA 90,1247-1256.

[781] Spelucci, P., & Hartmann, W. (1999), “A QR decomposition for matrix pencils”, BIT, 40, 183-189.

[782] Steiger, J.H. & Lind, J.C. (1980), ”Statistically Based Tests for the Number of Common Factors”, paper presentedat the annual meeting of the Psychometric Society, Iowa City, IA.

[783] Stiefel, E.L. (1963), An Introduction to Numerical Mathematics, New York: Academic Press.

[784] Stine, R. (1989), “An introduction to the bootstrap methods: Examples and ideas”, Sociological Methods andResearch, 18, 243-291.

[785] Stoppiglia, H., Dreyfus, G. Dubois, R.& Oussar, Y. (2003), “Ranking a random feature for variable and featureselection”, Journal of Machine Learning Research, 3, 1399-1414.

[786] Stromberg, A.J. (1993), “Computation of high breakdown nonlinear regression parameters”, JASA, 88, 237-244.

[787] Suykens, J.A.K. & Vandevalle, J. (1999), “Least squares support vector classifiers”, Neural Processing Letters, 9,293-300.

[788] Suykens, J.A.K., Lukas, L., Van Dooren, P., De Moor, B., & Vandewalle J. (1999), “Least squares support vectormachine classifiers : a large scale algorithm”, in Proc. of the European Conference on Circuit Theory and Design(ECCTD’99), Stresa, Italy; 839-842. Technical Report, Kath. University of Leeuven.

[789] Svetnik, V., Liaw, A., Tong, C. & Wang, T. (2004), “Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules”; in F. Roll, J. Kittler, & T. Windeatt (eds.): MCS 2004, LNCS3077, 334-343.

Bibliography 505

[790] Swaminathan, H. (1974), ”A General Factor Model for the Description of Change”, Report LR-74-9, Laboratory ofPsychometric and Evaluative Research, University of Massachusetts.

[791] Swarztrauber, P.N. (1982), “Vectorizing the FFT’s”, in Rodriguez (ed.) Parallel Computations, 51-83, New York:Academic Press.

[792] Szekely, G. J. & Rizzo, M. (2005), “A new test for multivariate normality,” Journal of Multivariate Analysis, 93,58-80.

[793] Tagliasacchi, A. (2008), Matlab package kDtree

[794] Takane, Y. (1977), “On the relations among four methods of multidimensional scaling”; Behaviometrika, 4,29-42.

[795] Takane, Y., Young, F. W. & de Leeuw, J. (1977), “Nonmetric individual differences multidimensional scaling: Analternating least squares method with optimal scaling features”; Psychometrika, 42, 7-67.

[796] Takane, Y., Young, F. W. & de Leeuw, J. (1980), “An individual differences additive model: An alternating leastsquares method with optimal scaling features”; Psychometrika, 45, 183-209.

[797] Talebi, H. & Esmailzadeh, N. (2011a), “Using Kullback-Leibler distance for performance evaluation of searchdesigns”, Bulletin of the Iranian Mathematical Society, 37, 269-279.

[798] Talebi, H. & Esmailzadeh, N. (2011b), “Weighted searching probability for classes of equivalent search designcomparison”, Communication in Statistics: Theory and Methods, 40, 635-647.

[799] Tarjan, R.E. (1972), “Depth first search and linear graph algorithms”, SIAM Journal on Computing, 1, 146-160.

[800] Thayananthan, A. (2005) “Template based pose estimation and tracking of 3D hand motion”, PhD Thesis, Dept.of Engineering, University of Cambridge.

[801] Theiler, J., Eubank, S., Longtin, A., Galdrikian, B. (1992a), ”Testing for Nonlinearity in Time Series: the Methodof Surrogate Data”, Physica D, 58, 77-94.

[802] Theiler, J. et al (1992b), ”Using Surrogate Data to Detect Nonlinearity in Time Series”; in ”Nonlinear Modelingand Forecasting”, edt. Casdagli, M. & Eubank, S., Addison-Wesley, Reading, MA, 163-188.

[803] Therneau, T.M. & Grambsch, P.M. (2000), Modeling Survival Data, Extending the Cox Model, New York, Springer.

[804] Thissen, D. & Steinberg, L. (1986), “A taxonomy of item response models”; Psychometrika, 51, 567-577.

[805] Thompson, R. (1985), “A note on restricted maximum likelihood estimation with an alternative outlier model”;Journal of the Royal Statistical Society, Ser. B, 47, 53-55.

[806] Thurstone, L.L. (1931), “Multiple factor analysis”; Psych. Rev., 38, 406-427.

[807] Tibshirani, R. (1996), “Regression shrinkage and selection via the Lasso”, J. Royal Stat. Soc., Ser. B, 58, 267-288.

[808] Tipping, M. E. (2001), “Sparse Bayesian learning and the relevance vector machine”, The Journal of MachineLearning Research, 1, 211-244.

[809] Tobler, W.R. (1965), “Computation of the corresponding of geographical patterns”, Papers of the Regional ScienceAssociation, 15, 131-139.

[810] Tobler, W. R. (1966), “Medieval distortions: Projections of ancient maps”, Annals of the Association of AmericanGeographers, 56, 351-360.

[811] Tobler, W. R. (1994), “Bidimensional regression”, Geographical Analysis, 26,187-212.

[812] Tomizawa, S. (1987), “Decomposition for 2-ratio-parameter symmetry model in square contingency tables withordered categories”; Biom. Journal, 1, 45-55.

[813] Torgerson, W. S. (1958), Theory and Methods of Scaling, New York: Wiley.

[814] Trindade, A. A. (2003), “Implementing modified Burg algorithms in multivariate subset autoregressive modeling”,JSS, 2003.

[815] Trujillo-Ortiz, A., Hernandez-Walls, R., Barba-Rojo, K., & Castro-Perez, A. (2007a),“AnDartest: Anderson-Darling test for assessing normality of a sample data”,http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectID=14807.

[816] Trujillo-Ortiz, A., Hernandez-Walls, R., Barba-Rojo, K., & Cupul-Magana, L. (2007b), “HXmvntest: Henze-Zirkler’s multivariate normality test”, http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectID=17931.

506 Bibliography

[817] Tucker, L.R. & Lewis, C. (1973), ”A reliability coefficient for maximum likelihood factor analysis”, Psychometrika,38, 1-10.

[818] Tukey, J.W. (1977a), Data Analysis and Regression, Reading: Addison-Wesley.

[819] Tukey, J.W. (1977b), Exploratory Data Analysis, Reading: Addison-Wesley.

[820] Tyler, D.E. (1987), “A distribution-free M-estimator of multivariate scatter”, The Annals of Statistics, 15, 234-251.

[821] Van der Voet, H. (1994), “Comparing the predictive accuracy of models using a simple randomization test”, Chemo-metrics and Intelligent Laboratory Systems, 25, 313-323.

[822] van der Vorst, H. (2000), Iterative Methods for Large Linear Systems, Utrecht: Utrecht University.

[823] Van Gestel T., Suykens J., De Brabanter J., De Moor B., & Vandewalle J., (2001), “Kernel Canonical CorrelationAnalysis and Least Squares Support Vector Machines”, in Proc. of the International Conference on Artificial NeurealNetworks (ICANN 2001), Vienna, Austria; 381-386.

[824] Van Gestel, T., Suykens, J., Lanckriet, G., Lambrechts, A., De Moor, B., & Vandewalle, J., (2002), “MulticlassLS-SVMs : Moderated outputs and coding-decoding schemes”, Neural Processing Letters, 15, 45-48. TechnicalReport, Kath. University of Leeuven.

[825] Van Huffel, S. & Vandewalle, J. (1991), The Total Least Squares Problem, SIAM Philadelphia, PA.

[826] van Leeuwe, J. F. J. (1974), “Item Tree analysis”, Nederlands Tijdschrift voor de Psycholoie, 29, 475-484.

[827] Vansina F. & De Greve, J.P. (1982), “Close binary systems before and after mass transfer”, Astrophys. Space Sci.,87, 377-401.

[828] Vapnik, V.N. (1995), The Nature of Statistical Learning, New York: Springer.

[829] Venables, W. N., & Ripley, B. D. (1994), “Modern Applied Statistics with S-Plus”; New York: Springer.

[830] Venzon, D.J. & Moolgavkar, S.H. (1988), “A Method for Computing Profile-Likelihood-Based Confidence Intervals”,Applied Statistics, 37, 87-94.

[831] Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (2000), “SOM Toolbox for Matlab 5”; TechnicalReport A57, Helsinki University of Technology,.

[832] Volgenant, A. & van den Hout, W. B. (1990), “TSP1 and TSP2 - Symmetric traveling salesman problem for personalcomputers”, Technical Report with Borland Pascal implementation.

[833] Wahba, G. (1990), Spline Models for Observational Data, Series in Applied Mathematics, Vol. 59, SIAM Philadel-phia.

[834] Wang, C.C. (2011), “TMVN: A Matlab package for multivariate normality test”, JSS 2011.

[835] Wang, C.C. & Hwang, Y.T. (2011), “A new functional statistic for multivariate normality”, Statistics and Com-puting, 21, 501-509.

[836] Weber, E. (1972), Grundriss der biologischen Statistik, Jena: VEB Gustav Fischer Verlag, 1972.

[837] Weber, E. (1974), Einfuhrung in die Faktorenanalyse, Jena: VEB Gustav Fischer Verlag, 1974.

[838] Wedin, P.A. & Lindstrom, P. (1987), Methods and Software for Nonlinear Least Squares Problems, University ofUmea, Report No. UMINF 133.87.

[839] Wehrens, R. & Buydens, L. M. C. (2007), “Self and super-organizing maps in R: The Kohonen package”; JSS, 21,2007.

[840] Weisberg, S. (1980), Applied Linear Regression, New York: John Wiley & Sons.

[841] Weisberg, S. (2002), “Dimension reduction regression with R”, JSS, 7, 2002.

[842] Weiss, A.A. (1984), ”ARMA Models with ARCH Errors,” Journal of Time Series Analysis, 5, 129-143.

[843] Welch, B. L. (1947), “The generalization of ”Student’s” problem when several different population variances areinvolved”; Biometrika, 34, 28-35

[844] Welch, B. L. (1951), “On the comparison of several mean values: an alternative approach”; Biometrika, 38, 330-336.

[845] Welch, P. D. (1967), “The use of fast Fourier transform for the estimation of power spectra: a method based on timeaveraging over short, modified periodograms”; IEEE Transactions on Audio Electroacoustics, AU-15(6), 70-73.

Bibliography 507

[846] Weston, J., Mukherje, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000), “Feature Selection for SVMs”,Neural Information Processing Systems, 13, 668-674.

[847] Wheaton, B., Muthen, B., Alwin, D.F., & Summers, G.F. (1977), ”Assessing Reliability and Stability in PanelModels”, in Sociological Methodology, ed. D.R. Heise, San Francisco: Jossey Bass.

[848] White, H. (2000), “A reality check for data snooping”; Econometrica, 68, 1097-1126.

[849] Wiley, D.E. (1973), ”The Identification Problem for Structural Equation Models with Unmeasured Variables”, inStructural Equation Models in the Social Sciences, eds. A.S. Goldberger and O.D. Duncan, New York: AcademicPress.

[850] Wilkinson, J.H. (1963), Rounding Errors in Algebraic Processes, Prentice Hall, Englewood Cliffs, NJ.

[851] Wilkinson, J.H. (1965), The Algebraic Eigenvalue Problem, Oxford University Press, Oxford.

[852] Wilkinson, J.H. & Reinsch, C. (1971), Handbook for Automatic Computation, Springer Verlag, Heidelberg.

[853] Willems, G., Pison, G., Rousseeuw, P.J. & Van Aelst, S. (2001), “A robust Hotelling test”, Technical Report,University of Antwerp, http://win-www.uia.ac.be/u/statis.

[854] Williams, E. (2014), “Aviation Formulary V1.46”, http://williams.best.vwh.net/avform.htm

[855] Wilson, E.B. & Hilferty, M.M. (1931), ”The Distribution of Chi-square”, Proc. Nat. Acad. Sci., 17, 694.

[856] Wold, S. (1994), “PLS for multivariate linear modeling”, in: QSAR: Chemometric Methods in Molecular Design.Methods and Principles in Medicinal Chemistry, ed. H. van de Waterbeemd, Weinheim: Verlag Chemie.

[857] Wold, S. (1996), “Estimation of principal components and related models by iterative least squares”, in: MultivariateAnalysis, ed. P.R. Krisjhnaiah, New York: Academic Press, 391-420.

[858] Wright, S. J. (1997), Primal-Dual Interior Point Methods, SIAM, Philadelphia.

[859] Wright, S. P. (1992), “Adjusted p-values for simultaneous inference”, Biometrics, 48, 1005-1013.

[860] Xie, X-J., Pendergast, J., & Clarke, W. (2008), “Increasing the power: A practical approach to goodness-of-fittest for logistic regression models with continuous predictors”; Computational Statistics and Data Analysis, 52,2703-2713.

[861] Xie, X-J. & Bian, A (2009), “A SAS Package for evaluating logistic and proportional odds model fit”; Submissionto JSS.

[862] Yang, J. & Honavar, V. (1997), “Feature selection using a genetic algorithm”, Technical Report, Iowa State Uni-versity.

[863] Yates, F. (1960), Sampling Methods Censuses and Surveys, London: Griffin & Company Lth.

[864] Young, F. W. (1970), “Nonmetric multidimensional scaling: Recovery of metric information”; Psychometrika, 35,455-473.

[865] Young, F. W. (1975), “Methods of describing ordinal data with cardinal models”, Journal of Mathematical Psy-chology, 12, 416-436.

[866] Young, F. W. (1987), “Multidimensional scaling: History, Theory, and Applications”; Hillsdale NJ: LawrenceErlbaum.

[867] Yuan, K.-H., Guarnaccia, C.A. & Hayslip Jr., B. (2003), “A study of the distribution of sample coefficient alphawith the Hopkins symptom checklist: Bootstrap versus asymptotics”; Educational and Psychological Measurement,63, 5-23.

[868] Zamar, D., Graham, J., & McNency, B. (2007), “elrm: Software implementing exact-like inference for logisticregression models”; JSS, 21.

[869] Zamar, D., Graham, J., & McNency, B. (2013), “Package elrm”, in CRAN.

[870] Zhang, H.H. (2006), “Variable selection for support vector machines via smoothing spline ANOVA,” StatisticaSinica, 16(2), 659-674.

[871] Zhang, H.H., Ahn, J., Lin, X., & Park, C. (2006), “Gene selection using support vector machines with nonconvexpenalty,” bioinformatics, 22, pp. 88-95.

[872] Zhang, X., Lobeiza, F.R., Zhang, M.J. & Klein, J.P. (2007), “A SAS Macro for estimation of direct adjusted survivalcurves based on a stratified Cox regression model”; Computer Methods and Programme in Biomedicine, 88, 95-101.

508 Bibliography

[873] Zhu, C., Byrd, R.H., Lu, P., & Nocedal, J. (1994) “L-BFGS-B: FORTRAN Subroutines for Large Scale BoundConstrained Optimization”, Tech. Report, NAM-11, EECS Department, Northwestern University, 1994.

[874] Zhu, J., Rosset, S., Hastie, T., & Tibshirani, R. (2003), “1-norm support vector machines,” Neural Inform. Pro-cessing Systems, 16, pp. 49-56.

[875] Ziff, R. M. (1998), “Four-tap shift-register-sequence random-number generators”, Computers on Physics, 12, 385-392.

[876] Zou, H., Hastie, T., & Tibshirani, R. (2004), “Sparse principal component analysis”; Technical Report, StanfordUniversity.

Chapter 5

Index

509

Index

AlBaali, 217asymmetry, 450

Beale, 216Birgin and Martinez, 209Broyden-Fletcher-Goldfarb-Shanno, 209

COBYLA, 193, 207Computational Problems, 257Computer Resources, 260conjugate descent, 209conjugate gradient, 207Conn, 193contingency table tests, 450Convergence Problems, 258

Davidon-Fletcher-Powell, 209Dennis, 201, 207, 213, 215, 221double dogleg method, 207DQNSQP, 193

FDINT option, 220Fletcher, 203, 204, 207, 216, 217Fletcher-Reeves, 209

Gay, 201, 208, 213, 215, 221Goldfarb & Idnani, 192Gould, 193gradient, 218

Hald, 195, 209, 218Hartmann, 195, 209, 218Hessian matrix, 218Hessian Scaling, 221hybrid quasi-Newton, 207

Jacobian matrix, 218

Kappa, Mardia Based, 168kernel regression, 353Kurtosis, Adjusted Univariate, 168Kurtosis, Corrected Univariate, 167Kurtosis, Mardia’s Multivariate, 167Kurtosis, Mean Scaled Univariate, 168Kurtosis, Normalized Multivariate, 167Kurtosis, Relative Multivariate, 167

Kurtosis, Uncorrected Univariate, 167

Levenberg-Marquardt, 208Lindstrom, 217

Madsen, 195, 209, 218Mei, 207, 215More, 201, 208, 213, 221multivariate kurtosis, 167

Nelder-Mead Simplex, 208Newton-Raphson, 208NLP options, 196non-independence, 450null space option, 192

Optimization Techniques, Summary, 211Overflow, 257

Pinar, 195, 209, 218Polak-Ribiere, 209Powell, 192, 193, 207, 209, 211, 212, 214, 216, 258Powell-Beale, 209precision, 259

QUADPEN, 193quasi-Newton, 208

range space option, 192

Scaling, 221SEM Example 1, 149sem: AGFI index, 162sem: AIC index, 164sem: alphaecv, 164sem: alpharms, 163sem: assessment of fit, 159sem: automatic variable selection, 169sem: Bentler-Bonett index, 162, 165sem: Bollen index, 165sem: CAIC index, 164sem: CENT index, 164sem: Centrality index, 164sem: CFA, 157sem: CFI index, 164sem: chi-square, 162, 163

510

Index 511

sem: chi-square, adjusted, 164sem: chi-square, reweighted, 164sem: Close Fit Probability, 163sem: comparative index, 164sem: confirmatory factor analysis, 157sem: COSAN model, 155sem: degrees of freedom, 171sem: determination index, 166sem: df, 162sem: ECV Index, 163sem: endogenous variables, 172sem: EQS model, 153sem: exogenous manifest variables, 171sem: factor analysis, 157sem: GFI index, 161sem: goodness of fit, 159sem: Hoelter index, 166sem: initial values, 169sem: instrumental FA, 169sem: Kappa, Multivariate Least-Squares, 168sem: Kappa, Multivariate Mean, 168sem: likelihood ratio test, 162sem: LISREL model, 156sem: LISREL program, 161, 166, 170sem: Path Diagram, 153, 158sem: path diagram, 151sem: PGFI index, 162sem: PNFI, 165sem: predetermined moments, 171sem: predicted model matrix, 171sem: R-squared, 166sem: RAM Model, 152sem: RMR index, 162sem: RMSEA index, 163sem: SBC index, 164sem: two-stage LS, 169sem: weighted least-squares, 168sem: Z-Test, 165Skewness, Corrected Univariate, 167Skewness, Uncorrected Univariate, 167Sorensen, 208, 213Stationary Point, 259support vector machines, 353

trust region, 208

VMCWD, 193

Wedin, 217Welsch, 201, 213, 221

Xu, 207, 217

Documents

CMAT · 2018. 7. 24. · 1 Some Details 9 1.1 Examples for glim, glmixd, and glmod ... 3.18 Freshman Data: Campbell & McCabe (1984) ... Classification Accuracy 14.28571429 . Goodman-Kruskal