16
Introduction to Nonsmooth Optimization

Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

Introduction to Nonsmooth Optimization

Page 2: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

Adil Bagirov • Napsu KarmitsaMarko M. Mäkelä

Introduction to NonsmoothOptimizationTheory, Practice and Software

123

Page 3: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

Adil BagirovSchool of Information Technology and

Mathematical Sciences, Centre forInformatics and Applied Optimization

University of BallaratBallarat, VCAustralia

Napsu KarmitsaMarko M. MäkeläDepartment of Mathematics and StatisticsUniversity of TurkuTurkuFinland

ISBN 978-3-319-08113-7 ISBN 978-3-319-08114-4 (eBook)DOI 10.1007/978-3-319-08114-4

Library of Congress Control Number: 2014943114

Springer Cham Heidelberg New York Dordrecht London

� Springer International Publishing Switzerland 2014This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed. Exempted from this legal reservation are briefexcerpts in connection with reviews or scholarly analysis or material supplied specifically for thepurpose of being entered and executed on a computer system, for exclusive use by the purchaser of thework. Duplication of this publication or parts thereof is permitted only under the provisions ofthe Copyright Law of the Publisher’s location, in its current version, and permission for use mustalways be obtained from Springer. Permissions for use may be obtained through RightsLink at theCopyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exemptfrom the relevant protective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Page 4: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

Preface

Nonsmooth optimization refers to the general problem of minimizing(or maximizing) functions that are typically not differentiable at their minimizers(maximizers). These kinds of functions can be found in many applied fields, forexample in image denoising, optimal control, neural network training, data mining,economics, and computational chemistry and physics. Since classical theory ofoptimization presumes certain differentiability and strong regularity assumptionsfor the functions to be optimized, it cannot be directly utilized. The aim of thisbook is to provide an easy-to-read introduction to the theory of nonsmoothoptimization and also to present the current state of numerical nonsmooth opti-mization. In addition, the most common cases where nonsmoothness is involved inpractical computations are introduced. In preparing this book, all efforts have beenmade to ensure that it is self-contained.

The book is organized into three parts: Part I deals with nonsmooth optimi-zation theory. We first provide an easy-to-read introduction to convex and non-convex analysis with many numerical examples and illustrative figures. Then wediscuss nonsmooth optimality conditions from both analytical and geometricalviewpoints. We also generalize the concept of convexity for nonsmooth functions.At the end of the part, we give brief surveys of different generalizations of sub-differentials and approximations to subdifferentials.

In Part II, we consider nonsmooth optimization problems. First, we introducesome real-life nonsmooth optimization problems, for instance, the moleculardistance geometry problem, protein structural alignment, data mining, hemivari-ational inequalities, the power unit-commitment problem, image restoration, andthe nonlinear income tax problem. Then we discuss some formulations which leadto nonsmooth optimization problems even though the original problem is smooth(continuously differentiable). Examples here include exact penalty formulations.We also represent the maximum eigenvalue problem, which is an importantcomponent of many engineering design problems and graph theoretical applica-tions. We refer to these problems as semi-academic problems. Finally, a com-prehensive list of test problems—that is, academic problems—used in nonsmoothoptimization is given.

v

Page 5: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

Part III is a guide to nonsmooth optimization software. First, we give shortdescriptions and the pseudo-codes of the most commonly used methods for non-smooth optimization. These include different subgradient methods, cutting planemethods, bundle methods, and the gradient sampling method, as well as somehybrid methods and discrete gradient methods. In addition, we introduce somecommon ways of dealing with constrained nonsmooth optimization problems. Wealso compare implementations of different nonsmooth optimization methods forsolving unconstrained problems. At the end of the part, we provide a table enablingthe quick selection of suitable software for different types of nonsmooth optimi-zation problems.

The book is ideal for anyone teaching or attending courses in nonsmoothoptimization. As a comprehensible introduction to the field, it is also well-suitedfor self-access learning for practitioners who know the basics of optimization.Furthermore, it can serve as a reference text for anyone—including experts—dealing with nonsmooth optimization.

Acknowledgments: First of all, we would like to thank Prof. Herskovits forgiving the reason to write a book on nonsmooth analysis and optimization: Heonce asked why the subject concerned is elusive in all the books and articlesdealing with it, and pointed out the lack of an extensive elementary book.

In addition, we would like to acknowledge Prof. Kuntsevich and Kappel forproviding Shor’s r-algorithm on their web site as well as Prof. Lukšan and Vlcekfor providing the bundle-Newton algorithm.

We are also grateful to the following colleagues and students, all of whom haveinfluenced the content of the book: Annabella Astorino, Ville-Pekka Eronen,Antonio Fuduli, Manlio Gaudioso, Kaisa Joki, Sami Kankaanpää, RefailKasimbeyli, Yury Nikulin, Gurkan Ozturk, Rami Rakkolainen, Julien Ugon, DeanWebb and Outi Wilppu.

The work was financially supported by the University of Turku (Finland),Magnus Ehrnrooth Foundation, Turku University Foundation, Federation Uni-versity Australia, and Australian Research Council.

Ballarat, April 2014 Adil BagirovTurku Napsu Karmitsa

Marko M. Mäkelä

vi Preface

Page 6: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

Contents

Part I Nonsmooth Analysis and Optimization

1 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Hausdorff Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Functions and Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 Convex Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Convex Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.2 Separating and Supporting Hyperplanes . . . . . . . . . . 142.1.3 Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.1.4 Contingent and Normal Cones . . . . . . . . . . . . . . . . . 27

2.2 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2.1 Level Sets and Epigraphs . . . . . . . . . . . . . . . . . . . . 372.2.2 Subgradients and Directional Derivatives . . . . . . . . . 382.2.3 ε-Subdifferentials . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.3 Links Between Geometry and Analysis . . . . . . . . . . . . . . . . . 492.3.1 Epigraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.3.2 Level Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.3.3 Distance Function. . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3 Nonconvex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.1 Generalization of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 61

3.1.1 Generalized Directional Derivative . . . . . . . . . . . . . . 613.1.2 Generalized Subgradients . . . . . . . . . . . . . . . . . . . . 643.1.3 ε-Subdifferentials . . . . . . . . . . . . . . . . . . . . . . . . . . 733.1.4 Generalized Jacobians . . . . . . . . . . . . . . . . . . . . . . . 76

vii

Page 7: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

3.2 Subdifferential Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.2.1 Subdifferential Regularity . . . . . . . . . . . . . . . . . . . . 773.2.2 Subderivation Rules . . . . . . . . . . . . . . . . . . . . . . . . 79

3.3 Nonconvex Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943.3.1 Tangent and Normal Cones . . . . . . . . . . . . . . . . . . . 943.3.2 Epigraphs and Level Sets . . . . . . . . . . . . . . . . . . . . 983.3.3 Cones of Feasible Directions . . . . . . . . . . . . . . . . . . 102

3.4 Other Generalized Subdifferentials . . . . . . . . . . . . . . . . . . . . 1043.4.1 Quasidifferentials . . . . . . . . . . . . . . . . . . . . . . . . . . 1043.4.2 Relationship Between Quasidifferential

and Clarke Subdifferential . . . . . . . . . . . . . . . . . . . . 1093.4.3 Codifferentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1103.4.4 Basic and Singular Subdifferentials. . . . . . . . . . . . . . 112

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4 Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1174.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 117

4.1.1 Analytical Optimality Conditions . . . . . . . . . . . . . . . 1184.1.2 Descent Directions . . . . . . . . . . . . . . . . . . . . . . . . . 120

4.2 Geometrical Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214.2.1 Geometrical Optimality Conditions . . . . . . . . . . . . . . 1224.2.2 Mixed Optimality Conditions . . . . . . . . . . . . . . . . . . 123

4.3 Analytical Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1264.3.1 Geometrical Optimality Conditions . . . . . . . . . . . . . . 1274.3.2 Fritz John Optimality Conditions . . . . . . . . . . . . . . . 1284.3.3 Karush-Kuhn-Tucker Optimality Conditions . . . . . . . 130

4.4 Optimality Conditions Using Quasidifferentials . . . . . . . . . . . 1344.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

5 Generalized Convexities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1395.1 Generalized Pseudoconvexity . . . . . . . . . . . . . . . . . . . . . . . . 1395.2 Generalized Quasiconvexity . . . . . . . . . . . . . . . . . . . . . . . . . 1505.3 Relaxed Optimality Conditions. . . . . . . . . . . . . . . . . . . . . . . 161

5.3.1 Unconstrained Optimization. . . . . . . . . . . . . . . . . . . 1625.3.2 Geometrical Constraints . . . . . . . . . . . . . . . . . . . . . 1635.3.3 Analytical Constraints . . . . . . . . . . . . . . . . . . . . . . . 164

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6 Approximations of Subdifferentials . . . . . . . . . . . . . . . . . . . . . . . 1696.1 Continuous Approximations of Subdifferential . . . . . . . . . . . . 1696.2 Discrete Gradient and Approximation of Subgradients . . . . . . 175

viii Contents

Page 8: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

6.3 Piecewise Partially Separable Functions and Computationof Discrete Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1836.3.1 Piecewise Partially Separable Functions . . . . . . . . . . 1836.3.2 Chained and Piecewise Chained Functions . . . . . . . . 1856.3.3 Properties of Piecewise Partially Separable

Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1876.3.4 Calculation of the Discrete Gradients . . . . . . . . . . . . 193

6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

Part II Nonsmooth Problems

7 Practical Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2037.1 Computational Chemistry and Biology . . . . . . . . . . . . . . . . . 203

7.1.1 Polyatomic Clustering Problem . . . . . . . . . . . . . . . . 2037.1.2 Molecular Distance Geometry Problem . . . . . . . . . . . 2047.1.3 Protein Structural Alignment . . . . . . . . . . . . . . . . . . 2077.1.4 Molecular Docking . . . . . . . . . . . . . . . . . . . . . . . . . 209

7.2 Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.2.1 Cluster Analysis via NSO . . . . . . . . . . . . . . . . . . . . 2117.2.2 Piecewise Linear Separability in Supervised

Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . 2157.2.3 Piecewise Linear Approximations in Regression

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2277.2.4 Clusterwise Linear Regression Problems . . . . . . . . . . 230

7.3 Optimal Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 2337.3.1 Optimal Shape Design. . . . . . . . . . . . . . . . . . . . . . . 2337.3.2 Distributed Parameter Control Problems . . . . . . . . . . 2347.3.3 Hemivariational Inequalities. . . . . . . . . . . . . . . . . . . 235

7.4 Engineering and Industrial Applications . . . . . . . . . . . . . . . . 2357.4.1 Power Unit-Commitment Problem . . . . . . . . . . . . . . 2357.4.2 Continuous Casting of Steel. . . . . . . . . . . . . . . . . . . 236

7.5 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2377.5.1 Image Restoration. . . . . . . . . . . . . . . . . . . . . . . . . . 2387.5.2 Nonlinear Income Tax Problem . . . . . . . . . . . . . . . . 239

8 SemiAcademic Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2418.1 Exact Penalty Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . 2418.2 Integer Programming with Lagrange Relaxation . . . . . . . . . . . 243

8.2.1 Traveling Salesman Problem . . . . . . . . . . . . . . . . . . 2438.3 Maximum Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . 244

Contents ix

Page 9: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

9 Academic Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2479.1 Small Unconstrained Problems . . . . . . . . . . . . . . . . . . . . . . . 2489.2 Bound Constrained Problems . . . . . . . . . . . . . . . . . . . . . . . . 2699.3 Linearly Constrained Problems. . . . . . . . . . . . . . . . . . . . . . . 2699.4 Large Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2779.5 Inequality Constrained Problems . . . . . . . . . . . . . . . . . . . . . 283

Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

Part III Nonsmooth Optimization Methods

10 Subgradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29510.1 Standard Subgradient Method . . . . . . . . . . . . . . . . . . . . . . . 29510.2 Shor’s r-Algorithm (Space Dilation Method) . . . . . . . . . . . . . 296

11 Cutting Plane Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29911.1 Standard Cutting Plane Method . . . . . . . . . . . . . . . . . . . . . . 29911.2 Cutting Plane Method with Proximity Control . . . . . . . . . . . . 301

12 Bundle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30512.1 Proximal Bundle and Bundle Trust Methods . . . . . . . . . . . . . 30512.2 Bundle Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

13 Gradient Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31113.1 Gradient Sampling Method . . . . . . . . . . . . . . . . . . . . . . . . . 311

14 Hybrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31314.1 Variable Metric Bundle Method . . . . . . . . . . . . . . . . . . . . . . 31314.2 Limited Memory Bundle Method . . . . . . . . . . . . . . . . . . . . . 31714.3 Quasi-Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32014.4 Non-Euclidean Restricted Memory Level Method . . . . . . . . . 322

15 Discrete Gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32715.1 Discrete Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . 32715.2 Limited Memory Discrete Gradient Bundle Method . . . . . . . . 330

16 Constraint Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33516.1 Exact Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33516.2 Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

17 Numerical Comparison of NSO Softwares . . . . . . . . . . . . . . . . . . 33917.1 Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34017.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

x Contents

Page 10: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

17.3 Termination, Parameters, and Acceptance of Results . . . . . . . 34417.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

17.4.1 Extra-Small Problems . . . . . . . . . . . . . . . . . . . . . . . 34517.4.2 Small-Scale Problems . . . . . . . . . . . . . . . . . . . . . . . 34617.4.3 Medium-Scale Problems . . . . . . . . . . . . . . . . . . . . . 34817.4.4 Large Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 34917.4.5 Extra Large Problems . . . . . . . . . . . . . . . . . . . . . . . 35117.4.6 Convergence Speed and Iteration Path . . . . . . . . . . . 352

17.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

Contents xi

Page 11: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

Acronyms and Symbols

Rn n-Dimensional Euclidean space

N Set of natural numbersx; y; z (column) VectorsxT Transposed vectorxT y Inner product of x and ykxk Norm of x in R

n; k x k¼ ðxT xÞ12

xi ith Component of vector x(xk) Sequence of vectors0 Zero vectora; b; c;α; ε; λ Scalarst # 0 t! 0þA, B MatricesðAÞij Element of matrix A in row i of column j

AT Transposed matrixA�1 Inverse of matrix Atr A Trace of matrix AkAkm�n

Matrix norm kAkm�n ¼Pm

i¼1 kAik2� �1

2

I Identity matrixei ith Column of the identity matrixdiag½θ1; . . .; θn� Diagonal matrix with diagonal elements θ1; . . .; θn

B(x; r) Open ball with radius r and central point x�Bðx; rÞ Closed ball with radius r and central point xS1 Sphere of the unit ball(a, b) Open interval[a, b] Closed interval[a, b), (a, b] Half-open intervalsHðp; αÞ HyperplaneHþðp; αÞ; H�ðp; αÞ HalfspacesS, U Sets

xiii

Page 12: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

cl S Closure of set Sint S Interior of set Sbd S Boundary of set SP ðSÞ Power set\m

i¼1Si Intersection of sets Si, i ¼ 1; . . .;mS� U Demyanov differenceconv S Convex hull of set Scone S Conic hull of set Sray S Ray of the set SS� Polar cone of the set SKS(x) Contingent cone of set S at xTS(x) Tangent cone of set S at xNS(x) Normal cone of set S at xGS(x) Cone of globally feasible directions of set S at xFS(x) Cone of locally feasible directions of set S at xDS(x) Cone of descent directions at x 2 SD�SðxÞ Cone of polar subgradient directions at x 2 S

F�SðxÞ Cone of polar constraint subgradient directions at x 2 S

levαf Level set of f with parameter αepi f Epigraph of fI ;J ;K Sets of indicesjI j Number of elements in set If(x) Objective function value at xarg min f(x) Point where function f attains its minimum valuerf ðxÞ Gradient of function f at xo

oxif ðxÞ Partial derivative of function f with respect to xi

r2f ðxÞ Hessian matrix of function f at xo2

oxioxjf ðxÞ Second partial derivative of function f with respect to xi

and xj

CmðRnÞ The space of functions f : Rn ! R with continuouspartial derivatives up to order m

LðRn;RÞ The space of linear mappings from Rn ! R

Dk (generalized) Variable metric approximation of theinverse of the Hessian matrix

f 0ðx; dÞ Directional derivative of function f at x in the direction df 0εðx; dÞ ε-Directional derivative of function f at x

in the direction df �ðx; dÞ Generalized directional derivative of function f at x in the

direction ddHðA;BÞ Hausdorff distance (distance between sets A and B)dSðxÞ Distance function (distance of x to the set S)dðx; yÞ Distance function (distance between x and y)ocf ðxÞ Subdifferential of convex function f at xof ðxÞ Subdifferential of function f at x

xiv Acronyms and Symbols

Page 13: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

ξ 2 of ðxÞ Subgradient of function f at xoεf ðxÞ ε-Subdifferential of convex function f at xoGε f ðxÞ Goldstein ε-subdifferential of function f at x

of ðxÞ Subdifferential of quasidifferentiable function f at x�of ðxÞ Superdifferential of quasidifferentiable function f at xDf ðxÞ Df ðxÞ ¼ ½of ðxÞ; �of ðxÞ� Quasidifferential of function f at xdf ðxÞ Hypodifferential of codifferentiable function f at x�df ðxÞ Hyperdifferential of codifferentiable function f at xDf ðxÞ Df ðxÞ ¼ ½df ðxÞ; �df ðxÞ� Codifferential of function f at xobf ðxÞ Basic (limiting) subdifferential of f at xo1f ðxÞ Singular subdifferential of f at xv ¼ Γ ðx; g; e; z; ζ; αÞ Discrete gradient of function f at x in direction gD0ðx; λÞ Set of discrete gradientsv(x, g, h) Quasi-secant of function f at xQSecðx; hÞ Set of quasi-secantsQSLðxÞ Set of limit points of quasi-secants as h # 0P Set of univariate positive infinitesimal functionsG Set of all vertices of the unit hypercube in R

n

Ωf A set in Rn where function f is not differentiable

fkðxÞ Piecewise linear cutting plane model of function f at x~fkðxÞ Piecewise quadratic model of function f at xr hðxÞ Jacobian matrix of function h : Rn ! R

m at xo hðxÞ Generalized Jacobian matrix of function h : Rn ! R

m at xA(x) Real symmetric matrix-valued affine function of xλiðAðxÞÞ i:th Eigenvalue of AðxÞλmaxðAðxÞÞ Eigenvalue of AðxÞ with the largest absolute valuemax Maximummin Minimumsup Supremuminf Infimumdiv(i, j) Integer division for positive integers i and jmod(i, j) Remainder after integer division, mod(i, j) = j(i/j - div(i, j))ln Natural logarithmDC Difference of convex functionsFJ Fritz John optimality conditionsKKT Karush–Kuhn–Tucker optimality conditionsLOVO Low order value optimizationMDGP Molecular distance geometry problemMINLP Mixed integer nonlinear programmingNC NonconstancyNSO Nonsmooth optimizationPLP Piecewise linear potentialLC, LNC Large-scale convex and nonconvex problems, n = 1000MC, MNC Medium-scale convex and nonconvex problems, n = 200

Acronyms and Symbols xv

Page 14: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

SC, SNC Small-scale convex and nonconvex problems, n = 50XLC, XLNC Extra–large convex and nonconvex problems, n = 4000XSC, XSNC Extra-small convex and nonconvex problems, n� 20BNEW Bundle–Newton methodBT Bundle trust methodCP (standard) Cutting plane methodCPPC Cutting plane method with proximity controlDGM Discrete gradient methodGS Gradient sampling methodLMBM Limited memory bundle methodLDGB Limited memory discrete gradient bundle methodNERML Non-Euclidean restricted memory level methodPBM Proximal bundle methodQSM Quasi-secant methodVMBM Variable metric bundle method

xvi Acronyms and Symbols

Page 15: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

Introduction

Nonsmooth optimization is among the most difficult tasks in optimization. It dealswith optimization problems where objective and/or constraint functions havediscontinuous gradients. Nonsmooth optimization dates back to the early 1960s,when the concept of the subdifferential was introduced by R.T. Rockafellar and W.Fenchel and the first nonsmooth optimization method—the subgradient methodwas developed by N. Shor, Y. Ermolyev, and their colleagues in Kyev, Ukraine (inthe former Soviet Union at that time). In the 1960s and in early 1970s, nonsmoothoptimization was mainly applied to solve minimax and large linear problems usingdecomposition. Such problems can also be solved using other optimizationtechniques.

The most important developments in nonsmooth optimization started with theintroduction of the bundle methods in the mid-1970s by C. Lemarechal (and alsoby P. Wolfe and R. Mifflin). In its original form, the bundle method was introducedto solve nonsmooth convex problems. The 1970s and early 1980s were animportant period for new developments in nonsmooth analysis. Various general-izations of subdifferentials were introduced, including the Clarke subdifferentialand Demyanov–Rubinov quasidifferential. The use of the Clarke subdifferentialallowed the extension of bundle methods to solve nonconvex nonsmoothoptimization problems.

Since the early 1990s, nonsmooth optimization has been widely applied to solvemany practical problems. Such applications, for example, include mechanics,economics, computational chemistry, engineering, machine learning, and datamining. In most of these applications, nonsmooth optimization approaches allowthe significant reduction of the number of decision variables in comparison withany other approaches, and thus facilitate the design of efficient algorithms for theirsolution. Therefore, in these applications, optimization problems cannot be solvedby other optimization techniques as efficiently as they can be solved usingnonsmooth optimization techniques. Undoubtedly, nonsmooth optimization hasnow become an indispensable tool for solving problems in diverse fields.

Nonsmoothness appears in the modeling of many practical problems in a verynatural way. The source of nonsmoothness can be divided into four classes:

xvii

Page 16: Introduction to Nonsmooth Optimization - Springer978-3-319-08114... · 2017-08-28 · optimization presumes certain differentiability and strong regularity assumptions for the functions

inherent, technological, methodological, and numerical nonsmoothness. In inher-ent nonsmoothness, the original phenomenon under consideration itself containsvarious discontinuities and irregularities. Typical examples of inherent non-smoothness are the phase changes of materials in the continuous casting of steel,piecewise linear tax models in economics, cluster analysis, supervised dataclassification, and clusterwise linear regression in data mining and machinelearning. Technological nonsmoothness in a model is usually caused by extratechnological constraints. These constraints may cause a nonsmooth dependencebetween variables and functions, even though the functions were originallycontinuously differentiable. Examples of this include so-called obstacle problemsin optimal shape design and discrete feasible sets in product planning. On the otherhand, some solution algorithms for constrained optimization may also lead to anonsmooth problem. Examples of methodological nonsmoothness are the exactpenalty function method and the Lagrange decomposition method. Finally,problems may be analytically smooth but numerically nonsmooth. That is the casewith, for instance, noisy input data or so-called ‘‘stiff problems,’’ which arenumerically unstable and behave like nonsmooth problems.

Despite huge developments in nonsmooth optimization in recent decades andwide application of its techniques, only a very few books have been specificallywritten about it. Some of these books are out of date and do not contain the mostrecent developments in the area. Moreover, all of these books were written in away that requires from the audience a high level of knowledge of the subject. Ouraim in writing this book is to give an overview of the current state of numericalnonsmooth optimization to a much wider audience, including practitioners.

The book is divided into three major parts dealing, respectively, with theory ofnonsmooth optimization (convex and nonsmooth analysis, optimality conditions),practical nonsmooth optimization problems (including applications to real worldproblems and descriptions of academic test problems) and methods of nonsmoothoptimization (description of methods and their pseudo-codes, as well ascomparison of different implementations). In preparing this book, all efforts havebeen made to ensure that it is self-contained.

Within each chapter of the first part, exercises, numerical examples andgraphical illustrations have been provided to help the reader to understand theconcepts, practical problems, and methods discussed. At the end of each part, notesand references are presented to aid the reader in their further study. In addition, thebook contains an extensive bibliography.

xviii Introduction