INFORMS OS Todaydimitrib/Bertsekas_Journey_through_Optimization.pdf · analysis, my ﬁrst assignment was to teach to un-suspecting students optimization by functional an-alytic methods

INFORMS OS TodayThe Newsletter of the INFORMS Optimization Society

Volume 5 Number 1 May 2015

Contents

Chair’s Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Nominations for OS Prizes . . . . . . . . . . . . . . . . . . . . . 25Nominations for OS O�cers . . . . . . . . . . . . . . . . . . . 26

Featured ArticlesA Journey through OptimizationDimitri P. Bertsekas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Optimizing in the Third WorldClovis C. Gonzaga . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

From Lovasz ✓ Function and Karmarkar’s Algo-rithm to Semidefinite ProgrammingFarid Alizadeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Local Versus Global Conditions in PolynomialOptimizationJiawang Nie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Convergence Rate Analysis of Several SplittingSchemesDamek Davis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Send your comments and feedback to the Editor:

Shabbir AhmedSchool of Industrial & Systems EngineeringGeorgia Tech, Atlanta, GA [email protected]

Chair’s ColumnSuvrajeet Sen

University of Southern California

[email protected]

Dear Fellow IOS Members:

It is my pleasure to assume the role of Chair ofthe IOS, one of the largest, and most scholarlygroups within INFORMS. This year marks theTwentieth Anniversary of the formation of theINFORMS Optimization Section, which startedwith 50 signatures petitioning the Board to forma section. As it stands today, the INFORMSOptimization Society has matured to become one ofthe larger Societies within INFORMS. With sevenactive special interest groups, four highly covetedawards, and of course, its strong presence in everyINFORMS Annual Meeting, this Society is on aroll. In large part, this vibrant community owesa lot to many of the past o�cer bearers, withoutwhose hard work, this Society would not be asstrong as it is today. For the current health ofour Society, I wish to thank my predecessors, andin particular, my immediate predecessor SanjayMehrotra under whose leadership, we gatheredthe momentum for our conferences, a possible newjournal, and several other initiatives. I also take thisopportunity to express my gratitude to Jim Luedtke,Secretary/Treasurer, whose soft-spoken assurance,and attention to detail keeps the society humming.Another person whose support is instrumental tomany communications is our Web Editor: PietroBelotti; thanks Pietro.

The major agenda item in the upcoming yearsis the possibility of a new journal, sponsored by

mailto:[email protected]


2 INFORMS Optimization Society Newsletter

the Society. Because of the growth in a varietyof applications in Communications and Control,Data Science, and Machine Learning, there hasbeen an explosive growth in Optimization. Thisfield has always attracted advanced applicationssuch as Energy, Finance, Health Care and others.What is remarkable today is that as the area ma-tures, many advanced applications are motivatingnew optimization concepts, including fundamentalmathematical breakthroughs (e.g., the mathematicsof risk), algorithmic speed-ups and robust software(e.g., convex, and mixed-integer optimization), aswell as new modeling paradigms (e.g., multi-agentand data driven optimization). These are but a fewof the promising directions today, and there are toomany more to mention in this column.

Given the above backdrop, it is no surprise thatour Society has shown overwhelming support for anew INFORMS journal devoted to Optimization.Exactly what the focus of such a journal shouldbe is not clear at this point. There are other openquestions, and we hope to resolve them as the yearprogresses. Our plan is to start a dialogue usingINFORMS Connect as a platform for discussion,and we may even hold some meetings at a�liatedconferences, such as at the International Symposiumon Mathematical Programming in Pittsburgh thissummer. Please be on the lookout for our calls atconferences of these a�liated groups.

Our Society is at the heart of INFORMS, and itdraws tremendous strength from this sense of lead-ership. There is another source of strength: thecaliber of scholars that the IOS attracts. In or-der to celebrate the accomplishments of our re-searchers, the IOS awards four highly coveted prizes:the Khachiyan Prize, the Farkas Prize, the Young-Researcher Prize, and the Student-paper Prize. Thisnewsletter provides a glimpse of the personal storiesof the award winners, and their unique perspectiveson optimization. The awards committees for theseprizes showed impeccable taste in their choices, andI wish to thank all members of these committees fortheir diligence with the process. The winners forthe year 2014 Khachiyan Prize were Dimitris Bert-sekas (MIT) and Clovis Gonzaga (Federal Universityof Santa Catarina, Florianopolis, Brazil). These two

titans of optimization have made multi-faceted con-tributions.

• Bertsekas was awarded for his pioneering role indynamic programming (with uncountable statespaces, approximate, neuro-dynamic and ap-proximate dynamic programming), Lagrangeanmethods, dual-based bounds for nonconvexproblems, network optimization, and applica-tions in communications, power, transportationand others. Moreover his books on many ofthese topics have been adopted at many uni-versities, world-wide.

• Gonzaga for his leadership in interior pointmethods (IPM), augmented Lagrangean meth-ods, on filter methods for constrained non-linearoptimization, and on fast steepest descent meth-ods. He has recently developed steepest de-scent algorithms for the minimization of convexquadratic functions with the same performancebound as Krylov space methods. He is also theauthor of several influential reviews which haveintroduced IPMs to scores of young researchers.

The 2014 Farkas’ Prize was awarded to FaridAlizadeh (Rutgers) who laid the groundwork forthe field of semidefinite programming (SDP), whichhas grown quickly and steadily over the past 25years. The citation continues the description ofhis contributions via connections between SDPand combinatorial optimization, complementarity,non-degeneracy and algorithmic methodology.

The 2014 Prize for Young Researchers was awardedto Jiawang Nie (UC - San Diego) for this paper“Optimality conditions and finite convergence ofLasserre’s hierarchy,” Mathematical Programming,Ser. A (2014) 146:97-121, DOI: 10.1007/s10107-013-0680-x. Prior to this proof of finite convergence, theexisting theory could only establish the convergenceof this family of relaxations in the limit. No proof offinite convergence had been previously found in spiteof the significant numerical evidence for finite con-vergence that had been collected over a decade or so.

The winner of the 2014 Student Paper Prize wasDamek Davis (UCLA) for his paper “Convergencerate analysis of several splitting schemes,” arXivpreprint arXiv:1406.4834 (2014), with Wotao Yin.

Volume 5 Number 1 May 2015 3

This paper “makes a very significant contributionto the theory of optimization by using a rigorousand unified convergence rate analysis of importantsplitting schemes.”

The awards committees for this year have beenappointed, and they are listed elsewhere in thisnewsletter. I encourage the optimization communityto identify strong candidates for these awards bysending forth nominations in each of the categoriesmentioned above.

Finally, I wish to welcome our newly elected ViceChairs: Aida Khajavirad (Global Optimization),Warren Powell (Optimization Under Uncertainty),and Daniel Robinson (Nonlinear Optimization).They are joining the continuing team of Vice Chairs:Vladimir Boginski (Network Optimization),JohnMitchell (Linear and Conic Optimization), ImrePolik (Computational Optimization and Software),and Juan Pablo Vielma (Integer and Discrete Op-timization). These individuals are truly the reasonfor the strong presence of IOS at the INFORMSAnnual Conferences, and I thank them for theirleadership.

Before I close, I wish to also express my sin-cere appreciation for the work and leadership ofShabbir Ahmed. He has taken our newsletterto a whole new level, and provided it continuityfor the past four years. Such dedication to theSociety is what makes IOS tick. Thank you Shabbir!

I hope my enthusiasm for our society is palpable inthis column, and hope that you will all get involvedin the most vital society within INFORMS. Have agreat summer, and see you all in either Pittsburghor Philadelphia.

p.s: For those interested in reading about some newareas for INFORMS researchers, follow this linkwhich needs you to login at INFORMS Connect:http://connect.informs.org/communities/

community-home/librarydocuments/

viewdocument/?DocumentKey=

d7e454e3-1872-4826-900b-7871063a5980

A Journey throughOptimizationDimitri P. Bertsekas

Lab. for Information and Decision SciencesMassachusetts Institute of Tech., Cambridge, MA 02139

[email protected]

I feel honored and grateful for being awarded theKhachiyan Prize. It is customary in such circum-stances to thank one’s institutions, mentors, and col-laborators, and I have many to thank. I was fortu-nate to be surrounded by first class students and col-leagues, at high quality institutions, which gave mespace and freedom to work in any direction I wishedto go. It is also customary to chart one’s intellectualroots and journey, and I will not depart from thistradition.

We commonly advise scholarly Ph.D. studentsin optimization to take the time to get a broadmany-course education, with substantial mathemat-ical content, and special depth in their research area.Then upon graduation, to use their Ph.D. researcharea as the basis and focus for further research, whilegradually branching out into neighboring fields, andnetworking within the profession. This is good ad-vice, which I often give, but this is not how it workedfor me!

I came from Greece with an undergraduate de-gree in mechanical engineering, got my MS in con-trol theory at George Washington University in threesemesters, while holding a full-time job in an unre-lated field, and finished two years later my Ph.D.thesis on control under set membership uncertaintyat MIT in 1971. I benefited from the stimulat-ing intellectual atmosphere of MIT’s Electronic Sys-tems Laboratory (later LIDS), nurtured by MikeAthans and Sanjoy Mitter, but because of my shortstay there, I graduated with little knowledge beyondKalman filtering and linear-quadratic control theory.Then I went to teach at Stanford in a departmentthat combined mathematical engineering and oper-ations research (in which my background was ratherlimited) with economics (in which I had no expo-sure at all). In my department there was little in-terest in control theory, and none at all in my the-sis work. Never having completed a first course in

http://connect.informs.org/communities/community-home/librarydocuments/viewdocument/?DocumentKey=d7e454e3-1872-4826-900b-7871063a5980






analysis, my first assignment was to teach to un-suspecting students optimization by functional an-alytic methods from David Luenberger’s wonderfulbook. The optimism and energy of youth carriedme through, and I found inspiration in what I sawas an exquisite connection between elegant mathe-matics and interesting practical problems. StudyingDavid Luenberger’s other works (including his Non-linear Programming book) and working next door tohim had a lasting e↵ect on me.

Two more formative experiences at Stanford werestudying Terry Rockafellar’s Convex Analysis book(and teaching a seminar course from it), and mostimportantly teaching a new course on dynamic pro-gramming, for which I studied Bellman’s books ingreat detail. My department valued rigorous math-ematical analysis that could be broadly applied,and provided a stimulating environment where bothcould thrive. Accordingly, my course aimed to com-bine Bellman’s vision of wide practical applicabilitywith the emerging mathematical theory of MarkovDecision Processes. The course was an encouragingsuccess at Stanford, and set me on a good track. Ithas survived to the present day at MIT, enriched bysubsequent developments in theoretical and approx-imation methodologies.

After three years at Stanford, I taught for fiveyears in the quiet and scholarly environment of theUniversity of Illinois. There I finally had a chance toconsolidate my mathematics and optimization back-ground, through research to a great extent. In par-ticular, it helped a lot that with the spirit of youth,I took the plunge into the world of the measure-theoretic foundations of stochastic optimal control,aiming to expand the pioneering Borel space frame-work of David Blackwell, in the company of my thenPh.D. student Steven Shreve.

I changed again direction by moving back to MITin 1979, to work in the then emerging field of datanetworks and the related field of distributed com-putation. There I had the good fortune to meettwo colleagues with whom I interacted closely overmany years: Bob Gallager, who coauthored withme a book on data networks in the mid-80s, andJohn Tsitsiklis, who worked with me first while adoctoral student and then as a colleague, and overtime coauthored with me two research monographson distributed algorithms and neuro-dynamic pro-

gramming, and a probability textbook. Workingwith Bob and John, and writing books with themwas exciting and rewarding, and made MIT a spe-cial place for me.

Nonetheless, at the same time I was getting dis-tracted by many side activities, such as books innonlinear programming, dynamic programming, andnetwork optimization, getting involved in applica-tions of queueing theory and power systems, and per-sonally writing several network optimization codes.By that time, however, I realized that simultane-ous engagement in multiple, diverse, and frequentlychanging intellectual activities (while not recom-mended broadly) was a natural and exciting modeof operation that worked well for me, and also hadsome considerable benefits. It stimulated the cross-fertilization of ideas, and provided perspective formore broadly integrated courses and books. Overtime I settled in a pattern of closely connected ac-tivities that reinforced each other: focus on an inter-esting subject to generate content and fill in missingpieces, teaching to provide a synthesis and to writeclass notes, and writing a book to understand thesubject in depth and put together a story that wouldmake sense to others. Writing books and lookingfor missing pieces was an e↵ective way to generatenew research, and working on a broad range of sub-jects was a good way to discover interconnections,broaden my perspective, and fight stagnation.

In retrospect I was very fortunate to get intomethodologies that eventually prospered. Dynamicprogramming developed perhaps beyond Bellman’sown expectation. He correctly emphasized the curseof dimensionality as a formidable impediment inits use, but probably could not have foreseen thetransformational impact of the advances broughtabout by reinforcement learning, neuro-dynamicprogramming, and other approximation methodolo-gies. When I got into convex analysis and opti-mization, it was an emerging theoretical subject,overshadowed by linear, nonlinear, and integer pro-gramming. Now, however, it has taken center stagethanks to the explosive growth of machine learningand large scale computation, and it has become thelynchpin that holds together most of the popular op-timization methodologies. Data networks and dis-tributed computation were thought promising whenI got involved, but it was hard to imagine the pro-


found impact they had on engineering, as well asthe world around us today. Even set membershipdescription of uncertainty, my Ph.D. thesis subject,which was totally overlooked for nearly fifteen years,eventually came to the mainstream, and has con-nected with the popular areas of robust optimiza-tion, robust control, and model predictive control.Was it good judgement or fortunate accident thatsteered me towards these fields? I honestly cannotsay. Albert Einstein wisely told us that “Luck iswhen opportunity meets preparation.” In my case,I also think it helped that I resisted overly lengthydistractions in practical directions that were too spe-cialized, as well as in mathematical directions thathad little visible connection to the practical world.An academic journey must have companions to

learn from and share with, and for me these weremy students and collaborators. In fact it is hardto draw a distinction, because I always viewed myPh.D. students as my collaborators. On more thanone occasion, a Ph.D. thesis evolved into a book, asin the cases of Angelia Nedic and Asuman Ozdaglar,or into a long multi-year series of research papersafter graduation, as in the cases of Paul Tseng andJaney Yu. My collaborators were many and I cannotmention them all, but they were special to me and Iwas fortunate to have met them. Thank you all forsharing this exciting journey with me.

Tamas Terlaky, Dimitri Bertsekas, Clovis Gonzaga andSanjay Mehrotra

Optimizing in theThird WorldClovis C. Gonzaga

Department of Mathematics, FederalUniversity of Santa Catarina, Florianopolis, SC, Brazil

[email protected]

Introduction. In the year of my seventieth birth-day I was amazed by being awarded the KhachiyanPrize, for which I thank the committee composed byTamas Terlaky (chair), Daniel Bienstock, ImmanuelBomze and John Birge.

I was born in South Brazil, and raised in Joinville,Santa Catarina, a small industrial town, formerly aGerman settlement. I decided very soon to be anengineer (I had the knack). A local company insti-tuted a prize: the student with best grades in Math-ematics, Physics, Chemistry and Drawing during thethree years of high school accepted in an Engineeringschool got a five year scholarship. I won this prize,and passed the entry exam in the best Engineeringschool in the Country, the Technological Institute ofAeronautics. ITA is a civilian school belonging tothe Air Force, with courses in Electronic, Mechan-ical, Aeronautical and (nowadays) ComputationalEngineering. I studied Electronics, with emphasisin control and servomechanisms. ITA was foundedin the fifties, organized by a team from MIT. Welived in the campus supported by the Air Force, ina very demanding system: with one grade D you areexpelled, a grade C had to be replaced by B duringthe vacations (a maximum of five times). Plenty ofjobs were o↵ered to graduates.

I graduated in 1967, and went immediately to thegraduate school of Engineering (COPPE) of the Fed-eral University of Rio de Janeiro, for an MSc inElectrical Engineering. COPPE and the CatholicUniversity of Rio de Janeiro were the first Brazil-ian graduate schools with MSc and DSc titles inthe American system. It was founded in the sixtieswith a substantial support from the Brazilian gov-ernment. We were pioneers, mostly isolated from theworld, in a time when slide rules were still used, in-ternet had not been imagined even by science fictionwriters. The school had an IBM 1130 computer with



32 Kbytes of memory and no floating point proces-sor, which allowed you to process your fortran pro-gram in punched cards once a day.

I was an MSc student with a full time job, tak-ing care of a TR48 analog computer (young peopledo not know what that is), doing maintenance, op-erating and teaching analog computation. I starteda research on adaptive control and read a text onidentification of a control system, based on a methodcalled Levenberg-Marquardt (!). I got hooked. Bythis time an IBM researcher and Berkeley part timeprofessor, Jean-Paul Jacob, came to Brazil for a year.Jean-Paul had a PhD from UC Berkeley, advised byLucien Polak, and gave us a short course on non-linear programming in 1969, the first to be taughtin Brazil. I wrote a dissertation with him, on “re-duction of optimal control problems to non-linearprogramming”. We had by then a group of ninequite talented students. Jean-Paul proposed to theUniversity to start a PhD program on Systems En-gineering in Brazil, and brought Lotfi Zadeh fromBerkeley to organize it. We made an agreement withUC, and American professors came to Brazil to teachus advanced courses. Most of us finished the PhD inBerkeley, but I refused to go abroad.

My doctoral research was done mostly alone. Iworked on two unrelated topics.

Graph search methods. Stephen Coles fromStanford visited us and taught a course on artificialintelligence, with emphasis on graph search meth-ods, especially the A⇤ algorithm for computing min-imum cost paths using look-ahead heuristics. I usedsearch methods for the solution of sequential deci-sion problems, for which at this time electrical engi-neers knew only dynamical programming. I devisedan algorithm for planning the long term expansionof power transmission systems, using a partial or-dering of nodes (a preference relation) for pruningless promising paths through a graph of system con-figurations. This was done in a contract with theBrazilian governmental power systems agency, usinga half million dollars IBM 370 computer with 256Kof memory. The program took two hours of CPU,and for several years it was used in the actual plan-ning of the Brazilian interconnected system.

Convex optimization. My main interest was incontinuous optimization. I was the first Brazilianto teach a full course on non-linear programming,

based on the fresh from press book by Olvi Mangas-sarian. I read several texts on convex optimizationand point to set mappings, by authors like Rockafel-lar, Berge, Bertsekas, Geo↵rion. I especially likedthe idea of epsilon-subgradients, and had the ideaof using this concept to assemble subradients of afunction and devise a general method for minimizingconvex functions. I travelled to the US, where Jean-Paul scheduled interviews with two of my “idols”,Polak in Berkeley and Geo↵rion in UCLA. I toldthem about my ideas, and they asked the obviousquestion, about my adviser. I had none, and theyagreed that the idea was great, but unattainable bya young man isolated in Brazil. Then Phillip Wolfecame to Brazil, and I was in charge of showing himsome of Rio de Janeiro. I took him to the Sugarloaf,to Corcovado, to my home, and eventually told himabout what I was doing. He said that he was doingprecisely the same thing. I was delighted – Wolfeand I had the same ideas! In the following day Iabandoned the topic. The end of this tale was alsofrustrating for Wolfe: Claude Lemarechal finishedthe bundle method before him, and certainly muchbefore I would have done it, if ever.

I wrote a thesis on graph search methods for solv-ing sequential decision problems, with application tothe Brazilian transmission system. The results werenice, but I never tried to publish any journal paper.No tradition, no advisor, no need to publish.

Berkeley. In 1975 I went to Berkeley for a post-doc with my “grandfather” Lucien Polak, working onsemi-infinite problems, using cutting sets for solvingcontrol systems design problems. We wrote a nicepaper [10], and I had my first contact with a realresearch environment. In my first technical interviewwith him, he was in his o�ce with another professor,unknown to me. Extremely nervous, stuttered “Iread your paper with Prof. Maine, and... and...”.“Did you find an error?” I said “y-y-yes.” And hesaid “Show us. By the way, meet David Maine, fromImperial College”. Resisting the temptation to faint,I went to the blackboard and showed the mistake.He asked if I knew how to do it right, I did, and thiswas the beginning of a life-time friendship.

Back to Brazil, I worked on the long term plan-ning of the operation of hydro-thermal power sys-tems, using stochastic dynamic programming, andon the beautiful theory of water value. I went to


many congresses, met many researchers in discretesystems, combinatorics and in power planning, butagain published no journal papers. The political sit-uation in Brazil deteriorated, under a violent mili-tary dictatorship. Our salaries went down and re-search activities became irrelevant.

Interior points. In 1985 Lucien Polak invited meto replace him in UC during a sabbatical year. I gota visiting professor position, and stayed in Berke-ley for two and a half years, teaching courses onautomatic control. Karmarkar’s paper had just ap-peared, and I got very interested. He had studiedwith Richard Karp, and was an expert in combina-torics, complexity theory and linear programming,with no background in NLP, I believe.

After much e↵ort in rephrasing his quite obscuremethod, I saw that it could be reduced to a scaledsteepest descent method applied to his potentialfunction, and wrote a paper on this. Then I realizedthat he had not used the full potential of Newton’smethod, and that it should be possible to improvehis result. I worked day and night, found dozensof ways of not improving his complexity results, gotvery depressed, went to a Chinese restaurant andthe fortune cookie said “You will solve a problemthat will be very important in your life.” I had al-ready decided for the logarithmic barrier function(instead of the potential function). I went home andstarted dealing with the third derivatives, leadingto the large quadratic convergence region needed tobuild the path following method represented by theinchworm in the figure. Properties of path followingmethods had already been obtained by Megiddo andby Renegar. I presented my result in a workshop or-ganized by Nimrod Megiddo in Asilomar, California,in the beginning of 1987, and the paper was pub-lished in the workshop proceedings. All referencesabout these early results are in [4].

My life changed instantly. I met all the mainbrains in the field, and saw myself in the middle ofthe interior point revolution. A bunch of extremelyclever guys like Anstreicher, Adler, Goldfarb, Ko-jima, McCormick, Megiddo, Mizuno, Monteiro, Ne-mirovsky, Nesterov, Renegar, Roos, Todd, Vial, Ye(forgive me if I forgot your name) travelled aroundthe world like rodeo riders, mounting a new horse ineach congress, with results that became obsolete ina matter of months. In 1992 I published the state

Figure 1: The inchworm

of the art paper [4] in SIAM Review, later endowedwith the “citation classics award” for the most citedpaper in Mathematics and Computer Sciences writ-ten by a Brazilian in the 90’s decade.

Back in Rio de Janeiro in 1987 I kept working oninterior point methods, alone in Brazil, sending andreceiving technical reports by regular mail and trav-elling frequently abroad. Brazil was again learninghow to be a democracy. When computer commu-nications arrived, science in the third world becamefeasible.

I spent 1993 at INRIA in Paris, working withFrederic Bonnans [2, 5] and Jean-Charles Gilbert [3],worked for a semester in TU Delft, in the companyof Roos and Terlaky.

In 1994 I moved back to South Brazil, transferred(with a group of DSc students) to the Math. De-partment of the Federal University of Santa Cata-rina in Florianopolis, a city on a beautiful islandsurrounded by 40 beaches, where we have organizedtwo international workshops (clovisfest 60 and 70,in 2004 and 2014, with a plan for another one in2024, maybe...). I worked on linear complementarityproblems, augmented Lagrangian methods, Nesterovoptimal descent methods, filter methods, and latelyon the complexity of descent methods for quadraticminimization. I wrote papers with my former stu-dents Hugo Lara [13], Luis C. Matioli [14], RomuloCastillo [6], Ademir Ribeiro [12, 16], Diane Rossetto[8], Marcia Vanti [9], Roger Behling [1], Fernanda


Raupp [15], Marli Cardia, and have a long lastingcollaboration with Elizabeth Karas [7, 8, 9] from theFederal University of Parana in Brazil.I am a member of the Brazilian Academy of Sci-

ences and of TWAS. I am a SIAM fellow, I receivedthe Great Cross of the Brazilian Order of ScientificMerit, and now this very prestigious prize in honourof Leonid Khachiyan.Now I am retired, leisurely working in Flo-

rianopolis as a voluntary researcher. I thank mystudents, old and new [11], for the joy they broughtto my life in these many years.

REFERENCES

[1] Roger Behling, C. C. Gonzaga, and G Haeser. Primal-dual relationship between levenberg-marquardt andcentral trajectories for linearly constrained convexoptimization. Journal of Optimization Theory andApplications, 162:705–717, 2013.

[2] F. Bonnans and C. C. Gonzaga. Convergence of interiorpoint algorithms for monotone linear complementar-ity problems. Mathematics of Operations Research,21:1–25, 1996.

[3] J. C. Gilbert, C. C. Gonzaga, and E. Karas. Examplesof ill-behaved central paths in convex optimization.Mathematical Programming, 103(1):63–94, 2005.

[4] C. C. Gonzaga. Path following methods for linear pro-gramming. SIAM Review, 34(2):167–227, 1992.

[5] C. C. Gonzaga and F Bonnans. Fast convergence ofthe simplified largest step path following algorithm.Mathematical Programming, 76:95–116, 1997.

[6] C. C. Gonzaga and R Castillo. A nonlinear programmingalgorithm based on non-coercive penalty functions.Mathematical Programming, 96:87–101, 2003.

[7] C. C. Gonzaga and E. W. Karas. Fine tuning Nesterov’ssteepest descent algorithm for di↵erentiable convexprogramming. Mathematical Programming, 138(1–2):141–166, 2013.

[8] C. C. Gonzaga, E. W. Karas, and D. R. Rossetto. An op-timal algorithm for constrained di↵erentiable convexoptimization. SIAM Journal on Optimization, 2013.To appear.

[9] C. C. Gonzaga, E. W. Karas, and M. Vanti. A globallyconvergent filter method for nonlinear programming.SIAM J. Optimization, 14(3):646–669, 2003.

[10] C. C. Gonzaga and E Polak. On constraint droppingschemes and optimality functions for a class of outerapproximation algorithms. SIAM Journal on Con-trol and Optimization, 17, 1979.

[11] C. C. Gonzaga and R. Schneider. Improving the e�-ciency of steepest descent algorithms for minimizingconvex quadratics. Technical report, Federal Uni-versity of Santa Catarina, Brazil, February 2015.

[12] E. W. Karas, C. C. Gonzaga, and A. A. Ribeiro. Localconvergence of filter methods for nonlinear program-ming. Optimization, 59:1153–1171, 2010.

[13] H Lara and C. C. Gonzaga. A note on properties of con-dition numbers. Linear Algebra and its Applications,261:269–273, 1997.

[14] L. C. Matioli and C. C. Gonzaga. A new family of penal-ties for augmented lagrangian methods. NumericalLinear Algebra with Applications, 15:925–944, 2008.

[15] F. M. Raupp and C. C. Gonzaga. A center cutting planealgorithm for a likelihood estimate problem. Compu-tational Optimization and Applications, 21:277–300,2000.

[16] A. A. Ribeiro, E. W. Karas, and C. C. Gonzaga. Globalconvergence of filter methods for nonlinear program-ming. SIAM Journal on Optimization, 19:1231 –1249, 2008.


From Lovasz ✓ Function andKarmarkar’s Algorithm toSemidefinite Programming

Farid AlizadehSchool of Business, Rutgers University,

Piscataway, NJ 08854

[email protected]

It is truly an honor, and quite an unexpected one,to be awarded the Farkas Prize, especially consid-ering the remarkable people who have received thisaward in the past.Below I will give an account of how I was steered

into the study of semidefinite programming from abrilliant construction by Lovasz.

1. The marvelous ✓ function

While studying as a graduate masters student atthe University of Nebraska–Lincoln in 1987, I cameacross the little book of Lovasz [27] (see also theearlier [26]). In it there was a derivation of a graphinvariant which was so simple and yet astonishinglybeautiful. Here is the idea: Suppose a graph has aclique (as subset of vertices all connected to eachother) of size k. Construct a symmetric matrixA(x), where the i, j entry is 1 if vertices i and jare connected in the graph or i = j; otherwise set[A(x)]

ij

= [A(x)]ji

= xij

a real-valued free variable.Consider this example:

1

5 4

32

The matrix of this graph as described above, andwith x =

�x14, x15, x24, x35

�is given by:

A(x)def=

0

BBBB@

1 1 1 x14 x151 1 1 x24 11 1 1 1 x35x14 x24 1 1 1x15 1 x35 1 1

1

CCCCA.

From elementary linear algebra, we know thatthe largest eigenvalue of a principal submatrix isbounded by the largest eigenvalue of the matrix it-self. So, since the submatrix corresponding to thevertices of the clique is a k ⇥ k matrix of all ones,and since the sole nonzero eigenvalue of such ma-trix is equal to k, it follows that k �[1]

�A(x)

�.

In particular, the size of the largest clique !(G) min

x

�[1]

�A(x)

�.

What is amazing is that the right hand side ofthe inequality, known as the Lovasz ✓ function, isthe solution of a convex optimization problem, eventhough there are no “neat” formulas for eigenvaluesof a matrix. As a result, the ✓ invariant of a graphcan be estimated in polynomial time by the ellipsoidmethod.

2. Karmarkar’s algorithms

Completely independent of the ✓ function I alsogot interested in the (then) new Karmarkar’s algo-rithm for linear programming (LP). While the el-lipsoid method was known to solve LP’s in polyno-mial time, it had not proved to be useful in prac-tice. Karmarkar’s algorithm was both polynomial-time, and also practical. I read Karmarkar’s pa-per [24], and subsequently attended the SIAM Op-timization Conference in Houston in 1987. Manymajor players in interior point methods, includingIlan Adler, Don Goldfarb, Clovis Gonzaga, Naren-dra Karmarkar, Michael J. Todd and Yinyu Ye werepresenting talks.

Yinyu Ye, Farid Alizadeh and Sanjay Mehrotra



Right brain meets left brain: Canwe extend Karmarkar’s algorithmto compute the ✓ function?

Driving back from Houston to Lincoln, and jugglingin my mind what I had learned in the SIAM con-ference, I thought of a crazy idea: Can we applyinterior point methods to compute the ✓ function?After transferring to Minnesota in 1988 to study

with late Ben Rosen, I worked on this idea and re-alized that it was not so crazy after all. Applyingthe logarithmic barrier to the optimization problemcharacterizing the ✓ function we get:

minz,x

z � µX

i

ln(z � �i

�A(x)

�

=minz,x

z � µ lnY

i

⇣z � �

i

�A(x)

�⌘

=minz,x

z � µ lnPA(x)(z),

where PA(x)(z) is the characteristic polynomial of

A(x). Now, the last line can be easily computedusing unconstrained optimization techniques basedon Newton’s, or quasi/truncated Newton methods.Using the array of supercomputers available at Min-nesota, especially the Cray-2, and the NAG libraryoptimization software, I could use the above formu-lation and compute ✓ of graphs with hundreds ofvertices and a few thousand edges.

3. Onward to semidefinite pro-gramming

In the Spring of 1990 I got hold of the wonderfulbook by Grotschel, Lovasz and Schrijver [21]. Tomy delight an entire chapter was devoted to the ✓function. A particular “dual” formulation caughtmy eye:

✓(G) = max J • Ys.t. Y

ij

= 0 for all ij edges in GY < 0

,

where A • B def=

Pij

Aij

Bij

, and Y < 0 means Y isa symmetric positive semidefinite matrix.

Looking at this characterization, I realized thatit resembled the so-called “standard-form” LP:

min{c>x | Ax = b,x � 0}; only the nonnegative or-thant was replaced by the positive semidefinite cone.Based on this observation I came up with a strategyto extend design and analysis of interior point meth-ods:

1. First, I would focus my attention not on the ✓function per se, but on a more general optimiza-tion problem and its dual:

min C •Xs.t. A

i

•X = bi

X < 0

max b>ys.t.

Pi

yi

Ai

+ S = CS < 0

(This was the time I thought of the termsemidefinite programming (SDP) for this prob-lem1).

2. Next, I would take one of literally hundredsof papers on the complexity of various interiorpoint methods for LP, and try to extend it, wordfor word, to SDP.

My intuition was based on the belief that Kar-markar’s, and other interior point methods did notrely in significant ways on the combinatorial natureof linear programming (the way the simplex methodsdid, for instance). So techniques for their analysisshould be extensible to more general setting above.

In the meantime, around May of 1990, Yinyu Ye,who was at the University of Iowa at the time, visitedMinnesota. We had an hour long conversation andI discussed my project with him. Yinyu was veryinterested and encouraged me to pursue the project.He also suggested that I use one of the algorithmshe had developed, which were based on the Todd-Ye-Mizuno potential function.

I spent the next few weeks studying Ye’s two pa-pers [48, 47] which presented two potential reduc-tion algorithms for linear programming with itera-tion complexity O

�pnL

�. While reading these pa-

pers, I realized that in almost every place that thevector x

k

(the current estimate for the optimal solu-tion) occurs as a parameter of a nonlinear function,that function is symmetric in x

k

. For instance, in

1The term “positive (semi)-definite programming” hadbeen used much earlier by Dantzig and Cottle to describe adi↵erent problem: The linear complementarity problem witha semidefinite matrix[14]. I was not aware of this at the time,and in any case the term had not caught on.


the potential function given by c>x� qP

i

lnxi

, theterm

Pi

lnxi

is symmetric. So are expressions likekx � 1k, where 1 is the vector of all ones, and k · kcorresponds to either the Euclidean norm, the max-imum (1) norm or the 1-norm. So it was straight-forward to translate statements involving such func-tions to SDP. All I had to do was to replace x

k

(in LP) with the eigenvalues of Xk

(in SDP). Forinstance, an inequality used by Ye, and earlier byKarmarkar, was the following: For 0 < x < 1:

X

i

lnxi

�X

i

xi

� n� kx� 1k22�1� kx� 1k1

� ·

This inequality is easily proved using the Taylor ex-pansion of ln(1 � x). Now let X be a symmet-ric matrix and let �

i

(X) be its eigenvalues. Then0 < �

i

(X) < 1 for all i is equivalent to 0 � X � I.And the inequality above, upon replacing �

i

(X) forxi

, becomes

lnDet(X) � trace(X)� n� kX � IkF

2�1� ⇢(X � I)

� ,

where ⇢(A) is the spectral radius of A (the largesteigenvalue norm max

i

|�i

(X)| for symmetric matri-ces).Another key ingredient of LP interior point meth-

ods was some form of linear or projective transforma-tion that mapped the nonnegative orthant back to it-self and brought the current interior feasible point x

k

to 1. For instance, a linear transformation achievingthis would be x ! Diag(x

k

)�1x. Typically a num-ber of identities and inequalities are proved for thistransformed vector, ultimately resulting in a fixedreduction in a potential function. The analogoustransformation for SDP seemed to be X ! X�1

k

Xwhich maps the current interior feasible point X

k

toI, the identity matrix. As in the inequality above,I could extend all the needed identities and inequal-ities involving Diag(x

k

)�1x in Ye’s papers to thespectrum of X�1

k

X. As a result I could prove a fixedreduction to an analogous matrix potential function.So my strategy of “word-for-word” turning LP in-terior point algorithms and their analyses to SDPseemed to work.Of course there was a major flaw in all this which

made the whole thing somewhat absurd: While inLP the transformation x ! Diag(x

k

)�1x mapped

the nonnegative orthant back to itself, the corre-sponding X ! X�1

k

X did not map the semidefinitecone to itself; the product of two symmetric matricesis not even a symmetric matrix in general.

I was stuck on this issue for a few weeks, un-til I decided to pay a visit to Yinyu and seek hisadvice. In summer of 1990, I drove five hoursfrom Minneapolis to Iowa City. Yinyu was gra-cious to treat me to pizza, and even more gra-cious afterwards to listen to my construction andthe dilemma about non-symmetry of X�1

k

X. Assoon as I described the problem, and without hes-itation, Yinyu proposed that I use instead the trans-

formation: X ! X�1/2k

XX�1/2k

. The moment heuttered these words I knew that the problem was

solved. X ! X�1/2k

XX�1/2k

is an automorphism ofthe semidefinite cone. At the same time it has thesame spectrum as X�1

k

X, so all the identities andinequalities I had already proved for its spectrumremained valid for this transformation as well.

I should mention here that Yinyu in his earliervisit to Minnesota also brought to my attention apreliminary version of Nesterov and Nemirovski’smonumental work [32] (as an e-book on a floppy,in 1990!) I did not quite understand the book, butfortunately there was a workshop in early summer of1990 in Madison, Wisconsin, in which both Yuri andArkadi did also attend. I talked to them about mywork on the ✓ function, and they re-assured me thattheir self-concordance theory implied that ✓ can beestimated in O

�pn�iterations. Based on this dis-

cussion I quickly wrote a paper [2] on randomizedparallel computation of ✓, and became confident thatmy “word-for-word” approach had a strong chanceof success.

4. Further research on applica-tions of SDP

After writing my dissertation [1] and a few papersbased on the research above [3, 4], I spent two years(1992-1994) as an NSF postdoctoral associate at theInternational Computer Science Institute at the Uni-versity of California–Berkeley. I was extremely priv-ileged to work with Dick Karp on applications ofcombinatorial optimization to molecular biology. Ishould also mention my joyful collaboration with


Andrew Goldberg in implementing the push-relabelmethod for the maximum-flow problem on the CM-2architecture [6].At this time I also got to know Ilan Adler and

through interactions with him got interested bothin primal-dual algorithms for SDP and also in sec-ond order cone programming. My interest in SOCParose after reading a preliminary version of a paperof Nemirovski and Scheinberg [31] which essentiallymade a “word-for-word” extension of Karmarkar’salgorithm from LP to SOCP.In 1994 I moved to RUTCOR at Rutgers Univer-

sity. Below I briefly outline my activities in the pasttwenty years:

1994-1998: Primal-Dual algorithms I startedmy collaboration with Jean-Pierre Haeberly of(then) Fordham, and Michael Overton of New YorkUniversity. After studying the primal-dual methodof Helmberg et al. [23] we developed our ownmethod, which computed Newton’s direction for pri-mal and dual feasibility and the complementarity re-lation XS+SX

2 = µI [11]; this direction was termedby the community as the AHO direction. We alsodeveloped SDPpack, the first publicly available soft-ware package that could solve optimization prob-lems with any combination of linear, second orderand semidefinite constraints [8]. Later, in a semi-nal work, Monteiro and Zhang extended the AHOmethod by adding an automorphic scaling of thesemidefinite cone [30]. Renato Monteiro showedthat under all such scalings, a “short-step” ap-proach yields O

�pn| ln ✏|

�iteration complexity [29].

The Monteiro-Zhang family includes the fundamen-tal Nesterov-Todd methods [33, 34] and the Helm-berg et al. method mentioned above.

1995-1997: Degeneracy in SDP With Jean-Pierre and Michael, we also studied the notion of(non)degeneracy and strict complementarity in SDP[10]. This work was further expanded by Pataki [43],and Faybusovich [17].

1998-2003: Jordan algebras Through the workof Guler [22], Faybusovich [17, 16, 18], and bril-liant texts of Faraut and Korany [15] and Koecher[28] Euclidean Jordan algebras became accessible to

the optimization community. With Stefan Schmi-eta we showed that just about anything said aboutthe Monteiro-Zhang family can be extended verba-tim to optimization over symmetric cones [44, 45].We also showed that the word-for-word extension ofYe’s algorithm in LP can be extended all the wayto symmetric cones [12]. With Yu Xia we extendedthe Q method—developed earlier with Haeberly andOverton [9]—to symmetric cones as well [13].

2008-2014: nonnegative and SOS polynomialsAs a result of the work of N. Z. Shor [46], and laterLassere [25] Parrilo [42, 41] and Nesterov [35], it be-came evident that the set of polynomials which areexpressible as sum-of-squares (SOS) of other poly-nomials, is SD-representable. With David Papp, westudied the notion of sum-of-squares in abstract al-gebras and, building on the work of Nesterov [35]and Faybusovich [19] showed SDP-representabilityof SOS cones in the most general setting possible[40]. With RUTCOR students Gabor Rudolf, NilayNoyan, David Papp, we showed that the complemen-tarity conditions for the cone of univariate nonneg-ative polynomials and its dual, the moment cone,cannot be expressed by more than four independentbilinear relations (unlike nonnegative orthant, andother symmetric cones where complementarity couldbe expressed by n bilinear relations). In the processwe introduced the notion of bilinearity rank of cones.Later, Gowda and Tao showed that bilinearity rankof a cone is exactly the dimension of the Lie algebraof its automorphism group [20].

2003-2011: Shape-constrained statistical esti-mation With my colleague at Rutgers, JonathanEckstein, and former students Gabor Rudolf and Ni-lay Noyan, we applied SDP to estimate the (nonneg-ative) arrival rate of nonhomogeneous Poisson pro-cesses, and used our approach to estimate arrivalrate of e-mails [5]. And with David Papp we esti-mated the two dimensional arrival rate of accidentsin New Jersey Turnpike as a function of both timeand millage post [39]. Also with David we used sim-ilar techniques for shape constrained regression andprobability density estimation [40].


5. Final word

I am fortunate to have been associated with greatmentors, colleagues, and students throughout theyears. Early in my career, my advisor at Nebraska,David Klarner, sparked the love of algorithms in me.And Ben Rosen at Minnesota had the confidence toagree to support me after just a ten minute conver-sation.Michael Overton, who had already extensively

studied eigenvalue optimization [36, 38, 37], did sub-stantial hand-holding in the early stages of my in-vestigation. I also had stimulating discussions bothin person and by e-mail with Arkadi Nemirovski andYuri Nesterov (who read the first drafts of my disser-tation and pointed out several sloppy errors). LaciLovasz was gracious and encouraged me to pursuemy ideas about the ✓ function and its e�cient com-putation. Gene Golub was kind in allowing me touse Stanford student o�ces, and also having manydiscussions about numerical stability of eigenvalueoptimization algorithms.Over the years discussions with Stephen Boyd,

Gabor Pataki, Mutakuri Ramanna, Michel Goe-mans, David Williamson and Garud Iyengar werevery fruitful. In addition, I had a great sabbaticalleave at Columbia IEOR department during calen-dar year of 2001. A result of this visit was a verysatisfying collaboration with Don Goldfarb which re-sulted in our paper on SOCP [7].Finally, I should acknowledge the great pleasure

I have had to work with (former) graduate studentsReuben Sattergren, Stefan Schmieta, Yu Xia, Ga-bor Rudolf, Nilay Noyan, David Papp, and RUT-COR and Rutgers MSIS (current) graduate studentsMarta Cavaleiro, Mohammad Ranjbar and DenizSeyed Eskandani.

REFERENCES

[1] F. Alizadeh. Combinatorial Optimization with InteriorPoint Methods and Semi-Definite Matrices. PhDthesis, Computer Science Department, University ofMinnesota, Minneapolis, Minnesota, 1991.

[2] F. Alizadeh. A sublinear-time randomized parallel al-gorithm for the maximum clique problem in perfectgraphs. In Proc. 2nd ACM-SIAM Symposium onDiscrete Algorithms, 1991.

[3] F. Alizadeh. Optimization Over Positive Semi-DefiniteCone; Interior-Point Methods and CombinatorialApplications. In P. Pardalos, editor, Advancesin Optimization and Parallel Computing. North-Holland, Amsterdam, 1992.

[4] F. Alizadeh. Interior point methods in semidefinite pro-gramming with applications to combinatorial opti-mization. SIAM J. Optim., 5(1):13–51, 1995.

[5] F. Alizadeh, J. Eckstein, N. Noyan, and G. Rudolf.Arrival Rate Approximation by Nonnegative CubicSplines . Operations Research, 56(1):140–156, 2008.

[6] F. Alizadeh and A. V. Goldberg. Implementing thePush-Relabel Method for the Maximum Flow Prob-lem on a Connection Machine. In David S. John-son and Catherine C. McGeoch, editors, NetworkFlows and Matching, Fisrt DIMACS Implementa-tion Challenge, volume 12 of DIMACS Series in Dis-crete Mathematics and Theoretical Computer Sci-ence. American Mathematical Society, 1993.

[7] F. Alizadeh and D. Goldfarb. Second-Order Cone Pro-gramming. Math. Programming Series B, 95:3–51,2003.

[8] F. Alizadeh, J.-P.A. Haeberly, V. Nayakkankuppam,M.L. Overton, and S.A. Schmieta. SDPpack UserGuide (Version 0.9 Beta). Technical Report 737,Courant Institute of Mathematical Sciences, NewYork University, New York, NY, 1997. (Available athttp://www.cs.nyu.edu/faculty/overton/sdppack).

[9] F. Alizadeh, J.P. Haeberly, and M.L. Overton. A newprimal-dual interior point method for semidefiniteprogramming. In Proc. 5th SIAM Conference onAppl. Linear Algebra, Snowbird, Utah, 1994.

[10] F. Alizadeh, J.P. Haeberly, and M.L. Overton. Com-plementarity and nondegeneracy in semidefinite pro-gramming. Math. Programming, 77(2), 1997.

[11] F. Alizadeh, J.P. Haeberly, and M.L. Overton. Primal-dual interior-point methods for semidefinite pro-gramming: Convergence rates, stability and numer-ical results. SIAM J. Optim., 8(3):746–768, 1998.

[12] F. Alizadeh and S.H. Schmieta. Symmetric Cones,Potential Reduction Methods and Word-By-WordExtensions. In R. Saigal, L. Vandenberghe, andH. Wolkowicz, editors, Handbook of SemidefiniteProgramming, Theory, Algorithms and Applications,pages 195–233. Kluwer Academic Publishers, 2000.

[13] F. Alizadeh and Y. Xia. The Q Method for SymmetricCone Programming. Journal of Optimzation Theo-rey and Applications, 149(1):102–137, 2010.

[14] G. B. Danztig and R. W. Cottle. Positive (semi-)definiteprogramming. In J. Abadie, editor, Nonlinear Pro-gramming, pages 55–73. North Holland, Amsterdam,1967.


[15] J. Faraut and A. Koranyi. Analysis on SymmetricCones. Oxford University Press, Oxford, UK, 1994.

[16] L. Faybusovich. Euclidean Jordan algebras and interior-point algorithms. Positivity, 1(4):331–357, 1997.

[17] L. Faybusovich. Linear systems in Jordan algebras andprimal-dual interior point algorithms. Journal ofComputational and Applied Mathematics, 86:149–175, 1997.

[18] L. Faybusovich. A Jordan-algebraic approach topotential-reduction algorithms. Math. Z., 239:117–129, 2002.

[19] L. Faybusovich. On nesterov’s approach to semi-infiniteprogramming. Acta. Appl. Math., 74(2):195–215,2002.

[20] S. Gowda and J. Tao. On the bilinearity rank of a propercone and lyapunov-like transformations. Math. Pro-gramming Series A, 147:155–170, 2014.

[21] M. Grotschel, L. Lovasz, and A. Schrijver. Geo-metric Algorithms and Combinatorial Optimization.Springer Verlag, 1988.

[22] O. Guler. Barrier Functions in Interior Point Methods.Math. of Oper. Res., 21:860–885, 1996.

[23] C. Helmberg, F. Rendl, R.J. Vanderbei, and H. Wolkow-icz. An interior-point method for semidefinite pro-gramming. SIAM J. Optim., 6:342–361, 1996.

[24] N. Karmarkar. A new polynomial-time algorithm for lin-ear programming. Combinatorica, 4:373–395, 1984.

[25] Lasserre, J. B. Global Optimization with Polynomialsand The Problem of Moments. SIAM J. Optim.,11(8):708–817, 2001.

[26] L. Lovasz. On the shannon capacity of a graph. IEEETrans. Inform. Theory, 25(1):1–7, January 1979.

[27] L. Lovasz. An Algorithmic Theory of Numbers, Graphsand Convexity. CBMS-NSF50. SIAM, 1986.

[28] M. Koecher. The Minnesota Notes on Jordan Alge-bras and Their Applications. Springer-Verlag, 1999.Edited by Kreig, Aloys and Walcher, Sebastianbased on Lectures given in The University of Min-nesota, 1960.

[29] R.D.C. Monteiro. Polynomial convergence of primal-dual algorithms for semidefinite programming basedon monteiro and zhang family of directions. SIAMJ. Optim., 8:797–812, 1998.

[30] R.D.C. Monteiro and Y. Zhang. A unified analysis for aclass of path-following primal-dual interior-point al-gorithms for semidefinite programming. Mathemat-ical Programming, 81:281–299, 1998.

[31] A. Nemirovski and K. Scheinberg. Extension of Kar-markar’s algorithm onto convex quadratically con-strained quadratic programming. Math. Program-ming, 72:273–289, 1996.

[32] Y. Nesterov and A. Nemirovski. Self-concordant func-tions and polynomial time methods in convex pro-gramming. Central Economical and Mathemati-cal Institute, U.S.S.R Academy of Science, Moscow,1990.

[33] Y.E. Nesterov and M.J. Todd. Self-scaled barriersand interior-point methods for convex programming.Math. of Oper. Res., 22:1–42, 1997.

[34] Y.E. Nesterov and M.J. Todd. Primal-dual interior-point methods for self-scaled cones. SIAM J. Op-tim., 8:324–364, 1998.

[35] Yu. Nesterov. Squared functional systems and optimiza-tion problems. In H. Frenk, K. Roos, T. Terlaky, andS. Zhang, editors, High Performance Optimization,Appl. Optim., pages 405–440. Kluwer, Dordrecht,Netherlands, 2000.

[36] M. L. Overton. On minimizing the maximum eigenvalueof a symmetric matrix. SIAM J. Matrix Anal. Appl.,9(2):256–268, 1988.

[37] M. L. Overton and R. S. Womersley. On the sum of thelargest eigenvalues of a symmetric matrix. SIAM J.Matrix Anal. Appl., 13:41–45, 1992.

[38] M. L. Overton and R. S. Womersley. Optimality con-ditions and duality theory for minimizing sums ofthe largest eigenvalues of symmetric matrices. Math.Programming, 62(2):321–357, 1993.

[39] D. Papp and F. Alizadeh. Multivariate Arrival RateEstimation by Sum-Of-Squares Polynomial Splinesand Semidefinite Programming. In Proc. 2011 Win-ter Simulation Conf., 2011.

[40] D. Papp and F. Alizadeh. Shape-Constrained Estima-tion Using Nonnegative Splines. Comput. Graph.Statist., 23(1):211–231, 2014.

[41] P. A. Parrilo. Semidefinite programming relaxationsfor semialgebraic problems. Mathematical Program-ming, 96(2):293–320, 2003.

[42] Parrilo, P. A. Structured Semidefinite Programs andsemialgebraic Methods. PhD thesis, The CalifornaiInst. of Technology, 2000.

[43] G. Pataki. On the rank of extreme matrices in semidef-inite programming and the multiplicity of optimaleigenvalues. Math. of Oper. Res., 23(2):339–358,1998.


[44] S.H. Schmieta and F. Alizadeh. Associative and JordanAlgebras, and Polynomial Time Interior-Point Algo-rithms for Symmetric Cones. Math. of Oper. Res.,26(3):543–564, 2001.

[45] S.H. Schmieta and F. Alizadeh. Extension of Commuta-tive Class of Primal-Dual Interior Point Algorithmsto Symmetric Cones. Math. Programming, 96:409–438, 2003.

[46] N. Z. Shor. A class of global minimum bounds of poly-nomial functions. Cybernatics, 23(6):731–734, 1987.

[47] Y. Ye. A class of projective transformations for lin-ear programming. SIAM J. Comput., 19(3):457–466,1990.

[48] Y. Ye. An O(n3L) Potential Reduction Algorithmfor Linear Programming. Math. Programming,50(2):239–258, 1991.

Local Versus GlobalConditions in Polynomial

OptimizationJiawang Nie

Department of MathematicsUniversity of California San Diego,

La Jolla, CA 92093, USA

[email protected]

This paper briefly reviews the relationship be-tween local and global optimality conditions in [15].Consider the polynomial optimization problem

8<

:

min f(x)s.t. h

i

(x) = 0 (i = 1, . . . ,m1),gj

(x) � 0 (j = 1, . . . ,m2),(1)

where f, h1, . . . , hm1 , g1, . . . , gm2 are real polynomi-als in x := (x1, . . . , xn). For convenience, denote

h := (h1, . . . , hm1), g := (g1, . . . , gm2)

and g0 := 1. Let K be the feasible set of (1). Whenthere are no equality (resp., inequality) constraints,the tuple h = ; and m1 = 0 (resp., g = ; andm2 = 0).

The problem (1) can be treated as a general non-linear program. By classical nonlinear optimiza-tion methods, we can typically get a Karush-Kuhn-Tucker (KKT) point of (1). Theoretically, it is NP-hard to check whether a KKT point is a local min-imizer or not. However, it is often not too hard to

Sanjay Mehrotra, Jiawang Nie and Andrzej Ruszczynski



do that in practice. This is because there exist stan-dard conditions ensuring local optimality. On theother hand, it is often much harder to get a globalminimizer. In practice, sometimes we may be able toget a global optimizer, but it is typically hard to ver-ify the global optimality. A major reason for this islack of easily checkable global optimality conditionsin nonlinear programming theory.Local and global optimality conditions are pre-

sumably very di↵erent, except special cases like con-vex optimization. For general nonconvex optimiza-tion, little is known about global conditions. How-ever, for polynomial optimization, this is possibleby using representations of nonnegative polynomi-als. Interestingly, global optimality conditions areclosely related to the local ones, which was discov-ered in the paper [15].

1. Local Optimality Conditions

Let u be a local minimizer of (1) and

J(u) := {j1, . . . , jr}

be the index set of active inequality constraints. Ifthe constraint qualification condition (CQC) holdsat u, i.e., the gradient vectors

rh1(u), . . . ,rhm1(u),rg

m1(u), . . . ,rgjr(u)

are linearly independent, then there exist Lagrangemultipliers �1, . . . ,�m1 and µ1, . . . , µm2 satisfying

rf(u) =m1X

i=1

�i

rhi

(u) +m2X

j=1

µj

rgj

(u), (2)

µ1g1(u) = · · · = µm2gm2(u) = 0,

µ1 � 0, . . . , µm2 � 0.

�(3)

The equation (2) is called the first order optimalitycondition (FOOC), and (3) is called the complemen-tarity condition. If it further holds that

µ1 + g1(u) > 0, . . . , µm2 + g

m2(u) > 0, (4)

then the strict complementarity condition (SCC)holds at u. The strict complementarity is equiva-lent to µ

j

> 0 for every j 2 J(u). Let L(x) be theLagrange function

L(x) := f(x)�m1X

i=1

�i

hi

(x)�X

j2J(u)µj

gj

(x).

Clearly, (2) implies the gradient rx

L(u) = 0. Thepolynomials f, h

i

, gj

are smooth functions. Thus,under the constraint qualification condition, the sec-ond order necessity condition (SONC) holds:

vTr2x

L(u)v � 0 8 v 2 G(u)?. (5)

In the above, G(u) denotes the Jacobian of the activeconstraining polynomials

G(u) = Jacobian⇣h1, . . . , hm1 , gj1 , . . . , gjr

⌘��x=u

and G(u)? denotes the null space of G(u). If it holdsthat

vTr2x

L(u)v > 0 for all 0 6= v 2 G(u)?, (6)

then the second order su�ciency condition (SOSC)holds at u. The relations among the above condi-tions can be summarized as follows: if CQC holdsat u, then (2), (3) and (5) are necessary conditionsfor u to be a local minimizer, but they may not besu�cient; if (2), (3), (4) and (6) hold at u 2 K, thenu is a strict local minimizer of (1). We refer to [1,Section 3.3] for such classical results.

Mathematically, CQC, SCC and SOSC are su�-cient for local optimality, but may not be necessary.However, for generic cases, they are su�cient andnecessary conditions. This is a major conclusion of[15]. Denote by R[x]

d

the set of real polynomials in xand with degrees at most d. Let [m] := {1, . . . ,m}.The following theorem is from [15].

Theorem 1. Let d0, d1, . . . , dm1 , d01, . . . , d

0m2

be posi-tive integers. Then there exist a finite set of nonzeropolynomials '1, . . . ,'L

, which are in the coe�cientsof polynomials f 2 R[x]

d0, hi

2 R[x]di (i 2 [m1]),

gj

2 R[x]d

0j(j 2 [m2]) such that if

'1(f, h1, . . . , hm1 , g1, . . . , gm2) 6= 0,...

'L

(f, h1, . . . , hm1 , g1, . . . , gm2) 6= 0,

then CQC, SCC and SOSC hold at every local min-imizer of (1).

Theorem 1 implies that the local conditions CQC,SCC and SOSC hold at every local minimizer in thespace of input polynomials with given degrees, ex-cept a union of finitely many hypersurfaces. So,


they hold in an open dense set in the space of in-put polynomials. Therefore, CQC, SCC and SOSCcan be used as su�cient and necessary conditions inchecking local optimality, for almost all polynomialoptimization problems. This fact was observed innonlinear programming.

2. A global optimality condition

Let u be a feasible point for (1). By the definition,u is a global minimizer if and only if

f(x)� f(u) � 0 8x 2 K. (7)

Typically, it is quite di�cult to check (7) directly.In practice, people are interested in easily checkableconditions ensuring (7). For polynomial optimiza-tion, this is possible by using sum-of-squares typerepresentations.Let R[x] be the ring of real polynomials in x :=

(x1, . . . , xn). A polynomial p 2 R[x] is said tobe sum-of-squares (SOS) if p = p21 + · · · + p2

k

forp1, . . . , p

k

2 R[x]. A su�cient condition for (7) isthat there exist polynomials �1, . . . ,�m1 2 R[x] andSOS polynomials �0,�1, . . . ,�m2 2 R[x] such that

f(x)� f(u) =m1X

i=1

�i

(x)hi

(x) +m2X

j=0

�j

(x)gj

(x). (8)

The equality in (8) is a polynomial identity in thevariables of x. Note that for every feasible point xin (1), the right hand side in (8) is always nonneg-ative. This is why (8) ensures that u is a globalminimizer. The condition (8) was investigated byLasserre [6]. It was a major tool for solving the op-timization problem (1) globally. We call (8) a globaloptimality condition for (1).People wonder when the global optimality condi-

tion holds. The representation of f(x)� f(u) in (8)was motivated by Putinar’s Positivstellensatz [16],which gives SOS type certificates for positive or non-negative polynomials on the set K. Denote

hhi := h1R[x] + · · ·+ hm1R[x],

which is the ideal generated by the polynomial tu-ple h. Let ⌃[x] be the set of all SOS polynomi-als in R[x]. The polynomial tuple g generates thequadratic module:

Q(g) := ⌃[x] + g1⌃[x] + · · ·+ gm2⌃[x].

If there exists a polynomial p 2 hhi + Q(g) suchthat the set {x 2 Rn : p(x) � 0} is compact,then hhi + Q(g) is said to be archimedean. Thearchimedeanness of hhi+Q(g) implies the compact-ness of K, while the reverse is not necessary. How-ever, when K is compact, we can always add a re-dundant condition like R�kxk22 � 0 to the tuple g sothat hhi+Q(g) is archimedean. Hence, archimedean-ness of hhi + Q(g) is almost equivalent to the com-pactness of K. Putinar’s Positivstellensatz [16] saysthat if hhi + Q(g) is archimedean, then every poly-nomial which is strictly positive on K belongs tohhi+Q(g) (cf. [16]).

The global optimality condition (8) is equivalentto the membership

f(x)� f(u) 2 hhi+Q(g).

When u is a global minimizer of (1), the polynomial

f(x) := f(x)� f(u)

is nonnegative on K, but not strictly positive onK. This is because u is always a zero point of fon K. So, Putinar’s Positivstellensatz itself doesnot imply the global optimality condition (8). In-deed, there are counterexamples that (8) may nothold. For instance, when f is the Motzkin polyno-mial x21x

22(x

21+x22�3x23)+x63 and K is the unit ball,

then (8) fails to hold.However, the global optimality condition (8) holds

for almost all polynomials f, hi

, gj

, i.e., it holds inan open dense set in the space of input polynomials.This is a major conclusion of [15]. The ideal hhi issaid to be real if every polynomial in R[x] vanish-ing on the set {x 2 Rn : h(x) = 0} belongs to hhi(cf. [2]). This is a general condition. For instance,if hhi is a prime ideal and h has a nonsingular realzero, then hhi is real (cf. [2]). As pointed out earlier,when the feasible set K is compact, we can gener-ally assume that hhi + Q(g) is archimedean. Inter-estingly, the local conditions CQC, SCC and SOSCimply the global optimality condition (8), under thearchimedeanness of hhi+Q(g). The following theo-rem is a consequence of the results in [15].

Theorem 2. Assume that the ideal hhi is real andthe set hhi +Q(g) is archimedean. If the constraintqualification condition, strict complementarity con-dition, and second order su�ciency condition hold


at every global minimizer of (1), then the global op-timality condition (8) holds.

Proof. At every global minimizer u of f on K,the CQC, SCC and SOSC conditions implies thatthe boundary hessian condition holds at u, byTheroem 3.1 of [15]. The boundary hessian con-dition was introduced by Marshall [11] (see Condi-tion 2.3 of [15]). Let f

min

be the global minimumvalue of (1). Denote V = {x 2 Rn : h(x) = 0}. LetI(V ) be the set of all polynomials vanishing on V .By Theorem 9.5.3 of [10] (also see Theorem 2.4 of[15]), we have

f(x)� fmin

2 I(V ) +Q(g).

Because hhi is real, hhi = I(V ) and

f(x)� f(u) 2 hhi+Q(g).

So, the global optimality condition (8) holds.

By Theorem 1, the local conditions CQC, SCCand SOSC hold generically, i.e., in an open dense setin the space of input polynomials. Therefore, theglobal optimality condition (8) also holds generically,when hhi is real and hhi+Q(g) is archimedean.

3. Lasserre’s hierarchy

Lasserre [6] introduced a sequence of semidefiniterelaxations for solving (1) globally, which is nowcalled Lasserre’s hierarchy in the literature. It canbe desribed in two equivalent versions. One versionuses SOS type representations, while the other oneuses moment and localizing matrices. They are dualto each other, as shown in [6]. For convenience ofdescription, we present the SOS version here. Foreach k 2 N (the set of nonnegative integers), denotethe sets of polynomials (note g0 = 1)

hhi2k :=

(m1X

i=1

�i

hi

��each �

i

2 R[x]and deg(�

i

hi

) 2k

),

Qk

(g) :=

8<

:

m2X

j=0

�j

gj

��each �

j

2 ⌃[x]and deg(�

j

gj

) 2k

9=

; .

Note that hhi2k is a truncation of hhi and Qk

(g) isa truncation of Q(g). The SOS version of Lasserre’shierarchy is the sequence of relaxations

max � s.t. f � � 2 hhi2k +Qk

(g) (9)

for k = 1, 2, . . . ,. The problem (9) is equivalent toa semidefinite program (SDP). So it can be solvedas an SDP by numerical methods. For instance, thesoftware GloptiPoly 3 [3] and SeDuMi [18] can beused to solve it. We refer to [7, 9, 10] for recent workin polynomial optimization.

Let fmin

be the minimum value of (1) and fk

de-note the optimal value of (9). It was shown that(cf. [6])

· · · fk

fk+1 · · · f

min

.

When hhi+Q(g) is archimedean, Lasserre [6] provedthe asymptotic convergence

fk

! fmin

as k ! 1.

If fk

= fmin

for some k, Lasserre’s hierarchy is saidto have finite convergence. It is possible that thesequence {f

k

} has only asymptotic, but not finite,convergence. For instance, this is the case when fis the Motzkin polynomial x21x

22(x

21 + x22 � 3x23) + x63

and K is the unit ball [12, Example 5.3]. Indeed,such f always exists whenever dim(K) � 3, whichcan be implied by [17, Prop. 6.1]. However, suchcases do not happen very much. Lasserre’s hierarchyoften has finite convergence in practice, which wasdemonstrated by extensive numerical experiments inpolynomial optimization (cf. [4, 5]).

A major conclusion of [15] is that Lasserre’s hier-archy almost always has finite convergence. Specif-ically, it was shown that Lasserre’s hierarchy hasfinite convergence when the local conditions CQC,SCC and SOSC are satisfied, under the archimedean-ness. The following theorem is shown in [15].

Theorem 3. Assume that hhi + Q(g) isarchimedean. If the constraint qualification,strict complementarity and second order su�-ciency conditions hold at every global minimizerof (1), then Lasserre’s hierarchy of (9) has finiteconvergence.

By Theorem 1, the local conditions CQC, SCCand SOSC at every local minimizer, in an open denseset in the space of input polynomials. This im-plies that, under the archimedeanness of hhi+Q(g),Lasserre’s hierarchy has finite convergence, in anopen dense set in the space of input polynomials.That is, Lasserre’s hierarchies almost always (i.e.,


generically) have finite convergence. This is a majorconclusion of [15].If one of the assumptions in Theorem 3 does not

hold, then {fk

} may fail to have finite convergence.The counterexamples were shown in §3 of [15]. Onthe other hand, there exists other non-generic con-ditions than ensures finite convergence of {f

k

}. Forinstance, if h has finitely many real or complex zeros,then {f

k

} has finite convergence (cf. [8, 14]).Since the minimum value f

min

is typically notknown, a practical concern is how to check f

k

= fmin

in computation. This issue was addressed in [13].Flat truncation is generally a su�cient and neces-sary condition for checking finite convergence.For non-generic polynomial optimization prob-

lems, it is possible that the sequence {fk

} does nothave finite convergence to f

min

. People are inter-ested in methods that have finite convergence forminimizing all polynomials over a given set K. TheJacobian SDP relaxation proposed in [12] can be ap-plied for this purpose. It gives a sequence of lowerbounds that have finite converge to f

min

, for everypolynomial f that has a global minimizer over a gen-eral set K.

Acknowledgement The research was partially sup-ported by the NSF grants DMS-0844775 and DMS-1417985.

REFERENCES

[1] D. Bertsekas. Nonlinear Programming, second edition.Athena Scientific, 1995.

[2] J. Bochnak, M. Coste and M-F. Roy. Real Algebraic Ge-ometry, Springer, 1998.

[3] D. Henrion, J. Lasserre and J. Loefberg. Glop-tiPoly 3: moments, optimization and semidef-inite programming. http://homepages.laas.fr/

henrion/software/gloptipoly3/

[4] D. Henrion and J.B. Lasserre. GloptiPoly : Global Op-timization over Polynomials with Matlab and Se-DuMi. ACM Trans. Math. Soft. 29, pp. 165–194,2003.

[5] D. Henrion and J.B. Lasserre. Detecting global opti-mality and extracting solutions in GloptiPoly. Pos-itive polynomials in control, 293310, Lecture Notesin Control and Inform. Sci., 312, Springer, Berlin,2005.

[6] J.B. Lasserre. Global optimization with polynomials andthe problem of moments. SIAM J. Optim., 11(3):796-817, 2001.

[7] J.B. Lasserre. Moments, Positive Polynomials and TheirApplications, Imperial College Press, 2009.

[8] M. Laurent. Semidefinite representations for finite vari-eties. Mathematical Programming, Vol. 109, pp. 1–26, 2007.

[9] M. Laurent. Sums of squares, moment matrices and op-timization over polynomials. Emerging Applicationsof Algebraic Geometry, Vol. 149 of IMA Volumes inMathematics and its Applications (Eds. M. Putinarand S. Sullivant), Springer, pages 157-270, 2009.

[10] M. Marshall. Positive Polynomials and Sums of Squares.Mathematical Surveys and Monographs, 146. Amer-ican Mathematical Society, Providence, RI, 2008.

[11] M. Marshall. Representation of non-negative polynomi-als, degree bounds and applications to optimization.Canad. J. Math., 61 (2009), pp. 205–221.

[12] J. Nie. An exact Jacobian SDP relaxation for polyno-mial optimization. Mathematical Programming, Ser.A, Vol. 137, pp. 225–255, 2013.

[13] J. Nie. Certifying convergence of Lasserre’s hierarchy viaflat truncation. Mathematical Programming, Ser. A,Vol 142, No. 1-2, pp. 485-510, 2013.

[14] J. Nie. Polynomial optimization with real varieties.SIAM Journal On Optimization, Vol 23, No.3, pp.1634-1646, 2013.

[15] J. Nie. Optimality Conditions and Finite Convergenceof Lasserre’s Hierarchy Mathematical Programming,Ser. A, Vol 146, No. 1-2, pp. 97-121, 2014.

[16] M. Putinar. Positive polynomials on compact semi-algebraic sets, Ind. Univ. Math. J. 42 (1993), 969–984.

[17] C. Scheiderer. Sums of squares of regular functions onreal algebraic varieties. Trans. Am. Math. Soc., 352,1039-1069 (1999).

[18] J.F. Sturm. SeDuMi 1.02: a MATLAB toolbox for opti-mization over symmetric cones. Optimization Meth-ods and Software, 11&12 (1999), 625-653. http:

//sedumi.ie.lehigh.edu

http://homepages.laas.fr/henrion/software/gloptipoly3/

http://homepages.laas.fr/henrion/software/gloptipoly3/

http://sedumi.ie.lehigh.edu

http://sedumi.ie.lehigh.edu


Convergence Rate Analysisof Several Splitting Schemes

Damek DavisDepartment of Mathematics

University of California, Los Angeles, CA

[email protected]

Our paper [12] is concerned with the following un-constrained minimization problem:

minimizex2H

f(x) + g(x) (1)

where H is a Hilbert space and f, g : H ! (�1,1]are closed, proper, and convex functions. Throughduality, the results generalize to the problem:

minimizex2H1,y2H2

f(x) + g(y) subject to Ax+By = 0,

(2)

where A : H1 ! G and B : H2 ! G are linearmappings and H1,H2, and G are Hilbert spaces. Atfirst this decomposition of the objective into twofunctions may seem superfluous, but it captures themotivation of operator-splitting methods: the wholeconvex problem may be as complicated as you wish,but it can be decomposed into two or more (possibly)simpler terms.The above decomposition suggests that we try to

minimize the complicated objective f+g by alterna-tively solving a series of subproblems related to min-imizing f and g separately. The following stronglyconvex optimization problem is one of the main sub-problems we use:

proxf

(x) := argminy2H f(y) +

1

2ky � xk2,

where the input x will encode some informationabout f + g, so the subproblem will not just min-imize f alone. It turns out that many functionsarising in machine learning, signal processing, andimaging have simple or closed form proximal oper-ators [7, 20, 4, 16], which is why operator-splittingalgorithms are so e↵ective for large-scale problems.Since von Neumann devised the method of al-

ternating projections [24], several operator splittingmethods have been proposed [18, 21, 23] to solve a

variety of decomposable problems [6, 14, 8, 9, 25, 5].Although splitting methods are observed to per-form well in practice, their convergence rates wereunknown for many years; this is a big limita-tion for practitioners. Without convergence rates,practitioners cannot predict practical performance,compare competing methods without implementingboth, or understand why methods perform well onnice problems but are slow on other problems of thesame size. The award winning manuscript [12], andfollow up work [13, 10, 11] resolves these issues for avariety of splitting algorithms.

Analyzing convergence rates of splitting methodsis challenging because the classical objective erroranalysis breaks down. Here, the objective no longermonotonically decreases; in some cases, they can becompletely useless. For example, finding a point inthe intersection of sets C1 and C2 can be formulatedas Problem (1) with f = ◆

C1 and g = ◆C2 . Notice

that the objective function f + g is finite (and equalto 0) only at a solution x 2 C1 \ C2. The Douglas-Rachford splitting method (which is a special case ofthe Peaceman-Rachford splitting method (see Equa-tion 3)) produces a sequence of points convergingto a solution. In general, all points in the sequencehave an infinite objective value, but the limit pointhas a zero objective value! Thus, we must take a dif-ferent approach to analyze operator-splitting meth-ods: we first prove convergence rates for the fixed-point residual associated to a nonexpansive map-ping, which implies bounds for certain subgradientsthat we use to bound the objective function.

Sanjay Mehrotra, Damek Davis and Serhat Aybat



A sketch of the upper complexity proof.

A key property used in our analysis is the firm non-expansiveness of prox

f

, which means that 2proxf

�IH is nonexpansive, i.e., 1-Lipschitz continuous [1,Proposition 4.2]. Because the composition of twononexpansive operators remains nonexpansive, thisproperty implies that the Peaceman-Rachford op-erator TPRS := (2prox

f

� I) � (2proxg

� I) isnonexpansive [18]. Thus, the Krasnosel’skiı-Mann(KM) [17, 19] iteration with damping by � 2 (0, 1)

zk+1 = (1� �)zk + �TPRS(zk) k = 0, 1, . . . (3)

will weakly converge to a fixed point of TPRS when-ever one exists [1, Theorem 5.14]. We are interestedin fixed-points z⇤ of TPRS because they immediatelyyield minimizers through the relation prox

g

(z⇤) 2argmin(f + g) [1, Theorem 25.6]. Furthermore, thesequence (prox

g

(zj))j�0 weakly converges to a min-

imizer of f + g when one exists [22].For any nonexpansive mapping T : H ! H, the

fixed-point residual (FPR) kTz � zk measures how“close” z is to being a fixed point. A key result inour paper establishes the convergence rate of thisquantity:

Theorem 4. Let � 2 (0, 1). For any nonexpansivemap T : H ! H with nonempty fixed-point set, wehave

kTzk � zkk = o

✓1pk + 1

◆

where z0 2 H and for all k � 0, zk+1 = (1� �)zk +�Tzk.

Roughly, this theorem follows by showing (kzj+1�zjk2)

j�0 is monotonic and summable. The conver-gence then follows from the following simple lemma:

Lemma 5. Let (aj

)j�0 ✓ R++ be a monotonic down

and summable sequence. Then ak

= o(1/(k + 1)).

Proof. (k + 1)a2k a2k + · · ·+ ak

k!1! 0.

We already mentioned that the convergence rate ofthe fixed-point residual would play a role in bound-ing subgradients associated to the relaxed PRS al-gorithm. Using some basic convex analysis, Figure 1“expands” a single iteration of Algorithm 3 into aseries of “subgradient steps.”

z TPRS(z)z+

xg = proxg(z)

z0

xf = proxf (z0)

�erg(xg)

�erg(xg) �erf(xf )

�erf(xf )

xf � xg

Figure 1: A single iteration of the relaxed PRS al-gorithm. Note that z+ = (1/2)z+(1/2)TPRS(z) anderg(x

g

) 2 @g(xg

) and erf(xf

) 2 @f(xf

).

A few identities immediately follow from Figure 1:

z+ � z = xf

� xg

= �(erg(xg

) + erf(xf

)) (4)

Note that whenever TPRS(z) = z, we have x⇤ :=xf

= xg

and erg(x⇤)+ erf(x⇤) = 0, so x⇤ = proxg

(z)is optimal for Problem 1. With this identity andTheorem 4, we get the following result:

Theorem 6. Suppose that (zj)j�0 is generated by

Algorithm 3. For all k � 0, with xkg

, erg(xkg

), xkf

, anderf(xk

f

) defined as in Figure 3, we have

kxkf

� xkg

k2 = kerg(xkg

) + erf(xkf

)k2 = o

✓1

k + 1

◆.

By applying the basic inequality h(y) � h(x) +herh(x), y � xi on f and g, we obtain the next theo-rem.


Algorithm (3). Let x⇤ be a minimizer of f + g. Forall k � 0, let xk

g

and xkf

be defined as in Figure 3.Then

|f(xkf

) + g(xkg

)� f(x⇤)� g(x⇤)| = o

✓1pk + 1

◆.

Furthermore, when f is Lipschitz continuous, wehave

0 (f + g)(xkg

)� (f + g)(x⇤) = o

✓1pk + 1

◆.


Theorem 7 shows that the relaxed PRS algorithmis strictly faster than the slowest performing first-order (subgradient) methods. Splitting methods areknown to perform well in practice, so it is naturalto suspect that this is not the best we can do. Thefollowing section shows that this intuition is partiallytrue.

Acceleration through averaging.

Can we improve upon the worst-case o(1/pk + 1)

complexity in Theorem 7? The following examplegives us some insight into how we might improve.

Figure 2: The relaxed PRS algorithm (� = 1/2) ap-plied to the intersection of two lines L1 and L2 (thehorizontal axis and the line through the origin withslope .2). Formally, g = ◆

L1 and f = ◆L2 are the in-

dicator functions of L1 and L2. The left side (red) isthe actual trajectory of the DRS algorithm (i.e., thesequence (zj)

j�0). The right side (blue) is obtained

through averaging zk := (1/(k + 1))P

k

i=0 zi.

Figure 2 replaces the k-th iterate zk generated byAlgorithm (3) with the averaged, or ergodic, iter-ate zk := (1/(k + 1))

Pk

i=0 zi. It is clear that, at

least initially, the ergodic iterate outperforms thestandard, or nonergodic, iterate because of cancella-tion induced from averaging the spiraling sequence.In actuality, the nonergodic iterate eventually sur-passes the nonergodic iterate and converges linearlywith rate ⇡ 0.9806 [2], but this example gives us anidea of why averaging can help.Inspired by Figure 2, we introduced the following

averaging scheme for relaxed-PRS to obtain a fasterO(1/(k + 1)) worst-case convergence rate.


Algorithm (3). Let x⇤ be a minimizer of f + g. Forall k � 0, let xk

g

and xkf

be defined as in Figure 3.

Furthermore, let xkg

:= (1/(k+1))P

k

i=0 xi

g

and xkf

:=(1/(k + 1))

Pk

i=0 xi

f

. Then

|f(xkf

) + g(xkg

)� f(x⇤)� g(x⇤)| = O

✓1

k + 1

◆.

Furthermore, when f is Lipschitz continuous, wehave

0 (f + g)(xkg

)� (f + g)(x⇤) = O

✓1

k + 1

◆.

The rate O(1/(k + 1)) is a big improvement overthe o(1/

pk + 1) rate from Theorem 7. However, it

is not clear that averaging will always result in fasterpractical performance. Indeed, consider the follow-ing basis pursuit problem:

minimizex2Rd

kxk1 subject to: Ax = b (5)

where A 2 Rm⇥d is a matrix and b 2 Rm is a vec-tor. We can apply the relaxed PRS algorithm tothis problem by setting g(x) = kxk1 and f(x) =◆{y|Ay=b}(x) where ◆

C

denotes the {0,1}-valued in-dicator function of a set C.

Figure 3: The objective error of the basis pursuitproblem (5)

Figure 3 shows a numerical experiment on thebasis-pursuit problem. Three things are clear: (i)the nonergodic iterate is highly oscillatory as sug-gested early; (ii) the performance of the nonergodiciterate eventually surpasses the ergodic iterate; (iii)the ergodic iterate eventually converges with rate ex-actly O(1/(k + 1)) and no faster. This situation istypical, and suggests that averaging is not necessar-ily a good idea in all scenarios. Nevertheless obtain-ing xk

g

is a nearly costless procedure, so it never hurtsto average.


✓1 ✓2✓2L L L L· · · · · ·

Figure 4: This graph shows 5 iterations of relaxed PRS applied to an infinite direct sum of examples similarto the one shown in Figure 2. The angles (✓

j

)j�0 are converging to 0. The black and blue dots coincide in

the first component. The infinite vectors of black (on tip of spiral), blue (on horizontal axis), and magenta(on non horizontal line) dots are the vectors z5, x5

g

, and the projection of x5g

onto the non horizontal line,respectively. The objective functions in Theorem 9 are as follows: g is the indicator of the horizontal axisand f is the distance function to the non horizontal line. Thus, g(x5

g

) = 0 and f(x5g

) is the distance betweenthe blue and magenta dots.

The worst-case nonergodic rate is sharp.

Although averaging can improve the rate of conver-gence of relaxed PRS, we observed in Figure 3 thatthe nonergodic iterate can perform better than theergodic iterate. Does this mean that our conver-gence rate in Theorem 7 can be improved? It turnsout that the answer is no, at least in infinite dimen-sional problems:

Theorem 9. There is a Hilbert space H and twoclosed, proper, and convex functions f, g : H !(�1,1] such that f is Lipschitz continuous and thefollowing holds: for all " > 0, there is an initial pointz0 2 H and a stepsize � 2 R++ with

(f + g)(xkg

)� (f + g)(x⇤) = ⌦

✓1

(k + 1)1/2+"

◆

where (zj)j�0 is generated by Algorithm (3) with � =

1/2 and x⇤ is the unique minimizer of f + g.

The example that proves the sharpness of ouro(1/

pk + 1) rate is shown in Figure 4. The proof

of Theorem 9 is technical and involves estimating anoscillatory integral, so we cannot get into the detailsin this brief article.

Follow up work.

In a later paper [13], we show that relaxed PRSand ADMM automatically adapt to the regularityof Problems 1 and 2. For example, we show that forall minimizers x⇤ of f + g, the “best” objective error

satisfies

mini=0,··· ,k

�(f + g)(xi

g

)� (f + g)(x⇤)�= o

✓1

k + 1

◆

whenever f is di↵erentiable and rf is Lipschitz con-tinuous. Thus, relaxed PRS obtains the same rate ofconvergence as the forward-backward splitting algo-rithm (also called ISTA) [3], which requires knowl-edge of the Lipschitz constant of rf to ensure con-vergence (or a suitable line search procedure). Thepaper also includes general conditions for linear con-vergence, linear convergence of feasibility problemswith nice intersection, and convergence rates un-der strong convexity assumptions. Finally, the pa-pers [10, 11] prove convergence rates for a wide classof algorithms that are significantly more powerfulthan relaxed PRS and ADMM.

Conclusion.

The results of our paper show that the relaxed PRSmethod is nearly as slow as the subgradient method(Theorem 9). This is unexpected because split-ting methods tend to perform well in practice. Wecan get around this issue by computing the ergodicsequence (Theorem 8) and testing whether it hasa smaller objective value than the nonergodic se-quence. This procedure is essentially costless, but ittends to perform worse in practice (Figure 3). In thepaper, we include several other complexity resultsincluding the first example of arbitrarily slow con-vergence for the Douglas-Rachford splitting method.We also analyze extensions of Problem (1) and get


similar results for the alternating direction methodof multipliers (ADMM) [15].

Acknowledgements.

I would like to say thank you to the prize committeefor this honor. I am also grateful to Professor WotaoYin for taking me on as a student and collaboratingwith me on this project.

REFERENCES

[1] Heinz H. Bauschke and Patrick L. Combettes. Con-vex analysis and monotone operator theory in Hilbertspaces. Springer, 2011.

[2] Heinz H. Bauschke, J.Y. Bello Cruz, Tran T.A. Nghia,Hung M. Phan, and Xianfu Wang. The rate of lin-ear convergence of the Douglas-Rachford algorithmfor subspaces is the cosine of the Friedrichs angle.Journal of Approximation Theory, 185(0):63 – 79,2014.

[3] Amir Beck and Marc Teboulle. A Fast IterativeShrinkage-Thresholding Algorithm for Linear In-verse Problems. SIAM Journal on Imaging Sciences,2(1):183–202, 2009.

[4] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato,and Jonathan Eckstein. Distributed optimizationand statistical learning via the alternating directionmethod of multipliers. Foundations and Trends inMachine Learning, 3(1):1–122, 2011.

[5] Luis M. Briceno-Arias and Patrick L. Combettes. AMonotone+Skew Splitting Model for CompositeMonotone Inclusions in Duality. SIAM Journal onOptimization, 21(4):1230–1250, 2011.

[6] Antonin Chambolle and Thomas Pock. A First-OrderPrimal-Dual Algorithm for Convex Problems withApplications to Imaging. Journal of MathematicalImaging and Vision, 40(1):120–145, 2011.

[7] Patrick L. Combettes and Jean-Christophe Pesquet.Proximal splitting methods in signal processing. InFixed-Point Algorithms for Inverse Problems in Sci-ence and Engineering, Springer Optimization andIts Applications, pages 185–212. Springer New York,2011.

[8] Patrick L. Combettes and Jean-Christophe Pesquet.Primal-Dual Splitting Algorithm for Solving Inclu-sions with Mixtures of Composite, Lipschitzian, andParallel-Sum Type Monotone Operators. Set-Valuedand Variational Analysis, 20(2):307–330, 2012.

[9] Laurent Condat. A Primal-Dual Splitting Method forConvex Optimization Involving Lipschitzian, Prox-imable and Linear Composite Terms. Journal of Op-timization Theory and Applications, 158(2):460–479,2013.

[10] Damek Davis. Convergence rate analysis ofprimal-dual splitting schemes. arXiv preprintarXiv:1408.4419v2, 2014.

[11] Damek Davis. Convergence rate analysis of the forward-Douglas-Rachford splitting scheme. arXiv preprintarXiv:1410.2654v3, 2014.

[12] Damek Davis and Wotao Yin. Convergence rate anal-ysis of several splitting schemes. arXiv preprintarXiv:1406.4834v2, 2014.

[13] Damek Davis and Wotao Yin. Faster convergencerates of relaxed Peaceman-Rachford and ADMMunder regularity assumptions. arXiv preprintarXiv:1407.5210v2, 2014.

[14] Ernie Esser, Xiaoqun Zhang, and Tony Chan. A generalframework for a class of first order primal-dual algo-rithms for convex optimization in imaging science.SIAM Journal on Imaging Sciences, 3(4):1015–1046,2010.

[15] Roland Glowinski and Americo Marrocco. Surl’approximation, par elements finis d’ordre un, et laresolution, par penalisation-dualite d’une classe deproblemes de Dirichlet nonlineaires. Rev. FrancaisedAut. Inf. Rech. Oper, R-2:41–76, 1975.

[16] Thomas Goldstein and Stanley Osher. The split breg-man method for l1-regularized problems. SIAMJournal on Imaging Sciences, 2(2):323–343, 2009.

[17] Mark Aleksandrovich Krasnosel’skii. Two remarks onthe method of successive approximations. UspekhiMatematicheskikh Nauk, 10(1):123–127, 1955.

[18] Pierre-Louis Lions and Bertrand Mercier. Splitting Al-gorithms for the Sum of Two Nonlinear Operators.SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.

[19] W. Robert Mann. Mean Value Methods in Iteration.Proceedings of the American Mathematical Society,4(3):pp. 506–510, 1953.

[20] Neal Parikh and Stephen Boyd. Proximal algorithms.Foundations and Trends in Optimization, 1(3):127–239, 2014.

[21] Gregory B. Passty. Ergodic convergence to a zero ofthe sum of monotone operators in Hilbert space.Journal of Mathematical Analysis and Applications,72(2):383 – 390, 1979.


[22] Benar F. Svaiter. On Weak Convergence of the Douglas-Rachford Method. SIAM Journal on Control andOptimization, 49(1):280–287, 2011.

[23] Paul Tseng. A Modified Forward-Backward SplittingMethod for Maximal Monotone Mappings. SIAMJournal on Control and Optimization, 38(2):431–446, 2000.

[24] John von Neumann. Functional operators: The geome-try of orthogonal spaces. Princeton University Press,1951. (Reprint of mimeographed lecture notes firstdistributed in 1933.).

[25] Bang Cong Vu. A splitting algorithm for dual mono-tone inclusions involving cocoercive operators. Ad-vances in Computational Mathematics, 38(3):667–681, 2013.

Nominations for SocietyPrizes Sought

The Society awards four prizes annually at the IN-FORMS annual meeting. We seek nominations (in-cluding self-nominations) for each of them, due byJuly 15, 2015. Details for each of the prizes, includ-ing eligibility rules and past winners, can be foundby following the links from http://www.informs.

org/Community/Optimization-Society/Prizes.Each of the four awards includes a cash amount

of US$1,000 and a citation plaque. The award win-ners will be invited to give a presentation in a specialsession sponsored by the Optimization Society dur-ing the INFORMS annual meeting in Philadelphia,PA in November 2015 (the winners will be respon-sible for their own travel expenses to the meeting).Award winners are also asked to contribute an ar-ticle about their award-winning work to the annualOptimization Society newsletter.

Nominations, applications, and inquiries for eachof the prizes should be made via email to the corre-sponding prize committee chair.

The Khachiyan Prize is awarded for outstand-ing lifetime contributions to the field of optimizationby an individual or team. The topic of the con-tribution must belong to the field of optimizationin its broadest sense. Recipients of the INFORMSJohn von Neumann Theory Prize or the MPS/SIAMDantzig Prize in prior years are not eligible for theKhachiyan Prize. The prize committee for this year’sKhachiyan Prize is as follows:

• Ilan Adler (Chair)[email protected]

• Michael Ball

• Don Goldfarb

• Werner Romisch

The Farkas Prize is awarded for outstanding con-tributions by a mid-career researcher to the field ofoptimization, over the course of their career. Suchcontributions could include papers (published orsubmitted and accepted), books, monographs, andsoftware. The awardee will be within 25 years oftheir terminal degree as of January 1 of the year of

http://www.informs.org/Community/Optimization-Society/Prizes

http://www.informs.org/Community/Optimization-Society/Prizes



the award. The prize may be awarded at most oncein their lifetime to any person. The prize committeefor this year’s Farkas Prize is as follows:

• Ariela Sofer (Chair)[email protected]

• Jon Lee

• Sanjay Mehrotra

• Zelda Zabinski

The Prize for Young Researchers is awardedto one or more young researcher(s) for an outstand-ing paper in optimization. The paper must be pub-lished in, or submitted to and accepted by, a refereedprofessional journal within the four calendar yearspreceding the year of the award. All authors musthave been awarded their terminal degree within eightcalendar years preceding the year of award. Theprize committee for this year’s Prize for Young Re-searchers is as follows:

• Nick Sahinidis (Chair)[email protected]

• Dan Bienstock

• Sam Burer

• Andrew Schaefer

The Student Paper Prize is awarded to one ormore student(s) for an outstanding paper in opti-mization that is submitted to and received or pub-lished in a refereed professional journal within threecalendar years preceding the year of the award. Ev-ery nominee/applicant must be a student on the firstof January of the year of the award. All coauthor(s)not nominated for the award must send a letter indi-cating that the majority of the nominated work wasperformed by the nominee(s). The prize committeefor this year’s Student Paper Prize is as follows:

• Mohit Tawarmalani (Chair)[email protected]

• Fatma Kilinc-Karzan

• Warren Powell

• Uday Shanbhag

Nominations of Candidatesfor Society O�cers Sought

Sanjay Mehtotra will complete his term as Most-Recent Past-Chair of the Society at the conclusion ofthe 2015 INFORMS annual meeting. Suvrajeet Senis continuing as Chair through 2016. Jim Luedtkewill also complete his term as Secretary/Treasurerat the conclusion of the INFORMS meeting. TheSociety is indebted to Sanjay and Jim for their work.

We would also like to thank four Society Vice-Chairs who will be completing their two-year termsat the conclusion of the INFORMS meeting: ImrePolik, Juan Pablo Vielma, John Mitchell, andVladimir Boginski.

Finally, we thank the Newsletter Editor, ShabbirAhmed, who is also stepping down this year.

We are currently seeking nominations of candi-dates for the following positions:

• Chair-Elect

• Secretary/Treasurer

• Vice-Chair for Computational Optimizationand Software

• Vice-Chair for Integer and Discrete Optimiza-tion

• Vice-Chair for Linear and Conic Optimization

• Vice-Chair for Network Optimization

• Newsletter Editor

Self nominations for all of these positions are encour-aged.

To ensure a smooth transition of the chairmanshipof the Society, the Chair-Elect serves a one-year termbefore assuming a two-year position as Chair; thusthis is a three-year commitment. As stated in theSociety Bylaws, “The Chair shall be the chief admin-istrative o�cer of the OS and shall be responsible forthe development and execution of the Society’s pro-gram. He/she shall (a) call and organize meetings ofthe OS, (b) appoint ad hoc committees as required,(c) appoint chairs and members of standing com-mittees, (d) manage the a↵airs of the OS betweenmeetings, and (e) preside at OS Council meetingsand Society membership meetings.”





The Secretary/Treasurer serves a two-yearterm. According to Society Bylaws, “The Secre-tary/Treasurer shall conduct the correspondence ofthe OS, keep the minutes and records of the Society,maintain contact with INFORMS, receive reports ofactivities from those Society Committees that maybe established, conduct the election of o�cers andMembers of Council for the OS, make arrangementsfor the regular meetings of the Council and themembership meetings of the OS. As treasurer,he/she shall also be responsible for disbursement ofthe Society funds as directed by the OS Council,prepare and distribute reports of the financial condi-tion of the OS, help prepare the annual budget of theSociety for submission to INFORMS. It will be theresponsibility of the outgoing Secretary/Treasurerto make arrangements for the orderly transfer ofall the Society’s records to the person succeedinghim/her.” The Secretary/Treasurer is allowed toserve at most two consecutive terms; as Jim Luedtkeis now completing his second term, the OS mustelect a new Secretary/Treasurer this cycle.Vice-Chairs also serve a two-year term. Accord-

ing to Society Bylaws, “The main responsibility ofthe Vice Chairs will be to help INFORMS Local Or-ganizing committees identify cluster chairs and/orsession chairs for the annual meetings. In general,the Vice Chairs shall serve as the point of contactwith their sub-disciplines.”The Newsletter Editor is appointed by the Society

Chair. The Newsletter Editor is responsible for edit-ing this newsletter, which is produced in the springeach year. Although this is not an elected position,nominations for this position are being sought at thistime for consideration by the chair.Please send your nominations or self-nominations

to Jim Luedtke ([email protected]), includingcontact information for the nominee, by June 1,2015. Online elections will begin in mid- June, withnew o�cers taking up their duties at the conclusionof the 2015 INFORMS annual meeting.

Optimization groups at Princeton and Rutgerswill be hosting the 2016 IOS Conference, andWarren Powell will serve as the General Chair.Stay tuned for more information.


Documents

INFORMS OS Todaydimitrib/Bertsekas_Journey_through_Optimization.pdf · analysis, my ﬁrst assignment was to teach to un-suspecting students optimization by functional an-alytic methods