1 Mathematical Methods in Economics and...1 Schofield Springer Texts in Business and Economics Mathematical Methods in Economics and Mathematical Methods in Social Choice Economics

1

Schofield

Springer Texts in Business and Economics

Mathematical Methods in Economics and Social Choice

Mathem

atical Methods in

Economics and Social Choice

Second Edition

In recent years, the usual optimization techniques, which have proved so useful in microeconomic theory, have been extended to incorporate more powerful topological and differential methods, and these methods have led to new results on the qualitative behavior of general economic and political systems. These developments have neces-sarily resulted in an increase in the degree of formalism in the publications in the academic journals. This formalism can often deter graduate students. The progression of ideas presented in this book will familiarize the student with the geometric concepts underlying these topological methods, and, as a result, make mathematical economics, general equilibrium theory, and social choice theory more accessible.

2nd Ed.

Norman Schofield


Norman Schofield

Mathematical Methods in Economics and Social ChoiceSecond Edition

Business / Economics

9 7 8 3 6 4 2 3 9 8 1 7 9

isbn 978-3-642-39817-9

Metadata of the book and chapters that will be visualized online Book series name Springer Texts in Business and Economics

Book title Mathematical Methods in Economics and Social Choice

Book copyright year 2014

Book copyright holder Springer-Verlag Berlin Heidelberg

Corresponding Author Family name Schofield

Particle

Given Name Norman

Suffix

Division Center in Political Economy

Organization Washington University in Saint Louis

Address Saint Louis, MO, USA

Chapter 1

Chapter title Sets, Relations, and Preferences

Abstract Chapter 1 introduces elementary set theory and the notation to be used throughout the book. We also define the notions of a binary relation, of a function, as well as the axioms of a group and field. Finally we discuss the idea of an individual and social preference relation, and mention some of the concepts of social choice and welfare economics.

Chapter 2

Chapter title Linear Spaces and Transformations

Abstract Chapter 2 surveys material on linear or vector spaces, and introduces the idea of a linear transformation between vector spaces, and shows how a linear transformation can be represented by a matrix. We also prove the dimension theorem, which shows how a linear transformation can be represented in a simple form, given by its kernel and image. We also introduce the notion of an eigenvalue and eigenvector of a matrix.

Chapter 3

Chapter title Topology and Convex Optimisation

Abstract Chapter 3 covers Topology and convex optimization. In the Chap. 2 we introduced the notion of the scalar product of two vectors in ℜn. More generally if a scalar product is defined on some space, then this permits the definition of a norm, or length, associated with a vector, and this in turn allows us to define the distance between two vectors. A distance function or metr ic may be defined on a space, X, even when

X admits no norm. More general than the notion of a metric is that of a topology. This notion allows us to define the idea of continuity of a function as well as analogous ideas for a correspondence. We then introduce three powerful theorems, the Brouwer Fixed Point Theorem for a function, Michael’s Selection Theorem, and the Browder Fixed Point Theorem for a correspondence.

Chapter 4

Chapter title Differential Calculus and Smooth Optimisation

Abstract In this chapter we develop the ideas of the differential calculus. Under certain conditions a continuous function f :ℜn→ℜm can be approximated at each point x in ℜn

by a linear function df (x):ℜn→ℜm, known as the di f ferent ial of f at x. In the same way the differential df may be approximated by a bilinear map d2f(x). When all differentials are continuous then f is called smooth. For a smooth function f , Taylor’s Theorem gives a relationship between the differentials at a point x and the value of f in a neighbourhood of a point. This in turn allows us to characterise maximum points of the function by features of the first and second differential. For a real-valued function whose preference correspondence is convex we can virtually identify critical points (where df (x)=0) with the maxima of the function. We use calculus to derive important results in economic theory, namely conditions for existence of a price equilibrium for an economy, and the Welfare Theorem for an exchange economy.

Chapter 5

Chapter title Singularity Theory and General Equilibrium

Abstract In Chap. 5 we introduce the fundamental result in singularity theory, that the set of singularity points of a smooth preference profile almost always has a particular geometric structure. We then go on to use this result to discuss the Debreu-Smale Theorem on the generic existence of regular economies. Section <InternalRef RefID="Sec6">5.4</InternalRef> uses an example of Scarf (Int. Econ. Rev. 1:157–172, 1960) to illustrate the idea of an excess demand function for an exchange economy. The example provides a general way to analyse a smooth adjustment process leading to a Walrasian equilibrium. Sections <InternalRef RefID="Sec7">5.5</InternalRef> and <InternalRef RefID="Sec8">5.6</InternalRef> introduce the more abstract topological ideas of structural stability and chaos in dynamical systems.

Chapter 6

Chapter title Topology and Social Choice

Abstract In this chapter we apply earlier results to the study of social choice and modelling elections. In Chap. 3 we showed the Nakamura Theorem. that a social choice could

be guaranteed as long as the dimension of the space did not exceed k(σ)=2. We now consider what can happen in dimension above k(σ)=1. We then go on to consider “probabilistic” social choice, where there is some uncertainty over voters’ preferences, by constructing an empirical model of the 2008 U.S. presidential election.

Chapter 7

Chapter title Review Exercises

Abstract ??????

PDF-OUTPUT

AU

TH

OR

’S P

RO

OF

Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46


For further volumes:www.springer.com/series/10099

http://www.springer.com/series/10099

AU

TH

OR

’S P

RO

OF

Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

AU

TH

OR

’S P

RO

OF


93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

Norman Schofield

Mathematical Methodsin Economics and SocialChoice

Second Edition

AU

TH

OR

’S P

RO

OF


139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

Norman SchofieldCenter in Political EconomyWashington University in Saint LouisSaint Louis, MO, USA

ISSN 2192-4333 ISSN 2192-4341 (electronic)Springer Texts in Business and EconomicsISBN 978-3-642-39817-9 ISBN 978-3-642-39818-6 (eBook)DOI 10.1007/978-3-642-39818-6Springer Heidelberg New York Dordrecht London

Library of Congress Control Number: ???

© Springer-Verlag Berlin Heidelberg 2004, 2014This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed. Exempted from this legal reservation are brief excerpts in connectionwith reviews or scholarly analysis or material supplied specifically for the purpose of being enteredand executed on a computer system, for exclusive use by the purchaser of the work. Duplication ofthis publication or parts thereof is permitted only under the provisions of the Copyright Law of thePublisher’s location, in its current version, and permission for use must always be obtained from Springer.Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violationsare liable to prosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date of pub-lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for anyerrors or omissions that may be made. The publisher makes no warranty, express or implied, with respectto the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

http://www.springer.com

http://www.springer.com/mycopy

AU

TH

OR

’S P

RO

OF


185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

Dedicated to the memory of Jeffrey Banksand Richard McKelvey

AU

TH

OR

’S P

RO

OF


AU

TH

OR

’S P

RO

OF


231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

Foreword

The use of mathematics in the social sciences is expanding both in breadth anddepth at an increasing rate. It has made its way from economics into the other socialsciences, often accompanied by the same controversy that raged in economics in the1950s. And its use has deepened from calculus to topology and measure theory tothe methods of differential topology and functional analysis.

The reasons for this expansion are several. First, and perhaps foremost, mathe-matics makes communication between researchers succinct and precise. Second, ithelps make assumptions and models clear; this bypasses arguments in the field thatare a result of different implicit assumptions. Third, proofs are rigorous, so math-ematics helps avoid mistakes in the literature. Fourth, its use often provides moreinsights into the models. And finally, the models can be applied to different contextswithout repeating the analysis, simply by renaming the symbols.

Of course, the formulation of social science questions must precede the construc-tion of models and the distillation of these models down to mathematical problems,for otherwise the assumptions might be inappropriate.

A consequence of the pervasive use of mathematics in our research is a changein the level of mathematics training required of our graduate students. We need ref-erence and graduate text books that address applications of advanced mathematicsto a widening range of social sciences. This book fills that need.

Many years ago, Bill Riker introduced me to Norman Schofield’s work and thento Norman. He is unique in his ability to span the social sciences and apply integra-tive mathematical reasoning to them all. The emphasis on his work and his book ison smooth models and techniques, while the motivating examples for presentationof the mathematics are drawn primarily from economics and political science. Thereader is taken from basic set theory to the mathematics used to solve problems atthe cutting edge of research. Students in every social science will find exposure tothis mode of analysis useful; it elucidates the common threads in different fields.Speculations at the end of Chap. 5 provide students and researchers with many openresearch questions related to the content of the first four chapters. The answers arein these chapters. When the first edition appeared in 2002, I wrote in my Forewordthat a goal of the reader should be to write Chap. 6. For the second edition of thebook, Norman himself has accomplished this open task.

Marcus BerliantSt. Louis, Missouri, USA2013

vii

njschofi

Cross-Out

njschofi

Sticky Note

4

AU

TH

OR

’S P

RO

OF


AU

TH

OR

’S P

RO

OF


277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

Preface to the Second Edition

For the second edition, I have added a new chapter six. This chapter continues withthe model presented in Chap. 3 by developing the idea of dynamical social choice.In particular the chapter considers the possibility of cycles enveloping the set ofsocial alternatives.

A theorem of Saari (1997) shows that for any non-collegial set, D, of decisive orwinning coalitions, if the dimension of the policy space is sufficiently large, then thechoice is empty under D for all smooth profiles in a residual subspace of Cr(W,�n).In other words the choice is generically empty.

However, we then define a social solution concept, known as the heart. Whenregarded as a correspondence, the heart is lower hemi-continuous. In general theheart is centrally located with respect to the distribution of voter preferences, andis guaranteed to be non-empty. Two examples are given to show how the heart isdetermined by the symmetry of the voter distribution.

Finally, to be able to use survey data of voter preferences, the chapter introducesthe idea of stochastic social choice. In situations where voter choice is given by aprobability vector, we can model the choice by assuming that candidates choosepolicies to maximise their vote shares. In general the equilibrium vote maximisingpositions can be shown to be at the electoral mean. The necessary and sufficient con-dition for this is given by the negative definiteness of the candidate vote Hessians. Inan empirical example, a multinomial logit model of the 2008 Presidential election ispresented, based on the American National Election Survey, and the parameters ofthis model used to calculate the Hessians of the vote functions for both candidates.According to this example both candidates should have converged to the electoralmean.

Norman SchofieldSaint Louis, Missouri, USAJune 13, 2013

ix

njschofi

Cross-Out

njschofi

Sticky Note

can

AU

TH

OR

’S P

RO

OF


AU

TH

OR

’S P

RO

OF


323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

Author’s Preface

In recent years, the optimisation techniques, which have proved so useful in mi-croeconomic theory, have been extended to incorporate more powerful topologicaland differential methods. These methods have led to new results on the qualitativebehaviour of general economic and political systems. However, these developmentshave also led to an increase in the degree of formalism in published work. This for-malism can often deter graduate students. My hope is that the progression of ideaspresented in these lecture notes will familiarise the student with the geometric con-cepts underlying these topological methods, and, as a result, make mathematicaleconomics, general equilibrium theory, and social choice theory more accessible.

The first chapter of the book introduces the general idea of mathematical struc-ture and representation, while the second chapter analyses linear systems and therepresentation of transformations of linear systems by matrices. In the third chapter,topological ideas and continuity are introduced and used to solve convex optimi-sation problems. These techniques are also used to examine existence of a “socialequilibrium.” Chapter four then goes on to study calculus techniques using a linearapproximation, the differential, of a function to study its “local” behaviour.

The book is not intended to cover the full extent of mathematical economicsor general equilibrium theory. However, in the last sections of the third and fourthchapters I have introduced some of the standard tools of economic theory, namelythe Kuhn Tucker Theorem, together with some elements of convex analysis andprocedures using the Lagrangian. Chapter four provides examples of consumer andproducer optimisation. The final section of the chapter also discusses, in a heuristicfashion, the smooth or critical Pareto set and the idea of a regular economy. The fifthand final chapter is somewhat more advanced, and extends the differential calculusof a real valued function to the analysis of a smooth function between “local” vectorspaces, or manifolds. Modem singularity theory is the study and classification of allsuch smooth functions, and the purpose of the final chapter to use this perspective toobtain a generic or typical picture of the Pareto set and the set of Walrasian equilibriaof an exchange economy.

Since the underlying mathematics of this final section are rather difficult, I havenot attempted rigorous proofs, but rather have sought to lay out the natural path ofdevelopment from elementary differential calculus to the powerful tools of singular-ity theory. In the text I have referred to work of Debreu, Balasko, Smale, and Saari,among others who, in the last few years, have used the tools of singularity theory to

xi

njschofi

Sticky Note

AU

TH

OR

’S P

RO

OF


xii Author’s Preface

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

develop a deeper insight into the geometric structure of both the economy and thepolity. These ideas are at the heart of recent notions of “chaos.” Some speculationson this profound way of thinking about the world are offered in Sect. 5.6. Reviewexercises are provided at the end of the book.

I thank Annette Milford for typing the manuscript and Diana Ivanov for thepreparation of the figures.

I am also indebted to my graduate students for the pertinent questions they askedduring the courses on mathematical methods in economics and social choice, whichI have given at Essex University, the California Institute of Technology, and Wash-ington University in St. Louis.

In particular, while I was at the California Institute of Technology I had the priv-ilege of working with Richard McKelvey and of discussing ideas in social choicetheory with Jeff Banks. It is a great loss that they have both passed away. This bookis dedicated to their memory.

Norman SchofieldSaint Louis, Missouri, USA

AU

TH

OR

’S P

RO

OF


415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

Contents

1 Sets, Relations, and Preferences . . . . . . . . . . . . . . . . . . . . 11.1 Elements of Set Theory . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 A Set Theory . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 A Propositional Calculus . . . . . . . . . . . . . . . . . 41.1.3 Partitions and Covers . . . . . . . . . . . . . . . . . . . 61.1.4 The Universal and Existential Quantifiers . . . . . . . . . 7

1.2 Relations, Functions and Operations . . . . . . . . . . . . . . . 71.2.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Groups and Morphisms . . . . . . . . . . . . . . . . . . . . . . 121.4 Preferences and Choices . . . . . . . . . . . . . . . . . . . . . . 24

1.4.1 Preference Relations . . . . . . . . . . . . . . . . . . . . 241.4.2 Rationality . . . . . . . . . . . . . . . . . . . . . . . . . 251.4.3 Choices . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.5 Social Choice and Arrow’s Impossibility Theorem . . . . . . . . 321.5.1 Oligarchies and Filters . . . . . . . . . . . . . . . . . . . 331.5.2 Acyclicity and the Collegium . . . . . . . . . . . . . . . 35

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2 Linear Spaces and Transformations . . . . . . . . . . . . . . . . . 392.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 45

2.2.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.2 The Dimension Theorem . . . . . . . . . . . . . . . . . 492.2.3 The General Linear Group . . . . . . . . . . . . . . . . . 532.2.4 Change of Basis . . . . . . . . . . . . . . . . . . . . . . 552.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 59

2.3 Canonical Representation . . . . . . . . . . . . . . . . . . . . . 622.3.1 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . 632.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 662.3.3 Symmetric Matrices and Quadratic Forms . . . . . . . . 672.3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.4 Geometric Interpretation of a Linear Transformation . . . . . . . 73

xiii

AU

TH

OR

’S P

RO

OF


xiv Contents

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

3 Topology and Convex Optimisation . . . . . . . . . . . . . . . . . . 773.1 A Topological Space . . . . . . . . . . . . . . . . . . . . . . . . 77

3.1.1 Scalar Product and Norms . . . . . . . . . . . . . . . . . 773.1.2 A Topology on a Set . . . . . . . . . . . . . . . . . . . . 82

3.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.4.1 A Convex Set . . . . . . . . . . . . . . . . . . . . . . . 993.4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 1003.4.3 Separation Properties of Convex Sets . . . . . . . . . . . 104

3.5 Optimisation on Convex Sets . . . . . . . . . . . . . . . . . . . 1103.5.1 Optimisation of a Convex Preference Correspondence . . 110

3.6 Kuhn-Tucker Theorem . . . . . . . . . . . . . . . . . . . . . . . 1153.7 Choice on Compact Sets . . . . . . . . . . . . . . . . . . . . . . 1183.8 Political and Economic Choice . . . . . . . . . . . . . . . . . . 125References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

4 Differential Calculus and Smooth Optimisation . . . . . . . . . . . 1354.1 Differential of a Function . . . . . . . . . . . . . . . . . . . . . 1354.2 Cr -Differentiable Functions . . . . . . . . . . . . . . . . . . . . 142

4.2.1 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . 1424.2.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . 1454.2.3 Critical Points of a Function . . . . . . . . . . . . . . . . 149

4.3 Constrained Optimisation . . . . . . . . . . . . . . . . . . . . . 1554.3.1 Concave and Quasi-concave Functions . . . . . . . . . . 1554.3.2 Economic Optimisation with Exogenous Prices . . . . . . 162

4.4 The Pareto Set and Price Equilibria . . . . . . . . . . . . . . . . 1714.4.1 The Welfare and Core Theorems . . . . . . . . . . . . . 1714.4.2 Equilibria in an Exchange Economy . . . . . . . . . . . 180

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

5 Singularity Theory and General Equilibrium . . . . . . . . . . . . 1895.1 Singularity Theory . . . . . . . . . . . . . . . . . . . . . . . . . 189

5.1.1 Regular Points: The Inverse and Implicit FunctionTheorem . . . . . . . . . . . . . . . . . . . . . . . . . . 189

5.1.2 Singular Points and Morse Functions . . . . . . . . . . . 1965.2 Transversality . . . . . . . . . . . . . . . . . . . . . . . . . . . 2005.3 Generic Existence of Regular Economies . . . . . . . . . . . . . 2035.4 Economic Adjustment and Excess Demand . . . . . . . . . . . . 2075.5 Structural Stability of a Vector Field . . . . . . . . . . . . . . . 2105.6 Speculations on Chaos . . . . . . . . . . . . . . . . . . . . . . . 221References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

6 Topology and Social Choice . . . . . . . . . . . . . . . . . . . . . . 2316.1 Existence of a Choice . . . . . . . . . . . . . . . . . . . . . . . 2316.2 Dynamical Choice Functions . . . . . . . . . . . . . . . . . . . 233

AU

TH

OR

’S P

RO

OF


Contents xv

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

6.3 Stochastic Choice . . . . . . . . . . . . . . . . . . . . . . . . . 2386.3.1 The Model Without Activist Valence Functions . . . . . . 244

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

7 Review Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2517.1 Exercises to Chap. 1 . . . . . . . . . . . . . . . . . . . . . . . . 2517.2 Exercises to Chap. 2 . . . . . . . . . . . . . . . . . . . . . . . . 2517.3 Exercises to Chap. 3 . . . . . . . . . . . . . . . . . . . . . . . . 2537.4 Exercises to Chap. 4 . . . . . . . . . . . . . . . . . . . . . . . . 2557.5 Exercises to Chap. 5 . . . . . . . . . . . . . . . . . . . . . . . . 255

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

AU

TH

OR

’S P

RO

OF


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

1Sets, Relations, and Preferences

In this chapter we introduce elementary set theory and the notation to be usedthroughout the book. We also define the notions of a binary relation, of a function, aswell as the axioms of a group and field. Finally we discuss the idea of an individualand social preference relation, and mention some of the concepts of social choiceand welfare economics.

1.1 Elements of Set Theory

Let U be a collection of objects, which we shall call the domain of discourse, theuniversal set, or universe. A set B in this universe (namely a subset of U ) is a sub-collection of objects from U . B may be defined either explicitly by enumerating theobjects, for example by writing

B = {Tom,Dick,Harry}, or

B = {x1, x2, x3, . . .}.

Alternatively B may be defined implicitly by reference to some property P(B),which characterises the elements of B , thus

B = {x : x satisfies P(B)}.

For example:

B = {x : x is an integer satisfying 1≤ x ≤ 5}

is a satisfactory definition of the set B , where the universal set could be the collec-tion of all integers. If B is a set, write x ∈ B to mean that the element x is a memberof B . Write {x} for the set which contains only one element, x.

N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6_1,© Springer-Verlag Berlin Heidelberg 2014

1

http://dx.doi.org/10.1007/978-3-642-39818-6_1

AU

TH

OR

’S P

RO

OF


2 1 Sets, Relations, and Preferences

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

If A, B are two sets write A∩B for the intersection: that is the set which containsonly those elements which are both in A and B . Write A ∪ B for the union: that isthe set whose elements are either in A or B . The null set or empty set Φ , is thatsubset of U which contains no elements in U .

Finally if A is a subset of U , define the negation of A, or complement of A in Uto be the set U\A=A= {x : x is in U but not in A}.

1.1.1 A Set Theory

Now let Γ be a family of subsets of U , where Γ includes both U and Φ , i.e., Γ ={U ,Φ,A,B, . . .}.

If A is a member of Γ , then write A ∈ Γ . Note that in this case Γ is a collectionor family of sets.

Suppose that Γ satisfies the following properties:1. for any A ∈ Γ,A ∈ Γ ,2. for any A,B in Γ,A∪B is in Γ ,3. for any A,B in Γ,A∩B is in Γ .Then we say that Γ satisfies closure with respect to (−,∪,∩), and we call Γ a settheory.

For example let 2U be the set of all subsets of U , including both U and Φ . Clearly2U satisfies closure with respect to (−,∪,∩).

We shall call a set theory Γ that satisfies the following axioms aBoolean algebra.

AxiomsS1. Zero element A∪Φ =A, A∩Φ =Φ

S2. Identity element A∪ U = U , A∩U =A

S3. Idempotency A∪A=A, A∩A=A

S4. Negativity A∪A= U , A∩A=Φ

A=A

S5. Commutativity A∪B = B ∪A

A∩B = B ∩A

S6. De Morgan Rule A∪B =A∩B

A∩B =A∪B

S7. Associativity A∪ (B ∪C)= (A∪B)∪C

A∩ (B ∩C)= (A∩B)∩C

S8. Distributivity A∪ (B ∩C)= (A∪B)∩ (A∪C)

A∩ (B ∪C)= (A∩B)∪ (A∩C).

We can illustrate each of the axioms by Venn diagrams in the following way.Let the square on the page represent the universal set U . A subset B of points

within U can then represent the set B . Given two subsets A,B the union is thehatched area, while the intersection is the double hatched area.

We shall use ⊂ to mean “included in”. Thus “A⊂ B” means that every elementin A is also an element of B . Thus:

njschofi

Sticky Note

See Figure 1.1.

AU

TH

OR

’S P

RO

OF


1.1 Elements of Set Theory 3

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

Fig. 1.1 ???

Fig. 1.2 ???

Suppose now that P(A) is the property that characterizes A, or that

A= {x : x satisfies P(A)}.

The symbol ≡ means “identical to”, so that [x ∈A] ≡ “x satisfies P(A)”.Associated with any set theory is a propositional calculus which satisfies prop-

erties analogous with a Boolean algebra, except that we use ∧ and ∨ instead of thesymbols ∩ and ∪ for “and” and “or”.

For example:

A∪B = {x : “x satisfies P(A)”∨ “x satisfies P(B)”}

A∩B = {x : “x satisfies P(A)”∧ “x satisfies P(B)”}.

The analogue of “⊂” is “if. . .then” or “implies”, which is written⇒.Thus A⊂ B ≡ [“x satisfies P(A)”⇒ “x satisfies P(B)”].The analogue of “=” in set theory is the symbol “⇐⇒” which means “if and

only if”, generally written “iff”. For example,

njschofi

Cross-Out

njschofi

Sticky Note

union

njschofi

Cross-Out

njschofi

Sticky Note

Inclusion

AU

TH

OR

’S P

RO

OF



139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

[A= B] = [“x satisfies P(A)”⇐⇒ “x satisfies P(B)”].

Hence

[A= B] ≡ [“x ∈A”⇐⇒ “x ∈ B”]≡ [A⊂ B andB ⊂A].

1.1.2 A Propositional Calculus

Let {U ,Φ,P1, . . . ,Pi, . . .} be a family of simple propositions. U is the universalproposition and always true, whereas Φ is the null proposition and always false.Two propositions P1,P2 can be combined to give a proposition P1 ∧ P2 (i.e., P1

and P2) which is true iff both P1 and P2 are true, and a proposition P1∨P2 (i.e., P1

or P2) which is true if either P1 or P2 is true. For a proposition P , the complementP in U is true iff P is false, and is false iff P is true.

Now extend the family of simple propositions to a family P , by including in Pany propositional sentence S(P1, . . . ,Pi, . . .) which is made up of simple proposi-tions combined under −,∨,∧. Then P satisfies closure with respect to (−,∨,∧)

and is called a propositional calculus.Let T be the truth function, which assigns to any simple proposition, Pi , the

value 0 if Pi is false and 1 if Pi is true. Then T extends to sentences in the obviousway, following the rules of logic, to give a truth function T : P→{0,1}. If T (S1)=T (S2) for all truth values of the constituent simple propositions of the sentences S1

and S2, then S1 = S2 (i.e., S1 and S2 are identical propositions).For example the truth values of the proposition P1 ∨ P2 and P2 ∨ P1 are given

by the table:

T (P1) T (P2) T (P1 ∨ P2) T (P2 ∨ P1)

0 0 0 00 1 1 11 0 1 11 1 1 1

Since T (P1∨P2)= T (P2∨P1) for all truth values it must be the case that P1∨P2 =P2 ∨ P1.

Similarly, the truth tables for P1 ∧ P2 and P2 ∧ P1 are:

T (P1) T (P2) T (P1 ∧ P2) T (P2 ∧ P1)

0 0 0 00 1 0 01 0 0 01 1 1 1

Thus P1 ∧ P2 = P2 ∧ P1.The propositional calculus satisfies commutativity of ∧ and ∨. Using these truth

tables the other properties of a Boolean algebra can be shown to be true.

AU

TH

OR

’S P

RO

OF


1.1 Elements of Set Theory 5

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

For example:(i) P ∨Φ = P , P ∧Φ =Φ .

T (P ) T (Φ) T (P ∨ Φ) T (P ∧Φ)

0 0 0 01 0 1 0

(ii) P ∨ U = U , P ∧ U = P .

T (P ) T (U) T (P ∨ U) T (P ∧ U)

0 1 1 01 1 1 1

(iii) Negation is given by reversing the truth value. Hence=P= P .

T (P ) T (P ) T (=P)

0 1 01 0 1

(iv) P ∨ P = U , P ∧ P =Φ .

T (P ) T (P ) T (P ∨ P ) T (P ∧ P )

0 1 1 01 0 1 0

Example 1.1 Truth tables can be used to show that a propositional calculus P =(U ,Φ,P1,P2, . . .) with the operators (−,∨,∧) is a Boolean algebra.

Suppose now that S1(A1, . . . ,An) is a compound set (or sentence) which is madeup of the sets A1, . . . ,An together with the operators {∪,∩,− }.

For example suppose that

S1(A1,A2,A3)=A1 ∪ (A2 ∩A3),

and let P(A1),P (A2),P (A3) be the propositions that characterise A1,A2,A3.Then

S1(A1,A2,A3)={x : x satisfies “S1

(P(A1),P (A2),P (A3)

)”}

S1(P (A1),P (A2),P (A3)) has precisely the same form as S1(A1,A2,A3) exceptthat P(A1) is substituted for Ai , and (∧,∨) are substituted for (∩,∪).

In the example

S1(P(A1),P (A2),P (A3)

)= P(A1)∨(P(A2)∧ P(A3)

).

Since P is a Boolean algebra, we know [by associativity] that P(A1)∨ (P (A2)∧P(A3))= (P (A1)∨P(A2))∧ (P (A1)∨P(A3))= S2(P (A1),P (A2),P (A3)), say.

Hence the propositions S1(P (A1), P (A2), P (A3)) and S2(P (A1),P (A2),

P (A3)), are identical, and the sentence

AU

TH

OR

’S P

RO

OF



231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

S1(A2,A2,A3)={x : x satisfies P

((A1)∨ P(A2)

)∧ (P(A1)∨ P(A3))}

= (A1 ∪A2)∩ (A1 ∪A3)

= S2(A1,A2,A3).

Consequently if Γ = (U ,Φ,A1,A2, . . .) is a set theory, then by exactly this proce-dure Γ can be shown to be a Boolean algebra.

Suppose now that Γ is a set theory with universal set U , and X is a subset of U .Let ΓX = (X,Φ,A1 ∩ X,A2 ∩X, . . .). Since Γ is a set theory on U ,ΓX must be aset theory on X, and thus there will exist a Boolean algebra in ΓX .

To see this consider the following:1. Since A ∈ Γ , then A ∈ Γ . Now let AX = A ∩ X. To define the com-

plement or negation (let us call it AX) of A in ΓX we have AX = {x :x is in X but not in A} = X ∩ A. As we noted previously this is also oftenwritten X−A, or X\A. But this must be the same as the complement or A∩X

in X, i.e., (A∩X)∩X = (A∪X)∩X = (A∩X)∪ (X ∩X)=A∩X.2. If A,B ∈ Γ then (A∩B)∩X = (A∩X)∩(B∩X). (The reader should examine

the behaviour of union.)A notion that is very close to that of a set theory is that of a topology.Say that a family Γ = (U ,Φ,A1,A2, . . .) is a topology on U iff

T1. when A1,A2 ∈ Γ then A1 ∩A2 ∈ Γ ;T2. If Aj ∈ Γ for all j belonging to some index set J (possibly infinite) then⋃

j∈J Aj ∈ Γ .T3. Both U and Φ belong to Γ .

Axioms T1 and T2 may be interpreted as saying that Γ is closed under finiteintersection and (infinite) union.

Let X be any subset of U . Then the relative topology ΓX induced from the topol-ogy Γ on U is defined by

ΓX = (X,Φ,A1 ∩X, . . .)

where any set of the form A∩X, for A ∈ Γ , belongs to ΓX .

Example 1.2 We can show that ΓX is a topology. If U1,U2 ∈ ΓX then there mustexist sets A1,A2 ∈ Γ such that Ui =Ai ∩X, for i = 1,2. But then

U1 ∩U2 = (A1 ∩X)∩ (A2 ∩X)

= (A1 ∩A2)∩X.

Since Γ is a topology, A1 ∩A2 ∈ Γ . Thus U1 ∩U2 ∈ ΓX . Union follows similarly.

1.1.3 Partitions and Covers

If X is a set, a cover for X is a family Γ = (A1,A2, . . . ,Aj , . . .) where j belongsto an index set J (possibly infinite) such that

X =∪{Aj : j ∈ J }.

AU

TH

OR

’S P

RO

OF


1.2 Relations, Functions and Operations 7

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

A partition for X is a cover which is disjoint, i.e., Aj ∩Ak =Φ for any distinctj, k ∈ J .

If ΓX is a cover for X, and Y is a subset of X then ΓY = {Aj ∩ Y : j ∈ J } is theinduced cover on Y .

1.1.4 The Universal and Existential Quantifiers

Two operators which may be used to construct propositions are the universal andexistential quantifiers.

For example, “for all x in A it is the case that x satisfies P(A).” The term “forall” is the universal quantifier, and generally written as ∀.

On the other hand we may say “there exists some x in A such that x satisfiesP(A).” The term “there exists” is the existential quantifier, generally written ∃.

Note that these have negations as follows:

not[∃ x s.t. x satisfies P ] ≡ [∀ x : x does not satisfyP ]not[∀ x : x satisfiesP ] ≡ [∃ x s.t. x does not satisfy P ].

We use s.t . to mean “such that”.

1.2 Relations, Functions and Operations

1.2.1 Relations

If X,Y are two sets, the Cartesian product set X × Y is the set of ordered pairs(x, y) such that x ∈X and y ∈ Y .

For example if we let � be the set of real numbers, then �×� or �2 is the set

{(x, y) : x ∈ �, y ∈ �},

namely the plane. Similarly �n =�×· · ·×� (n times) is the set of n-tuples of realnumbers, defined by induction, i.e., �n =� × (� × (� × · · · , . . .)).

A subset of the Cartesian product Y × X is called a relation, P , on Y × X. If(y, x) ∈ P then we sometimes write yPx and say that y stands in relation P to x.If it is not the case that (y, x) ∈ P then write (y, x) /∈ P or not (yPx). X is calledthe domain of P , and Y is called the target or codomain of P .

If V is a relation on Y × X and W is a relation on Z × Y , then define the re-lation W ◦ V to be the relation on Z × X given by (z, x) ∈ W ◦ V iff for somey ∈ Y , (z, y) ∈W and (y, x) ∈ V . The new relation W ◦ V on Z ×X is called thecomposition of W and V .

AU

TH

OR

’S P

RO

OF



323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

The identity relation (or diagonal) eX on X×X is

eX ={(x, x) : x ∈X

}.

If P is a relation on Y ×X, its inverse, P−1, is the relation on X× Y defined by

P−1 = {(x, y) ∈X× Y : (y, x) ∈ P}.

Note that:

P−1 ◦ P = {(z, x) ∈X×X : ∃ y ∈ Y s.t. (z, y) ∈ P−1 and (y, x) ∈ P}.

Suppose that the domain of P is X, i.e., for every x ∈ X there is some y ∈ Y s.t.(y, x) ∈ P . In this case for every x ∈ X, there exists y ∈ Y such that (x, y) ∈ P−1

and so (x, x) ∈ P−1 ◦ P for any x ∈X. Hence eX ⊂ P−1 ◦ P . In the same way

P ◦ P−1 = {(t, y) ∈ Y × Y : ∃x ∈Xs.t. (t, x) ∈ P and (x, y) ∈ P−1}

and so eY ⊂ P ◦ P−1.

1.2.2 Mappings

A relation P on Y × X defines an assignment or mapping from X to Y , which iscalled φP and is given by

φP (x)= {y : (y, x) ∈ P}.

In general we write φ :X→ Y for a mapping which assigns to each element ofX the set, φ(x), of elements in Y . As above, the set Y is called the co-domain of φ.

The domain of a mapping, φ, is the set {x ∈X : ∃ y ∈ Y s.t. y ∈ φ(x)}, and theimage of φ is {y ∈ Y : ∃ x ∈X s.t. y ∈ φ(x)}.

Suppose now that V,W are relations on Y × X,Z × Y respectively. We havedefined the composite relation W ◦ V on Z × X. This defines a mapping φW◦V :X→ Z by z ∈ φW◦V (x) iff ∃y ∈ Y such that (y, x) ∈ V and (z, y) ∈W . This inturn means that y ∈ φV (x) and z ∈ φW(y).

If φ : X→ Y and ψ : Y → Z are two mappings then define their compositionψ ◦ φ :X→ Z by

(ψ ◦ φ)(x)=ψ[φ(x)]=∪{ψ(y) : y ∈ φ(x)

}.

Clearly z ∈ φW◦V (x) iff z ∈ φW [φV (x)].Thus φW◦V (x) = φW [φV (x)] = [(φW ◦ φV )(x)], ∀x ∈ X. We therefore write

φW◦V = φW ◦ φV .For example suppose V and W are given by

AU

TH

OR

’S P

RO

OF



369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

V : {(2,3), (3,2), (1,2), (4,4), (4,1)}

and

W : {(1,4), (4,4), (4,1), (2,1), (2,3), (3,2)}

with mappings

φV φW

1 −→ 4 −→ 1↗ ↘

4 1 −→ 4↗ ↘

2 −→ 3 −→ 23 −→ 2 −→ 3

then the composite mapping φW ◦ φV = φW◦V is

1 −→ 1↗↘

4 −→ 4↘

2 −→ 23 −→ 3

with relation

W ◦ V = {(3,3), (2,2), (4,2), (1,4), (4,4), (1,1), (4,1)}.

Given a mapping φ : X → Y then the reverse procedure to the above gives arelation, called the graph of φ, or graph (φ), where

graph(φ)=⋃

x∈X

(φ(x)× {x})⊂ Y ×X.

In the obvious way if φ :X→ Y and ψ : Y → Z, are mappings, with compositionψ ◦ φ :X→ Z, then graph (ψ ◦ φ)= graph(ψ) ◦ graph(φ).

Suppose now that P is a relation on Y × X, with inverse P−1 on X × Y , andlet φP :X→ Y be the mapping defined by P . Then the mapping φP−1 : Y →X isdefined as follows:

φP−1(y)= {x : (x, y) ∈ P−1}

= {x : (y, x) ∈ P}

= {x : y ∈ φP (x)}.

More generally if φ :X→ Y is a mapping then the inverse mapping φ−1 : Y →X

is given by

φ−1(y)= {x : y ∈ φ(x)}.

AU

TH

OR

’S P

RO

OF



415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

Thus

φP−1 = (φP )−1 : Y →X.

For example let Z4 be the first four positive integers and let P be the relation onZ4 ×Z4 given by

P = {(2,3), (3,2), (1,2), (4,4), (4,1)}.

Then the mapping φP and inverse φP−1 are given by:

φP : 1 −→ 4 φP−1 : 4 −→ 1↗ 1 1 ↘

4 ↗ ↘ 42 −→ 3 3 −→ 23 −→ 2 2 −→ 3

If we compose P−1 and P as above then we obtain

P−1 ◦ P = {(1,1), (1,4), (4,1), (4,4), (2,2), (3,3)},

with mapping

φP−1 ◦ φP

1 −→ 1↗↘

4 −→ 42 −→ 23 −→ 3

Note that P−1◦P contains the identity or diagonal relation e= {(1,1), (2,2), (3,3),

(4,4)} on Z4 = {1,2,3,4}. Moreover φP−1 ◦ φP = φ(P−1◦P).The mapping idX :X→X defined by idX(x)= x is called the identity mapping

on X. Clearly if eX is the identity relation, then φeX= idX and graph (idX)= ex .

If φ,ψ are two mappings X→ Y then write ψ ⊂ φ iff for each x ∈X, ψ(x)⊂φ(x).

As we have seen eX ⊂ P−1 ◦ P and so

φeX= idX ⊂ φ(P−1◦P) = φP−1 ◦ φP = (φP )−1 ◦ φP .

(This is only precisely true when X is the domain of P , i.e., when for every x ∈X

there exists some y ∈ Y such that (y, x) ∈ P .)

AU

TH

OR

’S P

RO

OF



461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

1.2.3 Functions

If for all x in the domain of φ, there is exactly one y such that y ∈ φ(x) then φ is

called a function. In this case we generally write f :X→ Y , and sometimes xf→ y

to indicate that f (x)= y. Consider the function f and its inverse f−1 given by

f f−1 f−1 ◦ f

1 −→ 4 −→ 1 1 −→ 1↗ ↘ ↗↘

4 1 4 4 −→ 42 −→ 3 −→ 2 2 −→ 23 −→ 2 −→ 3 3 −→ 3

Clearly f−1 is not a function since it maps 4 to both 1 and 4, i.e., the graph off−1 is {(1,4), (4,4), (2,3), (3,2)}. In this case idX is contained in f−1 ◦ f but isnot identical to f−1 ◦ f . Suppose that f−1 is in fact a function. Then it is necessarythat for each y in the image there be at most one x such that f (x)= y. Alternativelyif f (x1)= f (x2) then it must be the case that x1 = x2. In this case f is called 1− 1or injective. Then f−1 is a function and

idX = f−1 ◦ f on the domain X of f

idY = f ◦ f−1 on the image Y of f.

A mapping φ : X→ Y is said to be surjective (or called a surjection) iff everyy ∈ Y belongs to the image of φ; that is, ∃ x ∈X s.t. y ∈ φ(x).

A function f :X→ Y which is both injective and surjective is said to be bijec-tive.

Example 1.3 Consider

π π−1

1 −→ 4 −→ 14 −→ 2 −→ 42 −→ 3 −→ 23 −→ 1 −→ 3.

In this case the domain and image of π coincide and π is known as a permutation.Consider the possibilities where φ is a mapping � → �, with graph (φ) ⊂ �2.(Remember � is the set of real numbers.) There are three cases:

AU

TH

OR

’S P

RO

OF



507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

(i) φ is a mapping:

(ii) φ is a non injective function:

(iii) φ is an injective function:

1.3 Groups and Morphisms

We earlier defined the composition of two mappings φ :X→ Y and ψ : Y →X tobe ψ ◦ φ : X→ Z given by (ψ ◦ φ)(x) = ψ[φ(x)] = ∪ {ψ(y) : y ∈ φ(x)}. In thecase of functions f :X→ Y and g : Y → Z this translates to

(g ◦ f )(x)= g[f (x)

]= {g(y) : y = f (x)}.

Since both f,g are functions the set on the right is a singleton set, and so g ◦ f is afunction. Write F(A,B) for the set of functions from A to B . Thus the compositionoperator, ◦, may be regarded as a function:

◦ : F(X,Y )×F(Y,Z) → F(X,Z)

(f, g) → g ◦ f.

AU

TH

OR

’S P

RO

OF


1.3 Groups and Morphisms 13

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

Example 1.4 To illustrate consider the function (or matrix) F given by

(a b

c d

)(x1x2

)=(

ax1 + bx2cx1 + dx2

).

This can be regarded as a function F : �2→�2 since it maps (x1, x2) → (ax1+bx2, cx1 + dx2) ∈ �2.

Now let

F =(

a b

c d

), H =

(e f

g h

).

F ◦H is represented by

(x1x2

)F→(

ax1 + bx2cx1 + dx2

)H→(

e(ax1 + bx2)+ f (cx1 + dx2)

g(ax1 + bx2)+ h(cx1 + dx2)

).

Thus

(H ◦ F)

(x1x2

)=(

ea + f c | eb+ f d

ga + hc | gb+ hd

)(x1x2

)

or

(H ◦ F)=(

e f

g h

)◦(

a b

c d

)=(

ea + f c | eb+ f d

ga + hc | gb+ hd

).

The identity E is the function

E

(x1x2

)=(

a b

c d

)(x1x2

)=(

x1x2

).

Since this must be true for all x1, x2, it follows that a = d = 1 and c= b= 0.

Thus E = ( 1 00 1

).

Suppose that the mapping F−1 : �2 →�2 is actually a matrix. Then it is cer-tainly a function, and by Sect. 1.2.3, F−1 ◦F must be equal to the identity functionon �2, which here we call E. To determine F−1, proceed as follows:

Let F−1 = ( e f

g h

). We know F−1 ◦ F = ( 1 0

0 1

).

Thus

ea+ fc= 1 | eb+ fd = 0

ga+ hc= 0 | gb+ hd = 1.

If a �= 0 and b �= 0 then e=− f db= 1−f c

a.

AU

TH

OR

’S P

RO

OF



599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

Now let |F | = (ad − bc), where |F | is called the determinant of F . Clearlyif |F | �= 0, then f = −b/|F |. More generally, if |F | �= 0 then we can solve theequations to obtain:

F−1 =(

e f

g h

)= 1

|F |(

d −b

−c a

).

If |F | = 0, then what we have called F−1 is not defined. This suggests that when|F | = 0, the inverse F−1 cannot be represented by a matrix, and in particular thatF−1 is not a function. In this case we shall call F singular. When |F | �= 0 then weshall call F non-singular, and in this case F−1 can be represented by a matrix, andthus a function. Let M(2) stand for the set of 2× 2 matrices, and let M∗(2) be thesubset of M(2) consisting of non-singular matrices.

We have here defined a composition operation:

◦ : M(2)×M(2) → M(2)

(H,F ) → H ◦ F.

Suppose we compose E with F then

E ◦ F =(

1 00 1

)(a b

c d

)=(

a b

c d

)= F.

Finally for any F ∈M∗(2) it is the case that there exists a unique matrix F−1 ∈M(2) such that

F−1 ◦ F =E.

Indeed if we compute the inverse (F−1)−1 of F−1 then we see that (F−1)−1 = F .Thus F−1 itself belongs to M∗(2).

M∗(2) is an example of what is called a group.More generally a binary operation, ◦, on a set G is a function

◦ : G×G → G,

(x, y) → x ◦ y.

Definition 1.1 A group G is a set G together with a binary operation, ◦ : G ×G→G which1. is associative: (x ◦ y) ◦ z= x ◦ (y ◦ z) for all x, y, z in G;2. has an identity e : e ◦ x = x ◦ e= x ∀ x ∈G;3. has for each x ∈G an inverse x−1 ∈G such that x ◦ x−1 = x−1 ◦ x = e.

When G is a group with operation, ◦, write (G,◦) to signify this.Associativity simply means that the order of composition in a sequence of com-

positions is irrelevant. For example consider the integers, Z , under addition. Clearlya + (b+ c)= (a + b)+ c, where the left hand side means add b to c, and then add

AU

TH

OR

’S P

RO

OF



645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

a to this, while the right hand side is obtained by adding a to b, and then adding c

to this. Under addition, the identity is that element e ∈Z such that a + e = a. Thisis usually written 0. Finally the additive inverse of an integer a ∈ Z is (−a) sincea + (−a)= 0. Thus (Z,+) is a group.

However consider the integers under multiplication, which we shall write as “·”.Again we have associativity since

a · (b · c) = (a · b) · c.

Clearly 1 is the identity since 1 · a = a. However the inverse of a is that objecta−1 such that a ·a−1 = 1. Of course if a = 0, then no such inverse exists. For a �= 0,a−1 is more commonly written 1

a. When a is non-zero, and different from ±1, then

1a

is not an integer. Thus (Z, ·) is not a group. Consider the set Q of rationals, i.e.,a ∈Q iff a = p

q, where both p and q are integers. Clearly 1 ∈Q. Moreover, if a = p

q

then a−1 = qp

and so belongs to Q. Although zero does not have an inverse, we canregard (Q\{0}, ·) as a group.

Lemma 1.1 If (G,◦) is a group, then the identity e is unique and for each x ∈G

the inverse x−1 is unique. By definition e−1 = e. Also (x−1)−1 = x for any x ∈G.

Proof1. Suppose there exist two distinct identities, e,f . Then e ◦ x = f ◦ x for some x.

Thus (e ◦ x) ◦ x−1 = (f ◦ x) ◦ x−1. This is true because the composition oper-ation

((e ◦ x), x−1)→ (e ◦ x) ◦ x−1

gives a unique answer.By associativity (e ◦ x) ◦ x−1 = e ◦ (x ◦ x−1), etc.Thus e ◦ (x ◦ x−1)= f ◦ (x ◦ x−1). But x ◦ x−1 = e, say.

Since e is an identity, e ◦ e= f ◦ e and so e= f . Since e ◦ e= e it must bethe case that e−1 = e.

2. In the same way suppose x has two distinct inverses, y, z, so x ◦ y = x ◦ z= e.Then

y ◦ (x ◦ y)= y ◦ (x ◦ z)

(y ◦ x) ◦ y = (y ◦ x) ◦ z

e ◦ y = e ◦ z

y = z.

3. Finally consider the inverse of x−1. Since x ◦ (x−1) = e and by definition(x−1)−1 ◦ (x−1)= e by part (2), it must be the case that (x−1)−1 = x. �

We can now construct some interesting groups.

AU

TH

OR

’S P

RO

OF



691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

Lemma 1.2 The set M∗(2) of 2 × 2 non-singular matrices form a group undermatrix composition, ◦.

Proof We have already shown that there exists an identity matrix E in M∗(2).Clearly |E| = 1 and so E has inverse E.

As we saw in Example 1.4, when we solved H ◦ F =E we found that

H = F−1 = 1

|F |(

d −b

−c a

).

By Lemma 1.1, (F−1)−1 = F and so F−1 must have an inverse, i.e., |F−1| �= 0,and so F−1 is non-singular. Suppose now that the two matrices H,F belong toM∗(2). Let

F =(

a b

c d

)

and

H =(

e f

g h

).

As in Example 1.4,

|H ◦ F | =∣∣∣∣

(ea + f c eb+ f d

ga + hc gb+ hd

)∣∣∣∣= (ea + f c)(gb+ hd)− (ga + hc)(eb+ f d)

= (eh− gf )(ad − bc)= |H ||F |.

Since both H and F are non-singular, |H | �= 0 and |F | �= 0 and so |H ◦ F | �= 0.Thus H ◦ F belongs to M∗(2), and so matrix composition is a binary operationM∗(2)×M∗(2)→M∗(2).

Finally the reader may like to verify that matrix composition on M∗(2) is asso-ciative. That is to say if F,G,H are non-singular 2× 2 matrices then

H ◦ (G ◦ F)= (H ◦G) ◦ F.

As a consequence (M∗(2),◦) is a group. �

Example 1.5 For a second example consider the addition operation on M(2) de-fined by

(e f

g h

)+(

a b

c d

)=(

a + e f + b

g+ c h+ d

).

AU

TH

OR

’S P

RO

OF



737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

Fig. 1.3 ???

Clearly the identity matrix is( 0 0

0 0

)and the inverse of F is

−F =−(

a b

c d

)=(−a −b

−c −d

).

Thus (M(2),+) is a group.Finally consider those matrices which represent rotations in �2.If we rotate the point (1,0) in the plane through an angle θ in the anticlockwise

direction then the result is the point (cos θ, sin θ ), while the point (0,1) is trans-formed to (− sin θ, cos θ ). As we shall see later, this rotation can be represented bythe matrix

(cos θ − sin θ

sin θ cos θ

)

which we will call eiθ .Let Θ be the set of all matrices of this form, where θ can be any angle between 0

and 360◦. If eiθ and eiψ are rotations by θ,ψ respectively, and we rotate by θ firstand then by ψ , then the result should be identical to a rotation by ψ + θ . To see this:

(cosψ − sinψ

sinψ cosψ

)(cos θ − sin θ

sin θ cos θ

)

=(

cosψ cos θ − sinψ sin θ | − cosψ sin θ − sinψ cos θ

sinψ cos θ + cosψ sin θ | − sinψ sin θ + cosψ cos θ

)

=(

cos(ψ + θ) − sin(ψ + θ)

sin(ψ + θ) cos(ψ + θ)

)= ei(θ+ψ).

Note that |eiθ | = cos2 θ + sin2 θ = 1. Thus

(eiθ)−1 =

(cos θ sin θ

− sin θ cos θ

)=(

cos θ − sin(−θ)

sin(−θ) cos θ

)= ei(−θ).

njschofi

Cross-Out

njschofi

Sticky Note

Rotation

njschofi

Sticky Note

See Figure 1.3.

AU

TH

OR

’S P

RO

OF



783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

Hence the inverse to eiθ is a rotation by (−θ), that is to say by θ but in theopposite direction. Clearly E = ei0, a rotation through a zero angle. Thus (Θ,◦) isa group. Moreover Θ is a subset of M∗(2), since each rotation has a non-singularmatrix. Thus Θ is a subgroup of M∗(2).

A subset Θ of a group (G,◦) is a subgroup of G iff the composition operation,◦, restricted to Θ is “closed”, and Θ is a group in its own right. That is to say (i) ifx, y ∈Θ then x ◦ y ∈Θ , (ii) the identity e belongs to Θ and (iii) for each x in Θ

the inverse, x−1, also belongs to Θ .

Definition 1.2 Let (X,◦) and (Y, ·) be two sets with binary operations, ◦, ·, re-spectively. A function f : X→ Y is called a morphism (with respect to (◦, ·)) ifff (x ◦ y) = f (x) · f (y), for all x, y ∈ X. If moreover f is bijective as a function,then it is called an isomorphism. If (X,◦), (Y, ·) are groups then f is called a homo-morphism.

A binary operation on a set X is one form of mathematical structure that the setmay possess. When an isomorphism exists between two sets X and Y then mathe-matically speaking their structures are identical.

For example let Rot be the set of all rotations in the plane. If rot(θ) and rot(ψ)

are rotations by θ , ψ respectively then we can combine them to give a rotationrot(ψ + θ), i.e.,

rot(ψ) ◦ rot(θ)= rot(ψ + θ).

Here ◦ means do one rotation then the other. To the rotation, rot(θ) let f assignthe 2× 2 matrix, called eiθ as above. Thus

f : (rot,◦)→ (Θ,◦),

where f (rot(θ))= eiθ .Moreover

f(rot(ψ) ◦ rot(θ)

)= f(rot(ψ + θ)

)

eiψ ◦ eiθ = ei(ψ+θ)

Clearly the identity rotation is rot(0) which corresponds to the zero matrix ei◦,while the inverse rotation to rot(θ) is rot(−θ) corresponding to e−iθ . Thus f is amorphism.

Here we have a collection of geometric objects, called rotations, with their ownstructure and we have found another set of “mathematical” objects namely 2 × 2matrices of a certain type, which has an identical structure.

Lemma 1.3 If f : (X,◦)→ (Y, ·) is a morphism between groups then

AU

TH

OR

’S P

RO

OF



829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

(1) f (eX)= eY where eX , eY are the identities in X,Y .(2) for each x in X, f (x−1)= [f (x)]−1.

Proof1. Since f is a morphism f (x ◦ eX)= f (x) · f (eX)= f (x). By Lemma 1.2, eY

is unique and so f (eX)= eY .2. f (x ◦ x−1) = f (x) · f (x−1) = f (eX) = eY . By Lemma 1.2, [f (x)]−1 is

unique, and so f (x−1)= [f (x)]−1. �

As an example, consider the determinant function det :M(2)→�.From the proof of Lemma 1.3, we know that for any 2× 2 matrices,H and F ,

it is the case that |H ◦ F | = |H | |F |. Thus det : (M(2),◦)→ (�, ·) is a morphismwith respect to matrix composition, ◦, in M(2) and multiplication, ·, in �.

Note also that if F is non-singular then det(F )= |F | �= 0, and so det :M∗(2)→�\{0}.

It should be clear that (�\{0}, ·) is a group.Hence det : (M∗(2),◦) → (�\{0}, ·) is a homomorphism between these two

groups. This should indicate why those matrices in M(2) which have zero deter-minant are those without an inverse in M(2).

From Example 1.4, the identity in M∗(2) is E, while the multiplicative identityin � is 1. By Lemma 1.3, det(E)= 1.

Moreover |F |−1 = 1|F | and so, by Lemma 1.3, |F−1| = 1

|F | . This is easy to checksince

∣∣F−1∣∣=∣∣∣∣

1

|F |(

d −b

−c a

)∣∣∣∣=da − bc

|F |2 = |F ||F |2 =

1

|F | .

However the determinant det :M∗(2)→�\0 is not injective, since it is clearlypossible to find two matrices, H,F such that |H | = |F | although H and F aredifferent.

Example 1.6 It is clear that the real numbers form a group (�,+) under additionwith identity 0, and inverse (to a) equal to −a. Similarly the reals form a group(�\{0}, ·) under multiplication, as long as we exclude 0.

Now let Z2 be the numbers {0,1} and define “addition modulo 2,” written +,on Z2, by 0+ 0= 0, 0+ 1 = 1, 1+ 0= 1, 1+ 1, and “multiplication modulo 2,”written ·, on Z2, by 0 · 0= 0, 0 · 1= 1 · 0= 0, 1 · 1= 1.

Under “addition modulo 2,” 0 is the identity, and 1 has inverse 1. Associativityis clearly satisfied, and so (Z2,+) is a group. Under multiplication, 1 is the identityand inverse to itself, but 0 has no inverse. Thus (Z2, ·) is not a group. Note that(Z2\{0}, ·) is a group, namely the trivial group containing only one element. Let Zbe the integers, and consider the function

f :Z→Z2,

defined by f (x)= 0 if x is even, 1 if x is odd.

AU

TH

OR

’S P

RO

OF



875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

We see that this is a morphism f : (Z,+)→ (Z2,+);1. if x and y are both even then f (x)= f (y)= 0; since x+y is even, f (x+y)=

0.2. if x is even and y odd, f (x)= 0, f (y)= 1 and f (x)+ f (y)= 1. But x + y is

odd, so f (x + y)= 1.3. if x and y are both odd, then f (x)= f (y)= 1, and so f (x)+ f (y)= 0. But

x + y is even, so f (x + y)= 0.Since (Z,+) and (Z2,+) are both groups, f is a homomorphism. Thus f (−a)=

f (a).On the other hand consider

f : (Z, ·)→ (Z2, ·);

1. if x and y are both even then f (x)= f (y)= 0 and so f (x) ·f (y)= 0= f (xy).2. if x is even and y odd, then f (x)= 0, f (y)= 1 and f (x) · f (y)= 0. But xy

is even so f (xy)= 0.3. if x and y are both odd, f (x)= f (y)= 1 and so f (x)f (y)= 1. But xy is odd,

and f (xy)= 1.Hence f is a morphism. However, neither (Z, ·) nor (Z2, ·) is a group, and so f

is not a homomorphism.A computer, since it is essentially a “finite” machine, is able to compute in binary

arithmetic, using the two groups (Z2,+), (Z2\{0}, ·) rather than with the groups(�,+), (�\{0}, ·).

This is essentially because the additive and multiplicative groups based on Z2form what is called a field.

Definition 1.31. A group (G,◦) is commutative or abelian iff for all a, b ∈G, a ◦ b= b ◦ a.2. A field (F,+, ·) is a set together with two operations called addition (+) and

multiplication (·) such that (F,+) is an abelian group with zero, or identity 0,and (F\{0}, ·) is an abelian group with identity 1. For convenience the additiveinverse of an element a ∈F is written (−a) and the multiplicative inverse of anon zero a ∈F is written a−1 or 1

a.

Moreover, multiplication is distributive over addition, i.e., for all a, b, c in F ,a · (b+ c)= a · b+ a · c.

To give an indication of the notion of abelian group, consider M∗(2) again. Aswe have seen

H ◦ F =(

e f

g h

)◦(

a b

c d

)=(

ea + f c eb+ f d

ga + hc gb+ hd

).

However,

F ◦H =(

a b

c d

)◦(

e f

g h

)=(

ea + bg af + bh

ce+ dg cf + dh

).

AU

TH

OR

’S P

RO

OF



921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

Thus H ◦ F �= F ◦H in general and so M∗(2) is non abelian. However, if weconsider two rotations eiθ , eiψ then eiψ ◦ eiθ = ei(ψ+θ) = eiθ ◦ eiψ . Thus the group(Θ,◦) is abelian.

Lemma 1.4 Both (�,+, ·) and (Z2,+, ·) are fields.

Proof Consider (Z2,+, ·) first of all. As we have seen (Z2,+) and (Z2\{0}, ·) aregroups. (Z2,+) is obviously abelian since 0+ 1= 1+ 0= 1, while (Z2\{0},◦) isabelian since it has one element.

To check for distributivity, note that

1 · (1+ 1)= 1 · 0= 0= 1 · 1+ 1 · 1= 1+ 1.

Finally to see that (�,+, ·) is a field, we note that for any real numbers, a, b, c,�, (b+ c)= ab+ ac. �

Given a field (F,+, ·) we define a new object called Fn where n is a positiveinteger as follows. Any element x ∈Fn is of the form

⎛

⎝x1·

xn

⎞

⎠

where x1, . . . , xn all belong to F .F1. If a ∈F , and x ∈Fn define αx ∈Fn by

α

⎛

⎝x1·

xn

⎞

⎠=⎛

⎝αx1·

αxn

⎞

⎠ .

F2. Define addition in Fn by

x + y =⎛

⎝x1·

xn

⎞

⎠+⎛

⎝y1·

yn

⎞

⎠=(

x1 + y1xn + yn

).

Since F , by definition, is an abelian additive group, it follows that

x + y =⎛

⎝x1 + y1·

xn + yn

⎞

⎠=⎛

⎝y1 + x1·

yn + xn

⎞

⎠= y + x.

AU

TH

OR

’S P

RO

OF



967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

Now let

0=⎛

⎝0·0

⎞

⎠ .

Clearly

x + 0=⎛

⎝x1·

xn

⎞

⎠+⎛

⎝0·0

⎞

⎠= x.

Hence 0 belongs to Fn and is an additive identity in Fn.Suppose we define

(−x)=−⎛

⎝x1·

xn

⎞

⎠=⎛

⎝−x1·−xn

⎞

⎠ .

Clearly

x + (−x)=⎛

⎝x1·

xn

⎞

⎠+⎛

⎝−x1·−xn

⎞

⎠=(

x1 − x1xn − xn

)= 0.

Thus for each x ∈Fn there is an inverse, (−x), in Fn.Finally, since F is an additive group

x + (y + z)=⎛

⎝x1·

xn

⎞

⎠+⎛

⎝y1 + z1·

yn + zn

⎞

⎠=⎛

⎝x1 + y1·

xn + yn

⎞

⎠+⎛

⎝z1·zn

⎞

⎠

= (x + y)+ z.

Thus (Fn,+) is an abelian group, with zero 0.The fact that it is possible to multiply an element x ∈ Fn by a scalar a ∈ F

endows Fn with further structure. To see this consider the example of �2.1. If a ∈ � and both x, y belong to �2, then

α

[(x1x2

)+(

y1y2

)]= α

(x1 + y1x2 + y2

)=(

αx1 + αy1αx2 + αy2

)

by distribution, and

=(

αx1αx2

)+(

αy1αy2

)= α

(x1x2

)+ α

(y1y2

)

by F1. Thus α(x + y)= αx + αy.

AU

TH

OR

’S P

RO

OF



1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

2.

(α + β)

(x1x2

)=(

(α + β)x1(α + β)x2

)

by F1

=(

αx1 + βx1αx2 + βx2

)=(

αx1αx2

)+(

βx1βx2

)

by F2

= α

(x1x2

)+ β

(x1x2

)

by F1. Therefore, (α + β)x = αx + βx.3.

(αβ)

(x1x2

)=(

(αβ)x1(αβ)x2

)

by F1 = α(

βx1βx2

)by associativity and F1, and = α(βx) by F1.

Thus (αβ)x = α(βx).

4.

(x1x2

)=(

1 · x11 · x2

)=(

x1x2

).

Therefore 1(x)= x.These four properties characterise what is know as a vector space.

Finally, consider the operation of a matrix F on the set of elements in �2. Bydefinition

F(x + y)=(

a b

c d

)[(x1x2

)+(

y1y2

)]=(

a b

c d

) (x1 + y1x2 + y2

)

=(

a(x1 + y1)+ b(x2 + y2)

c(x1 + y1)+ d(x2 + y2)

)=(

ax1 + bx2cx1 + dx2

)+(

ay1 + by2cy1 + dy2

)

by F2

=(

a b

c d

)(x1x2

)+(

a b

c d

) (y1y2

)= F(x)+ F(y).

Hence F : (�2,+) → (�2,+) is a morphism from the abelian group (�2,+)

into itself.By Lemma 1.3, we know that F(0)= 0, and for any element x ∈ �2,

F(−x)= F(−1(x)

)=−F(x)=−1F(x).

AU

TH

OR

’S P

RO

OF



1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

Fig. 1.4 ???

A morphism between vector spaces is called a linear transformation. Vectorspaces and linear transformations are discussed in Chap. 2.

1.4 Preferences and Choices

1.4.1 Preference Relations

A binary relation P on X is a subset of X×X; more simply P is called a relation onX. For example let X ≡� (the real line) and let P be “>” meaning strictly greaterthan. The relation “>” clearly satisfies the following properties:1. it is never the case that x > x

2. it is never the case that x > y and y > x

3. it is always the case that x > y and y > z implies x > z.These properties can be considered more abstractly. A relation P on X is:1. symmetric iff xPy⇒ yPx

asymmetric iff xPy⇒ not (yPx)

antisymmetric iff xPy and yPx⇒ x = y

2. reflexive iff (xPx) ∀ x ∈X

irreflexive iff not (xPx) ∀x ∈X

3. transitive iff xPy and yPz⇒ xPz

4. connected iff for any x, y ∈X either xPy or yPx.By analogy with the relation “>” a relation P , which is both irreflexive and

asymmetric is called a strict preference relation.Given a strict preference relation P on X, we can define two new relations called

I , for indifference, and R for weak preference as follows.1. xIy iff not (xPy) and not (yPx)

2. xRy iff xPy or xIy.By de Morgan’s rule xIy iff not (xPy ∨ yPx). Thus for any x, y ∈X either xIy

or xPy or yPx. Since P is asymmetric it cannot be the case that both xPy andyPx are true. Thus the propositions “xPy”, “yPx”, “xIy” are disjoint, and henceform a partition of the universal proposition, U .

Note that (xPy ∨ xIy)≡ not (yPx) since these three propositions form a (dis-joint) partition. Thus (xRy) iff not (yPx).

In the case that P is the strict preference relation “>”, it should be clear that in-difference is identical to “=” and weak preference to “> or =” usually written “≥”.

njschofi

Cross-Out

njschofi

Sticky Note

Indifference and Strict Preference

AU

TH

OR

’S P

RO

OF


1.4 Preferences and Choices 25

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

Lemma 1.5 If P is a strict preference relation then indifference (I ) is reflexive andsymmetric, while weak preference (R) is reflexive and connected. Moreover if xRy

and yRx then xIy.

Proof1. Since not (xPx) this must imply xIx, so I is reflexive2.

xIy ⇐⇒ not (xPy)∧ not (yPx)

⇐⇒ not (yPx)∧ not (xPy)

⇐⇒ yIx.

Hence I is symmetric.3. xRy⇐⇒ xPy or xIy. Thus xIx⇒ xRx, so R is reflexive.4. xRy⇐⇒ xPy or yIx and yRx⇐⇒ yPx or yIx.

‘ Not (xRy ∨ yRx)⇐⇒ not (xPy ∨ yPx ∨ xIy). But xPy ∨ yPx ∨ xIy isalways true since these three propositions form a partition of the universal set.Thus not (xRy ∨ yRx) is always false, and so xRy ∨ yRx is always true. ThusR is connected.

5. Clearly

xRy and yRx ⇐⇒ (xPy ∧ yPx)∨ xIy

⇐⇒ xIy

since xPy ∧ yPx is always false by asymmetry. �

In the case that P corresponds to “>” then x ≥ y and y ≥ x⇒ x = y, so “≥” isantisymmetric.

Suppose that P is a strict preference relation on X, and there exists a functionu :X→�, called a utility function, such that xPy iff u(x) > u(y). Therefore

xRy iff u(x)≥ u(y)

xIy iff u(x)= u(y).

The order relation “>” on the real line is transitive (since x > y > z⇒ x > z).Therefore P must be transitive when it is representable by a utility function.

We therefore have reason to consider “rationality” properties, such as transitivity,of a strict preference relation.

1.4.2 Rationality

Lemma 1.6 If P is irreflexive and transitive on X then it is asymmetric.

AU

TH

OR

’S P

RO

OF



1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

Proof To show that A∧B⇒ C we need only show that B ∧ not (C)⇒ not (A).Therefore suppose that P is transitive but fails asymmetry. By the latter assump-

tion there exists x, y ∈ X such that xPy and yPx. By transitivity this gives xPx,which violates irreflexivity. �

Call a strict preference relation, P , on X negatively transitive iff it is the casethat, for all x, y, z ∈X, not (xPy) ∧ not (yP z)⇒ not (xP z). Note that xRy⇐⇒not (yPx). Thus the negative transitivity of P is equivalent to the property

yRx ∧ zRy⇒ zRx.

Hence R must be transitive.

Lemma 1.7 If P is a strict preference relation that is negatively transitive then P ,I , R are all transitive.

Proof1. By the previous observation, R is transitive.2. To prove P is transitive, suppose otherwise, i.e., that there exist x, y, z such

that xPy, yPz but not (xP z). By definition not (xP z)⇐⇒ zRx. MoreoveryPz or yIz⇐⇒ yRz. Thus yPz�⇒ yRz. By transitivity of R, zRx and yRz

gives yRx, or not (xPy). But we assumed xPy. By contradiction we musthave xPz.

3. To show I is transitive, suppose xIy, yIz but not (xIz). Suppose xPz, say.But then xRz. Because of the two indifferences we may write zRy and yRx.By transitivity of R, zRx. But zRx and xRz imply xIz, a contradiction. In thesame way if zPx, then zRx, and again xIz. Thus I must be transitive. �

Note that this lemma also implies that P , I and R combine transitively. Forexample, if xRy and yPz then xPz.

To show this, suppose, in contradiction, that not (xP z).This is equivalent to zRx. If xRy, then by transitivity of R, we obtain zRy and so

not (yP z). Thus xRy and not (xP z)⇒ not (yP z). But yPz and not (yP z) cannotboth hold. Thus xRy and yPz⇒ xPz. Clearly we also obtain xIy and yPz⇒ xPz

for example.When P is a negatively transitive strict preference relation on X, then we call it

a weak order on X. Let O(X) be the set of weak orders on X. If P is a transitivestrict preference relation on X, then we call it a strict partial order. Let T (X) be theset of strict partial orders on X. By Lemma 1.7, O(X)⊂ T (X).

Finally call a preference relation acyclic if it is the case that for any finite se-quence x1, . . . , xr of points in X if xjPxj+1 for j = 1, . . . , r − 1 then it cannot bethe case that xrPx1.

Let A(X) be the set of acyclic strict preference relations on X. To see that T (x)⊂A(X), suppose that P is transitive, but cyclic, i.e., that there exists a finite cyclex1Px2 . . . P xrPx1. By transitivity xr−1PxrPx1 gives xr−1Px1, and by repetitionwe obtain x2Px1. But we also have x1Px2, which violates asymmetry.

AU

TH

OR

’S P

RO

OF



1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1.4.3 Choices

As we noted previously, if P is a strict preference relation on a set X, then a maximalelement, or choice, on X is an element x such that for no y ∈ X is it the case thatyPx. We can express this another way. Since P ⊂X×X, there is a mapping

φP :X→ X where φP (x)= {y : yPx}.We shall call φP the preference correspondence of P . The choice of P on X is theset Cp(X) = {x : φP (X) = Φ}. Suppose now that P is a strict preference relationon X. For each subset Y of X, let

CP (Y )= {x ∈ Y : φP (X)∩ Y =Φ}.

This defines a choice correspondence CP : 2X → 2X from 2X , the set of allsubsets of X, into itself.

An important question in social choice and welfare economics concerns the exis-tence of a “social” choice correspondence, CP , which guarantees the non-emptinessof the social choice CP (Y ) for each feasible set, Y , in X, and an appropriate socialpreference, P .

Lemma 1.8 If P is an acyclic strict preference relation on a finite set X, thenCP (Y ) is non-empty for each subset Y of X.

Proof Suppose that X = {x1, . . . , xr}. If all elements in X are indifferent thenclearly CP (X)=X.

So we can assume that if the cardinality |Y | of Y is at least 2, then x2Px1 forsome x2, x1. We proceed by induction on the cardinality of Y .

If Y = {x1} then obviously x1 = CP (Y ).If Y = {x1, x2} then either x1Px2, x2Px1, or x1Ix2 in which case CP (Y ) =

{x1}, {x2} or {x1, x2} respectively. Suppose CP (Y ) �= Φ whenever the cardinality|Y | of Y is 2, and consider Y ′ = {x1, x2, x3}.

Without loss of generality suppose that x2 ∈CP ({x1, x2}), but that neither x1 norx2 ∈ CP (Y ′). There are two possibilities (i) If x2Px1 then by asymmetry of P , not(x1Px2). Since x2 /∈ CP (Y ′) then x3Px2, so not (x2Px3). Suppose that CP (Y ′)=Φ . Then x1Px3, and we obtain a cycle x1Px3Px2Px1. This contradicts acyclicity,so x3 ∈ CP (Y ′). (ii) If x2Ix1 and x3 /∈ CP (Y ′) then either x1Px3 or x2Px3. Butneither x1 nor x2 ∈ CP (Y ′) so x3Px1 and x3Px2. This contradicts asymmetry of P .Consequently x3 ∈ CP (Y ′).

It is clear that this argument can be generalised to the case when |Y | = k and Y ′is a superset of Y (i.e., Y ⊂ Y ′ with |Y ′| = k+ 1.) So suppose CP (Y ) �=Φ . To showCP (Y ′) �=Φ when Y ′ = Y ∪{xk+1}, suppose [CP (Y )∪{xk+1}]∩CP (Y ′) �=Φ . Thenthere must exist some x ∈ Y such that xPxk+1.

If x ∈ CP (Y ) then xk+1Px, since x /∈ CP (Y ′) and zPx for no z ∈ Y . Hence weobtain the asymmetry xPxk+1Px. On the other hand if x ∈ Y\CP (Y ), then theremust exist a chain xrPxr−1P . . . x1Px with r < k, such that xr ∈CP (Y ). Since xr /∈

AU

TH

OR

’S P

RO

OF



1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

CP (Y ′) it must be the case that xk+1Pxr . This gives a cycle xPxk+1PxrP . . . x. Bycontradiction, CP (Y ′) �=Φ .

By induction if CP (Y ) �= Φ then CP (Y ′) �= Φ for any superset Y ′ of Y . SinceCP (Y ) �=Φ whenever |Y | = 2, it is evident that CP (Y ) �=Φ for any finite subset Y

of X. �

If P is a strict preference relation on X and P is representable by a utility func-tion u :X→� then it must be the case that all of P, I,R are transitive. To see this,we note the following:1. xRy and yRz iff u(x) ≥ u(y) ≥ u(z). Since “≥” on � is transitive it follows

that u(x)≥ u(z) and so xRz.2. xIy and yIz iff u(x)= u(y)= u(z), and thus xIz.

In this case indifference, I , is reflexive, symmetric and transitive. Such a relationon X is called an equivalence relation.

For any point x in X, let [x] be the equivalence class of x in X, i.e., [x] = {y :yIx}.

Every point in X belongs to exactly one equivalence class. To see this supposethat x ∈ [y] and x ∈ [z], then xIy and xIz. By symmetry zIx, and by transitivityzIy. Thus [y] = [z].

The set of equivalence classes in X under an equivalence relation, I , is writtenX/I . Clearly if u : X→� is a utility function then an equivalence class [x] is ofthe form

[x] = {y ∈X : u(x)= u(y)},

which we may also write as u−1[u(x)].If X is a finite set, and P is representable by a utility function then

CP (X)= {x ∈X : u(x) = s}

where s is max [u(y) : y ∈X], the maximum value of u on X.Social choice theory is concerned with the existence of a choice under a social

preference relation P which in some sense aggregates individual preferences for allmembers of a society M = {1, . . . , i, . . . ,m}. Typically the social preference relationcannot be representable by a “social” utility function. For example suppose a societyconsists of n individuals, each one of whom has a preference relation Pi on thefeasible set X.

Define a social preference relation P on X by xPy iff xPiy for all i ∈M (P iscalled the strict Pareto rule).

It is clear that if each Pi is transitive, then so must be P . As a result, P must beacyclic. If X is a finite set, then by Lemma 1.8, there exists a choice CP (X) on X.

The same conclusion follows if we define xQy iff xRjy ∀ j ∈M , and xPiy forsome i ∈M .

If we assume that each individual has negatively transitive preferences, then Q

will be transitive, and will again have a choice. Q is called the weak Pareto rule.Note that a point x belongs to CQ(X) iff it is impossible to move to another point y

AU

TH

OR

’S P

RO

OF



1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

Fig. 1.5 ???

which makes nobody “worse off”, but makes some members of the society “betteroff”. The set CQ(X) is called the Pareto set. Although the social preference rela-tion Q has a choice, there is no social utility function which represents Q. To seethis suppose the society consists of two individuals 1,2 with transitive preferencesxP1yP1z and zP2xP2y.

By the definition xQy, since both individuals prefer x to y. However conflict ofpreference between y and z, and between x and z gives yIz and xIz, where I isthe social indifference rule associated with Q. Consequently I is not transitive andthere is no “social utility function” which represents Q. Moreover the elements of X

cannot be partitioned into disjoint indifference equivalence classes.To see the same phenomenon geometrically define a preference relation P on �2

by

(x1, x2)P (y1, y2) ⇐⇒ x1 > y1 ∧ x2 > y2.

From Fig. 1.5 (x1, x2) P (y1, y2). However (x1, x2) I (z1, z2) and (y1, y2) I

(z1, z2). Again there is no social utility function representing the preference relationQ. Intuitively it should be clear that when the feasible set is “bounded” in someway in �2, then the preference relation Q has a choice. We shall show this moregenerally in a later chapter. (See Lemma 3.9 below.)

In Fig. 1.5, we have represented the preference Q in �2 by drawing the set pre-ferred to the point (y1, y2), say, as a subset of �2.

An alternative way to describe the preference is by the graph of φP . For example,suppose X is the unit interval [0,1] in �, and let the horizontal axis be the domainof φP , and the vertical axis be the co-domain of φP . In Fig. 1.6, the preference P isidentical to the relation > on the interval (so yPx iff y > x). The graph of φP is thenthe shaded set in the figure. Note that yPx iff xP−1y. Because P is irreflexive, thediagonal eX = {(x, x) : x ∈ X} cannot belong to graph (φP ). To find graph (φ−1

P )

we simply “reflect” graph (φP ) in the diagonal.The shaded set in Fig. 1.7 represents graph (φ−1

P ). Because P is asymmetric,it is impossible for both yPx and xPy to be true. This means that graph(φP ) ∩graph(φ−1

P ) = Φ (the empty set). This can be seen by superimposing Figs. 1.6and 1.7. A preference of the kind illustrated in Fig. 1.6 is often called monotonicsince increasing values of x are preferred.

njschofi

Cross-Out

njschofi

Sticky Note

The preference relation Q

AU

TH

OR

’S P

RO

OF



1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

Fig. 1.6 ???

Fig. 1.7 ???

Fig. 1.8 ???

To illustrate a transitive, but non-monotonic strict preference, consider Fig. 1.8which represents the preference yPx iff x < y < 1−x, for x ≤ 1

2 , or 1−x < y < x,for x > 1

2 . For example if x = 14 , then φP (x) is the interval ( 1

4 , 34 ), namely all points

between 14 , and 3

4 , excluding the end points.It is obvious that P represents a utility function

u(x)= x if x ≤ 1

2

njschofi

Cross-Out

njschofi

Sticky Note

The graph of the Preference P

njschofi

Cross-Out

njschofi

Sticky Note

The graph of the inverse preference

njschofi

Cross-Out

njschofi

Sticky Note

Representation of the preference P

AU

TH

OR

’S P

RO

OF



1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

Fig. 1.9 ???

u(x)= 1− x if1

2< x ≤ 1.

Clearly the choice of P is CP (x)= 12 . Such a most preferred point is often called

a “bliss point” for the preference. Indeed a preference of this kind is usually called“Euclidean”, since the preference is induced from the distance from the bliss point.In other words, yPx iff |y− 1

2 |< |x− 12 |. Note again that this preference is transitive

and of course acyclic. The fact that the P is asymmetric can be seen from notingthat the shaded set in Fig. 1.8 (graph φP ) and the shaded set in Fig. 1.9 (graph φ−1

P )do not intersect.

Figure 1.10 represents a more complicated asymmetric preference. Here

φP (x)=(

x, x + 1

2

)if x ≤ 1

2

=(

1

2, x

)∪(

0, x − 1

2

)if x >

1

2.

Clearly there is a cycle, say 14 P 1

8 P 1116 P 1

4 . Moreover the choice CP (X) isempty. This example illustrates that when acyclicity fails, then it is possible for thechoice to be empty.

To give an example where P is both acyclic on the interval, yet no choice exists,consider Fig. 1.11. Define φP (x)= (x, x+ 1

2 ) if x ≤ 12 and φP (x)= ( 1

2 , x) if x > 12 .

P is still asymmetric, but we cannot construct a cycle. For example, if x = 14

then yPx for y ∈ ( 14 , 3

4 ) but if zPy then z > 14 . Note however that φP ( 1

2 )= ( 12 ,1)

so CP (x)=Φ .This example shows that Lemma 1.8 cannot be extended directly to the case that

X is the interval. In Chap. 3 below we show that we have to impose “continuity” onP to obtain an analogous result to Lemma 1.8.

njschofi

Cross-Out

njschofi

Sticky Note

A Euclidean Preference

AU

TH

OR

’S P

RO

OF



1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

Fig. 1.10 ???

Fig. 1.11 ???

1.5 Social Choice and Arrow’s Impossibility Theorem

The discussion following Lemma 1.8 showed that even the weak Pareto rule, Q,did not give rise to transitive indifference, and thus could not be represented by a“social utility function”. However Q does give rise to transitive strict preference. Weshall show that any rule that gives transitive strict preference must be “oligarchic”in the same way that Q is oligarchic. In other words any rule that gives transitivestrict preference must be based on the Pareto (or unanimity) choice of some subset,say Θ of the society, M . Arrow’s Theorem (Arrow 1951) shows that if it is desiredthat indifference be transitive, then the rule must be “dictatorial”, in the sense that itobeys the preference of a single individual.

njschofi

Cross-Out

njschofi

Sticky Note

An empty Choice Set.

njschofi

Cross-Out

njschofi

Sticky Note

An acyclic Preference with empty choice.

AU

TH

OR

’S P

RO

OF


1.5 Social Choice and Arrow’s Impossibility Theorem 33

1473

1474

1475

1476

1477

1478

1479

1480

1481

1482

1483

1484

1485

1486

1487

1488

1489

1490

1491

1492

1493

1494

1495

1496

1497

1498

1499

1500

1501

1502

1503

1504

1505

1506

1507

1508

1509

1510

1511

1512

1513

1514

1515

1516

1517

1518

1.5.1 Oligarchies and Filters

The literature on social choice theory is very extensive and technical, and this sec-tion will not attempt to address its many subtleties. The general idea to examine thepossible rationality properties of a “social choice rule”

σ : S(X)M −→ S(X).

Here S(X) stands for the set of strict preference relations on the set X andM = {1, . . . , i, . . .} is a society. Usually M is a finite set of cardinality m. Some-times however M will be identified with the set of integers Z . S(X)M is the setof strict preference profiles for this society. For example if |M| = m, then a pro-file π = {P1, . . . ,Pm} is a list of preferences for the members of the society. Weuse A(X)M,T (X)M,O(X)M for profiles whose individual preferences are acyclic,strict partial orders or weak orders, respectively. Social choice theory is based onbinary comparisons. This means that if two profiles π1 and π2 agree on a pair ofalternatives {x, y}, say, then the social preferences σ(π1) and σ(π2) also agree on{x, y}. A key idea is that of a decisive coalition. Say a subset A ⊂M is decisiveunder the rule σ iff for any profile π = (P1, . . . ,Pm) such that xPiy for all i ∈ A

then x(σ (π))y. That is to say whenever A is decisive, and its members agree thatx is preferred to y then the social preference chooses x over y. The set of decisivecoalitions under the rule is written Dσ , or more simply, D. To illustrate this idea,suppose M = {1,2,3} and Dσ comprises any coalition with at least two members.It is easy to construct a profile π ∈ A(x)M such that σ is not even acyclic. Forexample, choose a profile π on the alternatives {x, y, z} such that

xP1yP1z, yP2zP2x, zP3xP3y.

Since both 1 and 2 prefer y to z we must have yσ(π)z. But in the same waywe find that zσ (π)x and xσ(π)y, giving a social preference cycle on {x, y, z}. Ingeneral restricting the image of σ so that it lies in A(X),T (X), and O(X) imposesconstraints on Dσ . We now examine these constraints.

Lemma 1.9 If σ : T (X)M −→ T (X), and M,A,B all belong to Dσ , then A ∩B ∈Dσ .

Outline of Proof Partition M into the four sets V1 =A∩B,

V2 =A\B,V3 = B\A,V4 =M\(A∪B) and suppose that each individual has pref-erences on the alternatives {x, y, z} as follows:

i ∈ V1: zPixPiy

i ∈ V2: xPiy, with preferences for z unspecified

i ∈ V3: zPix, with preferences for y unspecified

i ∈ V4: completely unspecified.

AU

TH

OR

’S P

RO

OF



1519

1520

1521

1522

1523

1524

1525

1526

1527

1528

1529

1530

1531

1532

1533

1534

1535

1536

1537

1538

1539

1540

1541

1542

1543

1544

1545

1546

1547

1548

1549

1550

1551

1552

1553

1554

1555

1556

1557

1558

1559

1560

1561

1562

1563

1564

Now A\B = {i ∈ A, but i /∈ B}, so V1 ∪ V2 = A. Since A is decisive and everyindividual in A prefers x to y we obtain xσ(π)y. In the same way V1∪V3 = B , andB is decisive, so zσ (π)x. Since we require σ(π) to be transitive, it is necessary thatzσ (π)y. Since individual preferences are assumed to belong to T (X), we requirethat zPiy for all i ∈ V1. We have not however specified the preferences for the restof the society. Thus V1 =A∩B must be decisive for {x, z} in the sense that V1 canchoose between x and z, independently of the rest of the society. But this must betrue for every pair of alternatives. Thus A∩B ∈Dσ . �

In general it could be possible for Dσ to be empty. However it is usual to assumethat σ satisfies the strict Pareto rule. That is to say for any x, y if xPiy, for all i ∈M

then xσ(π)y. This simply means that M ∈Dσ . Moreover, this implies that Φ /∈Dσ .To see this, suppose that Φ ∈Dσ and consider a profile with xPiy ∀ i ∈M . Sincenobody prefers y to x, and the empty set is decisive, we obtain yσ(π)x. But bythe Pareto rule, we have xσ(π)y. We assume however that σ(π) is always a strictpreference relation, and so yσ(π)x cannot occur. Finally if A ∈Dσ then any set B

which contains A must also be decisive. Thus Lemma 1.9 can be interpreted in thefollowing way.

Lemma 1.10 If σ : T (X)M −→ T (X) and σ satisfies the strict Pareto rule, thenDσ satisfies the following conditions:D1. (monotonicity) A⊂ B and A ∈Dσ implies B ∈Dσ .D2. (identity) A ∈Dσ and Φ /∈Dσ .D3. (closed under intersection) A,B ∈Dσ implies A∩B ∈Dσ .

A collection D of subsets of M which satisfy D1, D2, and D3 is called a filter.Note also that when M is finite, then Dσ must also be finite. By repeating

Lemma 1.9 for each pair of coalitions, we find that Θ = ∩{Ai : Ai ∈ Dσ } mustbe non-empty and also decisive. This set Θ is usually called the oligarchy. In thecase that σ is simply the strict Pareto rule, then the oligarchy is the whole society,M .

However, any rule that gives a transitive strict preference relation must be equiv-alent to the Pareto rule based on some oligarchy, possibly a strict subset of M .

For example, majority rule for the society M = {1,2,3} defines the decisivecoalitions {1,2}, {1,3}, {2,3}. These three coalitions contain no oligarchy. We canimmediately infer that this rule cannot be transitive. In fact, as we have seen, it isnot even acyclic. Below, we explore this further. If we require the rule always to sat-isfy the condition of negative transitivity then the oligarchy will consist of a singleindividual, when M is finite.

Lemma 1.11 If σ : T (X)M −→O(X) and M ∈Dσ and A⊂M and A /∈Dσ , thenM\A ∈Dσ .

Proof Since A /∈Dσ , we can find a profile π and a pair {x, y} such that yPix for alli ∈ A, yet not (yσ (π)x). Let us write this latter condition as xRy, where R standsfor weak social preference.

AU

TH

OR

’S P

RO

OF



1565

1566

1567

1568

1569

1570

1571

1572

1573

1574

1575

1576

1577

1578

1579

1580

1581

1582

1583

1584

1585

1586

1587

1588

1589

1590

1591

1592

1593

1594

1595

1596

1597

1598

1599

1600

1601

1602

1603

1604

1605

1606

1607

1608

1609

1610

Suppose now there is an alternative z such that xPiz for all i ∈M\A, and thatyPiz for all i ∈M . By the Pareto condition (M ∈ Dσ ) we obtain yPz (where P

stands for σ(π)). By negative transitivity of P , xRy and yPz implies xPz (seeLemma 1.7). However we have not specified the preferences of A on {x, z}. But wehave shown that if the members of M\A prefer x to z, then so does the society. Itthen follows that M\A must be decisive. �

It follows from this lemma that if σ : T (X)M −→ O(X) and M ∈ Dσ , thenwhenever A ∈ Dσ , there is some proper subset B (such that B ⊂ A yet B �= A)with B ∈ Dσ . To see this consider any proper subset C of A with C /∈ Dσ . ByLemma 1.11, M\C ∈Dσ . But since O(X) belongs to T (X), we can use the prop-erty D3 of Lemma 1.10 to infer that A ∩ (M\C) ∈ Dσ . But A ∩ (M\C) = A\C,and since C is a proper subset of A, A\C �= Φ . Hence A\C ∈ Dσ . In the case M

has finite cardinality, we can repeat this argument to show that there must be someindividual i such that {i} ∈Dσ . But then i is a dictator in the sense that, for any x, y

if π is a profile with xPiy then xσ(π)y.

Arrow’s Impossibility Theorem If σ :O(X)M −→O(X) and M ∈Dσ , with |M|finite, then there is a dictator {i}, say, such that {i} ∈Dσ .

It is obvious that if Dσ is non-empty, then all the coalitions in Dσ must containthe dictator {i}. In particular {i} = ∩{Mi : i ∈Dσ }, and {i} ∈Dσ .

A somewhat similar result holds when M is a “space” rather than a finite set.In this case there need not be a dictator in M . However in this case, the filter Dσ

defines an “invisible dictator”. That is to say, we can imagine the coalitions in Dσ

becoming smaller and smaller, so that they define the invisible dictator in the limit.

1.5.2 Acyclicity and the Collegium

As Lemma 1.8 demonstrated, if P is an acyclic preference relation on a finite set,then the choice for P will be non-empty. Given Arrow’s Theorem, it is thereforeuseful to examine the properties of a social choice rule that are compatible withacyclicity. To this end we introduce the notion of the Nakamura number for a rule,σ (Nakamura 1979).

Definition 1.4 Let D be a family of subsets of the finite set M . The collegiumK(D) is the intersection

∩{Ai :Ai ∈D}.That is to say K(D) is the largest set in M such that K(D)⊂A for all A ∈D.If K(D) is empty then D is said to be non-collegial. Otherwise D is collegial.If σ is a social choice rule, and Dσ its family of decisive coalitions, then

K(σ)=K(Dσ ) is the collegium for σ . Again σ is called collegial or non-collegialdepending on whether K(σ) is non-empty or empty. The Nakamura number of

njschofi

Sticky Note

See Kirman and Sondermann (1972).

AU

TH

OR

’S P

RO

OF



1611

1612

1613

1614

1615

1616

1617

1618

1619

1620

1621

1622

1623

1624

1625

1626

1627

1628

1629

1630

1631

1632

1633

1634

1635

1636

1637

1638

1639

1640

1641

1642

1643

1644

1645

1646

1647

1648

1649

1650

1651

1652

1653

1654

1655

1656

a non-collegial family D is written k(D) and is the cardinality of the smallestnon-collegial subfamily of D. That is, there exists some subfamily D′ of D with|D′| = k(D) such that K(D′) = Φ . Moreover if D′′ is a subfamily of D with|D′′| ≤ k(D)− 1 then K(D′′) �=Φ .

In the case D is collegial define k(D)=∞. For a social choice rule define k(σ )=k(Dσ ), where Dσ is the family of decisive coalitions for σ .

Example 1.7(i) To illustrate this definition, suppose D consists of the four coalitions,

{A1,A2,A3,A4} where A1 = {2,3,4}, A2 = {1,3,4}, A3 = {1,2,4,5} andA4 = {1,2,3,5}. Of course D will be monotonic, so supersets of thesecoalitions will be decisive. It is evident that A1 ∩ A2 ∩ A3 = {4} and soif D′ = {A1,A2,A3} then K(D′) �= Φ . However K(D′) ∩ A4 = Φ and soK(D)=Φ . Thus k(D)= 4.

(ii) An especially interesting case is of a q-majority rule where each individualhas one vote, and any coalition with at least q voters (out of m) is decisive.In this case it is easy to show then that k(σ )= 2+ [ q

m−q] where [ q

m−q] is the

greatest integer strictly less than qm−q

. In the case that m= 4 and q = 3, then

we find that [ 31 ] = 2, so k(σ )= 4.

On the other hand for all other simple majority rules where m= 2s + 1 or2s and q = s + 1 (and s is integer) then

[q

m− q

]=[s + 1

s

]or

[s + 1

s − 1

]

depending on whether m is odd or even. In both cases [ qm−q

] = 1. Thusk(σ )= 3 for any simple majority rule with m �= 4.

(iii) Finally, observe that for any simple majoritarian rule, if M1,M2 both belongto D, then A1 ∩ A2 �= Φ . So in general, any non-collegial subfamily of Dmust include at least three coalitions. Consequently any majoritarian rule, σ ,has k(σ )≥ 3.

The Nakamura number allows us to construct social preference cycles.

Nakamura Lemma Suppose that σ is a non-collegial voting rule, with Nakamuranumber k(σ )= k. Then there exists an acyclic profile π = (P1, . . . ,Pn) for the so-ciety M , on a set X = {x1, . . . , xk} of cardinality k, such that σ(π) is cyclic on W ,and the choice of σ(π) is empty.

Proof We wish to construct a cycle by considering k different decisive coalitions,A1, . . . ,Ak and assigning preferences to the members of each coalition such that

AU

TH

OR

’S P

RO

OF



1657

1658

1659

1660

1661

1662

1663

1664

1665

1666

1667

1668

1669

1670

1671

1672

1673

1674

1675

1676

1677

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1689

1690

1691

1692

1693

1694

1695

1696

1697

1698

1699

1700

1701

1702

xiPix2 for all i ∈A1

...

xk−1Pixk for all i ∈Ak−1

xkPix1 for all i ∈Ak.

We now construct such a profile, π . Let Dk = {A1, . . . ,Ak−1}. By the definitionof the Nakamura number, this must be collegial. Hence there exists some individual{k}, say, with k ∈ A1 ∩ · · · ∩ Ak−1. We can assign k the acyclic preference profilex1Pkx2Pk . . .Pkxk .

In the same way, for each subfamily Dj = {A, . . . ,Aj−1,Aj+1, . . . ,Ak} thereexists a collegium containing individual j , to whom we assign the preference

xj+1Pjxj+2 . . . xkPjx1 . . . xjPjxj .

We may continue this process to assign acyclic preferences to each member ofthe various collegia of subfamilies of D, so as to give the required cyclic socialpreference. �

Lemma 1.12 A necessary condition for a social choice rule σ to be acyclic on thefinite set X of cardinality at least m= |M|, for each acyclic profile π on X, is thatσ be collegial.

Proof Suppose σ is not collegial. It is easy to show that the Nakamura number k(σ )

will not exceed m. By the Nakamura Theorem there is an acyclic profile π on a setX of cardinality m, such that σ(π) is cyclic on X. Thus acyclicity implies that σ

must be collegial. �

Note that this lemma emphasizes the size of the set of alternatives. It is worthobserving here that in the previous proofs of Arrow’s Theorem, the cardinality ofthe set of alternatives was implicitly assumed to be at least 3.

These techniques using the Nakamura number can be used to show that a simplesocial rule will be acyclic whenever it is collegial. Say a social choice rule is simpleiff whenever xσ(π)y for the profile π , then xPiy for all i in some coalition that isdecisive for σ .

Note that a social choice rule need not, in general, be simple. If σ is simple thenall the information necessary to analyse the rule is contained in Dσ .

Lemma 1.13 Let σ be a simple choice rule on a finite set X:(i) If σ is dictatorial, then σ(π)⊂O(X) for all π ∈O(X)M

(ii) If σ is oligarchic, then σ(π)⊂ T (X) for all π ∈ T (X)M

(iii) If σ is collegial, then σ(π)⊂A(X) for all π ∈A(X)M .

AU

TH

OR

’S P

RO

OF



1703

1704

1705

1706

1707

1708

1709

1710

1711

1712

1713

1714

1715

1716

1717

1718

1719

1720

1721

1722

1723

1724

1725

1726

1727

1728

1729

1730

1731

1732

1733

1734

1735

1736

1737

1738

1739

1740

1741

1742

1743

1744

1745

1746

1747

1748

Proof(i) If i is a dictator, then xIiy implies that x and y are socially indifferent. Be-

cause Pi belongs to O(X) so must σ(π).(ii) In the same way, if Θ is the oligarchy and xσ(π)y then xPiy for all i in Θ .

Thus σ(π) must be transitive.(iii) If there is a cycle x1σ(π) . . . , xkσ (π)x1 then each of these social preferences

must be supported by a decisive coalition. Since the collegium is non-empty,there is some individual, i, say, who has such a cyclic preference. This con-tradicts the assumption that π ∈A(X)M .

�

Another way of expressing part (iii) of this lemma is to introduce the idea ofa prefilter. Say D is a prefilter if and only if it satisfies D1 (monotonicity) andD2 (identity) introduced earlier, and also non-empty intersection (so K(D) �= Φ).Clearly if σ is simple, and Dσ is a prefilter, then σ is acyclic and consistent with thePareto rule.

In Chap. 3 we shall further develop the notion of social choice using the notionof the Nakamura number, in the situation where X has a geometric structure.

References

The first version of the impossibility theorem can be found in

Arrow, K. J. (1951). Social choice and individual values. New York: Wiley.

The ideas of the filter and Nakamura number are in

<unc> Kirman, A. P., & Sondermann, D. (1972). Arrow’s Impossibility Theorem, many agents and invis-ible dictators. Journal of Economic Theory, 5, 267–278.

Nakamura, K. (1979). The vetoers in a simple game with ordinal preferences. International Journalof Game Theory, 8, 55–61.

njschofi

Cross-Out

njschofi

Sticky Note

Further Reading.

AU

TH

OR

’S P

RO

OF


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

2Linear Spaces and Transformations

2.1 Vector Spaces

We showed in Sect. 1.3 that when F was a field, the n-fold product set Fn had anadditional operation defined on it, which was induced from addition in F , so that(Fn,+) became an abelian group with zero 0. Moreover we were able to define aproduct · : F ×Fn → Fn which takes (α, x) to a new element of Fn called (αx).Elements of Fn are known as vectors, and elements of F as scalars. The propertiesthat we discovered in Fn characterise a vector space. A vector space is also knownas a linear space.

Definition 2.1 A vector space (V ,+) is an abelian additive group with zero 0,together with a field (F,+, ·) with zero 0 and identity 1. An element of V is calleda vector and an element of F a scalar. Moreover for any α ∈ F, v ∈ V there is ascalar multiplication (α, v)→ αv ∈ V which satisfies the following properties:V1. α(v1 + v2)= αv1 + αv2, for any α ∈F, v1, v2 ∈ V .V2. (α + β)v = αv + βv, for any α,β ∈F, v ∈ V .V3. (αβ)v = α(βv), for any α,β ∈F, v ∈ V .V4. 1 · v = v, for 1 ∈F , and for any v ∈ V .Call V a vector space over the field F . From the previous discussion the set �n

becomes an abelian group (�n,+) under addition. We shall frequently

write

x =⎛

⎜⎝

x1...

xn

⎞

⎟⎠


39

http://dx.doi.org/10.1007/978-3-642-39818-6_2

AU

TH

OR

’S P

RO

OF


40 2 Linear Spaces and Transformations

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

Fig. 2.1 ???

for a vector in �n, where x1, . . . , xn are called the coordinates of x. Vector additionis then defined by

x + y =⎛

⎜⎝

x1...

xn

⎞

⎟⎠+⎛

⎜⎝

y1...

yn

⎞

⎟⎠=

⎛

⎜⎝

x1 + y1...

xn + yn

⎞

⎟⎠ .

A vector space over � is called a real vector space.

For example (Z2,+, ·) is a field and so (Z2)n is a vector space over the field Z2.

It may not be possible to represent each vector in a vector space by a list of coordi-nates. For example, consider the set of all functions with domain X and image in �.Call this set �X . If f,g ∈ �X , define f + g to be that function which maps x ∈X

to f (x)+ g(x). Clearly there is a zero function 0 defined by 0(x)= 0, and each f

has an inverse (−f ) defined by (−f )(x) = −(f (x)). Finally for α ∈ �, f ∈ �X ,define αf :X→� by (αf )(x)= α(f (x)). Thus �X is a vector space over �.

Definition 2.2 Let (V ,+) be a vector space over a field, F . A subset V ′ of V iscalled a vector subspace of V if and only if1. v1, v2 ∈ V ′ ⇒ v1 + v2 ∈ V ′, and2. if α ∈ F and v ∈ V ′ then αv′ ∈ V ′.

Lemma 2.1 If (V ,+) is a vector space with zero 0 and V ′ is a vector subspace,then, for each v ∈ V ′, the inverse (−v) ∈ V ′, and 0 ∈ V ′, so (V ′,+) is a subgroupof (V ,+).

Proof Suppose v ∈ V ′. Since F is a field, there is an identity 1, with additive inverse−1. But by V2, (1− 1)v = 1 · v + (−1)v = 0 · v, since 1− 1= 0. Now (1+ 0)v =1 · v + 0 · v, and so 0 · v = 0. Thus (−1)v = (−v). Since V ′ is a vector subspace,(−1)v ∈ V ′, and so (−v) ∈ V ′. But then v + (−v)= 0, and so 0 ∈ V ′. �

From now on we shall simply write V for a vector space, and refer to the fieldonly on occasion.

njschofi

Cross-Out

njschofi

Sticky Note

Vector addition

njschofi

Sticky Note

, as in Figure 2.1

AU

TH

OR

’S P

RO

OF


2.1 Vector Spaces 41

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

Definition 2.3 Let V ′ = {v1, . . . , vr} be a set of vectors in the vector space V .A vector v is called a linear combination of the set V ′ iff v can be written in theform

v =r∑

i=1

λivi

where each λi, i = 1, . . . , r belongs to the field F . The span of V ′, written Span(V ′)is the set of vectors which are linear combinations of the set V ′. If V ′′ = Span(V ′),then V ′ is said to span V ′′.

For example, suppose

V ′ ={(

12

)(21

)}.

Since we can solve the equation

(x

y

)= α

(12

)+ β

(21

)

for any (x, y) ∈ �2, by setting α = 13 (2y − x) and β = 1

3 (2x − y), it is clear that V ′is a span for �2.

Lemma 2.2 If V ′ is a finite set of vectors in the vector space, V , then Span(V ′) isa vector subspace of V .

Proof We seek to show that for any α,β ∈ F and any u,w ∈ Span(V ′), then αu+βw ∈ Span(V ′). By definition, if V ′ = {v1, . . . , vr}, then u =∑r

i=1 ηivi and w =∑ri=1 μivi , where ηi,μi ∈F for i = 1, . . . , r . But then αu+ βw = α

∑ri=1 ηivi +

β∑r

i=1 μivi =∑ri=1 λivi , where λi = αηi + βμi ∈ F , for i = 1, . . . , r . Thus

αu+ βw ∈ Span(V ′).Note that, by this lemma, the zero vector 0 belongs to Span(V ′). �

Definition 2.4 Let V ′ = {v1, . . . , vr} be a set of vectors in V . V ′ is called a frameiff∑r

i=1 αivi = 0 implies that αi = 0 for i = 1, . . . , r . (Here each αi belongs to thefield F .) In this case the set V ′ is called a linearly independent set. If V ′ is not aframe, the vectors in V ′ are said to be linearly dependent. Say a vector is linearlydependent on V ′ = {v1, . . . , vr} iff v ∈ Span(V ′).

Note that if V ′ is a frame, then1. 0 /∈ V ′ since α0= 0 for every non-zero α ∈ F .2. If v ∈ V ′ then (−v) /∈ V ′, otherwise 1 · v + 1(−v) = 0 would belong to V ′,

contradicting (1).

AU

TH

OR

’S P

RO

OF



139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

Lemma 2.31. V ′ is not a frame iff there is some vector v ∈ V ′ which is linearly dependent on

V ′\{v}.2. If V ′ is a frame, then any subset of V ′ is a frame.3. If V ′ spans V ′′, but V ′ is not a frame, then there exists some vector v ∈ V ′ such

that V ′′′ = V ′\{v} spans V ′′.

Proof Let V ′ = {v1, . . . , vr} be the set of vectors in the vector space V .1. Suppose V ′ is not a frame. Then there exists an equation

∑rj=1 αjvj =

0, where, for at least one k, it is the case that αk �= 0. But then vk =− 1

αk(∑

j �=k αj vj ). Let vk = v. Then v is linearly dependent on V ′\{v}. Onthe other hand suppose that v1, say, is linearly dependent on {v2, . . . , vr}. Thenv1 =∑r

j=2 αjvj , and so 0=−v1+∑rj=2 αjvj =∑r

j=1 αjvj where α1 =−1.Since α1 �= 0,V ′ cannot be a frame.

2. Suppose V ′′ is a subset of V ′, but that V ′′ is not a frame. For convenience letV ′′ = {v1, . . . , vk} where k ≤ r . Then there is a non-zero solution

0 �=k∑

j=1

αjvj .

Since V ′′ is a subset of V ′, this implies that V ′ cannot be a frame. Thus if V ′is a frame, so is any subset V ′′.

3. Suppose that V ′ is not a frame, but that it spans V ′′. By part (1), there ex-ists a vector v1, say, in V ′ such that v1 belongs to Span(V ′\{v1}). Thusv1 =∑r

j=2 αjvj . Since V ′ is a span for V ′′, any vector v in V ′′ can be written

v =r∑

j=1

βjvj

= β1

(r∑

j=2

αjvj

)

+r∑

j=2

βjvj .

Thus v is a linear combination of V ′\{v1} and so V ′′ = Span(V ′\{v1}).Let V ′′′ = V ′\{v1} to complete the proof. �

Definition 2.5 A basis for a vector space V is a frame V ′ which spans V .For example, we previously considered

V ′ ={(

12

),

(21

)}

AU

TH

OR

’S P

RO

OF


2.1 Vector Spaces 43

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

and showed that any vector in �2 could be written as(

x

y

)=(

2y − x

3

)(12

)+(

2x − y

3

)(21

)= λ1

(12

)+ λ2

(21

).

Thus V ′ is a span for �2. Moreover if (x, y) = (0,0) then λ1 = λ2 = 0 and so V ′is a frame. Hence V ′ is a basis for �2. If V ′ = {v1, . . . , vn} is a basis for a vectorspace V then any vector v ∈ V can be written

v =n∑

j=1

αjvj

and the elements (α1, . . . , αn) are known as the coordinates of the vector v, withrespect to the basis V ′.

For example the natural basis for �n is the set V ′ = {e1, . . . , en} where ei =(0, . . . ,1, . . . ,0} with a 1 in the ith position.

Lemma 2.4 {e1, . . . , en} is a basis for �n.

Proof We can write any vector x in �n as {x1, . . . , xn}. Clearly

x =⎛

⎜⎝

x1...

xn

⎞

⎟⎠= x1

⎛

⎝10·

⎞

⎠+ · · ·xn

⎛

⎜⎝

0...

1

⎞

⎟⎠ .

If x = 0 then x1 = · · · = xn = 0 and so {e1, . . . , en} is a frame, as well as a span, andthus a basis for �n. �

However a single vector x will have different coordinates depending on the basischosen. For example the vector (x, y) has coordinates (x, y) in the basis {e1, e2} but

coordinates (2y−x

3 ,2x−y

3 ) with respect to the basis{( 1

2

),( 2

1

)}.

Once the basis is chosen, the coordinates of any vector with respect to that basisare unique. �

Lemma 2.5 Suppose V ′ = {v1, . . . , vn} is a basis for V . Let v =∑ni=1 αivi . Then

the coordinates (α1, . . . , αn), with respect to the basis, are unique.

Proof If the coordinates were not unique then it would be possible to write v =∑ni=1 βivi =∑n

i=1 αivi with βi �= αi for some i.But 0= v − v =∑n

i=1 αivi −∑ni=1 βivi =∑n

i=1 (αi − βi)vi .Since V ′ is a frame, αi − βi = 0 for i = 1, . . . , n. Thus αi = βi for all i, and so

the coordinates are unique. �

Note in particular that with respect to any basis {v1, . . . , vn} for V , the uniquezero vector 0 always has coordinates (0, . . . ,0).

AU

TH

OR

’S P

RO

OF



231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

Definition 2.6 A space V is finitely generated iff there exists a span V ′, for V ,which has a finite number of elements.

Lemma 2.6 If V is a finitely generated vector space, then it has a basis with afinite number of elements.

Proof Since V is finitely generated, there is a finite set V1 = {v1, . . . , vn} whichspans V . If V1 is a frame, then it is a basis. If V1 is linearly dependent, thenby Lemma 2.3(3) there is a vector v ∈ V1, such that Span(V2) = V , whereV2 = V1\{v}. Again if V2 is a frame, then it is a basis. If there were no subsetVr = {v1, . . . , vn−r+1} of V1 which was a frame, then V1 would have to be the emptyset, implying that V was an empty set. But this contradicts 0 ∈ V . �

Lemma 2.7 If V is a finitely generated vector space, and V1 is a frame, then thereis a basis V2 for V which includes V1.

Proof Let V1 = {v1, . . . , vr}. If Span(V1) = V then V1 is a basis. So supposethat Span(V1) �= V . Then there exists an element vr+1 ∈ V which does not be-long to Span(V1). We seek to show that V2 = V1 ∪ {vr+1} is a frame. Consider0= αr+1vr+1 +∑r

i=1 αivi .If αr+1 = 0, then the linear independence of V1 implies that αi = 0, for i =

1, . . . , r . Thus V2 is a frame. If αr+1 �= 0, then

vr+1 =− 1

αr+1

(r∑

i=1

αivi

)

.

But this implies that vr+1 belongs to Span(V1) and therefore that V = Span(V1).Thus V2 is a frame. If V2 is a span for V , then it is a basis. If V2 is not a span, reiteratethis process. Since V is finitely generated, there must be some frame Vn−r+1 ={v1, . . . , vr , vr+1, . . . , vn} which is a span, and thus a basis for V . �

These two lemmas show that if V is a finitely generated vector space, and{v1, . . . , vm} is a span then some subset {v1, . . . , vn}, with n≤m, is a basis. A basisis a minimal span.

On the other hand if X = {v1, . . . , vr} is a frame, but not a span, then elementsmay be added to X in such a way as to preserve linear independence, until this“superset” of X becomes a basis. Consequently a basis is a maximal frame. Thesetwo results can be combined into one theorem.

Exchange Theorem Suppose that V is a finitely generated vector space. Let X ={x1, . . . , xn} be a frame and Y = {y1, . . . , yn} a span. Then there is some subset Y ′of Y such that X ∪ Y ′ is a basis for V .

Proof By induction, let XS = {x1, . . . , xs}, for each s = 1, . . . , n, and let X0 = φ. �

AU

TH

OR

’S P

RO

OF


2.2 Linear Transformations 45

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

We know already from Lemma 2.6 that there is some subset Y0 of Y such thatX0 ∪ Y0 is a basis for V . Suppose for some s < m, there is a subset Ys of Y suchthat Xs ∪ Ys is a basis.

Let Ys = {y1, . . . , yt }. Now xs+1 /∈ Span(Xs ∪ Ys) since Xs ∪ Ys is a basis. Thusxs+1 =∑s

1 α1x1 +∑t1 βiyi . But Xs+1 = {x1, . . . , xs+1} is a frame, since it is a

subset of X.Thus at least one βj �= 0. Let Ys+1 = Ys\{yj }, so Yj /∈ Span(Xs+1 ∪ Ys+1) and

so Xs+1 ∪ Ys+1 = {x1, . . . , xs+1} ∪ {y1, . . . , yj−1, yj+1, . . . , yt } is a basis for V .Thus if there is some subset Ys of Y such that Xs ∪ Ys is a basis, there is a subset

Ys+1 of Y such that Xs+1 ∪ Ys+1 is a basis.By induction, there is a subset Ym = Y ′ of Y such that Xm ∪ Ym = X ∪ Y ′ is a

basis. �

Corollary 2.8 If X = {x1, . . . , xm} is a frame in a vector space V , and Y ={y1, . . . , yn} is a span for V , then m≤ n.

Lemma 2.9 If V is a finitely generated vector space, then any two bases have thesame number of vectors, where this number is called the dimension of V , and writtendim(V ).

Proof Let X,Y be two bases with m,n number of elements. Consider X as a frameand Y as a span. Thus m≤ n. However Y is also a frame and X a span. Thus n≤m.Hence m= n. �

If V ′ is a vector subspace of a finitely generated vector space V , then any basisfor V ′ can be extended to give a basis for V . To see this, there must exist some finiteset V ′′ = {v1, . . . , vr} of vectors all belonging to V ′ such that Span(V ′′)= V ′. Oth-erwise V could not be finitely generated. As before eliminate members of V ′′ until aframe is obtained. This gives a basis for V ′. Clearly dim(V ′)≤ dim(V ). Moreover ifV ′ has a basis V ′′′ = {v1, . . . , vr} then further linear independent vectors belongingto V \V ′ can be added to V ′′′ to give a basis for V .

As we showed in Lemma 2.3, the vector space �n has a basis {e1, . . . , en} con-sisting of n elements. Thus dim (�n)= n.

If V m is a vector subspace of �n of dimension m, where of course m ≤ n, thenin a certain sense V m is identical to a copy of �m through the origin 0. We makethis more explicit below.

2.2 Linear Transformations

In Chap. 1 we considered a morphism from the abelian group (�2,+) to itself.A morphism between vector spaces is called a linear transformation.

Definition 2.7 Let V,U be two vector spaces of dimension n,m respectively, overthe same field F . Then a linear transformation T : V → U is a function from V toU with domain V , such that

AU

TH

OR

’S P

RO

OF



323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

1. for any a ∈F , any v ∈ V,T (αv)= α(T (v))

2. for any v1, v2 ∈ V, T (v1 + v2)= T (v1)+ T (v2).Note that a linear transformation is simply a morphism between (V ,+) and

(U,+) which respects the operation of the field F . We shall show that any lineartransformation T can be represented by an array of the form

M(T )=⎛

⎜⎝

a11 a1n

......

...

am1 amn

⎞

⎟⎠

consisting of n× m elements in F . An array such as this is called an n by m (orn×m) matrix. The set of n×m matrices we shall write as M(n,m).

2.2.1 Matrices

For convenience we shall consider finitely generated vector spaces over �, so thatwe restrict attention to linear transformations between �n and �m, for any integersn and m. Now let V = {v1, . . . , vn} be a basis for �n and U = {u1, . . . , um} a basisfor �m.

Since V is a basis for �n, any vector x ∈ �n can be written as x =∑nj=1 xjvj ,

with coordinates (x1, . . . , xn).If T is a linear transformation, then T (αv1 + βv2) = T (αv1) + T (βv2) =

αT (v1)+ βT (v2). Therefore

T (x)= T

(n∑

j=1

xjvj

)

=n∑

j=1

xjT (vj ).

Since each T (vj ) lies in �m we can write T (vj )=∑mi=1 aijui , where (a1j , a2j ,

. . . , amj ) are the coordinates of T (vj ) with respect to the basis U for �m.Thus

T (x)=n∑

j=1

xj

m∑

i=1

aijui =m∑

i=1

yiui

where the ith coordinate, yi , of T (x) is equal to∑n

j=1 aij xj .We obtain a set of linear equations:

y1 = a11x1 + a12x2 + · · ·a1j xj + · · ·a1nxn

......

yi = ai1x1 + ai2x2 + · · ·aij xj + · · ·ainxn

......

ym = am1x1 + am2x2 + · · ·amjxj + · · ·amnxn.

AU

TH

OR

’S P

RO

OF



369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

This set of equations is more conveniently written

row i

⎛

⎜⎜⎜⎜⎜⎜⎝

a11 . . . aij . . . a1n

...

ai1 aij ain

...

am1 . . . amj . . . amn

⎞

⎟⎟⎟⎟⎟⎟⎠

j th column

⎛

⎜⎜⎜⎜⎜⎜⎝

x1...

xj

...

xn

⎞

⎟⎟⎟⎟⎟⎟⎠=

⎛

⎜⎜⎜⎜⎜⎜⎝

y1...

yi

...

ym

⎞

⎟⎟⎟⎟⎟⎟⎠

.

or as M(T )x = y, where M(T ) is the n×m array whose ith row is (ai1, . . . , ain)

and whose j th column is (a1j , . . . , amj ). This matrix is commonly written as (aij )

where it is understood that i = 1, . . . ,m and j = 1, . . . , n.Note that the operation of M(T ) on x is as follows: to obtain the ith coor-

dinate, yi , take the ith row vector (a11, . . . , a1n) and form the scalar product ofthis with the column vector (x1, . . . , xn), where this scalar product is defined to be∑n

j=1 aij xj .The coefficients of T (vj ) with respect to the basis (u1, . . . , um) are (a1j , . . . , amj )

and these turn up as the j th column of the matrix. Thus we could write the matrixas

M(T )= (T (v1) . . . T (vj ) . . . T (vn))

where T (vj ) is the column of coordinates in �m. Suppose now that W ={w1, . . . ,wp} is a basis for �p and S : �m →�p is a linear transformation. Thento represent S as a matrix with respect to the two sets of bases, U and W , for eachi = 1, . . . ,m, we need to know

S(ui)=p∑

k=1

bkiwk.

Then as before S is represented by the matrix

M(S)=⎛

⎜⎝

b11 . . . b1i . . . b1m

... bki

...

bp1 . . . bpi . . . bpm

⎞

⎟⎠

where the ith column is the column of coordinates of S(ui) in �p .We can compute the composition

(S ◦ T ) : �n T→�m S→�p.

The question is how should we compose the two matrices M(S) and M(T ) sothat the result “corresponds” to the matrix M(S ◦ T ) which represents S ◦ T .

First of all we show that S ◦ T : �n→�p is a linear transformation, so that weknow that it can be represented by an (n× p) matrix.

AU

TH

OR

’S P

RO

OF



415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

Lemma 2.10 If T : �n→�m and S : �m→�p are linear transformations, thenS ◦ T : �n→�p is a linear transformation.

Proof Consider α,β ∈ �, v1, v2 ∈ �n. Then

(S ◦ T )(αv1 + βv2)= S[T (αv1 + βv2)

]

= S(αT (v1)+ βT (v2)

)since T is linear

= αS(T (v1)

)+ βS(T (v2)

)since S is linear

= α(S ◦ T )(v1)+ β(S ◦ T )(v2).

Thus S ◦ T is linear.By the previous analysis, (S ◦T ) can be represented by an (n×p) matrix whose

j th column is (S ◦ T )(vj ). Thus

(S ◦ T )(vj )= S

(m∑

i=1

aijui

)

=m∑

i=1

aijS(ui)

=m∑

i=1

aij

p∑

k=1

bkiwk

=p∑

k=1

(m∑

i=1

aij bki

)

wk.

Thus the kth entry in the j th column of M(S ◦ T ) is∑m

i=1 bkiaij .Thus (S ◦ T ) can be represented by the matrix

M(S ◦ T )= kth row

⎛

⎜⎜⎝

←− n −→. . .

∑mi=1 bkiaij . . .

j th column

⎞

⎟⎟⎠p

The j th column in this matrix can be obtained more simply by operating thematrix M(S) on the j th column vector T (vj ) in the matrix M(T ).

Thus M(S ◦ T )= (M(S)(T (v1)) . . .M(S)(T (vn)))=M(S) ◦M(T ).

AU

TH

OR

’S P

RO

OF



461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

kth row ofp rows

(bk1 . . . bki . . . bkm

←− m columns −→)

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

←− n−→aij

...

aij

...

amj

j th column

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

m rows

=M(S) ◦M(T ). �

Thus the “natural” method of matrix composition corresponds to the compositionof linear transformations.

Now let L(�n,�n) stand for the set of linear transformations from �n to �n. Aswe have shown, if S,T belong to this set then S ◦ T is also a linear transformationfrom�n to�n. Thus composition of functions (◦) is a binary operation L(�n,�n)×L(�n,�n)→L(�n,�n).

Let M :L(�n,�n)→M(n,n) be the mapping which assigns to any linear trans-formation T : �n →�n the matrix M(T ) as above. Note that M is dependent onthe choice of bases {v1, . . . , vn} and {u1, . . . , un} for the domain and codomain, �n.There is in general no reason why these two bases should be the same.

Now let ◦ be the method of matrix composition which we have just defined. Thusthe mapping M satisfies

M(S ◦ T )=M(S) ◦M(T )

for any two linear transformations, S and T . Suppose now that we are given a lineartransformation, T ∈ L(�n,�n). Clearly the matrix M(T ) which represents T withrespect to the two bases is unique, and so M is a function.

On the other hand suppose that T ,S are both represented by the same matrixA= (aij ).

By definition T (vj )= S(vj )=∑mi=1 aijui for each j = 1, . . . , n.

But then T (x)= S(x) for any x ∈ �n, and so T = S. Thus M is injective.Moreover if A is any matrix, then it represents a linear transformation, and so M

is surjective. Thus we have a bijective morphism

M : (L(�n,�n),◦)→ (M(n,n),◦).

As we saw in the case of 2× 2 matrices, the subset of non-singular matrices inM(n,n) forms a group. We repeat the procedure for the more general case.

2.2.2 The Dimension Theorem

Let T : V → U be a linear transformation between the vector spaces V,U of di-mension n,m respectively over a field F . The transformation is characterised bytwo subspaces, of V and U .

AU

TH

OR

’S P

RO

OF



507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

Definition 2.81. the kernel of a transformation T : V →U is the set Ker(T )= {x ∈ V : T (x)=

0} in V .2. The image of the transformation is the set Im(T ) = {y ∈ U : ∃ x ∈ V s.t.

T (x)= y}.Both these sets are vector subspaces of U,V respectively. To see this supposev1, v2 ∈ Ker(T ), and α,β ∈ F . Then T (αv1 + βv2) = αT (v1) + βT (v2) = 0 +0= 0. Hence αv1 + βv2 ∈Ker(T ).

If u1, u2 ∈ Im(T ) then there exists v1, v2 ∈ V such that T (v1)= u1, T (v2)= u2.But then

αμ1 + βμ2 = αT (v1)+ βT (v2)

= T (αv1 + βv2).

Since V is a vector space, αv1 + βv2 ∈ V and so αu1 + βu2 ∈ Im(T ).By the exchange theorem there exists a basis k1, . . . , kp for Ker(T ), where p =

dim Ker(T ) and a basis u1, . . . , us for Im(T ) where s = dim(Im(T )). Here p iscalled the kernel rank of T , often written kr(T ), and s is the rank of T , or rk(T ).

The Dimension Theorem If T : V →U is a linear transformation between vectorspaces over a field F , where dimension (V ) �= n, then the dimension of the kerneland image of T satisfy the relation

dim(Im(T )

)+ dim(Ker(T )

)= n.

Proof Let {u1, . . . , us} be a basis for Im(T ) and for each i = 1, . . . , s, let vi be thevector in V n such that T (vi)= ui .

Let v be any vector in V . Then

T (v)=s∑

i=1

αiui, for T (v) ∈ Im(T ).

So

T (v)=s∑

i=1

αiT (vi)

= T

(s∑

i=1

αivi

)

, and

T

(

v−s∑

i=1

αivi

)

= 0,

the zero vector in U , i.e., v −∑si=1 αivi ∈ kernel T . Let {k1, . . . , kp} be the basis

for Ker(T ).

AU

TH

OR

’S P

RO

OF



553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

Then v − ∑si=1 αivi = ∑p

j=1 βjkj , or v = ∑si=1 αivi + ∑p

j=1 βjkj . Thus(v1, . . . , vs, k1, . . . , kp) is a span for V .

Suppose we consider

s∑

i=1

αivi +p∑

j=1

βjkj = 0. (*)

Then, since T (kj )= 0 for j = 1, . . . , p,

T

(s∑

i=1

αivi +p∑

j=1

βjkj

)

=s∑

i=1

αiT (vi)+p∑

j=1

βjT (kj )

=s∑

i=1

αiT (vi)=s∑

i=1

αiui = 0.

Now {ui, . . . , us} is a basis for Im(T ), and hence these vectors are linearly inde-pendent. So αi = 0, i = 1, . . . , s. Therefore (*) gives

∑p

j=1 βjkj = 0.However {k1, . . . , kp} is a basis for Ker(T ) and therefore a frame, so βj = 0 for

j = 1, . . . , p. Hence {v1, . . . , vs, k1, . . . , kp} is a frame, and therefore a basis for V .By the exchange theorem the dimension of V is the unique number of vectors in abasis. Therefore s + p = n. �

Note that this theorem is true for general vector spaces. We specialise now tovector spaces �n and �m.

Suppose {v1, . . . , vn} is a basis for �n. The coordinates of vj with respect to thisbasis are (0, . . . ,1, . . . ,0) with 1 in the j th place. As we have noted the image of vj

under the transformation T can be represented by the j th column (aij , . . . , amj ) inthe matrix M(T ), with respect to the original basis (e1, . . . , em), say, for �m. Callthe n different column vectors of this matrix a1, . . . , aj , . . . , an.

Then the equation M(T )(x) = y is identical to the equation∑n

j=1 xjaj = y

where x = (x1, . . . , xn).Clearly any vector y in the image of M(T ) can be written as a linear com-

bination of the columns A = {a1, . . . , an}. Thus Span(A) = Im(M(T )). Supposenow that A is not a frame. In this case an, say, can be written as a linear com-bination of {a1, . . . , an−1}, i.e.,

∑nj=1 k1j aj = 0 and k1n �= 0. Then the vector

k1 = (k11, . . . , k1n) satisfies M(T )(k1)= 0. Thus k1 belongs to Ker(M(T )).Eliminate an, say, and proceed in this way. After p iterations we will have ob-

tained p kernel vectors {k1, . . . , kp} and the remaining column vectors {a1, . . . ,

an−p} will form a frame, and thus a basis for the image of M(T ).Consequently dim(Im(M(T ))) = n − p = n − dim(Ker(M(T ))). The number

of linearly independent columns in the matrix M(T ) is called the rank of M(T ),and is clearly the dimension of the image of M(T ). In particular if M1(T ) andM2(T ) are two matrix representations with respect to different bases, of the lineartransformation T , then rankM1(T )= rankM2(T )= rank(T ).

AU

TH

OR

’S P

RO

OF



599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

Thus rank(T ) is an invariant, in the sense of being independent of the particularbases chosen for �n and �m.

In the same way the kernel rank of T is an invariant; that is, for any matrixrepresentation M(T ) of T we have ker rank(M(T ))= ker rank(T ).

In general if y ∈ Im(T ), x0 satisfies T (x0)= y, and k belongs to the kernel, then

T (x0 + k)= T (x0)+ T (k)= y + 0= y.

Thus if x0 is a solution to the equation T (x0) = y, the point x0 + k is also asolution. More generally x0+Ker(T )= {x0+ k : k ∈Ker(T )} will also be the set ofsolutions. Thus for a particular y ∈ Im(T ), T −1(y)= {x : T (x)= y} = x0+Ker(T ).

By the dimension theorem dim Ker(T ) = n − rank(T ). Thus T −1(y) is a geo-metric object of “dimension” dim Ker(T )= n− rank(T ).

We defined T to be injective iff T (x0)= T (x) implies x0 = x. Thus T is injectiveiff Ker(T )= {0}. In this case, if there is a solution to the equation T (x0)= y, thenthis solution is unique.

Suppose that n ≤ m, and that the n different column vectors of the matrix arelinearly independent. In this case rank(T )= n and so dim Ker(T )= 0. Thus T mustbe injective. In particular if n < m then not every y ∈ �m belongs to the image ofT , and so not every equation T (x) = y has a solution. Suppose on the other handthat n > m. In this case the maximum possible rank is m (since n vectors cannot belinearly independent in �m when n > m). If rank(T ) = m, then there must exist akernel of dimension (n−m).

Moreover Im(T ) = �m, and so for every y ∈ �m there exists a solution to thisequation T (x)= y. Thus T is surjective. However the solution is not unique, sinceT −1(y)= x +Ker(T ) is of dimension (n−m) as before.

Suppose now that n=m, and that T : �n→�n has maximal rank n. Then T isboth injective and surjective and thus an isomorphism. Indeed T will have an inversefunction T −1 : �n → �n. Moreover T −1 is linear. To see this note that if x1 =T −1(y1) and x2 = T −1(y2) then T (x1)= y1 and T (x2)= y2 so T (x1 + x2)= y1 +y2. Thus T −1(y1+ y2)= x1+ x2 = T −1(y1)+T −1(y2). Moreover if x = T −1(αy)

then T (x)= αy. If α �= 0, then 1αT (x)= T ( 1

αx)= y or 1

αx = T −1(y). Hence x =

αT −1(y). Thus T −1(αy)= αT −1(y). Since T −1 is linear it can be represented bya matrix M(T −1). As we know M : (L(�n,�n),◦)→ (M(n,n),◦) is a bijectivemorphism, so M maps the identity linear transformation, Id, to the identity matrix

M(Id)= I =⎛

⎜⎝

1 . . . 0...

...

0 . . . 1

⎞

⎟⎠ .

When T is an isomorphism with inverse T −1, then the representation M(T −1)

of T −1 is [M(T )]−1. We now show how to compute the inverse matrix [M(T )]−1

of an isomorphism.

AU

TH

OR

’S P

RO

OF



645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

2.2.3 The General Linear Group

To compute the inverse of an n× n matrix A, we define, by induction, the determi-nant of A. For a 1× 1 matrix (a11) define det (A11)= a11, and for a 2× 2 matrix

A= ( a11 a12a21 a22

)define det A= a11a22 − a21a12.

For an n× n matrix A define the (i, j)th cofactor to be the determinant of the(n− 1)× (n− 1) matrix A(i, j) obtained from A by removing the ith row and j thcolumn, then multiplying by (−1)i+j . Write this cofactor as Aij . For example inthe 3× 3 matrix, the cofactor in the (1,1) position is

A11 = det

(a22 a23a32 a33

)= a22a33 − a32a23.

The n× n matrix (Aij ) is called the cofactor matrix.The determinant of the n× n matrix A is then

∑nj=1 a1jA1j . The determinant is

also often written as |A|.This procedure allows us to define the determinant of an n× n matrix. For ex-

ample if A= (aij ) is a 3× 3 matrix, then

|A| = a11

∣∣∣∣a22 a23a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23a31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22a31 a32

∣∣∣∣

= a11(a22a33 − a32a23)− a12(a21a33 − a31a23)+ a13(a21a32 − a31a22).

An alternative way of defining the determinant is as follows. A permutation of n

is a bijection s : {1, . . . , n}→ {1, . . . , n}, with degree d(s) the number of exchangesneeded to give the permutation.

Then |A| =∑s(−1)d(s)Πni=1ais(i) = a11a22a33 · · · + · · · where the summation is

over all permutations. The two definitions are equivalent, and it can be shown that

|A| =n∑

j=1

aijAij (for any i = 1, . . . , n)

=n∑

i=1

aijAij (for any j = 1, . . . , n)

while

0=n∑

i=1

aijAik if j �= k

=n∑

j=1

aijAkj if i �= k.

AU

TH

OR

’S P

RO

OF



691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

Thus

(aij ) (Ajk)t =(

n∑

j=1

aijAkj

)

=⎛

⎜⎝

|A| . . . 0...

...

0 . . . |A|

⎞

⎟⎠= |A|I.

Here (Ajk)t is the n× n matrix obtained by transposing the rows and columns

of (Ajk). Now the matrix A−1 satisfies A ◦ A−1 = I , and if A−1 exists then it isunique. Thus A−1 = 1

|A| (Aij )t .

Suppose that the matrix A is non-singular, so |A| �= 0. Then we can construct aninverse matrix A−1.

Moreover if A(x) = y then y = A−1(x) which implies that A is both injectiveand surjective. Thus rank(A) = n and the column vectors of A must be linearlyindependent.

As we have noted, however, if A is not injective, with Ker(A) �= {0}, thenrank(A) < n, and the column vectors of A must be linearly dependent. In this casethe inverse A−1 is not a function and cannot therefore be represented by a matrixand so we would expect |A| to be zero.

Lemma 2.11 If A is an n× n matrix with rank(A) < n then |A| = 0.

Proof Let A′ be the matrix obtained from A by adding a multiple (α) of the kthcolumn of A to the j th column of A. The j th column of A′ is therefore aj + αak .This operation leaves the j th column of the cofactor matrix unchanged. Thus

∣∣A′∣∣=

n∑

i=1

a′ijAij

=n∑

i=1

(aij + αaik)Aij

=n∑

i=1

aijAij + α

n∑

i=1

aikAij

= |A| + 0= |A|.

Suppose now that the columns of A are linearly dependent, and that aj =∑k �=j αkak for example. Let A′ be the matrix obtained from A by substituting

a′j = 0= aj −∑k �=j αkak for the j th column.By the above |A′| =∑n

i=1 a′ijAij = 0= |A|. �

AU

TH

OR

’S P

RO

OF



737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

Suppose now that A,B are two non-singular matrices (aij ), (bki). The composi-tion is then B ◦A= (

∑mi=1 bkiaij ) with determinant

|B ◦A| =∑

s

(−1)d(s)

n∏

k=1

◦(

m∑

i=1

bkiais(k)

)

.

This expression can be shown to be equal to

∑

s

(−1)d(s)n∏

i=1

ais(i)

∑

s

(−1)d(s)n∏

i=1

bks(k) = |B||A| �= 0.

Hence the composition (B ◦A) has an inverse (B ◦A)−1 given by A−1 ◦B−1.Now let (GL(�n,�n),◦) be the set of invertible linear transformations, with ◦

composition of functions, and let M∗(n,n) be the set of non-singular n×n matrices.Choice of bases {v1, . . . , vn}, {u1, . . . , un} for the domain and codomain defines amorphism

M : (GL(�n,�n

),◦)→ (M∗(n,n),◦).

Suppose now that T belongs to GL(�n,�n). As we have seen this is equivalentto |M(T )| �= 0, so the image of M is precisely M∗(n,n). Moreover if |M(T )| �= 0then |M(T −1)| = 1

|M(T )| and M(T −1) belongs to M∗(n,n). On the other hand if

S,T ∈ GL(�n,�n) then S ◦ T also has rankn, and has inverse T −1 ◦ S−1 withrankn.

The matrix M(S ◦ T ) representing T ◦ S has inverse

M(T −1 ◦ S−1)=M

(T −1) ◦M

(S−1)

= [M(T )]−1 ◦ [M(S)

]−1.

Thus M is an isomorphism between the two groups (GL(�n,�n,◦)) and(M∗(n,n),◦).

The group of invertible linear transformations is also called the general lineargroup.

2.2.4 Change of Basis

Let L(�n,�m) stand for the set of linear transformations from �n to �m, and letM(n,m) stand for the set of n×m matrices. We have seen that the choice of basesfor �n,�m defines a function.

M :L(�n,�m)→M(n,m)

AU

TH

OR

’S P

RO

OF



783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

which take a linear transformation T to its representation M(T ). We now exam-ine the relationship between two representations M1(T ), M2(T ) of a single lineartransformation.

Basis Change Theorem Let {v1, . . . , vn} and {u1, . . . , um} be bases for �n,�m

respectively.Let T be a linear transformation which is represented by a matrix A = (aij )

with respect to these bases. If V ′ = {v′1, . . . , v′n},U ′ = {u′1, . . . , u′m} are new basesfor �n,�m then T is represented by the matrix B =Q−1 ◦A ◦ P , where P,Q arerespectively (n× n) and (m×m) invertible matrices.

Proof For each v′k ∈ V ′ = {v′1, . . . , v′n} let v′k =∑n

i=1 bikvi and bk = (b′1k, . . . , bnk).Let P = (b1, . . . , bn) where the kth column of P is the column of coordinates

of bk . With respect to the new basis V ′, v′k has coordinates ek = (0, . . . ,1, . . . ,0)

with a 1 in the kth place.But then P(ek)= bk the coordinates of v′k with respect to V .Thus P is the matrix that transforms coordinates with respect to V ′ into coordi-

nates with respect to V . Since V is a basis, the columns of P are linearly indepen-dent, and so rankP = n, and P is invertible.

In the same way let u′k =∑m

i=1 cikui, ck = (c1k, . . . , cmk) and Q= (c1, . . . , cm)

the matrix with columns of these coordinates.Hence Q represents change of basis from U ′ to U . Since Q is an invertible m×m

matrix it has inverse Q−1 which represents change of basis from U to U ′.Thus we have the diagram

{v1, . . . , vn} A−→ {u1, . . . , um}

P ↑ Q−1 ↓ ↑Q

{v′1, . . . , v′n

} B−→ {u′1, . . . , u′m

}

from which we see that the matrix B , representing the linear transformation T :�n→�m with respect to the new bases is given by B =Q−1 ◦A ◦ P . �

Isomorphism Theorem Any linear transformation T : �n→�m of rank r can berepresented, by suitable choice of bases for �n and �m, by an n×m matrix

(Ir 00 0

)where Ir =

(1 00 1

)is the (r × r) identity matrix.

In particular1. if n < m and T is injective then there is an isomorphism S : �m →�m such

that S ◦ T (x1, . . . , xn) = (x1, . . . , xn,0, . . . ,0) with (n − m) zero entries, forany vector (x1, . . . , xn) in �n

AU

TH

OR

’S P

RO

OF



829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

2. if n ≥ m and T is surjective then there are isomorphisms � : �n →�n, S :�m→�m such that S ◦ T ◦ R(x1, . . . , xn)= (x1, . . . , xm). If n=m, then S ◦T ◦R is the identity isomorphism.

Proof Of necessity rank(T )= r ≤min(n,m). If r < n, let p = n− r and choose abasis k1, . . . , kp for Ker(T ). Let V = {v1, . . . , vn} be the original basis for �n. Bythe exchange theorem there exists r = (n− p) different members {v1, . . . , vr} sayof V such that V ′ = {v1, . . . , vr , k1, . . . , kp} is a basis for �n.

Choose V ′ as the new basis for �n, and let P be the basis change matrixwhose columns are the column vectors in V ′. As in the proof of the dimen-sion theorem the image of the vectors v1, . . . , vn−p under T provide a basis forthe image of T . Let U = {u1, . . . , um} be the original basis of �m. By the ex-change theorem there exists some subset U ′ = {u1, . . . , um−r} of U such that U ′′ ={T (v1), . . . , T (vr), u1, . . . , um−r} form a basis for �m. Note that T (v1), . . . , T (vr)

are represented by the r linearly independent columns of the original matrix A rep-resenting T . Now let Q be the matrix whose columns are the members of U ′′. Bythe basis change theorem, B =Q−1 ◦A ◦ P , where B is the matrix representing T

with respect to these new bases. Thus we obtain

{v1, . . . , vn} A−→ {u1, . . . , um}

P ↑ Q−1 ↓

{v1, . . . , vr , k1 . . .} B−→ {T (v1) . . . T (vr), u1, . . . , um−r

}.

With respect to these new bases, the matrix B representing T has the required form:(

Ir 00 0

).

1. If n < m and T is injective then r = n. Hence P is the identity matrix, and soB =Q−1 ◦A.

But Q−1 is an m×m invertible matrix, and thus represents an isomorphism�n −→�n, while

B

(x1xn

)=(

In

0

)(x1xn

)=

⎛

⎜⎜⎜⎜⎜⎜⎝

x1...

xn

...

0

⎞

⎟⎟⎟⎟⎟⎟⎠

.

Write a vector x =∑ni=1 xivi as (x1, . . . , xn), and let S be the linear transfor-

mation �m →�m represented by the matrix Q−1. Then S ◦ T (x1, . . . , xn) =(x1, . . . , xn,0, . . . ,0).

2. If n≥m and T is surjective then rank(T )=m, and dim Ker(T )= n−m. ThusB = (Im 0)=Q−1 ◦A ◦ P . Let S,R be the linear transformations representedby Q−1 and P respectively.

AU

TH

OR

’S P

RO

OF



875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

Then S ◦ T ◦ R(x1, . . . , xn) = (x1, . . . , xm). If n = m then S ◦ T ◦ R is theidentity transformation. �

Suppose now that V,U are the two bases for �n,�m as in the basis theorem.A linear transformation T : �n −→ �m is represented by a matrix M1(T ) withrespect to these bases. If V ′,U ′ are two new bases, then T will be represented bythe matrix M2(T ), and by the basis theorem

M2(T )=Q−1 ◦M1(T ) ◦ P

where Q,P are non-singular (m × m) and (n × n) matrices respectively. SinceM1(T ) and M2(T ) represent the same linear transformation, they are in some senseequivalent. We show this more formally.

Say the two matrices A,B ∈ M(n,m) are similar iff there exist non singularsquare matrices P ∈M∗(n,n) and Q ∈M∗(m,m) such that B =Q−1 ◦A ◦ P , andin this case write B ∼A.

Lemma 2.12 The similarity relation (∼) on M(n,m) is an equivalence relation.

Proof1. To show that ∼ is reflexive note that A= I−1

m ◦A ◦ In where Im, In are respec-tively the (m×m) and (n× n) identity matrices.

2. To show that ∼ is symmetric we need to show that B ∼A implies that A∼ B .Suppose therefore that B =Q−1 ◦A ◦ P .Since Q ∈M∗(m,m) it has inverse Q−1 ∈M∗(m,m).Moreover (Q−1)−1 ◦Q−1 = Im, and thus Q= (Q−1)−1. Thus

Q ◦B ◦ P−1 = (Q ◦Q−1) ◦A ◦ (P ◦ P−1)

=A

= (Q−1)−1 ◦B ◦ (P−1).

Thus A∼ B .3. To show ∼ is transitive, we seek to show that C ∼ B ∼ A implies C ∼ A.

Suppose therefore that C =R−1 ◦B ◦ S and B =Q−1 ◦A ◦ P , where R,Q ∈M∗(m,m) and S,P ∈M∗(n,n). Then

C = (R−1 ◦Q−1) ◦A ◦ P ◦ S

= (Q ◦R)−1 ◦A ◦ (P ◦ S).

Now (M∗(m,m),◦), (M∗(n,n),◦) are both groups and so Q◦R ∈M∗(m,m),P ◦ S ∈M∗(n,n). Thus C ∼A. �

AU

TH

OR

’S P

RO

OF



921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

The isomorphism theorem shows that if there is a linear transformation T :�n −→ �m of rank r , then the (n × m) matrix M1(T ) which represents T , withrespect to some pair of the bases, is similar to an n×m matrix

B =(

Ir 00 0

)i.e., M(T )∼ B.

If S is a second linear transformation of rank r then M1(S)∼ B .By Lemma 2.12, M1(S)∼M1(T ).Suppose now that U ′,V ′ are a second pair of bases for �n,�m and let

M2(S),M2(T ) represent S and T . Clearly M2(S)∼M2(T ).Thus if S,T are linear transformations �m −→ �n we may say that S,T are

equivalent iff for any choice of bases the matrices M(S),M(T ) which representS,T are similar.

For any linear transformation T ∈ L(�n,�m) let [T ] be the equivalence class{S ∈ L(�n,�m) : S ∼ T }. Alternatively a linear transformation S belongs to [T ]iff rank(S)= rank(T ). Consequently the equivalence relation partitions L(�n,�m)

into a finite number of distinct equivalence classes where each class is classified byits rank, and the rank runs from 0 to min(n,m).

2.2.5 Examples

Example 2.1 To illustrate the use of these procedures in the solution of linear equa-tions, consider the case with n < m and the equation A(x)= y where

A=

⎛

⎜⎜⎝

1 −1 25 0 3−1 −4 53 2 −1

⎞

⎟⎟⎠ and y1 =

⎛

⎜⎜⎝

−11−11

⎞

⎟⎟⎠ , y2 =

⎛

⎜⎜⎝

05−55

⎞

⎟⎟⎠ .

To find Im(A), we first of all find Ker(A). The equation A(x) = 0 gives fourequations

x1 − x2 + 2x3 = 0

5x1 + 0+ 3x3 = 0

−x1 − 4x2 + 5x3 = 0

3x1 + 2x2 − x3 = 0

with solution k = (x1, x2, x3)= (−3,7,5).Thus Ker(A) ⊃ {λk ∈ �3 : λ ∈ �}. Hence dim Im(A) ≤ 2. Clearly the first two

columns (a1, a2) of A are linearly independent and so dim Im(A) = 2. Howevery2 = a1 + a2. Thus a particular solution to the equation A(x)= y2 is x0 = (1,1,0).

AU

TH

OR

’S P

RO

OF



967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

The full set of solutions to the equation is

x0 +Ker(A)= {(1,1,0)+ λ(−3,7,5) : λ ∈ �}.

To see whether y1 ∈ Im(A) we need only attempt to solve the equation y1 = αa1 +βa2. This gives

−1= α − β

1= 5α

−1=−α − 4β

1= 3α + 2β.

From the first two equations α = 15 , β = 6

5 , which is incompatible with the fourthequation. Thus y1 cannot belong to Im(A).

Example 2.2 Consider now an example of the case n > m, where

A=(

2 1 1 1 11 2 −1 1 1

): �5 −→�2.

Obviously the first two columns are linearly independent and so dim Im(A)≥ 2.Let {ai : i = 1, . . . ,5} be the five column vectors of the matrix and consider theequation

(21

)−(

12

)−(

1−1

)=(

00

).

Thus k1 = (1,−1,−1,0,0) belongs to Ker(A). On the other hand

(21

)+(

12

)− 3

(11

)=(

00

).

Thus k2 = (1,1,0,−3,0) and k3 = (1,1,0,0,−3) both belong to Ker(A).Consequently the rank of A has its maximal value of 2, while the kernel is three-

dimensional. Hence for any y ∈ �2 there is a set of solutions of the form x0 +Span{k1, k2, k3} to the equation A(x)= y.

Change the bases of �5 and �2 to

⎛

⎜⎜⎜⎜⎝

10000

⎞

⎟⎟⎟⎟⎠

,

⎛

⎜⎜⎜⎜⎝

01000

⎞

⎟⎟⎟⎟⎠

,

⎛

⎜⎜⎜⎜⎝

1−1−100

⎞

⎟⎟⎟⎟⎠

,

⎛

⎜⎜⎜⎜⎝

110−30

⎞

⎟⎟⎟⎟⎠

,

⎛

⎜⎜⎜⎜⎝

1100−3

⎞

⎟⎟⎟⎟⎠

AU

TH

OR

’S P

RO

OF



1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

and( 2

1

),( 1

2

)respectively, then

B =(

2 11 2

)−1(2 1 1 1 11 2 −1 1 1

)

⎛

⎜⎜⎜⎜⎝

1 0 1 1 10 1 −1 1 10 0 −1 0 00 0 0 −3 00 0 0 0 −3

⎞

⎟⎟⎟⎟⎠

= 1

3

(2 −1−1 2

) (2 1 0 0 01 2 0 0 0

)=(

1 0 0 0 00 1 0 0 0

).

Example 2.3 Consider the matrix

Q=

⎛

⎜⎜⎝

1 −1 0 05 0 0 0−1 −4 1 03 2 0 1

⎞

⎟⎟⎠ .

Since |Q| = 5 we can compute its inverse. The cofactor matrix (Qij ) of Q is

⎛

⎜⎜⎝

0 −5 −20 101 1 5 −50 0 5 00 0 0 5

⎞

⎟⎟⎠

and thus

Q−1 = 1

|Q| (Qij )t =

⎛

⎜⎜⎝

0 15 0 0

−1 15 0 0

−4 1 1 02 −1 0 1

⎞

⎟⎟⎠ .

Example 2.4 Let T : �3→�4 be the linear transformation represented by the ma-trix A of Example 2.1, with respect to the standard bases for �3,�4. We seek tochange the bases so as to represent T by a diagonal matrix

B =(

Ir

0

).

By Example 2.1, the kernel is spanned by (−3,7,5), and so we choose a newbasis

e1 =⎛

⎝100

⎞

⎠ , e2 =⎛

⎝010

⎞

⎠ , k =⎛

⎝−375

⎞

⎠

AU

TH

OR

’S P

RO

OF



1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

with basis change matrix P = (e1, e2, k). Note that |P | = 5 and P is non-singular.Thus {e1, e2, k} form a basis for �3. Now Im(A) is spanned by the first two columnsa1, a2, of A. Moreover A(e1)= a1 and A(e2)= a2. Thus choose

a1 =

⎛

⎜⎜⎝

15−13

⎞

⎟⎟⎠ , a2 =

⎛

⎜⎜⎝

−10−42

⎞

⎟⎟⎠ , e′3 =

⎛

⎜⎜⎝

0010

⎞

⎟⎟⎠ , e′4 =

⎛

⎜⎜⎝

0001

⎞

⎟⎟⎠

as the new basis for �4. Let Q = (a1, a2, e′3, e

′4) be the basis change matrix. The

inverse Q−1 is computed in Example 2.3. Thus we have (B)=Q−1 ◦A ◦ P .To check that this is indeed the case we compute:

Q−1 ◦A ◦ P =

⎛

⎜⎜⎝

0 15 0 0

−1 15 0 0

−4 1 1 02 −1 0 1

⎞

⎟⎟⎠

⎛

⎜⎜⎝

1 1 25 0 3−1 −4 53 2 −1

⎞

⎟⎟⎠

⎛

⎝1 0 −30 1 70 0 5

⎞

⎠

=

⎛

⎜⎜⎝

1 0 00 1 00 0 00 0 0

⎞

⎟⎟⎠

as required.

2.3 Canonical Representation

When considering a linear transformation T : �n −→ �n it is frequently conve-nient to change the basis of �n to a new basis V = {v1, . . . , vn} such that T is nowrepresented by a matrix

M2(T )= P−1 ◦M1(T ) ◦ P.

In this case it is generally not possible to obtain M2(T ) in the form(

Ir 00 0

)as before.

Under certain conditions however M2(T ) can be written in a diagonal form

⎛

⎝λ1 0

·0 λn

⎞

⎠ ,

where λ1, . . . , λn are known as the eigenvalues.More explicitly, a vector x is called an eigenvector of the matrix A iff there is a

solution to the equation A(x)= λx where λ is a real number. In this case, λ is calledthe eigenvalue associated with the eigenvector x. (Note that we assume x �= 0.)

AU

TH

OR

’S P

RO

OF


2.3 Canonical Representation 63

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

2.3.1 Eigenvectors and Eigenvalues

Suppose that there are n linearly independent eigenvectors {x1, . . . , xn} for A, where(for each i = 1, . . . , n) λi is the eigenvalue associated with xi . Clearly the eigen-vector xi belongs to Ker(A) iff λi = 0. If rank(A) = r then there would a subset{x1, . . . , xr} of eigenvectors which form a basis for Im(A), while {x1, . . . , xn} forma basis for �n. Now let Q be the (n× n) matrix representing a basis change fromthe new basis to the original basis. That is to say the ith column, vi , of Q is thecoordinate of xi with respect to the original basis.

After transforming, the original becomes

Q−1 ◦A ◦Q=⎛

⎜⎝

λ1 . . . 0... λr

...

0 0

⎞

⎟⎠=∧,

where rank∧= rankA= r .In general we can perform this diagonalisation only if there are enough eigen-

vectors, as the following lemma indicates.

Lemma 2.13 If A is an n× n matrix, then there exists a non-singular matrix Q,and a diagonal matrix ∧ such that ∧ =Q−1AQ iff the eigenvectors of A form abasis for �n.

Proof1. Suppose the eigenvectors form a basis, and let Q be the eigenvector matrix. By

definition, if vi is the ith column of Q, then A(vi)= λivi , where λi is real. ThusAQ=Q∧. But since {v1, . . . , vn} is a basis, Q−1 exists and so ∧=Q−1AQ.

2. On the other hand if ∧ =Q−1AQ, where Q is non-singular then AQ =Q∧.But this is equivalent to A(v1)= λivi for i = 1, . . . , n where λi is the ith diag-onal entry in ∧, and vi is the ith column of Q.Since Q is non-singular, the columns {v1, . . . , vn} are linearly independent, andthus the eigenvectors form a basis for �n. �

If there are n distinct (real) eigenvalues then this gives a basis, and thus a diago-nalisation.

Lemma 2.14 If {v1, . . . , vm} are eigenvectors corresponding to distinct eigenval-ues {λ1, . . . , λm}, of a linear transformation T : �n →�n, then {v1, . . . , vm} arelinearly independent.

Proof Since v1 is assumed to be an eigenvector, it is non-zero, and thus {v1} is alinearly independent set. Proceed by induction.

AU

TH

OR

’S P

RO

OF



1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

Suppose Vk = {v1, . . . , vk}, with k < m, are linearly independent. Let vk+1 beanother eigenvector and suppose

v =k+1∑

r=1

arvr = 0.

Then 0= T (v)=∑k+1r=1 arT (vr)=∑k+1

r=1 arλrvr .If λk+1 = 0, then λi �= 0 for i = 1, . . . , k and by the linear independence of

Vk, arλr = 0, and thus ar = 0 for r = 1, . . . , k.Suppose λk+1 �= 0. Then

λk+1v =k+1∑

r=1

λk+1arvr =k+1∑

r=1

arλrvr = 0.

Thus∑k

r=1(λk+1 − λr)arvr = 0.By the linear independence of Vk, (λk+1 − λr)ar = 0 for r = 1, . . . , k.But the eigenvalues are distinct and so ar = 0, for r = 1, . . . , k.Thus ak+1vk+1 = 0 and so ar = 0, r = 1, . . . , k + 1. Hence

Vk+1 = {v1, . . . , vk+1}, k < m,

is linearly independent.By induction Vm is a linearly independent set. �

Having shown how the determination of the eigenvectors gives a diagonalisation,we proceed to compute eigenvalues.

Consider again the equation A(x) = λx. This is equivalent to the equationA′(x)= 0, where

A′ =

⎛

⎜⎜⎜⎝

a11 − λ a12 . . . a1n

a21 a22 − λ ·...

......

...

an1 ann − λ

⎞

⎟⎟⎟⎠

.

For this equation to have a non zero solution it is necessary and sufficient that|A′| = 0. Thus we obtain a polynomial equation (called the characteristic equation)of degree n in λ, with n roots λ1, . . . , λn not necessarily all real. In the 2× 2 casefor example this equation is λ2− λ(a11+ a22)+ (a11a22− a21a12)= 0. If the rootsof this equation are λ1, λ2 then we obtain

(λ− λ1)(λ− λ2)= λ2 − λ(λ1 + λ2) + λ1λ2.

AU

TH

OR

’S P

RO

OF



1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

Hence

λ1λ2 = (a11a22 − a21a22)= |A|λ1 + λ2 = a11 + a22.

The sum of the diagonal elements of a matrix is called the trace of A. In the 2×2case therefore

λ1λ2 = |A|, λ1 + λ2 = a11 + a22 = trace(A).

In the 3× 3 case we find

(λ− λ1)(λ− λ2)(λ− λ3)

= λ3 − λ2(λ1 + λ2 + λ3)+ λ(λ1λ2 + λ1λ3 + λ2λ3)− λ1λ2λ3

= λ3 − λ2(traceA)+ λ(A11 +A22 +A33)− |A| = 0,

where Aii is the ith diagonal cofactor of A. Suppose all the roots are non-zero (thisis equivalent to the non-singularity of the matrix A). Let

∧=⎛

⎝λ1 0 00 λ2 00 0 λ3

⎞

⎠

be the diagonal eigenvalue matrix, with | ∧ | = λ1λ2λ3.The cofactor matrix of ∧ is then

⎛

⎝λ2λ3 0 0

0 λ1λ3 00 0 λ1λ2

⎞

⎠ .

Thus we see that the sum of the diagonal cofactors of A and ∧ are identical.Moreover trace (A)= trace (∧) and | ∧ | = |A|.

Now let ∼ be the equivalence relation defined on L(�n,�n) by B ∼A iff thereexist basis change matrices P,Q and a diagonal matrix ∧ such that

∧= P−1AP =Q−1BQ.

On the set of matrices which can be diagonalised, ∼ is an equivalence relation,and each class is characterised by n invariants, namely the trace, the determinant,and (n− 2) other numbers involving the cofactors.

AU

TH

OR

’S P

RO

OF



1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

2.3.2 Examples

Example 2.5 Let

A=⎛

⎝2 1 −10 1 12 0 −2

⎞

⎠ .

The characteristic equation is

(2− λ)[(1− λ)(−2− λ)

]− 1(−2)− (−2(1− λ)=−λ(λ2 − λ− 2

)

=−λ(λ− 2)(λ+ 1)

= 0.

Hence (λ1, λ2, λ3)= (0,2,−1). Note that λ1 + λ2 + λ3 = trace(A)= 1 and

λ2λ3 =−2=A11 +A22 +A33.

Eigenvectors corresponding to these eigenvalues are

x1 =⎛

⎝1−11

⎞

⎠ , x2 =⎛

⎝211

⎞

⎠ , x3 =⎛

⎝1−1

2

⎞

⎠ .

Let P be the basis change matrix given by these three column vectors. The in-verse can be readily computed, to give

P−1AP =⎛

⎝1 −1 −113

13 0

− 23

13 1

⎞

⎠

⎛

⎝2 1 −10 1 12 0 2

⎞

⎠

⎛

⎝1 2 1−1 1 −11 1 2

⎞

⎠=⎛

⎝0 0 00 2 00 0 −1

⎞

⎠ .

Suppose we now compute A2 =A ◦A : �3→�3. This can easily be seen to be

⎛

⎝2 3 12 1 −10 2 2

⎞

⎠ .

The characteristic function of A2 is (λ3 − 5λ2 + 4λ) with roots μ1 = 0, μ2 = 4,μ3 = 1.

In fact the eigenvectors of A2 are x1, x2, x3, the same as A, but with eigenval-ues λ2

1, λ22, λ

23. In this case Im(a) = Im(A2) is spanned by {x2, x3} and Ker(A) =

Ker(A2) has basis {x1}.More generally consider a linear transformation A : �n →�n. Then if x is an

eigenvector with a non-zero eigenvalue λ,A2(x) = A ◦ A(x) = A[λx] = λA(x) =λ2x, and so x ∈ Im(A)∩ Im(A2).

AU

TH

OR

’S P

RO

OF



1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

If there exist n distinct real roots to the characteristic equation of A, then a basisconsisting of eigenvectors can be found. Then A can be diagonalized, and Im(A)=Im(A2), Ker(A)=Ker(A2).

Example 2.6 Let

A=⎛

⎝3 −1 −11 3 −75 −3 1

⎞

⎠

Then Ker(A) has basis {(1,2,1)}, and Im(A) has basis {(3,1,5), (−1,3,−3)}. Theeigenvalues of A are 0,0,7. Since we cannot find three linearly independent eigen-vectors, A cannot be diagonalised. Now

A2 =⎛

⎝3 −3 3−29 29 −2917 −17 17

⎞

⎠

and thus Im(A2) has basis {(3,−29,17)}. Note that

⎛

⎝3−2917

⎞

⎠=−2

⎛

⎝315

⎞

⎠− 9

⎛

⎝−13−3

⎞

⎠ ∈ Im(A)

and so Im(A2) is a subspace of Im(A).Moreover Ker(A2) has basis {(1,2,1), (1,−1,0)} and so Ker(A) is a subspace

of Ker(A2).This can be seen more generally. Suppose f : �n→�n is linear, and x ∈Ker(f ).

Then f 2(x)= f (f (x))= 0, and so x ∈ Ker(f 2). Thus Ker(f )⊂ Ker(f 2). On theother hand if v ∈ Im(f 2) then there exists w ∈ �n such that f 2(w)= v. But f (w) ∈�n and so f (f (w))= v ∈ Im(f ). Thus Im(f 2)⊂ Im(f ).

2.3.3 Symmetric Matrices and Quadratic Forms

Given two vectors x = (x1, . . . , xn) and y = (y1, . . . , yn) in �n, let 〈x, y〉 =∑ni=1 xiyi ∈ � be the scalar product of x and y. Note that 〈λx,y〉 = λ〈x, y〉 =

〈x,λy〉 for any real λ. (We use 〈−,−〉 to distinguish the scalar product from a vec-tor in �2. However the notations (x, y) or x · y are often used for scalar product.)

An n× n matrix A= (aij ) may be regarded as a map A∗ : �n×�n→�, whereA∗(x, y)= 〈x,A(y)〉.

A∗ is linear in both x and y and is called bilinear. By definition 〈x,A(y)〉 =∑ni=1∑n

j=1 xiaij yj .Call an n× n matrix A symmetric iff A=At where At = (aji) is obtained from

A by exchanging rows and columns.

AU

TH

OR

’S P

RO

OF



1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

In this case 〈A(x), y〉 =∑ni=1(∑n

j=1 aji xi)yj =∑ni=1∑n

j=1 xi aij yj , sinceaij = aji for all i, j .

Hence 〈A(x), y〉 = 〈x,A(y)〉 for any x, y ∈ �n whenever A is symmetric.

Lemma 2.15 If A is a symmetric n × n matrix, and x, y are eigenvectors of A

corresponding to distinct eigenvalues then 〈x, y〉 = 0, i.e., x and y are orthogonal.

Proof Let λ1 �= λ2 be the eigenvalues corresponding to the distinct eigenvectorsx, y. Now

⟨A(x), y

⟩= ⟨x,A(y)⟩

= 〈λ1x, y〉 = 〈x,λ2y〉= λ1〈x, y〉 = λ2〈x, y〉.

Here 〈A(x), y〉 = 〈x,A(y)〉 since A is symmetric. Moreover 〈x,λy〉 =∑ni=1 xi(λyi)= λ〈x, y〉. Thus (λ1 − λ2)〈x, y〉 = 0. If λ1 �= λ2 then 〈x, y〉 = 0. �

Lemma 2.16 If there exist n distinct eigenvalues to a symmetric n× n matrix A,then the eigenvectors X = {x1, . . . , xn} form an orthogonal basis for �n.

Proof Directly by Lemmas 2.14 and 2.15.We may also give a brief direct proof of Lemma 2.16 by supposing that∑ni=1 αixi = 0. But then for each j = i, . . . , n,

0= 〈xj ,0〉 =n∑

i=1

αi〈xj , xi〉 = αj 〈xj , xj 〉.

But since xj �= 0, 〈xj , xj 〉> 0 and so αj = 0 for each j . Thus X is a frame. Sincethe vectors in X are mutually orthogonal, X is an orthogonal basis for �n.

For a symmetric matrix the roots of the characteristic equation will all be real.To see this in the 2× 2 case, consider the characteristic equation

(λ− λ1)(λ− λ2)= λ2 − λ(a11 + a22)= (a11a22 − a21a12).

The roots of this equation are −b+√

b2−4c2 with real roots iff b2 − 4c ≥ 0.

But this is equivalent to

(a11 + a22)2 − 4(a11a22 − a21a12)= (a11 − a22)

2 + 4(a12)2 ≥ 0,

since a12 = a21.Both terms in this expression are non-negative, and so λ1, λ2 are real.In the case of a symmetric matrix, A, let Eλ be the set of eigenvectors associ-

ated with a particular eigenvalue, λ, of A together with the zero vector. Supposex1, x2 belong to Eλ. Clearly A(x1+ x2)=A(x1)+A(x2)= λ(x1+ x2) and so x1+

AU

TH

OR

’S P

RO

OF



1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

x2 ∈ Eλ. If x ∈ Eλ, then A(αx) = αA(x) = α(λx) = λ(αx) and αx ∈ Eλ for eachnon-zero real number, α.

Since we also now suppose that for each eigenvalue, λ, the eigenspace Eλ con-tains 0, then Eλ will be a vector subspace of �n. If λ= λ1 = · · · = λr are repeatedroots of the characteristic equation, then, in fact, the eigenspace, Eλ, will be of di-mension r , and we can find r mutually orthogonal vectors in Eλ, forming a basisfor Eλ.

Suppose now that A is a symmetric n×n matrix. As we shall show we may write∧ = P−1AP where P is the n× n basis change matrix whose columns are the n

linearly independent eigenvectors of A.Now normalise each eigenvector xj by defining zj = 1

‖xj ‖ (x1j , . . . , xnj ) where

‖xj‖ =√∑

(xkj )2 =√〈xj , xj 〉 is called the norm of xj .Let Q = (z1, . . . , zn) be the n× n matrix whose columns consist of z1, . . . , zn.

Now

QtQ=⎛

⎝z11 z21 zn1z1j znj

z1n znn

⎞

⎠

⎛

⎜⎜⎜⎝

z11 z1j z1n

z21 z2j z2n

......

...

zn1 znj znn

⎞

⎟⎟⎟⎠

=⎛

⎜⎝

〈z1, z1〉 . . . 〈z1, zn〉〈z2, z1〉

... . . . 〈zn, zn〉

⎞

⎟⎠

since the (i, k)th entry in QtQ is∑n

r=1 zrizrk = 〈zi, zk〉.But (zi, zk) = 〈 xi‖xi‖ ,

xk‖xk‖ 〉 = 1‖xi‖‖xk‖ 〈xi, xk〉 = 0 if i �= k. On the other hand

〈zi, zi〉 = 1‖xi‖2 〈xi, xi〉 = 1, and QtQ = In, the n × n identity matrix. Thus Qt =

Q−1.Since {z1, . . . , zn} are eigenvectors of A with real eigenvalues {λ1, . . . , λn} we

obtain

∧=⎛

⎝λ1 0

λr

0 0

⎞

⎠=QtAQ

where the last (n− r) columns of Q correspond to the kernel vectors of A.When A is a symmetric n× n matrix the function A∗ : �n ×�n →� given by

A∗(x, y)= 〈x,A(y)〉 is called a quadratic form, and in matrix notation is given by

(x1, . . . , xn)

⎛

⎝aij

⎞

⎠

⎛

⎝y1

yn

⎞

⎠

AU

TH

OR

’S P

RO

OF



1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

Consider

A∗(x, x)= ⟨x,A(x)⟩

= ⟨x,Q∧At(x)⟩

= ⟨Qt(x),∧Qt(x)⟩.

Now Qt(x) = (x′1 . . . , x′n) is the coordinate representation of the vector x withrespect to the new basis {z1, . . . , zn} for �n. Thus

A∗(x, x)= (x′1, . . . , x′n)⎛

⎝λ1

λr

0

⎞

⎠(

x′1x′n

)=

r∑

i=1

λi

(x′i)2

.

Suppose that rankA= r and all eigenvalues of A are non-negative. In this case,A∗(x, x)=∑n

i=1 |λi |(xi)2 ≥ 0. Moreover if x is a non-zero vector then Qt(x) �= 0,

since Qt must have rank n.Define the nullity of A∗ to be {x :A∗(x, x)= 0}. Clearly if x is a non-zero vector

in Ker(A) then it is an eigenvector with eigenvalue 0. Thus the nullity of A∗ is avector subspace of �n of dimension at least n− r , where r = rank(A). If the nullityof A∗ is {0} then call A∗ non-degenerate. If all eigenvalues of A are strictly positive(so that A∗ is non-degenerate) then A∗(x, x) > 0 for all non-zero x ∈ �n. In thiscase A∗ is called positive definite. If all eigenvalues of A are non-negative but someare zero, then A∗ is called positive semi-definite, and in this case A∗(x, x) > 0,for all x in a subspace of dimension r in �n. Conversely if A∗ is non-degenerateand all eigenvalues are strictly negative, then A∗ is called negative definite. If theeigenvalues are non-positive, but some are zero, then A∗ is called negative semi-definite.

The index of the quadratic form A∗ is the maximal dimension of the subspaceon which A∗ is negative definite. Therefore index (A∗) is the number of strictlynegative eigenvalues of A.

When A has some eigenvalues which are strictly positive and some which arestrictly negative, then we call A∗ a saddle.

We have not as yet shown that a symmetric n× n matrix has n real roots to itscharacteristic equation. We can show however that any (symmetric) quadratic formcan be diagonalised.

Let A= (aij ) and 〈x,A(x)〉 =∑ni=1∑n

j=1 aij xixj . If aii = 0 for all i = 1, . . . , n

then it is possible to make a linear transformation of coordinates such that aij �= 0 forsome j . After relabelling coordinates we can take a11 �= 0. In this case the quadraticform can be written

⟨x,A(x)

⟩= a11x21 + 2a12x1x2, . . .

= a11

(x1 + a12

a11x2 · · ·

)2

+(

a22 − a212

a11

2)(x2 + · · ·)2 + · · ·

=n∑

i=1

αiy2i .

AU

TH

OR

’S P

RO

OF



1473

1474

1475

1476

1477

1478

1479

1480

1481

1482

1483

1484

1485

1486

1487

1488

1489

1490

1491

1492

1493

1494

1495

1496

1497

1498

1499

1500

1501

1502

1503

1504

1505

1506

1507

1508

1509

1510

1511

1512

1513

1514

1515

1516

1517

1518

Here each yi is a linear combination of {x1, . . . , xn}. Thus the transformation x→P(x)= y is non-singular and has inverse Q say.

Letting x =Q(y) we see the quadratic form becomes

⟨x,A(x)

⟩= ⟨Q(y),A ◦Q(y)⟩

= ⟨y,QtAQ(y)⟩

= ⟨y,D(y)⟩,

where D is a diagonal matrix with real diagonal entries (α1, . . . , αn). Note thatD =QtAQ and so rank(D) = rank(A) = r , say. Thus only r of the diagonal en-tries may be non zero. Since the symmetric matrix, A, can be diagonalised, not onlyare all its eigenvalues real, but its eigenvectors form a basis for �n. Consequently∧ = P−1AP where P is the n× n basis change matrix whose columns are theseeigenvectors. Moreover, if λ is an eigenvalue with multiplicity r (i.e., λ occurs asa root of the characteristic equation r times) then the eigenspace, Eλ, has dimen-sion r . �

2.3.4 Examples

Example 2.7 To give an illustration of this procedure consider a matrix

A=⎛

⎝0 0 10 1 01 0 0

⎞

⎠

representing the quadratic form x22 + 2x1x3. Let

P1(x)=⎛

⎝0 1 01 0 01 0 1

⎞

⎠

⎛

⎝x1x2x3

⎞

⎠=⎛

⎝z1z2z3

⎞

⎠

giving the quadratic form z21 − 2(z2 − 1

2z3)2 + 1

2z23 and

P2(z)=⎛

⎝1 0 00 1 − 1

20 0 1

⎞

⎠

⎛

⎝z1z2z3

⎞

⎠=⎛

⎝y1y2y3

⎞

⎠ .

Then 〈x,A(x)〉 = 〈y,D(y)〉, where

D =⎛

⎝1 0 00 −2 00 0 1

2

⎞

⎠=⎛

⎝α1 0 00 α2 00 0 α3

⎞

⎠ , and A= P t1P t

2DP2P1.

AU

TH

OR

’S P

RO

OF



1519

1520

1521

1522

1523

1524

1525

1526

1527

1528

1529

1530

1531

1532

1533

1534

1535

1536

1537

1538

1539

1540

1541

1542

1543

1544

1545

1546

1547

1548

1549

1550

1551

1552

1553

1554

1555

1556

1557

1558

1559

1560

1561

1562

1563

1564

Consequently the matrix A can be diagonalised. A has characteristic equation(1− λ)(λ2 − 1) with eigenvalues 1, 1, −1.

Then normalized eigenvectors of A are

1√2

⎛

⎝101

⎞

⎠ ,

⎛

⎝010

⎞

⎠ ,1√2

⎛

⎝10−1

⎞

⎠ ,

corresponding to the eigenvalues 1, 1, −1.Thus A∗ is a non-degenerate saddle of index 1. Let Q be the basis change matrix

1√2

⎛

⎝1 0 10 2 01 0 −1

⎞

⎠ .

Then

QtAQ= 1

2

⎛

⎝1 0 10 2 01 0 −1

⎞

⎠

⎛

⎝0 0 10 1 01 0 0

⎞

⎠

⎛

⎝1 0 10 2 01 0 −1

⎞

⎠=⎛

⎝1 0 00 1 00 0 −1

⎞

⎠ .

As a quadratic form

(x1, x2, x3)A

⎛

⎝x1x2x3

⎞

⎠= (x1, x2, x3)

⎛

⎝x3x2x1

⎞

⎠= x1x3 + x22 .

We can also write this as

(x1, x2, x3)1

2

⎛

⎝1 0 10√

2 01 0 −1

⎞

⎠

⎛

⎝1 0 00 1 00 0 −1

⎞

⎠

⎛

⎝1 0 10√

2 01 0 −1

⎞

⎠

⎛

⎝x1x2x3

⎞

⎠

= 1

2

(x1 + x3,

√2x2, x1 − x3

)⎛

⎝1 0 00 1 00 0 −1

⎞

⎠

⎛

⎝x1 + x3√

2x2x1 − x3

⎞

⎠

= 1

2(x1 + x3)

2 + 2x22 − (x1 − x3)

2.

Note that A is positive definite on the subspace {(x1, x2, x3) ∈ �3 : (x1 = x3)}spanned by the first two eigenvectors.

We can give a geometric interpretation of the behaviour of a matrix A with bothpositive and negative eigenvalues. For example

A=(

1 22 1

)

AU

TH

OR

’S P

RO

OF


2.4 Geometric Interpretation of a Linear Transformation 73

1565

1566

1567

1568

1569

1570

1571

1572

1573

1574

1575

1576

1577

1578

1579

1580

1581

1582

1583

1584

1585

1586

1587

1588

1589

1590

1591

1592

1593

1594

1595

1596

1597

1598

1599

1600

1601

1602

1603

1604

1605

1606

1607

1608

1609

1610

Fig. 2.2 ???

has eigenvectors

z1 =(

11

)z2 =

(1−1

)

corresponding to the eigenvalues 3, −1 respectively. Thus A maps the vector z1

to 3z1 and z2 to −z2. The second operation can be regarded as a reflection of thevector z2 in the line {(x, y) : x − y = 0}, associated with the first eigenvalue. Thefirst operation z1→ 3z1 is a translation of z1 to 3z1. Consider now any point x ∈ �2.We can write x = αz1 + βz2. Thus A(x)= 3αz1 − βz2. In other words A may bedecomposed into two operations: a translation in the direction z1, followed by areflection about z1.

2.4 Geometric Interpretation of a Linear Transformation

More generally suppose A has real roots to the characteristic equation and has eigen-vectors {x1, . . . , xs, z1, . . . , zt , k1, . . . , kp}.

The first s vectors correspond to positive eigenvalues, the next t vectors to nega-tive eigenvalues, and the final p vectors belong to the kernel, with zero eigenvalues.

Then A may be described in the following way:1. collapse the kernel vectors on to the image spanned by {x1, . . . , xs, z1, . . . , zt }.2. translate each xi to λixi .3. reflect each zj to −zj , and then translate to − | μj | zj (where μj is the nega-

tive eigenvalue associated with zj ).These operations completely describe a symmetric matrix or a matrix, A, which

is diagonalisable. When A is non-symmetric then it is possible for A to have com-plex roots.

For example consider the matrix

(cos θ − sin θ

sin θ cos θ

).

njschofi

Sticky Note

njschofi

Cross-Out

njschofi

Sticky Note

A Translation followed by a reflection.

njschofi

Sticky Note

, as in Figure2.2.

AU

TH

OR

’S P

RO

OF



1611

1612

1613

1614

1615

1616

1617

1618

1619

1620

1621

1622

1623

1624

1625

1626

1627

1628

1629

1630

1631

1632

1633

1634

1635

1636

1637

1638

1639

1640

1641

1642

1643

1644

1645

1646

1647

1648

1649

1650

1651

1652

1653

1654

1655

1656

Fig. 2.3 ???

As we have seen this corresponds to a rotation by θ in an anticlockwise directionin the plane �2. To determine the eigenvalues, the characteristic equation is (cos θ−λ)2 + sin2 θ = λ2 − 2λ cos θ + (cos2 θ + sin2 θ)= 0. But cos2 θ + sin2 θ = 1. Thus

λ= 2 cos θ ± 2√

cos2 θ−12 = cos θ ± i sin θ .

More generally a 2× 2 matrix with complex roots may be regarded as a trans-formation λeiθ where λ corresponds to a translation by λ and eiθ corresponds torotation by θ .

Example 2.8 Consider A= ( 2 −22 2

)with trace (A)= tr(A)= 4 and |A| = 8.

As we have seen the characteristic equation for A is (λ2 − (traceA))+ |A| = 0,

with roots trace(A)±√

(traceA)2−4|A|2 . Thus the roots are 2 ± 1

2

√16− 32 = 2± 2i =

2√

2[ 1√2+ i√

2] where cos θ = sin θ = 1√

2and so θ = 45◦. Thus

A :(

x

y

)→ 2

√2

(x cos 45 −y sin 45x sin 45 +y cos 45

).

Consequently A first sends (x, y) by a translation to (2√

2x,2√

2y) and thenrotates this vector through an angle 45◦.

More abstractly if A is an n× n matrix with two complex conjugate eigenvalues(cos θ + i sin θ), (cos θ − i sin θ), then there exists a two dimensional eigenspace Eθ

such that A(x)= λeiθ (x) for all x ∈Eθ , where λeiθ (x) means rotate x by θ withinEθ and then translate by λ.

In some cases a linear transformation, A, can be given a canonical form in termsof rotations, translations and reflections, together with a collapse onto the kernel.What this means is that there exists a number of distinct eigenspaces

{E1, . . . ,Ep,X1, . . . ,Xs,K}

where A maps1. Ej to Ej by rotating any vector in Ej through an angle θj ;2. Xj to Xj by translating a vector x in Xj to λjx, for some non-zero real number

λj ;

njschofi

Cross-Out

njschofi

Sticky Note

A translation followed by a rotation.

AU

TH

OR

’S P

RO

OF


2.4 Geometric Interpretation of a Linear Transformation 75

1657

1658

1659

1660

1661

1662

1663

1664

1665

1666

1667

1668

1669

1670

1671

1672

1673

1674

1675

1676

1677

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1689

1690

1691

1692

1693

1694

1695

1696

1697

1698

1699

1700

1701

1702

3. the kernel K to {0}.In the case that the dimensions of these eigenspaces sum to n, then the canonical

form of the matrix A is⎛

⎝eiθ 0 00 ∧ 00 0 0

⎞

⎠

where eiθ consists of p different 2× 2 matrices, and ∧ is a diagonal s × s matrix,while 0 is an (n− r)× (n− r) zero matrix, where r = rank(A)= 2p+ s.

However, even when all the roots of the characteristic equation are real, it neednot be possible to obtain a diagonal, canonical form of the matrix.

To illustrate, in Example 2.6 it is easy to show that the eigenvalue λ = 0 occurstwice as a root of the characteristic equation for the non-symmetric matrix A, eventhough the kernel is of dimension 1. The eigenvalue λ = 7 occurs once. Moreoverthe vector (3,−29,17) clearly must be an eigenvector for λ= 7, and thus span theimage of A2. However it is also clear that the vector (3,−29,17) does not span theimage of A. Thus the eigenspace E7 does not provide a basis for the image of A,and so the matrix A cannot be diagonalised.

However, as we have shown, for any symmetric matrix the dimensions of theeigenspaces sum to n, and the matrix can be expressed in canonical, diagonal, form.

In Chap. 4 below we consider smooth functions and show that “locally” such afunction can be analysed in terms of the canonical form of a particular symmetricmatrix, known as the Hessian.

AU

TH

OR

’S P

RO

OF


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

3Topology and Convex Optimisation

3.1 A Topological Space

In the previous chapter we introduced the notion of the scalar product of two vectorsin �n. More generally if a scalar product is defined on some space, then this permitsthe definition of a norm, or length, associated with a vector, and this in turn allowsus to define the distance between two vectors. A distance function or metric maybe defined on a space, X, even when X admits no norm. For example let X be thesurface of the earth. Clearly it is possible to say what is the shortest distance, d(x, y),between two points, x, y, on the earth’s surface, although it is not meaningful to talkof the “length” of a point on the surface. More general than the notion of a metricis that of a topology. Essentially a topology on a space is a mathematical notionfor making more precise the idea of “nearness”. The notion of topology can thenbe used to precisely define the property of continuity of a function between twotopological spaces. Finally continuity of a preference gives proof of existence of asocial choice and of an economic equilibrium in a world that is bounded.

3.1.1 Scalar Product and Norms

In Sect. 2.3 we defined the Euclidean scalar product of two vectors

x =n∑

i=1

xiei, and y =n∑

i=1

yiei in �n,

where {e1, . . . , en} is the standard basis, to be

〈x, y〉 =n∑

i=1

xiyi .


77

http://dx.doi.org/10.1007/978-3-642-39818-6_3

AU

TH

OR

’S P

RO

OF


78 3 Topology and Convex Optimisation

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

More generally suppose that {v1, . . . , vn} is a basis for �n, and (x1, . . . , xn),(y1, . . . , yn) are the coordinates of x, y with respect to this basis. Then

〈x, y〉 =n∑

i=1

n∑

j=1

xiyj 〈vi, vj 〉,

where 〈vi, vj 〉 is the scalar product of vi and vj . Thus

〈x, y〉 = (x1, . . . , xn)

⎛

⎜⎜⎝

...

aij

...

⎞

⎟⎟⎠

⎛

⎜⎝

y1...

yn

⎞

⎟⎠

where A = (aij ) = 〈vi, vj 〉i=1,...,n;j=1,...,n. If we let vi =∑nk=1 vikek , then clearly

〈vi, vi〉 = ∑nk=1(vik)

2 > 0. Moreover 〈vi, vj 〉 = 〈vj , vi〉. Thus the matrix A issymmetric. Since A must be of rank n, it can be diagonalized to give a matrix∧ =QtAQ, all of whose diagonal entries are positive. Here Q represents the or-thogonal basis change matrix and Qt(x1, . . . , xn) = (x′1, . . . , x′n) gives the coordi-nates of x with respect to the basis of eigenvectors of A. Hence

〈x, y〉 = ⟨x,A(y)⟩= ⟨x,Q∧Qt(y)

⟩

= ⟨Qt(x),∧Qt(y)⟩

=n∑

i=1

λix′iy′i .

Thus a scalar product is a non-degenerate positive definite quadratic form. Notethat the scalar product is bilinear since

〈x1 + x2, y〉 = 〈x1, y〉 + 〈x2, y〉 and 〈x, y1 + y2〉 = 〈x, y1〉 + 〈x, y2〉,

and symmetric since

〈x, y〉 = ⟨x,A(y)⟩= ⟨y,A(x)

⟩= 〈y, x〉.

We shall call the scalar product given by 〈x, y〉 =∑ni=1 xiyi the Euclidean scalar

product.

We define the Euclidean norm, ‖ ‖E , by ‖x‖E = √〈x, x〉 =√∑n

i=1 x2i . Note

that ‖x‖E ≥ 0 if and only if x = (0, . . . ,0). Moreover, if a ∈ �, then

‖ax‖E =√√√√

n∑

i=1

a2x2i = |a| ‖x‖E.

AU

TH

OR

’S P

RO

OF


3.1 A Topological Space 79

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

Fig. 3.1 ???

Lemma 3.1 If x, y ∈ �n, then ‖x + y‖E ≤ ‖x‖E + ‖y‖E .

Proof For convenience write ‖ x‖E as ‖x‖. Now

‖x + y‖2 = 〈x + y, x + y〉 = 〈x, x〉 + 〈x, y〉 + 〈y, x〉 + 〈y, y〉.

But the scalar product is symmetric. Therefore

‖x + y‖2 = ‖x‖2 + ‖y‖2 + 2〈x, y〉.

Furthermore (‖x‖+ ‖y‖)2 = ‖x‖2 +‖y‖2 + 2‖x‖‖y‖. Thus ‖x + y‖ ≤ ‖x‖+ ‖y‖iff 〈x, y〉 ≤ ‖x‖‖y‖. To show this note that

∑i<j (xiyj − xjyi)

2 ≥ 0. Thus∑

i<j (x2i y2

j + x2j y2

i ) ≥ 2∑

i<j xiyixj yj . Add∑n

i=1 x2i y2

i to both sides. This

gives (∑n

i=1 x2i )(∑n

i=1 y2i ) ≥ (

∑ni=1 xiyi)

2. Therefore ‖x‖2 ‖y‖2 ≥ 〈x, y〉2 and so(x,y)‖x‖‖y‖ ≤ 1, or ‖x + y‖ ≤ ‖x‖ + ‖y‖. �

In this lemma we have shown that

−1≤ 〈x, y〉‖x‖ ‖y‖ ≤ 1.

This ratio can be identified with cos θ , where θ is the angle between the two vectorsx, y. In the case of unit vectors, 〈x, y〉 can be identified with the perpendicularprojection of y onto x as in Fig. 3.1.

The property ‖x + y‖ ≤ ‖x‖ + ‖y‖ is known as the triangle inequality (seeFig. 3.2).

Definition 3.1 Let X be a vector space over the field �. A norm, ‖ ‖, on X is amapping ‖ ‖ :X→� which satisfies the following three properties:N1. ‖x‖ ≥ 0 for all x ∈X, and ‖x‖ = 0 iff x = 0.N2. ‖ax‖ = |a|‖x‖ for all x ∈X, and a ∈ �.N3. ‖x + y‖ ≤ ‖x‖ + ‖y‖ for all x, y ∈X.

njschofi

Cross-Out

njschofi

Sticky Note

Projection

AU

TH

OR

’S P

RO

OF



139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

Fig. 3.2 ???

There are many different norms on a vector space. For example if A is a non-degenerate positive definite symmetric matrix, we could define ‖ ‖A by ‖x‖A =√〈x,Ax〉.

The Cartesian norm is ‖x‖c = ‖(x1, . . . , xn)‖c =max{|x1|, . . . , |xn|}.Clearly ‖x‖c ≥ 0 and ‖x‖c = 0 iff xi = 0 for all i = 1, . . . , n. Moreover ‖ax‖c =

max{|ax1|, . . . , |axn|} =max{|a| |x1|, . . . , |a| |xn|} = |a| |xi |, for some i.Thus ‖ax‖c = |a|max{|x1|, . . . , |xn|} = |a| ‖x‖c.Finally, ‖x + y‖c = |xi + yi | for some i ≤ |xi | + |yi | ≤ max{|x1|, . . . , |xn|} +

max{|y1|, . . . , |yn|} = ‖x‖c + ‖y‖c. Define the city block norm ‖x‖B to be ‖x‖B =∑ni=1 |xi |. Clearly ‖x+y‖B =∑n

i=1 |xi+yi | ≤∑ni=1(|xi |+ |yi |)= ‖x‖B +‖y‖B .

If ‖ ‖ is a norm on the vector space X, the distance function or metric d on X

induced by ‖ ‖ is the function d :X×X→� : d(x, y)= ‖x − y‖.Note that d(x, y) ≥ 0 for all x, y ∈ X and that d(x, y) = 0 iff x − y = 0, i.e.,

x = y. Moreover,

d(x, y)+ d(y, z)= ‖x − y‖ + ‖y − z‖ ≥ ∥∥(x − y)+ (y − z)∥∥

= ‖x − z‖ = d(x, z).

Hence d(x, z) ≤ d(x, y)+ d(y, z).

Definition 3.2 A metric on a set X is a function d :X×X→� such thatD1. d(x, y)≥ 0 for all x, y ∈X and d(x, y)= 0 iff x = y

D2. d(x, z)≤ d(x, y)+ d(y, z) for all x, y, z ∈X.Note that a metric d may be defined on a set X even when X is not a vector

space. In particular d may be defined without reference to a particular norm. At thebeginning of the chapter for example we mentioned that the surface of the earth,S2, admits a metric d , where the distance between two points x, y on the surface ismeasured along a great circle through x, y. A second metric on S2 is obtained bydefining d(x, y) to be the angle, θ , subtended at the centre of the earth by the tworadii to x, y (see Fig. 3.3).

Any set X which admits a metric, d , we shall call metrisable, or a metric space.To draw attention to the metric, d , we shall sometimes write (X,d) for a metricspace.

In a metric space (X,d) the open ball at x of radius r in X is

Bd(x, r)= {y ∈X : d(x, y) < r},

njschofi

Cross-Out

njschofi

Sticky Note

Triangle Inequality

AU

TH

OR

’S P

RO

OF



185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

Fig. 3.3 ???

and the closed ball centre x of radius r is

ClosBd(x, r)= {y ∈X : d(x, y)≤ r}.

The sphere of radius r at x is

Sd(x, r)= {y ∈X : d(x, y)= r}.

In �n, the Euclidean sphere of radius r is therefore

S(x, r)={

y :n∑

i=1

(xi − yi)2 = r2

i

}

.

For convenience a sphere in �n is often written as Sn−1. Here the superfix isn− 1 because as we shall see the sphere in �n is (n− 1)-dimensional, even thoughit is not a vector space. If (X,d) is a metric space, say a set V in X is d-open iff forany x ∈ V there is some radius rx (dependent on x) such that

Bd(x, rx)⊂ V.

Lemma 3.2 Let Γd be the family of all sets in X which are d-open. Then Γd satis-fies the following properties:T1. If U,V ∈ Γd , then U ∩ V ∈ Γd .T2. If Uj ∈ Γd for all j belonging to an index set J (which is possibly infinite),

then⋃

j∈J Uj ∈ Γd .T3. Both X and the empty set, Φ , belong to Γd .

Proof Clearly X and Φ are d-open. If U and V ∈ Γd , but U ∩ V =Φ then U ∩ V

is d-open. Suppose on the other hand that x ∈U ∩V . Since both U and V are open,there exist r1, r2 such that Bd(x, r1)⊂ U and Bd(x1, r2)⊂ V . Let r =min(r1, r2).By definition

Bd(x, r)= Bd(x, r1)∩Bd(x, r2)⊂U ∩ V.

Thus there is an open ball, centre x, of radius r contained in U ∩ V .

njschofi

Cross-Out

njschofi

Sticky Note

Spherical Metric.

AU

TH

OR

’S P

RO

OF



231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

Finally suppose x ∈⋃j∈J Uj = U . Since x belongs to at least one Uj , say U1,there is an open ball B = B(x, r1) contained in U1. Since U1 is open so is U . �

Note that by T1 the finite intersection of open sets is an open set, but infiniteintersection of open sets need not be open. To see this consider a set of the form

I = (a, b)= {x ∈ � : a < x < b}.

For any x ∈ I it is possible to find an ε such that a + ε < x < b− ε. Hence theopen ball B(x, ε)= {y : x − ε < y < x + ε} belongs to I , and so I is open.

Now consider the family {Ur : r = 1, . . . ,∞} of sets of the form Ur = (− 1r, 1

r).

Clearly the origin, 0, belongs to each Ur , and so 0 ∈ ∩ Ur = U . Suppose thatU is open. Since 0 ∈ U , there must be some open ball B(0, ε) belonging to U . Letr0 be an integer such that r0 > 1

ε, so 1

r0< ε. But then Ur0 = (− 1

r0, 1

r0) is strictly

contained in (−ε, ε).Therefore the ball B(0, ε)= {y ∈ � : |y|< ε} = (−ε, ε) is not contained in Ur0 ,

and so cannot be contained in U . Hence U is not open.

3.1.2 A Topology on a Set

We may define a topology on a set X to be any collection of sets in X which satisfiesthe three properties T1′, T2′, T3′.

Definition 3.3 A topology Γ on a set X is a collection of sets in X which satisfiesthe following properties:T1′ If U,V ∈ Γ then U ∩ V ∈ Γ .T2′ If J is any index set and Uj ∈ Γ for each j ∈ J , then

⋃J Uj ∈ Γ .

T3′ Both X and the empty set belong to Γ .

A set X which has a topology Γ is called a topological space, and written (X,Γ ).The sets in Γ are called Γ -open, or simply open. An open set in Γ which containsa point x is called a neighbourhood (or nbd.) of x.

A base B for a topology Γ is a collection of Γ -open sets such that any member U

of Γ can be written as a union of members of B. Equivalently if x belongs to a Γ -open set U , then there is a member V of the base such that x ∈ V ⊂U .

By Lemma 3.2, the collection Γd of sets which are open with respect to the metricd comprise a topology called the metric topology Γd on X. We also said that a set U

belonged to Γd iff each point x ∈U had a neighbourhood Bd(x, r) which belongedto U . Thus the family of sets

B = {Bd(x, r) : x ∈X,r > 0}

forms a base for the metric topology Γd on X.

AU

TH

OR

’S P

RO

OF



277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

Fig. 3.4 ???

Consider again the metric topology on �. As we have shown any set of the form(a, b) is open. Indeed a set of the form (−∞, a) or (b,∞) is also open.

In general if U is an open set (in the topology Γ for the topological space X),then the complement UX =X\U of U in X is called closed.

Thus in �, the set [a, b] = {x ∈ � : a ≤ x ≤ b} is closed since it is the com-plement of the open set (−∞, a) ∪ (b,∞). Note that the sets (−∞, a] and [b,∞)

are also closed since they are complements of the open sets (a,∞) and (−∞, b)

respectively.If A is any set in a topological space, (X,Γ ), then define the open set, Int(A),

called the interior of A, by x ∈ Int(A) iff x is in A and there exists an open set G

containing x such that G⊂A.Conversely, define the closed set, Clos(A), or closure of A, by x ∈ Clos(A) iff x

is in X and for any open set G containing x, G∩A is non empty.Clearly Int(A) ⊂ A ⊂ Clos(A). (See Exercise ?? at the end of the book.) The

boundary of A, written δA, is Clos(A)∩Clos(AX), where AX =X\A.For example, if A is an open set, the complement (AX) of A in X is a closed

set containing δA (i.e., Clos(AX)= AX). The closure, Clos(A), on the other handis the closed set which intersects (AX) precisely in δA. Clearly if x belongs to theboundary of A, then any neighbourhood of x intersects both A and its complement.

A point x is an accumulation or limit point of a set A if any open set U containingx also contains points of A different from x. If A is closed then it contains its limitpoints. If A is a subset of X and Clos(A) = X then call A dense in X. Note thatthis means that any point x ∈X either belongs to A or, if it belongs to X\A, has theproperty that for any neighborhood U of x, U ∩ A �= Φ . Thus if x ∈ X\A it is anaccumulation point of A. If A is dense in X a point outside A may be ‘approximatedarbitrarily closely’ by points inside A.

For example the set of non-integer real numbers Z is dense (as well as open) in�.The set of rational numbers Q = {p

q: p,q ∈ Z} is also dense in �, but not open,

since any neighbourhood of a rational number must contain irrational numbers. Foreach rational q ∈Q, note that �\{q} is open, and dense. Thus

�\Q=⋂

q∈Q

{�\{q}} :

the set of irrationals is the “countable” intersection of open dense sets. Moreover�\Q is itself dense. Such a set is called a residual set, and is in a certain sense“more dense” than a dense set.

njschofi

Cross-Out

njschofi

Sticky Note

Accumulation point.

njschofi

Cross-Out

njschofi

Sticky Note

3.1.

njschofi

Sticky Note

See Figure 3.4.

AU

TH

OR

’S P

RO

OF



323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

Fig. 3.5 ???

We now consider different topologies on a set X.

Definition 3.41. If (X,Γ ) is a topological space, and Y is a subset of X, the relative topology

ΓY on Y is the topology whose open sets are {U ∩ Y :U ∈ Γ }.2. If (X,Γ ) and (Y,S) are topological spaces the product topology Γ ×S on the

set X× Y is the topology whose base consists of all sets of the form {U × V :U ∈ Γ,V ∈ S}.

We already introduced the relative topology in Sect. 1.1.2, and showed that thisformed a topology. To show that Γ × S is a topology for X × Y we need to showthat the union and intersection properties are satisfied. Suppose that

Wi =Ui × Vi

for i = 1,2, where Ui ∈ Γ,Vi ∈ S . Now W1 ∩ W2 = (U1 × V1) ∩ (U2 × V2) =(U1∩U2) × (V1∩V2). But U1∩U2 ∈ Γ,V1∩V2 ∈ S , since S and Γ are topologies.Thus W1 ∩W2 ∈ Γ × S . Suppose now that x ∈W1 ∪W2. Then x belongs either toU1 × V1 or U2 × V2 (or both). In either case x belongs to a member of the base forΓ × S .

Another way of expressing the product topology is that W is open in the producttopology iff for any x ∈W there exist open sets, U ∈ Γ and V ∈ S , such that x ∈U × V and U × V ⊂W .

For example consider the metric topology Γ induced by the norm ‖ ‖ on �.This gives the product topology Γ n on �n, where U is open in Γ n iff for eachx = (x1, . . . , xn) ∈ U there exists an open interval B(xi, ri) of radius ri about theith coordinate xi such that

B(x1, r1)× · · · ×B(xn, rn)⊂U.

Consider now the Cartesian norm on �n, where

‖x‖C =max{|x1|, . . . , |xn|

}.

njschofi

Cross-Out

njschofi

Sticky Note

Open Sets in the Product Topology.

AU

TH

OR

’S P

RO

OF



369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

Fig. 3.6 The producttopology in �2

Fig. 3.7 A Cartesian openball of radius r in �2

This induces a Cartesian metric

dC(x, y)=max{|x1 − y1|, . . . , |xn − yn|

}.

A Cartesian open ball of radius r about x is then the set

BC(x, r)= {y ∈ �n : |yi − xi |< r ∀i = 1, . . . , n}.

A set U is open in the Cartesian topology ΓC for �n iff for every x ∈ U thereexists some r > 0 such that the ball BC(x, r)⊂U .

Suppose now that U is an open set in the product topology Γ n for �n. At anypoint x ∈U , there exist r1 . . . rn all > 0 such that

B = B(x1, r1)×, . . . ,B(xn, rn)⊂U.

Now let r =min(r1, . . . , rn). Then clearly the Cartesian ball BC(x, r) belongs tothe product ball B , and hence to U . Thus U is open in the Cartesian topology.

On the other hand if U is open in the Cartesian topology, for each point x

in U , the Cartesian ball BC(x, r) belongs to U . But this means the product ballB(x1, r)×, . . . ,B(xn, r) also belongs to U . Hence U is open in the product topol-ogy.

njschofi

Sticky Note

See Figure 3.7.

AU

TH

OR

’S P

RO

OF



415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

Fig. 3.8 ???

We have therefore shown that a set U is open in the product topology Γ n on�n iff it is open in the Cartesian topology ΓC on �n. Thus the two topologies areidentical.

We have also introduced the Euclidean and city block norms on �n. These normsinduce metrics and thus the Euclidean topology ΓE and city block topology ΓB

on �n. As before a set U is open in ΓE (resp. ΓB ) iff for any point x ∈ U , there isan open neighborhood

BE(x, r)={

y ∈ �n :n∑

i=1

(yi − xi)2 < r2

}

,

resp. BB(x, r)= {y ∈ �n :∑ |yi − xi |< r} of x which belongs to U .(The reason we use the term “city block” should be obvious from the nature of a

ball under this metric, so displayed in Fig. 3.8.)In fact these three topologies ΓC , ΓE and ΓB on �n are all identical. We shall

show this in the case n= 2.

Lemma 3.3 The Cartesian, Euclidean and city block topologies on �2 are identi-cal.

Proof Suppose that U is an open set in the Euclidean topology ΓE for �2. Thus atx ∈U , there is an r > 0, such that the set

BE(x, r)={

y ∈ �2 :2∑

i=1

(yi − xi)2 < r2

}

⊂U.

From Fig. 3.9, it is obvious that the city block ball BB(x, r) also belongs toBE(x, r) and thus U . Thus U is open in ΓB .

On the other hand the Cartesian ball BC(x, r2 ) belongs to BB(x, r) and thus to U .

Hence U is open in ΓC .Finally the Euclidean ball BE(x, r

2 ) belongs to BC(x, r2 ). Hence if U is open

in ΓC it is open in ΓE .

njschofi

Cross-Out

njschofi

Sticky Note

Balls under different metrics

AU

TH

OR

’S P

RO

OF



461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

Fig. 3.9 ???

Thus U open in ΓE ⇒U open in ΓB ⇒U open in ΓC ⇒U open in ΓE .Consequently all three topologies are identical in �2. �

Suppose that Γ1 and Γ2 are two topologies on a space X. If any open set U inΓ1 is also an open set in Γ2 then say that Γ2 is as fine as Γ1 and write Γ1 ⊂ Γ2. IfΓ1 ⊂ Γ2 and Γ2 ⊂ Γ1 then Γ1 and Γ2 are identical, and we write Γ1 = Γ2. If Γ1 ⊂ Γ2

but Γ2 contains an open set that is not open in Γ1 then say Γ2 is finer than Γ1. Wealso say Γ1 is coarser than Γ2.

If d1 and d2 are two metrics on a space X, then under some conditions thetopologies Γ1 and Γ2 induced by d1 and d2 are identical. Say the metrics d1

and d2 are equivalent iff for each ε > 0 there exist η1 > 0 and η2 > 0 such thatd1(x, y) < η1 ⇒ d2(x, y) < ε, and d2(x, y) < η2 ⇒ d1(x, y) < ε. Another wayof expressing this is that B1(x, η1) ⊂ B2(x, ε), and B2(x, η2) ⊂ B2(x, ε), whereBi(x, r)= {y : di(x, y) < r} for i = 1 or 2.

Just as in Lemma 3.3, the Cartesian, Euclidean and city block metrics on �n areequivalent. We can use this to show that the induced topologies are identical.

We now show that equivalent metrics induce identical topologies.If f :X→� is a function, and V is a set in X, define sup(f,V ), the supremum

(from the Latin, supremus) of f on V to be the smallest number M ∈ � such thatf (x)≤M for all x ∈ V .

Similarly define inf(f,V ), the infimum (again from the Latin, infimus) of f on V

to be the largest number m ∈ � such that f (x)≥m for all x ∈ V .Let d : X × X→� be a metric on X. Consider a point, x, in X, and a subset

of X. Then define the distance from x to V to be d(x,V )= inf(d(x,−),V ), whered(x,−) : V →� is the function d(x,−)(y)= d(x, y).

Suppose now that U is an open set in the topology Γ1 induced by the metric d1.For any point x ∈ U there exists r > 0 such that B1(x, r) ⊂ U , where B1(x, r) ={y ∈X : d1(x, y) < r}. Since we assume the metrics d1 and d2 are equivalent, theremust exist s > 0, say, such that

B2(x, s)⊂ B1(x, r)

njschofi

Cross-Out

njschofi

Sticky Note

Balls under different metrics.

AU

TH

OR

’S P

RO

OF



507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

Fig. 3.10 ???

where B2(x, s)= {y ∈X : d2(x, y) < s}. Indeed one may choose s = d2(x,B1(x, r))

where B1(x, r) is the complement of B1(x, r) in X (see Fig. 3.10). Clearly theset U must be open in Γ2 and so Γ2 is as fine as Γ1. In the same way, how-ever, there exists t > 0 such that B1(x, t)⊂ B2(x, s), where again one may chooset = d1(x,B2(x, s)). Hence if U is open in Γ2 it is open in Γ1. Thus Γ1 is as fineas Γ2. As a consequence Γ1 and Γ2 are identical.

Thus we obtain the following lemma.

Lemma 3.4 The product topology, Γ n, Euclidean topology ΓE , Cartesian topol-ogy ΓC and city block topology ΓB are all identical on �n.

As a consequence we may use, as convenient, any one of these three metrics, orany other equivalent metric, on �n knowing that topological results are unchanged.

3.2 Continuity

Suppose that (X,Γ ) and (Y,S) are two topological spaces, and f : X → Y is afunction between X and Y . Say that f is continuous (with respect to the topologiesΓ and S) iff for any set U in S (i.e., U is S-open) the set f−1(U)= {x ∈X : f (x) ∈U} is Γ -open.

This definition can be given an alternative form in the case when X and Y aremetric spaces, with metrics d1, d2, say.

Consider a point x0 in the domain of f . For any ε > 0 the ball

B2(f (x0), ε

)= {y ∈ Y : d2(f (x0), y

)< ε}

is open. For continuity, we require that the inverse image of this ball is open. Thatis to say there exists some δ, such that the ball

B1(x0, δ)={x ∈X : d1(x0, x) < δ

}

belongs to f−1(B2(f (x0), ε)). Thus x ∈ B1(x0, δ)⇒ f (x) ∈ B2(f (x0), ε).Therefore say f is continuous at x0 ∈X iff for any ε > 0,∃ δ > 0 such that

f(B1(x0, δ)

)⊂ B2(f (x0), ε

).

njschofi

Cross-Out

njschofi

Sticky Note

Balls centered at x

AU

TH

OR

’S P

RO

OF


3.2 Continuity 89

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

Fig. 3.11 ???

In the case that X and Y have norms ‖ ‖X and ‖ ‖Y , then we may say that f iscontinuous at x0 iff for any ε > 0,∃ δ > 0 such that

‖x − x0‖x < δ⇒ ∥∥f (x)− f (x0)∥∥

y< ε.

Then f is continuous on X iff f is continuous at every point x in its domain.If X,Y are vector spaces then we may check the continuity of a function f :

X→ Y by using the metric or norm form of the definition.For example suppose f : �→� has the graph given in Fig. 3.11. Clearly f is not

continuous. To see this formally let f (x0)= y0 and choose ε such that |y1−y0|> ε.If x ∈ (x0 − δ, x0) then f (x) ∈ (y0 − ε, y0).

However for any δ > 0 it is the case that x ∈ (x0, x0 + δ) implies f (x) > y0 + ε.Thus there exists no δ > 0 such that

x ∈ (x0 − δ, x0 + δ) ⇒ f (x) ∈ (y0 − ε, y0 + ε).

Hence f is not continuous at x0.We can give an alternative definition of continuity. A sequence of points in a

space X is a collection of points {xk : k ∈ Z}, indexed by the positive integers Z .The sequence is written (xk). The sequence (xk) in the metric space, X, has a limit,x, iff ∀ε > 0 ∃k0 ∈Z such that k > k0 implies ‖xk − x‖X < ε.

In this case write xk → x as k→∞, or Limk→∞xk = x.Note that x is then an accumulation point of the sequence {x1, . . .}.More generally (xn)→ x iff for any open set G containing x, all but a finite

number of points in the sequence (xk) belong to G.Thus say f is continuous at x0 iff

xk → x0 implies f (xk)→ f (x0).

Example 3.1 Consider the function f : �→� given by

f : x→{

x sin 1x

if x �= 0

0 if x = 0.

njschofi

Cross-Out

njschofi

Sticky Note

A discontinuity

AU

TH

OR

’S P

RO

OF



599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

Now x sin 1x= siny

ywhere y = 1

x. Consider a sequence (xk) where

Limk→∞ xk = 0. We shall write this limit as x → 0. Limx→0 x sin 1x= siny

y= 0

since | siny| is bounded above by 1, and Limy→∞ 1y= 0. Thus Limx→0 f (x)= 0.

But f (0)= 0, and so xk → 0 implies f (xk)→ 0. Hence f is continuous at 0. Onthe other hand suppose g(x) = sin 1

x. Clearly g(x) has no limit as x → 0. To see

this observe that for any sequence (xk)→ 0 it is impossible to find a neighborhoodG of some point y ∈ [−1,1] such that g(xk) ∈G whenever k > ko.

Any linear function between finite-dimensional vector spaces is continuous. Thusthe set of continuous functions contains the set of linear functions, when the domainis finite-dimensional. To see this suppose that f : V →W is a linear transformationbetween normed vector spaces. (Note that V and W may be infinite-dimensional.)Let ‖ ‖v and ‖ ‖w be the norms on V,W respectively.

Say that f is bounded iff ∃ B > 0 such that ‖f (x)‖w ≤ B‖x‖v for all x ∈ V .Suppose now that

∥∥f (x)− f (x0)∥∥

w< ε.

Now∥∥f (x)− f (x0)

∥∥w= ∥∥f (x − x0)

∥∥w≤ B‖x − x0‖v,

since f is linear and bounded. Choose δ = εB

. Then

‖x − x0‖v < δ⇒ ∥∥f (x)− f (x0)∥∥

w≤ B‖x − x0‖v.

Thus if f is linear and bounded it is continuous.

Lemma 3.5 Any linear transformation f : V →W is bounded and thus continuousif V is finite-dimensional (of dimension n).

Proof Use the Cartesian norm ‖ ‖c on V , and let ‖ ‖w be the norm on W . Lete1 . . . en be a basis for V ,

‖x‖c = sup{|xi | : i = 1, . . . , n

}, and

e= supn

{∥∥f (ei)∥∥

w: i = 1, . . . , n

}.

Now f (x) =∑i=1 xif (ei). Thus ‖f (x)‖w ≤∑ni ‖f (xiei)‖w , by the triangle

inequality, and n≤∑ni=1 |xi |‖f (ei)‖w , since ‖ay‖w = |a| ‖y‖w ≤ n e‖x‖c. Thus f

is bounded, and hence continuous with respect to the norms ‖ ‖c, ‖ ‖w . But for anyother norm ‖ ‖v it can be shown that there exists B ′ > 0 such that ‖x‖c ≤ B ′‖x‖v .Thus f is bounded, and hence continuous, with respect to the norms ‖ ‖v , ‖ ‖w . �

Consider now the set L(�n,�m) of linear functions from �n to �m. Clearly iff,g ∈ L(�n,�m) then the sum f + g defined by (f + g)(x)= f (x)+ g(x) is alsolinear, and for any α ∈ �, αf , defined by (αf )(x)= α(f (x)) is linear.

AU

TH

OR

’S P

RO

OF


3.2 Continuity 91

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

Hence L(�n,�m) is a vector space over �. Since �n is finite dimensional,by Lemma 3.5 any member of L(�n,�m) is bounded. Therefore for any f ∈L(�n,�m) we may define

‖f ‖ = supx∈�n

{‖f (x)‖‖x‖ : ‖x‖ �= 0

}.

Since f is bounded this is defined. Moreover ‖ ‖ = 0 only if f is the zerofunction. By definition ‖f ‖ is the real number such that ‖f ‖ ≤ B for all B suchthat ‖f (x)‖ ≤ B‖x‖. In particular ‖f (x)‖ ≤ ‖f ‖ ‖x‖. If f,g ∈ L(�n,�m), then‖(f + g)(x)‖ = ‖f (x) + g(x)‖ ≤ ‖f (x)‖ + ‖g(x)‖ ≤ ‖f ‖ ‖x‖ + ‖g‖ ‖x‖ =(‖f ‖ + ‖g‖)‖x‖. Thus ‖f + g‖ ≤ ‖f ‖ + ‖g‖.

Hence ‖ ‖ on L(�n,�m) satisfies the triangle inequality, and so L(�n,�m) is anormed vector space. This in turn implies that L(�n,�m) has a metric and thus atopology. It is often useful to use the metrics

d1(f, g)= sup{∥∥f (x)− g(x)

∥∥ : x ∈ �n}, or

d2(f, g)= sup{∣∣fi(x)− gi(x)

∣∣ : i = 1, . . . ,m,x ∈ �n}

where f = (f1, . . . , fm), g = (g1, . . . , gm). We write L1(�n,�m) and L2(�n,�m)

for the set L(�n,�m) with the topologies induced by d1 and d2 respectively. Clearlythese two topologies are identical.

Alternatively, choose bases for�n and�m and consider the matrix representationfunction

M : (L(�n,�m),+)→ (M(n,m),+).

On the right hand side we add matrices element by element under the rule (aij )+(bij )= (aij + bij ). With this operation M(n,m) is also a vector space. Clearly wemay choose a basis for M(n,m) to be {Eij : i = 1, . . . , n; j = 1, . . . ,m} where Eij

is the elementary matrix with 1 in the ith column and j th row.Thus M(n,m) is a vector space of dimension nm. Since M is a bijection,

L(�n,�m) is also of dimension nm.A norm on M(n,m) is given by

‖A‖ = sup{|aij | : i = 1, . . . , n; j = 1, . . . ,m

},

where A= (aij ).This in turn defines a metric and thus a topology on M(n,m). Finally this de-

fines a topology on L(�n,�m) as follows. For any open set U in M(n,m), letV =M−1(U) and call V open. The base for the topology on L(�n,�m) then con-sists of all sets of this form. One can show that the topology induced on L(�n,�m)

in this way is independent of the choice of bases. We call this the induced topologyon L(�n,�m) and write L(�n,�m) for this topological space. If we consider the

AU

TH

OR

’S P

RO

OF



691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

norm topology on M(n,m) and the induced topology L(�n,�m) then the represen-tation map is continuous. Moreover the two topologies induced by the metrics d1and d2 on L(�n,�m) are identical to the induced topology L(�n,�m). Thus M isalso continuous when these metric topologies are used for its domain. (Exercise ??at the end of the book is devoted to this observation.)

If V is an infinite dimensional vector space and f ∈ L(V,W), then f need notbe continuous or bounded. However the subset of L(V,W) consisting of bounded,and thus continuous, maps in L(V,W) admits a norm and thus a topology. So wemay write this subset as L(V ,W).

Now let Co(�n,�m) be the set of continuous functions from �n to �m. We nowshow that Co(�n,�m) is a vector space.

Lemma 3.6 Co(�n,�m) is a vector space over �.

Proof Suppose that f,g are both continuous maps. At any x0 ∈ �n, and any ε > 0,∃ δ1, δ2 > 0 such that

‖x − x0‖< δ1 ⇒∥∥f (x)− f (x0)

∥∥<

(1

2

)ε

‖x − x0‖< δ2 ⇒∥∥g(x)− g(x0)

∥∥<

(1

2

)ε.

Let δ = min (δ1, δ2). Then ‖x−x0‖< δ⇒‖(f +g)(x)− (f +g)(x0)‖ = ‖f (x)−f (x0) + g(x) − g(x0)‖ ≤ ‖f (x) − f (x0)‖ + ‖g(x) − g(x0)‖ < ( 1

2 )ε + ( 12 )ε = ε.

Thus f + g ∈ Co(�n,�m).Also for any α ∈ �, any ε > 0 ∃ δ > 0 such that

‖x − x0‖< δ⇒ ∥∥f (x)− f (x0)∥∥<

ε

|α| .

Therefore ‖(αf )(x) − (αf )(x0)‖ = |α| ‖f (x) − f (x0)‖ < ε. Thus αf ∈ Co(�n,

�m). Hence Co(�n,�m) is a vector space. �

Since [L(�n,�m)] is closed under addition and scalar multiplication, it is a vec-tor subspace of dimension nm of the vector space Co(�n,�m). Note however thatCo(�n,�m) is an infinite-dimensional vector space (a function space).

Lemma 3.7 If (X,Γ ), (Y,S) and (Z,R) are topological spaces and Co((X,Γ ),

(Y,S)),Co((Y,S), (Z,R)) and Co((X,Γ ), (Z,R)) are the sets of functions whichare continuous with respect to these topologies, then the composition operator, ◦,maps Co((X,Γ ), (Y,S))×Co((Y,S), (Z,R)) to Co((X,Γ ), (Z,R)).

Proof Suppose f : (X,Γ )→ (Y,S) and g : (Y,S)→ (Z,R) are continuous. Weseek to show that g ◦ f : (X,Γ )→ (Z,R) is continuous. Choose any open set U

in Z. By continuity of g,g−1(U) is an S-open set in Y . But by the continuity of

AU

TH

OR

’S P

RO

OF


3.3 Compactness 93

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

f,f−1(g−1(U)) is a Γ -open set in X. However f−1g−1(U)= (g ◦ f )−1(U). Thusg ◦ f is continuous. �

Note therefore that if f ∈ L(�n,�m) and g ∈L(�m,�k) then g◦f ∈ L(�n,�k)

will also be continuous.

3.3 Compactness

Let (X,Γ ) be a topological space. An open cover for X is a family {Uj : j ∈ J } ofΓ -open sets such that X =⋃j∈J Uj .

If U = {Uj : j ∈ J } is an open cover for X,a subcover of U is an open cover U ′of X where U ′ = {Uj : j ∈ J ′} and the index set J ′ is a subset of J . The subcoveris called finite if J ′ is a finite set (i.e., |J ′|, the number of elements of J ′, is finite).

Definition 3.5 A topological space (X,Γ ) is called compact iff any open coverof X has a finite subcover. If Y is a subset of the topological space (X,Γ ) then Y

is compact iff the topological space (Y,ΓY ) is compact. Here ΓY is the topologyinduced on Y by Γ . (See Sect. 1.1.3.)

Say that a family CJ = {Cj : j ⊂ J } of closed sets in a topological space (X,Γ )

has the finite intersection property (FIP) iff whenever J ′ is a finite subset of J then⋂j∈J ′ Cj is non-empty.

Lemma 3.8 A topological space (X,Γ ) is compact iff whenever CJ is a family ofclosed sets with the finite intersection property then

⋂j∈J Cj is non-empty.

Proof We establish first of all that UJ = {Uj : j ∈ J } is an open cover of X iff thefamily CJ = {Cj =X\Uj : j ∈ J } of closed sets has empty intersection. Now

⋃

J

Uj =⋃

J

(X\Cj )=⋃

J

(X ∩Cj )

=X ∩(⋃

j

Cj

)=X ∩

(⋂

J

Cj

).

Thus

⋃

J

Uj =X iff⋂

J

Cj =Φ.

To establish necessity, suppose that X is compact and that the family CJ hasthe FIP. If CJ in fact has empty intersection then UJ = {X\Cj : j ∈ J } must bean open cover. Since X is compact there exists a finite set J ′ ⊂ J such that UJ ′ is

AU

TH

OR

’S P

RO

OF



783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

a cover. But then CJ ′ has empty intersection contradicting FIP. Thus CJ has non-empty intersection. To establish sufficiency, suppose that any family CJ , satisfyingFIP, has non-empty intersection. Let UJ = {Uj : j ∈ J } be an open cover for X.If UJ has no finite subcover, then for any finite J ′, the family CJ ′ = {X\Uj : j ∈J ′} must have non-empty intersection. By the FIP, the family CJ must have non-empty intersection, contradicting the assumption that UJ is a cover. Thus (X,Γ ) iscompact. �

This lemma allows us to establish conditions under which a preference relation P

on X has a non-empty choice CP (X). (See Lemma 1.8 for the case with X finite.)Say the preference relation P on the topological space X is lower demi-continuous(LDC) iff the inverse correspondence φ−1

P : X→ X : x → {y ∈ X : xPy} is openfor all x in X.

Lemma 3.9 If X is a compact topological space and P is an acyclic and lowerdemi-continuous preference on X, then there exists a choice x of P in X.

Proof Suppose on the contrary that there is no choice. Thus for all x ∈ X thereexists some y ∈X such that yPx. Thus x ∈ φ−1

P (y). Hence U = {φ−1P (y) : y ∈X}

is a cover for X. Moreover for each y ∈ X, φ−1P (y) is open. Since X is compact,

there exists a finite subcover of U . That is to say there exists a finite set A in X

such that U ′ = {φ−1P (y) : y ∈ A} is a cover for X. In particular this implies that

for each x ∈ A, there is some y ∈ A such that x ∈ P−1P (y), or that yPx. But then

CP (A)= {x ∈A : φP (x)=Φ} =Φ . Now P is acyclic on X, and thus acyclic on A.Hence, by the acyclicity of P and Lemma 1.8, CP (A) �= Φ . By the contradictionU = {φ−1

P (y) : y ∈ X} cannot be a cover. That is to say there is some x ∈ X suchthat x ∈ φ−1

P (y) for no y ∈X. But then yPx for no y ∈X, and x ∈ CP (X), or x isthe choice on X. �

This lemma can be used to show that a continuous function f : X→� from acompact topological space X into the reals attains its bounds. Remember that wedefined the supremum and infimum of f on a set Y to be1. sup(f,Y ) = M such that f (x) ≤ M for all x ∈ Y and if f (x) ≤ M ′ for all

x ∈ Y then M ′ ≥M

2. inf(f,Y )=m such that f (x)≥m for all x ∈ Y and if f (x)≥m′ for all x ∈ Y

then m≥m′.Say f attains its upper bound on Y iff there is some xs in Y such that f (xs) =sup(f,Y ). Similarly say f attains its lower bound on Y iff there is some xi in Y

such that f (xi)= inf(f,Y ).Given the function f : X → �, define a preference P on X × X by xPy iff

f (x) > f (y). Clearly P is acyclic, since > on� is acyclic. Moreover for any x ∈X,

φ−1P (x)= {y : f (y) < f (x)

}

is open, when f is continuous.

AU

TH

OR

’S P

RO

OF


3.3 Compactness 95

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

To see this let U = (−∞, f (x)). Clearly U is an open set in �. Moreover f (y)

belongs to the open interval (−∞, f (x)) iff y ∈ φ−1P (x). But f is continuous, and

so f−1(U) is open in X. Since y ∈ f−1(U) iff y ∈ φ−1P (x),φ−1

P (x) is open for anyx ∈X.

Weierstrass Theorem Let (X,Γ ) be a topological space and f :X→� a contin-uous real-valued function. If Y is a compact subset of X, then f attains its boundson Y .

Proof As above, for each x ∈ Y , define Ux = (−∞, f (x)). Then φ−1P (x) = {y ∈

Y : f (y) < f (x)} = f−1(Ux) ∩ Y is an open set in the induced topology on Y . ByLemma 3.9 there exists a choice x in Y such that φ−1

P (x) = Φ . But then f (y) >

f (x) for no y ∈ Y . Hence f (y)≤ f (x) for all y ∈ Y . Thus f (x)= sup(f,Y ).In the same way let Q be the relation on X given by xQy iff f (x) < f (y). Then

there is a choice x ∈ Y such that f (y) < f (x) for no y ∈ Y . Hence f (y)≥ f (x) forall y ∈ Y , and so f (x)= inf(f,Y ). Thus f attains its bounds on Y . �

We can develop this result further.

Lemma 3.10 If f : (X,Γ )→ (Z,S) is a continuous function between topologicalspaces, and Y is a compact subset of X, then

f (Y )= {f (y) ∈Z : y ∈ Y}

is compact.

Proof Let {Wα} be an open cover for f (Y ). Then each member Wα of this covermay be expressed in the form Wα = Uα ∩ f (Y ) where Uα is an open set in Z. Foreach α, let Vα = f−1(Uα)∩ Y . Now each Vα is open in the induced topology on Y .Moreover, for each y ∈ Y , there exists some Wα such that f (y) ∈Wα . Thus {Vα} isan open cover for Y . Since Y is compact, {Vα} has a finite subcover {Vα : α ∈ J },and so {f (Vα) : α ∈ J } is a finite subcover of {Wα}. Thus f (Y ) is compact. �

Now a real-valued continuous function f is bounded on a compact set, Y (bythe Weierstrass Theorem). So f (Y ) will be contained in [f (x,f (x))] say, for somex ∈ Y . Since f (Y ) must also be compact, this suggests that a closed set of the form[a, b] must be compact.

For a set Y ⊂ � define sup(Y ) = sup(id, Y ), the supremum of Y , and inf(Y ) =inf(id, Y ), the infimum of Y . Here id : �→� is the identity on �. The set Y ⊂� isbounded above (or below) iff its supremum (or infimum) is finite. The set is boundediff it is both bounded above and below. Thus a set of the form [a, b], say with−∞< a < b <+∞ is bounded.

Heine Borel Theorem A closed bounded interval, [a, b], of the real line is compact.

AU

TH

OR

’S P

RO

OF



875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

Sketch of Proof Consider a family C = {[a, ci], [dj , b] : i ∈ I, j ∈ J } of subsets of[a, b] with the finite intersection property. Suppose that neither I nor J is empty.Let d = sup({dj : j ∈ J }) and suppose that for some k ∈ I , ck < d . Then there existsi ∈ I and j ∈ J such that [a, ci] ∩ [dj , b] =Φ , contradicting the finite intersectionproperty. Thus ci ≥ d , and so [a, ci] ∩ [d, b] �=Φ , for all i ∈ I . Hence the family C

has non empty intersection. By Lemma 3.8, [a, b] is compact �

Definition 3.6 A topological space (X,Γ ) is called Hausdorff iff any two distinctpoints x, y ∈X have Γ -open neighbourhoods Ux,Uy such that Ux ∩Uy =Φ .

Lemma 3.11 If (X,d) is a metric space then (X,Γd) is Hausdorff, where Γd is thetopology on X induced by the metric d .

Proof For two points x �= y, let ε = d(x, y) �= 0. Define Ux = B(x, ε3 ) and Ux =

B(y, ε3 ).

Clearly, by the triangle inequality, B(x, ε3 )∩B(y, ε

3 )=Φ . Otherwise there wouldexist a point z such that d(x, z) < ε

3 , d(z, y) < ε3 , which would imply that d(x, y) <

d(x, z)+ d(z, y)= 2ε3 . By contradiction the open balls of radius ε

3 do not intersect.Thus (x,Γd) is Hausdorff. �

A Hausdorff topological space is therefore a natural generalisation of a metricspace.

Lemma 3.12 If (X,Γ ) is a Hausdorff topological space, then any compact subsetY of X is closed.

Proof We seek to show that X\Y is open, by showing that for any x ∈X\Y , thereexists a neighbourhood G of x and an open set H containing Y such that G∩H =Φ . Let x ∈ X\Y , and consider any y ∈ Y . Since X is Hausdorff, there exists aneighbourhood V (y) of y and a neighbourhood U(y), say, of x such that V (y) ∩U(y)=Φ . Since the family {V (y) : y ∈ Y } is an open cover of Y , and Y is compact,there exists a finite subcover {V (y) : y ∈A}, where A is a finite subset of Y .

Let H =⋃y∈A V (y) and G=⋂y∈A U(y).Suppose that G∩H �=Φ . Then this implies there exists y ∈A such that V (y)∩

U(y) is non-empty. Thus G ∩H =Φ . Since A is finite, G is open. Moreover Y iscontained in H . Thus X\Y is open and Y is closed. �

Lemma 3.13 If (X,Γ ) is a compact topological space and Y is a closed subsetof X, then Y is compact.

Proof Let {Uα} be an open cover for Y , where each Uα ⊂X. Then {Vα = Uα ∩ Y }is also an open cover for Y .

Since X\Y is open, {X\Y } ∪ {Vα} is an open cover for X.

AU

TH

OR

’S P

RO

OF


3.3 Compactness 97

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

Since X is compact there is a finite subcover, and since each Vα ⊂ Y , X\Ymust be a member of this subcover. Hence the subcover is of the form {X\Y } ∪{Vj : j ∈ J }. But then {Vj : j ∈ J } is a finite subcover of {Vα} for Y . Hence Y iscompact. �

Tychonoff’s Theorem If (X,Γ ) and (Y,S) are two compact topological spacesthen (X× Y,Γ × S), with the product topology, is compact.

Proof To see this we need only consider a cover for X × Y of the form {Uα × Vβ}for {Uα} an open cover for X and {Vβ} an open cover for Y . Since both X and Y

are compact, both {Uα} and {Vα} have finite subcovers {Uj }j∈J and {Vk}k∈K , andso {Uj × Vk : (j, k) ∈ J ×K} is a finite subcover for X× Y . �

As a corollary of this theorem, let Ik = [ak, bk] for k = 1, . . . , n be a family ofclosed bounded intervals in �. Each interval is compact by the Heine Borel Theo-rem. Thus the closed cube In = I1 × I2, . . . , In in �n is compact, by Tychonoff’sTheorem. Say that a set Y ⊂�n is bounded iff for each y ∈ Y there exists some finitenumber K(y) such that ‖x − y‖< K(y) for all x ∈ Y . Here ‖ ‖ is any convenientnorm on �n. If Y is bounded then clearly there exists some closed cube In ⊂ �n

such that Y ⊂ In. Thus we obtain:

Lemma 3.14 If Y is a closed bounded subset of �n then Y is compact.

Proof By the above, there is a closed cube In such that Y ⊂ In. But In is com-pact by Tychonoff’s Theorem. Since Y is a closed subset of In, Y is compact, byLemma 3.13. �

In �n a compact set Y is one that is both closed and bounded. To see this notethat �n is certainly a metric space, and therefore Hausdorff. By Lemma 3.12, ifY is compact, then it must be closed. To see that it must be bounded, consider theunbounded closed interval A= [0,∞) in�, and an open cover {Uk = (k−2, k); k =1, . . . ,∞}. Clearly {(−1,1), (0,2), (1,3), . . .} cover [0,∞). A finite subcover mustbe bounded above by K , say, and so the point K does not belong to the subcover.Hence [0,∞) is non-compact.

Lemma 3.15 A compact subset Y of � contains its bounds.

Proof Let s = sup(id, Y ) and i = inf(id, Y ) be the supremum and infimum of Y .Here id : � → � is the identity function. By the discussion above, Y must bebounded, and so i and s must be finite. We seek to show that Y contains thesebounds, i.e., that i ∈ Y and s ∈ Y . Suppose for example that s /∈ Y . By Lemma 3.12,Y must be closed and hence �\Y must be open. But then there exists a neighbour-hood (s − ε, s + ε) of s in �\Y , and so s − ε

2 /∈ Y . But this implies that y ≤ s − ε2

for all y ∈ Y , which contradicts the assumption that s = sup(id, Y ). Hence s ∈ Y .A similar argument shows that i ∈ Y . Thus Y = [i, y1] ∪, . . . ,∪ [yr , s] say, and soY contains its bounds. �

AU

TH

OR

’S P

RO

OF



967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

Fig. 3.12 ???

Lemma 3.16 Let (X,Γ ) be a topological space and f :X→� a continuous real-valued function. If Y ⊂X is compact then there exist points x0 and x1 in Y such thatf (x0)≤ f (y)≤ f (x1) for all y ∈ Y .

Proof By Lemma 3.10, f (Y ) is compact. By Lemma 3.15, f (Y ) contains its infi-mum and supremum. Thus there exists x0, x1 ∈ Y such that

f (x0)≤ f (y)≤ f (x1) for all y ∈ Y.

Note that f (Y ) must be bounded, and so f (x0) and f (x1) must be finite. �

We have here obtained a second proof of the Weierstrass Theorem that a contin-uous real-valued function on a compact set attains its bounds, and shown moreoverthat these bounds are finite. A useful application of this theorem is that if Y is acompact set in �n and x /∈ Y then there is some point y in Y which is nearest to x.Remember that we defined the distance from a point x in a metric space (X,d)

to a subset Y of X to be d(x,Y ) = inf(fx,Y ) where fx : Y → � is defined byfx(y)= d(x,−)(y)= d(x, y).

Lemma 3.17 Suppose Y is a subset of a metric space (X,d) and x ∈X. Then thefunction fx : Y →� given by fx(y)= d(x, y) is continuous.

Proof Consider y1, y2 ∈ Y and suppose that d(x, y1) ≥ d(x, y2). Then |d(x, y1)−d(x, y2)| = d(x, y1)− d(x, y2).

By the triangle inequality d(x, y1) ≤ d(x, y2) + d(y2, y1). Hence |d(x, y1) −d(x, y2)| ≤ d(y1, y2) and so d(y1, y2) < ε ⇒ |d(x, y1) − d(x, y2)| < ε, for anyε > 0.

Thus for any ε > 0, d(y1, y2) < ε⇒ |fx(y1)− fx(y2)|< ε, and so fx is contin-uous. �

njschofi

Cross-Out

njschofi

Sticky Note

Nearest points under different metrics

njschofi

Sticky Note

See Figure 3.12.

AU

TH

OR

’S P

RO

OF


3.4 Convexity 99

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

Lemma 3.18 If Y is a compact subset of a compact metric space X and x ∈ X,then there exists a point y0 ∈ Y such that d(x,Y )= d(x, y0) <∞.

Proof By Lemma 3.17, the function d(x,−) : Y → � is continuous. By Lem-ma 3.16, this function attains its lower and upper bounds on Y . Thus there existsy0 ∈ Y such that d(x, y0)= inf(d(x,−), Y )= d(x,Y ), where d(x, y0) is finite. �

The point y0 in Y such that d(x, y0)= d(x,Y ) is the nearest point in Y to x.Note of course that if x ∈ Y then d(x,Y )= 0.More importantly, when Y is compact d(x,Y ) = 0 if and only if x ∈ Y . To see

this necessity, suppose that d(x,Y )= 0. Then by Lemma 3.18, there exists y0 ∈ Y

such that d(x, y0)= 0. By the definition of a metric d(x, y0)= 0 iff x = y0 and sox ∈ Y . The point y ∈ Y that is nearest to x is dependent on the metric of course, andmay also not be unique.

3.4 Convexity

3.4.1 A Convex Set

If x, y are two points in a vector space, X, then the arc, [x, y], is the set {z ∈ X :∃ λ ∈ [0,1] s.t. z = λx + (1 − λ)y}. A point in the arc [x, y] is called a convexcombination of x and y. If Y is a subset of X, then the convex hull, con(Y ), of Y isthe smallest set in X that contains, for every pair of points x, y in Y , the arc [x, y].The set Y is called convex iff con(Y ) = Y . The set Y is strictly convex iff for anyx, y ∈ Y the combination λx + (1− λ)y, for λ ∈ (0,1), belongs to the interior of Y .

Note that if Y is a vector subspace of the real vector space X then Y must beconvex. For then if x, y ∈ Y both λ, (1− λ) ∈ � and so λx + (1− λ)y ∈ Y .

Definition 3.7 Let Y be a real vector space, or a convex subset of a real vectorspace, and let f : Y →� be a function. Then f is said to be1. convex iff f (λx + (1− λ)y)≤ λf (x)+ (1− λ)f (y)

2. concave iff f (λx + (1− λ)y)≥ λf (x)+ (1− λ)f (y)

3. quasi-concave iff f (λx + (1− λ)y) ≥ min[f (x), f (y)] for any x, y ∈ Y andany λ ∈ [0,1].

Suppose now that f : Y →� and consider the preference P ⊂ Y × Y inducedby f . For notational convenience from now on we regard P as a correspondenceP : Y → Y . That is define P by

P(x)= {y ∈ Y : f (y) > f (x)}.

If f is quasi-concave then when y1, y2,∈ P(x),

f(λy1 + (1− λ)y2

)≥min[f (y1), f (y2)

]> f (x).

AU

TH

OR

’S P

RO

OF



1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

Hence λy1 + (1− λ)y2 ∈ P(x). Thus for all x ∈ Y,P (x) is convex.We shall call a preference correspondence P : Y → Y convex when Y is convex

and P is such that P(x) is convex for all x ∈ Y .When a function f : Y →� is quasi-concave then the strict preference corre-

spondence P defined by f is convex. Note also that the weak preference R : Y → Y

given by

R(x)= {y ∈ Y : f (y)≥ f (x)}

will also be convex.If f : Y →� is a concave function then it is quasi-concave. To see this consider

x, y ∈ Y , and suppose that f (x)≤ f (y). By concavity,

f(λx + (1− λ)y

)≥ λf (x)+ (1− λ)f (y)

≥ λf (x)+ (1− λ)f (x)

≥min[f (x), f (y)

].

Thus f is quasi-concave. Note however that a quasi-concave function need beneither convex nor concave. However if f is a linear function then it is convex, con-cave and quasi-concave. There is a partial order > on �n given by x > y iff xi > yi

where x = (x1, . . . , xn), y = (y1, . . . , yn). A function f : �n→� is weakly mono-tonically increasing iff f (x)≥ f (y) for any x, y ∈ �n such that x > y. A functionf : �n→� has decreasing returns to scale iff f is weakly monotonically increas-ing and concave. A very standard assumption in economic theory is that feasibleproduction of an output has decreasing returns to scale of inputs, and that con-sumers’ utility or preference has decreasing returns to scale in consumption. Weshall return to this point below.

3.4.2 Examples

Example 3.2 (i) Consider the set X1 = {(x1, x2) ∈ �2 : x2 ≥ x1}. Clearly if x2 ≥ x1and x′2 ≥ x′1 then λx2+ (1−λ)x′2 ≥ λx1+ (1−λ)x′1, for λ ∈ [0,1]. Thus λ(x1, x2)+(1− λ)(x′1, x′2)= λx1 + (1− λ)x′1, λx2 + (1− λ)x′2 ∈X1. Hence X1 is convex.

On the other hand consider the set X2 = {(x1, x2) ∈ �2 : x2 ≥ x12}.

As Fig. 3.13a indicates, this is a strictly convex set. However the set X3 ={(x1, x2) ∈ �2 : |x2| ≥ x1

2} is not convex. To see this suppose x2 < 0. Then(x1, x2) ∈X3 implies that x2 ≤−x1

2. But then −x2 ≥ x12. Clearly (x1,0) belongs

to the convex combination of (x1, x2) and (x1,−x2) yet (x1,0) /∈X3.(ii) Consider now the set X4 = {(x1, x2) : x2 > x1

3}. As Fig. 3.13b shows it ispossible to choose (x1, x2) and (x′1, x′2) with x1 < 0, so that the convex combinationof (x1, x2) and (x′1, x′2) does not belong to X4. However X5 = {(x1, x2) ∈ �2 : x2 ≥x1

3 and x1 ≥ 0} and X6 = {(x1, x2) ∈ �2 : x2 ≤ x13 and x1 ≤ 0} are both convex

sets.

AU

TH

OR

’S P

RO

OF


3.4 Convexity 101

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

Fig. 3.13a ???

Fig. 3.13b ???

(iii) Now consider the set X7 = {(x1, x2) ∈ �2 : x1x2 ≥ 1}. From Fig. 3.13c itis clear that the restriction of the set X7 to the positive quadrant �2+ = {(x1, x2) ∈�2 : x1 ≥ 0 and x2 ≥ 0} is strictly convex, as is the restriction of x7 to the negativequadrant �2− = {(x1, x2) ∈ �2 : x1 ≤ 0 and x2 ≤ 0}. However if (x1, x2) ∈X7 ∩�2+then (−x1,−x2) ∈X7 ∩�2−. Clearly the origin (0,0) belongs to the convex hull of(x1, x2) and (−x1,−x2), yet does not belong to X7. Thus X7 is not convex.

Finally a set of the form

X8 ={(x1, x2) ∈ �2+ : x2 ≤ x1

α for α ∈ (0,1)}

is also convex. See Fig. 3.13d.

njschofi

Cross-Out

njschofi

Sticky Note

Example 3.2(i)

njschofi

Cross-Out

njschofi

Sticky Note

Example 3.2 (ii)

njschofi

Sticky Note

(iv)

AU

TH

OR

’S P

RO

OF



1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

Fig. 3.13c ???

Fig. 3.13d ???

Fig. 3.13e ???

Example 3.3 (i) Consider the set B = {(x1, x2) ∈ �2 : (x1−a1)2+(x2−a2)

2 ≤ r2}.See Fig. 3.13e. This is the closed ball centered on (a1, a2)= a, of radius r . Supposethat x, y ∈ B and z= λx + (1− λ)y for λ ∈ [0,1].

Let ‖ ‖ stand for the Euclidean norm. Then x, y both satisfy ‖x − a‖ ≤ r,‖y −a‖ ≤ r . But ‖z− a‖ ≤ λ‖x− a‖+ (1− λ)‖y− a‖. Thus ‖z− a‖ ≤ r and so z ∈ B .Hence B is convex. Moreover B is a closed and bounded subset of �2 and is thuscompact. For a general norm on �n, the closed ball B = {x ∈ �n : ‖x − a‖ ≤ r}

njschofi

Cross-Out

njschofi

Sticky Note

Example 3.2 (iii).

njschofi

Cross-Out

njschofi

Sticky Note

Example 3.2(iv)

njschofi

Cross-Out

njschofi

Sticky Note

The Euclidean Ball

AU

TH

OR

’S P

RO

OF


3.4 Convexity 103

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

Fig. 3.14 ???

will be compact and convex. In particular, if the Euclidean norm is used, then B isstrictly convex.

(ii) In the next section we define the hyperplane H(ρ,α) normal to a vectorρ in �n to be {x ∈ �n : 〈ρ,x〉 = α} where α is some real number. Suppose thatx, y ∈H(ρ,x).

Now

⟨ρ,λx + (1− λ)y

⟩= ⟨λ(ρ, x)+ (1− λ)(ρ, y)⟩

= α, whenever λ ∈ [0,1].

Thus H(ρ,α) is a convex set. We also define the closed half-space H+(ρ,α) byH+(ρ,α) = {x ∈ �n : 〈ρ,x〉 ≥ α}. Clearly if x, y ∈ H+(ρ,α) then 〈ρ,λx + (1 −λ)y〉 = (λ〈ρ,x〉 + (1− λ)〈ρ,y〉)≥ α and so H+(ρ,α) is also convex.

Notice that if B is the compact convex ball in �n then there exists some ρ ∈ �n

and some α ∈ � such that B ⊂H+(ρ,α).If A and B are two convex sets in �n then A∩B must also be a convex set, while

A ∪ B need not be. For example the union of two disjoint convex sets will not beconvex.

We have called a function f : Y → � convex on Y iff f (λx + (1 − λ)y) ≤λf (x)+ (1− λ)f (y) for x, y ∈ Y .

Clearly this is equivalent to the requirement that the set F = {(z, x) ∈ � × Y :z≥ f (x)} is convex. (See Fig. 3.14.)

To see this suppose (z1, x1) and (z2, x2) ∈ F .Then λ(z1, x1)+ (1− λ)(z2, x2) ∈ F iff λz1 + (1− λ)z2 ≥ f (λx1 + (1− λ)x2).

But (f (x1), x1) and (f (x2), x2) ∈ F , and so λz1 + (1 − λ)z2 ≥ λf (x1) + (1 −λ)f (x2)≥ f (λx1 + (1− λ)x2) for λ ∈ [0,1].

In the same way f is concave on Y iff G= {(z, x) ∈ �×Y : z≤ f (x)} is convex.If f : Y →� is concave then the function (−f ) : Y →�, given by (−f )(x) =−f (x), is convex and vice versa.

njschofi

Cross-Out

njschofi

Sticky Note

Convex and Concave Functions

AU

TH

OR

’S P

RO

OF



1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

To see this note that if z≤ f (x) then −z≥−f (x), and so G= {(z, x) ∈ �× Y :z≤ f (x)} is convex implies that F = {(z, x) ∈ �× Y : z≥ (−f )(x)} is convex.

Finally f is quasi-concave on Y iff, for all z ∈ �, the set G(z) = {x ∈ Y : z ≤f (x)} is convex.

Notice that G(x) is the image of G under the projection mapping pz : � × Y →Y : (z, x)→ x. Since the image of a convex set under a projection is convex, clearlyG(z) is convex for any z whenever G is convex. As we know already this meansthat a concave function is quasi-concave. We now apply these observations.

Example 3.4(i) Let f : �→� by x→ x2.

As Example 3.2(i) showed, the set F = {(x, z) ∈ �×� : z≥ f (x)= x2}is convex. Hence f is convex.

(ii) Now let f : � → � by x → x3. Example 3.2(ii) showed that the set F ={(x, z) ∈ �+ × � : z ≥ f (x) = x3} is convex and so f is convex on theconvex set �+ = {x ∈ � : x ≥ 0}.

On the other hand F = {(x, z) ∈ �− ×� : z ≤ f (x)= x3} is convex andso f is concave on the convex set �− = {x ∈ � : x ≤ 0}.

(iii) Let f : �→ � by x→ 1x

. By Example 3.2(iii) the set F = {(x, z) ∈ �+ ×� : z≥ f (x)= 1

x} is convex, and so f is convex on �+ and concave on �−.

(iv) Let f (x)= xα where 0 < α < 1. Then F = {(x, z) ∈ �+ ×� : z ≤ f (x)=xα} is convex, and so f is concave.

(v) Consider the exponential function exp : � → � : x → ex . Figure 3.15ademonstrates that the exponential function is convex. Another way of show-ing this is to note that ex > f (x) for any geometric function f : x→ xr forr > 1, for any x ∈ �+.

Since the geometric functions are convex, so is ex . On the other hand asFig. 3.15b shows the function loge : �+ →�, inverse to exp, is concave.

(vi) Consider now f : �2 →� : (x, y)→ xy. Just as in Example 3.2(iii) the set{(x, y) ∈ �2+ : xy ≥ t} = �2+ ∩ f−1[t,∞)} is convex and so f is a quasi-concave function on �2+. Similarly f is quasi-concave on �2−. However f

is not quasi-concave on �2.(vii) Let f : �2 →� : (x1, x2)→ r2 − (x1 − a1)

2 − (x2 − a2)2. Since the func-

tion g(x)= x2 is convex, (−g)(x)=−x2 is concave, and so clearly f is aconcave function. Moreover it is obvious that f has a supremum in �2 at thepoint (x1, x2)= (a1, a2). On the other hand the functions in Examples 3.4(iv)to (vi) are monotonically increasing.

3.4.3 Separation Properties of Convex Sets

Let X be a vector space of dimension n with a scalar product 〈, 〉. Define H(ρ,α)={x ∈X : 〈ρ,x〉 = α} to be the hyperplane in X normal to the vector ρ ∈ �n. It shouldbe clear that H(ρ,α) is an (n− 1) dimensional plane displaced some distance fromthe origin. To see this suppose that x = λρ belongs to H(ρ,α). Then 〈ρ,λρ〉 =

AU

TH

OR

’S P

RO

OF


3.4 Convexity 105

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

Fig. 3.15 ???

(a)

(b)

λ‖ρ‖2 = α. Thus λ = α

‖ρ‖2 . Hence the length of x is ‖x‖ = |λ| ‖ρ‖ = |α|‖ρ‖ and so

x =± |α|‖ρ‖2 ρ.

Clearly if y = λρ + y0 belongs to H(ρ,α) then 〈ρ,y〉 = 〈ρ,λρ + y0〉 = α +〈ρ,y0〉 and so 〈ρ,y0〉 = 0.

Thus any vector y in H(ρ,α) can be written in the form y = λρ+ y0 where y0 isorthogonal to ρ. Since there exist (n−1) linearly independent vectors y1, . . . , yn−1,all orthogonal to ρ, any vector y ∈H(ρ,α) can be written in the form

y = λρ +n−1∑

i=1

aiyi,

where λρ is a vector of length |α|‖ρ‖ . Now let {ρ⊥} = {x ∈ �n : 〈ρ,x〉 = 0}. Clearly

{ρ⊥} is a vector subspace of �n, of dimension (n− 1) through the origin.Thus H(ρ,α)= λρ + {ρ⊥} has the form of an (n− 1)-dimensional vector sub-

space displaced a distance |α|‖ρ‖ along the vector ρ. Clearly if ρ1 and ρ2 are colinear

vectors (i.e., ρ2 = aρ1 for some a ∈ �) then {ρ⊥} = {(ρ2)⊥}.

njschofi

Cross-Out

njschofi

Sticky Note

The exponential and logarithmic functions.

AU

TH

OR

’S P

RO

OF



1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

Suppose that α1ρ1‖ρ‖2 = α2ρ2

‖ρ‖2 , then both H(ρ1, α1) and H(ρ2, α2) contain the same

point and are thus identical. Thus H(ρ1, α1)=H(aρ1, aα1)=H(ρ1‖ρ2‖ ,

α1‖ρ1‖ ).The hyperplane H(ρ,α) separates X into two closed half-spaces:

H+(ρ,α)= {x ∈X : 〈ρ,x〉 ≥ α}, and H−(ρ,α)= {x ∈X : 〈ρ,x〉 ≤ α

}.

We shall also write

H 0+(ρ,α)= {x ∈X : 〈ρ,x〉> α}, and H 0−(ρ,α)= {x ∈X : 〈ρ,x〉< α

}

for the open half-spaces formed by taking the interiors of H+(ρ,α) and H−(ρ,α),in the case ρ �= 0.

Lemma 3.19 Let Y be a non-empty compact convex subset of a finite dimensionalreal vector space X, and let x be a point in X\Y . Then there is a hyperplane H(ρ,α)

through a point y0 ∈ Y such that

〈ρ,x〉< α = 〈ρ,y0〉 ≤ 〈ρ,y〉 for all y ∈ Y.

Proof As in Lemma 3.17 let fx : Y →� be the function fx(y)= ‖x − y‖, where‖ ‖ is the norm induced from the scalar product 〈, 〉 in X.

By Lemma 3.18 there exists a point y0 ∈ Y such that ‖x − y0‖ = inf (fx,Y )=d(x,Y ). Thus ‖x − y0‖ ≤ ‖x − y‖ for all y ∈ Y . Now define ρ = y0 − x and α =〈ρ,y0〉. Then

〈ρ,x〉 = 〈ρ,y0〉 −⟨ρ, (y0 − x)

⟩= 〈ρ,y0〉 − ‖ρ‖2 < 〈ρ,y0〉.

Suppose now that there is a point y ∈ Y such that 〈ρ,y0〉> 〈ρ,y〉. By convexity,w = λy + (1− λ)y0 ∈ Y , where λ belongs to the interval (0,1). But

‖x − y0‖2 − ‖x −w‖2 = ‖x − y0‖2 − ‖x − λy − y0 + λy0‖2

= 2λ⟨ρ, (y0 − y)

⟩− λ2‖y − y0‖2.

Now 〈ρ,y0〉> 〈ρ,y〉 and so, for sufficiently small λ, the right hand side is pos-itive. Thus there exists a point w in Y , close to y0, such that ‖x − y0‖> ‖x −w‖.But this contradicts the assumption that y0 is the nearest point in Y to x. Thus〈ρ,y〉 ≥ 〈ρ,y0〉 for all y ∈ Y . Hence 〈ρ,x〉< α = 〈ρ,y0〉 ≤ 〈ρ,y〉 for all y ∈ Y . �

Note that the point y0 belongs to the hyperplane H(ρ,α), the set Y belongs tothe closed half-space H+〈ρ,α〉, while the point x belongs to the open half-space

H 0−(ρ,α)= {z ∈X : 〈ρ, z〉< α}.

Thus the hyperplane separates the point x from the compact convex set Y (seeFig. 3.16).

AU

TH

OR

’S P

RO

OF


3.4 Convexity 107

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

Fig. 3.16 ???

Fig. 3.17 ???

While convexity is necessary for the proof of this theorem, the compactness re-quirement may be weakened to Y being closed. Suppose however that Y is an openset. Then it is possible to choose a point x outside Y , which is, nonetheless, anaccumulation point of Y such that d(x,Y )= 0.

On the other hand if Y is compact but not convex, then a situation such asFig. 3.17 is possible. Clearly no hyperplane separates x from Y .

If A and B are two sets, and H(ρ,α) = H is a hyperplane such that A ⊆H−(ρ,α) and B ⊆ H+(ρ,α) then say that H weakly separates A and B . If H

is such that A⊂H−(ρ,α) and B ⊂H+(ρ,α) then say H strongly separates A andB . Note in the latter case that it is necessary that A∩B =Φ .

In Lemma 3.19 we found a hyperplane H(ρ,α) such that 〈ρ,x〉< α. Clearly itis possible to find α− < α such that 〈ρ,x〉< α−.

Thus the hyperplane H(ρ,α−) strongly separates x from the compact convexset Y .

If Y is convex but not compact, then it is possible to find a hyperplane H thatweakly separates X from Y .

We now extend this result to the separation of convex sets.

njschofi

Cross-Out

njschofi

Sticky Note

A separating Hyperplane

njschofi

Cross-Out

njschofi

Sticky Note

non-existance of a separating hyperplane.

AU

TH

OR

’S P

RO

OF



1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

Separating Hyperplane Theorem Suppose that A and B are two disjoint non-empty convex sets of a finite dimensional vector space X. Then there exists a hy-perplane H that weakly separates A from B . If both A and B are compact then H

strongly separates A from B .

Proof Since A and B are convex the set A− B = {w ∈ X : w = a − b where a ∈A,b ∈ B} is also convex.

To see this suppose a1−b1 and a2−b2 ∈A−B . Then λ(a1−b1)+ (1−λ)(a2−b2)= [λa1 + (1− λ)a2] + [λb1 + (1− λ)b2] ∈A−B .

Now A ∩ B = Φ . Thus there exists no point in both A and B , and so 0 /∈A− B . By Lemma 3.19, there exists a hyperplane H(−ρ,0) weakly separating 0from A−B , i.e., 〈ρ,0〉 ≤ 〈ρ,w〉 for all w ∈ B −A. But then 〈ρ,a〉 ≤ 〈ρ,b〉 for alla ∈A,b ∈ B .

Choose α ∈ [supa∈A〈ρ,a〉, infb∈B〈ρ,b〉]. In the case that A,B are non-compact,it is possible that

supa∈A

〈ρ,a〉 = infb∈B〈ρ,b〉.

Thus 〈ρ,a〉 ≤ α ≤ 〈ρ,b〉 and so H(ρ,α) weakly separates A and B .Consider now the case when A and B are compact.The function ρ∗ : X → � given by ρ∗(x) = 〈ρ,x〉 is clearly continuous. By

Lemma 3.16, since both A and B are compact, there exist points a ∈ A and b ∈ B

such that

〈ρ,a〉 = supa∈A

〈ρ,a〉 and 〈ρ,b〉 = infb∈B〈ρ,b〉.

If supa∈A〈ρ,a〉 = infb∈B〈ρ,b〉, then 〈ρ,a〉 = 〈ρ,b〉, and so a = b, contradictingA∩B =Φ .

Thus 〈ρ,a〉 < 〈ρ,b〉 and we can choose α such that 〈ρ,a〉 ≤ 〈ρ,a〉 < α <

〈ρ,b〉 ≤ 〈ρ,b〉 for all a, b in A,B .Thus H(ρ,α) strongly separates A and B . (See Fig. 3.18b.) �

Example 3.5 Hildenbrand and Kirman (1976) have applied this theorem to find aprice vector which supports a final allocation of economic resources. Consider asociety M = {1, . . . ,m} in which each individual i has an initial endowment ei =(ei1, . . . , ein) ∈ �n of n commodities. At price vector p = (p1, . . . , pn), the budgetset of individual i is

Bi(p)= {x ∈ �n : 〈p,x〉 ≤ 〈p, ei〉}.

Each individual has a preference relation Pi on �n ×�n, and at the price vectorp the demand Di(p) by i is the set

{x ∈ Bi(p) : yPix for no y ∈ Bi(p)

}.

AU

TH

OR

’S P

RO

OF


3.4 Convexity 109

1473

1474

1475

1476

1477

1478

1479

1480

1481

1482

1483

1484

1485

1486

1487

1488

1489

1490

1491

1492

1493

1494

1495

1496

1497

1498

1499

1500

1501

1502

1503

1504

1505

1506

1507

1508

1509

1510

1511

1512

1513

1514

1515

1516

1517

1518

Fig. 3.18 a Weak separation.b Strong separation

(a)

(b)

Let fi = (fi1, . . . , fin) ∈ �n be the final allocation to individual i, for i =1, . . . ,m. Suppose there exists a price vector p = (p1, . . . , pn) with the property (∗)xPifi ⇒ 〈p,x〉> 〈p, ei〉. Then this would imply that fi ∈ Bi(p)⇒ fi ∈Di(p). Ifproperty (∗) holds at some price vector p, for each i, then fi ∈Di(p) for each i.

To show existence of such a price vector, let

πi = Pi(fi)− ei ∈ �n.

Here as before Pi(fi)= {x ∈ �n : xPifi}. Suppose that there exists a hyperplaneH(p,0) strongly separating 0 from πi . In this case 0 < 〈p,x−ei〉 for all x ∈ Pi(fi).But this is equivalent to 〈p,x〉> 〈p, ei〉 for all x ∈ Pi(fi).

Let π = Con[⋃i∈N πi] be the convex hull of the sets πi, i ∈M . Clearly if 0 /∈ π

and there is a hyperplane H(p,0) strongly separating 0 from π , then p is a pricevector which supports the final allocation f1, . . . , fn.

AU

TH

OR

’S P

RO

OF



1519

1520

1521

1522

1523

1524

1525

1526

1527

1528

1529

1530

1531

1532

1533

1534

1535

1536

1537

1538

1539

1540

1541

1542

1543

1544

1545

1546

1547

1548

1549

1550

1551

1552

1553

1554

1555

1556

1557

1558

1559

1560

1561

1562

1563

1564

3.5 Optimisation on Convex Sets

A key notion underlying economic theory is that of the maximisation of an objectivefunction subject to one or a number of constraints. The most elementary case ofsuch a problem is the one addressed by the Weierstrass Theorem: if f :X→� is acontinuous function, and Y is a compact constraint set then there exists some pointy such that f (y)= sup(f,Y ). Here y is a maximum point of f on Y .

Using the Separating Hyperplane Theorem we can extend this analysis to theoptimisation of a convex preference correspondence on a compact convex constraintset.

3.5.1 Optimisation of a Convex Preference Correspondence

Suppose that Y is a compact, convex constraint set in �n and P : �n →�n is apreference correspondence which is convex (i.e., P(x) is convex for all x ∈ �n).A choice for P on Y is a point y ∈ Y such that P(y)∩ Y =Φ .

We shall say that P is non-satiated in �n iff for no y ∈ �n is it the case thatP(y) = Φ . A sufficient condition to ensure non-satiation for example is the as-sumption of monotonicity, i.e., x > y (where as before this means xi > yi , for eachof the coordinates xi, yi, i = 1, . . . , n) implies that x ∈ P(y).

Say that P is locally non-satiated in �n iff for each y ∈ �n and any neighbour-hood Uy of y in �n, then P(y)∩Uy �=Φ .

Clearly monotonicity implies local non-satiation implies non-satiation.Suppose that y belongs to the interior of the compact constraint set Y . Then there

is a neighbourhood Uy of y within Y . Consequently P(y)∩Uy �=Φ and so y cannotbe a choice from Y . On the other hand, since Y is compact it is closed, and so if y

belongs to the boundary δY of Y , it belongs to Y itself. By definition if y ∈ δY thenany neighbourhood Uy of y intersects �n\Y . Thus when P(y)⊂�n\Y,y will be achoice from Y . Alternatively if y is a choice of P from Y , then y must belong to theboundary of P .

Lemma 3.20 Let Y be a compact, convex constraint set in�n and let P : �n→�n

be a preference correspondence which is locally non-satiated, and is such that, forall x ∈ �n,P (x) is open and convex. Then y is a choice of P from Y iff there is ahyperplane H(p,α) through y in Y which separates Y from P(y) in the sense that

〈p,y〉 ≤ α = 〈p,y〉< 〈p,x〉 for all y ∈ Y and all x ∈ P(y).

Proof Suppose that the hyperplane H(p,α) contains y and separates Y from P(y)

in the above sense. Clearly y must belong to the boundary of Y . Moreover 〈p,y〉<〈p,x〉 for all y ∈ Y,x ∈ P(y). Thus Y ∩ P(y) = Φ and so y is the choice of P

from Y .On the other hand suppose that y is a choice. Then P(y)∩ Y =Φ .

AU

TH

OR

’S P

RO

OF


3.5 Optimisation on Convex Sets 111

1565

1566

1567

1568

1569

1570

1571

1572

1573

1574

1575

1576

1577

1578

1579

1580

1581

1582

1583

1584

1585

1586

1587

1588

1589

1590

1591

1592

1593

1594

1595

1596

1597

1598

1599

1600

1601

1602

1603

1604

1605

1606

1607

1608

1609

1610

Moreover the local non-satiation property, P(y) ∩ Uy �= Φ for Uy a neighbor-hood of y in �n, guarantees that y must belong to the boundary of Y . Since Y andP(y) are both convex, there exists a hyperplane H(p,α) through y such that

〈p,y〉 ≤ α = 〈p,y〉 ≤ 〈p,x〉

for all y ∈ Y , all x ∈ P(y). But P(y) is open, and so the last inequality can bewritten 〈p,y〉< 〈p,x〉. �

Note that if either the constraint set, Y , or the correspondence P is such thatP(y) is strictly convex, for all y ∈ �n, then the choice y is unique.

If f : �n →� is a concave or quasi-concave function then application of thislemma to the preference correspondence P : �n → �, where P(x) = {y ∈ �n :f (y) > f (x)}, characterises the maximum point y of f on Y . Here y is a maximumpoint of f on Y if f (y)= sup(f,Y ). Note that local non-satiation of P requires thatfor any point x in �n, and any neighbourhood Ux of x in �n, there exists y ∈ Ux

such that f (y) > f (x).The vector p = (p1, . . . , pn) which characterises the hyperplane H(p,α) is

called in economic applications the vector of shadow prices. The reason for thiswill become clear in the following example.

Example 3.6 As an example suppose that optimal use of (n− 1) different inputs(x1, . . . , xn−1) gives rise to an output y, say, where y = y(x1, . . . , xn−1). Any n

vector (x1, . . . , xn−1, xn) is feasible as long as xn ≤ y(x1, . . . , xn−1). Here xn isthe output. Write g(x1, . . . , xn−1, xn) = y(x1, . . . , xn−1) − xn. Then a vector x =(x1, . . . , xn−1, xn) is feasible iff g(x)≥ 0.

Suppose now that y = y(x1, . . . , xn−1) is a concave function in x1, . . . , xn−1.Then clearly the set G = {x ∈ �n : g(x) ≥ 0} is a convex set. Now let π(x1, . . . ,

xn−1, xn) = −∑pixi + pnxn be the profit function of the producer, when pricesfor inputs and outputs are given exogenously by (−p1, . . . ,−pn−1,pn). Again letP : �n →�n be the preference correspondence P(x) = {z ∈ �n : π(z) > π(x)}.Since for each x,P (x) is convex, and locally non-satiated, there is a choice x and ahyperplane H(ρ,α) separating P(x) from G.

Indeed it is clear from the construction that P(x) ⊂ H 0+(ρ,α) and G ⊂H−(ρ,α).

Moreover the hyperplane H(ρ,α) must coincide with the set of points {x ∈ �n :π(x)= π(x)}. Thus the hyperplane H(ρ,α) has the form

{x ∈ �n : 〈p,x〉 = π(x)

}

and so may be written H(p,π(x)). Note that the intercept on the xn axis is π(x)pn

while the distance of the hyperplane from the origin is π(x)‖p‖ = π(x)√∑

p2i

. Thus the

intercept gives the profit measured in units of output, while the distance from the

AU

TH

OR

’S P

RO

OF



1611

1612

1613

1614

1615

1616

1617

1618

1619

1620

1621

1622

1623

1624

1625

1626

1627

1628

1629

1630

1631

1632

1633

1634

1635

1636

1637

1638

1639

1640

1641

1642

1643

1644

1645

1646

1647

1648

1649

1650

1651

1652

1653

1654

1655

1656

Fig. 3.19 ???

origin of the profit hyperplane gives the profit in terms of a normalized price vector(‖p‖).

Figure 3.19 illustrates the situation with one input (x1) and one output (x2).Precisely the same analysis can be performed when optimal production is char-

acterised by a general production function F : �n→�.Here x1, . . . , xm are inputs, with prices−p1, . . . ,−pm and xm+1, . . . , xn are out-

puts with prices pm+1, . . . , pn. Let p = (−p1, . . . ,−pm,−pm+1, . . . , pn) ∈ �n.Define F so that a vector x ∈ �n is feasible iff F(x)≥ 0. Note that we also need

to restrict all inputs and outputs to be non-negative. Therefore define

�n+ = {x : xi ≥ 0 for i = 1, . . . , n}.Assume that the feasible set (or production set)

G= {x ∈ �n+ : F(x)≥ 0}

is convex.

As before let P : �n → �n where P(x) = {z ∈ �n : π(z) > π(x)}. Then thepoint x is a choice of P from G iff x maximises the profit function

π(x)=n−m∑

j=1

pm+j xm+j −m∑

j=1

pjxj .

By the previous example x is a choice iff the hyperplane H(p,π(x)) separatesP(x) and G: i.e., P(x)⊂H 0+(p,π(x)) and G⊂H−(p,π(x)).

In the next chapter we shall use this optimality condition to characterise thechoice m more fully in the case when F is “smooth”.

Example 3.7 Consider now the case of a consumer maximising a preference corre-spondence P : �n→�n subject to a budget constraint B(p) which is dependent ona set of exogenous prices p1, . . . , pn.

njschofi

Cross-Out

njschofi

Sticky Note

Separation from a feasible set

AU

TH

OR

’S P

RO

OF


3.5 Optimisation on Convex Sets 113

1657

1658

1659

1660

1661

1662

1663

1664

1665

1666

1667

1668

1669

1670

1671

1672

1673

1674

1675

1676

1677

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1689

1690

1691

1692

1693

1694

1695

1696

1697

1698

1699

1700

1701

1702

Fig. 3.20 ???

For example the consumer may hold an initial set of endowments (e1, . . . , en),so let

I =n∑

i=1

piei = 〈p, e〉.

The budget set is then

B(p)= {x ∈ �n+ : 〈p,x〉 ≤ I},

where for convenience we assume the consumer only buys a non-negative amountof each commodity. Suppose that P is monotonic, and P(x) is open, convex forall x ∈ �n. As before x is a choice from B(p) iff there is a hyperplane H(ρ,α)

separating P(x) from B(p).Under these conditions the choice must belong to the upper boundary of B(p)

and so satisfy (p, x)= (p, e)= I . Thus the hyperplane has the form H(p, I), andso the optimality condition is P(x)⊂H 0+(p, I ) and B(p)⊂H−(p, I ); i.e., (p, x)≤I = (p, x) < (p,y) for all x ∈ B(p) and all y ∈ P(x).

Figure 3.20 illustrates the situation with two commodities x1 and x2.In the next chapter we use this optimality condition to characterise a choice when

preference is given by a smooth utility function f : �n→�.In the previous two examples we considered

1. optimisation of a profit function, which is determined by exogenously givenprices, subject to a fixed production constraint, and

2. optimisation of a fixed preference correspondence, subject to a budget con-straint, which is again determined by exogenous prices.

Clearly at a given price vector each producer will “demand” a certain input vec-tor and supply a particular output vector, so that the combination is his choice inthe environment determined by p. In the same way a consumer will respond to aprice vector p by demanding optimal amounts of each commodity, and possiblysupplying other commodities such as labor, or various endowments. In contrast toExample 3.7, regard all prices as positive, and consider a commodity xj demandedby an agent i as an input to be negative, and positive when supplied as an output.

njschofi

Cross-Out

njschofi

Sticky Note

Example 3.7

AU

TH

OR

’S P

RO

OF



1703

1704

1705

1706

1707

1708

1709

1710

1711

1712

1713

1714

1715

1716

1717

1718

1719

1720

1721

1722

1723

1724

1725

1726

1727

1728

1729

1730

1731

1732

1733

1734

1735

1736

1737

1738

1739

1740

1741

1742

1743

1744

1745

1746

1747

1748

Let xij (p) be the optimal demand or supply of commodity j by agent i at the pricevector p, with m agents and n commodities, then market equilibrium of supplyand demand in the economy occurs when

∑mi=1∑n

j=1 xij (p) = 0. A price vectorwhich leads to market equilibrium in demand and supply is called an equilibriumprice vector.

Example 3.8 To give a simple example, consider two agents. The first agent con-trols a machine which makes use of labor, x, to produce a single output y. Regardx ∈ (−∞,0] and consider a price vector p ∈ �2+, where p = (w, r) and w is theprice of labor, and r the price of the output. An output is feasible from Agent Oneiff F(x, y)≥ 0.

Agent Two is the only supplier of labor, but is averse to working. His preferenceis described by a quasi-concave utility function f : �2→� and we restrict attentionto a domain

D = {(x, y) ∈ �2 : x ≤ 0, y ≥ 0}.

Assume that f is monotonic, i.e., if x1 < x2 and y1 < y2 then f (x1, y1) <

f (x2, y2). The budget constraint of Agent Two at (w, r) is therefore

B(w, r)= {(x1, y2) ∈D : ry2 ≤w|x|},where |x| is the quantity of labor supplied, and y2 is the amount of commodity y

consumed. Profit for Agent One is π(x, y)= ry−wx, and we shall assume that thisagent then consumes an amount y1 = π(x,y)

rof commodity y.

For equilibrium of supply and demand at prices (w, r)

1. y = y1 + y2;2. (x, y) maximises π(x, y)= ry −wx subject to F(x, y)≥ 0;3. (x, y2) maximises f (x, y2) subject to ry2 =wx.

At any point (x, y) ∈D, and vector (w, r) define

P(x, y)= {(x′, y′) ∈D : f (x ′, y′ − y1)> f (x, y − y1)

}

where as above y1 = ry−wxr

is the amount of commodity y consumed by the pro-ducer. Thus P(x, y) is the preference correspondence of Agent One displaced bythe distance y1 up the y-axis.

Figure 3.21 illustrates that it is possible to find a vector (w, r) such that H =H((w, r),π(x, y)) separates P(x, y) and the production set

G= {(x, y) ∈D : F(x, y)≥ 0}.

As in Example 3.6, the intersect of H with the y-axis is y1 = π(x,y)r

, the con-sumption of y by Agent One.

The hyperplane H is the set

{(x, y) : ry +wx = ry1

}.

AU

TH

OR

’S P

RO

OF


3.6 Kuhn-Tucker Theorem 115

1749

1750

1751

1752

1753

1754

1755

1756

1757

1758

1759

1760

1761

1762

1763

1764

1765

1766

1767

1768

1769

1770

1771

1772

1773

1774

1775

1776

1777

1778

1779

1780

1781

1782

1783

1784

1785

1786

1787

1788

1789

1790

1791

1792

1793

1794

Fig. 3.21 ???

Hence for all (x, y) ∈H,(x, y− y1) satisfies r(y− y1)+wx = 0. Thus H is theboundary of the second agent’s budget set {(x, y2) : ry2+wx = 0} displaced up they-axis by y1. Consequently y = y1 + y2 and so (x, y) is a market equilibrium.

Note that the hyperplane separation satisfies:

py − rx ≤ π(x, y) < ry′ −wx′

for all (x, y) ∈G, and all (x′, y′) ∈ P(x, y).As above (x′, y′) ∈ P(x, y) iff f (x′, y′ − y1) > f (x, y2). So the right hand side

implies that if (x′, y′) ∈ P(x, y), then ry′ −wx′ > π(x, y).Since y′2 = y′ − y1, ry

′2 − wx′ > 0 or (x′, y′ − y1) ∈ H 0+((w, r),0) and so

(x′, y′ − y1) is infeasible for Agent Two.Finally (x, y) maximises π(x, y) subject to (x, y) ∈G, and so (x, y) ∈G and so

(x, y) results from optimising behaviour by both agents at the price vector (w, r).

As this example illustrates it is possible to show existence of a market equilib-rium in economies characterised by compact convex constraint sets, and convexpreference correspondences. To do this in a general context however requires morepowerful mathematical tools, which we shall introduce in Sect. 3.8 below. Beforethis however we consider one further application of the hyperplane separation the-orem to a situation where we wish to optimise a concave objective function subjectto a number of concave constraints.

3.6 Kuhn-Tucker Theorem

Here we consider a family of constraints in �n. Let g = (g1, . . . , gm) : �n →�m.As before let �m+ = {(y1, . . . , ym) : yi ≥ 0 for i = 1, . . . , n}. A point x is feasible iffx ∈ �n+ and g(x) ∈ �m+ (i.e., gi(x) ≥ 0 for i = 1, . . . ,m). Let f : �n →� be the

njschofi

Cross-Out

njschofi

Sticky Note

Example 3.8

AU

TH

OR

’S P

RO

OF



1795

1796

1797

1798

1799

1800

1801

1802

1803

1804

1805

1806

1807

1808

1809

1810

1811

1812

1813

1814

1815

1816

1817

1818

1819

1820

1821

1822

1823

1824

1825

1826

1827

1828

1829

1830

1831

1832

1833

1834

1835

1836

1837

1838

1839

1840

objective function. Say that x∗ ∈ �n is an optimum of the constrained optimisationproblem (f, g) iff x∗ solves the problem: maximise f (x) subject to the feasibilityconstraint g(x) ∈ �m+, x ∈ �n+.

Call the problem (f, g) solvable iff there is some x ∈ �n+ such that gi(x) > 0 fori = 1, . . . ,m. The Lagrangian to the problem (f, g) is:

L(x,λ)= f (x)+m∑

i=1

λigi(x)= f (x)+ (λ,g(x))

where x ∈ �n+ and λ = (λ1, . . . , λm) ∈ �m+. The pair (x∗, λ∗) ∈ �n+m+ is called aglobal saddle point for the problem (f, g) iff

L(x,λ∗

)≤ L(x∗, λ∗

)≤ L(x∗, λ)

for all x ∈ �n+, λ ∈ �m+.

Kuhn-Tucker Theorem 1 Suppose f,g1, . . . , gm : �n →�m are concave func-tions for all x ∈ �n+. Then if x∗ is an optimum to the solvable problem (f, g) :�n→�m+1 there exists a λ∗ ∈ �m+ such that (x∗, λ∗) is a saddle point for (f, g).

Proof Let A = {y ∈ �m+1 : ∃ x ∈ �n+ : y ≤ (f, g)(x)}. Here (f, g)(x) = (f (x),

g(x), . . . , gm(x)). Thus y = (y1, . . . , ym+1) ∈A iff ∃x ∈ �n+ such that

y1 ≤ f (x)

yj+1 ≤ gj (x) for j = 1, . . . ,m.

Let x∗ be an optimum and

B = {z= (z1, . . . , zm+1) ∈ �m : z1 > f(x∗)

and (z2, . . . , zm+1) > 0}.

Since f,g are concave, A is convex. To see this suppose y1, y2 ∈ A. But sinceboth f and g are concave af (x1)+(1−a)f (x2)≤ f (ax1+(1−a)x2) and similarlyfor g, for any a ∈ [0,1]. Thus

ay1 + (1− a)y2 ≤ a(f,g)(x1)+ (1− a)(f, g)(x2)≤ (f, g)(ax1 + (1− a)x2

).

Since x1, x2 ∈ �n+, ax1 + (1− a)x2 ∈ �n+, and so ay1 + (1− a)y2 ∈ �n+.Clearly B is convex, since az11+ (1− a)z12 > f (x∗) if a ∈ [0,1] and z11, z12 >

f (x∗).To see A ∩ B =Φ , consider x ∈ �n such that g(x) < 0. Then (y2, . . . , ym+1)≤

g(x) < 0≤ (z2, . . . , zm+1).If g(x) ∈ �m+ then x is feasible. In this case y1 ≤ f (x)≤ f (x∗) < z1.

AU

TH

OR

’S P

RO

OF


3.6 Kuhn-Tucker Theorem 117

1841

1842

1843

1844

1845

1846

1847

1848

1849

1850

1851

1852

1853

1854

1855

1856

1857

1858

1859

1860

1861

1862

1863

1864

1865

1866

1867

1868

1869

1870

1871

1872

1873

1874

1875

1876

1877

1878

1879

1880

1881

1882

1883

1884

1885

1886

By the separating hyperplane theorem, there exists (p1, . . . , pm+1) ∈ �m+1 andα ∈ � such that H(p,α)= {w ∈ �m+1 :∑m+1

j=1 wjpj = α} separates A and B , i.e.,∑m+1

j=1 pjyj ≤ α ≤∑m+1j=1 pjzj for any y ∈A and z ∈ B .

Moreover p ∈ �m+1+ . By the definition of A, for any y ∈ A, ∃ x ∈ �n+ such thaty ≤ (f, g)(x).

Thus for any x ∈ �n+,

p1f (x)+m∑

j=2

pjgj (x)≤m+1∑

j=1

pjzj .

Since (f (x∗),0, . . . ,0) belongs to the boundary of B ,

p1f (x)+m∑

j=2

pjgj (x)≤ p1f(x∗).

Suppose p1 = 0. Since p ∈ �m+1+ , there exists pj > 0.Since the problem is solvable, ∃ x ∈ �n+ such that gj (x) > 0. But this gives∑mj=2 pjgj (x) > 0, contradicting p1 = 0. Hence p1 > 0.

Let λ∗j = pj+1p1

for j = 1, . . . ,m.

Then L(x,λ∗) = f (x) +∑mj=1 λ∗j gj (x) ≤ f (x∗) for all x ∈ �n+, where λ∗ =

(λ∗1, . . . , λ∗m) ∈ �m+. Since x∗ is feasible, g(x∗) ∈ �m+, and 〈λ∗, g(x∗)〉 ≥ 0. Butf (x∗) + 〈λ∗, g(x∗)〉 ≤ f (x∗) implying 〈λ∗, g(x∗)〉 ≤ 0. Thus 〈λ∗, g(x∗)〉 = 0.Clearly 〈λ,g(x∗)〉 ≥ 0 if λ ∈ �m+. Thus L(x,λ∗) ≤ L(x∗, λ∗) ≤ L(x∗, λ) for anyx ∈ �n+, λ ∈ �m+. �

Kuhn-Tucker Theorem 2 If the pair (x∗, λ∗) is a global saddle point for the prob-lem (f, g), then x∗ is an optimum.

Proof By the assumption

L(x,λ∗

)≤ L(x∗, λ∗

)≤ L(x∗, λ

)

for all x ∈ �n+, λ ∈ �m+.Choose λ= (λ∗1, . . . ,2λ∗i , . . . , λ∗m).Then L(x∗, λ)≥L(x∗, λ∗) implies gi(x

∗)λ∗i ≥ 0. If λ∗i �= 0 then gi(x∗)≥ 0, and

so 〈λ∗, g(x∗)〉 ≥ 0.On the other hand, L(x∗, λ∗) ≤ L(x∗,0) implies 〈λ∗, g(x∗)〉 ≤ 0. Thus

〈λ∗, g(x∗)〉 = 0. Hence f (x) + 〈λ∗, g(x)〉 ≤ f (x∗) ≤ f (x∗) + 〈λ,g(x∗)〉. If x isfeasible, g(x) ≥ 0 and so 〈λ∗, g(x)〉 ≥ 0. Thus f (x) ≤ f (x)+ 〈λ∗, g(x)〉 ≤ f (x∗)for all x ∈ �n+, whenever g(x) ∈ �m+. Hence x∗ is an optimum for the problem(f, g). Note that for a concave optimisation problem (f, g), x∗ is an optimumfor (f, g) iff (x∗, λ∗) is a global saddle point for the Lagrangian L(x,λ),λ ∈ �m+.

AU

TH

OR

’S P

RO

OF



1887

1888

1889

1890

1891

1892

1893

1894

1895

1896

1897

1898

1899

1900

1901

1902

1903

1904

1905

1906

1907

1908

1909

1910

1911

1912

1913

1914

1915

1916

1917

1918

1919

1920

1921

1922

1923

1924

1925

1926

1927

1928

1929

1930

1931

1932

Moreover (x∗, λ∗) are such that 〈λ∗, g(x∗)〉 minimises 〈λ,g(x)〉 for all λ ∈ �m+, x ∈�n+, g(x) ∈ �m+. Since 〈λ∗, g(x∗)〉 = 0 this implies that if gi(x

∗) > 0 then λ∗i = 0and if λ∗i > 0 then gi(x

∗)= 0.The coefficients (λ∗1, . . . , λ∗m) are called shadow prices. If the optimum is such

that gi(x∗) > 0 then the shadow price λ∗1 = 0. In other words if the optimum does

not lie in the boundary of the ith constraint set ∂Bi = {x : gi(x) = 0}, then thisconstraint is slack, with zero shadow price. If the shadow price is non zero then theconstraint cannot be slack, and the optimum lies on the boundary of the constraintset.

In the case of a single constraint, the assumption of non-satiation was sufficientto guarantee that the constraint was not slack.

In this case

f (x)+ p2

p1g(x)≤ f

(x∗)≤ f

(x∗)+ λg

(x∗)

for any x ∈ �+, and λ ∈ �+, where p2p1

> 0.The Kuhn-Tucker theorem is of particular use when objective and constraint

functions are smooth. In this case the Lagrangean permits computation of the opti-mal points of the problem. We deal with these procedures in Chap. 4. �

3.7 Choice on Compact Sets

In Lemma 3.9 we showed that when a preference relation is acyclic and lower demi-continuous (LDC) on a compact space, X, then P admits a choice. Lemma 3.20gives a different kind of result making use of compactness and convexity of theconstraint set, and convexity and openness of the image P(x) at each point x. Wenow present a class of related results using compactness, convexity and lower demi-continuity. These results are essentially all based on the Brouwer Fixed Point The-orem (Brouwer 1912) and provide the technology for proving existence of a marketequilibrium. We first introduce the notion of retraction and contractibility, and thenshow that any continuous function on the closed ball in �n admits a fixed point.

Definition 3.8 Let X be a topological space.(i) Suppose Y is a subset of X. Say Y has the fixed point property iff whenever

f : Y →X is a continuous function (with respect to the induced topology onY ) such that f (Y )= {f (y) ∈X : y ∈ Y } ⊂ Y , then there exists a point x ∈ Y

such that f (x)= x.(ii) If Y is a topological space, and there exists a bijective continuous function h :

X→ Y such that h−1 is also continuous then h is called a homeomorphismand X,Y are said to be homeomorphic (see Sect. 1.2.3 for the definition ofbijective).

(iii) If Y is a topological space and there exist continuous functions g : Y → X

and h : X→ Y such that h ◦ g : Y → X→ Y is the identity on Y , then h iscalled an r-map (for g).

AU

TH

OR

’S P

RO

OF


3.7 Choice on Compact Sets 119

1933

1934

1935

1936

1937

1938

1939

1940

1941

1942

1943

1944

1945

1946

1947

1948

1949

1950

1951

1952

1953

1954

1955

1956

1957

1958

1959

1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

(iv) If Y ⊂ Z ⊂ X and g = id : Y → Y is the (continuous) identity map and h :Z→ Y is an r-map for g then h is called a retraction (of Z on Y ) and Y iscalled a retract of Z.

(v) If Y ⊂X and there exists a continuous function f : Y × [0,1]→ Y such thatf (y,0) = y (so f ( ,0) is the identity on Y ) and f (y,1) = y0 ∈ Y , for ally ∈ Y , then Y is said to be contractible.

(vi) Suppose that Y ⊂ Z ⊂ X and there exists a continuous function f : Z ×[0,1] → Z such that f (z,0)= z ∀z ∈ Z, f (y, t)= y ∀y ∈ Y and f (z,1)=h(z) where h : Z→ Y is a retraction, then f is called a strong retraction ofZ on Y , and Y is called a deformation retract of Z.

To illustrate the idea of contractibility, observe that the closed Euclidean ballin �n of radius ξ , centered at x, namely

Bn = clos(Bd(x, ξ)

)= {y ∈ �n : d(x, y)≤ ξ},

is obviously strictly convex and compact. Moreover the center, {x}, is clearly a de-formation retract of Bn. To see this let f (y, t)= (1− t)y + tx for y ∈ Bn. Clearlyf (y,1)= h(y)= {x} so h : Bn→{x} and h(x)= x. Since f is continuous, this alsoimplies that Bn is contractible. A continuous function such as f : Bn×[0,1]→ Bn,or more generally f : Z × [0,1] → Z, is often called a homotopy and writtenft : Z→ Z where ft (z) = f (z, t). The homotopy ft between the identity and theretraction f1 of Z on Y means that Z and Y are “topologically” equivalent (in somesense). Thus the ball Bn and the point {x} are topologically equivalent. More gen-erally, if Y is contractible and there is a strong retraction of Z on Y , then Z is alsocontractible.

Lemma 3.21 Let X be a topological space. If Y is contractible and Y ⊂ Z ⊂ X

such that Y is a deformation retract of Z, then Z is contractible.

Proof Let g : Z × [0,1] → Z be the strong retraction of Z on Y , and let f : Y ×[0,1]→ Y be the contraction of Y to y ∈ Y . Define

r :Z×[

0,1

2

]→Z by r(x, t)= g(z,2t)

r ′ :Z×[

1

2,1

]→Z by r ′(z, t)= f

(g(z,1),2t − 1

).

To see this define a contraction s of Z onto y0. Note that r(z, 12 ) = r ′(z, 1

2 ),since g(z,1) = f (g(z,1),0). This follows because g is a strong retraction and sog(z,1) ∈ Y , g(y,1)= f (y,0)= y if y ∈ Y . Clearly s : Z × [0,1]→ Z (defined bys(z, t)= r(z, t) if t < 1

2 , s(z, t)= r ′(z, t) if t ≥ 12 ) is continuous and is the identity

at t = 0. Finally if t = 1, then s(z,1)= f (g(z,1),1)= y0. �

AU

TH

OR

’S P

RO

OF



1979

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Lemma 3.22 If Z is a (non-empty) compact, convex set in �n, then it is con-tractible.

Proof For any x in the interior of Z there exists some ξ > 0 such that Bn =clos(Bd(x, ξ)) is contained in Z. (Indeed ξ can be chosen so that Bn is containedin the interior of Z.) As observed in Example 3.3, the closed ball, Bn, is both com-pact and strictly convex. By Lemma 3.17, the distance function d(z,−) : Bn →�is continuous for each z ∈ Z, and so there exists a point y(z) in Bn, say, such thatd(z, y(z)) < d(z, y)) ∀y ∈ Bn. Then d(z, y(z)) = d(z,Bn), the distance betweenz and Bn. Indeed d(z, y(z)) = 0 iff z ∈ Bn. Moreover for each z ∈ Z, y(z) isunique. Define the function f : Z×[0,1]→ Z by f (z, t)= tz+ (1− t)y(z). Sincey(z) ∈ Bn ⊂ Z for each z, and Z is convex, f (z, t) ∈ Z for all t ∈ (0,1]. Clearly ifz ∈ Bn then f (z, t)= z for all t ∈ [0,1] and f (−,1)= h : Z→ Bn is a retraction.Thus f is a strong retraction, and Bn is a deformation retract of Z. By Lemma 3.21,Z is contractible. �

Note that compactness of Z is not strictly necessary for the validity of this lemma.

Lemma 3.23 If Z is contractible to z0 and Y ⊂ Z is a retract of Z by h : Z→ Y

then Y is contractible to h(z0).

Proof Let f : Z × [0,1] → Z be the contraction of Z on z0, and let h : Z → Y

be the retraction. Clearly h ◦ f : Z × [0,1] → Z→ Y . If y ∈ Y , then f (y,0) = y

and h(y)= y (because h is a retraction). Thus h ◦ f (y,0)= y. Moreover f (z,1)=z0 ∀ ∈ Z, so h ◦ f (z,1)= h(z0). �

Clearly being a deformation retract of Z is a much stronger property than being aretract of Z. Both of these properties are useful in proving that any compact convexset has the fixed point property, and that the sphere is neither contractible nor hasthe fixed point property.

Remember that the sphere of radius ξ in �n, with center x, is

Sn−1 = Boundary(clos(Bd(x, ξ)

))= {y ∈ �n : d(x, y)= ξ}.

Now let x0 ∈ Sn−1 be the north pole of the sphere. We shall give an intuitiveargument why D = Sn−1\{x0} is contractible, but Sn−1 is not contractible.

Example 3.9 Let D = Sn−1\{x0} and let Z be a copy of D which is flattened atthe south pole. Let D0 be the flattened disc round the South Pole, xS . Clearly D0is homeomorphic to an (n− 1) dimensional ball Bn−1 centered on xS . Then Z ishomeomorphic to the object D0 × [0,1). There is obviously a strong retraction ofD onto D0. This retraction may be thought of as the function that moves any pointz ∈ Sn−1\{x0} down the lines of longitude to D0. Since D0 is compact, convexit is contractible to xS and thus, by Lemma 3.20, there is a contraction f : D ×[0,1]→D to xS .

AU

TH

OR

’S P

RO

OF



2025

2026

2027

2028

2029

2030

2031

2032

2033

2034

2035

2036

2037

2038

2039

2040

2041

2042

2043

2044

2045

2046

2047

2048

2049

2050

2051

2052

2053

2054

2055

2056

2057

2058

2059

2060

2061

2062

2063

2064

2065

2066

2067

2068

2069

2070

Fig. 3.22 The retraction ofBn on Sn−1

To indicate why Sn−1 cannot be contractible, let us suppose without loss of gen-erality, that g : Sn−1 × [0,1] → Sn−1 is a contraction of Sn−1 to xS , and that g

extends the contraction f :D×[0,1]→D (i.e., g(z, t)= f (z, t) whenever z ∈D).Now f (y, t) maps each point y ∈D to a point further south on the longitudinal linethrough y. If we require g(y, t) = f (y, t) for each y ∈ D, and we require g to becontinuous at (x0,0) then it is necessary for g(x0, t) to be an equatorial circle inSn−1. In other words if g is a function it must fail continuity at (x0,0). While thisis not a formal argument, the underlying idea is clear: The sphere Sn−1 contains ahole, and it is topologically different from the ball Bn.

Brouwer’s Theorem Any compact, convex set in �n has the fixed point property.

Proof We prove first that the ball Bn ≡ B has the fixed point property. Supposeotherwise: that is there exists a continuous function f : B → B with x �= f (x) forall x ∈ B .

Since f (x) �= x, construct the arc from f (x) to x and extend this to the boundaryof B . Now label the point where the arc and the boundary of B intersect as h(x).Since the boundary of B is Sn−1, we have constructed a function h : B→ Sn−1. Itis easy to see that h is continuous (because f is continuous). Moreover if x′ ∈ Sn−1

then h(x′) ∈ Sn−1. Since Sn−1 ⊂ B , it is clear that h : B → Sn−1 is a retraction.(See Fig. 3.22.) By Lemma 3.23, the contractibility of B to its center, x0 say, im-plies that Sn−1 is contractible to h(x0). But Example 3.9 indicates that Sn−1 is notcontractible. The contradiction implies that any continuous function f : B→ B hasa fixed point.

Now let Y be any compact convex set in �n. Then there exists for some ξ

and y0 ∈ Y , a closed ξ -ball, centered at y0, such that Y is contained in B =clos(Bd(y0, ξ)). As in the proof of Lemma 3.22, there exists a strong retractiong : B × [0,1] → B; so Y is a deformation retract of B . (See Fig. 3.23.) In par-ticular g(−,1) = h : B → Y is a retraction. If f : Y → Y is continuous, thenf ◦ h : B → Y → Y ⊂ B is continuous and has a fixed point. Since the image off ◦ h is in Y , this fixed point, y1, belongs to Y . Hence f ◦ h(y1) = y1 for somey1 ∈ Y . But h= id (the identity) on Y , so h(y1)= y1 and thus y1 = f (y1). Conse-quently y1 ∈ Y is a fixed point of f . Thus Y has the fixed point property. �

AU

TH

OR

’S P

RO

OF



2071

2072

2073

2074

2075

2076

2077

2078

2079

2080

2081

2082

2083

2084

2085

2086

2087

2088

2089

2090

2091

2092

2093

2094

2095

2096

2097

2098

2099

2100

2101

2102

2103

2104

2105

2106

2107

2108

2109

2110

2111

2112

2113

2114

2115

2116

Fig. 3.23 The strongretraction of B on Y

Example 3.10 The standard compact, convex set in �n is the (n−1)-simplex Δn−1

defined by

Δ=Δn−1 ={

x = (x1, . . . , xn) ∈ �n :n∑

i=1

xi = 1, and xi ≥ 0 ∀ i

}

.

Δ has n vertices {x0i }, where x0

i = (0, . . . ,1, . . . ,0) with a 1 in the ith entry. Anedge between x0

i and x0j is the arc 〈〈x0

i , x0j 〉〉 or convex set of the form

{x ∈ �n : x = λx0

i + (1− λ)x0j for λ ∈ [0,1]}.

An s-dimensional face of Δ is simply the convex hull of (s+1) different vertices.Note in particular that there are n different (n− 2) dimensional faces. Such a face isopposite the vertex x0

i , so we may label this face Δn−2i . These n different faces have

empty intersection. However any subfamily, F , of this family of n faces (where Fhas cardinality at most (n − 1)), does have a non-empty intersection. In fact if Fhas cardinality (n− 1) then the intersection is a vertex.

Brouwer’s Theorem allows one to derive further results on the existence ofchoice.

Lemma 3.24 Let Q :Δ→�n be an LDC correspondence from the (n−1) dimen-sional simplex to �n, such that Q(x) is both non-empty and convex, for each x ∈Δ.Then there exists a continuous selection, f , for Q, namely a continuous functionf :Δ→�n such that f (x) ∈Q(x) for all x ∈Δ.

Proof Since Q(x) �= Φ , ∀x ∈ Δ, then for each x ∈ Δ,x ∈ Q−1(y) for some y ∈�n. Hence {Q−1(y) : y ∈ �n} is a cover for Δ. Since Q is LDC, Q−1(y) is open,∀y ∈ �n, and thus the cover is an open cover. Δ is compact. As in the proof ofLemma 3.9, there is a finite index set, A = {y1, . . . , yk} of points in �n such that{Q−1(yi) : yi ∈ A} covers Δ. Define αi :Δ→� by αi(x)= d(x,Δ\Q−1(yi)) fori = 1, . . . , k, and let gi :Δ→� be given by gi(x)= αi(x)/

∑kj=1 αj (x). As before,

AU

TH

OR

’S P

RO

OF



2117

2118

2119

2120

2121

2122

2123

2124

2125

2126

2127

2128

2129

2130

2131

2132

2133

2134

2135

2136

2137

2138

2139

2140

2141

2142

2143

2144

2145

2146

2147

2148

2149

2150

2151

2152

2153

2154

2155

2156

2157

2158

2159

2160

2161

2162

d is the distance operator, so αi(x) is the distance from x to Δ\Q−1(yi). {gi} isknown as a partition of unity for Q. Clearly

∑gi(x)= 1 for all x ∈Δ, and gi(x)=

0 iff x ∈Δ\Q−1(yi) (since Δ\Q−1(yi) is closed and thus compact). Finally definef :Δ→�n by f (x)=∑k

i=1 gi(x)yi . By the construction, gi(x)= 0 iff yi /∈Q(x),thus f (x) is a convex combination of points all in Q(x). Since Q(x) is convex,f (x) ∈Q(x). By Lemma 3.17, each αi is continuous. Thus f is continuous. �

Lemma 3.25 Let P : Δ→ Δ be an LDC correspondence such that for all x ∈Δ,P (x) is convex and x /∈ Con P(x), the convex hull of P(x). Then the choiceCP (Δ) is non empty.

Proof Suppose CP (Δ)=Φ . Then P(x) �=Φ ∀x ∈Δ. By Lemma 3.24, there existsa continuous function f : Δ→ Δ such that f (x) ∈ P(x) ∀x ∈ Δ. By Brouwer’sTheorem, there exists a fixed point x0 ∈ Δ such that x0 = f (x0). This contradictsx /∈ ConP(x), ∀x ∈Δ. Thus CP (Δ) �=Φ . �

These two lemmas are stated for correspondences with domain the finite dimen-sional simplex, Δ. Clearly they are valid for correspondences with domain a (finitedimensional) compact convex space. However both results can be extended to (in-finite dimensional) topological vector spaces. The general version of Lemma 3.24is known as Michael’s Selection Theorem (Michael 1956). However it is necessaryto impose conditions on the domain and codomain spaces. In particular it is nec-essary to be able to construct a partition of unity. For this purpose we can use acondition call “paracompactness” rather than compactness. Paracompactness of aspace X requires that there exist, at any point x ∈X, an open set Ux containing x,such that for any open cover {Ui} of X, only finitely many of the open sets of thecover intersect Ux . To construct the continuous selection it is also necessary thatthe codomain Y of the correspondence has a norm, and is complete (essentially thismeans that a limit of a convergent sequences of points is contained in Y ). A com-plete normed topological vector space Y is called a Banach space. We also needY to be “separable” (ie Y contains a countable dense subset.) If Y is a separableBanach space we say it is admissible.

Michael’s Selection Theorem employs a property, called lower hemi-continuity.

Definition 3.9 A correspondence Q : X→ Y between the topological spaces, X

and Y , is lower hemi-continuous (LHC) if whenever U is open in Y , then the set{x ∈X :Q(x)∩U �=Φ} is open in X.

Michael’s Selection Theorem Suppose Q : X → Y is a lower hemi-continuouscorrespondence from a paracompact, Hausdorff topological space X into the ad-missible space Y , such that Q(x) is non-empty closed and convex, for all x ∈ X.Then there exists a continuous selection f :X→ Y for Q.

Lemma 3.24 also provides the proof for a useful intersection theorem known asthe Knaster-Kuratowski-Mazurkiewicz (KKM) Theorem.

AU

TH

OR

’S P

RO

OF



2163

2164

2165

2166

2167

2168

2169

2170

2171

2172

2173

2174

2175

2176

2177

2178

2179

2180

2181

2182

2183

2184

2185

2186

2187

2188

2189

2190

2191

2192

2193

2194

2195

2196

2197

2198

2199

2200

2201

2202

2203

2204

2205

2206

2207

2208

Before stating this theorem, consider an arbitrary collection {x1, . . . , xk} of dis-tinct points in �n. Then clearly the convex hull, Δ, of these points can be identifiedwith a (k− 1)-dimensional simplex. Let S ⊂ {1, . . . , k} be any index set, and let ΔS

be the simplex generated by this collection of s − 1 points (where s = |S|).

KKM Theorem Let R :X→ Y be a correspondence between a convex set X con-tained in a Hausdorff topological vector space Y such that R(x) �=Φ for all x ∈X.Suppose that for at least one point x0 ∈X,R(x0) is compact. Suppose further thatR(x) is closed for all x ∈X. Finally for any set {x1, . . . , xk} of points in X, supposethat

Con{x1, . . . , xk} ⊂k⋃

i=1

R(xi).

Then⋂

x∈X R(x) is non empty.

Proof By Lemma 3.8, since R(x0) is compact,⋂

x∈X R(x) is non-empty iff⋂k

i=1 R(xi) �= Φ for any finite index set. So let K = {1, . . . , k} and let Δ bethe (k − 1)-dimensional simplex spanned by {x1, . . . , xk}. Define P : Δ→ Δ byP(x) = {y ∈ Δ : x ∈ Δ\R(y)} and define Q : Δ→ Δ by Q(x) = ConP(x), theconvex hull of P(x).

But P−1(y) = {x ∈ Δ : y ∈ P(x)} = Δ\R(y) is an open set, in Δ, and so P isLDC. Thus Q is LDC. Now suppose that

⋂i∈K R(xi)=Φ .

Thus for each x ∈ Δ there exists xi(i ∈ K) such that x /∈ R(xi). But then x ∈Δ−R(xi) and so x ∈ P−1(xi). In particular, for each x ∈Δ, P(x), and thus Q(x),is non-empty. Moreover {Q−1(xi) : i ∈K} is an open cover for Δ. As in the proofof Lemma 3.24, there is a partition of unity for Q. (We need Y to be Hausdorff forthis construction.) In particular there exists a continuous selection f : Δ→ Δ forQ. By Brouwer’s Theorem, f has a fixed point x0 ∈Δ. Thus x0 ∈ Con P(x0), andso x0 ∈ Con{y1, . . . , yk} where yi ∈ P(x0) for i ∈ K . But then x0 ∈ Δ\R(yi) fori ∈K , and so x0 /∈R(yi) for i ∈K .

Hence

Con{y1, . . . , yk} �⊂k⋃

i=1

R(yi).

This contradicts the hypothesis of the Theorem. Consequently⋂

i∈K R(xi) �=Φ

for any finite vertex set K . By compactness⋂

x∈X R(x) �=Φ . �

We can immediately use the KKM theorem to prove a fixed point theorem for acorrespondence P from a compact convex set X to a Hausdorff topological vectorspace, Y . In particular X need not be finite dimensional.

AU

TH

OR

’S P

RO

OF


3.8 Political and Economic Choice 125

2209

2210

2211

2212

2213

2214

2215

2216

2217

2218

2219

2220

2221

2222

2223

2224

2225

2226

2227

2228

2229

2230

2231

2232

2233

2234

2235

2236

2237

2238

2239

2240

2241

2242

2243

2244

2245

2246

2247

2248

2249

2250

2251

2252

2253

2254

Browder Fixed Point Theorem Let Q : X → X be a correspondence where X

is a compact convex subset of the Hausdorff topological vector space, Y . Supposefurther that Q is LDC, and that Q(x) is convex and non-empty for all x ∈X. Thenthere exists x0 ∈X such that x0 ∈Q(x0).

Proof Suppose that x /∈Q(x) ∀x ∈ X. Define R : X→ X by R(x) = X\Q−1(x).Since Q is LDC, R(x) is closed and thus compact ∀x ∈X. To use KKM, we seek toshow that Con {x1, . . . , xk} ⊂⋃k

i=1 R(xi) for any finite index set, K = {1, . . . , k}.We proceed by contradiction. That is, suppose that there exists x0 in X with x0 ∈

Con {x1, . . . , xk} but x0 /∈R(xi) for i ∈K . Then x0 ∈Q−1(xi), so xi ∈Q(x0),∀i ∈K . But then x0 ∈ Con Q(x0). Since Q(x) is convex ∀ x ∈ X, this implies thatx0 ∈Q(x0), a contradiction. Consequently x ∈ R(xi) for some i ∈K . By the KKMTheorem,

⋂x∈X R(x) �=Φ .

Thus ∃ x0 ∈ X with x0 ∈ R(x), and so x0 ∈ X\Q−1(x) ∀x ∈ X. Thus x0 /∈Q−1(x) and x /∈ Q(x0) ∀x ∈ X. This contradicts the assumption that Q(x) �=Φ ∀x ∈ X. Hence ∃ x0 ∈X with x0 ∈Q(x0). �

Ky Fan Theorem Let P : X → X be an LDC correspondence where X isa compact convex subset of the Hausdorff topological vector space, Y . If x /∈ConP(x), ∀x ∈ X, then the choice CP (X0) �= Φ for any compact convex subset,X0 of X.

Proof Define Q : X→ X by Q(x) = Con P(x). If Q(x) �= Φ for all x ∈ X, thenby the Browder fixed point theorem, ∃ x0 ∈ X with x0 ∈ Q(x0). This contradictsx /∈ Con P(x) ∀ x ∈ X. Hence Q(x0) = Φ for some x0 ∈ X. Thus CP (X) = {x ∈X : P(x)=Φ} is non-empty. The same inference is valid for any compact, convexsubset X0 of X. �

3.8 Political and Economic Choice

The results outlined in the previous section are based on the intersection propertyof a family of closed sets. With compactness, this result can be extended to the caseof a correspondence R : X→ X to show that

⋂x∈X R(x) �= Φ . If we regard R as

derived from an LDC correspondence P :X→X by R(x)=X\P−1(x) then R(x)

can be interpreted as the set of points “no worse than” or “at last as good as” x.But then the choice CP (X) =⋂x∈X R(x), since such a choice must be at least

as good as any other point. The finite-dimensional version (Lemma 3.25) of theproof that the choice is non-empty is based simply on a fixed point argument usingBrouwer’s Theorem. To extend the result to an infinite dimensional topological vec-tor space we reduced the problem to one on a finite dimensional simplex, Δ, spannedby {x1, . . . , xk : xi ∈X} and then showed essentially that

⋂x∈Δ R(x) is non empty.

By compactness then⋂

x∈X R(x) is non-empty. There is, in fact, a related infinitedimensional version of Brouwer’s Fixed Point Theorem, known as Schauder’s fixedpoint theorem for a continuous function f :X→ Y , where Y is a compact subset ofthe convex Banach space X.

AU

TH

OR

’S P

RO

OF



2255

2256

2257

2258

2259

2260

2261

2262

2263

2264

2265

2266

2267

2268

2269

2270

2271

2272

2273

2274

2275

2276

2277

2278

2279

2280

2281

2282

2283

2284

2285

2286

2287

2288

2289

2290

2291

2292

2293

2294

2295

2296

2297

2298

2299

2300

Fig. 3.24 ???

One technique for proving existence of an equilibrium price vector in an ex-change economy (as discussed in Sect. 3.5) is to construct a continuous functionf :Δ→Δ, where Δ is the price simplex of feasible price vectors, and show that f

has a fixed point (using either the Brouwer or Schauder fixed point theorems).An alternative technique is to use the Ky Fan Theorem to prove existence of an

equilibrium price vector. This technique permits a proof even when preferences arenot representable by utility functions. More importantly, perhaps, it can be used inthe infinite dimensional case.

Example 3.11 To illustrate the Ky Fan Theorem with a simple example, considerFig. 3.24, which reproduces Fig. 1.11 from Chap. 1.

It is evident that the inverse preference, P−1, is not LDC: for example, P−1( 34 )=

( 14 , 1

2 ] ∪ ( 34 ,1] which is not open. As we saw earlier the choice of P on the unit

interval is empty. In fact, to ensure existence of a choice we can require simply thatP be lower hemi-continuous. This we can do by deleting the segment ( 1

2 ,1) abovethe point 1

2 . If the choice were indeed empty, then by Michael’s Selection Theoremwe could find a continuous selection f : [0,1] → [0,1] for P . By Brouwer’s fixedpoint theorem f has a fixed point, x0, say, with x0 ∈ P(x0). By inspection the fixedpoint must be x0 = 1

2 . If we require P to be irreflexive (since it is a strict preferencerelation) then this means that 1

2 /∈ P( 12 ) and so the choice must be CP ([0,1])= { 1

2 }.Notice that the preference displayed in Fig. 3.24 cannot be represented by a utilityfunction. This follows because the implicit indifference relation is intransitive.

The Ky Fan Theorem gives a proof of the existence of a choice for a “spatialvoting game”. Remember a social choice procedure, σ is simple iff it is defined bya family D of decisive coalitions, each a subset of the society M . In this case ifπ = (P1, . . . ,Pm) is a profile on the topological space X, then the social preferenceis given by

xσ(π)y iff x ∈⋃

A∈D

⋂

i∈A

Pi(y)= PD(y).

njschofi

Cross-Out

njschofi

Sticky Note

Illustration of the Ky Fan Theorem

njschofi

Sticky Note

See Figure 3.25

AU

TH

OR

’S P

RO

OF



2301

2302

2303

2304

2305

2306

2307

2308

2309

2310

2311

2312

2313

2314

2315

2316

2317

2318

2319

2320

2321

2322

2323

2324

2325

2326

2327

2328

2329

2330

2331

2332

2333

2334

2335

2336

2337

2338

2339

2340

2341

2342

2343

2344

2345

2346

Fig. 3.25 The graph ofindifference

Here we use Pi :X→X to denote the preference correspondence of individuali. For coalition A, then x ∈ PA(y)=⋂i∈A Pi(y) means every member of A prefersx to y.

Thus x ∈⋃A∈D PA(y) means that for some coalition A ∈ D, all members ofA prefer x to y. Finally we write x ∈ PD(y) for the condition that x is sociallypreferred to y. The choice for σ(π) on X is then

Cσ(π)(X)= {x : PD(x)=Φ}.

Nakamura Theorem Suppose X is a compact convex topological vector spaceof dimension n. Suppose that π = (P1, . . . ,Pm) is a profile on X such that eachpreference Pi :X→X is (i) LDC; and (ii) semi-convex, in the sense that x /∈ ConPi(x) for all x ∈ X. If σ is simple and has Nakamura number k(σ ), and if n ≤k(σ )− 2 then the choice Cσ(π)(X) is non-empty.

Proof For any point x, y ∈ P−1D (x) means x ∈ PD(y) and so x ∈ Pi(y) ∀i ∈ A,

some A ∈D. Thus y ∈ P−1i (x) ∀i ∈A or y ∈⋂i∈A P−1

i (x) or y ∈⋃D⋂

A P−1i (x).

But each Pi is LDC and so P−1i (x) is open, for all x ∈X. Finite intersection of open

sets is open, and so PD is LDC.Now suppose that PD is not semi-convex (that is x ∈ ConPD(x) for some

x ∈X). Since X is n-dimensional and convex, this means it is possible to find a set ofpoints {x1, . . . , xn+1} such that x ∈ Con {x1, . . . , xn+1} and such that xj ∈ PD(x) foreach j = 1, . . . , n+1. Without loss of generality this means there exists a subfamilyD′ = {A1, . . . ,An+1} of D such that xj ∈ PAj

(x). Now n+ 1 ≤ k(σ )− 1, and bythe definition of the Nakamura number, the collegium K(D′) is non-empty. In par-ticular there is some individual i ∈ Aj , for j = 1, . . . , n+ 1. Hence xj ∈ Pi(x) forj = 1, . . . , n+1. But then x ∈ ConPi(x). This contradicts the semi-convexity of in-dividual preference. Consequently PD is semi-convex. The conditions of the Ky FanTheorem are satisfied, and so PD(x)=Φ for some x ∈X. Thus Cσ(π)(x) �=Φ . �

AU

TH

OR

’S P

RO

OF



2347

2348

2349

2350

2351

2352

2353

2354

2355

2356

2357

2358

2359

2360

2361

2362

2363

2364

2365

2366

2367

2368

2369

2370

2371

2372

2373

2374

2375

2376

2377

2378

2379

2380

2381

2382

2383

2384

2385

2386

2387

2388

2389

2390

2391

2392

It is worth observing that in finite dimensional spaces the Ky Fan Theorem isvalid with the continuity property weakened to lower hemi-continuity (LHC). Notefirst that if P is LDC then it is LHC; this follows because {x ∈ X : P(x) ∩ V �=Φ} =⋃y∈V (P−1(y)∩X) is the union of open sets and thus open.

Moreover (as suggested in Example 3.11) if P is LHC and the choice is nonempty, then the correspondence x → ConP(x) has a continuous selection f (byMichael’s Selection Theorem). By the Brouwer Fixed Point Theorem, there is afixed point xo such that xo ∈ ConP(x0). This violates semi-convexity of P . Thusthe Nakamura Theorem is valid when preferences are LHC. The finite dimensionalversion of the Ky Fan Theorem can be used to show existence of a Nash equilibrium(Nash 1950).

Definition 3.10(i) A Game G = {(Pi,X) : i ∈M} for society M consists of a strategy space,

Xi , and a strict preference correspondence Pi :X→X for each i ∈M , whereX =Πi Xi =X1 × · · · ×Xm is the joint strategy space.

(ii) In a game G, the Nash improvement correspondence for individual i is de-fined by

P̂i :X→X where y ∈ P̂i(x) iff y ∈ Pi(x) and

y = (x1, . . . , xi−1, x∗i , . . . xm

),

x = (x1, . . . , xi−1, xi, . . . , xm).

(iii) The overall Nash improvement correspondence is

P̂ =⋃

i∈M

Pi :X→X.

(iv) A point x ∈X is a Nash Equilibrium for the game G iff P̂ (x)=Φ .

Bergstrom (1975, 1992) showed the following.

Bergstrom’s Theorem Let G = {(Pi,X)} be a game, and suppose each Xi ⊂ �n

is a non-empty compact, convex subset of �n. Suppose further that for all i ∈M,P̂i

is both semi-convex and LHC. Then there exists a Nash equilibrium for G.

Proof Since each P̂i is LHC, it follows easily that P̂ :X→X is LHC. To see thatP̂ is semi-convex, suppose that y ∈ Con P̂ (x), then y =∑λiy

i ,∑

i∈N λi = 1,λi ≥ 0, ∀i ∈ N and yi ∈ P̂i(x). By the definition of P̂j , y − x =∑i∈M λiz

i wherezi = yi − x.

This follows because yi and x only differ on the ith coordinate, and so zi =(0, . . . , zi

0, . . .) where zi0 ∈ Xi . Moreover, λi �= 0 iff zi �= 0 because P̂i is semi-

convex.

AU

TH

OR

’S P

RO

OF



2393

2394

2395

2396

2397

2398

2399

2400

2401

2402

2403

2404

2405

2406

2407

2408

2409

2410

2411

2412

2413

2414

2415

2416

2417

2418

2419

2420

2421

2422

2423

2424

2425

2426

2427

2428

2429

2430

2431

2432

2433

2434

2435

2436

2437

2438

Fig. 3.26 Failure ofsemi-convexity

Clearly {zi : i ∈ M,λi �= 0} is a linearly independent set, so y = x iff λi = 0∀i ∈M . But then yi = x ∀i ∈M , which again violates semi-convexity of Pi . Thusy �= x and so P̂ is semi-convex. By the Ky Fan Theorem, for X finite dimensional,the choice of P̂ on X is non-empty. Thus the Nash Equilibrium is non-empty. �

Although the Nakamura Theorem guarantees existence of a social choice for asocial choice rule, σ , for any semi-convex and LDC profile in dimension at mostk(σ )− 2, it is easy to construct situations in higher dimensions with empty choice.The example we now present also describes a game with no Nash equilibrium.

Example 3.12 Consider a simply voting procedure with M = {1,2,3} and let Dconsist of any coalition with at least two voters. Let X be a compact convex setin �2, and construct preferences for each i in M as follows. Each i has a “blisspoint” xi ∈ X and a preference Pi on X such that for y, x ∈ Pi(y) iff ‖x − xi‖ <

‖y − yi‖. The preference is clearly LDC and semi-convex (since Pi(x) is a convexset and x /∈ P(x) for all x ∈ X). Now let Δ = Con{x1, x2, x3} the 2-dimensionalsimplex in X (for convenience suppose all xi are distinct and in the interior of X).For each A⊂M let PA(x)=⋂i∈A Pi(x) as before.

In particular the choice for the strict Pareto rule is CPM(X)= CM(X)=Δ. This

can be seen by noting that if x ∈ CM(X) iff there is no point y ∈ M such that‖y − xi‖< ‖x − xi‖ ∀i ∈M . Clearly this condition holds exactly at those points inΔ. For this reason preferences of this kind are called “Euclidean preferences”.

Now consider a point in the interior of X. At x the preferred sets for the threecoalitions (D′ = {1,2}, {1,3}, {2,3}) do not satisfy the semi-convexity property.Figure 3.26 shows that

x ∈ Con{P12(x),P13(x),P23(x)

}.

While PD′ is LDC, it violates semi-convexity. Thus the Ky Fan Theorem cannotbe used to guarantee existence of a choice. To illustrate the connection with Walker’sTheorem, note also that there is a voting cycle. That is 1 prefers a to b to c, 2 prefersb to c to a, and 3 prefers c to a to b. The reason the cycle can be constructed is that

AU

TH

OR

’S P

RO

OF



2439

2440

2441

2442

2443

2444

2445

2446

2447

2448

2449

2450

2451

2452

2453

2454

2455

2456

2457

2458

2459

2460

2461

2462

2463

2464

2465

2466

2467

2468

2469

2470

2471

2472

2473

2474

2475

2476

2477

2478

2479

2480

2481

2482

2483

2484

Fig. 3.27 Empty Nashequilibrium

the Nakamura number is 3, and dimension (X) = 2, thus n = k(σ ) − 1. In factthere is no social choice in X. This voting cycle also defines a game with no Nashequilibrium.

Let {1,2} be the two parties, and let the “strategy spaces” be X1 = X2 = Δ.Say party i adopts a winning position yi ∈ Xi over party j ′s position, yj ∈ Xj

iff yi ∈ PD′(yj ), so party i gains a majority over party j . This induces a Nashimprovement correspondence P̂j , for j = 1,2.

For example P̂1 : Y → Y where Y =X1×X2 is defined by (y∗1 , y2) ∈ P̂1(y1, y2)

iff y∗1 ∈ PD′(y2), so y∗1 is a winning position over y2 but y1 /∈ PD′(y2), so y1 is nota winning position. In other words party 1 prefers to move from y1 to y∗1 , so as towin. It is evident if we choose y1, y2 and the points {z1, z2, z3} as in Fig. 3.27, thenzi ∈ PD′(y2) so (zi, y2) ∈ P̂1(y1, y2) for i = 1,2,3 and y1 ∈ Con{z1, z2, z3}.

Thus (y1, y2) ∈ Con P̂1(y1, y2). Because of the failure of semi-convexity of bothP̂1 and P̂2, a Nash equilibrium cannot be guaranteed. In fact the Nash equilibriumis empty.

We now briefly indicate how the Ky Fan Theorem can be used to show existenceof an equilibrium price vector in an exchange economy. First of all each individual i

initially holds a vector of endowments ei ∈ �n. A price vector p ∈Δn−1 belongs tothe (n− 1)-dimensional simplex: that is p = (p1, . . . , pn) such that

∑ni=1 pi = 1.

An allocation x ∈ X =∏i∈N Xi ⊂ (�n)m where Xi is i’s consumption set in�n+ (here xi ∈ �n+ iff xij ≥ 0, j = 1, . . . , n). At the price vector p, the ith budget setis

Bi(p)= {xi ∈ �n+ : 〈p,x〉 ≤ 〈p, ei〉}.

At price p, the demand vector x = (x1, . . . , xm) satisfies the optimality conditionP̂i(x)∩ {x ∈X : xi ∈ Bi(p)} =Φ for each i.

As before P̂i : X→ X is the Nash improvement correspondence (as in Defini-tion 3.10).

As we discussed earlier in Sect. 3.5.1, an equilibrium price vector p is a pricevector p = (p1, . . . , pn) such that the demand vector x satisfies the optimality con-dition at p and such that total demand does not exceed supply. This latter condition

AU

TH

OR

’S P

RO

OF



2485

2486

2487

2488

2489

2490

2491

2492

2493

2494

2495

2496

2497

2498

2499

2500

2501

2502

2503

2504

2505

2506

2507

2508

2509

2510

2511

2512

2513

2514

2515

2516

2517

2518

2519

2520

2521

2522

2523

2524

2525

2526

2527

2528

2529

2530

requires that∑

i∈M xi ≤∑i∈M ei (the two terms are both vectors in �n). That is, ifwe use the suffix of xj to denote commodity j , then

∑i∈M(xij − eij )≤ 0 for each

j = 1, . . . , n.Note also that a transformation p > λp, for a real number λ > 0, does not change

the budget set.This follows because Bi(p)= {xi ∈ �n+ : 〈p,x〉 ≤ 〈p, ei〉} = Bi(λp).Consequently if p is an equilibrium price vector, then so is λp. Without loss of

generality, then, we can normalize the price vector, p, so that ‖p‖ = 1 for somenorm on �n. We may do this for example by assuming that

∑nj=1 pj = 1 and that

pj ≥ 0 ∀j .For this reason we let Δn−1 represent the set of all price vectors.A further point is worth making. Since we assume that pj ≥ 0 ∀j it is possible

for commodity j to have zero price. But then the good must be in excess supply.To ensure this, we require the equilibrium price vector p and i’s demand xi at p

to satisfy the condition 〈p,xi〉 = 〈p, ei〉 As we noted in Sect. 3.5.1, this can beensured by assuming that preference is locally non-satiated in X. That is if we letP i(x)⊂Xi be the projection of P̂i onto Xi at x, then for any neighborhood U of xi

in Xi there exists x′i ∈U such that x′i ∈ P i(x).To show existence of a price equilibrium we need to define a price adjustment

mechanism.To this end define:

P̂0 :Δ×X→Δ by

P̂0(p, x)={p′ ∈Δ :

⟨p′ − p

∑

i∈M

(xi − ei)

⟩> 0

}.

(**)

Now let X∗ =Δ×X and define P ∗0 :X∗ →X∗ by (p′, x) ∈ P ∗0 (p, y) iff x = y and

p′ ∈ P̂0(p, x).In the same way for each i ∈M extend P̂i :X→X to P ∗i :X∗ →X∗ by letting

(p′, x) ∈ P ∗i (p, y) iff p′ = p and x ∈ P̂i(y). This defines an exchange game Ge ={(P ∗i ,X∗) : i = 0, . . . ,m}.

Bergstrom (1992) has shown (under the appropriate conditions of semi-convexity,LHC and local monotonicity for each P̂i ) that there is a Nash equilibrium to Ge.Note that e ∈ (�n)m is the initial vector of endowments. We can show that the Nashequilibria comprise a non-empty set {(p, x) ∈Δ×X} where x = (x1, . . . , xm) is avector of final demands for the members of M , and p is an equilibrium price vector.

A Nash equilibrium (p, x) ∈Δ×X satisfies the following properties:(i) Since P ∗i (p, x)=Φ for each i ∈N , it follows that xi is i’s demand. Moreover

by local monotonicity we have 〈p,xi〉 = 〈p, ei〉, ∀i ∈M . Thus

∑

i∈M

〈p,xi − ei〉 = 0. (*)

(ii) Now P ∗0 (p, x)=Φ so 〈p− p,∑

i∈M(xi − ei)〉 ≤ 0 ∀p ∈Δ (See (**)).

AU

TH

OR

’S P

RO

OF



2531

2532

2533

2534

2535

2536

2537

2538

2539

2540

2541

2542

2543

2544

2545

2546

2547

2548

2549

2550

2551

2552

2553

2554

2555

2556

2557

2558

2559

2560

2561

2562

2563

2564

2565

2566

2567

2568

2569

2570

2571

2572

2573

2574

2575

2576

Suppose that∑

i∈M(xi − ei) > 0. Then it is clearly possible to find p ∈Δ suchthat pj > 0, for some j , with 〈p,

∑i∈M(xi−ei)〉> 0. But this violates (*) and (**).

Consequently∑

i∈M(xij − eij )≤ 0 for j = 1, . . . , n.Thus x ∈ (�n)m satisfies the feasibility constraint

∑i∈M xi ≤∑i∈M ei .

Hence (p, x) is a free-disposal equilibrium, in the sense that total demand may beless than total supply. Bergstrom (1992) demonstrates how additional assumptionson individual preference are sufficient to guarantee equality of demand and supply.Section 4.4, below, discusses this more fully.

References

The reference for the Kuhn-Tucker Theorems is:

<unc> Kuhn, H. W., & Tucker, A. W. (1950). Non-linear programming. In Proceedings: 2nd Berke-ley symposium on mathematical statistics and probability, Berkeley: University of CaliforniaPress.

A useful further reference for economic applications is:

<unc> Heal, E. M. (1973). Theory of economic planning. Amsterdam: North Holland.

The classic references on fixed point theorems and the various corollaries of the theorems are:

Brouwer, L. E. J. (1912). Uber Abbildung von Mannigfaltikeiten. Mathematische Annalen, 71,97–115.

<unc> Browder, F. E. (1967). A new generalization of the Schauder fixed point theorem. MathematischeAnnalen, 174, 285–290.

<unc> Browder, F. E. (1968). The fixed point theory of multivalued mappings in topological vector spaces.Mathematische Annalen, 177, 283–301.

Fan, K. (1961). A generalization of Tychonoff’s fixed point theorem. Mathematische Annalen, 142,305–310.

Knaster, B., Kuratowski, K., & Mazerkiewicz, S. (1929). Ein Beweis des Fixpunktsatze fur n-dimensionale Simplexe. Fundamenta Mathematicae, 14, 132–137.

Michael, E. (1956). Continuous selections I. Annals of Mathematics, 63, 361–382.Nash, J. F. (1950). Equilibrium points in n-person games. Proceedings of the National Academy of

Sciences of the United States of America, 36, 48–49.<unc> Schauder, J. (1930). Der Fixpunktsatze in Funktionalräumn. Studia Mathematica, 2, 171–180.

A useful discussion of the relationships between the theorems can be found in:

<unc> Border, K. (1985). Fixed point theorems with applications to economics and game theory. Cam-bridge: Cambridge University Press.

njschofi

Cross-Out

njschofi

Sticky Note

Further Reading

njschofi

Sticky Note

See also Hildenbrand W. and Kirman A.P. (1976) Introduction to Equilibrium Analysis. Amsterdam: Elsevier.

AU

TH

OR

’S P

RO

OF


References 133

2577

2578

2579

2580

2581

2582

2583

2584

2585

2586

2587

2588

2589

2590

2591

2592

2593

2594

2595

2596

2597

2598

2599

2600

2601

2602

2603

2604

2605

2606

2607

2608

2609

2610

2611

2612

2613

2614

2615

2616

2617

2618

2619

2620

2621

2622

The proof of the Browder fixed point theorem by KKM is given in:

<unc> Yannelis, N., & Prabhakar, N. (1983). Existence of maximal elements and equilibria in lineartopological spaces. Journal of Mathematical Economics, 12, 233–245.

Applications of the Fan Theorem to show existence of a price equilibrium can be found in:

<unc> Aliprantis, C., & Brown, D. (1983). Equilibria in markets with a Riesz space of commodities.Journal of Mathematical Economics, 11, 189–207.

Bergstrom, T. (1975). The existence of maximal elements and equilibria in the absence of transi-tivity (Typescript). Washington University in St. Louis.

Bergstrom, T. (1992). When non-transitive relations take maxima and competitive equilibriumcan’t be beat. In W. Neuefeind & R. Riezman (Eds.), Economic theory and international trade,Berlin: Springer.

<unc> Shafer, W. (1976). Equilibrium in economies without ordered preferences or first disposal. Journalof Mathematical Economics, 3, 135–137.

<unc> Shafer, W., & Sonnenschein, H. (1975). Equilibrium in abstract economies without ordered pref-erences. Journal of Mathematical Economics, 2, 345–348.

The application of the idea of the Nakamura number to existence of a voting equilibrium can befound in:

<unc> Greenberg, J. (1979). Consistent majority rules over compact sets of alternatives. Econometrica,41, 285–297.

Schofield, N. (1984). Social equilibrium and cycles on compact sets. Journal of Economic Theory,33, 59–71.

<unc> Strnad, J. (1985). The structure of continuous-valued neutral monotonic social functions. SocialChoice and Welfare, 2, 181–195.

Finally there are results on existence of a joint political economic equilibrium, namely an outcome(p, t, x, y), where (p, x) is a market equilibrium, t is an equilibrium tax schedule voted on undera rule D and y is an allocation of publicly provided goods.

<unc> Konishi, H. (1996). Equilibrium in abstract political economies: with an application to a publicgood economy with voting. Social Choice and Welfare, 13, 43–50.

AU

TH

OR

’S P

RO

OF


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

4Differential Calculus and Smooth Optimisation

Under certain conditions a continuous function f : �n→�m can be approximatedat each point x in �n by a linear function df (x) : �n→�m, known as the differen-tial of f at x. In the same way the differential df may be approximated by a bilinearmap d2f (x). When all differentials are continuous then f is called smooth. For asmooth function f , Taylor’s Theorem gives a relationship between the differentialsat a point x and the value of f in a neighbourhood of a point. This in turn allows usto characterise maximum points of the function by features of the first and seconddifferential. For a real-valued function whose preference correspondence is convexwe can virtually identify critical points (where df (x)= 0) with maxima.

In the maximisation problem for a smooth function on a “smooth” constraint set,we seek critical points of the Lagrangian, introduced in the previous chapter. In par-ticular in economic situations with exogenous prices we may characterise optimumpoints for consumers and producers to be points where the differential of the utilityor production function is given by the price vector. Finally we use these results toshow that for a society the set of Pareto optimal points belongs to a set of generalisedcritical points of a function which describes the preferences of the society.

4.1 Differential of a Function

A function f : �→� is differentiable at x ∈ � if Limh→0f (x+h)−f (x)

hexists (and

is neither +∞ nor −∞). When this limit exists we shall write it as dfdx|x . Another

way of writing this is that as (xn)→ x then

f (xn)− f (x)

xn − x→ df

dx

∣∣∣∣x

,

the derivative of f at x.This means that there is a real number λ(x) = df

dx|x ∈ � such that f (x) =

λ(x)h+ ε|h|, where ε→ 0 as h→ 0.


135

http://dx.doi.org/10.1007/978-3-642-39818-6_4

AU

TH

OR

’S P

RO

OF


136 4 Differential Calculus and Smooth Optimisation

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

Fig. 4.1 ???

Let df (x) be the linear function �→� given by df (x)(h) = λ(x)h. Then themap �→� given by

h→ f (x)+ df (x)(h)= g(x + h)

is a “first order approximation” to the map

h→ f (x + h).

In other words the maps h→ g(x + h) and h→ f (x + h) are “tangent” to oneanother where “tangent” means that

|f (x + h)− g(x + h)||h|

approaches 0 as h→ 0. Note that the map h→ f (x)+ df (x)(h)= g(x + h) has astraight line graph, and so df (x) is a “linear approximation” to the function f at x.

Example 4.11. Suppose f : � → � : x → x2. Then Limh→0

f (x+h)−f (x)h

=Limh→0

(x+h)2−x2

h= Limh→0

2hx+h2

h= 2x + Limh→0 h = 2x. Similarly if

f : �→� : x→ xr then df (x)= rxr−1.2. Suppose f : �→� : x→ sinx. Then

Limh→0

(sin(x + h)− sinx

h

)

= Limh→0

(sinx(cosh− 1)+ cosx sinh

h

)

= Limh→0

sinx

h

(−h2

2

)+ Lim

h→0

cosx

h(h)

= cosx.

njschofi

Cross-Out

njschofi

Sticky Note

The differential

njschofi

Sticky Note

See Figure 4.1

AU

TH

OR

’S P

RO

OF


4.1 Differential of a Function 137

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

3. f : �→� : x→ ex .

Limh→0

ex+h − ex

h= Lim

h→0

ex

h

[1+ h+ h2

2· · · − 1

]= ex.

4. f : �→� : x→ x4 if x ≥ 0, x2 if x < 0. Consider the limit as h approaches0 from above (i.e., h→ 0+). Then

Limh→0+

f (0+ h)− f (0)

h= h4 − 0

h= h3 = 0.

The limit as h approaches 0 from below is

Limh→0−

f (0+ h)− f (0)

h= h3 − 0

h= h2 = 0.

Thus df (0) is defined and equal to 0.5. f : �→�, by

x→−x2 x ≤ 0

x→ (x − 1)2 − 1 0 < x ≤ 1

x→−x x > 1

Limx→0

f (x)= 0

Limx→0+

f (x)= 0.

Thus f is continuous at x = 0.

Limx→1−

f (x)=−1

Limx→1+

f (x)=−1.

Thus f is continuous at x = 1.

Limx→0−

df (x)= Limx→0−

(−2x)= 0

Limx→0+

df (x)= Limx→0+

2(x − 1)=−2

Limx→1−

df (x)= Limx→1−

2(x − 1)= 0

Limx→1+

df (x)= Limx→1+

(−1)=−1.

Hence df (x) is not continuous at x = 0 and x = 1.To extend the definition to the higher dimension case, we proceed as follows:

njschofi

Sticky Note

See Figure 4.2

AU

TH

OR

’S P

RO

OF



139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

Fig. 4.2 ???

Definition 4.1 Let X,Y be two normed vector spaces, with norms ‖ ‖X , ‖ ‖Y , andsuppose f,g :X→ Y . Then say f and g are tangent at x ∈X iff

Lim‖h‖X→0

‖f (x + h)− g(x + h)‖Y‖h‖X = 0.

If there exists a linear map df (x) : X → Y such that the function g : X → Y

given by

g(x + h)= f (x)+ df (x)(h)

is tangent to f at x, then f is said to be differentiable at x, and df (x) is called thedifferential of f at x.

In other words df (x) is the differential of f at x iff there is a linear approxima-tion df (x) to f at x, in the sense that

f (x + h)− f (x)= df (x)(h)+ ‖h‖Xμ(h)

where μ :X→ Y and ‖μ(h)‖Y → 0 as ‖h‖X → 0.Note that since df (x) is a linear map from X to Y , then its image is a vector

subspace of Y , and df (x)(0) is the origin, 0, in Y .Suppose now that f is defined on an open ball U in X.For some x ∈ U , consider an open neighborhood V of x in U . The image of the

map

h→ g(x + h) for each h ∈U

will be of the form f (x) + df (x)(h), which is to say a linear subspace of Y , buttranslated by the vector f (x) from the origin.

If f is differentiable at x, then we can regard df (x) as a linear map from X toY , so df (x) ∈ L(X,Y ), the set of linear maps from X to Y . As we have shown inSect. 3.2 of Chap. 3, L(X,Y ) is a normed vector space, when X is finite dimen-sional. For example, for k ∈ L(X,Y ) we can define ‖k‖ by

‖k‖ = sup{∥∥k(x)

∥∥Y: x ∈X s.t. ‖x‖X = 1

}.

Let L(X,Y ) be L(X,Y ) with the topology induced from this norm.

njschofi

Cross-Out

njschofi

Sticky Note

Example 4.1.5

AU

TH

OR

’S P

RO

OF



185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

When f : U ⊂X→ Y is continuous we shall call f a C0-map. If f is C0, anddf (x) is defined at x, then df (x) will be linear and thus continuous, in the sensethat df (x) ∈ L(X,Y ).

Hence we can regard df as a map

df :U → L(X,Y ).

It is important to note here that though the map df (x) may be continuous, themap df : U → L(X,Y ) need not be continuous at x. However when f is C0, andthe map

df :U → L(X,Y )

is continuous for all x ∈ U , then we shall say that f is a C1-differentiable mapon U . Let C0(U,Y ) be the set of maps from U to Y which are continuous on U ,and let C1(U,Y ) be the set of maps which are C1-differentiable on U . ClearlyC1(U,Y ) ⊂ C0(U,Y ). If f is a differentiable map, then df (x), since it is lin-ear, can be represented by a matrix. Suppose therefore that f : �n →�, and let{e1, . . . , en} be the standard basis for �n. Then for any h ∈ �n,h=∑n

i=1 hiei andso df (x)(h)=∑n

i=1 hidf (x)(ei)=∑ni=1 hiαi say.

Consider the vector (0, . . . , hi, . . . ,0) ∈ �n.Then by the definitions

αi = df (x)(0, . . . , ei, . . . ,0)

= Limhi→o

{f (x1, . . . , xi + hi, . . . , )− f (x1, . . . , xi, . . . , xn)

hi

}= ∂f

∂xi

∣∣∣∣x

,

where ∂f∂xi|x is called the partial derivative of f at x with respect to the ith coordi-

nate, xi .Thus the linear function df (x) : �→� can be represented by a “row vector” or

matrix

Df (x)=(

∂fj

∂x1

∣∣∣∣x

, . . . ,∂f

∂xn

∣∣∣∣x

).

Note that this representation is dependent on the particular choice of the basis for�n. This matrix Df (x) can also be regarded as a vector in �n, and is then called thedirection gradient of f at x. The ith coordinate of Df (x) is the partial derivative off with respect to xi at x.

If h is a vector in �n with coordinates (h1, . . . , hn) with respect to the standardbasis, then

df (x)(h)=n∑

i=1

hi

∂f

∂xi

∣∣∣∣x

= ⟨Df (x),hi

⟩

where 〈Df (x),h〉 is the scalar product of h and the direction gradient Df (x).

AU

TH

OR

’S P

RO

OF



231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

In the same way if f : �n→�m and f is differentiable at x, then df (x) can berepresented by the n×m matrix

Df (x)=(

∂fj

∂xi

)

x

, i = 1, . . . , n; j = 1, . . . ,m

where f (x)= f (x1, . . . , xn)= (f1(x), . . . , fj (x), . . . , fm(x)). This matrix is calledthe Jacobian of f at x. We may define the norm of Df (x) to be

∥∥Df (x)∥∥= sup

{∣∣∣∣∂fj

∂xi

∣∣∣∣x

: i = 1, . . . , n; j = 1, . . . ,m

}.

When f has domain U ⊂ �n, then continuity of Df : U → M(n,m), whereM(n,m) is the set of n×m matrices, implies the continuity of each partial derivative

U →� : x→ ∂fj

∂xi

∣∣∣∣x

.

Note that when f : � → � then ∂f∂x|x is written simply as df

dx|x and is a real

number.Then the linear function df (x) : �→� is given by df (x)(h)= (

dfdx|x)h.

To simplify notation we shall not distinguish between the linear function df (x)

and the real number dfdx|x when f : �→�.

Suppose now that f : U ⊂ X→ Y and g : V ⊂ Y → Z such that g ◦ f : U ⊂X→ Z exists. If f is differentiable at x, and g is differentiable at f (x) then g ◦ f

is differentiable at x and is given by

d(g ◦ f )(x)= dg(f (x)

) ◦ df (x).

In terms of Jacobian matrices this is D(g ◦ f )(x) =Dg(f (x)) ◦Df (x), or ∂gk

∂xi=

∑mj=1

∂gk

∂fj

∂fj

∂xi, i.e.,

kth row(

∂gk

∂f1. . .

∂gk

∂fm

)

⎛

⎜⎜⎜⎜⎝

∂f1∂xi

...∂fm

∂xi

ith column

⎞

⎟⎟⎟⎟⎠

This is also known as the chain-rule.If Id : �n →�n is the identity map then clearly the Jacobian matrix of Id must

be the identity matrix.Suppose now that f : U ⊂ �n → V ⊂ �n is differentiable at x, and has an in-

verse g = f−1 which is differentiable. Then g ◦ f = Id and so Id=D(g ◦ f )(x)=Dg(f (x)) ◦Df (x). Thus D(f−1)(f (x))= [Df (x)]−1.

In particular, for this to be the case Df (x) must be an n× n matrix of rank n.When this is so, f is called a diffeomorphism.

AU

TH

OR

’S P

RO

OF



277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

Fig. 4.3 ???

On the other hand suppose f :X→� and g : Y →�, where f is differentiableat x ∈X and g is differentiable at y ∈ Y .

Let fg : X × Y →� : (x, y)→ f (x)g(y). From the chain rule, d(fg)(x, y)×(h, k) = f (x) dg(y)(k) + g(y) df (x)(h), and so d(fg)(x, y) = f (x)dg(y) +g(y)df (x).

Hence fg is differentiable at the point (x, y) ∈X× Y .When X = Y = �, and (fg)(x) = f (x)g(x) then d(fg)(x) = f dg(x) +

gdf (x), where this is called the product rule.

Example 4.2 Consider the function f : �→� given by x→ x2 sin 1x

if x �= 0; 0 ifx = 0.

We first of all verify that f is continuous. Let g(x)= x2, h(x)= sin 1x= ρ[m(x)]

where m(x) = 1x

and ρ(y) = sin(y). Since m is continuous at any non zero point,both h and g are continuous. Thus f is continuous at x �= 0. (Compare with Exam-ple 3.1.)

Now Limx→0 x2 sin 1x= Limy→+∞ siny

y2 . But −1 ≤ siny ≤ 1, and so

Limy→+∞ siny

y2 = 0.Hence xn→ 0 implies f (xn)→ 0= f (0). Thus f is also continuous at x = 0.Consider now the differential of f . By the product rule, since f = gh,

d(gh)(x)= x2dh(x)+(

sin1

x

)dg(x).

Since h(x)= ρ[m(x)], by the chain rule,

dh(x) = dρ(m(x)

) · dm(x)

= cos

[(m(x)

)(− 1

x2

)]

= − 1

x2cos

1

x.

njschofi

Cross-Out

njschofi

Sticky Note

Example 4.2

njschofi

Sticky Note

See Figure 4.2.

AU

TH

OR

’S P

RO

OF



323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

Thus

df (x) = d(gh)(x)

= x2[− 1

x2cos

1

x+ 2x sin

1

x

]

= − cos1

x+ 2x sin

1

x,

for any x �= 0.Clearly df (x) is defined and continuous at any x �= 0. To determine if df (0) is

defined, let k = 1h

. Then

Limh→0+

f (0+ h)− f (0)

h= Lim

h→0+

h2 sin 1h

h

= Limh→0+

h sin1

h

= Limk→+∞

sin k

k.

But again −1 ≤ sin k ≤ 1 for all k, and so Limk→+∞ sinkk= 0. In the same

way Limh→0−h2 sin 1

h

h= 0. Thus Limh→0

f (0+h)−f (0)h

= 0, and so df (0)= 0. Hencedf (0) is defined and equal to zero.

On the other hand consider (xn)→ 0+. We show that Limxn→0+ df (xn) doesnot exist. By the above Limx→0+ df (x) = Limx→0+[2x sin 1

x− cos 1

x]. While

Limx→0+ 2x sin 1x= 0, there is no limit for cos 1

xas x → 0+ (see Example 3.1).

Thus the function df : �→ L(�,�) is not continuous at the point x = 0.The reason for the discontinuity of the function df at x = 0 is that in any neigh-

bourhood U of the origin, there exist an “infinite” number of non-zero points, x′,such that df (x′)= 0. We return to this below.

4.2 Cr -Differentiable Functions

4.2.1 The Hessian

Suppose that f :X→ Y is a C1-differentiable map, with domain U ⊂X. Then aswe have seen df : U → L(X,Y ) where L(X,Y ) is the topological vector space oflinear maps from X to Y with the norm

‖k‖ = sup{∥∥k(x)

∥∥Y: x ∈X s.t. ‖x‖X = 1

}.

Since both U and L(X,Y ) are normed vector spaces, and df is continuous, df

may itself be differentiable at a point x ∈ U . If df is differentiable, then its deriva-tive at x is written d2f (x), and will itself be a linear approximation of the map df

from X to L(X,Y ). If df is C1, then df will be continuous, and d2f (x) will alsobe a continuous map. Thus d2f (x) ∈ L(X,L(X,Y )).

AU

TH

OR

’S P

RO

OF


4.2 Cr -Differentiable Functions 143

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

When d2f :U → L(X,L(X,Y )) is itself continuous, and f is C1-differentiable,then say f is C2-differentiable. Let C2(U,Y ) be the set of C2-differentiable mapson U . In precisely the same way say that f is Cr -differentiable iff f is Cr−1-differentiable, and the r th derivative df : U → L(X,L(X,L(X, . . .))) is continu-ous.

The map is called smooth or C∞ if drf is continuous for all r .Now the second derivative d2f (x) satisfies d2f (x)(h)(k) ∈ Y for vectors

h, k ∈X. Moreover d2f (x)(h) is a linear map from X to Y and d2f (x) is a lin-ear map from X to L(X,Y ).

Thus d2f (x) is linear in both factors h and k. Hence d2f (x) may be regarded asa map

H(x) :X×X→ Y

where H(x)(h, k)= d2f (x)(h)(k) ∈ Y .Moreover d2f (x) is linear in both h and k, and so H(x) is linear in both h and k.

As in Sect. 2.3.3, we say H(x) is bilinear.Let L2(X;Y) be the set of bilinear maps X×X → Y . Thus H ∈L2(X;Y) iff

H(α1h1 + α2h2, k) = α1H(h1, k)+ α2H(h2, k)

H(h,β1k1 + β2k2) = β1H(h, k1)+ β2H(h, k2)

for any α1, α2, β1, β2 ∈Re,h,h1, h2, k, k1, k2 ∈X.Since X is a finite-dimensional normed vector space, so is X ×X, and thus the

set of bilinear maps L2(X;Y) has a norm topology. Write L2(X,Y ) when the setof bilinear maps has this topology. The continuity of the second differential d2f :U → L(X,L(X,Y )) is equivalent to the continuity of the map H :U → L2(X;Y),and we may therefore regard d2f as a map d2f : U → L2(X;Y). In the same waywe may regard drf as a map drf : U → Lr (X;Y) where Lr (X;Y) is the set ofmaps Xr → Y which are linear in each component, and is endowed with the normtopology.

Suppose now that f : �n→� is a C2-map, and consider a point x = (x1, . . . , xn)

where the coordinates are chosen with respect to the standard basis. As we have seenthe differential df :U → L(�n,�) can be represented by a continuous function

Df : x→(

∂f

∂x1

∣∣∣∣x

, . . . ,∂f

∂xn

∣∣∣∣x

).

Now let ∂fj : U →� be the continuous function x→ ∂f∂xj|x . Clearly the differ-

ential of ∂fj will be

x→(

∂

∂x1(∂fj )

∣∣∣∣x

, . . . ,∂

∂xn

(∂fj )

∣∣∣∣x

);

write ∂∂xi

(∂fj )|x = ∂fji = ∂∂xi

(∂f∂xj

)|x .

AU

TH

OR

’S P

RO

OF



415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

Then the differential d2f (x) can be represented by the matrix array

(∂fji)x =

⎛

⎜⎜⎝

∂∂x1

(∂f∂x1

) . . . ∂∂x1

(∂f∂xn

)

......

∂∂xn

(∂f∂x1

) . . . ∂∂xn

(∂f∂xn

)

⎞

⎟⎟⎠

x

.

This n× n matrix we shall also write as Hf (x) and call the Hessian matrix off at x. Note that Hf (x) is dependent on the particular basis, or coordinate systemfor X.

From elementary calculus it is known that

∂

∂xi

(∂f

∂xj

)∣∣∣∣x

= ∂

∂xj

(∂f

∂xi

)∣∣∣∣x

and so the matrix Hf (x) is symmetric.Consequently, as in Sect. 2.3.3, Hf (x) may be regarded as a quadratic form

given by

D2f (x)(h, k) = ⟨h,Hf (x)(k)⟩

= (h1, . . . , hn)

(∂

∂xi

(∂f

∂xj

)(k1kn

))

=n∑

i=1

n∑

j=1

hi

∂

∂xi

(∂f

∂xj

)kj .

As an illustration if f : �2→� is C2 then D2f (x) : �2 ×�2→� is given by

D2f (x)(h,h) = (h1 h2)⎛

⎝∂2f

∂x12

∂2f

∂x12∂x2

∂2f

∂x12∂x1

∂2f

∂x22

⎞

⎠

⎛

⎝h1

h2

⎞

⎠

=(

h12 ∂2f

∂x12+ 2h1h2

∂2f

∂x1∂x2+ h2

2 ∂2f

∂x22

)∣∣∣∣x

.

In the case that f : �→ � is C2, then ∂2f

∂x2 |x is simply written as d2f

dx2 |x , a real

number. Consequently the second differential D2f (x) is given by

D2f (x)(h,h) = h

(d2f

dx2

∣∣∣∣x

)h

= h2 d2f

dx2

∣∣∣∣x

.

We shall not distinguish in this case between the linear map D2f (x) : �2 →�and the real number d2f

dx2 |x .

AU

TH

OR

’S P

RO

OF



461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

Fig. 4.4 ???

4.2.2 Taylor’s Theorem

From the definition of the derivative of a function f : X→ Y , df (x) is the linearapproximation to f in the sense that f (x + h) − f (x) = df (x)(h) + ‖h‖xμ(h)

where the “error” ‖h‖xμ(h) approaches zero as h approaches zero. Taylor’s The-orem is concerned with the “accuracy” of this approximation for a small vector h,by using the higher order derivatives. Our principal tool in this is the following. Iff :U ⊂X→� and the convex hull [x, x + h] of the points x and x + h belongs toU , then there is some point z ∈ [x, x + h] such that f (x + h)= f (x)+ df (z)(h).

To prove this result we proceed as follows.

Lemma 4.1 (Rolle’s Theorem) Let f : U →� where U is an open set in � con-taining the compact interval I = [a, b], and a < b. Suppose that f is continuousand differentiable on U , and that f (a)= f (b). Then there exists a point c ∈ (a, b)

such that df (c)= 0.

Proof From the Weierstrass Theorem (and Lemma 3.16) f attains its upper andlower bounds on the compact interval, I . Thus there exists finite m,M ∈ � suchthat m≤ f (x)≤M for all x ∈ I .

If f is constant on I , so m= f (x)=M,∀x ∈ I , then clearly df (x)= 0 for allx ∈ I .

Then there exists a point c in the interior (a, b) of I such that df (c)= 0. Supposethat f is not constant. Since f is continuous and I is compact, there exist pointsc, e ∈ I such that f (c)=M and f (e)=m. Suppose that neither c nor e belong to(a, b). In this case we obtain a = e and b= c, say. But then M =m and so f is theconstant function. When f is not the constant function either c or e belongs to theinterior (a, b) of I .1. Suppose c ∈ (a, b). Clearly f (c) − f (x) ≥ 0 for all x ∈ I . Since c ∈ (a, b)

there exists x ∈ I s.t. x > c, in which case f (x)−f (c)x−c

≤ 0. On the other hand

there exists x ∈ I s.t. x < c and f (x)−f (c)x−c

≥ 0. By the continuity of df at x,

df (c) = Limx→c+f (x)−f (c)

x−c= Limx→c−

f (x)−f (c)x−c

= 0. Since c ∈ (a, b) anddf (x)= 0 we obtain the result.

2. If e ∈ (a, b) then we proceed in precisely the same way to show df (e) = 0.Thus there exists some point c ∈ (a, b), say, such that df (c)= 0. �

njschofi

Cross-Out

njschofi

Sticky Note

Rolle's Theorem

njschofi

Sticky Note

See Figure 4.4.

AU

TH

OR

’S P

RO

OF



507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

Fig. 4.5 ???

Note that when both c and e belong to the interior of I , then these maximum andminimum points for the function f are critical points in the sense that the derivativeis zero.

Lemma 4.2 Let f : �→ � where f is continuous on the interval I = [a, b] anddf is continuous on (a, b). Then there exists a point c ∈ (a, b) such that df (c) =f (b)−f (a)

b−a.

Proof Let g(x) = f (b) − f (x) − k(b − x) and k = f (b)−f (a)b−a

. Clearly, g(a) =g(b)= 0. By Rolle’s Theorem, there exists some point c ∈ (a, b) such that dg(c)=0. But dg(c)= k− df (c). Thus df (c)= f (b)−f (a)

b−a. �

Lemma 4.3 Let f : �→ � be continuous and differentiable on an open set con-taining the interval [a, a + h]. Then there exists a number t ∈ (0,1) such that

f (a + h)= f (a)+ df (a + th)(h).

Proof Put b= a + h. By the previous lemma there exists c ∈ (a, a + h) such that

df (c)= f (b)− f (a)

b− a.

Let t = c−ab−a

. Clearly t ∈ (0,1) and c = a + th. But then df (a + th) : �→� isthe linear map given by df (a + th)(h)= f (b)− f (a), and so f (a + h)= f (a)+df (a + th)(h). �

Mean Value Theorem Let f : U ⊂ X → � be a differentiable function on U ,where U is an open set in the normed vector space X. Suppose that the line seg-ment

[x, x + h] = {z : z= x + th where t ∈ [0,1]}

njschofi

Cross-Out

njschofi

Sticky Note

Lemma 4.2

njschofi

Sticky Note

See Figure 4.5.

AU

TH

OR

’S P

RO

OF



553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

belongs to U . Then there is some number t ∈ (0,1) such that

f (x + h)= f (x)+ df (x + th)(h).

Proof Define g : [0,1]→� by g(t)= f (x + th). Now g is the composition of thefunction

ρ : [0,1]→U : t → x + th

with f : [x, x + h]→�.Since both ρ and f are differentiable, so is g. By the chain rule,

dg(t) = df(ρ(t)) ◦ dρ(t)

= df (x + th)(h).

By Lemma 4.3, there exists t ∈ (0,1) such that dg(t) = g(1)−g(0)1−0 . But g(1) =

f (x + h) and g(0)= f (x). Hence df (x + th)(h)= f (x + h)− f (x). �

Lemma 4.4 Suppose g : U →� is a C2-map on an open set U in � containingthe interval [0,1]. Then there exists θ ∈ (0,1) such that

g(1)= g(0)+ dg(0)+ 1

2d2g(θ).

Proof (Note here that we regard dg(t) and d2g(t) as real numbers.) Now definek(t)= g(t)− g(0)− tdg(0)− t2[g(1)− g(0)− dg(0)].

Clearly k(0) = k(1) = 0, and so by Rolle’s Theorem, there exists θ ∈ (0,1)

such that dk(θ)= 0. But dk(t)= dg(t)− dg(0)− 2t[g(1)− g(0)− dg(0)]. Hencedk(0)= 0.

Again by Rolle’s Theorem, there exists θ ′ ∈ (0, θ) such that d2k(θ ′) = 0. Butd2k(θ ′)= d2g(θ ′)− 2[g(1)− g(0)− dg(0)].

Hence g(1)− g(0)− dg(0)= 12d2g(θ ′) for some θ ′ ∈ (0,1). �

Lemma 4.5 Let f :U ⊂X→� be a C2-function on an open set U in the normedvector space X. If the line segment [x, x + h] belongs to U , then there exists z ∈(x, x + h) such that

f (x + h)= f (x)+ df (x)(h)+ 1

2d2f (z)(h,h).

Proof Let g : [0,1] → � by g(t) = f (x + th). As in the mean value theorem,dg(t)= df (x + th)(h). Moreover d2g(t)= d2f (x + th)(h,h).

By Lemma 4.4, g(1)= g(0)+ dg(0)+ 12d2g(θ ′) for some θ ′ ∈ (0,1).

Let z= x + θ ′h. Then f (x + h)= f (x)+ df (x)(h)+ 12d2f (h,h). �

AU

TH

OR

’S P

RO

OF



599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

Fig. 4.6 ???

Taylor’s Theorem Let f :U ⊂X→� be a smooth (or C∞-) function on an openset U in the normed vector space X. If the line segment [x, x + h] belongs to U ,then f (x + h) = f (x)+∑n

r=11r!d

rf (x)(h, . . . , h)+ Rn(h) where the error term

Rn(h)= 1(n+1)!d

n+1f (z)(h, . . . , h) and z ∈ (x, x + h).

Proof By induction on Lemma 4.5, using the mean value theorem. �

The Taylor series of f at x to order k is

[f (x)

]k= f (x)+

k∑

r=1

1

r!drf (x)(h, . . . , h).

When f is C∞, then [f (x)]k exists for all k. In the case when X =�n and the errorterm Rk(h) approaches zero as k→∞, then the Taylor series [f (x)]k will convergeto f (x + h).

In general however [f (x)]k need not converge, or if it does converge then it neednot converge to f (x + h).

Example 4.3 To illustrate this, consider the flat function f : �→� given by

f (x) = exp

(− 1

x2

), x �= 0,

= 0 x = 0.

Now Df (x)=− 2x3 exp(− 1

x2 ) for x �= 0. Since y32 e−y → 0 as y→∞, we obtain

Df (x)→ 0 as x→ 0. However

Df (0)= Limh→0

f (0+ h)− f (0)

h= Lim

h→0

1

hexp

(− 1

h2

)= 0.

Thus f is both continuous and C1 at x = 0. In the same way f is Cr andDrf (0)= 0 for all r > 1. Thus the Taylor series is [f (0)]k = 0. However for smallh > 0 it is evident that f (0+ h) �= 0. Hence the Taylor series cannot converge to f .

These remarks lead directly to classification theory in differential topology, andare beyond the scope of this work. The interested reader may refer to Chillingsworth(1976) for further discussion.

njschofi

Cross-Out

njschofi

Sticky Note

The flat function.

AU

TH

OR

’S P

RO

OF



645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

4.2.3 Critical Points of a Function

Suppose now that f :U ⊂�n→� is a C2-map. Once a coordinate basis is chosenfor �n, then D2f (x) may be regarded as a quadratic form. In matrix notation thismeans that

D2f (x)(h,h)= htHf (x)h.

As we have seen in Chap. 2 if the Hessian matrix Hf (x) = (∂fji)x has all itseigenvalues positive, then D2f (x)(h,h) > 0 for all h ∈ �n, and so Hf (x) will bepositive definite.

Conversely Hf (x) is negative definite iff all its eigenvalues are strictly negative.

Lemma 4.6 If f :U ⊂�n→� is a C2 map on U , and the Hessian matrix Hf (x)

is positive (negative) definite at x, then there is a neighbourhood V of x in U s. t.Hf (y) is positive (negative) definite for all y ∈ V .

Proof If Hf (x) is positive definite, then as we have seen there are n different al-gebraic relationships between the partial derivatives ∂fji(x) for j = 1, . . . , n andi = 1, . . . , n, which characterise the roots λ1(x), . . . , λn(x) of the characteristicequation

∣∣Hf (x)− λ(x)I

∣∣= 0.

But since f is C2, the map D2f : U → L2(�n;�) is continuous. In particularthis implies that for each i, j the map x→ ∂

∂xi(

∂f∂xj

)|x = ∂fji(x) is continuous. Thusif ∂fji(x) > 0 then there is a neighbourhood V of x in U such that ∂fji(y) > 0 forall y ∈ V . Moreover if

C(x)= C(∂fji(x) : i = 1, . . . , n; j = 1, . . . , n

)

is an algebraic sentence in ∂fji(x) such that C(x) > 0 then again there is a neigh-bourhood V of x in U such that C(y) > 0 for all y ∈ V .

Thus there is a neighborhood V of x in U such that λi(x) > 0 for i = 1, . . . , n

implies λi(y) > 0 for i = 1, . . . , n, and all y ∈ V . Hence Hf (x) is positive definiteat x ∈ U implies that Hf (y) is positive definite for all y in some neighborhood ofx in U . The same argument holds if Hf (x) is negative definite at x. �

Definition 4.2 Let f :U ⊂�n→� where U is an open set in �n. A point x in U

is called1. a local strict maximum of f in U iff there exists a neighbourhood V of x in U

such that f (y) < f (x) for all y ∈ V with y �= x;2. a local strict minimum of f in U iff there exists a neighbourhood V of x in U

such that f (y) > f (x) for all y ∈ V with y �= x;3. a local maximum of f in U iff there exists a neighbourhood V of x in U such

that f (y)≤ f (x) for y ∈ V ;

AU

TH

OR

’S P

RO

OF



691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

4. a local minimum of f in U iff there exists a neighbourhood V of x in U suchthat f (y)≥ f (x) for all y ∈ V .

5. Similarly a global (strict) maximum (or minimum) on U is defined by requiringf (y) < (≤,>,≥)f (x) respectively on U .

6. If f is C1-differentiable then x is called a critical point iff df (x)= 0, the zeromap from �n to �.

Lemma 4.7 Suppose that f : U ⊂ �n →� is a C2-function on an open set U

in �n. Then f has a local strict maximum (minimum) at x if1. x is a critical point of f and2. the Hessian Hf (x) is negative (positive) definite.

Proof Suppose that x is a critical point and Hf (x) is negative definite. ByLemma 4.5

f (y)= f (x)+ df (x)(h)+ 1

2d2f (z)(h,h)

whenever the line segment [x, y] ∈U,h= y−x and z= x+θh for some θ ∈ (0,1).Now by the assumption there is a coordinate base for �n such that Hf (x) is

negative definite. By Lemma 4.6, there is a neighbourhood V of x in U such thatHf (y) is negative definite for all y in V . Let Nε(x) = {x + h : ‖h‖ < ε} be anε-neighborhood in V of x. Let S ε

2(0)= {h ∈ �n : ‖h‖ = 1

2ε}.Clearly any vector x + h, where h ∈ S ε

2(0) belongs to Nε(x), and thus V . Hence

Hf (z) is negative definite for any z = x + θh, where h ∈ S ε2(0), and θ ∈ (0,1).

Thus Hf (z)(h,h) < 0, and any z ∈ [x, x + h].But also by assumption df (x) = 0 and so df (x)(h) = 0 for all h ∈ �n. Hence

f (x + h) = f (x) + 12d2f (z)(h,h) and so f (x + h) < f (x) for h ∈ S ε

2(0). But

the same argument is true for any h satisfying ‖h‖ < ε2 . Thus f (y) < f (x) for

all y in the open ball of radius ε2 about x. Hence x is a local strict maximum. The

same argument when Hf (x) is positive definite shows that x must be a local strictminimum. �

In Sect. 2.3 we defined a quadratic form A∗ : �n×�n→� to be non-degenerateiff the nullity of A∗, namely {x : A∗(x, x) = 0}, is {0}. If x is a critical point of aC2-function f : U ⊂�n →� such that d2f (x) is non-degenerate (when regardedas a quadratic form), then call x a non-degenerate critical point.

The dimension (s) of the subspace of �n on which d2f (x) is negative definite iscalled the index of f at x, and x is called a critical point of index s.

If x is a non-degenerate critical point, then when any coordinate system for �n

is chosen, the Hessian Hf (x) will have s eigenvalues which are negative, and n− s

which are positive.For example if f : �→�, then only three cases can occur at a critical point

1. d2f (x) > 0 : x is a local minimum;

AU

TH

OR

’S P

RO

OF



737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

2. d2f (x) < 0 : x is a local maximum;3. d2f (x)= 0 : x is a degenerate critical point.

If f : �2 →� then a number of different cases can occur. There are three non-degenerate cases:

1. Hf (x)= [ 1 00 1

], say, with respect to a suitable basis; x is a local minimum since

both eigenvalues are positive. Index= 0.

2. Hf (x) = [−1 00 −1

]; x is a local maximum, both eigenvalues are negative.

Index= 2.

3. Hf (x)= [ 1 00 −1

]; x is a saddle point or index 1 non-degenerate critical point.

In the degenerate cases, one eigenvalue is zero and so det (Hf (x))= 0.

Example 4.4 Let f : �2 →� : (x, y)→ xy. The differential at (x, y) is Df (x, y)

= (y, x). Thus H =Hf (x, y)= ( 0 11 0

). Clearly (0,0) is the critical point. Moreover

|H | = −1 and so (0,0) is non-degenerate. The eigenvalues λ1, λ2 of the Hessiansatisfy λ1 + λ2 = 0, λ1λ2 =−1. Thus λ1 = 1, λ2 =−1. Eigenvectors of Hf (x, y)

are v1 = (1,1) and v2 = (1,−1) respectively. Let P = 1√2

( 1 11 −1

)be the normalized

eigenvector (basis change) matrix, so P−1 = P . Then

∧= 1

2

(1 1−1 1

)(0 11 0

)(1 1−1 0

)=(

1 00 −1

).

Consider a vector h= (h1, h2) ∈ �2. In matrix notation, htHh= htP ∧ P−1h.Now

P(h)= 1√2

(1 11 −1

)(h1h2

)= 1√

2

(h1 + h2h1 − h2

).

Thus htHh= 12 [(h1 + h2)

2 − (h1 − h2)2] = 2h1h2.

It is clear that D3f (0,0)= 0. Hence from Taylor’s Theorem,

f (0+ h1,0+ h2)= f (0)+Df (0)(h)+ 1

2D2f (0)(h,h)

and so f (h1, h2)= 12htHh= h1h2.

Suppose we make the basis change represented by P . Then with respect to thenew basis {v1, v2} the point (x, y) has coordinates ( 1√

2(h1 + h2),

1√2(h1 − h2)).

Thus f can be represented in a neighbourhood of the origin as

(h1, h2)→ 1√2(h1 + h2)

1√2(h1 − h2)= 1

2

(h1

2 − h22).

Notice that with respect to this new coordinate system

AU

TH

OR

’S P

RO

OF



783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

Fig. 4.7 ???

Df (h1, h2) = (h1,−h2), and

Hf (h1, h2) =(

1 00 −1

).

In the eigenspace E1 = {(x, y) ∈ �2 : x = y} the Hessian has eigenvalue 1 and so f

has a local minimum at 0, when restricted to E1.Conversely in the eigenspace E−1 = {(x, y) ∈ �2 : x + y = 0}, f has a local

maximum at 0.More generally when f : �2→� is a quadratic function in x, y, then at a critical

point f can be represented, after a suitable coordinate change, either as1. (x, y)→ x2 + y2 an index 0, minimum2. (x, y)→−x2 − y2 an index 2, maximum3. (x, y)→ x2 − y2 an index 1 saddle point.

Example 4.5 Let f : �3 →� : (x, y, z)→ x2 + 2y2 + 3z2 + xy + xz. ThereforeDf (x, y, z)= (2x + y + z,4y + x,6z+ x).

Clearly (0,0,0) is the only critical point.

H =Hf (x, y, z)=⎛

⎝2 1 11 4 01 0 6

⎞

⎠

|H | = 38 and so (0,0,0) is non-degenerate. It can be shown that the eigenvaluesof the matrix are strictly positive, and so H is positive definite and (0,0,0) is aminimum of the function. Thus f can be written in the form (u, v,w)→ au2 +bv2 + cw2, where a, b, c > 0 and (u, v,w) are the new coordinates after a (linear)basis change.

Notice that Lemma 4.7 does not assert that a local strict maximum (or minimum)of a C1-function must be a critical point where the Hessian is negative (respectivelypositive) definite.

For example consider the “flat” function f : � → � given by f (x) =− exp(− 1

x2 ), and f (0)= 0. As we showed in Example 4.3, df (0)= d2f (0)= 0.

Yet clearly 0 >− exp(− 1a2 ) for any a �= 0. Thus 0 is a global strict maximum of

f on �, although d2f is not negative definite.

njschofi

Cross-Out

njschofi

Sticky Note

The flat function

njschofi

Sticky Note

See Figure 4.7

AU

TH

OR

’S P

RO

OF



829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

A local maximum or minimum must however be a critical point. If the point isa local maximum for example then the Hessian can have no positive eigenvalues.Consequently D2f (x)(h,h) ≤ 0 for all h, and so the Hessian must be negativesemi-definite. As the flat function indicates, the Hessian may be identically zero atthe local maximum.

Lemma 4.8 Suppose that f : U ⊂ �n → � is a C2-function on an open set u

in �n. Then f has a local maximum or minimum at x only if x is a critical pointof f .

Proof Suppose that df (x) �= 0. Then we seek to show that x can be neither a localmaximum nor minimum at x.

Since df (x) is a linear map from�n to� it is possible to find some vector h ∈ �n

such that df (x)(h) > 0.Choose h sufficiently small so that the line segment [x, x + h] belongs to U .Now f is C1, and so df : U → L(�n,�) is continuous. In particular since

df (x) �= 0 then for some neighbourhood V of x in U , df (y) �= 0 for all y ∈ V .Thus for all y ∈ V,df (y)(h) > 0 (see Lemma 4.18 for more discussion of this phe-nomenon).

By the mean value theorem there exists t ∈ (0,1) such that f (x + h)= f (x)+df (x + th)(h). By choosing h sufficiently small, the vector x + th ∈ V . Hencef (x + h) > f (x). Consequently x cannot be a local maximum.

But in precisely the same way if df (x) �= 0 then it is possible to find h such thatdf (x)(h) < 0. A similar argument can then be used to show that f (x + h) < f (x)

and so x cannot be a local minimum. Hence if x is either a local maximum orminimum of f then it must be a critical point of f . �

Lemma 4.9 Suppose that f : U ⊂ �n →� is a C2-function on an open set U

in �n. If f has a local maximum at x then the Hessian d2f at the critical pointmust be negative semi-definite (i.e., d2f (x)(h,h)≤ 0 for all h ∈ �n).

Proof We may suppose that x is a critical point. Suppose further that for some co-ordinate basis at x, and vector h ∈ �n, d2f (x)(h,h) > 0. From Lemma 4.6, by thecontinuity of d2f there is a neighbourhood V of x in U such that d2f (x′)(h,h) > 0for all x′ ∈ V .

Choose an ε-neighbourhood of x in V , and choose α > 0 such that ‖αh‖ = 12ε.

Clearly x+αh ∈ V . By Taylor’s Theorem, there exists z= x+θαh, θ ∈ (0,1), suchthat f (x + αh)= f (x)+ df (x)(h)+ 1

2d2f (z)(αh,αh). But d2f (z) is bilinear, sod2f (z)(αh,αh)= α2d2f (z)(h,h) > 0 since z ∈ V . Moreover df (x)(h)= 0. Thusf (x + αh) > f (x).

Moreover for any neighbourhood U of x it is possible to choose ε sufficientlysmall so that x + αh belongs to U . Thus x cannot be a local maximum. �

Similarly if f has a local minimum at x then x must be a critical point withpositive semi-definite Hessian.

AU

TH

OR

’S P

RO

OF



875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

Example 4.6 Let f : �2→� : (x, y)→ x2y2− 4yx2+ 4x2; Df (x, y)= (2xy2−8xy + 8x,2x2y − 4x2). Thus (x, y) is a critical point when1. x(2y2 − 8y + 8)= 0 and2. 2x2(y − 2)= 0.

Now 2y2− 8y + 8= 2(y − 2)2. Thus (x, y) is a critical point either when y = 2or x = 0.

Let S(f ) be the set of critical points. Then S(f )= V1 ∪ V2 where

V1 ={(x, y) ∈ �2 : x = 0

}

V2 ={(x, y) ∈ �2 : y = 2

}.

Now

Hf (x, y)=(

2(y − 2)2 −4x(2− y)

−4x(2− y) 2x2

)

and so when (x, y) ∈ V1 then Hf (x, y) = (μ2 00 0

), and when (x, y) ∈ V2, then

Hf (x, y)= ( 0 00 τ2

).

For suitable μ and τ , any point in S(f ) is degenerate. On V1\{(0,0)} clearlya critical point is not negative semi-definite, and so such a point cannot be a localmaximum. The same is true for a point on V2\{(0,0)}.

Now (0,0) ∈ V1∩V2, and Hf (0,0)= (0). Lemma 4.9 does not rule out (0,0) asa local maximum. However it should be obvious that the origin is a local minimum.

Unlike Examples 4.4 and 4.5 no linear change of coordinate bases transforms thefunction into a quadratic canonical form.

To find a local maximum point we therefore seek all critical points. Those whichhave negative definite Hessian must be local maxima. Those points remaining whichdo not have a negative semi-definite Hessian cannot be local maxima, and may berejected. The remaining critical points must then be examined.

A C2-function f : �n→� with a non-degenerate Hessian at every critical pointis called a Morse function. Below we shall show that any Morse function can be rep-resented in a canonical form such as we found in Example 4.4. For such a function,local maxima are precisely those critical points of index n. Moreover, any smoothfunction with a degenerate critical point can be “approximated” by a Morse func-tion.

Suppose now that we wish to maximise a C2-function on a compact set K . Aswe know from the Weierstrass theorem, there does exist a maximum. However,Lemmas 4.8 and 4.9 are now no longer valid and it is possible that a point on theboundary of K will be a local or global maximum but not a critical point. However,Lemma 4.7 will still be valid, and a negative definite critical point will certainly bea local maximum.

A further difficulty arises since a local maximum need not be a global maximum.However, for concave functions, local and global maxima coincide. We discuss max-imisation by smooth optimisation on compact convex sets in the next section.

AU

TH

OR

’S P

RO

OF


4.3 Constrained Optimisation 155

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

4.3 Constrained Optimisation

4.3.1 Concave and Quasi-concave Functions

In the previous section we obtained necessary and sufficient conditions for a criticalpoint to be a local maximum on an open set U . When the set is not open, then a localmaximum need not be a critical point. Previously we have defined the differential ofa function only on an open set. Suppose now that Y ⊂�n is compact and thereforeclosed, and has a boundary ∂Y . If df is continuous at each point in the interior, Int(Y ) of Y , then we may extend df over Y by defining df (x), at each point x in theboundary, ∂Y of Y , to be the limit df (xk) for any sequence, (xk), of points in Int(Y ), which converge to x. More generally we shall say a function f : Y ⊂�n→�is C1 on the admissible set Y if df : Y → L(�n,�) is defined and continuous in theabove sense at each x ∈ Y . We now give an alternative definition of the differentialof a C1-function f : Y →�. Suppose that Y is convex and both x and x+h belongto Y . Then the arc [x, x + h] = {z ∈ �n : z= x + λh,λ ∈ [0,1]} belongs to Y .

Now df (x)(h) = limλ→0+f (x+λh)−f (x)

λand thus df (x)(h) is often called the

directional derivative of f at x in the direction h.Finding maxima of a function becomes comparatively simple when f is a con-

cave or quasi-concave function (see Sect. 3.4 for definitions of these terms). Ourfirst result shows that if f is a concave function then we may relate df (x)(y− x) tof (y) and f (x).

Lemma 4.10 If f : Y ⊂�n→� is a concave C1-function on a convex admissibleset Y then

df (x)(y − x)≥ f (y)− f (x).

Proof Since f is concave

f(λy + (1− λ)x

)≥ λf (y)+ (1− λ)f (x)

for any λ ∈ [0,1] whenever x, y ∈ Y . But then f (x+λ(y− x)−f (x))≥ λ[f (y)−f (x)], and so

df (x)(y − x) = Limλ→0+

f (x + λ(y − x))− f (x)

λ

≥ f (y)− f (x). �

This enables us to show that for a concave function, f , a critical point of f mustbe a global maximum when Y is open.

First of all call a function f : Y ⊂�n→� strictly quasi-concave iff Y is convexand for all x, y ∈ Y

f(λy + (1− λ)x

)> min

(f (x), f (y)

)for all λ ∈ (0,1).

AU

TH

OR

’S P

RO

OF



967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

Remember that f is quasi-concave if

f(λy + (1− λ)x

)≥min(f (x), f (y)

)for all λ ∈ [0,1].

As above let P(x;Y) = {y ∈ Y : f (y) > f (x)} be the preferred set of a func-tion f on the set Y . A point x ∈ Y is a global maximum of f on Y iff P(x;Y)=Φ .When there is no chance of misunderstanding we shall write P(x) for P(x;Y). Asshown in Sect. 3.4, if f is (strictly) quasi-concave then, ∀x ∈ Y , the preferred set,P(x), is (strictly) convex.

Lemma 4.111. If f : Y ⊂�n→� is a concave or strictly quasi-concave function on a convex

admissible set, then any point which is a local maximum of f is also a globalmaximum.

2. If f :U ⊂�n→� is a concave C1-function where U is open and convex, thenany critical point of f is a global maximum on U .

Proof1. Suppose that f is concave or strictly quasi-concave, and that x is a local max-

imum but not a global maximum on Y . Then there exists y ∈ Y such thatf (y) > f (x).

Since Y is convex, the line segment [x, y] belongs to Y . For any neighbour-hood U of x in Y there exists some λ∗ ∈ (0,1) such that, for λ ∈ (0, λ∗), z=λy + (1− λ)x ∈U . But by concavity

f (z)≥ λf (y)+ (1− λ)f (x) > f (x).

Hence in any neighbourhood U of x in Y there exists a point z such that f (z) >

f (x). Hence x cannot be a local maximum. Similarly by strict quasi-concavity

f (z) > min(f (x), f (y)

)= f (x),

and so, again, x cannot be a local maximum. By contradiction a local maximummust be a global maximum.

2. If f is C1 and U is open then by Lemma 4.8, a local maximum must be acritical point. By Lemma 4.10, df (x)(y − x)≥ f (y)− f (x). Thus df (x)= 0implies that f (y)− f (x) ≤ 0 for all y ∈ Y . Hence x is a global maximum off on Y . �

Clearly if x were a critical point of a concave function on an open set thenthe Hessian d2f (x) must be negative semi-definite. To see this, note that byLemma 4.11, the critical point must be a global maximum, and thus a local maxi-mum. By Lemma 4.9, d2f (x) must be negative semi-definite. The same is true if f

is quasi-concave.

Lemma 4.12 If f : U ⊂ �n →� is a quasi-concave C2-function on an open set,then at any critical point, x, d2f (x) is negative semi-definite.

AU

TH

OR

’S P

RO

OF



1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

Proof Suppose on the contrary that df (x) = 0 and d2f (x)(h,h) > 0 for someh ∈ �n. As in Lemma 4.6, there is a neighbourhood V of x in U such thatd2f (z)(h,h) > 0 for all z in V .

Thus there is some λ∗ ∈ (0,1) such that, for all λ ∈ (0, λ∗), there is some z in V

such that

f (x + λh) = f (x)+ df (x)(λh)+ λ2d2f (z)(h,h), and

f (x − λh) = f (x)+ df (x)(−λh)+ (−λh)2d2f (z)(h,h),

where [x − λh,x + λh] belongs to U . Then f (x + λh) > f (x) and f (x − λh) >

f (x). Now x ∈ [x − λh,x + λh] and so by quasi-concavity,

f (x)≥min(f (x + λh),f (x − λh)

).

By contradiction d2f (x)(h,h)≤ 0 for all h ∈ �n. �

For a concave function, f , on a convex set Y , d2f (x) is negative semi-definitenot just at critical points, but at every point in the interior of Y .

Lemma 4.131. If f : U ⊂ �n →� is a concave C2-function on an open convex set U , then

d2f (x) is negative semi-definite for all x ∈U .2. If f : Y ⊂�n→� is a C2-function on an admissible convex set Y and d2f (x)

is negative semi-definite for all x ∈ Y , then f is concave.

Proof1. Suppose there exists x ∈U and h ∈ �n such that d2f (x)(h,h) > 0. By the con-

tinuity of d2f , there is a neighbourhood V of x in U such that d2f (z)(h,h) >

0, for z ∈ V . Choose θ ∈ (0,1) such that x+θh ∈ V . Then by Taylor’s theoremthere exists z ∈ (x, x + θh) such that

f (x + θh) = f (x)+ df (x)(θh)+ 1

2d2f (z)(θh, θh)

> f (x)+ df (x)(θh).

But then df (x)(θh) < f (x + θh) − f (x). This contradicts df (x)(y − x) ≥f (y)− f (x), ∀x, y in U . Thus d2f (x) is negative semi-definite.

2. If x, y ∈ Y and Y is convex, then the arc [x, y] ⊂ Y . Hence there is somez= λx + (1− λ)y, where λ ∈ (0,1), such that

f (y) = f (x)+ df (x)(y − x)+ d2f (z)(y − x, y − x)

≤ f (x)+ df (x)(y − x).

But in the same way f (x) − f (z) ≤ df (z)(x − z) and f (y) − f (z) ≤df (z)(y − z). Hence

f (z)≥ λ[f (x)− df (z)(x − z)

]+ (1− λ)[f (y)− df (z)(y − z)

].

AU

TH

OR

’S P

RO

OF



1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

Now λdf (z)(x − z)+ (1− λ)df (z)(y − z)= df (z)[λx + (1− λ)y − z] =df (z)(0)= 0, since df (z) is linear.

Hence f (z)≥ λf (x)+ (1− λ)f (y) for any λ ∈ [0,1] and so f is concave. �

We now extend the analysis to a quasi-concave function and characterise thepreferred set P(x;Y).

Lemma 4.14 Suppose f : Y ⊂ �n → � is a quasi-concave C1-function on theconvex admissible set Y .1. If f (y)≥ f (x) then df (x)(y − x)≥ 0.2. If df (x)(y − x) > 0, then there exists some λ∗ ∈ (0,1) such that f (z) > f (x)

for any

z= λy + (1− λ)x where λ ∈ (0, λ∗).

Proof1. By the definition of quasi-concavity f (λy+ (1−λ)x) > f (x) for all λ ∈ [0,1].

But then, as in the analysis of a concave function,

f(x + λ(y − x)

)− f (x)≥ 0

and so df (x)(y − x)= Limλ→0+f (x+λ(y−x))−f (x)

λ≥ 0.

2. Now suppose f (x)≥ f (z) for all z in the line segment [x, y]. Then df (x)(y−x)= Limλ→0+

f (x+λ)(y−x)−f (x)λ

≤ 0, contradicting df (x)(y − x) > 0.Thus there exists z∗ = λ∗y + (1 − λ∗)x such that f (z∗) > f (x). But then for allz ∈ (x,λ∗y + (1− λ∗)x), f (z) > f (x). �

The property that f (y) ≥ f (x) ⇒ df (x)(y − x) ≥ 0 is often called pseudo-concavity.

We shall also say that f : Y ⊂�n→� is strictly pseudo-concave iff for any x, y

in Y , with y �= x then f (y)≥ f (x) implies that df (x)(y − x) > 0. (Note we do notrequire Y to be convex, but we do require it to be admissible.)

Lemma 4.15 Suppose f : Y ⊂ �n →� is a strictly pseudo-concave function onan admissible set Y .1. Then f is strictly quasi-concave when Y is convex.2. If x is a critical point, then it is a global strict maximum.

Proof1. Suppose that f is not strictly quasi-concave. Then for some x, y ∈ Y,f (λ∗y +

(1 − λ∗)x) ≤ min (f (x), f (y)) for some λ∗ ∈ (0,1). Without loss of gener-ality suppose f (x) ≤ f (y), and f (λy + (1 − λ)x) ≤ f (x) for all λ ∈ (0,1).Then df (x)(y − x) = Limλ→0+

f (λy+(l−λ)x)−f (x)λ

≤ 0. But by strict pseudo-concavity, we require that df (x)(y − x) ≥ 0. Thus f must be strictly quasi-concave.

AU

TH

OR

’S P

RO

OF



1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

2. If df (x)= 0 then df (x)(y − x)= 0 for all y ∈ U . Hence f (y) < f (x) for ally ∈U,y �= x. Thus x is a global strict maximum. �

As we have observed, when f is a quasi-concave function on Y , the preferred setP(x;Y) is a convex set in Y . Clearly when f is continuous then P(x;Y) is openin Y . As we might expect from the Separating Hyperplane Theorem, P(x;Y) willthen belong to an “open half space”. To see this note that Lemma 4.14 establishes(for a quasi-concave C1 function f ) that the weakly preferred set

R(x;Y)= {y ∈ Y : f (y)≥ f (x)}

belongs to the closed half-space

H(x;Y)= {y ∈ Y : df (x)(y − x)≥ 0}.

When Y is open and convex the boundary of H(x;Y) is the hyperplane {y ∈ Y :df (x)(y−x)= 0} and H(x;Y) has relative interior

0H (x;Y)= {y ∈ Y : df (x)(y−

x) > 0}. Write H(x),R(x),P (x) for H(x;Y),R(x;Y),P (x;Y), etc., when Y isunderstood.

Lemma 4.16 Suppose f :U ⊂�n→� is C1, and U is open and convex.

1. If f is quasi-concave, with df (x) �= 0 then P(x)⊂ 0H (x),

2. If f is concave or strictly pseudo-concave then P(x)⊂ 0H(x) for all x ∈U . In

particular if x is a critical point, then P(x)= 0H(x)=Φ .

Proof

1. Suppose that df (x) �= 0 but that P(x) �⊂ 0H(x). However both P(x) and

0H(x)

are open sets in U . By Lemma 4.14, R(x)⊂H(x), and thus the closure of P(x)

belongs to the closure of0H(x) in U . Consequently there must exist a point y

which belongs to P(x) yet df (x)(y − x)= 0, so y belongs to the boundary of0H(x). Since P(x) is open there exists a neighbourhood V of y in P(x), and

thus in R(x). Since y is a boundary point of0H(x) in any neighbourhood V of

y there exists z such that z /∈H(x). But this contradicts R(x)⊂H(x). Hence

P(x)⊂ 0H(x).

2. Since a concave or strictly pseudo-concave function is a quasi-concave func-

tion, (1) establishes that P(x) ⊂ 0H(x) for all x ∈ U such that df (x) �= 0.

By Lemmas 4.11 and 4.15, if df (x) = 0 then x is a global maximum. Hence

P(x)= 0H(x)=Φ . �

Lemma 4.8 shows that if0H(x) �=Φ , then x cannot be a global maximum on the

open set U .

AU

TH

OR

’S P

RO

OF



1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

Fig. 4.8 ???

Moreover for a concave or strictly pseudo-concave function P(x) is empty if0H(x) is empty (i.e., when df (x)= 0).

Figure 4.8 illustrates these observations. Let P(x) be the preferred set of a quasi-concave function f (at a non-critical point x).

Point y1 satisfies f (y1)= f (x) and thus belongs to R(x) and hence H(x). Pointy2 ∈H(x)\P(x) but there exists an open interval (x, z) belonging to [x, y2] and toP(x).

We may identify the linear map df (x) : �n → � with a vector Df (x) ∈ �n

where df (x)(h) = 〈Df (x),h〉 the scalar product of Df (x) with h. Df (x) is thedirection gradient, and is normal to the indifference surface at x, and therefore tothe hyperplane ∂H(x)= {y ∈ Y : df (x)(y − x)= 0}.

To see this intuitively, note that the indifference surface I (x)= {y ∈ Y : f (y)=f (x)} through x, and the hyperplane ∂H(x) are tangent at x. Just as df (x) is anapproximation to the function f , so is the hyperplane ∂H(x) an approximation toI (x), near to x.

As we shall see in Example 4.7, a quasi-concave function, f , may have a critical

point x (so0H(x)=Φ) yet P(x) �=Φ . For example, if Y is the unit interval, then f

may have a degenerate critical point with P(x) �= Φ . Lemma 4.16 establishes thatthis cannot happen when f is concave and Y is an open set. The final Lemma of thissection extends Lemma 4.16 to the case when Y is admissible.

Lemma 4.17 Let f : Y ⊂�n→� be C1, and let Y be a convex admissible set.

1. If f is quasi-concave on Y , then ∀x ∈ Y,0H(x;Y) �=Φ implies P(x;y) �=Φ . If

x is a local maximum of f , then0H(x;Y)=Φ .

2. If f is a strictly pseudo-concave function on an admissible set Y , and x is alocal maximum, then it is a global strict maximum.

njschofi

Cross-Out

njschofi

Sticky Note

Illustration of Lemma 4.16

AU

TH

OR

’S P

RO

OF



1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

Fig. 4.9 ???

3. If f is concave C1 or strictly pseudo-concave on Y , then P(x;Y) ⊂ 0H(x :

Y) ∀x ∈ Y . Hence0H(x;Y)=Φ⇒ P(x;Y)=Φ .

Proof

1. If0H(x;Y) �= Φ then df (x)(y − x) > 0 for some y ∈ Y . Then by Lem-

ma 4.14(2), in any neighbourhood U of x in Y there exists z such thatf (z) > f (x). Hence x cannot be a local maximum, and indeed P(x) �=Φ .

2. By Lemma 4.15, f must be quasi-concave. By (1), if x is a local maximum thendf (x)(y−x)≤ 0 for all y ∈ Y . By definition this implies that f (y) < f (x) forall y ∈ Y such that y �= x. Thus x is a global strict maximum.

3. If f is concave and y ∈ P(x;Y), then f (y) > f (x). By Lemma 4.10,

df (x)(y − x)≥ f (y)− f (x) > 0. Thus y ∈ 0H(x).

4. If f is strictly pseudo-concave then f (y) > f (x) implies df (x)(y − x) > 0,

and so P(x;Y)⊂ 0H(x;Y). �

Example 4.7 These results are illustrated in Fig. 4.9.1. For the general function f1, b is a critical point and local maximum, but not

a global maximum (e). On the compact interval [a, d], d is a local maximumbut not a global maximum.

2. For the quasi-concave function f2, a is a degenerate critical point but neither alocal nor global maximum, while b is a degenerate critical point which is alsoa global maximum.Point c is a critical point which is also a local maximum. However on [b, c], c

is not a global maximum.

njschofi

Cross-Out

njschofi

Sticky Note

Illustration of Lemma 4.17.

AU

TH

OR

’S P

RO

OF



1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

3. For the concave function f3, clearly b is a degenerate (but negative semi-definite) critical point, which is also a local and global maximum. Moreoveron the interval [a, c], c is the local and global maximum, even though it is nota critical point. Note that df3(c)(a − c) < 0.

Lemma 4.17 suggests that we call any point x in an admissible set Y a general-

ized critical point in Y iff0H(x;Y)=Φ , of course if df (x)= 0, then

0H(x;Y)=Φ ,

but the converse is not true when x is a boundary point.Lemma 4.17 shows that (i) for a quasi-concave C1-function, a global maximum

is a local maximum is a generalised critical point; (ii) for a concave C1-or strictlypseudo-concave function a critical point is a generalised critical point is a localmaximum is a global maximum.

4.3.2 Economic Optimisation with Exogenous Prices

Suppose now that we wish to find the maximum of a quasi-concave C1-functionf : Y →� subject to a constraint g(x) ≥ 0 where g is also a quasi-concave C1-function g : Y →�.

As we know from the previous section, when Pf (x) = {y ∈ Y : f (y) > f (x)}and df (x) �= 0, then Pf (x)⊂Hf (x)= {y ∈ Y : df (x)(x − y)≥ 0}.

Suppose now that Hg(x) = {y ∈ Y : dg(x)(x − y) ≥ 0} has the property that

Hg(x)∩ 0Hf (x)=Φ , and x satisfies g(x)= 0.

In this case, there exists no point y such that g(y)≥ 0 and f (y) > f (x).

A condition that is sufficient for the disjointness of the two half-spaces0Hf (x)

and Hg(x) is clearly that λdg(x)+ df (x)= 0 for some λ > 0.In this case if df (x)(v) > 0, then dg(x)(v) < 0, for any v ∈ �n.Now let L= Lλ(f,g) be the Lagrangian f + λg : Y →�. A sufficient condition

for x to be a solution to the optimisation problem is that dL(x)= 0.Note however that this is not a necessary condition. As we know from the pre-

vious section it might well be the case for some point x on the boundary of theadmissible set Y that dL(x) �= 0 yet there exists no y ∈ Y such that

dg(x)(x − y)≥ 0 and df (x)(x − y) > 0.

Figure 4.10 illustrates such a case when

Y = {(x, y) ∈ �2 : x ≥ 0, y ≥ 0}.

We shall refer to this possibility as the boundary problem.As we know we may represent the linear maps df (x), dg(x) by the direction

gradients, or vectors normal to the indifference surfaces, labelled Df (x),Dg(x).When the maps df (x), dg(x) satisfy df (x)+λdg(x)= 0, λ > 0, then the direc-

tion gradients are positively dependent and satisfy Df (x)+ λDg(x)= 0.

AU

TH

OR

’S P

RO

OF



1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

Fig. 4.10 ???

In the example Df (x) and Dg(x) are not positively dependent, yet x is a solutionto the optimisation problem.

It is often common to make some boundary assumption so that the solution doesnot belong to the boundary of the feasible or admissible set Y .

In the more general optimisation problem (f, g) : Y →�m+1, the Kuhn Tuckertheorem implies that a global saddle point (x∗, λ∗) to the Lagrangian Lλ(f,g) =f +∑λigi gives a solution x∗ to the optimisation problem. Aside from the bound-ary problem, we may find the global maxima of Lλ(f,g) by finding the criticalpoints of Lλ(f,g).

Thus we must choose x∗ such that

df(x∗)+

m∑

i=1

λi dgi

(x∗)= 0.

Once a coordinate system is chosen this is equivalent to finding x∗, and coeffi-cients λ1, . . . , λm all non-negative such that

Df(x∗)+

m∑

i=1

λiDgi

(x∗)= 0.

The Kuhn Tucker Theorem also showed that if x∗ is such that gi(x∗) > 0, then

λi = 0 and if gi(x∗)= 0 then λi > 0.

Example 4.8 Maximise the function f : �→�:

x → x2 : x ≥ 0

x → 0 : x < 0

subject to g1(x)= x ≥ 0 and g2(x)= 1− x ≥ 0. Now Lλ(x)= x2 + λ1x + λ2(1−x); ∂L

∂x= 2x + λ1 − λ2 = 0, ∂L

∂λ1= x = ∂L

∂λ2= 1− x = 0.

Clearly these equations have no solution. By inspection the solution cannot sat-isfy g1(x)= 0. Hence choose λ1 = 0 and solve

Lλ(x)= x2 + λ(1− x).

njschofi

Cross-Out

njschofi

Sticky Note

Optimisation of f on a constraint, g

AU

TH

OR

’S P

RO

OF



1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

Fig. 4.11 ???

Then ∂L∂x= 2x − λ, ∂L

∂λ= 1− x = 0. Thus x = 1 and λ= 2 is a solution.

Suppose now that f,g1, . . . , gm are all concave functions on the convex admis-sible set Y = {x ∈ �n : xi ≥ 0, i = 1, . . . , n}. Obviously if z= αy + (1− α)x, then

Lλ(f,g)(z) = f (z)+m∑

i=1

λigi(z)

≥ αf (y)+ (1− α)f (x)+m∑

i=1

λi

[αgi(y)+ (1− α)gi(x)

]

= αLλ(f,g)(y)+ (1− α)Lλ(f,g)(x).

Thus Lλ(f,g) is a concave function. By Lemma 4.11, x∗ is a global maximumof Lλ(f,g) iff dL(f,g)(x∗)= 0 (aside from the boundary problem).

For more general functions, to find the global maximum of the LagrangianLλ(f,g), and thus the optimum to the problem (f, g), we find the critical pointsof Lλ(f,g). Those critical points which have negative definite Hessian will then belocal maxima of Lλ(f,g). However we still have to examine the local maxima whenthe Hessian of the Lagrangian is negative semi-definite to find the global maxima.Even in this general case, any solution x∗ to the problem (f, g) must be a globalmaximum for a suitably chosen Lagrangian Lλ(f,g), and thus must satisfy the firstorder condition

Df(x∗)+

m∑

i=1

λiDgi

(x∗)= 0

(again, subject to the boundary problem).

Example 4.9 Maximise f : �2 → � : (x, y) → xy subject to the constraintg(x, y)= 1− x2 − y2 ≥ 0. We seek a solution to the first order condition:

DL(x,y)=Df (x, y)+ λDg(x, y)= 0.

Thus (y, x)+ λ(−2x,−2y)= 0 or λ= y2x= x

2yso x2 = y2.

njschofi

Cross-Out

njschofi

Sticky Note

The Lagrangian L(f,g).

AU

TH

OR

’S P

RO

OF



1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

Fig. 4.12 ???

For x = −y, λ < 0 and so Df (x, y) = |λ|Dg(x, y), (corresponding to a mini-mum of f on the feasible set g(x, y)≥ 0).

Thus we choose x = y and λ = 12 . For (x, y) on the boundary of the constraint

set we require 1− x2 − y2 = 0. Hence x = y =± 1√2

.

The Lagrangian is therefore L = xy + 12 (1 − x2 − y2) with differential (with

respect to x, y)

DL(x,y) = (y − x, x − y) and Hessian

HL(x,y) =(−1 1

1 −1

).

The eigenvalues of HL are −2, 0 corresponding to eigenvectors (1,−1) and(1,1) respectively. Hence HL is negative semi-definite, and so for example thepoint ( 1√

2, 1√

2) is a local maximum for the Lagrangian.

As we have observed in Example 3.4, the function f (x, y) = xy is not quasi-concave on �2, and hence it is not the case that Pf ⊂ Hf . However on �2+ ={(x, y) ∈ �2 : x ≥ 0, y ≥ 0}, f is quasi-concave, and so the optimality conditionHg(x, y)∩Hf (x, y)=Φ is sufficient for an optimum.

Note also that Df (x, y)= (y, x) and so the origin (0,0) is a critical point of f .However setting DL(x,y) = 0 at (x, y) = (0,0) requires λ = 0. In this case how-ever

HL(x,y)=(

0 11 0

)

njschofi

Cross-Out

njschofi

Sticky Note

Example 4.9.

njschofi

Sticky Note

See Figure 4.12

AU

TH

OR

’S P

RO

OF



1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

Fig. 4.13 ???

and as in Example 4.4, HL is non-degenerate with eigenvalues +1, −1. HenceHL is not negative semi-definite, and so (0,0) cannot be a local maximum for theLagrangian.

However if we were to maximise f (x, y)=−xy on the feasible set �2+ subjectto the same constraint then L would be maximized at (0,0) with λ= 0.

Example 4.10 In Example 3.7 we examined the maximisation of a convex pref-erence correspondence of a consumer subject to a budget constraint of the formB(p)= {x ∈ �n+ : 〈p,x〉 ≤ 〈p, e〉 = I }, given by an exogenous price vector p ∈ �n+,and initial endowment vector e ∈ �n+.

Suppose now that the preference correspondence is given by a utility function:

f : �2+ →� : (x, y)→ β logx + (1− β) logy, 0 < β < 1.

Clearly Df (x, y)= (βx,

1−βy

), and so

Hf (x, y)=( −β

x2 0

0 −(1−β)

y2

)

is negative definite. Thus f is concave on �2. The budget constraint is

g(x, y)= I − p1x − p2y ≥ 0

where p1,p2 are the given prices of commodities x, y. The first order condition onthe Lagrangian is: Df (x, y) + λDg(x, y) = 0, i.e., (

βx,

1−βy

) + λ(−p1,−p2) = 0

and λ > 0. Hence p1p2= β

x· y

1−β. See Fig. 4.13.

Now f is concave and has no critical point within the constraint set. Thus (x, y)

maximises Lλ iff

y = x

(p1

p2

)(1− β

β

)and g(x, y)= 0.

njschofi

Cross-Out

njschofi

Sticky Note

Optimising on a budget constraint

AU

TH

OR

’S P

RO

OF



1473

1474

1475

1476

1477

1478

1479

1480

1481

1482

1483

1484

1485

1486

1487

1488

1489

1490

1491

1492

1493

1494

1495

1496

1497

1498

1499

1500

1501

1502

1503

1504

1505

1506

1507

1508

1509

1510

1511

1512

1513

1514

1515

1516

1517

1518

Thus y = I−p1xp2

and so x = Iβp1

, y = I (1−β)p2

, and λ= 1I

, is the marginal utility ofincome.

Now consider a situation where prices vary. Then optimal consumption(x∗, y∗)= (d1(p1,p2), d2(p1,p2)) where di(p1,p2) is the demand for commodityx or y. As we have just shown, d1(p1,p2)= Iβ

p1, and d2(p1,p2)= I (1−β)

p2.

Suppose that all prices are increased by the same proportion i.e., (p′1,p′2) =α(p1,p2), α > 0.

In this exchange situation I ′ = p′1e1 + p′2e2 = αI = α(p1e1 + p2e2).

Thus x′ = I ′βp′1= αIβ

αp1= x′, and y′ = y.

Hence di(αp1, αp2)= di(p1,p2), for i = 1,2. The demand function is said to behomogeneous in prices.

Suppose now that income is obtained from supplying labor at a wage rate w say.Let the supply of labor by the consumer be e = 1− x3, where x3 is leisure time

and enters into the utility function.Then f : �3 → � : (x1x2x3) → ∑3

i=1 ai logxi and the budget constraint isp1x1 + p2x2 ≤ (1− x3)w, or g(x1, x2, x3) = w − (p1x1 + p2x2 + wx3) ≥ 0. Thefirst order condition is ( a1

x1, a2

x2,

a3x3

)= λ(p1,p2,w),λ > 0.Clearly the demand function will again be homogeneous, since d(p1,p2,w) =

d(αp1, αp2, αw).For the general consumer optimisation problem, we therefore normalise the price

vector. In general, in an n-commodity exchange economy let

Δ= {p ∈ �n+ : ‖p‖ = 1}

be the price simplex. Here ‖ ‖ is a convenient norm on �n.If f : �n+ →� is the utility function, let

D∗f (x)= Df (x)

‖Df (x)‖ ∈Δ.

Suppose then that x∗ ∈ �n is a maximum of f : �n+ →� subject to the budgetconstraint 〈p,x〉 ≤ I .

As we have seen the first order condition is

Df (x)+ λDg(x)= 0,

where Dg(x)=−p = (−p1, . . . ,−pn), p ∈Δ, and

Df (x)=(

∂f

∂x1, . . . ,

∂f

∂xn

).

Thus Df (x)= λ(p1, . . . , pn)= λp ∈ �n+. But then D∗f (x)= p‖p‖ ∈Δ.

Subject to boundary problems, a necessary condition for optimal consumer be-havior is that D∗f (x)= p

‖p‖ .

As we have seen the optimality condition is that ∂f∂xi

/∂f∂xj= pi

pj, for the ith and

j th commodity, where ∂f∂xi

is often called the marginal utility of the ith commodity.

AU

TH

OR

’S P

RO

OF



1519

1520

1521

1522

1523

1524

1525

1526

1527

1528

1529

1530

1531

1532

1533

1534

1535

1536

1537

1538

1539

1540

1541

1542

1543

1544

1545

1546

1547

1548

1549

1550

1551

1552

1553

1554

1555

1556

1557

1558

1559

1560

1561

1562

1563

1564

Fig. 4.14 ???

Now any point y on the boundary of the budget set satisfies

〈p,y〉 = I = 1

λ

⟨Df(x∗), x∗⟩.

Hence y ∈H(p, I), the hyperplane separating the budget set from the preferredset at the optimum x∗, iff 〈Df (x∗), y − x∗〉 = 0.

Consider now the problem of maximisation of a profit function by a producer

π(x1, . . . , xm, xm+1, . . . , xn)=n−m∑

j=1

pm+j xm+j −m∑

j=1

pjxj ,

where (x1, . . . , xm) ∈ � are inputs, (xm+1, . . . , xn) are outputs and p ∈ �n+ is a non-negative price vector.

As in Example ??, the set of feasible input-output combinations is given by theproduction set G= {x ∈ �n+ : F(x)≥ 0} where F : �n+ → � is a smooth functionand F(x)= 0 when x is on the upper boundary or frontier of the production set G.

At a point x on the boundary, the vector which is normal to the surface {x :F(x)= 0} is

DF(x)=(

∂F

∂x1, . . . ,

∂F

∂xn

)

x

.

The first order condition for the Lagrangian is that

Dπ(x)+ λDF(x)= 0

or (−p1, . . . ,−pm,pm+1, . . . , pn)+ λ( λFλx1

, . . . , ∂F∂xn

)= 0.For example with two inputs (x1 and x2) and one output (x3) we might express

maximum possible output y in terms of x1 and x2, i.e., y = g(x1, x2). Then thefeasible set is

{x ∈ �3+ : F(x1, x2, x3)= g(x1, x2)− x3 ≥ 0

}.

njschofi

Cross-Out

njschofi

Sticky Note

Maximising profit on a production set.

AU

TH

OR

’S P

RO

OF



1565

1566

1567

1568

1569

1570

1571

1572

1573

1574

1575

1576

1577

1578

1579

1580

1581

1582

1583

1584

1585

1586

1587

1588

1589

1590

1591

1592

1593

1594

1595

1596

1597

1598

1599

1600

1601

1602

1603

1604

1605

1606

1607

1608

1609

1610

Then (−p1,−p2,p3)+ λ(∂g∂x1

,∂g∂x2

,−1)= 0 and so

p1 = p3∂g

∂x1, p2 = p3

∂g

∂x2or

p1

p2= ∂g

∂x1

/ ∂g

∂x2.

Here ∂g∂xj

is called the marginal product (with respect to commodity j for j =1,2).

For fixed x = (x1, x2) consider the locus of points in �2+ such that y = g(x) is

a constant. If (∂g∂x1

,∂g∂x2

)x �= 0 at x, then by the implicit function theorem (discussedin the next chapter) we can express x2 as a function x2(x1) of x1, only, near x.

In this case ∂g∂x1+ dx2

dx1

∂g∂x2= 0 and so ∂g

∂x1/

∂g∂x2|x = dx2

dx1|x = p1

p2.

The ratio ∂g∂x1

/∂g∂x2|x is called the marginal rate of technical substitution of x2

for x1 at the point (x1, x2).

Example 4.11 There are two inputs K (capital) and L (labor), and one output, Y ,say.

Let g(K,L)= [dK−ρ + (1− d)L−ρ]−1ρ . The feasibility constraint is

F(K,L,Y )= g(K,L)− Y ≥ 0.

Let −v,−w,p be the prices of capital, labor and the output. For optimality weobtain:

(−v,−w,p)+ λ

(∂F

∂K,∂F

∂L,∂F

∂Y

)= 0.

On the production frontier, g(K,L)= Y and so p =−λ∂F∂Y= λ since ∂F

∂L=−1.

Now let X = [dk−ρ + (1− d)L−ρ].Then ∂F

∂K= (− 1

ρ)[−ρdk−ρ−1] X− 1

ρ−1.

Now Y =X− 1

ρ so Y 1+ρ =X− 1

ρ−1. Thus ∂F

∂K= d( Y

K)1+ρ .

Similarly ∂F∂L= (1− d)(Y

L)1+ρ . Thus r

w= ∂F

∂K/∂F

∂L= d

1−d( LK

)1+ρ .In the case just of a single output, where the production frontier is given by a

function

xn+1 = g(x1, . . . , xn) and (x1, . . . , xn) ∈ �n+is the input vector, then clearly the constraint set will be a convex set if and only if g

is a concave function. (See Example 3.3.) In this case the solution to the Lagrangianwill give an optimal solution. However when the constraint set is not convex, thensome solutions to the Lagrangian problem may be local minima. See Fig. 4.15 foran illustration.

As with the consumer, the optimum point on the production frontier is unchangedif all prices are multiplied by a positive number.

For a general consumer let d : �n+ →�n+ be the demand map where

d(p1, . . . , pn)=(x∗1 (p), . . . , x∗n(p)

)= x∗(p)

njschofi

Sticky Note

See Figure 4.14.

AU

TH

OR

’S P

RO

OF



1611

1612

1613

1614

1615

1616

1617

1618

1619

1620

1621

1622

1623

1624

1625

1626

1627

1628

1629

1630

1631

1632

1633

1634

1635

1636

1637

1638

1639

1640

1641

1642

1643

1644

1645

1646

1647

1648

1649

1650

1651

1652

1653

1654

1655

1656

Fig. 4.15 ???

and x∗(p) is any solution to the maximisation problem on the budget set

B(p1, . . . , pn)={x ∈ �n+ : 〈p,x〉 ≤ I

}.

In general it need not be the case that d is single-valued, and so it need not be afunction.

As we have seen, d is homogeneous in prices and so we may regard d as acorrespondence

d :Δ→�n+.

Similarly for a producer let s : Δ→�n+ where s(p1, . . . , pn) = (−x∗1 (p), . . . ,

−x∗m(p), x∗m+1(p) . . .) be the supply correspondence. (Here the first m values arenegative because these are inputs.)

Now consider a society {1, . . . , i, . . . ,m} and commodities named {1 . . . j . . . n}.Let ei ∈ �n+ be the initial endowment vector of agent i, and e =∑m

i=1 ei the totalendowment of the society. Then a price vector p∗ ∈ Δ is a market-clearing priceequilibrium when e +∑m

i=1 si(p∗) =∑m

i=1 di(p∗) where si(p

∗) ∈ �n+ belongs tothe set of optimal input-output vectors at price p∗ for agent i, and di(p

∗) is anoptimal demand vector for consumer i at price vector p∗.

As an illustration, consider a two person, two good exchange economy (withoutproduction) and let eij be the initial endowment of good j to agent i. Let (f1, f2) :�2+ →�2 be the C1-utility functions of the two players.

At (p1,p2) ∈Δ, for optimality we have(

∂fi

∂xi1,

∂fi

∂xi2

)= λi(p1,p2), λi > 0.

But x1j + x2j = e1j + e2j for j = 1 or 2, in market equilibrium. Thus ∂fi

∂xij=− ∂fi

∂xkj

when i �= k. Hence 1λ1

(− ∂f1∂x11

,∂f1∂x12

)= (p1,p2)= 1λ2

(∂f2∂x11

,− ∂f2∂x12

) or (∂f1∂x11

,∂f1∂x12

)+λ(

∂f2∂x11

,∂f2∂x12

)= 0, for some λ > 0. See Fig. 4.16.As we shall see in the next section this implies that the result (x11, x12, x21, x22)

of optimal individual behaviour at the market-clearing price equilibrium is a Paretooptimal outcome under certain conditions.

njschofi

Cross-Out

njschofi

Sticky Note

A non-convex production set.

AU

TH

OR

’S P

RO

OF


4.4 The Pareto Set and Price Equilibria 171

1657

1658

1659

1660

1661

1662

1663

1664

1665

1666

1667

1668

1669

1670

1671

1672

1673

1674

1675

1676

1677

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1689

1690

1691

1692

1693

1694

1695

1696

1697

1698

1699

1700

1701

1702

Fig. 4.16 ???

4.4 The Pareto Set and Price Equilibria

4.4.1 The Welfare and Core Theorems

Consider a society M = {1, . . . ,m} of m individuals where the preference of theith individual in M on a convex admissible set Y in �n is given by a C1-functionui : Y ⊂�n→�. Then u= (u1, . . . , um) : Y ⊂�n→�m is called a C1-profile forthe society.

A point y ∈ Y is said to be Pareto preferred (for the society M) to x ∈ Y iffui(y) > ui(x) for all i ∈M . In this case write y ∈ PM(x) and call PM : Y → Y

the Pareto correspondence. The (global) Pareto set for M on Y is the choiceP(u1, . . . , um)= {y ∈ Y : PM(y)=Φ}. We seek to characterise this set.

In the same way as before we shall let

Hi(x)= {y ∈ Y : dui(x)(y − x) > 0}

where Hi : Y → Y for each i = 1, . . . ,m. (Notice that Hi(x) is open ∀x ∈ Y .)Given a correspondence P : Y → Y the inverse correspondence P−1 : Y → Y is

defined by

P−1(x)= {y ∈ Y : x ∈ P(y)}.

In Sect. 3.3 we said a correspondence P : Y → Y (where Y is a topologicalspace) is lower demi-continuous (LDC) iff for all x ∈ Y,P−1(x) is open in Y .

Clearly if ui : W → � is continuous then the preference correspondence Pi :Y → Y given by Pi(x)= {y : ui(y) > ui(x)} is LDC, since

P−1i (y)= {x ∈ Y : ui(y) > u(x)

}

is open.

njschofi

Cross-Out

njschofi

Sticky Note

Two person, two good exchange economy.

AU

TH

OR

’S P

RO

OF



1703

1704

1705

1706

1707

1708

1709

1710

1711

1712

1713

1714

1715

1716

1717

1718

1719

1720

1721

1722

1723

1724

1725

1726

1727

1728

1729

1730

1731

1732

1733

1734

1735

1736

1737

1738

1739

1740

1741

1742

1743

1744

1745

1746

1747

1748

We show now that when ui : Y →� is C1 then Hi : Y → Y is LDC and as aconsequence if Hi(x) �=Φ then Pi(x) �=Φ .

This implies that if x is a global maximum of ui on Y (so Pi(x) = Φ) thenHi(x)=Φ (so x is a generalised critical point).

Lemma 4.18 If ui : Y →� is a C1-function on the convex admissible set Y , thenHi : Y → Y is lower demi-continuous and if Hi(x) �=Φ then Pi(x) �=Φ .

Proof Suppose that Hi(x) �= Φ . Then there exists y ∈ Y such that dui(x)(y −x) > 0. Let h= y − x.

By the continuity of dui : Y → L(�n,�) there exists a neighbourhood U of x

in Y , and a neighbourhood V of h in �n such that dui(z)(h′) > 0 for all z ∈ U , for

all h′ ∈ V .Since y ∈Hi(x), we have x ∈H−1

i (y). Now h= y − x. Let

U ′ = {x′ ∈U : y − x′ ∈ V}.

For all x′ ∈ U ′, dui(x′)(y − x′) > 0. Thus U ′ ⊂ H−1

i (y). Hence H−1i (y) is open.

This is true at each y ∈ Y , and so Hi is LDC.Suppose that Hi(x) �=Φ and h ∈Hi(x). Since Hi is LDC it is possible to choose

λ ∈ (0,1), by Taylor’s Theorem, such that

ui(x + λh)= ui(x)+ dui(z)(λh),

where dui(z)(h) > 0, and z ∈ (x, x + λh). Thus ui(x + λh) > ui(x) and soPi(x) �=Φ . �

When u= (u1, . . . , um) : Y →�m is a C1-profile then define the correspondenceHM : Y → Y by HM(x)=⋂i∈M Hi(x) i.e., y ∈HM(x) iff dui(x)(y − x) > 0 forall i ∈M .

Lemma 4.19 If (u1, . . . , um) : Y →�m is a C1-profile, then HM : Y → Y is lowerdemi-continuous. If HM(x) �=Φ then PM(x) �=Φ .

Proof Suppose that HM(x) �=Φ . Then there exists y ∈Hi(x) for each i ∈M . Thusx ∈ H−1

i (y) for all i ∈M . But each H−1i (y) is open; hence ∃ an open neighbour-

hood Ui of x in H−1i (y); let U =⋂i∈M Ui . Then x′ ∈U implies that x′ ∈H−1

M (y).Thus HM is LDC. As in the proof of Lemma 4.18 it is then possible to chooseh ∈ �n such that, for all i in M ,

ui(x + h)= ui(x)+ dui(z)(h)

where z belongs to U , and dui(z)(h) > 0. Thus x + h ∈ PM(x) and soPM(x) �=Φ . �

AU

TH

OR

’S P

RO

OF



1749

1750

1751

1752

1753

1754

1755

1756

1757

1758

1759

1760

1761

1762

1763

1764

1765

1766

1767

1768

1769

1770

1771

1772

1773

1774

1775

1776

1777

1778

1779

1780

1781

1782

1783

1784

1785

1786

1787

1788

1789

1790

1791

1792

1793

1794

The set {x : HM(x) = Φ} is called the critical Pareto set, and is often writtenas ΘM , or Θ(u1, . . . , um). By Lemma 4.19, Θ(u1, . . . , um) contains the Pareto setP(u1, . . . , um).

Moreover we can see that ΘM must be closed in Y . To see this suppose thatHM(x) �=Θ , and y ∈HM(x) �=Φ . Thus x ∈H−1

M (y). But HM is LDC and so thereis a neighbourhood U of x in Y such that x′ ∈ H−1

M (y) for all x′ ∈ U . Then y ∈HM(x′) for all x′ ∈U , and so HM(x′) �=Φ for all x′ ∈U .

Hence the set {x ∈ Y : HM(x) �= Φ} is open and so the critical Pareto set isclosed.

In the same way, the Pareto correspondence PM : Y → Y is given by PM(x) =⋂i∈M Pi(x) where Pi(x) = {y : ui(y) > ui(x)} for each i ∈M . Since each Pi is

LDC, so must be PM , and thus the Pareto set P(u1, . . . , um) must also be closed.Suppose now that u1, . . . , um are all concave C1-or strictly pseudo-concave func-

tions on the convex set Y .By Lemma 4.17, for each i ∈M,Pi(x)⊂Hi(x) at each x ∈ Y .If x ∈Θ(u1, . . . , um) then

⋂

i∈M

Pi(x)⊂⋂

i∈M

Hi(x)=Φ

and so x must also belong to the (global) Pareto set. Thus if u= (u1, . . . , um) witheach ui concave C1 or strictly pseudo-concave, then the global Pareto set P(u)

and the critical Pareto set Θ(u) coincide. In this case we may more briefly say thepreference profile represented by u is strictly convex.

A point in P(u1, . . . , um) is the precise analogue, in the case of a family of func-tions, of a maximum point for a single function, while a point in Θ(u1, . . . , um) isthe analogue of a critical point of a single function u : Y →�. In the case of a fam-ily or profile of functions, a point x belongs to the critical Pareto set ΘM(u), whena generalised Lagrangian L(u1, . . . , um) has differential dL(x)= 0.

This allows us to define a Hessian for the family and determine which criticalPareto points are global Pareto points.

Suppose then that u = (u1, . . . , um) : Y →�m where Y is a convex admissibleset in �n and each ui : Y →� is a C1-function.

A generalised Lagrangian L(λ,u) for u is a semipositive combination∑m

i=1 λiui

where each λi ≥ 0 but not all λi = 0.For convenience let us write

�m+ ={x ∈ �m : xi ≥ 0 for i ∈M

}

0�m+ ={x ∈ �m : xi > 0 for i ∈M

}, and

�m+ = �m+\{0}.

AU

TH

OR

’S P

RO

OF



1795

1796

1797

1798

1799

1800

1801

1802

1803

1804

1805

1806

1807

1808

1809

1810

1811

1812

1813

1814

1815

1816

1817

1818

1819

1820

1821

1822

1823

1824

1825

1826

1827

1828

1829

1830

1831

1832

1833

1834

1835

1836

1837

1838

1839

1840

Thus λ ∈ �m+ iff each λi ≥ 0 but not all λi = 0. Since each ui : Y →� is a C1-function, the differential at x is a linear map dui(x) : �n →�. Once a coordinatebasis for �n is chosen, dui(x) may be represented by the row vector

Dui(x)=(

∂ui

∂x

∣∣∣∣x

, . . . ,∂ui

∂xn

∣∣∣∣x

).

Similarly the profile u : Y →�m has differential at x represented by the (n×m)Jacobian matrix

Du(x)=⎛

⎜⎝

Du1(x)...

Dum(x)

⎞

⎟⎠ : �n→�m.

Suppose now that λ ∈ �m. Then define λ ·Du(x) : �n→� by

(λ ·Du(x)

)(v)= ⟨λ,Du(x)(v)

⟩

where 〈λ,Du(x)(v)〉 is the scalar product of the two vectors λ,Du(x)(v) in �m.

Lemma 4.20 The gradient vectors {Dui(x) : i ∈M} are linearly dependent andsatisfy the equation

m∑

i=1

λiDui(x)= 0

iff [ImDu(x)]⊥ is the subspace of �m spanned by λ= (λ1, . . . , λm).Here λ ∈ [ImDu(x)]⊥ iff 〈λ,w〉 = 0 for all w ∈ ImDu(x).

Proof

λ ∈ [ImDu(x)]⊥ ⇔ 〈λ,w〉 = 0 ∀ w ∈ ImDu(x)

⇔ ⟨λ,Du(x)(v)

⟩= 0 ∀ v ∈ �n

⇔ (λ ·Du(x)

)(v)= 0 ∀ v ∈ �n

⇔ λ ·Du(x)= 0.

But λ ·Du(x)= 0⇔∑mi=1 λiDui(x)= 0, where λ= (λ1, . . . , λm). �

Theorem 4.21 If u : Y →�m is a C1-profile on an admissible convex set and x

belongs to the interior of Y , then x ∈ Θ(u1, . . . , um) iff there exists λ ∈ �m+ suchthat dL(λ,u)(x)= 0.

If x belongs to the boundary of Y and dL(λ,u)(x) = 0, for λ ∈ �m+, then x ∈Θ(u1, . . . , um).

AU

TH

OR

’S P

RO

OF



1841

1842

1843

1844

1845

1846

1847

1848

1849

1850

1851

1852

1853

1854

1855

1856

1857

1858

1859

1860

1861

1862

1863

1864

1865

1866

1867

1868

1869

1870

1871

1872

1873

1874

1875

1876

1877

1878

1879

1880

1881

1882

1883

1884

1885

1886

Proof Pick a coordinate basis for �n. Suppose that there exists λ ∈ �m+ such that

L(λ,u)(x)=m∑

i=1

λiui(x) ∈ �,

satisfies∑m

i=1 λiDui(x)= 0 (that is to say DL(λ,u)(x)= 0).By Lemma 4.20 this implies that

λ ∈ [Im(Du(x))]⊥

.

However suppose x /∈ Θ(u1, . . . , um). Then there exists v ∈ �n such that

Du(x)(v) = w ∈ 0�m+, i.e., 〈Dui(x), v〉 = wi > 0 for all i ∈ M , where w =(w1, . . . ,wm). But w ∈ ImDu(x) and w ∈ 0�m+.

Moreover λ ∈ �m+ and so 〈λ,w〉> 0 (since not all λi = 0).This contradicts λ ∈ [Im(Du(x))]⊥, since 〈λ,w〉 �= 0. Hence x ∈Θ(u1, . . . , um).

Thus we have shown that for any x ∈ Y , if DL(λ,u)(x) = 0 for some λ ∈ �m+,then x ∈Θ(u1, . . . , um). Clearly DL(λ,u)(x)= 0 iff DL(λ,u)(x)= 0, so we haveproved sufficiency.

To show necessity, suppose that {Dui(x) : i ∈M} are linearly independent. Ifx belongs to the interior of Y then for a vector h ∈ �n there exists a vector y =x + θh, for θ sufficiently small, so that y ∈ Y and ∀i ∈M, 〈Dui(x),h〉 > 0. Thusx /∈Θ(u1, . . . , um).

So suppose that DL(λ,u)(x) = 0 where λ �= 0 but λ /∈ 0�m+. Then for at least

one i, λi < 0. But then there exists a vector w ∈ 0�m+ where w = (w1, . . . ,wm) andwi > 0 for each i ∈M , such that 〈λ,w〉 = 0. By Lemma 4.20, w ∈ Im(Du(x)).Hence there exists a vector h ∈ �n such that Du(x)(h)=w.

But w ∈ 0�m+, and so 〈Dui(x), h〉> 0 for all i ∈M . Since x belongs to the interiorof Y , there exists a point y = x + αh such that y ∈ Hi(x) for all i ∈ M . Hencex /∈Θ(u1, . . . , um).

Consequently if x is an interior point of Y then x ∈Θ(u1, . . . , um) implies thatdL(λ,u)(x)= 0 for some semipositive λ in �m+. �

Example 4.12 To illustrate, we compute the Pareto set in �2 when the utility func-tions are

u1(x1, x2) = x1αx2 where α ∈ (0,1), and

u2(x1, x2) = 1− x12 − x2

2.

We maximise u1 subject to the constraint u2(x1, x2)≥ 0.As in Example 4.9, the first order condition is

(αx1

α−1x2, x1α)+ λ(−2x1,−2x2)= 0.

AU

TH

OR

’S P

RO

OF



1887

1888

1889

1890

1891

1892

1893

1894

1895

1896

1897

1898

1899

1900

1901

1902

1903

1904

1905

1906

1907

1908

1909

1910

1911

1912

1913

1914

1915

1916

1917

1918

1919

1920

1921

1922

1923

1924

1925

1926

1927

1928

1929

1930

1931

1932

Fig. 4.17 ???

Hence λ= αx1α−1x2

2x1= x1

α

2x2, so αx2

2 = x12, or x1 =±√αx2.

If x1 = −√αx2 then λ = x1α√α

2(−x1) < 0, and so such a point does not belong

to the critical Pareto set. Thus (x1, x2) ∈ Θ(u1, u2) iff x1 = x2√

α. Note that ifx1 = x2 = 0 then the Lagrangian may be written as

L(λ,u)(0,0)= λ1u1(0,0)+ λ2u2(0,0)

where λ1 = 0 and λ2 is any positive number. In the positive quadrant�2+, the criticalPareto set and global Pareto set coincide.

Finally to maximise u1 on the set {(x1, x2) : u2(x1, x2)≥ 0} we simply choose λ

such that u2(x1, x2)= 0.Thus x1

2+x22 = αx2

2+x22 = 1 or x2 = 1√

1+αand so (x1, x2)= (

√α√

1+α, 1√

1+α).

In the next chapter we shall examine the critical Pareto set Θ(u1, . . . , um) anddemonstrate that the set belongs to a singularity “manifold” which can be topologi-cally characterised to be of “dimension” m− 1. This allows us then to examine theprice equilibria of an exchange economy.

Note that in the example of a two person exchange economy studied inSect. 4.3.2, we showed that the result of individual optimising behaviour led toan outcome, x, such that

Du1(x)+ λDu2(x)= 0

where λ > 0. As we have shown here this “market clearing equilibrium” must be-long to the critical Pareto set Θ(u1, u2). Moreover, when both u1 and u2 representstrictly convex preferences, then this outcome belongs to the global Pareto set. Wedevelop this in the following theorem.

njschofi

Cross-Out

njschofi

Sticky Note

Example4.12

AU

TH

OR

’S P

RO

OF



1933

1934

1935

1936

1937

1938

1939

1940

1941

1942

1943

1944

1945

1946

1947

1948

1949

1950

1951

1952

1953

1954

1955

1956

1957

1958

1959

1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

The Welfare Theorem for an Exchange Economy Consider an exchange econ-omy where each individual in the society M = {1, . . . , i, . . . ,m} has initial endow-ment ei ∈ �n+ .1. Suppose that the demand x∗i (p) ∈ �n+ of agent i at each price p ∈ Δ is such

that x∗i (p) maximises the C1-utility function ui : �n →� on the budget setBi(p)= {xi ∈ �n+ : 〈p,xi〉 ≤ 〈p, ei〉} and satisfies(a) Dui(x

∗i (p))= λip, λi > 0,

(b) 〈p,x∗i (p)〉 = 〈p, ei〉.2. Suppose further that p∗ is a market clearing price equilibrium in the sense that∑m

i=1 x∗i (p∗)=∑mi=1 ei ∈ �n.

Then x∗ = (x∗1 (p∗), . . . , x∗m(p)) ∈Θ(u1, . . . , um).

Moreover if each ui is either concave, or strictly pseudo-concave then x∗ belongsto the Pareto set, P(u1, . . . , um).

Proof We need to define the set of outcomes first of all. An outcome, x, is a vector

x = (x1, . . . , xi, . . . , xm) ∈ (�n+)m =�nm+ ,

where xi = (xi1, . . . , xin) ∈ �n+ is an allocation for agent i. However there are n

resource constraintsm∑

i=1

xij =m∑

i=1

eij = e·j ,

for j = 1, . . . , n, where ei = (ei1, . . . , ein) ∈ �n+ is the initial endowment of agent i.Thus the set, Y , of feasible allocations to the members of M is a hyperplane of

dimension n(m− 1) in �nm+ through the point (e1, . . . , em).As coordinates for any point x ∈ Y we may choose

x = (x11, . . . , x1n, x21, . . . , x2n, . . . , x(m−1)1, . . . , x(m−1)n)

where it is implicit that the bundle of commodities available to agent m is

xm = (xm1, . . . , xmn)

where xmj = e·j −∑m−1i=1 xij .

Now define u∗i : Y →�, the extended utility function of i on Y by

u∗i (x)= ui(xi1, . . . , xin).

For i ∈M , it is clear that the direction gradient of i on Y is

Du∗i (x)=(

0, . . . ,0,∂ui

∂xi

, . . . ,∂ui

∂xin

,0, . . .

)= (. . . ,0, . . . ,Dui(x), . . . ,0

).

AU

TH

OR

’S P

RO

OF



1979

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

For agent m, ∂u∗m∂xij=− ∂um

∂xmjfor i = 1, . . . ,m− 1; thus

Du∗m(x) = −(

∂um

∂xm1, . . . ,

∂um

∂xmn

, . . .

)

= −(Dum(x), . . . , . . . ,Dum(x)).

If p∗ is a market-clearing price equilibrium, then by definition

m∑

i=1

x∗i(p∗)=

m∑

i=1

ei .

Thus x∗(p∗) = (x∗1 (p∗), . . . , x∗m−1(p∗)) belongs to Y . But each x∗i is a critical

point of ui : �n+ →� on the budget set Bi(p∗) and Dui(x

∗i (p∗))= λip

∗.Thus the Jacobian for u∗ = (u∗1, . . . , u∗m) : Y →�m at x∗(p∗) is

Du∗(x∗)=

⎡

⎢⎢⎢⎣

λ1p∗ 0 . . . . . . 0

0 λ2p∗

...... λm−1p

∗−λmp∗ −λmp∗ −λmp∗

⎤

⎥⎥⎥⎦

Hence 1λ1

Du∗1(x∗)+ 1λ2

Du∗2(x∗) · · · + 1λm

Du∗m(x∗)= 0. But each λi > 0 for i =1, . . . ,m. Then dL(μ,u∗)(x∗(p∗)) = 0 where L(μ,u∗)(x) =∑m

i=1 μiu∗i (x) and

μi = 1λi

and μ ∈ �m+.By Theorem 4.21, x∗(p∗) belongs to the critical Pareto set.Clearly, if for each i, ui : �n+ → � is concave C1- or strictly pseudo-concave

then u∗1 : Y →� will be also. By previous results, the critical and global Pareto setwill coincide, and so x∗(p∗) will be Pareto optimal. �

One can also show that the competitive allocation, x∗(p∗) ∈ �nm+ , constructedin this theorem is Pareto optimal in a very simple way. By definition x∗(p∗) ischaracterised by the two properties:1.∑m

i=1 x∗i (p∗)=∑mi=1 ei in �n (feasibility)

2. If ui(xi) > ui(x∗i (p∗)) then 〈p∗, xi〉> 〈p∗, ei〉 (by the optimality condition for

agent i).But if x∗i (p∗) is not Pareto optimal, then there exists a vector x = (x1, . . . , xm) ∈

�nm+ such that ui(xi) > ui(x∗i (p∗)) for i = 1, . . . ,m. By (2), 〈p∗, xi〉> 〈p∗, ei〉 for

each i and som∑

i=1

⟨p∗, xi

⟩=⟨

p∗,m∑

i=1

xi

⟩

>

⟨

p∗,m∑

i=1

ei

⟩

.

But if x ∈ �nm+ is feasible then∑m

i=1 xi ≤ ∑mi=1 ei which implies 〈p∗,∑m

i=1 xi〉 ≤ 〈p∗,∑mi=1 ei〉.

By contradiction x∗(p∗) must belong to the Pareto optimal set.

AU

TH

OR

’S P

RO

OF



2025

2026

2027

2028

2029

2030

2031

2032

2033

2034

2035

2036

2037

2038

2039

2040

2041

2042

2043

2044

2045

2046

2047

2048

2049

2050

2051

2052

2053

2054

2055

2056

2057

2058

2059

2060

2061

2062

2063

2064

2065

2066

2067

2068

2069

2070

The observation has an immediate extension to a result on existence of a core ofan economy.

Definition 4.3 Let e = (e1, . . . , em) ∈ �nm+ be an initial endowment vector for asociety M . Let D be any family of subsets of M , and let P = (P1, . . . ,Pm) be aprofile of preferences for society M , where each Pi : �n+ → �n+ is a preferencecorrespondence for i on the ith consumption space Xi ⊂�n+.

An allocation x ∈ �nm+ is S-feasible (for S ⊂ M) iff x = (xij ) ∈ �nm+ and∑i∈S xij =∑i∈S eij for each j = 1, . . . , n.Given e and P , an allocation x ∈ �nm+ belongs to the D-core of (e,P ) iff

x = (x1, . . . , xm) is M-feasible and there exists no coalition S ∈ D and an allo-cation y ∈ �nm+ such that y is S-feasible and of the form y = (y1, . . . , ym) withyi ∈ Pi(xi),∀i ∈ S.

To clarify this definition somewhat, consider the set Y from the proof of the wel-fare theorem. Y = YM is a hyperplane of dimension n(m− 1) through the endow-ment point e ∈ �nm+ . For any coalition S ∈D of cardinality s, there is a hyperplaneYS , say, of dimension n(s − 1) through the endowment point e, consisting of S-feasible trades among the members of S. Clearly YS ⊂ YM . If x ∈ YM but there issome y ∈ YS such that every member of S prefers y to x, then the members of S

could refuse to accept the allocation x. If there is no such point x, then x is “un-beaten”, and belongs to the D-core of the economy described by (e,P ).

Core Theorem Let D be any family of subsets of M . Suppose that p∗ ∈ Δ is amarket clearing price equilibrium for the economy (e,P ) and x∗(B) ∈ �nm+ is thedemand vector, where

x∗(p∗)= (x∗1 (p), . . . , x∗m(p)

) ∈ YM, and Pi

(x∗i(p∗))∩Bi

(p∗)=Φ.

Then x∗(p∗) belongs to the D-core, for the economy (e,P ).

Proof Suppose that x∗(p∗) is not in the core. Then there is some y ∈ YS such thaty = (yi : i ∈ S) and y ∈ Pi(xi) for each i ∈ S. Now x∗i (p∗) is a most preferred pointfor i on Bi(p

∗) so 〈p∗, yi〉> 〈p∗, ei〉 for all i ∈ S.Hence

⟨

p∗,∑

i∈S

yi

⟩

=∑

i∈S

⟨p∗, yi

⟩>∑

i∈S

⟨p∗, ei

⟩.

However if y ∈ YS , then∑

i∈S yi = ∑i∈S ei ∈ �n+, which implies 〈p∗,∑i∈S(yi − ei)〉 = 0. By contradiction, x∗(p∗) must be in the core. �

The Core Theorem shows, even if a price mechanism is not used, that if a marketclearing price equilibrium, p∗ does exist, then the core is non-empty. This means inessence that the competitive allocation, x∗(p∗), is Pareto optimal for every coalitionS ∈D.

AU

TH

OR

’S P

RO

OF



2071

2072

2073

2074

2075

2076

2077

2078

2079

2080

2081

2082

2083

2084

2085

2086

2087

2088

2089

2090

2091

2092

2093

2094

2095

2096

2097

2098

2099

2100

2101

2102

2103

2104

2105

2106

2107

2108

2109

2110

2111

2112

2113

2114

2115

2116

By the results of Sect. 3.8, a market clearing price equilibrium p∗ will exist undercertain conditions on preference. In particular suppose preference is representableby smooth utility functions that are concave or strictly pseudo-concave and mono-tonic in the individual consumption spaces. Then the conditions of the Welfare The-orem will be satisfied, and there will exist a market clearing price equilibrium, p∗,and a competitive allocation, x∗(p∗), at p∗ which belongs to the Pareto set for thesociety M . Indeed, since the Core Theorem is valid when D consists of all sub-sets of N , the two results imply that x∗(p∗) will then belong to the critical Paretoset ΘS , associated with each coalition S ∈M . This in turn suggests for any S, thereis a solution x∗ = x∗(p∗) to the Lagrangian problem dLS(μ,u)(x∗) = 0, whereLS(μ,u∗)(x)=∑i∈S μiu

∗i (x) and μi ≥ 0 ∀i ∈ S.

Here u∗i : YS →� is the extended utility function for i on YS .It is also possible to use the concept of a core in the more general context consid-

ered in Sect. 3.8, where preferences are defined on the full space X =ΠiXi ∈ �nm+ .In this case, however, a price equilibrium may not exist if the induced social prefer-ence violates convexity or continuity. It is then possible for the D-core to be empty.

Note in particular that the model outlined in this section implicitly assumes thateach economic agent chooses their demand so as to optimize a utility function onthe budget set determined by the price vector. Thus prices are treated as exogenousvariables. However, if agents treat prices as strategic variables then it may be rationalfor them to compute their effect on prices, and thus misrepresent their preferences.The economic game then becomes much more complicated than the one analyzedhere.

A second consideration is whether the price equilibria are unique, or even locallyunique. If there exists a continuum of pure equilibria, then prices may move aroundchaotically.

A third consideration concerns the attainment of the price equilibrium. InSect. 3.8 we constructed an abstract preference correspondence for an “auction-eer” so as to adjust the price vector to increase the value of the excess supply of thecommodities. We deal with these considerations in the next section and in Chap. 5.

4.4.2 Equilibria in an Exchange Economy

The Welfare Theorem gives an important insight into the nature of competitive al-locations. The coefficients μi of the Lagrangean L(μ,u) of the social optimisationproblem turn out to be inverse to the coefficients λi in the individual optimisationproblems, where λi is equal to the marginal utility of income for the ith agent.This in turn suggests that it is possible for an agent to transform his utility functionfrom ui to u′i in such a way as to decrease λi and thus increase μi , the “weight”of the ith agent in the social optimisation problem. This is called the problem ofpreference manipulation and is an interesting research problem with applications intrade theory.

Secondly the weights μi can be regarded as functionally dependent on the initialendowment vector (e1, . . . , em) ∈ �nm+ . Thus the question of market equilibriumcould be examined in terms of the functions μi : �nm+ →�, i = 1, . . . ,m.

AU

TH

OR

’S P

RO

OF



2117

2118

2119

2120

2121

2122

2123

2124

2125

2126

2127

2128

2129

2130

2131

2132

2133

2134

2135

2136

2137

2138

2139

2140

2141

2142

2143

2144

2145

2146

2147

2148

2149

2150

2151

2152

2153

2154

2155

2156

2157

2158

2159

2160

2161

2162

It is possible that one or a number of agents could destroy or exchange com-modities so as to increase their weights. This is termed the problem of resourcemanipulation or the transfer paradox (see Gale 1974, and Balasko 1978).

Example 4.13 To illustrate these observations consider a two person (i = 1,2) ex-change economy with two commodities (j = 1,2).

As in Example 4.10, assume the preference of the ith agent is given by a utilityfunction fi : �2+ →� : fi(x, y)= βi logx + (1− βi) logy where 0 < βi < 1.

Let the initial endowment vector of i be ei = (ei1, ei2). At the price vector p =(p1,p2), demand by agent i is di(p1,p2)= (

Iiβi

p1,

Ii (1−βi)p2

) where I = p1ei1+p2ei2

is the value at p of the endowment.Thus agent i “desires” to change his initial endowment from ei to e′i :

(ei1, ei2)→(e′i1, e′i2

)=(

βiei1 + βiei2p2

p1, (1− βi)ei2 + (1− βi)

ei1p1

p2

).

Another way of interpreting this is that i optimally divides expenditure betweenthe first and second commodities in the ratio β : (1− β). Thus agent i offers to sell(1 − β)ei1 units of commodity 1 for (1 − β)ei1p1 monetary units and buy (1 −βi)ei1

p1p2

units of the second commodity, and offers to sell βiei2 units of the second

commodity and buy βiei2p2p1

units of the first commodity.At the price vector (p1,p2) the amount of the first commodity on offer is (1−

βi)e11 + (1− β2)e21 and the amount on request is β1e12p∗2 + β2e22p

∗2 ; where p∗2

is the ratio p2 : p1 of relative prices. For (p1,p2) to be a market-clearing priceequilibrium we require

e11(1− β1)+ e21(1− β2)= p∗2(e12β1 + e22β2).

Clearly if all endowments are increased by a multiple α > 0, then the equilibriumrelative price vector is unchanged. Thus p∗2 is uniquely determined and so the finalallocations (e′11, e

′12), (e

′21, e

′22) can be determined.

As we showed in Example 4.10, the coefficients λi for the individual optimisationproblems satisfy λi = 1

Ii, where λi is the marginal utility of income for agent i.

By the previous analysis, the weights μi in the social optimisation problem sat-isfy μi = Ii . After some manipulation of the price equilibrium equation we find

μi

μk

= ei1(ei2 + βkek2)+ ei2(1− βk)ek1

ek1(ek2 + βiei2)+ ek2(1− βi)ei1.

Clearly if agent i can increase the ratio μi : μk then the relative utility of i vis-à-vis k is increased. However since the relative price equilibrium is uniquely de-termined in this example, it is not possible for agent i, say, to destroy some of theinitial endowments (ei1, ei2) so as to bring about an advantageous final outcome.The interested reader is referred to Balasko (1978).

AU

TH

OR

’S P

RO

OF



2163

2164

2165

2166

2167

2168

2169

2170

2171

2172

2173

2174

2175

2176

2177

2178

2179

2180

2181

2182

2183

2184

2185

2186

2187

2188

2189

2190

2191

2192

2193

2194

2195

2196

2197

2198

2199

2200

2201

2202

2203

2204

2205

2206

2207

2208

Fig. 4.18 ???

In this example the (relative) price equilibrium is unique, but this need not al-ways occur. Consider the two person, two commodity case illustrated below inFig. 4.18. As in the Welfare Theorem, the set of feasible outcomes Y is the sub-set of �4+ = (x11, x12, x21, x22) such that x11+x21 = e·1;x12+x22 = e·2, and this isa two-dimensional hyperplane through the point (e11, e12, e21, e22). Thus Y can berepresented in the usual two-dimensional Edgeworth box where point A, the mostpreferred point for agent 1, satisfies (x11, x12)= (e·1, e·2).

The price ray p̃ is that ray through (e11, e12) where tan α = p1p2

. Clearly (p1,p2)

is an equilibrium price vector if the price ray intersects the critical Pareto setΘ(f1, f2) at a point (x11, x12) in Y where p̃ is tangential to the indifference curvesfor f1 and f2 through (x11, x12). At such a point we then have Df1(x11, x12) +μDf2(x11, x12)= 0.

As Fig. 4.18 indicates there may well be a second price ray p̃′ which satisfiesthe tangency property. Indeed it is possible that there exists a family of such rays, oreven an open set V in the price simplex such that each p in V is a market clearingequilibrium. We now explore the question of local uniqueness of price equilibria.

To be more formal let X = �n+ be the commodity or consumption space. Aninitial endowment is a vector e = (e1, . . . , em) ∈Xm. A Cr utility function is a Cr -function u= (u1, . . . , um) :X→�m.

Let Cr(X,�m) be the set of Cr -profiles, and endow Cr(X,�m) with a topologyin the following way. (See the next chapter for a more complete discussion of theWhitney topology.)

A neighbourhood of f ∈ Cr(X,�m) is a set

{g ∈ Cr

(X,�m

) : ∥∥dkg(x)− dkf (x)∥∥< ε(x)

}for k = 0, . . . , r

where ε(x) > 0 for all x ∈X. (We use the notation that d0g = g and d1g = dg.)

njschofi

Cross-Out

njschofi

Sticky Note

Multiple Price equilibria in Example 4.13

AU

TH

OR

’S P

RO

OF



2209

2210

2211

2212

2213

2214

2215

2216

2217

2218

2219

2220

2221

2222

2223

2224

2225

2226

2227

2228

2229

2230

2231

2232

2233

2234

2235

2236

2237

2238

2239

2240

2241

2242

2243

2244

2245

2246

2247

2248

2249

2250

2251

2252

2253

2254

Write Cr(X,�m) for Cr(X,�m) with this topology. A property K is calledgeneric iff it is true for all profiles which belong to a residual set in Cr(X,�m).Here residual means that the set is the countable intersection of open dense sets inCr(X,�m).

If a property is generic then we may say that almost all profiles in Cr(X,�m)

have that property.A smooth exchange economy is a pair (e, u) ∈Xm × Cr(X,�m). As before the

feasible outcome set is

Y ={

(x1, . . . , xm) ∈Xm :m∑

i=1

xi =m∑

i=1

ei

}

and a price vector p belongs to the simplex Δ= {p ∈X : ‖p‖ = 1}, where Δ is anobject of dimension n− 1.

As in the welfare theorem, the demand by agent i at p ∈ Δ satisfies : x∗i (p)

maximises ui on

B(p)= {xi ∈X : 〈p,xi〉 ≤ 〈p, ei〉}.

As we have observed, under appropriate boundary conditions, we may assume∀ i ∈M,x∗i (p) satisfies1. 〈p,x∗i (p)〉 = 〈p, ei〉2. D∗ui(x

∗i (p))= p ∈Δ. Say (x∗(p∗),p∗)= (x∗1 (p∗), . . . , x∗m(p∗),p∗) ∈Xm×

Δ is a market or Walrasian equilibrium iff x∗(p∗) is the competitive allocationat p∗ and satisfies

m∑

i=1

x∗i(p∗)=

m∑

i=1

ei ∈ �n+.

The economy (e, u) is regular iff (e, u) is such that the set of Walrasian equilibriais finite.

Debreu-Smale Theorem on Generic Existence of Regular Economies There isa residual set U in Cr(X,�m) such that for every profile u ∈U , there is a dense setV ∈Xm with the property that (e, u) is a regular economy whenever (e, u) ∈ V ×U .

The proof of this theorem is discussed in the next chapter (the interested readermight also consult Smale 1976). However we can give a flavor of the proof here.

Consider a point (e, x,p) ∈ Xm ×Xm ×Δ. This space is of dimension 2nm+(n− 1).

Now there are n resource restrictions

m∑

i=1

xij =m∑

i=1

eij

for j = 1, . . . , n, together with (m− 1) budget restrictions 〈p,xi〉 = 〈p, ei〉 for i =1, . . . ,m− 1.

AU

TH

OR

’S P

RO

OF



2255

2256

2257

2258

2259

2260

2261

2262

2263

2264

2265

2266

2267

2268

2269

2270

2271

2272

2273

2274

2275

2276

2277

2278

2279

2280

2281

2282

2283

2284

2285

2286

2287

2288

2289

2290

2291

2292

2293

2294

2295

2296

2297

2298

2299

2300

Fig. 4.19 ???

Note that the budget restriction for the mth agent is redundant.Let Γ = {(e, x,p) ∈Xm×Xm×Δ} be the set of points satisfying these various

restrictions. Then Γ will be of dimension 2nm+ (n−1)−[n+m−1] = 2nm−m.However, we also have m distinct vector equations D∗ui(x)= p, i = 1, . . . ,m.Since these vectors are normalised, each one consists of (n− 1) separate equa-

tions. Chapter 5 shows that singularity theory implies that for every profile u ina residual set, each of these constraints is independent. Together these m(n − 1)

constraints reduce the dimension of Γ by m(n− 1). Hence the set of points in Γ

satisfying the first order optimality conditions is a smooth object Zu of dimension

2nm−m−m(n− 1)= nm.

Now consider the projection

Zu ⊂Xm × (Xm ×Δ)→Xm : (e, x,p)→ e,

and note that both Zu and Xm have dimension nm.A regular economy (e, u) is one such that the projection map proj : Zu → Xm :

(e, x,p)→ e has differential with maximal rank nm. Call e a regular value in thiscase. From singularity theory it is known that for all u in a residual set U , the setof regular values of the projection map is dense in Xm. Thus when u ∈ U , and e isregular, the set of Walrasian equilibria for (e, u) will be finite. Figure 4.19 illustratesthis. At e1 there is only one Walrasian equilibrium, while at e3 there are three. More-over in a neighbourhood of e3 the Walrasian equilibria move continuously with e.At e4 the Walrasian equilibrium set is one-dimensional. As e moves from the rightpast e2 the number of Walrasian equilibria drops suddenly from 3 to 1, and displaysa discontinuity. Note that the points (x,p) satisfying (e, x,p) ∈ (proj)−1(e) neednot be Walrasian equilibria in the classical sense, since we have considered only thefirst order conditions. It is clearly the case that if there is non-convex preference,then the first order conditions are not sufficient for equilibrium. However, Smale’s

njschofi

Cross-Out

njschofi

Sticky Note

Finite Walrasian equilibria.

AU

TH

OR

’S P

RO

OF



2301

2302

2303

2304

2305

2306

2307

2308

2309

2310

2311

2312

2313

2314

2315

2316

2317

2318

2319

2320

2321

2322

2323

2324

2325

2326

2327

2328

2329

2330

2331

2332

2333

2334

2335

2336

2337

2338

2339

2340

2341

2342

2343

2344

2345

2346

Fig. 4.20 ???

theorem shows the existence of extended Walrasian equilibria. The same difficultyoccurs in the proof that a Walrasian equilibrium gives a Pareto optimal point in Y .

Let0Θ(u)= 0

Θ(u1, . . . , um) be the set of points satisfying the first order conditiondL(λ,u)= 0 where λ ∈ �m+. Suppose that we solve this with λ1 �= 0.

Then we may write Du1(x)+∑mi=2

λi

λ1Dui(x)= 0.

Clearly there are (m − 1) degrees of freedom in this solution and indeed0Θ(u1, . . . , um) can be shown to be a geometric object of dimension (m − 1) “al-

most always” (see Chap. 5). However0Θ(u) will contain points that are the “social”

equivalents of the minima of a real-valued function.

Note, by Lemma ??, that0Θ(u) and the critical Pareto set Θ(u) coincide, except

for boundary points. If the boundary of the space is smooth, then it is possible todefine a Lagrangian which characterises the boundary points in Θ(u).

For example consider Fig. 4.20, of a two agent two commodity exchange econ-omy.

Agent 1 has non-convex preference, and the critical Pareto set consists of threecomponents ABC,ADC and EFG.

On ADC although the utilities satisfy the first order condition, there exist nearbypoints that both agents prefer. For example, both agents prefer a nearby point y tox. See Fig. 4.21.

In Fig. 4.22 from an initial endowment such as e = (e11, e12), there exists threeWalrasian extended equilibria, but at least one can be excluded. Note that if e is theinitial endowment vector, then the Walrasian equilibrium B which is accessible byexchange along the price vector may be Pareto inferior to a Walrasian equilibrium,F , which is not readily accessible.

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Sticky Note

Theorem 4.21

njschofi

Cross-Out

njschofi

Sticky Note

Two agent, two commodity exchange economy

AU

TH

OR

’S P

RO

OF



2347

2348

2349

2350

2351

2352

2353

2354

2355

2356

2357

2358

2359

2360

2361

2362

2363

2364

2365

2366

2367

2368

2369

2370

2371

2372

2373

2374

2375

2376

2377

2378

2379

2380

2381

2382

2383

2384

2385

2386

2387

2388

2389

2390

2391

2392

Fig. 4.21 ???

Fig. 4.22 ???

Fig. 4.23 ???

Example 4.14 Consider the example due to Smale (as in Fig. 4.23). Let Y = �2

and suppose

u1(x, y) = y − x2

u2(x, y) = −y

x2 + 1.

Then

Du1(x, y) = (−2x,1)

Du2(x, y) =(

2xy

(x2 + 1)2,−1

x2 + 1

).

njschofi

Cross-Out

njschofi

Sticky Note

Unstable critical Paret point

njschofi

Cross-Out

njschofi

Sticky Note

Unstable Pareto component.

njschofi

Cross-Out

njschofi

Sticky Note

Example 4.14

AU

TH

OR

’S P

RO

OF


References 187

2393

2394

2395

2396

2397

2398

2399

2400

2401

2402

2403

2404

2405

2406

2407

2408

2409

2410

2411

2412

2413

2414

2415

2416

2417

2418

2419

2420

2421

2422

2423

2424

2425

2426

2427

2428

2429

2430

2431

2432

2433

2434

2435

2436

2437

2438

Let DLλ(x, y)= λ1(−2x,1)+ λ2(2xy

(x2+1)2 , −1x2+1

).Clearly one solution will be x = 0, in which case λ1(0,1) + λ2(0,−1) = 0 or

λ1 = λ2 = 1.The Hessian for L at x = 0 is then

HL(0, y) =D2u1(0, y)+D2u2(0, y)

=(−2 0

0 0

)+(

2y 00 0

)

which is negative semi-definite for 2(y − 1) < 0 or y < 1.

References

For a lucid account of economic equilibrium theory see:

<unc> Arrow, K. J., & Hahn, F. H. (1971). General competitive analysis. Edinburgh: Oliver and Boyd.Hildenbrand, W., & Kirman, A. P. (1976). Introduction to equilibrium analysis. Amsterdam: North

Holland.

For the ideas of preference or resource manipulation, see:

Balasko, Y. (1978). The transfer problem and the theory of regular economies. International Eco-nomic Review, 19, 687–694.

Gale, D. (1974). Exchange equilibrium and coalitions. Journal of Mathematical Economics, 1,63–66.

<unc> Guesnerie, R., & Laffont, J.-J. (1978). Advantageous reallocations of initial resources. Economet-rica, 46, 687–694.

<unc> Safra, Z. (1983). Manipulation by reallocating initial endowments. Journal of Mathematical Eco-nomics, 12, 1–17.

For a general introduction to the application of differential topology to economics see:

Smale, S. (1976). Dynamics in general equilibrium theory. American Economic Review, 66, 288–294. Reprinted in S. Smale (1980) The Mathematics of Time. Springer: Berlin.

njschofi

Cross-Out

njschofi

Sticky Note

Further Reading

AU

TH

OR

’S P

RO

OF


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

5Singularity Theory and General Equilibrium

In the last section of the previous chapter we introduced the critical Pareto set Θ(u)

of a smooth profile for a society, and the notion of a regular economy. Both ideasrelied on the concept of a singularity of a smooth function f : X→�m, where asingularity is a point analogous to the critical point of a real-valued function.

In this chapter we introduce the fundamental result in singularity theory, that theset of singularity points of a smooth profile almost always has a particular geometricstructure. We then go on to use this result to discuss the Debreu-Smale Theorem onthe generic existence of regular economies. No attempt is made to prove these re-sults in full generality. Instead the aim is to provide a geometric understanding of theideas. Section 5.4 uses an example of Scarf (1960) to illustrate the idea of an excessdemand function for an exchange economy. The example provides a general wayto analyse a smooth adjustment process leading to a Walrasian equilibrium. Sec-tions 5.5 and 5.6 introduce the more abstract topological ideas of structural stabilityand chaos in dynamical systems.

5.1 Singularity Theory

In Chap. 4 we showed that when f :X→� was a C2-function on a normed vectorspace, then knowledge of the first and second differential of f at a critical point, x,gave information about the local behavior (near x) of the function. In this sectionwe discuss the case of a differentiable function f :X→ Y between general normedvector spaces, and consider regular points (where the differential has maximal rank)and singularity points (where the differential has non-maximal rank). For both kindsof points we can locally characterise the behavior of the function.

5.1.1 Regular Points: The Inverse and Implicit Function Theorem

Suppose that f : X→ Y is a function between normed vector spaces and that forall x ′ in a neighbourhood U of the point x the differential df (x′) is defined. Bythe results of Sect. 3.2, if df (x′) is bounded, or if X is finite-dimensional, then


189

http://dx.doi.org/10.1007/978-3-642-39818-6_5

AU

TH

OR

’S P

RO

OF


190 5 Singularity Theory and General Equilibrium

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

df (x′) will be a continuous linear function. Suppose now that X and Y have thesame finite dimension (n) and df (x′) has rank n at all x′ ∈ U . Then we knowthat df (x′)−1 : Y → X is a linear map and thus continuous. We shall call f a C2-diffeomorphism on U in this case.

In general, when Y is an infinite-dimensional vector space, then even if df (x′)−1

exists it need not be continuous. However when X and Y are complete normedvector spaces then the existence of df (x′)−1 is sufficient to guarantee that df (x′)−1

is continuous.Essentially a normed vector space X is complete iff any “convergent” sequence

(xk) does indeed converge to a point in X. More formally, a Cauchy sequence isa sequence (xk) such that for any ε > 0 there exists some number k(ε) such thatr, s > k(ε) implies that ‖xr − xs‖ < ε. If (xk) is a sequence with a limit x0 in X

then clearly (xk) must be a Cauchy sequence. On the other hand a Cauchy sequenceneed not in general converge to a point in the space X. If every Cauchy sequence hasa limit in X, then X is called complete. A complete normed vector space is calleda Banach space. Clearly �n is a Banach space. Suppose now that X,Y are normedvector spaces of the same dimension, and f : U ⊂ X→ Y is a Cr -differentiablefunction on U , such that df (x) has a continuous inverse [df (x)]−1 at x. We call f aCr -diffeomorphism at x. Then we can show that f has an inverse f−1 : f (U)→U

with differential df−1(f (x))= [df (x)]−1. Moreover there exists a neighbourhoodV of x in U such that f is a Cr -diffeomorphism on V . In particular this meansthat f has an inverse f−1 : f (V )→ V with continuous differential df−1(f (x′))=[df (x′)]−1 for all x′ ∈ V , and that f−1 is Cr -differentiable on V . To prove thetheorem we need to ensure that [df (x)]−1 is not only linear but continuous, and itis sufficient to assume X and Y are Banach spaces.

Inverse Function Theorem Suppose f : U ⊂ X→ Y is Cr -differentiable, whereX and Y are Banach spaces of dimension n. Suppose that the linear map df (x) :X→ Y , for x ∈ U , is an isomorphism with inverse [df (x)]−1 : Y →X. Then thereexist open neighbourhoods V of x in U and V ′ of f (x) such that f : V → V ′ isa bijection with inverse f−1 : V ′ → V . Moreover f−1 is itself Cr -differentiableon V ′, and for all x′ ∈ V , df−1(f (x′)) = [df (x′)]−1. f is called a local Cr -dif-feomorphism, at x.

Outline of Proof Let t = df (x) :X→ Y . Since [df (x)]−1 exists and is continuous,t−1 : Y →X is linear and continuous.

It is possible to choose a neighbourhood V ′ of f (x) in f (U) and a closed ball Vx

in U centered at x, such that, for each y ∈ V ′, the function gy : Vx ⊂U ⊂X→ Vx ⊂X defined by gy(x

′)= x′ − t−1[f (x′)− y] is continuous. By Brouwer’s Theorem,each gy has a fixed point.

That is to say for each y ∈ V ′, there exists x′ ∈ Vx such that gy(x′)= x′. But then

t−1[f (x′)− y] = 0. Since, by hypothesis, t−1 is an isomorphism, its kernel = {0},and so f (x′) = y. Thus for each y ∈ V ′ we establish gy(x

′) = x′ is equivalent tof (x′) = y. Define f−1(y) = gy(x

′) = x′, which gives the inverse function on V ′.To show f−1 is differentiable, proceed as follows.

AU

TH

OR

’S P

RO

OF


5.1 Singularity Theory 191

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

Note that dgy(x′)= Id−t−1◦df (x′) is independent of y. Now dgy(x

′) is a linearand continuous function from X to X and is thus bounded. Since X is Banach, it ispossible to show that L(X,X), the topological space of linear and continuous mapsfrom X to X, is also Banach. Thus if u ∈ L(X,X), so is (Id−u)−1. This followssince (Id−u)−1 converges to an element of L(X,X).

Now dgy(x′) ∈ L(X,X) and so (Id−dgy(x

′))−1 ∈ L(X,X). But then t−1 ◦df (x′) has a continuous linear inverse. Now t−1 ◦ df (x′) : X→ Y → X and t−1

has a continuous linear inverse. Thus df (x′) has a continuous linear inverse, for allx′ ∈ V . Let V be the interior of Vx . By the construction the inverse of df (x′), forx′ ∈ V , has the required property. �

This is the fundamental theorem of differential calculus. Notice that the theoremasserts that if f : �n →�n and df (x) has rank n at x, then df (x′) has rank n forall x′ in a neighbourhood of x.

Example 5.1 (i) For a simple example, consider the function exp : �→�+ : x→ex . Clearly for any finite x ∈ �, d(exp)(x)= ex �= 0, and so the rank of the differ-ential is 1. The inverse φ : �+ →� must satisfy

dφ(y)= [d(exp)(x)]−1 = 1

ex

where y = exp(x)= ex . Thus dφ(y)= 1y

.Clearly φ must be the function loge : y→ loge y.(ii) Consider sin: (0,2π)→[−1,+1].Now d(sin)(x) �= 0. Hence there exist neighbourhoods V of x and V ′ of sinx

and an inverse φ : V ′ → V such that

dφ(y)= 1

cosx= 1√

1− y2.

This inverse φ is only locally a function. As Fig. 5.1 makes clear, even whensinx = y, there exist distinct values x1, x2 such that sin(x1)= sin(x2)= y. Howeverd(sin)(x1) �= d(sin)(x2).

The figure also shows that there is a neighbourhood V ′ of y such that φ : V ′ → V

is single-valued and differentiable on V ′. Suppose now x = π2 . Then d(sin)(π

2 )= 0.Moreover there is no neighbourhood V of π

2 such that sin : (π2 − ε, π

2 + ε)= V →V ′ has an inverse function.

Note one further consequence of the theorem. For h small, we may write

f (x + h) = f (x)+ df (x) ◦ [df (x)]−1(

f (x + h)− f (x))

= f (x)+ df (x)ψ(h),

where ψ(h)= d[f (x)]−1(f (x+h)−f (x)). Now by a linear change of coordinateswe can diagonalise df (x). So that in the case f = (f1, . . . , fn) : �n →�n we canensure ∂fi

∂xj|x = ∂ij where ∂ij = 1 if i = j and 0 if i �= j . Hence f (x + h)= f (x)+

AU

TH

OR

’S P

RO

OF



139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

Fig. 5.1 ???

(ψ1(h), . . . ,ψn(h)). There is therefore a Cr -diffeomorphic change of coordinates φ

near x such that φ(0)= x and

f(φ(h1, . . . , hn)

)= f (x)+ (h1, . . . , hn).

In other words by choosing coordinates appropriately f may be represented byits linear differential.

Suppose now that f : U ⊂�n→�m is a C1-function. The maximal rank of df

at a point x in U is min(n,m). If indeed df (x) has maximal rank then x is called aregular point of f , and f (x) a regular value. In this case we write x ∈ S0(f ). Theinverse function theorem showed that when n=m and x ∈ S0(f ) then f could beregarded as an identity function near x.

In particular this means that there is a neighbourhood U of x such thatf−1[f (x)] ∩U = {x} is an isolated point.

In the case that n �=m we use the inverse function theorem to characterise f atregular points.

Implicit Function Theorem for Vector Spaces1. (Surjective Version). Suppose that f : U ⊂ �n → �m, n ≥ m, and rank

(df (x)) = m, with f (x) = 0 for convenience. If f is Cr -differentiable at x,then there exists a Cr -diffeomorphism φ : �n→�n on a neighbourhood of theorigin such that φ(0)= x, and f ◦ φ(h1, . . . , hn)= (h1, . . . , hm).

2. (Injective Version). If f : U ⊂ �n → �m, n ≤ m, rank (df (x)) = n, withf (0) = y, and f is Cr -differentiable at x then there exists a Cr -diffeomorph-ism ψ : �m→�m such that ψ(y)= 0 and

ψf (h1, . . . , hn)= (h1, . . . , hn,0, . . . ,0).

njschofi

Cross-Out

njschofi

Sticky Note

Example 5.1

AU

TH

OR

’S P

RO

OF



185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

Proof1. Now df (x) = [B C], with respect to some coordinate system, where B is an

(m×m) non singular matrix and C is an (n−m)×m matrix. Define F : �n→�n by

F(x1, . . . , xn)=(f (x1, . . . , xn), xm+1, . . . , xn

).

Clearly DF(x) has rank n, and by the inverse function theorem there existsan inverse φ to F near x. Hence F ◦ φ(h1, . . . , hn) = (h1, . . . , hn). But thenf ◦ φ(h1, . . . , hn)= (h1, . . . , hm).

2. Follows similarly. �

As an application of the theorem, suppose f : �m ×�n−m→�m. Write x for avector in �m, and y for a vector in �n−m, and let df (x, y)= [B C] where B is anm×m matrix and C is an (n−m)×m matrix. Suppose that B is invertible, at (x, y),and that f (x, y) = 0. Then the implicit function theorem implies that there existsan open neighbourhood U of y in �n−m and a differentiable function g : U →�m

such that g(y′)= x′ and f (g(y′), y′)= 0 for all y′ ∈ V .To see this define

F : �m ×�n−m→�m ×�n−m

by F(x, y)= (f (x, y), y).

Clearly dF(x, y) = [ B C

O I

]and so dF(x, y) is invertible. Thus there exists a

neighbourhood V of (x, y) in �n on which F has a diffeomorphic inverse G. NowF(x, y)= (0, y). So there is a neighbourhood V ′ of (0, y) and a neighbourhood V

of (x, y) s.t. G : V ⊂�n→ V ′ ⊂ �n is a diffeomorphism.Let g(y′) be the x coordinate of G(0, y′) for all y′ such that (0, y′) ∈ V ′.Clearly g(y′) satisfies G(0, y′)= (g(y′), y′) and so F ◦G(0, y′)= F(g(y′), y)=

(f (g(y′, y′)), y′)= (0, y′).Now if (x′, y′) ∈ V ′ then y′ ∈ U where U is open in �n−m. Hence for all y′ ∈

U,g(y′) satisfies f (g(y′), y′) = 0. Since G is differentiable, so must be g : U ⊂�n−m→�m. Hence x′ = g(y′) solves f (x′, y′)= 0.

Example 5.21. Let f : �3→�2 where

f1(x, y, z) = x2 + y2 + z2 − 3

f2(x, y, z) = x3y3z3 − x + y − z.

At (x, y, z)= (1,1,1), f1 = f2 = 0.We seek a function g : �→�2 such that f (g1(z), g2(z), z)= 0 for all z in

a neighbourhood of 1.Now

df (x, y, z)=(

2x 2y 2z

3x2y3z3 − 1 3x3y2z3 + 1 3x3y3z2 − 1

)

AU

TH

OR

’S P

RO

OF



231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

and so

df (1,1,1)=(

2 2 22 4 2

).

The matrix(

2 22 4

)

is non-singular. Hence there exists a diffeomorphism G : �3 →�3 such thatG(0,0,1)= (1,1,1), and G(0,0, z′)= (g1(z

′), g2(z′), z′) for z′ near 1.

2. In a simpler example consider f : �2 →� : (x, y)→ (x − a)2 + (y − b)2 −25= 0, with df (x, y)= (2(x − a),2(y − b)).

Now let F(x, y) = (x, f (x, y)), where F : �2 →�2, and suppose y �= b.Then

dF(x, y) =(

1 0∂f∂x

∂f∂y

)

=(

1 02(x − a) 2(y − b)

)

with inverse

dG(x, y)= 1

2(y − b)

(2(y − b) 0−2(x − a) 1

).

Define g(x′) to be the y-coordinate of G(x′,0). Then F ◦ G(x′,0) =F(x′, g(x′)) = F(x′, f (x′, y′)) = (x′,0), and so y′ = g(x′) for f (x′, y′) = 0and y′ sufficiently close to y. Note also that

dg

dx

∣∣∣∣x′= dG2

∂x

∣∣∣∣(x′,g′)

=− (x′ − a)

(y′ − b).

In Example 5.2(1), the “solution” g(z) = (x, y) to the smooth constraintf (x, y, z)= 0 is, in a topological sense, a one-dimensional object (since it is givenby a single constraint in �2).

In the same way in Example 5.2(2) the solution y = g(x) is a one-dimensionalobject (since it is given by a single constraint in �2).

More specifically say that an open set V in �n is an r-dimensional smooth man-ifold iff there exists a diffeomorphism

φ : V ⊂�n→U ⊂�r .

When f : �n→�m and rank (df (x))=m≤ n then say that f is a submersionat x. If rank (df (x))= n≤m then say f is an immersion at x.

One way to interpret the implicit function theorem is as follows:

AU

TH

OR

’S P

RO

OF



277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

(1) When f is a submersion at x, then the inverse f−1(f (x)) of a point f (x)

“looks like” an object of the form {x,hm+1, . . . , hn} and so is a smooth manifold in�n of dimension (n−m).

(2) When f is an immersion at x, then the image of an (n-dimensional) neighbor-hood U of x “looks like” an n-dimensional manifold, f (u), in �m. These observa-tions can be generalized to the case when f :Xn→ Ym is “smooth”, and X,Y arethemselves smooth manifolds of dimension n,m respectively. Without going intothe formal details, X is a smooth manifold of dimension n if it is a paracompacttopological space and for any x ∈ X there is a neighborhood V of x and a smooth“chart”, φ : V ⊂X→ U ⊂�n. In particular if x ∈ Vi ∩ Vj for two open neighbor-hoods, Vi,Vj of x then

φi ◦ φ−1j : φj (Vi ∩ Vj )⊂�n→ φi(Vi ∩ Vj )⊂�n

is a diffeomorphism. A smooth structure on X is an atlas, namely a family {(φi,Vi)}of charts such that {Vi} is an open cover of X. The purpose of this definition is thatif f :Xn→ Ym then there is an induced function near a point x given by

fij =ψi ◦ f ◦ φ−1j : �n→ φ−1

j (Vj )→ Y →�m.

Here (φj ,Vj ) is a chart at x, and (ψi,Vi) is a chart at f (x). If the inducedfunctions {fij } at every point x are differentiable then f is said to be differentiable,and the “induced” differential of f is denoted by df . The charts thus provide aconvenient way of representing the differential df of f at the point x. In particularonce (φj ,Vj ) and (ψi,Vi) are chosen for x and f (x), then df (x) can be representedby the Jacobian matrix Df (x)= (∂fij ). As before Df (x) will consist of n columnsand m rows. Characteristics of the Jacobian, such as rank, will be independent ofthe choices for the charts (and thus coordinates) at x and f (x). (See Chillingsworth(1976), for example, for the details.)

If the differential df of a function f : Xn→ Ym is defined and continuous thenf is called C1. Let C1(X,Y ) be the collection of such C1-maps. Analogous to thecase of functions between real vector spaces, we may also write Cr(X,Y ) for theclass of Cr -differentiable functions between X and Y .

The implicit function theorem also holds for members of C1(X,Y ).

Implicit Function Theorem for Manifolds Suppose that f : Xn → Ym is a C1-function between smooth manifolds of dimension n,m respectively.1. If n ≥ m and f is a submersion at x (i.e., rank df (x) = m) then f−1(f (x))

is (locally) a smooth manifold in X of dimension (n−m). Moreover, if Z is amanifold in Yn of dimension r , and f is a submersion at each point in f−1(Z),then f−1(Z) is a submanifold of X of dimension n−m+ r .

2. If n ≤ m and f is an immersion at x (i.e., rank df (x) = n) then there is aneighbourhood U of x in X such that f (U) is an n-dimensional manifold in Y

and in particular f (U) is open in Y .

AU

TH

OR

’S P

RO

OF



323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

Fig. 5.2 ???

The proof of this theorem is considerably beyond the scope of this book, but theinterested reader should look at Golubitsky and Guillemin (1973, p. 9) or Hirsch(1976, p. 22). This theorem is a smooth analogue of the isomorphism theorem forlinear functions given in Sect. 2.2. For a linear function T : �n→�m when n≥m

and T is surjective, then T −1(y) has the form x0 + K where K is the (n − m)-dimensional kernel. Conversely if T : �n →�m and n ≤ m when T is injective,then image (T ) is an n-dimensional subspace of �m. More particularly if U is ann-dimensional open set in �n then T (U) is also an n-dimensional open set in �m.

Example 5.3 To illustrate, consider Example 5.2(2) again. When y �= b, df hasrank 1 and so there exists a “local” solution y′ = g(x′) such that f (x′, g(x′))= 0.In other words

f−1(0)= {(x′, g(x′)) ∈ �2 : x′ ∈U},

which essentially is a copy of U but deformed in �2. Thus f−1(0) is “locally”a one-dimensional manifold. Indeed the set S1 = {(x, y) : f (x, y) = 0} itself is a1-dimensional manifold in �2.

If y �= b, and (x, y) ∈ S1 then there is a neighbourhood U of x and a diffeomor-phism g : S1→� : (x′, y′)→ g(y′) and this parametrises S1 near (x, y).

If y = b, then we can do the same thing through a local solution x′ = h(y′)satisfying f (h(y′), y′)= 0.

5.1.2 Singular Points and Morse Functions

When f : Xn → Ym is a C1-function between smooth manifolds, and rank df (x)

is maximal (=min(n,m)) then as before write x ∈ S0(f ).The set of singular points of f is S(f ) = X\S0(f ). Let z =min(n,m) and say

that x is a corank r singularity, or x ∈ Sr(f ), if rank (df (x))= z− r .Clearly S(f )=⋃r>1 Sr(f ).In the next section we shall examine the corank r singularity sets of a C1-function

and show that they have a nice geometric structure. In this section we consider thecase m= 1.

njschofi

Cross-Out

njschofi

Sticky Note

Example 5.3

AU

TH

OR

’S P

RO

OF



369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

In the case of a C2-function f : Xn →�, either x will be regular (in S0(f ))or a critical point (in S1(f )) where df (x) = 0. We call a critical point of f non-degenerate iff d2f (x) is non-singular. A C2-function all of whose critical pointsare non degenerate is called a Morse function. A Morse function, f , near a criticalpoint has a very simple representation.

A local system of coordinates at a point x in X is a smooth assignment

yφ→ (h1, . . . , hn)

for every y in some neighbourhood U of x in X.

Lemma 5.1 (Morse) If f :Xn→� is C2 and x is a non-degenerate critical pointof index k, then there exists a local system of coordinates (or chart (φ,V )) at x suchthat f is given by

yφ→ (h1, . . . , hn)

g→ f (x)−k∑

i=1

hi2 +

n∑

i=k+1

hi2.

As before the index of the critical point is the number of negative eigenvalues ofthe Hessian Hf at x. The C2-function g has Hessian

Hg(0)=

⎛

⎜⎜⎜⎜⎜⎜⎜⎜⎝

−2··−2

2··

⎞

⎟⎟⎟⎟⎟⎟⎟⎟⎠

↑k

↓

with k negative eigenvalues. Essentially the Morse lemma implies that when x is anon-degenerate critical point of f , then f is topologically equivalent to the functiong with a similar Hessian at the point.

By definition, if f is a Morse function then all its critical points are non-degenerate. Moreover if x ∈ S1(f ) then there exists a neighbourhood V of x suchthat x is the only critical point of f in V .

To see this note that for y ∈ V ,

df (y)= dg(h1, . . . , hn)= (−2h1, . . . ,2hn)= 0

iff h1 = · · · = hn = 0, or y = x. Thus each critical point of f is isolated, and soS1(f ) is a set of isolated points and thus a zero-dimensional object.

As we shall see almost any smooth function can be approximated arbitrarilyclosely by a Morse function.

To examine the regular points of a differentiable function f :X→�, we can usethe Sard Lemma.

AU

TH

OR

’S P

RO

OF



415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

Fig. 5.3 ???

First of all a set V in a topological space X is called nowhere dense if its closure,clos(V ), contains no non-empty open set. Alternatively X\clos(V ) is dense.

If X is a complete metric space then the union of a countable collection of closednowhere dense sets is nowhere dense. This also means that a residual set (the in-tersection of a countable collection of open dense sets) is dense. (See Sect. 3.1.2).A set V is of measure zero in X iff for any ε > 0 there exists a family of cubes, withvolume less than ε, covering V . If V is closed, of measure zero, then it is nowheredense.

Lemma 5.2 (Sard) If f :Xn→� is a Cr -map where r ≥ n, then the set f (S1(f ))

of critical values of f has measure zero in �. Thus f (S0(f )), the set of regularvalues of f , is the countable intersection of open dense sets and thus is dense.

To illustrate this consider Fig. 5.3. f is a quasi-concave C1-function f : �→�.The set of critical points of f , namely S1(f ), clearly does not have measure zero,since S1(f ) has a non-empty interior. Thus f is not a Morse function. Howeverf (S1(f )) is an isolated point in the image.

Example 5.4 To illustrate the Morse lemma let Z = S1 × S1 be the torus (the skinof a doughnut) and let f :Z→� be the height function.

Point s, at the bottom of the torus, is a minimum of the function, and so the indexof s = 0. Let f (s)= 0.

Then near s, f can be represented by

(h1, h2)→ 0+ h12 + h2

2.

Note that the Hessian of f at s is[ 2 0

0 2

], and so is positive definite.

The next critical point, t , is obviously a saddle, with index 1, and so we can write

(h1, h2)→ f (t)+ h12 − h2

2. Clearly Hf (t)= [ 2 00 −2

].

Suppose now that a ∈ (f (s), f (t)). Clearly a is a regular value, and so any pointx ∈Z satisfying f (x)= a is a regular point, and f is a submersion at x. By the im-plicit function theorem f−1(a) is a one-dimensional manifold. Indeed it is a singlecopy of the circle, S1.

njschofi

Cross-Out

njschofi

Sticky Note

Illustration of Sard's Lemma.

njschofi

Sticky Note

See Figure 5.4.

AU

TH

OR

’S P

RO

OF



461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

Fig. 5.4 Critical points on Z

The next critical point is the saddle, u, near which f is represented as

(h1, h2)→ f (u)− h12 + h2

2.

Now for b ∈ (f (t), f (u)), f−1(b) is a one-dimensional manifold, but this timeit is two copies of S1. Finally v is a local maximum and f is represented near v by(h1, h2)→ f (u)− h1

2 − h22. Thus the index of v is 2.

We can also use this example to introduce the idea of the Euler characteristicχ(X) of a manifold X. If X has dimension, n, let ci(X,f ) be the number of criticalpoints of index i, of the function f :X→� and let

χ(X,f )=n∑

i=0

(−1)ici(X,f ).

For example the height function, f , on the torus Z has

(i) c0(Z,f )= 1, since s has index 0(ii) c1(Z,f )= 2, since both t and u have index 1

(iii) c2(Z,f )= 1, since v has index 2.Thus χ(Z,f )= 1− 2+ 1= 0. In fact, it can be shown that χ(X,f ) is indepen-

dent of f , when X is a compact manifold. It is an invariant of the smooth manifoldX, labelled (χ(X)). Example 5.4 illustrates the fact that χ(Z)= 0.

Example 5.5(1) The sphere S1. It is clear that the height function f : S1 →� has an index 0

critical point at the bottom and an index 1 critical point at the top, so χ(S1)=1− 1= 0.

AU

TH

OR

’S P

RO

OF



507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

Fig. 5.5 Critical points in S2

(2) The sphere S2 has an index 0 critical point at the bottom and an index 2 criticalpoint at the top, so χ(S2)= c0 + c2 = 1+ 1= 2.

It is possible to deform the sphere, S2, so as to induce a saddle, but thiscreates an index 0 critical point. In this case c0 = 2, c1 = 1, and c2 = 1 as inFig. 5.5. Thus χ(S2)= 2− 1+ 1= 2 again.

(3) More generally, χ(Sn)= 0 if n is odd and = 2 if n is even.(4) To compute χ(Bn) for the closed n-ball, take the sphere Sn and delete the top

hemisphere. The remaining bottom hemisphere is diffeomorphic to Bn. Bythis method we have removed the index n critical point at the top of Sn.

For n= 2k + 1 odd, we know

χ(S2k+1)=

2k∑

i=0

(−1)ici

(Sn)− cn

(Sn)= 0,

so χ(B2k+1)= χ(S2k+1)+ 1= 1. For n= 2k, even we have

2k−1∑

i=0

(−1)ci

(Sn)+ cn

(Sn)= 2, so χ

(B2k)= χ

(S2k)− 1= 1.

5.2 Transversality

To examine the singularity set S(f ) of a smooth function f :X→ Y we introducethe idea of transversality.

A linear manifold V in �n of dimension v is of the form x0 +K , where K is avector subspace of �n of dimension v. Intuitively if V and W are linear manifoldsin �n of dimension v,w then typically they will not intersect if v +w < n.

On the other hand if v + w ≥ n then V ∩ W will typically be of dimensionv +w− n.

For example two lines in �2 will intersect in a point of dimension 1+ 1− 2.

njschofi

Sticky Note

See Figure 5.5.

AU

TH

OR

’S P

RO

OF


5.2 Transversality 201

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

Another way of expressing this is to define the codimension of V in �n to ben− v. Then the codimension of V ∩W in W will typically be w − (v +w − n)=n− v, the same codimension.

Suppose now that f : Xn → Ym where X,Y are vector spaces of dimensionn,m respectively. Let Z be a z-dimensional linear manifold in Y . Say that f istransversal to Z iff for all x ∈ X, either (i) f (x) /∈ Z or (ii) the image of df (x),regarded as a vector subspace of Ym, together with Z span Y . In this case write

fT∩ Z. The same idea can be extended to the case when X,Y,Z, are all manifolds.

Whenever fT∩ Z, then if x ∈ f−1(Z), f will be a submersion at x, and so f−1(Z)

will be a smooth manifold in X of codimension equal to the codimension of Z in Y .Another way of interpreting this is that the number of constraints which determineZ in Y will be equal to the number of constraints which determine f−1(Z) in X.Thus dim(f−1(Z))= n− (m− z).

In the previous chapter we put the Whitney Cs -topology on the set of Cs -differentiable maps Xn→ Ym, and called this Cs(X,Y ). In this topological space aresidual set is dense. The fundamental theorem of singularity theory is that transver-sal intersection is generic.

Thom Transversality Theorem Let Xn,Ym be manifolds and Zz a submanifoldof Y . Then the set

{f ∈Cs(X,Y ) : f T∩Z

}= T∩ (X,Y ;Z)

is a residual (and thus dense) set in the topological space Cs(X,Y ).

Note that if f ∈ T∩ (X,Y ;Z) then f−1(Z) will be a manifold in X of dimensionn−m+ z.

Moreover if g ∈ Cs(X,Y ) but g is not transversal to Z, then there exists someCs -map, as near to g as we please in the Cs topology, which is transversal to Z.Thus transversal intersection is typical or generic.

Suppose now that f : Xn → Ym, and corank df (x) = r , so rank df (x) =min(n,m) − r . In this case we said that x ∈ Sr(f ), the corank r singularity setof f . We seek to show that Sr(f ) is a manifold in X, and compute its dimension.

Suppose Xn,Ym are vector spaces, with dimension n,m respectively. As beforeL(X,Y ) is the normed vector space of linear maps from X to Y . Let Lr (X,Y ) bethe subset of L(X,Y ) consisting of linear maps with corank r .

Lemma 5.3 Lr (X,Y ) is a submanifold of L(X,Y ) of codimension (n−z+r)(m−z+ r) where z=min(n,m).

Proof Choose bases such that a corank r linear map S is represented by a matrix(

A B

C D

)where rank (A)= k = z− r .

AU

TH

OR

’S P

RO

OF



599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

Let U be a neighbourhood of S in L(X,Y ), such that for all S′ ∈ U , corank(S′)= r .

Define F :U → L(�n−k,�m−k) by F(S′)=D′ −C(A′)−1B ′.Now S′ ∈ F−1(0) iff rank (S′) = k. The codimension of 0 in L(�n−k,�m−k)

is (n− k)(m− k). Since F is a submersion F−1(0)= Lr (X,Y ) is of codimension(n− z+ r)(m− z+ r). �

Now if f ∈ Cs(X,Y ), then df ∈ Cs−1(X,L(X,Y )). If df (x) ∈ Lr (X,Y )

then x ∈ Sr(f ). By the Thom Transversality Theorem, there is a residual set inCs−1(X,L(X,Y )) such that df (x) is transversal to Lr (X,Y ). But then Sr(f ) =df−1(Lr (X,Y )) is a submanifold of X of codimension (n − z + r)(m − z + r).Thus we have:

The Singularity Theorem There is the residual set V in Cs(X,Y ) such that forall f ∈ V , the corank r singularity set of f is a submanifold in X of codimension(n− z+ r)(m− z+ r).

If f :Xn→� then codim S1(f )= (n−1+1)(1−1+1)= n. Hence genericallythe set of critical points will be zero-dimensional, and so critical points will be iso-lated. Now a Morse function has isolated, non-degenerate critical points. Let Ms(X)

be the set of Morse functions on X with the topology induced from Cs(X,�).

Morse Theorem The set Ms(X) of Cs -differentiable Morse functions (with non-degenerate critical points), is an open dense set in Cs(X,�). Moreover if f ∈Ms(X), then the critical points of f are isolated.

More generally if f :Xn→ Ym with m≤ n then in the generic case, S1(f ) is ofcodimension (n−m+ 1)(n−m+ 1) = n−m+ 1 in X and so S1(f ) will be ofdimension (m− 1).

Suppose now that n > 2m − 4, and n ≥ m. Then 2n − 2m + 4 > n and sor(n−m+ r) > n for r ≥ 2. But codimension (Sr(f )) = r(n−m+ r), and sincecodimension (Sr(f )) > dimension X, Sr(f )=Φ for r ≥ 2.

Submanifold Theorem If Zz is a submanifold of Ym and z < m then Z is nowheredense in Y (here z= dim(Z) and m= dim(Y )).

In the case n≥m, the singularity set S(f ) will generically consist of a union ofthe various co-rank r singularity submanifolds, for r ≥ 1. The highest dimension ofthese is m− 1. We shall call an S(f ) a stratified manifold of dimension (m− 1).Note also that S(f ) will then be nowhere dense in X. We also require the followingtheorem.

Morse Sard Theorem If f : Xn → Ym is a Cs -map, where s > n − m, thenf (S(f )) has measure zero in Y and f (S0(f )) is residual and therefore dense in Y .

AU

TH

OR

’S P

RO

OF


5.3 Generic Existence of Regular Economies 203

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

We are now in a position to apply these results to the critical Pareto set. Sup-pose then that u= (u1, . . . , um) :Xn→�m is a smooth profile on the manifold offeasible states X.

Say x ∈ 0Θ(u1, . . . , um)= 0

Θ(u) iff dL(λ,u)(x)= 0, where L(λ,u)=∑mi=1 λiui

and λ ∈ �m+.

By Lemma ??, the critical Pareto set, Θ(u), contains0Θ(u) but possibly also

contains further points on the boundary of X. However we shall regard0Θ as the

differential analogue of the Pareto set. By Lemma 4.19,0Θ(u) must be closed in X.

Moreover when n≥m and x ∈ 0Θ(u) then the differentials {dui(x) : i ∈M}must be

linearly dependent. Hence0Θ(u) must belong to S(u). But also S(u) will be nowhere

dense in X. Thus we obtain the following theorem (Smale 1973 and Debreu 1970).

Pareto Theorem There exists a residual set U in C1(X,�m), for dim(X) ≥ m,

such that for any profile u ∈ U , the closed critical Pareto set0Θ(u) belongs to the

nowhere dense stratified singularity manifold S(u) of dimension (m− 1). Moreoverif dim(X) > 2m − 4, then Θ(u) is itself a manifold of dimension (m − 1), for allu ∈ C1(X,�m).

As we have already observed this result implies that Θ(u) can generally beregarded as an (m − 1) dimensional object parametrised by (m − 1) coefficients

(λ2λ1

, . . . , λm

λ1) say. Since points in

0Θ(u) are characterised by first order conditions

alone, it is necessary to examine the Hessian of L to find the Pareto optimal points.

5.3 Generic Existence of Regular Economies

In this section we outline a proof of the Debreu-Smale Theorem on the GenericExistence of Regular Economies (see Debreu 1970, and Smale 1974a).

As in Sect. 4.4, let u = (u1, . . . , um) : Xm → �m be a smooth profile, whereX = �n+, the commodity space facing each individual. Let e = (e1, . . . , em) ∈ Xm

be an initial endowment vector. Given u, define the Walras manifold to be the set

Zu ={(e, x,p) ∈Xm ×Xm ×Δ

}

(where Δ is the price simplex) such that (x,p) is a Walrasian equilibrium for theeconomy (e, u). That is, (x,p)= (x1, . . . , xm,p) ∈Xm ×Δ satisfies:1. individual optimality: D∗ui(xi)= p for i ∈M ,2. individual budget constraints: 〈p,xi〉 = 〈p, ei〉 for i ∈M ,3. social resource constraints:

∑mi=1 xij = ∑m

i=1 eij for each commodity j =1, . . . , n.

Note that we implicitly assume that each individual’s utility function, ui , is de-fined on a domain Xi ≡X ⊂�n+, so that the differential dui(xi) at xi ∈Xi can berepresented by a vector Dui(xi) ∈ �n. As we saw in Chap. 4, we may normalize

AU

TH

OR

’S P

RO

OF



691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

Dui and p so the optimality condition for i becomes D∗ui(x)= p for p ∈Δ. Forthe space of normalized price vectors, we may identify Δ with {p ∈ �n+ : ‖p‖ = 1}.Observe that dim(Δ)= n− 1.

We seek to show that there is a residual set U in Cs(X,�m) such that the Walrasmanifold is a smooth manifold of dimension mn.

Now define the Debreu projection

π : Zu ⊂Xm ×Xm ×Δ→Xm : (e, x,p)→ e.

Note that both Zu and Xm will then have dimension mn.By the Morse Sard Theorem the set

V = {e ∈Xm : dπ has rank nm at(e, x,p)}

is dense in Xm.Say the economy (e, u) is regular if π(e, x,p) = e is a regular value of π (or

rank dπ =mn) for all (x,p) ∈Xm ×Δ such that ((e, x,p) ∈ Zu).When e is a regular value of π , then by the inverse function theorem,

π−1(e)= {(e, x,p)1, (e, x,p)2, . . . , (e, x,p)k}

is a zero-dimensional object, and thus will consist of a finite number of isolatedpoints. Thus for each e ∈ V , the Walrasian equilibria associated with e will be finitein number. Moreover there will exist a neighbourhood N of e in V such that theWalrasian equilibria move continuously with e in N .

Proof of the Generic Regularity of the Debreu Projection Define ψu : Xm ×Δ→Δm+1 where u ∈ Cr(X,�m) by ψu(x,p) = (D∗u1(x1), . . . ,D

∗um(xm),p) wherex = (x1, . . . , xm) and u= (u1, . . . , um). Let I be the diagonal {(p, . . . ,p)} in Δm+1.If (x,p) ∈ ψ−1

u (I ) then for each i, D∗ui(xi) = p and so the first order individualoptimality conditions are satisfied. By the Thom Transversality Theorem there isa residual set (in fact an open dense set) of profiles U such that ψu is transversalto I for each u ∈ U . But then the codimension of ψ−1

u (I ) in Xm × Δ equals thecodimension of I in Δm+1.

Now Δ and I are both of dimension (n− 1) and so codimension (I ) in Δm+1

is (m + 1)(n − 1) − (n − 1) = m(n − 1). Thus dim(Xm × Δ) − dim(ψ−1u (I )) =

m(n− 1) and dim(ψ−1u (I ))=mn+ (n− 1)−m(n− 1)= n+m− 1, for all u ∈U .

Now let e ∈Xm be the initial endowment vector and

Y(e)={

(x1, . . . , xm) ∈Xm :m∑

i=1

xi =m∑

i=1

ei

}

be the set of feasible outcomes, a hyperplane in �nm+ of dimension n(m − 1).For each i, let Bi(p) = {xi ∈ X : 〈p,xi〉 = 〈p, ei〉}, be the hyperplane through theboundary of the ith budget set at the price vector p.

AU

TH

OR

’S P

RO

OF


5.3 Generic Existence of Regular Economies 205

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

Define

Σ(e)= {(x,p) ∈Xm ×Δ : x ∈ Y(e), xi ∈ Bi(p), ∀i ∈M}, and

Γ = {(e, x,p) : e ∈Xm, (x,p) ∈Σ(e)}.

As discussed in Chap. 4, Y(e) is characterized by n linear equations, while thebudget restrictions induce a further (m−1) linear restraints (the mth budget restraintis redundant). Thus the dimension of Γ is 2mn+ (n−1)−n− (m−1)= 2mn−m.(In fact, if Δ is taken to be the (n − 1) dimensional simplex, then Γ will be alinear manifold of dimension 2mn−m. More generally, Γ will be a submanifoldof Xm ×Xm ×Δ of dimension 2mn−m. At each point the projection is a regularmap (i.e., the rank of the differential of (e, x,p)→ (x,p) is maximal).

To see this define φ :Xm ×Xm ×Δ→�n ×�m−1 by

φ(e, x,p)=(

m∑

i=1

xi −m∑

i=1

ei, 〈p,x1〉 − 〈p, e1〉, . . . , 〈p,xm−1〉 − 〈p, em−1〉)

.

Clearly if φ(e, x,p) = 0 then x ∈ Y(e) and xi ∈ Bi(p) for each i. But 0 is ofcodimension n + m − 1 in �n × �m−1; thus φ−1(0) is of the same codimensionin X2m × Δ. Thus dim(X2m × Δ) − dimφ−1(0) = n + m − 1 and dimφ−1(0) =2nm+ (n− 1)− (n+m− 1)= 2mn−m (giving the dimension of Γ ). In a similarfashion, for (x,p) ∈Σ(e),φ(e, x,p)= 0, and so

dim(Xm ×Δ

)− dimφ−1(0)= n+m− 1.

Thus Σ(e) is a submanifold of Xm ×Δ of dimension

nm+ (n− 1)− (n+m− 1)=mn−m.

Finally define Zu = {(e, x,p) ∈ Γ : ψu(x,p) ∈ I }. For each u ∈ U,Zu is a sub-manifold of X2m ×Δ of dimension mn.

To see this, let fu(e, x,p)=ψu(x,p). Then

fu : Γ →Xm ×Xm ×Δ→Xm ×Δψu→Δm+1.

As we observed for all u ∈ U,ψu is transversal to I in Δm+1. But the codimen-sion of I in Δm+1 is m(n− 1). Since fu will be transversal to I ,

dim(Γ )− dim(f−1

u (I ))=m(n− 1).

Hence dim(f−1u (I ))=mn. Clearly Zu = f−1

u (I ).Thus for all u ∈ U , the Debreu projection π : Zu → Xm will be a C1-map be-

tween manifolds of dimension mn. The Morse Sard Theorem gives the result. �

AU

TH

OR

’S P

RO

OF



783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

Fig. 5.6 The Debreu map

Thus we have shown that for each smooth profile u in an open dense set U ,there exists an open dense set V of initial endowments such that (e, u) is a regulareconomy for all e ∈ V .

The result is also related to the existence of a demand function for an economy.A demand function for i (with utility ui ) is the function

fi :Δ×�+→X

where fi(p, I ) is that xi ∈X which maximizes ui on

Bi(p, I )= {x ∈X : 〈p,x〉 = I}.

Now define φi :X→Δ×�+ by φi(x)= (D∗ui(x), 〈D∗ui(x), x〉).But the optimality condition is precisely that D∗ui(x)= p and 〈D∗ui(x), x〉 =

〈p,x〉 = I . Thus when φi has maximal rank, it is locally invertible (by the inversefunction theorem) and so locally defines a demand function.

On the other hand if fi is a C1-function then φi must be locally invertible (byfi ). If this is true for all the agents, then ψu :Xm×Δ→Δm+1 must be transversalto I . Consequently if u = (u1, . . . , um) is such that each ui defines a C1-demandfunction fi :Δ×�+ →X then u ∈ U , the open dense set of the regular economytheorem.

As a final note suppose that u ∈U and e is a critical value of the Debreu projec-tion. Then it is possible that π−1(e)= e×W where W is a continuum of Walrasianequilibria. Another possibility is that there is a continuum of singular or catastrophicendowments C, so that as the endowment vector crosses C the number of Walrasianequilibria changes suddenly. As we discussed in Sect. 4.4, at a “catastrophic” en-dowment, stable and unstable Walrasian equilibria may merge (see Balasko 1975).

Another question is whether for every smooth profile, u, and every endowmentvector, e, there exists a Walrasian equilibrium (x,p). This is equivalent to the re-quirement that for every u the projection Zu→Xm is onto Xm.

In this case for each e ∈Xm there will exist some (e, x,p) ∈ Zu. This is clearlynecessary if there is to exist a market clearing price equilibrium p for the economy(e, u).

AU

TH

OR

’S P

RO

OF


5.4 Economic Adjustment and Excess Demand 207

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

The usual general equilibrium arguments to prove existence of a market clearingprice equilibrium typically rely on convexity properties of preference (see Chap. 3).However weaker assumptions on preference permit the use of topological argumentsto show the existence of an extended Walrasian equilibrium (where only the firstorder conditions are satisfied).

When the market-clearing price equilibrium does exist, it is useful to consider aprice adjustment process (or “auctioneer”) to bring about equilibrium.

5.4 Economic Adjustment and Excess Demand

To further develop the idea of a demand function and price adjustment process, weconsider the following famous example of Scarf (1960).

Example 5.6 There are three individuals i ∈M = {1,2,3} and three commodities.Agent i has utility ui(x

i1, x

i2, x

i3)=min(xi

i , xij ) where xi = (xi

1, xi2, x

33) ∈ �3+ is the

ith commodity space.At income I , and price vector p = (p1,p2,p3), i demands equal amounts of

xii , x

jj and zero of xi

k : thus (pi + pj )x = I , so xii = I (pi + pj )

−1 = xij .

Suppose the initial endowment ei of agent i is 1 unit of the ith good, and nothingof the j th and kth. Then I = pi and so i′s demand function fi has the form fi(p)=(fii(p), fij (p), fik(p))= (

pi

pi+pj,

pi

pi+pj,0) ∈ �3.

The excess demand function by i is ξi(p)= fi(p)− ei .Since ei = (eii , eij , eik)= (1,0,0) this gives

ξi(p)=( −pj

pi + pj

,pi

pi + pj

,0

)= (ξii , ξij , ξik).

Suppose now the other two consumers are described by cyclic permutationof subscripts, e.g., j has 1 unit of the j th good and utility uj (x

jj , x

jk , x

ji ) =

min(xjj , x

jk ), etc., then the total excess demand at p is

ξ(p)=3∑

i=1

ξi(p) ∈ �3.

For example, the excess demand in commodity j is:

ξj = ξ1j + ξ2j + ξ3j = pi

pi + pj

− pk

pj + pk

.

Since each i chooses fi(p) to maximize utility subject to 〈p1, fi(p)〉 = I =〈p, ei〉 we expect

∑3i=1 〈p,fi(p)− ei〉 = 0.

To see this note that

⟨p, ξ(p)

⟩ =⟨p,

(p3

p3 + p1− p2

p1 + p2,

p1

p1 + p2− p3

p2 + p3,

p2

p2 + p3− p1

p1 + p3

)⟩

= 0.

AU

TH

OR

’S P

RO

OF



875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

The equation 〈p, ξ(p)〉 = 0 is known as Walras’ Law. To interpret it, suppose welet Δ be the simplex in �3 of price vectors such that ‖p‖ = 1, and pi ≥ 0. Walras’Law says that the excess demand vector ξ(p) is orthogonal to the vector p. In otherwords ξ(p) may be thought of as a tangent vector in Δ. (This is easier to see if weidentify Δ with a quadrant of the sphere, S2.)

We may therefore consider a price adjustment process, which changes the pricevector p(t), at time t by the differential equation

dp(t)

dt= ξ(p) (*)

This adjustment process is a vector field on Δ: that is at every p there exists a rulethat changes p(t) by the equation dp(t)

dt= ξ(p).

If at a vector p∗, the excess demand ξ(p∗)= 0 then dp(t)dt|p∗ = 0, and the price

adjustment process has a stationary point. The flow on Δ can be obtained by inte-grating the differential equation. It is easy to see that if p∗ satisfies p∗1 = p∗2 = p∗3then ξ(p∗)= 0, so there clearly is a price equilibrium where excess demand is zero.

The price adjustment equation (*) does not result in a flow into the price equilib-rium.

To see this, compute the scalar product

⟨(p2p3,p1p3,p1p2), ξ(p)

⟩

=−p3(p1

2 − p22)

p1 + p2+ p2

(p32 − p1

2)

p1 + p3+ p1

(p12 − p3

2)

p2 + p3

= p3(p1 − p2)+ p2(p3 − p1)+ p1(p2 − p1)

= 0.

But if ξ(p)= dpdt

then we obtain p2p3dpdt+ p1p3

dpdt+ p1p2

dpdt= 0.

The obvious solution to this equation is that p1(t)p2(t)p3(t)= constant.In other words when the adjustment process satisfies (*) then the price vector p

(regarded as a function of time, t ,) satisfies the equation p1(t)p2(t)p3(t)= constant.The flow through any point p = (p1,p2,p3), other than the equilibrium price vectorp∗, is then homeomorphic to a circle S1, inside Δ.

Just to illustrate, consider a vector p with p3 = 0.Then ξ(p)= (

−p2p1+p2

,p1

p1+p2,0).

Because we have drawn the flow on the simplex Δ = {p ∈ �3+ :∑

pi = 1} the

flow dpdt

(t)= ξ(p) is discontinuous in p at the three vertices of Δ.However in the interior of Δ the flow is essentially circular (and anti-clockwise).

See Fig. 5.7.To examine the nature of the flow given by the differential equation dp

dt(t) =

ξ(p), define a Lyapunov function at p(t) to be L(p(t)) = 12

∑3i=1(pi(t) − p∗i )2

where p∗ = (p∗1,p∗2,p∗3) is the equilibrium price vector satisfying ξ(p∗)= 0. Since

AU

TH

OR

’S P

RO

OF


5.4 Economic Adjustment and Excess Demand 209

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

Fig. 5.7 Flow on the pricesimplex

p∗ ∈Δ, we may choose p∗ = ( 13 , 1

3 , 13 ). Then

dL

dt=

3∑

i=1

−(pi(t)− p∗i)dpi

dt

=3∑

i=1

ξi

(p(t))pi(t)−

3∑

i=1

p∗i ξi

(p(t))

= −1

3

3∑

i=1

ξi

(p(t)).

(This follows since 〈p, ξ(p)〉 = 0.)If ξi(p(t)) > 0 for i = 1,2,3 then dL

dt< 0 and so the Lyapunov distance L(p(t))

of p(t) from p∗ decreases as t →∞. In other words if δp(t)= p(t)− p∗ then thedistance ‖δp(t)‖ → 0 as t →∞. The equilibrium p∗ is then said to be stable.

If on the contrary ξi(p(t)) < 0 ∀i, then dLdt

> 0 and ‖δp(t)‖ increases as t →∞.In this case p∗ is called unstable.

However it is easy to show that the equilibrium point p∗ is neither stable norunstable. To see this consider the price vector p(t) = ( 2

3 , 16 , 1

6 ). It is then easyto show that ξ = (0, 3

10 , −310 ) so the flow through p (locally) keeps the distance

L(p(t)) constant. To see how L(p(t)) behaves near p(t), consider the pointsp(t − δt) = ( 2

3 , 16 − 1

20 , 16 + 1

10 ) and p(t + δt) = ( 23 , 1

6 + 110 , 1

6 − 110 ). After some

easy arithmetic we find that ξ(t− δt)= (0.195,0.409,−0.814) so that ‖δp(t− δt)‖is increasing at p(t − δt). On the other hand ξ(t + δt)= (−0.195,0.814,−0.409)

so ‖δp(t + δt)‖ is decreasing at p(t + δt). In other words the total excess demand∑3i=1 ξi(p(t)) oscillates about zero as we transcribe one of the closed orbits, so the

distance ‖δp(t)‖ increases then decreases.The Scarf Example gives a way to think more abstractly about the process of

price adjustment in an economy.As we have observed, the differential equation dp

dt= ξ(p) on Δ defines a flow

in Δ. That is if we consider any point p0 ∈Δ and then solve the equation for p, weobtain an “orbit”

{p(t) ∈Δ, t ∈ (−∞,∞); p(0)= p0 and dp = ξ

(p(t))}

AU

TH

OR

’S P

RO

OF



967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

Fig. 5.8 Smoothing the scarfprofile

that commences at the point p(0) = p0, and gives the past and future trajectory.Because the differential equation has a unique solution, any point p0 can belongto only one orbit. As we saw, each orbit in the example satisfies the equationp1(t) · p2(t) · p3(t) = constant. The phase portrait of the differential equation isits collection of orbits.

The differential equation dpdt= ξ(p) assigns to each point p ∈ Δ a vector

ξ(p) ∈ �n, and so ξ may be regarded as a function ξ : Δ→ �n. In fact ξ is acontinuous map except at the boundary of Δ. This discontinuity only occurs be-cause Δ itself is not smooth at its vertices. If we ignore this boundary feature, thenwe may write ξ ∈ C0(Δ,�n), where C0 as before stands for the set of continuousmaps. In fact if we examine ξ as a function of p then it can be seen to be differen-tiable, so ξ ∈ C1(Δ,�n). Obviously C1(Δ,�n) has a natural metric and thereforethe set C1(Δ,�n) can be given the C1-topology. A differential equation dp

dt= ξ(p)

of this kind can thus be treated as an element of C1(Δ,�n) in which case it is calleda vector field. C1(Δ,�n) with the C1-topology is written C1(Δ,�n) or V1(Δ). Weshall also write P(Δ) for the collection of phase portraits on Δ. Obviously, once thevector field, ξ , is specified, then this defines the phase portrait, τ(ξ), of ξ .

In the example, ξ was determined by the utility profile u and endowment vectore ∈ �3×3. As Fig. 5.8 illustrates the profile u can be smoothed by rounding each ui

without changing the essence of the example.

5.5 Structural Stability of a Vector Field

More abstractly then we can view the excess demand function ξ as a map fromCs(X,�m)×Xm to the metric space of vector fields on Δ: that is

ξ :Cs(X,�m

)×Xm −→ V1(Δ).

The genericity theorem given above implies that, in fact, there is an open denseset U in Cs(X,�m) such that ξ is indeed a C1 vector field on Δ. Moreover ξ isan excess demand function obtained from the individual demand functions {fi} asdescribed above.

njschofi

Cross-Out

njschofi

Sticky Note

S, upper case

AU

TH

OR

’S P

RO

OF


5.5 Structural Stability of a Vector Field 211

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

Fig. 5.9 Dissimilar phase portraits

An obvious question to ask is how ξ “changes” as the parameters u ∈Cs(X,�m)

and e ∈Xm change. One way to do this is to consider small perturbations in a vectorfield ξ and determine how the phase portrait of ξ changes.

It should be clear from the Scarf example that small perturbations in the utilityprofile or in e may be sufficient to change ξ so that the orbits change in a qualitativeway. If two vector fields, ξ1, and ξ2 have phase portraits that are homeomorphic,then τ(ξ1) and τ(ξ2) are qualitatively identical (or similar). Thus we say ξ1 and ξ2are similar vector fields if there is a homeomorphism h : Δ→ Δ such that eachorbit in the phase portrait τ(ξ1) of ξ1 is mapped by h to an orbit in τ(ξ2).

As we saw in the Scarf example, each of the orbits of the excess demand function,ξ1, say, comprises a closed orbit (homeomorphic to S1). Now consider the vectorfield ξ2 whose orbits approach an equilibrium price vector p∗. The phase portraitsof ξ1 and ξ2 are given in Fig. 5.9.

The price equilibrium in Fig. 5.9(b) is stable since limt→∞ p(t)→ p∗. Obvi-ously each of the orbits of ξ2 are homeomorphic to the half open interval (−∞,0].Moreover (−∞,0] and S1 are not homeomorphic, so ξ1 and ξ2 are not similar.

It is intuitively obvious that the vector field, ξ2 can be obtained from ξ1 by a“small perturbation”, in the sense that ‖ξ1 − ξ2‖< δ, for some small δ > 0. Whenthere exists a small perturbation ξ2 of ξ1, such that ξ1 and ξ2 are dissimilar, then ξ1is called structurally unstable. On the other hand, it should be plausible that, for anysmall perturbation ξ3 of ξ2 then ξ3 will have a phase portrait τ(ξ3) homeomorphicto τ(ξ2), so ξ2 and ξ3 will be similar. Then ξ2 is called structurally stable. Noticethat structural stability of ξ2 is a much more general property than stability of theequilibrium point p∗ (where ξ2(p

∗)= 0).All that we have said on Δ can be generalised to the case of a smooth manifold

Y . So let V1(Y ) be the topological space of C1-vector fields on Y and P(Y ) thecollection of phase portraits on Y .

Definition 5.1(1) Let ξ1, ξ2 ∈ V1(Y ). Then ξ1 and ξ2 are said to be similar (written ξ1 ∼ ξ2) iff

there is a homeomorphism h : Y → Y such that an orbit σ is the phase portraitτ(ξ1) of ξ1 iff h(σ ) is in the phase portrait of τ(ξ2).

(2) The vector field ξ is structurally stable iff there exists an open neighborhoodV of ξ in V1(Y ) such that ξ ′ ∼ ξ for all ξ ′ ∈ V .

AU

TH

OR

’S P

RO

OF



1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

Fig. 5.10 A source

(3) A property K of vector fields in V1(2) is generic iff the set {ξ ∈ V1(Y ) :ξ satisfiesK} is residual in V1(Y ).

As before, a residual set, V , is the countable intersection of open dense sets, and,when V1(Y ) is a “Baire” space, V will itself be dense.

It was conjectured that structural stability is a generic property. This is true if thedimension of Y is 2, but is false otherwise (Smale 1966; Peixoto 1962).

Before discussing the Peixoto-Smale Theorems, it will be useful to explore fur-ther how we can qualitatively “classify” the set of phase portraits on a manifold Y .The essential feature of this classification concerns the nature of the critical or singu-larity points of the vector field on Y and how these are constrained by the topologicalnature of Y .

Example 5.7 Let us return to the example of the torus Z = S1 × S1 examined inExample 5.4. We defined a height function f : Z → � and considered the fourcritical points {s, t, u, v} of f . To remind the reader v was an index 2 critical point(a local maximum of f ). Near v, f could be represented as

f (h1, h2)= f (v)− h12 − h2

2.

Now f defines a gradient vector field ξ where

ξ(h1, h2)=−df (h1, h2)

Looking down on v we see the flow near v induced by ξ resembles Fig. 5.10.The field ξ may be interpreted as the law of motion under a potential energy field,

f , so that the system flows from the “source”, v, towards the “sink”, s, at the bottomof Z.

Another way of characterizing the source, v, is by what we can call the “degree”of v. Imagine a small ball B2 around v and consider how the map g : S1 → S1 :(h1, h2)→ ξ(h1,h2)‖ξ(h1,h2)‖ behaves as we consider points (h1, h2) on the boundary S1

of B2.At point 1, ξ(h′1, h′2) points “north” so 1 → 1′. Similarly at 2, (ξ(h1

2, h22))

points east, so 2→ 2′. As we traverse the circle once, so does g. The degree ofg is +1, and the degree of v is also +1. However the saddle, u, is of degree −1.

AU

TH

OR

’S P

RO

OF



1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

Fig. 5.11 ???

Fig. 5.12 A saddle

Fig. 5.13 A sink

At 1, the field points north, but at 2 the field points west, so as we traverse thecircle on the left in a clockwise direction, we traverse the circle on the right inan anti-clockwise direction. Because of this change of orientation, the degree of u

is −1.It can also easily be shown that the sink s has degree +1.The rotation at s induced by g is clockwise. It can be shown that, in general,

the Euler characteristic can be interpreted as the sum of the degrees of the criticalpoints. Thus χ(Z)= 1−1−1+1, since the degree at each of the two saddle pointsis −1, and the degree at the source and sink is +1.

njschofi

Cross-Out

njschofi

Sticky Note

degree of the map g

njschofi

Sticky Note

of degree -1

njschofi

Sticky Note

of degree +1.

AU

TH

OR

’S P

RO

OF



1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

Fig. 5.14 ???

Example 5.8 It is obvious that the flow for the Scarf example is not induced by agradient vector field. If there were a function f :Δ→� satisfying ξ(p)=−df (p),then the orbits of ξ would correspond to decreasing values of f . As we saw however,there are circular orbits for ξ . It is clearly impossible to have a circular flow suchthat f decreases round the circle. However the Euler characteristic still determinesthe nature of the zeros (or singularities) of ξ .

First we compute the degree of the singularity p∗ where ξ(p∗)= 0.As we saw in Example 5.5, the orbits look like smoothed triangles homeomorphic

to S1. Let S1 be a copy of the circle (see Fig. 5.14). At 1, g points west, while at 2,g points northwest. At 3, g points northeast, at 4, g points east. Clearly the degreeis +1 again.

As we showed, the Euler characteristic χ(Δ) of the simplex is+1, and the degreeof the only critical point p∗ of the vector field ξ is+1. This suggests that again thereis a relationship between the Euler characteristic χ(Y ) of a manifold Y and the sumof the degrees of the critical points of any vector field ξ on Y . One technical pointshould be mentioned, concerning the nature of the flow on the boundary of Δ. Inthe Scarf example the flow of ξ was “along” the boundary, ∂Δ, of Δ. In a realeconomy one would expect that as the price vector approaches the boundary ∂Δ (sothat pi → 0 for some price pi ), then excess demand ξi for that commodity wouldrapidly increase as (ξi →∞). This essentially implies that the vector field ξ wouldpoint towards the interior of Δ. So now consider perturbations of the Scarf exampleas in Fig. 5.15.

In Fig. 5.15(a) is a perturbation where one of the circular orbits (called S) areretained; only flow commencing near to the boundary approaches S, as does anyflow starting near to the zero p∗ where ξ(p∗)= 0. In this case the boundary ∂Δ isa repellor; the closed orbit is an attractor, and the singularity point or zero, p∗, is asource (or point repellor). In Fig. 5.15(b) the flow is reversed. The closed orbit S isa repellor; p∗ is a sink (or attractor), while ∂Δ is an attractor. Now consider a copyΔ′ of Δ inside Δ (given by the dotted line in Fig. 5.15(b)). On the boundary of Δ′the flow points outwards. Then χ(Δ′) is still 1 and the degree of p∗ is still 1. Thisillustrates the following theorem.

Poincaré-Hopf Theorem Let Y be a compact smooth manifold with boundary ∂Y .Suppose ξ ∈ V1(Y ) has only a finite number of singularities, and points outwardson ∂Y . Then χ(Y ) is equal to the sum of the degrees of the singularities of ξ .

njschofi

Cross-Out

njschofi

Sticky Note

Computing the degree.

AU

TH

OR

’S P

RO

OF



1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

Fig. 5.15 ???

To apply this theorem, suppose ξ is the vector field given by the excess demandfunction. Suppose that ξ points towards the interior of Δ. Then the vector field (−ξ)

points outward. By the Debreu-Smale Theorem, we can generically assume that ξ

has (at most) a finite number of singularities. Since χ(Δ)= 1, there must be at leastone singularity of (−ξ) and thus of ξ . Unfortunately this theorem does not allow usto infer whether or not there exists a singularity p∗ which is stable (i.e., an attractor).

The Poincaré-Hopf Theorem can also be used to understand singularities of vec-tor fields on manifolds without boundary. As we have suggested, the Euler charac-teristic of a sphere is 2 for the even dimensional case and 0 for the odd dimensionalcase. This gives the following result.

The “Hairy Ball” Theorem Any vector field ξ on S2n (even dimension) must havea singularity. However there exists a vector field ξ on S2n+1 (odd dimension) suchthat ξ(p)= 0 for no p ∈ S2n+1.

To illustrate Fig. 5.16 shows a vector field on S2 where the flow is circular oneach of the circles of latitude, but both north and south poles are singularities. Theflow is evidently non-gradient, since no potential function, f , can increase around acircular orbit.

Example 5.9 As an application, we may consider a more general type of flow, bydefining at each point x ∈ Y a set h(x) of vectors in the tangent space at x. Asdiscussed in Chap. 4, h could be induced by a family of utility functions {ui : Y →�, i ∈M}, such that

h(x)= {v ∈ �n : ⟨dui(x), v⟩> 0, ∀i ∈M

}.

That is to say v ∈ h(x) iff each utility function increases in the direction v. We

can interpret Lemma ?? to assert that h(x) = Φ whenever x ∈ 0Θ(u1, . . . , um), the

critical Pareto set.

Suppose that0Θ = Φ . Then in general we can use a selection theorem to select

from h(x) a non-zero vector ξ(x), at every x ∈ Y , such that ξ is a continuous vectorfield. That is, ξ ∈ V1(Y ), but ξ has no singularities. However if χ(Y ) �= 0 then any

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Sticky Note

Theorem 4.21

njschofi

Cross-Out

njschofi

Sticky Note

Illustration of the Poincare Hopf Theorem.

AU

TH

OR

’S P

RO

OF



1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

Fig. 5.16 Zeros of a fieldon S2

vector field ξ on Y has critical points whose degrees sum to χ(Y ), subject to theboundary condition of the Poincaré-Hopf Theorem.

Pareto Theorem If χ(Y ) �= 0 then0Θ(u) �=Φ for any smooth profile u on Y .

The Euler characteristic can be interpreted as an obstruction to the non-existenceof equilibria, or of fixed points. For example suppose that χ(Y )= 0. Then it is pos-sible to construct a vector field ξ on Y without zeros. It is then possible to constructa function f : Y → Y which follows the trajectories of ξ a small distance, ε, say.But then the function f is homotopic to the identity.

That is to say, from each point x construct a path cx : [0,1]→ Y with cx(0)= x

and cx(1) = f (x) whose gradient dcdt

(t) at time t is given by the vector field ξ atthe point x′ = cx(t). Say that f is induced by the vector field, ξ . The homotopyF : [0,1] × Y → Y is then given by F(0, x) = x and F(t, x) = cx(t). Since cx

is continuous, so is F . Thus F is a homotopy between f and the identity on Y .A function f : Y → Y which is homotopic to the identity is called a deformationof Y .

If χ(Y )= 0 then it is possible to find a vector field ξ on Y without singularitiesand then construct a deformation f of Y induced by ξ . Since ξ(x)= 0 for no x, f

will not have a fixed point. Conversely if f is a deformation on Y and χ(Y ) �= 0,then the homotopy between f and the identity generates a vector field ξ . Were f tohave no fixed point, then ξ would have no singularity. If f and thus ξ have the rightbehavior on the boundary, then ξ must have at least one singularity. This contradictsthe fixed point free property of f .

Lefschetz Fixed Point Theorem If Y is a manifold with χ(Y )= 0 then there existsa fixed point free deformation of Y . If χ(Y ) �= 0 then any deformation of Y has afixed point.

AU

TH

OR

’S P

RO

OF



1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

Note that the Lefschetz Fixed Point Theorem does not imply that any functionf : S2 → S2 has a fixed point. For example the “antipodal” map f (x) = −x, for‖x‖ = 1, is fixed point free. However f cannot be induced by a continuous vectorfield, and is therefore not a deformation. The Lefschetz fixed point theorem juststated is in fact a corollary of a deeper result: For any continuous map f : Y → Y ,with Y compact, there is an “obstruction” called the Lefschetz number λ(f ). Ifλ(f ) �= 0, then f must have a fixed point. If Y is “homotopy equivalent” to thecompact ball then λ(f ) �= 0 for every continuous function on the ball, so the ballhas the fixed point property. On the other hand if f is homotopic to the identity,Id , on Y then λ(f ) = λ(Id), and it can be shown that λ(Id) = χ(Y ) the Eulercharacteristic of Y . It therefore follows that χ(Y ) is an obstruction to the existenceof a fixed point free deformation of the compact manifold Y .

Example 5.10 To illustrate an application of this theorem in social choice, supposethat Y is a compact manifold of dimension at most k(σ ) − 2, where k(σ ) is theNakamura number of the social choice rule, σ (see Sect. 3.8). Suppose (u1, . . . , um)

is a smooth profile on Y . Then it can be shown (Schofield 1984) that if the choiceCσ(π)(Y ) is empty, then there exists a fixed point free deformation on Y . Conse-quently if χ(X) �= 0, then Cσ(π)(Y ) must be non-empty.

Example 5.11 It would be useful to be able to use the notion of the Lefschetzobstruction to obtain conditions under which the singularities of the excess demandfunction, ξ , of an economy were stable. However, as Scarf’s example showed, it isentirely possible for there to be a single attractor, or a single repellor (as in Fig. 5.15)or even a situation with an infinite number of closed orbits. However, consider amore general price adjustment process as follows. At each p in the interior, Int Δ,of Δ let

ξ∗(p)= {v ∈ �n : ⟨v, ξ(p)⟩> 0}.

A vector field v ∈ V1(Δ) is dual to ξ iff v(p) ∈ ξ∗(p) for all p ∈ Int Δ, and v(p)= 0iff ξ(p)= 0 for p ∈ Int Δ. It may be possible to find a vector field, v, dual to ξ whichhas attractors. Suppose that v is dual to ξ , and that f :Δ→Δ is induced by v. Aswe have seen, the Lefschetz number of f gives information about the singularitiesof v.

Dierker (1972) essentially utilized the following hypothesis: there exists a dualvector field, v, and a function f :Δ→Δ induced by v such that f = f0 is homo-topic to the constant map f1 :Δ→{( 1

n, . . . , 1

n)} such that {p ∈Δ : ft (p)= p for t ∈

[0,1]} is compact. Under the assumption that the economy is regular (so the numberof singularities of ξ is finite), then he showed that the number of such singularitiesmust be odd. Moreover, if it is known that ξ only has stable singularities, then thereis only one. The proof of the first assertion follows by observing that λ(f0)= λ(f1).But f1 is the constant map on Δ so λ(f1)= 1. Moreover λ(f0) is equal to the sumof the degrees of the singularities of v, and Dierker shows that at each singularity ofv, the degree is ±1. Consequently the number of singularities must be odd. Finallyif there are only stable singularities, each has degree +1, so it must be unique.

AU

TH

OR

’S P

RO

OF



1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

Fig. 5.17 ???

Example 5.12 As a further application of the Lefschetz obstruction, suppose, con-trary to the usual assumption that negative prices are forbidden, that p ∈ Sn−1

rather than Δ. It is natural to suppose that ξ(p) = ξ(−p) for any p ∈ Sn−1. Sup-pose now that ξ(p) = 0 for no p ∈ Sn−1. This defines an (even) spherical mapg : Sn−1 → Sn−1 by g(p) = ξ(p)/‖ξ(p)‖. Thus g(p) = g(−p). The degree (deg(g)) of such a g can readily be seen to be an even integer, and it follows that theLefschetz obstruction of g is λ(g)= 1+ (−1)n−1 deg (g).

Clearly λ(g) �= 0 and so g has a fixed point p such that g(p) = p. But thenξ(p)= αp for some α > 0. This violates Walras’ Law, since 〈p, ξ(p)〉 = α‖p‖2 �=0, so ξ(p)= 0 for some p ∈ Sn−1. Keenan (1992) goes on to develop some of theearlier work by Rader (1972) to show, in this extended context, that for generic,regular economies there must be an odd number of singularities.

The above examples have all considered flows on the simplex or the sphere. Toreturn to the idea of structural stability, let us consider once again examples of avector field on the torus.

Example 5.13 (1) For a more interesting deformation of the torus Z = S1 × S1,consider Fig. 5.17.

The closed orbit at the top of the torus is a repellor, R, say. Any flow starting nearto R winds towards the bottom closed orbit, A, an attractor. There are no singulari-ties, and the induced deformation is fixed point free.

(2) Not all flows on the torus Z need have closed orbits. Consider the flow on Z

given in Fig. 5.18. If the tangent of the angle, θ , is rational, then the orbit through x

is closed, and will consist of a specific number of turns round Z. However supposethis flow is perturbed. There will be, in any neighborhood of θ , an irrational angle.The orbits of an irrational flow will not close up. To relate this to the Peixoto The-orem which follows, with rational flow there will be an infinite number of closedorbits. However the phase portrait for rational flow cannot be homeomorphic tothe portrait for irrational flow. Thus any perturbations of rational flow gives a non-

njschofi

Cross-Out

njschofi

Sticky Note

A Flow on the Torus.

AU

TH

OR

’S P

RO

OF



1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

Fig. 5.18 ???

homeomorphic irrational flow. Clearly any vector field on the torus which givesrational flow is structurally unstable.

Structural Stability Theorem1. If dimY = 2 and Y is compact, then structural stability of vector fields on Y is

generic.2. If dimY ≥ 3, then structural stability is non-generic.

Peixoto (1962) proved part (1) by showing that structurally stable vector fieldson compact Y (of dimension 2) must satisfy the following properties:(1) there are a finite number of non-degenerate isolated singularities (that is, crit-

ical points which can be sources, sinks, or saddles)(2) there are a finite number of attracting or repelling closed orbits(3) every orbit (other than closed orbits) starts at a source or saddle, or winds

away from a repellor and finishes at a saddle or sink, or winds towards anattractor

(4) no orbit connects saddle points.Peixoto showed that for any vector field ξ on Y and any neighborhood V of ξ in

V1(Y ) there was a vector field ξ ′ in V that satisfied the above four conditions andthus was structurally stable.

Although we have not carefully defined the terms used above, they should beintuitively clear. To illustrate, Fig. 5.19(a) shows an orbit connecting saddles, whileFig. 5.19(b) shows that after perturbation a qualitatively different phase portrait isobtained.

In Fig. 5.19(a), A and B are connected saddles, C is a repellor (orbits startingnear to C leave it) and D is a closed orbit. A small perturbation disconnects A andB as shown in Fig. 5.19(b), and orbits starting near to D (either inside or outside)approach D, so it is an attractor.

The excess demand function, ξ , of the Scarf example clearly has an infinite num-ber of closed orbits (all homeomorphic to S1). Thus ξ cannot be structurally stable.From Peixoto’s Theorem, small perturbations of ξ will destroy this feature. As wesuggested, a small perturbation may change ξ so that p∗ becomes a stable equilib-rium (an attractor) or an unstable equilibrium (a repellor).

Smale’s (1966) proof that structural stability was non-generic in three or moredimensions was obtained by constructing a diffeomorphism f : Y 3 → Y 3 (with Y =S1 × S1 × S1). This induced a vector field ξ ∈ V1(Y ) that had the property thatfor a neighborhood V of ξ in V1(Y ), no ξ ′ in V was structurally stable. In other

njschofi

Cross-Out

njschofi

Sticky Note

Structurally unstable rational flow on the Torus.

AU

TH

OR

’S P

RO

OF



1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

Fig. 5.19 ???

words every ξ ′ when perturbed led to a qualitatively different phase portrait. Wecould say that ξ was chaotic. Any attempt to model ξ by an approximation ξ ′, say,results in an essentially different vector field. The possibility of chaotic flow and itsramifications will be discussed in general terms in the next section. The consequencefor economic theory is immediate, however. Since it can be shown that any excessdemand function ξ and thus any vector field can result from an economy, (u, e), itis possible that the price adjustment process is chaotic.

As we observed after Example 5.8, an economically realistic excess demandfunction ξ on Δ should point into the price simplex at any price vector on ∂Δ. Thisfollows because if pi → 0 then ξi would be expected to approach∞. Let V1

0 (Δ) be

the topological space of vector fields on Δ, of the form dpdt|p = ξ(p), such that dp

dt|p

points into the interior of Δ for p near ∂Δ.

The Sonnenschein-Mantel-Debreu Theorem The map

ξ :Cs(X,�m

)×Xm→ V10 (Δ)

is onto if m≥ n.

Suppose that there are at least as many economic agents (m) as commodities.Then it is possible to construct a well-behaved economy (u, e) with monotonic,strictly convex preferences induced from smooth utilities, and an endowment vectore ∈ Xm, such that any vector field in V1

0 (Δ) is generated by the excess demandfunction for the economy (u, e).

Versions of the theorem were presented in Sonnenschein (1972), Mantel (1974),and Debreu (1974). A more recent version can be found in Mas-Colell (1985). As wehave discussed in this section, because the simplex Δ has χ(Δ)= 1, then the “ex-cess demand” vector field ξ will always have at least one singularity. In fact, from

njschofi

Cross-Out

njschofi

Sticky Note

(i) shows an orbit connecting saddle points, while (ii) shows that this property is destroyed by a small perturbation.

AU

TH

OR

’S P

RO

OF


5.6 Speculations on Chaos 221

1473

1474

1475

1476

1477

1478

1479

1480

1481

1482

1483

1484

1485

1486

1487

1488

1489

1490

1491

1492

1493

1494

1495

1496

1497

1498

1499

1500

1501

1502

1503

1504

1505

1506

1507

1508

1509

1510

1511

1512

1513

1514

1515

1516

1517

1518

the Debreu-Smale theorem, we expect ξ to generically exhibit only a finite num-ber of singularities. Aside from these restrictions, ξ , is essentially unconstrained. Ifthere are at least four commodities (and four agents) then it is always possible toconstruct (u, e) such that the vector field induced by excess demand is “chaotic”.

As we saw in Sect. 5.4, the vector field ξ of the Scarf example was structurallyunstable, but any perturbation of ξ led to a structurally stable field ξ ′, say, eitherwith an attracting or repelling singularity. The situation with four commodities ispotentially much more difficult to analyze. It is possible to find (u, e) such that theinduced vector field ξ on Δ is chaotic—in some neighborhood V of ξ there is nostructurally stable field. Any attempt to model ξ by ξ ′, say, must necessarily incor-porate some errors, and these errors will multiply in some fashion as we attemptto map the phase portrait. In particular the flow generated by ξ through some pointx ∈ Δ can be very different from the flow generated by ξ ′ through x. This phe-nomenon has been called “sensitive dependence on initial conditions.”

5.6 Speculations on Chaos

It is only in the last twenty years or so that the implications of “chaos” (or failure ofstructural stability in a profound way) have begun to be realized. In a recent bookKauffman commented on the failure of structural stability in the following way.

“One implication of the occurrence or non-occurrence of structural stability isthat, in structurally stable systems, smooth walks in parameter space must (resultin) smooth changes in dynamical behavior. By contrast, chaotic systems, which arenot structurally stable, adapt on uncorrelated landscapes. Very small changes in theparameters pass through many interlaced bifurcation surfaces and so change thebehavior of the system dramatically.”1

The whole point of the Debreu-Smale Theorem is that generically the Debreumap is regular. Thus there are open sets in the parameter space (of utility profilesand endowments) where the number, E(ξ), of singularities of ξ is finite and con-stant. As the Scarf example showed, however, even though E(ξ) may be constant ina neighborhood, the vector field ξ can be structurally unstable. The structurally un-stable circular vector field of the Scarf example is not particularly surprising. Afterall, similar structurally unstable systems are common (the oscillator or pendulumis one example). These have the feature that, when perturbed, they become struc-turally unstable. Thus the dynamical system of a pendulum with friction is struc-turally stable. Its phase portrait shows an attractor, which still persists as the frictionis increased or decreased. Smale’s Structural Instability Theorem together with theSonnenschien-Mantel-Debreu Theorem suggests that the vector field generated byexcess demand can indeed be chaotic when there are at least four commodities andagents. This does not necessarily mean that chaotic price adjustment processes arepervasive. As in Dierker’s example, even though the vector field ξ(p)= dp

dtcan be

1S. Kauffman, The Origins of Order (1993) Oxford University Press: Oxford.

AU

TH

OR

’S P

RO

OF



1519

1520

1521

1522

1523

1524

1525

1526

1527

1528

1529

1530

1531

1532

1533

1534

1535

1536

1537

1538

1539

1540

1541

1542

1543

1544

1545

1546

1547

1548

1549

1550

1551

1552

1553

1554

1555

1556

1557

1558

1559

1560

1561

1562

1563

1564

chaotic, it may be possible to find a structurally stable vector field, v, dual to ξ ,which is structurally stable.

This brief final section will attempt to discuss in an informal fashion, whether ornot it is plausible for economies to exhibit chaotic behavior.

It is worth mentioning that the idea of structural stability is not a new one, thoughthe original discussion was not formalized in quite the way it is today. Newton’sgreat work Philosophiae Naturalis Principia Mathematica (published in 1687) grewout of his work on gravitation and planetary motion. The laws of motion could besolved precisely giving a vector field and the orbits (or phase portrait) in the case ofa planet (a point mass) orbiting the sun. The solution accorded closely with Kepler’s(1571–1630) empirical observations on planetary motion. However, the attempt tocompute the planetary orbits for the solar system had to face the problem of pertur-bations. Would the perturbations induced in each orbit by the other planets cause theorbital computations to converge or diverge? With convergence, computing the orbitof Mars, say, can be done, by approximating the effects of Jupiter, Saturn perhaps,on the Mars orbit. The calculations would give a prediction very close to the actualorbit. Using the approximations, the planetary orbits could be computed far into thefuture, giving predictions as precise as calculating ability permitted. Without con-vergence, it would be impossible to make predictions with any degree of certainty.Laplace in his work “Mécanique Céleste” (published between 1799 and 1825) hadargued that the solar system (viewed as a formal dynamical system) is structurallystable (in our terms). Consistent with his view was the use of successive approxima-tions to predict the perihelion (a point nearest the sun) of Haley’s comet, in 1759,and to infer the existence and location of Neptune in 1846.

Structural stability in the three-body problem (of two planets and a sun) wasthe obvious first step in attempting to prove Laplace’s assertion. In 1885 a prizewas announced to celebrate the King of Sweden’s birthday. Henri-Poincaré submit-ted his entry “Sur le problème des trois corps et les Equations de la Dynamique.”This attempted to prove structural stability in a restricted three body problem. Theprize was won by Poincaré’s entry, although it was later found to contain an error.Poincaré had obtained his doctorate in mathematics in Paris in 1878, had brieflytaught at Caen and later became professor at Paris. His work on differential equa-tions in the 1880s and his later work on Celestial Mechanics in the 1890s developednew qualitative techniques (in what we now call differential topology) to study dy-namical equations.

In passing it is worth mentioning that since there is a natural periodicity to anyrotating celestial system, the state space in some sense can be viewed as productsof circles (that is tori). Many of the examples mentioned in the previous section,such as periodic (rational) or a-periodic (non-rational) flow on the torus came upnaturally in celestial mechanics.

One of the notions implicitly emphasized in the previous sections of this chapteris that of bifurcation: namely a dynamical system on the boundary separating quali-tatively different systems. At such a bifurcation, features of the system separate out

AU

TH

OR

’S P

RO

OF



1565

1566

1567

1568

1569

1570

1571

1572

1573

1574

1575

1576

1577

1578

1579

1580

1581

1582

1583

1584

1585

1586

1587

1588

1589

1590

1591

1592

1593

1594

1595

1596

1597

1598

1599

1600

1601

1602

1603

1604

1605

1606

1607

1608

1609

1610

in pairs. For example, in the Debreu map, a bifurcation occurs when two of the priceequilibria coalesce. This is clearly linked to the situation studied by Dierker, wherethe number of price equilibria (in Δ) is odd. At a bifurcation, two equilibria withopposite degrees coalesce. In a somewhat similar fashion Poincaré showed that, forthe three-body problem, if there is some value μ0 (of total mass, say) such that pe-riodic solutions exist for μ ≤ μ0 but not for μ > μ0, then two periodic solutionsmust have coalesced at μ0. However Poincaré also discovered that the bifurcationcould be associated with the appearance of a new solution with period double that ofthe original. This phenomenon is central to the existence of a period-doubling cas-cade as one of the characteristics of chaos. Near the end of his Celestial Mechanics,Poincaré writes of this phenomenon:

“Neither of the two curves must ever cut across itself, but it must bend backupon itself in a very complex manner an infinite number of times. . . . Nothing ismore suitable for providing us with an idea of the complex nature of the three bodyproblem.”2

Although Poincaré was led to the possibility of chaos in his investigations intothe solar system, it appears that the system is in fact structurally stable. Arnol’dshowed in 1963 that for a system with small planets, there is an open set of initialconditions leading to bounded orbits for all time. Computer simulations of the sys-tem far into time also suggests it is structurally stable.3 Even so, there are events inthe system that affect us and appear to be chaotic (perhaps catastrophic would be amore appropriate term). The impact of large asteroids may have a dramatic effecton the biosphere of the earth, and these have been suggested as a possible cause ofmass extinction. The onset and behavior of the ice ages over the last 100,000 yearsis very possibly chaotic, and it is likely that there is a relationship between theseviolent climatic variations and the recent rapid evolution of human intelligence.4

More generally, evolution itself is often perceived as a gradient dynamical pro-cess, leading to increasing complexity. However Stephen Jay Gould has argued overa number of years that evolution is far from gradient-like: increasing complexity co-exists with simple forms of life, and past life has exhibited an astonishing variety.5

Evolution itself appears to proceed at a very uneven rate.6

2My observations and quotations are taken from D. Goroff’s introduction and the text of a recentedition of Poincaré’s New Methods of Celestial Mechanics, (1993) American Institute of Physics:New York.3See I. Peterson, Newton’s Clock: Chaos in the Solar System (1993) Freeman: New York.4See W. H. Calvin, The Ascent of Mind. Bantam: New York.5S. J. Gould, Full House (1996) Harmony Books: New York; S. J. Gould, Wonderful Life (1989)Norton: New York.6N. Eldredge and S. J. Gould, “Punctuated Equilibria: An Alternative to Phyletic Gradualism,” inModels in Paleobiology (1972), T. J. M. Schopf, ed. Norton: New York.

AU

TH

OR

’S P

RO

OF



1611

1612

1613

1614

1615

1616

1617

1618

1619

1620

1621

1622

1623

1624

1625

1626

1627

1628

1629

1630

1631

1632

1633

1634

1635

1636

1637

1638

1639

1640

1641

1642

1643

1644

1645

1646

1647

1648

1649

1650

1651

1652

1653

1654

1655

1656

Fig. 5.20 The butterfly

“Empirical” chaos was probably first discovered by Edward Lorenz in his ef-forts to numerically solve a system of equations representative of the behavior ofweather.7 A very simple version is the non-linear vector equation

dx

dt=⎛

⎝dx1dx2dx3

⎞

⎠=⎛

⎝−a(x1 − x2)

−x1x3 + a2x1 − x2x1x2 − a3x3

⎞

⎠

which is chaotic for certain ranges of the three constants, a1, a2, a3.The resulting “butterfly” portrait winds a number of times about the left hole

(A in Fig. 5.20), then about the right hole (B), then the left, etc. Thus the phaseportrait can be described by a sequence of winding numbers (w1

l ,w1k ,w

2l ,w

2k , etc.).

Changing the constants a1, a2, a3 slightly changes the winding numbers.Given that chaos can be found in such a simple meteorological system, it is

worthwhile engaging in a thought experiment to see whether “climatic” chaos isa plausible phenomenon. Weather occurs on the surface of the earth, so the spa-tial context is S2 × I , where I is an interval corresponding to the depth of theatmosphere. As we know, χ(S2) = χ(S2 × I ) = 2 so we would expect singulari-ties. Secondly there are temporal periodicities, induced by the distance from the sunand earth’s rotation. Thirdly there are spatial periodicities or closed orbits. Chiefamong these must be the jet stream and the oceanic orbit of water from the southernhemisphere to the North Atlantic (the Gulf Stream) and back. The most interestingsingularities are the hurricanes generated each year off the coast of Africa and chan-neled across the Atlantic to the Caribbean and the coast of the U.S.A. Hurricanes

7E.N. Lorenz, “The Statistical Prediction of Solutions of Dynamical Equations,” Proceedings Int.Symp. Num. Weather Pred (1962) Tokyo; E. N. Lorenz “Deterministic Non Periodic Flow,” J. At-mos. Sci. (1963): 130–141.

AU

TH

OR

’S P

RO

OF



1657

1658

1659

1660

1661

1662

1663

1664

1665

1666

1667

1668

1669

1670

1671

1672

1673

1674

1675

1676

1677

1678

1679

1680

1681

1682

1683

1684

1685

1686

1687

1688

1689

1690

1691

1692

1693

1694

1695

1696

1697

1698

1699

1700

1701

1702

are self-sustaining heat machines that eventually dissipate if they cross land or coolwater. It is fairly clear that their origin and trajectory is chaotic.

Perhaps we can use this thought experiment to consider the global economy. Firstof all there must be local periodicities due to climatic variation. Since hurricanes andmonsoons, etc. effect the economy, one would expect small chaotic occurrences.More importantly, however, some of the behavior of economic agents will be basedon their future expectations about the nature of economic growth, etc. Thus onewould expect long term expectations to affect large scale “decisions” on matterssuch as fertility. The post-war “baby boom” is one such example. Large scale peri-odicities of this kind might very well generate smaller chaotic effects (such as, forexample, the oil crisis of the 1970s), which in turn may trigger responses of variouskinds.

It is evident enough that the general equilibrium (GE) emphasis on the existenceof price equilibria, while important, is probably an incomplete way to understandeconomic development. In particular, GE theory tends to downplay the formationof expectations by agents, and the possibility that this can lead to unsustainable“bubbles”.

Remember, it is a key assumption of GE that agents’ preferences are defined onthe commodity space alone. If, on the contrary, these are defined on commoditiesand prices, then it is not obvious that the assumptions of the Ky Fan Theorem (cf.,Chap. 3) can be employed to show existence of a price equilibrium. Indeed ma-nipulation of the kind described in Chap. 4 may be possible. More generally onecan imagine energy engines (very like hurricanes) being generated in asset markets,and sustained by self-reinforcing beliefs about the trajectory of prices. It is true thatmodern decentralised economies are truly astonishing knowledge or data-processingmechanisms. From the perspective of today, the argument that a central planning au-thority can be as effective as the market in making “rational” investment decisionsappears to have been lost. Hayek’s case, the so-called “calculation” argument, withvon Mises and against Lange and Schumpeter, was based on the observation thatinformation is dispersed throughout the economy and is, in any case, predominantlysubjective. He argued essentially that only a market, based on individual choices,can possibly “aggregate” this information.8

Recently, however, theorists have begun to probe the degree of consistency orconvergence of beliefs in a market when it is viewed as a game. It would seem thatwhen the agents “know enough about each other”, then convergence in beliefs is aconsequence.9

8See F. A. Hayek, “The Use of Knowledge in Society,” American Economic Review (1945) 55:519–530, and the discussion in A. Gamble, Hayek: The Iron Cage of Liberty (1996) Westview:Boulder, Colorado.9See R. J. Aumann, “Agreeing to Disagree,” Annals of Statistics (1976) 1236–1239 and K. J. Arrow“Rationality of Self and Others in an Economic System,” Journal of Business (1986) 59: S385–S399.

AU

TH

OR

’S P

RO

OF



1703

1704

1705

1706

1707

1708

1709

1710

1711

1712

1713

1714

1715

1716

1717

1718

1719

1720

1721

1722

1723

1724

1725

1726

1727

1728

1729

1730

1731

1732

1733

1734

1735

1736

1737

1738

1739

1740

1741

1742

1743

1744

1745

1746

1747

1748

In fact the issue about the “truth-seeking” capability of human institutions is veryold and dates back to the work of Condorcet.10 Nonetheless it is possible for beliefcascades or bubbles to occur under some circumstances.11 It is obvious enoughthat economists writing after the Great Crash of the 1930s might be more willingthan those writing today to consider the possibility of belief cascades and collapse.John Maynard Keynes’ work on The General Theory of Employment, Interest andMoney (1936) was very probably the most influential economic book of the cen-tury. What is interesting about this work is that it does appear to have grown outof work that Keynes did in the period 1906 to 1914 on the foundation of probabil-ity, and that eventually was published as the Treatise on Probability (1921). In theTreatise, Keynes viewed probability as a degree of belief. He also wrote: “The oldassumptions, that all quantity is numerical and that all quantitative characteristicsare additive, can no longer be sustained. Mathematical reasoning now appears as anaid in its symbolic rather than its numerical character. I, at any rate, have not thesame lively hope as Condorcet, or even as Edgeworth, ‘Eclairer le Science moraleset politiques par le flambeau de l’Algèbre.”’12

Macro-economics as it is practiced today tends to put a heavy emphasis on theempirical relationships between economic aggregates. Keynes’ views, as I inferfrom the Treatise, suggest that he was impressed neither by econometric relation-ships nor by algebraic manipulation. Moreover, his ideas on “speculative eupho-ria” and crashes13 would seem to be based on an understanding of the economygrounded not in econometrics or algebra but in the qualitative aspects of its dynam-ics.

Obviously I have in mind a dynamical representation of the economy somewherein between macro-economics and general equilibrium theory. The laws of motion ofsuch an economy would be derived from modeling individuals’ “rational” behavioras they process information, update beliefs and locally optimise. At present it isnot possible to construct such a micro-based macro-economy because the laws ofmotion are unknown. Nonetheless, just as simulation of global weather systems canbe based on local physical laws, so may economic dynamics be built up from local“rationality” of individual agents. In my view, the qualitative theory of dynamicalsystems will have a major rôle in this enterprise. The applications of this theory, asoutlined in the chapter, are intended only to give the reader a taste of how this theorymight be developed.

10See his work on the so-called Jury Theorem in his Essai of 1785. A discussion of Condorcet’swork can be found in I. McLean and F. Hewitt, Condorcet: Foundations of Social Choice andPolitical Theory (1994) Edward Elgar: Aldershot, England.11See S. Bikhchandani, D. Hirschleifer and I. Welsh, “A Theory of Fads, Fashion, Custom, andCultural Change as Information Cascades,” Journal of Political Economy (1992) 100: 992–1026.12John Maynard Keynes, Treatise on Probability (1921) Macmillan: London pp. 349. The twovolumes by Robert Skidelsky on John Maynard Keynes (1986, 1992) are very useful in helping tounderstand Keynes’ thinking in the Treatise and the General Theory.13See, for example, the work of Hyman Minsky John Maynard Keynes (1975) Columbia UniversityPress: New York, and Stabilizing an Unstable Economy (1986) Yale University Press: New Haven.

AU

TH

OR

’S P

RO

OF


References 227

1749

1750

1751

1752

1753

1754

1755

1756

1757

1758

1759

1760

1761

1762

1763

1764

1765

1766

1767

1768

1769

1770

1771

1772

1773

1774

1775

1776

1777

1778

1779

1780

1781

1782

1783

1784

1785

1786

1787

1788

1789

1790

1791

1792

1793

1794

References

A very nice though brief survey of the applications of global analysis (or differential topology) toeconomics is:

<unc> Debreu, G. (1976). The application to economics of differential topology and global analysis:regular differentiable economies. The American Economic Review, 66, 280–287.

An advanced and detailed text on the use of differential topology in economics is:

Mas-Colell, A. (1985). The theory of general economic equilibrium. Cambridge: Cambridge Uni-versity Press.

Background reading on differential topology and the ideas of transversality can be found in:

Chillingsworth, D. R. J. (1976). Differential topology with a view to applications. Pitman: London.Golubitsky, M., & Guillemin, V. (1973). Stable mappings and their singularities. Berlin: Springer.Hirsch, M. (1976). Differential topology. Berlin: Springer.

For the Debreu-Smale Theorem see:

Balasko, Y. (1975). Some results on uniqueness and on stability of equilibrium in general equilib-rium theory. Journal of Mathematical Economics, 2, 95–118.

Debreu, G. (1970). Economies with a finite number of equilibria. Econometrica, 38, 387–392.Smale, S. (1974a). Global analysis of economics IV: finiteness and stability of equilibria with

general consumption sets and production. Journal of Mathematical Economics, 1, 119–127.<unc> Smale, S. (1974b). Global analysis and economics IIA: extension of a theorem of Debreu. Journal

of Mathematical Economics, 1, 1–14.

The Smale-Pareto theorem on the generic structure of the Pareto set is in:

Smale, S. (1973). Global analysis and economics I: Pareto optimum and a generalization of Morsetheory. In M. Peixoto (Ed.), Dynamical systems. New York: Academic Press.

Scarf’s example and the use of the Euler characteristic are discussed in Dierker’s notes on topo-logical methods.

Dierker, E. (1972). Two remarks on the number of equilibria of an economy. Econometrica, 951–953.

<unc> Dierker, E. (1974). Lecture notes in economics and mathematical systems: Vol. 92. Topologicalmethods in Walrasian economics. Berlin: Springer.

Scarf, H. (1960). Some examples of global instability of the competitive equilibrium. InternationalEconomic Review, 1, 157–172.

njschofi

Cross-Out

njschofi

Sticky Note

Further Reading

AU

TH

OR

’S P

RO

OF



1795

1796

1797

1798

1799

1800

1801

1802

1803

1804

1805

1806

1807

1808

1809

1810

1811

1812

1813

1814

1815

1816

1817

1818

1819

1820

1821

1822

1823

1824

1825

1826

1827

1828

1829

1830

1831

1832

1833

1834

1835

1836

1837

1838

1839

1840

The Lefschetz fixed point theorem as an obstruction theory for the existence of fixed point freefunctions and continuous vector fields can be found in:

<unc> Brown, R. (1971). The Lefschetz fixed point theorem. Glenview: Scott and Foresman.

Some applications of these techniques in economics are in:

Keenan, D. (1992). Regular exchange economies with negative prices. In W. Nenuefeind & R.Riezman (Eds.), Economic theory and international trade. Berlin: Springer.

Rader, T. (1972). Theory of general economic equilibrium. New York: Academic Press.

An application of this theorem to existence of a voting equilibrium is in:

Schofield, N. (1984). Existence of equilibrium on a manifold. Journal of Operations Research, 9,545–557.

The results on structural stability are given in:

Peixoto, M. (1962). Structural stability on two-dimensional manifolds. Topology, 1, 101–120.Smale, S. (1966). Structurally stable systems are not dense. American Journal of Mathematics, 88,

491–496.

A very nice and fairly elementary demonstration of Peixoto’s Theorem in two dimensions, togetherwith the much earlier classification results of Andronov and Pontrjagin, is given in:

<unc> Hubbard, J. H., & West, B. H. (1995). Differential equations: a dynamical systems approach.Berlin: Springer.

René Thom’s early book applied these “topological” ideas to development:

<unc> Thom, R. (1975). Structural stability and morphogenesis. Reading: Benjamin.

The result that any excess demand function is possible can be found in:

Debreu, G. (1974). Excess demand functions. Journal of Mathematical Economics, 1, 15–21.Mantel, R. (1974). On the characterization of aggregate excess demand. Journal of Economic The-

ory, 12, 197–201.Sonnenschein, H. (1972). Market excess demand functions. Econometrica, 40, 549–563.

The idea of chaos has become rather fashionable recently. For a discussion see:

<unc> Gleick, J. (1987). Chaos: making a new science. New York: Viking.

AU

TH

OR

’S P

RO

OF


References 229

1841

1842

1843

1844

1845

1846

1847

1848

1849

1850

1851

1852

1853

1854

1855

1856

1857

1858

1859

1860

1861

1862

1863

1864

1865

1866

1867

1868

1869

1870

1871

1872

1873

1874

1875

1876

1877

1878

1879

1880

1881

1882

1883

1884

1885

1886

An excellent background to the work of Poincaré, Birkhoff and Smale, and applications in meteo-rology is:

<unc> Lorenz, E. N. (1993). The essence of chaos. Seattle: University of Washington Press.

For applications of the idea of chaos in various economic and social choice contexts see:

<unc> Saari, D. (1985). Price dynamics, social choice, voting methods, probability and chaos. In D.Aliprantis, O. Burkenshaw, & N. J. Rothman (Eds.), Lecture notes in economics and mathe-matical systems (Vol. 244). Berlin: Springer.

AU

TH

OR

’S P

RO

OF


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

6Topology and Social Choice

In Chap. 3 we showed the Nakamura Theorem that a social choice could be guaran-teed as long as the dimension of the space did not exceed k(σ )−2. We now considerwhat can happen in dimension above k(σ )− 1. We then go on to consider “proba-bilistic” social choice, where there is some uncertainty over voters’ preferences.

6.1 Existence of a Choice

First we repeat some of the results presented in Chap. 3.As we showed in this chapter, arguments for the existence of an equilibrium or

choice are based on some version of Brouwer’s fixed point theorem, which we canregard as a variant of the Fan Choice Theorem. Brouwer’s theorem asserts that anycontinuous function f : B → B between the finite dimensional ball, B or indeedany compact convex set in �w , has the fixed point property.

This section will consider the use of variants of the Brouwer theorem, to proveexistence of an equilibrium of a general social choice mechanism. We shall arguethat the condition for existence of an equilibrium will be violated if there are cyclesin the underlying mechanism.

Let W ⊂ �w be the set of alternatives and, and let 2W be the set of all subsetsof W . A preference correspondence, P , on W assigns to each point x ∈ W , itspreferred set P(x). Write P : W � W to denote that the image of x under P is aset (possibly empty) in W . For any subset V of W , the restriction of P to V gives acorrespondence PV : V � V . Define P−1

V : V � V such that for each x ∈ V ,

P−1V (x)= {y : x ∈ P(y)

}∩ V.

The sets PV (x),P−1V (x) are sometimes called the upper and lower preference sets

of P on V . When there is no ambiguity we delete the suffix V . The choice of P

from W is the set

C(W,P )= {x ∈W : P(x)=∅}.


231

http://dx.doi.org/10.1007/978-3-642-39818-6_6

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Sticky Note

Chap. 3

AU

TH

OR

’S P

RO

OF


232 6 Topology and Social Choice

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

Here ∅ is the empty set. The choice of P from a subset, V , of W is the set

C(V,P )= {x ∈ V : PV (x)=∅}.

Call CP a choice function on W if CP (V ) = C(V,P ) �= ∅ for every subset V

of W . We now seek general conditions on W and P which are sufficient for CP tobe a choice function on W . Continuity properties of the preference correspondenceare important and so we require the set of alternatives, W , to be a topological space.

Definition 6.1 Let W,Y be two topological spaces. A correspondence P : W � Y

is(i) Lower hemi-continuous (lhc) iff, for all x ∈W , and any open set U ⊂ Y such

that P(x)∩U �=∅ there exists an open neighborhood V of x in W , such thatP(x ′)∩U �=∅ for all x′ ∈ V .

(ii) Upper hemi-continuous (uhc) iff, for all x ∈W and any open set U ⊂ Y suchthat P(x) ⊂ U , there exists an open neighborhood V of x in W such thatP(x ′)⊂U for all x′ ∈ V .

(iii) Lower demi-continuous (ldc) iff, for all x ∈ Y , the set

P−1(x)= {y ∈W : x ∈ P(y)}

is open (or empty) in W .(iv) Upper demi-continuous (udc) iff, for all x ∈ W , the set P(x) is open (or

empty) in Y .(v) Continuous iff P is both ldc and udc.

(vi) Acyclic if it is impossible to find a cycle xt ∈ P(xt−1), xt−1 ∈ P(xt−2), . . . ,

x1 ∈ P(xt ).

We shall use lower demi-continuity of a preference correspondence to prove ex-istence of a choice. In some cases, however, it is possible to make use of lowerhemi-continuity. Note that if P is ldc then it is lhc.

We shall now show that if W is compact, and P is an acyclic and ldc preferencecorrespondence P : W � W , then C(W,P ) �=∅. First of all, say a preference cor-respondence P : W � W satisfies the finite maximality property (FMP) on W iff forevery finite set V in W , there exists x ∈ V such that P(x)∩ V =∅.

Lemma 6.1 (Walker Wal1977??) If W is a compact, topological space and P is an <ref:??>

ldc preference correspondence that satisfies FMP on W , then C(W,P ) �= ∅. Thisfollows readily, using compactness to find a finite subcover, and then using FMP.

Corollary 6.1 If W is a compact topological space and P is an acyclic, ldc pref-erence correspondence on W , then C(W,P ) �=∅.

As Walker (Wal1977??) noted, when W is compact and P is ldc, then P is <ref:??>

acyclic iff P satisfies FMP on W , and so either property can be used to show exis-tence of a choice. A alternative method of proof to show that CP is a choice functionis to substitute a convexity property for P rather than acyclicity.

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Cross-Out

AU

TH

OR

’S P

RO

OF


6.2 Dynamical Choice Functions 233

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

Definition 6.2(i) If W is a subset of a vector space, then the convex hull of W is the set,

Con[W ], defined by taking all convex combinations of points in W .(ii) W is convex iff W = Con[W ]. (The empty set is also convex.)

(iii) W is admissible iff W is a compact, convex subset of a topological vectorspace.

(iv) A preference correspondence P : W � W on a convex set W is convex iff,for all x ∈W , P(x) is convex.

(v) A preference correspondence P : W � W is semi-convex iff, for all x ∈W ,it is the case that x /∈ Con(P (x)).

As we showed in Chap. 3, Fan (1961) has demonstrated that if W is admissibleand P is ldc and semi-convex, then C(W,P ) is non-empty.

Choice Theorem (Fan 1961; Bergstrom 1975) If W is an admissible subset of aHausdorff topological vector space, and P : W � W a preference correspondenceon W which is ldc and semi-convex then C(W,P ) �=∅.

As demonstrated in Chap. 3, the proof uses the KKM lemma due to Knaster et al.(1929). There is a useful corollary to the Fan Choice theorem. Say a preferencecorrespondence on an admissible space W satisfies the convex maximality property(CMP) iff for any finite set V in W , there exists x ∈ Con(V ) such that P(x) ∩Con(V )=∅.

Corollary 6.2 Let W be admissible and P : W � W be ldc and semi-convex. ThenP satisfies the convex maximality property.

Numerous applications of the procedure have been made to show existence ofsuch an economic equilibrium. Note however, that these results all depend on semi-convexity of the preference correspondences.

6.2 Dynamical Choice Functions

We now consider a generalized preference field H :W � T W , on a smooth mani-fold W .

We use this notation to mean that at any x ∈W , H(x) is a cone in the tangentspace TxW above x. That is, if a vector v ∈ H(x), then λv ∈ H(x) for any λ > 0.If there is a smooth curve, c : [−1,1] →W , such that the differential dc(t)

dt∈H(x),

whenever c(t)= x, then c is called an integral curve of H . An integral curve of H

from x = c(o) to y = limt→1 c(t) is called an H -preference curve from x to y. Thepreference field H is called S-continuous iff, for any v ∈H(x) �=∅ then there is anintegral curve, c, in a neighborhood of x with dc(0)

dt= v. The choice C(W,H) of H

on W is defined by

C(W,H)= {x ∈W :H(x)=∅}.

AU

TH

OR

’S P

RO

OF



139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

Say H is half open if at every x ∈W , either H(x) = ∅ or there exists a vectorv′ ∈ TxW such that (v′ �v) > 0 for all v ∈H(x). We can say in this case that there is,at x, a direction gradient d in the cotangent space T ∗x W of linear maps from TxW

to � such that d(v) > 0 for all v ∈H(x). If H is S-continuous and half-open, thenthere will exist such a continuous direction gradient d V → T ∗V on a neighborhoodV of x.

Choice Theorem If H is an S-continuous half open preference field, on a finitedimensional compact manifold, W , then C(W,H) �= ∅. If H is not half openthen there exists an H -preference cycle through {x1, x2, x3, .xr .x1}. For each arc(xs, xs+1) there is an H -preference curve from xs to xs+1, with a final H -preferencecurve from xr to x1.

The Choice Theorem implies the existence of a singularity of the field, H .

Existence of Nash Equilibrium Let {W1, . . . ,Wn} be a family of compact, con-tractible, smooth, strategy spaces with each Wi ⊂ �w . A smooth profile is a func-tion u : WN = W1 ×W2 × · · · ×Wn � �n. Let Hi : Wi � T Wi be the inducedi-preference field in the tangent space over Wi . If each Hi is S-continuous andhalf open in T Wi then there exists a critical Nash equilibrium, z ∈WN such thatHN(z)= (H1 × · · · ×Hn)(z)=∅.

This follows from the previous choice theorem because the product preferencefield, HN , will be half-open and S-continuous Below we consider existence of localNash equilibrium. With smooth utility functions, a local Nash equilibrium can befound by checking the second order conditions on the Hessians. We now repeatExample 3.12. from Chap. 3.

Example 6.1 To illustrate the Choice Theorem, consider the example due toKramer (1973), with N = {1,2,3}. Let the preference relation PD : W � W be gen-erated by a set of decisive coalitions, D= {{1,2}, {1,3}, {2,3}}, so that y ∈ PD(x)

whenever two voters prefer y to x. Suppose further that the preferences of the votersare characterized by the direction gradients

{dui(x) : i = 1,2,3

}

as in Fig. 6.1.As the figure makes evident, it is possible to find three points {a, b, c} in W such

that

u1(a) > u1(b)= u1(x) > u1(c)

u2(b) > u2(c)= u2(x) > u2(a)

u3(c) > u3(a)= u3(x) > u3(b).

That is to say, preferences on {a, b, c} give rise to a Condorcet cycle. Note also thatthe set of points PD(x), preferred to x under the voting rule, are the shaded “winsets” in the figure. Clearly x ∈ ConPD(x), so PD(x) is not semi-convex. Indeed it

AU

TH

OR

’S P

RO

OF



185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

Fig. 6.1 Cycles near apoint x

should be clear that in any neighborhood V of x it is possible to find three points{a′, b′, c′} such that there is local voting cycle, with a′ ∈ PD(b′), b′ ∈ PD(c′), c′ ∈PD(a′). We can write this as

a′ → c′ → b′ → a′.

Not only is there a voting cycle, but the Fan theorem fails, and we have no reason tobelieve that C(W,PD) �=∅.

We can translate this example into one on preference fields by writing

HD(u)=∪HM(u) :W � T W

where each M ∈D and

HM(u)(x)= {v ∈ TxW : (dui(x) � v)> 0, ∀i ∈M

}.

Figure 6.2 shows the three difference preference fields {Hi : i = 1,2,3} on W , aswell as the intersections HM , for M = {1,2} etc.

Obviously the joint preference field HD :W � T W fails the half open propertyat x. Although HD is S-continuous, we cannot infer that C(W,H) �=∅. If we define

Cycle(W,H)= {x ∈W :H(x) is not half open},

then at any point in Cycle(W,H) it is possible to construct local cycles in the mannerjust described.

AU

TH

OR

’S P

RO

OF



231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

Fig. 6.2 The failure ofhalf-openness of a preferencefield

The choice theorem can then be interpreted to mean that for any S-continuousfield on W , then

Cycle(W,H)∪C(W,H) �=∅.

Chilchinisky (1995) has obtained similar results for markets, where the conditionthat the dual is non-empty was termed market arbitrage, and defined in terms ofglobal market co-cones associated with each player. Such a dual co-cone, [Hi(u)]∗is precisely the set of prices in the cotangent space that lie in the dual of the pre-ferred cone, [Hi(u)], of the agent. By analogy with the above, she identifies thiscondition on non-emptiness of the intersection of the family of co-cones as onewhich is necessary and sufficient to guarantee an equilibrium.

The following Theorem implies that Fig. 6.2 is “generic.” As in Chap. 4, a prop-erty is generic if it is true of all profiles in a residual set in Cr(W,�n), where thisis the Whitney topology on smooth profiles, for a society of size n, on the policyspace W . Now consider a non-collegial voting game, D with Nakamura numberκ(D) Then we have the following Theorem by Saari (1997).

Saari Theorem For any non-collegial D, there exists an integer w(D) > κ(D) suchthat dim(W) > w(D) implies that C(W,HD(u)) = ∅ for all u in a residual sub-space of Cr(W,�n).

This result was essentially proved by Saari (1997), building on earlier resultsby Banks (1995), McKelvey (1976, 1979), Kramer (1973), Plott (Plo1969??), <ref:??>

Schofield (1978, 1983) and McKelvey and Schofield (1987). Although this resultformally applies to voting rules, Schofield (2010) argues that it is applicable to anynon-collegial social mechanism, and as a result can be interpreted to imply that

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Cross-Out

njschofi

Sticky Note

1967

AU

TH

OR

’S P

RO

OF



277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

Fig. 6.3 The heart with auniform electorate on thetriangle

chaos is a generic phenomenon in coalitional systems. Since an equilibrium maynot exist, we now introduce a more general social solution concept, called the heart.

Definition 6.3 (The Heart)(i) If W is a topological space, then x ∈W is locally covered (under a preference

correspondence Q) iff for any neighborhood Y of x in W , there exists y ∈ Y

such that (a) y ∈ Q(x), and (b) there exists a neighborhood Y ′ of y, withY ′ ⊆ Y such that Y ′ ∩Q(y)⊂Q(x).

(ii) The heart of Q, written H(Q), is the set of locally uncovered points in W .This notion can be applied to a preference correspondence PD or to the prefer-

ence field, HD(u), in which case we write H(PD) or H(HD(u)). Schofield (1999)shows that the heart will belong to the Pareto set, and is lower hemi-continuouswhen regarded as a correspondence.

Example 6.2 To illustrate the heart, Fig. 6.3 gives a simple artificial examplewhere the utility profile, u, is one where society has “Euclidean” preferences, basedon distance, and the ideal points are uniformly distributed on the boundary ofthe equilateral triangle. Under majority rule, D, the heart H(HD(u)), is the star-shaped figure inside the equilateral triangle (the Pareto set), and contains the “yolk”McKelvey (1986). The heart is generated by an infinite family of “median lines,”such as {M1,M2, . . .}. The shape of the heart reflects the asymmetry of the distri-bution. Inside the heart, voting trajectories can wander anywhere. Outside the heartthe dual cones intersect, so any trajectory starting outside the heart converges to theheart. Thus the heart is an “attractor” of the voting process. Figure 6.4 gives a simi-lar example, this time where preferences are defined by the pentagon, and the heartis the small centrally located circle.

njschofi

Cross-Out

njschofi

Sticky Note

ball

AU

TH

OR

’S P

RO

OF



323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

Fig. 6.4 The heart with auniform electorate on thepentagon

6.3 Stochastic Choice

To construct such a social preference field, we first consider political choice on acompact space, W , of political proposals. This model is an extension of the stan-dard multiparty stochastic model, modified by inducing asymmetries in terms of thepreferences of voters.

We define a stochastic electoral model, M(λ,μ, θ ,α, β), which utilizes socio-demographic variables and voter perceptions of character traits. For this model weassume that voter i utility is given by the expression

uij (xi, zj ) = λj +μj (zj )+ (θj � ηi)+ (αj � τi)− β‖xi − zj‖2 + εj (6.1)

= [u∗ij (xi, zj )]+ εj (6.2)

The points {xi ∈ W : i ∈ N} are the preferred policies of the set, N , of voters inthe political or policy space W , and z = {zj ∈W : j ∈Q} are the positions of theagents/candidates. The term ‖xi − zj‖ is simply the Euclidean distance between xi

and zj . The error vector (ε1, . . . , εj , . . . , εp) is distributed by the iid type I extremevalue distribution, as assumed in empirical multinomial logit estimation (MNL).The symbol θ denotes a set of k-vectors {θj : j ∈ Q} representing the effect ofthe k different sociodemographic parameters (class, domicile, education, income,religious orientation, etc.) on voting for agent j while ηi is a k-vector denotingthe ith individual’s relevant “sociodemographic” characteristics. The compositions{(θj � ηi)} are scalar products, called the sociodemographic valences for j . Thesescalar terms characterize the various types of the voters.

njschofi

Sticky Note

I use the square bold dot for scalar product.

njschofi

Sticky Note

See Lin et al.(1999) for work with the stochastic model

AU

TH

OR

’S P

RO

OF


6.3 Stochastic Choice 239

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

The terms {(αj � τi)} are scalars giving voter i’s perceptions and beliefs. Thesecan include perceptions of the character traits of candidate or agent j , or beliefsabout the state of the economy, etc. We let α = (αq, . . . , α1). A trait score can beobtained by factor analysis from a set of survey questions asking respondents aboutthe traits of the agent, including ‘moral’, ‘caring’, ‘knowledgeable’, ‘strong’, ‘hon-est’, ‘intelligent’, etc. The perception of traits can be augmented with voter percep-tion of the state of the economy, etc. in order to examine how anticipated changesin the economy affect each agent’s electoral support.

Finally the exogenous valence vector λ= (λ1, λ2, . . . , λq) gives the general per-ception of the quality of the various candidates, {1, . . . , q}. This vector satisfiesλq ≥ λq−1 ≥ · · · ≥ λ2 ≥ λ1, where (1, . . . , q) label the candidates, and λj is theexogenous valence of agent or candidate j . In empirical multinomial logit models,the valence vector λ is given by the intercept terms for each agent. Finally {μj (zj )}represent the endogenous valences of the candidates. These valences depend on thepositions {zj ∈W : j ∈Q} of the agents.

In the model, the probability that voter i chooses candidate j , when party posi-tions are given by z is:

ρij (z)= Pr[[

uij (xi, zj ) > uil(xi, zl)], for all l �= j

].

A local Political Nash equilibrium (SLNE) is a vector, z, such that each candi-date, j , has chosen zj to locally strictly maximize the expectation Σiρij (z).

The type I extreme value distribution, Ψ , has a cumulative distribution the closedform

Ψ (h)= exp[− exp[−h]],

while its pdf has variance 16π2.

With this distribution it follows, for each voter i, and candidate, j , that

ρij (z)=exp[u∗ij (xi, zj )]

∑q

k=1 expu∗ik(xi, zk). (6.3)

This game is an example of what is known as a Quantal response gameMcKelvey and Palfrey (1995), Levine and Palfrey (2007). Note that the utility ex-pressions {u∗ik(xi, zk)} can be estimated from surveys that include vote intentions.We can use the American National Election Survey (ANES for 2008) which givesgave individual perceptions of the important political policy questions. As indicatedin Table 6.1, we are able to use factor analysis of these responses to construct a twodimensional policy space. ANES 2008 also gave voter perceptions of the charactertraits of the candidates, in terms of “moral”, “caring”, “knowledgeable”, “strong”and “honest”. We performed a factor analysis of these perceptions as shown in Ta-ble 6.2. Further details of the model can be found in Schofield et al. (2011)). Ta-ble 6.3 gives estimates of the average voter positions for the two parties . We alsoobtained data on those voters who declared they provided support for the candidates.these we designated as activists. Figure 6.5 gives an estimate of the voter estimatedpositions as well as the two presidential candidate positions in 2008.

njschofi

Sticky Note

strict

njschofi

Sticky Note

A local Nash equilibrium (LNE) is one that locally maximizes the expectation.

AU

TH

OR

’S P

RO

OF



415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

Table 6.1 Factor loadings for economic and social policy for the 2008 election

Question Economic policy Social policy

Less Government services 0.53 0.12

Oppose Universal health care 0.51 0.22

Oppose Bigger Government 0.50 0.14

Prefer Market to Government 0.56

Decrease Welfare spending 0.24

Less government 0.65

Worry more about Equality 0.14 0.37

Tax Companies Equally 0.28 0.10

Support Abortion 0.55

Decrease Immigration 0.12 0.25

Civil right for gays 0.60

Disagree Traditional values 0.53

Gun access 0.36

Support Afr. Amer 0.14 0.45

Conservative v Liberal 0.30 0.60

Eigenvalue 1.93 1.83

Table 6.2 Factor loadingsfor candidate traits scores2008

Question Obama traits McCain traits

Obama Moral 0.72 −0.01

Obama Caring 0.71 −0.18

Obama Knowledgeable 0.61 −0.07

Obama Strong 0.69 −0.13

Obama Honest 0.68 −0.09

Obama Intelligent 0.61 0.08

Obama Optimistic 0.55 0.00

McCain Moral −0.09 0.67

McCain Cares −0.17 0.63

McCain Knowledgeable −0.02 0.65

McCain Strong −0.10 0.70

McCain Honest −0.03 0.63

McCain Intelligent 0.11 0.68

McCain Optimistic −0.07 0.57

Eigenvalue 3.07 3.00

These survey data allow us to construct a spatial logit model of the election as inTable 6.4. This table has Obama as the baseline candidate.

AU

TH

OR

’S P

RO

OF



461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

Table 6.3 Estimated voter, activist and candidate positions in 2008

Econ policy Social policy n

Mean s.e. 95 % C.I Mean s.e. 95 % C.I

Activists

Democrats −0.20 0.09 [−0.38,−0.02] 1.14 0.11 [0.92,1.37] 80

Republicans 1.41 0.13 [1.66,1.16] −0.82 0.09 [−0.99,−0.65] 40

Non-activists

Democrats −0.17 0.03 [−0.24,−0.11] 0.36 0.04 [0.29,0.44] 449

Republicans 0.72 0.06 [0.60,0.84] −0.56 0.05 [−0.65,−0.46] 219

788

Fig. 6.5 Distribution of voter ideal points and candidate positions in the 2008 presidential election

Once the voter probabilities over a given set z= {zj : j ∈Q} are computed, thenestimation procedures allow these probabilities to be computed for each possiblevector z. This allows the determination and proof of existence of local Nash equi-librium (LNE), namely a vector, z∗ = (z∗1, . . . , z∗j , . . . , z∗q) such that each candidate,j , chooses z∗j to locally maximize its expected vote share, given by the expectationVj (z∗)=∑i ρij (z∗).

Schofield (2006) shows that the first order condition for a LNE is that themarginal electoral pull at z∗ = (z∗1, . . . , z∗j , . . . , z∗q) is zero. For candidate j , thisis defined to be

dE∗jdzj

(z∗j)= [zel

j − z∗j]

where zelj ≡

n∑

i=1

�ijxi

AU

TH

OR

’S P

RO

OF



507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

Table 6.4 Spatial logit models for USA 2008a

Variable (1) Spatial (2) Sp. & Traits (3) Sp. & Dem. (4) Full

McCain valence λ −0.84∗∗∗ −1.08∗∗∗ −2.60∗∗ −3.58∗∗∗

(7.6) (8.3) (2.8) (3.4)

Spatial β 0.85∗∗∗ 0.78∗∗∗ 0.86∗∗∗ 0.83∗∗∗

(14.1) (10.1) (12.3) (10.3)

McCain traits 1.30∗∗∗ 1.36∗∗∗

(7.6) (7.15)

Obama traits −1.02∗∗∗ −1.16∗∗∗

(6.8) (6.44)

Age −0.01 −0.01

(1.0) (1.0)

Gender (F) 0.29 0.44

(1.26) (0.26)

African American −4.16∗∗∗ −3.79∗∗∗

(3.78) (3.08)

Hispanic −0.55 −0.23

(1.34) (0.51)

Education 0.15∗ 0.22∗∗∗

(2.5) (3.66)

Income 0.03 0.01

(1.5) (0.50)

Working Class −0.54∗ −0.70∗∗

(2.25) (2.59)

South 0.36 −0.02

(1.5) (0.07)

Observations 788

log likelihood (LL) −299 −243 −250 −207

AIC 601 494 521 438

BIC 611 513 567 494

aObama is the baseline for this model

is the weighted electoral mean of candidate j .Here the weights {�ij } are individual specific, and defined at the vector z∗ by:

[�ij ] =[ [ρij (z∗)− ρij (z∗)2]∑

k∈N [ρkj (z∗)− ρkj (z∗)2]]

(6.4)

Because the candidate utility functions {Vj : W → �} are differentiable, thesecond order condition on the Hessian of each Vj at z∗ can then be used to de-termine whether z∗ is indeed an LNE. Proof of existence of such an LNE will

AU

TH

OR

’S P

RO

OF



553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

Fig. 6.6 Optimal Republican position

then follow from some version of the Choice Theorem. For example, in the sim-pler model M(λ, β), without activists, all weights are equal to 1

n, so the electoral

mean x0 = 1n

∑xi satisfies the first order condition, as suggested by Hinich (1977).

Schofield (2007) gives the necessary and sufficient second order conditions for anLNE at the mean.

The underlying idea of this model is that each candidate, j , will be attracted to therespective weighted electoral mean, zel

j , but will also be pulled away to a positionpreferred by the party activists. Figure 6.6 suggests the location of the weightedelectoral mean for a Republican candidate. The contract curve in this figure is aheuristic estimated locus of preferred activist positions. See also Schofield et al.(2011).

Because the candidate utility functions {Vj : W → �} are differentiable, thesecond order condition on the Hessian of each Vj at z∗ can then be used to de-termine whether z∗ is indeed an LNE. Proof of existence of such an LNE willthen follow from some version of the Choice Theorem. For example, in the sim-pler model M(λ,β), without activists, all weights are equal to 1

n, so the electoral

mean x0 = 1n

∑xi satisfies the first order condition, as suggested by Hinich (1977).

Schofield (2007) gives the necessary and sufficient second order conditions for anLNE at the mean.

AU

TH

OR

’S P

RO

OF



599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

The underlying idea of this model is that each candidate, j , will be attracted to therespective weighted electoral mean, zel

j , but will also be pulled away to a positionpreferred by the party activists. Figure 6.6 suggests the location of the weightedelectoral mean for a Republican candidate. The contract curve in this figure is aheuristic estimated locus of preferred activist positions. See also Schofield et al.(2011).

We can compute the necessary and sufficient second order conditions for an LNEat the electoral mean. In the case that the activist valence functions and sociode-mographic terms are identically zero, we call this the pure spatial model, denotedM(λ, β).

In this case, the first order condition is

dVj (z)dzj

= 1

n

∑

i∈N

dρij

dzj

(6.5)

= 1

n

∑

i∈N

{2β(xi − zj )

}[ρij − ρ2

ij

]= 0. (6.6)

Suppose that all zj are identical. Then all ρij are independent of {xi} and thus of i,and ρij may be written as ρj . Then for each fixed j , the first order condition is

dVj (z)dzj

= 2β[ρj − ρ2

j

]∑

i∈N

[(xi − zj )

]= 0. (6.7)

The second order condition for an LNE at z* depends on the negative definitenessof the Hessian of the activist valence function. If the eigenvalues of these Hessiansare negative at a balance solution, and of sufficient magnitude, then this will guar-antee that a vector z* which satisfies the balance condition will be a SLNE. Indeed,this condition can ensure concavity of the vote share functions, and thus of existenceof a PNE.

6.3.1 The Model Without Activist Valence Functions

We now apply the Theorem to the pure spatial model M(λ, β), by setting μ= θ =α ≡ 0.

As we have shown above, the joint electoral mean z0 satisfies the first ordercondition for a LNE. We now consider the second order condition.

Definition 6.4 (The Convergence Coefficient of the Model M(λ, β)) When thespace W has dimension w.

(i) Define

ρ1 =[

1+p∑

k=2

exp[λk − λ1]]−1

. (6.8)

AU

TH

OR

’S P

RO

OF



645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

(ii) Let W be endowed with an orthogonal system of coordinate axes (1, . . . , s,

. . . , t, . . . ,w). For each coordinate axis let ξt = (x1t , x2t , . . . , xnt ) ∈ Rn be

the vector of the t th coordinates of the set of n voter ideal points. Let(ξs, ξt ) ∈ R denote scalar product. The covariance between the sth and t thaxes is denoted (σs, σt ) = 1

n(ξs, ξt ) and σ 2

s = 1n(ξs, ξs) is the electoral vari-

ance on the sth axis. Note that these variances and covariances are takenabout the electoral means on each axis.

(iii) The symmetric w × w electoral covariance matrix ∇0 is defined to be1n[(ξs, ξt )]s=1,...,w

t=1,...,w .(iv) The electoral variance is

σ 2 =w∑

s=1

σ 2s =

1

n

w∑

s=1

(ξs, ξs)= trace(∇0).

(v) The w by w characteristic matrix, of agent 1 is given by

C1 = 2β(1− 2ρ1)∇0 − I. (6.9)

(vi) The convergence coefficient of the model M(λ, β) is

c≡ c(λ, β)= 2β[1− 2ρ1]σ 2. (6.10)

Observe that the β-parameter has dimension L−2, so that c is dimensionless. Wecan therefore use c to compare different models.

Note also that agent 1 is by definition the agent with the lowest valence, and ρ1,as defined above, is the probability that a generic voter will choose this agent whenall agents are located at the origin. The estimate of the probability ρ1 depends onlyon the comparison functions {fkj }, as given above and these can be estimated interms of the valence differences.

The following result is proved in Schofield (2007).

The Mean Voter Valence Theorem(i) The joint mean z0 satisfies the first order condition to be a LNE for the model

M(λ, β).(ii) The necessary and sufficient second order condition for SLNE at z0 is that

C1 has negative eigenvalues.1

(iii) A necessary condition for z0 to be a SLNE for the model M(λ, β) is thatc(λ, β) < w.

(iv) A sufficient condition for convergence to z0 in the two dimensional case isthat c < 1.

Notice that (iii) follows from (ii) since the condition of negative eigenvaluesmeans that

trace(C1)= 2β[1− 2ρ1]σ 2 −w < 0.

1In the usual way, the condition for an LNE is that the eigenvalues are negative semi-definite.

AU

TH

OR

’S P

RO

OF



691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

In the case c(λ, β)= w, then trace (C1)= 0, which means either that all eigen-values are zero, or at least one is positive. This degenerate situation requires ex-amination of C1. The additional condition c < 1 is sufficient to guarantee thatdet(C1) > 0, which ensures that both eigenvalues are negative.

The expression for C1 has a simple form because of the assumption of a singledistance parameter β . It is possible to use a model with different coefficients β ={β1, β2, . . . , βw} on each dimension. In this case the characteristic matrix can readilybe shown to be

C1 = 2(1− 2ρ1)β∇0β − β,

We require trace (C1) < 0, or

2(1− 2ρ1) trace(β∇0β) < β1 + β2 + · · · + βw.

The convergence coefficient in this case is

c(λ,β)= 2(1− 2ρ1) trace(β∇0β)

1w

(β1 + β2 + · · · + βw)

again giving the necessary condition of c(λ,β) < w.Note that if C1 has negative eigenvalues, then the Hessians of the vote shares for

all agents are negative definite at the joint mean, z0. When this is true, then the jointmean is a candidate for a PNE, and this property can be verified by simulation.

When the convergence condition c(λ,β) < w is violated the joint origin cannotbe a SPNE.

In the degenerate case c(λ,β)= w it is again necessary to examine the charac-teristic matrix to determine whether the joint mean can be a PNE.

Model (1) in Table 6.4 shows the coefficients in 2008 for the pure spatial model,M(λ, β), to be

(λObama, λMcCain, β)= (0,−0.84,0.85).

Table 6.4 indicates, the loglikelihood, Akaike information criterion (AIC) andBayesian information criterion (BIC) are all quite acceptable, and all coefficientsare significant with probability < 0.001.

Note that these parameters are estimated when the candidates are located at theestimated positions. Again, λMcCain is the relative negative exogenous valence ofMcCain, with respect to Obama, according to the pure spatial model M(λ, β). Weassume that the parameters of the model remain close to these values as we modifythe candidates positions in order to determine the equilibria of the model.

According to the model M(λ, β), the probability that a voter chooses McCain orObama when both are positioned at the electoral mean, z0, are

(ρMcCain, ρObama)=(

e0

e0 + e0.84,

e0.84

e0 + e0.84

)= (0.30,0.70).

AU

TH

OR

’S P

RO

OF



737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

Now the covariance matrix can be estimated to be

∇0 =[

0.80 −0.13−0.13 0.83

].

Thus from Table 6.4, we obtain

CMcCain = [2β(1− 2ρMcCain)∇0 = [2× 0.85× 0.4×∇0] − I

= (0.68)∇0 − I

= (0.68)

[0.80 −0.13−0.13 0.83

]− I =

[0.54 −0.09−0.09 0.56

]− I

=[−0.46 −0.09−0.09 −0.44

],

c= 2β(1− 2ρMcCain) trace∇0 = 2(0.85)(0.4)(1.63)= 1.1.

The determinant of CMcCain is positive and the trace negative, so both eigenvaluesare negative, showing that the mean is an LNE. The lower 95 % estimate for ρMcCain

is 0.26, and the upper 95 % estimate for β is 0.97, so a very conservative upperestimate for β(1− 2ρMcCain) is 0.97× 0.48 = 0.47, so the upper estimate for c is1.53, giving an estimate for CMcCain of

(0.94)

[0.80 −0.09−0.09 0.83

]− I

=[

0.75 −0.13−0.13 0.78

]− I

=[−0.25 −0.13−0.13 −0.22

],

which still has negative eigenvalues.We also considered a spatial model where the x and y axes had different coeffi-

cients, β1 = 0.8, β2 = 0.92.Using

c(λ,β)= 2(1− 2ρlib) trace(β∇0β)

1w

(β1 + β2 + · · · + βw)

with 12 (β1 + β2)= 1

2 (0.80+ 0.92)= 0.86 and ρlib = 0.25, we find

c(λ,β) = 2(0.4)

0.86trace

[(0.80)2(0.80) (0.80)(0.92)(−0.13)

(0.80)(0.92)(−0.13) (0.92)2(0.83)

]

= (0.93) trace

[0.51 −0.09−0.09 0.70

]= (0.93)(1.21)= 1.23.

AU

TH

OR

’S P

RO

OF



783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

For the characteristic matrix,

CMcCain = 2(1− 2ρMcCain)β∇0β − β

= 2(0.4)

[0.51 −0.09−0.09 0.70

]−[

0.80 00 0.92

]

=[−0.41 −0.07−0.07 −0.56

]−[

0.80 00 0.92

]

=[−0.39 −0.07−0.07 −0.36

].

The analysis showed the Hessian for this case had negative eigenvalues, so againz0 is a LNE. This model is essentially the same as the model with a single β . Sincethese models imply the origin is an LNE but nether candidate is located there we in-fer that activists exert influence to move the candidate positions into opposite quad-rants of the policy space.

References

Banks, J. S. (1995). Singularity theory and core existence in the spatial model. Journal of Mathe-matical Economics, 24, 523–536.

Chichilnisky, G. (1995). Limited arbitrage is necessary and sufficient for the existence of a com-petitive equilibrium with or without short sales. Economic Theory, 5, 79–107.

Hinich, M. J. (1977). Equilibrium in spatial voting: the median voter theorem is an artifact. Journalof Economic Theory, 16, 208–219.

Kramer, G. H. (1973). On a class of equilibrium conditions for majority rule. Econometrica, 41,285–297.

Levine, D., & Palfrey, T. R. (2007). The paradox of voter participation. American Political ScienceReview, 101, 143–158.

<unc> Lin, T., Enelow, M. J., & Dorussen, H. (1999). Equilibrium in multicandidate probabilistic spatialvoting. Public Choice, 98, 59–82.

McKelvey, R. D. (1976). Intransitivities in multidimensional voting models and some implicationsfor agenda control. Journal of Economic Theory, 12, 472–482.

McKelvey, R. D. (1979). General conditions for global intransitivities in formal voting models.Econometrica, 47, 1085–1112.

McKelvey, R. D. (1986). Covering, dominance and institution free properties of social choice.American Journal of Political Science, 30, 283–314.

McKelvey, R. D., & Palfrey, T. R. (1995). Quantal response equilibria in normal form games.Games and Economic Behavior, 10, 6–38.

McKelvey, R. D., & Schofield, N. (1987). Generalized symmetry conditions at a core point. Econo-metrica, 55, 923–933.

<unc> Miller, G., & Schofield, N. (2003). Activists and partisan realignment in the US. American PoliticalScience Review, 97, 245–260.

<unc> Plott, C. R. (1967). A notion of equilibrium and its possibility under majority rule. The AmericanEconomic Review, 57, 787–806.

Saari, D. (1997). The generic existence of a core for q-rules. Economic Theory, 9, 219–260.Schofield, N. (1978). Instability of simple dynamic games. Review of Economic Studies, 45, 575–

594.

njschofi

Sticky Note

Bergstrom T (1975), 'The existence of maximal elements and equilibria in the absence of transitivity', Typescript, University of Michigan.

njschofi

Sticky Note

Knaster B, K. Kuratowski, S.Mazurkiewicz (1929), 'Ein beweis des fixpunktsatzes fur n-dimensionale simplexe'. Fund Math 14, 132-137.

njschofi

Sticky Note

Fan, K. (1961), 'A generalization of Tychonoff's fixed point theorem', Math Annalen 42, 305--310.

njschofi

Sticky Note

See Miller and Schofield (2003) for a devlopment of this observation.

njschofi

Sticky Note

Schofield (2006) Equilibria in the Spatial Stochastic Model of Voting with Party Activists.”The Review of Economic Design (10 (3) : 183-203.

AU

TH

OR

’S P

RO

OF


References 249

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

Schofield, N. (1983). Generic instability of majority rule. Review of Economic Studies, 50, 695–705.

Schofield, N. (1999). The heart and the uncovered set. Journal of Economics. Supplementum, 8,79–113.

Schofield, N. (2006). Equilibria in the spatial stochastic model of voting with party activists. Re-view of Economic Design, 10, 183–203.

Schofield, N. (2007). The Mean Voter Theorem: necessary and sufficient conditions for convergentequilibrium. Review of Economic Studies, 74, 965–980.

Schofield, N. (2010). Social orders. Social Choice and Welfare, 34, 503–536.<unc> Schofield, N. (2013). The probability of a fit choice. Review of Economic Design, 17, 129–150.

Schofield, N., Claassen, C., & Ozdemir, U. (2011). Empirical and formal models of the US presi-dential elections in 2004 and 2008. In N. Schofield & G. Caballero (Eds.), The political economyof institutions, democracy and voting (pp. 217–258). Berlin: Springer.

njschofi

Sticky Note

Walker M. (1977), 'On the existence of maximal elements', Journal of Economic Theory16, 470--474.

AU

TH

OR

’S P

RO

OF


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

7Review Exercises

7.1 Exercises to Chap. 1

1.1. Consider the relations:

P = {(2,3), (1,4), (2,1), (3,2), (4,4)}

and Q= {(1,3), (4,2), (2,4), (4,1)}.

Compute Q ◦ P , P ◦ Q, (P ◦ Q)−1 and (Q ◦ P)−1. Let φQ and φP be themappings associated with these two relations. Are either φQ and φP functions, andare they surjective and/or injective?

1.2. Suppose that each member i of a society M = {1, . . . ,m} has weak and strictpreferences (Ri,Pi ) on a finite set X of feasible states. Define the weak Pareto rule,Q, on X by xQy iff xRiy∀i ∈M , and xPjy for some j ∈M . Show that if eachRi , i ∈M , is transitive, then Q is transitive. Hence show that the Pareto choice setCQ(X) is non empty.

1.3. Show that the set Θ = {eiθ : 0≤ θ ≤ 2π}, of all 2× 2 matrices representingrotations, is a subgroup of (M∗(2),◦), under matrix composition, ◦.


2.1. With respect to the usual basis for �3, let x1 = (1,1,0), x2 = (0,1,1), x3 =(1,0,1). Show that {x1, x2, x3} are linearly independent.

2.2. Suppose f : �5→�4 is a linear transformation, with a 2-dimensional kernel.Show that there exists some vector z ∈ �4, such that for any vector y ∈ �4 thereexists a vector y0 ∈ Im(f ) with y = y0 + λz for some λ ∈ �.


251

http://dx.doi.org/10.1007/978-3-642-39818-6_7

njschofi

Sticky Note

This section 7 of the book gives review exercises for each of the chapters.

AU

TH

OR

’S P

RO

OF


252 7 Review Exercises

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

2.3. Find all solutions to the equations A(x)= bi , for i = 1,2,3, where

A=⎛

⎝1 4 2 33 1 −1 11 −1 4 6

⎞

⎠

and

b1 =⎛

⎝734

⎞

⎠ , b2 =⎛

⎝111

⎞

⎠ and b3 =⎛

⎝321

⎞

⎠ .

2.4. Find all solutions to the equation A(x)= b where

A=⎛

⎝6 −1 1 41 1 3 −13 4 1 2

⎞

⎠

and

b=⎛

⎝437

⎞

⎠ .

2.5. Let F : �4→�2 be the linear transformation represented by the matrix

(1 5 −1 3−1 0 −4 2

).

Compute the set F−1(y), when y = ( 41

).

2.6. Find the kernel and image of the linear transformation, A, represented by thematrix

⎛

⎝3 7 24 10 21 −2 5

⎞

⎠ .

Find new bases for the domain and codomain of A so that A can be representedas a matrix

(I 00 0

)

with respect to these bases.

AU

TH

OR

’S P

RO

OF


7.3 Exercises to Chap. 3 253

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

2.7. Find the kernel of the linear transformation, A, represented by the matrix

⎛

⎝1 3 12 −1 −5−1 1 3

⎞

⎠ .

Use the dimension theorem to compute the image of A. Does the equationA(x)= b have a solution when

b=⎛

⎝111

⎞

⎠?

2.8. Find the eigenvalues and eigenvectors of

(2 −11 4

).

Is this matrix positive or negative definite or neither?

2.9. Diagonalize the matrix

⎛

⎝4 1 11 8 01 10 2

⎞

⎠ .

2.10. Compute the eigenvalues and eigenvectors of

⎛

⎝1 0 00 0 10 1 0

⎞

⎠

and thus diagonalize the matrix.


3.1. Show that if A is a set in a topological space (X,T ) then the interior, Int(A),of A is open and the closure, Clos(A), is closed. Show that Int(A)⊂A⊂ Clos(A).What is the interior and what is the closure of the set [a, b) in �, with the Euclideantopology? What is the boundary of [a, b)? Determine the limit points of [a, b).

3.2. If two metrics d1, d2 on a space X are equivalent write d1 ∼ d2. Show that∼ is an equivalence relation on the set of all metrics on X. Thus show that theCartesian, Euclidean and city block topologies on �n are equivalent.

AU

TH

OR

’S P

RO

OF



139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

3.3. Show that the set, L(�n,�m), of linear transformations from �n to �m is anormed vector space with norm

‖f ‖ = supx∈�n

{‖f (x)‖‖x‖ : ‖x‖ �= 0

},

with respect to the Euclidean norms on �n and �m. In particular verify that ‖ ‖Lsatisfies the three norm properties. Describe an open neighbourhood of a member f

of L(�n,�m) with respect to the induced topology on L(�n,�m). Let M(n,m) bethe set of n×m matrices with the natural topology (see page 106), and let

M :L(�n,�m)→M(n,m)

be the matrix representation with respect to bases for �n and �m. Discuss the con-tinuity of M with respect to these topologies for L(�n,�m) and M(n,m).

3.4. Determine, from first principles, whether the following functions are contin-uous on their domain:1. �+→� : x→ loge x;2. �→�+ : x→ x2;3. �→�+ : x→ ex ;4. �→� : x→ cosx;5. �→� : x→ cos 1

x.

3.5. What is the image of the interval [−1,1] under the function x → cos 1x

? Isthe image compact?

3.6. Determine which of the following sets are convex:1. X1 = {(x1, x2) ∈ �2 : 3x2

1 + 2x22 ≤ 6};

2. X2 = {(x1, x2) ∈ �2 : x1 ≤ 2, x2 ≤ 3};3. X3 = {(x1, x2) ∈ �2+ : x1x2 ≤ 1};4. X4 = {(x1, x2) ∈ �2+ : x2 − 3≥−x2

1}.

3.7. In �2, let BC(x, r1) be the Cartesian open ball of radius r1 about x, andBE(y, r2) the Euclidean ball of radius r2 about x. Show that these two sets areconvex. For fixed x, y ∈ �2 obtain necessary and sufficient restrictions on r1, r2 sothat these two open balls may be strongly separated by a hyperplane.

3.8. Determine whether the following functions are convex, quasi-concave, orconcave:1. �→�+ : x→ ex ;2. �→� : x→ x7;3. �→� : (x, y)→ xy;4. �→� : x→ 1

x;

5. �2→� : (x, y)→ x2 − y.

AU

TH

OR

’S P

RO

OF


7.4 Exercises to Chap. 4 255

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230


4.1. Suppose that f : �n →�m and g : �m →�k are both Cr -differentiable. Isg ◦ f : �n→�k , a Cr -differentiable function? If so, why?

4.2. Find and classify the critical points of the following functions:1. �2→� : (x, y)→ x2 + xy + 2y2 + 3;2. �2→� : (x, y)→−x2 + xy − y2 + 2x + y;3. �2→� : (x, y)→ e2x − 2x + 2y.

4.3. Determine the critical points, and the Hessian at these points, of the function�2→� : (x, y)→ x2y.

Compute the eigenvalues and eigenvectors of the Hessian at critical points, anduse this to determine the nature of the critical points.

4.4. Show that the origin is a critical point of the function:

�3→� : (x, y, z)→ x2 + 2y2 +Zz2 + xy + xz.

Determine the nature of this critical point by examining the Hessian.

4.5. Determine the set of critical points of the function

�2→� : (x, y)→−x2y2 + 4xy − 4x2.

4.6. Maximise the function �2 →� : (x, y)→ x2y subject to the constraint 1−x2 − y2 = 0.

4.7. Maximise the function �2 →� : (x, y)→ a logx + b logy, subject to theconstraint px + qy ≤ I , where p,q, I ∈ �+.


5.1. Show that if dimension (X) > m, then for almost every smooth profile u =(u1, . . . , um) :X→�m it is the case that Pareto optimal points in the interior of X

can be parametrised by at most (m−1) strictly positive coefficients {λ1, . . . , λm−1}.

5.2. Consider a two agent, two good exchange economy, where the initial endow-ment of good j , by agent i is eij . Suppose that each agent, i, has utility functionui : (xi1, xi2)→ a logxi1 + b logxi2. Compute the critical Pareto set Θ , within thefeasible set

Y = {(x11, x12, x21, x22) ∈ �4+},

AU

TH

OR

’S P

RO

OF



231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

Fig. 7.1 The butterflysingularity

where the coordinates of Y satisfy

x11 + x21 = e11 + e21 and x12 + x22 = e12 + e22.

What is the dimension of Y and what is the codimension of Θ in Y ? Computethe market-clearing equilibrium.

5.3. Figure 7.1 shows a “butterfly singularity”, A, in �2. Compute the degreeof this singularity. Show why such a singularity (though it is isolated) cannot beassociated with a generic excess demand function on the two-dimensional pricesimplex.

AU

TH

OR

’S P

RO

OF


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

Subject Index

AAbelian group, 20Accumulation point, 83Acyclic relation, 26Additive inverse, 15Additive relation, 22Admissible set, 158Antisymmetric relation, 24Arrow’s Impossibility Theorem, 33, 35Associativity in a group, 15Associativity of sets, 2Asymmetric relation, 24Attractor of a vector field, 214

BBaire space, 212Banach space, 123, 190Base for a topology, 82Basis of a vector space, 42Bergstrom’s theorem, 128Bijective function, 11Bilinear map, 67Binary operation, 14Binary relation, 24Bliss point of a preference, 31Boolean algebra, 2Boundary of a set, 83Boundary problem, 162Bounded function, 29Brouwer Fixed Point Theorem, 118, 121Browder Fixed Point Theorem, 125Budget set, 113Butterfly dynamical system, 224

CCalculation argument on economic

information, 225Canonical form of a matrix, 75Cartesian metric, 85Cartesian norm, 80Cartesian open ball, 85

Cartesian product, 7Cartesian topology, 85Chain rule, 141Change of basis, 55Chaos, 221, 224Characteristic equation of a matrix, 68Choice correspondence, 27City block metric, 87City block norm, 80City block topology, 86Closed set, 83Closure of a set, 83Coalition feasibility, 179Codomain of a relation, 7Cofactor matrix, 53Collegial rule, 35Collegium, 35Commutative group, 20Commutativity of sets, 2Compact set, 95Competitive allocation, 178Complement of a set, 2Complete vector space, 123, 190Composition of mappings, 8Composition of matrices, 14Concave function, 100Connected relation, 24Constrained optimisation, 155Consumer optimisation, 113Continuous function, 94Contractible space, 119Convex function, 99Convex preference, 99Convex set, 99Corank, 196Corank r singularity, 196Core Theorem for an exchange economy, 179Cover for a set, 6, 93Critical Pareto set, 173Critical point, 150

N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6,© Springer-Verlag Berlin Heidelberg 2014

257

http://dx.doi.org/10.1007/978-3-642-39818-6

njschofi

Sticky Note

Convergence coefficient,244

AU

TH

OR

’S P

RO

OF


258 Subject Index

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

DDebreu projection, 204Debreu-Smale Theorem, 183, 189, 203, 215,

221Decisive coalition, 33Deformation, 216Deformation retract, 119Degree of a singularity, 212Demand, 114Dense set, 83Derivative of a function, 145Determinant of a matrix, 14, 53Diagonalisation of a matrix, 63, 64Dictator, 35Diffeomorphism, 140Differentiable function, 135Differential of a function, 138Dimension of a vector space, 45Dimension theorem, 50Direction gradient, 139Distributivity of a field, 21Distributivity of sets, 2Domain of a mapping, 8Domain of a relation, 8

EEconomic optimisation, 162Edgeworth box, 182Eigenvalue, 62Eigenvector, 62Endowment, 108Equilibrium prices, 114Equivalence relation, 28Euclidean norm, 78Euclidean scalar product, 77Euclidean topology, 86Euler characteristic of simplex, 214Euler characteristic of sphere and torus, 199Excess demand function, 207Exchange theorem, 44Existential quantifier, 7

FFan theorem, 125, 225Feasible input-output vector, 111Field, 21Filter, 34, 38Fine topology, 87Finite intersection property, 93Finitely generated vector space, 44, 45Fixed point property, 118Frame, 41Free-disposal equilibrium, 132

Function, 11Function space, 92

GGame, 128General linear group, 55Generic existence of regular economies, 183,

203Generic property, 183, 201, 212Global maximum (minimum) of a function,

154Global saddlepoint of the Lagrangian, 117Graph of a mapping, 9Group, 14

HHairy Ball theorem, 215Hausdorff space, 96Heine-Borel Theorem, 95Hessian, 142, 144Homeomorphism, 118, 211Homomorphism, 18

IIdentity mapping, 10Identity matrix, 13, 52Identity relation, 8Image of a mapping, 8Image of a transformation, 50Immersion, 194Implicit function theorem, 192, 195Index of a critical point, 197Index of a quadratic form, 70Indifference, 24Infimum of a function, 87Injective function, 12Interior of a set, 83Intersection of sets, 2Inverse element, 15Inverse function, 11Inverse function theorem, 190Inverse matrix, 13, 54Inverse relation, 8Invisible dictator, 35Irrational flow on torus, 218Isomorphism, 18Isomorphism theorem, 56

JJacobian of a function, 140

KKernel of a transformation, 50Kernel rank, 50

njschofi

Sticky Note

Heart, 237

AU

TH

OR

’S P

RO

OF


Subject Index 259

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

Knaster-Kuratowski-Mazurkiewicz (KKM)Theorem, 123

Kuhn-Tucker theorems, 115–118

LLagrangian, 116Lefschetz fixed point theorem, 216Lefschetz obstruction, 217Limit of a sequence, 94Limit point, 83Linear combination, 42Linear dependence, 15Linear transformation, 45Linearly independent, 41Local maximum (minimum), 154Locally non-satiated preference, 110Lower demi-continuity, 118Lower hemi-continuity, 128Lyapunov function, 208, 209

MMajority rule, 34Manifold, 194Mapping, 8Marginal rate of technical substitution, 169Market equilibrium, 114Matrix, 13, 46Mean value theorem, 146Measure zero, 198Metric, 80Metric topology, 82Metrisable space, 80Michael’s Selection Theorem, 123Monotonic rule, 36Morphism, 18Morse function, 154, 197Morse lemma, 197Morse Sard theorem, 202Morse theorem, 202

NNakamura Lemma, 36Nakamura number, 35Nakamura Theorem, 127Nash equilibrium, 128Negation of a set, 2Negative definite form, 70Negative of an element, 21Negatively transitive, 26Neighbourhood, 82Non-degenerate critical point, 150Non-degenerate form, 70Non-satiated preference, 110Norm of a vector, 69, 78

Norm of a vector space, 80Normal hyperplane, 106Nowhere dense, 198Null set, 2Nullity of a quadratic form, 70

OOligarchy, 34, 38Open ball, 81Open cover, 93Open set, 82Optimum, 116Orthogonal vectors, 69

PPareto correspondence, 171Pareto set, 29, 171Pareto theorem, 203, 216Partial derivative, 139Partition, 7Peixoto-Smale theorem, 212Permutation, 11Phase portrait, 210Poincaré-Hopf Theorem, 214Positive definite form, 70Preference manipulation, 180Preference relation, 24Prefilter, 38Price adjustment process, 208Price equilibrium, 170, 225Price equilibrium existence, 126Price vector, 108, 112Producer optimisation, 111Product rule, 141Product topology, 84Production set, 114Profit function, 112Propositional calculus, 4Pseudo-concave function, 158

Qq-Majority, 36Quadratic form, 69Quasi-concave function, 100

RRank of a matrix, 51Rank of a transformation, 50Rationality, 25Real vector space, 40Reflexive relation, 24Regular economy, 183, 206Regular point, 192Regular value, 192Relation, 8

njschofi

Sticky Note

Local Nash Equilibrium (LNE), 238

njschofi

Sticky Note

Mean Voter Theorem, 245

njschofi

Sticky Note

Market arvitrage, 236

njschofi

Sticky Note

Quantal Response, 238

AU

TH

OR

’S P

RO

OF


260 Subject Index

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

Relative topology, 6, 84Repellor for a vector field, 214Residual set, 83, 183Resource manipulation, 181Retract, 119Retraction, 119Rolle’s Theorem, 145, 147Rotations, 17, 74

SSaddle, 70Saddle point, 151Sard’s lemma, 198Scalar, 22Scalar product, 47, 77Separating hyperplane, 108Separation of convex sets, 107Set theory, 1–3, 6Shadow prices, 111Shauder’s fixed point theorem, 125Similar matrices, 58Singular matrix, 14Singular point, 196Singularity set of a function, 200Singularity theorem, 202Smooth function, 143Social utility function, 28Sonnenschein-Mantel-Debreu Theorem, 220Stratified manifold, 202Strict Pareto rule, 28, 34Strict partial order, 26, 33Strict preference relation, 24Strictly quasi-concave function, 156, 158Structural stability of a vector field, 219, 222Subgroup, 18Submanifold, 202Submanifold theorem, 202Submersion, 194Supremum of a function, 87Surjective function, 11

Symmetric matrix, 68Symmetric relation, 24

TTangent to a function, 138Taylor’s theorem, 148Thom transversality theorem, 201Topological space, 82Topology, 6Torus, 198Trace of a matrix, 65Transfer paradox, 181Transitive relation, 24Transversality, 200Triangle inequality, 79Truth table, 4Two party competition, 130Tychonoff’s theorem, 97

UUnion of sets, 2Universal quantifier, 7Universal set, 1Utility function, 25

VVector field, 211Vector space, 40Vector subspace, 40Venn diagram, 2

WWalras’ Law, 218Walras manifold, 203Walrasian equilibrium, 183Weak monotone function, 100Weak order, 26Weak Pareto rule, 28Weierstrass theorem, 95Welfare theorem, 177Whitney topology, 182, 236

njschofi

Sticky Note

Valence, 238

njschofi

Sticky Note

Type I extreme value distribution

njschofi

Sticky Note

Stochastic Choice, 238

AU

TH

OR

’S P

RO

OF


185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

Author Index

AAliprantis, C., 133Aliprantis, D., 229Arrow, K. J., 32, 35, 38, 187, 225Aumann, R. J., 225

BBalasko, Y., 181, 187, 206, 227Banks, J. S., 236, 248Bergstrom, T., 128, 131, 132, 133Bikhchandani, S., 226Border, K., 132Brouwer, L. E. J., 118, 121, 132Browder, F. E., 125, 132Brown, D., 133Brown, R., 228Burkenshaw, O., 229

CCaballero, G., 239, 243, 244, 249Calvin, W. H., 223Chichilnisky, G., 236, 248Chillingsworth, D. R. J., 195, 227Claassen, C., 239, 243, 244, 249Condorcet, M. J. A. N., 226

DDebreu, G., 183, 203, 204, 206, 215, 220, 223,

227, 228Dierker, E., 217, 221, 227Dorussen, H., 248

EEnelow, M. J., 248Eldredge, N., 223

FFan, K., 125, 132, 225

GGale, D., 181, 187Gamble, A., 225Gleick, J., 228Golubitsky, M., 196, 227Goroff, D., 223Greenberg, J., 133Guesnerie, R., 187Guillemin, V., 196, 227

HHahn, F. H., 187Hayek, F. A., 225Heal, E. M., 132Hewitt, F., 226Hildenbrand, W., 108, 187Hinich, M. J., 243, 248Hirsch, M., 196, 227Hirschleifer, D., 226Hubbard, J. H., 228

KKauffman, S., 221Keenan, D., 218, 228Kepler, J., 222Keynes, J. M., 226Kirman, A. P., 38, 108, 187Knaster, B., 123, 132Konishi, H., 133Kramer, G. H., 234, 236, 248Kuhn, H. W., 115, 132Kuratowski, K., 123, 132

LLaffont, J.-J., 187Lange, O., 225Laplace, P. S., 222Levine, D., 239, 248Lin, T., 248Lorenz, E. N., 224, 229

N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6,© Springer-Verlag Berlin Heidelberg 2014

261

http://dx.doi.org/10.1007/978-3-642-39818-6

njschofi

Sticky Note

,233

njschofi

Sticky Note

, 233

njschofi

Sticky Note

, 233

AU

TH

OR

’S P

RO

OF


262 Author Index

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

MMantel, R., 220, 221, 228Mas-Colell, A., 220, 227Mazerkiewicz, S., 132McKelvey, R. D., 236, 237, 239, 248McLean, I., 226Michael, E., 123, 132Miller, G., 248Minsky, H., 226

NNakamura, K., 35, 36, 38, 127Nash, J. F., 128, 132Nenuefeind, W., 218, 228Neuefeind, W., 128, 131, 132, 133Newton, I., 222

OOzdemir, U., 239, 243, 244, 249

PPalfrey, T. R., 239, 248Peixoto, M., 203, 212, 219, 227, 228Peterson, I., 223Plott, C. R., 248Poincaré, H., 214, 222, 223Pontrjagin, L. S., 228Prabhakar, N., 133

RRader, T., 218, 228Riezman, R., 128, 131, 132, 133, 218, 228

Rothman, N. J., 229

SSaari, D., 229, 236, 248Safra, Z., 187Scarf, H., 189, 207, 209, 217, 227Schauder, J., 125, 132Schofield, N., 133, 217, 228, 236, 237, 239,

241, 243–245, 248, 249Schumpeter, J. A., 225Shafer, W., 133Skidelsky, R., 226Smale, S., 183, 187, 203, 212, 215, 219, 221,

227, 228Sondermann, D., 38Sonnenschein, H., 133, 220, 221, 228Strnad, J., 133

TThom, R., 201, 228Tucker, A. W., 115, 132

Vvon Mises, L., 225

WWelsh, I., 226West, B. H., 228

YYannelis, N., 133

njschofi

Sticky Note

, 236

Documents

1 Mathematical Methods in Economics and...1 Schofield Springer Texts in Business and Economics Mathematical Methods in Economics and Mathematical Methods in Social Choice Economics