Advanced Fixed Point Theory for Economicscupid.economics.uq.edu.au/mclennan/Advanced/advanced_fp.pdf · Advanced Fixed Point Theory for Economics Andrew McLennan April 8, 2014. Preface

Advanced Fixed Point Theory for

Economics

Andrew McLennan

April 8, 2014

Preface

Over two decades ago now I wrote a rather long survey of the mathematicaltheory of fixed points entitled Selected Topics in the Theory of Fixed Points. Ithad no content that could not be found elsewhere in the mathematical literature,but nonetheless some economists found it useful. Almost as long ago, I beganwork on the project of turning it into a proper book, and finally that project iscoming to fruition. Various events over the years have reinforced my belief that themathematics presented here will continue to influence the development of theoreticaleconomics, and have intensified my regret about not having completed it sooner.

There is a vast literature on this topic, which has influenced me in many ways,and which cannot be described in any useful way here. Even so, I should saysomething about how the present work stands in relation to three other books onfixed points. Fixed Point Theorems with Applications to Economics and GameTheory by Kim Border (1985) is a complement, not a substitute, explaining variousforms of the fixed point principle such as the KKMS theorem and some of themany theorems of Ky Fan, along with the concrete details of how they are actuallyapplied in economic theory. Fixed Point Theory by Dugundji and Granas (2003) is,even more than this book, a comprehensive treatment of the topic. Its fundamentalpoint of view (applications to nonlinear functional analysis) audience (professionalmathematicians) and technical base (there is extensive use of algebraic topology)are quite different, but it is still a work with much to offer to economics. Particularlynotable is the extensive and meticulous information concerning the literature andhistory of the subject, which is full of affection for the theory and its creators. Thebook that was, by far, the most useful to me, is The Lefschetz Fixed Point Theoremby Robert Brown (1971). Again, his approach and mine have differences rooted inthe nature of our audiences, and the overall objectives, but at their cores the twobooks are quite similar, in large part because I borrowed a great deal.

I would like to thank the many people who, over the years, have commentedfavorably on Selected Topics. It is a particular pleasure to acknowledge some verydetailed and generous written comments by Klaus Ritzberger. This work would nothave been possible without the support and affection of my families, both presentand past, for which I am forever grateful.

i

Contents

1 Introduction and Summary 2

1.1 The First Fixed Point Theorems . . . . . . . . . . . . . . . . . . . . 31.2 “Fixing” Kakutani’s Theorem . . . . . . . . . . . . . . . . . . . . . 51.3 Essential Sets of Fixed Points . . . . . . . . . . . . . . . . . . . . . 71.4 Index and Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4.2 The Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.3 The Fixed Point Index . . . . . . . . . . . . . . . . . . . . . 15

1.5 Topological Consequences . . . . . . . . . . . . . . . . . . . . . . . 171.6 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

I Topological Methods 22

2 Planes, Polyhedra, and Polytopes 23

2.1 Affine Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Convex Sets and Cones . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.4 Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.5 Polyhedral Complexes . . . . . . . . . . . . . . . . . . . . . . . . . 302.6 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Computing Fixed Points 35

3.1 The Lemke-Howson Algorithm . . . . . . . . . . . . . . . . . . . . . 363.2 Implementation and Degeneracy Resolution . . . . . . . . . . . . . 443.3 Using Games to Find Fixed Points . . . . . . . . . . . . . . . . . . 493.4 Sperner’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.5 The Scarf Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 543.6 Homotopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.7 Remarks on Computation . . . . . . . . . . . . . . . . . . . . . . . 59

4 Topologies on Spaces of Sets 66

4.1 Topological Terminology . . . . . . . . . . . . . . . . . . . . . . . . 664.2 Spaces of Closed and Compact Sets . . . . . . . . . . . . . . . . . . 674.3 Vietoris’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4 Hausdorff Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.5 Basic Operations on Subsets . . . . . . . . . . . . . . . . . . . . . . 71

ii

CONTENTS iii

4.5.1 Continuity of Union . . . . . . . . . . . . . . . . . . . . . . 714.5.2 Continuity of Intersection . . . . . . . . . . . . . . . . . . . 714.5.3 Singletons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.5.4 Continuity of the Cartesian Product . . . . . . . . . . . . . 724.5.5 The Action of a Function . . . . . . . . . . . . . . . . . . . . 734.5.6 The Union of the Elements . . . . . . . . . . . . . . . . . . . 74

5 Topologies on Functions and Correspondences 76

5.1 Upper and Lower Semicontinuity . . . . . . . . . . . . . . . . . . . 775.2 The Strong Upper Topology . . . . . . . . . . . . . . . . . . . . . . 785.3 The Weak Upper Topology . . . . . . . . . . . . . . . . . . . . . . . 805.4 The Homotopy Principle . . . . . . . . . . . . . . . . . . . . . . . . 825.5 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6 Metric Space Theory 85

6.1 Paracompactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.2 Partitions of Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.3 Topological Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . 886.4 Banach and Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . 906.5 EmbeddingTheorems . . . . . . . . . . . . . . . . . . . . . . . . . . 926.6 Dugundji’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7 Retracts 95

7.1 Kinoshita’s Example . . . . . . . . . . . . . . . . . . . . . . . . . . 957.2 Retracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977.3 Euclidean Neighborhood Retracts . . . . . . . . . . . . . . . . . . . 997.4 Absolute Neighborhood Retracts . . . . . . . . . . . . . . . . . . . 1007.5 Absolute Retracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.6 Domination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8 Essential Sets of Fixed Points 107

8.1 The Fan-Glicksberg Theorem . . . . . . . . . . . . . . . . . . . . . 1088.2 Convex Valued Correspondences . . . . . . . . . . . . . . . . . . . . 1108.3 Kinoshita’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 112

9 Approximation of Correspondences 115

9.1 The Approximation Result . . . . . . . . . . . . . . . . . . . . . . . 1159.2 Extending from the Boundary of a Simplex . . . . . . . . . . . . . . 1169.3 Extending to All of a Simplicial Complex . . . . . . . . . . . . . . . 1189.4 Completing the Argument . . . . . . . . . . . . . . . . . . . . . . . 120

II Smooth Methods 124

10 Differentiable Manifolds 125

10.1 Review of Multivariate Calculus . . . . . . . . . . . . . . . . . . . . 12610.2 Smooth Partitions of Unity . . . . . . . . . . . . . . . . . . . . . . . 128

CONTENTS 1

10.3 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13110.4 Smooth Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13210.5 Tangent Vectors and Derivatives . . . . . . . . . . . . . . . . . . . . 13310.6 Submanifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13610.7 Tubular Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . 14010.8 Manifolds with Boundary . . . . . . . . . . . . . . . . . . . . . . . 14310.9 Classification of Compact 1-Manifolds . . . . . . . . . . . . . . . . . 146

11 Sard’s Theorem 150

11.1 Sets of Measure Zero . . . . . . . . . . . . . . . . . . . . . . . . . . 15111.2 A Weak Fubini Theorem . . . . . . . . . . . . . . . . . . . . . . . . 15311.3 Sard’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15411.4 Measure Zero Subsets of Manifolds . . . . . . . . . . . . . . . . . . 15711.5 Genericity of Transversality . . . . . . . . . . . . . . . . . . . . . . 158

12 Degree Theory 163

12.1 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16312.2 Induced Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . 16812.3 The Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17112.4 Composition and Cartesian Product . . . . . . . . . . . . . . . . . . 174

13 The Fixed Point Index 176

13.1 Axioms for an Index on a Single Space . . . . . . . . . . . . . . . . 17713.2 Multiple Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17813.3 The Index for Euclidean Spaces . . . . . . . . . . . . . . . . . . . . 18013.4 Extension by Commutativity . . . . . . . . . . . . . . . . . . . . . . 18213.5 Extension by Continuity . . . . . . . . . . . . . . . . . . . . . . . . 189

III Applications and Extensions 193

14 Topological Consequences 194

14.1 Euler, Lefschetz, and Eilenberg-Montgomery . . . . . . . . . . . . . 19514.2 The Hopf Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 19714.3 More on Maps Between Spheres . . . . . . . . . . . . . . . . . . . . 20014.4 Invariance of Domain . . . . . . . . . . . . . . . . . . . . . . . . . . 20614.5 Essential Sets Revisited . . . . . . . . . . . . . . . . . . . . . . . . . 207

15 Vector Fields and their Equilibria 211

15.1 Euclidean Dynamical Systems . . . . . . . . . . . . . . . . . . . . . 21215.2 Dynamics on a Manifold . . . . . . . . . . . . . . . . . . . . . . . . 21315.3 The Vector Field Index . . . . . . . . . . . . . . . . . . . . . . . . . 21615.4 Dynamic Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22015.5 The Converse Lyapunov Problem . . . . . . . . . . . . . . . . . . . 22215.6 A Necessary Condition for Stability . . . . . . . . . . . . . . . . . . 226

Chapter 1

Introduction and Summary

The Brouwer fixed point theorem states that if C is a nonempty compact convexsubset of a Euclidean space and f : C → C is continuous, then f has a fixed point,which is to say that there is an x∗ ∈ C such that f(x∗) = x∗. The proof ofthis by Brouwer (1912) was one of the major events in the history of topology.Since then the study of such results, and the methods used to prove them, hasflourished, undergoing radical transformations, becoming increasingly general andsophisticated, and extending its influence to diverse areas of mathematics.

Around 1950, most notably through the work of Nash (1950, 1951) on noncoop-erative games, and the work of Arrow and Debreu (1954) on general equilibriumtheory, it emerged that in economists’ most basic and general models, equilibriaare fixed points. The most obvious consequence of this is that fixed point theo-rems provide proofs that these models are not vacuous. But fixed point theory alsoinforms our understanding of many other issues such as comparative statics, robust-ness under perturbations, stability of equilibria with respect to dynamic adjustmentprocesses, and the algorithmics and complexity of equilibrium computation. In par-ticular, since the mid 1970’s the theory of games has been strongly influenced byrefinement concepts defined largely in terms of robustness with respect to certaintypes of perturbations.

As the range and sophistication of economic modelling has increased, more ad-vanced mathematical tools have become relevant. Unfortunately, the mathematicalliterature on fixed points is largely inaccessible to economists, because it relies heav-ily on homology. This subject is part of the standard graduate school curriculumfor mathematicians, but for outsiders it is difficult to penetrate, due to its abstractnature and the amount of material that must be absorbed at the beginning beforethe structure, nature, and goals of the theory begin to come into view. Many re-searchers in economics learn advanced topics in mathematics as a side product oftheir research, but unlike infinite dimensional analysis or continuous time stochasticprocesses, algebraic topology will not gradually achieve popularity among economictheorists through slow diffusion. Consequently economists have been, in effect,shielded from some of the mathematics that is most relevant to their discipline.

This monograph presents an exposition of advanced material from the theory offixed points that is, in several ways, suitable for graduate students and researchersin mathematical economics and related fields. In part the “fit” with the intended

2

1.1. THE FIRST FIXED POINT THEOREMS 3

audience is a matter of coverage. Economic models always involve domains thatare convex, or at least contractible, so there is little coverage here of topics thatonly become interesting when the underlying space is more complicated. For thesettings of interest, the treatment is comprehensive and maximally general, withissues related to correspondences always in the foreground. The project was orig-inally motivated by a desire to understand the existence proofs in the literatureon refinements of Nash equilibrium as applications of preexisting mathematics, andthe continuing influence of this will be evident.

The mathematical prerequisites are within the common background of advancedstudents and researchers in theoretical economics. Specifically, in addition to multi-variate calculus and linear algebra, we assume that the reader is familiar with basicaspects of point-set topology. What we need from topics that may be less familiarto some (e.g., simplicial complexes, infinite dimensional linear spaces, the theoryof retracts) will be explained in a self-contained manner. There will be no use ofhomological methods.

The avoidance of homology is a practical necessity, but it can also be seen asa feature rather than a bug. In general, mathematical understanding is enhancedwhen brute calculations are replaced by logical reasoning based on conceptuallymeaningful definitions. To say that homology is a calculational machine is a bitsimplistic, but it does have that potential in certain contexts. Avoiding it commitsus to work with notions that have more direct and intuitive geometric content.(Admittedly there is a slight loss of generality, because there are acyclic—that is,homologically trivial—spaces that are not contractible, but this is unimportantbecause such spaces are not found “in nature.”) Thus our treatment of fixed pointtheory can be seen as a mature exposition that presents the theory in a natural andlogical manner.

In the remainder of this chapter we give a broad overview of the contents ofthe book. Unlike many subjects in mathematics, it is possible to understand thestatements of many of the main results with much less preparation than is requiredto understand the proofs. Needless to say, as usual, not bothering to study theproofs has many dangers. In addition, the material in this book is, of course,closely related to various topics in theoretical economics, and in many ways quiteuseful preparation for further study and research.

1.1 The First Fixed Point Theorems

A fixed point of a function f : X → X is an element x∗ ∈ X such thatf(x∗) = x∗. If X is a topological space, it is said to have the fixed point

property if every continuous function from X to itself has a fixed point. The firstand most famous result in our subject is Brouwer’s fixed point theorem:

Theorem 1.1.1 (Brouwer (1912)). If C ⊂ Rm is nonempty, compact, and convex,then it has the fixed point property.

Chapter 3 presents various proofs of this result. Although some are fairly brief,none of them can be described as truly elementary. In general, proofs of Brouwer’s

4 CHAPTER 1. INTRODUCTION AND SUMMARY

theorem are closely related to algorithmic procedures for finding approximate fixedpoints. Chapter 3 discusses the best known general algorithm due to Scarf, a newalgorithm due to the author and Rabee Tourky, and homotopy methods, which arethe most popular in practice, but require differentiability. The last decade has seenmajor breakthroughs in computer science concerning the computational complexityof computing fixed points, with particular reference to (seemingly) simple gamesand general equilibrium models. These developments are sketched briefly in Section3.7.

In economics and game theory fixed point theorems are most commonly used toprove that a model has at least one equilibrium, where an equilibrium is a vector of“endogenous” variable for the model with the property that each individual agent’spredicted behavior is rational, or “utility maximizing,” if that agent regards allthe other endogenous variables as fixed. In economics it is natural, and in gametheory unavoidable, to consider models in which an agent might have more thanone rational choice. Our first generalization of Brouwer’s theorem addresses thisconcern.

If X and Y are sets, a correspondence F : X → Y is a function from X to thenonempty subsets of Y . (On the rare occasions when they arise, we use the term set

valued mapping for a function fromX to all the subsets of Y , including the emptyset.) We will tend to regard a function as a special type of correspondence, bothintuitively and in the technical sense that we will frequently blur the distinctionbetween a function f : X → Y and the associated correspondence x 7→ {f(x)}.

If Y is a topological space, F is compact valued if, for all x ∈ X , F (x) iscompact. Similarly, if Y is a subset of a vector space, then F is convex valued ifeach F (x) is convex.

The extension of Brouwer’s theorem to correspondences requires a notion ofcontinuity for correspondences. If X and Y are topological spaces, a correspondenceF : X → Y is upper semicontinuous if it is compact valued and, for each x0 ∈ Xand each neighborhood V ⊂ Y of F (x0), there is a neighborhood U ⊂ X of x0 suchthat F (x) ⊂ V for all x ∈ U . It turns out that if X and Y are metric spaces and Yis compact, then F is upper semicontinuous if and only if its graph

Gr(F ) := { (x, y) ∈ X × Y : y ∈ F (x) }

is closed. (Proving this is a suitable exercise, if you are so inclined.) Thinking ofupper semicontinuity as a matter of the graph being closed is quite natural, and ineconomics this condition is commonly taken as definition, as in Debreu (1959). InChapter 5 we will develop a topology on the space of nonempty compact subsetsof Y such that F is upper semicontinuous if and only if it is a continuous functionrelative to this topology.

A fixed point of a correspondence F : X → X is a point x∗ ∈ X such thatx∗ ∈ F (x∗). Kakutani (1941) was motivated to prove the following theorem by thedesire to provide a simple approach to the von Neumann (1928) minimax theorem,which is a fundamental result of game theory. This is the fixed point theorem thatis most commonly applied in economic analysis.

1.2. “FIXING” KAKUTANI’S THEOREM 5

Theorem 1.1.2 (Kakutani’s Fixed Point Theorem). If C ⊂ Rm is nonempty,compact, and convex, and F : C → C is an upper semicontinuous convex valuedcorrespondence, then F has a fixed point.

1.2 “Fixing” Kakutani’s Theorem

Mathematicians strive to craft theorems that maximize the strength of the con-clusions while minimizing the strength of the assumptions. One reason for this isobvious: a stronger theorem is a more useful theorem. More important, however, isthe desire to attain a proper understanding of the principle the theorem expresses,and to achieve an expression of this principle that is unencumbered by useless clut-ter. When a theorem that is “too weak” is proved using methods that “happento work” there is a strong suspicion that attempts to improve the theorem willuncover important new concepts. In the case of Brouwer’s theorem the conclusion,that the space has the fixed point property, is a purely topological assertion. Theassumption that the space is convex, and in Kakutani’s theorem the assumptionthat the correspondence’s values are convex, are geometric conditions that seemsout of character and altogether too strong. Suitable generalizations were developedafter World War II.

A homotopy is a continuous function h : X × [0, 1] → Y where X and Y aretopological spaces. It is psychologically natural to think of the second variable in thedomain as representing time, and we let ht := h(·, t) : X → Y denote “the functionat time t,” so that h is a process that continuously deforms a function h0 into h1.Another intuitive picture is that h is a continuous path in the space C(X, Y ) ofcontinuous function from X to Y . As we will see in Chapter 5, this intuition canbe made completely precise: when X and Y are metric spaces and X is compact,there is a topology on C(X, Y ) such that a continuous path h : [0, 1] → C(X, Y ) isthe same thing as a homotopy.

We say that two functions f, g : X → Y are homotopic if there is a homotopyh with h0 = f and h1 = g. This is easily seen to be an equivalence relation:symmetry and reflexivity are obvious, and to establish transitivity we observe thatif e is homotopic to f and f is homotopic to g, then there is a homotopy betweene and g that follows a homotopy between e and f at twice its original speed, thenfollows a homotopy between f and g at double the pace. The equivalence classesare called homotopy classes.

A spaceX is contractible if the identity function IdX is homotopic to a constantfunction. That is, there is a homotopy c : X × [0, 1] → X such that c0 = IdX andc1(X) is a singleton; such a homotopy is called a contraction. Convex sets arecontractible. More generally, a subset X of a vector space is star-shaped if thereis x∗ ∈ X (the star) such that X contains the line segment

{ (1− t)x+ tx∗ : 0 ≤ t ≤ 1 }

between each x ∈ X and x∗. If X is star-shaped, there is a contraction

(x, t) 7→ (1− t)x+ tx∗.


It seems natural to guess that a nonempty compact contractible space has thefixed point property. Whether this is the case was an open problem for several years,but it turns out to be false. In Chapter 7 we will see an example due to Kinoshita(1953) of a nonempty compact contractible subset of R3 that does not have thefixed point property. Fixed point theory requires some additional ingredient.

If X is a topological space, a subset A ⊂ X is a retract if there is a continuousfunction r : X → A with r(a) = a for all a ∈ A. Here we tend to think of X as a“simple” space, and the hope is that although A might seem to be more complex,or perhaps “crumpled up,” it nonetheless inherits enough of the simplicity of X . Aparticularly important manifestation of this is that if r : X → A is a retraction andX has the fixed point property, then so does A, because if f : A→ A is continuous,then so is f ◦ r : X → A ⊂ X , so f ◦ r has a fixed point, and this fixed pointnecessarily lies in A and is consequently a fixed point of f . Also, a retract of acontractible space is contractible because if c : X × [0, 1] → X is a contraction ofX and r : X → A ⊂ X is a retraction, then

(a, t) 7→ r(c(a, t))

is a contraction of A.A set A ⊂ Rm is a Euclidean neighborhood retract (ENR) if there is an

open superset U ⊂ Rm of A and a retraction r : U → A. If X and Y are metricspaces, an embedding of X in Y is a function e : X → Y that is a homeomorphismbetween X and e(X). That is, e is a continuous injection1 whose inverse is alsocontinuous when e(X) has the subspace topology inherited from Y . An absolute

neighborhood retract (ANR) is a separable2 metric space X such that wheneverY is a separable metric space and e : X → Y is an embedding, there is an opensuperset U ⊂ Y of e(X) and a retraction r : U → e(X). This definition probablyseems completely unexpected, and it’s difficult to get any feeling for it right away.In Chapter 7 we’ll see that ANR’s have a simple characterization, and that manyof the types of spaces that come up most naturally are ANR’s, so this conditionis quite a bit less demanding than one might guess at first sight. In particular, itwill turn out that every ENR is an ANR, so that being an ENR is an “intrinsic”property insofar as it depends on the topology of the space and not on how thespace is embedded in a Euclidean space.

An absolute retract (AR) is a separable metric space X such that wheneverY is a separable metric space and e : X → Y is an embedding, there is a retractionr : Y → e(X). In Chapter 7 we will prove that an ANR is an AR if and only if itis contractible.

Theorem 1.2.1. If C is a nonempty compact AR and F : C → C is an uppersemicontinuous contractible valued correspondence, then F has a fixed point.

An important point is that the values of F are not required to be ANR’s.

1We will usually use the terms “injective” rather than “one-to-one,” “ surjective” rather than“onto,” and “bijective” to indicate that a function is both injective and surjective. An injection isan injective function, a surjection is a surjective function, and a bijection is a bijective function.

2A metric space is separable if it has a countable dense subset.

1.3. ESSENTIAL SETS OF FIXED POINTS 7

For “practical purposes” this is the maximally general topological fixed pointtheorem, but for mathematicians there is an additional refinement. There is a con-cept called acyclicity that is defined in terms of the concepts of algebraic topology.A contractible set is necessarily acyclic, but there are acyclic spaces (including com-pact ones) that are not contractible. The famous Eilenberg-Montgomery fixed pointtheorem is:

Theorem 1.2.2 (Eilenberg and Montgomery (1946)). If C is a nonempty compactAR and F : C → C is an upper semicontinuous acyclic valued correspondence, thenF has a fixed point.

1.3 Essential Sets of Fixed Points

It might seem like we have already reached a satisfactory and fitting resolutionof “The Fixed Point Problem,” but actually (both in pure mathematics and ineconomics) this is just the beginning. You see, fixed points come in different flavors.

b

b

0 10

1

s t

Figure 1.1

The figure above shows a function f : [0, 1] → [0, 1] with two fixed points, sand t. If we perturb the function slightly by adding a small positive constant, s“disappears” in the sense that the perturbed function does not have a fixed pointanywhere near s, but a function close to f has a fixed point near t. More precisely,if X is a topological space and f : X → X is continuous, a fixed point x∗ of f isessential if, for any neighborhood U of x∗, there is a neighborhood V of the graphof f such that any continuous f ′ : X → X whose graph is contained in V has afixed point in U . If a fixed point is not essential, then we say that it is inessential.These concepts were introduced by Fort (1950).

There need not be an essential fixed point. The function shown in Figure 1.2


has an interval of fixed points. If we shift the function down, there will be a fixedpoint near the lower endpoint of this interval, and if we shift the function up therewill be a fixed point near the upper endpoint.

This example suggests that we might do better to work with sets of fixed points.A set S of fixed points of a function f : X → X is essential if it is closed, it has aneighborhood that contains no other fixed points, and for any neighborhood U of S,there is a neighborhood V of the graph of f such that any continuous f ′ : X → Xwhose graph is contained in V has a fixed point in U . The problem with thisconcept is that “large” connected sets are not of much use. For example, if X iscompact and has the fixed point property, then the set of all fixed points of f isessential. It seems that we should really be interested in sets of fixed points that areeither essential and connected3 or essential and minimal in the sense of not havinga proper subset that is also essential.

0 10

1

Figure 1.2

In Chapter 8 we will show that any essential set of fixed points contains a min-imal essential set, and that minimal essential sets are connected. The theory ofrefinements of Nash equilibrium (e.g., Selten (1975); Myerson (1978); Kreps andWilson (1982); Kohlberg and Mertens (1986); Mertens (1989, 1991); Govindan andWilson (2008)) has many concepts that amount to a weakening of the notion ofessential set, insofar as the set is required to be robust with respect to only cer-tain types of perturbations of the function or correspondence. In particular, Jiang(1963) pioneered the application of the concept to game theory, defining an es-

sential!Nash equilibrium and an essential set of Nash equilibria in termsof robustness with respect to perturbations of the best response correspondenceinduced by perturbations of the payoffs. The mathematical foundations of such

3We recall that a subset S of a topological space X is connected if there do not exist twodisjoint open sets U1 and U2 with S ∩ U1 6= ∅ 6= S ∩ U2 and S ⊂ U1 ∪ U2.

1.4. INDEX AND DEGREE 9

concepts are treated in Section 8.3.

1.4 Index and Degree

There are different types of essential fixed points. Figure 1.3 shows a functionwith three fixed points. At two of them the function starts above the diagonal andgoes below it as one goes from left to right, and at the third it is the other wayaround. For any k it is easy to imagine a function with k fixed points of the firsttype and k − 1 fixed points of the second type.

This phenomenon generalizes to higher dimensions. Let

Dm = { x ∈ Rm : ‖x‖ ≤ 1 } and Sm−1 = { x ∈ Rm : ‖x‖ = 1 }be the m-dimensional unit disk and the (m− 1)-dimensional unit sphere, and sup-pose that f : Dm → Dm is a C∞ function. In the best behaved case each fixedpoint x∗ is in the interior Dm \ Sm−1 of the disk and regular, which means thatIdRm−Df(x∗) is nonsingular, where Df(x∗) : Rm → Rm is the derivative of f at x∗.We define the index of x∗ to be 1 if the determinant of IdRm −Df(x∗) is positiveand −1 if this determinant is negative. We will see that there is always one morefixed point of index 1 than there are fixed points of index −1, which is to say thatthe sum of the indices is 1.

What about fixed points on the boundary of the disk, or fixed points that aren’tregular, or nontrivial connected sets of fixed points? What about correspondences?What happens if the domain is a possibly infinite dimensional ANR? The most chal-lenging and significant aspect of our work will be the development of an axiomatictheory of the index that is general enough to encompass all these possibilities. Thework proceeds through several stages, and we describe them in some detail now.

b

b

b

0 10

1

Figure 1.3


1.4.1 Manifolds

First of all, it makes sense to expand our perspective a bit. An m-dimensionalmanifold is a topological space that resembles Rm in a neighborhood of each of itspoints. More precisely, for each p ∈M there is an open U ⊂ Rm and an embeddingϕ : U →M whose image is open and contains p. Such a ϕ is a parameterization

and its inverse is a coordinate chart. The most obvious examples are Rm itselfand Sm. If, in addition, N is an n-dimensional manifold, thenM×N is an (m+n)-dimensional manifold. Thus the torus S1 × S1 is a manifold, and this is just themost easily visualized member of a large class of examples. An open subset of anm-dimensional manifold is anm-dimensional manifold. A 0-dimensional manifold isjust a set with the discrete topology. The empty set is a manifold of any dimension,including negative dimensions. Of course these special cases are trivial, but theycome up in important contexts.

A collection {ϕi : Ui → M}i∈I of parameterizations is an atlas if its imagescover M . The composition ϕ−1

j ◦ϕi (with the obvious domain of definition) is calleda transition function. If, for some 1 ≤ r ≤ ∞, all the transition functions areCr functions, then the atlas is a Cr atlas. An m-dimensional Cr manifold is anm-dimensional manifold together with a Cr atlas. The basic concepts of differentialand integral calculus extend to this setting, leading to a vast range of mathematics.

In our formalities we will always assume that M is a subset of a Euclideanspace Rk called the ambient space, and that the parameterizations ϕi and thecoordinate charts ϕ−1

i are Cr functions. This is a bit unprincipled—for example,physicists see only the universe, and their discourse is more disciplined if it doesnot refer to some hypothetical ambient space—but this maneuver is justified byembedding theorems due to Whitney that show that it does not entail any seriousloss of generality. The advantages for us are that this approach bypasses certaintechnical pathologies while allowing for simplified definitions, and in many settingsthe ambient space will prove quite handy. For example, a function f : M → N(where N is now contained in some Rℓ) is Cr for our purposes if it is Cr in thestandard sense: for any S ⊂ Rk a function h : S → Rℓ is Cr, by definition, if thereis an openW ⊂ Rk containing S and a Cr function H :W → Rℓ such that h = H|S.

Having an ambient space around makes it relatively easy to establish the basicobjects and facts of differential calculus. Suppose that ϕi : Ui → M is a Cr

parameterization. If x ∈ Ui and ϕi(x) = p, the tangent space ofM at p, which wedenote by TpM , is the image of Dϕi(x). This is an m-dimensional linear subspaceof Rk. If f :M → N is Cr, the derivative

Df(p) : TpM → Tf(p)N

of f at p is the restriction to TpM of the derivative DF (p) of any Cr functionF : W → Rℓ defined on an open W ⊂ Rk containing M whose restriction to M isf . (In Chapter 10 we will show that the choice of F doesn’t matter.) The chainrule holds: if, in addition, P is a p-dimensional Cr manifold and g : N → P is a Cr

function, then g ◦ f is Cr and

D(g ◦ f)(p) = Dg(f(p)) ◦Df(p) : TpM → Tg(f(p))P.


The inverse and implicit function theorems have important generalizations. Thepoint p is a regular point of f if the image of Df(p) is all of Tf(p)N . We say thatf :M → N is a Cr diffeomorphism if m = n, f is a bijection, and both f and f−1

are Cr. The generalized inverse function theorem asserts that if m = n, f :M → Nis Cr, and p is a regular point of f , then there is an open U ⊂M containing p suchthat f(U) is an open subset of N and f |U : U → f(U) is a Cr diffeomorphism.If 0 ≤ s ≤ m, a set S ⊂ Rk is an s-dimensional Cr submanifold of M if itis an s-dimensional Cr submanifold that happens to be contained in M . We saythat q ∈ N is a regular value of f if every p ∈ f−1(q) is a regular point. Thegeneralized implicit function theorem, which is known as the regular value theorem,asserts that if q is a regular value of f , then f−1(q) is an (m − n)-dimensional Cr

submanifold of M .

1.4.2 The Degree

The degree is closely related to the fixed point index, but it has its own theory,which has independent interest and significance. The approach we take here is towork with the degree up to the point where its theory is more or less complete, thentranslate what we have learned into the language of the fixed point index.

We now need to introduce the concept of orientation. Two ordered basesv1, . . . , vm and w1, . . . , wm of an m-dimensional vector space have the same ori-

entation if the determinant of the linear transformation taking each vi to wi ispositive. It is easy to see that this is an equivalence relation with two equivalenceclasses. An oriented vector space is a finite dimensional vector space with a des-ignated orientation whose elements are said to be positively oriented. If V andW are m-dimensional oriented vector spaces, a nonsingular linear transformationL : V →W is orientation preserving if it maps positively oriented ordered basesof V to positively oriented ordered bases of W , and otherwise it is orientation

reversing. For an intuitive appreciation of this concept just look in a mirror: thelinear map taking each point in the actual world to its position as seen in the mirroris orientation reversing, with right shoes turning into left shoes and such.

In our discussion of degree theory nothing is lost by working with C∞ objectsrather than Cr objects for general r, and smooth will be a synonym for C∞. Anorientation for a smooth manifoldM is a “continuous” specification of an orientationof each of the tangent spaces TpM . We say that M is orientable if it has anorientation; the most famous examples of unorientable manifolds are the Mobiusstrip and the Klein bottle. (From a mathematical point of view 2-dimensionalprojective space is perhaps more fundamental, but it is difficult to visualize.) Anoriented manifold is a manifold together with a designated orientation.

If M and N are oriented smooth manifolds of the same dimension, f :M → Nis a smooth map, and p is a regular point of f , we say that f is orientation

preserving at p if Df(p) : TpM → Tf(p)N is orientation preserving, and otherwisef is orientation reversing at p. If q is a regular value of f and f−1(q) is finite,then the degree of f over q, denoted by deg∞q (f), is the number of points inf−1(q) at which f is orientation preserving minus the number of points in f−1(q)at which f is orientation reversing.


We need to extend the degree to situations in which the target point q is not aregular value of f , and to functions that are merely continuous. Instead of beingable to define the degree directly, as we did above, we will need to proceed indirectly,showing that the generalized degree is determined by certain of its properties, whichwe treat as axioms.

The first step is to extend the concept, giving it a “local” character. For acompact C ⊂ M let ∂C = C ∩ (M \ C) be the topological boundary of C, and letintC = C \∂C be its interior. A smooth function f : C → N with compact domainC ⊂M is said to be smoothly degree admissible over q ∈ N if f−1(q)∩∂C = ∅and q is a regular value of f . As above, for such a pair (f, q) we define deg∞q (f) tobe the number of p ∈ f−1(q) at which f is orientation preserving minus the numberof p ∈ f−1(q) at which f is orientation reversing. Note that deg∞q (f) = deg∞q (f |C′)whenever C ′ is a compact subset of C and f−1(q) has an empty intersection withthe closure of C \ C ′. Also, if C = C1 ∪ C2 where C1 and C2 are compact anddisjoint, then

deg∞q (f) = deg∞q (f |C1) + deg∞q (f |C2).

From the point of view of topology, what makes the degree important is itsinvariance under homotopy. If C ⊂ M is compact, a smooth homotopy h : C ×[0, 1] → N is smoothly degree admissible over q if h−1(q) ∩ (∂C × [0, 1]) = ∅and q is a regular value of h0 and h1. In this circumstance

deg∞q (h0) = deg∞q (h1). (∗)

Figure 1.4 illustrates the intuitive character of the proof.

b

b

b

b

b

b

b

b

b

b

b

b

+1

−1

+1

−1

−1

+1

+1

−1

t = 0 t = 1

Figure 1.4


The notion of an m-dimensional manifold with boundary is a generalizationof the manifold concept in which each point in the space has a neighborhood that ishomeomorphic to an open subset of the closed half space { x ∈ Rm : x1 ≥ 0 }. Asidefrom the half space itself, the closed disk Dm = { x ∈ Rm : ‖x‖ ≤ 1 } is perhapsthe most obvious example, but for us the most important example is M × [0, 1]where M is an (m− 1)-dimensional manifold without boundary. Note that any m-dimensional manifold without boundary is (automatically and trivially) a manifoldwith boundary. All elements of our discussion of manifolds generalize to this setting.

In particular, the generalization of the regular value theorem states that if M isan m-dimensional smooth manifold with boundary, N is an n-dimensional (bound-aryless) manifold, f :M → N is smooth, and q ∈ N is a regular value of both f andthe restriction of f to the boundary of M , then f−1(q) is an (m − n)-dimensionalmanifold with boundary, its boundary is its intersection with the boundary of M ,and at each point in this intersection the tangent space of f−1(q) is not contained inthe tangent space of the boundary ofM . In particular, if the dimension ofM is thedimension of N plus one, then f−1(q) is a 1-dimensional manifold with boundary.If, in addition, f−1(q) is compact, then it has finitely many connected components.

Suppose now that h : C × [0, 1] → N is smoothly degree admissible over q,and that q is a regular value of h. The consequences of applying the regular valuetheorem to the restriction of h to intC × [0, 1] are as shown in Figure 1.4: h−1(q)is a 1-dimensional manifold with boundary, its boundary is its intersection withC×{0, 1}, and h−1(q) is not tangent to C×{0, 1} at any point in this intersection.In addition h−1(q) is compact, so it has finitely many connected components, eachof which is compact. A connected compact 1-dimensional manifold with boundaryis either a circle or a line segment. (It will turn out that this obvious fact issurprisingly difficult to prove!) Thus each component of h−1(q) is either a circle ora line segment connecting two points in its boundary. If a line segment connectstwo points in C ×{0}, say (p, 0) and (p′, 0), then it turns out that h0 is orientationpreserving at p if and only if it is orientation reversing at p′. Similarly, if a linesegment connects two points (p, 1) and (p′, 1) in C × {1}, then h1 is orientationpreserving at p if and only if it is orientation reversing at p′. On the other hand,if a line segment connects a point (p0, 0) in C × {0} to a point (p1, 1) in C × {1},then h0 is orientation preserving at p0 if and only if h1 is orientation preserving atp1. Equation (∗) is obtained by summing these facts over the various componentsof h−1(q).

This completes our discussion of the proof of (∗) except for one detail: if h :C → [0, 1] → N is a smooth homotopy that is smoothly degree admissible over q,q is not necessarily a regular value of h. Nevertheless, Sard’s theorem (which isthe subject of Chapter 11, and a crucial ingredient of our entire approach) impliesthat h has regular values in any neighborhood of q, and it is also the case thatdeg∞q′ (h0) = deg∞q (h0) and deg∞q′ (h1) = deg∞q (h1) when q

′ is sufficiently close to q.It turns out that the smooth degree is completely characterized by the properties

we have seen. That is, if D∞(M,N) is the set of pairs (f, q) in which f : C → Nis smoothly degree admissible over q, then (f, q) 7→ deg∞q (f) is the unique functionfrom D∞(M,N) to Z satisfying:

(∆1) deg∞q (f) = 1 for all (f, q) ∈ D∞(M,N) such that f−1(q) is a singleton {p}


and f is orientation preserving at p.

(∆2) deg∞q (f) =∑r

i=1 deg∞q (f |Ci

) whenever (f, q) ∈ D∞(M,N), the domain of f isC, and C1, . . . , Cr are pairwise disjoint compact subsets of C such that

f−1(q) ⊂ intC1 ∪ . . . ∪ intCr.

(∆3) deg∞q (h0) = deg∞q (h1) whenever C ⊂ M is compact and the homotopy h :C × [0, 1] → N is smoothly degree admissible over q.

We note two additional properties of the smooth degree. The first is that if, inaddition to M and N , M ′ and N ′ are m′-dimensional smooth functions, (f, q) ∈D∞(M,N), and (f ′, q′) ∈ D∞(M ′, N ′), then

(f×f ′, (q, q′)) ∈ D∞(M×M ′, N×N ′) and deg∞(q,q′)(f×f ′) = deg∞q (f)·deg∞q′ (f ′).

Since (f × f ′)−1(q, q′) = f−1(q) × f ′−1(q′), this boils down to a consequence ofelementary facts about determinants: if (p, p′) ∈ (f × f ′)−1(q, q′), then f × f ′ isorientation preserving at (p, p′) if and only if f and f ′ are either both orientationpreserving or both orientation reversing at p and p′ respectively.

The second property is a strong form of continuity. A continuous functionf : C → N with compact domain C ⊂ M is degree admissible over q ∈ N iff−1(q) ∩ ∂C = ∅. If this is the case, then there is a neighborhood U ⊂ C × N ofthe graph of f and a neighborhood V ⊂ N \ f(∂C) of q such that

deg∞q′ (f′) = degq′′(f

′′)

whenever f ′, f ′′ : C → N are smooth functions whose graphs are contained in U ,q′, q′′ ∈ V , q′ is a regular value of f ′, and q′′ is a regular value of q′′.

We can now define degq(f) to be the common value of deg∞q′ (f′) for such pairs

(f ′, q′). Let D(M,N) be the set of pairs (f, q) in which f : C → N is a continuousfunction with compact domain C ⊂ M that is degree admissible over q ∈ N . Thefully general form of degree theory asserts that (f, q) 7→ degq(f) is the uniquefunction from D(M,N) to Z such that:

(D1) degq(f) = 1 for all (f, q) ∈ D(M,N) such that f is smooth, f−1(q) is asingleton {p}, and f is orientation preserving at p.

(D2) degq(f) =∑r

i=1 degq(f |Ci) whenever (f, q) ∈ D(M,N), the domain of f is C,

and C1, . . . , Cr are pairwise disjoint compact subsets of U such that

f−1(q) ⊂ C1 ∪ . . . ∪ Cr \ (∂C1 ∪ . . . ∪ ∂Cr).

(D3) If (f, q) ∈ D(M,N) and C is the domain of f , then there is a neighborhoodU ⊂ C ×N of the graph of f and a neighborhood V ⊂ N \ f(∂C) of q suchthat

degq′(f′) = degq′′(f

′′)

whenever f ′, f ′′ : C → N are continuous functions whose graphs are containedin U and q′, q′′ ∈ V .


1.4.3 The Fixed Point Index

Although the degree can be applied to continuous functions, and even to convexvalued correspondences, it is restricted to finite dimensional manifolds. For suchspaces the fixed point index is merely a reformulation of the degree. Its applicationto general equilibrium theory was initiated by Dierker (1972), and it figures in theanalysis of the Lemke-Howson algorithm of Shapley (1974). There is also a thirdvariant of the underlying principle, for vector fields, that is developed in Chapter 15,and which is related to the theory of dynamical systems. Hofbauer (1990) appliedthe vector field index to dynamic issues in evolutionary stability, and Ritzberger(1994) applies it systematically to normal form game theory.

However, it turns out that the fixed point index can be generalized much further,due to the fact that, when we are discussing fixed points, the domain and therange are the same. The general index is developed in three main stages. Inorder to encompass these stages in a single system of terminology and notation wetake a rather abstract approach. Fix a metric space X . An index admissible

correspondence for X is an upper semicontinuous correspondence F : C → X ,where C ⊂ X is compact, that has no fixed points in ∂C. An index base for X isa set I of index admissible correspondences such that:

(a) f ∈ I whenever C ⊂ X is compact and f : C → X is an index admissiblecontinuous function;

(b) F |D ∈ I whenever F : C → X is an element of I, D ⊂ C is compact, andF |D is index admissible.

Definition 1.4.1. Let I be an index base for X. An index for I is a functionΛX : I → Z satisfying:

(I1) (Normalization) If c : C → X is a constant function whose value is an elementof intC, then ΛX(c) = 1.

(I2) (Additivity) If F : C → X is an element of I, C1, . . . , Cr are pairwise disjointcompact subsets of C, and FP(F ) ⊂ intC1 ∪ . . . ∪ intCr, then

ΛX(F ) =∑

i

ΛX(F |Ci).

(I3) (Continuity) For each element F : C → X of I there is a neighborhoodU ⊂ C × X of the graph of F such that ΛX(F ) = ΛX(F ) for every F ∈ Iwhose graph is contained in U .

For each m = 0, 1, 2, . . . an index base for Rm is given by letting Im be the setof index admissible continuous functions f : C → Rm. Of course (I1)-(I3) parallel(D1)-(D3), and it is not hard to show that there is a unique index ΛRm for Im givenby

ΛRm(f) = deg0(IdC − f).

We now extend our framework to encompass multiple spaces. An index scope

S consists of a class of metric spaces SS and an index base IS(X) for each X ∈ SSsuch that


(a) SS contains X ×X ′ whenever X,X ′ ∈ SS ;

(b) F × F ′ ∈ IS(X ×X ′) whenever X,X ′ ∈ SS , F ∈ IS(X), and F ′ ∈ IS(X ′).

These conditions are imposed in order to express a property of the index that isinherited from the multiplicative property of the degree for cartesian products.

The index also has an additional property that has no analogue in degree theory.Suppose that C ⊂ Rm and C ⊂ Rm are compact, g : C → C and g : C → C arecontinuous, and g ◦ g and g ◦ g are index admissible. Then

ΛRm(g ◦ g) = ΛRm(g ◦ g).

When g and g are smooth and the fixed points in question are regular, this boilsdown to a highly nontrivial fact of linear algebra (Proposition 13.3.2) that wasunknown prior to the development of this aspect of index theory.

This property turns out to be the key to moving the index up to a much higherlevel of generality, but before we can explain this we need to extend the setup abit, allowing for the possibility that the images of g and g are not contained in Cand C, but that there are compact sets D ⊂ C and D ⊂ C with g(D) ⊂ C andg(D) ⊂ C that contain the relevant sets of fixed points.

Definition 1.4.2. A commutativity configuration is a tuple

(X,C,D, g, X, C, D, g)

where X and X are metric spaces and:

(a) D ⊂ C ⊂ X, D ⊂ C ⊂ X, and C, C, D, and D are compact;

(b) g ∈ C(C, X) and g ∈ C(C, X) with g(D) ⊂ int C and g(D) ⊂ intC;

(c) g ◦ g|D and g ◦ g|D are index admissible;

(d) g(FP(g ◦ g|D)) = FP(g ◦ g|D).

After all these preparations we can finally describe the heart of the matter.

Definition 1.4.3. An index for an index scope S is a specification of an index ΛXfor each X ∈ SS such that:

(I4) (Commutativity) If (X,C,D, g, X, C, D, g) is a commutativity configurationwith X, X ∈ SS , (D, g ◦ g|D) ∈ IS(X), and (D, g ◦ g|D) ∈ IS(X), then

ΛX(g ◦ g|D) = ΛX(g ◦ g|D).

The index is said to be multiplicative if:

(M) (Multiplication) If X,X ′ ∈ SS, F ∈ IS(X), and F ′ ∈ IS(X ′), then

ΛX×X′(F × F ′) = ΛX(F ) · Λ′X(F

′).

1.5. TOPOLOGICAL CONSEQUENCES 17

Let SSCtr be the class of ANRs, and for each X ∈ SSCtr let ISCtr(X) be the union overcompact C ⊂ X of the sets of index admissible upper semicontinuous contractiblevalued correspondences F : C → X . The central goal of this book is:

Theorem 1.4.4. There is a unique index ΛCtr for SCtr, which is multiplicative.

The passage from the indices ΛRm to ΛCtr has two stages. The first exploitsCommutativity to extend from Euclidean spaces and continuous functions to ANR’sand continuous functions. There is a significant result that is the technical basis forthis. Let X be a metric space with metric d. If Y is a topological space and ε > 0,a homotopy η : Y × [0, 1] → X is an ε-homotopy if

d(

η(y, s), η(y, t))

< ε

for all y ∈ Y and all 0 ≤ s, t ≤ 1. We say that h0 and h1 are ε-homotopic. Forε > 0, a topological space D ε-dominates C ⊂ X if there are continuous functionsϕ : C → D and ψ : D → X such that ψ ◦ ϕ : C → X is ε-homotopic to IdC . InSection 7.6 we show that:

Theorem 1.4.5. If X is a separable ANR, C ⊂ X is compact, and ε > 0, thenthere is an open U ⊂ Rm, for some m, such that U is compact and ε-dominates C.

The second stage passes from continuous function to contractible valued corre-spondences. As in the passage from the smooth degree to the continuous degree,the idea is to use approximation by functions to define the extension. The basisof this is a result of Mas-Colell (1974) that was extended to ANR’s by the author(McLennan (1991)) and is the topic of Chapter 9.

Theorem 1.4.6 (Approximation Theorem). Suppose that X is a separable ANRand C and D are compact subsets of X with C ⊂ intD. Let F : D → Y be an uppersemicontinuous contractible valued correspondence. Then for any neighborhood Uof Gr(F |C) there are:

(a) a continuous f : C → Z with Gr(f) ⊂ U ;

(b) a neighborhood U ′ of Gr(F ) such that, for any two continuous functions f0, f1 :D → Y with Gr(f0),Gr(f1) ⊂ U ′, there is a homotopy h : C× [0, 1] → Y withh0 = f0|C, h1 = f1|C, and Gr(ht) ⊂ U for all 0 ≤ t ≤ 1.

1.5 Topological Consequences

The final section of the book develops applications of the index. Chapter 14presents a number of classical concepts and results from topology that are usuallyproved homologically. Let X be a compact ANR. The Euler characteristic of Xis the index of IdX . If F : X → X is an upper semicontinuous contractible valuedcorrespondence, the index of F is called the Lefschetz number of F . Of courseAdditivity implies that F has a fixed point if its Lefschetz number is not zero.The celebrated Lefschetz fixed point theorem is this assertion (usually restricted to


compact manifolds and continuous functions) together with a homological charac-terization of the Lefschetz number. If X is contractible, then the Lefschetz numberof any F : X → X is equal to the Euler characteristic of F , which is one. Thuswe arrive at our version of the Eilenberg-Montgomery theorem: if X is a compactAR and F : X → X is a upper semicontinuous contractible valued correspondence,then F has a fixed point.

Chapter 14 also develops many of the classical theorems concerning maps be-tween spheres. The most basic of these is Hopf’s theorem: two continuous functionsf, f ′ : Sm → Sm are homotopic if and only if they have the same degree, so that thedegree is a “complete” homotopy invariant for maps between spheres of the samedimension. There are many other theorems concerning maps between spheres ofthe same dimension. Of these, one in particular has greater depth: if f : Sm → Sm

is continuous and f(−p) = −f(p) for all p ∈ Sm, then the degree of f is odd. Thisand its many corollaries constitute the Borsuk-Ulam theorem.

Using these results, we prove the frequently useful theorem known as invarianceof domain: if U ⊂ Rm is open and f : U → Rm is continuous and injective, thenf(U) is open and f is a homeomorphism onto its image.

If a connected set of fixed points has nonzero index, then it is essential, by virtueof Continuity. The result in Section 14.5 shows that the converse holds for convexvalued correspondences with convex domains, so for the settings most commonlyconsidered in economics the notion of essentiality does not have independent sig-nificance. But it is important to understand that this result does not imply thata component of the set of Nash equilibria of a normal form game of index zero isinessential in the sense of Jiang (1963). In fact Hauk and Hurkens (2002) providea concrete example of an essential component of index zero.

1.6 Dynamical Systems

Dynamic stability is a problematic issue for economic theory. On the one hand,particularly in complex settings, it seems that an equilibrium cannot a plausibleprediction unless it can be understood as the end state of a dynamic adjustmentprocess for which it is dynamically stable. In physics and chemistry there are ex-plicit dynamical systems, and with respect to those stability is a well accepted prin-ciple. But in economics, explicit models of dynamic adjustment are systematicallyinconsistent with the principle of rational expectations: if a model of continuousadjustment of prices, or of mixed strategies, is understood and anticipated by theagents in the model, their behavior will exploit the process, not conform to it.

Early work in general equilibrium theory (e.g., Arrow and Hurwicz (1958); Ar-row et al. (1959)) found special cases, such as a single agent or two goods, in whichat least one equilibrium is necessarily stable with respect to natural price adjust-ment processes. But Scarf (1960) produced examples showing that one could nothope for more general positive results, in the sense that “naive” dynamic adjustmentprocesses, such as Walrasian tatonnement, can easily fail to have stable dynamics,even when there is a unique equilibrium and as few as three goods. A later streamof research (Saari and Simon (1978); Saari (1985); Williams (1985); Jordan (1987))

1.6. DYNAMICAL SYSTEMS 19

showed that stability is informationally demanding, in the sense that an adjust-ment process that is guaranteed to return to equilibrium after a small perturbationrequires essentially all the information in the matrix of partial derivatives of theaggregate excess demand function. On the whole there seems to be little hope offinding a theoretical basis for an assertion that some equilibrium is stable, or thata stable equilibrium exists.

In his Foundations of Economic Analysis Samuelson (1947) Samuelson describesa correspondence principle, according to which the stability of an equilibriumhas implications for the qualitative properties of its comparative statics. In thisstyle of reasoning the stability of a given equilibrium is a hypothesis rather than aconclusion, so the problematic state of the existence issue is less relevant. That is,instead of claiming that some dynamical process should result in a stable equilib-rium, one argues that equilibria with certain properties are not stable, so if whatwe observe is an equilibrium, it cannot have these properties.

Proponents of such reasoning still need to wrestle with the fact that there is nocanonical dynamical process. (The conceptual foundations of economic dynamics,and in particular the principle of rational expectations, were not well understood inSamuelson’s time, and his discussion would be judged today to have various weak-nesses.) Here there is the possibility of arguing that although any one dynamicalprocess might be ad hoc, the instability is common to all “reasonable” or “natural”dynamics, for example those in which price adjustment is positively related to excessdemand, or that each agent’s mixed strategy adjusts in a direction that would im-prove her expected utility if other mixed strategies were not also adjusting. From astrictly logical point of view, such reasoning might seem suspect, but it seems quitelikely that most economists find it intuitively and practically compelling.

In Chapter 15 we present a necessary condition for stability of a componentof the set of equilibria that was introduced into game theory by Demichelis andRitzberger (2003). (See also Demichelis and Germano (2000).) We now give aninformal description of this result, with the relevant background, and relate it toSamuelson’s correspondence principle.

Let M be an m-dimensional C2 manifold, where r ≥ 2. A vector field ζ on aset S ⊂ M is a continuous (in the obvious sense) assignment of a tangent vectorζp ∈ TpM to each p ∈ S. Vector fields have many applications, but by far the mostimportant is that if ζ is defined on an open U ⊂ M and satisfies a mild technicalcondition, then it determines an autonomous dynamical system: there is an openW ⊂ U ×R such that for each p ∈ U , { t ∈ R : (p, t) ∈ W } is an interval containing0, and a unique function Φ : W → U such that Φ(p, 0) = p for all p and, for each(p, t) ∈ W , the time derivative of Φ at (p, t) is ζΦ(p,t). If W is the maximal domainadmitting such a function, then Φ is the flow of ζ . A point p where ζp = 0 is anequilibrium of ζ .

A set A ⊂ M is invariant if Φ(p, t) ∈ A for all p ∈ A and t ≥ 0. The ω-limit

set of p ∈M is⋂

t0≥0

{Φ(p, t) : t ≥ t0 }.

The domain of attraction of A is

D(A) = { p ∈M : the ω-limit set of p is nonempty and contained in A }.


A set A ⊂M is asymptotically stable if:

(a) A is compact;

(b) A is invariant;

(c) D(A) is a neighborhood of A;

(d) for every neighborhood U of A there is a neighborhood U such that Φ(p, t) ∈ Ufor all p ∈ U and t ≥ 0.

There is a well known sufficient condition for asymptotic stability. A functionf :M → R is ζ-differentiable if the ζ-derivative

ζf(p) =d

dtf(Φ(p, t))|t=0

is defined for every p ∈M . A continuous function L :M → [0,∞) is a Lyapunov

function for A ⊂M if:

(a) L−1(0) = A;

(b) L is ζ-differentiable with ζL(p) < 0 for all p ∈M \ A;

(c) for every neighborhood U of A there is an ε > 0 such that L−1([0, ε]) ⊂ U .

One of the oldest results in the theory of dynamical systems (Theorem 15.4.1) dueto Lyapunov, is that if there is a Lyapunov function for A, then A is asymptoticallystable.

A converse Lyapunov theorem is a result asserting that if A is asymptoticallystable, then there is a Lyapunov function for A. Roughly speaking, this is true, butthere is in addition the question of what sort of smoothness conditions one mayrequire of the Lyapunov function. The history of converse Lyapunov theorems israther involved, and the issue was not fully resolved until the 1960’s. We presentone such theorem (Theorem 15.5.1) that is sufficient for our purposes.

There is a well established definition of the index of an isolated equilibrium ofa vector field. We show that this extends to an axiomatically defined vector fieldindex. The theory of the vector field index is exactly analogous to the theories ofthe degree and the fixed point index, and it can be characterized in terms of thefixed point index. Specifically, a vector field ζ defined on a compact C ⊂ M isindex admissible if it does not have any equilibria in the boundary of C. It turnsout that if ζ is defined on a neighborhood of C, and satisfies the technical conditionguaranteeing the existence and uniqueness of the flow, then the vector field indexof ζ is the fixed point index of Φ(·, t)|C for small negative t. (The characterizationis in terms of negative time due to an unfortunate normalization axiom for thevector field index that is now traditional.) One may define the vector field indexof a compact connected component of the set of equilibria to be the index of therestriction of the vector field to a small compact neighborhood of the component.

The definition of asymptotic stability, and in particular condition (d), shouldmake us suspect that there is a connection with the Euler characteristic, because

1.6. DYNAMICAL SYSTEMS 21

for small positive t the flow Φ(·, t) will map neighborhoods of A into themselves.The Lyapunov function given by the converse Lyapunov theorem is used in Section15.6 to show that if A is dynamically stable and an ANR (otherwise the Euler char-acteristic is undefined) then the vector field index of A is (−1)mχ(A). In particular,if A is a singleton, then A can only be stable when the vector field index of A is(−1)m. This is the result of Demichelis and Ritzberger. The special case whenA = {p0} is a singleton is a prominent result in the theory of dynamical systemsdue to Krasnosel’ski and Zabreiko (1984).

We now describe the relationship between this result and qualitative propertiesof an equilibrium’s comparative statics. Consider the following stylized example.Let U be an open subset of Rm; an element of U is thought of as a vector ofendogenous variables. Let P be an open subset of Rn; an element of P is thought ofas a vector of exogenous parameters. Let z : U ×P → Rm be a C1 function, and let∂xz(x, α) and ∂αz(x, α) denote the matrices of partial derivatives of the componentsof z with respect to the components of x and α.

We think of z as a parameterized vector field on U . An equilibrium for aparameter α ∈ P is an x ∈ U such that z(x, α) = 0. Suppose that x0 is anequilibrium for α0, and ∂xz(x0, α0) is nonsingular. The implicit function theoremgives a neighborhood V of α and a C1 function σ : V → U with σ(α0) = x0 andz(σ(α), α) = 0 for all α ∈ V . The method of comparative statics is to differentiatethis equation with respect to α at α0, then rearrange, obtaining the equation

dσ

dα(α0) = −∂xz(x0, α0)

−1 · ∂α(x0, α0)

describing how the endogenous variables adjust, in equilibrium, to changes in thevector of parameters. The Krasnosel’ski-Zabreiko theorem implies that if {x0} is anasymptotically stable set for the dynamical system determined by the vector fieldz(·, α0), then the determinant of −∂xz(x0, α0)

−1 is positive. This is a precise andgeneral statement of the correspondence principle.

Part I

Topological Methods

22

Chapter 2

Planes, Polyhedra, and Polytopes

This chapter studies basic geometric objects defined by linear equations andinequalities. This serves two purposes, the first of which is simply to introducebasic vocabulary. Beginning with affine subspaces and half spaces, we will pro-ceed to (closed) cones, polyhedra, and polytopes, which are polyhedra that arebounded. A rich class of well behaved spaces is obtained by combining polyhedrato form polyhedral complexes. Although this is foundational, there are nonethelessseveral interesting and very useful results and techniques, notably the separatinghyperplane theorem, Farkas’ lemma, and barycentric subdivision.

2.1 Affine Subspaces

Throughout the rest of this chapter we work with a fixed d-dimensional real innerproduct space V . (Of course we are really talking about Rd, but a more abstractsetting emphasizes the geometric nature of the constructions and arguments.) Weassume familiarity with the concepts and results of basic linear algebra.

An affine combination of y0, . . . , yr ∈ V is a point of the form

α0y0 + · · ·+ αryr

where α = (α0, . . . , αr) is a vector of real numbers whose components sum to 1. Wesay that y0, . . . , yr are affinely dependent if it is possible to represent a point asan affine combination of these points in two different ways: that is, if

∑

j

αj = 1 =∑

j

α′j and

∑

j

αjyj =∑

j

α′jyj,

then α = α′. If y0, . . . , yr are not affinely dependent, then they are affinely inde-

pendent.

Lemma 2.1.1. For any y0, . . . , yr ∈ V the following are equivalent:

(a) y0, . . . , yr are affinely independent;

(b) y1 − y0, . . . , yr − y0 are linearly independent;

23

24 CHAPTER 2. PLANES, POLYHEDRA, AND POLYTOPES

(c) there do not exist β0, . . . , βr ∈ R, not all of which are zero, with∑

j βj = 0and

∑

j βjyj = 0.

Proof. Suppose that y0, . . . , yr are affinely dependent, and let αj and α′j be as above.

If we set βj = αj −α′j , then

∑

j βj = 0 and∑

j βjyj = 0, so (c) implies (a). In turn,if∑

j βj = 0 and∑

j βjyj = 0, then

β1(y1 − y0) + · · ·+ βr(yr − y0) = −(β1 + · · ·+ βr)y0 + β1y1 + · · ·+ βryr = 0,

so y1 − y0, . . . , yr− y0 are linearly dependent. Thus (b) implies (c). If β1(y1 − y0) +· · · + βr(yr − y0) = 0, then for any α0, . . . , αr with α0 + · · · + αr = 1 we can setβ0 = −(β1 + · · · + βr) and α′

j = αj + βj for j = 0, . . . , r, thereby showing thaty0, . . . , yr are affinely dependent. Thus (a) implies (b).

The affine hull aff(S) of a set S ⊂ V is the set of all affine combinations ofelements of S. The affine hull of S contains S as a subset, and we say that S isan affine subspace if the two sets are equal. That is, S is an affine subspace if itcontains all affine combinations of its elements. Note that the intersection of twoaffine subspaces is an affine subspace. If A ⊂ V is an affine subspace and a0 ∈ A,then { a − a0 : a ∈ A } is a linear subspace, and the dimension dimA of A is,by definition, the dimension of this linear subspace. The codimension of A isd− dimA. A hyperplane is an affine subspace of codimension one.

A (closed) half-space is a set of the form

H = { v ∈ V : 〈v, n〉 ≤ β }

where n is a nonzero element of V , called the normal vector of H , and β ∈ R.Of course H determines n and β only up to multiplication by a positive scalar. Wesay that

I = { v ∈ V : 〈v, n〉 = β }is the bounding hyperplane of H . Any hyperplane is the intersection of the twohalf-spaces that it bounds.

2.2 Convex Sets and Cones

A convex combination of y0, . . . , yr ∈ V is a point of the form α0y0+· · ·+αryrwhere α = (α0, . . . , αr) is a vector of nonnegative numbers whose components sumto 1. A set C ⊂ V is convex if it contains all convex combinations of its elements,so that (1− t)x0 + tx1 ∈ C for all x0, x1 ∈ C and 0 ≤ t ≤ 1. For any set S ⊂ V theconvex hull conv(S) of S is the smallest convex containing S. Equivalently, it isthe set of all convex combinations of elements of S.

The following fact is a basic tool of geometric analysis.

Theorem 2.2.1 (Separating Hyperplane Theorem). If C is a closed convex subsetof V and z ∈ V \ C, then there is a half space H with C ⊂ H and z /∈ H.

2.2. CONVEX SETS AND CONES 25

Proof. The case C = ∅ is trivial. Assuming C 6= ∅, the intersection of C with aclosed ball centered at z is compact, and it is nonempty if the ball is large enough,in which case it must contain a point x0 that minimizes the distance to z over thepoints in this intersection. By construction this point is as close to z as any otherpoint in C. Let n = z − x0 and β = 〈(x0 + z)/2, n〉. Checking that 〈n, z〉 > β is asimple calculation.

We claim that 〈x, n〉 ≤ 〈x0, n〉 for all x ∈ C, which is enough to imply thedesired result because 〈x0, n〉 = β − 1

2〈n, n〉. Aiming at a contradiction, suppose

that x ∈ C and 〈x, n〉 > 〈x0, n〉, so that 〈x− x0, z − x0〉 > 0. For t ∈ R we have

‖(1− t)x0 + tx− z‖2 = ‖x0 − z‖2 + 2t〈x0 − z, x− x0〉+ t2‖x− x0‖2,

and for small positive t this is less than ‖x0−z‖2, contradicting the choice of x0.

A convex cone is convex set C that is nonempty and closed under multiplicationby nonnegative scalars, so that αx ∈ C for all x ∈ C and α ≥ 0. Such a cone isclosed under addition: if x, y ∈ C, then x + y = 2(1

2x + 1

2y) is a positive scalar

multiple of a convex combination of x and y. Conversely, if a set is closed underaddition and multiplication by positive scalars, then it is a cone.

The dual of a convex set C is

C∗ = {n ∈ V : 〈x, n〉 ≥ 0 for all x ∈ C }.

Clearly C∗ is a convex cone, and it is closed, regardless of whether C is closed,because C∗ is the intersection of the closed half spaces {n ∈ V : 〈x, n〉 ≥ 0 }.

An intersection of closed half spaces is a closed convex cone. Farkas’ lemma is theconverse of this: a closed convex cone is an intersection of closed half spaces. From atechnical point of view, the theory of systems of linear inequalities is dominated bythis result because a large fraction of the results about systems of linear inequalitiescan easily be reduced to applications of it.

Theorem 2.2.2 (Farkas’ Lemma). If C is a closed convex cone, then for any b ∈V \ C there is n ∈ C∗ such that 〈n, b〉 < 0.

Proof. The separating hyperplane theorem gives n ∈ V and β ∈ R such that〈n, b〉 < β and 〈n, x〉 > β for all x ∈ C. Since 0 ∈ C, β < 0. There cannot be x ∈ Cwith 〈n, x〉 < 0 because we would have 〈n, αx〉 < β for sufficiently large α > 0, son ∈ C∗.

The recession cone of a convex set C is

RC = { y ∈ V : x+ αy ∈ C for all x ∈ C and α ≥ 0 }.

Clearly RC is, in fact, a convex cone.

Lemma 2.2.3. Suppose C is nonempty, closed, and convex. Then RC is the setof y ∈ V such that 〈y, n〉 ≤ 0 whenever H = { v ∈ V : 〈v, n〉 ≤ β } is a half spacecontaining C, so RC is closed because it is an intersection of closed half spaces. Inaddition, C is bounded if and only if RC = {0}.


Proof. Since C 6= ∅, if y ∈ RC , then 〈y, n〉 ≤ 0 whenever H = { v ∈ V : 〈v, n〉 ≤ β }is a half space containing C. Suppose that y satisfies the latter condition and x ∈ C.Then for all α ≥ 0, x + αy is contained in every half space containing C, and theseparating hyperplane theorem implies that the intersection of all such half spacesis C itself. Thus y is in RC .

If RC has a nonzero element, then of course C is unbounded. Suppose that Cis unbounded. Fix a point x ∈ C, and let y1, y2, . . . be a divergent sequence in C.Passing to a subsequence if need be, we can assume that

yj−x‖yj−x‖ converges to a unit

vector w. To show that w ∈ RC it suffices to observe that if H = { v : 〈v, n〉 ≤ β }is a half space containing C, then 〈w, n〉 ≤ 0 because

⟨ yj − x

‖yj − x‖ , n⟩

≤ β − 〈x, n〉‖yj − x‖ → 0.

The lineality space of a convex set C is

LC = RC ∩ −RC = { y ∈ V : x+ αy ∈ C for all x ∈ C and α ∈ R }.

The lineality space is closed under addition and scalar multiplication, so it is alinear subspace of V , and in fact it is the largest linear subspace of V contained inRC . Let L

⊥C be the orthogonal complement of LC . Clearly C + LC = C, so

C = (C ∩ L⊥C) + LC .

A convex cone is said to be pointed if its lineality space is {0}.

Lemma 2.2.4. If C 6= V is a closed convex cone, then there is n ∈ C∗ with 〈n, x〉 > 0for all x ∈ C \ LC.

Proof. For n ∈ C∗ let Zn = { x ∈ C : 〈x, n〉 = 0 }. Let n be a point in C∗ thatminimizes the dimension of the span of Zn. Aiming at a contradiction, supposethat 0 6= x ∈ Zn \ LC. Then −x /∈ C because x /∈ LC, and Farkas Lemma givesan n′ ∈ C∗ with 〈x, n′〉 < 0. Then Zn+n′ ⊂ Zn ∩ Zn′ (this inclusion holds for alln, n′ ∈ C∗) and the span of Zn+n′ does not contain x, so it is a proper subspace ofthe span of Zn.

2.3 Polyhedra

A polyhedron in V is an intersection of finitely many closed half spaces. Weadopt the convention that V itself is a polyhedron by virtue of being “the intersec-tion of zero half-spaces.” Any hyperplane is the intersection of the two half-spacesit bounds, and any affine subspace is an intersection of hyperplanes, so any affinesubspace is a polyhedron. The dimension of a polyhedron is the dimension of itsaffine hull. Fix a polyhedron P .

A face of P is either the empty set, P itself, or the intersection of P with thebounding hyperplane of some half-space that contains P . Evidently any face of P

2.3. POLYHEDRA 27

is itself a polyhedron. If F and F ′ are faces of P with F ′ ⊂ F , then F ′ is a faceof F , because if F ′ = P ∩ I ′ where I ′ is the bounding hyperplane of a half spacecontaining P , then that half space contains F and F ′ = F ∩ I ′. A face is proper ifit is not P itself. A facet of P is a proper face that is not a proper subset of anyother proper face. An edge of P is a one dimensional face, and a vertex of P is azero dimensional face. Properly speaking, a vertex is a singleton, but we will oftenblur the distinction between such a singleton and its unique element, so when werefer to the vertices of P , usually we will mean the points themselves.

We say that x ∈ P is an initial point of P if there does not exist x′ ∈ P anda nonzero y ∈ RP such that x = x′ + y. If the lineality subspace of P has positivedimension, so that RP is not pointed, then there are no initial points.

Proposition 2.3.1. The set of initial points of P is the union of the bounded facesof P .

Proof. Let F be a face of P , so that F = P ∩ I where I is the bounding hyperplaneof a half plane H containing P . Let x be a point in F .

We first show that if x is noninitial, then F is unbounded. Let Let x = x′ + yfor some x′ ∈ P and nonzero y ∈ RP . Since x − y and x + y are both in H , theymust both be in I, so F contains the ray { x+αy : α ≥ 0, and this ray is containedin P because y ∈ RP , so F is unbounded.

We now know that the union of the bounded faces is contained in the set ofinitial points, and we must show that if x is not contained in a bounded face, itis noninitial. We may assume that F is the smallest face containing x. Since Fis unbounded there is a nonzero y ∈ RF . The ray { x − αy : α ≥ 0 } leaves P atsome α ≥ 0. (Otherwise the lineality of RP has positive dimension and there areno initial points.) If α > 0, then x is noninitial, and α = 0 is impossible because itwould imply that x belonged to a proper face of F .

Proposition 2.3.2. If RP is pointed, then every point in P is the sum of an initialpoint and an element of RP .

Proof. Lemma 2.2.4 gives an n ∈ V such that 〈y, n〉 > 0 for all nonzero y ∈ RP .Fix x ∈ P . Clearly K = (x − RP ) ∩ P is convex, and it is bounded because itsrecession cone is contained in −Rp ∩ RP = {0}. Lemma 2.2.3 implies that K isclosed, hence compact. Let x′ be a point in K that minimizes 〈x′, n〉. Then x is asum of x′ and a point in RP , and if x′ was not initial, so that x′ = x′′ + y wherex′′ ∈ P and 0 6= y ∈ RP , then 〈x′′, n〉 < 〈x′, n〉, which is impossible.

Any polyhedron has a standard representation, which is a representation ofthe form

P = G ∩k⋂

i=1

Hi

where G is the affine hull of P and H1, . . . , Hk are half-spaces. This representationof P is minimal if it is irredundant, so that for each j, G ∩ ⋂

i 6=j Hi is a propersuperset. Starting with any standard representation of P , we can reduce it to aminimal representation by repeatedly eliminating redundant half spaces. We nowfix a minimal representation, with Hi = { v ∈ V : 〈v, ni〉 ≤ αi } and Ii the boundinghyperplane of Hi.


Lemma 2.3.3. P has a nonempty interior in the relative topology of G.

Proof. For each i we cannot have P ⊂ Ii because that would imply that G ⊂ Ii,making Hi redundant. Therefore P must contain some xi in the interior of eachHi. If x0 is a convex combination of x1, . . . , xk with positive weights, then x0 iscontained in the interior of each Hi.

Proposition 2.3.4. For J ⊂ {1, . . . , k} let FJ = P ∩ ⋂

j∈J Ij. Then FJ is a faceof P , and every nonempty face of P has this form.

Proof. If we choose numbers βj > 0 for all j ∈ J , then

⟨

x,∑

j∈Jβjnj

⟩

≤∑

j∈Jβjαj

for all x ∈ P , with equality if and only if x ∈ FJ . We have displayed FJ as a face.Now let F = P ∩H where H = { v ∈ V : 〈v, n〉 ≤ α } is a half-space containing

P , and let J = { j : F ⊂ Ij }. Of course F ⊂ FJ . Aiming at a contradiction,suppose there is a point x ∈ FJ \F . Then 〈x, ni〉 ≤ αi for all i /∈ J and 〈x, nj〉 = αjfor all j ∈ J . For each i /∈ J there is a yi ∈ F with 〈yi, ni〉 < αi; let y be a strictconvex combination of these. Then 〈y, ni〉 < αi for all i /∈ J and 〈y, nj〉 ≤ αj forall j ∈ J . Since x /∈ H and y ∈ H , the ray emanating from x and passing throughy leaves H at y, and consequently it must leave P at y, but continuing along thisray from y does not immediately violate any of the inequalities defining P , so thisis a contradiction.

This result has many worthwhile corollaries.

Corollary 2.3.5. P has finitely many faces, and the intersection of any two facesis a face.

Corollary 2.3.6. If F is a face of P and F ′ is a face of F , then F ′ is a face of P .

Proof. If G0 is the affine hull of F , then F = G0∩⋂

iHi is a standard representationof F . The proposition implies that F = P∩⋂i∈J Ii for some J , that F ′ = F∩⋂i∈J ′ Iifor some J ′, and that F ′ = P ∩⋂

i∈J∪J ′ Ii is a face of P .

Corollary 2.3.7. The facets of P are F{1}, . . . , F{k}. The dimension of each F{i}is one less than the dimension of P , The facets are the only faces of P with thisdimension.

Proof. Minimality implies that each F{i} is a proper face, and the result aboveimplies that F{i} cannot be a proper subset of another proper face. Thus each F{i}is a facet.

For each i minimality implies that for each j 6= i there is some xj ∈ F{i} \ F{j}.Let x be a convex combination of these with positive weights, then F{i} contains aneighborhood of x in Ii, so the dimension of F{i} is the dimension of G ∩ Ii, whichis one less than the dimension of P .

A face F that is not a facet is a proper face of some facet, so its dimension isnot greater than two less than the dimension of P .

2.4. POLYTOPES 29

Now suppose that P is bounded. Any point in P that is not a vertex can bewritten as a convex combination of points in proper faces of P . Induction on thedimension of P proves that:

Proposition 2.3.8. If P is bounded, then it is the convex hull of its set of vertices.

An extreme point of a convex set is a point that is not a convex combinationof other points in the set. This result immediately implies that only vertices of Pcan be extreme. In fact any vertex v is extreme: if {v} = P ∩ I where I is thebounding hyperplane of a half space H containing P , then v cannot be a convexcombination of elements of P \ I.

2.4 Polytopes

A polytope in V is the convex hull of a finite set of points. Polytopes werealready studied in antiquity, but the subject continues to be an active area ofresearch; Ziegler (1995) is a very accessible introduction. We have just seen thata bounded polyhedron is a polytope. The most important fact about polytopes isthe converse:

Theorem 2.4.1. A polytope is a polyhedron.

Proof. Fix P = conv{q1, . . . , qℓ}. The property of being a polyhedron is invariantunder translations: for any x ∈ V , P is a polyhedron if and only if x+ P is also apolyhedron. It is also invariant under passage to subspaces: P is a polyhedron in Vif and only if it is a polyhedron in the span of P , and in any intermediate subspace.The two invariances imply that we may reduce to a situation where the dimensionof P is the same as the dimension of V , and from there we may translate to makethe origin of V an interior point of P . Assume this is the case.

Let

P ∗ = { v ∈ V : 〈v, p〉 ≤ 1 for all p ∈ P }and

P ∗∗ = { u ∈ V : 〈u, v〉 ≤ 1 for all v ∈ P ∗ }.Since P is bounded and has the origin as an interior point, P ∗ is bounded with theorigin in its interior. The formula P ∗ =

⋂

j{ v ∈ V : 〈v, qj〉 ≤ 1 } displays P ∗ as apolyhedron, hence a polytope. This argument with P ∗ in place of P implies thatP ∗∗ is a bounded polyhedron, so it suffices to show that P ∗∗ = P . The definitionsimmediately imply that P ⊂ P ∗∗.

Suppose that z /∈ P . The separating hyperplane theorem gives w ∈ V andβ ∈ R such that 〈w, z〉 < β and 〈w, p〉 > β for all p ∈ P . Since the origin is in P ,β < 0. Therefore −w/β ∈ P ∗, and consequently z /∈ P ∗∗.

Wrapping things up, there is the following elegant decomposition result:

Proposition 2.4.2. Any polyhedron P is the sum of a linear subspace, a pointedcone, and a polytope.


Proof. Let L be its lineality, and let K be a linear subspace of V that is comple-mentary to L in the sense that K ∩L = {0} and K+L = V . Let Q = P ∩K. ThenP = Q + L, and the lineality of Q is {0}, so RQ is pointed. Let S be the convexhull of the set of initial points of Q. Above we saw that this is the convex hull ofthe set of vertices of Q, so S is a polytope. Now Proposition 2.3.2 gives

P = L+RQ + S.

2.5 Polyhedral Complexes

A wide variety of spaces can be created by taking the union of a finite collectionof polyhedra.

Definition 2.5.1. A polyhedral complex is a finite set P = {P1, . . . , Pk} ofpolyhedra in V such that:

(a) F ∈ P whenever P ∈ P and F is a nonempty face of P ;

(b) for any 1 ≤ i, j ≤ k, Pi ∩ Pj is a common (possibly empty) face of Pi and Pj.

The underlying space of the complex is

|P| :=⋃

P∈PP,

and we say that P is a polyhedral subdivision of |P|. The dimension of P is themaximum dimension of any of its elements.

To illustrate this concept we mention a structure that was first studied byDescartes, and that has accumulated a huge literature over the centuries . Letx1, . . . , xn be distinct points in V . The Voronoi diagram determined by thesepoints is

P = {PJ : ∅ 6= J ⊂ {1, . . . , n} } ∪ {∅}where

PJ = { y ∈ V : ‖y − xj‖ ≤ ‖y − xi‖ for all j ∈ J and i = 1, . . . , n }

is the set of points such that the xj for j ∈ J are as close to y as any of the pointsx1, . . . , xn. From Euclidean geometry we know that the condition ‖y−xj‖ ≤ ‖y−xi‖determines a half space in V (a quick calculation shows that ‖y−xj‖2 ≤ ‖y−xi‖2 ifand only if 〈y, xj−xi〉 ≥ 1

2(‖xj‖2−‖xi‖)) so each PJ is a polyhedron, and conditions

(a) and (b) are easy consequences of Proposition 2.3.4.Fix a polyhedral complex P. A subcomplex of P is a subset Q ⊂ P that

contains all the faces of its elements, so that Q is also a polyhedral complex. If thisis the case, then |Q| is a closed (because it is a finite union of closed subsets) subsetof |P|. We say that P is a polytopal complex if each Pj is a polytope, in whichcase P is said to be a polytopal subdivision of |P|. Note that |P| is necessarily

2.5. POLYHEDRAL COMPLEXES 31

compact because it is a finite union of compact sets. A k-dimensional simplex isthe convex hull of an affinely independent collection of points x0, . . . , xk. We saythat P is a simplicial complex, and that P is a simplicial subdivision of |P|,or a triangulation, if each Pj is a simplex.

b

b

b

b

b

bbb

b

b

b

b

b

bbb

b

b b

b

b

bb

b

b

b

b

b

b

We now describe a general method of subdividing a polytopal complex P into asimplicial complex Q. For each P ∈ P choose wP in the relative interior of P . LetQ be the collection of sets of the form

σQ = conv({wP : P ∈ Q })

where Q is a subset of P that is completely ordered by inclusion. We claim that Qis a simplicial complex, and that |Q| = |P|.

Suppose that Q = {P0, . . . , Pk} where Pi−1 is a proper subset of Pi for 1 ≤ i ≤ k.For each i, wP0 , . . . , wPi−1

are contained in Pi−1, and wPiis not contained in the

affine hull of Pi−1, so wPi− wP0 is not spanned by wP1 − wP0, . . . , wPi−1

− wP0. Byinduction, wP1 − wP0, . . . , wPk

− wP0 are linearly independent. Now Lemma 2.1.1implies that wP0 , . . . , wPk

are affinely independent, so σQ is a simplex.

In addition to Q, suppose that Q′ = {P ′0, . . . , P

′k′} where P ′

j−1 is a proper subsetof P ′

j for 1 ≤ j ≤ k′. Clearly σQ∩Q′ ⊂ σQ ∩ σQ′, and we claim that it is also thecase that the σQ ∩ σQ′ ⊂ σQ∩Q′. Consider an arbitrary x ∈ σQ ∩ σQ′. It suffices toshow the desired inclusion with Q and Q′ replaced by the smallest sets Q ⊂ Q andQ′ ⊂ Q′ such that x ∈ σQ∩σQ′ , so we may assume that x is in the interior of Pk andin the interior of P ′

k′, and it follows that Pk = P ′k′. In addition, the ray emanating

from wPkand passing through x leaves Pk at a point y ∈ σ{P0,...,Pk−1} ∩ σ{P ′

0,...,P′

k′−1},

and the claim follows by induction on max{k, k′}. We have shown that Q is asimplicial complex.

Evidently |Q| ⊂ |P|. Choosing x ∈ |P| arbitrarily, let P be the smallest elementof P that contains x. If x = wP , then x ∈ σ{P}, and if P is 0-dimensional then thisis the only possibility. Otherwise the ray emanating from wP and passing throughx intersects the boundary of P at a point y, and if y ∈ σQ, then x ∈ σQ∪{P}. Byinduction on the dimension of P we see that x is contained in some element of Q,so |Q| = |P|.


This construction shows that the underlying space of a polytopal complex is alsothe underlying space of a simplicial complex. In addition, repeating this processcan give a triangulation with small simplices. The diameter of a polytope is themaximum distance between any two of its points. Themesh of a polytopal complexis the maximum of the diameters of its polytopes.

Consider an ℓ-dimensional simplex P whose vertices are v0, . . . , vℓ. The barycen-ter of P is

β(P ) :=1

ℓ+ 1(v0 + · · ·+ vℓ).

In the construction above, suppose that P is a simplicial complex, and that wechose wP = βP for all P . We would like to bound the diameter of the simplices inthe subdivision of |P|, which amounts to giving a bound on the maximum distancebetween the barycenters of any two nested faces. After reindexing, these can betaken to be the faces spanned by v0, . . . , vk and v0, . . . , vℓ where 0 ≤ k < ℓ ≤ m andm is the dimension of P. The following rather crude inequality is sufficient for ourpurposes.

∥

∥

1

k + 1(v0 + · · ·+ vk)−

1

ℓ+ 1(v0 + · · ·+ vℓ)

∥

∥

=1

(k + 1)(ℓ+ 1)

∥

∥

∥

∑

0≤i≤k

∑

0≤j≤ℓvi − vj

∥

∥

∥

≤ 1

(k + 1)(ℓ+ 1)

∑

0≤i≤k

∑

0≤j≤ℓ,j 6=i‖vi − vj‖

≤ 1

(k + 1)(ℓ+ 1)(k + 1)ℓD ≤ m

m+ 1D.

It follows from this that the mesh of the subdivision of |P| is not greater thanm/(m+ 1) times the mesh of P. Since we can subdivide repeatedly:

Proposition 2.5.2. The underlying space of a polytopal complex has triangulationsof arbitrarily small mesh.

Simplicial complexes can be understood in purely combinatoric terms. An ab-

stract simplicial complex is a pair (V,Σ) where V is a finite set of vertices andΣ is a collection of subsets of V with the property that τ ∈ Σ whenever σ ∈ Σ andτ ⊂ σ. The geometric interpretation is as follows. Let { ev : v ∈ V } be the stan-dard unit basis vectors of RV : the v-component of ev is 1 and all other coordinatesare 0. (Probably most authors would work with R|V |, but our approach is simplerand formally correct insofar as Y X is the set of functions from X to Y .) For eachnonempty σ ∈ Σ let Pσ be the convex hull of { ev : v ∈ σ }, and let P∅ = ∅. Thesimplicial complex

P(V,Σ) = {Pσ : σ ∈ Σ }is called the canonical realization of (V,Σ).

Let P be a simplicial complex, and let V be the set of vertices of P. For eachP ∈ P let σP = P ∩ V be the set of vertices of P , and let Σ = { σP : P ∈ P }.It is easy to see that extending the map v 7→ e affinely on each simplex induces

2.6. GRAPHS 33

a homeomorphism between |P| and |P(V,Σ)|. Thus the homeomorphism type of asimplicial complex is entirely determined by its combinatorics, i.e., the “is a faceof” relation between the various simplices. Geometric simplicial complexes andabstract simplicial complexes encompass the same class of homeomorphism typesof topological spaces.

Simplicial complexes are very important in topology. On the one hand a widevariety of important spaces have simplicial subdivisions, and certain limiting pro-cesses can be expressed using repeated barycentric subdivision. On the other hand,the purely combinatoric nature of an abstract simplicial complex allows combina-toric and algebraic methods to be applied. In addition the requirement that asimplicial subdivision exists rules out spaces exhibiting various sorts of pathologiesand infinite complexities. A nice example of a space that does not have a simplicialsubdivision is the Hawaiian earring, which is the union over all n = 1, 2, 3, . . . ofthe circle of radius 1/n centered at (1/n, 0) ∈ R2.

2.6 Graphs

A graph is a one dimensional polytopal complex. That is, it consists of finitelymany zero and one dimensional polytopes, with the one dimensional polytopes in-tersecting at common endpoints, if they intersect at all. A one dimensional polytopeis just a line segment, which is a one dimensional simplex, so a graph is necessarilya simplicial complex.

Relative to general simplicial complexes, graphs sound pretty simple, and fromthe perspective of our work here this is indeed the case, but the reader should beaware that there is much more to graph theory than this. The formal study ofgraphs in mathematics began around the middle of the 20th century and quicklybecame an extremely active area of research, with numerous subfields, deep results,and various applications such as the theory of networks in economic theory. Amongthe numerous excellent texts in this area, Bollobas (1979) can be recommended tothe beginner.

This book will use no deep or advanced results about graphs. In fact, almosteverything we need to know about them is given in Lemma 2.6.1 below. The mainpurpose of this section is simply to introduce the basic terminology of the subject,which will be used extensively.

Formally, a graph1 is a triple G = (V,E) consisting of a finite set V of verticesand a set E of two element subsets of V . An element of e = {v, w} of E is called anedge, and v and w are its endpoints. Sometimes one writes vw in place of {v, w}.Two vertices are neighbors if they are the endpoints of an edge. The degree of avertex is the cardinality of its set of neighbors.

A walk in G is a sequence v0v1 · · · vr of vertices such that vj−1 and vj areneighbors for each j = 1, . . . , r. It is a path if v0, . . . , vr are all distinct. A path is

1In the context of graph theory the sorts of graphs we describe here are said to be “simple,”to distinguish them from a more complicated class of graphs in which there can be loops (that is,edges whose two endpoints are the same) and multiple edges connecting a single pair of vertices.They are also said to be “undirected” to distinguish them from so-called directed graphs in whicheach edge is oriented, with a “source” and “target.”


maximal if it not contained (in the obvious sense) in a longer path. Two verticesare connected if they are the endpoints of a path. This is an equivalence relation,and a component of G is one of the graphs consisting of an equivalence class andthe edges in G joining its vertices. We say that G is connected if it has only onecomponent, so that any two vertices are connected. A walk v0v1 · · · vr is a cycle ifr ≥ 3, v0, . . . , vr−1 are distinct, and vr = v0. If G has no cycles, then it is said tobe acyclic. A connected acyclic graph is a tree.

The following simple fact is the only “result” from graph theory applied in thisbook. It is sufficiently obvious that there would be little point in including a proof.

Lemma 2.6.1. If the degree of each of the vertices of G is at most two, then thecomponents of G are maximal paths, cycles, and vertices with no neighbors.

This simple principle underlies all the algorithms described in Chapter 3. Thereare an even number of endpoints of paths in G. If it is known that an odd numberrepresent or embody a situation that is not what we are looking for, then the restdo embody what we are looking for, and in particular the number of “solutions” isodd, hence positive. If it is known that exactly one endpoint embodies what we arenot looking for, and that endpoint is easily computed, then we can find a solutionby beginning at that point and following the path to its other endpoint.

Chapter 3

Computing Fixed Points

When it was originally proved, Brouwer’s fixed point theorem was a major break-through, providing a resolution of several outstanding problems in topology. Sincethat time the development of mathematical infrastructure has provided access tovarious useful techniques, and a number of easier demonstrations have emerged, butthere are no proofs that are truly simple.

There is an important reason for this. The most common method of provingthat some mathematical object exists is to provide an algorithm that constructs it,or some proxy such as an arbitrarily accurate approximation, but for fixed pointsthis is problematic. Naively, one might imagine a computational strategy thattried to find an approximate fixed point by examining the value of the function atvarious points, eventually halting with a declaration that a certain point was a goodapproximation of a fixed point. For a function f : [0, 1] → [0, 1] such a strategyis feasible because if f(x) > x and f(x′) < x′ (as is the case if x = 0 and x′ = 1unless one of these is a fixed point) then the intermediate value function impliesthat there is a fixed point between x and x′. According to the sign of f(x′′)− x′′,where x′′ = (x+x′)/2, we can replace x or x′ with x′′, obtaining an interval with thesame property and half the length. Iterating this procedure provides an arbitrarilyfine approximation of a fixed point.

In higher dimensions such a computational strategy can never provide a guar-antee that the output is actually near a fixed point. To say precisely what we meanby this we need to be a bit more precise. Suppose you set out in search of a fixedpoint of a continuous function f : X → X (where X is nonempty, compact, andconvex subset of a Euclidean space) armed with nothing more than an “oracle” thatevaluates f . That is, the only computational resources you can access are the theo-retical knowledge that f is continuous, and a “black box” that tells you the value off at any point in its domain that you submit to it. An algorithm is, by definition,a computational procedure that is guaranteed to halt eventually, so our supposedalgorithm for computing a fixed point necessarily halts after sampling the oraclefinitely many times, say at x1, . . . , xn, with some declaration that such-and-such isat least an approximation of a fixed point. Provided that the dimension of X isat least two, the Devil could now change the function to one that agrees with theoriginal function at every point that was sampled, is continuous, and has no fixedpoints anywhere near the point designated by the algorithm. (One way to do this is

35

36 CHAPTER 3. COMPUTING FIXED POINTS

to replace f with h−1 ◦f ◦h where h : X → X is a suitable homeomorphism satisfy-ing h(xi) = xi and h(f(xi)) = f(xi) for all i = 1, . . . , n.) The algorithm necessarilyprocesses the new function in the same way, arriving at the same conclusion, butfor the new function that conclusion is erroneous.

Our strategy for proving Brouwer’s fixed point theorem will, of necessity, be abit indirect. We will prove the existence of objects that we will describe as “pointsthat are approximately fixed.” (The exact nature of such objects will vary fromone proof to the next.) An infinite sequence of such points, with the “error” ofthe approximation converging to zero, will have the property that each of its limitpoints is a fixed point.

The proof that any sequence in a compact space has an accumulation point usesthe axiom of choice, and in fact Brouwer’s fixed point theorem cannot be provedwithout it. The axiom of choice was rather controversial when it emerged, withconstructivists (Brouwer himself became one late in life) arguing that mathematicsshould only consider objects whose definitions are, in effect, algorithms for comput-ing the object in question, or at least a succession of finer and finer approximations.It turns out that this is quite restrictive, so the ‘should’ of the last sentence be-comes quite puritanical, at least in comparison with the rich mathematics allowedby a broader set of allowed definitions and accepted axioms, and constructivism hasalmost completely faded out in recent decades.

This chapter studies two algorithmic ideas for computing points that are ap-proximate fixed. One of these uses an algorithm for computing a Nash equilibriumof a two person game. The second may be viewed as a matter of approximating thegiven function or correspondence with an approximation that is piecewise linear inthe sense that its graph is a polyhedral complex. In both cases the algorithm tra-verses a path of edges in a polyhedral complex, and in the final section we explainrecent advances in computer science concerning such algorithms and the problemsthey solve.

3.1 The Lemke-Howson Algorithm

In a two person game each of the two players is required to choose an elementfrom a set of strategies, without being informed of the other player’s choice, and eachplayer’s payoff depends jointly on the pair of strategies chosen. A pair consistingof a strategy for each agent is a Nash equilibrium if neither agent can do better byswitching to some other strategy. The “mixed extension” is the derived two persongame with the same two players in which each player’s set of strategies is the set ofprobability measures on that player’s set of strategies in the original game. Payoffsin the mixed extension are computed by taking expectations.

In a sense, our primary concern in this section and the next is to show that whenthe sets of strategies in the given game are finite, the mixed extension necessarily hasa Nash equilibrium. But we will actually do something quite a bit more interestingand significant, by providing an algorithm that computes a Nash equilibrium. Wewill soon see that the existence result is a special case of the Kakutani fixed pointtheorem. But actually this case is not so “special” because we will eventually

3.1. THE LEMKE-HOWSON ALGORITHM 37

see that two person games can be used to approximate quite general fixed pointproblems.

Formally, a finite two person game consists of:

(a) nonempty finite sets S = {s1, . . . , sm} and T = {t1, . . . , tn} of pure strate-

gies for the two agents, who will be called agent 1 and agent 2;

(b) payoff functions u, v : S × T → R.

Elements of S×T are called pure strategy profiles. A pure Nash equilibrium

is a pure strategy profile (s, t) such that u(s′, t) ≤ u(s, t) for all s′ ∈ S and v(s, t′) ≤v(s, t) for all t′ ∈ T .

To define the mixed extension we need notational conventions for probabilitymeasures on finite sets. For each k = 0, 1, 2, . . . let

∆k−1 = { ρ ∈ Rk+ : ρ1 + · · ·+ ρk = 1 }

be the k − 1 dimensional simplex. We will typically think of this as the set ofprobability measures on a set with k elements indexed by the integers 1, . . . , k. Inparticular, let S = ∆m−1 and T = ∆n−1; elements of these sets are called mixed

strategies for agents 1 and 2 respectively. Abusing notation, we will frequentlyidentify pure strategies si ∈ S and tj ∈ T with the mixed strategies in S and Tthat assign all probability to i and j.

An element of S × T is called a mixed strategy profile. We let u and valso denote the bilinear extensions of the given payoff functions to S × T , so theexpected payoffs resulting from a mixed strategy profile (σ, τ) ∈ S × T are

u(σ, τ) =m∑

i=1

n∑

j=1

u(si, tj)σiτj and v(σ, τ) =m∑

i=1

n∑

j=1

v(si, tj)σiτj

respectively. A (mixed) Nash equilibrium is a mixed strategy profile (σ, τ) ∈S × T such that each agent is maximizing her expected payoff, taking the otheragent’s mixed strategy as given, so that u(σ′, τ) ≤ u(σ, τ) for all σ′ ∈ S andv(σ, τ ′) ≤ v(σ, τ) for all τ ′ ∈ T .

The algebraic expressions for expected payoffs given above are rather bulky.There is a way to “lighten” our notation that also allows linear algebra to be applied.Let A and B be the m × n matrices with entries aij = u(si, tj) and bij = v(si, tj).Treating mixed strategies as column vectors, we have

u(σ, τ) = σTAτ and v(σ, τ) = σTBτ,

so that (σ, τ) is a Nash equilibrium if σ′TAτ ≤ σTAτ for all σ′ ∈ S and σTBτ ′ ≤σTBτ for all τ ′ ∈ T . The set of Nash equilibria can be viewed as the set of fixedpoints of an upper semicontinuous convex valued correspondence β : S×T → S×Twhere β(σ, τ) = β1(τ)× β2(σ) is given by

β1(τ) = argmaxσ′∈S

σ′TAτ and β2(σ) = argmaxτ ′∈T

σTBτ ′.


A concrete example may help to fix ideas. Suppose that m = n = 3, with

A =

3 3 44 3 33 4 3

and B =

4 5 24 2 55 4 2

.

These payoffs determine the divisions of S and T , according to best responses,shown in Figure 3.1 below.

b

b

b

s1

s2

s3

t1

t2

t3S

b

b

b

t1

t2

t3

s1s2

s3

T

Figure 3.1

Specifically, for any σ ∈ S, β2(σ) is the set of probability measures that assignall probability to pure strategies whose associated regions in S contain σ in theirclosure, and similarly for β1(τ). With a little bit of work you should have nodifficulty verifying that the divisions of S and T are as pictured, but the discussionuses only the qualitative information shown in the figure, so you can skip this choreif you like.

Because the number of pure strategies is quite small, we can use exhaustivesearch to find all Nash equilibria. For games in which each pure strategy has aunique best response a relatively quick way to find all pure Nash equilibria is tostart with an arbitrary pure strategy and follow the sequence of pure best responsesuntil it visits a pure strategy a second time. The last two strategies on the pathconstitute a Nash equilibrium if they are best responses to each other, and none ofthe preceeding strategies is part of a pure Nash equilibrium. If there are any purestrategies that were not reached, we can repeat the process starting at one of them,continuing until all pure strategies have been examined. For this example, startingat s1 gives the cycle

s1 −→ t2 −→ s3 −→ t1 −→ s2 −→ t3 −→ s1,

so there are no pure Nash equilibria.


A similar procedure can be used to find Nash equilibria in which each agentmixes over two pure strategies. If we consider s1 and s2, we see that there are twomixtures that allow agent 2 to mix over two pure strategies, and we will need toconsider both of them, so things are a bit more complicated than they were for purestrategies because the process “branches.” Suppose that agent 1 mixes over s1 ands2 in the proportion that makes t1 and t2 best responses. Agent 2 has a mixture oft1 and t2 that makes s2 and s3 best responses. There is a mixture of s2 and s3 thatmakes t1 and t3 best responses, and a certain mixture τ ∗ of t1 and t3 makes s1 ands2 best responses. The only hope for continuing this path in a way that might leadto a Nash equilibrium is to now consider the mixture σ∗ of s1 and s2 that makes t1and t3 best responses, and indeed, (σ∗, τ ∗) is a Nash equilibrium.

We haven’t yet considered the possibility that agent 1 might mix over s1 and s3,nor have we examined what might happen if agent 2 mixes over t2 and t3. There isa mixture of s1 and s3 that allow agent 2 to mix over t1 and t2, which is a possibilitywe have already considered and there is a mixture of t2 and t3 that allows agent1 to mix over s1 and s3, which we also analyzed above. Therefore there are noadditional Nash equilibria in which both agents mix over two pure strategies.

Could there be a Nash equilibrium in which one of the agents mixes over allthree pure strategies? Agent 2 does have one mixed strategy that allows agent 1 tomix freely, but this mixed strategy assigns positive probability to all pure strategies(such a mixed strategy is said to be totally mixed) so it is not a best responseto any of agent 1’s mixed strategies, and we can conclude that there is no Nashequilibrium of this sort. Thus (σ∗, τ ∗) is the only Nash equilibrium.

This sort of analysis quickly becomes extremely tedious as the game becomeslarger. In addition, the fact that we are able to find all Nash equilibria in this waydoes not prove that there is always something to find.

Before continuing we reformulate Nash equilibrium using a simple principle withnumerous repercussions, namely that a mixed strategy maximizes expected utility ifand only if it assigns all probability to pure strategies that maximize expected utility.To understand this formally it suffices to note that agent 1’s problem is to maximize

ui(σ, τ) = σTAτ =

m∑

i=1

σi

(

n∑

j=1

aijτj

)

subject to the constraints σi ≥ 0 for all i and∑m

i=1 σi = 1, taking τ as given. Fromthis it follows that:

Lemma 3.1.1. A mixed strategy profile (σ, τ) is a Nash equilibrium if and only if:

(a) for each i = 1, . . . , m, either σi = 0 or∑n

j=1 aijτj ≥ ∑nj=1 ai′jτj for all

i′ = 1, . . . , m;

(b) for each j = 1, . . . , n, either τj = 0 or∑m

i=1 bijσi ≥∑m

i=1 bij′σi for all j′ =1, . . . , n.

For each m+ n conditions there are two possibilities, so there are 2m+n cases. Foreach of these cases the intuition derived from counting equations and unknowns


suggests that the set of solutions of the conditions given in Lemma 3.1.1 will typi-cally be zero dimensional, which is to say that it is a finite set of points. Thus weexpect that the set of Nash equilibria will typically be finite.

The Lemke-Howson algorithm is based on the hope that if we relax one of theconditions above, say the one saying that either σ1 = 0 or agent 1’s first purestrategy is a best response, then we may expect that the resulting set will be onedimensional. Specifically, we let M be the set of pairs (σ, τ) ∈ S × T satisfying:

(a) for each i = 2, . . . , m, either σi = 0 or∑n

j=1 aijτj ≥ ∑nj=1 ai′jτj for all i′ =

1, . . . , m;

(b) for each j = 1, . . . , n, either τj = 0 or∑m

i=1 bijσi ≥∑m

i=1 bij′σi for all j′ =1, . . . , n.

For the rest of the section we will assume that M is 1-dimensional, and that it doesnot contain any point satisfying more than m + n of the 2(m + n − 1) conditions“σi = 0,” “strategy i is optimal,” “τj = 0,” and “strategy j is optimal,” for 2 ≤ i ≤m and 1 ≤ j ≤ n.

For our example there is a path in M that follows the path

(s1, t2) −→ (A, t2) −→ (A,B) −→ (C,B) −→ (C, t1) −→ (D, t1) −→ (D,E).

This path alternates between the moves in S and the moves in T shown in Figure3.2 below:

b

b

b

b

s1

s2

s3

t1

t2

t3

A

C

D

b

b

bbt1

t2

t3

s1s2

s3

B

E

1

2

34

5

6

Figure 3.2

Let’s look at this path in detail. The best response to s1 is t2, so (s1, t2) ∈ M .The best response to t2 is s3, so there is an edge inM leading away from (s1, t2) thatincreases the probability of s3 until (A, t2) is reached. We can’t continue further inthis direction because t2 would cease to be a best response. However, t1 becomes a


best response at A, so there is the possibility of holding A fixed and moving awayfrom t2 along the edge of T between t1 and t2. We can’t continue in this way pastB because s3 would no longer be a best response. However, at B both s2 ands3 are best responses, so the conditions defining M place no constraints on agent1’s mixed strategy. Therefore we can move away from (A,B) by holding B fixedand moving into the interior of S in a way that obeys the constraints on agent 2’smixed strategy, which are that t1 and t2 are best responses. This edge bumps intothe boundary of S at C. Since the probability of s3 is now zero, we are no longerrequired to have it be a best response, so we can continue from B along the edge ofT until we arrive at t1. Since the probability of t2 is now zero, we can move awayfrom C along the edge between s1 and s2 until we arrive at D. Since t3 is now abest response, we can move away from t1 along the edge between t1 and t3 until wearrive at E. As we saw above, (D,E) = (σ∗, τ ∗) is a Nash equilibrium.

We now explain how this works in general. If Y is a proper subset of {1, . . . , m}and D is a nonempty subset of {1, . . . , n}, let

SY (D) = { σ ∈ S : σi = 0 for all i ∈ Y and D ⊂ argmaxj=1,...,n

∑

i

bijσi }

be the set of mixed strategies for agent 1 that assign zero probability to every purestrategy in Y and make every pure strategy in D a best response. Evidently SY (D)is a polytope.

It is now time to say what “typically” means. The matrix B is said to be inLemke-Howson general position if, for all Y and D, SY (D) is either empty or(m− |D| − |Y |)-dimensional. That is, SY (D) has the dimensions one would expectby counting equations and unknowns. In particular, if m < |D|+ |Y |, then SY (D)is certainly empty.

Similarly, if Z is a proper subset of {1, . . . , n} and C is a nonempty subset of{1, . . . , m}, let

TZ(C) = { τ ∈ T : τj = 0 for all j ∈ Z and C ⊂ argmaxi=1,...,m

∑

j

aijτj }.

The matrix A is said to be in Lemke-Howson general position if, for all Z and C,TZ(C) is either empty or (n − |C| − |Z|)-dimensional. Through the remainder ofthis section we assume that A and B are in Lemke-Howson general position.

The set of Nash equilibria is the union of the cartesian products SY (D)×TZ(C)over all quadruples (Y,D, Z, C) with Y ∪ C = {1, . . . , m} and Z ∪D = {1, . . . , n}.The general position assumption implies that if such a product is nonempty, then|Y | + |C| = m and |Z| + |D| = n, so that Y and C are disjoint, as are Z and D,and SY (D)×TZ(C) is zero dimensional, i.e., a singleton. Thus the general positionassumption implies that there are finitely many equilibria.

In addition, we now have

M =⋃

SY (D)× TZ(C) (∗)

where the union is over all quadruples (Y,D, Z, C) such that:


(a) Y and Z are proper subsets of {1, . . . , m} and {1, . . . , n};

(b) C and D are nonempty subsets of {1, . . . , m} and {1, . . . , n};

(c) {2, . . . , m} ⊂ Y ∪ C;

(d) {1, . . . , n} = Z ∪D;

(e) SY (D) and TZ(C) are nonempty.

A quadruple (Y,D, Z, C) satisfying these conditions is said to be qualified. Avertex quadruple is a qualified quadruple (Y,D, Z, C) such that SY (D)× TZ(C)is 0-dimensional. It is the starting point of the algorithm if Y = {2, . . . , m},and it is a Nash equilibrium if 1 ∈ Y ∪ C.

An edge quadruple is a qualified quadruple (Y,D, Z, C) such that SY (D) ×TZ(C) is 1-dimensional. A vertex quadruple (Y ′, D′, Z ′, C ′) is an endpoint of thisedge quadruple if Y ⊂ Y ′, D ⊂ D′, Z ⊂ Z ′, and C ⊂ C ′. It is easy to see thatthe edge quadruple has two endpoints: if SY (D) is 1-dimensional, then it has twoendpoints SY ′(D′) and SY ′′(D′′), in which case (Y ′, D′, Z, C) and (Y ′′, D′′, Z, C) arethe two endpoints of (Y, C, Z,D), and similarly if TZ(C) is q-dimensional.

Evidently M is a graph. The picture we would like to establish is that it isa union of loops, paths whose endpoints are the Nash equilibria and the startingpoint of the algorithm, and possibly an isolated point if the starting point of thealgorithm happens to be a Nash equilibrium. If this is the case we can find a Nashequilibrium by following the path leading away from the starting point until wereach its other endpoint, which is necessarily a Nash equilibrium.

Put another way, we would like to show that a vertex quadruple is an endpointof zero, one, or two edge quadruples, and:

(i) if it is an endpoint of no edge quadruples, then it is both the starting pointof the algorithm and a Nash equilibrium;

(ii) if it is an endpoint of one edge quadruple, then it is either the starting pointof the algorithm, but not a Nash equilibrium, or a Nash equilibrium, but notthe starting point of the algorithm;

(iii) if it is an endpoint of two edge quadruples, then it is neither the starting pointof the algorithm nor a Nash equilibrium.

So, suppose that (Y,D, Z, C) is a vertex quadruple. There are two main casesto consider, the first of which is that it is a Nash equilibrium, so that 1 ∈ Y ∪ C.If 1 ∈ Y , then (Y \ {1}, D, Z, C) is the only quadruple that could be an edgequadruple that has (Y,D, Z, C) as an endpoint, and it is in fact such a quadruple:(a)-(d) hold obviously, and SY \{1}(D) is nonempty because SY (D) is a nonemptysubset. If 1 ∈ C, then (Y \ {1}, D, Z, C) is the only quadruple that could be anedge quadruple that has (Y,D, Z, C) as an endpoint, and the same logic shows thatit is except when C = {1}, in which case Y = {2, . . . , m}, i.e., (Y,D, Z, C) is thestarting point of the algorithm. Summarizing, if (Y,D, Z, C) is a Nash equilibriumvertex quadruple, it is an endpoint of precisely one edge quadruple except when it


is the starting point of the algorithm, in which case it is not an endpoint of anyedge quadruple.

Now suppose that (Y,D, Z, C) is not a Nash equilibrium. Since SY (D) andTZ(C) are 0-dimensional, |D|+ |Y | = m and |C|+ |Z| = n, so, in view of (e), oneof the two intersections Y ∩ C and Z ∩ D is a singleton while the other is empty.First suppose that Z ∩ D = {j}. Then (Y,D, Z \ {j}, C) and (Y,D \ {j}, Z, C)are the only quadruples that might be edge quadruples that have (Y,D, Z, C) as anendpoint, and in fact both are: again (a)-(d) hold obviously (except that one mustnote that |D| ≥ 2 because |Z ∪D| = n, |Z| < n, and |Z ∩D| = 1) and SY (D \ {j})and TZ\{j}(C) are both nonempty because SY (D) and TZ(C) are nonempty subsets.

On the other hand, if Y ∩C = {i}, then (Y \ {i}, D, Z, C) and (Y,D, Z, C \ {i})are the only quadruples that might be edge quadruples that have (Y,D, Z, C) as anendpoint. By the logic above, (Y \ {i}, D, Z, C) certainly is, and (Y,D, Z, C \ {i})is if C 6= {i}, and not otherwise. When C = {i} we have Y ∪ C = {2, . . . , m}and Y ∩ C = {i} = C, so Y = {2, . . . , m}, which is to say that (Y,D, Z, C) is thestarting point of the algorithm. In sum, if (Y,D, Z, C) is not a Nash equilibrium, itis an endpoint of precisely two edge quadruples except when it is the starting pointof the algorithm, in which case is an endpoint of precisely one edge quadruple.

Taken together, these observations verify (i)-(iii), and complete the formal veri-fication of the main properties of the Lemke-Howson algorithm. Two aspects ofthe procedure are worth noting. First, when SY (D) × TZ(C) is a vertex thatis an endpoint of two edges, the two edges are either SY \{i}(D) × TZ(C) andSY (D)×TZ(C \ {i}) for some i or SY (D)×TZ\{j}(C) and SY (D \ {j})×TZ(C) forsome j. In both cases one of the edges is the cartesian product of a line segmentin S and a point in T while the other is the cartesian product of a point in S anda line segment in T . Geometrically, the algorithm alternates between motion in Sand motion in T .

Second, although our discussion has singled out the first pure strategy of agent1, this was arbitrary, and any pure strategy of either player could be designated forthis role. It is quite possible that different choices will lead to different equilibria.In addition, although the algorithm was described in terms of starting at this purestrategy and its best response, the path following procedure can be started at anyendpoint of a path in M . In particular, having computed a Nash equilibrium usingone designated pure strategy, we can then switch to a different designated purestrategy and follow the path, for the new designated pure strategy, going awayfrom the equilibrium. This path may go to the starting point of the algorithmfor the new designated pure strategy, but it is also quite possible that it leadsto a Nash equilibrium that cannot be reached directly by the algorithm using anydesignated pure strategy. Equilibria that can be reached by repeated applications ofthis maneuver are said to be accessible. A famous example due to Robert Wilson(reported in Shapley (1974))) shows that there can be inaccessible equilibria evenin games with a surprisingly small number of pure strategies.


3.2 Implementation and Degeneracy Resolution

We have described the Lemke-Howson algorithm geometrically, in terms that ahuman can picture, but that it not quite the same thing as providing a descriptionin terms of concrete, fully elaborated, algebraic operations. This section providessuch a description. In addition, our discussion to this point has assumed a gamein Lemke-Howson general position. In order to prove that any game has a Nashequilibrium it suffices to show that games in general position are dense in the setof pairs (A,B) of m × n matrices, because it is easy to see that if (Ar, Br) is asequence converging to (A,B), and for each r we have a Nash equilibrium (σr, τ r)of (the game with payoff matrices) (Ar, Br), then along some subsequence we have(σr τ r) → (σ, τ), and (σ, τ) is a Nash equilibrium of (A,B). However, we will dosomething quite a bit more elegant and useful, providing a refinement of the Lemke-Howson algorithm that works even for games that are not in Lemke-Howson generalposition.

The formulation of the Nash equilibrium problem we have been working withso far is a matter of finding u∗, v∗ ∈ R, s′, σ′ ∈ Rm, and t′, τ ′ ∈ Rn such that:

Aτ ′ + s′ = u∗em, BTσ′ + t′ = v∗en, 〈s′, σ′〉 = 0 = 〈t′, τ ′〉, 〈σ′, em〉 = 1 = 〈τ ′, en〉,

s′, σ′ ≥ 0 ∈ Rm, t′, τ ′ ≥ 0 ∈ Rn.

The set of Nash equilibria is unaffected if we add a constant to every entry in acolumn of A, or to every entry of a row of B. Therefore we may assume that allthe entries of A and B are positive, and will do so henceforth. Now the equilibriumutilities u∗ and v∗ are necessarily positive, so we can divide in the system above,obtaining the system

Aτ + s = em, BTσ + t = en, 〈s, σ〉 = 0 = 〈t, τ〉, s, σ ≥ 0 ∈ Rm, t, τ ≥ 0 ∈ Rn

together with the formulas 〈σ, em〉 = 1/v∗ and 〈τ, en〉 = 1/u∗ for computing equi-librium expected payoffs. The components of s and t are called slack variables.

This new system is not quite equivalent to the one above because the one abovein effect requires that σ and τ each have some positive components. The new systemhas another solution that does not come from a Nash equilibrium, namely σ = 0,τ = 0, s = em, and t = en. It is called the extraneous solution. To see thatthis is the only new solution consider that if σ = 0, then t = en, so that 〈t, τ〉 = 0implies τ = 0, and similarly τ = 0 implies that σ = 0.

We now wish to see the geometry of the Lemke-Howson algorithm in the newcoordinate system. Let

S∗ = { σ ∈ Rm : σ ≥ 0 and BTσ ≤ en } and T ∗ = { τ ∈ Rn : τ ≥ 0 and Aτ ≤ em }.

There is a bijection σ 7→ σ/∑

i σi between the points on the upper surface of S∗,namely those for which some component of en − BTσ is zero, and the points of S,and similarly for T ∗ and T .

For the game studied in the last section the polytopes S∗ and T ∗ are shown inFigure 3.3 below. Note that the best response regions in Figure 3.1 have becomefacets.

3.2. IMPLEMENTATION AND DEGENERACY RESOLUTION 45

σ1

σ2

σ3t1

t2

t3S∗

τ1

τ2

τ3s1

s2

s3

T ∗

Figure 3.3

We now transport the Lemke-Howson algorithm to this framework. Let M∗ bethe set of (σ, τ) ∈ S∗ ×T ∗ such that, when we set s = em −Aτ and t = en −BTσ,we have

(a) for each i = 2, . . . , m, either σi = 0 or si = 0;

(b) for each j = 1, . . . , n, either τj = 0 or tj = 0.

For our running example we can follow a path in M∗ from (0, 0) to the image ofthe Nash equilibrium, as shown in Figure 3.4. This path has a couple more edgesthan the one in Figure 3.2, but there is the advantage of starting at (0, 0), which isa bit more canonical.

If we set

ℓ = m+ n, C =

[

0 ABT 0

]

, q = eℓ, y = (σ, τ), and x = (s, t),

the system above is equivalent to

Cy + x = q 〈x, y〉 = 0 x, y ≥ 0 ∈ Rℓ. (∗)

This is called the linear complementarity problem. It arises in a variety of othersettings, and is very extensively studied. The framework of the linear complemen-tarity problem is simpler conceptually and notationally, and it allows somewhatgreater generality, so we will work with it for the remainder of this section.


LetP = { (x, y) ∈ Rℓ × Rℓ : x ≥ 0, y ≥ 0, and Cy + x = q }.

We will assume that all the components of q are positive, that all the entries of Care nonnegative, and that each row of C has at least one positive entry, so that Pis bounded and thus a polytope. In general a d-dimensional polytope is said to besimple if each of its vertices is in exactly d facets. The condition that generalizesthe general position assumption on A and B is that P is simple.

Let the projection of P onto the second copy of Rℓ be

Q = { y ∈ Rℓ : y ≥ 0 and Cy ≤ q }.

If C =

[

0 ABT 0

]

and q = eℓ, then Q = S∗ × T ∗, and each edge of Q is either the

cartesian product of a vertex of S∗ and an edge of T ∗ or the cartesian product ofan edge of S∗ and a vertex of T ∗.

b

b

σ1

σ2

σ3t1

t2

t3

b

b

τ1

τ2

τ3s1

s2

s3

Figure 3.4

Our problem is to find a (x, y) ∈ P such that x 6= 0 satisfying the “complemen-tary slackness condition” 〈x, y〉 = 0. The algorithm follows the path starting at(x, y) = (q, 0) in

M∗∗ = { (x, y) ∈ P : x2y2 + · · ·+ xℓyℓ = 0 }.

The equation x2y2+· · ·+xℓyℓ = 0 encodes the condition that for each j = 2, . . . , ℓ, ei-ther xj = 0 or yj = 0. Suppose we are at a vertex (x, y) of P satisfying this condition,

3.2. IMPLEMENTATION AND DEGENERACY RESOLUTION 47

but not x1y1 = 0. Since P is simple, exactly ℓ of the variables x2, . . . , xℓ, y2, . . . , yℓvanish, so there is some i such that xi = 0 = yi. The portion of P where xi ≥ 0and the other ℓ − 1 variables vanish is an edge of P whose other endpoint is thefirst point where one of the ℓ variables that are positive at (x, y) vanishes. Again,since P is simple, precisely one of those variables vanishes there.

How should we describe moving from one vertex to the next algebraically? Con-sider specifically the mave away from (0, q). Observe that P is the graph of thefunction y 7→ q−Cy from Q to Rℓ. We explicitly write out the system of equationsdescribing this function:

x1 = q1 − c11y1 − · · · − c1ℓyℓ,

......

......

xi = qi − ci1y1 − · · · − ciℓyℓ,

......

......

xℓ = qℓ − cℓ1y1 − · · · − cℓℓyℓ.

As we increase y1, holding 0 = y2 = · · · = yℓ, the constraint we bump into first isthe one requiring xi ≥ 0 for the i for which qi/ci1 is minimal. If i = 1, then thepoint we arrived at is a solution and the algorithm halts, so we may suppose thati ≥ 2.

We now want to describe P as the graph of a function with domain in thexi, y2, . . . , yℓ coordinate subspace, and x1, . . . , xi−1, y1, xi+1, . . . , xℓ as the variablesparameterizing the range. To this end we rewrite the ith equation as

y1 =1

ci1qi −

1

ci1xi −

ci2ci1y2 − · · · − ciℓ

ci1yℓ.

Replacing the first equation above with this, and substituting it into the otherequations, gives

x1 =(

q1 −c11ci1qi

)

−(

− c11ci1

)

xi −(

c12 −c11ci2ci1

)

y2 − · · · −(

c1ℓ −c11ciℓci1

)

yℓ,

......

......

...

y1 =1

ci1qi − 1

ci1xi − ci2

ci1y2 − · · · − ciℓ

ci1yℓ,

......

......

...

xℓ =(

qℓ −cℓ1ci1qi

)

−(

− cℓ1ci1

)

xi −(

cℓ2 −cℓ1ci2ci1

)

y2 − · · · −(

cℓℓ −cℓ1ciℓci1

)

yℓ.

This is not exactly a thing of beauty, but it evidently has the same form as whatwe started with. The data of the algorithm consists of a tableau [q′, C ′], a listdescribing how the rows and the last ℓ columns of the tableau correspond to theoriginal variables of the problem, and the variable that vanished when we arrivedat the corresponding vertex. If this variable is either x1 or y1 we are done. Other-wise the data is updated by letting the variable that is complementary to this one


increase, finding the next variable that will vanish when we do so, then updatingthe list and the tableau appropriately. This process is called pivoting.

We can now describe how the algorithm works in the degenerate case when Pis not necessarily simple. From a conceptual point of view, our method of handlingdegenerate problems is to deform them slightly, so that they become nondegenerate,but in the end we will have only a combinatoric rule for choosing the next pivotvariable. Let L = { (x, y) ∈ Rℓ × Rℓ : Cy + x = q }, let α1, . . . , αℓ, β1, . . . , βℓ bedistinct positive integers, and for ε > 0 let

Pε = { (x, y) ∈ L : xi ≥ −εαi and yi ≥ −εβi for all i = 1, . . . , ℓ }.

If (x, y) is a vertex of Pε, then there are ℓ variables, which we will describe as“free variables,” whose corresponding equations xi = εαi and yi = εβi determine(x, y) as the unique member of L satisfying them. At the point in L where theseequations are satisfied, the other variables can be written as linear combinations ofthe free variables, and thus as polynomial functions of ε. Because the αi and βiare all different, there are only finitely many values of ε such that any of the othervariables vanish at this vertex. Because there are finitely many ℓ-element subsetsof the 2ℓ variables, it follows that Pε is simple for all but finitely many values of ε.

In particular, for all ε in some interval (0, ε) the combinatoric structure of Pε willbe independent of ε. In addition, we do not actually need to work in Pε because thepivoting procedure, applied to the polytope Pε for such ε, will follow a well definedpath that can be described in terms of a combinatoric procedure for choosing thenext pivot variable.

To see what we mean be this consider the problem of finding which xi first goesbelow −εαi as we go out the line y1 ≥ −εβ1, y2 = −εβ2, . . . , yℓ = −εβℓ . This isbasically a process of elimination. If ci1 ≤ 0, then increasing y1 never leads to aviolation of the ith constraint, so we can begin by eliminating all those i for whichci1 is not positive. Among the remaining i, the problem is to find the i for which

1

ci1qi +

1

ci1εαi +

ci2ci1εβ2 + · · ·+ ciℓ

ci1εβℓ

is smallest for small ε > 0. The next step is to eliminate all i for which qi/c1i is notminimal. For each i that remains the expression

1

ci1εαi +

ci2ci1εβ2 + · · ·+ ciℓ

ci1εβℓ

has a dominant term, namely the term, among those with nonzero coefficients,whose exponent is smallest. The dominant terms are ordered according to theirvalues for small ε > 0:

(a) terms with positive coefficients are greater than terms with negative coeffi-cients;

(b) among terms with positive coefficients, those with smaller exponents aregreater than terms with larger exponents, and if two terms have equal ex-ponents they are ordered according to the coefficients;

3.3. USING GAMES TO FIND FIXED POINTS 49

(c) among terms with negative coefficients, those with larger exponents are greaterthan terms with smaller exponents, and if two terms have equal exponentsthey are ordered according to the coefficients.

We now eliminate all i for which the dominant term is not minimal. All remainingi have the same dominant term, and we continue by subtracting off this term andcomparing the resulting expressions in a similar manner, repeating until only one iremains. This process does necessarily continue until only one i remains, becauseif other terms of the expressions above fail to distinguish between two possibilities,eventually there will be a comparison involving the terms εαi/ci1, and the exponentsα1, . . . , αℓ, β1, . . . , βℓ are distinct.

Let’s review the situation. We have given an algorithm that finds a solutionof the linear complementarity problem (∗) that is different from (q, 0). The as-sumptions that insure that the algorithm works are that q ≥ 0 and that P isa polytope. In particular, these assumptions are satisfied when the linear comple-mentarity problem is derived from a two person game with positive payoffs, in whichcase any solution other than (q, 0) corresponds to a Nash equilibrium. Thereforeany two person game with positive payoffs has a Nash equilibrium, but since theequilibrium conditions are unaffected by adding a constant to a player’s payoffs, infact we have now shown that any two person game has a Nash equilibrium.

There are additional issues that arise in connection with implementing the al-gorithm, since computers cannot do exact arithmetic on arbitrary real numbers.One possibility is to require that the entries of q and C lie in a set of numbersfor which exact arithmetic is possible—usually the rationals, but there are otherpossibilities, at least theoretically. Alternatively, one may work with floating pointnumbers, which is more practical, but also more demanding because there are issuesassociated with round-off error, and in particular its accumulation as the number ofpivots increases. The sort of pivoting we have studied here also underlies the sim-plex algorithm for linear programming, and the same sorts of ideas are applied toresolve degeneracy. Numerical analysis for linear programming has a huge amountof theory, much of which is applicable to the Lemke-Howson algorithm, but it is farbeyond our scope.

3.3 Using Games to Find Fixed Points

It is surprisingly easy to use the existence of equilibrium in two person games toprove Kakutani’s fixed point theorem in full generality. The key idea has a simpledescription. Fix a nonempty compact convex X ⊂ Rd, and let F : X → X be a (notnecessarily convex valued or upper semicontinuous) correspondence with compactvalues. We can define a two person game with strategy sets S = T = X by setting

u(s, t) = − minx∈F (t)

‖s− x‖2 and v(s, t) =

{

0, s 6= t,

1, s = t.

If (s, t) is a Nash equilibrium, then s ∈ F (t) and t = s, so s = t is a fixed point.Conversely, if x is a fixed point, then (x, x) is a Nash equilibrium.


Of course this observation does not prove anything, but it does point in a usefuldirection. Let x1, . . . , xn, y1, . . . , yn ∈ X be given. We can define a finite two persongame with n× n payoff matrices A = (aij) and B = (bij) by setting

aij = −‖xi − yj‖2 and bij =

{

0, i 6= j,

1, i = j.

Let (σ, τ) ∈ ∆n−1×∆n−1 be a mixed strategy profile. Clearly τ is a best response toσ if and only if it assigns all probability to the strategies that are assigned maximumprobability by σ, which is to say that τj > 0 implies that σj ≥ σi for all i.

Understanding when σ is a best response to τ requires a brief calculation. Letz =

∑nj=1 τjyj. For each i we have

∑

j

aijτj = −∑

j

τj‖xi − yj‖2 = −∑

j

τj⟨

xi − yj, xi − yj⟩

= −∑

j

τj⟨

xi, xi⟩

+ 2∑

j

τj⟨

xi, yj⟩

−∑

j

τj⟨

yj, yj⟩

= −⟨

xi, xi⟩

+ 2⟨

xi, z⟩

− 〈z, z〉 + C = −‖xi − z‖2 + C

where C = ‖z‖2 −∑nj=1 τj‖yj‖2 is a quantity that does not depend on i. Therefore

σ is a best response to τ if and only if it assigns all probability to those i with xi asclose to z as possible. If y1 ∈ F (x1), . . . , yn ∈ F (xn), then there is a sense in whicha Nash equilibrium may be regarded as a “point that is approximately fixed.”

We are going to make this precise, thereby proving Kakutani’s fixed point the-orem. Assume now that F is upper semicontinuous with convex values. Definesequences x1, x2, . . . and y1, y2, . . . inductively as follows. Choose x1 arbitrarily, andlet y1 be an element of F (x1). Supposing that x1, . . . , xn and y1, . . . , yn, have al-ready been determined, let (σn, τn) be a Nash equilibrium of the two person gamewith payoff matrices An = (anij) and B

n = (bnij) where anij = −‖xi − yj‖2 and bnij is

1 if i = j and 0 otherwise. Let xn+1 =∑

j τjyj, and choose yn+1 ∈ F (yn+1).Let x∗ be an accumulation point of the sequence {xn}. To show that x∗ is a

fixed point of F it suffices to show that it is an element of the closure of any convexneighborhood V of F (x∗). Choose δ > 0 such that F (x) ⊂ V for all x ∈ Uδ(x

∗).Consider an n such that xn+1 =

∑

j τnj yj ∈ Uδ/3(x

∗) and at least one of x1, . . . , xnis also in this ball. Then the points in x1, . . . , xn that are closest to xn+1 are inU2δ/3(xn+1) ⊂ Uδ(x

∗), so xn+1 is a convex combination of points in V , and istherefore in V . Therefore x∗ is in the closure of the set of xn that lie in V , and thusin the closure of V .

In addition to proving the Kakutani fixed point theorem, we have accumulatedall the components of an algorithm for computing approximately fixed points ofa continuous function f : X → X . Specifically, for any error tolerance ε > 0 wecompute the sequences x1, x2, . . . and y1, y2, . . . with f in place of F , halting when‖xn+1−f(xn+1)‖ < ε. The argument above shows that this is, in fact, an algorithm,in the sense that it is guaranteed to halt eventually. This algorithm is quite new.Code implementing it exists, and the initial impression is that it performs quitewell. But it has not been extensively tested.

3.4. SPERNER’S LEMMA 51

There is one more idea that may have some algorithmic interest. As before, weconsider points x1, . . . , xn, y1, . . . , yn ∈ Rd. Define a correspondence Φ : Rd → Rd

by letting Φ(z) be the convex hull of { yj : j ∈ argmini ‖z − xi‖ } when z ∈ PJ .(Evidently this construction is closely related to the Voronoi diagram determined byx1, . . . , xn. Recall that this is the polyhedral decomposition of Rd whose nonemptypolyhedra are the sets PJ = { z ∈ V : J ⊂ argmini ‖z − xi‖ } where ∅ 6= J ⊂{1, . . . , n}.) Clearly Φ is upper semicontinuous and convex valued.

Suppose that z is a fixed point of this correspondence. Then z is a convexcombination

∑

j τjyj with yj = 0 if j /∈ argmini ‖z − xi‖. Let J = { j : yj >0 }. If σi = 1/|J | when i ∈ J and σi = 0 when i /∈ J , then (σ, τ) is a Nashequilibrium of the game derived from x1, . . . , xn, y1, . . . , yn. Conversely, if (σ, τ) isa Nash equilibrium of this game, then

∑

j∈J τjyj is a fixed point of Φ. In a sense,the algorithm described above approximates the given correspondence F with acorrespondence of a particularly simple type.

We may project the path of the Lemke-Howson algorithm, in its application tothe game derived from x1, . . . , xn, y1, . . . , yn, into this setting. Define Φ1 : R

d → Rd

by letting Φ1(z) be the convex hull of { yi : i ∈ {1}∪argmini ‖z−xi‖ }. Suppose that(σ, τ) is an element of the set M defined in Section 3.1, so that all the conditionsof Nash equilibrium are satisfied except that it may be the case that σ1 > 0 even ifthe first pure strategy is not optimal. Let J = { j : τj > 0 }, and let z =

∑

j τjyj .Then J ⊂ { i : σi > 0 } ⊂ {1}∪argminj ‖z−xj‖, so z ∈ Φ1(z). Conversely, supposez is a fixed point of Φ1, and let J = argminj ‖z − xj‖. Then z =

∑

j τjyj for some

τ ∈ ∆n−1 with τj = 0 for all j /∈ {1} ∪ J . If we let σ be the element of ∆n−1 suchthat σi = 1/|{1} ∪ J | if i ∈ J and σi = 0 if i /∈ {1} ∪ J , then (σ, τ) ∈M .

If n is large one might guess that there is a sense in which operating in Rd mightbe less burdensome than working in ∆n−1 × ∆n−1, but it seems to be difficult todevise algorithms that take concrete advantage of this. Nonetheless this setup doesgive a picture of what the Lemke-Howson algorithm is doing that has interestingimplications. For example, if there is no point in Rd that is equidistant from morethan d + 1 points, then there is no point (σ, τ) ∈ M with σi > 0 for more thand + 2 indices. This gives a useful upper bound on the number of pivots of theLemke-Howson algorithm.

3.4 Sperner’s Lemma

Sperner’s lemma is the traditional method of proving Brouwer’s fixed pointtheorem without developing the machinery of algebraic topology. It dates from thelate 1920’s, which was a period during which the methods developed by Poincareand Brouwer were being recast in algebraic terms.

Most of our work will take place in ∆d−1. Let P be a triangulation of ∆d−1. Fork = 0, . . . , d − 1 let Pk be the set of k-dimensional elements of P. Let V = P0 bethe set of vertices of P, and fix a function

ℓ : V → {1, . . . , d}.We say that ℓ is a labelling for P, and we call ℓ(v) the label of v. If ℓ(v) 6= ifor all v ∈ V with vi = 0, then ℓ is a Sperner labelling. Let e1, . . . , ed be the


standard unit basis vectors of Rd. Then ℓ is a Sperner labelling if ℓ(v) ∈ { i1, . . . , ik }whenever v is contained in the convex hull of ei1 , . . . , eik . We say that σ ∈ Pd−1

with vertex set {v1, . . . , vd} is completely labelled if

{ℓ(v1), . . . , ℓ(vd)} = {1, . . . , d}.

b

b

b

1 2

3

b

b

b b

b

b

bbbbb

b

b

b

b

b

b

b

1 2

1

2

1

1

2 1 2

1 2

2

3

33

1

31

Figure 3.5

Theorem 3.4.1 (Sperner’s Lemma). If ℓ is a Sperner labelling, then the numberof completely labelled simplices is odd.

Before proving this, let’s see why it’s important:

Proof of Brower’s Theorem. Let f : ∆d−1 → ∆d−1 be a continuous function. Propo-sition 2.5.2 implies that there is a sequence P1,P2, . . . of triangulations whose meshesconverge to zero. For each r = 1, 2, . . . let V r be the set of vertices of Pr. If anyof the elements of V r is a fixed point we are done, and otherwise we can defineℓr : V r → {0, . . . , d} by letting ℓr(v) be the smallest index i such that vi > fi(v).Evidently ℓr is a Sperner labelling, so there is a completely labelled simplex withvertices vr1, . . . , v

rd where ℓ

r(vri ) = i. Passing to a subsequence, we may assume thatthe sequences v1i , v

2i , . . . have a common limit x. For each i we have

fi(x) = lim fi(vr) ≤ lim vri = xi,

and∑

i fi(x) = 1 =∑

i xi, so f(x) = x.

We will give two proofs of Sperner’s lemma. The first of these uses facts aboutvolume, and in this sense is less elementary than the second (which is given in thenext section) but it quickly gives both an intuition for why the result is true andan important refinement.

We fix an affine isometry1 A : Hd−1 → Rd−1 such that

D = det(

A(e2)− A(e1), . . . , A(ed)−A(e1))

> 0.

1If (X, dX) and (Y, dY ) are metric spaces, a function ι : X → Y is an isometry ifdY (ι(x), ι(x

′)) = dX(x, x′) for all x, x′ ∈ X .

3.4. SPERNER’S LEMMA 53

(We regard the determinant as a function of (d − 1)-tuples of elements of Rd−1 beidentifying the tuple with the matrix with those columns.) A theorem of Euclid isthat the volume of a pyramid is one third of the product of the height and the areaof the base. The straightforward2 generalization of this to arbitrary dimensionsimplies that 1

d!D is the volume of ∆d−1.

For each v ∈ V there is an associated function v : [0, 1] → ∆d−1 given byv(t) = (1 − t)v + teℓ(v). Consider a simplex σ ∈ Pd−1 that is the convex hull ofv1, . . . , vd ∈ V , where these vertices are indexed in such a way that

det(

A(v2)−A(v1), . . . , A(vd)− A(v1))

> 0.

We define a function pσ : [0, 1] → R by setting

pσ(t) =1

d!det

(

A(v2(t))− A(v1(t)), . . . , A(vd(t))−A(v1(t)))

.

For 0 ≤ t ≤ 1 let σ(t) be the convex hull of v1(t), . . . , vd(t). Then pσ(t) is thevolume of σ(t) when t is small.

We have

pσ(1) =1

d!det

(

A(eℓ(v2))− A(eℓ(v1)), . . . , A(eℓ(vd))− A(eℓ(v1)))

.

If σ is not completely labelled, then pσ(1) = 0 because some A(eℓ(vi)) − A(eℓ(v1))is zero or two of them are equal. If σ is completely labelled, then we say that thelabelling is orientation preserving on σ if pσ(1) > 0, in which case pσ(1) =

1d!D,

and orientation reversing on σ if pσ(1) < 0, in which case pσ(1) = − 1d!D.

Let p : [0, 1] → R be the sum

p(t) =∑

σ∈Pd−1

pσ(t).

Elementary properties of the determinant imply that each pσ and p are polynomialfunctions. For sufficiently small t the simplices σ(t) are the (d − 1)-dimensionalsimplices of a triangulation of ∆d−1.3 Therefore p(t) is 1

d!D for small t. Since p is a

2Actually, it is straightforward if you know integration, but Gauss regarded this as “too heavy”a tool, expressing a wish for a more elementary theory of the volume of polytopes. The third ofHilbert’s famous problems asks whether it is possible, for any two polytopes of equal volume, totriangulate the first in such a way that the pieces can be reassembled to give the second. Thiswas resolved negatively by Hilbert’s student Max Dehn within a year of Hilbert’s lecture layingout the problems, and it remains the case today that there is no truly elementary theory of thevolumes of polytopes. In line with this, our discussion presumes basic facts about d-dimensionalmeasure of polytopes in Rd that are very well understood by people with no formal mathematicaltraining, but which cannot be justified formally without appealing to relatively advanced theoriesof measure and integration.

3This is visually obvious, and a formal proof would be tedious, so we provide only a sketch.Suppose that for each v ∈ V we have a path connected neighborhood Uv of v in the interior of thesmallest face of ∆d−1 containing v, and this system of neighborhoods satisfies the condition thatfor any simplex in P , say with vertices v1, . . . , vk, if v

′

1∈ Uv1 , . . . , v

′

k∈ Uvk

, then v′1, . . . , v′

kare

affinely independent. We claim that a simplicial complex obtained by replacing each v with someelement of Uv is a triangulation of ∆d−1; note that this can be proved by moving one vertex at atime along a path. Finally observe that because ℓ is a Sperner labelling, for each v and 0 ≤ t < 1,v(t) is contained in the interior of the smallest face of ∆d−1 containing v.


polynomial function of t, it follows that it is constant, and in particular p(1) = 1d!D.

We have established the following refinement of Sperner’s lemma:

Theorem 3.4.2. If ℓ is a Sperner labelling, then the number of σ ∈ Pd−1 such thatℓ is orientation preserving on σ is one greater than the number of σ ∈ Pd−1 suchthat ℓ is orientation reversing on σ.

One of our major themes is that fixed points where the function or correspon-dence reverses orientation are different from those where orientation is preserved.Much of what follows is aimed at keeping track of this difference in increasinglygeneral settings.

3.5 The Scarf Algorithm

The traditional proof of Sperner’s lemma is an induction on dimension, usingpath following in a graph with maximal degree two to show that if the result istrue in dimension d − 2, then it is also true in dimension d − 1. In the late 1960’sand early 1970’s Herbert Scarf and his coworkers pointed out that the graphs inthe various dimensions can be combined into a single graph with maximal degreetwo that has an obvious vertex whose degree is either zero or one. If the labellingis derived from a function f : ∆d−1 → ∆d−1 in the manner described in the proof ofBrouwer’s fixed point theorem in Section 3.4, then following the path in this graphfrom this starting point to the other endpoint amounts to an algorithm for findinga point that is approximately fixed for f .

Our exposition will follow this history, first presenting the inductive argument,then combining the graphs in the various dimensions into a single graph that sup-ports the algorithm. As before, we are given a triangulation P of ∆d−1 and aSperner labelling ℓ : V → {1, . . . , d} where V = P0 = {v1, . . . , vm} is the set ofvertices. For each k = 0, . . . , d − 1 a k-dimensional simplex σ ∈ Pd with verticesvi1 , . . . , vik+1

is said to be k-almost completely labelled if

{1, . . . , k} ⊂ {ℓ(vi1), . . . , ℓ(vik+1)},

and it is k-completely labelled if

{ℓ(vi1), . . . , ℓ(vik+1)} = {1, . . . , k + 1}.

Note that a k-completely labelled simplex is k-almost completely labelled. Whatwe were calling completely labelled simplices in the last section are now (d − 1)-completely labelled simplices.

Suppose that σ ∈ Pd−1 is (d − 1)-almost completely labelled. If it is (d − 1)-completely labelled, then it has precisely one facet that is (d−2)-completely labelled,namely the facet that does not include the vertex with label d. If σ is not (d− 1)-completely labelled, then it has two vertices with the same label, and the facetsopposite these vertices are its (d− 2)-completely labelled facets, so it has preciselytwo such facets.

For k = 0, . . . , d − 2 let ∆k ⊂ ∆d−1 be the convex hull of e1, . . . , ek+1. If oneof the (d− 2)-completely labelled facets of σ is contained in the boundary of ∆d−1,

3.5. THE SCARF ALGORITHM 55

then it must be contained in ∆d−2 because the labelling is Sperner. (Every otherfacet of ∆d−1 lacks one of the labels 1, . . . , d − 1.) When σ has two such facets, itis not possible that ∆d−2 contains both of them, of course, because σ is the convexhull of these facets.

Suppose now that τ ∈ Pd−2 is (d− 2)-completely labelled. Any element of Pd−1

that has it as a facet is necessarily (d−1)-almost completely labelled. If τ intersectsthe interior of ∆d−1, then it is a facet of two elements of Pd−1. On the other hand, ifit is contained in the boundary of ∆d−1, then it must be contained in ∆d−2 becauseℓ is a Sperner labelling, and it is a facet of precisely one element of Pd−1.

We define a graph Γd−1 = (Vd−1, Ed−1) in which Vd−1 be the set of (d−1)-almostcompletely labelled elements of Pd−1, by declaring that two elements of Vd−1 arethe endpoints of an edge in Ed−1 if their intersection is a (d−2)-completely labelledelement of Pd−2. Let σ be an element of Vd−1. Our remarks above imply that ifσ is (d − 1)-completely labelled, then it is an endpoint of no edges if its (d − 2)-completely labelled facet is contained in ∆d−2, and otherwise it is an endpoint ofexactly one edge. On the other hand, if σ is not (d − 1)-completely labelled, thenit is an endpoint of precisely on edge if one of its (d− 2)-completely labelled facetsis contained in ∆d−2, and otherwise it is an endpoint of exactly two edges.

Thus Γd−1 has maximum degree two, so it is a union of isolated points, paths, andloops. The isolated points are the (d−1)-completely labelled simplices whose (d−2)-completely labelled facets are contained in ∆d−2. The endpoints of paths are the(d−1)-completely labelled simplices whose (d−2)-completely labelled facets are notcontained in ∆d−2 and the (d−1)-almost completely labelled simplices that are notcompletely labelled and have a (d−2)-completely labelled facet in ∆d−2. Combiningthis information, we find that the sum of the number of (d− 1)-completely labelledsimplices and the number of (d−2)-completely labelled simplices contained in ∆d−2

is even, because every isolated point is associated with one element of each set, andevery path has two endpoints. If there are an odd number of (d − 2)-completelylabelled simplices contained in ∆d−2, then there are necessarily an odd number of(d− 1)-completely labelled simplices.

b

b

b

1 2

3

b

b

b b

b

b

bbbbb

b

b

b

b

b

b

b

1 2

1

2

1

1

2 1 2

1 2

2

3

33

1

31

b b

b

b

b

b

b

b

b

b

b

bb

b

b

b

b b

1b b

b

b

b bb b

b

b

b

b

1

Figure 3.6

Of course for each k = 0, . . . , d − 2 the set of simplices in P that lie in ∆k


constitute a simplicial subdivision of ∆k, and it is easy to see that the restriction ofthe labelling to the vertices that lie in ∆k is a Sperner labelling for that subdivision.Thus Sperner’s lemma follows from induction if we can establish it when d− 1 = 0.In this case ∆d−1 = ∆0 is a 0-dimensional simplex (i.e., a point) and the elementsof the triangulation P are necessarily this simplex and the empty set. The simplexis 0-completely labelled, because 0 is the only available label, so the number of0-completely labelled simplices is odd, as desired. Figure 3.6 shows the simplices inΓ2 for the labelling of Figure 3.5.

In order to describe the Scarf algorithm we combine the graphs developed ateach stage of the inductive process to create a single graph with a path from aknown starting point to a (d − 1)-completely labelled simplex. Let Vk be the setof k-almost completely labelled simplices contained in ∆k. Define a graph Γk =(Vk, Ek) by specifying that two elements of Vk are the endpoints of an edge inEk if their intersection is a (k − 1)-completely labelled element of Pk−1. For eachk = 1, . . . , d−1, let Fk be the set of unordered pairs {τ, σ} where τ ∈ Vk−1, σ ∈ Vk,and τ is a facet of σ. Define a graph Γ = (V, E) by setting

V = V0 ∪ · · · ∪ Vd−1 and E = E0 ∪ F1 ∪ E1 ∪ · · · ∪ Ed−2 ∪ Fd−1 ∪ Ed−1.

In our analysis above we saw that the number of neighbors of σ ∈ Vk in Γk istwo except that this number is reduced by one if σ has a facet in Vk−1, and it isalso reduced by one if σ is k-completely labelled. If 1 ≤ k ≤ d − 2, then the firstof these conditions is precisely the circumstance in which σ is an endpoint of anedge in Fk, and the second is precisely the circumstance in which σ is an endpointof an edge in Fk+1. Therefore every element of V1 ∪ · · · ∪ Vd−2 has precisely twoneighbors in Γ.

Provided that d ≥ 1, every completely labelled simplex in Vd−1 has preciselyone neighbor in Γ, and every d-almost completely labelled simplex in Vd−1 thatis not completely labelled has two neighbors in Γ that are associated with its two(d−1)-completely labelled facets. Again provided that d ≥ 1, the unique element ofV0 has exactly one neighbor in V1. Thus the completely labelled elements of Vd−1

and the unique element of V0 each have one neighbor in Γ, and every other elementof V has exactly two neighbors in Γ. Consequently the path in Γ that begins atthe unique element of V0 ends at a completely labelled element of Vd−1. Figure 3.7shows the simplices in Γ for the labelling of Figure 3.5, which include points andline segments in addition to those shown in Figure 3.6.

Conceptually, the Scarf algorithm is the process of following this path. Anactual implementation requires a computational description of a triangulation of∆d−1. That is, there must be a triangulation and an algorithm such that if we aregiven a k-simplex in of this simplex in ∆k, the algorithm will compute the (k + 1)-simplex in ∆k+1 that has the given simplex as a facet (provided that k < d−1) andif we are given a vertex of the given simplex, the algorithm will return the otherk-simplex in ∆k that shares the facet of the given simplex opposite the given vertex(provided that this facet is not contained in the boundary of ∆k). In addition, weneed an algorithm that computes the label of a given vertex; typically this wouldbe derived from an algorithm for computing a given function f : ∆d−1 → ∆d−1, asin the proof of Brouwer’s theorem. Given these resources, if we are at an element of

3.5. THE SCARF ALGORITHM 57

V, we can compute the simplices of its neighbors in Γ and the labels of the verticesof these simplices. If we remember which of these neighbors we were at prior toarriving at the current element of V, then the next step in the algorithm is to go tothe other neighbor. Such a step along the path of the algorithm is called a pivot.

b

b

b

1 2

3

b

b

b b

b

b

bbbbb

b

b

b

b

b

b

b

1 2

1

2

1

1

2 1 2

1 2

2

3

33

1

31

b b

b

b

b

b

b

b

b

b

b

bb

b

b

b

b b

1b b

b

b

b bb b

b

b

b

b

1

b bb b b bb bb

Figure 3.7

At this point we remark on a few aspects of the Scarf algorithm, and laterwe will compare it with various alternatives. The first point is that it necessarilymoves through ∆d−1 rather slowly. Consider a k-almost completely labelled simplexσ. Each pivot of the algorithm drops one of the vertices of the current simplex,possibly adding a new vertex, or possibly dropping down to a lower dimensionalface. Therefore a minimum of k pivots are required before one can possibly arriveat a simplex that has no vertex in common with σ. If the grid is fine, the algorithmwill certainly require many pivots to arrive a fixed point far from the algorithm’sstarting point.

This suggests the following strategy. We first apply the Scarf algorithm to acoarse given triangulation of ∆d−1, thereby arriving at a completely labelled simplexthat is hopefully a rough approximation of a fixed point. We then subdivide thegiven triangulation of ∆d−1, using barycentric subdivision or some other method.If we could somehow “restart” the algorithm in the fine triangulation, near thecompletely labelled simplex in the coarse triangulation, it might typically be thecase that the algorithm did not have to go very far to find a completely labelledsimplex in the fine triangulation. Restart methods do exist (see, e.g., Merrill (1972),Kuhn and MacKinnon (1975), and van der Laan and Talman (1979)) but it remainsthe case that the Scarf algorithm has not proved to be very useful in practice,perhaps due in part to its difficulties with high dimensional problems.

There is one more feature of the Scarf algorithm that is worth mentioning. Inour description of the algorithm the ordering of the vertices plays an explicit role,and can easily make a difference to the outcome. If one wishes to find more thanone completely labelled simplex, or perhaps as many as possible, or perhaps even allof them, there is the following strategy. Having followed the algorithm for the givenordering of the indices to its terminus, now proceed from that completely labelledsimplex in the graph Γ′ associated with some different ordering. This might lead


back to the starting point of the algorithm in Γ′, but it is also quite possible thatit might lead to some completely labelled simplex that cannot be reached directlyby the algorithm under any ordering of the indices. A completely labelled simplexσ is accessible if it is reachable by the algorithm in this more general sense: thereis path going to σ from the starting point of the algorithm for some ordering of theindices, along a path that is a union of maximal paths of the various graphs Γ′ forthe various orderings of the indices.

3.6 Homotopy

Let f : X → X be a continuous function, and let x0 be an element of X . Welet h : X × [0, 1] → X be the homotopy

h(x, t) = (1− t)x0 + tf(x).

Here we think of the variable t at time, and let ht = h(·, t) : X → X be the function“at time t.” In this way we imagine deforming the constant function with value x0at time zero into the function f at time one.

Let g : X × [0, 1] → X be the function g(x, t) = h(x, t) − x. The idea of thehomotopy method is to follow a path in Z = g−1(0) starting at (x0, 0) until we reacha point of the form (x∗, 1). As a practical matter it is necessary to assume that f isC1, so that h and g are C1. It is also necessary to assume that the derivative of g hasfull rank at every point of Z, and that the derivative of the map x 7→ f(x)− x hasfull rank at each of the fixed points of f . As we will see later in the book, there is asense in which this is typically the case, so that these assumptions are mild. Withthese assumptions Z will be a union of finitely many curves. Some of these curveswill be loops, while others will have two endpoints in X × {0, 1}. In particular,the other endpoint of the curve beginning at (x0, 0) cannot be in X × {0}, becausethere is only one point in Z ∩ (X × {0}), so it must be (x∗, 1) for some fixed pointx∗ of f .

We now have to tell the computer how to follow this path. The standard com-putational implementation of curve following is called the predictor-corrector

method. Suppose we are at a point z0 = (x, t) ∈ Z. We first need to compute avector v that is tangent to Z at z0. Algebraicly this amounts to finding a nonzerolinear combination of the columns of the matrix of Dg(z0) that vanishes. For thisit suffices to express one of the columns as a linear combination of the others, and,roughly speaking, the Gram-Schmidt process can be used to do this. We can divideany vector we obtain this way by its norm, so that v becomes a unit vector. There isa parameter of the procedure called the step size that is a number ∆ > 0, and the“predictor” part of the process is completed by passing to the point z1 = z0 +∆v.

The “corrector” part of the process uses the Newton method to pass from z1 toa new point in Z, or at least very close to it. The first step is to find a vector w1

that is orthogonal to v such that g(z1) +Dg(z1)w1 = 0. To do this we can use theGram-Schmidt process to find a basis for the orthogonal complement of v, computethe matrix M of the derivative of g with respect to this basis, compute the inverseof M , and then set w1 = −M−1g(z1). We then set z2 = z1 + w1, find a vector w2

3.7. REMARKS ON COMPUTATION 59

orthogonal to v such that g(z2) +Dg(z2)w2 = 0, set z3 = z2 + w2, and continue inthis manner until g(zn) is acceptably small. The net effect of the predictor followedby the corrector is to move us from one point on Z to another a bit further down.By repeating this one can go from one end of the curve to the other.

Probably the reader has sensed that the description above is a high level overviewthat glides past many issues. In fact it is difficult to regard the homotopy methodas an actual algorithm, in the sense of having precisely defined inputs and beingguaranteed to eventually halt at an output of the promised sort. One issue isthat the procedure might accidentally hop from one component of Z to another,particularly if ∆ is large. There are various things that might be done about this,for instance trying to detect a likely failure and starting over with a smaller ∆, butthese issues, and the details of round off error that are common to all numericalsoftware, are really in the realm of engineering rather than computational theory.As a practical matter, the homotopy method is highly successful, and is used tosolve systems of equations from a wide variety of application domains.

3.7 Remarks on Computation

We have now seen three algorithms for computing points that are approximatelyfixed. How good are these, practically and theoretically? The first algorithm we saw,in Section 3.3, is new. It is simple, and can be applied to a wide variety of settings.Code now exists, but there has been little testing or practical experience. The Scarfalgorithm has not lived up to the hopes it raised when it was first developed, and isnot used in practical computation. Homotopy methods are restricted to problemsthat are smooth. As we mentioned above, within this domain they have an extensivetrack record with considerable success.

More generally, what can we reasonably hope for from an algorithm that com-putes points that are approximately fixed, and what sort of theoretical concepts canwe bring to bear on these issues? These question has been the focus of importantrecent advances in theoretical computer science, and in this section we give a briefdescription of these developments. The discussion presumes little in the way ofprior background in computer science, and is quite superficial—a full exposition ofthis material is far beyond our scope. Interested readers can learn much more fromthe cited references, and from textbooks such as Papadimitriou (1994a) and Aroraand Boaz (2007).

Theoretical analyses of algorithms must begin with a formal model of computa-tion. The standard model is the Turing machine, which consists of a processor withfinitely many states connected by an input-output device to a unbounded one di-mensional storage medium that records data in cells, on each of which one can writean element of a finite alphabet that includes a distinguished character ‘blank.’ Atthe beginning of the computation the processor is in a particular state, the storagemedium has a finitely many cells that are not blank, and the input-output deviceis positioned at a particular cell in storage. In each step of the computation thecharacter at the input-output device’s location is read. The Turing machine is es-sentially defined by functions that take state-datum pairs as their arguments and


compute:

• the next state of the processor,

• a bit that will be written at the current location of the input-output device(overwriting the bit that was just read) and

• a motion (forward, back, stay put) of the input-output device.

The computation ends when it reaches a particular state of the machine called“Halt.” Once that happens, the data in the storage device is regarded as theoutput of the computation.

As you might imagine, an analysis based on a concrete and detailed descriptionof the operation of a Turing machince can be quite tedious. Fortunately, it israrely necessary. Historically, other models of computation were proposed, but weresubsequently found to be equivalent to the Turing model, and the Church-Turing

thesis is the hypothesis that all “reasonable” models of computation are equivalent,in the sense that they all yield the same notion of what it means for something to be“computable.” This is a metamathematical assertion: it can never be proved, and arefutation would not be logical, but would instead be primarily a social phenomenon,consisting of researchers shifting their focus to some inequivalent model.

Once we have the notion of a Turing machine, we can define an algorithm tobe a Turing machine that eventually halts, for any input state of the storage device.A subtle distinction is possible here: a Turing machine that always halts is notnecessarily the same thing as a Turing machine that can be proved to halt, regardlessof the input. In fact one of the most important early theorems of computer scienceis that there is no algorithm that has, as input, a description of a Turing machineand a particular input, and decides whether the Turing machine with that input willeventually halt. As a practical matter, one almost always works with algorithmsthat can easily be proved to be such, in the sense that it is obvious that theyeventually halt.

A computational problem is a rule that associates a nonempty set of outputswith each input, where the set of possible inputs and outputs is the set of pairsconsisting of a position of the input-output device and a state of the storage mediumin which there are finitely many nonblank cells. (Almost always the inputs ofinterest are formatted in some way, and this definition implicitly makes checking thevalidity of the input part of the problem.) A computational problem is computable

if there is an algorithm that passes from each input to one of the acceptable outputs.The distinction between computational problems that are computable and thosethat are not is fundamental, with many interesting and important aspects, but inour discussion here we will focus exclusively on problems that are known to becomputable.

For us the most important distinctions is between those computable computa-tional problems that are “easy” and those that are “hard,” where the definitionsof these terms remain to be specified. In order to be theoretically useful, the eas-iness/hardness distinction should not depend on the architecture of a particularmachine or the technology of a particular era. In addition, it should be robust, atleast in the sense that a composition of two easy computational problems, where


the output of the first is the input of the second, should also be easy, and possi-bly in other senses as well. For these reasons, looking at the running time of analgorithm on a particular input is not very useful. Instead, it is more informativeto think about how the resources (time and memory) consumed by a computationincrease as the size of the input grows. In theoretical computer science, the mostuseful distinction is between algorithms whose worst case running time is boundedby a polynomial function of the size of the output, and algorithms that do nothave this property. The class of computational problems that have polynomial timealgorithms is denoted by P. If the set of possible inputs of a computational prob-lem is finite, then the problem is trivially in P, and in fact we will only considercomputational problems with infinite sets of inputs.

There are many kinds of computational problems, e.g., sorting, function evalua-tion, optimization, etc. For us the most important types are decision problems ,which require a yes or no answer to a well posed question, and search problems,which require an instance of some sort of object or a verification that no such ob-ject exists. An important example of a decision problem is Clique: given a simpleundirected graph G and an integer k, determine whether G has a clique with knodes, where a clique is a collection of vertices such that G has an edge betweenany two of them. An example of a search problem is to actually find such a cliqueor to certify that no such clique exists.

There is a particularly important class of decision problems called NP, whichstands for “nondeterministic polynomial time.” Originally NP was thought of asthe class of decision problems for which a Turing machine that chose its next staterandomly has a positive probability of showing that the answer is “Yes” when thisis the case. For example, if a graph has a k-clique, an algorithm that simply guesseswhich elements constitute the clique has a positive probability of stumbling ontosome k-clique. The more modern way of thinking about NP is that it is the class ofdecision problems for which a “Yes” answer has a certificate or witness that canbe verified in polynomial time. In the case of Clique an actual k-clique is such awitness. Factorization of integers is another algorithmic issue which easily generatesdecision problems—for example, does a given number have a prime factor whosefirst digit is 3?—that are in NP because a prime factorization is a witness for them.(One of the historic recent advances in mathematics is the discovery of a polynomialtime algorithm for testing whether a number is prime. Thus it is possible to verifythe primality of the elements of a factorization in polynomial time.)

An even larger computational class is EXP, which is the class of computationalproblems that have algorithms with running times that are bounded above by afunction of the form exp(p(s)), where s is the size of the problem and p is a poly-nomial function. Instead of using time to define a computational class, we canalso use space, i.e., memory; PSPACE is the class of computational problems thathave algorithms that use an amount of memory that is bounded by a polynomialfunction of the size of the input. The sizes of the certificates for a problem inNP are necessarily bounded by some polynomial function of the size of the input,and the problem can be solved by trying all possible certificates not exceeding thisbound, so any problem in NP is also in PSPACE. In turn, the number of processorstate-memory state pairs during the run of a program using polynomially bounded


memory an exponential function of the polynomial, so any problem in PSPACE

is also in EXP. Thus

P ⊂ NP ⊂ PSPACE ⊂ EXP.

Computational classes can also be defined in relation to an oracle which isassumed to perform some computation. The example of interest to us is an oraclethat evaluates a continuous function f : X → X . How hard is it to find a pointthat is approximately fixed using such an oracle? Hirsch et al. (1989) showed thatany algorithm that does this has an exponential worst case running time, becausesome functions require exponentially many calls to the oracle. Once you commit toan algorithm, the Devil can devise a function for which your algorithm will makeexponentially many calls to the oracle before finding an approximate fixed point.

An important aspect of this result is that the oracle is assumed to be the onlysource of information about the function. In practice the function is specified bycode, and in principle an algorithm could inspect the code and use what it learned tospeed things up. For linear functions, and certain other special classes of functions,this is a useful approach, but it seems quite farfetched to imagine that a fully generalalgorithm could do this fruitfully. At the same time it is hard to imagine how wemight prove that this is impossible, so we arrive at the conclusion that even thoughwe do not quite have a theorem, finding fixed points almost certainly has exponentialworst case complexity.

Even if finding fixed points is, in full generality, quite hard, it might still be thecase that certain types of fixed point problems are easier. Consider, in particular,finding a Nash equilibrium of a two person game. Savani and von Stengel (2006)(see also McLennan and Tourky (2010)) showed that the Lemke-Howson algorithmhas exponential worst case running time, but the algorithm is in many ways similarto the simplex algorithm for linear programming, not least because both algorithmstend to work rather well in practice. The simplex algorithm was shown by Kleeand Minty (1972) to have exponential case running time, but later polynomial timealgorithms were developed by Khachian (1979) and Karmarkar (1984). Whether ornot finding a Nash equilibrium of a two person game is in P was one of the out-standing open problems of computer science for over a decade. Additional conceptsare required in order to explain how this issue was resolved.

A technique called reduction can be used to show that some computationalproblems are at least as hard as others, in a precise sense. Suppose that A and Bare two computational problems, and we have two algorithms, guaranteed to runin polynomial time, the first of which converts the input encoding an instance ofproblem A into the input encoding an instance of problem B, and the second ofwhich converts the desired output for the derived instance of problem B into thedesired output for the given instance of problem A. Then problem B is at least ashard as problem A because one can easily turn an algorithm for problem B intoan algorithm for problem A that is “as good,” in any sense that is invariant underthese sorts of polynomial time transformations.

A problem is complete for a class of computational problems if it is at least ashard, in this sense, as any other member of the class. One of the reasons that NP

is so important is there are numerous NP-complete problems, many of which arise


naturally; Clique is one of them. One of the most famous problems in contem-porary mathematics is to determine whether NP is contained in P. This questionboils down to deciding whether Clique (or any other NP-complete problem) hasa polynomial time algorithm. This is thought to be highly unlikely, both because alot of effort has gone into designing algorithms for these problems, and because theexistence of such an algorithm would have remarkable consequences. It should bementioned that this problem is, to some extent at least, an emblematic representa-tive of numerous open questions in computer science that have a similar character.In fact, one of the implicit conventions of the discipline is to regard a computationalproblem as hard if, after some considerable effort, people haven’t been able to figureout whether it is hard or easy.

For any decision problem in NP there is an associated search problem, namelyto find a witness for an affirmative answer or verify that the answer is negative.For Clique this means not only showing that a clique of size k exists, but actuallyproducing one. The class of search problems associated with decision problems iscalled FNP. (The ‘F’ stands for “function.”) For Clique the search problem isnot much harder than the decision problem, in the following sense: if we had apolynomial time algorithm for the decision problem, we could apply it to the graphwith various vertices removed, repeatedly narrowing the focus until we found thedesired clique, thereby solving the search problem is polynomial time.

However, there is a particular class of problems for which the search problemis potentially quite hard, even though the decision problem is trivial because theanswer is known to be yes. This class of search problems is called TFNP. (The’T’ stands for “total.”) There are some “trivial” decision problems that give riseto quite famous problems in this class:

• “Does a integer have a prime factorization?” Testing primality can now bedone in polynomial time, but there is still no polynomial time algorithm forfactoring.

• “Given a set of positive integers {a1, . . . , an} with ai < 2n/n for all i, do thereexist two different subsets with the same sum?” There are 2n different subsets,and the sum of any one of them is less than 2n − n + 1, so the pigeonholeprinciple implies that the answer is certainly yes.

• “Does a two person game have sets of pure strategies for the agents that arethe supports4 of a Nash equilibrium?” Verifying that a pair of sets are thesupport of a Nash equilibrium is a computation involving linear algebra and asmall number of inequality verifications that can be performed in polynomialtime.

Problems involving a function defined on some large space must be specifiedwith a bit more care, because if the function is given by listing its values, then theproblem is easy, relative to the size of the input, because the input is huge. Instead,one takes the input to be a Turing machine that computes (in polynomial time) thevalue of the function at any point in the space.

4The support of a mixed strategy is the set of pure strategies that are assigned positiveprobability.


• “Given a Turing machine that computes a real valued function at every vertexof a graph, is there a vertex where the function’s value is at least as large asthe function’s value at any of the vertex’ neighbors in the graph?” Since thegraph is finite, the function has a global maximum and therefore at least onelocal maximum.

• “Given a Turing machine that computes the value of a Sperner labelling atany vertex in a triangulation of the simplex, does there exist a completelylabelled subsimplex?”

Mainly because the class of problems inNP that always have a positive answer isdefined in terms of a property of the outputs, rather than a property of the inputs(but also in part because factoring seems so different from the other problems)experts expect that TFNP does not contain any problems that are complete forthe class. In view of this, trying to study the class as a whole is unlikely to bevery fruitful. Instead, it makes sense to define and study coherent subclasses, andPapadimitriou (1994b) advocates defining subclasses in terms of the proof that asolution exists. Thus PPP (“polynomial pigeonhole principle”) is (roughly) theclass of problems for which existence is guaranteed by the pigeonhole principle, andPLS (“polynomial local search”) is (again roughly) the set of problems requestinga local maximum of a real valued function defined on a graph by a Turing machine.

For us the most important subclass of TFNP is PPAD (“polynomial parityargument directed”) which is defined by abstracting certain features of the algo-rithms we have seen in this chapter. The computational problem EOTL (“end ofthe line”) is defined by a Turing machine that defines a directed graph5 of maximaldegree two in a space that may, without loss of generality, be taken to be the set{0, 1}k of bit strings of length k, where k is bounded by a polynomial function ofthe size of the input. For each v ∈ {0, 1}k the Turing machine specifies whether v isa vertex in the graph. If it is, the Turing machine computes its predecessor, if it hasone, and its successor, if it has one. When it exists, the predecessor of v must be avertex, and its successor must be v. Similarly, when v has a successor, it must be avertex, and its predecessor must be v. Finally, we require that (0, . . . , 0) is a vertexthat has a successor but no predecessor. The problem is to find another “leaf” ofthe graph, by which we mean either a vertex with a predecessor but no successor,or a vertex with a successor but no predecessor. Of course the existence of such aleaf follows from Lemma 2.6.1, generalized in the obvious way to handle directedgraphs. The class of computational problems that have reductions to EOTL isPPAD (“polynomial parity problem directed”).

The Lemke-Howson algorithm passes from a two person game to an instanceof EOTL, then solves it by following the path in the graph to its other endpoint.Similarly, the Scarf algorithm has as input the algorithms for navigating in a trian-gulation of ∆d−1 and generating the labels of the vertices, and if follows a path ina graph from one endpoint to another. (It would be difficult to describe homotopyin exactly these terms, but there is an obvious sense in which it has this character.)

5A directed graph is a pair G = (V,E) where V is a finite set of vertices and E is a finiteset of ordered pairs of elements of V . That is, in a directed graph each edge has a source and atarget.


There is a rather subtle point that is worth mentioning here. In our descriptionsof Lemke-Howson, Scarf, and homotopy, we implicitly assumed that the algorithmused its memory of where it had been to decide which direction to go in the graph,but the definition of EOTL requires that the graph be directed, which means ineffect that if we begin at any point on the path, we can use local information to de-cide which of the two directions in the graph constitutes forward motion. It turnsout that each of our three algorithms has this property; a proper explanation ofthis would require more information about orientation than we have developed atthis point. The class of problems that can be reduced to the computational prob-lem that has the same features as EOTL, except that the graph is undirected, isPPA. Despite the close resemblance to PPAD, the theoretical properties of thetwo classes differ in important ways.

In a series of rapid developments in 2005 and 2006 (Daskalakis et al. (2006);Chen and Deng (2006b,a)) it was shown that computing a Nash equilibrium of atwo player game is PPAD-complete, and also that the two dimensional Spernerproblem is PPAD-complete. This means that computing a Nash equilibrium of atwo player game is almost certainly hard, in the sense that there is no polynomialtime algorithm for the problem, because computing general fixed points is almostcertainly hard. Since this breakthrough many other computational problems havebeen shown to be PPAD-complete, including finding Walrasian equilibria in seem-ingly quite simple exchange economies. In various senses the problem does not goaway if we relax the problem, asking for a point that is ε-approximately fixed foran ε that is significantly greater than zero.

The current state of theory presents a contrast between theoretical conceptsthat classify even quite simple fixed point problems as intractable, and algorithmsthat often produce useful results in a reasonable amount of time. A recent resultpresents an even more intense contrast. The computational problem OEOTL hasthe same given data as EOTL, but now the goal is to find the other end of the pathbeginning at (0, . . . , 0), and not just any second leaf of the graph. Goldberg et al.(2011) show that OETL is PSPACE-complete, even though the Lemke-Howsonalgorithm, the Scarf algorithm, and many specific instances of homotopy procedurescan be recrafted as algorithms for OEOTL.

Recent developments have led to a rich and highly interesting theory explainingwhy the problem of finding an approximate fixed point is intractable, in the sensethat there is almost certainly no algorithm that always finds an approximate fixedpoint in a small amount of time. What is missing at this point are more toleranttheoretical concepts that give an account of why the algorithms that exist are asuseful as they are in fact, and how they might be compared with each other, andwith theoretical ideals that have not yet been shown to be far out of reach.

Chapter 4

Topologies on Spaces of Sets

The theories of the degree and the index involve a certain kind of continuitywith respect to the function or correspondence in question, so we need to developtopologies on spaces of functions and correspondences. The main idea is that onecorrespondence is close to another if its graph is close to the graph of the secondcorrespondence, so we need to have topologies on spaces of subsets of a given space.In this chapter we study such spaces of sets, and in the next chapter we apply theseresults to spaces of functions and correspondences. There are three basic set theo-retic operations that are used to construct new functions or correspondences fromgiven ones, namely restriction to a subdomain, cartesian products, and composi-tion, and our agenda here is to develop continuity results for elementary operationson sets that will eventually support continuity results for those operations.

To begin with Section 4.1 reviews some basic properties of topological spacesthat hold automatically in the case of metric spaces. In Section 4.2 we definetopologies on spaces of compact and closed subsets of a general topological space.Section 4.3 presents a nice result due to Vietoris which asserts that for one of thesetolopogies the space of nonempty compact subsets of a compact space is compact.Economists commonly encounter this in the context of a metric space, in whichcase the topology is induced by the Hausdorff distance; Section 4.4 clarifies theconnection. In Section 4.5 we study the continuity properties of basic operationsfor these spaces. Our treatment is largely drawn fromMichael (1951) which containsa great deal of additional information about these topologies.

4.1 Topological Terminology

Up to this point the only topological spaces we have encountered have beensubsets of Euclidean spaces. Now it will be possible that X lacks some of theproperties of metric spaces, in part because we may ultimately be interested in somespaces that are not metrizable, but also in order to clarify the logic underlying ourresult.

Throughout this chapter we work with a fixed topological space X . We say thatX is:

(a) a T1-space if, for each x ∈ X , {x} is closed;

66

4.2. SPACES OF CLOSED AND COMPACT SETS 67

(b) Hausdorff if any two distinct points have disjoint neighborhoods;

(c) regular if every neighborhood of a point contains a closed neighborhood ofthat point;

(d) normal if, for any two disjoint closed sets C and D, there are disjoint opensets U and V with C ⊂ U and D ⊂ V .

In a Hausdorff space the complement of a point is a neighborhood of every otherpoint, so a Hausdorff space is T1. It is an easy exercise to show that a metric spaceis normal and T1. Evidently a normal T1 space is Hausdorff and regular.

A collection B of subsets of X is a base of a topology if the open sets are allunions of elements of B. Note that B is a base of a topology if and only if all theelements of B are open and the open sets are those U ⊂ X such that for everyx ∈ U there there is a V ∈ B with x ∈ V ⊂ U . We say that B is a subbase ofthe topology if the open sets are the unions of finite intersections of elements of B.Equivalently, each element of B is open and for each open U and x ∈ U there areV1, . . . , Vk ∈ B such that x ∈ V1 ∩ · · · ∩ Vk ⊂ U .

It is often easy to define or describe a topology by specifying a subbase—in whichcase we way that the topology of X is generated by B—so we should understandwhat properties a collection B of subsets of X has to have in order for this towork. Evidently the collection of all unions of finite intersections of elements of Bis closed under finite intersection and arbitrary union. We may agree, as a matterof convention if you like, that the empty set is a finite intersection of elements ofB. Then the only real requirement is that the union of all elements of B is X , sothat X itself is closed.

4.2 Spaces of Closed and Compact Sets

There will be a number of topologies, and in order to define them we need thecorresponding subbases. For each open U ⊂ X let:

• UU = {K ⊂ U : K is compact };

• UU = UU \ {∅};

• VU = {K ⊂ X : K is compact and K ∩ U 6= ∅ };

• U0U = {C ⊂ U : C is closed };

• U0U = U0

U \ {∅};

• V0U = {C ⊂ X : C is closed and C ∩ U 6= ∅ }.

We now have the following spaces:

• K(X) is the space of compact subsets of X endowed with the topology gen-erated by the subbase { UU : U ⊂ X is open }.

68 CHAPTER 4. TOPOLOGIES ON SPACES OF SETS

• K(X) is the space of nonempty compact subsets of X endowed with thesubspace topology inherited from K(X).

• H(X) is the space of nonempty compact subsets of X endowed with thetopology generated by the subbase

{ UU : U ⊂ X is open } ∪ {VU : U ⊂ X is open }.

• K0(X) is the space of closed subsets ofX endowed with the topology generatedby the base { U0

U : U ⊂ X is open }.

• K0(X) is the space of nonempty closed subsets of X endowed with the sub-space topology inherited from K0(X).

• H0(X) is the space of nonempty closed subsets of X endowed with the topol-ogy generated by the subbase

{ U0U : U ⊂ X is open } ∪ {V0

U : U ⊂ X is open }.

The topologies of H(X) and H0(X) are both called the Vietoris topology.Roughly, a neighborhood of K in K(X) or K(X) consists of those K ′ that

are close to K in the sense that every point in K ′ is close to some point of K.A neighborhood of K ∈ H(X) consists of those K ′ that are close in this sense,and also in the sense that every point in K is close to some point of K ′. Similarremarks pertain to K0(X), K0(X), andH0(X). Section 4.4 develops these intuitionsprecisely when X is a metric space.

Compact subsets of Hausdorff spaces are closed, so “for practical purposes” (i.e.,when X is Hausdorff) every compact set is closed. In this case K(X), K(X), andH(X) have the subspace topologies induced by the topologies of K0(X), K0(X), andH0(X). Of course it is always the case that K(X) and K0(X) have the subspacetopologies induced by K(X) and K0(X) respectively.

It is easy to see that { UU : U ⊂ X is open } is a base for K(X) and { U0U :

U ⊂ X is open } is a base for K0(X). Also, for any open U1, . . . , Uk we have

UU1 ∩ . . . ∩ UUk= UU1∩...∩Uk

,

and similarly for UU , U0U , and U0

U , so the subbases of K(X), K(X), K0(X), andK0(X) are actually bases.

4.3 Vietoris’ Theorem

An interesting fact, which was proved already in Vietoris (1923), and which isapplied from time to time in mathematical economics, is that H(X) is compactwhenever X is compact. We begin the argument with a technical lemma.

Lemma 4.3.1. If X has a subbase such that any cover of X by elements of thesubbase has a finite subcover, then X is compact.

4.4. HAUSDORFF DISTANCE 69

Proof. Say that a set is basic if it is a finite intersection of elements of the subbasis.Any open cover is refined by the collection of basic sets that are subsets of itselements. If a refinement of an open cover has a finite subcover, then so does thecover, so it suffices to show that any open cover of X by basic sets has a finitesubcover.

A collection of open covers is a chain if it is completely ordered by inclusion:for any two covers in the chain, the first is a subset of the second or vice versa. Ifeach open cover in a chain consists of basic sets, and has no finite subcover, thenthe union of the elements of the chain also has these properties (any finite subsetof the union is contained in some member of the chain) so Zorn’s lemma impliesthat if there is one open cover with these properties, then there is a maximal suchcover, say {Uα : α ∈ A}.

Suppose, for some β ∈ A, that Uβ = V1 ∩ . . . ∩ Vn where V1, . . . , Vn are in thesubbasis. If, for each i = 1, . . . , n, {Uα : α ∈ A} ∪ {Vi} has a finite subcover Ci,then each Ci \ {Vi} covers X \ Vi, so

(C1 \ {V1}) ∪ . . . ∪ (Cn \ {Vn}) ∪ {Uβ}

is a finite subcover from {Uα : α ∈ A}. Therefore there is at least one i such that{Uα : α ∈ A}∪{Vi} has no finite subcover, and maximality implies that Vi is alreadyin the cover. This argument shows that each element Uβ of the cover is containedin a subbasic set that is also in the cover, so the subbasic sets in {Uα : α ∈ A} coverX , and by hypothesis there must be a finite subcover after all.

Theorem 4.3.2. If X is compact, then H(X) is compact.

Proof. Suppose that { UUα : α ∈ A} ∪ {VVβ : β ∈ B} is an open cover of H(X)by subbasic sets. Let D := X \ ⋃

β Vβ; since D is closed and X is compact, D iscompact. We may assume that D is nonempty because otherwise X = Vβ1∪. . .∪Vβnfor some β1, . . . , βn, in which case H(X) = VVβ1 ∪ . . . ∪ VVβn . In addition, D mustbe contained in some Uα because otherwise D would not be an element of any UUα

or any VVβ . But then {Uα} ∪ {Vβ : β ∈ B} has a finite subcover, so, for someβ1, . . . , βn, we have

H(X) = UUα ∪ VVβ1 ∪ . . . ∪ VVβn .

4.4 Hausdorff Distance

Economists sometimes encounter spaces of compacts subsets of a metric space,which are frequently topologized with the Hausdorff metric. In this section we clarifythe relationship between that approach and the spaces introduced above. Supposethat X is a metric space with metric d. For nonempty compact sets K,L ⊂ X let

δK(K,L) := maxx∈K

miny∈L

d(x, y).

Then for any K and ε > 0 we have

{L : δK(L,K) < ε } = {L : L ⊂ Uε(K) } = UUε(K). (∗)


On the other hand, whenever K ⊂ U with K compact and U open there is someε > 0 such that Uε(K) ⊂ U (otherwise we could take sequences x1, x2, . . . in Land y1, y2, . . . in X \ U with d(xi, yi) → 0, then take convergent subsequences) so{L : δK(L,K) < ε } ⊂ UU . Thus:

Lemma 4.4.1. When X is a metric space, the sets of the form {L : δK(L,K) < ε }constitute a base of the topology of K(X).

The Hausdorff distance between nonempty compact sets K,L ⊂ X is

δH(K,L) := max{δK(K,L), δK(L,K)}.

This is a metric. Specifically, it is evident that δH(K,L) = δH(L,K), and thatδH(K,L) = 0 if and only if K = L. If M is a third compact set, then

δK(K,M) ≤ δK(K,L) + δK(L,M),

from which it follows easily that the Hausdorff distance satisfies the triangle in-equality.

There is now an ambiguity in our notation, insofar asUε(L) might refer either tothe the union of the ε-balls around the various points of L or to the set of compactsets whose Hausdorff distance from L is less than ε. Unless stated otherwise, wewill always interpret it in the first way, as a set of points and not as a set of sets.

Proposition 4.4.2. The Hausdorff distance induces the Vietoris topology on H(X).

Proof. Fix a nonempty compact K. We will show that any neighborhood of K inone topology contains a neighborhood in the other topology.

First consider some ε > 0. Choose x1, . . . , xn ∈ K such that K ⊂ ⋃

iUε/2(xi).If L ∩Uε/2(xi) 6= ∅ for all i, then δK(L,K) < ε, so, in view of (∗),

K ∈ UUε(K) ∩ VUε/2(x1) ∩ . . . ∩ VUε/2(xn) ⊂ {L : δH(K,L) < ε }.

We now show that any element of our subbasis for the Vietoris topology contains{L : δH(K,L) < ε } for some ε > 0. If U is an open set containing K, then (as weargued above) Uε(K) ⊂ U for some ε > 0, so that

K ∈ {L : δH(L,K) < ε } ⊂ {L : δK(L,K) < ε } ⊂ UU .

If V is open with K∩V 6= ∅, then we can choose x ∈ K∩V and ε > 0 small enoughthat Uε(x) ⊂ V . Then

K ∈ {L : δH(K,L) < ε } ⊂ {L : δK(K,L) < ε } ⊂ VV .

4.5. BASIC OPERATIONS ON SUBSETS 71

4.5 Basic Operations on Subsets

In this section we develop certain basic properties of the topologies defined inSection 4.2. To achieve a more unified presentation, it will be useful to let T denotea generic element of {K,K,H, K0,K0,H0}. This is, T (X) will denote one of thespaces K(X), K(X), H(X), K0(X), H0(X), and H0(X), with the range of allowedinterpretations indicated in each context. Similarly, W will denote a generic elementof {U ,U ,V, U0,U0,V0}.

We will frequently apply the following simple fact.

Lemma 4.5.1. If Y is a second topological space, f : Y → X is a function, and Bis a subbase for X such that f−1(V ) is open for every V ∈ B, then f is continuous.

Proof. For any sets S1, . . . , Sk ⊂ X we have f−1(⋂

i Si) =⋂

i f−1(Si), and for

any collection {Ti}i∈I of subsets of X we have f−1(⋃

i Ti) =⋃

i f−1(Ti). Thus the

preimage of a union of finite intersections of elements of B is open, because it is aunion of finite intersections of open subsets of Y .

4.5.1 Continuity of Union

The function taking a pair of sets to their union is as well behaved as one mighthope.

Lemma 4.5.2. For any T ∈ {K,K,H, K0,K0,H0} the function υ : (K1, K2) 7→K1 ∪K2 is a continuous function from T (X)× T (X) to T (X).

Proof. Applying Lemma 4.5.1, it suffices to show that preimages of subbasic opensets are open. For T ∈ {K,K, K0,K0} it suffices to note that

υ−1(WU) = WU ×WU

for all four W ∈ {U ,U , U0,U0}. For T ∈ {H,H0} we also need to observe that

υ−1(WU ) = (WU ×H(X)) ∪ (H(X)×WU)

for both W ∈ {V,V0}.

4.5.2 Continuity of Intersection

Simple examples show that intersection is not a continuous operation for thetopologies H and H0, so the only issues here concern K, K, K0, and K0.

Lemma 4.5.3. If A ⊂ X is closed, the function K 7→ K ∩A from KA(X) to K(A)and the function C 7→ C ∩A from K0

A(X) to K0(A) are continuous.

Proof. If V ⊂ A is open, then the set of compact K such that K ∩ A ⊂ V isUV ∪(X\A). This establishes the first asserted continuity, and a similar argumentestablishes the second.


For a nonempty closed set A ⊂ X let KA(X) and K0A(X) be the sets of compact

and closed subsets of X that have nonempty intersection with A. Since the topolo-gies of K(X) and K0 are the subspace topologies inherited from K(X) and K0(X),last result has the following immediate consequence.

Lemma 4.5.4. The function K 7→ K ∩ A from KA(X) to K(A) and the functionC 7→ C ∩ A from K0

A(X) to K0(A) are continuous.

Joint continuity of the map (C,D) 7→ C ∩D requires an additional hypothesis.

Lemma 4.5.5. If X is a normal space, then ι : (C,D) 7→ C ∩ D is a continuousfunction from K0(X) × K0(X) to K0(X). If, in addition, X is a T1 space, thenι : K(X)× K(X) → K(X) is continuous.

Proof. By Lemma 4.5.1 it suffices to show that, for any open U ⊂ X , ι−1(U0U ) is

open. For any (C,D) in this set normality implies that there are disjoint open setsV and W containing C \ U and D \ U respectively. Then (U ∪ V ) ∩ (U ∪W ) = U ,so

(C,D) ∈ (U0U∪V × U0

U∪W ) ∩ I0(X) ⊂ ι−1(U0U).

If X is also T1, it is a Hausdorff space, so compact sets are closed. Thereforeι : K(X) × K(X) → K(X) is continuous because its domain and range have thesubspace topologies inherited from K0(X)× K0(X) and K0(X).

Let I(X) (resp. I0(X)) be the set of pairs (K,L) of compact (resp. closed)subsets of X such that K ∩ L 6= ∅, endowed with the topology it inherits from theproduct topology of K(X)×K(X) (resp. K0(X)×K0(X)). The relevant topologiesare relative topologies obtained from the spaces in the last result, so:

Lemma 4.5.6. If X is a normal space, then ι : (C,D) 7→ C ∩ D is a continuousfunction from I0(X) to K0(X). If, in addition, X is a T1 space, then ι : I(X) →K(X) is continuous.

4.5.3 Singletons

Lemma 4.5.7. The function η : x 7→ {x} is a continuous function from X toT (X) when T ∈ {K,H}. If, in addition, X is a T1-space, then it is continuouswhen T ∈ {K0,H0}.

Proof. Singletons are always compact, so for any open U we have η−1(UU) =η−1(VU) = U . If X is T1, then singletons are closed, so η−1(U0

U) = η−1(V0U) = U .

4.5.4 Continuity of the Cartesian Product

In addition to X , we now let Y be another given topological space. A simpleexample shows that the cartesian product π0 : (C,D) 7→ C ×D is not a continuousfunction from H0(X) × H0(Y ) to H0(X × Y ). Suppose X = Y = R, (C,D) =(X, {0}), and

W = { (x, y) : |y| < (1 + x2)−1 }.


It is easy to see that there is no neighborhood V ⊂ H0(Y ) ofD such that π0(C,D′) ∈UW (that is, R×D′ ⊂W ) for all D′ ∈ V .

For compact sets there are positive results. In preparation for them we recall abasic fact about the product topology.

Lemma 4.5.8. If K ⊂ X and L ⊂ Y are compact, and W ⊂ X × Y is a neigh-borhood of K × L, then there are neighborhoods U of K and V of L such thatU × V ⊂W .

Proof. By the definition of the product topology, for each (x, y) ∈ K × L thereare neighborhoods U(x,y) and V(x,y) of x and y such that U(x,y) × V(x,y) ⊂ W . Foreach x ∈ K we can find y1, . . . , yn such that L ⊂ Vx :=

⋃

j V(x,yj), and we can thenlet Ux :=

⋂

j U(x,yj). Now choose x1, . . . , xm such that K ⊂ U :=⋃

i Uxi, and letV :=

⋂

i Vxi.

Proposition 4.5.9. For T ∈ {K,K,H} the function π : (K,L) 7→ K × L is acontinuous function from T (X)× T (Y ) to T (X × Y ).

Proof. Let K ⊂ X and L ⊂ Y be compact. If W is a neighborhood of K × L andU and V are open neighborhoods of K and L with U × V ⊂W , then

(K,L) ∈ UU × UV ⊂ π−1(UW ).

By Lemma 4.5.1, this establishes the asserted continuity when T ∈ {K,K}.To demonstrate continuity when T = H we must also show that π−1(VW ) is open

in H(X)×H(Y ) whenever W ⊂ X × Y is open. Suppose that (K × L) ∩W 6= ∅.Choose (x, y) ∈ (K ×L)∩W , and choose open neighborhoods U and V of x and ywith U × V ⊂W . Then

K × L ∈ VU × VV ⊂ π−1(VW ).

4.5.5 The Action of a Function

Now fix a continuous function f : X → Y . Then f maps compact sets tocompact sets while f−1(D) is closed whenever D ⊂ Y is closed. The first of theseoperations is as well behaved as one might hope.

Lemma 4.5.10. If T ∈ {K,K,H}, then φf : K 7→ f(K) is a continuous functionfrom T (X) to T (Y ).

Proof. Preimages of subbasic open sets are open: for any open V ⊂ Y we haveφ−1f (WV ) = Wf−1(V ) for all W ∈ {U ,U ,V}.

There is the following consequence for closed sets.

Lemma 4.5.11. If X is compact, Y is Hausdorff, and T ∈ {K,K,H}, then φf :K 7→ f(K) is a continuous function from T 0(X) to T 0(Y ).


Proof. Recall that a closed subset of a compact space X is compact1 so thatT 0(X) ⊂ T (X). As we mentioned earlier, T 0(X) has the relative topologies in-duced by the topology of T (X), so the last result implies that φf is a continuousfunction from T 0(X) to T (Y ). The proof is completed by recalling that a compactsubset of a Hausdorff space Y is closed2, so that T (Y ) ⊂ T 0(Y ).

Since preimages of closed sets are closed, there is a well defined function ψf :D 7→ f−1(D) from K0(Y ) to K0(X). We need an additional hypothesis to guaranteethat it is continuous. Recall that a function is closed if it is continuous and mapsclosed sets to closed sets.

Lemma 4.5.12. If f is a closed map, then ψf : D 7→ f−1(D) is a continuousfunction from K0(Y ) to K0(X).

Proof. For an open U ⊂ X , we claim that ψ−1f (U0

U) = U0Y \f(X\U). First of all,

Y \ f(X \U) is open because f is a closed map. If D ⊂ Y \ f(X \U) is closed, thenf−1(D) is a closed subset of U . Thus U0

Y \f(X\U) ⊂ ψ−1f (U0

U). On the other hand,

if D ⊂ Y is closed and f−1(D) ⊂ U , then D ∩ f(X \ U) = ∅. Thus ψ−1f (U0

U) ⊂U0Y \f(X\U).

Of course if f is closed and surjective, then ψf restricts to a continuous mapfrom K0(Y ) to K0(X). When X is compact and Y is Hausdorff, any continuous f :X → Y is closed, because any closed subset ofX is compact, so its image is compactand consequently closed. Here is an example illustrating how the assumption thatf is closed is indispensable.

Example 4.5.13. Suppose 0 < ε < π, let X = (−ε, 2π + ε) and Y = { z ∈C : |z| = 1 }, and let f : X → Y be the function f(t) := eit. The functionψf : D 7→ f−1(D) is discontinuous at D0 = { eit : ε ≤ t ≤ 2π − ε } because for anyopen V containing D0 there are closed D ⊂ V such that f−1(D) includes points farfrom f−1(D0) = [ε, 2π − ε].

4.5.6 The Union of the Elements

Whenever we have a set of subsets of some space, we can take the union of itselements. For any open U ⊂ X we have

⋃

K∈UUK = U because for each x ∈ U , {x}

is compact. Since the sets UU are a base for the topology of K(X), it follows thatthe union of all elements of an open subset of K(X) is open. If U and V1, . . . , Vkare open, then UU ∩ VV1 ∩ · · · ∩ VVk = ∅ if there is some j with U ∩ Vj = ∅, andotherwise

{x, y1, . . . , yk} ∈ UU ∩ VV1 ∩ · · · ∩ VVkwhenever x ∈ U and y1 ∈ V1 ∩ U, . . . , yk ∈ Vk ∩ U , so the union of all K ∈UU ∩ VV1 ∩ · · · ∩ VVk is again U . Therefore the union of all the elements of an open

1Proof: an open cover of the subset, together with its complement, is an open cover of thespace, any finite subcover of which yields a finite subcover of the subset.

2Proof: fixing a point y in the complement of the compact set K, for each x ∈ K there aredisjoint neighborhoods of Ux of x and Vx of y, {Ux} is an open cover of K, and if Ux1

, . . . , Uxnis

a finite subcover, then Vx1∩ . . . ∩ Vxn

is a neighborhood of y that does not intersect K.


subset of H(X) is open. If X is either T1 or regular, then similar logic shows thatfor either T ∈ {K0,H0} the union of the elements of an open subset of T (X) isopen.

If a subset C of H(X) or H0(X) is compact, then it is automatically compact inthe coarser topology of K(X) or K0(X). Therefore the following two results implythe analogous claims for the H(X) and H0(X), which are already interesting.

Lemma 4.5.14. If S ⊂ K(X) is compact, then L :=⋃

K∈SK is compact.

Proof. Let {Uα : α ∈ A} be an open cover of L. For each K ∈ S let VK be theunion of the elements of some finite subcover. Then K ∈ UVK , so { UVK : K ∈ S }is an open cover of S; let UVK1

, . . . ,UVKrbe a finite subcover. Then L ⊂ ⋃r

i=1 VKi,

and the various sets from {Uα} that were united to form the VKiare the desired

finite subcover of L.

Lemma 4.5.15. If X is regular and S ⊂ K0(X) is compact, then D :=⋃

C∈S C isclosed.

Proof. We will show that X \D is open; let x be a point in this set. Each elementof S is a closed set that does not contain x, so (since X is regular) it is an elementof U0

X\N for some closed neighborhood N of x. Since S is compact we have S ⊂U0X\N1

∪ . . . ∪ U0X\Nk

for some N1, . . . , Nk. Then N1 ∩ . . . ∩ Nk is a neighborhood

of x that does not intersect any element of S, so x is in the interior of X \ D asdesired.

Chapter 5

Topologies on Functions and

Correspondences

In order to study of robustness of fixed points, or sets of fixed points, with respectto perturbations of the function or correspondence, one must specify topologies onthe relevant spaces of functions and correspondences. We do this by identifyinga function or correspondence with its graph, so that the topologies from the lastchapter can be invoked. The definitions of upper and lower semicontinuity, and theirbasic properties, are given in Section 5.1. There are two topologies on the space ofupper semicontinuous correspondences from X to Y . The strong upper topology,which is defined and discussed in Section 5.2, turns out to be rather poorly behaved,and the weak upper topology, which is usually at least as coarse, is presented inSection 5.3. When X is compact the strong upper topology coincides with the weakupper topology.

We will frequently appeal to a perspective in which a homotopy h : X× [0, 1] →Y is understood as a continuous function t 7→ ht from [0, 1] to the space of contin-uous functions from X to Y . Section 5.4 presents the underlying principle in fullgenerality for correspondences. The specializations to functions of the strong andweak upper topologies are known as the strong topology and the weak topologyrespectively. If X is regular, then the weak topology coincides with the compact-open topology, and when X is compact the strong and weak topologies coincide.Section 5.5 discusses these matters, and presents some results for functions that arenot consequences of more general results pertaining to correspondences.

The strong upper topology plays an important role in the development of thetopic, and its definition provides an important characterization of the weak uppertopology when the domain is compact, but it does not have any independent signif-icance. Throughout the rest of the book, barring an explicit counterindication, thespace of upper semicontinuous correspondences from X to Y will be endowed withthe weak upper topology, and the space of continuous functions from X to Y will beendowed with the weak topology.

76

5.1. UPPER AND LOWER SEMICONTINUITY 77

5.1 Upper and Lower Semicontinuity

Let X and Y be topological spaces. Recall that a correspondence F : X → Ymaps each x ∈ X to a nonempty F (x) ⊂ Y . The graph of F is

Gr(F ) = { (x, y) ∈ X × Y : y ∈ F (x) }.

If each F (x) is compact (closed, convex, etc.) then F is compact valued

(closed valued, convex valued, etc.). We say that F is upper semicontinuous if itis compact valued and, for any x ∈ X and open set V ⊂ Y containing F (x), thereis a neighborhood U of x such that F (x′) ⊂ V for all x′ ∈ U . When F is compactvalued, it is upper semi-continuous if and only if F−1(UV ) is a open whenever V ⊂ Yis open. Thus:

Lemma 5.1.1. A compact valued correspondence F : X → Y is upper semi-continuous if and only if it is continuous when regarded as a function from X toK(Y ).

In economics literature the graph being closed in X ×Y is sometimes presentedas the definition of upper semicontinuity. Useful intuitions and simple argumentsflow from this point of view, so we should understand precisely when it is justified.

Proposition 5.1.2. If F is upper semicontinuous and Y is a Hausdorff space, thenGr(F ) is closed.

Proof. We show that the complement of the graph is open. Suppose (x, y) /∈ Gr(F ).Since Y is Hausdorff, y and each point z ∈ F (x) have disjoint neighborhoods Vzand Wz. Since F (x) is compact, F (x) ⊂Wz1 ∪ · · · ∪Wzk for some z1, . . . , zk. ThenV := Vz1 ∩ · · · ∩ Vzk and W := Wz1 ∪ · · · ∪Wzk are disjoint neighborhoods of y andF (x) respectively. If U is a neighborhood of x with F (x′) ⊂ W for all x′ ∈ U , thenU × V is a neighborhood (x, y) that does not intersect Gr(F ).

If Y is not compact, then a compact valued correspondence F : X → Y with aclosed graph need not be upper semicontinuous. For example, suppose X = Y = R,F (0) = {0}, and F (t) = {1/t} when t 6= 0.

Proposition 5.1.3. If Y is compact and Gr(F ) is closed, then F is upper semi-continuous.

Proof. Fix x ∈ X . Since (X×Y )\Gr(F ) is open, for each y ∈ Y \V we can chooseneighborhoods Uy of x and Vy of y such that (Uy × Vy) ∩Gr(F ) = ∅. In particular,Y \ F (x) = ⋃

y∈Y \F (x) Vy is open, so F (x) is closed and therefore compact. Thus Fis compact valued.

Now fix an open neighborhood V of F (x). Since Y \ V is a closed subset of acompact space, hence compact, there are y1, . . . , yk such that Y \V ⊂ Vy1∪ . . .∪Vyk .Then F (x′) ⊂ V for all x′ ∈ Uy1 ∩ . . . ∩ Uyk .

Proposition 5.1.4. If F is upper semicontinuous and X is compact, then Gr(F )is compact.

78CHAPTER 5. TOPOLOGIES ON FUNCTIONS AND CORRESPONDENCES

Proof. We have the following implications of earlier results:

• Lemma 4.5.7 implies that the function x 7→ {x} ∈ K(X) is continuous;

• Lemma 5.1.1 implies that F is continuous, as a function from X to K(Y );

• Proposition 4.5.9 states that (K,L) 7→ K × L is a continuous function fromK(X)×K(Y ) to K(X × Y ).

Together these imply that F : x 7→ {x} × F (x) is continuous, as a function fromX to K(X × Y ). Since X is compact, it follows that F (X) is a compact subset ofK(X × Y ), so Lemma 4.5.14 implies that Gr(F ) =

⋃

x∈X F (x) is compact.

We say that F is lower semicontinuous if, for each x ∈ X , y ∈ F (x), andneighborhood V of y, there is a neighborhood U of x such that F (x′) ∩ V 6= ∅ forall x′ ∈ U . If F is both upper and lower semi-continuous, then it is said to becontinuous. When F is compact valued, it is lower semicontinuous if and only ifF−1(VV ) is open whenever V ⊂ Y is open. Combining this with Lemma 5.1.1 gives:

Lemma 5.1.5. A compact valued correspondence F : X → Y is continuous if andonly if it is continuous when regarded as a function from X to H(Y ).

5.2 The Strong Upper Topology

Let X and Y be topological spaces with Y Hausdorff, and let U(X, Y ) be the setof upper semicontinuous correspondences from X to Y . Proposition 5.1.2 insuresthat the graph of each F ∈ U(X, Y ) is closed, so there is an embedding F 7→ Gr(F )of U(X, Y ) in K0(X × Y ). The strong upper topology is the topology inducedby this embedding when the image has the subspace topology. Let US(X, Y ) beU(X, Y ) endowed with this topology. Since {U0

V : V ⊂ X × Y is open } is a subbasefor K0(X × Y ), there is a subbase of US(X, Y ) consisting of the sets of the form{F : Gr(F ) ⊂ V }.

Naturally the following result is quite important.

Theorem 5.2.1. If Y is a Hausdorff space and X is a compact subset of Y , then

FP : US(X, Y ) → K(X)

is continuous.

Proof. Since Y is Hausdorff, X and ∆ = { (x, x) : x ∈ X } are closed subsets ofY and X × Y respectively. For each F ∈ US(X, Y ), FP(F ) is the projection ofGr(F )∩∆ onto the first coordinate. Since Gr(F ) is compact (Proposition 5.1.4) sois Gr(F )∩∆, and the projection is continuous, so FP(F ) is compact. The definitionof the strong topology implies that Gr(F ) is a continuous function of F . Since ∆ isclosed in X×Y , Lemma 4.5.3 implies that Gr(F )∩∆ is a continuous function of F ,after which Lemma 4.5.10 implies that FP(F ) is a continuous function of F .

5.2. THE STRONG UPPER TOPOLOGY 79

The basic operations for combining given correspondences to create new cor-respondences are restriction to a subset of the domain, cartesian products, andcomposition. We now study the continuity of these constructions.

Lemma 5.2.2. If A is a closed subset of X, then the map F 7→ F |A is continuousas a function from US(X, Y ) to US(A, Y ).

Proof. Since A × Y is a closed subset of X × Y , continuity as a function fromUS(X, Y ) to US(A, Y )—that is, continuity of Gr(F ) 7→ Gr(F ) ∩ (A × Y )—followsimmediately from Lemma 4.5.4.

An additional hypothesis is required to obtain continuity of restriction to acompact subset of the domain, but in this case we obtain a kind of joint continuity.

Lemma 5.2.3. If X is regular, then the map (F,K) 7→ Gr(F |K) is a continuousfunction from US(X, Y ) × K(X) to K(X × Y ). In particular, for any fixed K themap F 7→ F |K is a continuous function from US(X, Y ) to US(K, Y ).

Proof. Fix F ∈ US(X, Y ), K ∈ K(X), and an open neighborhood W of Gr(F |K).For each x ∈ K Lemma 4.5.8 gives neighborhoods Ux of x and Vx of F (x) withUx × Vx ⊂ W . Choose x1, . . . , xk such that U := Ux1 ∪ . . . ∪ Uxk contains K.Since X is regular, each point in K has a closed neighborhood contained in U , andthe interiors of finitely many of these cover K, so K has a closed neighborhood Ccontained in U . Let

W ′ := (Ux1 × Vx1) ∪ . . . ∪ (Uxk × Vxk) ∪ ((X \ C)× Y ).

Then (K,Gr(F )) ∈ UintC ×UW ′, and whenever (K ′,Gr(F ′)) ∈ UintC ×UW ′ we have

Gr(F ′|K ′) ⊂ W ′ ∩ (C × Y ) ⊂ (Ux1 × Vx1) ∪ . . . ∪ (Uxk × Vxk) ⊂W.

Let X ′ and Y ′ be two other topological spaces with Y ′ Hausdorff. Since the map(C,D) 7→ C ×D is not a continuous operation on closed sets, we should not expectthe function (F, F ′) 7→ F×F ′ from US(X, Y )×US(X ′, Y ′) to US(X×X ′, Y×Y ′) to becontinuous, and indeed, after giving the matter a bit of thought, the reader shouldbe able to construct a neighborhood of the graph of the function (x, x′) 7→ (0, 0)that shows that the map (F, F ′) 7→ F ×F ′ from US(R,R)×US(R,R) to US(R2,R2)is not continuous.

We now turn our attention to composition. Suppose that, in addition to X andY , we have a third topological space Z that is Hausdorff. (We continue to assumethat Y is Hausdorff.) We can define a composition operation from (F,G) 7→ G ◦ Ffrom U(X, Y )× U(Y, Z) to U(X,Z) by letting

G(F (x)) :=⋃

y∈F (x)

F (y).

That is, G(F (x)) is the projection onto Z of Gr(G|F (x)), which is compact byProposition 5.1.4, so G(F (x)) is compact. Thus G ◦ F is compact valued. To show


that G◦F is upper semicontinuous, consider an x ∈ X , and letW be a neighborhoodof G(F (x)). For each y ∈ F (x) there is open neighborhood Vy such that G(y′) ⊂Wfor all y′ ∈ Vy. Setting V :=

⋃

y∈F (x) Vy, we have G(y) ⊂ W for all y ∈ V . If U is a

neighborhood of x such that F (x′) ⊂ V for all x′ ∈ U , then G(F (x′)) ⊂ W for allx′ ∈ U .

We can also define G ◦ F to be the correspondence whose graph is

πX×Z((Gr(F )× Z) ∩ (X ×Gr(G)))

where πX×Z : X × Y × Z → X × Z is the projection. This definition involves setoperations that are not continuous, so we should suspect that (F,G) 7→ G ◦ F isnot a continuous function from US(X, Y ) × US(Y, Z) to US(X,Z). For a concreteexample let X = Y = Z = R, and let f and g be the constant function with valuezero. If U and V are neighborhoods of the graph of f and g, there are δ, ε > 0 suchthat (−δ, δ)× (−ε, ε) ⊂ V , and consequently the set of g′ ◦ f ′ with Gr(f ′) ⊂ U andGr(g′) ⊂ V contains the set of all constant functions with values in (−ε, ε), but ofcourse there are neighborhoods of the graph of g ◦ f that do not contain this set offunctions for any ε.

5.3 The Weak Upper Topology

As in the last section, X and Y are topological spaces with Y Hausdorff. Thereis another topology on U(X, Y ) that is in certain ways more natural and betterbehaved than the strong upper topology. Recall that if {Bi}i∈I is a collection oftopological spaces and { fi : A → Bi }i∈I is a collection of functions, the quotient

topology on A induced by this data is the coarsest topology such that each fiis continuous. The weak upper topology on U(X, Y ) is the quotient topologyinduced by the functions F 7→ F |K ∈ US(K, Y ) for compact K ⊂ X . Since afunction is continuous if and only if the preimage of every subbasic subset of therange is open, a subbase for the weak upper topology is given by the sets of theform {F : Gr(F |K) ⊂ V } where K ⊂ X is compact and V is a (relatively) opensubset of K × Y .

Let UW (X, Y ) be U(X, Y ) endowed with the weak upper topology. As in thelast section, we study the continuity of basic constructions.

Lemma 5.3.1. If A is a closed subset of X, then the map F 7→ F |A is continuousas a function from UW (X, Y ) to UW (A, Y ).

Proof. If A has the quotient topology induced by { fi : A→ Bi }i∈I , then a functiong : Z → A is continuous if each composition fi ◦ g is continuous. (The sets of theform f−1

i (Vi), where Vi ⊂ Bi is open, constitute a subbase of the quotient topology,so this follows from Lemma 4.5.1.) To show that the composition F 7→ F |A 7→ F |Kis continuous whenever K is a compact subset of A we simply observe that K iscompact as a subset of X , so this follows directly from the definition of the topologyof UW (X, Y ).

Lemma 5.3.2. If every compact set in X is closed (e.g., because X is Hausdorff)then the topology of UW (X, Y ) is at least as coarse as the topology of US(X, Y ). If,in addition, X is itself compact, then the two topologies coincide.

5.3. THE WEAK UPPER TOPOLOGY 81

Proof. We need to show that the identity map from US(X, Y ) to UW (X, Y ) is con-tinuous, which is to say that for any given compact K ⊂ X , the map Gr(F ) →Gr(F |K) = Gr(F )∩ (K×Y ) is continuous. This follows from Lemma 5.3.1 becauseK × Y is closed in X × Y whenever K is compact.

If X is compact, the continuity of the identity map from UW (X, Y ) to US(X, Y )follows directly from the definition of the weak upper topology.

There is a useful variant of Lemma 5.2.3.

Lemma 5.3.3. If X is normal, Hausdorff, and locally compact, then the function(K,F ) 7→ Gr(F |K) is a continuous function from K(X)×UW (X, Y ) to K(X × Y ).

Proof. We will demonstrate continuity at a given point (K,F ) in the domain. Localcompactness implies that there is a compact neighborhood C of K. The mapF ′ 7→ F ′|C from U(X, Y ) to US(C, Y ) is a continuous function by virtue of thedefinition of the topology of U(X, Y ). Therefore Lemma 5.2.3 implies that thecomposition (K ′, F ′) → (K ′, F ′|C) → Gr(F |K ′) is continuous, and of course itagrees with the function in question on a neighborhood of (K,F ).

In contrast with the strong upper topology, for the weak upper topology carte-sian products and composition are well behaved. Let X ′ and Y ′ be two other spaceswith Y ′ Hausdorff.

Lemma 5.3.4. If X and X ′ are Hausdorff, then the function (F, F ′) 7→ F × F ′

from UW (X, Y )× UW (X ′, Y ′) to UW (X ×X ′, Y × Y ′) is continuous.

Proof. First suppose that X and X ′ are compact. Then, by Proposition 5.1.4,the graphs of upper semicontinuous functions with these domains are compact,and continuity of the function (F, F ′) 7→ F × F ′ from US(X, Y ) × US(X ′, Y ′) toUS(X ×X ′, Y × Y ′) follows from Proposition 4.5.9.

Because UW (X × X ′, Y × Y ′) has the quotient topology, to establish the gen-eral case we need to show that (F, F ′) 7→ F × F ′|C is a continuous function fromUW (X, Y ) × UW (X ′, Y ′) to US(C, Y × Y ′) whenever C ⊂ X ×X ′ is compact. LetK and K ′ be the projections of C onto X and X ′ respectively; of course these setsare compact. The map in question is the composition

(F, F ′) → (F |K , F ′|K ′) → F |K × F ′|K ′ → (F |K × F ′|K ′)|C .The continuity of the second map has already been established, and the continuityof the first and third maps follows from Lemma 5.3.1, because compact subsets ofHausdorff spaces are closed and products of Hausdorff spaces are Hausdorff1.

Suppose that, in addition to X and Y , we have a third topological space Z thatis Hausdorff.

Lemma 5.3.5. If K ⊂ X is compact, Y is normal and locally compact, and X ×Y × Z is normal, then

(F,G) 7→ Gr(G ◦ F |K)is a continuous function from UW (X, Y )× UW (Y, Z) to K(X × Z).

1I do not know if the compact subsets of X×X ′ are closed when X and X ′ are compact spaceswhose compact subsets are closed.


Proof. The map F 7→ Gr(F |K) is a continuous function from UW (X, Y ) to K(X×Y )by virtue of the definition of the weak upper topology, and the natural projection ofX×Y onto Y is continuous, so Lemma 4.5.10 implies that im(F |K) is a continuousfunction of (K,F ). Since Y is normal and locally compact, Lemma 5.3.3 impliesthat (F,G) 7→ Gr(G|im(F |K)) is a continuous function from UW (X, Y )×UW (Y, Z) toK(X × Z), and again (F,G) 7→ im(G|im(F |K)) is also continuous. The continuity ofcartesian products of compact sets (Proposition 4.5.9) now implies that

Gr(F |K)× im(G|im(F |K)) and K ×Gr(G|im(F |K))

are continuous functions of (K,F,G). Since X is T1 while Y and Z are Hausdorff,X × Y × Z is T1, so Lemma 4.5.6 implies that the intersection

{ (x, y, z) : x ∈ K, y ∈ F (x), and z ∈ G(y) }

of these two sets is a continuous function of (K,F,G), and Gr(G ◦ F |K) is theprojection of this set onto X × Z, so the claim follows from another application ofLemma 4.5.10.

As we explained in the proof of Lemma 5.3.1, the continuity of (F,G) 7→ G◦F |Kfor each compact K ⊂ X implies that (F,G) 7→ G◦F is continuous when the rangehas the weak upper topology, so:

Proposition 5.3.6. If X is T1, Y is normal and locally compact, and X×Y ×Z isnormal, then (F,G) 7→ G ◦ F is a continuous function from UW (X, Y )× UW (Y, Z)to UW (X,Z).

5.4 The Homotopy Principle

Let X , Y , and Z be topological spaces with Z Hausdorff, and fix a compactvalued correspondence F : X × Y → Z. For each x ∈ X let Fx : Y → Z bethe derived correspondence y 7→ F (x, y). Motivated by homotopies, we study therelationship between the following two conditions:

(a) x 7→ Fx is a continuous function from X to US(Y, Z);

(b) F is upper semi-continuous.

If F : X × Y → Z is upper semicontinuous, then x 7→ Fx will not necessarily becontinuous without some additional hypothesis. For example, let X = Y = Z = R,and suppose that F (0, y) = {0} for all y ∈ Y . Without F being in any sense poorlybehaved, it can easily happen that for x arbitrarily close to 0 the graph of Fx is notcontained in { (y, z) : |z| < (1 + y2)−1 }.

Lemma 5.4.1. If Y is compact and F is upper semicontinuous, then x 7→ Fx is acontinuous function from X to US(Y, Z).

5.5. CONTINUOUS FUNCTIONS 83

Proof. For x ∈ X let Fx : Y → Y ×Z be the correspondence Fx(y) := {y}×Fx(y).Clearly Fx is compact valued and continuous as a function from Y to K(Y × Z).Since Y is compact, the image of Fx is compact, so Lemma 4.5.14 implies thatGr(Fx) =

⋃

y∈Y Fx(y) is compact, and Lemma 4.5.15 implies that it is closed.Since Z is a Hausdorff space, Proposition 5.1.2 implies that Gr(F ) is closed.

Now Proposition 5.1.3 implies that x 7→ Gr(Fx) is upper semicontinuous, which isthe same (by Lemma 5.1.1) as it being a continuous function from X to K(Y ×Z).But since Gr(Fx) is closed for all x, this is the same as it being a continuous functionfrom X to K0(Y × Z), and in view of the definition of the topology of US(Y, Z),this is the same as x 7→ Fx being continuous.

Lemma 5.4.2. If Y is regular and x 7→ Fx is a continuous function from X toUS(Y, Z), then F is upper semicontinuous.

Proof. Fix (x, y) ∈ X×Y and a neighborhoodW ⊂ Z of F (x, y). Since Fx is uppersemicontinuous, there is neighborhood V of y such that F (x, y′) ⊂W for all y′ ∈ V .Applying the regularity of Y , let V be a closed neighborhood of y contained in V .Since x 7→ Fx is continuous, there is a neighborhood U ⊂ X of x such that

Gr(Fx′) ⊂ (V ×W ) ∪ ((Y \ V )× Z)

for all x′ ∈ U . Then F (x′, y′) ⊂W for all (x′, y′) ∈ U × V .

For the sake of easier reference we combine the last two results.

Theorem 5.4.3. If Y is regular and compact, then F is upper semicontinuous ifand only if x 7→ Fx is a continuous function from X to US(Y, Z).

5.5 Continuous Functions

If X and Y are topological spaces with Y Hausdorff, CS(X, Y ) and CW (X, Y )will denote the space of continuous functions with the topologies induced by theinclusions of C(X, Y ) in US(X, Y ) and UW (X, Y ). In connection with continuousfunctions, these topologies are know as the strong topology and weak topology

respectively. Most of the properties of interest are automatic corollaries of our earlierwork; this section contains a few odds and ends that are specific to functions.

If K ⊂ X is compact and V ⊂ Y is open, let CK,V be the set of continuousfunctions f such that f(K) ⊂ V . The compact-open topology is the topologygenerated by the subbasis

{ CK,V : K ⊂ X is compact, V ⊂ Y is open },

and CCO(X, Y ) will denote the space of continuous functions from X to Y endowedwith this topology. The set of correspondences F : X → Y with Gr(F |K) ⊂ K × Vis open in UW (X, Y ), so the compact-open topology is always at least as coarse asthe topology inherited from UW (X, Y ).

Proposition 5.5.1. Suppose X is regular. Then the compact-open topology coin-cides with the weak topology.


Proof. What this means concretely is that whenever we are given a compactK ⊂ X ,an open set W ⊂ K × Y , and a continuous f : X → Y with Gr(f |K) ⊂ W , we canfind a compact-open neighborhood of f whose elements f ′ satisfy Gr(f ′|K) ⊂ W .For each x ∈ K the definition of the product topology gives open sets Ux ⊂ K andVx ⊂ Y such that (x, f(x)) ∈ Ux × Vx ⊂W . Since f is continuous, by replacing Uxwith a smaller open neighborhood if necessary, we may assume that f(Ux) ⊂ Vx.Since X is regular, x has a closed neighborhood Cx ⊂ Ux, and Cx is compact becauseit is a closed subset of a compact set. Then f ∈ CCx,Vx for each x. We can findx1, . . . , xn such that K = Cx1 ∪ . . . ∪ Cxn, and clearly Gr(f ′|K) ⊂W whenever

f ′ ∈ CCx1 ,Vx1∩ . . . ∩ CCxn ,Vxn .

For functions there is a special result concerning continuity of composition.

Proposition 5.5.2. If X is compact and f : X → Y is continuous, then g 7→ g ◦ fis a continuous function from CCO(Y, Z) → CCO(X,Z).

Proof. In view of the subbasis for the strong topology, it suffices to show, for a givencontinuous g : Y → Z and an open V ⊂ X × Z containing the graph of g ◦ f , that

N = { (y, z) ∈ Y × Z : f−1(y)× {z} ⊂ V }

is a neighborhood of the graph of g. If not, then some point (y, g(y)) is an accumu-lation point of points of the form (f(x′), z) where (x′, z) /∈ V . Since X is compact,it cannot be the case that for each x ∈ X there are neighborhoods A of x and B of(y, g(y)) such that

{ (x′, z) ∈ (A× Z) \ V : (f(x′), z) ∈ B } = ∅.

Therefore there is some x ∈ X such that for any neighborhoods A of x and Bof (y, g(y)) there is some x′ ∈ A and z such that (x′, z) /∈ V and (f(x′), z) ∈ B.Evidently f(x) = y. To obtain a contradiction choose neighborhoods A of x andW of g(y) such that A×W ⊂ V , and set B = Y ×W .

The following simple result, which does not depend on any additional assump-tions on the spaces, is sometimes just what we need.

Proposition 5.5.3. If g : Y → Z is continuous, then f 7→ g ◦ f is a continuousfunction from CS(X, Y ) to CS(X,Z).

Proof. If U ⊂ X × Z is open, then so is (IdX × g)−1(U).

Chapter 6

Metric Space Theory

In this chapter we develop some advanced results concerning metric spaces.An important tool, partitions of unity, exist for locally finite open covers of

a normal space: this is shown in Section 6.2. But sometimes we will be givena local cover that is not necessarily locally finite, so we need to know that anyopen cover has a locally finite refinement. A space is paracompact if this is thecase. Paracompactess is studied in Section 6.1; the fact that metric spaces areparacompact will be quite important.

Section 6.3 describes most of the rather small amount we will need to knowabout topological vector spaces. Of these, the most important for us are the locallyconvex spaces, which have many desirable properties. One of the larger themes ofthis study is that the concepts and results of fixed point theory extend naturally tothis level of generality, but not further.

Two important types of topological vector spaces, Banach spaces and Hilbertspaces, are introduced in Section 6.4. Results showing that metric spaces can beembedded in such linear spaces are given in Section 6.5. Section 6.6 presents aninfinite dimensional generalization of the Tietze extension theorem due to Dugundji.

6.1 Paracompactness

Fix a topological space X . A family {Sα}α∈A of subsets of X is locally finite ifevery x ∈ X has a neighborhood W such that there are only finitely many α withW ∩Sα 6= ∅. If {Uα}α∈A is a cover of X , a second cover {Vβ}β∈B is a refinement of{Uα}α∈A if each Vβ is a subset of some Uα. The space X is paracompact if everyopen cover is refined by an open cover that is locally finite. This section is devotedto the proof of:

Theorem 6.1.1. A metric space is paracompact.

This result is due to Stone (1948). At first the proofs were rather complex, buteventually Rudin (1969) found a brief and simple argument. A well ordering of aset Z is a complete ordering ≤ such that any A ⊂ Z has a least element. That anyset Z has a well ordering is the assertion of the well ordering theorem, whichis a simple consequence of Zorn’s lemma. Let O be the set of all pairs (Z ′,≤′)where Z ′ ⊂ Z and ≤′ is a well ordering of Z. We order O by specifying that

85

86 CHAPTER 6. METRIC SPACE THEORY

(Z ′,≤′) � (Z ′′,≤′′) if Z ′ ⊂ Z ′′, ≤′ is the restriction of ≤′′ to Z ′ and z′ ≤′′ z′′ forall z′ ∈ Z ′ and z′′ ∈ Z ′′ \ Z ′. Any chain in O has an upper bound in O (just takethe union of all the sets and all the orderings) so Zorn’s lemma implies that O hasa maximal element (Z∗,≤∗). If there was a z ∈ Z \ Z∗ we could extend ≤∗ toa well ordering of Z∗ ∪ {z} by specifying that every element of Z∗ is less than z.This would contradict maximality, so we must have Z∗ = Z. (The axiom of choice,Zorn’s lemma, and the well ordering theorem are actually equivalent; cf. Kelley(1955).)

Proof of Theorem 6.1.1. Let {Uα}α∈A be an open cover of X where A is a wellordered set. We define sets Vαn for α ∈ A and n = 1, 2, . . ., inductively (over n) asfollows: let Vαn be the union of the balls U2−n(x) for those x such that:

(a) α is the least element of A such that x ∈ Uα;

(b) x /∈ ⋃

j<n,β∈A Vβj ;

(c) U3·2−n(x) ⊂ Uα.

For each x there is a least α such that x ∈ Uα and an n large enough that (c) holds,so x ∈ Vαn unless x ∈ Vβj for some β and j < n. Thus {Vαn} is a cover of X , andof course each Vαn is open and contained in Uα, so it is a refinement of {Uα}.

To prove that the cover is locally finite we fix x, let α be the least element ofA such that x ∈ Vαn for some n, and choose j such that U2−j (x) ⊂ Vαn. We claimthat U2−n−j (x) intersects only finitely many Vβi.

If i > j and y satisfies (a)-(c) with β and i in place of α and n, then U2−n−j (x)∩U2−i(y) = ∅ because U2−j (x) ⊂ Vαn, y /∈ Vαn, and n + j, i ≥ j + 1. ThereforeU2−n−j (x) ∩ Vβi = ∅.

For i ≤ j we will show that there is at most one β such that U2−n−j (x) intersectsVβi. Suppose that y and z are points satisfying (a)-(c) for β and γ, with i in placeof j. Without loss of generality β preceeds γ. Then U3·2−i(y) ⊂ Uβ, z /∈ Uβ , andn + j > i, so U2−n−j (x) cannot intersect both U2−i(y) and U2−i(z). Since this isthe case for all y and z, U2−n−j (x) cannot intersect both Vβi and Vγi.

6.2 Partitions of Unity

We continue to work with a fixed topological space X . This section’s centralconcept is:

Definition 6.2.1. Let {Uα}α∈A be a locally finite open cover of X. A partition

of unity subordinate to {Uα} is a collection of continuous functions {ψα : X →[0, 1]} such that ψα(x) = 0 whenever x /∈ Uα and

∑

α∈A ψα(x) = 1 for each x.

The most common use of a partition of unity is to construct a global functionor correspondence with particular properties. Typically locally defined functionsor correspondences are given or can be shown to exist, and the global object isconstructed by taking a “convex combination” of the local objects, with weightsthat vary continuously. Of course to apply this method one must have resultsguaranteeing that suitable partitions of unity exist. Our goal in this section is:

6.2. PARTITIONS OF UNITY 87

Theorem 6.2.2. For any locally finite open cover {Uα}α∈A of a normal space Xthere is a partition of unity subordinate to {Uα}.

A basic tool used in the constructive proof of this result, and many others, is:

Lemma 6.2.3 (Urysohn’s Lemma). If X is a normal space and C ⊂ U ⊂ X with Cclosed and U open, then there is a continuous function ϕ : X → [0, 1] with ϕ(x) = 0for all x ∈ C and ϕ(x) = 1 for all x ∈ X \ U .

Proof. Since X is normal, whenever C ′ ⊂ U ′, with C ′ closed and U ′ open, thereexist a closed C ′′ and an open U ′′ such that C ′ ⊂ U ′′, X \ U ′ ⊂ X \ C ′′, andU ′′ ∩ (X \ C ′′) = ∅, which is to say that C ′ ⊂ U ′′ ⊂ C ′′ ⊂ U ′. Let C0 := C andU1 := U . Choose an open U1/2 and a closed C1/2 with C0 ⊂ U1/2 ⊂ C1/2 ⊂ U1.Choose an open U1/4 and a closed C1/4 with C0 ⊂ U1/4 ⊂ C1/4 ⊂ U1/2, and choosean open U3/4 and a closed C3/4 with C1/2 ⊂ U3/4 ⊂ C3/4 ⊂ U1. Continuing inthis fashion, we obtain a system of open sets Ur and a system of closed sets Cr forrationals r ∈ [0, 1] of the form k/2m (except that C1 and U0 are undefined) withUr ⊂ Cr ⊂ Us ⊂ Cs whenever r < s.

For x ∈ X let

ϕ(x) :=

{

inf{ r : x ∈ Cr }, x ∈ ⋃

r Cr

1, otherwise.

Clearly ϕ(x) = 0 for all x ∈ C and ϕ(x) = 1 for all x ∈ X \ U . Any open subsetof [0, 1] is a union of finite intersections of sets of the form [0, a) and (b, 1], where0 < a, b < 1, and

ϕ−1(

[0, a))

=⋃

r<a

Ur and ϕ−1(

(b, 1])

=⋃

r>b

(X \ Cr)

are open, so ϕ is continuous.

Below we will apply Urysohn’s lemma to a closed subset of each element of alocally finite open cover. We will need X to be covered by these closed sets, as perthe next result.

Proposition 6.2.4. If X is a normal space and {Uα}α∈A is a locally finite coverof X, then there is an open cover {Vα}α∈A such that for each α, the closure of Vαis contained in Uα.

Proof. A partial thinning of {Uα}α∈A is a function F from a subset B of A to theopen sets of X such that:

(a) for each β ∈ B, the closure of F (β) is contained in Uβ;

(b)⋃

β∈B F (β) ∪⋃

α∈A\B Uα = X .

Our goal is to find such an F with B = A. The partial thinnings can be partiallyordered as follows: F < G if the domain of F is a proper subset of the domain ofG and F and G agree on this set. We will show that this ordering has maximalelements, and that the domain of a maximal element is all of A.


Let {Fι}ι∈I be a chain of partial thinnings. That is, for all distinct ι, ι′ ∈ I,either Fι < Fι′ or Fι′ < Fι. Let the domain of each Fι be Bι, let B :=

⋃

ιBι, andfor β ∈ B let F (β) be the common value of Fι(β) for those ι with β ∈ Bι. For eachx ∈ X there is some ι with Fι(β) = F (β) for all β ∈ B such that x ∈ Uβ becausethere are only finitely many α with x ∈ Uα. Therefore F satisfies (b). We haveshown that any chain of partial thinnings has an upper bound, so Zorn’s lemmaimplies that the set of all partial thinnings has a maximal element.

If F is a partial thinning with domain B and α′ ∈ A \B, then

X \(

⋃

β∈BF (β) ∪

⋃

α∈A\B,α6=α′

Uα

)

is a closed subset of Uα, so it has an open superset Vα′ whose closure is contained inUα. We can define a partial thinning G with domain B∪{α′} by setting G(α′) := Vα′

and G(β) := F (β) for β ∈ B. Therefore F cannot be maximal unless its domain isall of A.

Proof of Theorem 6.2.2. The result above gives a closed cover {Cα}α∈A of X withCα ⊂ Uα for each α. For each α let ϕα : X → [0, 1] be continuous with ϕα(x) = 0for all x ∈ X \ Uα and ϕα(x) = 1 for all x ∈ Cα. Then

∑

α ϕα is well definedand continuous everywhere since {Uα} is locally finite, and it is positive everywheresince {Cα} covers X . For each α ∈ A set

ψα :=ϕα

∑

α′ ϕα′

.

6.3 Topological Vector Spaces

Since we wish to develop fixed point theory in as much generality as is reasonablypossible, infinite dimensional vector spaces will inevitably appear at some point. Inaddition, these spaces will frequently be employed as tools of analysis. The resultin the next section refers to such spaces, so this is a good point at which to coverthe basic definitions and elementary results.

A topological vector space V is a vector space over the real numbers1 that isendowed with a topology that makes addition and scalar multiplication continuous,and makes {0} a closed set. Topological vector spaces, and maps between them,are the objects studied in functional analysis. Over the last few decades functionalanalysis has grown into a huge body of mathematics; it is fortunate that our workhere does not require much more than the most basic definitions and facts.

We now lay out elementary properties of V . For any given w ∈ V the mapsv 7→ v + w and v 7→ v − w are continuous, hence inverse homeomorphisms. Thatis, the topology of V is translation invariant. In particular, the topology of V is

1Other fields of scalars, in particular the complex numbers, play an important role in functionalanalysis, but have no applications in this book.

6.3. TOPOLOGICAL VECTOR SPACES 89

completely determined by a neighborhood base of the origin, which simplifies manyproofs.

The following facts are basic.

Lemma 6.3.1. If C ⊂ V is convex, then so is its closure C.

Proof. Aiming at a contradiction, suppose that v = (1− t)v0 + tv1 is not in C eventhough v0, v1 ∈ C and 0 < t < 1. Let U be a neighborhood of v that does notintersect C. The continuity of addition and scalar multiplication implies that thereare neighborhoods U0 and U1 of v0 and v1 such that (1 − t)v′0 + tv′1 ∈ U for allv′0 ∈ U0 and v′1 ∈ U1. Since U0 and U1 contain points in C, this contradicts theconvexity of C.

Lemma 6.3.2. If A is a neighborhood of the origin, then there is closed neighbor-hood of the origin U such that U + U ⊂ A.

Proof. Continuity of addition implies that there are neighborhoods of the originB1, B2 with B1 +B2 ⊂ A, and replacing these with their intersection gives a neigh-borhood B such that B+B ⊂ A. If w ∈ B, then w−B intersects any neighborhoodof the origin, and in particular (w −B) ∩B 6= ∅. Thus B ⊂ B +B ⊂ A. Applyingthis argument again gives a closed neighborhood U of the origin with U ⊂ B.

We can now establish the separation properties of V .

Lemma 6.3.3. V is a regular T1 space, and consequently a Hausdorff space.

Proof. Since {0} is closed, translation invariance implies that V is T1. Translationinvariance also implies that to prove regularity, it suffices to show that any neigh-borhood of the origin, say A, contains a closed neighborhood, and this is part ofwhat the last result asserts. As has been pointed out earlier, a simple and obviousargument shows that a regular T1 space is Hausdorff.

We can say slightly more in this direction:

Lemma 6.3.4. If K ⊂ V is compact and U is a neighborhood of K, then there isa closed neighborhood W of the origin such that K +W ⊂ U .

Proof. For each v ∈ K Lemma 6.3.2 gives a closed neighborhood Wv of the origin,which is convex if V is locally convex, such that v+Wv +Wv ⊂ U . Then there arev1, . . . , vn such that v1+Wv1 , . . . , vn+Wvn is a cover of K. LetW := Wv1∩. . .∩Wvn .For any v ∈ K there is an i such that v ∈ vi +Wi, so that

v +W ⊂ vi +Wvi +Wvi ⊂ U.

A topological vector space is locally convex if every neighborhood of the origincontains a convex neighborhood. In several ways the theory of fixed points devel-oped in this book depends on local convexity, so for the most part locally convextopological vector spaces represent the outer limits of generality considered here.


Lemma 6.3.5. If V is locally convex and A is a neighborhood of the origin, thenthere is closed convex neighborhood of the origin W such that W +W ⊂ A.

Proof. Lemma 6.3.2 gives a closed neighborhood U of the origin such that U +U ⊂A, the definition of local convexity gives a convex neighborhood of the origin Wthat is contained in U . If we replace W with its closure, it will still be convex dueto Lemma 6.3.1.

6.4 Banach and Hilbert Spaces

We now describe two important types of locally convex spaces. A norm on Vis a function ‖ · ‖ : V → R≥ such that:

(a) ‖v‖ = 0 if and only if v = 0;

(b) ‖αv‖ = |α| · ‖v‖ for all α ∈ R and v ∈ V ;

(c) ‖v + w‖ ≤ ‖v‖+ ‖w‖ for all v, w ∈ V .

Condition (c) implies that the function (v, w) 7→ ‖v −w‖ is a metric on V , and weendow V with the associated topology. Condition (a) implies that {0} is closed be-cause every other point has a neighborhood that does contain the origin. Conditions(b) and (c) give the calculations

‖α′v′ − αv‖ ≤ ‖α′v′ − α′v‖+ ‖α′v − αv‖ = |α′| · ‖v′ − v‖+ |α′ − α| · ‖v‖

and‖(v′ + w′)− (v + w)‖ ≤ ‖v′ − v‖+ ‖w′ − w‖,

which are easily seen to imply that scalar multiplication and addition are continuous.A vector space endowed with a norm and the associated metric and topology iscalled a normed space.

For a normed space the calculation

‖(1− α)v + αw‖ ≤ ‖(1− α)v‖+ ‖αw‖ = (1− α)‖v‖+ α‖w‖ ≤ max{‖v‖, ‖w‖}

shows that for any ε > 0, the open ball of radius ε centered at the origin is convex.The open ball of radius ε centered at any other point is the translation of this ball,so a normed space is locally convex.

A sequence {vm} in a topological vector space V is a Cauchy sequence if, foreach neighborhood A of the origin, there is an integer N such that vm− vn ∈ A forall m,n ≥ N . The space V is complete if its Cauchy sequences are all convergent.A Banach space is a complete normed space.

For the most part there is little reason to consider topological vector spacesthat are not complete except insofar as they occur as subspaces of complete spaces.The reason for this is that any topological vector space V can be embedded ina complete space V whose elements are equivalence classes of Cauchy sequences,where two Cauchy sequence {vm} and {wn} are equivalent if, for each neighborhoodA of the origin, there is an integer N such that vm−wn ∈ A for all m,n ≥ N . (This

6.4. BANACH AND HILBERT SPACES 91

relation is clearly reflexive and symmetric. To see that it is transitive, suppose {uℓ}is equivalent to {vm} which is in turn equivalent to {wn}. For any neighborhood Aof the origin the continuity of addition implies that there are neighborhoods B,C ofthe origin such that B+C ⊂ A. There is N such that uℓ−vm ∈ B and vm−wn ∈ Cfor all ℓ,m, n ≥ N , whence uℓ − wn ∈ A.) Denote the equivalence class of {vm} by[vm]. The vector operations have the obvious definitions: [vm] + [wn] := [vm + wm]and α[vm] := [αvm]. The open sets of V are the sets of the form

{ [vm] : vm ∈ A for all large m }

where A ⊂ V is open. (It is easy to see that the condition “vm ∈ A for all largem” does not depend on the choice of representative {vm} of [vm].) A completejustification of this definition would require verifications of the vector space axioms,the axioms for a topological space, the continuity of addition and scalar multiplica-tion, and that {0} is a closed set. Instead of elaborating, we simply assert that thereader who treats this as an exercise will find it entirely straightforward. A similarconstruction can be used to embed any metric space in a “completion” in which allCauchy sequences (in the metric sense) are convergent.

As in the finite dimensional case, the best behaved normed spaces have innerproducts. An inner product on a vector space V is a function 〈·, ·〉 : V × V → R

that is symmetric, bilinear, and positive definite:

(i) 〈v, w〉 = 〈w, v〉 for all v, w ∈ V ;

(ii) 〈αv + v′, w〉 = α〈v, w〉+ 〈v′, w〉 for all v, v′, w ∈ V and α ∈ R;

(iii) 〈v, v〉 ≥ 0 for all v ∈ V , with equality if and only if v = 0.

We would like to define a norm by setting ‖v‖ := 〈v, v〉1/2. This evidently satisfies(a) and (b) of the definition of a norm. The verification of (c) begins with thecomputation

0 ≤⟨

〈v, v〉w − 〈v, w〉v, 〈v, v〉w− 〈v, w〉v⟩

= 〈v, v〉(

〈v, v〉〈w,w〉 − 〈v, w〉2)

,

which implies the Cauchy-Schwartz inequality: 〈v, w〉 ≤ ‖v‖ · ‖w‖ for all v, w ∈V . This holds with equality if v = 0 or 〈v, v〉w − 〈v, w〉v, which is the case ifand only if w is a scalar multiple of v, and otherwise the inequality is strict. TheCauchy-Schwartz inequality implies the inequality in the calculation

‖v + w‖2 = 〈v + w, v + w〉 = ‖v‖2 + 2〈v, w〉+ ‖w‖2 ≤ (‖v‖+ ‖w‖)2,

which implies (c) and completes the verification and ‖ · ‖ is a norm. A vector spaceendowed with an inner product and the associated norm and topology is called aninner product space. A Hilbert space is a complete inner product space.

Up to linear isometry there is only one separable2 Hilbert space. Let

H := { s = (s1, s2, . . .) ∈ R∞ : s21 + s22 + · · · <∞}2Recall that a metric space is separable if it contains a countable set of points whose closure

is the entire space.


be the Hilbert space of square summable sequences. Let 〈s, t〉 := ∑

i siti be the usualinner product; the Cauchy-Schwartz inequality implies that this sum is convergent.For any Cauchy sequence in H and for each i, the sequence of ith components isCauchy, and the element of R∞ whose ith component is the limit of this sequence iseasily shown to be the limit in H of the given sequence. Thus H is complete. Theset of points with only finitely many nonzero components, all of which are rational,is a countable dense subset, so H is separable.

We wish to show that any separable Hilbert space is linearly isomorphic to H , solet V be a separable Hilbert space, and let {v1, v2, . . . } be a countable dense subset.The span of this set is also dense, of course. Using the Gram-Schmidt process, wemay pass from this set to a countable sequence w1, w2, . . . of orthnormal vectorsthat has the same span. It is now easy to show that s 7→ s1w1 + s2w2 + · · · is alinear isometry between H and V .

6.5 EmbeddingTheorems

An important technique is to endow metric spaces with geometric structures byembedding them in normed spaces. Let (X, d) be a metric space, and let C(X) bethe space of bounded continuous real valued functions on X . This is, of course, avector space under pointwise addition and scalar multiplication. We endow C(X)with the norm

‖f‖∞ = supx∈X

|f(x)|.

Lemma 6.5.1. C(X) is a Banach space.

Proof. The verification that ‖ · ‖∞ is actually a norm is elementary and left tothe reader. To prove completeness suppose that {fn} is a Cauchy sequence. Thissequence has a pointwise limit f because each {fn(x)} is Cauchy, and we needto prove that f is continuous. Fix x ∈ X and ε > 0. There is an m such that‖fm−fn‖ < ε/3 for all n ≥ m, and there is a δ > 0 such that |fm(x′)−fm(x)| < ε/3for all x′ ∈ Uδ(x). For such x

′ we have

|f(x′)− f(x)| ≤ |f(x′)− fm(x′)|+ |fm(x′) + fm(x)|+ |fm(x)− f(x)| < ε.

Theorem 6.5.2 (Kuratowski (1935), Wojdyslawski (1939)). X is homeomorphicto a relatively closed subset of a convex subset of C(X). If X is complete, then itis homeomorphic to a closed subset of C(X).

Proof. For each x ∈ X let fx ∈ C(X) be the function fx(y) := min{1, d(x, y)}; themap h : x 7→ fx is evidently an injection from X to C(X). For any x, y ∈ X wehave

‖fx−fy‖∞ = supz

|min{1, d(x, z)}−min{1, d(y, z)}| ≤ supz

|d(x, z)−d(y, z)| ≤ d(x, y),

so h is continuous. On the other hand, if {xn} is a sequence such that fxn → fx,then min{1, d(xn, x)} = |fxn(x)− fx(x)| ≤ ‖fxn − fx‖∞ → 0, so xn → x. Thus theinverse of h is continuous, so h is a homeomorphism.

6.6. DUGUNDJI’S THEOREM 93

Now suppose that fxn converges to an element f =∑k

i=1 λifyi of the convex hullof h(X). We have ‖fxn − f‖∞ → 0 and

‖fxn − f‖∞ ≥ |fxn(xn)− f(xn)| = |f(xn)|,

so f(xn) → 0. For each i we have 0 ≤ fyi(xn) ≤ f(xn)/λi → 0, which implies thatxn → yi, whence f = fy1 = · · · = fyk ∈ h(X). Thus h(X) is closed in the relativetopology of its convex hull.

Now suppose that X is complete, and that {xn} is a sequence such that fxn → f .Then as above, min{1, d(xm, xn)} ≤ ‖fxm −fxn‖∞, and {fxn} is a Cauchy sequence,so {xn} is also Cauchy and has a limit x. Above we saw that fxn → fx, so fx = f .Thus h(X) is closed in C(X).

The so-called Hilbert cube is

I∞ := { s ∈ H : |si| ≤ 1/i for all i = 1, 2, . . . }.

For separable metric spaces we have the following refinement of Theorem 6.5.2.

Theorem 6.5.3. (Urysohn) If (X, d) is a separable metric space, there is an em-bedding ι : X → I∞.

Proof. Let { x1, x2, . . . } be a countable dense subset of X . Define ι : X → I∞ bysetting

ιi(x) := min{d(x, xi), 1/i}.Clearly ι is a continuous injection. To show that the inverse is continuous, supposethat {xj} is a sequence with ι(xj) → ι(x). If it is not the case that xj → x, thenthere is a neighborhood U that (perhaps after passing to a subsequence) does nothave any elements of the sequence. Choose xi in that neighborhood. The sequenceof numbers min{d(xj, xi), 1/i} is bounded below by a positive number, contrary tothe assumption that ι(xj) → ι(x).

6.6 Dugundji’s Theorem

The well known Tietze extension theorem asserts that if a topological space Xis normal and f : A → [0, 1] is continuous, where A ⊂ X is closed, then f has acontinuous extension to all of X . A map into a finite dimensional Euclidean spaceis continuous if its component functions are each continuous, so Tietze’s theoremis adequate for finite dimensional applications. Mostly, however, we will work withspaces that are potentially infinite dimensional, for which we will need the followingvariant due to Dugundji (1951).

Theorem 6.6.1. If A is a closed subset of a metric space (X, d), Y is a locallyconvex topological vector space, and f : A → Y is continuous, then there is acontinuous extension f : X → Y whose image is contained in the convex hull off(A).


Proof. The sets Ud(x,A)/2(x) are open and cover X \ A. Theorem 6.1.1 implies theexistence of an open locally finite refinement {Wα}α∈I . Theorem 6.2.2 implies theexistence of a partition of unity {ϕα}α∈I subordinate to {Wα}α∈I . For each α chooseaα ∈ A with d(aα,Wα) < 2d(A,Wα), and define the extension by setting

f(x) :=∑

α∈Iϕα(x)f(aα) (x ∈ X \ A).

Clearly f is continuous at every point of X \ A and at every interior point of A.Let a be a point in the boundary of A, let U be a neighborhood of f(a), which wemay assume to be convex, and choose δ > 0 small enough that f(a′) ∈ U whenevera′ ∈ Uδ(a) ∩ A. Consider x ∈ Uδ/7(a) ∩ (X \ A). For any α such that x ∈ Wα andx′ such that Wα ⊂ Ud(x′,A)/2(x

′) we have

d(aα,Wα) ≥ d(aα, x′)− d(x′, A)/2 ≥ d(aα, x

′)− d(x′, aα)/2 = d(aα, x′)/2

andd(x′, x) ≤ d(x′, A)/2 ≤ d(Wα, A) ≤ d(Wα, aα),

sod(aα, x) ≤ d(aα, x

′) + d(x′, x) ≤ 3d(aα,Wα) ≤ 6d(A,Wα) ≤ 6d(a, x).

Thus d(aα, a) ≤ d(aα, x)+d(x, a) ≤ 7d(x, a) < δ whenever x ∈ Wα, so f(x) ∈ U .

Chapter 7

Retracts

This chapter begins with Kinoshita’s example of a compact contractible spacethat does not have the fixed point property. The example is elegant, but also rathercomplex, and nothing later depends on it, so it can be postponed until the reader isin the mood for a mathematical treat. The point is that fixed point theory dependson some additional condition over and above compactness and contractibility.

After that we develop the required material from the theory of retracts. Wefirst describe retracts in general, and then briefly discuss Euclidean neighborhoodretracts, which are retracts of open subsets of Euclidean spaces. This concept isquite general, encompassing simplicial complexes and (as we will see later) smoothmanifolds.

The central concept of the chapter is the notion of an absolute neighborhoodretract (ANR) which is a metrizable space whose image, under any embedding as aclosed subset of a metric space, is a retract of some neighborhood of itself. The twokey characterization results are that an open subset of a convex subset of a locallyconvex linear space is an absolute neighborhood retract, and that an ANR can beembedded in a normed linear space as a retract of an open subset of a convex set.

An absolute retract (AR) is a space that is a retract of any metric space itis embedded in as a closed subset. It turns out that the ARs are precisely thecontractible ANRs.

The extension of fixed point theory to infinite dimensional settings ultimatelydepends on “approximating” the setting with finite dimensional objects. Section7.6 provides one of the key results in this direction.

7.1 Kinoshita’s Example

This example came to be known as the “tin can with a roll of toilet paper.” Asyou will see, this description is apt, but does not do justice to the example’s beautyand ingenuity.

Polar coordinates facilitate the description. Let P = [0,∞)×R, with (r, θ) ∈ Pidentified with the point (r cos θ, r sin θ). The unit circle and the open unit disk are

C = { (r, θ) : r = 1 } and D = { (r, t) : r < 1 }.

95

96 CHAPTER 7. RETRACTS

Let ρ : [0,∞) → [0, 1) be a homeomorphism, let s : [0,∞) → P be the functions(t) := (ρ(t), t), and let

S = { s(t) : t ≥ 0 }.Then S is a curve that spirals out from the origin, approaching C asymptotically.The space of the example is

X = (C × [0, 1]) ∪ (D × {0}) ∪ (S × [0, 1]) ⊂ R3.

Here C× [0, 1] is the cylindrical side of the tin can, D×{0} is its base, and S× [0, 1]is the roll of toilet paper. Evidently X is closed, hence compact, and there is anobvious contraction of X that first pushes the cylinder of the tin can and the toiletpaper down onto the closed unit disk and then contracts the disk to the origin.

We are now going to define functions

f1 : C × [0, 1] → X, f2 : D × {0} → X, f3 : S × [0, 1] → X

which combine to form a continuous function f : X → X with no fixed points.Fix a number ε > 0 that is not an integral multiple of 2π; imagining that ε issmall may help to visualize f as a motion of X . Also, fix a continuous functionκ : [0, 1] → [0, 1] with κ(0) = 0, κ(1) = 1, and κ(z) > z for all 0 < z < 1.

The first function is given by the formula

f1(1, θ, z) := (1, θ − (1− 2z)ε, κ(z)).

This is evidently well defined and continuous. The point (1, θ, z) cannot be fixedbecause κ(z) = z implies that z = 0 or z = 1 and ε is not a multiple of 2π.

Observe that D = { (ρ(t), θ) : t ≥ 0, θ ∈ R }. The second function is

f2(ρ(t), θ, 0) :=

{

(0, 0, 1− t/ε), 0 ≤ t ≤ ε,

(ρ(t− ε), θ − ε, 0), ε ≤ t.

This is well defined because ρ is invertible and the two formulas give the originas the image when t = ε. It is continuous because it is continuous on the twosubdomains, which are closed and cover D. It does not have any fixed pointsbecause the coordinate of f2(ρ(t), θ, 0) is less than ρ(t) except when t = 0, andf2(ρ(0), θ, 0) = (0, 0, 1).

The third function is

f3(s(t), z) :=

{

(s((t+ ε)z), 1− (1− κ(z))t/ε), 0 ≤ t ≤ ε,

(s(t− (1− 2z)ε), κ(z)), ε ≤ t.

This is well defined because s is invertible and the two formulas give (s(2εz), κ(z))as the image when t = ε. It is continuous because it is continuous on the twosubdomains, which are closed and cover S × [0, 1]. Since f2(s(t), 0) = f3(s(t), 0)for all t, f2 and f3 combine to define a continuous function on the union of theirdomains.

Can (s(t), z) be a fixed point of f3? If t < ε, then the equation

z = 1− (1− κ(z))t/ε

7.2. RETRACTS 97

is equivalent to (1 − κ(z))t = (1 − z)ε, which is impossible if z < 1 due to theconditions on κ. When t < ε and z = 1, we have s(t) 6= s(t + ε) because s isinjective. On the other hand, when t ≥ ε the equation κ(z) = z implies that eitherz = 0, in which case s(t) 6= s(t− ε), or z = 1, in which case s(t) 6= s(t + ε).

We have now shown that f is well defined and has no fixed points, and that it iscontinuous on (S×[0, 1])∪(D×{0}) and on C×[0, 1]. To complete the verification ofcontinuity, first consider a sequence {(ρ(ti), θi, 0)} in D×{0} converging to (1, θ, 0).Clearly

f2(ρ(ti), θi, 0) = (ρ(ti − ε), θi − ε, 0) → (1, θ − ε, 0) = f1(1, θ, 0).

Now consider a sequence {(s(ti), zi)} converging to a point (1, θ, z). In order for fto be continuous it must be the case that

f3(s(ti), zi) = (s(ti − (1− 2zi)ε), κ(zi)) → (1, θ − (1− 2z)ε, κ(z)) = f1(1, θ, z).

Since s(ti) → (1, θ) means precisely that ti → ∞ and ti mod 2π → θ mod 2π, againthis is clear.

7.2 Retracts

This section prepares for later material by presenting general facts about retrac-tions and retracts. Let X be a metric space, and let A be a subset of X such thatthere is a continuous function r : X → A with r(a) = a for all a ∈ A. We say thatA is a retract of X and that r is a retraction. Many desirable properties that Xmight have are inherited by A.

Lemma 7.2.1. If X has the fixed point property, then A has the fixed point property.

Proof. If f : A → A is continuous, then f ◦ r necessarily has a fixed point, say a∗,which must be in A, so that a∗ = f(r(a∗)) = f(a∗) is also a fixed point of f .

Lemma 7.2.2. If X is contractible, then A is contractible.

Proof. If c : X × [0, 1] → X is a contraction X , then so is (a, t) 7→ r(c(a, t)).

Lemma 7.2.3. If X is connected, then A is connected.

Proof. We show that if A is not connected, then X is not connected. If U1 andU2 are nonempty open subsets of A with U1 ∩ U2 = ∅ and U1 ∪ U2 = A, thenr−1(U1) and r

−1(U2) are nonempty open subsets of X with r−1(U1) ∩ r−1(U2) = ∅and r−1(U1) ∪ r−1(U2) = X .

Here are two basic observations that are too obvious to prove.

Lemma 7.2.4. If s : A → B is a second retraction, then s ◦ r : X → B is aretraction, so B is a retract of X.

Lemma 7.2.5. If A ⊂ Y ⊂ X, then the restriction of r to Y is a retraction, so Ais a retract of Y .


We say that A is a neighborhood retract in X if A is a retract of an openU ⊂ X . We note two other simple facts, the first of which is an obvious consequenceof the last result:

Lemma 7.2.6. Suppose that A is not connected: there are disjoint open setsU1, U2 ⊂ X such that A ⊂ U1 ∪ U2 with A1 := A ∩ U1 and A2 := A ∩ U2 bothnonempty. Then A is a neighborhood retract in X if and only if both A1 and A2

are neighborhood retracts in X.

Lemma 7.2.7. If A is a neighborhood retract in X and B is a neighborhood retractin A, then B is a neighborhood retract in X.

Proof. Let r : U → A and s : V → B be retractions, where U is a neighborhood ofA and V ⊂ A is a neighborhood of B in the relative topology of A. The definitionof the relative topology implies that there is a neighborhood W ⊂ X of B such thatV = A ∩W . Then U ∩W is a neighborhood of B in X , and the composition of swith the restriction of r to U ∩W is a retraction onto B.

A set A ⊂ X is locally closed if it is the intersection of an open set and aclosed set. Equivalently, it is an open subset of a closed set, or a closed subset ofan open set.

Lemma 7.2.8. A neighborhood retract is locally closed.

Proof. If U ⊂ X is open and r : U → A is a retraction, A is a closed subset of Ubecause it is the set of fixed points of r.

This terminology ‘locally closed’ is further explained by:

Lemma 7.2.9. If X is a topological space and A ⊂ X, then A is locally closed ifand only if each point x ∈ A has a neighborhood U such that U ∩A is closed in U .

Proof. If A = U ∩ C where U is open and C is closed, then U is a neighborhoodof each x ∈ A, and A is closed in U . On the other hand suppose that each x ∈ Ahas a neighborhood Ux such that Ux ∩ A is closed in Ux, which is to say thatUx ∩ A = Ux ∩ A. Then A =

⋃

x(Ux ∩A) =⋃

x(Ux ∩A) =(⋃

x Ux)

∩ A.

Corollary 7.2.10. If X is locally compact, a set A ⊂ X is locally closed if andonly if each x ∈ A has a compact neighborhood.

Proof. If A = U ∩ C, x ∈ A, and K is a compact neighborhood of x contained inU , then K ∩ C is a compact neighborhood in A. On the other hand, if x ∈ A andK is a compact neighborhood of x in A, then K = A ∩ V for some neighborhoodV of x in X . Let U be the interior of V . Then U ∩A = U ∩K is closed in U . Thisshows that if every point in K has a compact neighborhood, then the condition inthe last result holds.

7.3. EUCLIDEAN NEIGHBORHOOD RETRACTS 99

7.3 Euclidean Neighborhood Retracts

A Euclidean neighborhood retract (ENR) is a topological space that ishomeomorphic to a neighborhood retract of a Euclidean space. If a subset of aEuclidean space is homeomorphic to an ENR, then it is a neighborhood retract:

Proposition 7.3.1. Suppose that U ⊂ Rm is open, r : U → A is a retraction,B ⊂ Rn, and h : A→ B is a homeomorphism. Then B is a neighborhood retract.

Proof. Since A is locally closed and Rm is locally compact, each point in A hasa closed neighborhood that contains a compact neighborhood. Having a compactneighborhood is an intrinsic property, so every point in B has such a neighborhood,and Corollary 7.2.10 implies that B is locally closed. Let V ⊂ Rn be an open setthat has B as a closed subset. The Tietze extension theorem gives an extension ofh−1 to a map j : V → Rm. After replacing V with j−1(U), V is still an open setthat contains B, and h ◦ r ◦ j : V → B is a retraction.

Note that every locally closed set A = U ∩ C ⊂ Rm is homeomorphic to aclosed subset of Rm+1, by virtue of the embedding x 7→ (x, d(x,Rm \ U)−1), whered(x,Rm \U) is the distance from x to the nearest point not in U . Thus a sufficientcondition for X to be an ENR is that it is homeomorphic to a neighborhood retractof a Euclidean space, but a necessary condition is that it homeomorphic to a closedneighborhood retract of a Euclidean space.

In order to expand the scope of fixed point theory, it is desirable to show thatmany types of spaces are ENR’s. Eventually we will see that a smooth submanifoldof a Euclidean space is an ENR. At this point we can show that simplicial complexeshave this property.

Lemma 7.3.2. If K ′ = (V ′, C ′) is a subcomplex of a simplicial complex K = (V, C),then |K ′| is a neighborhood retract in |K|.

Proof. To begin with suppose that there are simplices of positive dimension in Kthat are not in K ′. Let σ be such a simplex of maximal dimension, and let β bethe barycenter of |σ|. Then |K| \ {β} is a neighborhood of |K| \ int |σ|, and thereis a retraction r of the former set onto the latter that is the identity on the latter,of course, and which maps (1− t)x+ tβ to x whenever x ∈ |∂σ| and 0 < t < 1.

Iterating this construction and applying Lemma 7.2.7 above, we find that thereis a neighborhood retract of |K| consisting of |K ′| and finitely many isolated points.Now Lemma 7.2.6 implies that |K ′| is a neighborhood retract in |K|.

Proposition 7.3.3. If K = (V, C) is a simplicial complex, then |K| is an ENR.

Proof. Let ∆ be the convex hull of the set of unit basis vectors in R|V |. Afterrepeated barycentric subdivision of ∆ there is a (|V | − 1)-dimensional simplex σin the interior of ∆. (This is a consequence of Proposition 2.5.2.) Identifying thevertices of σ with the elements of V leads to an embedding of |K| as a subcomplexof this subdivision, after which we can apply the result above.


Giving an example of a closed subset of a Euclidean space that is not an ENRis a bit more difficult. Eventually we will see that a contractible ENR has thefixed point property, from which it follows that Kinoshita’s example is not an ENR.A simpler example is the Hawaiian earring H , which is the union over all n =1, 2, . . . of the circle of radius 1/n centered at (1/n, 0). If there was a retractionr : U → H of a neighborhood U of H , then for small n the entire disk of radius1/n centered at (1/n, 0) would be contained in U , and we would have a violationof the following result, which is actually a quite common method of applying thefixed point principle.

Theorem 7.3.4 (No Retraction Theorem). If Dn is the closed unit disk centeredat the origin in Rn, and Sn−1 is its boundary, then there does not exist a continuousr : Dn → Rn \Dn with r(x) = x for all s ∈ Sn−1.

Proof. Suppose that such an r exists, and let g : Dn → Sn−1 be the function thattakes each x ∈ Sn−1 to itself and takes each x ∈ Dn \ Sn−1 to the point where theline segment between r(x) and x intersects Sn−1. An easy argument shows that gis continuous at each x ∈ Dn \ Sn−1, and another easy argument shows that g iscontinuous at each x ∈ Sn−1, so g is continuous. If a : Sn−1 → Sn−1 is the antipodalmap a(x) = −x, then a ◦ g gives a map from Dn to itself that does not have a fixedpoint, contradicting Brouwer’s fixed point theorem.

7.4 Absolute Neighborhood Retracts

A metric space A is an absolute neighborhood retract (ANR) if h(A) is aneighborhood retract whenever X is a metric space, h : A → X is an embedding,and h(A) is closed. This definition is evidently modelled on the description of ENR’swe arrived at in the last section, with ‘metric space’ in place of ‘Euclidean space.’

We saw above that if A ⊂ Rm is a neighborhood retract, then any homeomorphicimage of A in another Euclidean space is also a neighborhood retract, and somesuch homeomorphic image is a closed subset of the Euclidean space. Thus a natural,and at least potentially more restrictive, extension of the concept is obtained bydefining an ANR to be a space A such that h(A) is a neighborhood retract wheneverh : A→ X is an embedding of A is a metric space X , even if h(A) is not closed.

There is a second sense in which the definition is weaker than it might be. Atopological space is completely metrizable if its topology can be induced by acomplete metric. Since an ENR is homeomorphic to a closed subset of a Euclideanspace, an ENR is completely metrizable. Problem 6K of Kelley (1955) shows that atopological space A is completely metrizable if and only if, whenever h : A→ X isan embedding of A in a metric space X , h(A) is a Gδ. The set of rational numbersis an example of a space that is metrizable, but not completely metrizable, becauseit is not a Gδ as a subset of R. To see this observe that the set of irrational numbersis⋂

r∈Q R\{r}, so if Q was a countable intersection of open sets, then ∅ would be acountable intersection of open sets, contrary to the Baire category theorem (p. 200of Kelley (1955)). The next result shows that the union of { eπir : r ∈ Q } with theopen unit disk in C is an ANR, but this space is not completely metrizable, so it isnot an ENR. Thus there are finite dimensional ANR’s that are not ENR’s.

7.4. ABSOLUTE NEIGHBORHOOD RETRACTS 101

By choosing the least restrictive definition we strengthen the various resultsbelow. However, these complexities are irrelevant to compact ANR’s, which are,for the most part, the only ANR’s that will figure in our work going forward. Ofcourse the homeomorphic image h(A) of a compact metric space A in any metricspace is compact and consequently closed, and of course h(A) is also complete.

At first blush being an ANR might sound like a remarkable property that canonly be possessed by quite special spaces, but this is not the case at all. AlthoughANR’s cannot exhibit the “infinitely detailed features” of the tin can with a roll oftoilet paper, the concept is not very restrictive, at least in comparison with otherconcepts that might serve as an hypothesis of a fixed point theorem.

Proposition 7.4.1. A metric space A is an ANR if it (or its homeomorphic image)is a retract of an open subset of a convex subset of a locally convex linear space.

Proof. Let r : U → A be a retraction, where U is an open subset of a convexset C. Suppose h : A → X maps A homeomorphically onto a closed subset h(A)of a metric space X . Dugundji’s theorem implies that h−1 : h(A) → U has acontinuous extension j : X → C. Then V = j−1(U) is a neighborhood of h(A), andh ◦ r ◦ j|V : V → h(A) is a retraction.

Corollary 7.4.2. An ENR is an ANR.

The proposition above gives a sufficient condition for a space to be an ANR.There is a somewhat stronger necessary condition.

Proposition 7.4.3. If A is an ANR, then there is a homeomorphic image of Athat is a retract of an open subset of a convex subset of Banach space.

Proof. Theorem 6.5.2 gives a map h : A→ Z, where Z is a Banach space, such thath maps A homeomorphically onto h(A) and h(A) is closed in the relative topologyof its convex hull C. Since A is an ANR, there is a relatively open U ⊂ C and aretraction r : U → h(A).

Since compact metric spaces are separable, compact ANR’s satisfy a more de-manding embedding condition than the one given by Proposition 7.4.3.

Proposition 7.4.4. If A is a compact ANR, then there exists an embedding ι :A→ I∞ such that ι(A) is a neighborhood retract in I∞.

Proof. Urysohn’s Theorem guarantees the existence of an embedding of A in I∞.Since A is compact, h(A) is closed in I∞, and since A is an ANR, h(A) is a neigh-borhood retract in I∞.

The simplicity of an open subset of I∞ is the ultimate source of the utility ofANR’s in the theory of fixed points. To exploit this simplicity we need analytictools that bring it to the surface. Fix a compact metric space (X, d), and let

∆ = { (x, x) : x ∈ X }

be the diagonal in X ×X . We say that (X, d) is uniformly locally contractible

if, for any neighborhood V ⊂ X × X of ∆ there is a neighborhood W of ∆ and amap γ : W × [0, 1] → X such that:


(a) γ(x, x′, 0) = x′ and γ(x, x′, 1) = x for all (x, x′) ∈ W ;

(b) γ(x, x, t) = x for all x ∈ X and t ∈ [0, 1];

(c) (x, γ(x, x′, t)) ∈ V for all (x, x′) ∈ W and t ∈ [0, 1].

Proposition 7.4.5. A compact ANR A is uniformly locally contractible.

Proof. By Proposition 7.4.4 we may assume that A ⊂ I∞, and that there is aretraction r : U → A where U ⊂ I∞ is open. Fix a neighborhood V ⊂ A×A of thediagonal, and let V = (IdA× r)−1(V ) ⊂ A×U . The distance from x to the nearestpoint in I∞ \ U is a positive continuous function on A, which attains its minimumsince A is compact, so there is some δ > 0 such that

∆δ = { (x, x′) ∈ A× U : ‖x′ − x‖ < δ } ⊂ V .

Let W = ∆δ ∩ (A×A), and let γ : W × [0, 1] → A be the function

γ(x, x′, t) = r(tx+ (1− t)x′).

Evidently γ has all the required properties.

A topological space X is locally path connected if, for each x ∈ X , eachneighborhood Y of x contains a neighborhood U such that for any x0, x1 ∈ U thereis a continuous path γ : [0, 1] → Y with γ(0) = x0 and γ(1) = x1. At first sightthis seems less straightforward than requiring that any neighborhood of x contain apathwise connected neighborhood, but the weaker condition given by the definitionis sometimes much easier to verify, and it usually has whatever implications aredesired.

Corollary 7.4.6. A compact ANR A is locally path connected.

Proof. The last result (with V = A × A) gives a neighborhood W ⊂ A× A of thediagonal and a function γ : W × [0, 1] → A satisfying (a) and (b). Fix x ∈ A,and let Y be a neighborhood of x. There is a neighborhood U of x such thatU × U ⊂ W and γ(U × U × [0, 1]) ⊂ Y . (Combining (b) and the continuity of γ,for each t ∈ [0, 1] there is a neighborhood Ut and εt > 0 such that Ut × Ut ⊂ Wand γ(Ut × Ut × (t − εt, t + εt)) ⊂ Y . Since [0, 1] is compact there are t1, . . . , tksuch that [0, 1] ⊂ ⋃

i(ti − εti , ti + εti). Let U =⋂

i Uti .) Then for any x0, x1 ∈ U ,t 7→ γ(x1, x0, t) is a path in Y going from x0 to x1.

7.5 Absolute Retracts

A metric space A is an absolute retract (AR) if h(A) is a retract ofX wheneverX is a metric space, h : A → X is an embedding, and h(A) is closed. Of coursean AR is an ANR. Below we will see that an ANR is an AR if and only if it iscontractible, so compact convex sets are AR’s. Eventually (Theorem 14.1.5) wewill show that nonempty compact AR’s have the fixed point property. In this senseAR’s fulfill our goal of replacing the assumption of a convex domain in Kakutani’stheorem with a topological condition.

The embedding conditions characterizing AR’s parallel those for ANR’s, withsome simplifications.

7.5. ABSOLUTE RETRACTS 103

Proposition 7.5.1. If a metric space A is a retract of a convex subset C of a locallyconvex linear space, then it is an ANR.

Proof. Suppose h : A → X maps A homeomorphically onto a closed subset h(A)of a metric space X . Dugundji’s theorem implies that h−1 : h(A) → C has acontinuous extension j : X → C. Let r : C → A be a retraction. Then q := h◦ r ◦ jis a retraction of X onto h(A).

Proposition 7.5.2. If A is an AR, then there is a homeomorphic image of A thatis a retract of a convex subset of a Banach space.

Proof. Theorem 6.5.2 gives a map h : A→ Z, where Z is a Banach space, such thath maps A homeomorphically onto h(A) and h(A) is closed in the relative topologyof its convex hull C. Since A is an AR, there is a retraction r : C → h(A).

The remainder of the section proves:

Proposition 7.5.3. An ANR is an AR if and only if it is contractible.

In preparation for the proof we introduce an important concept of general topol-ogy. A pair of topological spaces X , A with A ⊂ X are said to have the homotopy

extension property with respect to the class ANR if, whenever:

(a) Y is an ANR,

(b) f : X → Y is continuous,

(c) η : A× [0, 1] → Y is a homotopy, and

(d) η(·, 0) = f |A,there is a continuous η : X × [0, 1] → Y with η(·, 0) = f and η|A×[0,1] = η.

Proposition 7.5.4. If X is a metric space and A is a closed subset of X, then Xand A have the homotopy extension property with respect to ANR’s.

We separate out one of the larger steps in the argument.

Lemma 7.5.5. Let X be a metric space, let A be a closed subset of X, and let

Z := (X × {0}) ∪ (A× [0, 1]).

Then for every neighborhood V ⊂ X × [0, 1] of Z there is a map ϕ : X × [0, 1] → Vthat agrees with the identity on Z.

Proof. For each (a, t) ∈ A× [0, 1] choose a product neighborhood

U(a,t) × (t− ε(a,t), t+ ε(a,t)) ⊂ V

where U(a,t) ⊂ X is open and ε > 0. For any particular a the cover of {a}× [0, 1] hasa finite subcover, and the intersection of its first cartesian factors is a neighborhoodUa of a with Ua × [0, 1] ⊂ V . Let U :=

⋃

a Ua. Thus there is a neighborhood U ofA such that U × [0, 1] ⊂ V .

Urysohn’s lemma gives a function α : X → [0, 1] with α(x) = 0 for all x ∈ X \Uand α(a) = 1 for all a ∈ A, and the function ϕ(x, t) := (x, α(x)t) satisfies therequired conditions.


Proof of Proposition 7.5.4. Let Y , f : X → Y , and h : A × [0, 1] → Y satisfy (a)-(d) above. By Theorem 6.5.2 we may assume without loss of generality that Y iscontained in a Banach space S, and is a relatively closed subset of its convex hull C.Let Z := (X×{0})∪(A×[0, 1]), and define g : Z → Y by setting g(x, 0) = f(x) andg(a, t) = h(a, t). Dugundji’s theorem implies that there is a continuous extensiong : X × [0, 1] → C of g. Let W ⊂ C be a neighborhood of Y for which there is aretraction r : W → Y , let V := g−1(W ), and let ϕ : X× [0, 1] → V be a continuousmap that is the identity on Z, as per the result above. Clearly η := r ◦ g ◦ ϕ hasthe indicated properties.

We now return to the characterization of AR’s.

Proof of Proposition 7.5.3. Let A be an ANR. By Theorem 6.5.2 we may embed Aas a relatively closed subset of a convex subset C of a Banach space.

If A is an AR, then it is a retract of C. A convex set is contractible, and aretract of a contractible set is contractible (Lemma 7.2.2) so A is contractible.

Suppose that A is conractible. By Proposition 7.5.1 it suffices to show that A isa retract of C. Let c : A×[0, 1] → A be a contraction, and let a1 be the “final value”a1, by which we mean that c(a, 1) = a1 for all a ∈ A. Set Z := (C×{0})∪(A×[0, 1]),and define f : Z → A by setting f(x, 0) := a1 for x ∈ C and f(a, t) := c(a, 1− t) for(a, t) ∈ A× [0, 1]. Proposition 7.5.4 implies the existence of a continuous extensionf : C × [0, 1] → A. Now r := f(·, 1) : C → A is the desired retraction.

7.6 Domination

In our development of the fixed point index an important idea will be to passfrom a theory for certain simple or elementary spaces to a theory for more generalspaces by showing that every space of the latter type can be “approximated” by asimpler space, in the sense of the following definitions. Fix a metric space (X, d).

Definition 7.6.1. If Y is a topological space and ε > 0, a homotopy η : Y × [0, 1] →X is an ε-homotopy if

d(

η(y, s), η(y, t))

< ε

for all y ∈ Y and all 0 ≤ s, t ≤ 1. We say that η0 and η1 are ε-homotopic.

Definition 7.6.2. For ε > 0, a topological space D ε-dominates C ⊂ X if thereare continuous functions ϕ : C → D and ψ : D → X such that ψ ◦ ϕ : C → X isε-homotopic to IdC.

This section’s main result is:

Theorem 7.6.3 (Domination Theorem). If X is a separable ANR and C ⊂ X iscompact, then for any ε > 0 there is a simplicial complex that ε-dominates C.

Proof. If C = ∅, then for any ε > 0 it is ε-dominated by ∅, which we considerto be a simplicial complex. Similarly, if C is a singleton, then for any ε > 0 it isε-dominated by the simplicial complex consisting of a single point. Therefore wemay assume that C has more than one point.

7.6. DOMINATION 105

In view of Proposition 7.4.3 we may assume that X is a retract of an open setU of a convex subset S of a Banach space. Let r : U → X be the retraction, andlet d be the metric on S derived from the norm of the Banach space. Fix ε > 0small enough that C is not contained in the ε/2-ball around any of its points. Letr : U → X be a retraction of a neighborhood onto X . For x ∈ C let

ρ(x) := 12d(

x, S \ r−1(Uε/2(x) ∩X))

.

Choose x1, . . . , xn ∈ C such that

U1 := Uρ(x1)(x1), . . . , Un := Uρ(xn)(xn)

is an open cover of C.Let e1, . . . , en be the standard unit basis vectors of Rn. The nerve of the open

cover is

N(U1,...,Un) =⋃

x∈Xconv({ ej : x ∈ Uj }) =

⋃

Vj1∩...∩Vjk 6=∅conv(ej1, . . . , ejk).

Of course it is a (geometric) simplicial complex. There are functions α1, . . . , αn :C → [0, 1] given by

αi(x) :=d(x,X \ Ui)

∑nj=1 d(x,X \ Uj)

.

Of course the denominator is always positive, so these functions are well definedand continuous. There is a continuous function ϕ : C → N(U1,...,Un) given by

ϕ(x) :=n

∑

j=1

αj(x)ej .

We would like to define a function ψ : N(U1,...,Un) → X be setting

ψ(

n∑

j=1

αjej

)

= r(

n∑

j=1

αjxj

)

.

Consider a point y =∑n

j=1 αjej ∈ N(U1,...,Un). Let j1, . . . , jk be the indices jsuch that αj > 0, ordered so that ρ(xj1) ≥ max{ρ(xj2), . . . , ρ(xjk)}. Let B :=U2ρ(xj1 )

(xj1). The definition of N(U1,...,Un) implies that there is a point z ∈ ⋂ph=0Ujh .

For all h = 1, . . . , k we have xjh ∈ B because

d(z, xjh) < ρ(xjh) ≤ ρ(xj1).

Now note thatB ⊂ r−1(Uε/2(xj1) ∩X) ⊂ U.

Since B is convex, it contains∑k

h=1 αjhxjh , so ψ is well defined.Now we would like to define a homotopy η : C × [0, 1] → X by setting

η(x, t) = r(

(1− t)∑

j

αj(x)xj + tx)

,


so suppose that y = ϕ(x) for some x ∈ C. Then x ∈ Uj1 ∩ . . . ∩ Ujk . In particularB := U2ρ(xj1 )

(xj1) ⊃ Uρ(xj1 )(xj1) = Uj1 , so B contains x. Again, since B is convex

it contains the line segment between x and∑n

j=1 αj(x)xj =∑k

h=1 αjhxjh, so η iswell defined. Evidently η is continuous with η0 = ψ ◦ ϕ and η1 = IdC . In addition,since B ⊂ U we have

η(x, t) ∈ r(B) ⊂ Uε/2(xj1) ⊂ Uε(x)

for all 0 ≤ t ≤ 1.

Sometimes we will need the following variant.

Theorem 7.6.4. If X is a separable ANR and C ⊂ X is compact, then for anyε > 0 there is an open U ⊂ Rm, for somem, such that U is compact and ε-dominatesC.

Proof. Fixing ε > 0, let P ⊂ Rm be a simplicial complex that ε-dominates C byvirtue of the maps ϕ : C → P and ψ : P → X . Since P is an ENR (Proposition7.3.3) it is a neighborhood retract. Let r : U ′ → P be a retraction of a neighborhood.For sufficiently small ε > 0 the closed ε-ball around P is contained in U ′. Let U bethe open ε-ball around P . Of course U is compact. Let ϕ′ : C → U be ϕ interpretedas a function with range U , and let ψ′ = ψ ◦ r : U → X . Since ψ′ ◦ ϕ′ = ψ ◦ ϕ, C isε-dominated by U .

Chapter 8

Essential Sets of Fixed Points

Figure 2.1 shows a function f : [0, 1] → [0, 1] with two fixed points, s and t.Intuitively, they are qualitatively different, in that a small perturbation of f canresult in a function that has no fixed points near s, but this is not the case for t.This distinction was recognized by Fort (1950) who described s as inessential, whilet is said to be essential.

b

b

0 10

1

s t

Figure 1.1

In game theory one often deals with correspondences with sets of fixed pointsthat are infinite, and include continua such as submanifolds. As we will see, thedefinition proposed by Fort can be extended to sets of fixed points rather easily:roughly, a set of fixed points is essential if every neighborhood of it contains fixedpoints of every “sufficiently close” perturbation of the given correspondence. (Hereone needs to be careful, because in the standard terminology of game theory, fol-lowing Jiang (1963), essential Nash equilibria, and essential sets of Nash equilibria,are defined in terms of perturbations of the payoffs. This is a form of Q-robustness,which is studied in Section 8.3.) But it is easy to show that the set of all fixed

107

108 CHAPTER 8. ESSENTIAL SETS OF FIXED POINTS

points is essential, so some additional condition must be imposed before essentialsets can be used to distinguish some fixed points from others.

The condition that works well, at least from a mathematical viewpoint, is con-nectedness. This chapter’s main result, Theorem 8.3.2, which is due to Kinoshita(1953), asserts that minimal (in the sense of set inclusion) essential sets are con-nected. The proof has the following outline. Let K be a minimal essential set ofan upper semicontinuous convex valued correspondence F : X → X , where X is acompact, convex subset of a locally convex toplogical vector space. Suppose thatK is disconnected, so there are disjoint open sets U1, U2 such that K1 := K ∩ U1

and K2 := K ∩ U2 are nonempty and K1 ∪K2 = K. Since K is minimal, K1 andK2 are not essential, so there are perturbations F1 and F2 of F such that each Fihas no fixed points near Ki. Let α1, α2 : X → [0, 1] be continuous functions suchthat each αi vanishes outside Ui and is identically 1 near Ki, and let α : X → [0, 1]be the function α(x) := 1 − α1(x) − α2(x). Then α, α1, α2 is a partition of unitysubordinate to the open cover X \K,U1, U2. The correspondence

x 7→ α(x)F (x) + α1(x)F1(x) + α2(x)F2(x)

is then a perturbation of F that has no fixed points near K, which contradicts theassumption that K is essential. Much of this chapter is concerned with filling inthe technical details of this argument.

Turning to our particular concerns, Section 8.1 gives the Fan-Glicksberg the-orem, which is the extension of the Kakutani fixed point theorem to infinite di-mensional sets. Section 8.2 shows that convex valued correspondences can be ap-proximated by functions, and defines convex combinations of convex valued corre-spondences, with continuously varying weights. Section 8.3 then states and provesKinoshita’s theorem, which implies that minimal connected sets exist. There re-mains the matter of proving that minimal essential sets actually exist, which is alsohandled in Section 8.3.

8.1 The Fan-Glicksberg Theorem

We now extend the Kakutani fixed point theorem to correspondences with in-finite dimensional domains. The result below was proved independently by Fan(1952) and Glicksberg (1952) using quite similar methods; our proof is perhaps abit closer to Fan’s. In a sense the result was already known, since it can be derivedfrom the Eilenberg-Montgomery theorem, but the proof below is much simpler.

Theorem 8.1.1 (Fan, Glicksberg). If V is a locally convex topological vector space,X ⊂ V is nonempty, convex, and compact, and F : X → X is an upper semicon-tinuous convex valued correspondence, then F has a fixed point.

We treat two technical points separately:

Lemma 8.1.2. If V is a (not necessarily locally convex) topological vector spaceand K,C ⊂ V with K compact and C closed, then K + C is closed.

8.1. THE FAN-GLICKSBERG THEOREM 109

Proof. We will show that the compliment is open. Let y be a point of V that is notin K +C. For each x ∈ K, translation invariance of the topology of V implies thatx + C is closed, so Lemma 6.3.2 gives a neighborhood Wx of the origin such that(y +Wx +Wx) ∩ (x + C) = ∅. Since we can replace Wx with −Wx ∩Wx, we mayassume that −Wx = Wx, so that (y +Wx) ∩ (x + C +Wx) = ∅. Choose x1, . . . , xksuch that the sets xi +Wxi cover K, and let W = Wx1 ∩ . . . ∩Wxk . Now

(y +W ) ∩ (K + C) ⊂ (y +W ) ∩⋃

i

(xi + C +Wxi)

⊂⋃

i

(y +Wxi) ∩ (xi + C +Wxi) = ∅.

Lemma 8.1.3. If V is a (not necessarily locally convex) topological vector spaceand K,C, U ⊂ V with K compact, C closed, U open, and C ∩K ⊂ U , then thereis a neighborhood of the origin W such that (C +W ) ∩K ⊂ U .

Proof. Let L := K \ U . Our goal is to find a neighborhood of the origin W suchthat (C + W ) ∩ L = ∅. Since C is closed, for each x ∈ L there is (by Lemma6.3.2) a neighborhood Wx of the origin such that (x+Wx +Wx) ∩ C = ∅. We canreplace Wx with −Wx ∩Wx, so we may insist that −Wx = Wx. As a closed subsetof K, L is compact, so there are x1, . . . , xk such that the sets xi+Wxi cover L. LetW := Wx1 ∩ . . . ∩Wxk . Then W = −W , so if (C +W ) ∩ L is nonempty, then so isC ∩ (L+W ), but

L+W ⊂(

⋃

i

xi +Wxi

)

+W ⊂⋃

i

xi +Wxi +Wxi.

Proof of Theorem 8.1.1. Let U be a closed convex neighborhood of the origin.(Lemma 6.3.4 implies that such a U exists.) Let FU : X → X be the corre-spondence FU(x) := (F (x) + U) ∩ X . Evidently FU(x) is nonempty and convex,and the first of the two results above implies that it is a closed subset of X , so it iscompact.

To show that FU is upper semicontinuous we consider a particular x and aneighborhood T of FU(x). The second of the two results above implies that thereis a neighborhood W of the origin such that (F (x) + U +W ) ∩X ⊂ T . Since F isupper semicontinuous there is a neighborhood A of x such that F (x′) ⊂ F (x) +Wfor all x′ ∈ A, and for such an x′ we have

FU(x′) = (F (x′) + U) ∩X ⊂ (F (x) +W + U) ∩X ⊂ T.

Since X is compact, there are finitely many points x1, . . . , xk ∈ X such thatx1 + U, . . . , xk + U is a cover of X . Let C be the convex hull of these points.Define G : C → C by setting G(x) = FU (x) ∩ C; since G(x) contains some xi, it isnonempty, and of course it is convex. Since C is the image of the continuous function(α1, . . . , αk) 7→ α1x1+· · ·+αkxk from the (k−1)-dimensional simplex, it is compact,


and consequently closed because V is Hausdorff. Since Gr(G) = Gr(FU) ∩ (C ×C)is closed, G is upper semicontinuous. Therefore G satisfies the hypothesis of theKakutani fixed point theorem and has a nonempty set of fixed points. Any fixedpoint of G is a fixed point of FU , so the set FU of fixed points of FU is nonempty.Of course it is also closed in X , hence compact.

The collection of compact sets

{FU : U is a closed convex neighborhood of the origin }

has the finite intersection property because FU1∩...Uk⊂ FU1 ∩ . . . ∩ FUk

, so itsintersection is nonempty. Suppose that x∗ is an element of this intersection. If x∗

was not an element of F (x∗) there would be a closed neighborhood U of the originsuch that (x∗ − U) ∩ F (x∗) = ∅, which contradicts x∗ ∈ FU , so x

∗ is a fixed pointof F .

8.2 Convex Valued Correspondences

Let X be a topological space, and let Y be a subset of a topological vector spaceV . Then Con(X, Y ) is the set of upper semicontinuous convex valued correspon-dences from X to Y . Let ConS(X, Y ) denote this set endowed with the relativetopology inherited from US(X, Y ), which was defined in Section 5.2. This sectiontreats two topological issues that are particular to convex valued correspondences:a) approximation by continuous functions; b) the continuity of the process by whichthey are recombined using convex combinations and partitions of unity.

The following result is a variant, for convex valued correspondences, of theapproximation theorem (Theorem 9.1.1) that is the subject of the next chapter.

Proposition 8.2.1. If X is a metric space, V is locally convex, and Y is eitheropen or convex, then C(X, Y ) is dense in ConS(X, Y ).

Proof. Fix F ∈ Con(X, Y ) and a neighborhood U ⊂ X × Y of Gr(F ). Our goal isto produce a continuous function f : X → Y with Gr(f) ⊂ U .

Consider a particular x ∈ X . For each y ∈ F (x) there is a neighborhood Tx,y ofx and (by Lemma 6.3.2) a neighborhood Wx,y of the origin in V such that

Tx,y × (y +Wx,y +Wx,y) ⊂ U.

If Y is open we can also require that y +Wx,y +Wx,y ⊂ Y . The compactness ofF (x) implies that there are y1, . . . , yk such that the yi +Wx,yi cover F (x). SettingTx =

⋂

i Tx,yi andWx =⋂

iWx,yi, we have Tx×(F (x)+Wx) ⊂ U and F (x)+Wx ⊂ Yif Y is open. Since V is locally convex, we may assume that Wx is convex becausewe can replace it with a smaller convex neighborhood. Upper semicontinuity givesa δx > 0 such that Uδx(x) ⊂ Tx and F (x′) ⊂ F (x) +Wx for all x′ ∈ Uδx(x).

Since metric spaces are paracompact there is a locally finite open cover {Tα}α∈Aof X that refines {Uδx/2(x)}x∈X . For each α ∈ A choose xα such that Tα ⊂Uδα/2(xα), where δα := δxα, and choose yα ∈ F (xα). Since metric spaces are

8.2. CONVEX VALUED CORRESPONDENCES 111

normal, Theorem 6.2.2 gives a partition of unity {ψα} subordinate to {Tα}α∈A. Letf : X → V be the function

f(x) :=∑

α∈Aψα(x)yα.

Fixing x ∈ X , let α1, . . . , αn be the α such that ψα(x) > 0. After renumberingwe may assume that δα1 ≥ δαi

for all i = 2, . . . , n. For each such i we havexαi

∈ Uδαi/2(x) ⊂ Uδα1

(xα1), so that yαi∈ F (xα1) +Wxα1

. Since F (xα1) +Wxα1is

convex we have

(x, f(x)) ∈ Uδα1(xα1)× (F (xα1) +Wxα1

) ⊂ U.

Note that f(x) is contained in Y either because Y is convex or because F (xα1) +Wxα1

⊂ Y . Since x was arbitrary, we have shown that Gr(f) ⊂ U .

We now study correspondences constructed from given correspondences by tak-ing a convex combination, where the weights are given by a partition of unity. LetX be a compact metric space and let V be a topological vector space. Since addi-tion and scalar multiplication are continuous, Proposition 4.5.9 and Lemma 4.5.10imply that the composition

(α,K) 7→ {α} × L 7→ αK = {αv : v ∈ K } (∗)

and the Minkowski sum

(K,L) 7→ K × L 7→ K + L := { v + w : v ∈ K,w ∈ L } (∗∗)

are continuous functions from R×K(V ) and K(V )×K(V ) to K(V ).These operations define continuous functions on the corresponding spaces of

functions and correspondences. Let CS(X) denote the space CS(X,R) defined inSection 5.5.

Lemma 8.2.2. The function (ψ, F ) 7→ ψF from CS(X)×ConS(X, V ) to ConS(X, V )is continuous.

Proof. To produce a contradiction suppose the assertion is false. Then there isa directed set (D,<) and a convergent net, say {(ψd, F d)}d∈D with limit (ψ, F ),such that ψdF d 6→ ψF . Failure of convergence means that there is a neighborhoodW ⊂ X × V of Gr(ψF ) such that (after choosing a subnet) for every d there arepoints xd ∈ X and yd ∈ F d(xd) such that (xd, ψd(xd)yd) /∈ W .

Taking a further subnet, we may assume that xd → x and ψd(xd) → α. For eachy ∈ F (x) there are neighborhoods Ty and Uy of x and y such that Ty × Uy ⊂ W .Let Uy1 , . . . , Uym be a finite subcover of F (x), and set T :=

⋂

j Tyj and U :=⋃

j Uyj .Then T and U are neighborhoods of x and F (x) such that T × U ⊂W .

The continuity of (∗) and (∗∗) implies that there are neighborhoods A of α andU of F (x) such that α′K ⊂ U whenever α′ ∈ A and K ⊂ U . By replacing T with asmaller neighborhood of x if need be, we can insure that ψ(x′) ∈ A and F (x′) ⊂ Ufor all x′ ∈ T . Then the set of (ψ′, F ′) such that ψ′(x′) ∈ A and F ′(x′) ⊂ U for all


x′ ∈ T is a neighborhood of (ψ, F ), so when d is “large” we will have (ψd, F d) inthis neighborhood and xd ∈ T , which implies that

{xd} × ψd(xd)F d(xd) ⊂ T × U ⊂W.

This contradicts our supposition, so the proof is complete.

The proof of the following follows the same pattern, and is left to the reader.

Lemma 8.2.3. The function (F1, F2) 7→ ψF1 + F2 from ConS(X, V )× ConS(X, V )to ConS(X, V ) is continuous.

If ψ1, . . . , ψk is a partition of unity subordinate to this cover and F1, . . . , Fk ∈Con(X, V ), then each Fi may be regarded as a continuous function from X to K(V ),so we may define a new continuous function from X to K(V ) by setting

(ψ1F1 + · · ·+ ψkFk)(x) := ψ1(x)F1(x) + · · ·+ ψk(x)Fk(x).

A continuous function fromX to K(V ) is the same thing as an upper semicontinuouscompact valued correspondence, so we may regard ψ1F1+ · · ·+ψkFk as an elementof Con(X, V ). Let PUk(X) be the space of k-element partitions of unity ψ1, . . . , ψkof X . We endow PUk(X) with the relative topology it inherits as a subspace ofCS(X)k. The last two results now imply:

Proposition 8.2.4. The function

(ψ1, . . . , ψk, F1, . . . , Fk) 7→ ψ1F1 + · · ·+ ψkFk

from PUk(X)× ConS(X, V )k to ConS(X, V ) is continuous.

8.3 Kinoshita’s Theorem

Let X be a compact convex subset of a locally convex topological vector space,and fix a particular F ∈ Con(X,X).

Definition 8.3.1. A set K ⊂ FP(F ) is an essential set of fixed points of Fif it is compact and for any open U ⊃ K there is a neighborhood V ⊂ ConS(X,X)of F such that FP(F ′) ∩ U 6= ∅ for all F ′ ∈ V .

The following result from Kinoshita (1952) is a key element of the theory ofessential sets.

Theorem 8.3.2. (Kinoshita) If K ⊂ FP(F ) is essential and K1, . . . , Kk is apartition of K into disjoint compact sets, then some Kj is essential.

Proof. Suppose that no Kj is essential. Then for each j = 1, . . . , k there is aneighborhood Uj of Kj such that for every neighborhood Vj ⊂ ConS(X,X) there isan Fj ∈ Vj with no fixed points in Uj . Replacing the Uj with smaller neighborhoodsif need be, we can assume that they are pairwise disjoint. Let U be a neighborhood ofX \(U1∪ . . .∪Uk) whose closure does not intersect K. A compact Hausdorff space is

8.3. KINOSHITA’S THEOREM 113

normal, so Theorem 6.2.2 implies the existence of a partition of unity ϕ1, . . . , ϕk, ϕ :X → [0, 1] subordinate to the open cover U1, . . . , Uk, U . Let V ⊂ ConS(X,X)be a neighborhood of F . Proposition 8.2.4 implies that there are neighborhoodsV1, . . . , Vk ⊂ ConS(X,X) of F such that ϕ1F1 + · · · + ϕkFk + ϕF ∈ V wheneverF1 ∈ V1, . . . , Fk ∈ Vk. For each j we can choose a Fj ∈ Vj that has no fixed pointsin Uj . Then ϕ1F1 + · · ·+ ϕkFk + ϕF has no fixed points in X \ U because on eachUj \U it agrees with Fj . Since X \ U is a neighborhood of K and V was arbitrary,this contradicts the assumption that K is essential.

Recall that a topological space is connected if it is not the union of two disjointnonempty open sets. A subset of a topological space is connected if the relativetopology makes it a connected space.

Corollary 8.3.3. A minimal essential set is connected.

Proof. Let K be an essential set. If K is not connected, then there are disjointopen sets U1, U2 such that K ⊂ U1 ∪ U2 and K1 := K ∩ U1 and K2 := K ∩ U2

are both nonempty. Since K1 and K2 are closed subsets of K, they are compact,so Kinoshita’s theorem implies that either K1 or K2 is essential. Consequently Kcannot be minimal.

Naturally we would like to know whether minimal essential sets exist. Becauseof important applications in game theory, we will develop the analysis in the contextof a slightly more general concept.

Definition 8.3.4. A pointed space is a pair (A, a0) where A is a topologicalspace and a0 ∈ A. A pointed map f : (A, a0) → (B, b0) between pointed spaces isa continuous function f : A→ B with f(a0) = b0.

Definition 8.3.5. Suppose (A, a0) is a pointed space and

Q : (A, a0) → (ConS(X,X), F )

is a pointed map. A nonempty compact set K ⊂ FP(F ) is Q-robust if, for ev-ery neighborhood V ⊂ X of K, there is a neighborhood U ⊂ A of a0 such thatFP(Q(a)) ∩ V 6= ∅ for all a ∈ U .

A set of fixed points is essential if and only if it is Id(ConS(X,X),F )-robust. At theother extreme, if Q is a constant function, so that Q(a) = F for all a, then anynonempty compact K ⊂ FP(F ) is Q-robust. The weakening of the notion of anessential set provided by this definition is useful when certain perturbations of Fare thought to be more relevant than others, or when the perturbations of F arederived from perturbations of the parameter a in a neighborhood of a0. Some ofthe most important refinements of the Nash equilibrium concept have this form. Inparticular, Jiang (1963) defines essential Nash equilibria, and essential sets of Nashequilibria, in terms of perturbations of the game’s payoffs, while Kohlberg andMertens (1986) define stable sets of Nash equilibria in terms of those perturbationsof the payoffs that are induced by the trembles of Selten (1975).

Lemma 8.3.6. FP(F ) is Q-robust.


Proof. The continuity of FP (Theorem 5.2.1) implies that for any neighborhoodV ⊂ X of FP(F ) there is a neighborhood U ⊂ A of a0 such that FP(Q(a)) ⊂ Vfor all a ∈ U . The Fan-Glicksberg fixed point theorem implies that FP(Q(a)) isnonempty.

This result shows that if our goal is to discriminate between some fixed pointsand others, these concepts must be strengthened in some way. The two mainmethods for doing this are to require either connectedness or minimality.

Definition 8.3.7. A nonempty compact set K ⊂ FP(F ) is a minimal Q-robustset if it is Q-robust and minimal in the class of such sets: K is Q-robust and noproper subset is Q-robust. A minimal connected Q-robust set is a connectedQ-robust set that does not contain a proper subset that is connected and Q-robust.

In general a minimal Q-robust set need not be connected. For example, if(A, a0) = ((−1, 1), 0) and Q(a)(t) = argmaxt∈[0,1] at (so that F (t) = [0, 1] for all t)then FP(Q(a)) is {0} if a < 0 and it is {1} if a > 0, so the only minimal Q-robustset is {0, 1}. In view of this one must be careful to distinguish between a minimalconnected Q-robust set and a minimal Q-robust set that happens to be connected.

Theorem 8.3.8. If K ⊂ FP(F ) is a Q-robust set, then it contains a minimalQ-robust set, and if K is a connected Q-robust set, then it contains a minimalconnected Q-robust set.

Proof. Let C be the set of Q-robust sets that are contained in K. We order this setby reverse inclusion, so that our goal is to show that C has a maximal element. Thisfollows from Zorn’s lemma if we can show that any completely ordered subset O hasan upper bound in C. The finite intersection property implies that the intersectionof all elements of O is nonempty; let K∞ be this intersection. If K∞ is not Q-robust, then there is a neighborhood V of K∞ such that every neighborhood U ofa0 contains a point a such that Q(a) has no fixed points in V . If L ∈ O, we cannothave L ⊂ V because L is Q-robust, but now {L \ V : L ∈ O } is a collection ofcompact sets with the finite intersection property, so it has a nonempty intersectionthat is contained in K∞ but disjoint from V . Of course this is absurd.

The argument for connected Q-robust sets follows the same lines, except that inaddition to showing that K∞ is Q-robust, we must also show that it is connected.If not there are disjoint open sets V1 and V2 such that K∞ ⊂ V1∪V2 and K∞∩V1 6=∅ 6= K∞ ∩ V2. For each L ∈ O we have L ∩ V1 6= ∅ 6= L ∩ V2, so L \ (V1 ∪ V2) mustbe nonempty because L is connected. As above, {L \ (V1 ∪ V2) : L ∈ O } has anonempty intersection that is contained in K∞ but disjoint from V1 ∪ V2, which isimpossible.

Chapter 9

Approximation of

Correspondences

In extending fixed point theory from functions to correspondences, an importantmethod is to show that continuous functions are dense in the space of correspon-dences, so that any correspondence can be approximated by a function. In thelast chapter we saw such a result (Theorem 8.2.1) for convex valued correspon-dences, but much greater care and ingenuity is required by the arguments showingthat contractible valued correspondences have good approximations. This chap-ter states and proves the key result in this direction. This result was proved inthe Euclidean case by Mas-Colell (1974) and extended to ANR’s by the author inMcLennan (1991).

9.1 The Approximation Result

Our main result can be stated rather easily. We now fix ANR’s X and Y . Weassume throughout this chapter that X is separable, in order to be able to invokethe domination theorem.

Theorem 9.1.1 (Approximation Theorem). Suppose that C and D are compactsubsets of X with C ⊂ intD. Let F : D → Y be an upper semicontinuous con-tractible valued correspondence. Then for any neighborhood U of Gr(F |C) thereare:

(a) a continuous f : C → Z with Gr(f) ⊂ U ;

(b) a neighborhood U ′ of Gr(F ) such that, for any two continuous functions f0, f1 :D → Y with Gr(f0),Gr(f1) ⊂ U ′, there is a homotopy h : C× [0, 1] → Y withh0 = f0|C, h1 = f1|C, and Gr(ht) ⊂ U for all 0 ≤ t ≤ 1.

Roughly, (a) is an existence result, while (b) is uniqueness up to effective equiva-lence.

Here, and later in the book, things would be much simpler if we could haveC = D. More precisely, it would be nice to drop the assumption that C ⊂ intD.This may be possible (that is, I do not know a relevant counterexample) but a proofwould certainly involve quite different methods.

115

116 CHAPTER 9. APPROXIMATION OF CORRESPONDENCES

The following is an initial indication of the significance of this result.

Theorem 9.1.2. If X is a compact ANR with the fixed point property, then anyupper semicontinuous contractible valued correspondence F : X → X has a fixedpoint.

Proof. In the last result let Y = X and C = D = X . Endow X with a metric dX .For each j = 1, 2, . . . let

Uj := { (x′, y′) ∈ X ×X : dX(x, x′) + dX(y, y

′) < 1/j }

for some (x, y) ∈ Gr(F ), let fj : X → X be a continuous function with Gr(fj) ⊂ Uj ,let zj be a fixed point of fj, and let (x′j, y

′j) be a point in Gr(F ) with dX(x

′j , zj) +

dX(y′j, zj) < 1/j. Passing to convergent subsequences, we find that the common

limit of the sequences {x′j}, {y′j}, and {zj} is a fixed point of F .

Much later, applying Theorem 9.1.1, we will show that a nonempty compactcontractible ANR has the fixed point property.

9.2 Extending from the Boundary of a Simplex

The proof of Theorem 9.1.1 begins with a concrete geometric construction thatis given in this section. In subsequent sections we will transport this result toincreasingly general settings, eventually arriving at our objective.

We now fix a locally convex topological vector space T and a convex Q ⊂ T . Asubset Z of a vector space is balanced if λz ∈ Z whenever z ∈ Z and |λ| ≤ 1. SinceT is locally convex, every neighborhood of the origin contains a convex neighborhoodU , and U ∩ −U is a neighborhood that is convex and balanced. Working withbalanced neighborhoods of the origin allows us to not keep track of the differencebetween a neighborhood and its negation.

Proposition 9.2.1. Let A and B be convex balanced neighborhoods of the origin inT with 2A ⊂ B. Suppose S ⊂ Q is compact and c : S×[0, 1] → S is a contraction forwhich there is a δ > 0 such that c(s, t)− c(s′, t′) ∈ B for all (s, t), (s′, t′) ∈ S× [0, 1]with s − s′ ∈ 3A and |t − t′| < δ. Let L be a simplex. Then any continuousf ′ : ∂L → (S + A) ∩Q has a continuous extension f : L→ (S +B) ∩Q.

Proof. Let β be the barycenter of L. We define “polar coordinate” functions

y : L \ {β} → ∂L and t : L \ {β} → [0, 1)

implicitly by requiring that

(1− t(x))y(x) + t(x)β = x.

LetL1 = t−1([0, 1

3]), L2 = t−1([1

3, 23]), L3 = t−1([2

3, 1)) ∪ {β}.

We first define f at points in L2, then extend to L1 and L3.

9.2. EXTENDING FROM THE BOUNDARY OF A SIMPLEX 117

Let d be a metric on L. Since f ′, t(·), and y(·) are continuous, and L2 is compact,for some sufficiently small λ > 0 it is the case that

f ′(y(x))− f ′(y(x′)) ∈ A and |t(x)− t(x′)| < 13δ

for all x, x′ ∈ L2 such that d(x, x′) < λ. There is a polyhedral subdivision of L2

whose cells are the sets

y−1(F ) ∩ t−1(13), y−1(F ) ∩ L2, y−1(F ) ∩ t−1(2

3)

for the various faces F of L. Proposition 2.5.2 implies that repeated barycentricsubdivision of this polyhedral complex results eventually in a simplicial subdivisionof L2 whose mesh is less than λ.

For each vertex v of this subdivision choose s(v) ∈ (f ′(y(v)) + A) ∩ S, and set

f(v) := c(s(v), 3t(v)− 1).

Let ∆ be a simplex of the subdivision of L2 with vertices v1, . . . , vr. We define fon ∆ by linear interpolation on ∆: if x = α1v1 + · · ·+ αrvr, then

f(x) := α1f(v1) + · · ·+ αrf(vr).

This definition does not depend on the choice of ∆ if x is contained in more thanone simplex, it is continuous on each ∆, and the simplices are a finite closed coverof L2, so f is continuous.

Suppose that v and v′ are two vertices of ∆, so they are the endpoints of anedge. We have d(v, v′) < λ, so f ′(y(v))− f ′(y(v′)) ∈ A and |t(v)− t(v′)| < 1

3δ. In

addition, s(v)− f ′(y(v)) and f ′(y(v′))− s(v′) are elements of A, so

s(v)− s(v′) ∈ 3A and |(3t(v)− 1)− (3t(v′)− 1)| < δ,

from which it follows, by hypothesis, that f(v) − f(v′) ∈ B. Consider a pointx = α1v1 + · · ·+ αrvr ∈ ∆. Since f(v1) ∈ S and

f(x)− f(v1) =r

∑

j=1

αj(f(vj)− f(v1))

is a convex combination of the vectors f(vj) − f(v1) for the vertices vj of ∆, wehave f(x) ∈ (f(v1) +B) ∩Q ⊂ (S +B) ∩Q. Thus f(L2) ⊂ (S +B) ∩Q.

We now define f on L1 by setting

f(x) := (1− 3t(x))f ′(y(x)) + 3t(x)f(13β + 2

3y(x)).

Since f is continuous on L2, this formula defines a continuous function. Supposethat

23y(x) + 1

3β = α1v1 + · · ·+ αrvr

as above. Consider a particular vj . Above we showed that

f(23y(x) + 1

3β) ∈ (s(vj) +B) +Q.


The point s(vj) was chosen with f ′(y(vj))−s(vj) ∈ A, and f ′(y(x))−f ′(y(vj)) ∈ Abecause d(2

3y(x) + 1

3β, y(vj)) < λ, so

f ′(y(x)) ∈ (s(vj) + 2A) ∩Q ⊂ (s(vj) +B) ∩Q.

Since f(x) is a convex combination of f ′(y(x)) and f(23y(x) + 1

3β) we have

f(x) ∈ (s(vj) +B) ∩Q ⊂ (S +B) ∩Q.

Thus f(L1) ⊂ (S +B) ∩Q.Let z be the point S is contracted to by c: c(S, 1) = {z}. We define f on L3 by

setting f(x) := z. Of course this is a continuous function whose image is containedin S ⊂ (S +B) ∩Q.

If x ∈ L1 ∩ L2, then t(x) =13and 2

3y(x) + 1

3β = x, so the formula defining f on

L1 agrees with the definition of f for elements of L2 at x. If v is a vertex of thesubdivision of L2 contained in L2 ∩L3, then t(v) =

23, so that the definition of f on

L2 gives f(v) = c(s(v), 3t(v) − 1) = z. If x ∈ L2 ∩ L3, then L2 ∩ L3 contains anysimplex of the subdivision of L2 that has x as an element, and the definition of fon L2 gives f(x) = z. Thus this definition agrees with the definition of f on L2 atpoints in L2 ∩ L3. Thus f is well defined and continuous.

9.3 Extending to All of a Simplicial Complex

As above, Q is a convex subset of T , and we now fix a relatively open Z ⊂ Q.We also fix a simplicial complex K and a subcomplex J .

Proposition 9.3.1. Let F : K → Z be an upper semicontinuous contractible val-ued correspondence. Then for any neighborhood W ⊂ K × Z of Gr(F ) there is aneighborhood W ′ of Gr(F |J) such that any continuous f ′ : J → Z with Gr(f ′) ⊂W ′

has a continuous extension f : K → Z with Gr(f) ⊂W .

The main argument will employ two technical results, the first of which will alsobe applied in the next section. Recall that an ANR can be embedded in a normedspace (Proposition 7.4.3) so it is metrizable.

Lemma 9.3.2. Let X be an ANR, let F : X → Z be an upper semicontinuouscorrespondence with metric d, and let V ⊂ X × Z be a neighborhood of Gr(F ).For any x ∈ X there is δ > 0 and a neighborhood B of the origin in Z such thatUδ(x)× ((F (x) +B) ∩ Z) ⊂ V .

Proof. By the definition of the product topology, for every z ∈ F (x) there existδz > 0 and an open neighborhood Az ⊂ Z of the origin in T such that

Uδz(x)× ((z + Az) ∩ Z) ⊂ V,

and the continuity of addition in T implies that there is a neighborhood Bz of theorigin with Bz + Bz ⊂ Az. Since F (x) is compact there are z1, . . . , zK such thatz1 +Bz1 , . . . , zk +Bzk is a cover of F (x). Let δ := minj δzj and B :=

⋂

j Bzj .

9.3. EXTENDING TO ALL OF A SIMPLICIAL COMPLEX 119

Lemma 9.3.3. Let U1, . . . , Un be a cover of a metric space X by open sets, noneof which are X itself. For each y ∈ X let

ry = maxi:y∈Ui

sup{ ε > 0 : Uε(y) ⊂ Ui },

and let Vy be an open subset of U(√5−2)ry

(y) that contains y. Then for all y, y′ ∈ X,if Vy ∩ Vy′ 6= ∅, then Vy′ ⊂ Ury(y).

Proof. Let α =√5−2 and β = 3−

√5. Suppose Vy∩Vy′ 6= ∅. The distance from y to

any point in Vy′ cannot exceed α(ry+2ry′), so if Vy′ is not contained in Ury(y), thenα(ry+2ry′) > ry, which boils down to 2αry′ > βry. Let iy′ be one of the indices suchthat Ury′

(y) ⊂ Uiy . We claim that x ∈ Uiy′ because ry′ > α(ry+ ry′), which reducesto βry′ > αry. A quick computation verifies that β/2α > α/β, so this followsfrom the inequality above. Since y ∈ Uiy′ , and the distance from y to y′ is less thanα(ry+ry′), we have ry > ry′−α(ry+ry′), which reduces to (α−1)ry > βry′. Togetherthis inequality and the one above imply that 2α/β > (3 −

√5)/(α − 1), but one

may easily compute that in fact these two quantities are equal. This contradictioncompletes the proof.

Proof of Proposition 9.3.1. Letm be the largest dimension of any simplex inK thatis not in J . The main idea is to use induction on m, but one of the methods used inthe construction is subdivision ofK, and the formulation of the induction hypothesismust be sensitive to this. Precisely, we will show that for each k = 0, . . . , m thereis a neighborhood Wk ⊂ W of Gr(F ) and a simplicial subdivision of K such thatif Hk is the union of J with the k-skeleton of some further subdivision, then anyf ′ : J → Z with Gr(f ′) ⊂Wk has an extension f : Hk → Z with Gr(f) ⊂W .

For k = 0 the claim is obvious: we can letW0 = W and takeK itself without anyfurther subdivision. By induction we may assume that the claim has already beenestablished with k − 1 in place of k. That is, there is a neighborhood Wk−1 ⊂ Wof Gr(F ) and a simplicial subdivision of K such that if Hk−1 is the union of Jwith the (k − 1)-skeleton of some further subdivision, then any f ′ : J → Z withGr(f ′) ⊂Wk−1 has an extension f : Hk−1 → Z with Gr(f) ⊂W .

We now develop two open coverings of K. Consider a particular x ∈ K. Fix acontraction cx : F (x) × [0, 1] → F (x). Lemma 9.3.2 allows us to choose a convexbalanced neighborhood Bx of the origin in T and δx > 0 such that

Ux ×(

(F (x) +Bx) ∩ Z)

⊂Wk−1

where Ux := Uδx(x). By choosing Bx sufficiently small we can also have

(F (x) +Bx) ∩Q ⊂ Z.

Since cx is continuous, we can choose a convex balanced neighborhood Ax of the ori-gin in T and a number δx > 0 such that cx(z

′, t′) ∈ cx(z, t)+Bx for all (z, t), (z′, t′) ∈

F (x)× [0, 1] such that z′ − z ∈ 3Ax and |t′ − t| < δx. Replacing Ax with a smallerconvex neighborhood if need be, we may assume that 2Ax ⊂ Bx. Since F is up-per semicontinuous and δx may be replaced by a smaller positive number, we can


insure that F (x′) ⊂ F (x) + 12Ax whenever x′ ∈ Ux. Choose x1, . . . , xn such that

Ux1 , . . . , Uxn is a covering of K. Let A :=⋂ni=1Axi.

The second open covering of K is finer. For each y ∈ K let

ry = maxi:y∈Ui

sup{ ε > 0 : Uε(y) ⊂ Ui }.

The upper semicontinuity of F implies that each y has an open neighborhood Vysuch that F (y′) ⊂ F (y) + 1

2A for all y′ ∈ Uεy(y). We can replace Vy with a smaller

neighborhood to bring about Vy ⊂ U(√5−2)ry

(y). Choose y1, . . . , yp ∈ K such thatVy1, . . . , Vyp cover K. Set

Wk :=

p⋃

j=1

Vyj × ((F (yj) +12A) ∩ Z).

Evidently Gr(F ) ⊂ Wk. We have Wk ⊂ Wk−1 because for each j there is some isuch that Vyj ⊂ Uxi and

(F (yj) +12A) ∩ Z ⊂ ((F (xi) +

12Axi) +

12A) ∩ Z ⊂ (F (xi) + Axi) ∩ Z.

Starting with the subdivision of K obtained at stage k−1, by Proposition 2.5.2repeated barycentric subdivision leads eventually to a subdivision of K with eachsimplex contained in some Vyj . Let Hk be the union of J with the k-skeleton ofsome further subdivision, and fix a continuous f ′ : J → Z with Gr(f ′) ⊂ Wk. Bythe induction hypothesis there is an extension f of f ′ to the (k− 1)-skeleton of thefurther subdivision. Since extensions to each of the k-simplices that are in Hk butnot in J combine to give the desired sort of extension, it suffices to show that thereis an extension to a single such k-simplex L.

By construction there is a j such that L ⊂ Vyj . Let J be the set of j′ suchthat Vyj ∩ Vyj′ 6= ∅. There is some Xi with Vyj′ ⊂ Uxi for all j

′ ∈ J , either becauseall of K is contained in a single Xi or as an application of the lemma above. Theconditions imposed on our construction imply that

f(∂L) ⊂⋃

j′∈JF (yj′) +

12A ⊂

⋃

j′∈JF (yj′) +

12Axi ⊂ F (xi) + Axi .

Now Lemma 9.2.1, with Ax, Bx, δx, F (x), and f |∂L in place of A, B, δ, S, and f ′,gives a continuous extension f : L → Z with f(L) ⊂ (F (xi) + Bxi) ∩ Q, and byconstruction this set is contained in Z. The proof is complete.

9.4 Completing the Argument

The next step is a result in which the domains are subsets of the ANR X .

Proposition 9.4.1. Suppose that C ⊂ D ⊂ X where C and D are compact withC ⊂ intD. Let F : D → Z be an upper semicontinuous contractible valued corre-spondence. Then for any neighborhood V of Gr(F |C) there exist:

(a) a continuous f : C → Z with Gr(f) ⊂ V ;

9.4. COMPLETING THE ARGUMENT 121

(b) a neighborhood V ′ of Gr(F ) such that for any two functions f0, f1 : D → Zwith Gr(f0),Gr(f1) ⊂ V ′ there is a homotopy h : C×[0, 1] → Z with h0 = f0|C,h1 = f1|C, and Gr(ht) ⊂ V for all 0 ≤ t ≤ 1.

The passage from this to the main result is straightforward.

Proof of Theorem 9.1.1. Recall (Proposition 7.4.1) that an ANR is a retract of arelatively open subset of a convex subset of a locally convex space. In particular,we now fix a locally convex space T , an open subset Z of a convex subset of T , anda retraction r : Z → Y . Let i : Y → Z be the inclusion. Let

V := (IdX × r)−1(U).

Proposition 9.4.1(a) implies that there is a continuous f ′ : C → Z with Gr(f ′) ⊂ V ,and setting f := r ◦ f ′ verifies (a) of Theorem 9.1.1.

Let V ′ ⊂ V be a neighborhood of Gr(i ◦ F ) with the property asserted byProposition 9.4.1(b). Let U ′ := (IdX × i)−1(V ′). Suppose that f0, f1 : D → Y withGr(f0),Gr(f1) ⊂ U ′. Then there is a homotopy h : C × [0, 1] → Z with

h0 = i ◦ f0|C , h1 = i ◦ f1|C, and Gr(ht) ⊂ V for all 0 ≤ t ≤ 1,

so that

r ◦ h0 = f0|C , r ◦ h1 = f1|C, and Gr(r ◦ ht) ⊂ U for all 0 ≤ t ≤ 1.

This confirms (b) of Theorem 9.1.1.

The proof of Proposition 9.4.1 depends on two more technical lemmas. Belowd denotes a metric for X . For the two lemmas below an upper semicontinuouscorrespondence F : X → Z is given.

Lemma 9.4.2. Suppose that C ⊂ X is compact, and V ⊂ C ×Z is a neighborhoodof Gr(F |C). Then there is ε > 0 and a neighborhood V of Gr(F ) such that

⋃

(x,z)∈V

Uε(x)× {z} ⊂ V.

Proof. For each x ∈ C Lemma 9.3.2 allows us to choose δx > 0 and a neighborhoodAx of F (x) such that Uδx(x) × Ax ⊂ V . Replacing δx with a smaller numberif need be, we may assume without loss of generality that F (x′) ⊂ Ax for allx′ ∈ Uδx(x). Choose x1, . . . , xH such that Uδx1/2

(x1), . . . ,UδxH /2(xH) cover C. Let

ε := min{δxi/2}, and set

V :=⋃

i

Uδxi/2(xi)×Axi .

Lemma 9.4.3. Suppose that f : S → X is a continuous function, where S isa compact metric space. If U is a neighborhood of Gr(F ◦ f), then there is aneighborhood V of Gr(F ) such that (f × IdZ)

−1(V ) ⊂ U .


Proof. Consider a particular x ∈ X . Applying Lemma 9.3.2, for any s ∈ f−1(x) wecan choose a neighborhood Ns of s and a neighborhood As ⊂ Y of F (x) such thatNs × As ⊂ U . Since f−1(s) is compact, there are s1, . . . , sℓ such that Ns1, . . . , Nsℓ

cover f−1(s). Let A := As1 ∩ . . . ∩ Asℓ, and let W be a neighborhood of x smallthat f−1(W ) ⊂ Ns1 ∪ . . .∪Nsℓ and F (x

′) ⊂ A for all x′ ∈ W . (Such a W must existbecause S is compact and F is upper semicontinuous.) Then

(f × IdY )−1(W × A) ⊂

⋃

i

Nsi × A ⊂ U.

Since x was arbitrary, this establishes the claim.

Proof of Proposition 9.4.1. Lemma 9.4.2 gives a neighborhood V ′′ of Gr(F ) andε > 0 such that

⋃

(x,z)∈V ′′

Uε(x)× {z} ⊂ V.

After replacing ε with a smaller number, Uε(C) is contained in the interior of D.Because X is separable, the domination theorem (Theorem 7.6.3) implies that thereis a simplicial complex K that ε-dominates D by virtue of the maps ϕ : D → Kand ψ : K → X . Let

W ′′ := (ψ × IdZ)−1(V ′′).

Since ψ ◦ ϕ is ε-homotopic to IdD we have ϕ(C) ⊂ ψ−1(Uε(C)). Since ϕ(C)is compact and ψ−1(Uε(C)) is open, Proposition 2.5.2 implies that after repeatedsubdivisions of K the subcomplex H consisting of all simplices that intersect ϕ(C)will satisfy ψ(H) ⊂ Uε(C). SinceW

′′ is a neighborhood of Gr(F ◦ψ|H), Proposition9.3.1 implies the existence of a function f ′ : H → Z with Gr(f ′) ⊂ W ′′. Letf := f ′ ◦ ϕ|C . Then Gr(f) ⊂ V , which verifies (a), because

(ϕ|C × IdZ)−1(W ′′) = ((ψ ◦ ϕ|C)× IdZ)

−1(V ′′) ⊂⋃

(x,z)∈V ′′

Uε(x)× {z} ⊂ V. (∗)

Turning to (b), let G : H× [0, 1] → Z be the correspondence G(z, t) = F (ψ(z)).We apply Proposition 9.3.1, with G, H × [0, 1], W ′′ × [0, 1], and H × {0, 1} inplace of F , K, W , and J respectively, obtaining neighborhoods W ′

0,W′1 ⊂ W ′′ of

Gr(F ◦ψ|H) such that for any continuous functions f ′0, f

′1 : H → Z with Gr(f ′

0) ⊂W ′0

and Gr(f ′1) ⊂ W ′

1, there is a homotopy h′ : H × [0, 1] → Z with h′0 = f ′0, h

′1 = f ′

1,and Gr(h′t) ⊂ W ′′ for all t. Let W ′ =W ′

0 ∩W ′1.

Lemma 9.4.3 implies that there is a neighborhood V ′ of Gr(F ) such that

(ψ|H × IdZ)−1(V ′) ⊂ W ′.

Replacing V ′ with V ′ ∩ V ′′ if need be, we may assume that V ′ ⊂ V ′′.Now consider continuous f0, f1 : D → Z with Gr(f0),Gr(f1) ⊂ V ′. We have

Gr(f0 ◦ ψ|H),Gr(f1 ◦ ψ|H) ⊂W ′.

Therefore there is a homotopy j : H×[0, 1] → Z with j0 = f0◦ψ|H, j1 = f1◦ψ|H , andGr(jt) ⊂W ′′ for all t. Let h′′ : C× [0, 1] → Z be the homotopy h′′(x, t) = j(ϕ(x), t).In view of (∗) we have

Gr(h′′t ) ⊂ (ϕ|C × IdZ)−1(W ′′) ⊂ V

9.4. COMPLETING THE ARGUMENT 123

for all t. Of course h′′0 = f0 ◦ ψ ◦ ϕ|C and h′′1 = f1 ◦ ψ ◦ ϕ|C .We now construct a homotopy h′ : C×[0, 1] → Z with h′0 = f0|C , h′1 = f0◦ψ◦ϕ|C ,

and Gr(h′t) ⊂ V for all t. Let η : D × [0, 1] → X be an ε-homotopy with η0 = IdDand η1 = ψ ◦ ϕ, and define h′ by h′(x, t) := f0(η(x, t)). Then h′ has the desiredendpoints, and for all (x, t) in the domain of h′ we have (x, h′t(x)) ∈ V becaused(x, η(x, t)) < ε and

(η(x, t), h′t(x)) = (η(x, t), f0(η(x, t))) ∈ V ′ ⊂ V ′′.

Similarly, there is a homotopy h′′′ : C × [0, 1] → Z with h′′′0 = f0 ◦ ψ ◦ ϕ|C ,h′′′1 = f1|C , and Gr(h′′′t ) ⊂ V for all t. To complete the proof of (b) we construct ahomotopy h by setting ht = h′3t for 0 ≤ t ≤ 1/3, ht = h′′3t−1 for 1/3 ≤ t ≤ 2/3, andht = h′′′3t−2 for 2/3 ≤ t ≤ 1.

Part II

Smooth Methods

124

Chapter 10

Differentiable Manifolds

This chapter introduces the basic concepts of differential topology: ‘manifold,’‘tangent vector,’ ‘smooth map,’ ‘derivative.’ If these concepts are new to you,you will probably be relieved to learn that these are just the basic concepts ofmultivariate differential calculus, with a critical difference.

In multivariate calculus you are handed a coordinate system, and a geometry,when you walk in the door, and everything is a calculation within that given Eu-clidean space. But many of the applications of multivariate calculus take place inspaces like the sphere, or the physical universe, whose geometry is not Euclidean.The theory of manifolds provides a language for the concepts of differential calculusthat is in many ways more natural, because it does not presume a Euclidean setting.Roughly, this has two aspects:

• In differential topology spaces that are locally homeomorphic to Euclideanspaces are defined, and we then impose structure that allows us to talk aboutdifferentiation of functions between such spaces. The concepts of interestto differential topology per se are those that are “invariant under diffeomor-phism,” much as topology is sometimes defined as “rubber sheet geometry,”namely the study of those properties of spaces that don’t change when thespace is bent or stretched.

• The second step is to impose local notions of angle and distance at each pointof a manifold. With this additional structure the entire range of geometricissues can be addressed. This vast subject is called differential geometry.

For us differential topology will be primarily a tool that we will use to set up anenvironment in which issues related to fixed points have a particularly simple andtractable structure. We will only scratch its surface, and differential geometry willnot figure in our work at all.

The aim of this chapter is provide only as much information as we will needlater, in the simplest and most concrete manner possible. Thus our treatment ofthe subject is in various ways terse and incomplete, even as an introduction to thistopic, which has had an important influence on economic theory. Milnor (1965) andGuillemin and Pollack (1974) are recommended to those who would like to learn abit more, and at a somewhat higher level Hirsch (1976) is more comprehensive, butstill quite accessible.

125

126 CHAPTER 10. DIFFERENTIABLE MANIFOLDS

10.1 Review of Multivariate Calculus

We begin with a quick review of the most important facts of multivariate dif-ferential calculus. Let f : U → Rn be a function where U ⊂ Rm is open. Recallthat if r ≥ 1 is an integer, we say that f is Cr if all partial derivatives of order≤ r are defined and continuous. For reasons that will become evident in the nextparagraph, it can be useful to extend this notation to include r = 0, with C0 inter-preted as a synonym for “continuous.” We say that f is C∞ if it is Cr for all finiter. An order of differentiability is either a nonnegative integer r or ∞, and wewrite 2 ≤ r ≤ ∞, for example, to indicate that r is such an object, within the givenbounds.

If f is C1, then f is differentiable: for each x ∈ U and ε > 0 there is δ > 0such that

‖f(x′)− f(x)−Df(x)(x′ − x)‖ ≤ ε‖x′ − x‖for all x′ ∈ U with ‖x′ − x‖ < δ, where the derivative of f at x is the linearfunction

Df(x) : Rm → Rn

given by the matrix of first partial derivatives at x. If f is Cr, then the function

Df : U → L(Rm,Rn)

is Cr−1 if we identify L(Rm,Rn) with the space Rn×m of n × m matrices. Thereader is expected to know the standard facts of elementary calculus, especiallythat addition and multiplication are C∞, so that functions built up from theseoperations (e.g., linear functions and matrix multiplication) are known to be C∞.

There are three basic operations used to construct new Cr functions from givefunctions. The first is restriction of the function to an open subset of its domain,which requires no comment because the derivative is unaffected. The second isforming the cartesian product of two functions: if f1 : U → Rn1 and f2 : U → Rn2

are functions, we define f1× f2 : U → Rn1+n2 to be the function x 7→ (f1(x), f2(x)).Evidently f1 × f2 is Cr if and only if f1 and f2 are Cr, and when this is the casewe have

D(f1 × f2) = Df1 ×Df2.

The third operation is composition. The most important theorem of multivariatecalculus is the chain rule: if U ⊂ Rm and V ⊂ Rn are open and f : U → V andg : V → Rp are C1, then g ◦ f is C1 and

D(g ◦ f)(x) = Dg(f(x)) ◦Df(x)

for all x ∈ U . Of course the composition of two C0 functions is C0. Arguing induc-tively, suppose we have already shown that the composition of two Cr−1 functionsis Cr−1. If f and g are Cr, then Dg ◦ f is Cr−1, and we can apply the result aboveabout cartesian products, then the chain rule, to the composition

x 7→ (Dg(f(x)), Df(x)) 7→ Dg(f(x)) ◦Df(x)

to show that D(g ◦ f) is Cr−1, so that g ◦ f is Cr.

10.1. REVIEW OF MULTIVARIATE CALCULUS 127

Often the domain and range of the pertinent functions are presented to us asvector spaces without a given or preferred coordinate system, so it is important toobserve that we can use the chain rule to achieve definitions that are independent ofthe coordinate systems. Let X and Y be m- and n-dimensional vector spaces. (Inthis chapter all vector spaces are finite dimensional, with R as the field of scalars.)Let c : X → Rm and d : Y → Rn be linear isomorphisms. If U ⊂ X is open, we cansay that a function f : U → Y is Cr, by definition, if d ◦ f ◦ c−1 : c(U) → Rk is Cr,and if this is the case and x ∈ U , then we can define the derivative of f at x to be

Df(x) = d−1 ◦D(d ◦ f ◦ c−1)(c(x)) ◦ c ∈ L(X, Y ).

Using the chain rule, one can easily verify that these definitions do not dependon the choice of c and d. In addition, the chain rule given above can be used toshow that this “coordinate free” definition also satisfies a chain rule. Let Z be athird p-dimensional vector space. Then if V ⊂ Y is open, g : V → Z is Cr, andf(U) ⊂ V , then g ◦ f is Cr and D(g ◦ f) = Dg ◦Df .

Sometimes we will deal with functions whose domains are not open, and we needto define what it means for such a function to be Cr. Let S be a subset of X of anysort whatsoever. If Y is another vector space and f : S → Y is a function, thenf is Cr by definition if there is an open U ⊂ X containing S and a Cr functionF : U → Y such that f = F |S. Evidently being Cr isn’t the same thing as havinga well defined derivative at each point in the domain!

Note that the identity function on S is always Cr, and the chain rule impliesthat compositions of Cr functions are Cr. Those who are familiar with the categoryconcept will recognize that there is a category of subsets of finite dimensional vectorspaces and Cr maps between them. (If you haven’t heard of categories it wouldcertainly be a good idea to learn a bit about them, but what happens later won’tdepend on this language.)

We now state coordinate free versions of the inverse and implicit function theo-rems. Since you are expected to know the usual, coordinate dependent, formulationsof these results, and it is obvious that these imply the statements below, we giveno proofs.

Theorem 10.1.1 (Inverse Function Theorem). If n = m (that is, X and Y areboth m-dimensional) U ⊂ X is open, f : U → Y is Cr, x ∈ U , and Df(x) isnonsingular, then there is an open V ⊂ U containing x such that f |V is injective,f(V ) is open in Y , and (f |V )−1 is Cr.

Suppose that U ⊂ X × Y is open and f : U → Z is a function. If f is C1, then,at a point (x, y) ∈ U , we can define “partial derivatives” Dxf(x, y) ∈ L(X,Z) andDyf(x, y) ∈ L(Y, Z) to be the derivatives of the functions

f(·, y) : { x ∈ X : (x, y) ∈ U } → Z and f(x, ·) : { y ∈ Y : (x, y) ∈ U } → Z

at x and y respectively.

Theorem 10.1.2 (Implicit Function Theorem). Suppose that p = n. (That is Yand Z have the same dimension.) If U ⊂ X × Y is open, f : U → Z is Cr,


(x0, y0) ∈ U , f(x0, y0) = z0, and Dyf(x0, y0) is nonsingular, then there is an openV ⊂ X containing x0, an open W ⊂ U containing (x0, y0), and a Cr functiong : V → Y such that g(x0) = y0 and

{ (x, g(x)) : x ∈ V } = { (x, y) ∈ W : f(x, y) = z0 }.

In additionDg(x0) = −Dyf(x0, y0)

−1 ◦Dxf(x0, y0).

We will sometimes encounter settings in which the decomposition of the domaininto a cartesian product is not given. Suppose that T is a fourth vector space,U ⊂ T is open, t0 ∈ U , f : U → Z is Cr, and Df(t0) : T → Z is surjective.Let Y be a linear subspace of T of the same dimension as Z such that Df(t0)|Yis surjective, and let X be a complementary linear subspace: X ∩ Y = {0} andX + Y = T . If we identify T with X × Y , then the assumptions of the result abovehold. We will understand the implicit function theorem as extending in the obviousway to this setting.

10.2 Smooth Partitions of Unity

A common problem in differentiable topology is the passage from local to global.That is, one is given or can prove the existence of objects that are defined locallyin a neighborhood of each point, and one wishes to construct a global object withthe same properties. A common and simple method of doing so is to take convexcombinations, where the weights in the convex combination vary smoothly. Thissection develops the technology underlying this sort of argument, then developssome illustrative and useful applications.

Fix a finite dimensional vector space X .

Definition 10.2.1. Suppose that {Uα}α∈A is a collection of open subsets of X,U =

⋃

α Uα, and 0 ≤ r ≤ ∞. A Cr partition of unity for U subordinate to{Uα} is a collection {ϕβ : X → [0, 1]}β∈B of Cr functions such that:

(a) for each β the closure of Vβ = { x ∈ X : ϕβ(x) > 0 } is contained in some Uα;

(b) {Vβ} is locally finite (as a cover of U);

(c)∑

β ϕβ(x) = 1 for each x ∈ U .

The first order of business is to show that such partitions of unity exist. Thekey idea is the following ingenious construction.

Lemma 10.2.2. There is a C∞ function γ : R → R with γ(t) = 0 for all t ≤ 0 andγ(t) > 0 for all t > 0.

Proof. Let

γ(t) :=

{

0, t ≤ 0,

e−1/t, t > 0.

10.2. SMOOTH PARTITIONS OF UNITY 129

Standard facts of elementary calculus can be combined inductively to show that foreach r ≥ 1 there is a polynomial Pr such that γ(r)(t) is Pr(1/t)e

−1/t if t > 0. Sincethe exponential function dominates any polynomial, it follows that γ(r)(t)/t→ 0 ast→ 0, so that each γ(r) is differentiable at 0 with γ(r+1)(0) = 0. Thus γ is C∞.

Note that for any open rectangle∏m

i=1(ai, bi) ⊂ Rm the function

x 7→∏

i

γ(xi − ai)γ(bi − xi)

is C∞, positive everywhere in the rectangle, and zero everywhere else.

Lemma 10.2.3. If {Uα} is a collection of open subsets of Rm and U =⋃

α Uα,then U has a locally finite (relative to U) covering by open rectangles, each of whoseclosures in contained in some Uα.

Proof. For any integer j ≥ 0 and vector k = (k1, . . . , km) with integer componentslet

Qj,k =

m∏

i=1

(

(ki − 1)/2j, (ki + 1)/2j)

and Q′j,k =

m∏

i=1

(

(ki − 2)/2j, (ki + 3)/2j)

.

The cover consists of those Qj,k such that the closure of Qj,k is contained in someUα and, if j > 0, there is no α such that the closure of Q′

j,k is contained in Uα.Consider a point x ∈ U . The last requirement implies that x has a neighborhoodthat intersects only finitely many cubes in the collection, which is to say that thecollection is locally finite.

For any j the Qj,k cover Rm, so there is some k such that x ∈ Qj,k, and if jis sufficiently small, then the closure of Qj,k is contained in some Uα. If Qj,k isnot in the collection, then the closure of Q′

j,k is contained in some Uα. Define k′

by letting k′i be ki/2 or (ki + 1)/2 according to whether ki is even or odd. ThenQj,k ⊂ Qj−1,k′ ⊂ Q′

j,k. Repeating this leads eventually to an element of the collectionthat contains x, so the collection is indeed a cover of U .

Imposing a coordinate system on X , then combining the observations above,proves that:

Theorem 10.2.4. For any collection {Uα}α∈A of open subsets of X there is a C∞

partition of unity for⋃

α Uα subordinate to {Uα}.

For future reference we mention a special case that comes up frequently:

Corollary 10.2.5. If U ⊂ X is open and C0 and C1 are disjoint closed subsets ofU , then there is a C∞ function α : U → [0, 1] with α(x) = 0 for all x ∈ C0 andα(x) = 1 for all x ∈ C1.

Proof. Let {ϕ0, ϕ1} be a C∞ partition of unity subordinate to the open cover {U \C1, U \ C0}, and set α = ϕ1.


Now let Y be a second vector space. As a first application we consider a problemthat arises in connection with the definition in the last section of what it means fora Cr function f : S → Y on a general domain S ⊂ X to be Cr. We say that f islocally Cr if each x ∈ S has a neighborhood Ux ⊂ X that is the domain of a Cr

function Fx : Ux → Y with Fx|S∩Ux = f |S∩Ux. This seems like the “conceptuallycorrect” definition of what it means for a function to be Cr, because this should bea local property that can be checked by looking at a neighborhood of an arbitrarypoint in the function’s domain. A Cr function is locally Cr, obviously. Fortunatelythe converse holds, so that the definition we have given agrees with the one thatis conceptually correct. (In addition, it will often be pleasant to apply the givendefinition because it is simpler!)

Proposition 10.2.6. If S ⊂ X and f : S → Y is locally Cr, then f is Cr.

Proof. Let {Fx : Ux → Y }x∈S be as above. Let {ϕβ}β∈B be a C∞ partition of unityfor U =

⋃

x Ux subordinate to {Ux}. For each β choose an xβ such that the closureof { x : ϕβ(x) > 0 } is contained in Uxβ , and let F :=

∑

β ϕβ · Fxβ : U → Y . ThenF is Cr because each point in U has a neighborhood in which it is a finite sum ofCr functions. For x ∈ S we have

F (x) =∑

β

ϕβ(x) · Fxβ(x) =∑

β

ϕβ(x) · f(x) = f(x).

Here is another useful result applying a partition of unity.

Proposition 10.2.7. For any S ⊂ X, C∞(S, Y ) is dense in CS(S, Y ).

Proof. Fix a continuous f : S → Y and an open W ⊂ S × Y containing the graphof f . Our goal is to find a C∞ function from S to Y whose graph is also containedin W .

For each p ∈ S choose a neighborhood Up of p and εp > 0 small enough that

f(Up ∩ S) ⊂ Uεp(f(p)) and (Up ∩ S)×U2εp(f(p)) ⊂W.

Let U =⋃

p∈W Up. Let {ϕβ}β∈B be a C∞ partition of unity for U subordinate to{Up}p∈S. For each β let Vβ = { x : ϕβ(x) > 0 }, choose some pβ such that Vβ ⊂ Upβ ,

and let Uβ = Upβ and εβ = εpβ . Let f : U → Y be the function x 7→ ∑

β ϕβ(x)·f(pβ).Since {Vβ} is locally finite, f : U → Y is C∞, so f |S is C∞.

We still need to show that the graph of f |S is contained in W . Consider somep ∈ S. Of those β with ϕβ(p) > 0, let α be one of those for which εβ is maximal. Ofcourse p ∈ Upα, and f(p) ∈ U2εα(f(pα)) because for any other β such that ϕβ(p) > 0we have

‖f(pβ)− f(pα)‖ ≤ ‖f(pβ)− f(p)‖+ ‖f(p)− f(pα)‖ < 2εα.

Therefore (p, f(p)) ∈ Upα ×U2εα(f(pα)) ⊂ W .

10.3. MANIFOLDS 131

10.3 Manifolds

The maneuver we saw in Section 10.1—passing from a calculus of functionsbetween Euclidean spaces to a calculus of functions between vector spaces—wasaccomplished not by fully “eliminating” the coordinate systems of the domain andrange, but instead by showing that the “real” meaning of the derivative would notchange if we replaced those coordinate systems by any others. The definition of aCr manifold, and of a Cr function between such manifolds, is a more radical andfar reaching application of this idea.

A manifold is an object like the sphere, the torus, and so forth, that “looks like”a Euclidean space in a neighborhood of any point, but which may have differentsorts of large scale structure. We first of all need to specify what “looks like” means,and this will depend on a degree of differentiability. Fix an m-dimensional vectorspace X , an open U ⊂ X , and a degree of differentiability 0 ≤ r ≤ ∞.

Recall that if A and B are topological spaces, a function e : A → B is anembedding if it is continuous and injective, and its inverse is continuous whene(A) has the subspace topology. Concretely, e maps open sets of A to open subsetsof e(A). Note that the restriction of an embedding to any open subset of the domainis also an embedding.

Lemma 10.3.1. If U ⊂ X is open and ϕ : U → Rk is a Cr embedding such thatfor all x ∈ U the rank of Dϕ(x) is m, then ϕ−1 is a Cr function.

Proof. By Proposition 10.2.6 it suffices to show that ϕ−1 is locally Cr. Fix a point pin the image of ϕ, let x = ϕ−1(p), let X ′ be the image ofDϕ(x), and let π : Rk → X ′

be the orthogonal projection. Since ϕ is an immersion, X ′ is m-dimensional, andthe rank of D(π ◦ ϕ)(x) = π ◦ Dϕ(x) is m. The inverse function theorem impliesthat the restriction of π ◦ϕ to some open subset of U containing x has a Cr inverse.Now the chain rule implies that ϕ−1|ϕ(U) = (π ◦ ϕ|U)−1 ◦ π|ϕ(U) is C

r.

Definition 10.3.2. A set M ⊂ Rk is an m-dimensional Cr manifold if, foreach p ∈ M , there is a Cr embedding ϕ : U → M , where U is an open subset ofan m-dimensional vector space, such that for all x ∈ U the rank of Dϕ(x) is mand ϕ(M) is a relatively open subset of M that contains p. We say that ϕ is a Cr

parameterization for M and ϕ−1 is a Cr coordinate chart for M . A collection{ϕi}i∈I of Cr parameterizations for M whose images cover M is called a Cr atlas

for M .

Although the definition above makes sense when r = 0, we will have no use forthis case because there are certain pathologies that we wish to avoid. Among otherthings, the beautiful example known as theAlexander horned sphere (Alexander(1924)) shows that a C0 manifold may have what is known as a wild embedding

in a Euclidean space. From this point on we assume that r ≥ 1.There are many “obvious” examples of Cr manifolds such as spheres, the torus,

etc. In analytic work one should bear in mind the most basic examples:

(i) A set S ⊂ Rk is discrete if each p ∈ S has a neighborhood W such thatS ∩W = {p}. A discrete set is a 0-dimensional Cr manifold.


(ii) Any open subset (including the empty set) of an m-dimensional affine sub-space of Rk is an m-dimensional Cr manifold. More generally, an open subsetof an m-dimensional Cr manifold is itself an m-dimensional Cr manifold.

(iii) If U ⊂ Rm is open and φ : U → Rk−m is Cr, then the graph

Gr(φ) := { (x, φ(x)) : x ∈ U } ⊂ Rk

of φ is an m-dimensional Cr manifold, because ϕ : x 7→ (x, φ(x)) is a Cr

parameterization.

10.4 Smooth Maps

Let M ⊂ Rk be an m-dimensional Cr manifold, and let N ⊂ Rℓ be an n-dimensional Cr manifold. We have already defined what it means for a functionf : M → N is Cr to be Cr: there is an open W ⊂ Rk that contains M and a Cr

function F : W → Rℓ such that F |M = f . The following characterization of thiscondition is technically useful and conceptually important.

Proposition 10.4.1. For a function f :M → N the following are equivalent:

(a) f is Cr;

(b) for each p ∈ M there are Cr parameterizations ϕ : U → M and ψ : V → Nsuch that p ∈ ϕ(U), f(ϕ(U)) ⊂ ψ(V ), and ψ−1 ◦ f ◦ ϕ is a Cr function;

(c) ψ−1 ◦ f ◦ ϕ is a Cr function whenever ϕ : U → M and ψ : V → N are Cr

parameterizations such that f(ϕ(U)) ⊂ ψ(V ).

Proof. Because compositions of Cr functions are Cr, (a) implies (c), and since eachpoint in a manifold is contained in the image of a Cr parameterization, it is clearthat (c) implies (b). Fix a point p ∈M and Cr parameterizations ϕ : U → M andψ : V → N with p ∈ ϕ(U) and f(ϕ(U)) ⊂ ψ(V ). Lemma 10.3.1 implies that ϕ−1

and ψ−1 are Cr, so ψ ◦ (ψ−1 ◦ f ◦ ϕ) ◦ ψ−1 is Cr on its domain of definition. Sincep was arbitrary, we have shown that f is locally Cr, and Proposition 10.2.6 impliesthat f is Cr. Thus (b) implies (a).

There is a more abstract approach to differential topology (which is followed inHirsch (1976)) in which an m-dimensional Cr manifold is a topological space Mtogether with a collection {ϕα : Uα →M }α∈A, where each ϕα is a homeomorphismbetween an open subset Uα of an m-dimensional vector space and an open subsetof M ,

⋃

α ϕα(Uα) = M , and for any α, α′ ∈ A, ϕ−1α′ ◦ ϕα is Cr on its domain of

definition. If N with collection {ψβ : Vβ :→ N } is an n-dimensional Cr manifold,a function f : M → N is Cr by definition if, for all α and β, ψ−1

β ◦ f ◦ ϕα is a Cr

function on its domain of definition.The abstract approach is preferable from a conceptual point of view; for ex-

ample, we can’t see some Rk that contains the physical universe, so our physicaltheories should avoid reference to such an Rk if possible. (Sometimes Rk is called

10.5. TANGENT VECTORS AND DERIVATIVES 133

the ambient space.) However, in the abstract approach there are certain technicaldifficulties that must be overcome just to get acceptable definitions. In addition,the Whitney embedding theorems (cf. Hirsch (1976)) show that, under as-sumptions that are satisfied in almost all applications, a manifold satisfying theabstract definition can be embedded in some Rk, so our approach is not less generalin any important sense. From a technical point of view, the assumed embeddingof M in Rk is extremely useful because it automatically imposing conditions suchas metrizability and thus paracompactness, and it allows certain constructions thatsimplify many proofs.

There is a category of Cr manifolds and Cr maps between them. (This canbe proved from the definitions, or we can just observe that this category can beobtained from the category of subsets of finite dimensional vector spaces and Cr

maps between them by restricting the objects.) The notion of isomorphism for thiscategory is:

Definition 10.4.2. A function f : M → N is a Cr-diffeomorphism if f is abijection and f and f−1 are both Cr. If such an f exists we say that M and N areCr diffeomorphic.

If M and N are Cr diffeomorphic we will, for the most part, regard them astwo different “realizations” of “the same” object. In this sense the spirit of thedefinition of a Cr manifold is that the particular embedding of M in Rk is of noimportance, and k itself is immaterial.

10.5 Tangent Vectors and Derivatives

There are many notions of “derivative” in mathematics, but invariably the termrefers to a linear approximation of a function that is accurate “up to first order.”The first step in defining the derivative of a Cr map between manifolds is to specifythe vector spaces that serve as the linear approximation’s domain and range.

Fix an m-dimensional Cr manifold M ⊂ Rk. Throughout this section, when werefer to a Cr parameterization ϕ : U →M , it will be understood that U is an opensubset of the m-dimensional vector space X .

Definition 10.5.1. If ϕ : U →M is a C1 parameterization and p = ϕ(x), then thetangent space of M at p is the image of this linear transformation Dϕ(x) : X →Rk.

We should check that this does not depend on the choice of ϕ. If ϕ′ : U ′ →M isa second C1 parameterization with ϕ′(x′) = p, then the chain rule gives Dϕ′(x′) =Dϕ(x)◦D(ϕ−1◦ϕ′)(x′), so the image of Dϕ′(x′) is contained in the image of Dϕ(x).

We can combine the tangent spaces at the various points of M :

Definition 10.5.2. The tangent bundle of M is

TM :=⋃

p∈M{p} × TpM ⊂ Rk × Rk.


For a Cr parameterization ϕ : U →M for M we define

Tϕ : U ×X → { (p, v) ∈ TM : p ∈ ϕ(U) } ⊂ TM

by settingTϕ(x, w) := (ϕ(x), Dϕ(x)w).

Lemma 10.5.3. If r ≥ 2, then Tϕ is a Cr−1 parameterization for TM .

Proof. It is easy to see that Tϕ is a Cr−1 immersion, and that it is injective. Theinverse function theorem implies that its inverse is continuous.

Every p ∈M is contained in the image of some Cr parameterization ϕ, and forevery v ∈ TpM , (p, v) is in the image of Tϕ, so the images of the Tϕ cover TM .Thus:

Proposition 10.5.4. If r ≥ 2, then TM is a Cr−1 manifold.

Fix a second Cr manifold N ⊂ Rℓ, which we assume to be n-dimensional, anda Cr function f :M → N .

Definition 10.5.5. If F is a C1 extension of f to a neighborhood of p, the deriva-

tive of f at p is the linear function

Df(p) = DF (p)|TpM : TpM → Tf(p)N.

We need to show that this definition does not depend on the choice of extensionF . Let ϕ : U → M be a Cr parameterization whose image is a neighborhood of p,let x = ϕ−1(p), and observe that, for any v ∈ TpM , there is some w ∈ Rm such thatv = Dϕ(x)w, so that

DF (p)v = DF (p)(Dϕ(x)w) = D(F ◦ ϕ)(x)w = D(f ◦ ϕ)(x)w.

We also need to show that the image of Df(p) is, in fact, contained in Tf(p)N .Let ψ : V → N be a Cr parameterization of a neighborhood of f(p). The lastequation shows that the image of Df(p) is contained in the image of

D(f ◦ ϕ)(x) = D(ψ ◦ ψ−1 ◦ f ◦ ϕ)(x) = Dψ(ψ−1(f(p))) ◦D(ψ−1 ◦ f ◦ ϕ),

so the image of Df(p) is contained in the image of Dψ−1(ψ(f(p)), which is Tf(p)N .Naturally the chain rule is the most important basic result about the derivative.

We expect that many readers have seen the following result, and at worst it is asuitable exercise, following from the chain rule of multivariable calculus withouttrickery, so we give no proof.

Proposition 10.5.6. If M ⊂ Rk, N ⊂ Rℓ, and P ⊂ Rm are C1 manifolds, andf :M → N and g : N → P are C1 maps, then, at each p ∈M ,

D(g ◦ f)(p) = Dg(f(p)) ◦Df(p).

We can combine the derivatives defined at the various points of M :

10.5. TANGENT VECTORS AND DERIVATIVES 135

Definition 10.5.7. The derivative of f is the function Tf : TM → TN given by

Tf(p, v) := (f(p), Df(p)v).

These objects have the expected properties:

Proposition 10.5.8. If r ≥ 2, then Tf is a Cr−1 function.

Proof. Each (p, v) ∈ TM is in the image of Tϕ for some Cr parameterization ϕwhose image contains p. The chain rule implies that

Tf ◦ Tϕ : (x, w) 7→(

f(ϕ(x)), D(f ◦ ϕ)(x)w)

,

which is a Cr−1 function. We have verified that Tf satisfies (c) of Proposition10.4.1.

Proposition 10.5.9. T IdM = IdTM .

Proof. Since IdRk is a C∞ extension of IdM , we clearly have DIdM(p) = IdTpM foreach p ∈M . The claim now follows directly from the definition of T IdM .

Proposition 10.5.10. If M , N , and P are Cr manifolds and f : M → N andg : N → P are Cr functions, then T (g ◦ f) = Tg ◦ Tf .

Proof. Using Proposition 10.5.6 we compute that

Tg(Tf(p, v)) = Tg(f(p), Df(p)v) = (g(f(p)), Dg(f(p))Df(p)v)

= (g(f(p)), D(g ◦ f)(p)v) = T (g ◦ f)(p, v).

For the categorically minded we mention that Proposition 10.5.4 and the lastthree results can be summarized very succinctly by saying that if r ≥ 2, then Tis a functor from the category of Cr manifolds and Cr maps between them to thecategory of Cr−1 manifolds and Cr−1 maps between them. Again, we will not usethis language later, so in a sense you do not need to know what a functor is, butcategorical concepts and terminology are pervasive in modern mathematics, so itwould certainly be a good idea to learn the basic definitions.

Let’s relate the definitions above to more elementary notions of differentiation.Consider a C1 function f : (a, b) → M and a point t ∈ (a, b). Formally Df(t) isa linear function from Tt(a, b) to Tf(t)M , but thinking about things in this way isusually rather cumbersome. Of course Tt(a, b) is just a copy of R, and we definef ′(t) = Df(t)1 ∈ Tf(t)M , where 1 is the element of Tt(A, b) corresponding to 1 ∈ R.When M is an open subset of R we simplify further by treating f ′(t) as a numberunder the identification of Tf(t)M with R. In this way we recover the concept ofthe derivative as we first learned it in elementary calculus.


10.6 Submanifolds

For almost any kind of mathematical object, we pay special attention to subsets,or perhaps “substructures” of other sorts, that share the structural properties of theobject. One only has to imagine a smooth curve on the surface of a sphere to seethat such substructures of manifolds arise naturally. Fix a degree of differentiability1 ≤ r ≤ ∞. If M ⊂ Rk is an m-dimensional Cr manifold, N is an n-dimensionalthat is also embedded in Rk, and N ⊂M , then N is a Cr submanifold ofM . Theinteger m− n is called the codimension of N in M .

The reader can certainly imagine a host of examples, so we only mention onethat might easily be overlooked because it is so trivial: any open subset of Mis a Cr manifold. Conversely, any codimension zero submanifold of M is just anopen subset. Evidently submanifolds of codimension zero are not in themselvesparticularly interesting, but of course they occur frequently.

Submanifolds arise naturally as images of smooth maps, and as solution sets ofsystems of equations. We now discuss these two points of view at length, arrivingeventually at an important characterization result. Let M ⊂ Rk and N ⊂ Rℓ be Cr

manifolds that are m- and n-dimensional respectively, and let f :M → N be a Cr

function. We say that p ∈M is:

(a) an immersion point of f if Df(p) : TpM → Tf(p)N is injective;

(b) a submersion point of f if Df(p) is surjective;

(c) a diffeomorphism point of f is Df(p) is a bijection.

There are now a number of technical results. Collectively their proofs display theinverse function and the implicit function theorem as the linchpins of the analysissupporting this subject.

Proposition 10.6.1. If p is an immersion point of f , then there is a neighborhoodV of p such that f(V ) is an m-dimensional Cr submanifold of N . In additionDf(p) : TpM → Tf(p)f(V ) is a linear isomorphism

Proof. Let ϕ : U → M be a Cr parameterization for M whose image containsp, and let x = ϕ−1(p). The continuity of the derivative implies that there is aneighborhood U ′ of x such that for all x′ ∈ U ′ the rank of D(f ◦ ϕ)(x′) is m. LetX ⊂ Rℓ be the image of Df(p), and let π : Rℓ → X be the orthogonal projection.Possibly after replacing U ′ with a suitable smaller neighborhood of x, the inversefunction theorem implies that π ◦f ◦ϕ|U ′ is invertible. Let V = ϕ(U ′). Now f ◦ϕ|U ′

is an embedding because its inverse is (π ◦ f ◦ ϕ|U ′)−1 ◦ π. Lemma 10.3.1 impliesthat the inverse of f is also Cr, so, for every x′ ∈ U ′ the rank of D(f ◦ ϕ)(x′) is m,so f(V ) = f(ϕ(U ′)) satisfies Definition 10.3.2.

The final assertion follows from Df(p) being injective while TpM and Tf(p)(f(V )are both m-dimensional.

Proposition 10.6.2. If p is a submersion point of f , then there is a neighborhoodU of p such that f−1(f(p))∩U is a (m− n)-dimensional Cr submanifold of M . Inaddition Tpf

−1(q) = kerDf(p).

10.6. SUBMANIFOLDS 137

Proof. Let ϕ : U →M be a Cr parameterization whose image is an open neighbor-hood of p, let w0 = ϕ−1(p), and let ψ : Z → Rn be a Cr coordinate chart for anopen neighborhood Z ⊂ N of f(p). Without loss of generality we may assume thatf(ϕ(U)) ⊂ Z. Since Dϕ(w0) and Dψ(f(p)) are bijections,

D(ψ ◦ f ◦ ϕ)(w0) = Dψ(f(p)) ◦Df(p) ◦Dϕ(w0)

is surjective, and the vector space containing U can be decomposed as X×Y whereY is n dimensional and Dy(ψ ◦ f ◦ ϕ)(w0) is nonsingular. Let w0 = (x0, y0). Theimplicit function theorem gives an open neighborhood V ⊂ X containing x0, anopen W ⊂ U containing w0, and a Cr function g : V → Y such that g(x0) = y0 and

{ (x, g(x)) : x ∈ V } = {w ∈ W : f(ϕ(w)) = f(p) }.

Then{ϕ(x, g(x)) : x ∈ V } = f−1(f(p)) ∩ ϕ(W )

is a neighborhood of p in f−1(f(p)), and x 7→ ϕ(x, g(x)) is a Cr embedding becauseits inverse is the composition of ϕ−1 with the projection (x, y) 7→ x.

We obviously have Tpf−1(q) ⊂ kerDf(p), and the two vector spaces have the

same dimension.

Proposition 10.6.3. If p is a diffeomorphism point of f , then there is a neighbor-hood W of p such that f(W ) is a neighborhood of f(p) and f |W : W → f(W ) is aCr diffeomorphism.

Proof. Let ϕ : U → M be a Cr parameterization of a neighborhood of p, letx = ϕ−1(p), and let ψ : V → N be a Cr parameterization of a neighborhood off(p). Then

D(ψ−1 ◦ f ◦ ϕ)(x) = Dψ−1(f(p)) ◦Df(p) ◦Dϕ(x)is nonsingular, so the inverse function theorem implies that, after replacing U andV with smaller open sets containing x and ψ−1(f(p)), ψ−1 ◦ f ◦ϕ is invertible withCr inverse. Let W = ϕ(U). We now have

(f |W )−1 = ϕ ◦ (ψ−1 ◦ f ◦ ϕ)−1 ◦ ψ−1,

which is Cr.

Now let P be a p-dimensional Cr submanifold of N . The following is the tech-nical basis of the subsequent characterization theorem.

Lemma 10.6.4. If q ∈ P then:

(a) There is a neighborhood V ⊂ P , a p-dimensional Cr manifold M , a Cr func-tion f : M → P , a p ∈ f−1(q) that is an immersion point of f , and aneighborhood U of P , such that f(U) = V .

(b) There is a neighborhood Z ⊂ N of q, an (n − p)-dimensional Cr manifoldM , and a Cr function f : Z → M such q is a submersion point of f andf−1(f(q)) = P ∩ Z.


Proof. Let ϕ : U → P be a Cr parameterization for P whose image contains q.Taking f = ϕ verifies (a).

Let w = ϕ−1(q). Let ψ : V → N be a Cr parameterization for N whose imagecontains q. Then the rank of D(ψ−1 ◦ ϕ)(w) is p, so the vector space containingV can be decomposed as X × Y where X is the image of D(ψ−1 ◦ ϕ)(w). LetπX : X × Y → X and πY : X × X → Y be the projections (x, y) 7→ x and(x, y) 7→ y respectively. The inverse function implies that, after replacing U witha smaller neighborhood of w, πX ◦ ψ−1 ◦ ϕ is a Cr diffeomorphism between U andan open W ⊂ X . Since we can replace V with V ∩ π−1

X (W ), we may assume thatπX(V ) ⊂W . Let Z = ψ(V ), and let

f = πY ◦ ψ−1 − πY ◦ ψ−1 ◦ ϕ ◦ (πX ◦ ψ−1 ◦ ϕ)−1 ◦ πX ◦ ψ−1 : Z → Y.

Evidently every point of V is a submersion point of

πY − ψ−1 ◦ ϕ ◦ (πX ◦ ψ−1 ◦ ϕ)−1 ◦ πX ,so every point of Z is a submersion point of f . If q′ ∈ P ∩ Z, then q′ = ϕ(w′)for some w′ ∈ U , so f(q′) = 0. On the other hand, suppose f(q′) = 0, andlet q′′ be the image of q′ under the map ϕ ◦ (πX ◦ ψ−1 ◦ ϕ)−1 ◦ πX ◦ ψ−1. ThenπX(ψ

−1(q′)) = πX(ψ−1(q′′)) and πY (ψ

−1(q′)) = πY (ψ−1(q′′)), so q′′ = q′ and thus

q′ ∈ P . Thus f−1(f(q)) = P ∩ Z.Theorem 10.6.5. Let N be a Cr manifold. For P ⊂ N the following are equivalent:

(a) P is a p-dimensional Cr submanifold of M .

(b) For every q ∈ P there is a relatively open neighborhood V ⊂ P , a p-dimensionalCr manifoldM , a Cr function f :M → P , a p ∈ f−1(q) that is an immersionpoint of f , and a neighborhood U of P , such that f(U) = V .

(c) For every q ∈ P there is a neighborhood Z ⊂ N of q, an (n− p)-dimensionalCr manifold M , and a Cr function f : Z → M such q is a submersion pointof f and f−1(f(q)) = P ∩ Z.

Proof. The last result asserts that (a) implies (b) and (c), Proposition 10.6.1 impliesthat (b) implies (a), and Proposition 10.6.2 implies that (c) implies (a).

Let M ⊂ Rk and N ⊂ Rℓ be an m-dimensional and an n-dimensional Cr man-ifold, and let f : M → N be a Cr function. We say that f is an immersion ifevery p ∈ M is an immersion point of f . It is a submersion if every p ∈ M is asubmersion point, and it is a local diffeomorphism if every p ∈M is a diffeomor-phism point. There are now some important results that derive submanifolds fromfunctions.

Theorem 10.6.6. If f :M → N is a Cr immersion, and an embedding, then f(M)is an m-dimensional Cr submanifold of N .

Proof. We need to show that any q ∈ f(M) has a neighborhood in f(M) that is an(n −m)-dimensional Cr manifold. Proposition 10.6.1 implies that any p ∈ M hasan open neighborhood V such that f(V ) is a Cr (n−m)-dimensional submanifoldof N . Since f is an embedding, f(V ) is a neighborhood of f(p) in f(M).

10.6. SUBMANIFOLDS 139

A submersion point of f is also said to be a regular point of f . If p is not aregular point of f , then it is a critical point of f . A point q ∈ N is a critical valueof f if some preimage of q is a critical point, and if q is not a critical value, then itis a regular value. Note the following paradoxical aspect of this terminology: if qis not a value of f , in the sense that f−1(q) = ∅, then q is automatically a regularvalue of f .

Theorem 10.6.7 (Regular Value Theorem). If q is a regular value of f , then f−1(q)is an (m− n)-dimensional submanifold of M .

Proof. This is an immediate consequence of Proposition 10.6.2.

This result has an important generalization. Let P ⊂ N be a p-dimensional Cr

submanifold.

Definition 10.6.8. The function f is transversal to P along S ⊂ M if, for allp ∈ f−1(P ) ∩ S,

imDf(p) + Tf(p)P = Tf(p)N.

We write f ⋔S P to indicate that this is the case, and when S =M we simply writef ⋔ P .

Theorem 10.6.9 (Transversality Theorem). If f ⋔ P , then f−1(P ) is an (m −n + p)-dimensional Cr submanifold of M . For each p ∈ f−1(P ), Tpf

−1(P ) =Df(p)−1(Tf(p)P ).

Proof. Fix p ∈ f−1(P ). (If f−1(P ) = ∅, then all claims hold trivially.) We usethe characterization of a Cr submanifold given by Theorem 10.6.5: since P is asubmanifold of N , there is a neighborhood W ⊂ N of f(p) and a Cr functionΨ :W → Rn−p such that DΨ(f(p)) has rank n− p and P ∩W = Ψ−1(0).

Let V = f−1(W ) and Φ = Ψ ◦ f |V . Of course V is open, Φ is Cr, and f−1(P )∩V = Φ−10). We compute that

imDΦ(p) = DΨ(f(p))(

imDf(p))

= DΨ(f(p))(

imDf(p) + ker DΨ(f(p)))

= DΨ(f(p))(

imDf(p) + Tf(p)P)

= DΨ(f(p))(Tf(p)N) = Rn−s.

(The third equality follows from the final assertion of Proposition 10.6.2, and thefourth is the transversality assumption.) Thus p is a submersion point of Φ. Sincep is an arbitrary point of f−1(P ) the claim follows from Theorem 10.6.5.

We now have

Tpf−1(P ) = kerDΦ(p) = ker(DΨ(f(p)) ◦Df(p))

= Df(p)−1(kerDΨ(p)) = Df(p)−1(Tf(p)P )

where the first and last equalities are from Proposition 10.6.2.


10.7 Tubular Neighborhoods

Fix a degree of differentiability r ≥ 2 and an n dimensional Cr manifold N ⊂ Rℓ.For each q ∈ N let νqN be the orthogonal complement of TqN . The normal bundle

of N is

νN =⋃

q∈N{q} × νqN.

Proposition 10.7.1. νN is an ℓ-dimensional Cr−1 submanifold of N × Rℓ.

Proof. Let ϕ : U → Rℓ be a Cr parameterization for N . Let Z : U × Rℓ → Rn bethe function

Z(x, w) =(

〈Dϕ(x)e1, w〉, . . . , 〈Dϕ(x)en, w〉)

where e1, . . . , em is the standard basis for Rn. Clearly Z is Cr−1, and for every(x, w) in its domain the rank of DZ(x, w) is n. Therefore the regular value theoremimplies that

Z−1(0) = { (x, w) ∈ U × Rℓ : (ϕ(x), w) ∈ νN }is a ℓ-dimensional Cr−1 manifold. Since (x, w) 7→ (ϕ(x), w) and (q, w) 7→ (ϕ−1(q), w)are inverse Cr−1 bijections between Z−1(0) and νN ∩ (ϕ(U)×Rℓ), the first of thesemaps is a Cr−1 embedding, which implies (Theorem 10.6.6) that the latter set is aCr−1 manifold. Of course these sets cover νN because the images of Cr parameter-izations cover N .

Like the tangent bundle, the normal bundle attaches a vector space of a certaindimension to each point of N . (The general term for such a construct is a vector

bundle.) The zero section of νN is { (q, 0) : q ∈ N }. There are maps

π : (q, v) 7→ q and σ : (q, v) 7→ q + v

from N × Rℓ to N and Rℓ respectively. Let πT = Π|TM , πν = π|νN , σT = σ|TM ,and σν = σ|νN .

For a continuous function ρ : N → (0,∞) let

Uρ = { (q, v) ∈ νN : ‖v‖ < ρ(q) },

and let σνρ = σν |Uρ . The main topic of this section is the following result and itsmany applications.

Theorem 10.7.2 (Tubular Neighborhood Theorem). There is a continuous ρ :N → (0,∞) such that σνρ is a Cr−1 diffeomorphism onto its image, which is aneighborhood of N .

The inverse function theorem implies that each (q, 0) in the zero section has aneighborhood that is mapped Cr−1 diffeomorphically by σ onto a neighborhood ofq in Rℓ. The methods used to produce a suitable neighborhood of the zero sectionwith this property are topological and quite technical, in spite of their elementarycharacter.

10.7. TUBULAR NEIGHBORHOODS 141

Lemma 10.7.3. If (X, d) and (Y, e) are metric spaces, f : X → Y is continuous,S is a subset of X such that f |S is an embedding, and for each s ∈ S the restrictionof f to some neighborhood Ns of s is an embedding, then there is an open U suchthat S ⊂ U ⊂ ⋃

sNs and f |U is an embedding.

Proof. For s ∈ S let δ(s) be one half of the supremum of the set of ε > 0 such thatUε(s) ⊂ Ns and f |Uε(s) is an embedding. The restriction of an embedding to anysubset of its domain is an embedding, which implies that δ is continuous.

Since f |S is an invertible, its inverse is continuous. In conjunction with thecontinuity of δ and d, this implies that for each s ∈ S there is a ζs > 0 such that

d(s, s′) < min{δ(s)− 12δ(s′), δ(s)− 1

2δ(s′)} (∗)

for all s′ ∈ S with e(f(s), f(s′)) ≤ ζs. For each s choose an open Us ⊂ X such thats ∈ Us ⊂ Uδ(s)/2(s) and f(Us) ⊂ Uζs/3(f(s)). Let U =

⋃

s∈S Us. We will show thatf |U is injective with continuous inverse.

Consider s, s′ ∈ S and y, y′ ∈ Y with e(f(s), y) < ζs/3 and e(f(s′), y′) < ζs′/3.We claim that if y = y′, then (∗) holds: otherwise e(f(s), f(s′)) > ζs, ζs′, so that

e(y, y′) ≥ e(f(s), f(s′))− e(f(s), y)− e(f(s′), y′)

> (12e(f(s), f(s′))− ζs/3) + (1

2e(f(s), f(s′))− ζs′/3) ≥ 1

6(ζs + ζs′).

In particular, if f(x) = y = y′ = f(x′) for some x ∈ Us and x′ ∈ Us′, then12δ(s′) + d(s, s′) ≤ δ(s) and thus

Us′ ⊂ Uδ(s′)/2(s′) ⊂ Uδ(s′)/2+d(s,s′)(s) ⊂ Uδ(s)(s).

We have x ∈ Us, x′ ∈ Us′ , and Us, Us′ ⊂ Uδ(s)(s), and f |Uδ(s)(s) is injective, so it

follows that x = x′. We have shown that f |U is injective.We now need to show that the image of any open subset of U is open in the

relative topology of f(U). Fix a particular s ∈ S. In view of the definition ofU , it suffices to show that if Vs ⊂ Us is open, then f(Vs) is relatively open. Therestriction of f to Uδ(s)(s) is an embedding, so there is an open Zs ⊂ Y such thatf(Vs) = f(Uδ(s)(s)) ∩ Zs. Since f(Vs) ⊂ f(Us) ⊂ Uζs/3(f(s)) we have

f(Vs) =(

f(U) ∩Uζs/3(f(s)) ∩ Zs)

∩ f(Uδ(s)(s)).

Above we showed that if Uζs/3(f(s)) ∩ Uζs′/3(f(s′)) is nonempty, then (∗) holds.

Therefore f(U) ∩ Uζs/3(f(s)) is contained in the union of the f(Us′) for those s′

such that 12δ(s′) + d(s, s′) < δ(s), and for each such s′ we have Us′ ⊂ Uδ(s′)/2(s

′) ⊂Uδ(s)(s). Therefore f(U) ∩Uζs/3(f(s)) ⊂ f(Uδ(s)(s)), and consequently

f(Vs) = f(U) ∩Uζs/3(f(s)) ∩ Zs,

so f(Vs) is relatively open in f(U).

Lemma 10.7.4. If (X, d) is a metric space, S ⊂ X, and U is an open set containingS, then there is a continuous δ : S → (0,∞) such that for all s ∈ S, Uδ(s)(s) ⊂ U .


Proof. For each s ∈ S let βs = sup{ ε > 0 : Uε(s) ⊂ U }. Since X is paracompact(Theorem 6.1.1) there is a locally finite refinement {Vα}α∈A of {Uβs(s)}s∈S. The-orem 6.2.2 gives a partition of unity {ϕα} subordinate to {Vα}. The claim holdstrivially if there is some α with Vα = X ; otherwise for each α let δα : S → [0,∞) bethe function δα(s) = infx∈X\Vα d(s, x), which is of course continuous, and define δby setting δ(s) :=

∑

α ϕα(s)δα(s). If s ∈ S, s ∈ Vα, and δα′(s) ≤ δα(s) for all otherα′ such that s ∈ Vα′, then

Uδ(s)(s) ⊂ Uδα(s)(s) ⊂ Vα ⊂ Uβs′(s′) ⊂ U

for some s′, so Uδ(s)(s) ⊂ U .

The two lemmas above combine to imply that:

Proposition 10.7.5. If (X, d) and (Y, e) are metric spaces, f : X → Y is con-tinuous, S is a subset of X such that f |S is an embedding, and for each s ∈ Sthe restriction of f to some neighborhood Ns of s is an embedding, then there is acontinuous ρ : S → (0,∞) such that Uρ(s)(s) ⊂ Ns for all s and the restriction off to

⋃

s∈S Uρ(s)(s) is an embedding.

Proof of the Tubular Neighborhood Theorem. The inverse function theorem impliesthat each point (q, 0) in the zero section of νN has a neighborhood Nq such thatσν |Ns is a Cr−1 diffeomorphism. If ρ is in the last result, then σνρ is an embedding,and its inverse is Cr−1 differentiable because Uρ ⊂

⋃

q Nq.

We now develop several applications of the tubular neighborhood theorem. LetM be an m-dimensional Cr ∂-manifold.

Theorem 10.7.6. For any S ⊂M , Cr−1(S,N) is dense in CS(S,N).

Proof. Proposition 10.2.7 implies that Cr−1(S, Vρ) is dense in CS(S, Vρ), and Propo-sition 5.5.3 implies that f 7→ πν ◦ σ−1

ρ ◦ f is continuous.

Recall that a topological space X is locally path connected if, for each x ∈ X ,each neighborhood U of x contains a neighborhood V such that for any x0, x1 ∈ Vthere is a continuous path γ : [0, 1] → U with γ(0) = x0 and γ(1) = x1. For anopen subset of a locally convex topological vector space, local path connectednessis automatic: any neighborhood of a point contains a convex neighborhood.

Theorem 10.7.7. For any S ⊂M , CS(S,N) is locally path connected.

Proof. Fix a neighborhood U ⊂ CS(S,N) of a continuous f : S → N . The definitionof the strong topology implies that there is an openW ⊂ S×N such that f ∈ { f ′ ∈C(S,N) : Gr(f ′) ⊂ W } ⊂ U . Lemma 10.7.4 implies that there is a continuousλ : N → (0,∞) such that Uλ(y)(y) ⊂ Vρ for all y ∈ N and (x, π(σ−1

ρ (z))) ∈ W forall x ∈ S and z ∈ Uλ(f(x))(f(x)). Let W

′ = { (x, y) ∈ W : y ∈ Uλ(f(x))(f(x)) }. Forf0, f1 ∈ C(S,N) with Gr(f0),Gr(f1) ⊂W ′ we define h by setting

h(x, t) = πν(

σ−1ρ ((1− t)f0(x) + tf1(x))

)

.

If f0 and f1 are Cr, so that they are the restrictions to S of Cr functions definedon open supersets of S, then this formula defines a Cr extension of h to an opensuperset of S × [0, 1], so that h is Cr.

10.8. MANIFOLDS WITH BOUNDARY 143

Proposition 10.7.8. There is a continuous function λ : N → (0,∞) and a Cr−1

function κ : Vλ → N , where Vλ = { (q, v) ∈ TN : ‖v‖ < λ(q) }, such that thefunction κ : (q, v) 7→ (q, κ(q, v)) is a Cr−1 diffeomorphism between Vλ and a neigh-borhood of the diagonal in N ×N .

Proof. Let ρ and Uρ be as in the tubular neighborhood theorem. For each q ∈ Nthere is a neighborhood Nq of (q, 0) ∈ TqN such that σT (Nq) is contained in σνρ(Uρ).Let

κ = πν ◦ (σνρ)−1 ◦ σT :⋃

q

Nq → N and κ = πT × κ :⋃

q

Nq → N ×N.

It is easy to see (and not hard to compute formally using the chain rule) thatDκ(q, 0) = IdTqN × IdTqN under the natural identification of T(q,0)(TN) with TqN ×TqN . The inverse function theorem implies that after replacing Nq with a smallerneighborhood of (q, 0), the restriction of κ to Nq is a diffeomorphism onto its image.We can now proceed as in the proof of the tubular neighborhood theorem.

The following construction simulates convex combination.

Proposition 10.7.9. There is a neighborhood W of the diagonal in N ×N and acontinuous function c : W × [0, 1] → N such that:

(a) c((q, q′), 0) = q for all (q, q′) ∈ W ;

(b) c((q, q′), 1) = q′ for all (q, q′) ∈ W ;

(c) c((q, q), t) = q for all q ∈ N and all t.

Proof. The tubular neighborhood gives an open neighborhood U of the zero sectionin νN such that if σ : νN → Rk is the map σ(q, v) = q + v, then σ|U is a homeo-morphism between U and σ(U). Let π : νN → N be the projection for the normalbundle. Let

W = { (q, q′) ∈ N ×N : (1− t)q + tq′ ∈ σ(U) for all 0 ≤ t ≤ 1 },

and for (q, q′) ∈ W and 0 ≤ t ≤ 1 let

c((q, q′), t) = π(

(σ|U)−1((1− t)q + tq′))

.

10.8 Manifolds with Boundary

Let X be an m-dimensional vector space, and let H be a closed half space of X .In the same way that manifolds were “modeled” on open subsets of X , manifoldswith boundary are “modeled” on open subsets of H . Examples of ∂-manifoldsinclude the m-dimensional unit disk

Dm := { x ∈ Rm : ‖x‖ ≤ 1 },


the annulus { x ∈ R2 : 1 ≤ x ≤ 2 }, and of course H itself. Since we will be veryconcerned with homotopies, a particularly important example is M × [0, 1] whereM is a manifold (without boundary). Thus it is not surprising that we need toextend our formalism in this direction. What actually seems more surprising is theinfrequency with which one needs to refer to “manifolds with corners,” which arespaces that are “modeled” on the nonnegative orthant of Rm.

There is a technical point that we need to discuss. If U ⊂ H is open andf : U → Y is C1, where Y is another vector space, then the derivative Df(x) isdefined at any x ∈ U , including those in the boundary of H , in the sense that allC1 extensions f : U → Y of f to open (in X) sets U with U ∩ H = U have thesame derivative at x. This is fairly easy to prove by showing that if w ∈ X and theray rw = { x + tw : t ≥ 0 } from x “goes into” H , then the derivative of f alongrw is determined by f , and that the set of such w spans X . We won’t belabor thepoint by formalizing this argument.

The following definitions parallel those of the last section. If U ⊂ H is open andϕ : U → Y is a function, we say that ϕ is a Cr ∂-immersion if it is Cr and therank of Dϕ(x) is m for all x ∈ U . If, in addition, ϕ is a homeomorphism betweenU and ϕ(U), then we say that ϕ is a Cr ∂-embedding.

Definition 10.8.1. If M ⊂ Rk, an m-dimensional Cr ∂-parameterization forM is a Cr ∂-embedding ϕ : U →M , where U ⊂ H is open and ϕ(U) is a relativelyopen subset ofM . If each p ∈ M is contained in the image of a Cr parameterizationfor M , then M is an m-dimensional Cr manifold with boundary.

We will often write “∂-manifold” in place of the cumbersome phrase “manifold withboundary.”

Fix anm-dimensional Cr ∂-manifoldM ⊂ Rk. We say that p ∈M is a boundarypoint ofM if there a Cr ∂-parameterization ofM that maps a point in the boundaryof H to p. If any Cr parameterization of a neighborhood of p has this property, thenall do; this is best understood as a consequence of invariance of domain (Theorem14.4.4) which is most commonly proved using algebraic topology. Invariance ofdomain is quite intuitive, and eventually we will be able to establish it, but in themeantime there arises the question of whether our avoidance of results derived fromalgebraic topology is “pure.” One way of handling this is to read the definitionof a ∂-manifold as specifying which points are in the boundary. That is, a ∂-manifold is defined to be a subset of Rk together with an atlas of m-dimensionalCr parameterizations {ϕi}i∈I such that each ϕ−1

j ◦ ϕi maps points in the boundaryof H to points in the boundary and points in the interior to points in the interior.In order for this to be rigorous it is necessary to check that all the constructions inour proofs preserve this feature, but this will be clear throughout. With this pointcleared up, the boundary of M is well defined; we denote this subset by ∂M . Notethat ∂M automatically inherits a system of coordinate systems that display it asan (m− 1)-dimensional Cr manifold (without boundary).

Naturally our analytic work will be facilitated by characterizations of ∂-manifoldsthat are somewhat easier to verify than the definition.

Lemma 10.8.2. For M ⊂ Rk the following are equivalent:

10.8. MANIFOLDS WITH BOUNDARY 145

(a) M is an m-dimensional ∂-manifold;

(b) for each p ∈ M there is a neighborhood W ⊂ M , an m-dimensional Cr

manifold (without boundary) W , and a Cr function h : W → R such thatW = h−1([0,∞)) and Dh(p) 6= 0.

Proof. Fix p ∈ M . If (a) holds then there is a Cr ∂-embedding ϕ : U → M , whereU ⊂ H is open and ϕ(U) is a relatively open subset of M . After composing withan affine function, we have assume that H = { x ∈ Rm : xm ≥ 0 }. Let ϕ : U → Rk

be a Cr extension of ϕ to an open (in Rm) superset of U . After replacing U with asmaller neighbohrood of ϕ−1(p) it will be the case that ϕ is a Cr embedding, andwe may replace U with its intersection with this smaller neighborhood. To verify(b) we set W = ϕ(U) and W = ϕ(U), and we let h be the last component functionof ϕ−1.

Now suppose that W , W , and h are as in (b). Let ψ : V → W be a Cr

parameterization for W whose image contains p, and let x = ψ−1(p). Since Dh(p) 6=0 there is some i such that ∂(h◦ψ)

∂xi(x) 6= 0; after reindexing we may assume that

i = m. Let η : W → Rm be the function

η(x) =(

x1, . . . , xm−1, h(ψ(x)))

.

Examination of the matrix of partial derivatives shows thatDη(x) is nonsingular, so,by the inverse function, after replacing W with a smaller neighborhood of x, we mayassume that η is a Cr embedding. Let U = η(V ), U = U∩H , ϕ = ψ◦η−1 : U → W ,and ϕ = ϕ|U : U →W . Evidently ϕ is a Cr ∂-parameterization for M .

The following consequence is obvious, but is still worth mentioning because itwill have important applications.

Proposition 10.8.3. If M is an m-dimensional Cr manifold, f : M → R is Cr,and a is a regular value of f , then f−1([a,∞)) is an m-dimensional Cr ∂-manifold.

The definitions of tangent spaces, tangent manifolds, and derivatives, are onlyslightly different from what we saw earlier. Suppose that M ⊂ Rk is an m-dimensional Cr ∂-manifold, ϕ : U → M is a Cr ∂-parameterization, x ∈ U , andϕ(x) = p. The definition of a Cr function gives a Cr extension ϕ : U → Rk of ϕ toan open (in Rm) superset of U , and we define TpM to be the image of Dϕ(x). (Ofcourse there is no difficulty showing that Dϕ(x) does not depend on the choice ofextension ϕ.) As before, the tangent manifold of M is

TM =⋃

p∈M{p} × TpM.

Let πTM : TM →M be the natural projection π : (p, v) 7→ p.We wish to show that TM is a Cr−1 ∂-manifold. To this end define Tϕ : U ×

Rm → π−1TM(U) by setting Tϕ(x, w) = (ϕ(x), Dϕ(x)w). If r ≥ 2, then Tϕ is an

injective Cr−1 ∂-immersion whose image is open in TM , so it is a Cr ∂-embedding.Since TM is covered by the images of maps such as Tϕ, it is indeed a Cr−1 ∂-manifold.


If N ⊂ Rℓ is an n-dimensional Cr ∂-manifold and f :M → N is a Cr map, thenthe definitions of Df(p) : TpM → Tf(p)N for p ∈ M and Tf : TM → TN , andthe main properties, are what we saw earlier, with only technical differences in theexplanation. In particular, T extends to a functor from the category Cr ∂-manifoldsand Cr maps to the category of Cr−1 ∂-manifolds and Cr−1 maps.

We also need to reconsider the notion of a submanifold. One can of course definea Cr ∂-submanifold of M to be a Cr ∂-manifold that happens to be contained inM , but the submanifolds of interest to us satisfy additional conditions. Any pointin the submanifold that lies in ∂M should be a boundary point of the submanifold,and we don’t want the submanifold to be tangent to ∂M at such a point.

Definition 10.8.4. IfM is a Cr ∂-manifold, a subset P is a neat Cr ∂-submanifold

if it is a Cr ∂-manifold, ∂P = P ∩∂M , and for each p ∈ ∂P we have TpP+Tp∂M =TpM .

The reason this is the relevant notion has to do with transversality. Supposethat M is a Cr ∂-manifold, N is a Cr manifold, without boundary, P is a Cr

submanifold of N , and f :M → N is Cr. We say that f is transversal to P alongS ⊂ M , and write f ⋔S P , if f |M\∂M ⋔S\∂M P and f |∂M ⋔S∩∂M P . As above, whenS =M we write f ⋔ P .

The transversality theorem generalizes as follows:

Proposition 10.8.5. If f :M → N is a Cr function that is transversal to P , thenf−1(P ) is a neat Cr submanifold of M with ∂f−1(P ) = f−1(P ) ∩ ∂M .

Proof. We need to show that a neighborhood of a point p ∈ f−1(P ) has the requiredproperties. If p ∈ M \ ∂M , this follows from the Theorem 10.6.9, so supposethat p ∈ ∂M . Lemma 10.8.2 implies that there is a neighborhood W ⊂ M ofp, an m-dimensional Cr manifold W , and a Cr function h : W → R such thatW = h−1([0,∞)), h(p) = 0, and Dh(p) 6= 0. Let f : W → N be a Cr extensionof f |W 1. We may assume that f is transverse to P , so the transversality theoremimplies that f−1(P ) is a Cr submanifold of W .

Since f and f |∂M are both tranverse to P , there must be a v ∈ TpM \Tp∂M suchthat Df(p)v ∈ Tf(p)P . This implies two things. First, since v /∈ ker Dh(p) = Tp∂M

and f−1(P )∩W = f−1(P )∩W∩h−1([0,∞)), Lemma 10.8.2 implies that f−1(P )∩Wis a Cr ∂-manifold in a neighborhood of p. Second, the transversality theoremimplies that Tpf

−1(P ) includes v, so we have Tpf−1(P ) + Tp∂M = TpM .

10.9 Classification of Compact 1-Manifolds

In order to study the behavior of fixed points under homotopy, we will needto understand the structure of h−1(q) when M and N are manifolds of the samedimension,

h :M × [0, 1] → N

1If ψ : V → N is a Cr parameterization for N whose image contains f(W ), then ψ−1 has a Cr

extension, because that is what it means for a function on a possibly nonopen domain to be Cr,and this extension can be composed with ψ to give f .

10.9. CLASSIFICATION OF COMPACT 1-MANIFOLDS 147

is a Cr homotopy, and q is a regular value of h. The transverality theorem impliesthat h−1(q) is a 1-dimensional Cr ∂-manifolds, so our first step is the followingresult.

Proposition 10.9.1. A nonempty compact connected 1-dimensional Cr manifold isCr diffeomorphic to the circle C = { (x, y) ∈ R2 : x2+y2 = 1 }. A compact connected1-dimensional Cr ∂-manifold with nonempty boundary is Cr diffeomorphic to [0, 1].

Of course no one has any doubts about this being true. If there is anything tolearn from the following technical lemma and the subsequent argument, it can onlyconcern technique. Readers who skip this will not be at any disadvantage.

Lemma 10.9.2. Suppose that a < b and c < d, and that there is an increasing Cr

diffeomorphism f : (a, b) → (c, d). Then for sufficiently large Q ∈ R there is anincreasing Cr diffeomorphism λ : (a, b) → (a−Q, d) such that λ(s) = s−Q for alls in some interval (a, a+ δ) and λ(s) = f(s) for all s in some interval (b− ε, b).

Proof. Lemma 10.2.2 presented a C∞ function γ : R → [0,∞] with γ(t) = 0 for allt ≤ 0 and γ′(t) > 0 for all t > 0. Setting

κ(s) =γ(s− a− δ)

γ(s− a− δ) + γ(b− ε− s)

for sufficiently small δ, ε > 0 gives a C∞ function κ : (a, b) → [0, 1] with κ(s) = 0for all s ∈ (a, a+ δ), κ(s) = 1 for all s ∈ (b− ε, b), and κ′(s) > 0 for all s such that0 < κ(s) < 1. For any real number Q we can define λ : (a, b) → R by setting

λ(s) = (1− κ(s))(s−Q) + κ(s)f(s).

Clearly this will be satisfactory if λ′(s) > 0 for all s. A brief calculation gives

λ′(s) = 1 + κ(s)(f ′(s)− 1) + κ′(s)(Q+ f(s)− s)

= (1− κ(s))(1− f ′(s)) + f ′(s) + κ′(s)(Q+ f(s)− s).

If Q is larger than the upper bound for s− f(s), then λ′(s) > 0 when κ(s) is closeto 0 or 1. Since those s for which this is not the case will be contained in a compactinterval on which κ′ positive and continuous, hence bounded below by a positiveconstant, if Q is sufficiently large then λ′(s) > 0 for all s.

Proof of Proposition 10.9.1. LetM be a nonempty compact connected 1-dimensionalCr manifold. We can pass from a Cr atlas for M to a Cr atlas whose elements allhave connected domains by taking the restrictions of each element of the atlas tothe connected components of its domain. To be concrete, we will assume that thedomains of the parameterizations are connected subsets of R, i.e., open intervals.Since we can pass from a parameterization with unbounded domain to a countablecollection of restrictions to bounded domains, we may assume that all domains arebounded. Since M is compact, any atlas has a finite subset that is also an atlas.We now have an atlas of the form

{ϕ1 : (a1, b1) →M, . . . , ϕK : (aK , bK) → M }.


Finally, we may assume that K is minimal. Since M is compact, K > 1.Let p be a limit point of ϕ1(s) as s → b1. If p was in the image of ϕ1, say

p = ϕ1(s1), then the image of a neighborhood of s1 would be a neighborhood ofp, and points close to b1 would be mapped to this neighborhood, contradicting theinjectivity of ϕ1. Therefore p is not in the image of ϕ1. After reindexing, we mayassume that p is in the image of ϕ2, say p = ϕ2(t2).

Fix ε > 0 small enough that [t2 − ε, t2 + ε] ⊂ (a2, b2). Since ϕ2((t2 − ε, t2 + ε))and M \ϕ2([t2− ε, t2+ ε]) are open and disjoint, and there at most two s such thatϕ1(s) = ϕ2(t2±ε), there is some δ > 0 such that ϕ1((b2−δ, b1)) ⊂ ϕ2((t2−ε, t2+ε)).Then f = ϕ−1

2 ◦ϕ1|(b1−δ,b1) is a Cr diffeomorphism. The intermediate value theoremimplies that it is monotonic. Without loss of generality (we could replace ϕ2 witht 7→ ϕ2(−t)) we may assume that it is increasing. Of course lims→b1 f(s) = t2.

The last result implies that there is some real number Q and an increasing Cr

diffeomorphism λ : (b1−δ, b1) → (b1−δ− c, t2) such that λ(s) = s−Q for all s nearb1 − δ and λ(s) = f(s) for all s near b1. We can now define ϕ : (a1 − Q, b2) → Mby setting

ϕ(s) =

ϕ1(s+Q), s ≤ b1 − δ −Q,

ϕ1(λ−1(s)), b1 − δ −Q < s < t2,

ϕ2(s), s ≥ t2.

We have λ−1(s) = s+Q for all s in a neighborhood of b1 − δ−Q and ϕ(s) = ϕ2(s)for all s close to t2. Therefore ϕ is a Cr function. Each point in its domainhas a neighborhood such that the restriction of ϕ to that neighborhood is a Cr

parameterization for M , which implies that if maps open sets to open sets. If itwas injective, it would be a Cr coordinate chart whose image was the union of theimages of ϕ1 and ϕ2, which would contradict the minimality of K.

Therefore ϕ is not injective. Since ϕ1 and ϕ2 are injective, there must be s <b1 − δ− c such that ϕ(s) = ϕ(s′) for some s′ > t1. Let s0 be the supremum of suchs. If ϕ(s0) = ϕ(s′) for some s′ > t1, then the restrictions of ϕ to neighborhoods ofs0 and s′ would both map diffeomorphically onto some neighborhood of this point,which would give a contradiction of the definition of s0. Therefore ϕ(s0) is in theclosure of ϕ(((t1, b2)), but is not an element of this set, so it must be lims′→b2 ϕ(s

′).Arguments similar to those given above imply that there are α, β > 0 such that theimages of ϕ|(b2−α,b2) and ϕ|(s0−β,s0) are the same, and the Cr diffeomorphism

g = (ϕ|(s0−β,s0))−1 ◦ ϕ|(b2−α,b2)

is increasing. Applying the lemma above again, there is a real number R and anincreasing Cr diffeomorphism λ : (b2−α, b2) → (b2−α−R, s0) such that λ(s) = s−Rfor s near b2 − α and λ(s) = g(s) for s near b2.

We now define ψ : [s0, s0 +R) →M by setting

ψ(s) =

{

ϕ(s), s0 ≤ s ≤ b2 − α,

ϕ(λ−1(s−R)), b2 − α < s < s0 +R.

Then ψ agrees with ϕ near b2 − α, so it is Cr, and it agrees with ϕ(s − R) nears0+R, so it can be construed as a Cr function from the circle (thought of R modulo

10.9. CLASSIFICATION OF COMPACT 1-MANIFOLDS 149

R) toM . This function is easily seen to be injective, and it maps open sets to opensets, so its image is open, but also compact, hence closed. Since M is connected,its image must be all of M , so we have constructed te desired Cr diffeomorphismbetween the circle and M .

The argument for a compact connected one dimensional Cr ∂-manifold withnonempty boundary is similar, but somewhat simpler, so we leave it to the reader.

Although it will not figure in the work here, the reader should certainly beaware that the analogous issues for higher dimensions are extremely important intopology, and mathematical culture more generally. In general, a classification ofsome type of mathematical object is a description of all the isomorphism classes(for whatever is the appropriate notion of isomorphism) of the object in question.The result above classifies compact connected 1-dimensional Cr manifolds.

The problem of classifying oriented surfaces (2-dimensional manifolds) was firstconsidered in a paper of Mobius in 1870. The classification of all compact connectedsurfaces was correctly stated by van Dyke in 1888. This result was proved forsurfaces that can be triangulated by Dehn and Heegaard in 1907, and in 1925 Radoshowed that any surface can be triangulated.

After some missteps, Poincare formulated a fundamental problem for the theclassification of 3-manifolds: is a simply connected compact 3-manifold necessarilyhomeomorphic to S3? (A topological space X is simply connected if it is con-nected and any continuous function f : S1 = { (x, y) ∈ R2 : x2 + y2 = 1 } → X hasa continuous extension F : D2 = { (x, y) ∈ R2 : x2 + y2 ≤ 1 } → X .) AlthoughPoincare did not express a strong view, this became known as the Poincare con-

jecture, and over the course of the 20th century, as it resisted solution and the fourcolor theorem and Fermat’s last theorem were proved, it became perhaps the mostfamous open problem in mathematics. Curiously, the analogous theorems for higherdimensions were proved first, by Smale in 1961 for dimensions five and higher, andby Freedman in 1982 for dimension four. Finally in late 2002 and 2003 Perelmanposted three papers on the internet that sketched a proof of the original conjecture.Over the next three years three different teams of two mathematicians set aboutfilling in the details of the argument. In the middle of 2006 each of the teams posteda (book length) paper giving a complete argument. Although Perelman’s paperswere quite terse, and many details needed to be filled in, all three teams agreedthat all gaps in his argument were minor.

Chapter 11

Sard’s Theorem

The results concerning existence and uniqueness of systems of linear equationshave been well established for a long time, of course. In the late 19th century Wal-ras recognized that the system describing economic equilibria had (after recognizingthe redundant equation now known as Walras’ law) the same number of equationsand free variables, which suggested that “typically” economic equilibria should beisolated and also robust, in the sense that the endogenous variables will vary contin-uously with the underlying parameters in some neighborhood of the initial point. Itwas several decades before methods for making these ideas precise were establishedin mathematics, and then several more decades elapsed before they were importedinto theoretical economics.

The original versions of what is now known as Sard’s theorem appeared duringthe 1930’s. There followed a process of evolution, both in the generality of theresult and in the method of proof, that culminated in the version due to Federer(see Section 11.3.) Our treatment here is primarily based on Milnor (1965), fleshedout with some arguments from Sternberg (1983), which (in its first edition) seemsto have been Milnor’s primary source. While not completely general, this version ofthe result is adequate for all of the applications in economic theory to date, manyof which are extremely important.

Suppose 1 ≤ r ≤ ∞, and let f : U → Rn be a Cr function, where U ⊂ Rm

is open. If f(x) = y and Df(x) has rank n, then the implicit function theorem(Theorem 10.1.2) implies that, in a neighborhood of x, f−1(y) can be thought ofas the graph of a Cr function. Intuition developed by looking at low dimensionalexamples suggests that for “typical” values of y this pleasant situation will prevailat all elements of f−1(y), but even in the case m = n = 1 one can see that there canbe a countable infinity of exceptional y. Thus the difficulty in formulating this ideaprecisely is that we need a suitable notion of a “small” subset of Rn. This problemwas solved by the theory of Lesbesgue measure, which explains the relatively latedate at which the result first appeared.

Measure theory has rather complex foundations, so it preferable that it not bea prerequisite. Thus it is fortunate that only the notion of a set of measure zerois required. Section 11.1 defines this notion and establishes its basic properties.One of the most important results in measure theory is Fubini’s theorem, which,roughly speaking, allows functions to be integrated one variable at a time. Section

150

11.1. SETS OF MEASURE ZERO 151

11.2 develops a Fubini-like result for sets of measure zero. With these elementsin place, it becomes possible to state and prove Sard’s theorem in Section 11.3.Section 11.4 explains how to extend the result to maps between sufficiently smoothmanifolds.

The application of Sard’s theorem that is most important in the larger scheme ofthis book is given in Section 11.5. The overall idea is to show that any map betweenmanifolds can be approximated by one that is transversal to a given submanifoldof the range.

11.1 Sets of Measure Zero

For each n there is a positive constant such that the volume of a ball in Rn isthat constant times rn, where r is the radius of the ball. Without knowing verymuch about the constant, we can still say that sets satisfying the following definitionare “small.”

Definition 11.1.1. A set S ⊂ Rm has measure zero if, for any ε > 0, there isa sequence {(xj, rj)}∞j=1 in Rk × (0, 1) such that

S ⊂⋃

j

Urj (xj) and∑

j

rmj < ε.

Of course we can use different sets, such as cubes, as a measure of whethera set has measure zero. Specifically, if we can find a covering of S by balls ofradius rj with

∑

j rmj < ε, then there is a covering by cubes of side length 2rj

with∑

j(2rj)m < 2mε, and if we can find a covering of S by cubes of side lengths

2ℓj with∑

j(2ℓj)m < ε, then there is a covering by balls of radius

√mℓj with

∑

j(√mℓj)

m < (√m/2)mε. We can also use rectangles

∏mi=1[ai, bi] because we can

cover such a rectangle with a collection of cubes of almost the same total volume;from the point of view of our methodology it is important to recognize that we“know” this as a fact of arithmetic (and in particular the distributive law) ratherthan as prior knowledge concerning volume.

The rest of this section develops a few basic facts. The following property ofsets of measure zero occurs frequently in proofs.

Lemma 11.1.2. If S1, S2, . . . ⊂ Rm are sets of measure zero, then S1 ∪S2∪ . . . hasmeasure zero.

Proof. For given ε take the union of a countable cover of S1 by rectangles of totalvolume < ε/2, a countable cover of S2 by rectangles of total volume < ε/4, etc.

It is intuitively obvious that a set of measure zero cannot have a nonemptyinterior, but our methodology requires that we “forget” everything we know aboutvolume, using only arithmetic to prove it.

Lemma 11.1.3. If S has measure zero, its interior is empty, so its complement isdense.

152 CHAPTER 11. SARD’S THEOREM

Proof. Suppose that, on the contrary, S has a nonempty interior. Then it contains aclosed cube C, say of side length 2ℓ. Fixing ε > 0, suppose that S has a covering bycubes of side length 2ℓj with

∑

j(2ℓj)m < ε. Then it has a covering by open cubes

Cj of side length 3ℓj, and there is a finite subcover of C. For some large integer K,

consider all “standard cubes” of the form∏m

j=1[ijK,ij+1

K]. For each cube in our finite

subcover, let Dj be the union of all such standard cubes contained in Cj, and let njbe the number of such cubes. Let D be the union of all standard cubes containing apoint in C, and let n be the number of them. Simply as a matter of counting (thatis to say, without reference to any theory of volume) we have nj/K

m ≤ (3ℓj)m and

n/Km ≥ (2ℓ)m. If K is sufficiently large, then D ⊂ ⋃

j Dj , so that n ≤ ∑

j nj and

(2ℓ)m ≤ n/Km ≤∑

j

nj/Km ≤

∑

j

(3ℓj)m ≤ (3/2)mε,

so that ε > (4ℓ/3)m cannot be arbitrarily small.

The next result implies that the notion of a set of measure zero is invariantunder C1 changes of coordinates. In the proof of Theorem 11.3.1 we will use thisflexibility to choose coordinate systems with useful properties. In addition, thisfact is the key to the definition of sets of measure zero in manifolds. Recall that ifL : Rm → Rm is a linear transformation, then the operator norm of L is

‖L‖ = max‖v‖=1

‖L(v)‖.

Lemma 11.1.4. If U ⊂ Rm is open, f : U → Rm is C1, and S ⊂ U has measurezero, then f(S) has measure zero.

Proof. Let C ⊂ U be a closed cube. Since U can be covered by countably many suchcubes (e.g., all cubes contained in U with rational centers and rational side lengths)it suffices to show that f(S ∩C) has measure zero. Let B := maxx∈C ‖Df(x)‖. Forany x, y ∈ C we have

‖f(x)− f(y)‖ =∥

∥

∥

∫ 1

0

Df((1− t)x+ ty)(y − x) dt∥

∥

∥

≤∫ 1

0

‖Df((1− t)x+ ty)‖ · ‖y − x‖ dt ≤ B‖y − x‖.

If {(xj, rj)}∞j=1 is a sequence such that

S ∩ C ⊂⋃

j

Urj(xj) and∑

j

rmj < ε,

then

f(S ∩ C) ⊂⋃

j

UBrj (f(xj)) and∑

j

(Brj)m < Bmε.

11.2. A WEAK FUBINI THEOREM 153

11.2 A Weak Fubini Theorem

For a set S ⊂ Rm and t ∈ R let

S(t) := { (x2, . . . , xm) ∈ Rm−1 : (t, x2, . . . , xm) ∈ S }

be the t-slice of S. Let P (S) be the set of t such that S(t) does not have (m− 1)-dimensional measure zero. Certainly it seems natural to expect that if S is a setof m-dimensional zero, then P (S) should be a set of 1-dimensional measure zero,and conversely. This is true, by virtue of Fubini’s theorem, but we do not have themeans to prove it in full generality. Fortunately all we will need going forward is aspecial case.

Proposition 11.2.1. If S ⊂ Rm is locally closed, then S has measure zero if andonly if P (S) has measure zero.

We will prove this in several steps.

Lemma 11.2.2. If C ⊂ Rm is compact, then C has measure zero if and only ifP (C) has measure zero.

Proof of Proposition 11.2.1. Suppose that S = C ∩ U where C is closed and U isopen. Let A1, A2, . . . be a countable collection of compact rectangles that cover U .Then the following are equivalent:

(a) S has measure zero;

(b) each C ∩ Aj has measure zero;

(c) each P (C ∩Aj) has measure zero;

(d) P (S) has measure zero.

Specifically, Lemma 11.1.2 implies that (a) and (b) are equivalent, and also thatP (S) =

⋃

j P (C ∩ Aj), after which the equivalence of (c) and (d) follows from athird application of the result. The equivalence of (b) and (c) follows from thelemma above.

We now need to prove Lemma 11.2.2. Fix a compact set C, which we assumeis contained in the rectangle

∏mi=1[ai, bi]. For each δ > 0 let Pδ(C) be the set of

t such that C(t) cannot be covered by a finite collection of open rectangles whosetotal (m− 1)-dimensional volume is less than δ.

Lemma 11.2.3. For each δ > 0, Pδ(C) is closed.

Proof. If t is in the complement of Pδ(C), then any collection of open rectangles thatcover C(t) also covers C(t′) for t′ sufficiently close to t, because C is compact.

The next two results are the two implications of Lemma 11.2.2.

Lemma 11.2.4. If P (C) has measure zero, then C has measure zero.


Proof. Fix ε > 0, and choose δ < ε/2(b1 − a1). Since Pδ(C) ⊂ P (C), it has onedimensional measure zero, and since it is closed, hence compact, it can be covered bythe union J of finitely many open intervals of total length ε/2(b2−a2) · · · (bm−am).In this way { x ∈ C : x1 ∈ J } is covered by a union of open rectangles of totalvolume ≤ ε/2.

For each t /∈ J we can choose a finite union of rectangles in Rm−1 of total volumeless than δ that covers C(t), and these will also cover C(t′) for all t′ in some openinterval around t. Since [a1, b1] \ J is compact, it is covered by a finite collection ofsuch intervals, and it is evident that we can construct a cover of { x ∈ C : x1 /∈ J }of total volume less than ε/2.

Lemma 11.2.5. If C has measure zero, then P (C) has measure zero.

Proof. Since P (C) =⋃

n=1,2,... P1/n(C), it suffices to show that Pδ(C) has measurezero for any δ > 0. For any ε > 0 there is a covering of C by finitely many rectanglesof total volume less than ε. For each t there is an induced covering C(t) be a finitecollection of rectangles, and there is an induced covering of [a1, b1]. The total lengthof intervals with induced coverings of total volume greater than δ cannot exceedε/δ.

11.3 Sard’s Theorem

We now come to this chapter’s central result. Recall that a critical point of aC1 function is a point in the domain at which the rank of the derivative is less thanthe dimension of the range, and a critical value is a point in the range that is theimage of a critical point.

Theorem 11.3.1. If U ⊂ Rm is open and f : U → Rn is a Cr function, wherer > max{m− n, 0}, then the set of critical values of f has measure zero.

Proof. If n = 0, then f has no critical points and therefore no critical values. Ifm = 0, then U is either a single point or the null set, and if n > 0 its image hasmeasure zero. Therefore we may assume that m,n > 0. Since r > m − n impliesboth r > (m − 1) − (n − 1) and r > (m − 1) − n, by induction we may assumethat the claim has been established with (m,n) replaced by either (m− 1, n− 1) or(m− 1, n).

Let C by the set of critical points of f . For i = 1, . . . , r let Ci be the set ofpoints in U at which all partial derivatives of f up to order i vanish. It suffices toshow that:

(a) f(C \ C1) has measure 0;

(b) f(Ci \ Ci+1) has measure zero for all i = 1, . . . , r − 1;

(c) f(Cr) has measure zero.

Proof of (a): We will show that each x ∈ C \ C1 has a neighborhood V such thatf(V ∩ C) has measure zero. This suffices because C \ C1 is an open subset of a

11.3. SARD’S THEOREM 155

closed set, so it is covered by countably many compact sets, each of which is coveredby finitely many such neighborhoods, and consequently it has a countable cover bysuch neighborhoods.

After reindexing we may assume that ∂f1∂x1

(x) 6= 0. Let V be a neighborhood of

x in which ∂f1∂x1

does not vanish. Let h : V → Rm be the function

h(x) := (f1(x), x2, . . . , xm).

The matrix of partial derivatives of h at x is

∂f1∂x1

(x) ∂f1∂x2

(x) · · · ∂f1∂xm

(x)

0 1 · · · 0...

......

0 0 · · · 1

,

so the inverse function theorem implies that, after replacing V with a smaller neigh-borhood of x, h is a diffeomorphism onto its image. The chain rule implies thatthe critical values of f are the critical values of g = f ◦ h−1, so we can replace fwith g, and g has the additional property that g1(z) = z1 for all z in its domain.The upshot of this argument is that we may assume without loss of generality thatf1(x) = x1 for all x ∈ V .

For each t ∈ R let V t := {w ∈ Rm−1 : (t, w) ∈ V }, let f t : V t → Rn−1 be thefunction

f t(w) := (f2(t, w), . . . , fn(t, w)),

and let Ct be the set of critical points of f t. The matrix of partial derivatives of fat x ∈ V is

1 0 · · · 0∂f2∂x1

(x) ∂f2∂x2

(x) · · · ∂f2∂xm

(x)...

......

∂fn∂x1

(x) ∂fn∂x2

(x) · · · ∂fn∂xm

(x)

,

so x is a critical point of f if and only if (x2, . . . , xm) is a critical point of fx1, andconsequently

C ∩ V =⋃

t

{t} × Ct and f(C ∩ V ) =⋃

t

{t} × f t(Ct).

Since the result is known to be true with (m,n) replaced by (m − 1, n − 1), eachf t(Ct) has (n − 1)-dimensional measure zero. In addition, the continuity of therelevant partial derivatives implies that C \ C1 is locally closed, so Proposition11.2.1 implies that f(C ∩ V ) has measure zero.

Proof of (b): As above, it is enough to show that an arbitrary x ∈ Ci \ Ci+1 has aneighborhood V such that f(Ci ∩V ) has measure zero. Choose a partial derivative

∂i+1f∂xs1 ···∂xsi ·∂xsi+1

that does not vanish at x. Define h : U → Rm by

h(x) := ( ∂if∂xs1 ···∂xsi

(x), x2, . . . , xm).


After reindexing we may assume that si+1 = 1, so that the matrix of partial deriva-tives of h at x is triangular with nonzero diagonal entries. By the inverse functiontheorem the restriction of h to some neighborhood V of x is a C∞ diffeomorphism.Let g := f ◦ (h|V )−1. Then h(V ∩ Ci) ⊂ {0} × Rm−1. Let

g0 : { y ∈ Rm−1 : (0, y) ∈ h(V ) } → Rn

be the map g0(y) = g(0, y). Then f(V ∩(Ci\Ci+1)) is contained in the set of criticalvalues of g0, and the latter set has measure zero because the result is already knownwhen (m,n) is replaced by (m− 1, n).

Proof of (c): Since U can be covered by countably many compact cubes, it sufficesto show that f(Cr∩I) has measure zero whenever I ⊂ U is a compact cube. Since Iis compact and the partials of f of order r are continuous, Taylor’s theorem impliesthat for every ε > 0 there is δ > 0 such that

‖f(x+ h)− f(x)‖ ≤ ε‖h‖r

whenever x, x+ h ∈ I with x ∈ Cr and ‖h‖ < δ. Let L be the side length of I. Foreach integer d > 0 divide I into dm subcubes of side length L/d. The diameter ofsuch a subcube is

√mL/d. If this quantity is less than δ and the subcube contains a

point x ∈ Cr, then its image is contained in a cube of sidelength 2ε(√mL)r centered

at f(x). There are dm subcubes of I, each one of which may or may not contain apoint in Cr, so for large d, f(Cr ∩ I) is contained in a finite union of cubes of totalvolume at most

(

2(√mL)r

)nεndm−nr. Now observe that nr ≥ m: either m < n and

r ≥ 1, or m ≥ n and

nr ≥ n(m− n + 1) = (n− 1)(m− n) +m ≥ m.

Therefore f(Cr ∩ I) is contained in a finite union of cubes of total volume at most(

2(√mL)r

)nεn, and ε may be arbitrarily small.

Instead of worrying about just which degree of differentiability is the smallestthat allows all required applications of Sard’s theorem, in the remainder of the bookwe will, for the most part, work with objects that are smooth, where smooth is asynonym for C∞. This will result in no loss of generality, since for the most partthe arguments depend on the existence of smooth objects, which will follow fromProposition 10.2.7. However, in Chapter 15 there will be given objects that may, inapplications, be only C1, but Sard’s theorem will be applicable because the domainand range have the same dimension. It is perhaps worth mentioning that for thisparticular case there is a simpler proof, which can be found on p. 72 of Spivak(1965).

We briefly describe the most general and powerful version of Sard’s theorem,which depends on a more general notion of dimension.

Definition 11.3.2. For α > 0, a set S ⊂ Rk has α-dimensional Hausdorff

measure zero if, for any ε > 0, there is a sequence {(xj , δj)}∞j=1 such that

S ⊂⋃

j

Uδj (xj) and∑

j

δαj < ε.

11.4. MEASURE ZERO SUBSETS OF MANIFOLDS 157

Note that this definition makes perfect sense even if α is not an integer! LetU ⊂ Rm be open, and let f : U → Rn be a Cr function. For 0 ≤ p < m let Rp

be the set of points x ∈ M such that the rank of Df(x) is less than or equal top. The most general and sophisticated version of Sard’s theorem, due to Federer,states that f(Rp) has α-dimensional measure zero for all α > p+ m−p

r. A beautiful

informal introduction to the circle of ideas surrounding these concepts, which is thebranch of analysis called geometric measure theory, is given by Morgan (1988). Theproof itself is in Section 3.4 of Federer (1969). This reference also gives a completeset of counterexamples showing this result to be best possible.

11.4 Measure Zero Subsets of Manifolds

In most books Sard’s theorem is presented as a result concerning maps betweenEuclidean spaces, as in the last section, with relatively little attention to the ex-tension to maps between manifolds. Certainly this extension is intuitively obvious,and there are no real surprises or subtleties in the details, which are laid out in thissection.

Definition 11.4.1. IfM ⊂ Rk is an m-dimensional C1 manifold, then S ⊂ M has

m-dimensional measure zero if ϕ−1(S) has measure zero whenever U ⊂ Rm isopen and ϕ : U →M is a C1 parameterization.

In order for this to be sensible, it should be the case that ϕ(S) has measure zerowhenever ϕ : U →M is a C1 parameterization and S ⊂ U has measure zero. Thatis, it must be the case that if ϕ′ : U ′ → M is another C1 parameterization, thenϕ′−1(ϕ(S)) has measure zero. This follows from the application of Lemma 11.1.4to ϕ′−1 ◦ ϕ.

Clearly the basic properties of sets of measure zero in Euclidean spaces—thecomplement of a set of measure zero is dense, and countable unions of sets of mea-sure zero have measure zero—extend, by straightforward verifications, to subsets ofmanifolds of measure zero. Since uncountable unions of sets of measure zero neednot have measure zero, the following fact about manifolds (as we have defined them,namely submanifolds of Euclidean spaces) is comforting, even if the definition abovemakes it superfluous.

Lemma 11.4.2. If M ⊂ Rk is an m-dimensional C1 manifold, then M is coveredby the images of a countable system of parameterizations {ϕj : Uj → M}j=1,2,....

Proof. If p ∈ M and ϕ : U → M is a Cr parameterization with p ∈ ϕ(U), thenthere is an open set W ⊂ Rk such that ϕ(U) =M ∩W . Of course there is an openball B of rational radius whose center has rational coordinates with p ∈ B ⊂ W ,and we may replace ϕ with its restriction to ϕ−1(B). Now the claim follows fromthe fact that there are countably many balls in Rk of rational radii centered atpoints with rational coordinates.

The “conceptually correct” version of Sard’s theorem is an easy consequence ofthe Euclidean special case.


Theorem 11.4.3. (Morse-Sard Theorem) If f : M → N is a smooth map, whereM and N are smooth manifolds, then the set of critical values of f has measurezero.

Proof. Let C be the set of critical points of f . In view of the last result it suffices toshow that f(C∩ϕ(U)) has measure zero whenever ϕ : U →M is a parameterizationforM . That is, we need to show that ψ−1(f(C∩ϕ(U))) has measure zero wheneverψ : V → N is a parameterization for N . But ψ−1(f(C ∩ϕ(U))) is the set of criticalvalues of ψ−1 ◦ f ◦ ϕ, so this follows from Theorem 11.3.1.

11.5 Genericity of Transversality

Intuitively, it should be unlikely that two smooth curves in 3-dimensional spaceintersect. If they happen to, it should be possible to undo the intersection by“perturbing” one of the curves slightly. Similarly, a smooth curve and a smoothsurface in 3-space should intersect at isolated points, and again one expects thata small perturbation can bring about this situation if it is not the case initially.Sard’s theorem can help us express this intuition precisely.

Let M and N be m- and n-dimensional smooth manifolds, and let P be a p-dimensional smooth submanifold of N . Recall that a smooth function f : M → Nis transversal to P if

Df(p)(TpM) + Tf(p)P = Tf(p)N

for every p ∈ f−1(P ). The conceptual point studied in this section is that smoothfunctions fromM to N that are transversal to P are plentiful, in the sense that anycontinuous function can be approximated by such a map. This is expressed moreprecisely by the following result.

Proposition 11.5.1. If f : M → N is a continuous function and A ⊂ M × Nis a neighborhood of Gr(f), then there is a smooth function f ′ : M → N that istransverse to P with Gr(f ′) ⊂ A.

In some applications the approximation will be required to satisfy some restric-tion. A vector field on a set S ⊂ M is a continuous function ζ : S → TM suchthat π ◦ζ = IdM , where π : TM → M is the projection. We can write ζ(p) = (p, ζp)where ζp ∈ TpM . Thus a vector field on S attaches a tangent vector ζp to eachp ∈ S, in a continuous manner. The zero section of TM is M ×{0} ⊂ TM . If welike we can think of it as the image of the vector field that is identically zero, andof course it is also an m-dimensional smooth submanifold of TM .

Proposition 11.5.2. If ζ is a vector field on M and A ⊂ TM is an open neigh-borhood of { (p, ζp) : p ∈ M }, then there is a smooth vector field ζ ′ such that{ (p, ζ ′p) : p ∈M } ⊂ A and ζ ′ is transverse to the zero section of TM .

To obtain a framework that is sufficiently general we introduce an s-dimensionalsmooth manifold S and a smooth submersion h : N → S. We now fix a continuousfunction f : M → N such that h ◦ f is a smooth submersion. Our main result,which has the two results above as corollaries, is:

11.5. GENERICITY OF TRANSVERSALITY 159

Theorem 11.5.3. If A ⊂ M ×N is an open neighborhood of Gr(f), then there isa smooth function f ′ :M → N with Gr(f ′) ⊂ A, h ◦ f ′ = h ◦ f , and f ′ ⋔ P .

The first proposition above is the special case of this in which S is a point. Toobtain the second proposition we set S = M and let h be the natural projectionTM →M .

The rest of this section is devoted to the proof of Theorem 11.5.3. The proofis a matter of repeated modifying f on sets that are small enough that we canconduct the construction in a fully Euclidean setting. In the following result, whichdescribes the local modification, there is an open domain U ⊂ Rm and a compact“target” set K ⊂ U . There is a closed set C ⊂ U and an open neighborhood Y ofC on which the desired tranversality already holds. We wish to modify the functionso that the desired transversality holds on a possibly smaller neighborhood of C,and also on a neighborhood of K. However, when we apply this result U will be,in effect, an open subset of M , and in order to preserve the properties of the givenfunction at points in the boundary of U in M it will need to be the case that thefunction is unchanged outside of a given compact neighborhood of K. Collectivelythese requirements create a significant burden of complexity.

Proposition 11.5.4. Let U be an open subset of Rm, suppose that C ⊂ W ⊂ Uwith C relatively closed and W open, let K be a compact subset of U , and let Z bean open neighborhood of K whose closure is a compact subset of U . Suppose that

g = (gs, gn−s) : U → Rn = Rs × Rn−s

is a continuous function, g|W is smooth, and gs is a smooth submersion. Let P be ap-dimensional smooth submanifold of Rn, and suppose that g|C ⋔ P . Let A ⊂ U×Rn

be a neighborhood of the graph of g. Then there is an open Z ′ ⊂ Z containing U anda continuous function gn−s : U → Rn−s such that, setting g = (gs, gn−s) : U → Rn,we have:

(a) Gr(g) ⊂ A;

(b) g|U\Z = g|U\Z;

(c) g|W∪Z′ is smooth;

(d) g ⋔C∪K P .

We first explain how this result can be used to prove Theorem 11.5.3. The nextresult describes how the local modification of f looks in context. Let ψ : V → Nbe a smooth parameterization. We say that ψ is aligned with h if h(ψ(y)) isindependent of ys+1, . . . , yn.

Lemma 11.5.5. Suppose that C ⊂ W ⊂ M with C closed and W open, f |W issmooth, and f ⋔C P . Let A ⊂ M × N be an open set that contains the graph off . Suppose that ϕ : U → M and ψ : V → N are smooth parameterizations withf(ϕ(U)) ⊂ ψ(V ) and ψ aligned with h. Suppose that K ⊂ ϕ(U) is compact, and Zis an open subset of ϕ(U) whose closure is compact and contained in ϕ(U). Thenthere is an open Z ′ ⊂ Z containing K and a continuous function f : M → N suchthat:


(a) Gr(f) ⊂ A;

(b) f |M\Z = f |M\Z;

(c) f |W∪Z′ is smooth;

(d) f ⋔C∪K P .

Proof. Let

P = ψ−1(P ), g = ψ−1 ◦ f ◦ ϕ, A = { (x, y) ∈ U × V : (ϕ(x), ψ(y)) ∈ A },

C = ϕ−1(C), W = ϕ−1(W ), K = ϕ−1(K), Z = ϕ−1(Z).

Let Z ′ and g be the set and function whose existence is guaranteed by the last result.Set Z ′ = ϕ(Z ′), and define f by specifying that f agrees with f on M \ ϕ(U), and

f |ϕ(U) = ψ ◦ g ◦ ϕ−1.

Clearly f has all the desired properties.

In order to apply this we need to have an ample supply of smooth parameteri-zations for N that are aligned with h.

Lemma 11.5.6. Each point q ∈ f(M) is contained in the image of a smoothparameterization that is aligned with h.

Proof. Let ψ : V → N be any smooth parameterization whose image contains q,and let y = ψ−1(q). Let σ : W → S be a smooth parameterization whose imagecontains h(q); we can replace V with a smaller open set containing y, so we mayassume that h(ψ(V )) ⊂ σ(W ).

Since h ◦ f is a submersion, the rank of Dh(q) is s, and consequently the rankof D(σ−1 ◦ h ◦ ψ)(y) is also s. After some reindexing, y is a regular point of

θ : y′ 7→ (σ−1(h(ψ(y′))), y′s+1, . . . , y′n).

Applying the inverse function theorem, a smooth parameterization whose imagecontains q that is aligned with h is given by letting ψ be the restriction of ψ ◦ θ−1

to some neighborhood of (σ−1(h(q)), ys+1, . . . , yn).

Proof of Theorem 11.5.3. Any open subset of M is a union of open subsets whoseclosures are compact. In view of this fact and the last result, M is covered by thesets ϕ(U) where ϕ : U →M is a smooth parameterization with f(ϕ(U)) ⊂ ψ(V ) forsome smooth parameterization ψ : V → N that is aligned with h, and the closureof ϕ(U) is compact. Since M is paracompact, there is a locally finite cover by theimages of such parameterizations, and since M is separable, this cover is countable.That is, there is a sequence ϕ1 : U1 → M,ϕ2 : U2 → M, . . . whose images coverM , such that for each i, the closure of ϕi(Ui) is compact and there is a smoothparameterization ψi : Vi → N that is aligned with h such that f(ϕi(Ui)) ⊂ ψi(Vi).

We claim that there is a sequence K1, K2, . . . of compact subsets ofM that coverM , with Ki ⊂ ϕi(Ui) for each i. For p ∈ M let δ(p) be the maximum ε such that

11.5. GENERICITY OF TRANSVERSALITY 161

Uε(p) ⊂ ϕi(Ui) for some i, and let ip be an integer that attains the maximum. Thenδ : M → (0,∞) is a continuous function. For each i let δi := minp∈ϕi(Ui)

δ(p), andlet

Ki = { p ∈ ϕi(Ui) : Uδi(p) ⊂ ϕi(Ui) }.Clearly Ki is a closed subset of ϕi(Ui), so it is compact. For any p ∈ M we havep ∈ Kip, so the sets K1, K2, . . . cover M .

Let C0 = ∅, and for each positive i let Ci = K1 ∪ . . .∪Ki. Let f0 = f . Supposefor some i = 1, 2, . . . that we have already constructed a neighborhood Wi−1 of Ci−1

and a continuous function fi−1 : M → N with Gr(fi−1) ⊂ A such that f |Wi−1is

smooth and f ⋔Ci−1P . Let Zi be an open subset of ϕi(Ui) that contains Ki, and

whose closure is compact and contained in ϕi(Ui). Now Lemma 11.5.5 gives an openZ ′i ⊂ Zi containing Ki and a continuous function fi :M → N with Gr(fi) ⊂ A such

that fWi−1∪Z′

iis smooth, fi|M\Zi

= fi−1|M\Zi, and fi ⋔Ci

P . Set Wi = Wi−1 ∪ Z ′i.

Evidently this constructive process can be extended to all i.For each i, ϕi(Ui) intersects only finitely many ϕj(Uj), so the sequence

f1|ϕi(Ui), f2|ϕi(Ui), . . .

is unchanging after some point. Thus the sequence f1, f2, . . . has a well defined limitthat is smooth and transversal to P , and whose graph is contained in A.

We now turn to the proof of Proposition 11.5.4. The main idea is to select asuitable member from a family of perturbations of g. The following lemma isolatesthe step in the argument that uses Sard’s theorem.

Lemma 11.5.7. If U ⊂ Rm and B ⊂ Rn−s are open, P is a p-dimensional smoothsubmanifold of Rn, and G : U × B → Rn is smooth and transversal to P , then foralmost every b ∈ B the functions gb = G(·, b) : U → N is transversal to P .

Proof. Let Q = G−1(P ). By the transversality theorem, Q is a smooth manifold,of dimension (m+ (n− s))− (n− p) = m+ p− s. Let π be the natural projectionU × B → B. Sard’s theorem implies that almost every b ∈ B is a regular value ofπ|Q. Fix such a b. We will show that gb is transversal to P .

Fix x ∈ g−1b (P ), set q = gb(x), and choose some y ∈ TqN . Since G is transversal

to P there is a u ∈ T(x,b)(U×B) such that y is the sum of DG(x, b)u and an elementof TqP . Let u = (v, w) where v ∈ Rm and w ∈ Rn−s. Since (x, b) is a regular pointof π|Q, there is a u′ ∈ T(x,b)Q such that Dπ|Q(x, b)u′ = −w. Let u′ = (v′,−w).Then TqP contains DG(x, b)u′, so it contains

DG(x, b)u− y +DG(x, b)u′ = DG(x, b)(v + v′, 0)− y = Dgb(x)(v + v′)− y.

Thus y is the sum of Dgb(x)(v + v′) and an element of TqP , as desired.

Proof of Proposition 11.5.4. For x ∈ U let α(x) be the supremum of the set ofαx > 0 such that (x, y) ∈ A whenever y ∈ Uαx(g(x

′)). Clearly α is continuousand positive, so (e.g., Proposition 10.2.7 applied to 1

2α) there is a smooth function

α : U → (0,∞) such that 0 < α(x) < α(x) for all x.There is a neighborhood Y ⊂ U of C such that g|Y is smooth with g|Y ⋔ P .

Let Y ′ be an open subsets of U with C ⊂ Y ′ and Y′ ⊂ Y . Corollary 10.2.5 gives a


smooth function β : U → [0, 1] that vanishes on Y ′ and is identically equal to oneon U \ Y .

Let B be the open unit disk centered at the origin in Rn−s, and let G : U ×B →Rn be the smooth function

G(x, b) =(

gs(x), gn−s(x) + α(x)β(x)b)

.

For any (x, b) the image of DG(x, b) contains the image of Dg(x), so, since g|Y ⋔ P ,we have G ⋔Y×B P . Since gs is a submersion, at every (x, b) such that β(x) > 0 theimage of DG(x, b) is all of Rn, so G ⋔(U\Y )×B P . Therefore G ⋔ P . The last resultimplies that for some b ∈ B, gb = G(·, b) is transversal to P . Evidently gb agreeswith g on Y ′.

Let Z ′ be an open subset of U with K ⊂ Z ′ and Z′ ⊂ Z. Corollary 10.2.5 gives

a smooth γ : U → [0, 1] that is identically one on Z ′ and vanishes on U \Z. Defineg be setting

g(x) =(

gs(x), γ(x)gn−sb (x) + (1− γ(x))gn−s(x))

.

Clearly Gr(g) ⊂ A, g is smooth on W ∪Z ′, and g agrees with g on U \Z. Moreover,g agrees with g on Y ′ and with gb on Z

′, so g ⋔Y ′∪Z′ P .

Chapter 12

Degree Theory

Orientation is an intuitively familiar phenomenon, modelling, among other things,the fact that there is no way to turn a left shoe into a right shoe by rotating it, butthe mirror image of a left shoe is a right shoe. Consider that when you look at amirror there is a coordinate system in which the map taking each point to its mirrorimage is the linear transformation (x1, x2, x3) 7→ (−x1, x2, x3). It turns out that thecritical feature of this transformation is that its determinant is negative. Section12.1 describes the formalism used to impose an orientation on a vector space andconsistently on the tangent spaces of the points of a manifold, when this is possible.

Section 12.2 discusses two senses in which an orientation on a given objectinduces a derived orientation: a) an orientation on a ∂-manifold induces an orien-tation of its boundary; b) given a smooth map between two manifolds of the samedimension, an orientation of the tangent space of a regular point in the domaininduces an orientation of the tangent space of that point’s image. If both manifoldsare oriented, we can define a sense in which the map is orientation preserving ororientation reversing by comparing the induced orientation of the tangent space ofthe image point with its given orientation.

In Section 12.3 we first define the smooth degree of a smooth (where “smooth”now means C∞) map over a regular value in the range to be the number of preimagesof the point at which the map is orientation preserving minus the number of points atwhich it is orientation reversing. Although the degree for smooth functions providesthe correct geometric intuition, it is insufficiently general. The desired generaliza-tion is achieved by approximating a continuous function with smooth functions,and showing that any two sufficiently accurate approximations are homotopic, sothat such approximations can be used to define the degree of the given continuousfunction. However, instead of working directly with such a definition, it turns outthat an axiomatic characterization is more useful.

12.1 Orientation

The intuition underlying orientation is simple enough, but the formalism is a bitheavy, with the main definitions expressed as equivalence classes of an equivalencerelation. We assume prior familiarity with the main facts about determinants ofmatrices.

163

164 CHAPTER 12. DEGREE THEORY

No doubt most readers are well aware that a linear automorphism (that is, alinear transformation from a vector space to itself) has a determinant. What wemean by this is that the determinant of the matrix representing the transformationdoes not depend on the choice of coordinate system. Concretely, if L and L′ are thematrices of the transformation in two coordinate systems, then there is a matrix U(expressing the change of coordinates) such that L′ = U−1LU , so that

|L′| = |U−1LU | = |U |−1|L| |U | = |L|.

Let X be an m-dimensional vector space. An ordered basis of X is an orderedm-tuple (v1, . . . , vm) of linearly independent vectors in X . Mostly we will omitthe parentheses, writing v1, . . . , vm when the interpretation is clear. If v1, . . . , vmand v′1, . . . , v

′m are ordered bases, we say that they have the same orientation if

the determinant |L| of the linear map L taking v1 7→ v′1, . . . , vm 7→ v′m is positive,and otherwise they have the opposite orientation. To verify that “has the sameorientation as” is an equivalence relation we observe that it is reflexive because thedeterminant of the identity matrix is positive, symmetric because the determinantof L−1 is 1/|L|, and transitive because the determinant of the composition of twolinear functions is the product of their determinants.

The last fact also implies that if v1, . . . , vm and v′1, . . . , v′m have the opposite

orientation, and v′1, . . . , v′m and v′′1 , . . . , v

′′m also have the opposite orientation, then

v1, . . . , vm and v′′1 , . . . , v′′m must have the same orientation, so there are precisely

two equivalence classes. An orientation for X is one of these equivalence classes.An oriented vector space is a vector space for which one of the two orientationshas been specified. An ordered basis of an oriented vector space is said to bepositively oriented (negatively oriented) if it is (not) an element of the specifiedorientation.

Since the determinant is continuous, each orientation is an open subset of theset of ordered bases of X . The two orientations are disjoint, and their union is theentire set of ordered bases, so each path component of the space of ordered basesis contained in one of the two orientations. If the space of ordered bases had morethan two path components, it would be possible to develop an invariant that wasmore sophisticated than orientation. But this is not the case.

Proposition 12.1.1. Each orientation of X is path connected.

Proof. Fix a “standard” basis e1, . . . , em and some ordered basis v1, . . . , vm. Wewill show that there is a path in the space of ordered bases from v1, . . . , vm toeither e1, e1, . . . , em or −e1, e1, . . . , em. Thus the space of ordered bases has atmost two path components, and since each orientation is a nonempty union of pathcomponents, each must be a path component.

If i 6= j, then for any t ∈ R the determinant of the linear transformationtaking v1, . . . , vm to v1, . . . , vi + tvj , . . . , vm is one, so varying t gives a continuouspath in the space of ordered bases. Combining such paths, we can find a pathfrom v1, . . . , vm to w1, . . . , wm where wi =

∑

j bijej with bij 6= 0 for all i and j.Beginning at w1, . . . , wm, such paths can be combined to eliminate all off diagonalcoefficients, arriving at an ordered basis of the form c1e1, . . . , cmem. From here we

12.1. ORIENTATION 165

can continuously rescale the coefficients, arriving at an ordered basis d1e1, . . . , dmemwith di = ±1 for all i.

For any ordered basis v1, . . . , vm and any i = 1, . . . , m− 1 there is a path

θ 7→ (v1, . . . , vi−1, cos θvi + sin θvi+1, cos θvi+1 − sin θvi, vi+1, . . . , vm)

from [0, π] to the space of ordered bases. Evidently such paths can be combined toconstruct a path from d1e1, . . . , dmem to ±e1, e2, . . . , em.

This result has a second interpretation. The general linear group of X isthe group GL(X) of all nonsingular linear transformations L : X → X , withcomposition as the group operation. The identity component of GL(X) is thesubgroup GL+(X) of linear transformations with positive determinant. If we fix aparticular basis e1, . . . , em there is a bijection L↔ (Le1, . . . , Lem) between GL(X)and the set of ordered bases of X , which gives the following version of the lastresult.

Corollary 12.1.2. GL+(X) is path connected.

We wish to extend the notion of orientation to ∂-manifolds. Let M ⊂ Rk be anm-dimensional smooth ∂-manifold. Roughly, an orientation of M is a “continuous”assignment of orientations to the tangent spaces at the various points of M . Oneway to do this is to require that if ϕ : U →M is a smooth parameterization, whereU is a connected open subset of X , and (v1, . . . , vm) is an ordered basis of X , thenthe bases (Dϕ(x)v1, . . . , Dϕ(x)vm) are all either positively oriented or negativelyoriented. The method we adopt is a bit more concrete, and its explanation is a bitlong winded, but the tools we obtain will be useful later.

A path inM is a continuous function γ : [a, b] → M , where a < b. Fix such a γ.A vector field along γ is a continuous function from [a, b] to Rk that maps each tto an element of Tγ(t)M . A moving frame along γ is an m-tuple v = (v1, . . . ,vm)of vector fields along γ such that for each t, v(t) = (v1(t), . . . ,vm(t)) is a basis ofTγ(t)M . More generally, for h = 0, . . . , m a moving h-frame along γ is an h-tuplev = (v1, . . . ,vh) of vector fields along γ such that for each t, v1(t), . . . ,vh(t) arelinearly independent.

We need to know that moving frames exist in a variety of circumstances.

Proposition 12.1.3. For any h = 0, . . . , m−1, any moving h-frame v = (v1, . . . ,vh)along γ, and any vh+1 ∈ Tγ(a)M such that v1(a), . . . ,vh(a), vh+1 are linearly in-dependent, there is a vector field vh+1 along γ such that vh+1(a) = vh+1 and(v1, . . . ,vh,vh+1) is a moving (h+ 1)-frame for γ.

There are two parts to the argument, the first of which is concrete and geometric.

Lemma 12.1.4. If η : [a, b] → Rm is a path in Rm, then for any h = 0, . . . , m −1, any moving h-frame u = (u1, . . . ,uh) along η, and any uh+1 ∈ Rm such thatu1(a), . . . ,uh(a), uh+1 are linearly independent, there is a vector field uh+1 along ηsuch that uh+1(a) = uh+1 and (u1, . . . ,uh,uh+1) is a moving (h+ 1)-frame for η.


Proof. If v1, . . . , vh, w ∈ Rm and v1, . . . , vh are linearly independent, let

π(v1, . . . , vh, w) = w −∑

i

βivi

be the projection of w onto the orthogonal complement of the span of v1, . . . , vh.The numbers βi are the solution of the linear system 〈∑i βivi, vj〉 = 〈w, vj〉, so thecontinuity of matrix inversion implies that π is a continuous function.

First suppose that uh+1 is a unit vector that is orthogonal to u1(a), . . . ,uh(a).Let s be the least upper bound of the set of s in [a, b] such that there is a continuousuh+1 : [a, s] → Rm with uh+1(t) orthogonal to u1(t), . . . ,uh(t) and ‖uh+1(t)‖ = 1for all t. The set of pairs (s,uh+1(s)) for such functions has a point of the form(s, vh+1) in its closure. The continuity of the inner product implies that vh+1 is aunit vector that is orthogonal to u1(s), . . . ,uh(s). The continuity π implies thatthere is an ε > 0 and a neighborhood U of vh+1, which we may insist is convex,such that π(u1(t), . . . ,uh(t), u) 6= 0 for all t ∈ [s−ε, s+ε]∩ [a, b] and all u ∈ U . Wecan choose s ∈ [s− ε, s) and a function uh+1 : [a, s] → Rm satisfying the conditionsabove with uh+1(s) ∈ U . We extend uh+1 to all of [a,min{s + ε, b}] by settinguh+1(t) = uh+1(t)/‖uh+1(t)‖ where

uh+1(t) =

{

π(u1(t), . . . ,uh(t),s−ts−suh+1(s) +

t−ss−svt+1), s ≤ t ≤ s,

π(u1(t), . . . ,uh(t), vt+1), s ≤ t.

Then uh+1 contradicts the definition of s if s < b, and for s = b it provides asatisfactory function.

To prove the general case we write uh+1 =∑

i αiui(a) + βu′h+1 where u′h+1

is a unit vector that is orthogonal to u1(a), . . . ,uh(a). If u′h+1 is the function

constructed in the last paragraph with u′h+1 in place of uh+1, then we can let uh+1 =∑

i αiui + βu′h+1.

The general result is obtained by applying this in the context of finite collectionof parameterizations that cover γ.

Proof of Proposition 12.1.3. There are a = t0 < t1 < · · · < tJ−1 < tJ = b suchthat for each j = 1, . . . , J , the image of γ|[tj−1,tj ] is contained in the image of asmooth parameterization. We may assume that J = 1 because the general casecan obviously be obtained from J applications of this special case. Thus there isa smooth parameterization ϕ : U → M whose image contains the image of γ. Letψ = ϕ−1, let η := ψ ◦ γ, let uh+1 = Dψ(γ(a))vh+1(a), and define a moving h-frameu along η by setting

u1(t) := Dψ(γ(t))v1(t), . . . ,uh(t) := Dψ(γ(t))vh(t).

The last result gives a uh+1 : [a, b] → Rm such that uh+1(a) = uh+1 and (u1, . . . ,uh,uh+1)is a moving (h+ 1)-frame along η. We define vh+1 to [a, b] by setting

vh+1(t) = Dϕ(η(t))uh+1(t).

12.1. ORIENTATION 167

Corollary 12.1.5. For any basis v1, . . . , vm of Tγ(a)M there is a moving frame v

along γ such that vh(a) = vh for all h. If the ordered basis v′1, . . . , v′m of Tγ(b)M has

the same orientation as v1(b), . . . ,vm(b), then there is a moving frame v′ along γsuch that v′

h(a) = vh and v′h(b) = v′h for all h.

Proof. The first assertion is obtained by applying the Proposition m times. Toprove the second we regard GL(Rm) as a group of matrices and let ρ : [a, b] →GL+(Rm) be a path with ρ(a) the identity matrix and ρ(b) the matrix such that∑

j ρij(b)vj(b) = v′i for all i, as per Corollary 12.1.2. Define v′ by setting

v′i(t) =

∑

j

ρij(t)vj(t).

Given a moving frame v and an orientation of Tγ(a)M , there is an inducedorientation of Tγ(b)M defined by requiring that v(b) is positively oriented if andonly if v(a) is positively oriented. The last result implies that it is always possibleto induce an orientation in this way, because a moving frame always exists, and thenext result asserts that the induced orientation does not depend on the choice ofmoving frame, so there is a well defined orientation of Tγ(b)M that is induced by γand an orientation of Tγ(a)M .

Lemma 12.1.6. If v and v are two moving frames along a path γ : [a, b] → M ,then v(a) and v(a) have the same orientation if and only if v(b) and v(b) have thesame orientation.

Proof. For a ≤ t ≤ b let A(t) = (aij(t)) be the matrix such that vi(t) =∑m

j=1 aij(t)vj(t).Then A is continuous, and the determinant is continuous, so t 7→ |A(t)| is a con-tinuous function that never vanishes, and consequently |A(a)| > 0 if and only if|A(b)| > 0.

If γ(b) = γ(a) and a given orientation of Tγ(a)M = Tγ(b)M differs from theone induced by the given orientation and γ, then we say that γ is an orientation

reversing loop. Suppose that M has no orientation reversing loops. For anychoice of a “base point” p0 in each path component of M and any specificationof an orientation of each Tp0M , there is an induced orientation of TpM for eachp ∈M defined by requiring that whenever γ : [a, b] →M is a continuous path withγ(a) = p0, the orientation of Tγ(b)M is the one induced by γ and the given orientationof Tp0M . If γ′ : [a′, b′] → M is a second path with γ′(a′) = γ(a) and γ′(b′) = γ(b),then for any given orientation of Tγ(a) the orientations of Tγ(b) induced by γ and γ′

must be the same because otherwise following γ, then backtracking along γ′ wouldbe an orientation reversing loop. Thus, in the absense of orientation reversing loops,an orientation of Tp0M induces an orientation at every p in the path component ofp0.

We have arrived at the following collection of concepts.

Definition 12.1.7. An orientation for M is a assignment of an orientation toeach tangent space TpM such that for every moving frame v along a path γ : [a, b] →


M , v(a) is a positively oriented basis of Tγ(a)M if and only if v(b) is a positivelyoriented basis of Tγ(b)M . We say that M is orientable if it has an orientation. Anoriented ∂-manifold is a ∂-manifold with a specified orientation. If p is a pointin an oriented ∂-manifold M , we say that an ordered basis (x1, . . . , xm) of TpM ispositively oriented if it is a member of the orientation of TpM specified by theorientation of M ; otherwise it is negatively oriented. For any orientation of Mthere is an opposite orientation obtained by reversing the orientation to eachTpM .

Our discussion above has the following summary:

Proposition 12.1.8. Exactly one of the two situations occurs:

(a) M has an orientation reversing loop.

(b) Each path component of M has two orientations, and any specification of anorientation for each path component of M determines an orientation of M .

Probably you already know that the Moebius strip is the best known example ofa ∂-manifold that is not orientable, while the Klein bottle is the best known exampleof a manifold that is not orientable. From several points of view two dimensionalprojective space is a more fundamental example of a manifold that is not orientable,but it is more difficult to visualize. (If you are unfamiliar with any of these spacesyou should do a quick web search.)

12.2 Induced Orientation

An orientation on a manifold induces an orientation on an open subset, obvi-ously. More interesting is the orientation induced on ∂M by an orientation on thea ∂-manifold M . We are also interested in how an orientation on a point in thedomain of a smooth map between manifolds of equal dimension induces an orienta-tion on the tangent space of the image point. As we will see, this generalizes to theimage point being in an oriented submanifold whose codimension is the dimensionof the domain.

As before we work with anm-dimensional smooth ∂-manifoldM with a given ori-entation. Consider a point p ∈ ∂M and a basis v1, . . . , vm of TpM with v2, . . . , vm ∈Tp∂M . Of course v2, . . . , vm is a basis of Tp∂M . There is a visually obvious sensein which v1 is either “inward pointing” or “outward pointing” that is made preciseby using a parameterization ϕ : U → M (where U ⊂ H is open) to determinewhether the first component of Dϕ−1(p)v1 is positive or negative. Note that thesets of inward and outward point vectors are both convex. Our convention will bethat an orientation of TpM induces an orientation of Tp∂M according to the rulethat if v1, . . . , vm is positively oriented and v1 is outward pointing, then v1, . . . , vm−1

is positively oriented.Does our definition of the induced orientation make sense? There are two issues

to address.First, we need to show that if v1, . . . , vm and v′1, . . . , v

′m are two bases of TpM

with the properties described in the definition above, so that either could be used to

12.2. INDUCED ORIENTATION 169

define the induced orientation of Tp∂M , then they give the same induced orientation.Suppose that v1 and v

′1 are both outward pointing. Since the set of outward pointing

vectors is convex,t 7→ (1− t)v′1 + tv1, v

′2, . . . , v

′m

is a path in the space of bases of TpM , so v′1, . . . , v′m and v1, v

′2, . . . , v

′m have the

same orientation. The first row of the matrix A of the linear transformationtaking v1, v2, . . . , vm to v1, v

′2 . . . , v

′m (concretely, v′i =

∑

j aijvj) is (1, 0, . . . , 0),so the determinant of A is the same as the determinant of its lower right hand(m − 1) × (m − 1) submatrix, which is the matrix of the linear transformationtaking v2, . . . , vm to v′2, . . . , v

′m. Therefore v1, . . . , vm has the same orientation as

v1, v′2 . . . , v

′m and v′1, . . . , v

′m if and only if v2, . . . , vm has the same orientation as

v′2, . . . , v′m.

We also need to check that what we have defined as the induced orientation of∂M is, in fact, an orientation. Consider a path γ : [a, b] → ∂M . Corollary 12.1.5gives a moving frame (v2, . . . ,vm) for ∂M along γ, and Proposition 12.1.3 impliesthat it extends to a moving frame (v1, . . . ,vm) forM along γ. Suppose that v1(a) isoutward pointing. By continuity, it must be the case that for all t, v1(t) is outwardpointing. If we assume that v1(a), . . . ,vm(a) is positively oriented, for the givenorientation, then v2(a), . . . ,vm(a) is positively oriented, for the induced orientation.In addition, v1(b), . . . ,vm(b) is positively oriented, for the given orientation, so, asdesired, v2(b), . . . ,vm(b) is positively oriented, both with respect to the inducedorientation and with respect to the orientation induced by γ and v2(a), . . . ,vm(a).

Now suppose that M and N are two m-dimensional oriented smooth manifolds,now without boundary, and that f : M → N is a smooth function. If p is aregular point of f , we say that f is orientation preserving at p if Df(p) mapspositively oriented bases of TpM to positively oriented bases of Tf(p)N ; otherwise f isorientation reversing at p. This makes sense because if v1, . . . , vm and v′1, . . . , v

′m

are two bases of TpM , then the matrix of the linear transformation taking each vito v′i is the same as the matrix of the linear transformation taking each Df(p)vi toDf(p)v′i.

We can generalize this in a way that does not play a very large role in laterdevelopments, but does provide some additional illumination at little cost. Sup-pose that M is an oriented m-dimensional smooth ∂-manifold, N is an orientedn-dimensional boundaryless manifold, P is an oriented (n − m)-dimensional sub-manifold of N , and f : M → N is a smooth map that is transversal to P . We saythat f is positively oriented relative to P at a point p ∈ f−1(P ) if

Df(p)v1, . . . , Df(p)vm, w1, . . . , wn−m

is a positively oriented ordered basis of Tf(p)N whenever v1, . . . , vm is a positivelyoriented ordered basis of TpM and w1, . . . , wn−m is a positively oriented orderedbasis of Tf(p)P . It is easily checked that whether or not this is the case does notdepend on the choice of positively oriented ordered bases v1, . . . , vm and TpM andw1, . . . , wn−m. When this is not the case we say that f is negatively oriented

relative to P at p.Now, in addition, suppose that f−1(P ) is finite. The oriented intersection

number I(f, P ) is the number of points in f−1(P ) at which f is positively oriented


relative to P minus the number of points at which f is negatively oriented relativeto P . An idea of critical importance for the entire project is that under naturaland relevant conditions this number is a homotopy invariant. This corresponds tothe special case of the following result in which M is the cartesian product of aboundaryless manifold and [0, 1].

Theorem 12.2.1. Suppose thatM is an (m+1)-dimensional oriented smooth man-ifold, N is an n-dimensional smooth manifold, P is a compact (n−m)-dimensionalsmooth submanifold of N and f : M → N is a smooth function that is transverseto P with f−1(P ) compact. Then

I(f |∂M , P ) = 0.

Proof. Proposition 10.8.5 implies that f−1(P ) is a neat smooth ∂-submanifold ofM . Since f−1(P ) is compact, it has finitely many connected components, andProposition 10.9.1 implies that each of these is either a loop or a line segment.Recalling the definition of neatness, we see that the elements of f−1(P ) ∩ ∂M arethe endpoints of the line segments. Fix one of the line segments. It suffices toshow that f |∂M is positively oriented relative to P at one endpoint and negativelyoriented relative to P at the other.

The line segment is a smooth ∂-manifold, and by gluing together smooth param-eterizations of open subsets, using a partition of unity, we can construct a smoothpath γ : [a, b] → M that traverses it, with nonzero derivative everywhere. Letv1(t) = Dγ(t)1 for all t. (Here 1 is thought of as an element of Tt[a, b] under theidentification of this space with R.) Neatness implies that v1(a) is inward pointingand v1(b) is outward pointing.

Let v2, . . . , vm+1 be a basis of Tγ(a)∂M . Proposition 12.1.3 implies that v1 ex-tends to a moving frame v1, . . . ,vm+1 along γ with v2(a) = v2, . . . ,vm+1(a) = vm+1.We have

vj(b) = v′j + αjv1(b) (j = 2, . . . , m+ 1)

for some basis v′2, . . . , v′m+1 of Tγ(b)∂M and scalars α2, . . . , αm+1. We can replace

v with the moving frame given by Corollary 12.1.5 applied to the ordered basisv1(b), v

′2, . . . , v

′m+1 of Tγ(b)M , so we may assume that v2(b), . . . ,vm+1(b) ∈ Tγ(b)∂M .

Then v1(a), . . . ,vm+1(a) is a positively oriented basis of Tγ(a)M if and only ifv1(b), . . . ,vm+1(b) is a positively oriented basis of Tγ(b)M . Since v1(a) is inwardpointing and v1(b) is outward pointing, v2(a), . . . ,vm+1(a) is a positively orientedbasis of Tγ(a)∂M if and only if v2(b), . . . ,vm+1(b) is a negatively oriented basis ofTγ(b)M .

Proposition 12.1.3 implies that there is a moving frame w1, . . . ,wn−m alongf ◦γ : [a, b] → P . As we have defined orientation, w1(a), . . . ,wn−m(a) is a positivelyoriented basis of Tf(γ(a))P if and only if w1(b), . . . ,wn−m(b) is a positively orientedbasis of Tf(γ(b))P , and

Df(p)v2(a), . . . , Df(p)vm+1(a),w1(a), . . . ,wn−m(a)

is a positively oriented basis of Tf(γ(a))N if and only if

Df(p)v2(b), . . . , Df(p)vm+1(b),w1(b), . . . ,wn−m(b)

12.3. THE DEGREE 171

is a positively oriented basis of Tf(γ(b))N . Combining all this, we conclude that f |∂Mis positively oriented relative to P at γ(a) if and only if it is and negatively orientedrelative to P at γ(b), which is the desired result.

12.3 The Degree

LetM and N be m-dimensional smooth manifolds. We can restrict a smooth f :M → N to a subset of the domain and consider the degree of the restricted functionover some point in the range. The axioms characterizing the degree express relationsbetween the degrees of the various restrictions. In order to get a “clean” theorywe need to consider subdomains that are compact, and which have no preimages intheir topological boundaries. (Intuitively, a preimage that is in the boundary of acompact subset is neither clearly inside the domain nor unambiguously outside it.)

For a compact C ⊂M let ∂C = C \ (M \ C) be the topological boundary of C.

Definition 12.3.1. A continuous function f : C → N with compact domain C ⊂M is degree admissible over q ∈ N if

f−1(q) ∩ ∂C = ∅.

If, in addition, f is smooth and q is a regular value of f , then f is smoothly degree

admissible over q. Let D(M,N) be the set of pairs (f, q) in which f : C → N isa continuous function with compact domain C ⊂ M that is degree admissible overq ∈ N . Let D∞(M,N) be the set of (f, q) ∈ D(M,N) such that f is smoothly degreeadmissible over q.

Definition 12.3.2. If C ⊂M is compact, a homotopy h : C× [0, 1] → N is degreeadmissible over q if , for each t, ht is degree admissible over q. We say that h issmoothly degree admissible over q if, in addition, h is smooth and h0 and h1are smoothly degree admissible over q.

Proposition 12.3.3. There is a unique function deg∞ : D∞(M,N) → Z, taking(f, q) to deg∞q (f), such that:

(∆1) deg∞q (f) = 1 for all (f, q) ∈ D∞(M,N) such that f−1(q) is a singleton {p}and f is orientation preserving at p.

(∆2) deg∞q (f) =∑r

i=1 deg∞q (f |Ci

) whenever (f, q) ∈ D∞(M,N), the domain of fis C, and C1, . . . , Cr are pairwise disjoint compact subsets of C such that

f−1(q) ⊂ C1 ∪ . . . ∪ Cr \ (∂C1 ∪ . . . ∪ ∂Cr).

(∆3) deg∞q (h0) = deg∞q (h1) whenever C ⊂ M is compact and the homotopy h :C × [0, 1] → N is smoothly degree admissible over q.

Concretely, deg∞q (f) is the number of p ∈ f−1(q) at which f is orientation preserv-ing minus the number of p ∈ f−1(q) at which f is orientation reversing.


Proof. For (f, q) ∈ D(M,N) the inverse function theorem implies that each p ∈f−1(q) has a neighborhood that contains no other element of f−1(q), and since U iscompact it follows that f−1(q) is finite. Let deg∞q (f) be the number of p ∈ f−1(q)at which f is orientation preserving minus the number of p ∈ f−1(q) at which f isorientation reversing.

Clearly deg∞ satisfies (∆1) and (∆2). Suppose that h : C × [0, 1] → N issmoothly degree admissible over q. Let V be a neighborhood of q such that for allq′ ∈ V :

(a) h−1(q′) ⊂ U × [0, 1];

(b) q′ is a regular value of h0 and h1;

(c) deg∞q′ (h0) = deg∞q (h0) and deg∞q′ (h1) = deg∞q (h1).

Sard’s theorem implies that some q′ ∈ V is a regular value of h. In view of (a) wecan apply Theorem 12.2.1, concluding that the degree of h|∂(U×[0,1]) = h|U×{0,1} overq′ is zero. Since the orientation of M × {0} induced by M × [0, 1] is the oppositeof the induced orientation of M × {1}, this implies that deg∞q′ (h0)− deg∞q′ (h1) = 0,from which it follows that deg∞q (h0) = deg∞q (h1). We have verified (∆3).

It remains to demonstrate uniqueness. In view of (∆2), this reduces to showinguniqueness for (f, q) ∈ D∞(M,N) such that f−1(q) = {p} is a singleton. If f isorientation preserving at p, this is a consequence of (∆1), so we assume that f isorientation reversing at p.

The constructions in the remainder of the proof are easy to understand, buttedious to elaborate in detail, so we only explain the main ideas. Using the pathconnectedness of each orientation (Proposition 12.1.1) and an obvious homotopybetween an f that has p as a regular point and its linear approximation, withrespect to some coordinate systems for the domain and range, one can show that(∆3) implies that deg∞q (f) does not depend on the particular orientation reversingf . Using one of the bump functions constructed after Lemma 10.2.2, one can easilyconstruct a smooth homotopy j : M × [0, 1] → M such that j0 = IdM , each jt is asmooth diffeomorphism, and j1(p) is any point in some neighborhood of p. Applying(∆3) to h = f ◦ j, we find that deg∞q (f) does not depend on which point (withinsome neighborhood of p) is mapped to q. The final construction is a homotopybetween the given f and a function f ′ that has three preimages of q near p, with f ′

being orientation reversing at two of them and orientation preserving at the third.In view of the other conclusions we have reached, (∆3) implies that

deg∞q (f) = 2 deg∞q (f) + 1.

In preparation for the next result we show that deg∞ is continuous in a ratherstrong sense.

Proposition 12.3.4. If C ⊂ M is compact, f : C → N is continuous, and q ∈N \f(∂C), then are neighborhoods Z ⊂ C(C,N) of f and V ⊂ N \f(∂C) of q suchthat

deg∞q′ (f′) = degq′′(f

′′)

12.3. THE DEGREE 173

whenever f ′, f ′′ ∈ Z ∩ C∞(C,N), q′, q′′ ∈ V , q′ is a regular value of f ′, and q′′ is aregular value of f ′′.

Proof. Let V be an open disk in N that contains q with V ⊂ N \ f(∂C). Then

Z ′ = { f ′ ∈ C(C,N) : f(∂C) ⊂ N \ V }

is an open subset of C(C,N), and Theorem 10.7.7 gives an open Z ⊂ Z ′ containing fsuch that for any f ′, f ′′ ∈ Z∩C∞(C,N) there is a smooth homotopy h : C×[0, 1] →N with h0 = f ′, h1 = f ′′, and ht ∈ Z ′ for all t, which implies that h is a degreeadmissible homotopy, so (∆3) implies that deg∞q′′′(f

′) = deg∞q′′′(f′′) whenever q′′′ ∈ V

is a regular point of both f ′ and f ′′.Since Sard’s theorem implies that such a q′′′ exists, it now suffices to show that

deg∞q′ (f′) = deg∞q′′(f

′) whenever f ′ ∈ Z∩C∞(C,N) and q′, q′′ ∈ V are regular valuesof f ′. Let j : N × [0, 1] → N be a smooth function with the following properties:

(a) j0 = IdN ;

(b) each jt is a smooth diffeomorphism;

(c) j(y, t) = y for all y ∈ N \ V and all t;

(d) j1(q′) = q′′.

(Construction of such a j, using the techniques of Section 10.2, is left as an exercise.)Clearly jt(q

′) is a regular value of jt ◦ f for all t, so the concrete characterization ofdeg∞ implies that deg∞jt(q′)(jt ◦ f ′) is locally constant as a function of t. Since theunit interval is connected, it follows that deg∞q′ (f

′) = deg∞q′′(j1 ◦ f ′). On the otherhand jt ◦ f ′ ∈ Z ′ for all t, so the homotopy (y, t) 7→ j(f ′(y), t) is smoothly degreeadmissible over q′′, and (∆3) implies that deg∞q′′(j1 ◦ f ′) = deg∞q′′(f

′).

The theory of the degree is completed by extending the degree to continuousfunctions, dropping the regularity condition.

Theorem 12.3.5. There is a unique function deg : D(M,N) → Z, taking (f, q) todegq(f), such that:

(D1) degq(f) = 1 for all (f, q) ∈ D(M,N) such that f is smooth, f−1(q) is asingleton {p}, and f is orientation preserving at p.

(D2) degq(f) =∑r

i=1 degq(f |Ci) whenever (f, q) ∈ D(M,N), the domain of f is C,

and C1, . . . , Cr are pairwise disjoint compact subsets of U such that

f−1(q) ⊂ C1 ∪ . . . ∪ Cr \ (∂C1 ∪ . . . ∪ ∂Cr).

(D3) If (f, q) ∈ D(M,N) and C is the domain of f , there is a neighborhood A ⊂C(C,N)×N of (f, q) such that degq′(f

′) = degq(f) for all (f ′, q′) ∈ A.


Proof. We claim that if deg : D(M,N) → Z satisfies (D1)-(D3), then its restrictionto D∞(M,N) satisfies (∆1)-(∆3). For (∆1) and (∆2) this is automatic. Supposethat C ⊂ M is compact and h : U × [0, 1] → N is a smoothly degree admissiblehomotopy over q. Such a homotopy may be regarded as a continuous function from[0, 1] to C(U,N). Therefore (D3) implies that degq(ht) is a locally constant functionof t, and since [0, 1] is connected, it must be constant. Thus (∆3) holds.

Proposition 11.5.1 implies that for any (f, q) ∈ D(M,N) the set of smoothf ′ : M → N that have q as a regular value is dense at f . In conjunction withProposition 12.3.4, this implies that the only possibility consistent with (D3) is toset degq(f) = deg∞q′ (f

′) for (f ′, q′) ∈ D∞(M,N) with f ′ and q′ close to f and q.This establishes uniqueness, and Proposition 12.3.4 also implies that the definitionis unambiguous. It is easy to see that (D1) and (D2) follow from (∆1) and (∆2),and (D3) is automatic.

Since (D2) implies that the degree of f over q is the sum of the degrees of therestrictions of f to the various connected components of the domain of f , it makessense to study the degree of the restriction of f to a single component. For thisreason, when studying the degree one almost always assumes that M is connected.(In applications of the degree this may fail to be the case, of course.) The imageof a connected set under a continuous mapping is connected, so if M is connectedand f : M → N is continuous, its image is contained in one of the connectedcomponents of N . Therefore it also makes sense to assume that N is connected.

So, assume that N is connected, and that f : M → N is continuous. We have(M, f, q) ∈ D(M,N) for all q ∈ N , and (D3) asserts that degq(f) is continuous asa function of q. Since Z has the discrete topology, this means that it is a locallyconstant function, and since N is connected, it is in fact constant. That is, degq(f)does not depend on q; when this is the case we will simply write deg(f), and wespeak of the degree of f without any mention of a point in N .

12.4 Composition and Cartesian Product

In Chapter 5 we emphasized restriction to a subdomain, composition, and carte-sian products, as the basic set theoretic methods for constructing new functions fromones that are given. The bahevior of the degree under restriction to a subdomain isalready expressed by (D3), and in this section we study the behavior of the degreeunder composition and products. In both cases the result is given by multiplication,reflecting basic properties of the determinant.

Proposition 12.4.1. If M , N , and P are oriented m-dimensional smooth mani-folds, C ⊂M and D ⊂ N are compact, f : C → N and g : D → P are continuous,g is degree admissible over r ∈ P , and g−1(r) is contained in one of the connectedcomponents of N \ f(∂C), then for any q ∈ g−1(r) we have

degr(g ◦ f) = degq(f) · degr(g).

Proof. Since C∞(C,N) and C∞(D,P ) are dense in C(C,N) and C(D,P ) (Theorem10.7.6) and composition is a continuous operation (Proposition 5.3.6) the continuity

12.4. COMPOSITION AND CARTESIAN PRODUCT 175

property (D3) of the degree implies that is suffices to prove the claim when f andg are smooth. Sard’s theorem implies that there are points r arbitrarily near rthat are regular values of both g and g ◦ f , and Proposition 12.3.4 implies that therelevant degrees are unaffected if r is replaced by such a point, so we may assumethat r has these regularity properties.

For q ∈ g−1(r) let sg(q) be 1 or −1 according to whether g is orientation pre-serving or orientation reversing at q. For p ∈ (g ◦ f)−1(q) define sf (p) and sg◦f (p)similarly. In view of the chain rule and the definition of orientation preservationand reversal, sg◦f (p) = sg(f(p))sf(p). Therefore

deg(g ◦ f) =∑

p∈(g◦f)−1(r)

sg(f(p))sf(p) =∑

q∈g−1(r)

sg(q)

(

∑

p∈g−1(q)

sf(p)

)

=∑

q∈g−1(r)

sg(q) degq(f).

Since g−1(r) is contained in a single connected component of N \f(∂C), Proposition12.3.4 implies that degq(f) is the same for all q ∈ g−1(r), and

∑

q∈g−1(r) sg(q) =

degr(g).

The hypotheses of the last result are rather stringent, which makes it ratherartificial. For topologists the following special case is the main point of interest.

Corollary 12.4.2. If M , N , and P are compact oriented m-dimensional smoothmanifolds, N is connected, and f :M → N and g : N → P are continuous, then

deg(g ◦ f) = deg(f) · deg(g).For cartesian products the situation is much simpler.

Proposition 12.4.3. Suppose that M and N are oriented m-dimensional smoothmanifolds, M ′ and N ′ are oriented m′-dimensional smooth manifolds, C ⊂ M andC ′ ⊂ M are compact, and f : C → N and f ′ : C ′ → N ′ are index admissible overq and q′ respectively. Then

deg(q,q′)(f × f ′) = degq(f) · degq′(f ′).

Proof. For reasons explained in other proofs above, we may assume that f and f ′

are smooth and that q and q′ are regular values of f and f ′. For p ∈ f−1(r) letsf(p) be 1 or −1 according to whether f is orientation preserving or orientationreversing at p, and define sf ′(p

′) for p′ ∈ f ′−1(q′) similarly. Since the determinantof a block diagonal matrix is the product of the determinants of the blocks, f × f ′

is orientation preserving or orientation reversing at (p, p′) according to whethersp(f)sp′(f

′) is positive or negative, so

deg(q,q′)(f × f ′) =∑

(p,p′)∈(f×f ′)−1(q,q′)

sp(f)sp′(f′)

=∑

p∈f−1(q)

sp(f) ·∑

p′∈f ′−1(q′)

sp′(f′) = degq(f) · degq′(f ′).

Chapter 13

The Fixed Point Index

We now take up the theory of the fixed point index. For continuous functions de-fined on compact subsets of Euclidean spaces it is no more than a different renderingof the theory of the degree; this perspective is developed in Section 13.1.

But we will see that it extends to a much higher level of generality, because thedomain and the range of the function or correspondence have the same topology.Concretely, there is a property called Commutativity that relates the indices of thetwo compositions g ◦ g and g ◦ g where g : C → X and g : C → X are continuous,and other natural restrictions on this data (that will give rise to a quite cumbersomedefinition) are satisfied. This property requires that we extend our framework toallow comparison across spaces. Section 13.2 introduces the necessary abstractionsand verifies that Commutativity is indeed satisfied in the smooth case. It turns outthat this boils down to a fact of linear algebra that came as a surprise when thistheory was developed.

When we extended the degree from smooth to continuous functions, we showedthat continuous functions could be approximated by smooth ones, and that this gavea definition of the degree for continuous functions that made sense and was uniquelycharacterized by certain axioms. In somewhat the same way Commutativity willbe used, in Section 13.4, to extend from Euclidean spaces to separable ANR’s, asper the ideas developed in Section 7.6. The argument is lengthy, technically dense,and in several ways the culmination of our work to this point.

The Continuity axiom is then used in Section 13.5 to extend the index to con-tractible valued correspondences. The underlying idea is the one used to extendfrom smooth to continuous functions: approximate and show that the resulting def-inition is consistent and satisfies all properties. Again, there are many verifications,and the argument is rather dense.

Multiplication is an additional property of the index that describe its behav-ior in connection with cartesian products. For continuous functions on subsets ofEuclidean spaces it is a direct consequence of Proposition 12.4.3. At higher levelsof generality it is, in principle, a consequence of the axioms, because those axiomscharacterize the index uniquely, but an argument deriving Multiplication from theother axioms is not known. Therefore we carry Multiplication along as an addi-tional property that is extended from one level of generality to the next along witheverything else.

176

13.1. AXIOMS FOR AN INDEX ON A SINGLE SPACE 177

13.1 Axioms for an Index on a Single Space

The axiom system for the fixed point index is introduced in two stages. Thissection presents the first group of axioms, which describe the properties of theindex that concern a single space. Fix a metric space X . For a compact C ⊂ Xlet intC = C \ ∂C be the interior of C, and let ∂C = C \ intC be its topologicalboundary.

Definition 13.1.1. An index admissible correspondence for X is an uppersemicontinuous correspondence F : C → X, where C ⊂ X is compact, that has nofixed points in ∂C.

There will be various indices, according to which sorts of correspondences areconsidered. The next definition expresses the common properties of their domains.

Definition 13.1.2. An index base for X is a set of index admissible correspon-dences F : C → X such that:

(a) f ∈ I whenever C ⊂ X is compact and f : C → X is an index admissiblecontinuous function;

(b) F |D ∈ I whenever F : C → X is an element of I, D ⊂ C is compact, andF |D is index admissible.

For each integer m ≥ 0 an index base for Rm is given by letting Im be the set ofindex admissible continuous functions f : C → Rm.

We can now state the first batch of axioms.

Definition 13.1.3. Let I be an index base for X. An index for I is a functionΛX : I → Z satisfying:

(I1) (Normalization1) If c : C → X is a constant function whose value is anelement of intC, then

ΛX(c) = 1.

(I2) (Additivity) If F : C → X is an element of I, C1, . . . , Cr are pairwise disjointcompact subsets of C, and FP(F ) ⊂ intC1 ∪ . . . ∪ intCr, then

ΛX(F ) =∑

i

ΛX(F |Ci).

(I3) (Continuity) For each element F : C → X of I there is a neighborhoodA ⊂ U(C,X) of F such that ΛX(F ) = ΛX(F ) for all F ∈ A ∩ I.

A proper appreciation of Continuity depends on the following immediate conse-quence of Theorem 5.2.1.

1In the literature this condition is sometimes described as “Weak Normalization,” in contrastwith a stronger condition defined in terms of homology.

178 CHAPTER 13. THE FIXED POINT INDEX

Proposition 13.1.4. If C ⊂ X is compact, then

{F ∈ U(C,X) : F is index admissible }

is an open subset of U(C,X).

Proposition 13.1.5. For each m = 1, 2, . . . there is a unique index ΛRm for Imgiven by

ΛRm(f) = deg0(IdC − f).

Proof. Observe that if C ⊂ Rm is compact, then f : C → Rm is index admissible ifand only if IdC−f is degree admissible over the origin, Now (I1)-(I3) follow directlyfrom (D1)-(D3).

To prove uniqueness suppose that Λm is an index for Im. For (g, q) ∈ D(Rm,Rm)let

dq(g) = Λm(IdC − g − q),

where C is the domain of g. It is straightforward to show that d satisfies (D1)-(D3),so it must be the degree, and consequently Λm = Λm.

As we explain now, invariance under homotopy is subsumed by Continuity.However, homotopies will still be important in our work, so we have the followingdefinition and result.

Definition 13.1.6. For a compact C ⊂ X a homotopy h : C× [0, 1] → X is indexadmissible if each ht is index admissible.

Proposition 13.1.7. If I is an index base for X and ΛX is an index for I, thenΛX(h0) = ΛX(h1) whenever h : [0, 1] → C(C,X) is an index admissible homotopy.

Proof. Continuity implies that ΛX(ht) is a locally constant function of t, and [0, 1]is connected.

We will refer to this result as the homotopy!principle.

13.2 Multiple Spaces

We now introduced two properties of the index that involve comparison acrossdifferent spaces. When we define an abstract notion of an index satisfying theseconditions, we need to require that the set of spaces is closed under the operationsthat are involved in these conditions, so we require that the sets of spaces andcorrespondences are closed under cartesian products.

Definition 13.2.1. An index scope S consists of a class of metric spaces SS andan index base IS(X) for each X ∈ SS such that

(a) SS contains X ×X ′ whenever X,X ′ ∈ SS;

(b) F × F ′ ∈ IS(X ×X ′) whenever X,X ′ ∈ SS , F ∈ IS(X), and F ′ ∈ IS(X ′).

13.2. MULTIPLE SPACES 179

Our first index scope S0 has the collection of spaces SS0 = {R0,R1,R2, . . .} withIS0(Rm) = Im for each m. Of course (b) is satisfied by identifying Rm × Rn withRm+n.

To understand the motivation for the following definition, first suppose thatX, X ∈ SS , and that g : X → X and g : X → X are continuous. In thiscircumstance it will be the case that g ◦ g and g ◦ g have the same index. Wewould like to develop this idea in greater generality, for functions g : C → X andg : C → X , but for our purposes it is too restrictive to require that g(C) ⊂ C andg(C) ⊂ C. In this way we arrive at the following definition.

Definition 13.2.2. A commutativity configuration is a tuple

(X,D,E, g, X, D, E, g)

where X and X are metric spaces and:

(a) E ⊂ D ⊂ X, E ⊂ D ⊂ X, and D, D, E, and E are compact;

(b) g ∈ C(D, X) and g ∈ C(D,X) with g(E) ⊂ int D and g(E) ⊂ intD;

(c) g ◦ g|E and g ◦ g|E are index admissible;

(d) g(FP(g ◦ g|E)) = FP(g ◦ g|E).

Before going forward, we should think through the details of what (d) means.If x is a fixed point of g ◦ g|E, then g(x) is certainly a fixed point of g ◦ g, so it is afixed point of g ◦ g|E if and only if g(x) ∈ E. Thus the inclusion

g(FP(g ◦ g|E)) ⊂ FP(g ◦ g|E)

holds if and only ifg(FP(g ◦ g|E)) ⊂ E. (∗)

On the other hand, if x is a fixed point of g ◦ g|E , then it is in the image of gand g(x) is a fixed point of g ◦ g that is mapped to x by g, so it is contained ing(FP(g ◦ g|E)) if and only if g(x) ∈ E. Therefore the inclusion

FP(g ◦ g|E) ⊂ g(FP(g ◦ g|E))

holds if and only ifg(FP(g ◦ g|E)) ⊂ E. (∗∗)

Thus (d) holds if and only if both (∗) and (∗∗) hold, and by symmetry this is thecase if and only if g(FP(g ◦ g|E)) = FP(g ◦ g|E)).

Definition 13.2.3. An index for an index scope S is a specification of an indexΛX for each X ∈ SS such that:

(I4) (Commutativity) If (X,D,E, g, X, D, E, g) is a commutativity configurationwith X, X ∈ SS , (E, g ◦ g|E) ∈ IS(X), and (E, g ◦ g|E) ∈ IS(X), then

ΛX(g ◦ g|E) = ΛX(g ◦ g|E).


The index is said to be multiplicative if:

(M) (Multiplication) If X,X ′ ∈ SS, F ∈ IS(X), and F ′ ∈ IS(X ′), then

ΛX×X′(F × F ′) = ΛX(F ) · Λ′X(F

′).

We can now state the result that has been the main objective of all our work.Let SSCtr be the class of separable absolute neighborhood retracts, and for eachX ∈ SSCtr let ISCtr(X) be the union over compact C ⊂ X of the sets of indexadmissible upper semicontinuous contractible valued correspondences F : C → X .Since cartesian products of contractible valued correspondences are contractiblevalued, we have defined an index scope SCtr.

Theorem 13.2.4. There is a unique index ΛCtr for SCtr, which is multiplicative.

13.3 The Index for Euclidean Spaces

The method of proof of Theorem 13.2.4 is to first establish an index in a quiterestricted setting, then show that it has unique extensions, first to an intermediateindex scope, and then to SCtr. Our goal in the remainder of this section is to prove:

Theorem 13.3.1. There is a unique index Λ0 for the index scope S0 given by settingΛ0

Rm = ΛRm for each m, and Λ0 is multiplicative.

Insofar as continuous functions can be approximated by smooth ones with reg-ular fixed points, and we can use Additivity to focus on a single fixed point, theverification of Commutativity will boil down to the following fact of linear algebra,which is not at all obvious, and was not known prior to the discovery of its relevanceto the theory of fixed points.

Proposition 13.3.2 (Jacobson (1953) pp. 103–106). Suppose K : V → W andL : W → V are linear transformations, where V and W are vector spaces ofdimensions m and n respectively over an arbitrary field. Suppose m ≤ n. Then thecharacteristic polynomials κKL and κLK of KL and LK are related by the equationκKL(λ) = λn−mκLK(λ). In particular,

κLK(1) = |IdV − LK| = |IdW −KL| = κKL(1).

Proof. We can decompose V and W as direct sums V = V1 ⊕ V2 ⊕ V3 ⊕ V4 andW = W1 ⊕W2 ⊕W3 ⊕W4 where

V1 = kerK ∩ imL, V1 ⊕ V2 = imL, V1 ⊕ V3 = kerK,

and similarly for W . With suitably chosen bases the matrices of K and L have theforms

0 K12 0 K14

0 K22 0 K24

0 0 0 00 0 0 0

and

0 L12 0 L14

0 L22 0 L24

0 0 0 00 0 0 0

13.3. THE INDEX FOR EUCLIDEAN SPACES 181

Computing the product of these matrices, we find that

κKL(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

λI −K12L22 0 −K12L24

0 λI −K22L22 0 −K22L24

0 0 λI 00 0 0 λI

∣

∣

∣

∣

∣

∣

∣

∣

Using elementary facts about determinants, this reduces to κKL(λ) = λn−k|λI −K22L22|, where k = dim V2 = dimW2. In effect this reduces the proof to the specialcase V2 = V and W2 = W , i.e. K and L are isomorphisms. But this case followsfrom the computation

|λIdV − LK| = |L−1| · |λIdV − LK| · |L| = |L−1(λIdV − LK)L| = |λIdW −KL|.

Lemma 13.3.4 states that if we fix the sets X,D,E, X, D, E, then the set of pairs(g, g) giving a commutativity configuration is open. This is simple and unsurprising,but without the spadework we did in Chapter 5 the proof would be a tedious slog.We extract one piece of the argument in order to be able to refer to it later.

Lemma 13.3.3. If X and X are metric spaces, E ⊂ D ⊂ X, and E ⊂ D ⊂ X,where D, D, E, and E are compact, then

(g, g) 7→ g ◦ g|E

is a continuous function from { g ∈ C(D, X) : g(E) ⊂ D } × C(D,X) to C(E,X).

Proof. Lemma 5.3.1 implies that the function g 7→ g|E is continuous, after whichProposition 5.3.6 implies that (g, g) 7→ g ◦ g|E is continuous.

Lemma 13.3.4. If X and X are metric spaces, E ⊂ D ⊂ X, E ⊂ D ⊂ X, andD, D, E, and E are open with compact closure, then the set of

(g, g) ∈ C(D, X)× C(D,X)

such that (X,D,E, g, X, D, E, g) is a commutativity configuration is an open subsetof C(D, X)× C(D,X).

Proof. Lemma 4.5.10 implies that the set of (g, g) such that g(E) ⊂ int D andg(E) ⊂ intD is an open subset of C(D, X) × C(D,X). The lemma above impliesthat the functions (g, g) 7→ g ◦ g|E and (g, g) 7→ g ◦ g|E are continuous, and Propo-sition 13.1.4 implies that the set of (g, g) satisfying part (c) of Definition 13.2.2 isopen. In view of the discussion in the last section, a pair (g, g) that satisfies (a)-(c)of Definition 13.2.2 will also satisfy (d) if and only if

g(FP(g ◦ g|E)) ⊂ int E and g(FP(g ◦ g|E)) ⊂ intE.

Since (g, g) 7→ g◦g|E and (g, g) 7→ g◦g|E are continuous, Theorem 5.2.1 and Lemma4.5.10 imply that the set of such pairs is open.


Proof of Theorem 13.3.1. Uniqueness and (I1)-(I3) follow from Proposition 13.1.5,so, we only need to prove that (I4) and (M) are satisfied.

Suppose that (Rm, D, E, g,Rm, D, E, g) is a commutativity configuration. Lemma13.3.4 states that it remains a commutativity configuration if g and g are replacedby functions in any sufficiently small neighborhood, and Lemma 13.3.3 implies thatg ◦ g|E and g ◦ g|E are continuous functions of (g, g), so, since we already know that(I3) holds, it suffices to prove the equation of (I4) after such a replacement. Sincethe smooth functions are dense in C(D,Rm) and C(D,Rm) (Proposition 10.2.7) wemay assume that g and g are smooth. In addition, Sard’s theorem implies thatthe regular values of IdE − g ◦ g|E are dense, so after perturbing g by adding anarbitrarily small constant, we can make it the case that 0 is a regular value. In thesame way we can add a small constant to g to make 0 a regular value of IdE−g◦ g|E ,and if the constant is small enough it will still be the case that 0 is a regular valueof IdE − g ◦ g|E.

Let x1, . . . , xr be the fixed points of g ◦ g|E, and for each i let xi = g(xi). Thenx1, . . . , xr are the fixed points of g◦g|E. Let D1, . . . , Dr be pairwise disjoint compact

subsets of E with xi ∈ intDi, and let D1, . . . , Dr be pairwise disjoint open subsetsof E with xi ∈ int Di. For each i let Ei be a compact subset of g−1(int Di) withxi ∈ intEi, and let Ei be a compact subset of g−1(intDi) with xi ∈ int Ei. It is easyto check that each (Rm, Di, Ei, gi,R

m, Di, Ei, gi) is a commutativity configuration.Recalling the relationship between the index and the degree, we have

ΛRm(g ◦ g|Ei) = ΛRm(g ◦ g|Ei

)

because Proposition 13.3.2 gives

|I −D(g ◦ g)(xi)| = |I −Dg(xi)Dg(xi)| = |I −Dg(xi)Dg(xi)| = |I −D(g ◦ g)(xi)|.

Applying Additivity to sum over i gives the equality asserted by (I4).Turning to (M), suppose that C ⊂ Rm and C ′ ⊂ Rm′

are compact and f : C →Rm and f ′ : C ′ → Rm′

are index admissible. Then Proposition 12.4.3 gives

ΛRm+m′ (f × f ′) = deg(0,0)(IdC×C′ − f × f ′) = deg(0,0)((IdC − f)× (IdC′ − f ′))

= deg0(IdC − f) · deg0(IdC′ − f ′) = ΛRm(f) · ΛRm′ (f ′).

13.4 Extension by Commutativity

The extension of the fixed point index to absolute neighborhood retracts wasfirst achieved by Felix Browder in his Ph.D. thesis Browder (1948), using the exten-sion method described in this section. This extension method is, perhaps, the mostimportant application of Commutativity, but Commutativity is also sometimes use-ful in applications of the index, which should not be particularly surprising sincethe underlying fact of linear algebra it embodies (Proposition 13.3.2) is alreadynontrivial.

13.4. EXTENSION BY COMMUTATIVITY 183

Throughout this section we will work with two fixed index scopes S and S. Wesay that S subsumes S if, for every X ∈ SS , we have X ∈ SS and IS(X) ⊂ IS(X).If this is the case, and Λ is an index for S, then the restriction (in the obvious sense)of Λ to S is an index for S. (It is easy to check that this is an automatic consequenceof the definition of an index.) If Λ is an index for S, then an extension to S is anindex for S whose restriction to S is Λ.

If f : C → X is in IS(X), a narrowing of focus for f is a pair (D,E) ofcompact subsets of intC such that

FP(f) ⊂ intE, E ∪ f(E) ⊂ intD, and D ∪ f(D) ⊂ intC.

For such a pair let ε(D,E) be the minimum of:

• d(

E ∪ f(E), X \ (intD))

,

• d(

D ∪ f(D), X \ (intC))

;

• the supremum of the set of ε > 0 such that d(x′, f(x′)) > 2ε whenever x ∈C \ (intE), x′ ∈ C, and d(x, x′) < ε,

where d is the given metric for X . (Of course X has many metrics that give thesame topology. In contexts such as this we will implicitly assume that one has beenselected.)

Since f is continuous and admissible, narrowings of focus for f exist: continuityimplies the existence of an open neighborhood V of FP(f) satisfying V ∪ f(V ) ⊂intC. Repeating this observation gives an open neighborhood W of FP(f) satis-fying W ∪ f(W ) ⊂ V , and we can let D = V and E =W .

Let C be a compact subset of a metric space X . An (S, ε)-domination of C isa quadruple (X, C, ϕ, ψ) in which X ∈ SS , C is an compact subset of X , ϕ : C → C

and ψ : C → X are continuous functions, and ψ ◦ ϕ is ε-homotopic to IdC . We saythat S dominates S if, for each X ∈ SS , each compact C ⊂ X , and each ε > 0,there is an (S, ε)-domination of C. This section’s main result is:

Theorem 13.4.1. If S dominates S and Λ is an index for S, then there is an indexΛ for S that is defined by setting

ΛX(f) = ΛX(ϕ ◦ f ◦ ψ|ψ−1(E)) (†)

whenever X ∈ SS , f : C → X is an element of IS(X), (D,E) is a narrowing offocus for f , ε < ε(D,E), and (X, C, ϕ, ψ) is an (S, ε)-domination of C. If, in addi-

tion, S subsumes S, then Λ is the unique extension of Λ to S. If Λ is multiplicative,then so is Λ.

Let SSANR be the class of compact absolute neighborhood retracts, and for eachX ∈ SSANR let ISANR(X) be the union over open C ⊂ X of the sets of indexadmissible functions in C(C,X). These definitions specify an index scope SANR

because SSANR is closed under formation of finite cartesian products, and f × f ′ ∈ISANR(X × X) whenever X, X ∈ SSANR, f ∈ ISANR(X), and f ′ ∈ ISANR(X).


Theorem 13.4.2. There is a unique index ΛANR for SANR that extends Λ0, andΛANR is multiplicative.

Proof. Theorem 7.6.4 implies that S0 dominates SANR, and SANR evidently sub-sumes S0.

The rest of this section is devoted to the proof of Theorem 13.4.1. Beforeproceeding, the reader should be warned that this is, perhaps, the most difficultargument in this book. Certainly it is the most cumbersome, from the point ofview of the burden of notation, because the set up used to extend the index iscomplex, and then several verifications are required in that setting. To make theexpressions somewhat more compact, from this point forward we will frequentlydrop the symbol for composition, for instance writing ψϕ rather than ψ ◦ ϕ.Lemma 13.4.3. Suppose X ∈ SS , f : C → X is in IS(X), (D,E) is a narrowing

of focus for f , 0 < ε < ε(D,E), and (X, C, ϕ, ψ) is an (S, ε)-domination of U . Let

D = ψ−1(D) and E = ψ−1(E). Then

(X,D,E, ϕf |D, X, D, E, ψ|D)is a commutativity configuration.

Proof. We need to verify (a)-(d) of Definition 13.2.2. We have E ⊂ D ⊂ X withD and E compact, so E ⊂ D ⊂ X. In addition D and E are closed because ψ iscontinuous, so they are compact because they are subsets of C. Thus (a) holds.

Of course ϕf |D and ψ|D are continuous. We have

ψ(ϕ(f(E))) ⊂ Uε(f(E)) ⊂ intD,

so ϕ(f(E)) ⊂ ψ−1(intD) ⊂ int D. In addition, ψ(E) ⊂ E ⊂ intD. Thus (b) holds.If x ∈ D \ int (E), then d(x, f(x)) > 2ε(D,E) > 2ε and d(f(x), ψ(ϕ(f(x)))) < ε,

so x cannot be a fixed point of ψϕf . Thus FP(ψϕf |E) ⊂ intE. If x ∈ D is a fixedpoint of ϕfψ|D, then ψ(x) is a fixed point of ψϕf |D, so FP(ϕfψ|D) ⊂ ψ−1(intE) ⊂int E. Thus (c) holds.

We now establish (∗) and (∗∗). We have

ψ(ϕ(f(FP(ψϕf |D)))) = FP(ψϕf |D) ⊂ intE,

soϕ(f(FP(ψϕf |D))) ⊂ ψ−1(intE) ⊂ E,

and FP(ϕfψ|E) ⊂ int E, so

ψ(FP(ϕfψ|D)) ⊂ ψ(int E) ⊂ E.

Thus (d) holds.

From this point forward we assume that there is a given index Λ for S. Inorder for our proposed definition of Λ to be workable it needs to be the case thatthe definition of the derived index does not depend on the choice of narrowing ordomination, and it turns out that proving this will be a substantial part of theoverall effort. The argument is divided into a harder part proving a special caseand a reduction of the general case to this.


Lemma 13.4.4. Let X be an element of SS , and let f : C → X be an elementof IS(X). Suppose that (D,E) is a narrowing of focus for f , 0 < ε1, ε2 < ε(D,E),

and (X1, C1, ϕ1, ψ1) and (X2, C2, ϕ2, ψ2) are an (S, ε1)-domination and an (S, ε2)-domination of C. Set

D1 = ψ−11 (D), E1 = ψ−1

1 (E), D2 = ψ−12 (D), E2 = ψ−1

2 (E).

ThenΛX1

(ϕ1fψ1|E1) = ΛX2

(ϕ2fψ2|E2).

Proof. The definition of domination gives an ε-homotopy h : C × [0, 1] → X withh0 = IdC and h1 = ψ1ϕ1 and a ε-homotopy j : C × [0, 1] → X be an ε-homotopywith j0 = IdC and j1 = ψ2ϕ2. We will show that:

(a) the homotopy t 7→ ϕ1jtfψ1|E1is well defined and index admissible;

(b) the homotopy t 7→ ϕ2fhtψ2|E2is well defined and index admissible;

(c) (X1, D1, E1, ϕ2fψ1, X2, D2, E2, ϕ1ψ2) is a commutativity configuration.

The claim follows from the computation

ΛX1(ϕ1fψ1|E1

) = ΛX1(ϕ1ψ2ϕ2fψ1|E1

) = ΛX2(ϕ2fψ1ϕ1ψ2|E2

) = ΛX2(ϕ2fψ2|E2

).

Specifically, in view of (a) and (b) the first and third equalities follows from thehomotopy principle, while (c) permits an application of Commutativity that givesthe second equality.

For each t the composition ϕ1jtfψ1|E1is well defined because

ψ1(E1) ⊂ E and jt(f(E)) ⊂ Uε(f(E)) ⊂ D ⊂ C.

In order to show that this homotopy is index admissible, we assume (aiming at acontradiction) that for some 0 ≤ t ≤ 1, y1 ∈ ∂E1 is a fixed point of ϕ1jtfψ1. Thenψ1(y1) is a fixed point of ψ1ϕ1jtf . The definition of E1 and the continuity of ψ1

imply that ψ1(y1) ∈ ∂E, so that d(

fψ1(y1), ψ1(y1))

≥ 2ε(D,E) > 2ε, but

d(

ψ1ϕ1jtfψ1(y1), fψ1(y1))

= d(

h1jtfψ1(y1), fψ1(y1))

≤ d(

h1jtfψ1(y1), jtfψ1(y1))

+ d(

jtfψ1(y1), fψ1(y1))

< 2ε,

so this is impossible. We have established (a) and (by symmetry) (b).To establish (c) we need to verify (a)-(d) of Definition 13.2.2. Evidently E1 ⊂

D1 ⊂ X1 and E2 ⊂ D2 ⊂ X2 with D1, E1, D2, and E2 compact, so (a) holds.We have f(ψ1(D1)) ⊂ f(D) ⊂ C, so ψ1(D1) is contained in the domain of ϕ2,

and ψ2(D2) ⊂ D ⊂ C, so ψ2(D2) is contained in the domain of ϕ1. Thus ϕ2fψ1

and ϕ1ψ2 are well defined, and of course they are continuous. In addition,

ψ2ϕ2fψ1(E1) ⊂ ψ2ϕ2f(E) ⊂ Uε(f(E)) ⊂ intD

andψ1ϕ1ψ2(E2) ⊂ Uε(ψ2(E2)) ⊂ Uε(E) ⊂ intD,


so

ϕ2fψ1(E1) ⊂ ψ−12 (D) ⊂ int D2 and ϕ1(ψ2(E2)) ⊂ ψ−1

1 (intD) = int D1.

Thus (b) holds.Above we showed that

ϕ1ψ2ϕ2fψ1|E1= ϕ1j1fψ1|E1

and ϕ2fψ1ϕ1ψ2|E2= ϕ2fh1ψ2|E2

are index admissible. That is, (c) holds.Suppose that y1 ∈ FP(ϕ1ψ2ϕ2fψ1|E1

) and y2 = ϕ2fψ1(y1). Then ψ2(y2) is afixed point of ψ2ϕ2fψ1ϕ1. The definition of ε(D,E) implies that this is impossible

if ψ2(y2) /∈ E, so y2 ∈ ψ−12 (E) = E2. Now suppose that y2 ∈ FP(ϕ2fψ1ϕ1ψ2|E2

)and y1 = ϕ1ψ2(y2). Then ψ1(y1) is a fixed point of ψ1ϕ1ψ2ϕ2f , so ψ1(y1) ∈ E andy1 ∈ E1. We have shown that

ϕ2fψ1(FP(ϕ1ψ2ϕ2fψ1|E1)) ⊂ E2 and ϕ1ψ2(FP(ϕ2fψ1ϕ1ψ2|E2

)) ⊂ E1,

which is to say that (∗) and (∗∗) hold, which implies (d), completing the proof.

The hypotheses of the next result are mostly somewhat more general, but wenow need to assume that S dominates S.

Lemma 13.4.5. Assume that S dominates S. Let X be an element of SS , and letf : C → X be an element of IS(X). Suppose (D1, E1) and (D2, E2) are narrowingsof focus for f , 0 < ε1 < ε(D1,E1), 0 < ε2 < ε(D2,E2), and (X1, C1, ϕ1, ψ1) and

(X2, C2, ϕ2, ψ2) are an (S, ε1)-domination and an (S, ε2)-domination of C. Set

D1 = ψ−11 (D1), E1 = ψ−1

1 (E1), D2 = ψ−12 (D2), E2 = ψ−1

2 (E2).

ThenΛX1

(ϕ1fψ1|E1) = ΛX2

(ϕ2fψ2|E2).

Proof. It suffices to show this when D1 ⊂ D2 and E1 ⊂ E2, because then thegeneral case follows from two applications in which first D1 and E1, and then D2

and E2, are replaced by D1 ∩ D2 and E1 and E2 with E1 ∩ E2. The assumptionthat S dominates S which guarantees the existence of an (S, ε′2) domination of Ufor arbitrarily small ε′2, and if we apply the lemma above to this domination andthe given one we find that it suffices to prove the result with the given dominationreplaced by this one. This means that we may assume that ε2 is as small as needbe, and in particular we may assume that ε2 < ε(D1,E1). Now Additivity impliesthat

ΛX2(ϕ2fψ2|E2

) = ΛX2(ϕ2f ◦ ψ2|ψ−1

2 (E1)),

which means that it suffices to establish the result with D2 and E2 replaced by D1

and E1, which is the case established in the lemma above.

Proof of Theorem 13.4.1. Since S dominates S, the objects used to define Λ exist,and the last result implies that the definition of Λ does not depend on the choiceof (D,E), ε, and (X, C, ϕ, ψ). We now verify that Λ satisfies (I1)-(I4) and (M).


For the proofs of (I1)-(I3) we fix a particular X ∈ SS and an f : C → X inIS(X), and we let (D,E), ε, and (X, C, ϕ, ψ) be as in the hypotheses.

Normalization:

If f is a constant function, then so is ϕfψ, so Normalization for Λ gives

ΛX(f) = ΛX(ϕfψ) = 1.

Additivity:

Suppose that FP(f) ⊂ intC1 ∪ . . . ∪ intCr where C1, . . . , Cr ⊂ C are compactand pairwise disjoint. For each j = 1, . . . , r choose open sets Dj ⊂ D ∩ Cj andEj ⊂ E ∩ Cj such that (Dj, Ej) is a narrowing of focus for (Cj , f |Cj

). In view ofLemma 13.4.5 we may assume that ε < ε(Dj ,Ej) for all j. It is easy to see that for

each j, (X, C, ϕ|Cj, ψ) is an (S, ε)-domination of Cj. For each j let E ′

j = ψ−1(Ej).

Additivity for Λ gives

ΛX(f) = ΛX(ϕfψ|E) =∑

j

ΛX(ϕfψ|E′

j) =

∑

j

ΛX(f |Cj).

Continuity:

It is easy to see that if f ′ : C → X that are sufficiently close to f , then (D,E) isa narrowing of focus for (C, f ′), and (X, C, ϕ, ψ) is a (S, ε)-domination of C. Sincef ′ 7→ ϕf ′ψ is continuous (Propositions 5.5.2 and 5.5.3) Continuity for Λ gives

ΛX(f) = ΛX(ϕfψ) = ΛX(ϕf′ψ) = ΛX(f

′)

when f ′ is sufficiently close to f .

Commutativity:

Suppose that (X1, C1, D1, g1, X2, C2, D2, g2) is a commutativity configurationwith X1, X2 ∈ SS . Replacing D1 and D2 with smaller open neighborhoods ofFP(g2g1) and FP(g1g2) if need be, we may assume that

D1 ∪ g2g1(D1) ⊂ C1 and D2 ∪ g1g2(D2) ⊂ C2.

Choose open sets E1 and E2 with

FP(g2g1) ⊂ E1, E1 ∪ g2g1(E1) ⊂ D1, FP(g1g2) ⊂ E2, E2 ∪ g1g2(E2) ⊂ D2.

For any positive ε1 < ε(D1,E2) and ε2 < ε(D2,E2) Lemma 13.4.5 implies that there is

a (S, ε1)-domination (X1, C1, ϕ1, ψ1) of C1 and a (S, ε2)-domination (X2, C2, ϕ2, ψ2)of C2. Let

D1 = ψ−11 (D1), E1 = ψ−1

1 (E1), D2 = ψ−12 (D2), E2 = ψ−1

2 (E2).

Let h : C1 × [0, 1] → X1 be a ε1-homotopy with h0 = IdC1 and h1 = ψ1ϕ1, and letj : C2 × [0, 1] → X2 be a ε2-homotopy with j0 = IdC2 and j1 = ψ2ϕ2. The desiredresult will follow from the calculation

ΛX1(g2g1) = ΛX1(ϕ1g2g1ψ1|E1

) = ΛX1(ϕ1g2ψ2ϕ2g1ψ1|E1

)


= ΛX2(ϕ2g1ψ1ϕ1g2ψ2|E2

) = ΛX2(ϕ2g1g2ψ2|E2

) = ΛX2(g1g2).

Here the first and fifth equality are from the definition of Λ, the second and fourthare implied by Continuity for Λ, and the third is from Commutativity for Λ. Inorder for this to work it must be the case that all the compositions in this calculationare well defined, in the sense that the image of the first function is contained in thedomain of the second function, the homotopies

t 7→ ϕ1g2jtg1ψ1|E1and t 7→ ϕ2g1htg2ψ2|E2

are index admissible, and

(X1, D1, E1, ϕ2g1ψ1|D1, X2, D2, E2, ϕ1g2ψ2|D2)

is a commutativity configuration. Clearly this will be the case when ε1 and ε2 aresufficiently small.

Multiplication:

We now consider X1, X2 ∈ SS , f1 : C1 → X1 in IS(X1), and f2 : C2 → X2 inIS(X2). For each i = 1, 2 let (Di, Ei) be a narrowing of focus for (Ci, fi), and let(Xi, Ci, ϕi, ψi) be an (S, εi)-domination of C, where ε < ε(Di,Ei). The definition ofan index scope requires that X1 ×X2 ∈ SS and (C1 × C2, f1 × f2) ∈ IS(X1 ×X2).Clearly (D1 ×D2, E1 ×E2) is a narrowing of focus for (C1 ×C2, f1 × f2). If d1 andd2 are given metrics for X1 and X2 respectively, endow X1 ×X2 with the metric

(

(x1, x2), (y1, y2))

7→ max{d1(x1, y1), d2(x2, y2)}.

Let ε = max{ε1, ε2}. Then (X1×X2, C1×C2, ϕ1×ϕ2, ψ1×ψ2) is a (S, ε)-dominationof C1 ×C2. It is also easy to check that ε(D1×D2,E1×E2) ≥ max{ε(D1,E1), ε(D2,E2)}, soε < ε(D1×D2,E1×E2). Therefore Lemma 13.4.3 implies that the validity of the firstequality in

ΛX1×X2(f1 × f2) = ΛX1×X2((ϕ1f1ψ1 × ϕ2f2ψ2)|E1×E2

)

= ΛX1(ϕ1f1ψ1|E1

) · ΛX2(ϕ2f2ψ2|E2

) = ΛX1(f1) · ΛX2(f2)

the second one is an application of Multiplication for Λ and the third is the definitionof Λ.

We now prove that if S subsumes S, then Λ is the unique extension of Λ toS. Consider X ∈ SS and (C, f) ∈ IS(X). For any ε > 0, (X,C, IdC , IdC) is

an (S, ε)-domination of C. For any narrowing of focus (D,E) equation (†) givesΛX(f) = ΛX(f |E) and Additivity for Λ gives ΛX(f |E) = ΛX(f). Thus Λ extendsΛ.

Two indices for S that restrict to Λ necessarily agree everywhere because, byContinuity and Commutativity, (†) holds in the circumstances described in thestatement of Theorem 13.4.1.

13.5. EXTENSION BY CONTINUITY 189

13.5 Extension by Continuity

This section extends the index from continuous functions to upper semicontin-uous contractible valued correspondences. As in the last section, we describe theextension process abstractly, thereby emphasizing the aspects of the situation thatdrive the argument.

Definition 13.5.1. If I and I are index bases for a compact metric space X, wesay that I approximates I if:

(E1) If C,D ⊂ X are open with D ⊂ C, then I ∩ C(D,X) is dense in

{F |D : F ∈ I ∩ U(C,X) and F |D is index admissable }.

(E2) If C,D ⊂ X are open with D ⊂ C, F ∈ I ∩ U(C,X), and A ⊂ C × X is aneighborhood of Gr(F ), then there is a neighborhood B ⊂ D ×X of Gr(F |D)such that any two functions f, f ′ ∈ C(D,X) with Gr(f),Gr(f ′) ⊂ B are theendpoints of a homotopy h : [0, 1] → C(D,X) with Gr(ht) ⊂ A for all t.

It would be simpler if, in (E1) and (E2), we could have V = U , but unfortunatelyTheorem 9.1.1 is not strong enough to justify working with such a definition.

Definition 13.5.2. If S and S are two index scopes with SS = SS, then S ap-

proximates S) if, for each X ∈ SS , IS(X) approximates IS(X), and

(E3) If (X,C,D, g,X ′, C ′, D′, g′) is a commutativity configuration such that X,X ′ ∈SS , g′ ◦g ∈ IS(X), and g ◦g′ ∈ IS(X ′), and S ⊂ C(C,X ′) and S ′ ⊂ C(C ′, X)are neighborhoods of g and g′, then there exist γ ∈ S and γ′ ∈ S ′ such thatγ′ ◦ γ|D ∈ IS(X) and γ ◦ γ′|D′ ∈ IS(X

′).

Theorem 13.5.3. Suppose that S approximates S, and Λ is an index for S. Foreach X ∈ SS let ΛX be the extension of ΛX to IS(X) given by the last result. Then

the system Λ of maps ΛX is an index for S. If, in addition, S subsumes S, then Λis the unique extension of Λ to S.

Evidently SCtr subsumes SANR. The constant functions in SANR and SCtr arethe same, of course, and Theorem 9.1.1 implies that (E1) and (E2) are satisfiedwhen S = SCtr and S = SANR. Therefore Theorem 13.2.4 follows from Theorem13.4.2 and the last result.

The remainder of this section is devoted to the proof of Theorem 13.5.3. Theoverall structure of our work here is similar to what we saw in the last section. Weare given an index Λ for an index base I, and we wish to use this to define anindex for another index base I. In this case Continuity is the axiom that does theheavy lifting. Assumption (E1) states that every element of the second base can beapproximated by an element of the first base. Therefore we can define the indexof an element of the second base to be the index of sufficiently fine approximationsby elements of the first base, provided these all agree, and assumption (E2), inconjunction with Continuity, implies that this is the case.

Having defined the index on the second base, we must verify that it satisfies theaxioms. This phase is broken down into two parts. The following result verifiesthat the axioms pertaining to a single index base hold.


Proposition 13.5.4. Suppose I and I are index bases for a compact metric spaceX, and I approximates I. Then for any index ΛX : I → Z there is a uniqueindex ΛX : I → Z such that for each open C ⊂ X with compact closure, each F ∈I∩U(C,X), and each open D with FP(F ) ⊂ D and D ⊂ C, there is a neighborhoodE ⊂ U(D,X) of F |D such that ΛX(F ) = ΛX(f) for all f ∈ E ∩ C(D,X) ∩ I.Proof. Fix C, F ∈ I ∩ U(C,X), and D as in the hypotheses. Then F |D is indexadmissable, hence an element of I because I is an index base.

Applying (E2), let B ⊂ D × X be a neighborhood of Gr(F |D) such that forany f, f ′ ∈ I ∩ C(D,X) with Gr(f),Gr(f ′) ⊂ B there is a homotopy h : [0, 1] →C(D,X) with h0 = f , h1 = f ′, and

Gr(ht) ⊂ (C ×X) \ { (x, x) : x ∈ C \D }

for all t. Since F has no fixed points in C \D, the right hand side is a neighborhoodof Gr(FD). We define ΛX by setting

ΛX(F ) := ΛX(f)

for any such f .We first have to show that this definition makes sense. First, (E1) implies that

{F ′ ∈ U(D,X) : Gr(F ′) ⊂ B } ∩ C(D,X) ∩ I 6= ∅,

and Continuity implies that this definition does not depend on the choice of f .Since A and B can be replaced by smaller open sets, it does not depend on thechoice of A and B. We must also show that it does not depend on the choice of D.

So, let D be another open set with D ⊂ C and FP(F ) ∩ (C \ D) = ∅. ThenFP(F ) ⊂ D ∩ D. The desired result follows if we can show that it holds when Dand D are replaced by D and D∩ D and also when D and D are replaced by D∩ Dand D. Therefore we may assume that D ⊂ D.

Let B ⊂ D × X be a neighborhood of Gr(F |D) such that for any f, f ′ ∈ I ∩C(D,X) with Gr(f),Gr(f ′) ⊂ B there is a homotopy h : [0, 1] → C(D,X) withh0 = f , h1 = f ′, and

Gr(ht) ⊂ (C ×X) \ { (x, x) : x ∈ C \ D }

for all t. Since restriction to a compact subdomain is a continuous operation(Lemma 5.3.1) we may replace B with a smaller neighborhood of F |D to obtainGr(f |D) ⊂ B′ whenever Gr(f) ⊂ B. For such an f Additivity gives ΛX(f) =

ΛX(f |D) as desired.It remains to show that (I1)-(I3) are satisfied.

Normalization:

If c is a constant function, we can take c itself as the approximation used todefine ΛX(c), so Normalization for ΛX follows from Normalization for ΛX .

Additivity:

Consider F ∈ I with domain C. Let C1, . . . , Cr be disjoint open subsets of Cwhose union contains FP(F ). Let D1, . . . , Dr be open subsets of C with D1 ⊂

13.5. EXTENSION BY CONTINUITY 191

C1, . . . , Dr ⊂ Cr and FP(F ) ⊂ D1, . . . , Dr. For each i = 1, . . . , r let Bi be aneighborhood of Gr(F |Di

) such that ΛX(F |Ci) = ΛX(fi) whenever fi ∈ I∩C(Di, X)

with Gr(fi) ⊂ Bi. Let D := D1 ∪ . . . ∪ Dr, and let B be a neighborhood of F |Dsuch that ΛX(F |C) = ΛX(f) whenever f ∈ I ∩ C(D,X) with Gr(f) ⊂ B. Sincerestriction to a compact subdomain is a continuous operation (Lemma 5.3.1) Bmay be chosen so that, for all i, Gr(f |Di

) ⊂ Bi whenever Gr(f) ⊂ B. For anyf ∈ I ∩ C(D,X) with Gr(f) ⊂ B we now have

ΛX(F ) = ΛX(f |D) =∑

i

ΛX(f |Di) =

∑

i

ΛX(F |Ci).

Continuity: Suppose that C ⊂ X is open with compact closure, D ⊂ C is open

with D ⊂ C, F ∈ I with FP(F ) ⊂ D, and B is a neighborhood of Gr(F |D)with ΛX(F ) = ΛX(f) for all f ∈ I ∩ C(D,X) with Gr(f) ⊂ B. Then the set ofF ′ ∈ I ∩ U(C,X) such that F ′|D ∈ B and FP(F ′) ⊂ D is a neighborhood of F ,and for every such F ′ we have ΛX(F

′) = ΛX(F ).

The remainder of the argument shows that the extension procedure describedabove results in an index satisfying (I4) and (M) when it is used to define extensionsfor all spaces in an index scope.

Proof of Theorem 13.5.3. We begin by noting that when S subsumes S, Continuityfor Λ implies both that Λ is an extension of Λ and that any extension must satisfythe condition used to define Λ, so Λ is the unique extension. In view of the lastresult, it is only necessary to verify that Λ satisfies (I4) and (M). The argument isbased on the description of Λ given in the first paragraph of the proof of the lastresult.

Commutativity:

Suppose that (X,C,D, g,X ′, C ′, D′, g) is a commutativity configuration withg′ ◦ g ∈ IS(X) and g ◦ g′ ∈ IS(X ′). Lemma 13.3.4 implies that there are neigh-borhoods S ⊂ C(C,X ′) and S ′ ⊂ C(C ′, X) such that (X,C,D, γ,X ′, C ′, D′, γ′)is a commutativity configuration for all γ ∈ S and γ′ ∈ S ′. Let B ⊂ C(D,X)and B′ ⊂ C(D′, X ′) be neighborhoods of g′ ◦ g|D and g ◦ g′|D′, respectively, suchthat Λ(g′ ◦ g|D) = Λ(f) and Λ(g ◦ g′|D′′) = Λ(f ′) whenever f ∈ B ∩ IS(X) andf ′ ∈ C ∩ IS(X

′). The continuity of restriction and composition (Lemma 5.3.1 andProposition 5.3.6) implies the existence of neighborhoods T ⊂ C(C,X ′) of g andT ′ ⊂ C(D′, X) of g′ such that γ′ ◦ γ|D ∈ B and γ ◦ γ′|D′ ∈ B′ whenever γ ∈ Tand γ′ ∈ T ′. Applying (E3), we may choose γ ∈ S ∩ T and γ′ ∈ S ′ ∩ T ′ such thatγ′ ◦ γ|D ∈ IS(X) and γ ◦ γ′|D′ ∈ IS(X

′). Now Commutativity for Λ gives

ΛX(g′ ◦ g|D) = ΛX(γ

′ ◦ γ|D) = ΛX′(γ ◦ γ′|D′) = ΛX′(g ◦ g′|D′).

Multiplication:

For spaces X,X ′ ∈ SS and open C ⊂ X and C ′ ⊂ X ′ with compact closureconsider

F ∈ IS(X) ∩ U(C,X) and F ′ ∈ IS(X′) ∩ U(C ′, X ′).


Then the definition of an index scope implies that F×F ′ ∈ IS(X×X ′). Choose opensets D and D′ with FP(F ) ⊂ D, D ⊂ C, FP(F ′) ⊂ D′, and D′ ⊂ C ′. As above, wecan find neighborhoods B ⊂ U(D,X), B′ ⊂ U(D′, X ′), and D ⊂ U(D×D′, X×X ′),of F |D, F ′|D′, and (F × F ′)|D×D′ respectively, such that ΛX(F ) = ΛX(f) for allf ∈ B ∩ IS(X), ΛX′(F ′) = ΛX′(f ′) for all f ′ ∈ B′ ∩ IS(X

′), and ΛX×X′(F × F ′) =

ΛX×X′(j) for all j ∈ D ∩ IS(X × X ′). Since the formation of cartesian productsof correspondences is a continuous operation (this is Lemma 5.3.4) we may replaceB and B′ with smaller neighborhoods to obtain F × F ′ ∈ D for all F ∈ B andF ′ ∈ B′. Assumption (E1) implies that there are

f ∈ B ∩ IS(X) ∩ C(D,X) and f ′ ∈ B′ ∩ IS(X′) ∩ C(D′, X ′).

The definition of an index scope implies that f×f ′ ∈ IS(X×X ′), and Multiplication

(I4) for Λ now gives

ΛX×X′(F × F ′) = ΛX×X′(f × f ′) = ΛX(f) · ΛX′(f ′) = ΛX(F ) · ΛX′(F ′).

Part III

Applications and Extensions

193

Chapter 14

Topological Consequences

This chapter is a relaxing and refreshing change of pace. Instead of working veryhard to slowly build up a toolbox of techniques and specific facts, we are going toharvest the fruits of our earlier efforts, using the axiomatic description of the fixedpoint index, and other major results, to quickly derive a number of quite famousresults. In Section 14.1 we define the Euler characteristic, relate it to the Lefschetzfixed point theorem, and then describe the Eilenberg-Montgomery as a special case.

For two general compact manifolds, the degree of a map from one to the otheris a rather crude invariant, in comparison with many others that topologists havedefined. Nevertheless, when the range is the m-dimensional sphere, the degreeis already a “complete” invariant in the sense that it classifies functions up tohomotopy: if M is a compact m-dimensional manifold that is connected, and fand f ′ are functions from M to the m-sphere of the same degree, then f and f ′

are homotopic. This famous theorem, due to Hopf, is the subject of Section 14.2.Section 12.4 proves a simple result asserting that the degree of a composition of twofunctions is the products of their degrees.

Section 14.3 presents several other results concerning fixed points and antipodalmaps of a map from a sphere to itself. Some of these are immediate consequencesof index theory and the Hopf theorem, but the Borsuk-Ulam theorem requires asubstantial proof, so it should be thought of as a significant independent fact oftopology. It has many consequences, including the fact that spheres of differentdimensions are not homeomorphic.

In Section 14.4 we state and prove the theorem known as invariance of domain.It asserts that if U ⊂ Rm is open, and f : U → Rm is continuous and injective, thenthe image of f is open, and the inverse is continuous. One may think of this as apurely topological version of the inverse function theorem, but from the technicalpoint of view it is much deeper.

If a connected set of fixed points has a nonzero index, it is essential. This raisesthe question of whether a connected set of fixed points of index zero is necessarilyinessential. Section 14.5 presents two results of this sort.

194

14.1. EULER, LEFSCHETZ, AND EILENBERG-MONTGOMERY 195

14.1 Euler, Lefschetz, and Eilenberg-Montgomery

The definition of the Euler characteristic, and Euler’s use of it in the analysesof various problems, is often described as the historical starting point of topologyas a branch of mathematics. In popular expositions the Euler characteristic of a 2-dimensional manifoldM is usually defined by the formula χ(M) := V −E+F whereV , E, and F are the numbers of vertices, edges, and 2-simplices in a triangulationof M . Our definition is:

Definition 14.1.1. The Euler characteristic χ(X) of a compact ANR X isΛX(IdX).

Here is a sketch of a proof that our definition of χ(M) agrees with Euler’swhen M is a triangulated compact 2-manifold. We deform the identity functionslightly, achieving a function f : M → M defined as follows. Each vertex of thetriangulation is mapped to itself by f . Each barycenter of an edge is mapped toitself, and the points on the edge between the barycenter and either of the verticesof the edge are moved toward the barycenter. Each barycenter of a two dimensionalsimplex is mapped to itself. If x is a point on the boundary of the 2-simplex, theline segment between x and the barycenter is mapped to the line segment betweenf(x) and the barycenter, with points on the interior of the line segment pushedtoward the barycenter, relative to the affine mapping. It is easy to see that the onlyfixed points of f are the vertices and the barycenters of the edges and 2-simplices.Euler’s formula follows once we show that the index of a vertex is +1, the index ofthe barycenter of an edge is −1, and the index of the barycenter of a 2-simplex is +1.We will not give a detailed argument to this effect; very roughly it corresponds tothe intuition that f is “expansive” at each vertex, “compressive” at the barycenterof each 2-simplex, and expansive in one direction and compressive in another at thebarycenter of an edge.

Although Euler could not have expressed the idea in modern language, he cer-tainly understood that the Euler characteristic is important because it is a topo-logical invariant.

Theorem 14.1.2. If X and X ′ are homeomorphic compact ANR’s, then

χ(X) = χ(X ′).

Proof. For any homeomorphism h : X → X ′, Commutativity implies that

χ(X) = ΛX(IdX) = ΛX(IdX ◦ h−1 ◦ h) = ΛX′(h ◦ IdX ◦ h−1) = ΛX′(IdX′) = χ(X ′).

The analytic method implicit in Euler’s definition—pass from a topological space(e.g., a compact surface) to a discrete object (in this case a triangulation) thatcan be analyzed combinatorically and quantitatively—has of course been extremelyfruitful. But as a method of proving that the Euler characteristic is a topologi-cal invariant, it fails in a spectacular manner. There is first of all the question of

196 CHAPTER 14. TOPOLOGICAL CONSEQUENCES

whether a triangulation exists. That a two dimensional compact manifold is trian-gulable was not proved until the 1920’s, by Rado. In the 1950’s Bing and Moiseproved that compact three dimensional manifolds are triangulable, and a streamof research during this same general period showed that smooth manifolds are tri-angulable, but in general a compact manifold need not have a triangulation. Forsimplicial complexes topological invariance would follow from invariance under sub-division, which can be proved combinatorically, and the Hauptvermutung, whichwas the conjecture that any two simplicial complexes that are homeomorphic havesubdivisions that are combinatorically isomorphic. This conjecture was formulatedby Steinitz and Tietze in 1908, but in 1961 Milnor presented a counterexample, andin the late 1960’s it was shown to be false even for triangulable manifolds.

The Lefschetz fixed point theorem is a generalization Brouwer’s theoremthat was developed by Lefschetz for compact manifolds in Lefschetz (1923, 1926)and extended by him to manifolds with boundary in Lefschetz (1927). Using quitedifferent methods, Hopf extended the result to simplicial complexes in Hopf (1928).

Definition 14.1.3. If X is a compact ANR and F : X → X is an upper semicon-tinuous contractible valued correspondence, the Lefschetz number of F is ΛX(F ).

Theorem 14.1.4. If X is a compact ANR, F : X → X is an upper semicontinuouscontractible valued correspondence and ΛX(F ) 6= 0, then FP(F ) 6= ∅.

Proof. When FP(F ) = ∅ two applications of Additivity give

Λ(F |∅) = Λ(F ) = Λ(F |∅) + Λ(F |∅).

In Lefschetz’ originally formulation the Lefschetz number of a function was de-fined using algebraic topology. Thus one may view the Lefschetz fixed point theoremas a combination of the result above and a formula expressing the Lefschetz numberin terms of homology.

In the Kakutani fixed point theorem, the hypothesis that the correspondence isconvex valued cries out for generalization, because convexity is not a topological con-cept that is preserved by homeomorphisms of the space. The Eilenberg-Montgomerytheorem asserts that if X is a compact acyclic ANR, and F : X → X is an uppersemicontinuous acyclic valued correspondence, then F has a fixed point. Unfor-tunately it would take many pages to define acyclicity, so we will simply say thatacyclicity is a property that is invariant under homeomorphism, and is weaker thancontractibility. The known examples of spaces that are acyclic but not contractibleare not objects one would expect to encounter “in nature,” so it seems farfetchedthat the additional strength of the Eilenberg-Montgomery theorem, beyond that ofthe result below, will ever figure in economic analysis.

Theorem 14.1.5. If X is a nonempty compact absolute retract and F : X → Xis an upper semicontinuous contractible valued correspondence, then F has a fixedpoint.

14.2. THE HOPF THEOREM 197

Proof. Recall (Proposition 7.5.3) that an absolute retract is an ANR that is con-tractible. Theorem 9.1.1 implies that F can be approximated in the sense ofContinuity by a continuous function, so ΛX(F ) = ΛX(f) for some continuousf : X → X . Let c : X × [0, 1] → X be a contraction. Then (x, t) 7→ c(f(x), t) (or(x, t) → f(c(x, t))) is a homotopy between f and a constant function, so Homotopy[fix this] and Normalization imply that ΛX(f) = 1. Now the claim follows from thelast result.

14.2 The Hopf Theorem

Two functions that are homotopic may differ in their quantitative features, butfrom the perspective of topology these differences are uninteresting. Two functionsthat are not homotopic differ in some qualitative way that one may hope to char-acterize in terms of discrete objects. A homotopy!invariant may be thought ofas a function whose domain is the set of homotopy classes; equivalently, it may bethought of as a mapping from a space of functions that is constant on each homo-topy class. A fundamental method of topology is to define and study homotopyinvariants.

The degree is an example: for compact manifolds M and N of the same di-mension it assigns an integer to each continuous f : M → N , and if f and f ′ arehomotopic, then they have the same degree. There are a great many other homo-topy invariants, whose systematic study is far beyond our scope. In the study ofsuch invariants, one is naturally interested in settings in which some invariant (orcollection of invariants) gives a complete classification, in the sense that if two func-tions are not homotopic, then the invariant assigns different values to them. Theprototypical result of this sort, due to Hopf, asserts that the degree is a completeinvariant when N is the m-sphere.

Theorem 14.2.1 (Hopf). If M is an m-dimensional compact connected smoothmanifold, then two maps f, f ′ : M → Sm are homotopic if and only if deg(f) =deg(f ′).

We provide a rather informal sketch of the proof. Since the ideas in the argumentare geometric, and easily visualized, this should be completely convincing, and littlewould be gained by adding more formal details of particular constructions.

We already know that two homotopic functions have the same degree, so ourgoal is to show that two functions of the same degree are homotopic. Consider aparticular f :M → Sm. The results of Section 10.7 imply that CS(M,Sm) is locallypath connected, and that C∞(M,Sm) is dense in this space, so f is homotopic to asmooth function. Suppose that f is smooth, and that q is a regular value of f . (Theexistence of such a q follows from Sard’s theorem.) The inverse function theoremimplies that if D is a sufficiently small disk in Sm centered at q, then f−1(D) is acollection of pairwise disjoint disks, each containing one element of f−1(q).

Let q− be the antipode of q in Sm. (This is −q when Sm is the unit spherecentered at the origin in Rm+1.) Let j : Sm × [0, 1] → Sm be a homotopy withj0 = IdSm that stretches D until it covers Sm, so that j1 maps the boundary of Dand everything outside D to q−. Then f = j0 ◦ f is homotopic to j1 ◦ f .


We have shown that the f we started with is homotopic to a function withthe following description: there are finitely many pairwise disjoint disks in M ,everything outside the interiors of these disks is mapped to q−, and each disk ismapped bijectively (except that all points in the boundary are mapped to q−) toSm. We shall leave the peculiarities of the case m = 1 to the reader: when m ≥ 2, itis visually obvious that homotopies can be used to move these disks around freely,so that two maps satisfying this description are homotopic if they have the samenumber of disks mapped onto Sm in an orientation preserving manner and the samenumber of disks in which the mapping is orientation reversing.

The final step in the argument is to show that a disk in which the orientationis positive and a disk in which the orientation is negative can be “cancelled,” sothat the map is homotopic to a map satisfying the dsecription above, but withone fewer disk of each type. Repeating this cancellation, we eventually arrive at amap in which the mapping is either orientation preserving in all disks or orientationreversing in all disks. Thus any map is homotopic to a map of this form, and any twosuch maps with the same number of disks of the same orientation are homotopic.Since the number of disks is the absolute value of the degree, and the maps areorientation preserving or orientation reversing according to whether the degree ispositive or negative, we conclude that maps of the same degree are homotopic.

For the cancellation step it is best to adopt a concrete model of the domain andrange. We will think of Sm as the unit disk Dm = { x ∈ Rm : ‖x‖ ≤ 1 } with theboundary points identified with a single point, which will continue to be denotedby q−. We will think of Rm as representing an open subset of M containing twodisks that are mapped with opposite orientation. Let e1 = (1, 0, . . . , 0) ∈ Rm. Aftersliding the disks around, expanding or contracting them, and revising the maps ontheir interiors, we can achieve the following specific f : Rm → Sm:

f(x) =

x− e1, ‖x− e1‖ < 1,

x− (−e1)− 2(〈x, e1〉 − 〈−e1, e1〉)e1, ‖x− (−e1)‖ < 1,

q−, otherwise.

Visually, f maps the unit disk centered at e1 to Sm preserving orientation, it maps

the unit disk centered at −e1 reversing orientation, and everything else goes to q−.We now have the following homotopy:

ht(x) =

x− (1− 2t)e1, ‖x− (1− 2t)e1‖ < 1 and x1 ≥ 0,

x− (1− 2t)e1 − 2〈x, e1〉e1, ‖x− (1− 2t)(−e1)‖ < 1 and x1 ≤ 0,

q−, otherwise.

Of course the first two expressions agree when x1 = 0, so this is well defined andcontinuous, and h1(x) = q− for all x.

In preparation for an application of the Hopf theorem, we introduce an importantconcept from topology. If X is a topological space and A ⊂ X , the pair (X,A)has the homotopy!extension property if, for any topological space Y and anyfunction g : (X × {0}) ∪ (A × [0, 1]) → Y , there is a homotopy h : X × [0, 1] → Ysuch that is an extension of g: h(x, 0) = g(x, 0) for all x ∈ X and h(x, t) = g(x, t)for all (x, t) ∈ A× [0, 1].

14.2. THE HOPF THEOREM 199

Lemma 14.2.2. The pair (X,A) has the homotopy extension property if and onlyif (X × {0}) ∪ (A× [0, 1]) is a retract of X × [0, 1].

Proof. If (X,A) has the homotopy extension property, then the inclusion map from(X ×{0})∪ (A× [0, 1]) to X × [0, 1] has a continuous extension to all of X × [0, 1],which is to say that there is a retraction. On the other hand, if r is a retraction,then for any g there is continuous extension h = g ◦ r.

We will only be concerned with the example given by the next result, but itis worth noting that this concept takes on greater power when one realizes that(X,A) has the homotopy extension property whenever X is a simplicial complexand A is a subcomplex. It is easy to prove this if there is only one simplex σ in Xthat is not in A; either the boundary of σ is contained in A, in which case thereis an argument like the proof of the following, or it isn’t, and another very simpleconstruction works. The general case follows from induction because if (X,A) and(A,B) have the homotopy extension property, then so does (X,B). To show thissuppose that g : (X × {0}) ∪ (B × [0, 1]) → Y is given. There is a continuousextension h : A× [0, 1] → Y of the restriction of g to (A× {0}) ∪ (B × [0, 1]). Theextension of h to all of (X ×{0})∪ (A× [0, 1]) defined by setting h|X×{0} = g|X×{0}is continuous because it is continuous on X × {0} and A × [0, 1], both of whichare closed subsets of X × [0, 1] (here the requirement that A is closed finally showsup) and since (X,A) has the homotopy extension property this h can be furtherextended to all of X × [0, 1].

Lemma 14.2.3. The pair (Dm, Sm−1) has the homotopy extension property.

Proof. There is an obvious retraction

r : Dm × [0, 1] → (Dm × {0}) ∪ (Sm−1 × [0, 1])

defined by projecting radially from (0, 2) ∈ Rm × R.

We now relate the degree of a map from Dm to Rm with what may be thoughtof as the “winding number” of the restriction of the map to Sm−1.

Theorem 14.2.4. If f : Dm → Rm is continuous, 0 /∈ f(Sm−1), and f : Sm−1 →Sm−1 is the function x 7→ f(x)/‖f(x)‖, then deg0(f) = deg(f).

Proof. For k ∈ Z let fk : Dm → Rm be the map

(r cos θ, r sin θ, x3, . . . , xm) 7→ (r cos kθ, r sin kθ, x3, . . . , xm).

It is easy to see that deg0(fk) = k = deg(f |Sm−1).Now let k = deg(f). The Hopf theorem implies that there is a homotopy

h : Sm−1× [0, 1] → Sm−1 with h0 = f and h1 = fk|Sm−1 . Let h : Sm−1× [0, 1] → Rm

be the homotopy with h0 = f |Sm−1 and h1 = fk|Sm−1 given by

h(x, t) =(

(1− t)‖f(x)‖+ t)

h(x, t),

and extend this to g : (Dm×{0})∪ (Sm−1 × [0, 1]) → Rm by setting g(x, 0) = f(x).The last result implies that g extends to a homotopy j : Dm × [0, 1] → Rm. There


is an additional homotopy ℓ : Dm × [0, 1] → Rm with ℓ0 = j1 and ℓ1 = fk given bysetting

ℓ(x, t) = (1− t)j1(x) + tfk(x).

Note that ℓt|Sm−1 = fk|Sm−1 for all t. The invariance of degree under degree admis-sible homotopy now implies that

deg(f) = k = deg0(fk) = deg0(j1) = deg0(j0) = deg0(f).

14.3 More on Maps Between Spheres

Insofar as spheres are the simplest “nontrivial” (where, in effect, this meansnoncontractible) topological spaces, it is entirely natural that mathematicians wouldquickly investigate the application of degree and index theory to these spaces, andto maps between them. There are many results coming out of this research, someof which are quite famous.

Our discussion combines some purely topological reasoning with analysis basedon concrete examples, and for the latter it is best to agree that

Sm := { x ∈ Rm+1 : ‖x‖ = 1 }.

Some of our arguments involve induction on m, and for this purpose we will regardSm−1 as a subset of Sm by setting

Sm−1 = { x ∈ Sm : xm+1 = 0 }.

Let am : Sm → Sm be the function

am(x) = −x.

Two points x, y ∈ Sm are said to be antipodal if y = am(x). Regarded topolog-ically, am is a fixed point free local diffeomorphism whose composition with itselfis IdSm, and one should expect that all the topological results below involving amand antipodal points should depend only on these properties, but we will not try todemonstrate this (the subject is huge, and our coverage is cursory) instead treatingam as an entirely concrete object.

LetEm = { (x, y) ∈ Sm × Sm : y 6= am(x) }.

There is a continuous function rm : Em × [0, 1] → Sm given by

rm(x, y, t) :=tx+ (1− t)y

‖tx+ (1− t)y‖ .

Proposition 14.3.1. Suppose f, f ′ : Sm → Sn are continuous. If they do not mapany point to a pair of antipodal points—that is, f ′(p) 6= an(f(p)) for all p ∈ Sm—then f and f ′ are homotopic.

14.3. MORE ON MAPS BETWEEN SPHERES 201

Proof. Specifically, there is the homotopy h(x, t) = rn(f(x), f′(x), t).

Consider a continuous function f : Sm → Sn. If m < n, then f is homotopicto a constant map, and thus rather uninteresting. To see this, first note that thesmooth functions are dense in C(Sm, Sn), and a sufficiently nearby function doesnot map any point to the antipode of its image under f , so f is homotopic to asmooth function. So, suppose that f is smooth. By Sard’s theorem, the regularvalues of f are dense, and since n > m, a regular value is a y ∈ Sn with f−1(y) = ∅.We now have the homotopy h(x, t) = rn(f(x), an(y), t).

When m > n, on the other hand, the analysis of the homotopy classes of mapsfrom Sm to Sn is a very difficult topic that has been worked out for many specificvalues of m and n, but not in general. We will only discuss the case of m = n, forwhich the most basic question is the relation between the index and the degree.

Theorem 14.3.2. If f : Sm → Sm is continuous, then

Λ(f) = 1 + (−1)m deg(f).

Proof. Hopf’s theorem (Theorem 14.2.1) implies that two maps from Sm to itself arehomotopic if they have the same degree, and the index is a homotopy invariant, soif suffices to determine the relationship between the degree and index for a specificinstance of a map of each possible degree.

We begin with m = 1. For d ∈ Z let f1,d : S1 → S1 be the function

f1,d(cos θ, sin θ) := (cos dθ, sin dθ).

If d > 0, then f−11,d (1, 0) consists of d points at which f1,d is orientation preserving,

when d = 0 there are points in S1 that are not in the image of f1,0, and if d > 0,then f−1

1,d (1, 0) consists of d points at which f1,d is orientation reversing. Therefore

deg(f1,d) = d.

Now observe that f1,1 is homotopic to a map without fixed points, while ford 6= 1 the fixed points of f1,d are the points

(

cos 2πkd−1

, sin 2πkd−1

)

(k = 0, . . . , d− 2).

If d > 1, then motion in the domain is translated by f1,d into more rapid motionin the range, so the index of each fixed point is −1. When d < 1, f1,d translatesmotion in the domain into motion in the opposite direction in the range, so theindex of each fixed point is 1. Combining these facts, we conclude that

Λ(f1,d) = 1− d,

which establishes the result when m = 1.Let em+1 = (0, . . . , 0, 1) ∈ Rm+1. Then

Sm = {αx+ βem+1 : x ∈ Sm−1, α ≥ 0, α2 + β2 = 1 }.


We define fm,d inductively by the formula

fm,d(

αx+ βem+1

)

= αfm−1,−d(x)− βem+1.

If fm−1,−d is orientation preserving (reversing) at x ∈ Sm−1, then fm,d is clearlyorientation reversing (preserving) at x, so deg(fm,d) = − deg(fm−1,−d). Therefore,by induction, deg(fm,d) = d.

The fixed points of fm,d are evidently the fixed points of fm−1,−d. Fix such anx. Computing in a local coordinate system, one may easily show that the index ofx, as a fixed point of fm,d, is the same as the index of x as a fixed point of fm−1,−d,so Λ(fm,d) = Λ(fm−1,−d). By induction,

Λ(fm,d) = Λ(fm−1,−d) = 1 + (−1)m−1 deg(fm−1,−d) = 1 + (−1)m deg(fm,d).

Corollary 14.3.3. If a map f : Sm → Sm has no fixed points, then deg(f) =(−1)m+1. If f does not map any point to its antipode, which is to say that am ◦ fhas no fixed points, then deg(f) = 1. Consequently, if f does not map any pointeither to itself or its antipode, then m is odd.

Proof. The first claim follows from Λ(f) = 0 and the result above. In particular,am has no fixed points, so deg(am) = (−1)m+1. The second result now follows fromthe multiplicative property of the degree of a composition (Corollary 12.4.2):

(−1)m+1 = deg(am ◦ f) = deg(am) · deg(f) = (−1)m+1 deg(f).

Proposition 14.3.4. If the map f : Sm → Sm never maps antipodal points toantipodal points—that is, am(f(p)) 6= f(am(p)) for all p ∈ Sm—then deg(f) iseven. If m is even, then deg(f) = 0.

Proof. The homotopy h : Sm × [0, 1] → Sm given by

h(p, t) := rm(f(p), f(am(p)), t)

shows that f and f ◦ am are homotopic, whence deg(f) = deg(f ◦ am). Corollary12.4.2 and Corollary 14.3.3 give

deg(f) = deg(f ◦ am) = deg(f) deg(am) = (−1)m+1 deg(f),

and when m is even it follows that deg(f) = 0.Since f is homotopic to a nearby smooth function, we may assume that it is

smooth, in which case each ht is also smooth. Sard’s theorem implies that each hthas regular values, and since h1/2 = h1/2 ◦ am, any regular value of h1/2 has an evennumber of preimages. The sum of an even number of elements of {1,−1} is even,so it follows that deg(f) = deg(h1/2) is even.

Combining this result with the first assertion of Corollary 14.3.3 gives a resultthat was actually applied to the theory of general economic equilibrium by Hartand Kuhn (1975):


Corollary 14.3.5. Any map f : Sm → Sm either has a fixed point or a point psuch that f(am(p)) = am(f(p)).

Of course am extends to the map x 7→ −x from Rm+1 to itself, and in appropriatecontexts we will understand it in this sense. If D ⊂ Rm+1 satisfies am(D) = D, amap f : D → Rn+1 is said to be antipodal if

f ◦ am|D = an ◦ f.

An antipodal map f : Sm → Sm induces a map from m-dimensional projectivespace to itself. If you think about it for a bit, you should be able to see that a mapfrom m-dimensional projective space to itself is induced by such an f if and only ifit maps orientation reversing loops to orientation reversing loops.

The next result seems to be naturally paired with Proposition 14.3.4, but it isactually much deeper.

Theorem 14.3.6. If a map f : Sm → Sm is antipodal, then its degree is odd.

Proof. There are smooth maps arbitrarily close to f . For such an f ′ the map

p 7→ rm(f′(p),−f ′(−p), 1

2)

is well defined, smooth, antipodal, and close to f , so it is homotopic to f and hasthe same degree. Evidently it suffices to prove the claim with f replaced by thismap, so we may assume that f is smooth. Sard’s theorem implies that there is aregular value of f , say q.

After rotating Sm we may assume that q = (0, . . . , 0, 1) and −q = (0, . . . , 0,−1)are the North and South poles of Sm. We would like to assume that

(f−1(q) ∪ f−1(−q)) ∩ Sm−1 = ∅,

and we can bring this about by replacing f with f ◦ h where h : Sm → Sm is anantipodal diffeomorphism than perturbs neighborhoods of the points in f−1(q) ∪f−1(−q) while leaving points far away from these points fixed. (Such an h can easilybe constructed using the methods of Section 10.2.)

Since a sum of numbers drawn from {−1, 1} is even or odd according to whetherthe number of summands is even or odd, our goal reduces to showing that f−1(q)has an odd number of elements. When m = 0 this is established by considering thetwo antipode preserving maps from S0 to itself. Proceeding inductively, supposethe result has been established when m is replaced by m− 1.

For p ∈ Sm, p ∈ f−1(q) if and only if −p ∈ f−1(−q), because f is antipodal,so the number of elements of f−1(q) ∪ f−1(−q) is twice the number of elements off−1(q). Let

Sm+ := { p ∈ Sm : pm+1 ≥ 0 } and Sm− := { p ∈ Sm : pm+1 ≤ 0 }

be the Northern and Southern hemispheres of Sm. Then p ∈ Sm+ if and only if−p ∈ Sm− , so Sm+ contains half the elements of f−1(q)∪ f−1(−q). Thus it suffices toshow that (f−1(q) ∪ f−1(−q)) ∩ Sm+ has an odd number of elements.


For ε > 0 consider the small open and closed disks

Dε := { p ∈ Sm : pm+1 > 1− ε } and Dε := { p ∈ Sm : pm+1 ≥ 1− ε }

centered at the North pole. Since f is antipode preserving, −q is also a regular valueof f . In view of the inverse function theorem, f−1(Dε ∪ −Dε) is a disjoint unionof diffeomorphic images of Dε, and none of these intersect Sm−1 if ε is sufficientlysmall. Concretely, for each p ∈ f−1(q) ∪ f−1(−q) the component Cp of f−1(Dε ∪−Dε) containing p is mapped diffeomorphically by f to either Dε or −Dε, and thevarious Cp are disjoint from each other and Sm−1. Therefore we wish to show thatf−1(Dε ∪ −Dε) ∩ Sm+ has an odd number of components.

Let M = Sm+ \ f−1(Dε ∪ −Dε). Clearly M is a compact m-dimensional smooth∂-manifold. Each point in Sm \ {q,−q} has a unique representation of the formαy+βq where y ∈ Sm−1, 0 < α ≤ 1, and α2+β2 = 1. Let j : Sm \ {q,−q} → Sm−1

be the function j(

αy + βq)

:= y, and let

g := j ◦ f |M :M → Sm−1.

Sard’s theorem implies that some q∗ ∈ Sm−1 is a regular value of both g and g|∂M .Theorem 12.2.1 implies that degq∗(g|∂M) = 0, so (g|∂M)−1(q∗) has an even number ofelements. Evidently g maps the boundary of each Cp diffeomorphically onto Sm−1,so each such boundary contains exactly one element of (g|∂M)−1(q∗). In addition,j maps antipodal points of Sm \ {q,−q} to antipodal ponts of Sm−1, so g|Sm−1 isantipodal, and our induction hypothesis implies that (g|∂M)−1(q∗) ∩ Sm−1 has anodd number of elements. Therefore the number of components of f−1(Dε ∪ −Dε)contained in Sm+ is odd, as desired.

The hypotheses can be weakened:

Corollary 14.3.7. If the map f : Sm → Sm satisfies f(−p) 6= f(p) for all p, thenthe degree of f is odd.

Proof. This will follow from the last result once we have shown that f is homo-topic to an antipodal map. Let h : Sm × [0, 1] → Sm be the homotopy h(p, t) =rm(f(p),−f(−p), 2t). The hypothesis implies that this is well defined, and h1 isantipodal.

This result has a wealth of geometric consequences.

Theorem 14.3.8 (Borsuk-Ulam Theorem). The following are true:

(a) If f : Sm → Rm is continuous, then there is a p ∈ Sm such that f(p) =f(am(p)).

(b) If f : Sm → Rm is continuous and antipodal, then there is a p ∈ Sm such thatf(p) = 0.

(c) There is no continuous antipodal f : Sm → Sm−1.

(d) There is no continuous g : Dm = { (y1, . . . , ym, 0) ∈ Rm+1 : ‖y‖ ≤ 1 } → Sm−1

such that g|Sm−1 is antipodal.


(e) Any cover F1, . . . , Fm+1 of Sm by m + 1 closed sets has a least one set thatcontains a pair of antipodal points.

(f) Any cover U1, . . . , Um+1 of Sm by m + 1 open sets has a least one set thatcontains a pair of antipodal points.

Proof. We think of Rm as Sm with a point removed, so a continuous f : Sm → Rm

amounts to a function from Sm to itself that is not surjective, and whose degree isconsequently zero. Now (a) follows from the last result.

Suppose that f : Sm → Rm is continuous and f(p) = f(−p). If f is alsoantipodal, then f(−p) = −f(p) so f(p) = 0. Thus (a) implies (b).

Obviously (b) implies (c).Let π : p 7→ (p1, . . . , pm, 0) be the standard projection from Rm+1 to Rm. As

in the proof of Theorem 14.3.6 let Sm+ and Sm− be the Northern and Southernhemispheres of Sm. If g : Dm → Sm−1 was continuous and antipodal, we coulddefine a continuous and antipodal f : Sm → Sm−1 by setting

f(p) =

{

g(π(p)), p ∈ Sm+ ,

g(π(am(p))), p ∈ Sm− .

Thus (c) implies (d).Suppose that F1, . . . , Fm+1 is a cover of Sm by closed sets. Define f : Sm → Rm

by setting

f(p) =(

d(x, F1), . . . , d(x, Fm))

where d(x, x′) = ‖x − x′‖ is the usual metric for Rm+1. Suppose that f(p) =f(−p) = y. If yi = 0, then p,−p ∈ Fi, and if all the components of y are nonzero,then p,−p ∈ Fm+1. Thus (a) implies (e).

Suppose U1, . . . , Um+1 is a cover of Sm by open sets and ε > 0. For i = 1, . . . , m+

1 set Fi := { p ∈ Sm : d(p, Sm \ Ui) ≥ ε }. Then each Fi is a closed subset of Ui,and these sets cover Sm if ε is sufficiently small. Thus (e) implies (f).

In the argument above we showed that (a) ⇒ (b) ⇒ (c) ⇒ (d) and (a) ⇒ (e) ⇒(f). There are also easy arguments for the implications (d) ⇒ (c) ⇒ (b) ⇒ (a) and(f) ⇒ (e) ⇒ (c), so (a)-(f) are equivalent in the sense of each being an elementaryconsequence of each other. The proofs that (d) ⇒ (c) and (c) ⇒ (b) are obviousand can be safely left to the reader. To show that (b) ⇒ (a), for a given continuousf : Sm → Rm we apply (b) to f − f ◦ am. To show that (f) ⇒ (e) observe that ifF1, . . . , Fm+1 are closed and cover Sm, then for each n the sets U1/n(Fi) are openand cover Sm, so there is a pn with pn,−pn ∈ U1/n(Fi) for some i. Any limit pointof the sequence {pn} has the desired property.

The proof that (e) ⇒ (c) is more interesting. Consider an m-simplex that isembedded in Dm with the origin in its interior. Let F1, . . . , Fm+1 be the radialprojections of the facets of the simplex onto Sm−1. These sets are closed and coverSm−1, and since each facet is separated from the origin by a hyperplane, each Fidoes not contain an antipodal pair of points. If f : Sm → Sm−1 is continuous,then f−1(F1), . . . , f

−1(Fm+1) are a cover of Sm by closed sets, and (e) implies the


existence of p,−p ∈ f−1(Fi) for some i. If f was also antipodal, then f(p), f(−p) =−f(p) ∈ Fi, which is impossible.

As a consequence of the Borsuk-Ulam theorem, the following “obvious” fact isactually highly nontrivial.

Theorem 14.3.9. Spheres of different dimensions are not homeomorphic.

Proof. If k < m then, since Sk can be embedded in Rm, part (a) of the Borsuk-Ulamtheorem implies that a continuous function from Sm to Sk cannot be injective.

14.4 Invariance of Domain

The main result of this section, invariance of domain, is a famous result withnumerous applications. It can be thought of as a purely topological version of theinverse function theorem. However, before that we give an important consequencesof the Borsuk-Ulam theorem for Euclidean spaces.

Theorem 14.4.1. Euclidean spaces of different dimensions are not homeomorphic.

Proof. If k 6= m and f : Rk → Rm was a homeomorphism, for any sequence {xj} inRk with {xj} → ∞ the sequence {f(xj)} could not have a convergent subsequence,so ‖f(xj)‖ → ∞. Identifying Rk and Rm with Sk \ {ptk} and Sm \ {ptm}, theextension of f to Sk given by setting f(ptk) = ptm would be continuous, with acontinuous inverse, contrary to the last result.

The next two lemmas develop the proof of this section’s main result.

Lemma 14.4.2. Suppose Sm+ is the Northern hemisphere of Sm, f : Sm+ → Sm

is a map such that f |Sm−1 is antipodal, and p ∈ Sm+ \ Sm−1 is a point such that−p /∈ f(Sm+ ) and p /∈ f(Sm−1). Then degp(f) is odd.

Proof. Let f : Sm → Sm be the extension of f given by setting f(p) = −f(−p)when pm+1 < 0. Clearly f is continuous and antipodal, so its degree is odd. Thehypotheses imply that f−1(p) ⊂ Sm+ \ Sm−1, and that f is degree admissible over p,

so Additivity implies that degp(f) = degp(f).

Lemma 14.4.3. If f : Dm → Rm is injective, then degf(0)(f) is odd, and f(Dm)includes a neighborhood of f(0).

Proof. Replacing f with x 7→ f(x) − f(0), we may assume that f(0) = 0. Leth : Dm × [0, 1] → Rm be the homotopy

h(x, t) := f( x1+t

)− f(−tx1+t

).

Of course h0 = f and h1 is antipodal. If ht(x) = 0 then, because f is injective,x = −tx, so that x = 0. Therefore h is a degree admissible homotopy over zero, sodeg0(h0) = deg0(h1), and the last result implies that deg0(h1) is odd, so deg0(h0) =deg0(f) is odd. The Continuity property of the degree implies that degy(f) is oddfor all y in some neighborhood of f(0). Since, by Additivity, degy(f) = 0 whenevery /∈ f(Dm), we conclude that f(Dm) contains a neighborhood of 0.

14.5. ESSENTIAL SETS REVISITED 207

The next result is quite famous, being commonly regarded as one of the majoraccomplishments of algebraic topology. As the elementary nature of the assertionsuggests, it is applied quite frequently.

Theorem 14.4.4 (Invariance of Domain). If U ⊂ Rm is open and f : U → Rm

is continuous and injective, then f(U) is open and f is a homeomorphism onto itsimage.

Proof. The last result can be applied to a closed disk surrounding any point in thedomain, so for any open V ⊂ U , f(V ) is open. Thus f−1 is continuous.

14.5 Essential Sets Revisited

Let X be a compact ANR, let C ⊂ X be compact, and let f : C → X be anindex admissible function. If Λ(f) 6= 0, then the set of fixed points is essential.What about a converse? More specifically, if Λ(f) = 0, then of course f may havefixed points, but is f necessarily index admissible homotopic to a function withoutfixed points? When C = X , so that Λ(f) = L(f), this question amounts to arequest for conditions under which a converse of the Lefschetz fixed point theoremholds. We can also ask whether a somewhat more demanding condition holds: doesevery neighborhood of f in C(C,X) contains a function without any fixed points?

If C is not connected, the answers to these questions are obtained by combiningthe answers obtained when this question is applied to the restrictions of f to thevarious connected components of C, so we should assume that C is connected. IfC1, . . . , Cr are pairwise disjoint subsets of C with FP(f) contained in the interiorof C1 ∪ . . . ∪ Cr, then Λ(f) =

∑

i Λ(f |Ci), and of course when r > 1 it can easily

happen that Λ(f |Ci) 6= 0 for some i even though the sum is zero. Therefore we

should assume that FP(f) is also connected.Our goal is to develop conditions under which a connected set of fixed points

with index zero can be “perturbed away,” in the sense that there is a nearby functionor correspondence with no fixed points near that set. Without additional assump-tions, there is little hope of achieving positive answers. For the general situation inwhich the space is an ANR, the techniques we develop below would lead eventuallyto composing a perturbation with a retraction, and it is difficult to prevent theretraction from introducing undesired fixed points. An approach to this issue forsimplicial complexes is developed in Ch. VIII of Brown (1971).

Our attention is restricted to the following settings: a) X is a “well behaved”subset of a smooth manifold; b) X is a compact convex subset of a Euclidean space.The gist of the argument used to prove these results is to first approximate with asmooth function that has only regular fixed point, which are necessarily finite andcan be organized in pairs of opposite index, then perturb to eliminate each pair.

Proposition 14.5.1. If g : Dm → Rm is continuous, 0 /∈ g(Sm−1), and deg0(g) =0, then there is a continuous g : Dm → Rm \ {0} with g|Sm−1 = g|Sm−1.

Proof. Let g : Sm−1 → Sm−1 be the function g(x) = g(x)/‖g(x)‖. Theorem 14.2.4implies that deg(g) = 0, so the Hopf theorem implies that there is a homotopy


h : Sm−1 × [0, 1] → Sm−1 with h0 = g and h1 a constant function. For (x, t) ∈Sm−1 × [0, 1] we set

g(tx) =(

t‖g(x)‖+ (1− t))

h1−t(x).

If (xr, tr) is a sequence with tr → 0, then g(tr, xr) converges to the constant valueof h1, so this is well defined and continuous. For x ∈ Sm−1 we have g(x) 6= 0, sothe origin is not in the image of g, and g(x) = ‖g(x)‖h0(x) = g(x).

The first of this section’s principal results is as follows.

Theorem 14.5.2. Let M be a smooth Cr manifold, where 2 ≤ r ≤ ∞, let X ⊂Mbe a compact ANR for which there is a homotopy h : X × [0, 1] → X such thath0 = IdX and, for each t > 0, ht : X → ht(X) is a homeomorphism whose imageht(X) is contained in the interior of X. Let C be a compact subset of X, and letf : C → X be an index admissible function. If FP(f) is connected and Λ(f) = 0,then FP(f) is an inessential set of fixed points.

Proof. It suffices to show that for a given open W ⊂ C ×X containing the graphof f there is a continuous f ′ : C → X with FP(f ′) = ∅. We have Gr(ht ◦ f) ⊂ Wfor small t > 0, so it suffices to prove the result with f replaced by ht ◦ f , whichmeans that we may assume that the image of f is contained in the interior of X .

Recall that Proposition 10.7.8 gives a continuous function λ : M → (0,∞) anda Cr−1 function κ : Vλ → M , where Vλ = { (p, v) ∈ RM : ‖v‖ < λ(p) }, such thatκ(p, 0) = p for all p ∈M and

κ = π × κ : Vλ → M ×M

is a Cr−1 embedding, where π : TM →M is the projection. Let Vλ = κ(Vλ).Let Y0 = { p ∈ C : (p, f(p)) ∈ Vλ }; of course this is an open set containing

FP(f). Let Y1 and Y2 be open sets such that FP(f) ⊂ Y2, Y 2 ⊂ Y1, Y 1 ⊂ Y0,and Y2 is path connected. (Such a Y2 can be constructed by taking a finite unionof images of Dm under Cr parameterizations.) We can define a vector field ζ on aneighborhood of Y0 by setting

ζ(p) = κ−1(p, f(p)).

Proposition 11.5.2 and Corollary 10.2.5 combine to imply that there is a vector fieldζ on Y0 with image contained in κ−1(W ) that agrees with ζ on Y0 \ Y1, is Cr−1 onY2, and has only regular equilibria, all of which are in Y2. The number of equilibriais necessarily finite, and we may assume that, among all the vector fields on Y0 thatagree with ζ on Y0 \ Y1, are Cr−1 on Y2, and have only regular equilibria in Y2, ζminimizes this number. If ζ has no equilibria, then we may define a continuousfunction f : C → X without any fixed points whose graph is contained in W bysetting f(p) = κ(ζ(p)) if p ∈ Y0 and setting f(p) = f(p) otherwise.

Aiming at a contradiction, suppose that ζ has equilibria. Since the index iszero, there must be two equilibria of opposite index, say p0 and p1, and it sufficesto show that we can further perturb ζ in a way that eliminates both of them.There is a Cr embedding γ : (−ε, 1 + ε) → Upp with γ(0) = p0 and γ(1) = p1.

14.5. ESSENTIAL SETS REVISITED 209

(This is obvious, but painful to prove formally, and in addition the case m = 1requires special treatment. A formal verification would do little to improve thereader’s understanding, so we omit the details.) Applying the tubular neighborhoodtheorem, this path can be used to construct a Cr parameterization ϕ : Z → U whereZ ⊂ Rm is a a neighborhood of Dm.

Let g : Z → Rm be defined by setting g(x) = Dϕ(x)−1ζϕ(x). Proposition 14.5.1gives a continuous function g : Z → Rm \ {0} that agrees with g on the closure ofZ \Dm. We extend g to all of Z by setting g(x) = g(x) if x /∈ Dm. Define a newvector field ζ on ϕ(Z) by setting

ζ(p) = Dϕ(ϕ−1(p))g(ϕ−1(p)).

There are two final technical points. In order to insure that ζ(p) ∈ κ−1(W ) forall p we can first multiplying g by a Cr function β : Dm → (0, 1] that is identically1 on Z \Dm and close to zero in the interior of Dm outside of some neighborhood ofSm−1. We can also use Proposition 11.5.2 and Corollary 10.2.5 to further perturb ζto make is Cr−1 without introducing any additional equilibria. This completes theconstruction, thereby arriving at a contradiction that completes the proof.

Economic applications call for a version of the result for correspondences. Ideallyone would like to encompass contractible valued correspondences in the setting ofa manifold, but the methods used here are not suitable. Instead we are restrictedto convex valued correspondences, and thus to settings where convexity is defined.

Theorem 14.5.3. If X ⊂ Rm is compact and convex, C ⊂ X is compact, F :C → X is an index admissible upper semicontinuous convex valued correspondence,Λ(F ) = 0, and FP(F ) is connected, then F is inessential.

Caution: The analogous result does not hold for essential sets of Nash equilibria,which are defined by Jiang (1963) in terms of perturbations of the game’s payoffs.Hauk and Hurkens (2002) give an example of a game with a component of the setof Nash equilibria that has index zero but is robust with respect to perturbationsof payoffs.

Proof. LetW ⊂ C×X be an open set containing the graph of F . We will show thatthere is a continuous f : C → X with Gr(f) ⊂W and FP(f) = ∅. Let x0 be a pointin the interior of X , let h : X×[0, 1] → X be the contraction h(x, t) = (1−t)x+tx0,and for t ∈ [0, 1] let ht◦F be the correspondence x 7→ ht(F (x)). This correspondenceis obviously upper semicontinuous and convex valued, and Gr(ht◦F ) ⊂W for smallt > 0, so it suffices to prove the result with F replaced by ht◦F for such t. Thereforewe may assume that the image of F is contained in the interior of X .

For each x ∈ FP(F ) we choose convex neighborhoods Yx ⊂ C of x and Zx ⊂ Xof F (x) such that Yx ⊂ Zx and Yx×Zx ⊂W . Choose x1, . . . , xk such that FP(F ) ⊂Yx1 ∪ . . . ∪ Yxk , and let

Y0 = Yx1 ∪ . . . ∪ Yxk and Z0 = (Yx1 × Zx1) ∪ . . . ∪ (Yxk × Zxk).

Note that for all (x, y) ∈ Z0, Z0 contains the line segment { (x, (1− t)y+ tx) }. LetY1 and Y2 be open subsets of C with FP(F ) ⊂ Y2, Y 2 ⊂ Y1, Y 1 ⊂ Y0, and Y2 is


path connected. Let α : C → [0, 1] be a C∞ function that is identically one on Y 2

and identically zero on C \ Y1.Let

W0 = Z0 ∪ (W ∩ ((C \ Y1)×X)) \ { (x, x) : x ∈ C \ Y2 }.This is an open set that contains the graph of F , so Proposition 10.2.7 implies thatthere is a C∞ function f : C → X with Gr(f) ⊂ W0 that has only regular fixedpoints. We assume that among all functions with these properties, f is minimal forthe number of fixed points.

There is some ε > 0 such that {x} × Uε(x) ⊂ Z0 for all x ∈ Y 2. For anyδ ∈ (0, 1] the function

f ′ : x 7→ (1− α(x))f(x) + α(x+ δ(f(x)− x))

is C∞, its graph is contained in W0, and it has only regular fixed points. If δ > 0is sufficiently small, then f ′(x) ∈ Uε(x) for all x ∈ Y 2. Therefore we may assumethat f(x) ∈ Uε(x) for all x ∈ Y 2.

Define a function ζ : Y2 → Rm by setting ζ(x) = f(x) − x. Aiming at acontradiction, suppose that ζ has zeros. Since the Λ(f) = 0, there must be two zerosof opposite index, say x0 and x1. As in the last proof, there is a Cr embedding γ :(−ε, 1+ε) → Y2 with γ(0) = x0 and γ(1) = x1. Applying the tubular neighborhoodtheorem, this path can be used to construct a C∞ parameterization ϕ : T → Y2where T ⊂ Rm is a neighborhood of Dm.

Let g : T → Rm be defined by setting g(x) = Dϕ(x)−1ζϕ(x). Proposition 14.5.1gives a continuous function g : T → Rm \ {0} that agrees with g on the closure ofT \Dm. We extend g to all of T by setting g(x) = g(x) if x /∈ Dm. Define a newvector field ζ on ϕ(T ) by setting

ζ(p) = Dϕ(ϕ−1(p))g(ϕ−1(p)).

There are two final technical points. In order to insure that ‖ζ(x)‖ < ε for allp we can first multiply g by a C∞ function β : Dm → (0, 1] that is identically 1on T \Dm and close to zero in the interior of Dm outside of some neighborhood ofSm−1. We can also use Proposition 11.5.2 and Corollary 10.2.5 to further perturbζ to make it C∞ without introducing any additional zeros. We can now definea function f ′ : C → X by setting f(x) = x + ζ(x) if x ∈ T and f ′(x) = f(x)otherwise. Since f ′ has all the properties of f , and two fewer fixed points, this is acontradiction, and the proof is complete.

Chapter 15

Vector Fields and their Equilibria

Under mild technical conditions, explained in Sections 15.1 and 15.2, a vectorfield ζ on a manifoldM determines a dynamical system. That is, there is a functionΦ : W → M , where W ⊂ M × R is a neighborhood of W × {0}, such that thederivative of Φ at (p, t) ∈ W , with respect to time, is ζΦ(p,t). In this final chapterwe develop the relationship between the fixed point index and the stability of restpoints, and sets of rest points, of such a dynamical system.

In addition to the degree and the fixed point index, there is a third expression ofthe underlying mathematical principle for vector fields. In Section 15.3 we presentan axiomatic description of the vector field index, paralleling our axiom systemsfor the degree and fixed point index. Existence and uniqueness are established byshowing that the vector field index of ζ |C, for suitable compact C ⊂ M , agreeswith the fixed point index of Φ(·, t)|C for small negative t. Since we are primarilyinterested in forward stability, it is more to the point to say that the fixed pointindex of Φ(·, t)|C for small positive t agrees with the vector field index of −ζ |C.

The notion of stability we focus on, asymptotic stability, has a rather compli-cated definition, but the intuition is simple: a compact set A is asymptoticallystable if the trajectory of each point in some neighborhood of A is eventually drawninto, and remains inside, arbitrarily small neighborhoods of A. In order to use thefixed point index to study stability, we need to find some neighborhood of such an Athat is mapped into itself by Φ(·, t) for small positive t. The tool we use to achievethis is the converse Lyapunov theorem, which asserts that if A is asymptoticallystable, then there is a Lyapunov function for ζ that is defined on a neighborhood ofA. Unlike the better known Lyapunov theorem, which asserts that the existence ofa Lyapunov function implies asymptotic stability, the converse Lyapunov theoremis a more recent and difficult result. We prove a version of it that is sufficient forour needs in Section 15.5.

Once all this background material is in place, it will not take long to prove theculminating result, that if A is a asymptotically stable, and an ANR, then the vectorfield index of −ζ is the Euler characteristic of A. This was proved in the context ofa game theoretic model by Demichelis and Ritzberger (2003). The special case of Abeing a singleton is a prominent result in the theory of dynamical systems, due toKrasnosel’ski and Zabreiko (1984): if an isolated rest point is asymptotically stablefor ζ , then the vector field index of that point for −ζ is 1.

211

212 CHAPTER 15. VECTOR FIELDS AND THEIR EQUILIBRIA

Paul Samuelson advocated a “correspondence principle” in two papers Samuel-son (1941, 1942) and his famous book Foundations of Economic Analysis Samuelson(1947). The idea is that the stability of an economic equilibrium, with respect tonatural dynamics of adjustment to equilibrium, implies certain qualitative prop-erties of the equilibrium’s comparative statics. There are 1-dimensional settingsin which this idea is regarded as natural and compelling, but Samuelson’s writ-ings discuss many examples without formulating it as a general theorem, and itsnature and status in higher dimensions has not been well understood; Echenique(2008) provides a concise summary of the state of knowledge and related literature.The book concludes with an explanation of how the Krasnosel’ski-Zabreiko theoremallows the correspondence principle to be formulated in a precise and general way.

15.1 Euclidean Dynamical Systems

We begin with a review of the theory of ordinary differential equations in Eu-clidean space. Let U ⊂ Rm be open, and let z : U → Rm be a function, thoughtof as a vector field. A trajectory of z is a C1 function γ : (a, b) → U such thatγ′(s) = zγ(s) for all s. Without additional assumptions the dynamics associatedwith z need not be deterministic: there can be more than one trajectory for thevector field satisfying an initial condition that specifies the position of the trajectoryat a particular moment. For example, suppose that m = 1, U = R, and

z(t) =

{

0, t ≤ 0,

2√t, t > 0.

Then for any s0 there is a trajectory γs0 : R →M given by

γs0(s) =

{

0, s ≤ s0,

(s− s0)2, s > s0.

For most purposes this sort of indeterminacy is unsatisfactory, so we need to finda condition that implies that for any initial condition there is a unique trajectory.Let (X, d) and (X ′, d′) be metric spaces. A function f : X → X ′ is Lipshitz ifthere is a constant L > 0 such that

d′(f(x), f(y)) ≤ Ld(x, y)

for all x, y ∈ X . We say that f is locally Lipschitz if each x ∈ X has a neigh-borhood U such that f |U is Lipschitz. The basic existence-uniqueness result forordinary differential equations is:

Theorem 15.1.1 (Picard-Lindelof Theorem). Suppose that U ⊂ Rm is open, z :U → Rm is locally Lipschitz, and C ⊂ U is compact. Then for sufficiently smallε > 0 there is a unique function F : C × (−ε, ε) → U such that for each x ∈ C,F (x, 0) = x and F (x, ·) is a trajectory of z. In addition F is continuous, and if zis Cs (1 ≤ s ≤ ∞) then so is F .

15.2. DYNAMICS ON A MANIFOLD 213

Due to its fundamental character, a detailed proof would be out of place here,but we will briefly describe the central ideas of two methods. First, for any ∆ > 0one can define a piecewise linear approximate solution going forward in time bysetting F∆(x, 0) = x and inductively applying the equation

F∆(x, t) = F∆(x, k∆) + (t− k∆) · z(F∆(x, k∆)) for k∆ < t ≤ (k + 1)∆.

Concrete calculations show that this collection of functions has a limit as ∆ → 0,that this limit is continuous and satisfies the differential equation (∗), and alsothat any solution of (∗) is a limit of this collection. These calculations give pre-cise information concerning the accuracy of the numerical scheme for computingapproximate solutions described by this approach.

The second proof scheme uses a fixed point theorem. It considers the mappingF 7→ F given by the equation

F (x, t) = x+

∫ t

0

z(F (x, s)) ds.

This defines a function from C(C × [−ε, ε], U) to C(C × [−ε, ε],Rm). As usual,the range is endowed with the supremum norm. A calculation shows that if ε issufficiently small, then the restriction of this function to a certain neighborhoodof the function (x, t) 7→ x is actually a contraction. Since C(C × [−ε, ε],Rm) is acomplete metric space, the contraction mapping theorem gives a unique fixed point.Additional details can be found in Chapter 5 of Spivak (1979) and Chapter 8 ofHirsch and Smale (1974).

15.2 Dynamics on a Manifold

Throughout this chapter we will work with a fixed order of differentiability2 ≤ r ≤ ∞ and an m-dimensional Cr manifold M ⊂ Rk. Recall that if S is asubset of M , a vector field on S is a continuous function ζ : S → TM suchthat π ◦ ζ = IdS, where π : TM → M is the projection (p, v) 7→ p. We writeζ(p) = (p, ζp), so that ζ is thought of as “attaching” a tangent vector ζp to eachp ∈ S, in a continuous manner. A trajectory of ζ is a C1 function γ : (a, b) → Ssuch that γ′(s) = ζγ(s) for all s.

We wish to transport the Picard-Lindelof theorem to M . To this end, we studyhow vector fields and their associated dynamical systems are transformed by changesof coordinates. In addition to the vector field ζ on M , suppose that N ⊂ Rℓ is asecond Cr manifold and h : M → N is a Cr diffeomorphism. Let η be the vectorfield on N defined by

ηq = Dh(h−1(q))ζh−1(q). (∗)This formula preserves the dynamics:

Lemma 15.2.1. The curve γ : (a, b) → M is a trajectory of ζ if and only if h ◦ γis a trajectory of η.


Proof. For each s the chain rule gives

(h ◦ γ)′(s) = Dh(γ(s))γ′(s)

and Dh(γ(s)) is a linear isomorphism because h is a diffeomorphism.

In our main application of this result N will be an open subset of Rm, to whichTheorem 15.1.1 can be applied. The given ζ will be locally Lipschitz, and it shouldfollow that η is also locally Lipschitz. Insofar as M is a subset of Rk, TM inherits ametric, which gives meaning to the assumption that ζ is locally Lipschitz, but thisis a technical artifice, and it would be troubling if our concepts depended on thismetric in an important way. One consequence of the results below is that differentembeddings ofM in a Euclidean space give rise to the same class of locally Lipschitzvector fields.

Lemma 15.2.2. If U ⊂ Rm is open and f : U → Rn is C1, then f is locallyLipschitz.

Proof. Consider a point x ∈ U . There is an ε > 0 such that the closed ball B ofradius ε centered at x is contained in U . Let L := maxy∈B ‖Df(y)‖. Since B isconvex, for any y, z,∈ B we have

‖f(z)− f(y)‖ = ‖∫ 1

0

Df(y + t(z − y))(z − y) dt‖

≤∫ 1

0

‖Df(y + t(z − y))‖ · ‖z − y‖ dt ≤ L‖z − y‖.

Lemma 15.2.3. A composition of two Lipschitz functions is Lipschitz, and a com-position of two locally Lipschitz functions is locally Lipschitz.

Proof. Suppose that f : X → X ′ is Lipschitz, with Lipschitz constant L, that(X ′′, d′′) is a third metric space, and that g : X ′ → X ′′ is Lipschitz with Lipschitzconstant M . Then

d′′(g(f(x)), g(f(y))) ≤ Md′(f(x), f(y)) ≤ LMd(x, y)

for all x, y ∈ X , so g ◦ f is Lipschitz with Lipschitz constant LM .Now suppose that f and g are only locally Lipschitz. For any x ∈ X there is a

neighborhood U of x such that f |U is Lipschitz and a neighborhood V of f(x) suchthat g|V is Lipschitz. Then f |U∩f−1(V ) is Lipschitz, and, by continuity, U ∩ f−1(V )is a neighborhood of x. Thus g ◦ f is locally Lipschitz.

In preparation for the next result we note the following immediate consequencesof equation (∗):

Dh(p)ζp = ηh(p) and Dh−1(q)ηq = ζh−1(q)

for all p ∈ M and q ∈ h(M). We also note that for everything we have done up tothis point it is enough that r ≥ 1, but the following result depends on r being atleast 2.

15.2. DYNAMICS ON A MANIFOLD 215

Lemma 15.2.4. ζ is locally Lipschitz if and only if η is locally Lipschitz.

Proof. Suppose that ζ is locally Lipschitz. For any p, p′ ∈M we have

‖ηh(p) − ηh(p′)‖ = ‖Dh(p)(ζp − ζp′) + (Dh(p)−Dh(p′))ζp′‖

≤ ‖Dh(p)‖ · ‖ζp − ζp′‖+ ‖Dh(p)−Dh(p′))‖ · ‖ζp′‖.Any p0 ∈ M has a neighborhood M0 ⊂ M such that ζ |M0 is Lipschitz, say withLipschitz constant L1, and there are constants C1, C2 > 0 such that ‖Dh(p)‖ ≤ C1

and ‖ζp‖ ≤ C2 for all p ∈ M0. Since h is C2, Dh is C1 and consequently locallyLipschitz, so we can choose M0 such that Dh|M0 is Lipschitz, say with Lipschitzconstant L2. Then η ◦ h|M0 is Lipschitz with Lipschitz constant C1L1 + C2L2.

Now suppose that η is locally Lipschitz. By the definition of a Cr function,there is an open W ⊂ Rk containing h(M) and a Cr function Ψ : W → Rm whoserestriction to h(M) is h−1. Replacing W with Ψ−1(M), we may assume that theimage of Ψ is contained in M . We extend η to W be setting η = η ◦ h ◦Ψ. This islocally Lipschitz because η is locally Lipschitz and h ◦ Ψ is C1. The remainder ofthe proof follows the pattern of the first part, with ζ in place of η, η in place of ζ ,and Ψ in place of h.

With the preparations complete, we can now place the Picard-Lindelof theoremin a general setting.

Theorem 15.2.5. Suppose that ζ is locally Lipschitz and C ⊂ M is compact. Thenfor sufficiently small ε > 0 there is a unique function Φ : C × (−ε, ε) → U suchthat for all p ∈ C, Φ(p, 0) = p and Φ(p, t) is a trajectory for ζ. In addition Φ iscontinuous, and if ζ is Cs (1 ≤ s ≤ r) then so is Φ.

Proof. We can cover C with the interiors of a finite collection K1, . . . , Kr of compactsubsets, each of which is contained in the image of some Cr parameterization ϕi :Ui → M . For each i let zi be the vector field on Ui derived from ζ and ϕ−1

i , asdescribed above, and let Fi : ϕ

−1(Ki) × (−εi, εi) → Ui be the function given byTheorem 15.1.1. Then the function Φi : Ki × (−εi, εi) →M given by

Φi(p, t) = ϕi(Fi(ϕ−1i (p), t))

inherits the continuity and smoothness properties of Fi, and for each p ∈ Ki,Φi(p, 0) = p and Φi(p, ·) is a trajectory for ζ . If ε ≤ min{ε1, . . . , εr}, then wemust have Φ(p, t) = Φi(p, t) whenever p ∈ Ki, so Φ is unique if it exists. In factΦ unambiguously defined by this condition: if p ∈ Ki ∩Kj , then ϕ−1

i ◦ Φj(p, ·) istrajectory for zi, so it agrees with Fi(ϕ

−1i (p), ·), and thus Φi(p, ·) and Φj(p, ·) agree

on (−ε, ε).Taking a union of the interiors of the sets C×(−ε, ε) gives an openW ′ ∈M×R

such that:

(a) for each p, { t : (p, t) ∈ W ′ } is an interval containing 0;

(b) there is a unique function Φ′ : C × (−ε, ε) → U such that for all p ∈ C,Φ′(p, 0) = p and Φ′(p, ·) is a trajectory for ζ .


If W ′′ and Φ′′ is a second pair with these properties, then W ′∪W ′′ satisfies (a), anduniqueness implies that Φ′ and Φ′′ agree on W ′ ∩W ′′, so the function on W ′ ∪W ′′

that agrees with Φ′ on W ′ and with Φ′′ on W ′′ satisfies (b). In fact his logic extendsto any, possibly infinite, collection of pairs. Applying it to the collection of all suchpairs shows that there is a maximal W satisfying (a), called the flow domain ofζ , such that there is a unique Φ : W → M satisfying (b), which is called the flow

of ζ . Since the flow agrees, in a neighborhood of any point, with a function derived(by change of time) from one of those given by Theorem 15.2.5, it is continuous,and it is Cs (1 ≤ s ≤ r) if ζ is Cs.

The vector field ζ is said to be complete if W =M ×R. When this is the caseeach Φ(·, t) : M → M is a homeomorphism (or Cs diffeomorphism is ζ is Cs) withinverse Φ(·,−t), and t 7→ Φ(·, t) is a homomorphism from R (thought of as a group)to the space of homeomorphisms (or Cs diffeomorphisms) between M and itself.

It is important to understand that when ζ is not complete, it is because thereare trajectories that “go to ∞ in finite time.” One way of making this rigorous isto define the notion of “going to ∞” as a matter of eventually being outside anycompact set. Suppose that Ip = (a, b), where b <∞, and C ⊂ M is compact. If wehad Φ(p, tn) ∈ C for all n, where {tn} is a sequence in (a, b) converging to b, thenafter passing to a subsequence we would have Φ(p, tn) → q for some q ∈ C, and wecould used the method of the last proof to show that (p, b) ∈ W .

15.3 The Vector Field Index

If S ⊂ M and ζ is a vector field on S, an equilibrium of ζ is a point p ∈ Ssuch that ζ(p) = 0 ∈ TpM . Intuitively, an equilibrium is a rest point of thedynamical system defined by ζ in the sense that the constant function with value pis a trajectory.

The axiomatic description of the vector field index resembles the correspondingdescriptions of the degree and the fixed point index. If C ⊂ M is compact, acontinuous vector field ζ on C is index admissible if it has no equilibria in ∂C.(As before, intC is the topological interior of C, and ∂C = C\intC is its topologicalboundary.) Let V(M) be the set of index admissible vector fields ζ : C → TM whereC is compact.

Definition 15.3.1. A vector field index for M is a function ind : V(M) → Z,ζ 7→ ind(ζ), satisfying:

(V1) ind(ζ) = 1 for all ζ ∈ V(M) with domain C such that there is a Cr parame-terization ϕ : V → M with C ⊂ ϕ(V ), ϕ−1(C) = Dm, and Dϕ(x)−1ζϕ(x) = xfor all x ∈ Dm = { x ∈ Rm : ‖x‖ ≤ 1 }.

(V2) ind(ζ) =∑s

i=1 ind(ζ |Ci) whenever ζ ∈ V(M), C is the domain of ζ, and

C1, . . . , Cs are pairwise disjoint compact subsets of C such that ζ has no equi-libria in C \ (intC1 ∪ . . . ∪ intCs).

(V3) For each ζ ∈ V(M) with domain C there is a neighborhood A ⊂ TM of Gr(ζ)such that ind(ζ ′) = ind(ζ) for all vector fields ζ ′ on C with Gr(ζ ′) ⊂ A.

15.3. THE VECTOR FIELD INDEX 217

A vector field homotopy on S is a continuous function η : S × [0, 1] → TMsuch that π(η(p, t)) = p for all (p, t), which is to say that each ηt = η(·, t) : S → TMis a vector field on S. A vector field homotopy η on C is index admissible if eachηt is index admissible. If ind(·) is a vector field index, then ind(ηt) is locally constantas a function of t, hence constant because [0, 1] is connected, so ind(η0) = ind(η1).

Our analysis of the vector field index relates it to the fixed point index.

Theorem 15.3.2. There is a unique index for M . If ζ ∈ V(M) has an extensionto a neighborhood of its domain that is locally Lipschitz and Φ is the flow of thisextension, then

ind(ζ) = Λ(Φ(·,−t)|C) = (−1)mΛ(Φ(·, t)|C)for all sufficiently small positive t. Equivalently, ind(−ζ) = Λ(Φ(·, t)|C) for smallpositive t.

Remark: In the theory of dynamical systems we are more interested in the futurethan the past. In particular, forward stability is of much greater interest thanbackward stability, even though the symmetry t 7→ −t makes the study of oneequivalent to the study of the other. From this point of view it seems that it wouldhave been preferable to define the vector field index with (V1) replaced by thenormalization requiring that the vector field x 7→ −x ∈ TxR

m has index 1.

The remainder of this section is devoted to the proof of Theorem 15.3.2. Fixζ ∈ V(M) with domain C. The first order of business is to show that ζ can be ap-proximated by a well enough behaved vector field that is defined on a neighborhoodof C.

Since C is compact, it is covered by the interiors of a finite collection K1, . . . , Kk

of compact sets, with each Ki contained in an open Vi that is the image of a Cr

parameterization ϕi. Each ϕi induces an isomorphism between TVi and Vi × Rm,so that the Tietze extension theorem implies that there is a vector field on Vi thatagrees with ζ on C∩Vi. There is a partition of unity {λi} forK1∪. . .∪Kk subordinateto the cover V1, . . . , Vk, and we may define an extension of ζ to V =

⋃

i Vi by setting

ζ(p) =∑

p∈Viλi(p)ζi(p).

Suppose 2 ≤ r ≤ ∞. We will need to show that ζ can be approximated by avector field that is locally Lipschitz, but in fact we can approximate with a Cr−1

vector field. In the setting of the last paragraph, we may assume that the partitionof unity {λi} is Cr. Proposition 10.2.7 allows us to approximate each vector field

x 7→ Dϕi(x)−1(ζi,ϕi(x))

on ϕi(Vi) with a Cr vector field ξi, and

p 7→ Dϕi(ϕ−1i (p))ξi,ϕ−1

i (p)

is then a Cr−1 vector field ζi on Vi that approximates ζi. The vector field ζ on Vgiven by

ζ(p) =∑

p∈Viλi(p)ζi(p)


is a Cr−1 vector field that approximates ζ .Actually, we wish to approximate ζ with a Cr−1 vector field satisfying an addi-

tional regularity condition. Recall that T(p,0)(TM) = TpM × TpM , and let

π2 : TpM × TpM → TpM, π2(v, w) = w,

be the projection onto the second component. We say that p is a regular equilib-

rium of ζ if p is an equilibrium of ζ and π2 ◦Dζ(p) is nonsingular. (Intuitively, thederivative at p of the map q 7→ ζq has rank m.) We need the following local result.

Lemma 15.3.3. Suppose that K ⊂ V ⊂ V ⊂ U ⊂ Rm with U and V open andK and V compact, and λ : U → [0, 1] is a Cr−1 (2 ≤ r ≤ ∞) function withλ(x) = 1 whenever x ∈ K and λ(x) = 0 whenever x /∈ V . Let D be a closed subsetof U , and let f : U → Rm be a Cr−1 function whose zeros in D are all regular.Then any neighborhood of the origin in Rm contains a y such that all the zeros offy : x 7→ f(x) + λ(x)y in D ∪K are regular.

Proof. The equidimensional case of Sard’s theorem implies that the set of regularvalues of f |V is dense, and if −y is a regular value of f |V , then all the zeros of fy|Kare regular. If the claim is false, there must be a sequence yn → 0 such that foreach n there is a xn ∈ V ∩D such that xn is a singular zero of fyn. But V ∩D iscompact, so the sequence {xn} must have a limit point, which is a singular zero off |D by continuity, contrary to assumption.

Using the last result, we first choose a perturbation ζ1 of ζ1 such that λ1ζ1 +∑n

i=2 λiζi has no zeros in K1. Working inductively, we then choose perturbations

ζ2, . . . , ζn of ζ2, . . . , ζn one at a time, in such a way that for each i,

i∑

h=1

λiζi +n

∑

h=i+1

λiζi

has only regular equilibria in Di−1∪Ki = (K1∪ . . .∪Ki−1)∪Ki. At the end of thisζ =

∑

i λiζi is an approximation of ζ that has only regular equilibria in C.We can now explain the proof that the vector field index is unique. In view

of (V3), if it exists, the vector field index is determined by its values on thoseζ ∈ V(M) that are Cr−1 and have only regular equilibria. Applying (V2), we findthat the vector field index is in fact fully determined by its values on the ζ that areCr−1 and have a single regular equilibria.

For such ζ the main ideas here are essentially the ones that were developedin connection with our analysis of orientation, so we only sketch them briefly andinformally. By the logic of that analysis, there is a Cr parameterization ϕ : V →Mwhose image contains the unique equilibrium, such that either x 7→ Dϕ(x)−1ζϕ(x)is admissibly homotopic to either the vector field x 7→ x or the vector field x 7→(−x1, x2, . . . , xm). In the first of these two situations, the index is determined by(V1). In addition, one can easily define an admissible homotopy transforming thissituation into one in which there are three regular equilibria, two of which are ofthe first type and one of which is of the second type. Combining (V1) and (V2), we

15.3. THE VECTOR FIELD INDEX 219

find that the index of the equilibrium of the second type is −1, so the vector fieldindex is indeed uniquely determined by the axioms.

We still need to construct the index. One way to proceed would be to definethe vector field index to be the index of nearby smooth approximations with reg-ular equilibria. This is possible, but the key step, namely showing that differentapproximations give the same result, would duplicate work done in earlier chapters.Instead we will define the vector field index using the characterization in terms ofthe fixed point index given in the statement of Theorem 15.3.2, after which theaxioms for the fixed point index will imply (V1)-(V3).

We need the following technical fact.

Lemma 15.3.4. If C ⊂ M is compact, ζ is a locally Lipschitz vector field definedon a neighborhood U of C, Φ is the flow of ζ, and ζ(p) 6= 0 for all p ∈ C, then thereis ε > 0 such that Φ(p, t) 6= p for all (p, t) ∈ C × ((−ε, 0) ∪ (0, ε)).

Proof. We have C = K1 ∪ . . . ∪ Kk where each Ki is compact and contained inthe domain Wi of a C

r parameterization ϕi. It suffices to prove the claim withC replaced with Ki, so we may assume that C is contained in the image of a Cr

parameterization. We can use Lemma 15.2.1 to move the problem to the domain ofthe parameterization, so we may assume that U is an open subset of Rm and thatζ is Lipschitz, say with Lipschitz constant L.

Let V be a neighborhood of C such that V is a compact subset of U , and let

M := maxp∈V

‖ζ(p)‖ and m := minp∈V

‖ζ(p)‖.

Let ε > 0 be small enough that: a) V × (−ε, ε) is contained in the flow domain W ζ

of ζ ; b) Φ(C × (−ε, ε)) ⊂ V ; c) LMε < m.

We claim that

‖Φ(p, t)− p‖ < M |t| (∗)

for all (p, t) ∈ C × (−ε, ε). For any (p, s) ∈ C × (−ε, ε) and v ∈ Rm we have

∣

∣

dds〈Φ(p, s)− p, v〉

∣

∣ =∣

∣〈ζΦ(p,s), v〉∣

∣ ≤ ‖ζΦ(p,s)‖ · ‖v‖ ≤ M‖v‖.

Therefore the intermediate value theorem implies that |〈Φ(p, t)−p, v〉| ≤M |t| ·‖v‖.Since v may be any unit vector, (∗) follows this.

Now suppose that Φ(p, t) = p for some (p, t) ∈ C × ((−ε, 0) ∪ (0, ε)). Rolle’stheorem implies that there is some s between 0 and t such that

0 = dds〈Φ(p, s)− p, ζp〉 = 〈ζΦ(p,s), ζp〉,

but among the vectors that are orthogonal to ζp, the origin is the closest, and

‖ζΦ(p,s) − ζp‖ ≤ L‖Φ(p, s)− p‖ ≤ LM |s| < LMε < m < ‖ζp‖,

so this is impossible.


We now define the vector field index of the pair (U, ζ) to be Λ(Φζ(·, t)|C), whereζ is a nearby Cr−1 vector field, for all sufficiently small negative t. Since such a ζis vector field admissible, the last result (applied to ∂C) implies that Φζ(·, t)|C isindex admissible for all small negative t, and it also (by Homotopy) implies thatthe choice of t does not affect the definition.

We must also show that the choice of ζ does not matter. Certainly there is aneighborhood such that for ζ0 and ζ1 in this neighborhood and all s ∈ [0, 1], ζs =

(1− s)ζ0 + sζ1 is index admissible. In addition, that Φζs(p, t) is jointly continuousas a function of (p, t, s) follows from Theorem 15.2.5 applied to the vector field(p, s) 7→ (ζs(p), 0) ∈ T(p,s)(M × (−δ, 1+ δ)) on M × (−δ, 1+ δ), where δ is a suitablesmall positive number. Therefore Continuity for the fixed point index implies that

Λ(Φζ0(·, t)|C) = Λ(Φζ1(·, t)|C).

We now have to verify that our definition satisfies (V1)-(V3). But the resultestablished in the last paragraph immediately implies (V3). Of course (V2) followsdirectly from the Additivity property of the fixed point index. Finally, the flow ofthe vector field ζ(x) = x on Rm is Φ(x, t) = etx, so for small negative t there isan index admissible homotopy between Φ(·, t)|Dm and the constant map x 7→ 0, so(V1) follows from Continuity and Normalization for the fixed point index.

All that remains of the proof of Theorem 15.3.2 is to show that ζ is locallyLipschitz and defined in a neighborhood of C, then

ind(ζ) = (−1)mΛ(Φζ(·, t)|C)

for sufficiently small positive t. Since we can approximate ζ with a vector field thatis Cr−1 and has only regular equilibria, by (V2) it suffices to prove this when C isa single regular equilibrium. If ζ is one of the two vector fields x 7→ x ∈ TxR

m orx 7→ (−x1, x2, . . . , xm) ∈ TxR

m on Rm, then Φζ(x,−t) = −Φζ(x, t) for all x and t,so the result follows from the relationship between the index and the determinantof the derivative.

15.4 Dynamic Stability

If an equilibrium of a dynamical system is perturbed, does the system necessarilyreturn to the equilibrium, or can it wander far away from the starting point? Suchquestions are of obvious importance for physical systems. In economics the notionof dynamic adjustment to equilibrium is problematic, because if the dynamics ofadjustment are understood by the agents in the model, they will usually not adjusttheir strategies in the predicted way. Nonetheless economists would generally agreethat an equilibrium may or may not be empirically plausible according to whetherthere are some “natural” or “reasonable” dynamics for which it is stable.

In this section we study a basic stability notion, and show that a sufficientcondition for it is the existence of a Lyapunov function. As before we work witha locally Lipschitz vector field ζ on a Cr manifold M where r ≥ 2. Let W be theflow domain of ζ , and let Φ be the flow.

15.4. DYNAMIC STABILITY 221

One of the earliest and most useful tools for understanding stability was intro-duced by Lyapunov toward the end of the 19th century. A function f : M → R isζ-differentiable if the ζ-derivative

ζf(p) =d

dtf(Φ(p, t))|t=0

is defined for every p ∈M . A continuous function L :M → [0,∞) is a Lyapunov

function for A ⊂M if:

(a) L−1(0) = A;

(b) L is ζ-differentiable with ζL(p) < 0 for all p ∈M \ A;

(c) for every neighborhood U of A there is an ε > 0 such that L−1([0, ε]) ⊂ U .

The existence of a Lyapunov function implies quite a bit. A set A ⊂ M isinvariant if A× [0,∞) ⊂W and Φ(p, t) ∈ A for all p ∈ A and t ≥ 0. The ω-limit

set of p ∈M is⋂

t0≥0

{Φ(p, t) : t ≥ t0 }.

The domain of attraction of A is

D(A) = { p ∈M : the ω-limit set of p is nonempty and contained in A }.

A set A ⊂M is asymptotically stable if:

(a) A is compact;

(b) A is invariant;

(c) D(A) is a neighborhood of A;

(d) for every neighborhood U of A there is a neighborhood U such that Φ(p, t) ∈ Ufor all p ∈ U and t ≥ 0.

Asymptotic stability is a local property, in the sense A is asymptotically stable if andonly if it is asymptotically stable for the restriction of ζ to any given neighborhoodof A; this is mostly automatic, but to verify (c) for the restriction one needs tocombine (c) and (d) for the given vector field.

Theorem 15.4.1 (Lyapunov (1992)). If A is compact and L is a Lyapunov functionfor A, then A is asymptotically stable.

Proof. If L(Φ(p, t)) > 0 for some p ∈ A and t > 0, the intermediate value theoremwould give a t′ ∈ [0, t] with

0 <d

dtL(Φ(p, t))|t=t′ =

d

dtL(Φ(Φ(p, t′), t))|t=0,

contrary to (b). Therefore A = L−1(0) is invariant.


Let K be a compact neighborhood of A, choose ε > 0 such that L−1([0, ε]) ⊂ K,and consider a point p ∈ L−1([0, ε]). Since L(Φ(p, t)) is weakly decreasing,

{Φ(p, t) : t ≥ 0 } ⊂ L−1([0, ε]) ⊂ K,

so the ω-limit of p is a subset of K. Since K is compact and the ω-limit of isby definition the intersection of nested nonempty closed subsets, the ω-limit isnonempty. To show that the ω-limit of p is contained in A, consider some q /∈ A,and fix a t > 0. Since L is continuous there are neighborhoods of V0 of q and Vt ofΦ(q, t) such that L(q′) > L(q′′) for all q′ ∈ V0 and q′′ ∈ Vt. Since Φ is continuous,we can choose V0 small enough that Φ(q′, t) ∈ Vt for all q′ ∈ V0. The significanceof this is that if the trajectory of p ever entered V0 it would continue to Vt, and itcould not then return to V0 because L(Φ(p, t)) is a decreasing function of t, so q isnot in the ω-limit of p.

We have shown that L−1([0, ε)) ⊂ D(A), so D(A) is a neighborhood of A.Now consider an neighborhood U of A. We want a neighborhood U such that

Φ(U, t) ⊂ U for all t ≥ 0, and it suffices to set U = L−1([0, δ)) for some δ > 0 suchthat L−1([0, δ)) ⊂ U . If there was no such δ there would be a sequence {pn} inL−1([0, ε]) \ U with L(pn) → 0. Since this sequence would be contained in K, itwould have limit points, which would be in A, by (a), but also in K \ U . Of coursethis is impossible.

15.5 The Converse Lyapunov Problem

A converse Lyapunov theorem is a result asserting that if a set is asymptot-ically stable, then there is a Lyapunov function defined on a neighborhood of theset. The history of converse Lyapunov theorems is sketched by Nadzieja (1990).Briefly, after several partial results, the problem was completely solved by Wilson(1969), who showed that one could require the Lyapunov function to be C∞ whenthe given manifold is C∞. Since we do not need such a refined result, we will followthe simpler treatment given by Nadzieja (1990).

Let M , ζ , W , and Φ be as in the last section. This section’s goal is:

Theorem 15.5.1. If A is asymptotically stable, then (after replacing M with asuitable neighborhood of A) there is a Lyapunov function for A.

The construction requires that the vector field be complete, and that certainother conditions hold, so we begin by explaining how the desired situation can beachieved on some neighborhood of A. Let U ⊂ D(A) be an open neighborhood ofA whose closure (as a subset of Rk) is contained in M . For any metric on M (e.g.,the one induced by the inclusion in Rk) the infimum of the distance from a pointp ∈ U to a point in M \ U is a positive continuous function on U , so Proposition10.2.7 implies that there is a Cr function α : U → (0,∞) such that for each p ∈ U ,1/α(p) is less than the distance from p to any point in M \U . Let M be the graphof α:

M = { (p, α(p)) : p ∈ U } ⊂ U × R ⊂ Rk+1.

The closed subsets of M are the subsets that are closed in Rk+1:

15.5. THE CONVERSE LYAPUNOV PROBLEM 223

Lemma 15.5.2. M is a closed subset of Rk+1.

Proof. Suppose a sequence {(pn, hn)} in M converges to (p, h). Then p ∈ M , andit must be in U because otherwise hn = α(pn) → ∞. Continuity implies thath = α(p), so (p, h) ∈ M .

Using the map IdM × α : U → M , we defined a transformed vector field:

ζ(p,α(p)) = D(IdM × h)(p)ζp.

Since IdM × α is Cr, Lemma 15.2.4 implies that ζ is a locally Lipschitz vector fieldon M . Let Φ be the flow of ζ. Using the chain rule, it is easy to show that

Φ((p, h), t) =(

Φ(p, t), α(Φ(p, t)))

for all (p, t) in the flow domain of ζ . Since asymptotic stability is a local property,A = { (p, α(p)) : p ∈ A } is asymptotically stable for ζ.

We now wish to slow the dynamics, to prevent trajectories from going to ∞ infinite time. Another application of Proposition 10.2.7 gives a Cr function β : M →(0,∞) with β(p, h) < 1/‖ζ(p, h)‖ for all (p, h) ∈ M . Define a vector field ζ on Mby setting

ζ(p, h) = β(p, h)ζ(p, h),

and let Φ be the flow of ζ. For (p, t) such that (p, α(p), t) is in the flow domain ofζ let

B(p, t) =

∫ t

0

β(Φ(p, α(p)), s) ds.

The chain rule computation

d

dt

[

Φ(

p, α(p), B(p, t))

]

= β(Φ(p, α(p), t))ζΦ(p,α(p),B(p,t))

shows that t 7→ Φ(p, α(p), B(p, t)) is a trajectory for ζ, so

Φ(p, α(p), t) = Φ(

p, α(p), B(p, t))

.

This has two important consequences. The first is that the speed of a trajectoryof ζ is never greater than one, so the final component of Φ(p, α(p), t) cannot goto ∞ in finite (forward or backward) time. In view of our remarks at the end ofSection 15.2, ζ is complete. The second point is that since β is bounded below onany compact set, if { Φ(p, α(p), t) : t ≥ 0 } is bounded, then Φ(p, ·) traverses theentire trajectory of ζ beginning at (p, α(p)). It follows that A is asymptoticallystable for ζ. Note that if L is a Lyapunov function for ζ and A, then it is also aLyapunov function for ζ and A, and setting L(p) = L(p, α(p)) gives a Lyapunovfunction for ζ |U and A. Therefore it suffices to establish the claim with M and ζreplaced by M and ζ.

The upshot of the discussion to this point is as follows. We may assume thatζ is complete, and that the domain of attraction of A is all of M . We may alsoassume that M has a metric d that is complete—that is, any Cauchy sequenceconverges—so a sequence {pn} that is eventually outside of each compact subset ofM diverges in the sense that d(p, pn) → ∞ for any p ∈M .

The next four results are technical preparations for the main argument.


Lemma 15.5.3. Let K be a compact subset of M . For any neighborhood U of Athere is a neighborhood V of K and a number T such that Φ(p, t) ∈ U wheneverp ∈ V and t ≥ T .

Proof. The asymptotic stability of A implies that A has a neighborhood U suchthat Φ(U, t) ⊂ U for all t ≥ 0. The domain of attraction of A is all of M , so foreach p ∈ K there is tp such that Φ(p, tp) ∈ U , and the continuity of Φ implies thatΦ(p′, tp) ∈ U for all p′ in some neighborhood of p. Since K is compact, it has a finiteopen cover V1, . . . , Vk such that for each i there is some ti such that Φ(p, t) ∈ Uwhenever p ∈ Vi and t ≥ ti. Set V = V1 ∪ . . . ∪ Vk and T = max{t1, . . . , tk}.

Lemma 15.5.4. If {(pn, tn)} is a sequence in W = M × R such that the closureof {pn} does not intersect A, and {Φ(pn, tn)} is bounded, then the sequence {tn} isbounded below.

Proof. Let U be a neighborhood of A that does not contain any element of {pn}Since {Φ(pn, tn)} is bounded, it is contained in a compact set, so the last resultgives a T such that Φ(Φ(pn, tn), t) = Φ(pn, tn + t) ∈ U for all t ≥ T . For all n wehave tn > −T because otherwise pn = Φ(pn, 0) ∈ U .

Lemma 15.5.5. For all p ∈M \ A, d(Φ(p, t), p) → ∞ as t→ −∞.

Proof. Otherwise there is a p and sequence {tn} with tn → −∞ such that {Φ(p, tn)}is bounded and consequently contained in a compact set. The last result impliesthat this is impossible.

Let ℓ :M → [0,∞) be the function

ℓ(p) = inft≤0

d(Φ(p, t), A).

If p ∈ A, then ℓ(p) = 0. If p /∈ A, then Φ(p, t) /∈ A for all t ≤ 0 because t isinvariant, and the last result implies that ℓ(p) > 0.

Lemma 15.5.6. ℓ is continuous.

Proof. Since ℓ(p) ≤ d(p, A), ℓ is continuous at points in A. Suppose that {pn} is asequence converging to a point p /∈ A. The last result implies that there are t ≤ 0and tn ≤ 0 for each n such that ℓ(p) = d(Φ(p, t), A) and ℓ(pn) = d(Φ(pn, tn), A).The continuity of Φ and d gives

lim supn

ℓ(pn) ≤ lim supn

d(Φ(pn, t), pn) = ℓ(p).

On the other hand d(Φ(pn, tn), A) ≤ d(pn, A), so the sequence Φ(pn, tn) is bounded,and Lemma 15.5.4 implies that {tn} is bounded below. Passing to a subsequence,we may suppose that tn → t′, so that

ℓ(p) ≤ d(Φ(p, t′), A) = limnd(Φ(pn, tn), A) = lim inf

nℓ(pn).

Thus ℓ(pn) → ℓ(p).

15.5. THE CONVERSE LYAPUNOV PROBLEM 225

We are now ready for the main construction. Let L :M → [0,∞) be defined by

L(p) =

∫ ∞

0

ℓ(Φ(p, s)) exp(−s) ds.

The rest of the argument verifies that L is, in fact, a Lyapunov function.Since A is invariant, L(p) = 0 if p ∈ A. If p /∈ A, then L(p) > 0 because

ℓ(p) > 0.To show that L is continuous at an arbitrary p ∈ M we observe that for any

ε > 0 there is a T such that ℓ(Φ(p, T )) < ε/2. Since Φ is continuous we haveℓ(Φ(p′, T )) < ε/2 and |ℓ(Φ(p′, t))− ℓ(Φ(p, t))| < ε/2 for all p′ in some neighborhoodof p and all t ∈ [0, T ], so that

|L(p′)− L(p)| ≤∫ T

0

∣

∣ℓ(Φ(p′, s))− ℓ(Φ(p, s))∣

∣ exp(−s) ds −∣

∣

∣

∣

∫ ∞

T

ℓ(Φ(p′, s)) exp(−s) ds−∫ ∞

T

ℓ(Φ(p, s)) exp(−s) ds∣

∣

∣

∣

< ε

for all p′ in this neighborhood.To show that L is ζ-differentiable, and to compute its derivative, we observe

that

L(Φ(p, t)) =

∫ ∞

0

Φ(p, t+ s) exp(−s) ds = exp(t)

∫ ∞

t

Φ(p, s) exp(−s) ds,

so that

L(Φ(p, t))− L(p) = (exp(t)− 1)

∫ ∞

t

Φ(p, s) exp(−s) ds−∫ t

0

ℓ(Φ(p, t)) exp(−s) ds.

Dividing by t and taking the limit as t→ 0 gives

ζL(p) = L(p)− ℓ(p).

Note that

L(p) < ℓ(p)

∫ ∞

0

exp(s) ds = ℓ(p)

because ℓ(Φ(p, ·)) is weakly decreasing with limt→∞ ℓ(Φ(p, t)) = 0. Therefore ζL(p) <0 when p /∈ A.

We need one more technical result.

Lemma 15.5.7. If {(pn, tn)} is a sequence such that d(pn, A) → ∞ and there is anumber T such that tn < T for all n, then d(Φ(pn, tn), A) → ∞.

Proof. Suppose not. After passing to a subsequence there is a B > 0 such thatd(Φ(pn, tn), A) < B for all n, so the sequence {Φ(pn, tn)} is contained in a compactset K. Since the domain of attraction of A is all of M , Φ is continuous, and Kis compact, for any ε > 0 there is some S such that d(Φ(p, t), A) < ε wheneverp ∈ K and t > S. The function p 7→ d(Φ(p, t), A) is continuous, hence bounded onthe compact set K × [−T, S], so it is bounded on all of K × [−T,∞). But this isimpossible because −tn > −T and

d(Φ(Φ(pn, tn),−tn), A) = d(pn, A) → ∞.


It remains to show that if U is open and contains A, then there is an ε > 0 suchthat L−1([0, ε]) ⊂ U . The alternative is that there is some sequence {pn} in M \ Uwith L(pn) → 0. Since L is continuous and positive on M \ U , the sequence musteventually be outside any compact set. For each n we can choose tn ≤ 1 such thatℓ(Φ(pn, 1)) = d(Φ(pn, tn), A), and the last result implies that ℓ(Φ(pn, 1)) → ∞, so

L(pn) ≥∫ 1

0

ℓ(Φ(pn, t)) exp(−t) dt ≥ ℓ(Φ(pn, t))

∫ 1

0

exp(−t) dt→ ∞.

This contradiction completes the proof that L is a Lyapunov function, so the proofof Theorem 15.5.1 is complete.

15.6 A Necessary Condition for Stability

This section establishes the relationship between asymptotic stability and thevector field index. Let M , ζ , and Φ be as before. If A is a compact set of equilibriafor ζ that has a compact index admissible neighborhood C that contains no otherequilibria of ζ , then ind(ζC) is the same for all such C; we denote this commonvalue of the index by indζ(A).

Theorem 15.6.1. If A is an ANR that is asymptotically stable, then

ind−ζ(A) = χ(A).

Proof. From the last section we know that (after restricting to some neighborhoodof A) there is a Lyapunov function L for ζ . For some ε > 0, Aε = L−1([0, ε]) iscompact. Using the flow, it is not hard to show that Aε is a retract of Aε′ for someε′ > ε, and that Aε′ is a neighborhood of Aε, so Aε is an ANR. For each t > 0,Φ(·, t)|Aε maps Aε to itself, and is homotopic to the identity, so

χ(Aε) = Λ(Φ(·, t)|Aε) = (−1)mind(ζ |Aε).

Since A is an ANR, there is a retraction r : C → A, where C is a neighborhoodof A. By taking ε small we may insure that Aε ⊂ C, and we may then replaceC with ε, so we may assume the domain of r is actually Aε. If i : A → C is theinclusion, then Commutativity gives

χ(A) = Λ(r ◦ i) = Λ(i ◦ r) = Λ(r),

so it suffices to show that if t > 0, then

Λ(Φ(·, t)|C) = Λ(r).

Let W ⊂M ×M be a neighborhood of the diagonal for which there is “convexcombination” function c :W×[0, 1] → M as per Proposition 10.7.9. We claim that ifT is sufficiently large, then there is an index admissible homotopy h : Aε×[0, 1] → Aεbetween IdAε and r given by

h(p, t) =

Φ(p, 3tT ), 0 ≤ t ≤ 13,

c((Φ(p, T ), r(Φ(p, T ))), 3(t− 13)), 1

3≤ t ≤ 2

3,

r(Φ(p, 3(1− t)T )), 23≤ t ≤ 1.

15.6. A NECESSARY CONDITION FOR STABILITY 227

This works because there is some neighborhood U of A such that c((p, r(p)), t) isdefined and in the interior of Aε for all p ∈ U and all 0 ≤ t ≤ 1, and Φ(Aε, T ) ⊂ Uif T is sufficiently large.

The following special case is a prominent result in the theory of dynamicalsystems.

Corollary 15.6.2 (Krasnosel’ski and Zabreiko (1984)). If {p0} is asymptoticallystable, then

ind−ζ({p0}) = 1.

Physical equilibrium concepts are usually rest points of explicit dynamical sys-tems, for which the notion of stability is easily understood. For economic models,dynamic adjustment to equilibrium is a concept that goes back to Walras’ notion oftatonnement, but such adjustment is conceptually problematic. If there is gradualadjustment of prices, or gradual adjustment of mixed strategies, and the agents un-derstand and expect this, then instead of conforming to such dynamics the agentswill exploit and undermine them. For this reason there are, to a rough approxi-mation, no accepted theoretical foundations for a prediction that an economic orstrategic equilibrium is dynamically stable.

Paul Samuelson (1941, 1942, 1947) advocated a correspondence principle, ac-cording to which dynamical stability of an equilibrium has implications for thequalitative properties of the equilibrium’s comparative statics. Samuelson’s writ-ings consider many particular models, but he never formulated the correspondenceprinciple as a precise and general theorem, and the economics profession’s under-standing of it has languished, being largely restricted to 1-dimensional cases; seeEchenique (2008) for a succinct summary. However, it is possible to pass quicklyfrom the Krasnosel’ski-Zabreiko theorem to a general formulation of the correspon-dence principle, as we now explain.

Let U ⊂ Rm be open, let P be a space of parameter values that is an opensubset of Rn, and let z : U × P → Rm be a C1 function that we understand asa parameterized vector field. (Working in a Euclidean setting allows us to avoiddiscussing differentiation of vector fields on manifolds, which is a very substantialtopic.) For (x, α) ∈ U×P let ∂xz(x, α) and ∂αz(x, α) denote the matrices of partialderivatives of the components of z with respect to the components of x and αrespectively.

We consider a point (x0, α0) with z(x0, α0) = 0 such that ∂xz(x0, α0) is nonsin-gular. The implicit function implies that there is a neighborhood V of α0 and C1

function σ : V → U such that σ(α0) = x0 and z(σ(α), α) = 0 for all α ∈ V . Themethod of comparative statics if to differentiate this equation with respect to α,using the chain rule, then rearrange, arriving at

dσ

dα(α0) = −∂xz(x0, α0)

−1 · ∂αz(x0, α0).

The last result implies that if {x0} is asymptotically stable for the vector fieldz(·, α0), then the determinant of −∂xz(x0, α0) is positive, as is the determinant ofits inverse. When m = 1 this says that the vector dσ

dα(α0) is a positive scalar multiple


of ∂αz(x0, α0). When m > 1 it says that the transformation mapping ∂αz(x0, α0)to dσ

dα(α0) is orientation preserving, which is still a qualitiative property of the

comparative statics, though of course its intuitive and conceptual significance isless immediate. (It is sometimes argued, e.g., pp. 320-1 of Arrow and Hahn (1971),that the correspondence principle has no consequences beyond the 1-dimensionalcase, but this does not seem quite right. In higher dimensions it still provides aqualitative restriction on the comparative statics. It is true that the restrictionprovides only one bit of information, so by itself it is unlikely to be useful, but oneshould still expect the correspondence principle to have some useful consequencesin particular models, in combination with various auxilary hypotheses.)

We conclude with some comments on the status of the correspondence principleas a foundational element of economic analysis. First of all, the fact that our currentunderstanding of adjustment to equilibrium gives little reason to expect an equilib-rium to be stable is of limited relevance, because in the correspondence principlestability is an hypothesis, not a conclusion. That is, we observe an equilibrium thatpersists over time, and is consequently stable with respect to whatever mechanismbrings about reequilibration after small disturbances. This is given.

In general equilibrium theory and noncooperative game theory, and in a mul-titude of particular economic models, equilibrium is implicitly defined as a restpoint of some process according to which, in response to a failure of an equilib-rium condition, some agent would change her behavior in pursuit of higher utility.Such a definition brings with it some sense of “natural” dynamics, e.g., the variousprices each adjusting in the direction of excess demand, or each agent adjustingher mixed strategy in some direction that would be improving if others were notalso adjusting. The Krasnosel’ski-Zabreiko theorem will typically imply that ifind−z(·,α0)({x0}) 6= 1, then x0 is not stable for any dynamic process that is naturalin this sense. Logically, this leaves open the possibility that the actual dynamicprocess is unnatural, which seemingly requires some sort of coordination on thepart of the various agents, or perhaps that it is much more complicated than weare imagining. Almost certainly most economists would regard these possibilitiesas far fetched. In this sense the correspondence principle is not less reliable or wellfounded than other basic principles of our imprecise and uncertain science.

Bibliography

Alexander, J. W. (1924). An example of a simply-connected surface bounding aregion which is not simply-connected. Proceedings of the National Academy ofSciences, 10:8–10.

Arora, S. and Boaz, B. (2007). Computational Complexity: A Modern Approach.Cambridge University Press, Cambridge.

Arrow, K., Block, H. D., and Hurwicz, L. (1959). On the stability of the competitiveequilibrium, II. Econometrica, 27:82–109.

Arrow, K. and Debreu, G. (1954). Existence of an equilibrium for a competitiveeconomy. Econometrica, 22:265–290.

Arrow, K. and Hurwicz, L. (1958). On the stability of the competitive equilibrium,I. Econometrica, 26:522–552.

Arrow, K. J. and Hahn, F. H. (1971). General Competitive Analysis. Holden Day,San Francisco.

Bollobas, B. (1979). Graph Theory: an Introductory Course. Springer-Verlag, NewYork.

Border, K. C. (1985). Fixed point theorems with applications to economics and gametheory. Cambridge University Press, Cambridge.

Brouwer, L. E. J. (1912). Uber Abbildung von Mannigfaltikeiten. MathematicheAnnalen, 71:97–115.

Browder, F. (1948). The Topological Fixed Point Theory and its Applications toFunctional Analysis. PhD thesis, Princeton University.

Brown, R. (1971). The Lefschetz Fixed Point Theorem. Scott Foresman and Co.,Glenview, IL.

Chen, X. and Deng, X. (2006a). On the complexity of 2D discrete fixed pointproblem. In Proceedings of the 33th International Colloquium on Automata, Lan-guages, and Programming, pages 489–500.

Chen, X. and Deng, X. (2006b). Settling the complexity of two-player Nash equi-librium. In Proceedings of the 47th Annual IEEE Symposium on Foundations ofComputer Science, pages 261–272.

229

230 BIBLIOGRAPHY

Daskalakis, C., Goldberg, P., and Papadimitriou, C. (2006). The complexity ofcomputing a Nash equilibrium. In Proceedings of the 38th ACM Symposium onthe Theory of Computing.

Debreu, G. (1959). Theory of Value: An Axiomatic Analysis of Economic Equilib-rium. Wiley & Sons, inc., New York.

Demichelis, S. and Germano, F. (2000). On the indices of zeros of Nash fields.Journal of Economic Theory, 94:192–217.

Demichelis, S. and Ritzberger, K. (2003). From evolutionary to strategic stability.Journal of Economic Theory, 113:51–75.

Dierker, E. (1972). Two remarks on the number of equilibria of an economy. Econo-metrica, 40:951–953.

Dugundji, J. (1951). An extension of tietze’s theorem. Pacific Journal of Mathe-matics, 1:353–367.

Dugundji, J. and Granas, A. (2003). Fixed Point Theory. Springer-Verlag, NewYork.

Echenique, F. (2008). The correspondence principle. In Durlauf, S. and Blume, L.,editors, The New Palgrave Dictionary of Economics (Second Edition). PalgraveMacmillan, New York.

Eilenberg, S. and Montgomery, D. (1946). Fixed-point theorems for multivaluedtransformations. American Journal of Mathematics, 68:214–222.

Fan, K. (1952). Fixed point and minimax theorems in locally convex linear spaces.Proceedings of the National Academy of Sciences, 38:121–126.

Federer, H. (1969). Geometric Measure Theory. Springer, New York.

Fort, M. (1950). Essential and nonessential fixed points. American Journal ofMathematics, 72:315–322.

Glicksberg, I. (1952). A further generalization of the Kakutani fixed point theoremwith applications to Nash equilibrium. Proceedings of the American MathematicalSociety, 3:170–174.

Goldberg, P., Papadimitriou, C., and Savani, R. (2011). The complexity of thehomotopy method, equilibrium selection, and Lemke-Howson solutions. In Pro-ceedings of the 52nd Annual IEEE Symposium on the Foundations of ComputerScience.

Govindan, S. and Wilson, R. (2008). Nash equilibrium, refinements of. In Durlauf,S. and Blume, L., editors, The New Palgrave Dictionary of Economics (SecondEdition). Palgrave Macmillan, New York.

BIBLIOGRAPHY 231

Guillemin, V. and Pollack, A. (1974). Differential Topology. Springer-Verlag, NewYork.

Hart, O. and Kuhn, H. (1975). A proof of the existence of equilibrium without thefree disposal assumption. J. of Mathematical Economics, 2:335–343.

Hauk, E. and Hurkens, S. (2002). On forward induction and evolutionary andstrategic stability. Journal of Economic Theory, 106:66–90.

Hirsch, M. (1976). Differential Topology. Springer-Verlag, New York.

Hirsch, M., Papadimitriou, C., and Vavasis, S. (1989). Exponential lower boundsfor finding Brouwer fixed points. Journal of Complexity, 5:379–416.

Hirsch, M. and Smale, S. (1974). Differential Equations, Dynamical Systems, andLinear Algebra. Academic Press, Orlando.

Hofbauer, J. (1990). An index theorem for dissipative semiflows. Rocky MountainJournal of Mathematics, 20:1017–1031.

Hopf, H. (1928). A new proof of the Lefschetz formula on invariant points. Pro-ceedings of the National Academy of Sciences, USA, 14:149–153.

Jacobson, N. (1953). Lectures in Abstract Algebra. D. van Norstrand Inc., Princeton.

Jiang, J.-h. (1963). Essential component of the set of fixed points of the multivaluedmappings and its application to the theory of games. Scientia Sinica, 12:951–964.

Jordan, J. S. (1987). The informational requirement of local stability in decentral-ized allocation mechanisms. In Groves, T., Radner, R., and Reiter, S., editors,Information, Incentives, and Economic Mechanisms: Essays in Honor of LeonidHurwicz, pages 183–212. University of Minnesota Press, Minneapolis.

Kakutani, S. (1941). A generalization of Brouwer’s fixed point theorem. DukeMathematical Journal, 8:457–459.

Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming.In Proceedings of the 16th ACM Symposium on Theory of Computing, STOC ’84,pages 302–311, NewYork, NY, USA. ACM.

Kelley, J. (1955). General Topology. Springer Verlag, New York.

Khachian, L. (1979). A polynomial algorithm in linear programming. Soviet Math-ematics Doklady, 20:191–194.

Kinoshita, S. (1952). On essential components of the set of fixed points. OsakaMathematical Journal, 4:19–22.

Kinoshita, S. (1953). On some contractible continua without the fixed point prop-erty. Fundamentae Mathematicae, 40:96–98.

232 BIBLIOGRAPHY

Klee, V. and Minty, G. (1972). How good is the simplex algorithm? In Sisha, O.,editor, Inequalities III. Academic Press, New York.

Kohlberg, E. and Mertens, J.-F. (1986). On the strategic stability of equilibria.Econometrica, 54:1003–1038.

Krasnosel’ski, M. A. and Zabreiko, P. P. (1984). Geometric Methods of NonlinearAnalysis. Springer-Berlin, Berlin.

Kreps, D. and Wilson, R. (1982). Sequential equilibrium. Econometrica, 50:863–894.

Kuhn, H. and MacKinnon, J. (1975). Sandwich method for finding fixed points.Journal of Optimization Theory and Applications, 17:189–204.

Kuratowski, K. (1935). Quelques problems concernant les espaces metriques non-separables. Fundamenta Mathematicae, 25:534–545.

Lefschetz, S. (1923). Continuous transformations of manifolds. Proceedings of theNational Academy of Sciences, 9:90–93.

Lefschetz, S. (1926). Intersections and transformations of complexes and manifolds.Transactions of the American Mathematical Society, 28:1–49.

Lefschetz, S. (1927). Manifolds with a boundary and their transformations. Trans-actions of the American Mathematical Society, 29:429–462.

Lyapunov, A. (1992). The General Problem of the Stability of Motion. Taylor andFrancis, London.

Mas-Colell, A. (1974). A note on a theorem of F. Browder. Mathematical Program-ming, 6:229–233.

McLennan, A. (1991). Approxiation of contractible valued correspondences by func-tions. Journal of Mathematical Economics, 20:591–598.

McLennan, A. and Tourky, R. (2010). Imitation games and computation. Gamesand Economic Behavior, 70:4–11.

Merrill, O. (1972). Applications and Extensions of an Algorithm that ComputesFixed Points of Certain Upper Semi-continuous Point to Set Mappings. PhDthesis, University of Michigan, Ann Arbor, MI.

Mertens, J.-F. (1989). Stable equilibria—a reformulation, part i: Definition andbasic properties. Mathematics of Operations Research, 14:575–625.

Mertens, J.-F. (1991). Stable equilibria—a reformulation, part ii: Discussion of thedefinition and further results. Mathematics of Operations Research, 16:694–753.

Michael, E. (1951). Topologies on spaces of subsets. Transactions of the AmericanMathematical Society, 71:152–182.

BIBLIOGRAPHY 233

Milnor, J. (1965). Topology from the Differentiable Viewpoint. University Press ofVirginia, Charlottesville.

Morgan, F. (1988). Geometric Measure Theory: A Beginner’s Guide. AcademicPress, New York.

Myerson, R. (1978). Refinements of the Nash equilibrium concept. InternationalJ. of Game Theory, 7:73–80.

Nadzieja, T. (1990). Construction of a smooth Lyapunov function for an asymp-totically stable set. Czechoslovak Mathematical Journal, 40:195–199.

Nash, J. (1950). Non-cooperative Games. PhD thesis, Mathematics Department,Princeton University.

Nash, J. (1951). Non-cooperative games. Annals of Mathematics, 54:286–295.

Papadimitriou, C. H. (1994a). Computational Complexity. Addison Wesley Long-man, New York.

Papadimitriou, C. H. (1994b). On the complexity of the parity argument and otherinefficient proofs of existence. Journal of Computer and System Science, 48:498–532.

Ritzberger, K. (1994). The theory of normal form games from the differentiableviewpoint. International Journal of Game Theory, 23:201–236.

Rudin, M. E. (1969). A new proof that metric spaces are paracompact.Proc. Amer. Math. Soc., 20:603.

Saari, D. G. (1985). Iterative price mechanisms. Econometrica, 53:1117–1133.

Saari, D. G. and Simon, C. P. (1978). Effective price mechanisms. Econometrica,46:1097–1125.

Samuelson, P. (1947). Foundations of Economic Analysis. Harvard University Press.

Samuelson, P. A. (1941). The stability of equilibrium: Comparative statics anddynamics. Econometrica, 9:97–120.

Samuelson, P. A. (1942). The stability of equilibrium: Linear and nonlinear systems.Econometrica, 10:1–25.

Savani, R. and von Stengel, B. (2006). Hard-to-solve bimatrix games. Econometrica,74:397–429.

Scarf, H. (1960). Some examples of global instability of the competitive equilibrium.International Economic Review, 1:157–172.

Selten, R. (1975). Re-examination of the perfectness concept for equilibrium pointsof extensive games. International J. of Game Theory, 4:25–55.

234 BIBLIOGRAPHY

Shapley, L. S. (1974). A note on the Lemke-Howson algorithm. MathematicalProgramming Study, 1:175–189.

Spivak, M. (1965). Calculus on Manifolds : A Modern Approach to Classical The-orems of Advanced Calculus. Benjamin, New York.

Spivak, M. (1979). A Comprehensive Introduction to Differential Geometry, vol-ume 1. Publish or Perish, 2nd edition.

Sternberg, S. (1983). Lectures on Differential Geometry. Chelsea Publishing Com-pany, New York, 2nd edition.

Stone, A. H. (1948). Paracompactness and product spaces. Bull. Amer. Math. Soc.,54:977–982.

van der Laan, G. and Talman, A. (1979). A restart algorithm for computing fixedpoints without an extra dimension. Mathematical Programming, 17:74–84.

Vietoris, L. (1923). Bereiche Zweiter Ordnung. Monatschefte fur Mathematik undPhysik, 33:49–62.

von Neumann, J. (1928). Zur theorie der gesellschaftsspiele. Mathematische An-nalen, 100:295–320.

Williams, S. R. (1985). Necessary and sufficient conditions for the existence of alocally stable message process. Journal of Economic Theory, 35:127–154.

Wilson, F. W. (1969). Smoothing derivatives of functions and applications. Trans-actions of the American Mathematical Society, 139:413–428.

Wojdyslawski, M. (1939). Retractes absolus et hyperspaces des continus. Funda-menta Mathematicae, 32:184–192.

Ziegler, G. M. (1995). Lectures on Polytopes. Springer Verlag, New York.

Index

Cr, 126Cr ∂-embedding, 144Cr ∂-immersion, 144Cr atlas, 10Cr function, 127Cr manifold, 10, 131Cr submanifold, 11, 136Q-robust set, 113Q-robust set

minimal, 114minimal connected, 114

T1-space, 66ω-limit set, 19, 221∂-parameterization, 144ε-domination, 17, 104ε-homotopy, 17, 104EXP, 61FNP, 63NP, 61PLS (polynomial local search), 64PPAD, 64PPA, 65PPP (polynomial pigeonhole principle),

64PSPACE, 61P, 61TFNP, 63Clique, 61EOTL (end of the line), 64OEOTL (other end of the line), 65

absolute neighborhood retract, 6, 100absolute retract, 6, 102acyclic, 34affine

combination, 23dependence, 23hull, 24independence, 23

subspace, 24Alexander horned sphere, 131algorithm, 60ambient space, 10, 133annulus, 144antipodal function, 203antipodal points, 200approximates, 189Arrow, Kenneth, 2asymptotic stability, 20, 221atlas, 10, 131axiom of choice, 36

balanced set, 116Banach space, 90barycenter, 32base of a topology, 67bijection, 6Bing, R. H., 196Border, Kim, iBorsuk, Karol, 18Borsuk-Ulam theorem, 18, 204bounding hyperplane, 24Brouwer’s fixed point theorem, 3Brouwer, Luitzen, 2Brown, Robert, i

category, 135Cauchy sequence, 90Cauchy-Schwartz inequality, 91certificate, 61Church-Turing thesis, 60closed function, 74codimension, 24, 136commutativity configuration, 16, 179compact-open topology, 83complete invariant, 194complete metric space, 90complete vector field, 216

235

236 INDEX

completely metrizable, 100component of a graph, 34computational problem, 60

complete for a class, 62computable, 60decision, 61search, 61

connectedgraph, 34space, 8, 113, 165

continuous, 78contractible, 5contraction, 5converse Lyapunov theorem, 20, 222convex, 24

combination, 24cone, 25hull, 24

coordinate chart, 10, 131correspondence, 4, 77

closed valued, 77compact valued, 4convex valued, 4, 77graph of, 77lower semicontinuous, 78upper semicontinuous, 77

correspondence principle, 19, 212critical point, 139, 154critical value, 139, 154cycle, 34

Debreu, Gerard, 2degree, 11, 33, 174degree admissible

function, 12, 14, 171homotopy, 12, 171

Dehn, Max, 149Demichelis, Stefano, 19derivative, 126, 134, 135derivative along a vector field, 20Descartes, Rene, 30deterministic, 212diameter, 32diffeomorphism, 11, 133diffeomorphism point, 136differentiable, 126

differentiation along a vector field, 221dimension

of a polyhedron, 26of a polytopal complex, 30of an affine subspace, 24

directed graph, 64discrete set, 131domain of attraction, 19, 221domination, 183dual, 25Dugundji, James, 93Dugundji, James, i

edge, 27, 33Eilenberg, Samuel, 7, 18Eilenberg-Montgomery theorem, 196embedding, 6, 131endpoint, 33, 42equilibrium, 19, 21equilibrium of a vector field, 216

regular, 218essential

fixed point, 7Nash equilibrium, 8set of fixed points, 8, 112set of Nash equilibria, 8

Euclidean neighborhood retract, 6, 99Euler characteristic, 17, 20, 195expected payoffs, 37extension of an index, 183extraneous solution, 44extreme point, 29

face, 26proper, 27

facet, 27family of sets

locally finite, 85refinement of, 85

Federer, Herbert, 150Fermat’s last theorem, 149fixed point, 3, 4fixed point property, 3, 6flow, 19, 216flow domain, 216Fort, M. K., 107four color theorem, 149

INDEX 237

Freedman, Michael, 149Fubini’s theorem, 150functor, 135

general position, 41general linear group, 165Granas, Andrzej, igraph, 4, 33

half-space, 24Hauptvermutung, 196Hausdorff distance, 70Hausdorff measre zero, 156Hausdorff space, 67have the same orientation, 11, 164Hawaiian earring, 33, 100Heegaard, Poul, 149Hilbert cube, 93Hilbert space, 91homology, 2, 3, 177, 196homotopy, 5, 58–59

class, 5extension property, 198invariant, 18, 197principle, 178

homotopy extension property, 103Hopf’s theorem, 18, 197–198Hopf, Heinz, 18, 196hyperplane, 24

identity component, 165immersion, 138immersion point, 136implicit function theorem, 127index, 9, 15, 16, 177, 179index admissible

correspondence, 15, 177homotopy, 178vector field, 20

index base, 15, 177index scope, 15, 178inessential fixed point, 7initial point, 27injection, 6inner product, 91inner product space, 91invariance of domain, 18, 207

invariant, 19invariant set, 221inverse function theorem, 127isometry, 52

Kakutani, Shizuo, 4Kinoshita, Shin’ichi, 6, 95, 108

labelling, 51Lefschetz fixed point theorem, 17, 196Lefschetz number, 17, 196Lefschetz, Solomon, 17, 196Lemke-Howson algorithm, 36–49, 62, 64Lesbesgue measure, 150lineality space, 26linear complementarity problem, 45Lipshitz, 212local diffeomorphism, 138locally Cr, 130locally closed set, 98locally Lipschitz, 212locally path connected space, 102, 142lower semicontinuous, 78Lyapunov function, 221Lyapunov function for A ⊂M , 20Lyapunov theorem, 221Lyapunov, Aleksandr, 20

manifold, 10Cr, 131

manifold with boundary, 13, 144Mas-Colell, Andreu, 17, 115maximal, 34measure theory, 150measure zero, 151, 157mesh, 32Milnor, John, 150, 196Minkowski sum, 111Moise, Edwin E., 196Montgomery, Deane, 7, 18Morse-Sard theorem, 157moving frame, 165multiplicative, 16, 180Mobius, August Ferdinand, 149

narrowing of focus, 183Nash equilibrium

accessible, 43

238 INDEX

mixed, 37pure, 37refinements of, 8

Nash, John, 2negatively oriented, 164, 168negatively oriented relative to P , 169neighborhood retract, 98neighbors, 33nerve of an open cover, 105no retraction theorem, 100norm, 90normal bundle, 140normal space, 67normal vector, 24normed space, 90

opposite orientation, 164, 168oracle, 62order of differentiability, 126ordered basis, 164orientable, 11, 168orientation, 163–171orientation preserving, 11, 53, 169orientation reversing, 11, 53, 169orientation reversing loop, 167oriented ∂-manifold, 168oriented intersection number, 169oriented manifold, 11oriented vector space, 11, 164

paracompact space, 85parameterization, 10, 131partition of unity, 86

Cr, 128path, 33, 165path connected space, 164payoff functions, 37Perelman, Grigori, 149Picard-Lindelof theorem, 212, 215pivot, 57pivoting, 48Poincare conjecture, 149Poincare, Henri, 149pointed cone, 26pointed map, 113pointed space, 113polyhedral complex, 30

polyhedral subdivision, 30polyhedron, 26

minimal representation of, 27standard representation of, 27

polytopal complex, 30polytopal subdivision, 30polytope, 29

simple, 46positively oriented, 11, 164, 168positively oriented relative to P , 169predictor-corrector method, 58prime factorization, 63

quadrupleedge, 42qualified, 42vertex, 42

quotient topology, 80

Rado, Tibor, 149, 196recession cone, 25reduction, 62regular fixed point, 9regular point, 11, 139regular space, 67regular value, 11, 139retract, 6, 97retraction, 97Ritzberger, Klaus, i, 19

Samuelson, Paul, 19Sard’s theorem, 182, 218Scarf algorithm, 56Scarf, Herbert, 18separable, 6separable metric space, 91separating hyperplane theorem, 24set valued mapping, 4simplex, 31

accessible completely labelled, 58almost completely labelled, 54completely labelled, 52

simplicial complex, 31abstract, 32canonical realization, 32

simplicial subdivision, 31simply connected, 149

INDEX 239

slack variables, 44slice of a set, 153Smale, Stephen, 149smooth, 11, 156Sperner labelling, 51star-shaped, 5Steinitz, Ernst, 196step size, 58Sternberg, Shlomo, 150strategy

mixed, 37pure, 37totally mixed, 39

strategy profilemixed, 37pure, 37

strong topology, 83strong upper topology, 78subbase of a topology, 67subcomplex, 30submanifold, 11

neat, 146submersion, 138submersion point, 136subsumes, 183support of a mixed strategy, 63surjection, 6

tableau, 47tangent bundle, 133tangent space, 10, 133tatonnement, 18Tietze, Heinrich, 196topological space, 66topological vector space, 88

locally convex, 89torus, 10trajectory, 212, 213transition function, 10translation invariant topology, 88transversal, 139, 146, 158tree, 34triangulation, 31tubular neighborhood theorem, 140Turing machine, 59two person game, 37

Ulam, Stanislaw, 18uniformly locally contractible metric space,

101upper semicontinuous, 4, 77Urysohn’s lemma, 87

van Dyke, Walther, 149vector bundle, 140vector field, 19, 158, 213

along a curve, 165index admissible, 216

vector field homotopy, 217index admissible, 217

vector field index, 216vertex, 27, 32vertices, 33

connected, 34Vietoris topology, 68Vietoris, Leopold, 66von Neumann, John, 4Voronoi diagram, 30

walk, 33weak topology, 83weak upper topology, 80well ordering, 85well ordering theorem, 85Whitney embedding theorems, 133Whitney, Hassler, 10wild embedding, 131witness, 61

zero section, 140, 158

Documents

Advanced Fixed Point Theory for Economicscupid.economics.uq.edu.au/mclennan/Advanced/advanced_fp.pdf · Advanced Fixed Point Theory for Economics Andrew McLennan April 8, 2014. Preface