Foundations of Ergodic Theory · v Brief historic survey The word ergodic is a concatenation of two Greek words, ǫργoν(ergon) = work and oδoσ(odos) = way, and was introduced

Foundations of Ergodic Theory

Marcelo Viana and Krerley Oliveira

ii

Preface

In short terms, Ergodic Theory is the mathematical discipline that deals withdynamical systems endowed with invariant measures. Let us begin by explainingwhat we mean by this and why these mathematical objects are so worth study-ing. Next, we highlight some of the major achievements in this field, whose rootsgo back to the Physics of the late 19th century. Near the end of the preface, weoutline the content of this book, its structure and its pre-requisites.

What is a dynamical system?

There are several definitions of what is a dynamical system, some more generalthan others. We restrict ourselves to two main models.

The first one, to which we refer most of the time, are transformations f :M → M in some space M . Heuristically, we think of M as the space of allpossible states of a given system. Then f is the evolution law, associating witheach state x ∈ M the one state f(x) ∈ M the system will be in a unit of timelater. Thus, time is a discrete parameter in this model.

We also consider models of dynamical systems with continuous time, namelyflows. Recall that a flow in a space M is a family f t : M → M , t ∈ R oftransformations satisfying

f0 = identity and f t fs = f t+s for all t, s ∈ R. (0.0.1)

Flows appear, most notably, in connection with differential equations: take f t

to be the transformation associating with each x ∈M the value at time t of thesolution of the equation that passes through x at time zero.

We always assume that the dynamical system is measurable, that is, thatthe space M carries a σ-algebra of measurable subsets that is preserved by thedynamics, in the sense that the pre-image of any measurable subset is still ameasurable subset. Often, we take M to be a topological space, or even a metricspace, endowed with the Borel σ-algebra, that is, the smallest σ-algebra thatcontains all open sets. Even more, in many of the situations we consider inthis book, M is a smooth manifold and the dynamical system is taken to bedifferentiable.

iii

iv

What is an invariant measure?

A measure in M is a non-negative function µ defined on the σ-algebra of M ,such that µ(∅) = 0 and

µ(∪n An) =

∑

n

µ(An)

for any countable family An of pairwise disjoint measurable subsets. We call µa probability measure if µ(M) = 1. In most cases, we deal with finite measures,that is, such that µ(M) < ∞. Then we can easily turn µ into a probability ν:just define

ν(E) =µ(E)

µ(M)for every measurable set E ⊂M.

In general, we say that a measure µ is invariant under a transformation f if

µ(E) = µ(f−1(E)) for every measurable set E ⊂M. (0.0.2)

Heuristically, this may be read as follows: the probability that a point is in anygiven measurable set is the same as the probability that its image is in that set.For flows, we replace (0.0.2) by

µ(E) = µ(f−t(E)) for every measurable set E ⊂M and t ∈ R. (0.0.3)

Notice that (0.0.2)–(0.0.3) do make sense since, by assumption, the pre-imageof a measurable set is also a measurable set.

Why study invariant measures?

As in any other branch of mathematics, an important part of the motivationis intrinsic and aesthetical: as we will see, these mathematical structures havedeep and surprising properties, which are expressed through beautiful theorems.Equally fascinating, ideas and results from Ergodic Theory can be applied inmany other areas of Mathematics, including some that do not seem to haveanything to do with probabilistic concepts, such as Combinatorics and NumberTheory.

Another key motivation is that many problems in the experimental sciences,including many complicated natural phenomena, can be modelled by dynamicalsystems that leave some interesting measure invariant. Historically, the mostimportant example came from Physics: Hamiltonian systems, which describethe evolution of conservative systems in Newtonian Mechanics, are described bycertain flows that preserve a natural measure, the so-called Liouville measure.Actually, we will see that very general dynamical systems do possess invariantmeasures.

Yet another fundamental reason to be interested in invariant measures is thattheir study may yield important information on the dynamical system’s behaviorthat would be difficult to obtain otherwise. Poincare’s recurrence theorem, oneof the first results we analyze in this book, is a great illustration of this: itasserts that, relative to any finite invariant measure, almost every orbit returnsarbitrarily close to its initial state.

v

Brief historic survey

The word ergodic is a concatenation of two Greek words, ǫργoν (ergon) = workand oδoσ (odos) = way, and was introduced in the 19th century by the Austrianphysicist L. Boltzmann. The systems that interested Boltzmann, J. C. Maxwelland J. C. Gibbs, the founders of the kinetic theory of gases, can be describedby a Hamiltonian flow, associated with a differential equation of the form

(dq1dt

, . . . ,dqndt

,dp1

dt, . . . ,

dpndt

)=

(∂H

∂p1, . . . ,

∂H

∂pn,−∂H

∂q1, . . . ,− ∂H

∂qn

).

Boltzmann believed that typical orbits of such a flow fill-in the whole energysurface H−1(c) that contains them. Starting from this ergodic hypothesis, hededuced that the (time) averages of observable quantities along typical orbitscoincide with the (space) averages of such quantities on the energy surface,which was crucial for his formulation of the kinetic theory of gases.

In fact, the way it was formulated originally by Boltzmann, this hypothesisis clearly false. So, the denomination ergodic hypothesis was gradually displacedto what would have been a consequence, namely, the claim that time averagesand space averages coincide. Systems for which this is true were called ergodic.And it is fair to say that a great part of the progress experienced by ErgodicTheory in the 20th century was motivated by the quest to understand whethermost Hamiltonian systems, especially those that appear in connection with thekinetic theory of gases, are ergodic or not.

The foundations were set in the 1930’s, when J. von Neumann and G. D.Birkhoff proved that time averages are indeed well-defined for almost everyorbit. However, in the mid 1950’s, the great Russian mathematician A. N.Kolmogorov observed that many Hamiltonian systems are actually not ergodic.This spectacular discovery was much expanded V. Arnold and J. Moser, in whatcame to be called KAM (Kolmogorov-Arnold-Moser) theory.

On the other hand, still in the 1930’s, E. Hopf had given the first importantexamples of Hamiltonian systems that are ergodic, namely, the geodesic flowson surfaces with negative curvature. His result was generalized to geodesic flowson manifolds of any dimension by D. Anosov, in the 1960’s. In fact, Anosovproved ergodicity for a much more general class of systems, both with discretetime and in continuous time, which are now called Anosov systems.

An even broader class, called uniformly hyperbolic systems, was introducedby S. Smale and became a major focus for the theory of Dynamical Systemsthrough the last half a century or so. In the 1970’s, Ya. Sinai developed thetheory of Gibbs measures for Anosov systems, conservative or dissipative, whichD. Ruelle and R. Bowen rapidly extended to uniformly hyperbolic systems. Thiscertainly ranks among the greatest achievements of smooth ergodic theory.

Two other major contributions must also be mentioned in this brief survey.One, is the introduction of the notion of entropy, by Kolmogorov and Sinai,near the end of the 1950’s. Another, is the proof that the entropy is a completeinvariant for Bernoulli shifts (two Bernoulli shifts are equivalent if and only ifthey have the same entropy), by D. Ornstein, some ten years later.

vi

By then, the theory of non-uniformly hyperbolic systems was being initiatedby V. I. Oseledets, Ya. Pesin and others. But that would take us beyond thescope of the present book.

How this book came to be

This book grew from lecture notes we wrote for the participants of mini-courseswe taught at the Department of Mathematics of the Universidade Federal dePernambuco (Recife, Brazil), in January 2003, and at the meeting Novos Talen-tos em Matematica held by Fundacao Calouste Gulbenkian (Lisbon, Portugal),in September 2004.

In both cases, most of the audience consisted of young undergraduates withlittle previous contact with Measure Theory, let alone Ergodic Theory. Thus,it was necessary to provide very friendly material that allowed such students tofollow the main ideas to be presented. Still at that stage, our text was used byother colleagues, such as Vanderlei Horita (Sao Jose do Rio Preto, Brazil), forteaching mini-courses to audiences with a similar profile.

As the text evolved, we have tried to preserve this elementary character of theearly chapters, especially Chapters 1 and 2, so that they can used independentlyof the rest of the book, with as little pre-requisites as possible.

Starting from the mini-course we gave at the 2005 Coloquio Brasileiro deMatematica (IMPA, Rio de Janeiro), this project acquired a broader purpose.Gradually, we evolved towards trying to present in a consistent textbook formatthe material that, in our view, constitutes the core of Ergodic Theory. Inspiredby our own research experience in this area, we endeavored to assemble in aunified presentation the ideas and facts upon which is built the remarkabledevelopment this field experienced over the last decades.

A main concern was to try and keep the text as self-contained as possible.Ergodic Theory is based on several other mathematical disciplines, especiallyMeasure Theory, Topology and Analysis. In the appendices, we collected themain material from those disciplines that is used throughout the text. As arule, proofs are omitted, since they can easily be found in many of the excellentreferences we provide. However, we do assume that the reader is familiar withthe main tools of Linear Algebra, such as the canonical Jordan form.

Structure of the book

The main part of this book consists of 12 chapters, divided into sections andsubsections, and 7 appendices, with the status of sections and also divided intosubsections. A list of exercises is given at the end of every section, appendicesincluded. Statements (theorems, propositions, lemmas, corollaries etc), exer-cises and formulas are numbered by section and chapter: for instance, (2.3.7) isthe seventh formula in the third section of the second chapter and Exercise A.5.1is the first exercise in the fifth appendix. Hints for selected exercises are givenin the special chapter after the appendices. At the end, we provide a list ofreferences and an index.

vii

Chapters 1 through 12 may be organized as follows:

• Chapters 1 through 4 constitute a kind of introductory cycle, in whichwe present the basic notions and facts in Ergodic Theory - invariance,recurrence and ergodicity - as well as some main examples. Chapter 3introduces the fundamental results (ergodic theorems) upon which thewhole theory is built.

• Chapter 4, where we introduce the key notion of ergodicity, is a turningpoint in our text. The next two chapters (Chapters 5 and 6) develop acouple of important related topics: decomposition of invariant measuresinto ergodic measures and systems admitting a unique, necessarily ergodic,invariant measure.

• Chapters 7 through 9 deal with very diverse subjects - loss of memory, theisomorphism problem and entropy - but they also form a coherent struc-ture, built around the idea of considering increasingly ‘chaotic’ systems:mixing, Lebesgue spectrum, Kolmogorov and Bernoulli systems.

• Chapter 9 is another turning point. As we introduce the fundamentalconcept of entropy, we take our time to present it to the reader from severaldifferent viewpoints. This is naturally articulated with the content ofChapter 10, where we develop the topological version of entropy, includingan important generalization called pressure.

• In the two final Chapters 11 and 12, we focus on a specific class of dynam-ical systems, called expanding transformations, that allows us to exhibit aconcrete (and spectacular!) application of many of the general ideas pre-sented along the text. This includes Ruelle’s theorem and its applications,which we view as a natural climax of the book.

Appendices A.1 through A.2 cover several basic topics of Measure and Inte-gration. Appendix A.3 deals with the special case of Borel measures in metricspaces. In Appendix A.4 we recall some basic facts from the theory of mani-folds and smooth maps. Similarly, Appendices A.5 and A.6 cover some usefulbasic material about Banach spaces and Hilbert spaces. Finally, Appendix A.7is devoted to the spectral theorem.

Examples and applications have a key part in any mathematical disciplineand, perhaps, even more so in Ergodic Theory. For this reason, we devote specialattention to presenting concrete situations that illustrate and put in perspectivethe general results. Such examples and constructions are introduced gradually,whenever the context seems better suited to highlight their relevance. Theyoften return later in the text, to illustrate new fundamental concepts as weintroduce them.

The exercises at the end of each section have a threefold purpose. Thereare routine exercises meant to help the reader become acquainted with theconcepts and the results presented in the text. Also, we leave as exercisescertain arguments and proofs that are not used in the sequel and or belong to

viii

more elementary related areas, such as Topology or Measure Theory. Finally,more sophisticated exercises test the reader’s global understanding of the theory.For the reader’s convenience, hints for selected exercises are given in the specialchapter following the appendices.

How to use this book?

These comments are meant, primarily, for the reader who plans to use thisbook to teach a course. Appendices A.1 and A.7 provide quick references tobackground material. In principle, they are not meant to be presented in class.

The content of Chapters 1 through 12 is suitable for a one year course, or asequence of two one semester courses. In either case, the reader should be ableto cover most of the material, possibly, reserving some topics for seminars givenby the students. The following sections are especially suited for that:

Section 1.5, Section 2.5, Section 3.4, Section 4.4, Section 6.4, Section 7.3,Section 7.4, Section 8.3 Section 8.4, Section 8.5, Section 9.5, Section 9.7,Section 10.4, Section 10.5, Section 11.1, Section 11.3, Section 12.3 and Sec-

tion 12.4.

In this format, Ruelle’s theorem (Theorem 12.1) and its applications are a nat-ural closure for the course.

In case only one semester is available, some selection of topics will be neces-sary. The authors’ suggestion is to try and cover the following program:

Chapter 1: Sections 1.1, 1.2 and 1.3.Chapter 2: Sections 2.1 and 2.2.Chapter 3: Sections 3.1, 3.2 and 3.3.Chapter 4: Sections 4.1, 4.2 and 4.3.Chapter 5: Section 5.1 (mention Rokhlin’s theorem).Chapter 6: Sections 6.1, 6.2 and 6.3.Chapter 7: Sections 7.1 and 7.2.Chapter 8: Section 8.1 and 8.2 (mention Ornstein’s theorem).Chapter 9: Sections 9.1, 9.2, 9.3 and 9.4.Chapter 10: Sections 10.1 and 10.2.Chapter 11: Section 11.1.

In this format, the course could close either with the proof of the variational prin-ciple for the entropy (Theorem 10.1) or with the construction of absolutely con-tinuous invariant measures for expanding maps on manifolds (Theorem 11.1.2).

We have designed the text in such a way as to make it feasible for thelecturer to focus on presenting the central ideas, leaving it to the student tostudy in detail many of the proofs and complementary results. Indeed, wedevoted considerable effort to making the explanations as friendly as possible,detailing the arguments and including plenty of cross-references to previousrelated results as well to the definitions of the relevant notions.

In addition to the regular appearance of examples, we have often chosen toapproach the same notion more than once, from different points of view, if thatseemed useful for its in-depth understanding. The special chapter containing the

ix

hints for selected exercises is also part of that effort to encourage and facilitatethe autonomous use of this book by the student.

Acknowledgements

The writing of this book extended for over a decade. During this period webenefitted from constructive criticism from several colleagues and students.

Many colleagues used different preliminary versions of the book to teachcourses and shared their experiences with us. Besides Vanderlei Horita (Sao Josedo Rio Preto, Brazil), Nivaldo Muniz (Sao Luis, Brazil) and Meysam Nassiri(Teheran, Iran), we would like to thank Vıtor Araujo (Salvador, Brazil) for anextended list of suggestions that influenced significantly the way the text evolvedfrom then on. Francois Ledrappier (Paris, France) helped us with questionsabout substitution systems.

We also had the chance to test the material in a number of regular gradu-ate courses at IMPA-Instituto de Matematica Pura e Aplicada and at UFAL-Universidade Federal de Alagoas. Feedback from graduate students Aline GomesCerqueira, Ermerson Araujo, Ignacio Atal, Rafael Lucena, Raphael Cyna e Xiao-Chuan Liu allowed us to correct many of the weaknesses in earlier versions.

The first draft of Appendices A.1-A.2 was written by Joao Gouveia, VıtorSaraiva and Ricardo Andrade, who acted as assistants for the course in NovosTalentos em Matematica 2004 mentioned previously. IMPA students Edilenode Almeida Santos, Felippe Soares Guimaraes, Fernando Nera Lenarduzzi, ItaloDowell Lira Melo, Marco Vinicius Bahi Aymone and Renan Henrique Finderwrote many of the hints for the exercises in Chapters 1 through 8 and some ofthe appendices.

The original Portuguese version of this book, Fundamentos da Teoria Ergodi-ca [VO14], was published in 2014 by SBM-Sociedade Brasileira de Matematica.Feedback from colleagues who used that back to teach graduate courses in dif-ferent places helped eliminate some of the remaining shortcomings. The ex-tended list of remarks by Bernardo Lima (Belo Horizonte, Brazil) and his stu-dent Leonardo Guerini was particularly useful in this regard.

Several other changes and corrections were made in the course of the transla-tion to English. We are grateful to the many colleagues and students who agreedto revise different parts of the translated text, especially Cristina Lizana, ElaısC. Malheiro, Fernando Nera Lenarduzzi, Jiagang Yang, Karina Marin, LucasBackes, Maria Joao Resende, Mauricio Poletti, Paulo Varandas, Ricardo Tur-olla, Sina Turelli, Vanessa Ramos, Vıtor Araujo and Xiao-Chuan Liu.

Rio de Janeiro and Maceio, March 31, 2015

Marcelo Viana 1 and Krerley Oliveira 2

1IMPA, Estrada D. Castorina 110, 22460-320 Rio de Janeiro, Brasil. [email protected] de Matematica, Universidade Federal de Alagoas, Campus A. C. Simoes

s/n, 57072-090 Maceio, Brasil. [email protected].

x

Contents

1 Recurrence 1

1.1 Invariant measures . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Poincare recurrence theorem . . . . . . . . . . . . . . . . . . . . . 41.2.1 Measurable version . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Kac theorem . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.3 Topological version . . . . . . . . . . . . . . . . . . . . . . 71.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Decimal expansion . . . . . . . . . . . . . . . . . . . . . . 101.3.2 Gauss map . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.3 Circle rotations . . . . . . . . . . . . . . . . . . . . . . . . 161.3.4 Rotations on tori . . . . . . . . . . . . . . . . . . . . . . . 181.3.5 Conservative maps . . . . . . . . . . . . . . . . . . . . . . 181.3.6 Conservative flows . . . . . . . . . . . . . . . . . . . . . . 191.3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.4 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.4.1 First return map . . . . . . . . . . . . . . . . . . . . . . . 221.4.2 Induced transformations . . . . . . . . . . . . . . . . . . . 241.4.3 Kakutani-Rokhlin towers . . . . . . . . . . . . . . . . . . 271.4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.5 Multiple recurrence theorems . . . . . . . . . . . . . . . . . . . . 291.5.1 Birkhoff multiple recurrence theorem . . . . . . . . . . . . 311.5.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Existence of Invariant Measures 35

2.1 Weak∗ topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.1.1 Definition and properties of the weak∗ topology . . . . . . 362.1.2 Theorem Portmanteau . . . . . . . . . . . . . . . . . . . . 372.1.3 The weak∗ topology is metrizable . . . . . . . . . . . . . . 392.1.4 The weak∗ topology is compact . . . . . . . . . . . . . . . 412.1.5 Theorem of Prohorov . . . . . . . . . . . . . . . . . . . . 422.1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2 Proof of the existence theorem . . . . . . . . . . . . . . . . . . . 45

xi

xii CONTENTS

2.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.3 Comments in Functional Analysis . . . . . . . . . . . . . . . . . . 49

2.3.1 Duality and weak topologies . . . . . . . . . . . . . . . . . 492.3.2 Koopman operator . . . . . . . . . . . . . . . . . . . . . . 502.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.4 Skew-products and natural extensions . . . . . . . . . . . . . . . 532.4.1 Measures on skew-products . . . . . . . . . . . . . . . . . 542.4.2 Natural extensions . . . . . . . . . . . . . . . . . . . . . . 542.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.5 Arithmetic progressions . . . . . . . . . . . . . . . . . . . . . . . 582.5.1 Theorem of van der Waerden . . . . . . . . . . . . . . . . 602.5.2 Theorem of Szemeredi . . . . . . . . . . . . . . . . . . . . 612.5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3 Ergodic Theorems 65

3.1 Ergodic theorem of von Neumann . . . . . . . . . . . . . . . . . . 663.1.1 Isometries in Hilbert spaces . . . . . . . . . . . . . . . . . 663.1.2 Statement and proof of the theorem . . . . . . . . . . . . 693.1.3 Convergence in L2(µ) . . . . . . . . . . . . . . . . . . . . 703.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.2 Birkhoff ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . 713.2.1 Mean sojourn time . . . . . . . . . . . . . . . . . . . . . . 713.2.2 Time averages . . . . . . . . . . . . . . . . . . . . . . . . 723.2.3 Theorem of von Neumann and consequences . . . . . . . . 753.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.3 Subadditive ergodic theorem . . . . . . . . . . . . . . . . . . . . 793.3.1 Preparing the proof . . . . . . . . . . . . . . . . . . . . . 803.3.2 Key lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 813.3.3 Estimating ϕ− . . . . . . . . . . . . . . . . . . . . . . . . 833.3.4 Bounding ϕ+ . . . . . . . . . . . . . . . . . . . . . . . . . 843.3.5 Lyapunov exponents . . . . . . . . . . . . . . . . . . . . . 853.3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.4 Discrete time and continuous time . . . . . . . . . . . . . . . . . 883.4.1 Suspension flows . . . . . . . . . . . . . . . . . . . . . . . 893.4.2 Poincare maps . . . . . . . . . . . . . . . . . . . . . . . . 913.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4 Ergodicity 97

4.1 Ergodic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.1.1 Invariant sets and functions . . . . . . . . . . . . . . . . . 984.1.2 Spectral characterization . . . . . . . . . . . . . . . . . . 1004.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044.2.1 Rotations on tori . . . . . . . . . . . . . . . . . . . . . . . 1044.2.2 Decimal expansion . . . . . . . . . . . . . . . . . . . . . . 1064.2.3 Bernoulli shifts . . . . . . . . . . . . . . . . . . . . . . . . 108

CONTENTS xiii

4.2.4 Gauss map . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.2.5 Linear endomorphisms of the torus . . . . . . . . . . . . . 114

4.2.6 Hopf argument . . . . . . . . . . . . . . . . . . . . . . . . 116

4.2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.3 Properties of ergodic measures . . . . . . . . . . . . . . . . . . . 121

4.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.4 Comments in Conservative Dynamics . . . . . . . . . . . . . . . . 124

4.4.1 Hamiltonian systems . . . . . . . . . . . . . . . . . . . . . 125

4.4.2 Kolmogorov-Arnold-Moser theory . . . . . . . . . . . . . . 127

4.4.3 Elliptic periodic points . . . . . . . . . . . . . . . . . . . . 131

4.4.4 Geodesic flows . . . . . . . . . . . . . . . . . . . . . . . . 135

4.4.5 Anosov systems . . . . . . . . . . . . . . . . . . . . . . . . 136

4.4.6 Billiards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

4.4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5 Ergodic decomposition 147

5.1 Ergodic decomposition theorem . . . . . . . . . . . . . . . . . . . 147

5.1.1 Statement of the theorem . . . . . . . . . . . . . . . . . . 148

5.1.2 Disintegration of a measure . . . . . . . . . . . . . . . . . 149

5.1.3 Measurable partitions . . . . . . . . . . . . . . . . . . . . 151

5.1.4 Proof of the ergodic decomposition theorem . . . . . . . . 152

5.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.2 Rokhlin disintegration theorem . . . . . . . . . . . . . . . . . . . 155

5.2.1 Conditional expectations . . . . . . . . . . . . . . . . . . 155

5.2.2 Criterion for σ-additivity . . . . . . . . . . . . . . . . . . 157

5.2.3 Construction of conditional measures . . . . . . . . . . . . 160

5.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

6 Unique Ergodicity 163

6.1 Unique ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

6.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

6.2 Minimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

6.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.3 Haar measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

6.3.1 Rotations on tori . . . . . . . . . . . . . . . . . . . . . . . 168

6.3.2 Topological groups and Lie groups . . . . . . . . . . . . . 169

6.3.3 Translations on compact metrizable groups . . . . . . . . 173

6.3.4 Odometers . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6.3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

6.4 Theorem of Weyl . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

6.4.1 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

6.4.2 Unique ergodicity . . . . . . . . . . . . . . . . . . . . . . . 182

6.4.3 Proof of the theorem of Weyl . . . . . . . . . . . . . . . . 184

6.4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

xiv CONTENTS

7 Correlations 187

7.1 Mixing systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1887.1.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 1887.1.2 Weak mixing . . . . . . . . . . . . . . . . . . . . . . . . . 1917.1.3 Spectral characterization . . . . . . . . . . . . . . . . . . 1937.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.2 Markov shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1967.2.1 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 2017.2.2 Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2037.2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

7.3 Interval exchanges . . . . . . . . . . . . . . . . . . . . . . . . . . 2077.3.1 Minimality and ergodicity . . . . . . . . . . . . . . . . . . 2097.3.2 Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

7.4 Decay of correlations . . . . . . . . . . . . . . . . . . . . . . . . . 2147.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

8 Equivalent Systems 221

8.1 Ergodic equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 2228.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

8.2 Spectral equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 2248.2.1 Invariants of spectral equivalence . . . . . . . . . . . . . . 2258.2.2 Eigenvalues and weak mixing . . . . . . . . . . . . . . . . 2268.2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

8.3 Discrete spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . 2308.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

8.4 Lebesgue spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . 2338.4.1 Examples and properties . . . . . . . . . . . . . . . . . . . 2338.4.2 The invertible case . . . . . . . . . . . . . . . . . . . . . . 2378.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

8.5 Lebesgue spaces and ergodic isomorphism . . . . . . . . . . . . . 2418.5.1 Ergodic isomorphism . . . . . . . . . . . . . . . . . . . . . 2418.5.2 Lebesgue spaces . . . . . . . . . . . . . . . . . . . . . . . 2438.5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

9 Entropy 249

9.1 Definition of entropy . . . . . . . . . . . . . . . . . . . . . . . . . 2509.1.1 Entropy in Information Theory . . . . . . . . . . . . . . . 2509.1.2 Entropy of a partition . . . . . . . . . . . . . . . . . . . . 2519.1.3 Entropy of a dynamical system . . . . . . . . . . . . . . . 2569.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

9.2 Theorem of Kolmogorov-Sinai . . . . . . . . . . . . . . . . . . . . 2619.2.1 Generating partitions . . . . . . . . . . . . . . . . . . . . 2639.2.2 Semi-continuity of the entropy . . . . . . . . . . . . . . . 2659.2.3 Expansive transformations . . . . . . . . . . . . . . . . . . 2679.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

CONTENTS xv

9.3 Local entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2689.3.1 Proof of the Shannon-McMillan-Breiman theorem . . . . 2709.3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

9.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2749.4.1 Markov shifts . . . . . . . . . . . . . . . . . . . . . . . . . 2749.4.2 Gauss map . . . . . . . . . . . . . . . . . . . . . . . . . . 2759.4.3 Linear endomorphisms of the torus . . . . . . . . . . . . . 2779.4.4 Differentiable maps . . . . . . . . . . . . . . . . . . . . . . 2789.4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

9.5 Entropy and equivalence . . . . . . . . . . . . . . . . . . . . . . . 2819.5.1 Bernoulli automorphisms . . . . . . . . . . . . . . . . . . 2829.5.2 Systems with entropy zero . . . . . . . . . . . . . . . . . . 2829.5.3 Kolmogorov systems . . . . . . . . . . . . . . . . . . . . . 2859.5.4 Exact systems . . . . . . . . . . . . . . . . . . . . . . . . 2919.5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

9.6 Entropy and ergodic decomposition . . . . . . . . . . . . . . . . . 2929.6.1 Affine property . . . . . . . . . . . . . . . . . . . . . . . . 2949.6.2 Proof of the Jacobs theorem . . . . . . . . . . . . . . . . . 2969.6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

9.7 Jacobians and the Rokhlin formula . . . . . . . . . . . . . . . . . 3009.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

10 Variational principle 307

10.1 Topological entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 30810.1.1 Definition via open covers . . . . . . . . . . . . . . . . . . 30810.1.2 Generating sets and separated sets . . . . . . . . . . . . . 31010.1.3 Calculation and properties . . . . . . . . . . . . . . . . . . 31510.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

10.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31910.2.1 Expansive maps . . . . . . . . . . . . . . . . . . . . . . . 31910.2.2 Shifts of finite type . . . . . . . . . . . . . . . . . . . . . . 32110.2.3 Topological entropy of flows . . . . . . . . . . . . . . . . . 32410.2.4 Differentiable maps . . . . . . . . . . . . . . . . . . . . . . 32610.2.5 Linear endomorphisms of the torus . . . . . . . . . . . . . 32810.2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

10.3 Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33210.3.1 Definition via open covers . . . . . . . . . . . . . . . . . . 33210.3.2 Generating sets and separated sets . . . . . . . . . . . . . 33410.3.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 33610.3.4 Comments in Statistical Mechanics . . . . . . . . . . . . . 33910.3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

10.4 Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . 34410.4.1 Proof of the upper bound . . . . . . . . . . . . . . . . . . 34610.4.2 Approximating the pressure . . . . . . . . . . . . . . . . . 34810.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

10.5 Equilibrium states . . . . . . . . . . . . . . . . . . . . . . . . . . 352

xvi CONTENTS

10.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

11 Expanding Maps 359

11.1 Expanding maps on manifolds . . . . . . . . . . . . . . . . . . . . 360

11.1.1 Distortion lemma . . . . . . . . . . . . . . . . . . . . . . . 361

11.1.2 Existence of ergodic measures . . . . . . . . . . . . . . . . 365

11.1.3 Uniqueness and conclusion of the proof . . . . . . . . . . 366

11.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

11.2 Dynamics of expanding maps . . . . . . . . . . . . . . . . . . . . 369

11.2.1 Contracting inverse branches . . . . . . . . . . . . . . . . 372

11.2.2 Shadowing and periodic points . . . . . . . . . . . . . . . 373

11.2.3 Dynamical decomposition . . . . . . . . . . . . . . . . . . 377

11.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

11.3 Entropy and periodic points . . . . . . . . . . . . . . . . . . . . . 381

11.3.1 Rate growth of periodic points . . . . . . . . . . . . . . . 382

11.3.2 Approximation by atomic measures . . . . . . . . . . . . . 383

11.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 385

12 Thermodynamical Formalism 387

12.1 Theorem of Ruelle . . . . . . . . . . . . . . . . . . . . . . . . . . 388

12.1.1 Reference measure . . . . . . . . . . . . . . . . . . . . . . 390

12.1.2 Distortion and the Gibbs property . . . . . . . . . . . . . 392

12.1.3 Invariant density . . . . . . . . . . . . . . . . . . . . . . . 394

12.1.4 Construction of the equilibrium state . . . . . . . . . . . . 396

12.1.5 Pressure and eigenvalues . . . . . . . . . . . . . . . . . . . 398

12.1.6 Uniqueness of the equilibrium state . . . . . . . . . . . . . 400

12.1.7 Exactness . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

12.1.8 Absolutely continuous measures . . . . . . . . . . . . . . . 403

12.1.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

12.2 Theorem of Livsic . . . . . . . . . . . . . . . . . . . . . . . . . . 406

12.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

12.3 Decay of correlations . . . . . . . . . . . . . . . . . . . . . . . . . 409

12.3.1 Projective distances . . . . . . . . . . . . . . . . . . . . . 411

12.3.2 Cones of Holder functions . . . . . . . . . . . . . . . . . . 416

12.3.3 Exponential convergence . . . . . . . . . . . . . . . . . . . 420

12.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

12.4 Dimension of conformal repellers . . . . . . . . . . . . . . . . . . 425

12.4.1 Hausdorff dimension . . . . . . . . . . . . . . . . . . . . . 425

12.4.2 Conformal repellers . . . . . . . . . . . . . . . . . . . . . 427

12.4.3 Distortion and conformality . . . . . . . . . . . . . . . . . 429

12.4.4 Existence and uniqueness of d0 . . . . . . . . . . . . . . . 431

12.4.5 Upper bound . . . . . . . . . . . . . . . . . . . . . . . . . 434

12.4.6 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . . 435

12.4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 436

CONTENTS xvii

A Measure Theory, Topology and Analysis 439

A.1 Measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439A.1.1 Measurable spaces . . . . . . . . . . . . . . . . . . . . . . 440A.1.2 Measure spaces . . . . . . . . . . . . . . . . . . . . . . . . 442A.1.3 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . 445A.1.4 Measurable maps . . . . . . . . . . . . . . . . . . . . . . . 448A.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 450

A.2 Integration in measure spaces . . . . . . . . . . . . . . . . . . . . 452A.2.1 Lebesgue integral . . . . . . . . . . . . . . . . . . . . . . . 453A.2.2 Convergence theorems . . . . . . . . . . . . . . . . . . . . 455A.2.3 Product measures . . . . . . . . . . . . . . . . . . . . . . 456A.2.4 Derivation of measures . . . . . . . . . . . . . . . . . . . . 458A.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 460

A.3 Measures in metric spaces . . . . . . . . . . . . . . . . . . . . . . 462A.3.1 Regular measures . . . . . . . . . . . . . . . . . . . . . . . 462A.3.2 Separable complete metric spaces . . . . . . . . . . . . . . 465A.3.3 Space of continuous functions . . . . . . . . . . . . . . . . 466A.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 468

A.4 Differentiable manifolds . . . . . . . . . . . . . . . . . . . . . . . 469A.4.1 Differentiable manifolds and maps . . . . . . . . . . . . . 469A.4.2 Tangent space and derivative . . . . . . . . . . . . . . . . 471A.4.3 Cotangent space and differential forms . . . . . . . . . . . 473A.4.4 Transversality . . . . . . . . . . . . . . . . . . . . . . . . . 474A.4.5 Riemannian manifolds . . . . . . . . . . . . . . . . . . . . 475A.4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 477

A.5 Lp(µ) spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478A.5.1 Lp(µ) spaces with 1 ≤ p <∞ . . . . . . . . . . . . . . . . 478A.5.2 Inner product in L2(µ) . . . . . . . . . . . . . . . . . . . . 479A.5.3 Space of essentially bounded functions . . . . . . . . . . . 480A.5.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 480A.5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 481

A.6 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482A.6.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . 482A.6.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483A.6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 485

A.7 Spectral theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 486A.7.1 Spectral measures . . . . . . . . . . . . . . . . . . . . . . 486A.7.2 Spectral representation . . . . . . . . . . . . . . . . . . . 489A.7.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 490

Hints or solutions for selected exercises 491

xviii CONTENTS

Chapter 1

Recurrence

Ergodic Theory studies the behavior of dynamical systems with respect to mea-sures that remain invariant under time evolution. Indeed, it aims to describethose properties that are valid for the trajectories of almost all initial statesof the system, that is, all but a subset that has zero weight for the invariantmeasure. Our first task, in Section 1.1, will be to explain what we mean by‘dynamical system’ and ‘invariant measure’.

The roots of the theory date back to the first half of the 19th century. By1838, the French mathematician Joseph Liouville observed that every energy-preserving system in Classical (Newtonian) Mechanics admits a natural invari-ant volume measure in the space of configurations. Just a bit later, in 1845,the great German mathematician Carl Friedrich Gauss pointed out that thetransformation

(0, 1] → R, x 7→ fractional part of1

x,

which has an important role in Number Theory, admits an invariant measureequivalent to the Lebesgue measure (in the sense that the two have the samezero measure sets). These are two of the examples of applications of ErgodicTheory that we discuss in Section 1.3. Many others are introduced throughoutthis book.

The first important result was found by the great French mathematicianHenri Poincare by the end of the 19th century. Poincare was particularly in-terested in the motion of celestial bodies, such as planets and comets, whichis described by certain differential equations originating from Newton’s Law ofUniversal Gravitation. Starting from Liouville’s observation, Poincare realizedthat for almost every initial state of the system, that is, almost every value ofthe initial position and velocity, the solution of the differential equation comesback arbitrarily close to that initial state, unless it goes to infinity. Even more,this recurrence property is not specific to the (Celestial) Mechanics: it is sharedby any dynamical system that admits a finite invariant measure. That is thetheme of Section 1.2.

The same theme reappears in Section 1.5, in a more elaborate context: there,

1

2 CHAPTER 1. RECURRENCE

we deal with any finite number of dynamical systems commuting with eachother, and we seek for simultaneousreturns of the orbits of all those systemsto the neighborhood of the initial state. This kind of result has importantapplications in Combinatorics and Number Theory, as we will see.

The recurrence phenomenon is also behind the constructions that we presentin Section 1.4. The basic idea is to fix some positive measure subset of thedomain and to consider the first return to that subset. This first-return trans-formation is often easier to analyze, and it may be used to shed much light onthe behavior of the original transformation.

1.1 Invariant measures

Let (M,B, µ) be a measure space and f : M →M be a measurable transforma-tion. We say that the measure µ is invariant under f if

µ(E) = µ(f−1(E)) for every measurable set E ⊂M . (1.1.1)

We also say that µ is f -invariant, or that f preserves µ, to mean just thesame. Notice that the definition (1.1.1) makes sense, since the pre-image ofa measurable set under a measurable transformation is still a measurable set.Heuristically, the definition means that the probability that a point picked “atrandom” is in a given subset is equal to the probability that its image is in thatsubset.

It is possible, and convenient, to extend this definition to other types ofdynamical systems, beyond transformations. We are especially interested inflows, that is, families of transformations f t : M → M , with t ∈ R, satisfyingthe following conditions:

f0 = id and fs+t = fs f t for every s, t ∈ R. (1.1.2)

In particular, each transformation f t is invertible and the inverse is f−t. Flowsarise naturally in connection with differential equations of the form

dγ

dt(t) = X(γ(t)),

in the following way: under suitable conditions on the vector field X , for eachpoint x in the domain M there exists exactly one solution t 7→ γx(t) of thedifferential equation with γx(0) = x; then f t(x) = γx(t) defines a flow in M .

We say that a measure µ is invariant under a flow (f t)t if it is invariantunder each one of the transformations f t, that is, if

µ(E) = µ(f−t(E)) for every measurable set E ⊂M and t ∈ R. (1.1.3)

Proposition 1.1.1. Let f : M → M be a measurable transformation and µ bea measure on M . Then f preserves µ if and only if

∫φdµ =

∫φ f dµ (1.1.4)

for every µ-integrable function φ : M → R.

1.1. INVARIANT MEASURES 3

Proof. Suppose that the measure µ is invariant under f . We are going to showthat the relation (1.1.4) is valid for increasingly broader classes of functions.Let XB denote the characteristic function of any measurable subset B. Then

µ(B) =

∫XB dµ and µ(f−1(B)) =

∫Xf−1(B) dµ =

∫(XB f) dµ.

Thus, the hypothesis µ(B) = µ(f−1(B)) means that (1.1.4) is valid for char-acteristic functions. Then, by linearity of the integral, (1.1.4) is valid for allsimple functions. Next, given any integrable φ : M → R, consider a sequence(sn)n of simple functions, converging to φ and such that |sn| ≤ |φ| for every n.That such a sequence exists is guaranteed by Proposition A.1.33. Then, usingthe dominated convergence theorem (Theorem A.2.11) twice:

∫φdµ = lim

n

∫sn dµ = lim

n

∫(sn f) dµ =

∫(φ f) dµ.

This shows that (1.1.4) holds for every integrable function if µ is invariant. Theconverse is also contained in the arguments we just presented.

1.1.1 Exercises

1.1.1. Let f : M → M be a measurable transformation. Show that a Diracmeasure δp is invariant under f if and only if p is a fixed point of f . Moregenerally, a probability measure δp,k = k−1

(δp+δf(p)+· · ·+δfk−1(p)

)is invariant

under f if and only if fk(p) = p.

1.1.2. Prove the following version of Proposition 1.1.1. Let M be a metricspace, f : M → M be a measurable transformation and µ be a measure on M .Show that f preserves µ if and only if

∫φdµ =

∫φ f dµ.

for every bounded continuous function φ : M → R.

1.1.3. Prove that if f : M → M preserves a measure µ then, given any k ≥ 2,the iterate fk also preserves µ. Is the converse true?

1.1.4. Suppose that f : M →M preserves a probability measure µ. Let B ⊂Mbe a measurable set satisfying any one of the following conditions:

1. µ(B \ f−1(B)) = 0;

2. µ(f−1(B) \B) = 0;

3. µ(B∆f−1(B)) = 0;

4. f(B) ⊂ B.

Show that there exists C ⊂M such that f−1(C) = C and µ(B∆C) = 0.

1.1.5. Let f : U → U be a C1 diffeomorphism on an open set U ⊂ Rd. Showthat the Lebesgue measure m is invariant under f if and only if | detDf | ≡ 1.


1.2 Poincare recurrence theorem

We are going to study two versions of Poincare’s theorem. The first one (Sec-tion 1.2.1) is formulated in the context of (finite) measure spaces. The theo-rem of Kac, that we state and prove in Section 1.2.2, provides a quantitativecomplement to that statement. The second version of the recurrence theorem(Section 1.2.3) assumes that the ambient is a topological space with certain ad-ditional properties. We will also prove a third version of the recurrence theorem,due to Birkhoff, whose statement is purely topological.

1.2.1 Measurable version

Our first result asserts that, given any finite invariant measure, almost everypoint in any positive measure set E returns to E an infinite number of times:

Theorem 1.2.1 (Poincare recurrence). Let f : M →M be a measurable trans-formation and µ be a finite measure invariant under f . Let E ⊂ M be anymeasurable set with µ(E) > 0. Then, for µ-almost every point x ∈ E there existinfinitely many values of n for which fn(x) is also in E.

Proof. Denote by E0 the set of points x ∈ E that never return to E. As afirst step, let us prove that E0 has zero measure. To this end, let us observethat the pre-images f−n(E0) are pairwise disjoint. Indeed, suppose there existm > n ≥ 1 such that f−m(E0) intersects f−n(E0). Let x be a point in theintersection and y = fn(x). Then y ∈ E0 and fm−n(y) = fm(x) ∈ E0. SinceE0 ⊂ E, this means that y returns to E at least once, which contradicts thedefinition of E0. This contradiction proves that the pre-images are pairwisedisjoint, as claimed.

Since µ is invariant, we also have that µ(f−n(E0)) = µ(E0) for all n ≥ 1. Itfollows that

µ( ∞⋃

n=1

f−n(E0))

=

∞∑

n=1

µ(f−n(E0)) =

∞∑

n=1

µ(E0).

The expression on the left-hand side is finite, since the measure µ is assumed tobe finite. On the right-hand side we have a sum of infinitely many terms thatare all equal. The only way such a sum can be finite is if the terms vanish. So,µ(E0) = 0 as claimed.

Now let us denote by F the set of points x ∈ E that return to E a finitenumber of times. It is clear from the definition that every point x ∈ F has someiterate fk(x) in E0. In other words,

F ⊂∞⋃

k=0

f−k(E0).

Since µ(E0) = 0 and µ is invariant, it follows that

µ(F ) ≤ µ( ∞⋃

k=0

f−k(E0))≤

∞∑

k=0

µ(f−k(E0)

)=

∞∑

k=0

µ(E0) = 0.

1.2. POINCARE RECURRENCE THEOREM 5

Thus, µ(F ) = 0 as we wanted to prove.

Theorem 1.2.1 implies an analogous result for continuous time systems: ifµ is a finite invariant measure of a flow (f t)t then for every measurable setE ⊂ M with positive measure and for µ-almost every x ∈ E there exist timestj → +∞ such that f tj (x) ∈ E. Indeed, if µ is invariant under the flow then, inparticular, it is invariant under the so-called time-1 map f1. So, the statementwe just made follows immediately from Theorem 1.2.1 applied to f1 (the timestj one finds in this way are integers). Similar observations apply to the otherversions of the recurrence theorem that we present in the sequel.

On the other hand, the theorem in the next section is specific to discretetime systems.

1.2.2 Kac theorem

Let f : M → M be a measurable transformation and µ be a finite measureinvariant under f . Let E ⊂M be any measurable set with µ(E) > 0. Considerthe first-return time function ρE : E → N ∪ ∞, defined by

ρE(x) = minn ≥ 1 : fn(x) ∈ E (1.2.1)

if the set on the right-hand side is non-empty and ρE(x) = ∞ if, on the contrary,x has no iterate in E. According to Theorem 1.2.1, the second alternative occursonly on a set with zero measure.

The next result shows that this function is integrable and even provides thevalue of the integral. For the statement we need the following notation:

E0 = x ∈ E : fn(x) /∈ E for every n ≥ 1 and

E∗0 = x ∈M : fn(x) /∈ E for every n ≥ 0.

In other words, E0 is the set of points in E that never return to E and E∗0 is

the set of points in M that never enter E. We have seen in Theorem 1.2.1 thatµ(E0) = 0.

Theorem 1.2.2 (Kac). Let f : M → M be a measurable transformation, µbe a finite invariant measure and E ⊂ M be a positive measure set. Then thefunction ρE is integrable and

∫

E

ρE dµ = µ(M) − µ(E∗0 ).

Proof. For each n ≥ 1, define

En = x ∈ E : f(x) /∈ E, . . . , fn−1(x) /∈ E, but fn(x) ∈ E and

E∗n = x ∈M : x /∈ E, f(x) /∈ E, . . . , fn−1(x) /∈ E, but fn(x) ∈ E.

That is, En is the set of points of E that return to E for the first time exactlyat time n,

En = x ∈ E : ρE(x) = n,


and E∗n is the set points that are not in E and enter E for the first time exactly at

time n. It is clear that these sets are measurable and, hence, ρE is a measurablefunction. Moreover, the sets En, E

∗n, n ≥ 0 constitute a partition of the ambient

space: they are pairwise disjoint and their union is the whole M . So,

µ(M) =∞∑

n=0

(µ(En) + µ(E∗

n))

= µ(E∗0 ) +

∞∑

n=1

(µ(En) + µ(E∗

n)). (1.2.2)

Now observe that

f−1(E∗n) = E∗

n+1 ∪ En+1 for every n. (1.2.3)

Indeed, f(y) ∈ E∗n means that the first iterate of f(y) that belongs to E is

fn(f(y)) = fn+1(y) and that occurs if and only if y ∈ E∗n+1 or else y ∈ En+1.

This proves the equality (1.2.3). So, given that µ is invariant,

µ(E∗n) = µ(f−1(E∗

n)) = µ(E∗n+1) + µ(En+1) for every n.

Applying this relation successively, we find that

µ(E∗n) = µ(E∗

m) +

m∑

i=n+1

µ(Ei) for every m > n. (1.2.4)

The relation (1.2.2) implies that µ(E∗m) → 0 when m→ ∞. So, taking the limit

as m→ ∞ in the equality (1.2.4), we find that

µ(E∗n) =

∞∑

i=n+1

µ(Ei). (1.2.5)

To complete the proof, replace (1.2.5) in the equality (1.2.2). In this way wefind that

µ(M) − µ(E∗0 ) =

∞∑

n=1

( ∞∑

i=n

µ(Ei))

=

∞∑

n=1

nµ(En) =

∫

E

ρE dµ,

as we wanted to prove.

In some cases, for example when the system (f, µ) is ergodic (this propertywill be defined and studied later, starting from Chapter 4), the set E∗

0 has zeromeasure. Then the conclusion of the Kac theorem means that

1

µ(E)

∫

E

ρE dµ =µ(M)

µ(E)(1.2.6)

for every measurable set E with positive measure. The left-hand side of thisexpression is the mean!return time to E. So, (1.2.6) asserts that the mean returntime is inversely proportional to the measure of E.

Remark 1.2.3. By definition, E∗n = f−n(E) \ ∪n−1

k=0f−k(E). So, the fact that

the sum (1.2.2) is finite implies that the measure of E∗n converges to zero when

n→ ∞. This fact will be useful later.

1.2. POINCARE RECURRENCE THEOREM 7

1.2.3 Topological version

Now let us suppose that M is a topological space, endowed with its Borel σ-algebra B. A point x ∈ M is recurrent for a transformation f : M → M ifthere exists a sequence nj → ∞ of natural numbers such that fnj (x) → x.Analogously, we say that x ∈ M is recurrent for a flow (f t)t if there exists asequence tj → +∞ of real numbers such that f tj (x) → x when j → ∞.

In the next theorem we assume that the topological space M admits a count-able basis of open sets, that is, there exists a countable family Uk : k ∈ Nof open sets such that every open subset of M may be written as a union ofelements Uk of this family. This condition holds in most interesting examples.

Theorem 1.2.4 (Poincare recurrence). Suppose that M admits a countablebasis of open sets. Let f : M → M be a measurable transformation and µ bea finite measure on M invariant under f . Then, µ-almost every x ∈ M isrecurrent for f .

Proof. For each k, denote by Uk the set of points x ∈ Uk that never return toUk. According to Theorem 1.2.1, every Uk has zero measure. Consequently, thecountable union

U =⋃

k∈N

Uk

also has zero measure. Hence, to prove the theorem it suffices to check thatevery point x that is not in U is recurrent. That is easy, as we are going tosee. Consider x ∈ M \ U and let U be any neighborhood of x. By definition,there exists some element Uk of the basis of open sets such that x ∈ Uk andUk ⊂ U . Since x is not in U , we also have that x /∈ Uk. In other words, thereexists n ≥ 1 such that fn(x) is in Uk. In particular, fn(x) is also in U . Sincethe neighborhood U is arbitrary, this proves that x is a recurrent point.

Let us point out that the conclusions of Theorems 1.2.1 and 1.2.4 are false,in general, if the measure µ is not finite:

Example 1.2.5. Let f : R → R be the translation by 1, that is, the transfor-mation defined by f(x) = x + 1 for every x ∈ R. It is easy to check that fpreserves the Lebesgue measure on R (which is infinite). On the other hand, nopoint x ∈ R is recurrent for f . According to the recurrence theorem, this lastobservation implies that f can not admit any finite invariant measure.

However, it is possible to extend these statements for certain cases of infinitemeasures: see Exercise 1.2.2.

To conclude, we present a purely topological version of Theorem 1.2.4, calledBirkhoff recurrence theorem, that makes no reference at all to invariant mea-sures:

Theorem 1.2.6 (Birkhoff recurrence). If f : M →M is a continuous transfor-mation on a compact metric space M then there exists some point x ∈ X thatis recurrent for f .


Proof. Consider the family I of all non-empty closed sets X ⊂ M that areinvariant under f , in the sense that f(X) ⊂ X . This family is non-empty, sinceM ∈ I. We claim that an element X ∈ I is minimal for the inclusion relation ifand only if the orbit of every x ∈ X is dense in X . Indeed, it is clear that if Xis a closed invariant subset then X contains the closure of the orbit of each oneof its elements. Hence, in order to be minimal, X must coincide with everyoneof these closures. Conversely, for the same reason, if X coincides with the orbitclosure of each one of its points then it has no proper subset that is closed andinvariant. That is, X is minimal. This proves our claim. In particular, everypoint x in a minimal set is recurrent. Therefore, to prove the theorem it sufficesto prove that there exists some minimal set.

We claim that every totally ordered set Xα ⊂ I admits a lower bound.Indeed, consider X = ∩αXα. Observe that X is non-empty, since the Xα arecompact and they form a totally ordered family. It is clear that X is closed andinvariant under f and it is equally clear that X is a lower bound for the setXα. That proves our claim. Now it follows from Zorn’s lemma that I doescontain minimal elements.

Theorem 1.2.6 can also be deduced from Theorem 1.2.4 together with thefact, which we will prove later (in Chapter 2), that every continuous transfor-mation on a compact metric space admits some invariant probability measure.

1.2.4 Exercises

1.2.1. Show that the following statement is equivalent to Theorem 1.2.1, mean-ing that each one of them can be obtained from the other. Let f : M → M bea measurable transformation and µ be a finite invariant measure. Let E ⊂ Mbe any measurable set with µ(E) > 0. Then there exists N ≥ 1 and a positivemeasure set D ⊂ E such that fN(x) ∈ E for every x ∈ D.

1.2.2. Let f : M → M be an invertible transformation and suppose that µ isan invariant measure, not necessarily finite. Let B ⊂ M be a set with finitemeasure. Prove that, given any measurable set E ⊂ M with positive measure,µ-almost every point x ∈ E either returns to E an infinite number of times orhas only a finite number of iterates in B.

1.2.3. Let f : M → M be an invertible transformation and suppose that µ isa σ-finite invariant measure: there exists an increasing sequence of measurablesubsets Mk with µ(Mk) <∞ for every k and ∪kMk = M . We say that a pointx goes to infinity if, for every k, there exists only a finite number of iterates of xthat are in Mk. Show that, given any E ⊂ M with positive measure, µ-almostevery point x ∈ E returns to E an infinite number of times or else goes toinfinity.

1.2.4. Let f : M →M be a measurable transformation, not necessarily invert-ible, µ be an invariant probability measure and D ⊂ M be a set with positive

1.3. EXAMPLES 9

measure. Prove that almost every point of D spends a positive fraction of timein D:

lim supn

1

n#0 ≤ j ≤ n− 1 : f j(x) ∈ D > 0

for µ-almost every x ∈ D. [Note: One may replace lim sup by lim inf in thestatement, but the proof of that fact will have to wait until Chapter 3.]

1.2.5. Let f : M → M be a measurable transformation preserving a finitemeasure µ. Given any measurable set A ⊂M with µ(A) > 0, let n1 < n2 < · · ·be the sequence of values of n such that µ(f−n(A) ∩ A) > 0. The goal of thisexercise is to prove that VA = n1, n2, . . . is a syndetic, that is, that thereexists C > 0 such that ni+1 − ni ≤ C for every i.

1. Show that for any increasing sequence k1 < k2 < · · · there exist j > i ≥ 1such that µ(A ∩ f−(kj−ki)(A)) > 0.

2. Given any infinite sequence ℓ = (lj)j of natural numbers, denote by S(ℓ)the set of all finite sums of consecutive elements of ℓ. Show that VAintersects S(ℓ) for every ℓ.

3. Deduce that the set VA is syndetic.

[Note: Exercise 3.1.2 provides a different proof of this fact.]

1.2.6. Show that if f : [0, 1] → [0, 1] is a measurable transformation preservingthe Lebesgue measure m then m-almost every point x ∈ [0, 1] satisfies

lim infn

n|fn(x) − x| ≤ 1.

[Note: Boshernitzan [Bos93] proved a much more general result, namely thatlim infn n

1/dd(fn(x), x) < ∞ for µ-almost every point and every probabilitymeasure µ invariant under f : M → M , assuming M is a separable metricwhose d-dimensional Hausdorff measure is σ-finite.]

1.2.7. Define f : [0, 1] → [0, 1] by f(x) = (x+ω)−[x+ω], where ω represents thegolden ratio (1 +

√5)/2. Given x ∈ [0, 1], check that n|fn(x) − x| = n2|ω − qn|

for every n, where (qn)n → ω is the sequence of rational numbers given byqn = [x+ nω]/n. Using that the roots of the polynomial R(z) = z2 − z − 1 areprecisely ω and ω −

√5, prove that lim infn n

2|ω − qn| ≥ 1/√

5. [Note: Thisshows that the constant 1 in Exercise 1.2.6 cannot be replaced by any constantsmaller than 1/

√5. It is not known whether 1 is the smallest constant such that

the statement holds for every transformation on the interval.]

1.3 Examples

Next, we describe some simple examples of invariant measures for transforma-tions and flows that help us interpret the significance of the Poincare recurrenceand also lead to some interesting conclusions.


1.3.1 Decimal expansion

Our first example is the transformation defined on the interval [0, 1] in thefollowing way:

f : [0, 1] → [0, 1], f(x) = 10x− [10x].

Here and in what follows, we use [y] is the integer part of a real number y, thatis, the largest integer smaller or equal than y. So, f is the map sending eachx ∈ [0, 1] to the fractional part of 10x. Figure 1.1 represents the graph of f .

0 2/10 4/10 6/10 8/10

1

1

E

Figure 1.1: Fractional part of 10x

We claim that the Lebesgue measure µ on the interval is invariant under thetransformation f , that is, it satisfies

µ(E) = µ(f−1(E)) for every measurable set E ⊂M. (1.3.1)

This can be checked as follows. Let us begin by supposing that E is an interval.Then, as illustrated in Figure 1.1, its pre-image f−1(E) consists of ten intervals,each of which is ten times shorter than E. Hence, the Lebesgue measure off−1(E) is equal to the Lebesgue measure of E. This proves that (1.3.1) doeshold in the case of intervals. As a consequence, it also holds when E is a finiteunion of intervals. Now, the family of all finite unions of intervals is an algebrathat generates the Borel σ-algebra of [0, 1]. Hence, to conclude the proof it isenough to use the following general fact:

Lemma 1.3.1. Let f : M → M be a measurable transformation and µ be afinite measure on M . Suppose that there exists some algebra A of measurablesubsets of M such that A generates the σ-algebra B of M and µ(E) = µ(f−1(E))for every E ∈ A. Then the latter remains true for every set E ∈ B, that is, themeasure µ is invariant under f .

Proof. We start by proving that C = E ∈ B : µ(E) = µ(f−1(E)) is a mono-tone class. Let E1 ⊂ E2 ⊂ . . . be any increasing sequence of elements of C and

1.3. EXAMPLES 11

let E = ∪∞i=1Ei. By Theorem A.1.14 (see Exercise A.1.9),

µ(E) = limiµ(Ei) and µ(f−1(E)) = lim

iµ(f−1(Ei)).

So, using the fact that Ei ∈ C,

µ(E) = limiµ(Ei) = lim

iµ(f−1(Ei)) = µ(f−1(E)).

Hence, E ∈ C. In precise the same way, one gets that the intersection of anydecreasing sequence of elements of C is in C. This proves that C is indeed amonotone class.

Now it is easy to deduce the conclusion of the lema. Indeed, since C is as-sumed to contain A, we may use the monotone class theorem (Theorem A.1.18),to conclude that C contains the σ-algebra B generated by A. That is preciselywhat we wanted to prove.

Now we explain how one may use the fact that the Lebesgue measure isinvariant under f , together with the Poincare recurrence theorem, to reachsome interesting conclusions. The transformation f is directly related to theusual decimal expansion of a real number: if x is given by

x = 0, a0a1a2a3 · · ·

with ai ∈ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and ai 6= 9 for infinitely many values of i, thenits image under f is given by

f(x) = 0, a1a2a3 · · · .

Thus, more generally, the n-th iterate of f can be expressed in the followingway, for every n ≥ 1:

fn(x) = 0, anan+1an+2 · · · (1.3.2)

Let E be the subset of points x ∈ [0, 1] whose decimal expansion starts withthe digit 7, that is, such that a0 = 7. According to Theorem 1.2.1, almost everyelement in E has infinitely many iterates that are also in E. By the expression(1.3.2), this means that there are infinitely many values of n such that an = 7.So, we have shown that almost every number x whose decimal expansion startswith 7 has infinitely many digits equal to 7.

Of course, instead of 7 we may consider any other digit. Even more, there isa similar result (see Exercise 1.3.2) when, instead of a single digit, one considersa block of k ≥ 1 consecutive digits. Later on, in Chapter 3, we will provea much stronger fact: for almost every number x ∈ [0, 1], every digit occurswith frequency 1/10 (more generally, every block of k ≥ 1 digits occurs withfrequency 1/10k) in the decimal expansion of x.


1.3.2 Gauss map

The system we present in this section is related to another important algorithmin Number Theory, the continued fraction expansion, which plays a central rolein the problem of finding the best rational approximation to any real number.Let us start with a brief presentation of this algorithm.

Given any number x0 ∈ (0, 1), let

a1 =

[1

x0

]and x1 =

1

x0− a1.

Note that a1 is a natural number, x1 ∈ [0, 1) and

x0 =1

a1 + x1.

Supposing that x1 is different from zero, we may repeat this procedure, defining

a2 =

[1

x1

]and x2 =

1

x1− a2.

Then

x1 =1

a1 + x2and so x0 =

1

a1 +1

a2 + x2

.

Now we may proceed by induction: for each n ≥ 1 such that xn−1 ∈ (0, 1),define

an =

[1

xn−1

]and xn =

1

xn−1− an = G(xn−1),

and observe that

x0 =1

a1 +1

a2 +1

· · · + 1

an + xn

. (1.3.3)

It can be shown that the sequence

zn =1

a1 +1

a2 +1

· · · + 1

an

(1.3.4)

converges to x0 when n→ ∞. This is usually expressed through the expression

x0 =1

a1 +1

a2 +1

· · · + 1

an +1

· · ·

, (1.3.5)

1.3. EXAMPLES 13

which is called continued fraction expansion of x0.Note that the sequence (zn)n defined by the relation (1.3.4) consists of ratio-

nal numbers. Indeed, one can show that these are the best rational approxima-tions of the number x0, in the sense that each zn is closer to x0 than any otherrational number whose denominator is smaller or equal than the denominatorof zn (written in irreducible form). Observe also that to obtain (1.3.5) we hadto assume that xn ∈ (0, 1) for every n ∈ N. If in the course of the process oneencounters some xn = 0, then the algorithm halts and we consider (1.3.3) to bethe continued fraction expansion of x0. It is clear that this can happen only ifx0 itself is a rational number.

This continued fraction algorithm is intimately related to a certain dynamicalsystem on the interval [0, 1] that we describe in the sequel. The Gauss mapG : [0, 1] → [0, 1] is defined by

G(x) =1

x−[

1

x

]= fractional part of 1/x,

if x ∈ (0, 1] andG(0) = 0. The graph ofG can be easily sketched (see Figure 1.2),starting from the following observation: for every x in each interval Ik = (1/(k+1), 1/k], the integer part of 1/x is equal to k and so G(x) = 1/x− k.

...

0 1

1

1/21/31/4

Figure 1.2: Gauss map

The continued fraction expansion of any number x0 ∈ (0, 1) can be obtainedfrom the Gauss map, in the following way: for each n ≥ 1, the natural numberan is determined by

Gn−1(x0) ∈ Ian

and the real number xn is simply the n-th iterate Gn(x0) of the point x0. Thisprocess halts whenever we encounter some xn = 0; as we explained previously,this can only happen if x0 is a rational number (see Exercise 1.3.4). In particular,there exists a full Lebesgue measure subset of (0, 1) such that all the iterates ofG are defined for all the points in that subset.


A remarkable fact that makes this transformation interesting from the pointof view of Ergodic Theory is that G admits an invariant probability measurethat, in addition, is equivalent to the Lebesgue measure on the interval. Indeed,consider the measure defined by

µ(E) =

∫

E

c

1 + xdx for every measurable set E ⊂ [0, 1], (1.3.6)

where c is a positive constant. The integral is well defined, since the functionin the integral is continuous on the interval [0, 1]. Moreover, this function takesvalues inside the interval [c/2, c] and that implies

c

2m(E) ≤ µ(E) ≤ cm(E) for every measurable set E ⊂ [0, 1]. (1.3.7)

In particular, µ is indeed equivalent to the Lebesgue measure m.

Proposition 1.3.2. The measure µ is invariant under G. Moreover, if wechoose c = 1/ log 2 then µ is a probability measure.

Proof. We are going to use the following lemma:

Lemma 1.3.3. Let f : [0, 1] → [0, 1] be a transformation such that there existpairwise disjoint open intervals I1, I2, . . . satisfying

1. the union ∪kIk has full Lebesgue measure in [0, 1] and

2. the restriction fk = f | Ik to each Ik is a diffeomorphism onto (0, 1).

Let ρ : [0, 1] → [0,∞) be an integrable function (relative to the Lebesgue measure)satisfying

ρ(y) =∑

x∈f−1(y)

ρ(x)

|f ′(x)| (1.3.8)

for almost every y ∈ [0, 1]. Then the measure µ = ρdx is invariant under f .

Proof. Let φ = χE be the characteristic function of an arbitrary measurable setE ⊂ [0, 1]. Changing variables in the integral,

∫

Ik

φ(f(x))ρ(x) dx =

∫ 1

0

φ(y)ρ(f−1k (y))|(f−1

k )′(y)| dy.

Note that (f−1k )′(y) = 1/f ′(f−1

k (y)). So, the previous relation implies that

∫ 1

0

φ(f(x))ρ(x) dx =∞∑

k=1

∫

Ik

φ(f(x))ρ(x) dx

=

∞∑

k=1

∫ 1

0

φ(y)ρ(f−1

k (y))

|f ′(f−1k (y))| dy.

(1.3.9)

1.3. EXAMPLES 15

Using the monotone convergence theorem (Theorem A.2.9) and the hypothesis(1.3.8), we see that the last expression in (1.3.9) is equal to

∫ 1

0

φ(y)∞∑

k=1

ρ(f−1k (y))

|f ′(f−1k (y))| dy =

∫ 1

0

φ(y)ρ(y) dy.

In this way we find that∫ 1

0φ(f(x))ρ(x) dx =

∫ 1

0φ(y)ρ(y) dy. Since µ = ρdx and

φ = XE , this means that µ(f−1(E)) = µ(E) for every measurable set E ⊂ [0, 1].In other words, µ is invariant under f .

To conclude the proof of Proposition 1.3.2 we must show that the condition(1.3.8) holds for ρ(x) = c/(1 + x) and f = G. Let Gk denote the restriction ofG to the interval Ik = (1/(k+1), 1/k), for k ≥ 1. Note that G−1

k (y) = 1/(y+k)for every k. Note also that G′(x) = (1/x)′ = −1/x2 for every x 6= 0. Therefore,

∞∑

k=1

ρ(G−1k (y))

|G′(G−1k (y))| =

∞∑

k=1

c(y + k)

y + k + 1

( 1

y + k

)2=

∞∑

k=1

c

(y + k)(y + k + 1). (1.3.10)

Observing that

1

(y + k)(y + k + 1)=

1

y + k− 1

y + k + 1,

we see that the last sum in (1.3.10) has a telescopic structure: except for thefirst one, all the terms occur twice, with opposite signs, and so they cancel out.This means that the sum is equal to the first term:

∞∑

k=1

c

(y + k)(y + k + 1)=

c

y + 1= ρ(y).

This proves that the equality (1.3.8) is indeed satisfied and, hence, we may useLemma 1.3.1 to conclude that µ is invariant under f .

Finally, observing that c log(1 + x) is a primitive of the function ρ(x), wefind that

µ([0, 1]) =

∫ 1

0

c

1 + xdx = c log 2.

So, picking c = 1/ log 2 ensures that µ is a probability measure.

This proposition allows us to use ideas from Ergodic Theory, applied to theGauss map, to obtain interesting conclusions in Number Theory. For example(see Exercise 1.3.3), the natural number 7 occurs infinitely many times in thecontinued fraction expansion of almost every number x0 ∈ (1/8, 1/7), that is,one has an = 7 for infinitely many values of n ∈ N. Later on, in Chapter 3, wewill prove a much more precise statement, that contains the following conclusion:for almost every x0 ∈ (0, 1) the number 7 occurs with frequency

1

log 2log

64

63

in the continued fraction expansion of x0. Try to guess right away where thisnumber comes from!


1.3.3 Circle rotations

Let us consider on the real line R the equivalence relation ∼ that identifies anynumbers whose difference is an integer number:

x ∼ y ⇔ x− y ∈ Z.

We represent by [x] ∈ R/Z the equivalence class of each x ∈ R and denote byR/Z the space of all equivalence classes. This space is called the circle andis also denoted by S1. The reason for this terminology is that R/Z can beidentified in a natural way with the unit circle z ∈ C : |z| = 1 on the complexplane, by means of the map

φ : R/Z → z ∈ C : |z| = 1, [x] 7→ e2πxi. (1.3.11)

Note that φ is well defined: since the function x 7→ e2πxi is periodic of period1, the expression e2πxi does not depend on the choice of a representative x forthe class [x]. Moreover, φ is a bijection.

The circle R/Z inherits from the real line R the structure of an abelian group,given by the operation

[x] + [y] = [x+ y].

Observe that this is well defined: the equivalence class on the right-hand sidedoes not depend on the choice of representatives x and y for the classes on theleft-hand side. Given θ ∈ R, we call rotation of angle θ the transformation

Rθ : R/Z → R/Z, [x] 7→ [x+ θ] = [x] + [θ].

Note that Rθ corresponds, via the identification (1.3.11), to the transformationz 7→ e2πθiz on z ∈ C : |z| = 1. The latter is just the restriction to the unitcircle of the rotation of angle 2πθ around the origin in the complex plane. It isclear from the definition that R0 is the identity map and Rθ Rτ = Rθ+τ forevery θ and τ . In particular, every Rθ is invertible and the inverse is R−θ.

We can also endow S1 with a natural structure of a probability space, asfollows. Let π : R → S1 be the canonical projection, that assigns to each x ∈ Rits equivalence class [x]. We say that a set E ⊂ S1 is measurable if π−1(E) is ameasurable subset of the real line. Next, let m be the Lebesgue measure on thereal line. We define the Lebesgue measure µ on the circle to be given by

µ(E) = m(π−1(E) ∩ [k, k + 1)

)for every k ∈ Z.

Note that the left-hand side of this equality does not depend on k, since, bydefinition, π−1(E) ∩ [k, k + 1) =

(π−1(E) ∩ [0, 1)

)+ k and the measure m is

invariant under translations.

It is clear from the definition that µ is a probability. Moreover, µ is invariantunder every rotation Rθ (according to Exercise 1.3.8, it is the only probabilitymeasure with this property). This can be shown as follows. By definition,

1.3. EXAMPLES 17

π−1(R−1θ (E)) = π−1(E) − θ for every measurable set E ⊂ S1. Let k be the

integer part of θ. Since m is invariant under all the translations,

m((π−1(E) − θ) ∩ [0, 1)

)= m

(π−1(E) ∩ [θ, θ + 1)

)

= m(π−1(E) ∩ [θ, k + 1)

)+m

(π−1(E) ∩ [k + 1, θ + 1)

).

Note that π−1(E)∩ [k+ 1, θ+ 1) =(π−1(E)∩ [k, θ)

)+ 1. So, the expression on

the right-hand side of the previous equality may be written as

m(π−1(E) ∩ [θ, k + 1)

)+m

(π−1(E) ∩ [k, θ)

)= m

(π−1(E) ∩ [k, k + 1)

).

Combining these two relations we find that

µ(R−1θ (E)

)= m

(π−1(R−1

θ (E) ∩ [0, 1)))

= m(π−1(E) ∩ [k, k + 1)

)= µ(E)

for every measurable set E ⊂ S1.The rotations Rθ : S1 → S1 exhibit two very different types of dynamical

behavior, depending on the value of θ. If θ is rational, say θ = p/q with p ∈ Zand q ∈ N, then

Rqθ([x]) = [x+ qθ] = [x] for every [x].

Consequently, in this case every point x ∈ S1 is periodic with period q. In theopposite case we have:

Proposition 1.3.4. If θ is irrational then O([x]) = Rnθ ([x]) : n ∈ N is adense subset of the circle R/Z for every [x].

Proof. We claim that the set D = m + nθ : m ∈ Z, n ∈ N is dense in R.Indeed, consider any number r ∈ R. Given any ε > 0, we may choose p ∈ Z andq ∈ N such that |qθ − p| < ε. Note that the number a = qθ − p is necessarilydifferent from zero, since θ is irrational. Let us suppose that a is positive (thecase when a is negative is analogous). Subdividing the real line into intervals oflength a, we see that there exists an integer number l such that 0 ≤ r− la < a.This implies that

|r − (lqθ − lp)| = |r − la| < a < ε.

As m = lq and n = −lq are integer numbers and ε is arbitrary, this proves thatr is in the closure of the set D, for every r ∈ R.

Now, given y ∈ R and ε > 0, we may take r = y− x and, using the previousparagraph, we may find m,n ∈ Z such that |m + nθ − (y − x)| < ε. This isequivalent to saying that the distance from [y] to the iterate Rnθ ([x]) is less thanε. Since x, y and ε are arbitrary, this shows that every orbit O([x]) is dense inS1.

In particular, it follows that every point on the circle is recurrent for Rθ (thisis also true when θ is rational). The previous proposition also leads to someinteresting conclusions in the study of the invariant measures of Rθ. Amongother things, we will learn later (in Chapter 6) that if θ is irrational then theLebesgue measure is the unique probability measure that is preserved by Rθ.Related to this, we will see that the orbits of Rθ are uniformly distributedsubsets of S1.


1.3.4 Rotations on tori

The notions we just presented can be generalized to arbitrary dimension, as weare going to explain. For each d ≥ 1, consider the equivalence relation on Rd

that identifies any two vectors whose difference is an integer vector:

(x1, . . . , xd) ∼ (y1, . . . , yd) ⇔ (x1 − y1, . . . , xd − yd) ∈ Zd.

We denote by [x] or [(x1, . . . , xd)] the equivalence class of any x = (x1, . . . , xd).Then we call d-dimensional torus or, simply, d-torus the space

Td = Rd/Zd = (R/Z)d

formed by those equivalence classes. Let m be the Lebesgue measure on Rd.The operation

[(x1, . . . , xd)] + [(y1, . . . , yd)] = [(x1 + y1, . . . , xd + yd)]

is well defined and turns Td into an abelian group. Given θ = (θ1, . . . , θd) ∈ Rd,we call

Rθ : Td → Td, Rθ([x]) = [x] + [θ]

the rotation by θ (sometimes, Rθ is also called the translation by θ). The map

φ : [0, 1]d → Td, (x1, . . . , xd) 7→ [(x1, . . . , xd)]

is surjective and allows us to define a Lebesgue probability measure µ on thed-torus, through the following formula:

µ(B) = m(φ−1(B)

)for every B ⊂ Td such that φ−1(B) is measurable.

This measure µ is invariant under Rθ for every θ.We say that a vector θ = (θ1, . . . , θd) ∈ Rd is rationally independent if, for

any integer numbers n0, n1, . . . , nd, we have that

n0 + n1θ1 + · · · + ndθd = 0 ⇒ n0 = n1 = · · · = nd = 0.

Otherwise, we say that θ is rationally dependent. One can show that θ isrationally independent if and only if the rotation Rθ is minimal, meaning thatthe orbit O([x]) = Rnθ ([x]) : n ∈ N of every [x] ∈ Td is a dense subset of Td.In this regard, see Exercises 1.3.9–1.3.10 and also Corollary 4.2.3.

1.3.5 Conservative maps

Let M be an open subset of the Euclidian space Rd and f : M → M be aC1 diffeomorphism. This means that f is a bijection, both f and its inversef−1 are differentiable and the two derivatives are continuous. Denote by volthe restriction to M of the Lebesgue measure (volume measure) on Rd. Theformula of change of variables asserts that, for any measurable set B ⊂M ,

vol(f(B)) =

∫

B

| detDf | dx. (1.3.12)

One can easily deduce the following consequence:

1.3. EXAMPLES 19

Lemma 1.3.5. A C1 diffeomorphism f : M →M preserves the volume measurevol if and only if the absolute value | detDf | of its Jacobian is equal to 1 at everypoint.

Proof. Suppose that the absolute value | detDf | of its Jacobian is equal to 1 atevery point. Let E be any measurable set E and B = f−1(E). The formula(1.3.12) yields

vol(E) =

∫

B

1 dx = vol(B) = vol(f−1(E)).

This means that f preserves the measure vol and so we proved the “if” part ofthe statement.

To prove the “only if”, suppose that | detDf(x)| > 1 for some point x ∈M .Then, since the Jacobian is continuous, there exists a neighborhood U of x andsome number σ > 1 such that

| detDf(y)| ≥ σ for all y ∈ U.

Then, applying (1.3.12) to B = U , we get that

vol(f(U)) ≥∫

U

σ dx ≥ σ vol(U).

Denote E = f(U). Since vol(U) > 0, the previous inequality implies thatvol(E) > vol(f−1(E)). Hence, f does not leave vol invariant. In precisely thesame way one shows that if | detDf(x)| < 1 for some point x ∈M then f doesnot leave the measure vol invariant.

1.3.6 Conservative flows

Now we discuss the invariance of the volume measure in the setting of flows f t :M →M , t ∈ R. As before, take M to be an open subset of the Euclidean spaceRd. Let us suppose that the flow is C1, in the sense that the map (t, x) 7→ f t(x)is differentiable and all the derivatives are continuous. Then, in particular, everyflow transformation f t : M → M is a C1 diffeomorphism: the inverse is f−t.Since f0 is the identity map and the Jacobian varies continuously, we have thatdetDf t(x) > 0 at every point.

Applying Lemma 1.3.5 in this context, we find that the flow preserves thevolume measure if and only if

detDf t(x) = 1 for every x ∈ U and every t ∈ R. (1.3.13)

However, this is not very useful in practice because most of the times we donot have an explicit expression for f t and, hence, it is not clear how to checkthe condition (1.3.13). Fortunately, there is a reasonably explicit expression forthe Jacobian of the flow that can be used in some interesting situations. Let usexplain this.


Let us suppose that the flow f t : M →M corresponds to the trajectories ofa C1 vector field F : M → Rd. In other words, each t 7→ f t(x) is the solutionof the differential equation

dy

dt= F (y) (1.3.14)

that has x as the initial condition (when dealing with differential equations wealways assume that their solutions are defined for all times).

The Liouville formula relates the Jacobian of f t to the divergence divF ofthe vector field:

detDf t(x) = exp( ∫ t

0

divF (fs(x)) ds)

for every x and every t.

Recall that the divergence of a vector field F is the trace of its Jacobian matrix,that is

divF =∂F1

∂x1+ · · · + ∂Fd

∂xd. (1.3.15)

Combining the Liouville formula with (1.3.13), we obtain

Lemma 1.3.6 (Liouville). The flow (f t)t associated with a C1 vector field Fpreserves the volume measure if and only if the divergence of F is identicallyzero.

We can extend this discussion to the case when M is any Riemannian man-ifold of dimension d ≥ 2. The reader who is unfamiliar with this notion maywish to check Appendix A.4.5 before proceeding.

For simplicity, let us suppose that the manifold is orientable. Then thevolume measure on M is given by a differentiable d-form ω, called volume form(this remains true in the non-orientable case, except that the form ω is definedup to sign only). What this means is that the volume of any measurable set Bcontained in the domain of local coordinates (x1, . . . , xd) is given by

vol(B) =

∫

B

ρ(x1, . . . , xd) dx1 · · · dxd

where ω = ρdx1 · · ·dxd is the expression of the volume form in those localcoordinates. Let F be a C1 vector field on M . Writing,

F (x1, . . . , xd) = (F1(x1, . . . , xd), . . . , Fd(x1, . . . , xd)),

we may express the divergence as

divF =∂(ρF )

∂x1+ · · · + ∂(ρF )

∂xd

(it can be shown that the right-hand side does not depend on the choice of thelocal coordinates). A proof of the following generalization of Lemma 1.3.6 canbe found in the book of Sternberg [Ste58]:

1.3. EXAMPLES 21

Theorem 1.3.7 (Liouville). The flow (f t)t associated with a C1 vector fieldF on a Riemannian manifold preserves the volume measure on the manifold ifand only if divF = 0 at every point.

Then, it follows from the recurrence theorem for flows that, assuming thatthe manifold has finite volume (for example, if M is compact) and divF = 0,then almost every point is recurrent for the flow of F .

1.3.7 Exercises

1.3.1. Use Lemma 1.3.3 to give another proof of the fact that the decimalexpansion transformation f(x) = 10x − [10x] preserves the Lebesgue measureon the interval.

1.3.2. Prove that, for any number x ∈ [0, 1] whose decimal expansion containsthe block 617 (for instance, x = 0, 3375617264 · · · ), that block occurs infinitelymany times in the decimal expansion of x. Even more, the block 617 occursinfinitely many times in the decimal expansion of almost every x ∈ [0, 1].

1.3.3. Prove that the number 617 appears infinitely many times in the continuedfraction expression of almost every number x0 ∈ (1/618, 1/617), that is, one hasan = 617 for infinitely many values of n ∈ N.

1.3.4. Let G be the Gauss map. Show that a number x ∈ (0, 1) is rational ifand only if there exists n ≥ 1 such that Gn(x) = 0.

1.3.5. Consider the sequence 1, 2, 4, 8, . . . , an = 2n, . . . of all the powers of 2.Prove that, given any digit i ∈ 1, . . . , 9, there exist infinitely many values ofn for which an starts with that digit.

1.3.6. Prove the following extension of Lemma 1.3.3. Let f : M → M be aC1 local diffeomorphism on a compact Riemannian manifold M . Let vol be thevolume measure on M and ρ : M → [0,∞) be a continuous function. Then fpreserves the measure µ = ρ vol if and only if

∑

x∈f−1(y)

ρ(x)

| detDf(x)| = ρ(y) for every y ∈M.

When f is invertible this means that f preserves the measure µ if and only ifρ(x) = ρ(f(x))| detDf(x)| for every x ∈M .

1.3.7. Check that if A is a d×d-matrix with integer coefficients and determinantdifferent from zero then the transformation fA : Td → Td defined on the torusby fA([x]) = [A(x)] preserves the Lebesgue measure on Td.

1.3.8. Show that the Lebesgue measure on S1 is the only probability measureinvariant under all the rotations of S1, even if we restrict to rational rotations.[Note: We will see in Chapter 6 that, for any irrational θ, the Lebesgue measureis the unique probability measure invariant under Rθ.]


1.3.9. Suppose that θ = (θ1, . . . , θd) is rationally dependent. Show that thereexists a continuous non-constant function ϕ : Td → C such that ϕ Rθ = ϕ.Conclude that there exist non-empty open subsets U and V of Td that aredisjoint and invariant under Rθ, in the sense that Rθ(U) = U and Rθ(V ) = V .Deduce that no orbit O([x]) of the rotation Rθ is dense in Td.

1.3.10. Suppose that θ = (θ1, . . . , θd) is rationally independent. Prove that ifV is a non-empty open subset of Td invariant under Rθ, then V is dense inTd. Conclude that ∪n∈ZR

nθ (U) is dense in the torus, for every non-empty open

subset U . Deduce that there exists [x] whose orbit O([x]) under the rotationRθ is dense in Td. Conclude that O([y]) is dense in Td for every [y].

1.3.11. Let U be an open subset of R2d and H : U → R be a C2 function. De-note by (p1, . . . , pd, q1, . . . , qd) the coordinate variables in R2d. The Hamiltonianvector field associated with H is defined by

F (p1 , . . . , pd , q1 , . . . , qd) =

(∂H

∂q1, . . . ,

∂H

∂qd,−∂H

∂p1, . . . , −∂H

∂pd

).

Check that the flow defined by F preserves the volume measure.

1.3.12. Let f : U → U be a C1 diffeomorphism preserving the volume measureon an open subset U of Rd. Let H : U → R be a first integral of f , that is, aC1 function such that H f = H . Let c be a regular value of H and ds be thevolume measure defined on the hypersurface Hc = H−1(c) by the restriction ofthe Riemannian metric of Rd. Prove that the restriction of f to the hypersurfaceHc preserves the measure ds/‖ gradH‖.

1.4 Induction

In this section we describe a general method, based on the Poincare recurrencetheorem, to construct from a given system (f, µ) other systems, that we refer toas systems induced by (f, µ). The reason this is interesting is the following. Onthe one hand, it is often the case that an induced system is easier to analyze,because it has better global properties than the original one. On the otherhand, interesting conclusions about the original system can often be obtainedfrom analyzing the induced one. Examples will appear in a while.

1.4.1 First return map

Let f : M →M be a measurable transformation and µ be an invariant probabil-ity measure. Let E ⊂M be a measurable set with µ(E) > 0 and ρ(x) = ρE(x)be the first-return time of x to E, as given by (1.2.1). The first-return map tothe domain E is the map g given by

g(x) = fρ(x)(x)

1.4. INDUCTION 23

whenever ρ(x) is finite. The Poincare recurrence theorem ensures that this isthe case for µ-almost every x ∈ E and so g is defined on a full measure subsetof E. We also denote by µE the restriction of µ to the measurable subsets E.

Proposition 1.4.1. The measure µE is invariant under the map g : E → E.

Proof. For every k ≥ 1, denote by Ek the subset of points x ∈ E such thatρ(x) = k. By definition, g(x) = fk(x) for every x ∈ Ek. Let B be anymeasurable subset of E. Then

µ(g−1(B)) =

∞∑

k=1

µ(f−k(B) ∩ Ek). (1.4.1)

On the other hand, since µ is f -invariant,

µ(B)

= µ(f−1(B)

)= µ

(f−1(B) ∩ E1

)+ µ

(f−1(B) \ E

). (1.4.2)

Analogously,

µ(f−1(B) \ E

)= µ

(f−2(B) \ f−1(E)

)

= µ(f−2(B) ∩ E2

)+ µ

(f−2(B) \ (E ∪ f−1(E))

).

Replacing this expression in (1.4.2), we find that

µ(B)

=2∑

k=1

µ(f−k(B) ∩ Ek

)+ µ

(f−2(B) \

1⋃

k=0

f−k(E)).

Repeating this argument successively, we obtain

µ(B)

=n∑

k=1

µ(f−k(B) ∩Ek

)+ µ

(f−n(B) \

n−1⋃

k=0

f−k(E)). (1.4.3)

Now let us go to the limit when n → ∞. It is clear that the last term isbounded above by µ

(f−n(E) \ ⋃n−1

k=0 f−k(E)

). So, using Remark 1.2.3, that

term converges to zero when n→ ∞. In this way we conclude that

µ(B)

=

∞∑

k=1

µ(f−k(B) ∩ Ek

).

Together with (1.4.1), this shows that µ(g−1(B)) = µ(B) for every measurablesubset B of E. That is to say, the measure µE is invariant under g.

Example 1.4.2. Consider the transformation f : [0,∞) → [0,∞) defined by

f(0) = 0 and f(x) = 1/x if x ∈ (0, 1) and f(x) = x− 1 if x ≥ 1.

Let E = [0, 1]. The time ρ of first-return to E is given by

ρ(0) = 1 and ρ(x) = k + 1 if x ∈ (1/(k + 1), 1/k] with k ≥ 1.


So, the first-return map to E is given by

g(0) = 0 and g(x) = 1/x− k if x ∈ (1/(k + 1), 1/k] with k ≥ 1.

In other words, g is the Gauss map. We saw in Section 1.3.2 that the Gauss mapadmits an invariant probability measure equivalent to the Lebesgue measure on[0, 1). From this one can draw some interesting conclusions about the originalmap f . For instance, using the ideas in the next section one gets that f admitsan (infinite) invariant measure equivalent to the Lebesgue measure on [0,∞).

1.4.2 Induced transformations

In an opposite direction, given any measure ν invariant under g : E → E, wemay construct a certain related measure νρ that is invariant under f : M →M .For this g does not even have to be a first-return map: the construction thatwe present in the sequel is valid for any map induced from f , that is, any mapof the form

g : E → E, g(x) = fρ(x)(x), (1.4.4)

where ρ : E → N is a measurable function (it suffices that ρ is defined on somefull measure subset of E). As before, we denote by Ek the subset of pointsx ∈ E such that ρ(x) = k. Then we define:

νρ(B) =

∞∑

n=0

∑

k>n

ν(f−n(B) ∩ Ek), (1.4.5)

for every measurable set B ⊂M .

Proposition 1.4.3. The measure νρ defined in (1.4.5) is invariant under f andsatisfies νρ(M) =

∫E ρ dν. In particular, νρ is finite if and only if the function

ρ is integrable with respect to ν.

Proof. First, let us prove that νρ is invariant. By the definition (1.4.5),

νρ(f−1(B)

)=

∞∑

n=0

∑

k>n

ν(f−(n+1)(B) ∩Ek

)=

∞∑

n=1

∑

k≥n

ν(f−n(B) ∩ Ek

).

We may rewrite this expression as follows:

νρ(f−1(B)

)=

∞∑

n=1

∑

k>n

ν(f−n(B) ∩Ek

)+

∞∑

k=1

ν(f−k(B) ∩Ek

). (1.4.6)

Concerning the last term, observe that

∞∑

k=1

ν(f−k(B) ∩ Ek

)= ν

(g−1(B)

)= ν

(B)

=

∞∑

k=1

ν(B ∩ Ek

),

1.4. INDUCTION 25

since ν is invariant under g. Replacing this in (1.4.6), we see that

νρ(f−1(B)

)=

∞∑

n=1

∑

k>n

ν(f−n(B) ∩ Ek

)+

∞∑

k=1

ν(B ∩ Ek

)= νρ

(B)

for every measurable set B ⊂ E. The second claim is a direct consequence ofthe definitions:

νρ(M) =

∞∑

n=0

∑

k>n

ν(f−n(M) ∩ Ek) =

∞∑

n=0

∑

k>n

ν(Ek) =

∞∑

k=1

kν(Ek) =

∫

E

ρ dν.

This completes the proof.

It is interesting to analyze how this construction relates to the one in theprevious section when g is a first-return map of f and the measure ν is therestriction µ | E of some invariant measure µ of f :

Corollary 1.4.4. If g is the first-return map of f to a measurable subset E andν = µ | E, then

1. νρ(B) = ν(B) = µ(B) for every measurable set B ⊂ E.

2. νρ(B) ≤ µ(B) for every measurable set B ⊂M .

Proof. By definition, f−n(E) ∩ Ek = ∅ for every 0 < n < k. This implies that,given any measurable set B ⊂ E, all the terms with n > 0 in the definition(1.4.5) are zero. Hence, νρ(B) =

∑k>0 ν(B∩Ek) = ν(B) as claimed in the first

part of the statement.Consider any measurable set B ⊂M . Then,

µ(B)

= µ(B ∩ E

)+ µ

(B ∩Ec

)= ν(B ∩ E) + µ

(B ∩ Ec

)

=∞∑

k=1

ν(B ∩ Ek

)+ µ

(B ∩ Ec

).

(1.4.7)

Since µ is invariant, µ(B∩Ec) = µ(f−1(B)∩f−1(Ec)

). Then, as in the previous

equality,

µ(B ∩ Ec

)= µ

(f−1(B) ∩E ∩ f−1(Ec)

)+ µ

(f−1(B) ∩ Ec ∩ f−1(Ec)

)

=

∞∑

k=2

ν(f−1(B) ∩Ek

)+ µ

(f−1(B) ∩ Ec ∩ f−1(Ec)

).

Replacing this in (1.4.7), we find that

µ(B)

=1∑

n=0

∑

k>n

ν(f−n(B) ∩ Ek

)+ µ

(f−1(B) ∩

1⋂

n=0

f−n(Ec)).


Repeating this argument successively, we get that

µ(B)

=

N∑

n=0

∑

k>n

ν(f−n(B) ∩ Ek

)+ µ

(f−N (B) ∩

N⋂

k=0

f−n(Ec))

≥N∑

n=0

∑

k>n

ν(f−n(B) ∩ Ek

)for every N ≥ 1.

Passing to the limit as N → ∞, we conclude that µ(B) ≥ νρ(B).

We also have from the Kac theorem (Theorem 1.2.2) that

νρ(M) =

∫

E

ρ dν =

∫

E

ρ dµ = µ(M) − µ(E∗0 ).

So, it follows from Corollary 1.4.4 that νρ = µ if and only if µ(E∗0 ) = 0.

Example 1.4.5 (Manneville-Pomeau). Given d > 0, let a be the only numberin (0, 1) such that a(1 + ad) = 1. Then define f : [0, 1] → [0, 1] as follows:

f(x) = x(1 + xd) if x ∈ [0, a] and f(x) =x− a

1 − aif x ∈ (a, 1].

The graph of f is depicted on the left-hand side of Figure 1.3. Observe that|f ′(x)| ≥ 1 at every point, and the inequality is strict at every x > 0. Let (an)nbe the sequence on the interval [0, a] defined by a1 = a and f(an+1) = an forn ≥ 1. We also write a0 = 1. Some properties of this sequence are studied inExercise 1.4.2.

...

00 1

1

1

1

a1

a1a1

f g

a2

a2a2 a3a3

Figure 1.3: Construction of an induced transformation

Now consider the map g(x) = fρ(x)(x), where

ρ : [0, 1] → N, ρ(x) = 1 + minn ≥ 0 : fn(x) ∈ (a, 1].

1.4. INDUCTION 27

In other words, ρ(x) = k and so g(x) = fk(x) for every x ∈ (ak, ak−1]. Thegraph of g is represented on the right-hand side of Figure 1.3. Note that therestriction to each interval (ak, ak−1] is a bijection onto (0, 1]. A key point isthat the induced map g is expanding:

|g′(x)| ≥ 1

1 − a> 1 for every x ∈ [0, 1].

Using the ideas that will be developed in Chapter 11, one can show that gadmits a unique invariant probability measure ν equivalent to the Lebesguemeasure on (0, 1]. In fact, the density (Radon-Nikodym derivative) of ν withrespect to the Lebesgue measure is bounded from zero and infinity. Then, thef -invariant measure νρ in (1.4.5) is equivalent to Lebesgue measure. It follows(see Exercise 1.4.2) that this measure is finite if and only if d ∈ (0, 1).

1.4.3 Kakutani-Rokhlin towers

It is possible, and useful, to generalize the previous constructions even further,by omitting the initial transformation f : M → M altogether. More precisely,given a transformation g : E → E, a measure ν on E invariant under g anda measurable function ρ : E → N, we are going to construct a transformationf : M → M and a measure νρ invariant under f such that E can be identifiedwith a subset of M , g is the first-return map of f to E, with first-return timegiven by ρ, and the restriction of νρ to E coincides with ν.

This transformation f is called the Kakutani-Rokhlin tower of g with timeρ. The measure νρ is finite if and only if ρ is integrable with respect to ν. Theyare constructed as follows. Begin by defining:

M = (x, n) : x ∈ E and 0 ≤ n < ρ(x)

=

∞⋃

k=1

k−1⋃

n=0

Ek × n.

In other words, M consists of k copies of each set Ek = x ∈ E : ρ(x) = k,“piled up” on top of each other. We call each ∪k>nEk × n the n-th floor ofM . See Figure 1.4.

Next, define f : M →M as follows:

f(x, n) =

(x, n+ 1) if n < ρ(x) − 1(g(x), 0) if n = ρ(x) − 1

.

In other words, the dynamics “lifts” each point (x, n) one floor at a time, untilreaching the floor ρ(x) − 1; at that stage, the point “falls” directly to (g(x), 0)on the ground (zero-th) floor. The ground floor E × 0 is naturally identifiedwith the set E. Besides, the first-return map to E × 0 corresponds preciselyto g : E → E.

Finally, the measure νρ is defined by

νρ | (Ek × n) = ν | Ek


......

E1 E2 E3 Ekground floor

1st floor

2nd floor

(k − 1)-st floor

k-th floor

g

Figure 1.4: Kakutani-Rokhlin tower of g with time ρ

for every 0 ≤ n < k. It is clear that the restriction of νρ to the ground floorcoincides with ν. Moreover, νρ is invariant under f and

νρ(M) =∞∑

k=1

kν(Ek) =

∫

E

ρ dν.

This completes the construction of the Kakutani-Rokhlin tower.

1.4.4 Exercises

1.4.1. Let f : S1 → S1 be the transformation f(x) = 2x mod Z. Show thatthe function τ(x) = mink ≥ 0 : fk(x) ∈ (1/2, 1) is integrable with respectto the Lebesgue measure. State and prove a corresponding result for any C1

transformation g : S1 → S1 that is close to f , in the sense that supx‖g(x) −f(x)‖, ‖g′(x) − f ′(x)‖ is sufficiently small.

1.4.2. Consider the measure νρ and the sequence (an)n defined in Exam-ple 1.4.5. Check that νρ is always σ-finite. Show that (an)n is decreasingand converges to zero. Moreover, there exist c1, c2, c3, c4 > 0 such that

c1 ≤ ajj1/d ≤ c2 and c3 ≤

(aj − aj+1

)j1+1/d ≤ c4 for every j. (1.4.8)

Deduce that the g-invariant measure νρ is finite if and only if d ∈ (0, 1).

1.4.3. Let σ : Σ → Σ be the map defined on the space Σ = 1, . . . , dZ byσ((xn)n) = (xn+1)n. Describe the first-return map g to the subset (xn)n ∈ Σ :x0 = 1.

1.4.4. [Kakutani-Rokhlin lemma] Let f : M →M be an invertible transforma-tion and µ be an invariant probability measure without atoms and such thatµ(∪n∈Nf

n(E)) = 1 for every E ⊂ M with µ(E) > 0. Show that for every

1.5. MULTIPLE RECURRENCE THEOREMS 29

n ≥ 1 and ε > 0 there exists a measurable set B ⊂ M such that the iteratesB, f(B), . . . , fn−1(B) are pairwise disjoint and the complement of their unionhas measure less than ε. In particular, this holds for every invertible systemthat is aperiodic, that is, whose periodic points form a zero measure set.

1.4.5. Let f : M → M be a transformation and (Hj)j≥1 be a collection ofsubsets of M such that if x ∈ Hn then f j(x) ∈ Hn−j for every 0 ≤ j < n. LetH be the set of points that belong to Hj for infinitely many values of j, thatis, H = ∩∞

k=1 ∪∞j=k Hj . For y ∈ H , define τ(y) = minj ≥ 1 : y ∈ Hj and

T (y) = f τ(y)(y). Observe that T maps H inside H . Moreover, show that

lim supn

1

n#1 ≤ j ≤ n : x ∈ Hj ≥ θ > 0 ⇒ lim inf

k

1

k

k−1∑

i=0

τ(T i(x)) ≤ 1

θ.

1.4.6. Let f : M → M be a transformation preserving a measure µ. Let(Hj)j≥1 and τ : M → N be as in Exercise 1.4.5. Consider the sequence offunctions (τn)n defined by τ1(x) = τ(x) and τn(x) = τ(f τn−1(x)(x)) + τn−1(x)for n > 1. Suppose that

lim supn

1

n#1 ≤ j ≤ n : x ∈ Hj ≥ θ > 0 for µ-almost every x ∈M .

Show that τn+1(x)/τn(x) → 1 for µ-almost every x ∈ M . [Note: Sequenceswith this property are called non-lacunary.]

1.5 Multiple recurrence theorems

Now we consider finite families of commuting maps fi : M → M , i = 1, . . . , q,that is, such that

fi fj = fj fi for every i, j ∈ 1, . . . , q.

Our goal is to explain that the results in Section 1.2 extend to this setting: wefind points that are simultaneously recurrent for these transformations.

The first result in this direction generalizes the Birkhoff recurrence theorem(Theorem 1.2.6):

Theorem 1.5.1 (Birkhoff multiple recurrence). Let M be a compact metricspace and f1, . . . , fq : M → M be continuous commuting maps. Then thereexists a ∈M and a sequence (nk)k → ∞ such that

limkfnki (a) = a for every i = 1, . . . , q. (1.5.1)

The key point here is that the sequence (nk)k does not depend on i: we saythat the point a is simultaneously recurrent for all the maps fi, i = 1, . . . , q. Aproof of Theorem 1.5.1 is given in Section 1.5.1. Next, we discuss the followinggeneralization of the Poincare recurrence theorem (Theorem 1.2.1):


Theorem 1.5.2 (Poincare multiple recurrence). Let (M,B, µ) be a probabilityspace and fi : M → M , i = 1, . . . , q be measurable commuting maps that pre-serve the measure µ. Then, given any set E ⊂ M with positive measure, thereexists n ≥ 1 such that

µ(E ∩ f−n

1 (E) ∩ · · · ∩ f−nq (E)

)> 0.

In other words, for a positive measure subset of points x ∈ E, their orbitsunder all the maps fi, i = 1, . . . , q return to E simultaneously at time n (we saythat n is a simultaneous return of x to E): once more, the crucial point withthe statement is that n does not depend on i.

The proof of Theorem 1.5.2 will not be presented here; we refer the inter-ested reader to the book of Furstenberg [Fur77]. We are just going to mentionsome direct consequences and, in Chapter 2, we will use this theorem to provethe Szemeredi theorem on existence of arithmetic progressions inside “dense”subsets of integer numbers.

To begin with, observe that the set of simultaneous returns is always infinite.Indeed, let n be as in the statement of Theorem 1.5.2. Applying the theoremto the set F = E ∩ f−n

1 (E) ∩ · · · ∩ f−nq (E), we find m ≥ 1 such that

µ(E ∩ f−(m+n)

1 (E) ∩ · · · ∩ f−(m+n)q (E)

)

≥ µ(F ∩ f−m

1 (F ) ∩ · · · ∩ f−mq (F )

)> 0.

Thus, m+n is also a simultaneous return to E, for all the points in some subsetof E with positive measure.

It follows that, for any set E ⊂ M with µ(E) > 0 and for µ-almost everypoint x ∈ E, there exist infinitely many simultaneous returns of x to E. Indeed,suppose there is a positive measure set F ⊂ E such that every point of F has afinite number of simultaneous returns to E. On the one hand, up to replacingF by a suitable subset, we may suppose that the simultaneous returns to E ofall the points of F are bounded by some k ≥ 1. On the other hand, using theprevious paragraph, there exists n > k such that G = F ∩f−n

1 (F )∩· · ·∩f−nq (F )

has positive measure. Now, it is clear from the definition that n is a simultaneousreturn to E of every x ∈ G. This contradicts the choice of F , thus proving ourclaim.

Another direct corollary is the Birkhoff multiple recurrence theorem (The-orem 1.5.1). Indeed, if fi : M → M , i = 1, . . . , q are continuous commutingtransformations on a compact metric space then there exists some probabilitymeasure µ that is invariant under all these transformations (this fact will bechecked in the next chapter, see Exercise 2.2.2). From this point on, we mayargue exactly as in the proof of Theorem 1.2.4. More precisely, consider anycountable basis Uk for the topology of M . According to the previous para-graph, for every k there exists a set Uk ⊂ Uk with zero measure such that everypoint in Uk\Uk has infinitely many simultaneous returns to Uk. Then U = ∪kUkhas measure zero and every point in its complement is simultaneously recurrent,in the sense of Theorem 1.5.1.


1.5.1 Birkhoff multiple recurrence theorem

In this section we prove Theorem 1.5.1 in the case when the transformationsf1, . . . , fq are homeomorphisms of M , which suffices for all our purposes in thepresent chapter. The general case may be deduced easily (see Exercise 2.4.7)using the concept of natural extension, which we will present in the next chapter.

The theorem may be reformulated in the following useful way. Consider thetransformation F : M q →M q defined on the product space M q = M × · · ·×Mby F (x1, . . . , xq) = (f1(x1), . . . , fq(xq)). Denote by ∆q the diagonal of M q,that is, the subset of points of the form x = (x, . . . , x). Theorem 1.5.1 claims,precisely, that there exist a ∈ ∆q and (nk)k → ∞ such that

limkFnk(a) = a. (1.5.2)

The proof of Theorem 1.5.1 is by induction on the number q of transforma-tions. The case q = 1 is contained in Theorem 1.2.6. Consider any q ≥ 2 andsuppose that the statement is true for every family of q− 1 commuting homeo-morphisms. We are going to prove that it is true for the family f1, . . . , fq.

Let G be the (abelian) group generated by the homeomorphisms f1, . . . , fq.We say that a set X ⊂M is G-invariant if g(X) ⊂ X for every g ∈ G. Observingthat the inverse g−1 is also in G, we see that this implies g(X) = X for everyg ∈ G. Just as we did in Theorem 1.2.6, we may use Zorn’s lemma to concludethat there exists some minimal, non-empty, closed, G-invariant set X ⊂M (thisis Exercise 1.5.2). The statement of the theorem is not affected if we replace Mby X . Thus, it is no restriction to assume that the ambient space M is minimal.This assumption is used as follows:

Lemma 1.5.3. If M is minimal then for every non-empty open set U ⊂ Mthere exists a finite subset H ⊂ G such that

⋃

h∈H

h−1(U) = M.

Proof. For any x ∈ M , the closure of the orbit G(x) = g(x) : g ∈ G is a non-empty, closed, G-invariant subset of M . So, the hypothesis that M is minimalimplies that every orbit G(x) is dense in M . In particular, there is g ∈ G suchthat g(x) ∈ U . This proves that g−1(U) : g ∈ G is an open cover of M . Bycompactness, it follows that there exists a finite subcover, as claimed.

Consider the product M q endowed with the distance function

d((x1, . . . , xq), (y1, . . . , yq)

)= maxd(xi, yi) : 1 ≤ i ≤ q.

Note that the map M → ∆q, x 7→ x = (x, . . . , x) is a homeomorphism, and evenan isometry for this choice of a distance. Every open set U ⊂ M correspondsto an open set U ⊂ ∆q through this homeomorphism. Given any g ∈ G,we denote by g : M q → M q the homeomorphism defined by g(x1, . . . , xq) =(g(x1), . . . , g(xq)). The fact that the group G is abelian implies that g commutes


with F ; note also that every g preserves the diagonal ∆q. Then the conclusionof Lemma 1.5.3 may be rewritten in the following form:

⋃

h∈H

h−1(U) = ∆q. (1.5.3)

Lemma 1.5.4. Given ε > 0 there exist x ∈ ∆q and y ∈ ∆q and n ≥ 1 suchthat d(Fn(x), y) < ε.

Proof. Define gi = fi f−1q for each i = 1, . . . , q−1. Since the maps fi commute

with each other, so do the maps gi. Then, by induction, there exist y ∈M and(nk)k → ∞ such that

limkgnk

i (y) = y for every i = 1, . . . , q − 1.

Denote xk = f−nkq (y) and consider xk = (xk, . . . , xk) ∈ ∆q. Then,

Fnk(xk) = (fnk1 f−nk

q (y), . . . , fnkq−1f

−nkq (y), fnk

q f−nkq (y))

= (gnk1 (y), . . . , gnk

q−1(y), y)

converges to (y, . . . , y, y) when k → ∞. This proves the lemma, with x = xk,y = (y, . . . , y, y) and n = nk for every k sufficiently large.

The next step is to show that the point y in Lemma 1.5.4 is arbitrary:

Lemma 1.5.5. Given ε > 0 and z ∈ ∆q there exist w ∈ ∆q and m ≥ 1 suchthat d(Fm(w), z) < ε.

Proof. Given ε > 0 and z ∈ ∆q, consider U = open ball of center z and radiusε/2. By Lemma 1.5.3 and the observation (1.5.3), we may find a finite set

H ⊂ G such that the sets h−1(U), h ∈ H cover ∆q. Since the elements of G are(uniformly) continuous functions, there exists δ > 0 such that

d(x1, x2) < δ ⇒ d(h(x1), h(x2)) < ε/2 for every h ∈ H.

By Lemma 1.5.4 there exist x, y ∈ ∆q and n ≥ 1 such that d(Fn(x), y) < δ. Fix

h ∈ H such that y ∈ h−1(U). Then,

d(h(Fn(x)), z

)≤ d(h(Fn(x)), h(y)

)+ d(h(y), z

)< ε/2 + ε/2.

Take w = h(x). Since h commutes with Fn, the previous inequality implies thatd(Fn(w), z) < ε, as we wanted to prove.

Next, we prove that one may take x = y in Lemma 1.5.4:

Lemma 1.5.6 (Bowen). Given ε > 0 there exist v ∈ ∆q and k ≥ 1 withd(F k(v), v) < ε.

Proof. Given ε > 0 and z0 ∈ ∆q, consider the sequences εj , mj and zj , j ≥ 1defined by recurrence as follows. Initially, take ε1 = ε/2.


• By Lemma 1.5.5 there are z1 ∈ ∆q and m1 ≥ 1 with d(Fm1(z1), z0) < ε1.

• By the continuity of Fm1 , there exists ε2 < ε1 such that d(z, z1) < ε2implies d(Fm1(z), z0) < ε1.

Next, given any j ≥ 2:

• By Lemma 1.5.5 there are zj ∈ ∆q andmj ≥ 1 with d(Fmj (zj), zj−1) < εj .

• By the continuity of Fmj , there exists εj+1 < εj such that d(z, zj) < εj+1

implies d(Fmj (z), zj−1) < εj.

In particular, for any i < j,

d(Fmi+1+···+mj (zj), zi) < εi+1 ≤ ε

2.

Since ∆q is compact, we can find i, j with i < j such that d(zi, zj) < ε/2. Takek = mi+1 + · · · +mj. Then,

d(F k(zj), zj) ≤ d(F k(zj), zi) + d(zi, zj) < ε.

This completes the proof of the lemma.

Now we are ready to conclude the proof of Theorem 1.5.1. For that, let usconsider the function

φ : ∆q → [0,∞), φ(x) = infd(Fn(x), x) : n ≥ 1.Observe that φ is upper semi-continuous: given any ε > 0, every point x admitssome neighborhood V such that φ(y) < φ(x) + ε for every y ∈ V . This is animmediate consequence of the fact that φ is given by the infimum of a family ofcontinuous functions. Then (Exercise 1.5.4), φ admits some continuity point a.We are going to show that this point satisfies the conclusion of Theorem 1.5.1.

Let us begin by observing that φ(a) = 0. Indeed, suppose that φ(a) ispositive. Then, by continuity, there exist β > 0 and a neighborhood V of a suchthat φ(y) ≥ β > 0 for every y ∈ V . Then,

d(Fn(y), y) ≥ β for every y ∈ V and n ≥ 1. (1.5.4)

On the other hand, according to (1.5.3), for every x ∈ ∆q there exists h ∈ Hsuch that h(x) ∈ V . Since the transformations h are uniformly continuous, wemay fix α > 0 such that

d(z, w) < α ⇒ d(h(z), h(w)

)< β for every h ∈ H. (1.5.5)

By Lemma 1.5.6, there exists n ≥ 1 such that d(x, Fn(x)) < α. Then, using(1.5.5) and recalling that F commutes with every h,

d(h(x), Fn(h(x))

)< β.

This contradicts (1.5.4). This contradiction proves that φ(a) = 0, as claimed.In other words, there exists (nk)k → ∞ such that d(Fnk(a), a) → 0 when

k → ∞. This means that (1.5.2) is satisfied and, hence, the proof of Theo-rem 1.5.1 is complete.


1.5.2 Exercises

1.5.1. Show, by means of examples, that the conclusion of Theorem 1.5.1 isgenerally false if the transformations fi do not commute with each other.

1.5.2. Let G be the abelian group generated by commuting homeomorphismsf1, . . . , fq : M → M on a compact metric space. Prove that there exists someminimal element X ⊂ M for the inclusion relation in the family of non-empty,closed, G-invariant subsets of M .

1.5.3. Show that if ϕ : M → R is an upper semi-continuous function on acompact metric space then ϕ attains its maximum, that is, there exists p ∈ Msuch that ϕ(p) ≥ ϕ(x) for every x ∈M .

1.5.4. Show that if ϕ : M → R is an (upper or lower) semi-continuous functionon a compact metric space then the set of continuity points of ϕ contains acountable intersection of open and dense subsets of M . In particular, the set ofcontinuity points is dense in M .

1.5.5. Let f : M → M be a measurable transformation preserving a finitemeasure µ. Given k ≥ 1 and a positive measure set A ⊂ M , show that foralmost every x ∈ A there exists n ≥ 1 such that f jn(x) ∈ A for every 1 ≤ j ≤ k.

1.5.6. Let f1, . . . , fq : M → M be commuting homeomorphisms on a compactmetric space. A point x ∈M is called non-wandering if for every neighborhoodU of x there exist n1, . . . , nq ≥ 1 such that fn1

1 · · · fnqq (U) intersects U . The

non-wandering set is the set Ω(f1, · · · , fq) of all non-wandering points. Provethat Ω(f1, · · · , fq) is non-empty and compact.

Chapter 2

Existence of Invariant

Measures

In this chapter we prove the following result, which guarantees the existence ofinvariant measures for a broad class of transformations:

Theorem 2.1 (Existence of invariant measures). Let f : M → M be a con-tinuous transformation on a compact metric space. Then there exists someprobability measure on M invariant under f .

The main point in the proof is to introduce a certain topology in the setM1(M) of probability measures on M , that we call weak∗ topology. The idea isthat two measures are close, with respect to this topology, if the integrals theyassign to (many) bounded continuous functions are close. The precise definitionand some of the properties of the weak∗ topology is presented in Section 2.1. Thecrucial property, that makes this topology so useful for proving the existencetheorem, is that it turns M1(M) into a compact space (Theorem 2.1.5).

The proof of Theorem 2.1 is given in Section 2.2. We will also see, throughexamples, that the hypotheses of continuity and compactness cannot be omitted.

In Section 2.3 we insert the construction of the weak∗ topology into a broaderframework from Functional Analysis and we also take the opportunity to intro-duce the notion of Koopman operator of a transformation, which will be veryuseful in the sequel. In particular, as we are going to see, it allows us to give analternative proof of Theorem 2.1, based on tools from Functional Analysis.

In Section 2.4 we describe certain explicit constructions of invariant measuresfor two important classes of systems: skew-products and natural extensions (orinverse limits) of non-invertible transformations.

Finally, in Section 2.5 we discuss some important applications of the idea ofmultiple recurrence (Section 1.5) in the context of Combinatorial Arithmetics.Theorem 2.1.5 has an important role in the arguments, which is the reason whythis discussion was postponed to the present chapter.

35

36 CHAPTER 2. EXISTENCE OF INVARIANT MEASURES

2.1 Weak∗ topology

In this section M will always be a metric space. Our goal is to define the so-called weak∗ topology in the set M1(M) of Borel probability measures on Mand to discuss its main properties.

Let d(·, ·) be the distance function onM and B(x, δ) denote the ball of centerx ∈M and radius δ > 0. Given B ⊂M , we define

d(x,B) = infd(x, y) : y ∈ B

and we call δ-neighborhood of B the set Bδ of points x ∈M with d(x,B) < δ.

2.1.1 Definition and properties of the weak∗ topology

Given a measure µ ∈ M1(M), a finite set Φ = φ1, . . . , φN of bounded contin-uous functions φi : M → R and a number ε > 0, we define

V (µ,Φ, ε) = ν ∈ M1(M) :∣∣∫φi dν −

∫φi dµ

∣∣ < ε for every i. (2.1.1)

Note that intersection of any two such sets contains some set of this form. Thus,the family V (µ,Φ, ε) : Φ, ε may be taken as a basis of neighborhoods of eachµ ∈ M1(M).

The weak∗ topology is the topology defined by these bases of neighborhoods.In other words, the open sets in the weak∗ topology are the sets A ⊂ M1(M)such that for every µ ∈ A there exists some V (µ,Φ, ε) contained in A. Observethat the definition depends only on the topology of M , not on its distance.Furthermore, this topology is Hausdorff: Proposition A.3.3 implies that if µ andν are distinct probabilities then there exist ε > 0 and some bounded continuousfunction φ : M → R such that V (µ, φ, ε) ∩ V (ν, φ, ε) = ∅.

Lemma 2.1.1. A sequence (µn)n∈N converges to a measure µ ∈ M1(M) in theweak∗ topology if and only if

∫φdµn →

∫φdµ for every bounded continuous function φ : M → R.

Proof. To prove the “only if” claim, consider any set Φ = φ consisting of asingle bounded continuous function φ. Since (µn)n → µ, for any ε > 0 thereexists n ≥ 1 such that µn ∈ V (µ,Φ, ε) for every n ≥ n. This means, precisely,that ∣∣

∫φdµn −

∫φdµ

∣∣ < ε for every n ≥ n.

In other words, the sequence (∫φdµn)n converges to

∫φdµ.

The converse asserts that if (∫φdµn)n converges to

∫φdµ for every bounded

continuous function φ then, given any Φ = φ1, . . . , φN and ε > 0, there existsn ≥ 1 such that µn ∈ V (µ,Φ, ε) for n ≥ n. To check that this is so, let

2.1. WEAK∗ TOPOLOGY 37

Φ = φ1, . . . , φN. The hypothesis ensures that for every i there exists ni suchthat ∣∣

∫φi dµn −

∫φi dµ

∣∣ < ε for every n ≥ ni .

Taking n = maxn1, . . . , nN we get that µn ∈ V (µ,Φ, ε) for every n ≥ n.

2.1.2 Theorem Portmanteau

Now let us discuss other useful ways of defining the weak∗ topology. Indeed,the relations (2.1.2), (2.1.3), (2.1.4) and (2.1.5) below introduce other naturalchoices for neighborhoods of a probability measure µ ∈ M1. In Theorem 2.1.2we prove that all these choices give rise to the same topology in M1(M), whichcoincides with the weak∗ topology.

A direct variation of the definition of weak∗ topology is obtained by takingas basis of neighborhoods the family of sets

V (µ,Ψ, ε) = η ∈ M1(M) :∣∣∫ψi dη −

∫ψi dµ

∣∣ < ε for every i (2.1.2)

where ε > 0 and Ψ = ψ1, . . . , ψN is a family of Lipschitz functions. Thenext definition is formulated in terms of closed subsets. Given any finite familyF = F1, . . . , FN of closed subsets of M and given any ε > 0, consider

Vf (µ,F , ε) = ν ∈ M1 : ν(Fi) < µ(Fi) + ε for every i. (2.1.3)

The next construction is analogous, just with open subsets instead of closedsubsets. Given any finite family A = A1, . . . , AN of open subsets of M andgiven any ε > 0, consider

Va(µ,A, ε) = ν ∈ M1 : ν(Ai) > µ(Ai) − ε for every i. (2.1.4)

We call continuity set of a measure µ any Borel subset B of M whose boundary∂B has zero measure for µ. Given any finite family B = B1, . . . , BN ofcontinuity sets of µ and given any ε > 0, consider

Vc(µ,B, ε) = ν ∈ M1 : |µ(Bi) − ν(Bi)| < ε for every i. (2.1.5)

Given any two topologies T1 and T2 in the same set, we say that T1 is weakerthan T2 (or, equivalently, that T2 is stronger than T1) if every subset that isopen for T1 is also open for T2. We say that the two topologies are equivalent ifthey have exactly the same open sets.

Theorem 2.1.2. The topologies defined by the bases of neighborhoods (2.1.1),(2.1.2), (2.1.3), (2.1.4) and (2.1.5) are all equivalent.

Proof. Since every Lipschitz function is continuous, it is clear that the topology(2.1.2) is weaker than the topology (2.1.1).

To show that the topology (2.1.3) is weaker than the topology (2.1.2), con-sider any finite family F = F1, . . . , FN of closed subsets of M . According


to Lemma A.3.4, for each δ > 0 and each i there exists a Lipschitz functionψi : M → [0, 1] such that XFi ≤ ψi ≤ XF δ

i. Observe that ∩δF δi = Fi, because

Fi is closed, and so µ(F δi ) → µ(Fi) when δ → 0. Fix δ > 0 small enough sothat µ(F δi ) − µ(Fi) < ε/2 for every i. Let Ψ be the set of functions ψ1, . . . , ψNobtained in this way. Observe that

∣∣∫ψi dν −

∫ψi dµ

∣∣ < ε/2 ⇒ ν(Fi) − µ(F δi ) < ε/2 ⇒ ν(Fi) ≤ µ(Fi) + ε

for every i. In other words, V (µ,Ψ, ε/2) is contained in Vf (µ,F , ε).It is easy to see that the topologies (2.1.3) and (2.1.4) are equivalent. In-

deed, let F = F1, . . . , Fn be any finite family of closed subsets and letA = A1, . . . , AN, where each Ai is the complement of Fi. Clearly,

Vf (µ,F , ε) = ν ∈ M1 : ν(Fi) < µ(Fi) + ε for every i= ν ∈ M1 : ν(Ai) > µ(Ai) − ε for every i = Va(µ,A, ε).

Next, let us show that the topology (2.1.5) is weaker than these equivalenttopologies (2.1.3) and (2.1.4). Given any finite family B = B1, . . . , BN ofcontinuity sets of µ, let Fi be the closure and Ai be the interior of each Bi.Denote F = F1, . . . , FN and A = A1, . . . , AN. Since µ(Fi) = µ(Bi) =µ(Ai),

ν(Fi) < µ(Fi) + ε ⇒ ν(Bi) < µ(Bi) + ε

ν(Ai) > µ(Ai) − ε ⇒ ν(Bi) > µ(Bi) − ε

for every i. This means that Vf (µ,F , ε)∩Va(µ,A, ε) is contained in Vc(µ,B, ε).Finally, let us prove that the topology (2.1.1) is weaker than the topology

(2.1.5). Let Φ = φ1, . . . , φN be a finite family of bounded continuous func-tions. Fix an integer number ℓ such that sup |φi(x)| < ℓ for every i. For each i,the pre-images φ−1

i (s), s ∈ [−ℓ, ℓ] are pairwise disjoint. Hence, µ(φ−1i (s)

)= 0

except for a countable set of values of s. In particular, we may choose k ∈ Nand points −ℓ = t0 < t1 < · · · < tk−1 < tk = ℓ such that tj − tj−1 < ε/2 andµ(φ−1

i (tj)) = 0 for every j. Then, each

Bi,j = φ−1i ((tj−1, tj ])

is a continuity set of µ. Moreover,

k∑

j=1

tj µ(Bi,j) ≥∫φi dµ ≥

k∑

j=1

tj−1 µ(Bi,j) >

k∑

j=1

tj µ(Bi,j) − ε/2

and we also have similar inequalities for the integrals relative to ν. It followsthat

∣∣∫φi dµ−

∫φi dν

∣∣ ≤k∑

j=1

ℓ |µ(Bi,j) − ν(Bi,j)| + ε/2 (2.1.6)

for every i. Denote B = Bi,j : i = 1, . . . , N and j = 1, . . . , k. Then therelation (2.1.6) implies that Vc(µ,B, ε/(2kℓ)) is contained in V (µ,Φ, ε).


2.1.3 The weak∗ topology is metrizable

Now assume that the metric space M is separable. We will see in Exercise 2.1.3that the weak∗ topology on M1(M) is separable. Here we show that it is alsometrizable: we exhibit a distance function on M1(M) that induces the weak∗

topology.Given µ, ν ∈ M1(M), define D(µ, ν) to be the infimum of all numbers δ > 0

such that

µ(B) < ν(Bδ) + δ and ν(B) < µ(Bδ) + δ (2.1.7)

for every Borel set B ⊂M .

Lemma 2.1.3. The function D is a distance on M1(M).

Proof. Let us start by showing that D(µ, ν) = 0 implies µ = ν. Indeed, thehypothesis implies

µ(B) ≤ ν(B) and ν(B) ≤ µ(B)

for every Borel set B ⊂ M , where B denotes the closure of B. When B isclosed, these inequalities mean that µ(B) = ν(B). As we have seen previously,any two measures that coincide on the closed subsets are necessarily the same.

We leave it to the reader to check all the other conditions in the definitionof a distance (Exercise 2.1.5).

This distance D is called the Levy-Prohorov metric on M1(M). In whatfollows we denote by BD(µ, r) the ball of radius r > 0 around any µ ∈ M1(M).

Proposition 2.1.4. If M is a separable metric space then the topology inducedby the Levy-Prohorov distance D coincides with the weak∗ topology on M1(M).

Proof. Let ε > 0 and F = F1, . . . , FN be a finite family of closed subsets ofM . Fix δ ∈ (0, ε/2) such that µ(F δi ) < µ(Fi) + ε/2 for every i. If ν ∈ BD(µ, δ)then

ν(Fi) < µ(F δi ) + δ < µ(Fi) + ε for every i,

which means that ν ∈ Vf (µ,F , ε). This shows that the topology induced bythe distance D is stronger than the topology (2.1.3) which, as we have seen, isequivalent to the weak∗ topology.

We are left to prove that ifM is separable then the weak∗ topology is strongerthan the topology induced by D. For that, let p1, p2, . . . be any countabledense subset of M . Given ε > 0, let us fix δ ∈ (0, ε/3). For each j, the spheres∂B(pj, r) = x : d(x, pj) = r, r > 0 are pairwise disjoint. So, we may findr > 0 arbitrarily small such that µ(∂B(pj , r)) = 0 for every j. Fix any such r,with r ∈ (0, δ/3). The family B(pj , r) : j = 1, 2, . . . is a countable cover of Mby continuity sets of µ. Fix k ≥ 1 such that the set U = ∪kj=1B(pj , r) satisfies

µ(U)> 1 − δ. (2.1.8)


Next, let us consider the (finite) partition P of U defined by the family of balls

B(pj , r) : j = 1, . . . , k.

That is, the elements of P are the maximal sets P ⊂ U such that, for each j,either P is contained in B(pj , r) or P is disjoint from B(pj , r). See Figure 2.1.Now let E be the family of all finite unions of elements of P . Note that theboundary of every element of E has measure zero, since it is contained in theunion of the boundaries of the balls B(pj , r), 1 ≤ j ≤ k. That is, every elementof E is a continuity set of µ.

Figure 2.1: Partition defined by a finite cover

If ν ∈ Vc(µ, E , δ) then

|µ(E) − ν(E)| < δ for every E ∈ E . (2.1.9)

In particular, (2.1.8) together with (2.1.9) imply that

ν(U)> 1 − 2δ. (2.1.10)

Now, given any Borel subset B, denote by EB the union of all the elements ofP that intersect B. Then EB ∈ E and so the relation (2.1.9) yields

|µ(EB) − ν(EB)| < δ.

Observe that B is contained in EB ∪ U c. Moreover, EB ⊂ Bδ because everyelement of P has diameter less than 2r < δ. These facts, together with (2.1.8)and (2.1.10), imply that

µ(B) ≤ µ(EB) + δ < ν(EB) + 2δ ≤ ν(Bδ) + 2δ

ν(B) ≤ ν(EB) + 2δ < µ(EB) + 3δ ≤ µ(Bδ) + 3δ.

Since 3δ < ε, these relations imply that ν ∈ BD(µ, ε).

One can show that if M is a complete separable metric space then the Levy-Prohorov metric on M1(M) is complete (and separable, by Exercise 2.1.3). See,for example, Theorem 6.8 in the book of Billingsley [Bil68].


2.1.4 The weak∗ topology is compact

In this section we take the metric space M to be compact. We are going toprove the following fundamental result:

Theorem 2.1.5. The space M1(M) is compact for the weak∗ topology.

Since we already know that M1(M) is metrizable, it suffices to prove:

Proposition 2.1.6. Every sequence (µk)k∈N in M1(M) has some subsequencethat converges in the weak∗ topology.

Proof. Let φn : n ∈ N be a countable dense subset of the unit ball of C0(M)(recall Theorem A.3.13). For each n ∈ N, the sequence of real numbers

∫φn dµk ,

k ∈ N is bounded by 1. Hence, for each n ∈ N there exists a sequence (knj )j∈N

such that∫φn dµkn

jconverges to some number Φn ∈ R when j → ∞.

Moreover, each sequence (kn+1j )j∈N may be chosen to be a subsequence of the

previous (knj )j∈N. Define ℓj = kjj for each j ∈ N. By construction, (ℓj)j∈N is asubsequence of every (knj )j∈N, up to finitely many terms. Hence,

(∫φn dµℓj

)

j

→ Φn for every n ∈ N.

One can easily deduce that

Φ(ϕ) = limj

∫ϕdµℓj (2.1.11)

exists, for every function ϕ ∈ C0(M). Indeed, suppose first that ϕ is in the unitball of C0(M). Given any ε > 0, we may find n ∈ N such that ‖ϕ − φn‖ ≤ ε.Then,

∣∣∫ϕdµℓj −

∫φn dµℓj

∣∣ ≤ ε

for every j. Since∫φn dµℓj converges (to Φn), it follows that

lim supj

∫ϕdµℓj − lim inf

j

∫ϕdµℓj ≤ 2ε.

Since ε is arbitrary, we find that limj

∫ϕdµℓj exists. This proves (2.1.11) when

the function is in the unit ball. The general case reduces immediately to thisone, just replacing ϕ with ϕ/‖ϕ‖. In this way, we have completed the proof of(2.1.11).

Finally, it is clear that the operator Φ : C0(M) → R defined by (2.1.11) islinear and positive: Φ(ϕ) ≥ minϕ ≥ 0 whenever ϕ ≥ 0 at all points. Moreover,


Φ(1) = 1. Thus, by Theorem A.3.11, there exists some Borel probability mea-sure µ on M such that Φ(ϕ) =

∫ϕdµ for every continuous function ϕ. Now,

the equality in (2.1.11) may be rewritten as∫ϕdµ = lim

j

∫ϕdµℓj for every ϕ ∈ C0(M).

According to Lemma 2.1.1, this means that the subsequence (µℓj )j∈N convergesto µ in the weak∗ topology.

As we observed previously, Theorem 2.1.5 is an immediate consequence ofthe proposition we have just proved.

2.1.5 Theorem of Prohorov

The theorem that we are going to state in this section provides a very generalcriterion for a family of probability measures to be compact. Indeed, the class ofmetric spaces M to which it applies includes virtually all interesting examples.

Definition 2.1.7. A set M of Borel measures in a topological space is tightif for every ε > 0 there exists a compact set K ⊂ M such that µ(Kc) < ε forevery measure µ ∈ M.

Note that when M consists of a single measure this definition correspondsexactly to Definition A.3.6. Clearly, tightness is a hereditary property: if a setis tight then all its subsets are also tight. Note also that if M is a compactmetric space then the space M1(M) of all probability measures is a tight set.So, the next result is an extension of Theorem 2.1.5:

Theorem 2.1.8 (Prohorov). Let M be a complete separable metric space. A setK ⊂ M1(M) is tight if and only if every sequence in K admits some subsequencethat is convergent in the weak∗ topology of M1(M).

Proof. We only prove the necessary condition, which is the most useful partof the statement. Then, in Exercise 2.1.8, we invite the reader to prove theconverse.

Suppose that K is tight. Consider an increasing sequence (Kl)l of compactsubsets of M such that η(Kc

l ) ≤ 1/l for every l and every η ∈ K. Fix anysequence (µn)n in K. To begin with, we claim that for every l there exists asubsequence (nj)j and there exists a measure νl on M such that νl(K

cl ) = 0

and (µnj | Kl)j converges to νl, in the sense that∫

Kl

ψ dµnj →∫

Kl

ψ dνl for every continuous function ψ : Kl → R. (2.1.12)

Indeed, that is a simple consequence of Theorem 2.1.5: up to restricting to asubsequence, we may suppose that the limit bl = limn µn(Kl) exists (note that1 ≥ bl ≥ 1 − 1/l); it follows from the theorem that the sequence of normalizedrestrictions (

(µn | Kl)/µn(Kl))n


admits a subsequence converging to some probability measure ηl ∈ M1(Kl); toconclude the proof of the claim it suffices to take ηl to be a probability measureon M with ηl(K

cl ) = 0 and to choose νl = blηl.

Next, using a diagonal argument analogous to the one in Proposition 2.1.6,we may choose a subsequence (nj)j in such a way that (2.1.12) holds, simul-taneously, for every l ≥ 1. Observe that the sequence (νl)l is non-decreasing:given k > l and any continuous function φ : M → [0, 1],

∫φdνl = lim

j

∫

Kl

φdµnj ≤ limj

∫

Kk

φdµnj =

∫φdνk.

Analogously, for any k > l and any continuous function φ : M → [0, 1],

∫φdνk −

∫φdνl = lim

j

∫

Kk\Kl

φdµnj ≤ lim supj

µnj (Kcl ) ≤ 1/l.

Using Exercise A.3.5, we may translate this in terms of measures of sets (ratherthan integrals of functions): for every k > l and every Borel set E ⊂M ,

νl(E) ≤ νk(E) ≤ νl(E) + 1/l. (2.1.13)

Define ν(E) = liml νl(E) for each Borel set E. We claim that ν is a prob-ability measure on M . It is immediate from the definition that ν(∅) = 0 andthat ν is additive. Furthermore, ν(M) = liml ν(Kl) = liml bl = 1. To showthat ν is countably additive (σ-additive), we use the criterion of continuity atthe empty set (Theorem A.1.14). Consider any decreasing sequence (Bn)n ofBorel subsets of M with ∩nBn = ∅. Given ε > 0, choose l such that 1/l < ε.Since νl is countably additive, Theorem A.1.14 shows that νl(Bn) < ε for everyn sufficiently large. Hence, ν(Bn) ≤ νl(Bn) + 1/l < 2ε for every n sufficientlylarge. This proves that (ν(Bn))n converges to zero and, by Theorem A.1.14, itfollows that ν is indeed countably additive.

The definition of ν implies (see Exercise 2.1.1 or Exercise 2.1.4) that (νl)lconverges to ν in the weak∗ topology. So, given ε > 0 and any bounded con-tinuous function ϕ : M → R, we have that |

∫ϕdνl −

∫ϕdν| < ε for every l

sufficiently large. Fix l such that, in addition, sup |ϕ|/l < ε. Then,

|∫ϕdµnj −

∫ϕdνl| ≤ |

∫

Kcl

ϕdµnj | + |∫

Kl

ϕdµnj −∫

Kl

ϕdνl| ≤ 2ε

for every j sufficiently large. This shows that |∫ϕdµnj −

∫ϕdν| < 3ε whenever

j is large enough. Thus, (µnj )j converges to ν in the weak∗ topology.

2.1.6 Exercises

2.1.1. Let M be a metric space and (µn)n be a sequence in M1(M). Show thatthe following conditions are all equivalent:

1. (µn)n converges to a probability measure µ in the weak∗ topology.


2. lim supn µn(F ) ≤ µ(F ) for every closed set F ⊂M .

3. lim infn µn(A) ≥ µ(A) for every open set A ⊂M .

4. limn µn(B) = µ(B) for every continuity set B of µ.

5. limn

∫ψ dµn =

∫ψ dµ for every Lipschitz function ψ : M → R.

2.1.2. Fix any dense subset F of the unit ball of C0(M). Show that a sequence(µn)n∈N of probability measures on M converges to some µ ∈ M1(M) in theweak∗ topology if and only if

∫φdµn converges to

∫φdµ, for every φ ∈ F .

2.1.3. Show that the subset formed by the measures with finite support is densein M1(M), relative to the weak∗ topology. Assuming that the metric space Mis separable, conclude that M1(M) is also separable.

2.1.4. The uniform topology in M1(M) is defined by the basis of neighborhoods

Vu(µ, ε) = ν ∈ M1(M) : |µ(B) − ν(B)| < ε for every B ∈ B

and the pointwise topology is defined by the basis of neighborhoods

Vp(µ,B, ε) = ν ∈ M1(M) : |µ(Bi) − ν(Bi)| < ε for every i

where ε > 0, n ≥ 1 and B = B1, . . . , BN is a finite family of measurable sets.Check that the uniform topology is stronger than the pointwise topology andthe latter is stronger than the weak∗ topology. Show, by means of examples,that these relations may be strict.

2.1.5. Complete the proof of Lemma 2.1.3.

2.1.6. Let Vk, k = 1, 2, . . . be random variables with real values, that is, mea-surable functions Vk : (X,B, µ) → R defined in some probability space (X,B, µ).The distribution function of Vk is the monotone function Fk : R → [0, 1] definedby Fk(a) = µ(x ∈ X : Vk(x) ≤ a). We say that (Vk)k converges in distributionto some random variable V if limk Fk(a) = F (a) for every continuity point a ofthe distribution function F of the random variable V . What does this have todo with the weak∗ topology?

2.1.7. Let (µn)n∈N be a sequence of probability measures converging to someµ in the weak∗ topology. Let B be a continuity set of µ with µ(B) > 0. Provethat the normalized restrictions (µn | B)/µn(B) converge to the normalizedrestriction (µ | B)/µ(B) when n→ ∞. What can be said if we replace continuitysets by closed sets or by open sets?

2.1.8. (Converse to the theorem of Prohorov) Prove that if K ⊂ M1(M) is suchthat every sequence in K admits some convergent subsequence then K is tight.

2.2. PROOF OF THE EXISTENCE THEOREM 45

2.2 Proof of the existence theorem

Given any f : M → M and any measure η on M , we denote by f∗η and calliterate (or image) of η under f the measure defined by

f∗η(B)

= η(f−1(B)

)

for each measurable set B ⊂ M . Note that the measure η is invariant under fif and only if f∗η = η.

Lemma 2.2.1. Let η be a measure and φ be a bounded measurable function.Then ∫

φdf∗η =

∫φ f dη. (2.2.1)

Proof. If φ is the characteristic function of a measurable set B then the relation(2.2.1) means that f∗η(B) = η(f−1(B)), which holds by definition. By linearityof the integral, it follows that (2.2.1) holds whenever φ is a simple function. Fi-nally, since every bounded measurable function can be approximated by simplefunctions (see Proposition A.1.33), it follows that the claim in the lemma is truein general.

Proposition 2.2.2. If f : M →M is continuous then f∗ : M1(M) → M1(M)is continuous relative to the weak∗ topology.

Proof. Let ε > 0 and Φ = φ1, . . . , φn be any family of bounded continuousfunctions. Since f is continuous, the family Ψ = φ1f, . . . , φnf also consistsof bounded continuous functions. By the previous lemma,

|∫φi d(f∗µ) −

∫φi d(f∗ν)| = |

∫(φi f) dµ−

∫(φi f) dν|

and so the left-hand side is smaller than ε if the right-hand side is smaller thanε. That means that

f∗(V (µ,Ψ, ε)

)⊂ V (f∗µ,Φ, ε)) for every µ, Φ and ε

and this last fact shows that f∗ is continuous.

At this point, Theorem 2.1 could be deduced from the classical Schauder-Tychonoff fixed point theorem for continuous operators in topological vectorspaces. A topological vector space is a vector space V endowed with a topologyrelative to which both operations of V (addition and multiplication by a scalar)are continuous. A set K ⊂ V is said to be convex if (1− t)x+ ty ∈ K for everyx, y ∈ K and every t ∈ [0, 1].

Theorem 2.2.3 (Schauder-Tychonoff). Let F : V → V be a continuous trans-formation on a topological vector space V . Suppose that there exists a compactconvex set K ⊂ V such that F (K) ⊂ K. Then F (v) = v for some v ∈ K.


Theorem 2.1 corresponds to the special case when V = M(M) is the spaceof complex measures, K = M1(M) is the space of probability measures on Mand F = f∗ is the action of f on M(M).

However, the situation of Theorem 2.1 is a lot simpler than the generalcase of the Schauder-Tychonoff theorem because the operator f∗ besides beingcontinuous is also linear. This allows for a direct and elementary proof of The-orem 2.1 that also provides some additional information about the invariantmeasure.

To that end, let ν be any probability measure on M : for example, ν couldbe the Dirac mass at any point. Form the sequence of probability measures

µn =1

n

n−1∑

j=0

f j∗ν, (2.2.2)

where f j∗ν is the image of ν under the iterate f j. By Theorem 2.1.5, thissequence has some accumulation point, that is, there exists some subsequence(nk)k∈N and some probability measure µ ∈ M1(M) such that

1

nk

nk−1∑

j=0

f j∗ν → µ (2.2.3)

in the weak∗ topology. Now we only need to prove:

Lemma 2.2.4. Every accumulation point of a sequence (µn)n∈N of the form(2.2.2) is a probability measure invariant under f .

Proof. The relation (2.2.3) asserts that, given any family Φ = φ1, . . . , φN ofbounded continuous functions and given any ε > 0, we have

∣∣ 1

nk

nk−1∑

j=0

∫(φi f j) dν −

∫φi dµ

∣∣ < ε/2 (2.2.4)

for every i and every k sufficiently large. By Proposition 2.2.2,

f∗µ = f∗(limk

1

nk

nk−1∑

j=0

f j∗ν)

= limk

1

nk

nk∑

j=1

f j∗ν . (2.2.5)

Now observe that

∣∣ 1

nk

nk−1∑

j=0


1

nk

nk∑

j=1

∫(φi f j) dν

∣∣

=1

nk

∣∣∫φi dν −

∫(φi fnk) dν

∣∣ ≤ 2

nksup |φi|

and the latter expression is smaller than ε/2 for every i and every k sufficientlylarge. Combining this fact with (2.2.4), we conclude that

∣∣ 1

nk

nk∑

j=1


∫φi dµ

∣∣ < ε (2.2.6)

2.2. PROOF OF THE EXISTENCE THEOREM 47

for every i and every k sufficiently large. This means that

1

nk

nk∑

j=1

f j∗ν → µ

when k → ∞. However, (2.2.5) means that this sequence converges to f∗µ. Byuniqueness of the limit, it follows that f∗µ = µ.

Now the proof of Theorem 2.1 is complete. The examples that follow showthat neither of the two hypotheses in the theorem, continuity and compactness,may be omitted.

Example 2.2.5. Consider f : (0, 1] → (0, 1] given by f(x) = x/2. Supposethat f admits some invariant probability measure: we are going to show thatthis is actually not true. By the recurrence theorem (Theorem 1.2.4), relative tothat probability measure almost every point of (0, 1] is recurrent. However, it isclear that there are no recurrent points: the orbit of every x ∈ (0, 1] convergesto zero and, in particular, does not accumulate on the initial point x. Hence,f is an example of a continuous transformation (on a non-compact space) thatdoes not have any invariant probability measure.

Example 2.2.6. Modifying a little the previous construction, we see that thesame phenomenon may occur in compact spaces, if the transformation is notcontinuous. Consider f : [0, 1] → [0, 1] given by f(x) = x/2 if x 6= 0 andf(0) = 1. For the same reason as before, no point x ∈ (0, 1] is recurrent. So, ifthere exists some invariant probability measure µ then it must give full weightto the sole recurrent point x = 0. In other words, µ must be the Dirac masssupported at zero, that is, the measure δ0 defined by

δ0(E) = 1 if 0 ∈ E and δ0(E) = 0 if 0 /∈ E.

However, the measure δ0 is not invariant under f : for example, the measurableset E = 0 has measure 1 and yet its pre-image f−1(E) is the empty set, whichhas measure zero. Thus, this transformation f has no invariant probabilitymeasures.

Our third example is of a different nature. We include it to stress the limita-tions of Theorem 2.1 (which are inherent to its great generality): the measureswhose existence is ensured by the theorem may be completely trivial; for ex-ample, in the situation that we are going to describe “almost every point” justmeans the point x = 0. For this reason, an important objective in ErgodicTheory is to construct more sophisticated invariant measures, with additionalinteresting properties such as, for instance, being equivalent to the Lebesguemeasure.

Example 2.2.7. Consider f : [0, 1] → [0, 1] given by f(x) = x/2. This is acontinuous transformation on a compact space. So, by Theorem 2.1, f admits


some invariant probability measure. Using the same arguments as in the previ-ous example, we find that there exists a unique invariant probability measure,namely, the Dirac mass δ0 at the origin. Note that in this case the measure δ0is indeed invariant.

As an immediate application of Theorem 2.1, we have the following alter-native proof of the Birkhoff recurrence theorem (Theorem 1.2.6). Suppose thatf : M → M is a continuous transformation on a compact metric space. ByTheorem 2.1, there exists some f -invariant probability measure µ. Every com-pact metric space admits a countable basis of open sets. So, we may applyTheorem 1.2.4 to conclude that µ-almost every point is recurrent. In particular,the set of recurrent points is non-empty, as stated by Theorem 1.2.6.

2.2.1 Exercises

2.2.1. Prove the following generalization of Lemma 2.2.4. Let f : M → Mbe a continuous transformation on a compact metric space, ν be a probabilitymeasure on M and (In)n be a sequence of intervals of natural numbers such that#In converges to infinity when n goes to infinity. Then every accumulation pointof the sequence

µn =1

#In

∑

j∈In

f j∗ν

is an f -invariant probability measure.

2.2.2. Let f1, . . . , fq : M → M be any finite family of commuting continuoustransformations on a compact metric space. Prove that there exists some prob-ability measure µ that is invariant under fi for every i ∈ 1, . . . , q. In fact,the conclusion remains true for any countable family fj : j ∈ N of commutingcontinuous transformations on a compact metric space.

2.2.3. Let f : [0, 1] → [0, 1] be the decimal expansion transformation. Showthat for every k ≥ 1 there exists some invariant probability measure whosesupport is formed by exactly k points (in particular, f admits infinitely manyinvariant probability measures). Determine whether there are invariant proba-bility measures µ such that

(a) the support of µ is infinite countable;

(b) the support of µ is non-countable but has empty interior;

(c) the support of µ has non-empty interior but µ is singular with respect tothe Lebesgue measure m.

2.2.4. Prove the theorem of existence of invariant measures for continuous flows:every continuous flow (f t)t∈R on a compact metric space admits some invariantprobability measure.

2.2.5. Show that the transformation f : [−1, 1] → [−1, 1], f(x) = 1 − 2x2 hassome invariant probability measure equivalent to the Lebesgue on the interval.

2.3. COMMENTS IN FUNCTIONAL ANALYSIS 49

2.2.6. Let f : M → M be an invertible measurable transformation and m bea probability measure on M such that m(A) = 0 if and only if m(f(A)) = 0.We say that the pair (f,m) is totally dissipative if there exists a measurableset W ⊂ M whose iterates f j(W ), j ∈ Z are pairwise disjoint and such thattheir union has full measure. Prove that if (f,m) is totally dissipative then fadmits some σ-finite invariant measure equivalent to Lebesgue measure m. Thismeasure is necessarily infinite.

2.2.7. Let f : M →M be an invertible measurable transformation and m be aprobability measure on M such that m(A) = 0 if and only if m(f(A)) = 0. Wesay that the pair (f,m) is conservative if there is no measurable set W ⊂ Mwith positive measure whose iterates f j(W ), j ∈ Z are pairwise disjoint. Showthat if (f,m) is conservative then, for every measurable set X ⊂ M , m-almostevery point of X returns to X infinitely times.

2.2.8. Suppose that (f,m) is conservative. Show that f admits a σ-finite invari-ant measure µ equivalent tom if and only if there exist setsX1 ⊂ · · · ⊂ Xn ⊂ · · ·with M = ∪nXn and m(Xn) < ∞ for every n, such that the first-return mapfn to each Xn admits a finite invariant measure µn absolutely continuous withrespect to the restriction of m to Xn.

2.2.9. Find conservative pairs (f,m) such that f has no finite invariant mea-sures equivalent to m. [Observation: Ornstein [Orn60] gave examples such thatf does not even have σ-finite invariant measures equivalent to m.]

2.3 Comments in Functional Analysis

The definition of weak∗ topology in the space of probability measures is a specialcase of a construction from Functional Analysis that is worthwhile recalling here.It leads us to introducing a certain linear isometry Uf in the space L1(µ), calledthe Koopman operator of the system (f, µ). These operators have an importantrole in Ergodic Theory because they allow for powerful tools from Analysis tobe used in the study of invariant measures. To illustrate this fact, we presentan alternative proof of Theorem 2.1 based on the spectral properties of theKoopman operator.

2.3.1 Duality and weak topologies

Let E be a Banach space, that is, a vector space endowed with a complete norm.The dual of E is the space E∗ of all continuous linear functionals defined on E.This is also a Banach space, with the norm

‖g‖ = sup |g(v)|

‖v‖ : v ∈ E \ 0. (2.3.1)

The weak topology in the space E is the topology defined by the following basisof neighborhoods:

V (v, g1, . . . , gN, ε) = w ∈ E : |gi(v) − gi(w)| < ε for every i, (2.3.2)


where g1, . . . , gN ∈ E∗. In terms of sequences, it satisfies

(vn)n → v ⇒ (g(vn))n → g(v) for every g ∈ E∗.

The weak∗ topology in the dual space E∗ is the topology defined by the followingbasis of neighborhoods:

V ∗(g, v1, . . . , vN, ε) = h ∈ E∗ : |g(vi) − h(vi)| < ε for every i, (2.3.3)

where v1, . . . , vN ∈ E. It satisfies

(gn)n → g ⇒ (gn(v))n → g(v) for every v ∈ E.

The weak∗ topology has the following remarkable property:

Theorem 2.3.1 (Banach-Alaoglu). The closed unit ball of E∗ is compact forthe weak∗ topology.

The construction carried out in the previous sections corresponds to the sit-uation where E is the space C0(M) of (complex) continuous functions and E∗

is the space M(M) of complex measures on a compact metric space M : accord-ing to the theorem of Riesz-Markov (Theorem A.3.12), M(M) corresponds tothe dual of C0(M) when we identify each measure µ ∈ M(M) with the linearfunctional Iµ(φ) =

∫φdµ. Note that the definition of the norm (2.3.1) implies

that

‖µ‖ = sup |∫φdµ|

sup |φ| : φ ∈ C0(M) \ 0.

In particular, the set M1(M) of probability measures is contained in the unitball of M(M). Since this set is closed for the weak∗ topology, we conclude thatTheorem 2.1.5 is also a direct consequence of the theorem of Banach-Alaoglu.

Now consider any continuous transformation f : M → M and the corre-sponding action f∗ : M(M) → M(M), µ 7→ f∗µ in the space of complexmeasures. Then f∗ is a linear operator on M(M) and it is continuous withrespect to the weak∗ topology. There exists another continuous linear opera-tor naturally associated with f , namely Uf : C0(M) → C0(M), φ 7→ φ f .Observe that these two operators are dual, in the following sense (rememberLemma 2.2.1):

∫Uf (φ) dµ =

∫(φ f) dµ =

∫φd(f∗µ). (2.3.4)

These observations motivate the important notion that we are going to introducein the next section.

2.3.2 Koopman operator

Let (M,B) be a measurable space, f : M →M be a measurable transformationand µ be an f -invariant measure. The Koopman operator of (f, µ) is the linearoperator

Uf : L1(µ) → L1(µ), Uf (φ) = φ f.

2.3. COMMENTS IN FUNCTIONAL ANALYSIS 51

Note that Uf is well defined and is an isometry, that is, it preserves the normin the Banach space L1(µ): since µ is invariant under f ,

‖Uf(φ)‖1 =

∫|Uf (φ)| dµ =

∫|φ| f dµ =

∫|φ| dµ = ‖φ‖1. (2.3.5)

Moreover, Uf is a positive linear operator: Uf(φ) ≥ 0 at µ-almost every pointwhenever φ ≥ 0 at µ-almost every point. For future reference, we summarizethese facts in the following proposition:

Proposition 2.3.2. The Koopman operator Uf : L1(M) → L1(M) of anysystem (f, µ) is a positive linear isometry.

The property (2.3.5) implies that the operator Uf is injective. In general,Uf is not surjective (see Exercise 2.3.5). It is clear that if f is invertible thenUf is an isomorphism: the inverse is just the Koopman operator Uf−1 of theinverse of f .

We may also consider versions of the Koopman operator defined on thespaces Lp(µ),

Uf : Lp(µ) → Lp(µ), Uf (φ) = φ ffor each p ∈ [1,∞]. Proposition 2.3.2 remains valid in all these cases: all theseoperators are positive linear isometries.

When M is a compact metric space and f is continuous, it is particularlyinteresting to investigate the action of Uf restricted to the space C0(M) ofcontinuous functions:

Uf : C0(M) → C0(M).

It is clear that this operator is continuous relative to the norm of uniform con-vergence. As we have seen previously, the dual space of C0(M) is naturallyidentified with the space M(M) of complex measures on M . Moreover, therelation (2.3.4) shows that, under that identification, the dual operator

U∗f : C0(M)∗ → C0(M)∗

corresponds precisely to the action f∗ : M(M) → M(M) of the transformationf on M(M). This fact allows us to give an alternative proof of Theorem 2.1,based on certain facts from Spectral Theory.

For that, we need to recall some notions from the theory of positive linearoperators. The reader can find a lot more details in the book of Deimling [Dei85],including the proofs of the results quoted in the sequel.

Let E be a Banach space. A closed convex subset C is called a cone of E, ifit satisfies:

λC ⊂ C for every λ ≥ 0 and C ∩ (−C) = 0. (2.3.6)

We call the cone C normal if

inf‖x+ y‖ : x, y ∈ C such that ‖x‖ = ‖y‖ = 1 > 0.


Let us fix a cone C of E. Given any continuous linear operator T : E → E,we say that T is positive over C if the image T (C) ⊂ C. Given a continuouslinear functional φ : E → R, we say that φ is positive over C if φ(v) ≥ 0 forevery v ∈ C. By definition, the dual cone C∗ is the cone of E∗ formed by allthe linear functionals positive over C.

Example 2.3.3. The cone C0+(M) = ϕ ∈ C0(M) : ϕ ≥ 0 is a normal cone of

C0(M) (Exercise 2.3.3). By the Riesz-Markov theorem (Theorem A.3.11), thedual cone is naturally identified with the space of finite positive measures onM .

Denote by r(T ) the spectral radius of the continuous linear operator T :

r(T ) = limn

n√‖T n‖.

Then r(T ) = r(T ∗), where T ∗ : E∗ → E∗ represents the linear operator dualto T . The next result is a consequence of the theorem of Banach-Mazur; seeProposition 7.2 in the book of Deimling [Dei85]:

Theorem 2.3.4. Let C be a normal cone of a Banach space E and T : E → Ebe a linear operator positive over C. Then r(T ∗) is an eigenvalue of the dualoperator T ∗ : E∗ → E∗ and it admits some eigenvector v∗ ∈ C∗.

As an application, let us give an alternative proof of the existence of invariantmeasures for continuous transformations on compact spaces. Consider the coneC = C0

+(M) of E = C0(M). As we observed before, the dual cone C∗ is thespace of finite positive measures on M . It is clear from the definition thatthe operator T = Uf is positive over C. Besides, its spectral radius is equalto 1, since sup |T (ϕ)| ≤ sup |ϕ| for every ϕ ∈ C0(M) and T (1) = 1. So, byTheorem 2.3.4, there exists some finite positive measure µ on M that is aneigenvalue of the dual operator T ∗ = f∗ associated with the eigenvalue 1. Inother words, the measure µ is invariant. Multiplying by a suitable constant, wemay suppose that µ is a probability measure.

2.3.3 Exercises

2.3.1. Let ℓ1 be the space of summable sequences of complex numbers, endowedwith the norm ‖(an)n‖1 =

∑∞n=0 |an|. Let ℓ∞ be the space of bounded sequences

and c0 be the space of sequences converging to zero, both endowed with the norm‖(an)n‖∞ = supn≥0 |an|.

(a) Check that ℓ∞, ℓ1 and c0 are Banach spaces.

(b) Show that the map (an)n 7→ [(bn)n 7→∑n anbn] defines norm-preserving

isomorphisms from ℓ∞ to the dual space (ℓ1)∗ and from ℓ1 to the dualspace (c0)

∗.

2.4. SKEW-PRODUCTS AND NATURAL EXTENSIONS 53

2.3.2. Show that a sequence (xk)k in ℓ1 (write xk = (xkn)n for each k) convergesin the topology defined by the norm if and only if it converges in the weaktopology, that is, if and only if (

∑n anx

kn)k converges for every (an)n ∈ ℓ∞.

[Observation: This does not imply that the two topologies are the same. Whynot?] Show that this is no longer true if we replace the weak topology by theweak∗ topology.

2.3.3. Check that C0+(M) is a normal cone.

2.3.4. Let Rθ : S1 → S1 be an irrational rotation and m be the Lebesguemeasure on the circle. Calculate the eigenvalues and the eigenvectors of theKoopman operator Uθ : L2(m) → L2(m). Show that the spectrum of Uθ coin-cides with the unit circle z ∈ C : |z| = 1.

2.3.5. Show, through examples, that the Koopman operator Uf need not besurjective.

2.3.6. Let U : H → H be an isometry of a Hilbert space H . By Exercise A.6.8,the image of U is a closed subspace of H . Deduce that there exist closed sub-spaces V and W such that U(V ) = V , the iterates of W are pairwise orthogonaland orthogonal to V , and

H = V ⊕∞⊕

n=0

Un(W ).

Furthermore, U is an isomorphism if and only if W = 0.

2.3.7. Let φ : E → R be a continuous convex functional on a separable Banachspace E. Assume that φ is differentiable in all directions at some point u ∈ E.Prove that there exists at most one bounded linear functional T : E → R tangentto φ at u, that is, such that T (v) ≤ φ(u + v) − φ(u) for every v ∈ E. If φ isdifferentiable at u then the derivativeDφ(u) is a linear functional tangent to φ atu. [Observation: The smoothness theorem of Mazur (Theorem 1.20 in the bookof Phelps [Phe93]) states that the set of points where φ is differentiable and,consequently, there exists a unique linear functional tangent to φ is a residualsubset of E.]

2.4 Skew-products and natural extensions

In this section we describe two general constructions that are quite useful in Er-godic Theory. The first one is a basic model for the situation where two dynam-ical systems are coupled in the following way: the first system is autonomousbut the second one is not, because its evolution depends on the evolution ofthe former. The second construction associates an invertible system with anygiven system, in such a way that their invariant measures are in one-to-onecorrespondence. This permits to reduce to the invertible case many statementsabout general, not necessarily invertible systems.


2.4.1 Measures on skew-products

Let (X,A) and (Y,B) be measurable spaces. We call skew-product any measur-able transformation F : X × Y → X × Y of the form F (x, y) = (f(x), g(x, y)).Represent by π : X × Y → X the canonical projection to the first coordinate.By definition,

π F = f π. (2.4.1)

Let m be a probability measure on X × Y invariant under F and let µ = π∗mbe its projection to X . Using that m is invariant under F , we get that

f∗µ = f∗π∗m = π∗F∗m = π∗m = µ,

that is, µ is invariant under f . The next proposition provides a partial converseto this conclusion: under suitable hypotheses, every f -invariant measure is theprojection of some F -invariant measure.

Proposition 2.4.1. Let X be a complete separable metric space, Y be a compactmetric space and F be continuous. Then, for every probability measure µ on Xinvariant under f there exists some probability measure m on X × Y invariantunder F and such that π∗m = µ.

Proof. Given any f -invariant probability measure invariant µ on X , let K ⊂M1(X × Y ) be the set of measures η on X × Y such that π∗η = µ. Considerany η ∈ K. Then, π∗F∗η = f∗π∗η = f∗µ = µ. This proves that K is invariantunder F∗. Next, note that the projection π : X × Y → X is continuous and,thus, the operator π∗ is continuous relative to the weak∗ topology. So, K isclosed in M1(X × Y ). By Proposition A.3.7, given any ε > 0 there existsa compact set K ⊂ X such that µ(Kc) < ε. Then K × Y is compact andη((K × Y )c

)= µ(Kc) < ε for every η ∈ K. This proves that the set K is tight.

Consider any η ∈ K. By the theorem of Prohorov (Theorem 2.1.8), the sequence

1

n

n−1∑

j=0

F j∗ η

has some accumulation point m ∈ K. Arguing as in the proof of Lemma 2.2.4,we conclude that m is invariant under F .

2.4.2 Natural extensions

We are going to see that, given any surjective transformation f : M → M , onecan always find an extension f : M → M that is invertible. By extension wemean that there exists a surjective map π : M → M such that π f = f π.This fact is very useful, for it makes it possible to reduce to the invertible casethe proofs of many statements about general systems. We comment on thesurjective hypothesis in Example 2.4.2: we will see that this hypothesis can beomitted in may interesting cases.


To begin with, take M to be the set of all pre-orbits of f , that is, all sequences(xn)n≤0 indexed by the non-positive integers and satisfying f(xn) = xn+1 for

every n < 0. Consider the map π : M → M sending each sequence (xn)n≤0 to

its term x0 of order zero. Observe that π(M) = M . Finally, define f : M → Mto be the shift by one unit to the left:

f(. . . , xn, . . . , x0) = (. . . , xn, . . . , x0, f(x0)). (2.4.2)

It is clear that f is well defined and satisfies π f = f π. Moreover, f isinvertible: the inverse is the shift to the right

(. . . , yn, . . . , y−1, y0) 7→ (. . . , yn, . . . , y−2, y−1).

If M is a measurable space then we may turn M into a measurable space,by endowing it with the σ-algebra generated by the measurable cylinders

[Ak, . . . , A0] = (xn)n≤0 ∈ M : xi ∈ Ai for i = k, . . . , 0, (2.4.3)

where k ≤ 0 and Ak, . . . , A0 are measurable subsets of M . Then π is ameasurable map, since

π−1(A) = [A]. (2.4.4)

Moreover, f is measurable if f is measurable:

f−1([Ak, . . . , A0]) = [Ak, . . . , A−2, A−1 ∩ f−1(A0)]. (2.4.5)

The inverse of f is also measurable, since

f([Ak, . . . , A0]) = [Ak, . . . , A0,M ]. (2.4.6)

Analogously, if M is a topological space then we may turn M into a topo-logical space, by endowing it with the topology generated by the open cylinders[Ak, . . . , A0], where k ≤ 0 and Ak, . . . , A0 are open subsets of M . The relations

(2.4.4) and (2.4.6) show that π and f−1 are continuous, whereas (2.4.5) shows

that f is continuous if f is continuous. Observe that ifM admits a countable ba-sis U of open sets then the cylinders [Ak, . . . , A0] with k ≥ 0 and A0, . . . , Ak ∈ Uconstitute a countable basis of open sets for M .

If M is a metric space, with distance d, then the following function is adistance on M :

d(x, y) =

0∑

n=−∞

2n mind(xn, yn), 1 (2.4.7)

where x = (xn)n≤0 and y = (yn)n≤0. It follows immediately from the definitionthat if x and y belong to the same pre-image π−1(x) then

d(f j(x), f j(y)) ≤ 2−j d(x, y) for every j ≥ 0.

So, every pre-image π−1(x) is a stable set, that is, a subset restricted to which

the transformation f is uniformly contracting.


Example 2.4.2. Given any transformation g : M → M , consider its maximalinvariant set Mg = ∩∞

n=1gn(M). Clearly, g(Mg) ⊂Mg. Suppose that

(a) M is compact and g is continuous or (b) #g−1(y) <∞ for every y.

Then (Exercise 2.4.3), the restriction f = (g | Mg) : Mg → Mg is surjective.This restriction contains all the interesting dynamics of g. For example, as-suming that fn(M) is a measurable set for every n, every probability measureinvariant under g is also invariant under f . Analogously, every point that isrecurrent for g is also recurrent for f , at least in case (a). For this reason, wealso refer to the natural extension of f = (g |Mg) as the natural extension of g.

A set Λ ⊂M such that f−1(Λ) = Λ is called an invariant set of f . There is

a corresponding notion for the transformation f . The next proposition showsthat every closed invariant set of f admits a unique lift to a closed invariant setof the transformation f :

Proposition 2.4.3. Assume that M is a topological space. If Λ ⊂M is a closedset invariant under f then Λ = π−1(Λ) is the only closed set invariant under fand satisfying π(Λ) = Λ.

Proof. Since π is continuous, if Λ is closed then Λ = π−1(Λ) is also closed.

Moreover, if Λ is invariant under f then Λ is invariant under f :

f−1(Λ) = (π f)−1(Λ) = (f π)−1(Λ) = π−1(Λ) = Λ.

In the converse direction, let Λ ⊂ M be a closed set invariant under f andsuch that π(Λ) = Λ. It is clear that Λ ⊂ π−1(Λ). To prove the other inclusion,we must show that, given any x0 ∈ Λ, if x ∈ π−1(x0) then x ∈ Λ. Let us writex = (xn)n≤0. Consider n ≤ 0 and any neighborhood of x of the form

V = [An, . . . , A0], An, · · · , A0 open subsets of M .

By the definition of natural extension, x0 = f−n(xn) and, hence, xn ∈ fn(Λ) =Λ. Then, the hypothesis π(Λ) = Λ implies that π(yn) = xn for some yn ∈ Λ.

Since Λ is invariant under f , we have that f−n(yn) ∈ Λ. Moreover, the propertyπ(yn) = xn implies that

f−n(yn) = (. . . , yn,k, . . . , yn,−1, yn,0 = xn, xn−1, . . . , x−1, x0).

It follows that f−n(yn) ∈ V , since V contains x and its definition only dependson the coordinates indexed by j ∈ n, . . . , 0. This proves that x is accumulatedby elements of Λ. Since Λ is closed, it follows that x ∈ Λ.

Now let µ be an invariant measure of f and let µ = π∗µ. The propertyπ f = f π implies that µ is invariant under f :

f∗µ = f∗π∗µ = π∗f∗µ = π∗µ = µ.

We say that µ is a lift of µ. The next result, which is a kind of version ofProposition 2.4.3 for measures, is due to Rokhlin [Rok61]:


Proposition 2.4.4. Assume that M is a complete separable metric space andf : M →M is continuous. Then every probability measure µ invariant under fadmits a unique lift, that is, there is a unique measure µ on M invariant underf and such that π∗µ = µ.

Uniqueness is easy to establish and is independent of the hypotheses on thespace M and the transformation f . Indeed, if µ is a lift of µ then (2.4.4) and(2.4.5) imply that the measure of every cylinder is uniquely determined:

µ([Ak, . . . , A0]) = µ([Ak ∩ · · · ∩ f−k(A0)]) = µ(Ak ∩ · · · ∩ f−k(A0)). (2.4.8)

The proof of existence will be proposed to the reader in Exercise 5.2.4, usingideas to be developed in Chapter 5. We will also see in Exercise 8.5.7 thatthose arguments remain valid in the somewhat more general setting of Lebesguespaces. But existence of the lift is not true in general, for arbitrary probabilityspaces, as shown by the example in Exercise 1.15 in the book of Przytycki,Urbanski [PU10]).

2.4.3 Exercises

2.4.1. Let M be a compact metric space and X be a set of continuous mapsf : M →M , endowed with a probability measure ν. Consider the skew-productF : XN × M → XN × M defined by F ((fn)n, x) = ((fn+1)n, f0(x)). Showthat F admits some invariant probability measure m of the form m = νN × µ.Moreover, a measure m of this form is invariant under F if and only if themeasure µ is stationary for ν, that is, if and only if µ(E) =

∫f∗µ(E) dν(f) for

every measurable set E ⊂M .

2.4.2. Let f : M → M be a surjective transformation, f : M → M be itsnatural extension and π : M → M be the canonical projection. Show that ifg : N → N is an invertible transformation such that f p = p g for some mapp : N → M then there exists a unique map p : N → M such that π p = pand p g = f p. Suppose that M and N are compact spaces and the maps pand g are continuous. Show that if p is surjective then p is surjective (and so

g : N → N is an extension of f : M → M).

2.4.3. Check the claims in Example 2.4.2.

2.4.4. Show that if (M,d) is a complete separable metric space then the same

holds for the space (M, d) of the pre-orbits of any continuous surjective trans-formation f : M →M .

2.4.5. The purpose of this exercise and the next is to generalize the notion ofnatural extension to finite families of commuting transformations. Let M bea compact space and f1, . . . , fq : M → M be commuting surjective continuous

transformations. Let M be the set of all sequences (xn1,...,nq)n1,...,nq≤0, indexedby the q-uples of non-positive integer numbers, such that

fi(xn1,...,ni,...,nq) = xn1,...,ni+1,...,nq for every i and every (n1, . . . , nq).


Let π : M → M be the map sending (xn1,...,nq)n1,...,nq≤0 to the point x0,...,0.

For each i, let fi : M → M be the map sending (xn1,...,ni,...nq)n1,...,nq≤0 to(xn1,...,ni+1,...nq)n1,...,nq≤0.

(a) Prove that M is a compact space. Moreover, M is metrizable if M ismetrizable.

(b) Show that every fi : M → M is a homeomorphism with π fi = fi π.Moreover, these homeomorphisms commute.

(c) Prove that π is continuous and surjective. In particular, M is non-empty.

2.4.6. Let M be a compact space and g1, . . . , gq : M → M be commutingcontinuous transformations. Define Mg = ∩∞

n=1gn1 · · · gnq (M).

(a) Check that Mg = ∩n1,...,nqgn11 · · · gnq

q (M), where the intersection is overall q-uples (n1, . . . , nq) with ni ≥ 1 for every i.

(b) Show that gi(Mg) ⊂Mg and the restriction fi = gi | Mg is surjective, forevery i.

[Observation: It is clear that these restrictions fi commute.]

2.4.7. Use the construction in Exercises 2.4.5 and 2.4.6 to extend the proofof Theorem 1.5.1 to the case when the transformations fi are not necessarilyinvertible.

2.5 Arithmetic progressions

In this section we prove two fundamental results of Combinatorial Arithmetics,the theorem of van der Waerden and the theorem of Szemeredi, using the mul-tiple recurrence theorems (Theorem 1.5.1 and Theorem 1.5.2) introduced inSection 1.5.

We call partition of the set Z of integer numbers any finite family of pairwisedisjoint sets S1, . . . , Sk ⊂ Z whose union is the whole Z. Recall that a (finite)arithmetic progression is a sequence of the form

m+ n,m+ 2n, . . . ,m+ qn, with m ∈ Z and n, q ≥ 1.

The number q is called the length of the progression.The next theorem was originally proven by the Dutch mathematician Bartel

van der Waerden [vdW27] in the 1920’s:

Theorem 2.5.1 (van der Waerden). Given any partition S1, . . . , Sl of Z,there exists j ∈ 1, . . . , l such that Sj contains arithmetic progressions of everylength. In other words, for every q ≥ 1 there exist m ∈ Z and n ≥ 1 such thatm+ in ∈ Sj for every 1 ≤ i ≤ q.

2.5. ARITHMETIC PROGRESSIONS 59

Some time afterwards, the Hungarian mathematicians Pal Erdos and PalTuran [ET36] conjectured the following statement, which is stronger than thetheorem of van der Waerden: any set S ⊂ Z whose upper density is positivecontains arithmetic progressions of every length. This was proven by anotherHungarian mathematician, Endre Szemeredi [Sze75], almost four decades later.To state the theorem of Szemeredi precisely, we need to define the notion ofupper density of a subset of Z.

We call interval of the set Z any subset I of the form n ∈ Z : a ≤ n < bwith a ≤ b. The cardinal of an interval I is the number #I = b− a. The upperdensity of a subset S of Z is the number

Du(S) = lim sup#I→∞

#(S ∩ I)#I

where I represents any interval of Z. The lower density Dl(S) of a subset S of Zis defined analogously, just replacing limit superior with limit inferior. In otherwords, Du(S) is the largest and Dl(D) is the smallest number D such that

#(S ∩ Ij)#Ij

→ D for some sequence of intervals Ij ⊂ Z with #Ij → ∞.

In the next lemma we collect some simple properties of the upper and lowerdensities. The proof is left as an exercise (Exercise 2.5.1).

Lemma 2.5.2. For any S ⊂ Z,

0 ≤ Dl(S) ≤ Du(S) ≤ 1 and Dl(S) = 1 −Du(Z \ S).

Moreover, if S1, . . . , Sl is a partition of Z then

Dl(S1) + · · · +Dl(Sl) ≤ 1 ≤ Du(S1) + · · · +Du(Sl).

Example 2.5.3. Let S be the set of even numbers. For any interval I ⊂ Z, wehave #(S ∩ I) = #I/2 if the cardinal of I is even and #(S ∩ I) = (#I ± 1)/2if the cardinal of I is odd; the sign ± is positive if the smallest element of Iis an even number and it is negative otherwise. It follows, immediately, thatDu(S) = Dl(S) = 1/2.

Example 2.5.4. Let S be the following subset of Z:

1, 3, 4, 7, 8, 9, 13, 14, 15, 16, 21, 22, 23, 24, 25, 31, 32, 33, 34, 35, 36, 43, . . ..

That is, for each k ≥ 1 we include in S a block of k consecutive integers andthen we omit the next k integer numbers. On the one hand, S contains intervalsof every length. Consequently, Du(S) = 1. On the other hand, the complementof S also contains intervals of every length. So, Dl(S) = 1 −Du(Z \ S) = 0.

Notice that, in both examples, the set S contains arithmetic progressionsof every length. Actually, in Example 2.5.3 the set S even contains arithmeticprogressions of infinite length. That is not true in Example 2.5.4, because inthis case the complement of S contains arbitrarily long intervals.


Theorem 2.5.5 (Szemeredi). If S is a subset of Z with positive upper densitythen it contains arithmetic progressions of every length.

The theorem of van der Waerden is an easy consequence of the theorem ofSzemeredi. Indeed, it follows from Lemma 2.5.2 that if S1, . . . , Sl is a partitionof Z then there exists j such that Du(Sj) > 0. By Theorem 2.5.5, such an Sjcontains arithmetic progressions of every length.

The original proofs of these results were combinatorial. Then, Furstenberg(see [Fur81]) observed that the two theorems could also be deduced from ideas inErgodic Theory: we will show in Section 2.5.1 how to obtain the theorem of vander Waerden from the multiple recurrence theorem of Birkhoff (Theorem 1.5.1);similar arguments yield the theorem of Szemeredi from the multiple recurrencetheorem of Poincare (Theorem 1.5.2), as we will see in Section 2.5.2.

The theory of Szemeredi remains a very active research area. In particular,alternative proofs of Theorem 2.5.5 have been given by other authors. Recently,this led to the following spectacular result of the British mathematician BenGreen and the Australian mathematician Terence Tao [GT08]: the set of primenumbers contains arithmetic progressions of every length. This is not a con-sequence of the theorem of Szemeredi, because the upper density of the set ofprime numbers is zero, but the theorem of Szemeredi does have an importantrole in the proof. On the other hand, the theorem of Green-Tao is a special caseof yet another conjecture of Erdos: if S ⊂ N is such that the sum of the inversesdiverges, that is, such that ∑

n∈S

1

n= ∞,

then S contains arithmetic progressions of every length. This more generalstatement remains open.

2.5.1 Theorem of van der Waerden

In this section we prove Theorem 2.5.1. The idea of the proof is to reduce theconclusion of the theorem to a claim about the shift map

σ : Σ → Σ, (αn)n∈Z 7→ (αn+1)n∈Z

in the space Σ = 1, 2, . . . , lZ of two-sided sequences with values in the set1, 2, . . . , l. This claim will then be proved using the multiple recurrence the-orem of Birkhoff.

Observe that every partition S1, . . . , Sl of Z into l ≥ 2 subsets determinesan element α = (αn)n∈Z of Σ, through αn = i ⇔ n ∈ Si. Conversely, everyα ∈ Σ determines a partition of Z into subsets

Si = n ∈ Z : αn = i, i = 1, . . . , l.

We are going to show that for every α ∈ Σ and every q ≥ 1, there exist m ∈ Zand n ≥ 1 such that

αm+n = · · · = αm+qn. (2.5.1)


In view of what we have just observed, this means that for every partitionS1, . . . , Sl and every q ≥ 1 there exists i ∈ 1, . . . , l such that Si con-tains some arithmetic progression of length q. Since there are finitely manySi, that implies that some Sj contains arithmetic progressions of arbitrarilylarge lengths. This is the same as saying that Si contains arithmetic progres-sions of every length because, clearly, every arithmetic progression of length qcontains arithmetic progressions of every length smaller than q. This reducesthe proof of the theorem to proving the claim in (2.5.1).

To that end, let us consider on Σ the distance defined by d(β, γ) = 2−N(β,γ),where

N(β, γ) = maxN ≥ 0 : βn = γn for every n ∈ Z with |n| < N

.

Note thatd(β, γ) < 1 if and only if β0 = γ0. (2.5.2)

Since the metric space (Σ, d) is compact, the closure Z =σn(α) : n ∈ Z

of

the trajectory of α is also compact. Moreover, Z is invariant under the shiftmap. Let us consider the transformations f1 = σ, f2 = σ2, . . . , fq = σq definedfrom Z to Z. It is clear from the definition that these transformations commutewith each other. So, we may use Theorem 1.5.1 to conclude that there existθ ∈ Z and a sequence (nk)k → ∞ such that

limkfnk

i (θ) = θ for every i = 1, 2, . . . , q.

Observe that fnj

i = σi nj . In particular, we may fix n = nj such that the iteratesσn(θ), σ2n(θ), . . . , σqn(θ) are all within distance less than 1/2 from the pointθ. Consequently,

d(σin(θ), σjn(θ)

)< 1 for every 1 ≤ i, j ≤ q.

Then, as θ is in the closure Z of the orbit of α, we may find m ∈ Z such thatσm(α) is so close to θ that

d(σm+in(α), σm+jn(α)

)< 1 for every 1 ≤ i, j ≤ q.

Taking into account the observation (2.5.2) and the definition of the shift mapσ, this means that αm+n = · · · = αm+qn, as we wanted to prove. This completesthe proof of the theorem of van der Waerden.

2.5.2 Theorem of Szemeredi

Now let us prove Theorem 2.5.5. We use the same kind of dictionary betweenpartitions of Z and sequences of integer numbers that was used in the previoussection to prove the theorem of van der Waerden.

Let S be a subset of integer numbers with positive upper density, that is,such that there exist c > 0 and intervals Ij = [aj , bj) of Z satisfying

limj

#Ij = ∞ and limj

#(S ∩ Ij)#Ij

≥ c.


Let us associate with S the sequence α = (αj)j∈Z ∈ Σ = 0, 1Z defined by

αj = 1 ⇔ j ∈ S.

Consider the shift map σ : Σ → Σ and the subset A = α ∈ Σ : α0 = 1 of Σ.Note that both A and its complement are open cylinders of Σ. Thus, A is bothopen and closed in Σ. Moreover, for every j ∈ Z,

σj(α) ∈ A⇔ αj = 1 ⇔ j ∈ S.

So, to prove the theorem of Szemeredi it suffices to show that for every k ∈ Nthere exist m ∈ Z and n ≥ 1 such that

σm+n(α), σm+2n(α), . . . , σm+kn(α) ∈ A. (2.5.3)

For that, let us consider the sequence µj of probability measures defined on Σby:

µj =1

#Ij

∑

i∈Ij

δσi(α) (2.5.4)

Since the space M1(Σ) of all probability measures on Σ is compact (Theo-rem 2.1.5), up to replacing (µj)j by some subsequence we may suppose that itconverges in the weak∗ topology to some probability measure µ on Σ.

Observe that µ is a σ-invariant probability measure for, given any continuousfunction ϕ : Σ → R,

∫(ϕ σ) dµj =

1

#Ij

∑

i∈Ij

ϕ(σi(α)) +1

#Ij

[ϕ(σbj (α)) − ϕ(σaj (α))

]

=

∫ϕdµj +

1

#Ij

[ϕ(σbj (α)) − ϕ(σaj (α))

]

and, taking the limit when j → ∞, it follows that∫

(ϕσ) dµ =∫ϕdµ. Observe

also that µ(A) > 0. Indeed, since A is closed, Theorem 2.1.2 ensures that

µ(A) ≥ lim supj

µj(A) = lim supj

#(S ∩ Ij)#Ij

≥ c.

Given any k ≥ 1, consider fi = σi for i = 1, . . . , k. It is clear that thesetransformations commute with each other. So, we are in a position to applyTheorem 1.5.2 to conclude that there exists some n ≥ 1 such that

µ(A ∩ σ−n(A) ∩ · · · ∩ σ−kn(A)

)> 0.

Since A is open, this implies (Theorem 2.1.2) that

µl(A ∩ σ−n(A) ∩ · · · ∩ σ−kn(A)

)> 0

for every l sufficiently large. By the definition (2.5.4) of µl, this means thatthere exists some m ∈ Il such that

σm(α) ∈ A ∩ σ−n(A) ∩ · · · ∩ σ−kn(A).

In particular, σm+in(α) ∈ A for every i = 1, . . . , k, as we wanted to prove.


2.5.3 Exercises

2.5.1. Prove Lemma 2.5.2.

2.5.2. Show that the conclusion of Theorem 2.5.1 remains valid for partitionsof finite subsets of Z, as long as they are sufficiently large. More precisely: givenq, l ≥ 1 there exists N ≥ 1 such that, for any partition of the set 1, 2, . . . , Ninto l subsets, some of these subsets contains arithmetic progressions of lengthq.

2.5.3. A point x ∈ M is said to be super non-wandering if, given any neigh-borhood U of x and any k ≥ 1, there exists n ≥ 1 such that ∩kj=0f

−jn(U) 6= ∅.Show that the theorem of van der Warden is equivalent to the following state-ment: every invertible transformation on a compact metric space has some supernon-wandering point.

2.5.4. Prove the following generalization of the theorem of van der Waerdento arbitrary dimension, called Grunwald theorem: given any partition Nk =S1 ∪ · · · ∪ Sl and any q ≥ 1, there exist j ∈ 1, . . . , l, d ∈ N and b ∈ Nk suchthat

b+ d(a1, . . . , ak) ∈ Sj for any 1 ≤ ai ≤ q and any 1 ≤ i ≤ k.


Chapter 3

Ergodic Theorems

In this chapter we present the fundamental results of Ergodic Theory. To mo-tivate the kind of statements that we are going to discuss, let us consider ameasurable set E ⊂ M with positive measure and an arbitrary point x ∈ M .We want to analyze the set of iterates of x that visit E, that is,

j ≥ 0 : f j(x) ∈ E.

For example, the Poincare recurrence theorem states that this set is infinite,for almost every x ∈ E. We would like to have more precise quantitativeinformation. Let us call mean sojourn time of x to E the value of

τ(E, x) = limn→∞

1

n#0 ≤ j < n : f j(x) ∈ E. (3.0.1)

There is an analogous notion for flows, defined by

τ(E, x) = limT→∞

1

Tm(0 ≤ t ≤ T : f t(x) ∈ E

), (3.0.2)

where m is the Lebesgue measure on the real line. It would be interesting toknow, for example, under which conditions the mean sojourn time is positive.But before tackling this problem one must answer an even more basic question:when do the limits in (3.0.1)-(3.0.2) exist?

These questions go back to the work of the Austrian physicist Ludwig Boltz-mann (1844-1906), founder of the Kinetic Theory of gases. Boltzmann was anemphatic supporter of the Atomic Theory, according to which gases are formedby a large number of small moving particles, constantly colliding with eachother, at a time when this theory was still highly controversial. In principle,it should be possible to explain the behavior of a gas by applying the laws ofClassical Mechanics to each one of these particles (molecules). In practice, thisis not realistic because the number of molecules is huge.

The proposal of the Kinetic Theory was, then, to try and explain the behav-ior of gases at a macroscopic scale as the statistical combination of the motions

65

66 CHAPTER 3. ERGODIC THEOREMS

of all its molecules. To formulate the theory in precise mathematical terms,Boltzmann was forced to make an assumption that became known as the er-godic hypothesis. In a modern language, the ergodic hypothesis claims that, forthe kind of systems (Hamiltonian flows) that describe the motions of particlesof a gas, the mean sojourn time to any measurable set E exists and is equal tothe measure of E, for almost every point x.

Efforts to validate (or not) this hypothesis led to important developments,in Mathematics – Ergodic Theory, Dynamical Systems – as well as in Physics– Statistical Mechanics. In this chapter we concentrate ourselves on resultsconcerning the existence of the mean sojourn time. The question of whetherτ(E, x) = µ(E) for almost every x is the subject of Chapter 4.

Denoting by ϕ the characteristic function of the set E, we may rewrite theexpression on the right-hand side of (3.0.1) as

limn→∞

1

n

n−1∑

j=0

ϕ(f j(x)). (3.0.3)

This suggests a natural generalization of the original question: does the limitin (3.0.3) exist for more general functions ϕ, for example, for all integrablefunctions?

The ergodic theorem of von Neumann (Theorem 3.1.6) states that the limit in(3.0.3) does exist, in the space L2(µ), for every function ϕ ∈ L2(µ). The ergodictheorem of Birkhoff (Theorem 3.2.3) goes a lot further, by asserting that theconvergence holds at µ-almost every point, for every ϕ ∈ L1(µ). In particular,the limit in (3.0.1) is well-defined for µ-almost every x (Theorem 3.2.1).

We give a direct proof of the theorem of von Neumann and we also show howit can be deduced from the theorem of Birkhoff. Concerning the latter, we aregoing to see that it can be obtained as a special case of an even stronger result,the subadditive ergodic theorem of Kingman (Theorem 3.3.3). This theoremasserts that ψn/n converges almost everywhere, for any sequence of functionsψn such that ψm+n ≤ ψm + ψn fm for every m,n.

All these results remain valid for flows, as we comment upon in Section 3.4.

3.1 Ergodic theorem of von Neumann

In this section we state and prove the ergodic theorem of von Neumann. Webegin by reviewing some general ideas concerning isometries in Hilbert spaces.See Appendices A.6 and A.7 for more information on this topic.

3.1.1 Isometries in Hilbert spaces

Let H be a Hilbert space and F be a closed subspace of H . Then,

H = F ⊕ F⊥, (3.1.1)

3.1. ERGODIC THEOREM OF VON NEUMANN 67

where F⊥ = w ∈ H : v ·w = 0 for every v ∈ F is the orthogonal complementof F . The projection PF : H → F associated with the decomposition (3.1.1) iscalled orthogonal projection to F . It is uniquely characterized by

‖x− PF (x)‖ = min‖x− v‖ : v ∈ F.

Observe that PF (v) = v for every v ∈ F and, consequently, P 2F = PF .

Example 3.1.1. Consider the Hilbert space L2(µ), with the inner product

ϕ · ψ =

∫ϕψ dµ.

Let F be the subspace of constant functions. Given any ϕ ∈ L2(µ), we havethat (PF (ϕ) − ϕ) · 1 = 0, that is,

PF (ϕ) · 1 = ϕ · 1.

Since PF (ϕ) is a constant function, the expression on the left-hand side is equalto PF (ϕ). The expression on the right-hand side is equal to

∫ϕdµ. Therefore,

the orthogonal projection to the subspace F is given by

PF (ϕ) =

∫ϕdµ.

Recall that the adjoint operator U∗ : H → H of a continuous linear operatorU : H → H is defined by the relation

U∗u · v = u · Uv for every u, v ∈ H . (3.1.2)

The operator U is said to be an isometry if it preserves the inner product:

Uu · Uv = u · v for every u, v ∈ H . (3.1.3)

This is equivalent to saying that U preserves the norm of H (see Exercise A.6.9).Another equivalent condition is U∗U = id . Indeed,

Uu · Uv = u · v for every u, v ⇔ U∗Uu · v = u · v for every u, v.

The property U∗U = id implies that U is injective. In general, an isometry neednot be surjective. See Exercises 2.3.5 and 2.3.6. If an isometry is surjective thenit is an isomorphism; such isometries are also called unitary operators.

Example 3.1.2. If f : M → M preserves a measure µ then, as we saw inSection 2.3.2, the Koopman operator Uf : L2(µ) → L2(µ) is an isometry. If fis invertible then Uf is a unitary operator.

We call set of invariant vectors of a continuous linear operator U : H → Hthe subspace

I(U) = v ∈ H : Uv = v.Observe that I(U) is a closed vector subspace, since U is continuous and linear.When U is an isometry, we have that I(U) = I(U∗):


Lemma 3.1.3. If U : H → H is an isometry then Uv = v if and only ifU∗v = v.

Proof. Since U∗U = id , it is clear that Uv = v implies U∗v = v. Now assumethat U∗v = v. Then, Uv · v = v · U∗v = v · v = ‖v‖2. So, using the fact that Upreserves the norm of H ,

‖Uv − v‖2 = (Uv − v) · (Uv − v) = ‖Uv‖2 − Uv · v − v · Uv + ‖v‖2 = 0.

This means that Uv = v.

To close this brief digression, let us quote a classical result from FunctionalAnalysis, due to Marshall H. Stone, that permits to reduce the study of Koop-man operators of continuous time systems to the discrete case.

Let Ut : H → H , t ∈ R be a 1-parameter group of linear operators ona Banach space: by this we mean that U0 = id and Ut+s = UtUs for everyt, s ∈ R. We say that the group is strongly continuous if

limt→t0

Utv = Ut0v, for every t0 ∈ R and v ∈ H .

Theorem 3.1.4 (Stone). If Ut : H → H, t ∈ R is a strongly continuous1-parameter group of unitary operators on a complex Hilbert space then thereexists a self-adjoint operator A, defined on a dense subspace D(A) of H, suchthat Ut | D(A) = eitA for every t ∈ R.

A proof may be found in the book of Yosida [Yos68, § IX.9] and a simpleapplication is given in Exercise 3.1.5. The operator iA is called infinitesimalgenerator of the group. It may be retrieved through

iAv = limt→0

1

t

(Utv − v

). (3.1.4)

See Yosida [Yos68, § IX.3] for a proof of the fact that the limit on the right-handside exists for every v in a dense subspace of H .

Example 3.1.5. Let H be the Banach space of continuous functions ϕ : S1 →C, with the norm of uniform convergence. Define Ut(ϕ)(x) = ϕ(x+ t) for everyfunction ϕ ∈ H . Observe that (Ut)t is a strongly continuous 1-parameter groupof isometries of H . The infinitesimal generator is given by

iAφ(x) = limt→0

1

t

(Utφ(x) − φ(x)

)= lim

t→0

1

t

(φ(x+ t) − φ(x)

)= φ′(x).

Its domain is the subset of functions of class C1, which is well-known to bedense in H .

3.1. ERGODIC THEOREM OF VON NEUMANN 69

3.1.2 Statement and proof of the theorem

Our first ergodic theorem is:

Theorem 3.1.6 (von Neumann). Let U : H → H be an isometry in a Hilbertspace H and P be the orthogonal projection to the subspace I(U) of invariantvectors of U . Then,

limn→∞

1

n

n−1∑

j=0

U jv = Pv for every v ∈ H. (3.1.5)

Proof. Let L(U) be the set of vectors v ∈ H of the form v = Uu − u for someu ∈ H and let L(U) be its closure. We claim that

I(U) = L(U)⊥. (3.1.6)

This can be checked as follows. Consider any v ∈ I(U) and w ∈ L(U). ByLemma 3.1.3, we have that v ∈ I(U∗), that is, U∗v = v. Moreover, by definitionof L(U), there are un ∈ H , n ≥ 1 such that (Uun − un)n → w. Since

v · (Uun − un) = v · Uun − v · un = U∗v · un − v · un = 0

for every n, we conclude that v ·w = 0. This proves that I(U) ⊂ L(U)⊥. Next,consider any v ∈ L(U)⊥. Then, in particular,

v · (Uu− u) = 0 or, equivalently, U∗v · u− v · u = 0

for every u ∈ H . This means that U∗v = v. Using Lemma 3.1.3 once more, wededuce that v ∈ I(U). This shows that L(U)⊥ ⊂ I(U), which completes theproof of (3.1.6). As a consequence, using (3.1.1),

H = I(U) ⊕ L(U). (3.1.7)

Now we prove the identity (3.1.5), successively, for v ∈ I(U), for v ∈ L(U)and for any v ∈ H . Begin by supposing that v ∈ I(U). On the one hand,Pv = v. On the other hand,

1

n

n−1∑

j=0

U jv =1

n

n−1∑

j=0

v = v

for every n, and so this sequence converges to v when n→ ∞. Combining thesetwo observations we get (3.1.5) in this case.

Next, suppose that v ∈ L(U). Then, by definition, there exists u ∈ H suchthat v = Uu− u. It is clear that

1

n

n−1∑

j=0

U jv =1

n

n−1∑

j=0

(U j+1u− U ju

)=

1

n(Unu− u).


The norm of this last expression is bounded by 2‖u‖/n and, consequently, con-verges to zero when n→ ∞. This shows that

limn

1

n

n−1∑

j=0

U jv = 0 for every v ∈ L(U). (3.1.8)

More generally, suppose that v ∈ L(U). Then, there exist vectors vk ∈ L(U)converging to v when k → ∞. Observe that

∥∥ 1

n

n−1∑

j=0

U jv − 1

n

n−1∑

j=0

U jvk∥∥ ≤ 1

n

n−1∑

j=0

‖U j(v − vk)‖ ≤ ‖v − vk‖

for every n and every k. Together with (3.1.8), this implies that

limn

1

n

n−1∑

j=0

U jv = 0 for every v ∈ L(U). (3.1.9)

Since (3.1.6) implies that Pv = 0 for every v ∈ L(U), this shows that (3.1.5)holds also when v ∈ L(U).

The general case of (3.1.5) follows immediately, as H = I(U) ⊕ L(U).

3.1.3 Convergence in L2(µ)

Given a measurable transformation f : M → M and an invariant probabilitymeasure µ on M , we say that a measurable function ψ : M → R is invariantif ψ f = ψ at µ-almost every point. The following result is a special case ofTheorem 3.1.6:

Theorem 3.1.7. Given any ϕ ∈ L2(µ), let ϕ be the orthogonal projection of ϕto the subspace of invariant functions. Then the sequence

1

n

n−1∑

j=0

ϕ f j (3.1.10)

converges to ϕ in the space L2(µ). If f is invertible, then the sequence

1

n

n−1∑

j=0

ϕ f−j (3.1.11)

also converges to ϕ in L2(µ).

Proof. Let U = Uf : L2(µ) → L2(µ) be the Koopman operator of (f, µ). Notethat a function ψ is in the subspace I(U) of invariant functions if and only ifψ f = ψ at µ-almost every point. By Theorem 3.1.6, the sequence in (3.1.10)converges in the space L2(µ) to the orthogonal projection ϕ of ϕ to the subspaceI(U). This proves the first claim.

3.2. BIRKHOFF ERGODIC THEOREM 71

The second one is analogous, taking instead U = Uf−1 , which is the inverseof Uf . We get that the sequence in (3.1.11) converges in L2(µ) to the orthogonalprojection of ϕ to the subspace I(Uf−1). Observing that I(Uf−1) = I(Uf ), weconclude that the limit of this sequence is just the same function ϕ as before.

3.1.4 Exercises

3.1.1. Show that under the hypotheses of the von Neumann ergodic theoremone has the following stronger conclusion:

limn−m→∞

1

n−m

n−1∑

j=m

ϕ f j → P (ϕ).

3.1.2. Use the previous exercise to show that, given any A ⊂M with µ(A) > 0,the set of values of n ∈ N such that µ(A∩f−n(A)) > 0 is syndetic. [Observation:We have seen a different proof of this fact in Exercise 1.2.5.]

3.1.3. Prove that the set F = ϕ ∈ L1(µ) : ϕ is f -invariant is a closed sub-space of L1(µ).

3.1.4. State and prove a version of the von Neumann ergodic theorem for flows.

3.1.5. Let ft : M →M , t ∈ R be a continuous flow on a compact metric spaceM and µ be an invariant probability measure. Check that the 1-parametergroup Ut : L2(µ) → L2(µ), t ∈ R of Koopman operators ϕ 7→ Utϕ = ϕ ft isstrongly continuous. Show that µ is ergodic if and only if 0 is a simple eigenvalueof the infinitesimal generator of the group.

3.2 Birkhoff ergodic theorem

The theorem that we present in this section was proven by George DavidBirkhoff1, the prominent American mathematician of his generation and au-thor of many other fundamental contributions to Dynamics. It is a substantialimprovement of the von Neumann ergodic theorem, because its conclusion isstated in terms of convergence at µ-almost every point, which in this context isa stronger property than convergence in L2(µ), as explained in Section 3.2.3.

3.2.1 Mean sojourn time

We start by stating the version of the theorem for mean sojourn times:

Theorem 3.2.1 (Birkhoff). Let f : M → M be a measurable transformationand µ be a probability measure invariant under f . Given any measurable setE ⊂M , the mean sojourn time

τ(E, x) = limn

1

n#j = 0, 1, . . . , n− 1 : f j(x) ∈ E

1His son Garret Birkhoff was also a mathematician, and is well known for his work inAlgebra. The notion of projective distance that we use in Section 12.3 was due to him.


exists at µ-almost every point x ∈M . Moreover,∫τ(E, x) dµ(x) = µ(E).

Observe that if τ(E, x) exists for some x ∈M then

τ(E, f(x)) = τ(E, x). (3.2.1)

Indeed, by definition,

τ(E, f(x)) = limn→∞

1

n

n∑

j=1

XE(f j(x))

= limn→∞

1

n

n−1∑

j=0

XE(f j(x)) − 1

n

[XE(x) −XE(fn(x))

]

= τ(E, x) − limn→∞

1

n

[XE(x) −XE(fn(x))

]

Since the characteristic function is bounded, the last limit is equal to zero. Thisproves (3.2.1).

The next example shows that the mean sojourn time does not exist for everypoint, in general:

Example 3.2.2. Consider the number x ∈ (0, 1) defined by the decimal expan-sion x = 0, a1a2a3 . . . , where ai = 1 if 2k ≤ i < 2k+1 with k even and ai = 0 if2k ≤ i < 2k+1 with k odd. In other words,

x = 0, 10011110000000011111111111111110 . . . ,

where the lengths of the alternating blocks of 0s and 1s are given by the suc-cessive powers of 2. Let f : [0, 1] → [0, 1] be the transformation defined inSection 1.3.1 and let E = [0, 1/10). That is, E is the set of all points whosedecimal expansion starts with the digit 0. It is easy to check that if n = 2k − 1with k even then

1

n

n−1∑

j=0

XE(f j(x)) =21 + 23 + · · · + 2k−1

2k − 1=

2

3.

On the other hand, if one takes n = 2k − 1 with k odd then

1

n

n−1∑

j=0

XE(f j(x)) =21 + 23 + · · · + 2k−2

2k − 1=

2k − 2

3(2k − 1)→ 1

3

as k → ∞. Thus, the mean sojourn time of x in the set E does not exist.

3.2.2 Time averages

As we observed previously,

τ(E, x) = limn

1

n

n−1∑

j=0

ϕ(f j(x)), where ϕ = XE .


The next statement extends Theorem 3.2.1 to the case when ϕ is any integrablefunction:

Theorem 3.2.3 (Birkhoff). Let f : M → M be a measurable transformationand µ be a probability measure invariant under f . Given any integrable functionϕ : M → R, the limit

ϕ(x) = limn→∞

1

n

n−1∑

j=0

ϕ(f j(x)) (3.2.2)

exists at µ-almost every point x ∈ M . Moreover, the function ϕ defined in thisway is integrable and satisfies

∫ϕ(x) dµ(x) =

∫ϕ(x) dµ(x).

In a little while, we will obtain this theorem as a special case of a moregeneral result, the subadditive ergodic theorem. The limit ϕ is called timeaverage, or orbital average, of ϕ. The next proposition shows that time averagesare constant on the orbit of µ-almost every point, which generalizes (3.2.1):

Proposition 3.2.4. Let ϕ : M → R be an integrable function. Then,

ϕ(f(x)) = ϕ(x) for µ-almost every point x ∈M . (3.2.3)

Proof. By definition,

ϕ(f(x)) = limn→∞

1

n

n∑

j=1

ϕ(f j(x)) = limn→∞

1

n

n−1∑

j=0

ϕ(f j(x)) +1

n

[ϕ(fn(x)) − ϕ(x)

]

= ϕ(x) + limn→∞

1

n

[ϕ(fn(x)) − ϕ(x)

]

We need the following lemma:

Lemma 3.2.5. If φ is an integrable function then limn(1/n)φ(fn(x)) = 0 forµ-almost every point x ∈M .

Proof. Fix any ε > 0. Since µ is invariant, we have that

µ(x ∈M : |φ(fn(x))| ≥ nε

)= µ

(x ∈M : |φ(x)| ≥ nε

)

=

∞∑

k=n

µ(x ∈M : k ≤ |φ(x)|

ε< k + 1

).

Adding these expressions over n ∈ N, we obtain

∞∑

n=1

µ(x ∈M : |φ(fn(x))| ≥ nε

)=

∞∑

k=1

kµ(x ∈M : k ≤ |φ(x)|

ε< k + 1

)

≤∫ |φ|

εdµ.


Since φ is integrable, by assumption, all these expressions are finite. Thatimplies that the set B(ε) of all points x such that |φ(fn(x))| ≥ nε for infinitelymany values of n has zero measure (check Exercise A.1.6). Now, the definition ofB(ε) implies that for every x /∈ B(ε) there exists p ≥ 1 such that |φ(fn(x))| < nεfor every n ≥ p. Consider the set B = ∪∞

i=1B(1/i). Then B has zero measureand limn(1/n)φ(fn(x)) = 0 for every x /∈ B.

Applying Lemma 3.2.5 to the function φ = ϕ we obtain the identity in(3.2.3). This completes the proof of Proposition 3.2.4.

In general, the total measure subset of points for which the limit in (3.2.2)exists depends on the function ϕ under consideration. However, in some situ-ations it is possible to choose such a set independent of the function. A usefulexample of such a situation is:

Theorem 3.2.6. Let M be a compact metric space and f : M → M be ameasurable map. Then there exists some measurable set G ⊂M with µ(G) = 1such that

1

n

n−1∑

j=0

ϕ(f j(x)) → ϕ(x) (3.2.4)

for every x ∈ G and every continuous function ϕ : M → R.

Proof. By the Birkhoff ergodic theorem, for every continuous function ϕ thereexists G(ϕ) ⊂ M such that µ(G(ϕ)) = 1 and (3.2.4) holds for every x ∈ G(ϕ).By Theorem A.3.13, the space C0(M) of continuous functions admits somecountable dense subset ϕk : k ∈ N. Take

G =

∞⋂

k=1

G(ϕk).

Since the intersection is countable, it is clear that µ(G) = 1. So, it suffices toprove that (3.2.4) holds for every continuous function ϕ whenever x ∈ G. Thiscan be done as follows. Given ϕ ∈ C0(M) and any ε > 0, take k ∈ N such that

‖ϕ− ϕk‖ = sup|ϕ(x) − ϕk(x)| : x ∈M

≤ ε.

Then, given any point x ∈ G,

lim supn

1

n

n−1∑

j=0

ϕ(f j(x)) ≤ limn

1

n

n−1∑

j=0

ϕk(fj(x)) + ε = ϕk(x) + ε

lim infn

1

n

n−1∑

j=0

ϕ(f j(x)) ≥ limn

1

n

n−1∑

j=0

ϕk(fj(x)) − ε = ϕk(x) − ε.

This implies that

lim supn

1

n

n−1∑

j=0

ϕ(f j(x)) − lim infn

1

n

n−1∑

j=0

ϕ(f j(x)) ≤ 2ε.


Since ε is arbitrary, it follows that the limit ϕ(x) exists, as stated.

In general, one can not say anything about the speed of convergence inTheorem 3.2.3. For example, it follows from a theorem of Kakutani and Petersen(check pages 94 to 99 in the book of Petersen [Pet83]) that if the measure µis ergodic 2 and non-atomic then, given any sequence (an)n of positive realnumbers with limn an = 0, there exists some bounded measurable function ϕwith

lim supn

1

an

∣∣∣1

n

n−1∑

j=0

ϕ(f j(x)) −∫ϕdµ

∣∣∣ = +∞.

Another interesting observation is that there is no analogue of the Birkhoffergodic theorem for infinite invariant measures. Indeed, suppose that µ is aσ-finite, but infinite, invariant measure of a transformation f : M → M . Wesay that a measurable set W ⊂M is wandering if the pre-images f−i(W ), i ≥ 0are pairwise disjoint. Suppose that µ is ergodic and conservative, that is, suchthat every wandering set has zero measure. Then, given any sequence (an)n ofpositive real numbers,

• either, for every ϕ ∈ L1(µ),

lim infn

1

an

n−1∑

j=0

ϕ f j = 0 at almost every point;

• or, there exists (nk)k → ∞ such that, for every ϕ ∈ L1(µ),

limk

1

ank

nk−1∑

j=0

ϕ f j = ∞ at almost every point.

This and other related facts about infinite measures are proved in Section 2.4of the book of Aaronson [Aar97].

3.2.3 Theorem of von Neumann and consequences

The theorem of von Neumann (Theorem 3.1.7) may also be deduced directlyfrom the theorem of Birkhoff, as we are going to explain.

Consider any ϕ ∈ L2(µ) and let ϕ be the corresponding time average. Westart by showing that ϕ ∈ L2(µ) and its norm satisfies ‖ϕ‖2 ≤ ‖ϕ‖2. Indeed,

|ϕ| ≤ limn

1

n

n−1∑

j=0

|ϕ f j| and, hence, |ϕ∣∣2 ≤ lim

n

( 1

n

n−1∑

j=0

∣∣ϕ f j|)2

.

2We say that an invariant measure µ is ergodic if f−1(A) = A up to measure zero impliesthat either µ(A) = 0 or µ(Ac) = 0. The study of ergodic measures will be the subject of thenext chapter.


Then, by the Fatou lemma (Theorem A.2.10),

[ ∫|ϕ∣∣2 dµ

]1/2≤ lim inf

n

[ ∫ ( 1

n

n−1∑

j=0

|ϕ f j |)2

dµ]1/2

. (3.2.5)

We can use the Minkowski inequality (Theorem A.5.3) to bound the sequenceon the right-hand side from above:

[ ∫ ( 1

n

n−1∑

j=0

|ϕ f j |)2

dµ]1/2

≤ 1

n

n−1∑

j=0

[ ∫|ϕ f j |2 dµ

]1/2. (3.2.6)

Since µ is invariant under f , the expression on the right-hand side is equal to[ ∫|ϕ|2 dµ

]1/2. So, (3.2.5) and (3.2.6) imply that ‖ϕ‖2 ≤ ‖ϕ‖2 <∞.

Now let us show that (1/n)∑n−1j=0 ϕ f j converges to ϕ in L2(µ). Initially,

suppose that the function ϕ is bounded, that is, there exists C > 0 such that|ϕ| ≤ C. Then,

∣∣∣1

n

n−1∑

j=0

ϕ f j∣∣∣ ≤ C for every n and |ϕ| ≤ C.

Then we may use the dominated convergence theorem (Theorem A.2.11) toconclude that

limn

∫ ( 1

n

n−1∑

j=0

ϕ f j − ϕ)2

dµ =

∫ (limn

1

n

n−1∑

j=0

ϕ f j − ϕ)2

dµ = 0.

In other words, (1/n)∑n−1j=0 ϕ f j converges to ϕ in the space L2(µ). We are

left to extend this conclusion to arbitrary functions ϕ in L2(µ). For that, let usconsider some sequence (ϕk) of bounded functions such that (ϕk)k converges toϕ. For example:

ϕk(x) =

ϕ(x) if |ϕ(x)| ≤ k0 otherwise.

Denote by ϕk the corresponding time averages. Given any ε > 0, let k0 be fixedsuch that ‖ϕ−ϕk‖2 < ε/3 for every k ≥ k0. Note that ‖(ϕ−ϕk) f j‖2 is equalto ‖ϕ− ϕk‖2 for every j ≥ 0, because the measure µ is invariant. Thus,

∥∥∥1

n

n−1∑

j=0

(ϕ−ϕk)f j∥∥∥

2≤ ‖ϕ−ϕk‖2 < ε/3 for every n ≥ 1 and k ≥ k0. (3.2.7)

Observe also that ϕ − ϕk is the time average of the function ϕ − ϕk. So, theargument in the previous paragraph gives that

‖ϕ− ϕk‖2 ≤ ‖ϕ− ϕk‖2 < ε/3 for every k ≥ k0. (3.2.8)


By assumption, for every k ≥ 1 there exists n0(k) ≥ 1 such that

∥∥∥1

n

n−1∑

j=0

ϕk f j − ϕk

∥∥∥2< ε/3 for every n ≥ n0(k). (3.2.9)

Adding (3.2.7), (3.2.8), (3.2.9) we get that

∥∥∥1

n

n−1∑

j=0

ϕ f j − ϕ∥∥∥

2< ε for every n ≥ n0(k0).

This completes the proof of the theorem of von Neumann from the theorem ofBirkhoff.

Exercise 3.2.5 contains an extension of these conclusions to any Lp(µ) space.

Corollary 3.2.7. The time average ϕ of any function ϕ ∈ L2(µ) coincides withthe orthogonal projection P (ϕ) of ϕ to the subspace of invariant functions.

Proof. On the one hand, Theorem 3.1.7 gives that (1/n)∑n−1j=0 ϕ f j converges

to P (ϕ) in L2(µ). On the other hand, we have just shown that this sequenceconverges to ϕ in the space L2(µ). So, by uniqueness of the limit, P (ϕ) = ϕ.

Corollary 3.2.8. If f : M → M is invertible then the time averages of anyfunction ϕ ∈ L2(µ) relative to f and to f−1 coincide at µ-almost every point:

limn

1

n

n−1∑

j=0

ϕ f−j = limn

1

n

n−1∑

j=0

ϕ f j at µ-almost every point. (3.2.10)

Proof. The limit on the left-hand side of (3.2.10) is the orthogonal projectionof ϕ to the subspace of functions invariant under f−1, whereas limit on theright-hand side is the orthogonal projection of ϕ to the subspace of functionsinvariant under f . It is clear that these two subspaces are exactly the same.Thus, the two limits coincide in L2(µ).

3.2.4 Exercises

3.2.1. Let X = x1, . . . , xr be a finite set and σ : X → X be a permutation.We call σ a cyclic permutation if it admits a unique orbit (containing all relements of X).

1. Prove that, for any cyclic permutation σ and any function ϕ : X → R,

limn→∞

1

n

n−1∑

i=0

ϕ(σi(x)) =ϕ(x1) + · · · + ϕ(xr)

r.


2. More generally, prove that for any permutation σ and any function ϕ

limn→∞

1

n

n−1∑

i=0

ϕ(σi(x)) =ϕ(x) + ϕ(σ(x)) + · · · + ϕ(σp−1(x))

p.

where p ≥ 1 is the cardinality of the orbit of x.

3.2.2. Check that Lemma 3.2.5 can also be deduced from the Birkhoff ergodictheorem and then we may even weaken the hypothesis: it suffices to supposethat φ is measurable and ψ = φ f − φ is integrable.

3.2.3. A function ϕ : Z → R is said to be uniformly quasi-periodic if for everyε > 0 there exists L(ε) ∈ N such that every interval n + 1, . . . , n + L(ε) inthe set of integer numbers contains some τ such that |ϕ(k + τ) − ϕ(k)| < ε forevery k ∈ Z. Any such τ is called an ε-quasi-period of f .

(a) Prove that if ϕ is uniformly quasi-periodic then ϕ is bounded.

(b) Show that for every ε > 0 there exists ρ ≥ 1 such that

∣∣∣1

ρ

(n+1)ρ∑

j=nρ+1

ϕ(j) − 1

ρ

ρ∑

j=1

ϕ(j)∣∣∣ < 2ε for every n ≥ 1.

(c) Show that the sequence (1/n)∑nj=1 ϕ(j) converges to some real number

when n→ ∞.

(d) More generally, prove that limn(1/n)∑nk=1 ϕ(x+k) exists for every x ∈ Z

and is independent of x.

3.2.4. Prove that for Lebesgue-almost every x ∈ [0, 1], the geometric meanof the integer numbers a1, . . . , an, . . . in the continued fraction expansion of xconverges to some real number: in other words, there exists b ∈ R such thatlimn(a1a2 · · ·an)1/n = b. [Observation: Compare with Exercise 4.2.12.]

3.2.5. Let ϕ : M → R be an integrable function and ϕ be the correspondingtime average, given by Theorem 3.2.3. Show that if ϕ ∈ Lp(µ) for some p > 1then ϕ ∈ Lp(µ) and ‖ϕ‖p ≤ ‖ϕ‖p. Moreover,

1

n

n−1∑

j=0

ϕ f j

converges to ϕ in the space Lp(µ).

3.2.6. Prove the Birkhoff ergodic theorem for flows: if µ is a probability measureinvariant under a flow f and ϕ ∈ L1(µ) then the function

ϕ(x) = limT→∞

1

T

∫ T

0

ϕ(f t(x)) dt

is defined at µ-almost every point and∫ϕ dµ =

∫ϕdµ.

3.3. SUBADDITIVE ERGODIC THEOREM 79

3.2.7. Prove that if a homeomorphism f : M → M of a metric space Madmits exactly one invariant probability measure µ, and this measure is suchthat µ(A) > 0 for every non-empty open set A ⊂ M , then every orbit of f isdense in M .

3.3 Subadditive ergodic theorem

A sequence of functions ϕn : M → R is said to be subadditive for a transforma-tion f : M →M if

ϕm+n ≤ ϕm + ϕn fm for every m,n ≥ 1. (3.3.1)

Example 3.3.1. A sequence ϕn : M → R is additive for the transformation fif ϕm+n = ϕm + ϕn fm for every m,n ≥ 1. For example, the time sums

ϕn(x) =

n−1∑

j=0

ϕ(f j(x))

of any function ϕ : M → R form an additive sequence. In fact, every additivesequence is of this form, with ϕ = ϕ1. Of course, additive sequences are alsosubadditive.

For the next example we need the notion of norm of a square matrix A ofdimension d, which is defined as follows:

‖A‖ = sup‖Av‖

‖v‖ : v ∈ Rd \ 0

(3.3.2)

Compare with (2.3.1). It follows directly from the definition that the norm ofthe product of two matrices is less or equal than the product of the norms ofthose matrices:

‖AB‖ ≤ ‖A‖ ‖B‖ . (3.3.3)

Example 3.3.2. Let A : M → GL(d) be a measurable function with values inthe linear group, that is, the set GL(d) of invertible square matrices of dimensiond. Define

φn(x) = A(fn−1(x)) · · ·A(f(x))A(x)

for every n ≥ 1 and x ∈ M . Then the sequence ϕn(x) = log ‖φn(x)‖ is subad-ditive. Indeed,

φm+n(x) = φn(fm(x))φm(x)

and so, using (3.3.3),

ϕm+n(x) = log ‖φn(fm(x))φm(x)‖≤ log ‖φm(x)‖ + log ‖φn(fm(x))‖ = ϕm(x) + ϕn(f

m(x)).

for every m, n and x.


Recall that, given any function ϕ : M → R, we denote by ϕ+ : M → R itspositive part, which is defined by ϕ+(x) = maxϕ(x), 0.Theorem 3.3.3 (Kingman). Let µ be a probability measure invariant under atransformation f : M → M and let ϕn : M → R, n ≥ 1 be a subadditive se-quence of measurable functions such that ϕ+

1 ∈ L1(µ). Then (ϕn/n)n convergesat µ-almost every point to some function ϕ : M → [−∞,+∞) that is invariantunder f . Moreover, ϕ+ ∈ L1(µ) and

∫ϕdµ = lim

n

1

n

∫ϕn dµ = inf

n

1

n

∫ϕn dµ ∈ [−∞,+∞).

The proof of Theorem 3.3.3 that we are going to present is due to Avila,Bochi [AB], who started from a proof of the Birkhoff ergodic theorem (The-orem 3.2.3) by Katznelson, Weiss [KW82]. An important observation is thatTheorem 3.2.3 is not used in the arguments. This allows us to obtain the theo-rem of Birkhoff as a particular case of Theorem 3.3.3.

3.3.1 Preparing the proof

A sequence (an)n in [−∞,+∞) is said to be subadditive if am+n ≤ am + an forevery m,n ≥ 1.

Lemma 3.3.4. If (an)n is a subadditive sequence then

limn

ann

= infn

ann

∈ [−∞,∞). (3.3.4)

Proof. If am = −∞ for some m then, by subadditivity, an = −∞ for everyn > m. In that case, both sides of (3.3.4) are equal to −∞ and so the lemmaholds. From now on let us assume that an ∈ R for every n.

Let L = infn(an/n) ∈ [−∞,+∞) and B be any real number larger than L.Then we may find k ≥ 1 such that

akk< B.

For n > k, we may write n = kp + q, where p and q are integer numbers suchthat p ≥ 1 and 1 ≤ q ≤ k. Then, by subadditivity,

an ≤ akp + aq ≤ pak + aq ≤ pak + α,

where α = maxai : 1 ≤ i ≤ k. Hence,

ann

≤ pk

n

akk

+α

n.

Observe that pk/n converges to 1 and α/n converges to zero when n→ ∞. So,since ak/k < B, we have that

L ≤ ann< B


for every n sufficiently large. Making B → L, we conclude that

limn

ann

= L = infn

ann.

This completes the argument.

Now let (ϕn)n be as in Theorem 3.3.3. By subadditivity,

ϕn ≤ ϕ1 + ϕ1 f + · · · + ϕ1 fn−1.

This relation remains valid if we replace ϕn and ϕ1 by their positive parts ϕ+n

and ϕ+1 . Hence, the hypothesis that ϕ+

1 ∈ L1(µ) implies that ϕ+n ∈ L1(µ) for

every n. Moreover, the hypothesis that (ϕn)n is subadditive implies that

an =

∫ϕn dµ, n ≥ 1

is a subadditive sequence in [−∞,+∞). Therefore, by Lemma 3.3.4, the limit

L = limn

ann

= infn

ann

∈ [−∞,∞)

exists. Define ϕ− : M → [−∞,∞] and ϕ+ : M → [−∞,∞] through

ϕ−(x) = lim infn

ϕnn

(x) and ϕ+(x) = lim supn

ϕnn

(x).

Clearly, ϕ−(x) ≤ ϕ+(x) for every x ∈M . We are going to prove that

∫ϕ− dµ ≥ L ≥

∫ϕ+ dµ, (3.3.5)

as long as each function ϕn is bounded from below. Consequently, the twofunctions ϕ− and ϕ+ coincide at µ-almost every point and their integral isequal to L. Thus, the theorem will be proven in this case, with ϕ = ϕ− = ϕ+

(the fact that ϕ is invariant under f is part of Exercise 3.3.2). At the end, weremove that boundedness assumption using a truncation trick.

3.3.2 Key lemma

In this section we assume that ϕ− > −∞ at every point. Fix ε > 0 and define,for each k ∈ N,

Ek =x ∈M : ϕj(x) ≤ j

(ϕ−(x) + ε

)for some j ∈ 1, . . . , k

.

It is clear that Ek ⊂ Ek+1 for every k. Moreover, the definition of ϕ−(x) impliesthat M = ∪kEk. Define also

ψk(x) =

ϕ−(x) + ε if x ∈ Ekϕ1(x) if x ∈ Eck.


It follows from the definition of Ek that ϕ1(x) > ϕ−(x) + ε for every x ∈ Eck.Combining this fact with the previous observations, we see that the sequence(ψk(x))k is non-increasing and converges to ϕ−(x) + ε, for every x ∈ M . Inparticular, by the monotone convergence theorem (Theorem A.2.9),

∫ψk dµ→

∫(ϕ− + ε) dµ as k → ∞.

The crucial step in the proof of the theorem is the following estimate:

Lemma 3.3.5. For every n > k ≥ 1 and µ-almost every x ∈M ,

ϕn(x) ≤n−k−1∑

i=0

ψk(fi(x)) +

n−1∑

i=n−k

maxψk, ϕ1(f i(x)).

Proof. Take x ∈ M such that ϕ−(x) = ϕ−(f j(x)) for every j ≥ 1 (this holdsat µ-almost every point, according to Exercise 3.3.2). Consider the sequence,possibly finite, of integer numbers

m0 ≤ n1 < m1 ≤ n2 < m2 ≤ . . . (3.3.6)

defined inductively as follows (see also Figure 3.1).

m0

m0

m1

m1

ml

ml n

n

nl+1

n1

n1 nl

nl

Eck EckEck

EckEckEckEck

Eck

Figure 3.1: Decomposition of the trajectory of a point

Define m0 = 0. Let nj be the smallest integer number larger or equal thanmj−1 satisfying fnj(x) ∈ Ek (if it exists). Then, by the definition of Ek, thereexists mj such that 1 ≤ mj − nj ≤ k and

ϕmj−nj (fnj (x)) ≤ (mj − nj)(ϕ−(fnj (x)) + ε). (3.3.7)

This completes the definition of the sequence (3.3.6). Now, given n ≥ k, letl ≥ 0 be the largest integer number such that ml ≤ n. By subadditivity,

ϕnj−mj−1(fmj−1(x)) ≤

nj−1∑

i=mj−1

ϕ1(fi(x))


for every j = 1, . . . , l such that mj−1 6= nj and analogously for ϕn−ml(fml(x)).

Thus,

ϕn(x) ≤∑

i∈I

ϕ1(fi(x)) +

l∑

j=1

ϕmj−nj (fnj (x)) (3.3.8)

where I = ∪lj=1[mj−1, nj) ∪ [ml, n). Observe that

ϕ1(fi(x)) = ψk(f

i(x)) for every i ∈ ∪lj=1[mj−1, nj) ∪ [ml,minnl+1, n),

since f i(x) ∈ Eck in all these cases. Moreover, since ϕ− is constant on orbits(see Exercise 3.3.2) and ψk ≥ ϕ− + ε, the relation (3.3.7) gives that

ϕmj−nj (fnj (x)) ≤

mj−1∑

i=nj

(ϕ−(f i(x)) + ε) ≤mj−1∑

i=nj

ψk(fi(x))

for every j = 1, . . . , l. In this way, using (3.3.8), we conclude that

ϕn(x) ≤minnl+1,n−1∑

i=0

ψk(fi(x)) +

n−1∑

i=nl+1

ϕ1(fi(x)).

Since nl+1 > n− k, the lemma is proven.

3.3.3 Estimating ϕ−

Towards establishing (3.3.5), in this section we prove the following lemma:

Lemma 3.3.6.∫ϕ− dµ = L

Proof. Suppose for a while that ϕn/n is uniformly bounded from below, that is,that there exists κ > 0 such that ϕn/n ≥ −κ for every n. Applying the lemmaof Fatou (Theorem A.2.10) to the sequence of non-negative functions ϕn/n+κ,we get that ϕ− is integrable and

∫ϕ− dµ ≤ lim

n

∫ϕnndµ = L.

To prove the opposite inequality, observe that Lemma 3.3.5 implies

1

n

∫ϕn dµ ≤ n− k

n

∫ψk dµ+

k

n

∫maxψk, ϕ1 dµ. (3.3.9)

Note that maxψk, ϕ1 ≤ maxϕ− + ε, ϕ+1 and this last function is integrable.

So, the limit superior of the last term in (3.3.9) as n→ ∞ is less or equal thanzero. So, making n → ∞ we get that L ≤

∫ψk dµ for every k. Then, making

k → ∞, we conclude that

L ≤∫ϕ− dµ+ ε.


Finally, making ε → 0 we get that L ≤∫ϕ− dµ. This proves the lemma when

ϕn/n is uniformly bounded from below.We are left to remove this hypothesis. Define, for each κ > 0,

ϕκn = maxϕn,−κn and ϕκ− = maxϕ−,−κ.

The sequence (ϕκn)n satisfies all the conditions of Theorem 3.3.3: indeed, it issubadditive and the positive part of ϕκ1 is integrable. Moreover, it is clear thatϕκ− = lim infn(ϕ

κn/n). So, the argument in the previous paragraph shows that

∫ϕκ− dµ = inf

n

1

n

∫ϕκn dµ. (3.3.10)

By the monotone convergence theorem (Theorem A.2.9), we also have that

∫ϕn dµ = inf

κ

∫ϕκn dµ and

∫ϕ− dµ = inf

κ

∫ϕκ− dµ. (3.3.11)

Combining the relations (3.3.10) and (3.3.11), we get that

∫ϕ− dµ = inf

κ

∫ϕκ− = inf

κinfn

1

n

∫ϕκn dµ = inf

n

1

n

∫ϕn dµ = L.


3.3.4 Bounding ϕ+

To complete the proof of (3.3.5), we are now going to show that∫ϕ+ dµ ≤ L

as long as infx ϕn(x) is finite for every n. Let us start by proving the followingauxiliary result:

Lemma 3.3.7. For any fixed k,

lim supn

ϕknn

= k lim supn

ϕnn.

Proof. The inequality ≤ is clear, since ϕkn/kn is a subsequence of ϕn/n. Toprove the opposite inequality, let us write n = kqn + rn with rn ∈ 1, . . . , k.By subadditivity,

ϕn ≤ ϕkqn + ϕrn fkqn ≤ ϕkqn + ψ fkqn

where ψ = maxϕ+1 , . . . , ϕ

+k . Observe that n/qn → k when n→ ∞. Moreover,

as ψ ∈ L1(µ), we may use Lemma 3.2.5 to see that ψ fn/n converges to zeroat µ-almost every point. Hence, dividing all the terms in the previous relationby n and taking the limit superior as n→ ∞, we get that

lim supn

1

nϕn ≤ lim sup

n

1

nϕkqn + lim sup

n

1

nψ fkqn =

1

klim sup

q

1

qϕkq,

as stated in the lemma.


Lemma 3.3.8. Suppose that infx ϕn(x) > −∞ for every n. Then∫ϕ+ dµ ≤ L.

Proof. For each k and n ≥ 1, consider θn = −∑n−1j=0 ϕk f jk. Observe that

∫θn dµ = −n

∫ϕk dµ for every n, (3.3.12)

since fk preserves the measure µ. Since the sequence (ϕn)n is subadditive,θn ≤ −ϕkn for every n. Hence, using Lemma 3.3.7,

θ− = lim infn

θnn

≤ − lim supn

ϕknn

= −k lim supn

ϕnn

= −kϕ+

and so ∫θ− dµ ≤ −k

∫ϕ+ dµ. (3.3.13)

Observe also that the sequence (θn)n is additive: θm+n = θm + θn fkm forevery m, n ≥ 1. Since θ1 = −ϕk is bounded from above by − inf ϕk, we alsohave that the function θ+1 is bounded and, consequently, integrable. Thus, wemay apply Lemma 3.3.6, together with the equality (3.3.12), to conclude that

∫θ− dµ = inf

n

∫θnndµ = −

∫ϕk dµ. (3.3.14)

Putting (3.3.13) and (3.3.14) together we get that

∫ϕ+ dµ ≤ 1

k

∫ϕk dµ.

Finally, taking the infimum over k we get that∫ϕ+ dµ ≤ L.

Lemmas 3.3.6 and 3.3.8 imply the relation (3.3.5) and, thus, Theorem 3.3.3is proven when inf ϕk > −∞ for every k. In the general case, consider

ϕκn = maxϕn,−κn and ϕκ− = maxϕ−,−κ and ϕκ+ = maxϕ+,−κ

for every constant κ > 0. The previous arguments may be applied to thesequence (ϕκn)n for each fixed κ > 0. Hence, ϕκ+ = ϕκ− at µ-almost every pointfor every κ > 0. Since ϕκ− → ϕ− and ϕκ+ → ϕ+ when κ → ∞, it follows thatϕ− = ϕ+ at µ-almost every point. The proof of Theorem 3.3.3 is complete.

3.3.5 Lyapunov exponents

We have observed previously that every sequence of time sums

ϕn =n−1∑

j=0

ϕ f j, n ≥ 1


is additive and, in particular, subadditive. Therefore, the ergodic theoremof Birkhoff (Theorem 3.2.3) is a particular case of Theorem 3.3.3. Anotherimportant consequence of the subadditive ergodic theorem is the theorem ofFurstenberg-Kesten that we state next.

Let f : M → M be a measurable transformation and µ be an invariantprobability measure. Consider any measurable function θ : M → GL(d) withvalues in the group GL(d). The cocycle defined by θ over f is the sequence offunctions defined by

φn(x) = θ(fn−1(x)) · · · θ(f(x))θ(x) for n ≥ 1 and φ0(x) = id

for every x ∈M . We leave it to the reader to check that

φm+n(x) = φn(fm(x)) · φm(x) for every m,n ∈ N and x ∈M . (3.3.15)

It is also easy to check that, conversely, any sequence (φn)n with this propertyis the cocycle defined by θ = φ1 over the transformation f .

Theorem 3.3.9 (Furstenberg-Kesten). If log+ ‖θ‖ ∈ L1(µ) then

λmax(x) = limn

1

nlog ‖φn(x)‖

exists at µ-almost every point. Moreover, λ+max ∈ L1(µ) and

∫λmax dµ = lim

n

1

n

∫log ‖φn‖ dµ = inf

n

1

n

∫log ‖φn‖ dµ.

If log+ ‖θ−1‖ ∈ L1(µ) then

λmin(x) = limn

− 1

nlog ‖φn(x)−1‖

exists at µ-almost every point. Moreover, λmin ∈ L1(µ) and∫λmin dµ = lim

n− 1

n

∫log ‖(φn)−1‖ dµ = sup

n− 1

n

∫log ‖(φn)−1‖ dµ.

To deduce this result from Theorem 3.3.3 it suffices to note that the se-quences

ϕmaxn (x) = log ‖φn(x)‖ and ϕmin

n (x) = log ‖φn(x)−1‖are subadditive (recall Example 3.3.2).

The multiplicative ergodic theorem of Oseledets, which we are going to statein the sequel, provides a major refinement of the conclusion of the Furstenberg-Kesten theorem. It asserts that, under the same hypotheses as Theorem 3.3.9,for µ-almost every x ∈ M there exist a positive integer number k = k(x) andreal numbers λ1(x) > · · · > λk(x) and a filtration (that is, a decreasing sequenceof vector subspaces)

Rd = V 1x > · · · > V kx > V k+1

x = 0 (3.3.16)

such that, for every i ∈ 1, . . . , k and µ-almost every x ∈M ,


(a1) k(f(x)) = k(x) and λi(f(x)) = λi(x) and θ(x) · V ix = V if(x);

(b1) limn

1

nlog ‖φn(x)v‖ = λi(x) for every v ∈ V ix \ V i+1

x ;

(c1) limn

1

nlog | detφn(x)| =

k∑

i=1

di(x)λi(x), where di(x) = dimV ix − dimV i+1x .

Moreover, the numbers k(x) and λ1(x), . . . , λk(x) and the subspaces V 1x , . . . , V

kx

depend measurably on the point x.The numbers λi(x) are called Lyapunov exponents of θ at the point x. They

satisfy λ1 = λmax and λk = λmin. For this reason, we also call λmax(x) andλmin(x) the extremal Lyapunov exponents of θ at the point x. Each di(x) iscalled multiplicity of the Lyapunov exponent λi(x).

When f is invertible, we may extend the sequence φn to the whole Z, through

φ−n(x) = φn(f−n(x))−1 for every n ≥ 1 and x ∈M .

Assuming also that log+ ‖θ−1‖ ∈ L1(µ), one obtains a stronger conclusion thanbefore: more than a filtration, there is a decomposition

Rd = E1x ⊕ · · · ⊕Ekx (3.3.17)

such that, for every i = 1, . . . , k,

(a2) θ(x) ·Eix = Eif(x) and V ix = V i+1x ⊕ Eix; so, dimEix = dimV ix − dim V i+1

x ;

(b2) limn→±∞

1

nlog ‖φn(x)v‖ = λi(x) for every v ∈ Eix different from zero;

(c2) limn→+∞

1

nlog | detφn(x)| =

k∑

i=1

di(x)λi(x), where di(x) = dimEix.

The reader will find a much more detailed discussion of these results, includ-ing proofs, in Chapter 4 of the book [Via14].

3.3.6 Exercises

3.3.1. Give a direct proof of the Birkhoff ergodic theorem (Theorem 3.2.3),using the approach in the proof of Theorem 3.3.3.

3.3.2. Given a subadditive sequence (ϕn)n with ϕ+1 ∈ L1(µ), show that the

functionsϕ− = lim inf

n

ϕnn

and ϕ+ = lim supn

ϕnn

are f -invariant, that is, they satisfy ϕ−(x) = ϕ− f(x) and ϕ+(x) = ϕ+ f(x)for µ-almost every x ∈M .


3.3.3. State and prove the subadditive ergodic theorem for flows.

3.3.4. Let M be a compact manifold and f : M → M be a diffeomorphism ofclass C1 that preserves the Lebesgue measure. Check that

k(x)∑

i=1

di(x)λi(x) = 0 at µ-almost every point x ∈M ,

where λi(x), i = 1, . . . , k(x) are the Lyapunov exponents of Df at the point xand di(x), i = 1, . . . , k(x) are the corresponding multiplicities.

3.3.5. Let (ϕn)n be a subadditive sequence of functions for some transformationf : M →M . We call time constant of (ϕn)n the number

limn

1

n

∫ϕn dµ

when it exists. Assuming that the limit does exist and is finite, show that wemay write ϕn = ψn + γn for each n, in such a way that (ψn)n is an additivesequence and (γn)n is a subadditive sequence with time constant equal to zero.

3.3.6. Under the assumptions of the Furstenberg-Kesten theorem, show thatthe sequence ψn = (1/n) log ‖φn‖ is uniformly integrable, in the following sense:for every ε > 0 there exists δ > 0 such that

µ(E) < δ ⇒∫

E

ψ+n dµ < ε for every n.

3.3.7. Under the assumptions of the Furstenberg-Kesten theorem, let Ψk denotethe time average of the function ψk = (1/k) log ‖φk‖ relative to the transforma-tion fk. Show that λmax(x) ≤ Ψk(x) for every k and µ-almost every x. UsingExercise 3.3.6, show that for every ρ > 0 and µ-almost every x there exists ksuch that Ψk(x) ≤ λmax(x) + ρ.

3.4 Discrete time and continuous time

Most of the time we focus our presentation in the context of dynamical systemswith discrete time. However, about everything that was said so far extends,more or less straightforwardly, to systems with continuous time. One reasonwhy the two theories are so similar is that one may relate systems of either kindto systems of the other kind, through certain constructions. That is the subjectof the present section. For simplicity, we stick to the case of invertible systems.The general case may be handled using the notion of natural extension, whichwas described in Section 2.4.2.

3.4. DISCRETE TIME AND CONTINUOUS TIME 89

3.4.1 Suspension flows

Our first construction associates with every invertible map f : M → M andevery measurable function τ : M → (0,∞) a flow gt : N → N , t ∈ R, that wecall suspension of f with return time τ , whose recurrence properties are directlyrelated to the recurrence properties of f . In particular, we associate a measureν invariant under this flow with every measure µ invariant under f . For thisconstruction we assume that the function τ is such that

∞∑

j=1

τ(f j(x)) =∞∑

j=1

τ(f−j(x)) = +∞. (3.4.1)

for every x ∈M . That is the case, for instance, if τ is bounded away from zero.The first step is to construct the domain N of the suspension flow. Let us

consider the transformation F : M × R →M × R defined by

F (x, s) = (f(x), s− τ(x)).

Note that F is invertible. Let ∼ be the equivalence relation defined in M × Rby

(x, s) ∼ (x, s) ⇔ there exists n ∈ Z such that Fn(x, s) = (x, s).

We denote by N the set of equivalence classes and by π : M × R → N thecanonical projection associating with every (x, s) ∈ M × R the correspondingequivalence class.

Now consider the flow Gt : M × R → M × R given by Gt(x, s) = (x, s + t).It is clear that Gt F = F Gt for every t ∈ R. This ensures that Gt, t ∈ Rinduces a flow gt, t ∈ R in the quotient space N , given by

gt(π(x, s)) = π(Gt(x, s)) for every x ∈M and s, t ∈ R. (3.4.2)

Indeed, if π(x, s) = π(x, s) then there exists n ∈ Z such that Fn(x, s) = (x, s).Hence,

Gt(x, s) = Gt Fn(x, s) = Fn Gt(x, s)and so π(Gt(x, s)) = π(Gt(x, s)). This shows that the flow gt, t ∈ R is welldefined.

To better understand how this flow is related to the transformation f , weneed to revisit the construction from a more concrete point of view. Let us con-sider D = (x, s) ∈ M × R : 0 ≤ s < τ(x). We claim that D is a fundamentaldomain for the equivalence relation ∼, that is, it contains exactly one represen-tative of each equivalence class. Uniqueness of the representative is immediate:just observe that if (x, s) ∈ D then Fn(x, s) = (xn, sn) with sn < 0 for everyn > 0 and sn > τ(fn(x)) for every n > 0. To prove existence, we need thecondition (3.4.1): it ensures that the iterates (xn, sn) = Fn(x, s) of any (x, s)satisfy

limn→+∞

sn = −∞ and limn→−∞

sn = +∞.


Then, taking n maximum such that sn ≥ 0, we find that (xn, sn) ∈ D. In thisway, the claim is proved. Now observe that the claim means that the restrictionof the projection π to domain D is a bijection over N . Thus, we may identifyN with D and, in particular, we may consider gt, t ∈ R as a flow in D.

In just the same way, we may identify M with the subset Σ = π(M × 0)of N . Observing that

gτ(x)(π(x, 0)) = π(x, τ(x)) = π(f(x), 0), (3.4.3)

we see that, through this identification, the transformation f : M → M corre-sponds to the first-return map (or Poincare return map) of the suspension flowto Σ. See Figure 3.2.

M

τ(x)

x

f(x)

R 0

Figure 3.2: Suspension flow

Now let µ be a measure on M invariant under f . Let us denote by ds theLebesgue measure on the real line R. It is clear that the (infinite) measureµ× ds is invariant under the flow Gt, t ∈ R. Moreover, it is invariant under thetransformation F , since µ is invariant under f . We call suspension of µ withreturn time τ the measure ν defined on N by

ν = π∗(µ× ds | D). (3.4.4)

In other words, ν is the measure such that

∫ψ dν =

∫ ∫ τ(x)

0

ψ(π(x, s)) ds dµ(x)

for every bounded measurable function ψ : N → (0,∞). In particular,

ν(N) =

∫1 dν =

∫τ(x) dµ(x) (3.4.5)

is finite if and only if the function τ is integrable with respect to µ.

Proposition 3.4.1. The flow gt, t ∈ R preserves the measure ν.


Proof. Let us fix t ∈ R. Given any measurable set B ⊂ N , let B = π−1(B)∩D.By the definition of ν, we have that ν(B) = (µ × ds)(B). For each n ∈ Z,let Bn be the set of all pairs (x, s) ∈ B such that G−t(x, s) ∈ Fn(D) and letBn = π(Bn). Since D is a fundamental domain, Bn : n ∈ Z is a partitionof B and Bn : n ∈ Z is a partition of B. Moreover, Bn = π−1(Bn) ∩ Dand, consequently, ν(Bn) = (µ × ds)(Bn) for every n. The definition of thesuspension flow gives that

π−1(g−t(Bn)

)= G−t

(π−1(Bn)

)= G−t

( ⋃

k∈Z

F k(Bn))

=⋃

k∈Z

F k(G−t(Bn)

).

Observing that F−n(G−t(Bn)) ⊂ D, we conclude that

ν(g−t(Bn)

)= (µ× ds)

(π−1(g−t(Bn)) ∩D

)= (µ× ds)

(F−n(G−t(Bn))

).

As the measure µ× ds is invariant under both F and Gt, the last expression isequal to (µ× ds)(Bn). Therefore,

ν(g−t(B)) =∑

n∈Z

ν(g−t(Bn)) =∑

n∈Z

(µ× ds)(Bn) = (µ× ds)(B) = ν(B).

This proves that ν is invariant under the flow gt, t ∈ R.

In Exercise 3.4.2 we invite the reader to relate the recurrence properties ofthe systems (f, µ) and (gt, ν).

3.4.2 Poincare maps

Next, we present a kind of inverse for the construction described in the previoussection. Let gt : N → N , t ∈ R be a measurable flow and ν be an invariantmeasure. Let Σ ⊂ N be a cross-section of the flow, that is, a subset of N suchthat for every x ∈ Σ there exists τ(x) ∈ (0,∞] such that gt(x) /∈ Σ for everyt ∈ (0, τ(x)) and gτ(x)(x) ∈ Σ whenever τ(x) is finite. We call τ(x) first-returntime of x to Σ. Our goal is to construct, starting from ν, a measure µ that isinvariant under the first-return map (or Poincare return map)

f : x ∈ Σ : τ(x) <∞ → Σ, f(x) = gτ(x)(x).

Observe that this map is injective.For each ρ > 0, denote Σρ = x ∈ Σ : τ(x) ≥ ρ. Given A ⊂ Σρ and

δ ∈ (0, ρ], consider Aδ = gt(x) : x ∈ A and 0 ≤ t < δ. Observe that the map(x, t) 7→ gt(x) is a bijection from A × [0, δ) to Aδ. Assume that Σ is endowedwith a σ-algebra of measurable subsets such that

1. the function τ and the maps f and f−1 are measurable;

2. if A ⊂ Σρ is measurable then Aδ ⊂ N is measurable, for every δ ∈ (0, ρ].

Lemma 3.4.2. Let A be a measurable subset of Σρ for some ρ > 0. Then, thefunction δ 7→ ν(Aδ)/δ is constant in the interval (0, ρ].


Proof. Consider any δ ∈ (0, ρ] and l ≥ 1. It is clear that

Aδ =l−1⋃

i=0

giδ/l(Aδ/l)

and this is a disjoint union. Using that ν is invariant under the flow gt, t ∈ R,we conclude that ν(Aδ) = lν(Aδ/l) for every δ ∈ (0, ρ] and every l ≥ 1. Then,ν(Arδ) = rν(Aδ) for every δ ∈ (0, ρ] and every rational number r ∈ (0, 1).Using, furthermore, the fact that both sides of this relation vary monotonicallywith r, we get that the equality remains true for every real number r ∈ (0, 1).This implies the conclusion of the lemma.

Given any measurable subset A of Σρ, ρ > 0, let us define

µ(A) =ν(Aδ)

δfor any δ ∈ (0, ρ]. (3.4.6)

Next, given any measurable subset A of Σ, let

µ(A) = supρµ(A ∩ Σρ). (3.4.7)

See Figure 3.3. We leave it to the reader to check that µ is a measure in Σ(Exercise 3.4.1). We call it the flux of ν through Σ under the flow.

Aδ

f(A)δ

Σ

A

f(A)

Figure 3.3: Flux of a measure through a cross-section

Proposition 3.4.3. Suppose that the measure ν is finite. Then the measure µin Σ is invariant under the Poincare map f .

Proof. Start by observing that the map f is essentially surjective: the comple-ment of the image f(Σ) has measure zero. Indeed, suppose that there exists aset E with µ(E) > 0 contained in Σ \ f(Σ). It is no restriction to assume thatE ⊂ Σρ for some ρ > 0. Then, ν(Eρ) > 0. Since ν is finite, by assumption, wemay apply the Poincare recurrence theorem to the flow g−t, t ∈ R. We get thatthere exists z ∈ Eρ such that g−s(z) ∈ Eρ for arbitrarily large values of s > 0.


By definition, z = gt(y) for some y ∈ E and some t ∈ (0, ρ]. By construction,the backward trajectory of y intersects Σ. Hence, there exists x ∈ Σ such thatf(x) = y. This contradicts the choice of E. Thus, the claim is proved.

Given any measurable set B ⊂ Σ, let us denote A = f−1(B). Moreover,given ε > 0, let us consider a countable partition of B into measurable subsetsBi satisfying the following conditions: for every i there is ρi > 0 such that

1. Bi and Ai = f−1(Bi) are contained in Σρi ;

2. sup(τ | Ai) − inf(τ | Ai) < ερi.

Next, choose ti < inf(τ | Ai) ≤ sup(τ | Ai) < si such that si − ti < ερi. Fixδi = ρi/2. Then, using the fact that f is essentially surjective,

gti(Aiδi) ⊃ Biδi−(si−ti)

and gsi(Aiδi) ⊂ Biδi+(si−ti)

.

Hence, using the hypothesis that ν is invariant,

ν(Aiδi) = ν(gti(Aiδi

)) ≥ ν(Biδi−(si−ti))

ν(Aiδi) = ν(gsi(Aiδi

)) ≤ ν(Biδi+(si−ti)).

Dividing by δi we get that

µ(Ai) ≥ 1 − (si − ti)

δµ(Bi) > (1 − 2ε)µ(Bi)

µ(Ai) ≤ 1 +(si − ti)

δµ(Bi) < (1 + 2ε)µ(Bi).

Finally, adding over all the values of i, we conclude that

(1 − 2ε)µ(A) ≤ µ(B) ≤ (1 + 2ε)µ(A).

Since ε is arbitrary, this proves that the measure µ is invariant under f .

3.4.3 Exercises

3.4.1. Check that the function µ defined by (3.4.6)-(3.4.7) is a measure.

3.4.2. In the context of Section 3.4.1, suppose that M is a topological spaceand f : M → M and τ : M → (0,∞) are continuous. Let gt : N → N bethe suspension flow and ν be the suspension of some Borel measure µ invariantunder f .

(a) Show that if x ∈M is recurrent for the transformation f then π(x, s) ∈ Nis recurrent for the flow gt, for every s ∈ R.

(b) Show that if π(x, s) ∈ N is recurrent for the flow gt, for some s ∈ R, thenx ∈M is recurrent for f .


(c) Conclude that the set of recurrent points of f has total measure for µ ifand only if the set of recurrent points of gt, t ∈ R has total measure for ν.In particular, this happens if at least one of the measures µ or ν is finite.

3.4.3. Let gt : N → N , t ∈ R be the flow defined by a vector field X of classC1 on a compact Riemannian manifold N . Assume that this flow preserves thevolume measure ν associated with the Riemannian metric. Let Σ be a hyper-surface of N transverse to X and νΣ be the volume measure on Σ associatedwith the restriction of the Riemannian metric. Define φ : Σ → (0,∞) throughφ(y) = |X(y) · n(y)|, where n(·) is a unit vector field orthogonal to Σ. Showthat the measure η = φνΣ is invariant under the Poincare map f : Σ → Σ ofthe flow. Indeed, η coincides with the flux of ν through Σ.

3.4.4. The following construction has a significant role in the theory of intervalexchanges. Let N ⊂ R4

+ be the set of all 4-uples (λ1, λ2, h1, h2) of positivereal numbers, endowed with the standard volume measure ν = dλ1dλ2dh1dh2.Define

F : N → N , F (λ1, λ2, h1, h2) =

(λ1 − λ2, λ2, h1, h1 + h2) if λ1 > λ2

(λ1, λ2 − λ1, h1 + h2, h2) if λ1 < λ2

(F is not defined when λ1 = λ2.) Let N be the quotient of N by the equivalencerelation z ∼ z ⇔ Fn(z) = z for some n ∈ Z and let π : N → N be the canonicalprojection. Define

Gt : N → N , t ∈ R, Gt(λ1, λ2, h1, h2) = (etλ1, etλ2, e

−th1, e−th2).

Let a : N → (0,∞) be the functional given by a(λ1, λ2, h1, h2) = λ1h1 + λ2h2.For each c > 0, let Nc be the subset of all x ∈ N such that a(x) = c, let νc bethe volume measure defined on Nc by the restriction of the Riemannian metricof R4

+ and let ηc = νc/‖ grad a‖.

(a) Show that F preserves the functional a and so there exists a functionala : N → (0,∞) such that a π = a. Show that Gt commutes with Fand preserves a. Hence, (Gt)t induces a flow (gt)t in the quotient space Nthat preserves the functional a. Check that F and (Gt)t preserve ν andηc for every c.

(b) Check that D =(λ1, λ2, h1, h2) : λ1 + λ2 ≥ 1 > maxλ1, λ2

is a

fundamental domain for ∼. Consider the measure ν = π∗(ν | D) on N .Check that the definition does not depend on the choice of the fundamentaldomain and show that ν is invariant under the flow (gt)t. Is the measureν finite?

(c) Check that Σ = π((λ1, λ2, h1, h2) : λ1 +λ2 = 1) is a cross-section of theflow (gt)t. Calculate the Poincare map f : Σ → Σ and the correspondingfirst-return time function τ . Calculate the flux µ of the measure ν throughΣ. Is the measure µ finite?


(d) For every c > 0, let Nc = π(Nc) and ηc = π∗(ηc ∩D). Show that Nc andηc are invariant under (gt)t, for every c > 0. Check that ηc(Nc) < ∞ forevery c. Conclude that ν-almost every point is recurrent for the flow (gt)t.


Chapter 4

Ergodicity

The theorems presented in the previous chapter fully establish the first part ofBoltzmann’s ergodic hypothesis: for any measurable set E, the mean sojourntime τ(E, x) is well defined for almost every point x. The second part of theergodic hypothesis, that is, the claim that τ(E, x) should coincide with themeasure of E for almost every x, is a statement of a different nature and is thesubject of the present chapter.

In this chapter we always take µ to be a probability measure invariant undersome measurable transformation f : M → M . We say that the system (f, µ)is ergodic if, given any measurable set E, we have τ(E, x) = µ(E) for µ-almostevery point x ∈M . We are going to see that this is equivalent to saying that thesystem is dynamically indivisible, in the sense that every invariant set has eitherfull measure or zero measure. Other equivalent formulations of the ergodicityproperty are discussed in Section 4.1. One of them is that time averages coincidewith space averages: for every integrable function ϕ,

limn

1

n

n−1∑

j=0

ϕ(f j(x)) =

∫ϕdµ at µ-almost every point.

In Section 4.2 we illustrate, by means of examples, several techniques to proveor disprove ergodicity. Most of them will be utilized again later in more complexsituations. Next, we take the following viewpoint: we fix the dynamical systemsand analyze the properties of ergodic measures within the space of all invariantmeasures of that dynamical system. As we are going to see in Section 4.3, theergodic measures are precisely the extremal elements of that space.

In Section 4.4 we give a brief outline of the historical development of ErgodicTheory in the context of conservative systems. The main highlights are KAMtheory, thus denominated in homage to Andrey Kolmogorov, Vladimir Arnoldand Jurgen Moser, and hyperbolic dynamics, which was initiated by StevenSmale, Dmitry Anosov, Yakov Sinai and their collaborators. The two theoriesdeal with distinct types of dynamical behavior, elliptic and hyperbolic, and theyreach opposing conclusions: roughly speaking, hyperbolic systems are ergodic

97

98 CHAPTER 4. ERGODICITY

whereas elliptic systems are not.

4.1 Ergodic systems

We use the expressions ‘the measure µ is ergodic with respect to the transfor-mation f ’ or ‘the transformation f is ergodic with respect to the measure µ’to mean the same thing, namely, that the system (f, µ) is ergodic. Recall that,by definition, this means that the mean sojourn time in any measurable set ofµ-almost every point coincides with the measure of that set. This condition canbe rephrased in several equivalent ways, as we are going to see next.

4.1.1 Invariant sets and functions

A measurable function ϕ : M → R is said to be invariant if ϕ = ϕf at µ-almostevery point. In other words, ϕ is invariant if it is constant on every trajectoryof f outside a zero measure subset. Moreover, we say that a measurable setB ⊂ M is invariant if its characteristic function XB is an invariant function.Equivalently, B is invariant if it differs from its pre-image f−1(B) by a zeromeasure set:

µ(B∆f−1(B)) = 0.

Exercise 1.1.4 collects some equivalent formulations of this property. It is easyto check that the family of all invariant sets is a σ-algebra, that is, it is closedunder countable unions and intersections and under passage to the complement.

Example 4.1.1. Let f : [0, 1] → [0, 1] be the decimal expansion transformation,introduced in Section 1.3.1, and µ be the Lebesgue measure. Clearly, the setA = Q ∩ [0, 1] of rational numbers is invariant. Other interesting examples arethe sets of points x = 0, a1a2 . . . in [0, 1] with prescribed proportion of digits aiwith each value k ∈ 0, . . . , 9. More precisely, given any vector p = (p0, . . . , p9)such that pi ≥ 0 for every i and

∑i pi = 1, define

Ap = x : limn

1

n#1 ≤ i ≤ n : ai = k = pk for k = 0, . . . , 9.

Observe that if x = 0, a1a2 . . . then every point y ∈ f−1(x) may be written asy = 0, ba1a2 . . . with b ∈ 0, . . . , 9. It is clear that the extra digit b does notaffect the proportion of digits with any of the values 0, . . . , 9 in the decimalexpansion. Thus, y ∈ Ap if and only if x ∈ Ap. This implies that Ap is indeedinvariant under f .

Example 4.1.2. Let ϕ : [0, 1] → R be a function in L1(µ). According to theergodic theorem of Birkhoff (Theorem 3.2.3), the time average ϕ is an invariantfunction. So, every level set

Bc = x ∈ [0, 1]; ϕ(x) = c,

4.1. ERGODIC SYSTEMS 99

is an invariant set. Observe also that every invariant function is of this form: itis clear that if ϕ is invariant then it coincides with its time average ϕ at µ-almostevery point.

The next proposition collects a few equivalent ways to define ergodicity. Wesay that a function ϕ is constant at µ-almost every point if there exists c ∈ Rsuch that ϕ(x) = c for µ-almost every x ∈M .

Proposition 4.1.3. Let µ be an invariant probability measure of a measurabletransformation f : M → M . The following conditions are all equivalent:

(a) For every measurable set B ⊂ M one has τ(B, x) = µ(B) for µ-almostevery point.

(b) For every measurable set B ⊂ M the function τ(B, ·) is constant at µ-almost every point.

(c) For every integrable function ϕ : M → R one has ϕ(x) =∫ϕdµ for

µ-almost every point.

(d) For every integrable function ϕ : M → R the time average ϕ : M → R isconstant at µ-almost every point.

(e) For every invariant integrable function ψ : M → R one has ψ(x) =∫ψ dµ

for µ-almost every point.

(f) Every invariant integrable function ψ : M → R is constant at µ-almostevery point.

(g) For every invariant subset A we have either µ(A) = 0 or µ(A) = 1.

Proof. It is immediate that (a) implies (b), that (c) implies (d) and that (e)implies (f). It is also clear that (e) implies (c) and (f) implies (d), because thetime average is an invariant function (recall Proposition 3.2.4). Analogously,(c) implies (a) and (d) implies (b), because the mean sojourn time is a timeaverage (of the characteristic function of B). We are left to prove the followingimplications:

(b) implies (g): Let A be an invariant set. Then τ(A, x) = 1 for µ-almostevery x ∈ A and τ(A, x) = 0 for µ-almost every x ∈ Ac. Since τ(A, ·) is assumedto be constant at µ-almost every point, it follows that µ(A) = 0 or µ(A) = 1.

(g) implies (e): Let ψ be an invariant integrable function. Then every levelset

Bc = x ∈M : ψ(x) ≤ cis an invariant set. So, the hypothesis implies that µ(Bc) ∈ 0, 1 for everyc ∈ R. Since c 7→ µ(Bc) is non-decreasing, it follows that there exists c ∈ R suchthat µ(Bc) = 0 for every c < c and µ(Bc) = 1 for every c ≥ c. Then ψ = c atµ-almost every point. Hence,

∫ψ dµ = c and so ψ =

∫ψ dµ at µ-almost every

point.


4.1.2 Spectral characterization

The next proposition characterizes the ergodicity property in terms of the Koop-man operator Uf (ϕ) = ϕ f :

Proposition 4.1.4. Let µ be an invariant probability measure of a measurabletransformation f : M →M . The following conditions are equivalent:

(a) (f, µ) is ergodic.

(b) For any pair of measurable sets A and B one has

limn

1

n

n−1∑

j=0

µ(f−j(A) ∩B) = µ(A)µ(B). (4.1.1)

(c) For any functions ϕ ∈ Lp(µ) and ψ ∈ Lq(µ), with 1/p+ 1/q = 1, one has

limn

1

n

n−1∑

j=0

∫(U jfϕ)ψ dµ =

∫ϕdµ

∫ψ dµ. (4.1.2)

Proof. It is clear that (c) implies (b): just take ϕ = XA and ψ = XB. To showthat (b) implies (a), let A be an invariant set. Taking A = B in the hypothesis(b), we get that

µ(A) = limn

1

n

n−1∑

j=0

µ(f−j(A) ∩A) = µ(A)2.

This implies that µ(A) = 0 or µ(A) = 1.Now it suffices to prove that (a) implies (c). Consider any ϕ ∈ Lp(µ) and

ψ ∈ Lq(µ). By ergodicity and the ergodic theorem of Birkhoff (Theorem 3.2.3)we have that

1

n

n−1∑

j=0

U jfϕ→∫ϕdµ (4.1.3)

at µ-almost every point. Initially, assume that |ϕ| ≤ k for some k ≥ 1. Then,for every n ∈ N,

∣∣( 1

n

n−1∑

j=0

U jfϕ)ψ∣∣ ≤ k|ψ|.

So, since k|ψ| ∈ L1(µ), we may use the dominated convergence theorem (Theo-rem A.2.11) to conclude that

∫(1

n

n−1∑

j=0

U jfϕ)ψ dµ →∫ϕdµ

∫ψ dµ.


This proves the claim (4.1.2) when ϕ is bounded. All that is left to do is removethis restriction. Given any ϕ ∈ Lp(µ) and k ≥ 1, define

ϕk(x) =

k if ϕ(x) > kϕ(x) if ϕ(x) ∈ [−k, k]−k if ϕ(x) < −k.

Fix ε > 0. By the previous argument, for every k ≥ 1 one has

∣∣∫

(1

n

n−1∑

j=0

U jfϕk)ψ dµ−∫ϕk dµ

∫ψ dµ

∣∣ < ε (4.1.4)

if n is large enough (depending on k). Next, observe that ‖ϕk − ϕ‖p → 0 whenk → ∞: this is clear when p = ∞, because ϕk = ϕ for every k > ‖ϕ‖∞; forp < ∞ use the monotone convergence theorem (Theorem A.2.9). Hence, usingthe Holder inequality (Theorem A.5.5), we have that

∣∣∫

(ϕk − ϕ) dµ

∫ψ dµ

∣∣ ≤ ‖ϕk − ϕ‖p∣∣∫ψ dµ

∣∣ < ε, (4.1.5)

for every k sufficiently large. Similarly,

∣∣∫

1

n

n−1∑

j=0

U jf (ϕk − ϕ)ψ dµ∣∣ ≤ 1

n

n−1∑

j=0

∣∣∫U jf (ϕk − ϕ)ψ dµ

∣∣

≤ 1

n

n−1∑

j=0

‖U jf (ϕk − ϕ)‖p ‖ψ‖q dµ

= ‖ϕk − ϕ‖p ‖ψ‖q < ε,

(4.1.6)

for every n and every k sufficiently large, independent of n. Fix k so that (4.1.5)and (4.1.6) hold and then take n sufficiently large such that (4.1.4) also holds.Summing the three relations (4.1.4) to (4.1.6), we get that

∣∣∫

(1

n

n−1∑

j=0

U jfϕ)ψ dµ−∫ϕdµ

∫ψ dµ

∣∣ < 3ε

for every n sufficiently large. This gives condition (c).

In case p = q = 2, the condition (4.1.2) may be expressed in terms of theinner product · in the space L2(µ). In this way we get that (f, µ) is ergodic ifand only if:

limn

1

n

n−1∑

j=0

[(Unf ϕ) − (ϕ · 1)

]· ψ = 0 for every ϕ, ψ ∈ L2(µ). (4.1.7)


We will use a few times the following elementary facts: given any measurablesets A and B,

|µ(A) − µ(B)| = |µ(A \B) − µ(B \A)|≤ µ(A \B) + µ(B \A) = µ(A∆B)

(4.1.8)

and given any sets A1, A2, B1, B2,

(A1 ∩A2)∆(B1 ∩B2) ⊂ (A1∆B1) ∪ (A2∆B2). (4.1.9)

Corollary 4.1.5. Assume that the condition (4.1.1) in Proposition 4.1.4 holdsfor every A and B in some algebra A that generates the σ-algebra of measurablesets. Then (f, µ) is ergodic.

Proof. Let A and B be arbitrary measurable sets. By the approximation the-orem (Theorem A.1.19), given any ε > 0 there are A0 and B0 in A such thatµ(A∆A0) < ε and µ(B∆B0) < ε. Observe that

∣∣µ(f−j(A) ∩B) − µ(f−j(A0) ∩B0)∣∣ ≤ µ(f−j(A)∆f−j(A0)) + µ(B∆B0)

= µ(A∆A0) + µ(B∆B0) < 2ε

(the equality uses the fact that µ is an invariant measure) for every j and

|µ(A)µ(B) − µ(A0)µ(B0)| ≤ µ(A∆A0) + µ(B∆B0) < 2ε.

Then, the hypothesis

limn

1

n

n−1∑

j=0

µ(f−j(A0) ∩B0) = µ(A0)µ(B0)

implies that

−4ε ≤ lim infn

1

n

n−1∑

j=0

µ(f−j(A) ∩B) − µ(A)µ(B)

≤ lim supn

1

n

n−1∑

j=0

µ(f−j(A) ∩B) − µ(A)µ(B) ≤ 4ε.

Since ε is arbitrary, this proves that the condition (4.1.1) holds for all pairs ofmeasurable sets. According to Proposition 4.1.4, it follows that the system isergodic.

In the same spirit, it suffices to check part (c) of Proposition 4.1.4 on densesubsets:

Corollary 4.1.6. Assume that the condition (4.1.2) in Proposition 4.1.4 forevery ϕ and ψ in dense subsets of Lp(µ) and Lq(µ), respectively. Then (f, µ) isergodic.

We leave the proof of this fact to the reader (see Exercise 4.1.3).


4.1.3 Exercises

4.1.1. Let (M,A) be a measurable space and f : M → M be a measurabletransformation. Prove that if p ∈ M is a periodic point of period k, then themeasure µp = 1

k (δp + δf(p) + · · · + δfk−1(p)) is ergodic.

4.1.2. Let µ be an invariant probability measure, not necessarily ergodic, of ameasurable transformation f : M → M . Show that the following limit existsfor any pair of measurable sets A and B:

limn

1

n

n−1∑

i=0

µ(f−i(A) ∩B).

4.1.3. Show that an invariant probability measure µ is ergodic for a transfor-mation f : M →M if and only if any one of the following conditions holds:

(a) µ(⋃n≥0 f

−n(A)) = 1 for every measurable set A with µ(A) > 0;

(b) given any measurable sets A,B with µ(A)µ(B) > 0, there is n ≥ 1 suchthat µ

(f−n(A) ∩B

)> 0;

(c) the convergence in condition (c) of Proposition 4.1.4 holds for some choiceof p, q and some dense subset of functions ϕ ∈ Lp(µ) and ψ ∈ Lq(µ);

(d) there is p ∈ [1,∞] such that every invariant function ϕ ∈ Lp(µ) is constantat µ-almost every point;

(e) every integrable function ϕ satisfying ϕ f ≥ ϕ at µ-almost every point(or ϕf ≤ ϕ at µ-almost every point) is constant at µ-almost every point.

4.1.4. TakeM to be a metric space. Prove that an invariant probability measureµ is ergodic for f : M → M if and only if the time average of every boundeduniformly continuous function ϕ : M → R is constant at µ-almost every point.

4.1.5. Take M to be a metric space. We call basin of an invariant probabilitymeasure µ the set B(µ) of all points x ∈M such that

limn→∞

1

n

n−1∑

j=0

ϕ(f j(x)) =

∫ϕdµ

for every bounded continuous function ϕ : M → R. Check that the basin is aninvariant set. Moreover, if µ is ergodic then B(µ) has full µ-measure.

4.1.6. Show that if µ and η are distinct ergodic probability measures of atransformation f : M →M , then η and µ are mutually singular.

4.1.7. Let µ be a probability measure invariant under some transformationf : M → M . Show that the product measure µ2 = µ × µ is invariant underthe transformation f2 : M ×M → M ×M defined by f2(x, y) = (f(x), f(y)).Moreover, if (f2, µ2) is ergodic then (f, µ) is ergodic. Is the converse true?


4.1.8. Let µ be a probability measure invariant under some transformationf : M → M . Assume that (fn, µ) is ergodic for every n ≥ 1. Show that if ϕ isa non-constant eigenfunction of the Koopman operator Uf then the eigenvalueis not a root of unity and any set restricted to which ϕ is constant has zeromeasure.

4.2 Examples

In this section we use a number of examples to illustrate different methods forchecking whether a system is ergodic or not.


Initially, let us consider the case of a rotation Rθ : S1 → S1 on the circleS1 = R/Z. As observed in Section 1.3.3, the Lebesgue measure m is invariantunder Rθ. We want to analyze the ergodic behavior of the system (Rθ,m) fordifferent values of θ.

If θ is rational, say θ = p/q in irreducible form, Rqθ(x) = x for every x ∈ S1.then, given any segment I ⊂ S1 with length less than 1/q, the set

A = I ∪Rθ(I) ∪ · · · ∪Rq−1θ (I)

is invariant under Rθ and its Lebesgue measure satisfies 0 < m(A) < 1. Thus, ifθ is rational then the Lebesgue measure is not is ergodic. The converse is muchmore interesting:

Proposition 4.2.1. If θ is irrational then Rθ is ergodic relative to the Lebesguemeasure.

We are going to mention two different proofs of this fact. The first one,which we detail in the sequel, uses some simple facts from Fourier Analysis.The second one, which we leave as an exercise (Exercise 4.2.6), is based on adensity point argument similar to the one we will use in Section 4.2.2 to provethat the decimal expansion map is ergodic relative to the Lebesgue measure.

We denote by L2(m) the Hilbert space of measurable functions ψ whosesquare is integrable, that is, such that:

∫|ψ|2 dm <∞.

It is convenient to consider functions with values in C, and we will do so. Weuse the well known fact that the family of functions

φk : S1 → C, x 7→ e2πikx, k ∈ Z

is a Hilbert basis of this space: given any ϕ ∈ L2(m) there exists a uniquesequence (ak)k∈Z of complex numbers such that

ϕ(x) =∑

k∈Z

ake2πikx for almost every x ∈ S1. (4.2.1)

4.2. EXAMPLES 105

This is called the Fourier series expansion of ϕ ∈ L2(m). Then

ϕ(Rθ(x)

)=∑

k∈Z

ake2πikθe2πikx. (4.2.2)

Assume that ϕ is invariant. Then (4.2.1) and (4.2.2) coincide. By uniquenessof the coefficients in the Fourier series expansion, this happens if and only if

ake2πikθ = ak for every k ∈ Z.

The hypothesis that θ is irrational means that e2πikθ 6= 1 for every k 6= 0. Hence,the relation that we just obtained implies that ak = 0 for every k 6= 0. In otherwords, ϕ(z) = a0 for m-almost every z ∈ S1. This proves that every invariantL2 function is constant m-almost everywhere. In particular, the characteristicfunction ϕ = XA of any invariant set A ⊂ S1 is constant at m-almost everypoint. This is the same as saying that A has either zero measure or full measure.Hence, by Proposition 4.1.3, the measure m is ergodic.

These observations extend in a natural way to the rotation on the d-torusTd, for any d ≥ 1:

Proposition 4.2.2. If θ = (θ1, . . . , θd) is rationally independent then the rota-tion Rθ : Td → Td is ergodic relative to the Lebesgue measure.

This may be proved by the same argument as in the case d = 1, using thefact (see Exercise 4.2.1) that the family of functions

φk1,...,kd: Td → C, (x1, . . . , xd) 7→ e2πi(k1x1+···+kdxd), (k1, . . . , kd) ∈ Zd

is a Hilbert basis of the space L2(m).

Corollary 4.2.3. If θ = (θ1, . . . , θd) is rationally independent then the rotationRθ : Td → Td is minimal, that is, every orbit O(x) = Rnθ (x) : n ∈ N is densein Td.

Proof. Let us consider in Td the flat distance, defined by:

d([ξ], [η]) = infd(ξ′, η′) : ξ′, η′ ∈ Rd, ξ′ ∼ ξ, η′ ∼ η.

Observe that this distance is preserved by every rotation. Let Uk : k ∈ Nbe a countable basis of open sets of Td and m be the Lebesgue measure onTd. By ergodicity, there is W ⊂ Td, with total Lebesgue measure, such thatτ(Uk, x) = m(Uk) > 0 for every k and every x ∈ W . In particular, the orbitof x is dense in Td for every x ∈ W . Now consider an arbitrary point x ∈ Mand consider any y ∈ W . Then, for every δ > 0 there exists k ≥ 1 such thatd(fk(y), x) < δ. It follows that d(fn+k(y), fn(x)) < δ for every n ≥ 1. Sincethe orbit of y is dense, this implies that the orbit of x is δ-dense, that is, itintersects the δ-neighborhood of every point. Since δ is arbitrary, this impliesthat the orbit of x is dense in the ambient torus.


In fact, the irrational rotations on the circle or, more generally, on anytorus have a much stronger property than ergodicity: they are uniquely er-godic, meaning that they admit a unique invariant probability measure (whichis the Lebesgue measure, of course). Uniquely ergodic systems are studied inChapter 6.

4.2.2 Decimal expansion

Consider the transformation f : [0, 1] → [0, 1], f(x) = 10x− [10x] introduced inSection 1.3.1. We have seen that f preserves the Lebesgue measure m.

Proposition 4.2.4. The transformation f is ergodic relative to the Lebesguemeasure m.

Proof. By Proposition 4.1.3, it suffices to prove that every invariant set A hastotal measure. The main ingredient is the derivation theorem (Theorem A.2.15),according to which almost every point of A is a density point of A. Moreprecisely (see also Exercise A.2.9), m-almost every point a ∈ A satisfies

limε→0

infm(I ∩A)

m(I): I an interval such that a ∈ I ⊂ B(a, ε)

= 1 . (4.2.3)

Let us fix a density point a ∈ A. Since the set of points of the form m/10k,k ∈ N, 0 ≤ m ≤ 10k has zero measure, it is no restriction to suppose that a isnot of that form. Let us consider the family of intervals

I(k,m) =(m− 1

10k,m

10k), k ∈ N, m = 1, . . . , 10k.

It is clear that for each k ∈ N there exists a unique m = mk such that I(k,mk)contains the point a. Denote Ik = I(k,mk). The property (4.2.3) implies that

m(Ik ∩A)

m(Ik)→ 1 when k → ∞.

Observe also that each fk is an affine bijection from Ik to the interval (0, 1).This has the following immediate consequence, which is crucial for our argument:

Lemma 4.2.5 (Distortion). For every k ∈ N, one has

m(fk(E1))

m(fk(E2))=m(E1)

m(E2)(4.2.4)

for any measurable subsets E1 and E2 of Ik.

Applying this fact to E1 = Ik ∩A and E2 = Ik we find that

m(fk(Ik ∩A)

)

m((0, 1)

) =m(Ik ∩A)

m(Ik).

4.2. EXAMPLES 107

Clearly, m((0, 1)

)= 1. Moreover, as we take A to be invariant, fk(Ik ∩ A) is

contained in A. In this way we get that

m(A) ≥ m(Ik ∩A)

m(Ik)for every k.

Since the sequence on the right-hand side converges to 1 when k → ∞, it followsthat m(A) = 1, as we wanted to prove.

The proof of Lemma 4.2.5 relies on the fact that the transformation f isaffine on each interval

((m − 1)/10,m/10

)and that may give the impression

that the method of proof that we just presented is restricted to a very specialclass of examples. In fact, this is not so, much on the contrary.

The reason is that there are many situations where one can obtain a slightlyweaker version of the statement of Lemma 4.2.5 that is, nevertheless, still suf-ficient to conclude the proof of ergodicity. In a few words, instead of the claimthat the two sides of (4.2.4) are equal, one can often show that the quotientbetween the two terms is bounded by some uniform constant. That is calledbounded distortion property. As an illustration of these ideas, in Section 4.2.4we prove that the Gauss transformation is ergodic.

Next, we describe an application of Proposition 4.2.4 in the context of Num-ber Theory. We say that a number x ∈ R is 10-normal if every block of digits(b1, . . . , bl), l ≥ 1 occurs with frequency 10−l in the decimal expansion of x.Rational numbers are never 10-normal, of course, and it is also easy to giveirrational examples, such as x = 0, 101001000100001000001 · · · . Moreover, it isnot difficult to construct 10-normal numbers, for example, the Champernowneconstant x = 0, 12345678910111213141516171819202122 · · · , which is obtainedby concatenation of the successive natural numbers.

However, it is usually difficult to check whether a given number is 10-normalor not. For example, that remains unknown for the numbers π, e and even

√2.

On the other hand, using the previous proposition one can easily prove thatalmost every number is 10-normal:

Proposition 4.2.6. The set of 10-normal numbers x ∈ R has full Lebesguemeasure in the real line.

Proof. Since the fact of being 10-normal or not is independent of the integerpart of the number, we only need to show that almost every x ∈ [0, 1] is 10-normal. Consider f : [0, 1] → [0, 1] defined by f(x) = 10x − [10x]. For eachblock (b1, . . . , bl) ∈ 0, . . . , 9l, consider the interval

Ib1,...,bl=[ κ10l

,κ+ 1

10l

)with κ =

l∑

i=1

bi10l−i.

Recall that if x = 0, a0a1 · · · akak+1 · · · then fk(x) = 0, akak+1 · · · for everyk ≥ 1. Hence, fk(x) ∈ Ib1,...,bl

if and only if (ak, . . . , ak+l−1) = (b1, . . . , bl).So, the mean sojourn time τ(Ib1,...,bl

, x) is equal to the frequency of the block


(b1, . . . , bl) in the decimal expansion of x. Using the Birkhoff ergodic theoremand the fact that the transformation f is ergodic with respect to the Lebesguemeasure m, we conclude that for every (b1, . . . , bl) there exists a full Lebesguemeasure subset B(b1, . . . , bl) of the interval [0, 1] such that

τ(Ib1,...,bl, x) = m(Ib1,...,bl

) =1

10lfor every x ∈ B(b1, . . . , bl).

Let B be the intersection of B(b1, . . . , bl) over all values of b1, . . . , bl in 0, . . . , 9and every l ≥ 1. Then m(B) = 1 and every x ∈ B is 10-normal.

More generally, for any integer number d ≥ 2, we say that x ∈ R is d-normalif every block (b1, . . . , bl) ∈ 0, . . . , d − 1l, l ≥ 1 occurs with frequency d−l inthe expansion of x in base d. Finally, we say that x is a normal number if it isd-normal for every d ≥ 2. Everything that was said before for d = 10 extendsimmediately to general d. In particular, the set of d-normal numbers has fullLebesgue measure for every d ≥ 2. Taking the intersection over all the values ofd, we conclude that Lebesgue almost every real number is normal (Borel normaltheorem).

4.2.3 Bernoulli shifts

Let (X, C, ν) be a probability space. In this section we consider the productspace Σ = XN, endowed with the product σ-algebra B = CN and the productmeasure µ = νN. As explained in Appendix A.2.3, this means that: Σ is the setof all sequences (xn)n∈N with xn ∈ X for every n; B is the σ-algebra generatedby the measurable cylinders

[m;Am, . . . , An] = (xi)i∈N : xi ∈ Ai for m ≤ i ≤ n

with m ≤ n and Ai ∈ C for each i; and µ is the probability measure on Σcharacterized by

µ([m;Am, . . . , An]) =

n∏

i=m

ν(Ai). (4.2.5)

We may think of the elements of Σ as representing the results of a sequenceof random experiments with values in X and all subject to the same probabilitydistribution ν: given any measurable set A ⊂ X , the probability of xi ∈ A isequal to ν(A) for every i. Moreover, in this model the results of the successiveexperiments are independent: indeed, the relation (4.2.5) means that the prob-ability of xi ∈ Ai for every m ≤ i ≤ n is the product of the probabilities of theindividual events xi ∈ Ai.

In this section we introduce a dynamical system σ : Σ → Σ the space Σ,called shift map, which preserves the measure µ. The pair (σ, µ) is called aBernoulli shift. The main result is that every Bernoulli shift is an ergodicsystem.

It is worth pointing out that N may be replaced with Z throughout theconstruction. That is, we may take Σ to be the space of two-sided sequences

4.2. EXAMPLES 109

(. . . , x−n, . . . , x0, . . . , xn, . . . ). Up to minor adjustments, which we leave to thereader, all that follows remains valid in that case. In addition, in the two-sidedcase the shift map is invertible.

The shift map σ : Σ → Σ is defined by

σ((xn)n) = (xn+1)n.

That is, by definition, σ sends each sequence (x0, x1, . . . , xn, . . . ) to the sequence(x1, . . . , xn, . . . ). Observe that the pre-image of any cylinder is still a cylinder:

σ−1([m;Am, . . . , An]) = [m+ 1;Am, . . . , An]. (4.2.6)

It follows that the map σ is measurable with respect to the σ-algebra B. More-over,

µ(σ−1([m;Am, . . . , An])

)= ν(Am) · · · ν(An) = µ

([m;Am, . . . , An]

)

and (using Lemma 1.3.1) that ensures that the measure µ is invariant under σ.

Proposition 4.2.7. Every Bernoulli shift (σ, µ) is ergodic.

Proof. Let A be an invariant measurable set. We want to prove that µ(A) = 0or µ(A) = 1. We use the following fact:

Lemma 4.2.8. If B and C are finite unions of pairwise disjoint cylinders, then

µ(B ∩ σ−j(C)) = µ(B)µ(σ−j(C)) = µ(B)µ(C),

for every j sufficiently large.

Proof. First, suppose that B and C are both cylinders: B = [k;Bk, . . . , Bl] andC = [m;Cm, . . . , Cn]. Then,

σ−j(C) = [m+ j;Cm, . . . , Cn] for each j.

Consider any j large enough that m+ j > l. Then,

B ∩ σ−j(C) = (xn)n : xk ∈ Bk, . . . , xl ∈ Bl, xm+j ∈ Cm, . . . , xn+j ∈ Cn= [k;Bk, . . . , Bl, X, . . . , X,Cm, . . . , Cn],

where X appears exactly m + j − l − 1 times. By the definition (4.2.5), thisgives that

µ(B ∩ σ−j(C)) =

l∏

i=k

ν(Bi) 1m+j−l−1n∏

i=m

ν(Ci) = µ(B)µ(C).

This proves the conclusion of the lemma when both sets are cylinders. Thegeneral case follows easily, using the fact that µ is finitely additive.


Proceeding with the proof of Proposition 4.2.7, suppose for a while that theinvariant set A belongs to the algebra B0 whose elements are the finite unions ofpairwise disjoint cylinders. Then, on the one hand, we may apply the previouslemma with B = C = A to conclude that µ(A∩σ−j(A)) = µ(A)2 for every largej. On the other hand, since A is invariant, the left-hand side of this identityis equal to µ(A) for every j. It follows that µ(A) = µ(A)2, which means thateither µ(A) = 0 or µ(A) = 1.

Now let A be an arbitrary invariant set. By the approximation theorem(Theorem A.1.19), given any ε > 0 there exists B ∈ B0 such that µ(A∆B) < ε.By Lemma 4.2.8 we may fix j such that

µ(B ∩ σ−j(B)) = µ(B)µ(σ−j(B)) = µ(B)2. (4.2.7)

Using (4.1.8) and (4.1.9) and the fact that µ is invariant, we get that

∣∣µ(A ∩ σ−j(A)) − µ(B ∩ σ−j(B))∣∣ ≤ 2µ(A∆B) < 2ε (4.2.8)

(a similar fact was deduced during the proof of Corollary 4.1.5). Moreover,

∣∣µ(A)2 − µ(B)2∣∣ ≤ 2

∣∣µ(A) − µ(B)∣∣ < 2ε. (4.2.9)

Putting the relations (4.2.7), (4.2.8) and (4.2.9) together, we conclude that|µ(A) − µ(A)2| < 4ε. Since ε is arbitrary, we deduce that µ(A) = µ(A)2 and,hence, either µ(A) = 0 or µ(A) = 1.

When X is a topological space and C is the corresponding Borel σ-algebra,we may endow Σ with the product topology which, by definition, is the topologygenerated by the cylinders [m;Am, . . . , An] where Am, . . . , An are open subsetsof X . The property (4.2.6) implies that the shift map σ : Σ → Σ is continuouswith respect to this topology. The theorem of Tychonoff (see [Dug66]) assertsthat Σ is compact if X is compact.

A relevant special case is when X is a finite set endowed with the discretetopology, that is, such that every subset of X is open. A map f : M →M in atopological space M is said to be transitive if there exists some x ∈ M whosetrajectory fn(x), n ≥ 1 is dense in M . We leave it to the reader (Exercise 4.2.2)to prove the following result:

Proposition 4.2.9. Let X be a finite set and Σ be either XN or XZ. Then theshift map σ : Σ → Σ is transitive. Moreover, the set of all periodic points of σis dense in Σ.

The following informal statement, which is one of many versions of the mon-key paradox, illustrates the meaning of the ergodicity of the Bernoulli measureµ: A monkey hitting keys at random on a typewriter keyboard for an infiniteamount of time will almost surely type the complete text of “Os Lusıadas” 1.

To “prove” this statement we need to formulate it a bit more precisely. Thepossible texts typed by the monkey correspond to the sequences (xn)n∈N in

1Monumental epic poem by the 16th century Portuguese poet Luis de Camoes.

4.2. EXAMPLES 111

the (finite) set X of all the characters on the keyboard: letters, digits, space,punctuation signs, etc. Denote by σ : Σ → Σ the shift map in the space Σ = XN.It is assumed that each character ∗ ∈ X has a positive probability p∗ of beinghit at each time. This corresponds to the probability measure

ν =∑

∗∈X

p∗δ∗

on the set X . Furthermore, it is assumed that the character hit at each timeis independent of all the previous ones. This means that the distribution of thesequences of characters (xn)n is governed by the Bernoulli probability measureµ = νN.

The text of “Os Lusıadas” corresponds to a certain finite (albeit very long)sequence of characters (l0, . . . , lN). Consider the cylinder L = [0; l0, . . . , lN ].Then

µ(L) =

N∏

j=1

plj

is positive (although very small). A sequence (xn)n contains a complete copyof “Os Lusıadas” precisely if σk

((xn)n

)∈ L for some k ≥ 0. By the Birkhoff

ergodic theorem and the fact that (σ, µ) is ergodic, the set K of values of k forwhich that happens satisfies

limn

1

n#(K ∩ [0, n− 1]) = µ(L) > 0, (4.2.10)

with full probability. In particular, for almost all sequences (xn)n the setK is in-finite, which means that (xn)n contains infinitely many copies of “Os Lusıadas”.Actually, (4.2.10) yields an even stronger conclusion: still with full probability,the copies of our poem correspond to a positive (although small) fraction of allthe typed characters. In other words, on average, the monkey types a new copyof “Os Lusıadas” every so many (a great many) years.

4.2.4 Gauss map

As we have seen in Section 1.3.2, the gauss map G(x) = 1/x − [1/x] has aninvariant probability measure µ equivalent to the Lebesgue measure, namely:

µ(E) =1

log 2

∫

E

dx

1 + x. (4.2.11)

Proposition 4.2.10. The system (G,µ) is ergodic.

This can be proved using a more elaborate version of the method introducedin Section 4.2.2. We are going to outline the arguments in the proof, referring toSection 4.2.2 for those parts that are common to both situations and addressingin more detail the main new difficulty.

Let A be an invariant set with positive measure. We want to show thatµ(A) = 1. On the one hand, it remains true that for almost every point a ∈ [0, 1]


there exists a sequence of intervals Ik containing a and such that Gk maps Ikbijectively and differentiably onto (0, 1). Indeed, such intervals can be found asfollows. Firstly, consider

I(1,m) =( 1

m+ 1,

1

m

),

for each m ≥ 1. Next, define, by recurrence,

I(k,m1, . . . ,mk) = I(1,m1) ∩G−k+1(I(k − 1,m2, . . . ,mk)

)

for m1, . . . ,mk ≥ 1. Then, it suffices to take as Ik the interval I(k,m1, . . . ,mk)that contains a. This is well defined for every k ≥ 1 and every point a in thecomplement of a countable set, namely, the set ∪∞

k=0G−k(0, 1).

On the other hand, although the restriction ofGk to each Ik is a differentiablebijection, it is not affine. For that reason, the analogue of (4.2.4) can not holdin the present case. This difficulty is by-passed by the result that follows, whichis an example of distortion control: it is important to note that the constant Kis independent of Ik, E1, E2 and, most of all, k.

Proposition 4.2.11 (Bounded distortion). There exists K > 1 such that, givenany k ≥ 1 and any interval Ik such that Gk restricted to Ik is a differentiablebijection,

µ(Gk(E1))

µ(Gk(E2))≤ K

µ(E1)

µ(E2)

for any measurable subsets E1 and E2 of the interval Ik.

For the proof of this proposition we need the following two auxiliary results:

Lemma 4.2.12. For every x ∈ (0, 1] we have

|G′(x)| ≥ 1 and |(G2)′(x)| ≥ 2 and |G′′(x)/G′(x)2| ≤ 2.

Proof. Recall that G(x) = 1/x−m on each interval (1/(m+1), 1/m]. Therefore,

G′(x) = − 1

x2and G′′(x) =

2

x3.

The first identity implies that |G′(x)| ≥ 1 for every x ∈ (0, 1]. Moreover,|G′(x)| ≥ 2 whenever x ≤ 2/3. On the other hand, x ≥ 2/3 implies thatG(x) = 1/x − 1 < 2/3 and, consequently, G′(G(x)) ≥ 2. Combining theseobservations we find that |(G2)′(x)| = |G′(x)| |G′(G(x))| ≥ 2 for every x ∈ (0, 1].Finally, |G′′(x)/G′(x)2| = 2|x| ≤ 2 also for every x ∈ (0, 1].

Lemma 4.2.13. There exists C > 1 such that, given any k ≥ 1 and any intervalIk such that Gk restricted to Ik is a differentiable bijection,

|(Gk)′(x)||(Gk)′(y)| ≤ C for any x and y in Ik.

4.2. EXAMPLES 113

Proof. Let g be a local inverse of G, that is, a differentiable function defined onsome interval and such that G(g(z)) = z for every z in the domain of definition.Note that [

log |G′ g(z)|]′

=G′′(g(z)) g′(z)

G′(g(z))=G′′(g(z))

G′(g(z))2.

Therefore, the last estimate in Lemma 4.2.12 implies that∣∣[ log |G′ g(z)|

]′∣∣ ≤ 2 for every g and every z. (4.2.12)

In other words, every function of the form log |G′ g| admits 2 as a Lipschitzconstant. Observe also that if x, y ∈ Ik then

log|(Gk)′(x)||(Gk)′(y)| =

k−1∑

j=0

log |G′(Gj(x))| − log |G′(Gj(y))|

=

k∑

j=1

log |G′ gj(Gj(x))| − log |G′ gj(Gj(y))|

where gj denotes a local inverse of G defined on the interval [Gj(x), Gj(y)].Using the estimate (4.2.12), we get that

log|(Gk)′(x)||(Gk)′(y)| ≤ 2

k∑

j=1

|Gj(x) −Gj(y)| = 2k−1∑

i=0

|Gk−i(x) −Gk−i(y)|. (4.2.13)

Now, the first two estimates in Lemma 4.2.12 imply that

|Gk(x) −Gk(y)| ≥ 2[i/2]|Gk−i(x) −Gk−i(y)|for every i = 0, . . . , k. Replacing in (4.2.13), we conclude that

log|(Gk)′(x)||(Gk)′(y)| ≤ 2

k−1∑

i=0

2−[i/2]|Gk(x) −Gk(y)| ≤ 8|Gk(x) −Gk(y)| ≤ 8.

Now it suffices to take C = e8.

Proof of Proposition 4.2.11. Let m be the Lebesgue measure on [0, 1]. It followsfrom Lemma 4.2.13 that

m(Gk(E1))

m(Gk(E2))=

∫E1

|(Gk)′| dm∫E2

|(Gk)′| dm ≤ Cm(E1)

m(E2).

On the other hand, the definition (4.2.11) implies that

1

2 log 2m(E) ≤ µ(E) ≤ 1

log 2m(E),

for every measurable set E ⊂ [0, 1]. Combining these two relations, we find that

µ(Gk(E1))

µ(Gk(E2))≤ 2

m(Gk(E1))

m(Gk(E2))≤ 2C

m(E1)

m(E2)≤ 4C

µ(E1)

µ(E2).

Hence, it suffices to take K = 4C.


We are ready to conclude that (G,µ) is ergodic. Let A be an invariant setwith µ(A) > 0. Then A also has positive Lebesgue measure, since µ is absolutelycontinuous with respect to the Lebesgue measure. Let a be a density point ofA whose future trajectory is contained in the open interval (0, 1). Consider thesequence (Ik)k of the intervals I(k,m1, . . . ,mk) that contain a. It follows fromLemma 4.2.12 that

diam Ik ≤ sup 1

|(Gk)′(x)| : x ∈ Ik≤ 2−[k/2]

for every k ≥ 1. In particular, the diameter of Ik converges to zero and so

µ(Ik ∩A)

µ(Ik)→ 1 when k → ∞. (4.2.14)

Let us take E1 = Ik ∩Ac and E2 = Ik. By Proposition 4.2.11,

µ(Gk(Ik ∩Ac))µ(Gk(Ik))

≤ Kµ(Ik ∩Ac)µ(Ik)

.

Observe that Gk(Ik ∩ Ac) = Ac up to a zero measure set, because the set Ais assumed to be invariant. Recall also that Gk(Ik) = (0, 1), which has fullmeasure. Therefore, the previous inequality may be written as

µ(Ac) ≤ Kµ(Ik ∩Ac)µ(Ik)

.

According to (4.2.14), the expression on the right-hand side converges to zerowhen k → ∞. It follows that µ(Ac) = 0, as we wanted to prove.

4.2.5 Linear endomorphisms of the torus

Recall that we call torus of dimension d (or just d-torus) the quotient spaceTd = Rd/Zd, that is, the space of all equivalence classes of the equivalencerelation defined in Rd by x ∼ y ⇔ x − y ∈ Zd. This quotient inherits from Rd

the structure of a differentiable manifold of dimension d. In what follows weassume that Td is also endowed with the flat Riemannian metric, which makesit locally isometric to the Euclidean space Rd. Let m be the volume measureassociated with this Riemannian metric (see Appendix A.4.5).

Let A be a d-by-d matrix with integer coefficients and determinant differentfrom zero. Then A(Zd) ⊂ Zd and, consequently, A induces a transformation

fA : Td → Td, fA([x]) = [A(x)]

where [x] denotes the equivalence class that contains x ∈ Rd. These transfor-mations are called linear endomorphisms of the torus.

Note that fA is differentiable and the derivative DfA([x]) at each pointis canonically identified with A. In particular, the Jacobian detDfA([x]) isconstant equal to detA. It follows (Exercise 4.2.9) that the degree of f is equal

4.2. EXAMPLES 115

to | detA|. In particular, fA is invertible if and only if | detA| = 1. In thiscase, the inverse is the transformation fA−1 induced by the inverse matrix A−1;observe that A−1 is also a matrix with integer coefficients.

In any case, fA preserves the Lebesgue measure on Td. This may be seen asfollows. Since fA is a local diffeomorphism, the pre-image of any measurable setD with sufficiently small diameter consists of | detA| (= degree of fA) pairwisedisjoint sets Di, each of which is mapped diffeomorphically onto D. By theformula of change of variables, m(D) = | detA|m(Di) for every i. This provesthat m(D) = m(f−1

A (D)) for every measurable set D with small diameter.Hence, fA does preserve the Lebesgue measure m. Next we prove the followingfact:

Theorem 4.2.14. The system (fA,m) is ergodic if and only if no eigenvalueof the matrix A is a root of unity.

Proof. Suppose that no eigenvalue of A is a root of unity. Consider any functionϕ ∈ L2(m) and let

ϕ([x]) =∑

k∈Zd

cke2πi(k·x)

be its Fourier series expansion (with k · x = k1x1 + · · ·+ kdxd). The coefficientsck ∈ C satisfy ∑

k∈Zd

|ck|2 = ‖ϕ‖22 <∞. (4.2.15)

Then, the Fourier series expansion of ϕ fA is:

ϕ(fA([x])) =∑

k∈Zd

cke2πi(k·A(x)) =

∑

k∈Zd

cke2πi(A∗(k)·x),

where A∗ denotes the adjoint of A. Suppose that ϕ is an invariant function,that is, ϕ fA = ϕ at m-almost every point. Then, since the Fourier seriesexpansion is unique, we must have

cA∗(k) = ck for every k ∈ Z. (4.2.16)

We claim that the trajectory of every k 6= 0 under the transformation A∗ isinfinite. Indeed, if the trajectory of some k 6= 0 were finite then there wouldexist l, r ∈ Z with r > 0 such that A(l+r)∗(k) = Al∗(k). This could only happenif A∗ had some eigenvalue λ such that λr = 1. Since A and A∗ have the sameeigenvalues, that would mean that A has some eigenvalue which is a root ofunity, which is excluded by the hypothesis. Hence, the trajectory of every k 6= 0is infinite, as claimed. Then the identity (4.2.16), together with (4.2.15), impliesthat ck = 0 for every k 6= 0. Thus, ϕ = c0 at m-almost every point. This provesthat the system (fA,m) is ergodic.

To prove the converse, assume that A admits some eigenvalue which is aroot of unity. Then the same holds for A∗ and, hence, there exists r > 0 suchthat 1 is eigenvalue of Ar∗. Since Ar∗ has integer coefficients, it follows (see


Exercise 4.2.8) that there exists some k ∈ Zd \ 0 such that Ar∗(k) = k. Fix kand consider the function ϕ ∈ L2(m) defined by

ϕ([x]) =

r−1∑

i=0

e2πi(Ai∗(k)·x) =

r−1∑

i=0

e2πi(k·Ai(x))

Then ϕ is an invariant function for fA and it is not constant at m-almost everypoint. Hence, (fA,m) is not ergodic.

4.2.6 Hopf argument

In this section we present an alternative, more geometric, method to prove theergodicity of certain linear endomorphisms of the torus. This is based on anargument introduced by Eberhard F. Hopf in his pioneering work [Hop39] onthe ergodicity of geodesic flows on surfaces with negative Gaussian curvature.

In the present linear context, the Hopf argument may be used whenever| detA| = 1 and the matrix A is hyperbolic, that is, A has no eigenvalues in theunit circle. But its strongest point is that it may be extended to much moregeneral differentiable systems, not necessarily linear. Some of these extensionsare mentioned in Section 4.4.

The hypothesis that the matrix A is hyperbolic means that the space Rd

may be written as a direct sum Rd = Es ⊕ Eu such that:

1. A(Es) = Es and all the eigenvalues of A | Es have absolute value smallerthan 1;

2. A(Eu) = Eu and all the eigenvalues of A | Eu have absolute value biggerthan 1.

Then there exist constants C > 0 and λ < 1 such that

‖An(vs)‖ ≤ Cλn‖vs‖ for every vs ∈ Es and every n ≥ 0,

‖A−n(vu)‖ ≤ Cλn‖vu‖ for every vu ∈ Eu and every n ≥ 0.(4.2.17)

Example 4.2.15. Consider A =

(2 11 1

). The eigenvalues of A are

λu =3 +

√5

2> 1 > λs =

3 −√

5

2> 0

and the corresponding eigenspaces are:

Eu = (x, y) ∈ R2 : y =

√5 − 1

2x and Es = (x, y) ∈ R2 : y = −

√5 + 1

2x.

The family of all affine subspaces of Rd of the form v + Es, with v ∈ Rd,defines a partition Fs of Rd that we call stable foliation and whose elements wecall stable leaves of A. This partition is invariant under A, meaning that the

4.2. EXAMPLES 117

image of any stable leaf is still a stable leaf. Moreover, by (4.2.17), the trans-formation A contracts distances uniformly inside each stable leaf. Analogously,the family of all affine subspaces of Rd of the form v + Eu with v ∈ Rd definesthe unstable foliation Fu of Rd, whose elements are called unstable leaves. Theunstable foliation is also invariant and the transformation A expands distancesuniformly inside unstable leaves.

Ws(x)

Wu(x)

x

Figure 4.1: Stable foliation and unstable foliation in the torus

Mapping Fs and Fu by the canonical projection π : Rd → Td, we obtainfoliations Ws and Wu of the torus that we call stable foliation and unstablefoliation of the transformation fA. See Figure 4.1. The previous observationsshow that these foliations are invariant under fA. Moreover:

(a) d(f jA(x), f jA(y)) → 0 when j → +∞, for any points x and y in the samestable leaf;

(b) d(f jA(y), f jA(z)) → 0 when j → −∞, for any points y and z in the sameunstable leaf.

We are going to use this geometric information to prove that (fA,m) isergodic. To that end, let ϕ : Td → R be any continuous function and considerthe time averages

ϕ+(x) = limn

1

n

n−1∑

j=0

ϕ(f jA(x)) and ϕ−(x) = limn

1

n

n−1∑

j=0

ϕ(f−jA (x)),

which are defined for m-almost every x ∈ Td. By Corollary 3.2.8, there exists afull measure set X ⊂ Td such that

ϕ+(x) = ϕ−(x) for every x ∈ X . (4.2.18)

Let us denote by Ws(x) and Wu(x), respectively, the stable leaf and theunstable leaf of fA through each point x ∈ Td.

Lemma 4.2.16. The function ϕ+ is constant on each leaf of Ws: if ϕ+(x)exists and y ∈ Ws(x) then ϕ+(y) exists and it is equal to ϕ+(x). Analogously,ϕ− is constant on each leaf of Wu.


Proof. According to property (a) above, d(f jA(x), f jA(y)) converges to zero whenj → ∞. Noting that ϕ is uniformly continuous (because its domain is compact),it follows that

ϕ(f jA(x)) − ϕ(f jA(y)) → 0 when j → ∞.

In particular, the Cesaro limit

limn

1

n

n−1∑

j=0

ϕ(f jA(x)) − ϕ(f jA(y))

is also zero. That implies that ϕ+(y) exists and is equal to ϕ+(x). The argumentfor ϕ− is entirely analogous.

Given any open subset R of the torus and any x ∈ R, denote by Ws(x,R)the connected component of Ws(x) ∩ R that contains x and by Wu(x,R) theconnected component of Wu(x) ∩ R that contains x. We call R a rectangle ifWs(x,R) intersects Wu(y,R) at a unique point, for every x and y in R. SeeFigure 4.2.

Lemma 4.2.17. Given any rectangle R ⊂ Td, there exists a measurable setYR ⊂ X ∩ R such that m(R \ YR) = 0 and, given any x and y in YR, thereexist points x′ and y′ in X ∩R such that x′ ∈ Ws(x,R) and y′ ∈ Ws(y,R) andy′ ∈ Wu(x′).

Ws(x)

Ws(y)

x

y

R

x′

y′

Figure 4.2: Rectangle in Td

Proof. Let us denote by msx the Lebesgue measure on the stable leaf Ws(x) of

each point x ∈ Td. Note that m(R \X) = 0, since X has full measure in Td.Then, by the theorem of Fubini,

msx

(Ws(x,R) \X

)= 0 for m-almost every x ∈ R.

Define YR =x ∈ X ∩R : ms

x

(Ws(x,R) \X

)= 0. Then YR has full measure

in R. Given x, y ∈ R, consider the map π : Ws(x,R) → Ws(y,R) defined by

π(x′) = intersection between Wu(x′, R) and Ws(y,R).

4.2. EXAMPLES 119

This maps is affine and, consequently, it has the following property, called ab-solute continuity:

msx(E) = 0 ⇔ ms

y(π(E)) = 0.

In particular, the image of Ws(x,R) ∩ X has full measure in Ws(y,R) and,consequently, it intersects Ws(y,R) ∩ X . So, there exists x′ ∈ Ws(x,R) ∩ Xwhose image y′ = π(x′) is in Ws(y,R)∩X . Observing that x′ and y′ are in thesame unstable leaf, by the definition of π, we see that these points satisfy theconditions in the conclusion of the lemma.

Consider any rectangle R. Given any x, y in YR, consider the points x′, y′ inX given by Lemma 4.2.17. Using Lemma 4.2.16 as well, we obtain

ϕ−(x) = ϕ+(x) = ϕ+(x′) = ϕ−(x′) = ϕ−(y′) = ϕ+(y′) = ϕ+(y) = ϕ−(y).

This shows that the functions ϕ+ and ϕ− coincide with one another and areconstant in YR.

Now let R1, . . . , RN be a finite cover of the torus by rectangles. Considerthe set

Y =

N⋃

j=1

Yj , where Yj = YRj .

Observe that m(Y ) = 1, since Y ∩ Rj ⊃ Yj has full measure in Rj for everyj. We claim that ϕ+ = ϕ− is constant on the whole Y . Indeed, given anyk, l ∈ 1, . . . , N we may find j0 = k, j1, . . . , jn−1, jn = l such that each Rjiintersects Rji−1 (that is just because the torus is path-connected). Recallingthat Rj is an open set and Yj is a full measure subset, we get that each Yjiintersects Yji−1 . Then, ϕ+ = ϕ− is constant on the union of all the Yji . Thisproves our claim.

In this way, we have shown that the time averages ϕ± of any continuous func-tion ϕ are constant at m-almost every point. Consequently (see Exercise 4.1.4),the system (fA,m) is ergodic.

4.2.7 Exercises

4.2.1. Prove Proposition 4.2.2.


4.2.3. Let I = [0, 1] and f : I → I be the function defined by

f(x) =

2x if 0 ≤ x < 1/32x− 2/3 if 1/3 ≤ x < 1/22x− 1/3 if 1/2 ≤ x < 2/32x− 1 if 2/3 ≤ x ≤ 1.

Show that f is ergodic with respect to the Lebesgue measure m.


4.2.4. Let X be a finite set and Σ = XN. Prove that every infinite compactsubset of Σ invariant under the shift map σ : Σ → Σ contains some non-periodicpoint.

4.2.5. Let X be a topological space, endowed with the corresponding Borelσ-algebra C, and let Σ = XN. Show that if X has a countable basis of opensets then the Borel σ-algebra of Σ (for the product topology) coincides with theproduct σ-algebra B = CN. The same is true for Σ = XZ and B = CZ.

4.2.6. In this exercise we propose an alternative proof of Proposition 4.2.1.Assume that θ is irrational. Let A be an invariant set with positive measure.Recalling that the orbit Rnθ (a) : n ∈ Z of every a ∈ S1 is dense in S1, showthat no point of S1 is a density point of Ac. Conclude that µ(A) = 1.

4.2.7. Assume that θ is irrational. Let ϕ : S1 → R be any continuous function.Show that

ϕ(x) = limn→∞

1

n

n−1∑

j=0

ϕ(Rjθ(x)) (4.2.19)

exists at every point and, in fact, the limit is uniform. Deduce that ϕ is constantat every point. Conclude that Rθ has a unique invariant probability measure.

4.2.8. Let A be a square matrix of dimension d with rational coefficients andlet λ be a rational eigenvalue of A. Show that there exists some eigenvectorwith integer coefficients, that is, some k ∈ Zd \ 0 such that Ak = λk.

4.2.9. Show that if f : M → M is a local diffeomorphism on a compact Rie-mannian manifold then

degreef =

∫| detDf | dm,

where m denotes the volume measure induced by the Riemannian metric of M ,normalized in such a way thatm(M) = 1. In particular, for any square matrix Aof dimension d with integer coefficients, the degree of the linear endomorphismfA : Td → Td is equal to | detA|.

4.2.10. A number x ∈ (0, 1) has continued fraction expansion of bounded typeif the sequence (an)n constructed in Section 1.3.2 is bounded. Prove that theset L ⊂ (0, 1) of points with continued fraction expansion of bounded type hasLebesgue measure zero.

4.2.11. Let f : M → M be a measurable transformation, µ be an ergodicinvariant measure and ϕ : M → R be a measurable function with

∫ϕdµ = +∞.

Prove that limn(1/n)∑n−1j=0 ϕ(f j(x)) = +∞ for µ-almost every x ∈M .

4.2.12. Observe that the number b in Exercise 3.2.4 is independent of x in aset with full Lebesgue measure. Prove that the arithmetic mean of the numbersa1, . . . , an, . . . goes to infinity: limn(1/n)(a1 + · · · + an) = +∞.

4.3. PROPERTIES OF ERGODIC MEASURES 121

4.3 Properties of ergodic measures

In this section we take the transformation f : M → M to be fixed and weanalyze the set Me(f) of probability measures that are ergodic with respect tof as a subset of the space M1(f) of all probability measures invariant under f .

Recall that a measure ν is said to be absolutely continuous with respect toanother measure µ if µ(E) = 0 implies ν(E) = 0. Then we write ν ≪ µ. Thisrelation is transitive: if ν ≪ µ and µ ≪ λ then ν ≪ λ. The first result assertsthat the ergodic probability measures are minimal for this order relation:

Lemma 4.3.1. If µ and ν are invariant probability measures such that µ isergodic and ν is absolutely continuous with respect to µ then µ = ν.

Proof. Let ϕ : M → R be any bounded measurable function. Since µ is invariantand ergodic, the time average

ϕ(x) = limn→∞

1

n

n−1∑

j=0

ϕ(f j(x))

is constant: ϕ(x) =∫ϕdµ at µ-almost every point. Since ν ≪ µ, it follows that

the equality also holds at ν-almost every point. In particular,∫ϕdν =

∫ϕ dν =

∫ϕdµ

(the first equality is part of the Birkhoff ergodic theorem). Therefore, the inte-grals of each bounded measurable function ϕ with respect to µ and with respectto ν coincide. In particular, considering characteristic functions, we concludethat µ = ν.

It is clear that if µ1 and µ2 are probability measures invariant under thetransformation f then so is (1 − t)µ1 + tµ2, for any t ∈ (0, 1). This means thatthe space M1(f) of all probability measures invariant under f is convex. Thenext proposition asserts that the ergodic probability measures are the extremalelements of this convex set:

Proposition 4.3.2. An invariant probability measure µ is ergodic if and onlyif it is not possible to write it as µ = (1 − t)µ1 + tµ2 with t ∈ (0, 1) andµ1, µ2 ∈ M1(f) with µ1 6= µ2.

Proof. To prove the “if” claim, assume that µ is not ergodic. Then there existssome invariant set A with 0 < µ(A) < 1. Define µ1 and µ2 to be the normalizedrestriction of µ to the set A and to its complement Ac, respectively:

µ1(E) =µ(E ∩A)

µ(A)and µ2(E) =

µ(E ∩Ac)µ(Ac)

.

Since A and Ac are invariant sets and µ is an invariant measure, both µ1 andµ2 are still invariant probability measures. Moreover,

µ = µ(A)µ1 + µ(Ac)µ2


and, consequently, µ is not extremal.To prove the converse, assume that µ is ergodic and µ = (1 − t)µ1 + tµ2 for

some t ∈ (0, 1). It is clear that µ(E) = 0 implies µ1(E) = µ2(E) = 0, that is,µ1 and µ2 are absolutely continuous with respect to µ. Hence, by Lemma 4.3.1,µ1 = µ = µ2. This shows that µ is extremal.

Let us also point out that distinct ergodic measures “live” in disjoint subsetsof the space M (see also Exercise 4.3.6):

Lemma 4.3.3. Assume that the σ-algebra of M admits some countable gen-erating subset Γ. Let µi : i ∈ I be an arbitrary family of ergodic probabilitymeasures, all distinct. Then these measures µi are mutually singular: there ex-ist pairwise disjoint measurable subsets Pi : i ∈ I invariant under f and suchthat µi(Pi) = 1 for every i ∈ I.

Proof. Let A be the algebra generated by Γ. Note that A is countable, since itcoincides with the union of the (finite) algebras generated by the finite subsetsof Γ. For each i ∈ I, define

Pi =⋂

A∈A

x ∈M : τ(A, x) = µi(A).

Since µi is ergodic, x ∈ M : τ(A, x) = µi(A) has full measure for each A.Using that A is countable, it follows that µi(Pi) = 1 for every i ∈ I. Moreover,if there exists x ∈ Pi ∩ Pj then µi(A) = τ(A, x) = µj(A) for every A ∈ A. Inother words, µi = µi. This proves that the Pi are pairwise disjoint.

Now assume that f : M → M is a continuous transformation in a topologi-cal space M . We say that f is transitive if there exists some x ∈ M such thatfn(x) : n ∈ N is dense in M . The next lemma provides a useful characteri-zation of transitivity. Recall that a topological space M is called a Baire spaceif the intersection of any countable family of open dense subsets is dense in M .Every complete metric space is a Baire space and the same is true for everylocally compact topological space (see [Dug66]).

Lemma 4.3.4. Let M be a Baire space with a countable basis of open sets.Then f : M → M is transitive if and only if for every pair of open sets U andV there exists k ≥ 1 such that f−k(U) intersects V .

Proof. Assume that f is transitive and let x ∈M be a point whose orbit fn(x) :n ∈ N is dense in M . Then there exists m ≥ 1 such that fm(x) ∈ V and (usingthe fact that fn(x) : n > m is also dense) there exists n > m such thatfn(x) ∈ U . Take k = n−m. Then fm(x) ∈ f−k(U)∩V . This proves the “onlyif” part of the statement.

To prove the converse, let Uj : j ∈ N be a countable basis of open subsetsof M . The hypothesis ensures that the open set ∪∞

k=1f−k(Uj) is dense in M for

every j ∈ N. Then the intersection

X =∞⋂

j=1

∞⋃

k=1

f−k(Uj)

4.3. PROPERTIES OF ERGODIC MEASURES 123

is a dense subset of M . In particular, it is non-empty. On the other hand,by definition, if x ∈ X then for every j ∈ N there exists k ≥ 1 such thatfk(x) ∈ Uj. Since the Uj constitute a basis of open subsets of M , this meansthat fk(x) : k ∈ N is dense in M .

Proposition 4.3.5. Let M be a Baire space with a countable basis of open sets.If µ is an ergodic probability measure then the restriction of f to the support ofµ is transitive.

Proof. Start by noting that suppµ has a countable basis of open sets, becauseit is a subspace of M , and it is a Baire space, since it is closed in M . LetU and V be open subsets of suppµ. By the definition of support, µ(U) > 0and µ(V ) > 0. Define B = ∪∞

k=1f−k(U). Then µ(B) > 0, because B ⊃ U ,

and f−1(B) ⊂ B. By ergodicity (see Exercise 1.1.4) it follows that µ(B) = 1.Then B must intersect V . This proves that there exists k ≥ 1 such that f−k(U)intersects V . By Lemma 4.3.4, it follows that the restriction f : suppµ→ suppµis transitive.

4.3.1 Exercises

4.3.1. Let M be a topological space M with countable basis of open sets,f : M → M be a measurable transformation and µ be an ergodic probabilitymeasure. Show that the orbit fn(x) : n ≥ 0 of µ-almost every point x ∈M isdense in the support of µ.

4.3.2. Let f : M → M be a continuous transformation in a compact metricspace. Given a function ϕ : M → R, prove that there exists an invariantprobability measure µϕ such that

∫ϕdµϕ = sup

η∈M1(f)

∫ϕdη.

4.3.3. Let g : E → E be a transformation induced by f : M → M , that is, atransformation of the form g(x) = fρ(x)(x) with ρ : E → N (see Section 1.4.2).Let ν be an invariant probability measure of g and νg be the invariant measureof f defined by (1.4.5). Assume that νρ(M) < ∞ and denote µ = νρ/νρ(M).Show that (f, µ) is ergodic if and only if (g, ν) is ergodic.

4.3.4. Let f : M →M be a continuous transformation in a separable completemetric space. Given any invariant probability measure µ, let µ be its lift to thenatural extension f : M → M (see Section 2.4.2). Show that (f , µ) is ergodic ifand only if (f, µ) is ergodic.

4.3.5. Let f : M → M be a measurable transformation and µ be an invariantmeasure. Let gt : N → N , t ∈ R be a suspension flow of f and ν be thecorresponding suspension of the measure µ (see Section 3.4.1). Assume thatν(N) <∞ and denote ν = ν/ν(N). Show that ν is ergodic for the flow (gt)t ifand only if µ is ergodic for f .


4.3.6. Show that for finite or countable families of ergodic measures the con-clusion of Lemma 4.3.3 holds even if the σ-algebra is not countably generated.

4.3.7. Give an example of a metric space M and a transformation f : M → Msuch that there exists a sequence of ergodic Borel measures µn converging, inthe weak∗ topology, to a non-ergodic invariant measure µ.

4.3.8. Let M be a metric space, f : M → M be a continuous transformationand µ be an ergodic probability measure. Show that

1

n

n−1∑

j=0

f j∗ν converges to µ in the weak∗ topology

for any probability measure ν on M absolutely continuous with respect to µ,but not necessarily invariant,

4.3.9. Let X = 1, . . . , d and σ : Σ → Σ be the shift map in Σ = XN orΣ = XZ.

(1) Show that for every δ > 0 there exists k ≥ 1 such that, given x1, . . . , xs ∈ Σand m1, . . . ,ms ≥ 1, there exists a periodic point y ∈ Σ with period nsand such that d(f j+ni (y), f j(xi)) < δ for every 0 ≤ j < mi, where n1 = 0and ni = (m1 + k) + · · · + (mi−1 + k) for 1 < i ≤ s.

(2) Let ϕ : Σ → R be a continuous function and ϕ be its Birkhoff average.Show that, given ε > 0, points x1, . . . , xs ∈ Σ where the Birkhoff averageof ϕ is well defined, and numbers α1, . . . , αs > 0 such that

∑i α

i = 1,there exists a periodic point y ∈ Σ satisfying |ϕ(y) −∑i α

iϕ(xi)| < ε.

(3) Conclude that the set Me(σ) of ergodic probability measures is dense inthe space M1(σ) of all invariant probability measures.

4.4 Comments in Conservative Dynamics

The ergodic theorem of Birkhoff, proven in the 1930’s, provided a solid mathe-matical foundation to the statement of the Boltzmann ergodic hypothesis, butleft entirely open the question of its veracity. In this section we briefly surveythe main results obtained since then, in the context of conservative systems,that is, dynamical systems that preserve a volume measure on a manifold.

Let us start by mentioning that, in a certain abstract sense, the majorityof conservative systems are ergodic. That is the sense of the theorem that westate next, which was proven in the early 1940’s by John Oxtoby and StanislavUlam [OU41]. Recall that a subset of a Baire space is called residual if it may bewritten as a countable intersection of open and dense subsets. By the definitionof Baire space, every residual subset is dense.

Theorem 4.4.1 (Oxtoby, Ulam). For every compact Riemannian manifold Mthere exists a residual subset R of the space Homeovol(M) of all conservativehomeomorphisms of M such that every element of R is ergodic.

4.4. COMMENTS IN CONSERVATIVE DYNAMICS 125

The results presented in the sequel imply that the conclusion of this theoremis no longer true when one replaces Homeovol(M) by the space Diffeokvol(M) ofconservative diffeomorphisms of class Ck, at least for k > 3. Essentially nothingis know in this regard in the cases k = 2 and k = 3. On the other hand,Artur Avila, Sylvain Crovisier and Amie Wilkinson have recently announced aC1 version of the previous theorem: for every compact Riemannian manifoldM , there exists a residual subset R of the space Diffeo1

vol(M) of conservativediffeomorphisms of class C1 such that every f ∈ R with positive entropy hvol(f)is ergodic. The notion of entropy will be studied in Chapter 9.

4.4.1 Hamiltonian systems

The systems that interested Boltzmann, relative to the motion of gas molecules,may, in principle, be described by the laws of Newtonian Classical Mechanics.In the so-called Hamiltonian formalism of Classical Mechanics, the states of thesystem are represented by “generalized coordinates” q1, . . . , qd and “generalizedmomenta” p1, . . . , pd and the system’s evolution is described by the solutions ofthe Hamilton-Jacobi equations:

dqjdt

=∂H

∂pjand

dpjdt

= −∂H∂qj

, j = 1, . . . , d, (4.4.1)

where H (the total energy of the system) is a C2 function of the variablesq = (q1, . . . , qd) and p = (p1, . . . , pd); the integer d ≥ 1 is the number of degreesof freedom.

Example 4.4.2 (Harmonic pendulum). Let d = 1 and H(q, p) = p2/2−g cos q,where g is a positive constant and (q, p) ∈ R2. The Hamilton-Jacobi equations

dq

dt= p and

dp

dt= −g sin q

describe the motion of a pendulum subject to a constant gravitational field:the coordinate q measures the angle with respect to the position of (stable)equilibrium and p measures the angular momentum. Then p2/2 is the kineticenergy and −g cos q is the potential energy. Thus, the Hamiltonian H is thetotal energy.

Note that H is always a first integral of the system, that is, it is constantalong the flow trajectories:

dH

dt=

d∑

j=1

∂H

∂qj

dqjdt

+∂H

∂pj

dpjdt

≡ 0.

Thus, we may consider the restriction of the flow to each energy hypersurfaceHc = (q, p) : H(q, p) = c. The volume measure dq1 · · · dqddp1 · · ·dpd is calledLiouville measure. Observe that the divergence of the vector field

F =(− ∂H

∂p1, . . . ,−∂H

∂pd,∂H

∂q1, . . . ,

∂H

∂qd

)


is identically zero. Thus (recall Section 1.3.6), the Liouville measure is invariantunder the Hamiltonian flow. It follows (see Exercise 1.3.12) that the restrictionof the flow to each energy hypersurface Hc admits an invariant measure µc thatis given by

µc(E) =

∫

E

ds

‖ gradH‖ for every measurable set E ⊂ Hc

where ds denotes the volume element on the hypersurface. Then, the ergodichypothesis may be viewed as claiming that, in general, Hamiltonian systemsare ergodic with respect to this invariant measure µc on (almost) every energyhypersurface.

The first important result in this context was announced by Andrey Kol-mogorov at the International Congress of Mathematicians ICM 1954 and wassubstantiated, short afterwards, by the works of Vladimir Arnold and JurgenMoser. This led to deep theory of so-called almost integrable systems that isknown as KAM theory, in homage to its founders, and to which contributedin a decisive manner several other mathematicians, such as Helmut Russmann,Michael Herman, Eduard Zehnder, Jean-Christophe Yoccoz and Jurgen Poschel,among others. Let us explain what is meant by “almost integrable”.

A Hamiltonian system with d degrees of freedom is said to be integrable (inthe sense of Liouville) if it admits d first integrals I1, . . . , Id:

• independent: that is, such that the gradients

grad Ij =

(∂Ij∂q1

,∂Ij∂p1

, . . . ,∂Ij∂qd

,∂Ij∂pd

), 1 ≤ j ≤ d,

are linearly independent at every point on an open and dense subset ofthe domain;

• in involution: that is, such that the Poisson brackets

Ij , Ik =

d∑

i=1

[∂Ij∂qi

∂Ik∂pi

− ∂Ij∂pi

∂Ik∂qi

].

are all identically zero.

It follows from the previous remarks that every system with d = 1 degree offreedom is integrable: the Hamiltonian H itself is a first integral. Anotherimportant example:

Example 4.4.3. For any number d ≥ 1 of degrees of freedom, assume thatthe Hamiltonian H depends only on the variables p = (p1, . . . , pd). Then theHamilton-Jacobi equations (4.4.1) reduce to

dqjdt

=∂H

∂pj(p) and

dpjdt

= −∂H∂qj

(p) = 0.


The second equation means that each pj is a first integral; it is easy to see thatthe first integrals are independent and in involution. Then the expression on theright-hand side of the first equation is independent of time. Hence, the solutionis given by

qj(t) = qj(0) +∂H

∂pi(p(0)) t.

As we are going to comment in the sequel, this example is totally typical ofintegrable systems.

A classical theorem of Liouville asserts that if the system is integrable thenthe Hamilton-Jacobi equations may be solved completely by quadratures. Inthe proof (see the book of Arnold [Arn78]) one constructs certain functionsϕ = (ϕ1, . . . , ϕd) with values in Td which, together with the first integralsI = (I1, . . . , Id) ∈ Rd, constitute canonical coordinates of the system (they arecalled action-angle coordinates). What we mean by “canonical” is that thecoordinate change

Ψ : (q, p) 7→ (ϕ, I)

preserves the form of the Hamilton-Jacobi equations: (4.4.1) becomes

dϕjdt

=∂H ′

∂Ijand

dIjdt

= −∂H′

∂ϕj, (4.4.2)

whereH ′ = HΨ−1 is the expression of the Hamiltonian in the new coordinates.Since the Ij are first integrals, the second equation yields

0 =dIjdt

= −∂H′

∂ϕj.

This means thatH does not depend on the variables ϕj and so we are in the typeof situation described in Example 4.4.3. Each trajectory of the Hamiltonian flowis constrained inside a torus I = const and, according to the first equation in(4.4.2), it is linear in the coordinate ϕ:

ϕj(t) = ϕj(0) + ωj(I)t, where ωj(I) =∂H ′

∂Ij(I).

In terms of the original coordinates (q, p), we conclude that the trajectories ofthe Hamiltonian flow are given by

t 7→ Ψ−1(ϕ(0) + ω(I)t, I) = Φϕ(0),I(ω(I)t) (4.4.3)

where Φϕ(0),I : Rd →M is a Zd-periodic function and ω(I) = (ω1(I), . . . , ωd(I))is called frequency vector. We say that the trajectory is quasi-periodic.

4.4.2 Kolmogorov-Arnold-Moser theory

It is clear that integrable systems are never ergodic. However, since integra-bility is a very rare property, this alone would not be an obstruction to most


Hamiltonian systems being ergodic. Nevertheless, the fundamental result thatwe state next asserts that generic integrable systems are robustly non-ergodic:every nearby Hamiltonian flow is also non-ergodic.

Let H0 be an integrable Hamiltonian, written in action-angle coordinates(ϕ, I). More precisely, let Bd be a ball in Rd and assume that H0(ϕ, I) isdefined for every (ϕ, I) ∈ Td × Bd but depends only on the coordinate I. Wecall H0 non-degenerate if its Hessian matrix is invertible:

det

(∂2H0

∂Ii∂Ij

)

i,j

6= 0 at every point. (4.4.4)

Observe that the Hessian matrix of H0 coincides with the Jacobian matrix ofthe function I 7→ ω(I). Therefore, the twist condition (4.4.4) means that themap assigning to each value of I the corresponding frequency vector ω(I) is alocal diffeomorphism.

The next theorem means that, under this condition, most of the invarianttori of the Hamiltonian flow of H0 persist for any nearby system:

Theorem 4.4.4. Let H0 be an integrable non-degenerate Hamiltonian of classC∞. Then there exists a neighborhood V of H0 in the space C∞(Td × Bd,R)such that for every H ∈ V there exists a compact set K ⊂ Td ×Bd satisfying:

(a) K is a union of differentiable tori of the form (ϕ, u(ϕ)) : ϕ ∈ Td eachof which is invariant under the Hamiltonian flow of H;

(b) the restriction of the Hamiltonian flow of H to each of these tori is con-jugate to a linear flow on Td;

(c) the set K has positive volume and, in fact, the volume of its complementgoes to zero when H → H0.

In particular, the Hamiltonian flow of H can not be ergodic.

The latter is because the set K may be decomposed into positive volumesubsets that are also unions of invariant tori and, thus, are invariant. The proofof the theorem shows that the persistence or not of a given invariant torus of H0

is intimately related to the arithmetic properties of the corresponding frequencyvector. Let us explain this.

Given c > 0 and τ > 0, we say that a vector ω0 ∈ Rd is (c, τ)-Diophantine if

|k · ω0| ≥c

‖k‖τ for every k ∈ Zd, (4.4.5)

where ‖k‖ = |k1| + · · · |kd|. Diophantine vectors are rationally independent; infact, the condition (4.4.5) means that ω0 is badly approximated by rationallydependent vectors. We say that ω0 is τ -Diophantine if it is (c, τ)-Diophantinefor some c > 0. The set of τ -Diophantine vectors is non-empty if and only ifτ ≥ d− 1; moreover, it has full measure in Rd if τ is strictly larger than d − 1(see Exercise 4.4.1).


While proving Theorem 4.4.4, it is shown that, given c > 0 and τ ≥ d−1 andany compact set Ω ⊂ ω(Bd), one can find a neighborhood V of H0 such that,for every H ∈ V and every (c, τ)-Diophantine vector ω0 ∈ Ω, the Hamiltonianflow of H admits a differentiable invariant torus restricted to which the flow isconjugate to the linear flow t 7→ ϕ(t) = ϕ(0) + tω0.

Next, we discuss a version of Theorem 4.4.4 for discrete time systems or,more precisely, symplectic transformations. We call symplectic manifold (seeArnold [Arn78, Chapter 8]) any differentiable manifold M endowed with asymplectic form, that is, a non-degenerate differential 2-form θ. By “non-degenerate” we mean that for every x ∈ M and every u 6= 0 there exists vsuch that θx(u, v) 6= 0. Existence of a symplectic form on M implies that thedimension is even: write dimM = 2d. Moreover, the d-th power θd = θ∧· · · ∧θis a volume form on M .

A differentiable transformation f : M → M is said to be symplectic if itpreserves the symplectic form, meaning that θx(u, v) = θf(x)(Df(x)u,Df(x)v)for every x ∈ M and any u, v ∈ TxM . Then, in particular, f preserves thevolume form θd.

Example 4.4.5. Let M = R2d, with coordinates (q1, . . . , qd, p1, . . . , pd), andlet θ be the differential 2-form defined by

θx = dq1 ∧ dp1 + · · · + dqd ∧ dpd (4.4.6)

for every x. Then θ is a symplectic form on M . Actually, a classical theoremof Darboux states that for every symplectic form there exists some atlas of themanifold such that the expression of the symplectic form in any local chart isof the type (4.4.6). Consider any transformation of the form

f0(q1, . . . , qd, p1, . . . , pd) = (q1 + ω1(p), . . . , qd + ωd(p), p1, . . . , pd).

Using

Df0 ·∂

∂qj=∂

∂qjand Df0 ·

∂

∂pj=∂

∂pj+∑

i

∂ωi∂pj

∂

∂qi,

we see that f0 is symplectic with respect to the form θ.

Example 4.4.6 (Cotangent bundle). Let M be a manifold of class Cr withr ≥ 3. By definition, the cotangent space TqM at each point q ∈M is the dualof the tangent space TqM and the cotangent bundle of M is the disjoint unionT ∗M = ∪q∈MT ∗

qM of all cotangent spaces. See Appendix A.4.3. The cotangentbundle is a manifold of class Cr−1 and the canonical projection π∗ : T ∗M →Mmapping each T ∗

qM to the corresponding base point q is a map of class Cr−1.A very important feature of the cotangent bundle is that it always admits a

canonical symplectic form, that is, one that depends only on the manifold M .That can be seen as follows. Let α be the differential 1-form on T ∗M definedby

α(q,p) : T(q,p)(T∗M) → R, α(q,p) = p Dπ∗(q, p)


for each (q, p) ∈ T ∗M . It is clear that α is well defined and of class Cr−2. Con-sider the exterior derivative θ∗ = dα. One can check (for instance, using localcoordinates) that θ∗ is non-degenerate at every point and, thus, is a symplecticform in T ∗M .

There is no corresponding statement for the tangent bundle TM . However,once we fix a Riemannian metric onM it is possible to endow the tangent bundlewith a (non-canonical) symplectic form:

Example 4.4.7 (Tangent bundle). LetM be a Riemannian manifold of class Cr

with r ≥ 3. Then we may identify the tangent bundle TM with the cotangentbundle T ∗M through the map Ξ : TM → T ∗M that maps each point (q, v) withv ∈ TqM to the point (q, p) with p ∈ T ∗

qM defined by

p : TqM → R, p(w) = v ·q w.

Indeed, Ξ is a diffeomorphism and it maps fibers of TM to fibers of T ∗M , pre-serving the base point. In particular, we may use Ξ to transport the symplecticform θ∗ in Example 4.4.6 to a symplectic form θ in TM :

θ(q,v)(w1, w2

)= θ∗Ξ(q,v)

(DΞ(q, v)w1, DΞ(q, v)w2

)

for any w1, , w2 ∈ T(q,v)(TM). It is clear from the construction that, unlike θ∗,this form θ depends on the Riemannian metric in M .

By analogy with the case of flows, we call a transformation f0 integrable ifthere exist coordinates (q, p) ∈ Td × Bd such that f0(q, p) = (q + ω(p), p) forevery (q, p). Moreover, we say that f0 is non-degenerate if

the map p 7→ ω(p) is a local diffeomorphism. (4.4.7)

Theorem 4.4.8. Let f0 be a non-degenerate integrable transformation of classC∞. Then there exists a neighborhood V of f0 in the space C∞(Td × Bd,Rd)such that for every symplectic transformation2 f ∈ V there exists a compact setK ⊂ Td ×Bd satisfying:

(a) K is a union of differentiable tori of the form (q, u(q)) : q ∈ Td, each ofwhich is invariant under f ;

(b) the restriction of the transformation f to each of these tori is conjugateto a translation on Td;

(c) the set K has positive volume and, in fact, the volume of the complementconverges to zero when f → f0.

In particular, the transformation f can not be ergodic.

2Relative to the canonical symplectic form (4.4.6).


Just as in the previous (continuous time) situation, the set K is formed bytori restricted to which the dynamics is conjugate to a Diophantine rotation.

Theorems 4.4.4 and 4.4.8 extend to systems of class Cr with r finite butsufficiently large, depending on the dimension. For example, the version ofTheorem 4.4.8 for d = 1 is true for r > 3 and false for r < 3; in the boundarycase r = 3, parts (a) and (b) of the theorem remain valid but part (c) does not.

The notion of Hamiltonian flow extends to any symplectic manifold (M, θ),as follows. Let H : M → R be a function of class C2 and dH(z) : TzM → Rdenote its derivative at each point z ∈M . By the definition of symplectic form,each θz : TzM × TzM → R is a non-degenerate alternate 2-form. Hence, thereexists exactly one vector XH(z) ∈ TzM such that

θz(XH(z), v) = dH(z)v for every v ∈ TzM .

The map z 7→ XH(z) is a vector field of class C1 on the manifold M . This iscalled the Hamiltonian vector field associated with H . The corresponding flow,given by the differential equation

dz

dt= XH(z), (4.4.8)

is the Hamiltonian flow associated with H . We leave it to the reader to checkthat (4.4.8) corresponds precisely to the Hamilton-Jacobi equations (4.4.1) whenM = R2d and θ is the symplectic form in Example 4.4.5.

4.4.3 Elliptic periodic points

The ideas behind the results stated in the previous section may be used todescribe the behavior of conservative systems in the neighborhood of ellipticperiodic points. Let us explain this briefly, starting with the symplectic case indimension 2.

When M is a surface, the notions of symplectic form and area form coincide.Thus, a differentiable transformation f : M → M is symplectic if and only ifit preserves area. Let ζ ∈ M be an elliptic fixed point, that is, such that theeigenvalues of Df(ζ) are in the unit circle. Let λ and λ be the eigenvalues. Wesay that the fixed point ζ is non-degenerate if λk 6= 1 for every 1 ≤ k ≤ 4. Then,by the Birkhoff normal form theorem (see Arnold [Arn78, Appendix 7]), thereexist canonical coordinates (x, y) ∈ R2 in the neighborhood of the fixed point,with ζ = (0, 0), such that the transformation f has the form:

f(θ, ρ) = (θ + ω0 + ω1ρ, ρ) +R(θ, ρ) with |R(θ, ρ)| ≤ C|ρ|2, (4.4.9)

where (θ, ρ) ∈ S1 × R are the “polar” coordinates defined by

x =√ρ cos 2πθ and y =

√ρ sin 2πθ.

Observe that the normal form f0 : (θ, ρ) 7→ (θ + ω0 + ω1ρ, ρ) is integrable.Moreover, f0 satisfies the twist condition (4.4.7) as long as ω1 6= 0 (this con-dition does not depend on the choice of the canonical coordinates, just on the


transformation f). Then one may apply the methods of Theorem 4.4.8 to con-clude that there exists a set K with positive area that is formed by invariantcircles with Diophantine rotation numbers, that is, such that the restriction off to each of these circles is conjugate to a Diophantine rotation. Even more,the fixed point ζ is a density point of this set:

limr→0

m(B(ζ, r) \K)

m(B(ζ, r))= 0,

where B(ζ, r) represents the ball of radius r > 0 around ζ.We will refer to points ζ as in the previous paragraph as generic elliptic fixed

points. An important consequence of what we just said is that generic ellipticfixed points of area preserving transformations are stable: the trajectory of anypoint close to ζ remains close to ζ for all times, as it is “trapped” on the inside ofsome small invariant circle. This feature does not extend to higher dimensions,as we will explain in a while.

Still in dimension two, we want to mention other important dynamical phe-nomena that take place in the neighborhood of generic elliptic fixed points. Letus start by presenting a very useful tool, known as Poincare-Birkhoff fixed pointtheorem or Poincare last theorem. The statement was proposed by Poincare,who also presented some special cases, a few months before his death; the gen-eral case was proved by Birkhoff [Bir13] in the following year.

Let A = S1 × [a, b], with 0 < a < b, and let f : A→ A be a homeomorphismthat preserves each of the boundary components of the annulus A. We say that fis a twist homeomorphism if it rotates the two boundary components in oppositesenses or, more precisely, if there exists some lift F : R × [a, b] → R × [a, b],F (θ, ρ) = (Θ(θ, ρ), R(θ, ρ)) of the map f to the universal cover of the annulus,such that [

Θ(θ, a) − θ][

Θ(θ, b) − θ]< 0 for every θ ∈ R. (4.4.10)

Theorem 4.4.9 (Poincare-Birkhoff fixed point). If f : A→ A is a twist home-omorphism that preserves area then f admits, at least, two fixed points in theinterior of A.

As mentioned previously, every generic elliptic fixed point ζ is accumulatedby invariant circles with Diophantine rotation numbers. Any two such disksbound an annulus around ζ. Applying Theorem 4.4.9 (or, more precisely, itscorollary in Exercise 4.4.6) one gets that any such annulus contains, at least, apair of periodic orbits with the same period.

In a sense, these pairs of periodic orbits are what is left of the invariantcircles of the normal form f0 with rational rotation numbers, which are usuallydestroyed by the addition of the term R in (4.4.9). Their periods go to infinitywhen one approaches ζ. Generically, one of these periodic orbits is hyperbolic(saddle points) and the other one is elliptic. An example is sketched in Fig-ure 4.3: the elliptic fixed point ζ is surrounded by a hyperbolic periodic orbitand an elliptic periodic orbit, marked with the letters p and q, respectively, bothwith period 4. Two invariant circles around ζ are also represented.


ζ

pp

pp

q

qq

q

Figure 4.3: Invariant circles, periodic orbits and homoclinic intersections in theneighborhood of a generic elliptic fixed point

The Swiss mathematician Eduard Zehnder proved that, generically, the hy-perbolic periodic orbits exhibit transverse homoclinic points, that is, their sta-ble manifolds and unstable manifolds intersect transversely, as depicted in Fig-ure 4.3. This implies that the geometry of the stable manifolds and unstablemanifolds is extremely complex. Moreover, the elliptic periodic orbits satisfythe genericity conditions mentioned previously. This means that all the dynam-ical complexity that we are describing in the neighborhood of ζ is reproducedin the neighborhood of each one of these “satellite” elliptic orbits (which havetheir own “satellites”, etc).

Moreover, a theory developed by the French physicist Serge Aubry and theAmerican mathematician John Mather shows that ζ is also accumulated by cer-tain infinite, totally disconnected invariant sets, restricted to which the transfor-mation f is minimal (all the orbits are dense). In a sense, these Aubry-Mathersets are a souvenir of the invariant circles of the normal form f0 with irra-tional non-Diophantine rotation numbers that are also typically destroyed bythe addition of the perturbation term R in (4.4.9).

Figure 4.4 illustrates a good part of what we have been saying. It depictsseveral computer calculated trajectories of an area preserving transformation.The behavior of these trajectories suggests the presence of invariant circles,elliptic satellites with their own invariant circles, and even hyperbolic orbitswith associated transverse homoclinic intersections. One can also observe thepresence of certain trajectories with “chaotic” behavior, apparently related tothose homoclinic intersections.

More generally, let f : M → M be a symplectic diffeomorphism on a sym-plectic manifold M of any (even) dimension 2d ≥ 2. We say that a fixedpoint ζ ∈ M is elliptic if all the eigenvalues of the derivative Df(ζ) are in theunit circle. Let λ1, λ1, . . . , λd, λd be those eigenvalues. We say that ζ is non-degenerate if λk11 . . . λkd

d 6= 1 for every (k1, . . . , kd) ∈ Zd with |k1|+ · · ·+ |kd| ≤ 4


Figure 4.4: Computational evidence for the presence of invariant circles, ellipticislands and transverse homoclinic intersections

(in particular, the eigenvalues are all distinct). Then, by the Birkhoff nor-mal form theorem (see Arnold [Arn78, Appendix 7]), there exist canonicalcoordinates (x1, . . . , xd, y1, . . . , yd) ∈ R2d in a neighborhood of ζ such thatζ = (0, . . . , 0, 0, . . . , 0) and the transformation f has the form

f(θ, ρ) = (θ + ω0 + ω1(ρ), ρ) +R(θ, ρ) with ‖R(θ, ρ)‖ ≤ const ‖ρ‖2,

where ω0 ∈ Rd and ω1 : Rd → Rd is a linear map and (θ, ρ) ∈ Td × Rd are the“polar” coordinates defined by

xj =√ρj cos 2πθj and yj =

√ρj sin 2πθj , j = 1, . . . , d.

Assuming that ω1 is an isomorphism (this is yet another generic conditionon the transformation f), we have that the normal form

f0 : (θ, ρ) 7→ (θ + ω0 + ω1(ρ), ρ)

is integrable and satisfies the twist condition (4.4.7). Applying the ideas of The-orem 4.4.8, one concludes that ζ is a density point of a set K formed by invarianttori of dimension d, restricted to which the transformation f is conjugate to aDiophantine rotation.

In particular, symplectic transformations with generic elliptic fixed (or pe-riodic) points are never ergodic. Observe, on the other hand, that for d > 1 atorus of dimension d does not separate the ambient space M into two connectedcomponents. Therefore, the argument we used before to conclude that genericelliptic fixed points on surfaces are stable does not extend to higher dimension.In fact, it is known that when d > 1 elliptic fixed points are usually unsta-ble: trajectories starting arbitrarily close to the fixed point may escape from afixed neighborhood of it. This is related to the phenomenon known as Arnolddiffusion, which is a very active research topic in this area.

Finally, let us mention that this theory also applies to continuous time con-servative systems. We say that a stationary point ζ of a Hamiltonian flow iselliptic if all the eigenvalues of the derivative of the vector field at the point ζ


are pure imaginary numbers. Arguments similar to those in the discrete timecase show that, under generic hypothesis, ζ is a density point of a set formed byinvariant tori of dimension d restricted to each of which the Hamiltonian flowis conjugate to a linear flow.

Moreover, there are corresponding results for periodic trajectories of Hamil-tonian flows. One way to obtain such results is by considering a cross-section tothe flow at some point of the periodic trajectory and applying the previous ideasto the corresponding Poincare map. In this way one gets that, under genericconditions, elliptic periodic trajectories of Hamiltonian flows are accumulatedby sets with positive volume consisting of invariant tori of the flow.

The theory of Kolmogorov-Arnold-Moser has many other applications, in awide variety of situations in Mathematics that go beyond the scope of this book.The reader may find more complete information in the following references:Arnold [Arn78], Bost [Bos86], Yoccoz [Yoc92], de la Llave [dlL93] and Arnold,Kozlov, Neishtadt [AKN06], among others.

4.4.4 Geodesic flows

Let M be a compact Riemannian manifold. Some of the notions that are usedin the sequel are recalled in Appendix A.4.

It follows from the theory of ordinary differential equations that for each(x, v) ∈ TM there exists a unique geodesic γx,v : R → M of the manifold Msuch that γx,v(0) = x and γx,v(0) = v. Moreover, the family of transformationsdefined by

f t : (x, v) 7→ (γx,v(t), γx,v(t))

is a flow on the tangent bundle TM , which is called geodesic flow of M . Wedenote by T 1M the unit tangent bundle, formed by the pairs (x, v) ∈ TM with‖v‖ = 1. The unit tangent bundle is invariant under the geodesic flow.

Equivalently, the geodesic flow may be defined as the Hamiltonian flow in thetangent bundle TM (with the symplectic structure defined in Example 4.4.7)associated with the Hamiltonian function H(x, v) = ‖v‖2. So, (f t)t preservesthe Liouville measure of the tangent bundle.

In this context, the Liouville measure may described as follows. Every innerproduct in a finite-dimensional vector space induces a volume element3 in thatspace, relative to which the cube spanned by any orthonormal basis has volume1. In particular, the Riemannian metric induces a volume element dv on eachtangent space TxM . Integrating this volume element along M , we get a volumemeasure dx on the manifold itself. The Liouville measure of TM is given, locally,by the product dxdv. Moreover, its restriction m to the unit tangent bundle isgiven, locally, by the product dxdα, where dα is the measure of angle on theunit sphere of TxM .

The fact that H is a first integral means that the norm ‖v‖ is constant alongtrajectories of the flow. In particular, (f t)t leaves the unit tangent bundle invari-

3That is, a volume form defined up to sign: the sign is not determined because the innerproduct does not detect the orientation of the vector space.


ant. Furthermore, the geodesic flow preserves the restriction m of the Liouvillemeasure to T 1M . However, the behavior of geodesic flows is, usually, very dif-ferent from the dynamics of the almost integrable systems that we described inSection 4.4.2.

For example, the Austrian mathematician Eberhard F. Hopf [Hop39] provedin 1939 that if M is a compact surface with negative Gaussian curvature atevery point then its geodesic flow is ergodic. Almost three decades later, histheorem was extended to manifolds in any dimension, through the followingremarkable result of the Russian mathematician Dmitry Anosov [Ano67]:

Theorem 4.4.10 (Anosov). Let M be a compact manifold with negative sec-tional curvature. Then the geodesic flow on the unit tangent bundle is ergodicwith respect to the Liouville measure on T 1M .

Thus, the geodesic flows of manifolds with negative curvature were the firstimportant class of Hamiltonian systems for which the ergodic hypothesis couldbe validated rigorously.

4.4.5 Anosov systems

There are two fundamental steps in the proof of Theorem 4.4.10. The first oneis to show that every geodesic flow on a manifold with negative curvature is uni-formly hyperbolic. This means that every trajectory γ of the flow is contained ininvariant submanifolds W s(γ) and Wu(γ) that intersect each other transverselyalong γ and satisfy:

• every trajectory in W s(γ) is exponentially asymptotic to γ in the future;

• every trajectory in Wu(γ) is exponentially asymptotic to γ in the past;

(see Figure 4.5), with exponential convergence rates that are uniform, that is,independent of γ. Moreover, the geodesic flow is transitive. The second mainstep in the proof of Theorem 4.4.10 consists of showing that every transitive,uniformly hyperbolic flow (or transitive Anosov flow) of class C2 that preservesvolume is ergodic. We will comment on this last issue in a little while.

γ

Wu(γ)

W s(γ)

Figure 4.5: Hyperbolic behavior

There exists a corresponding notion for discrete time systems: we say thata diffeomorphism f : N → N on a compact Riemannian manifold is uniformly


hyperbolic (or an Anosov diffeomorphism) if the tangent space to the manifoldat every point z ∈ N admits a direct sum decomposition TzN = Esz ⊕ Euz suchthat the decomposition is invariant under the derivative of f :

Df(z)Esz = Esf(z) and Df(z)Euz = Euf(z) for every z ∈ N , (4.4.11)

and the derivative contracts Esz and expands Euz , uniformly:

supz∈N

‖Df(z) | Esz‖ < 1 and supz∈N

‖Df(z)−1 | Euz ‖ < 1 (4.4.12)

(for some choice of a norm compatible with the Riemannian metric on M).One can prove that for each z ∈ N the set W s(z) of points whose forward

trajectory is asymptotic to the trajectory of z is a differentiable (immersed)submanifold of N tangent to Esz at the point z; analogously, the set W s(z)of points whose backward trajectory is asymptotic to the trajectory of z is adifferentiable submanifold tangent to Euz at the point z. These submanifoldsform foliations (that is, decompositions of N into differentiable submanifolds)that are invariant under the diffeomorphism:

f(W s(z)) = W s(f(z)) and f(Wu(z)) = Wu(f(z)) for every z ∈ N .

We call W s(z) the stable manifold (or stable leaf ) and Wu(z) the unstablemanifold (or unstable leaf ) of the point z ∈M .

Concerning the second part of the proof of Theorem 4.4.10, the crucial tech-nical tool to prove that every transitive, uniformly hyperbolic diffeomorphismof class C2 that preserves volume is ergodic is the following theorem of Anosov,Sinai [AS67]:

Theorem 4.4.11 (Absolute continuity). The stable and unstable foliations ofany Anosov diffeomorphism (or flow) of class C2 are absolutely continuous:

1. if X ⊂ N has zero volume then X ∩W s(x) has volume zero inside W s(x)for almost every x ∈ N ;

2. if Y ⊂ Σ is a zero volume subset of some submanifold Σ transverse to thestable foliation, then the union of the stable manifolds through the pointsof Y has zero volume in N ;

and analogously for the unstable foliation.

Ergodicity of the system may then be deduced using the Hopf argument, thatwe introduced in a special case in Section 4.2.6. Let us explain this. Given anycontinuous function ϕ : N → R, let Eϕ be the set of all points z ∈ N for whichthe forward and backward time averages, ϕ+(z) and ϕ−(z), are well definedand coincide. This set Eϕ has full volume, as we have seen in Corollary 3.2.8.Observe also that ϕ+ is constant on each stable manifold and ϕ− is constant oneach unstable manifold. So, by the first part of Theorem 4.4.11, the intersectionYz = Wu(z) ∩ Eϕ has full volume in Wu(z) for almost every z ∈ N . Moreover,


ϕ− = ϕ+ is constant on each Yz . Fix any such z. The transitivity hypothesisimplies that the union of all stable manifolds through the points of Wu(z) is thewhole ambient manifold N . Hence, using the second part of Theorem 4.4.11,the union of the stable manifolds through the points of Yz has full volume in N .Clearly, ϕ+ is constant on this union. This shows that the time average of everycontinuous function ϕ is constant on a full measure set. Hence, f is ergodic.

We close this section by observing that all the known examples of Anosovdiffeomorphisms are transitive. The corresponding statement for Anosov flowsis false (see Verjovsky [Ver99]). Another open problem in this setting is whetherergodicity still holds when the Anosov system is only of class C1. It is known(see [Bow75b, RY80]) that in this case the absolute continuity theorem (Theo-rem 4.4.11) is false, in general.

4.4.6 Billiards

As we have seen in Sections 4.4.2 and 4.4.3, non-ergodic systems are quitecommon in the realm of Hamiltonian flows and symplectic transformations.However, this fact alone is not sufficient to invalidate the ergodic hypothesis ofBoltzmann in the context where it was formulated. Indeed, ideal gases are aspecial class of systems and it is conceivable that ergodicity could be typicalin this more restricted setting, even it is not typical for general Hamiltoniansystems.

In the 1960’s, the Russian mathematician and theoretical physicist YakovSinai [Sin63] conjectured that Hamiltonian systems formed by spherical hardballs that hit each other elastically are ergodic. Hard ball systems (see Exam-ple 4.4.13 below for a precise definition) had been proposed as a model for thebehavior of ideal gases by the American scientist Josiah Willard Gibbs who,together with Boltzmann and Scottish mathematician and theoretical physicistJames Clark Maxwell, created the area of Statistical Mechanics. The ergodichypothesis of Boltzmann-Sinai, as Sinai’s conjecture is often referred to, is themain topic in the present section.

In fact, we are going to discuss the problem of ergodicity for somewhatmore general systems, called billiards, whose formal definition was first given byBirkhoff, in the 1930’s.

In its simplest form, a billiard is given by a bounded connected domainΩ ⊂ R2, called the billiard table, whose boundary ∂Ω is formed by a finitenumber of differentiable curve. We call corners those points of the boundarywhere it fails to be differentiable; by hypothesis, they constitute a finite setC ⊂ ∂Ω. One considers a point particle moving uniformly along straight linesinside Ω, with elastic reflections on the boundary. That is, whenever the particlehits ∂Ω\C it is reflected in such a way that the angle of incidence equals the angleof reflection. When the particle hits some corner it is absorbed: its trajectoryis not defined from then on.

Let us denote by n the unit vector field orthogonal to the boundary ∂Ωand pointing to the inside of Ω. It defines an orientation in ∂Ω \ C: a vectort tangent to the boundary is positive if the basis t,n of R2 is positive. It is


clear that the motion of the particle is characterized completely by the sequenceof collisions with the boundary. Moreover, each such collision may be describedby the position s ∈ ∂Ω and the angle of reflection θ ∈ (−π/2, π/2). Therefore,the evolution of the billiard is governed by the transformation

f : (∂Ω \ C) × (−π/2, π/2) → ∂Ω × (−π/2, π/2), (4.4.13)

that associates with each collision (s, θ) the subsequent one (s′, θ′). See Fig-ure 4.6.

s′

sθ

θ′s s′

θ

θ

∂Ω ∂Ω

Figure 4.6: Dynamics of billiards

In the example on the left-hand side of Figure 4.6 the billiard table is a poly-gon, that is, the boundary consists of a finite number of straight line segments.The one trajectory represented in the figure hits one of the corners. Nearbytrajectories, to either side, collide with distinct boundary segments, with verydifferent angles of incidence. In particular, it is clear that the billiard trans-formation (4.4.13) can not be continuous. Discontinuities may occur even inthe absence of corners. For example, on the right-hand side of Figure 4.6 theboundary has four connected components, all of which are differentiable curves.Consider the trajectory represented in the figure, tangent to one of the bound-ary components. Nearby trajectories, to either side, hit with different boundarycomponents. Consequently, the billiard map is discontinuous also in this case.

Example 4.4.12. (Circular billiard table) On the left-hand side of Figure 4.7we represent a billiard in the unit ball Ω ⊂ R2. The corresponding billiardtransformation is given by

f : (s, θ) 7→ (s− (π − 2θ), θ).

The behavior of this transformation is described geometrically on the right-hand side of Figure 4.7. Observe that f preserves the area measure ds dθ andsatisfies the twist condition (4.4.4). Note also that f is integrable (in the senseof Section 4.4.2) and, in particular, the area measure is not ergodic. We willsee in a while (Theorem 4.4.14) that every planar billiard preserves a naturalmeasure equivalent to the area measure on ∂Ω × (−π/2, π/2). Then, using theprevious observations, the KAM theory permits to prove that billiards withalmost circular tables are not ergodic with respect to that invariant measure.


s

s′

θ

θ′

π − 2θ

θ = π/2

θ = −π/2

Figure 4.7: Billiard on a circular table

The definition of billiard extends immediately to bounded connected do-mains Ω in any Euclidean space Rd, d ≥ 1, whose boundary consists of a finitenumber of differentiable hypersurfaces intersecting each other along submani-folds with codimension larger than 1. We denote by C the union of the submani-folds. As before, we endow ∂Ω with the orientation induced by the unit vector n

orthogonal to the boundary and pointing to the “inside” of Ω. Elastic reflectionson the boundary are defined by the following two conditions: (i) the incidenttrajectory segment, the reflected trajectory segment and the orthogonal vectorn are co-planar and (ii) the angle of incidence equals the angle of reflection.The billiard transformation is defined as in (4.4.13), having as domain

(s, v) ∈ (∂Ω \ C) × Sd−1 : v · n(s) > 0.Even more generally, we may take as billiard table any bounded connected

domain in a Riemannian surface, whose boundary is formed by a finite number ofdifferentiable hypersurfaces intersecting along higher codimension submanifolds.The definitions are analogous, except that the trajectories between consecutivereflections on the boundary are given by segments of geodesics and angles aremeasured according to the Riemannian metric on the manifolds.

Example 4.4.13 (Ideal gases and billiards). Ideally, a gas is formed by a largenumber N of molecules (N ≈ 1027) that move uniformly along straight lines,between collisions, and collide with each other elastically. Check the right-handside of Figure 4.8. For simplicity, let us assume that the molecules are identicalspheres and that they are contained in the torus4 of dimension d ≥ 2. Let usalso assume that all the molecules move with constant unit speed. This systemcan be modelled by a billiard, as follows.

For 1 ≤ i ≤ N , denote by pi ∈ Td the position of the center of the i-thmolecule Mi. Let ρ > 0 be the radius of each molecule. Then, each state of thesystem is entirely described by a value of p = (p1, . . . , pN ) in the set

Ω = p = (p1, . . . , pN ) ∈ TNd : ‖pi − pj‖ ≥ 2ρ for every i 6= j4One may replace the torus Td by a more plausible container, such as the d-dimensional

cube [0, 1]d, for example. However, the analysis is a bit more complicated in that case, becausewe must take into account the collisions of the balls with the container’s walls.


Td

vjvi vj

v′j

Rij

vi

v′i

Figure 4.8: Model for an ideal gas

(this set is connected, as long as the radius ρ is sufficiently small).In the absence of collisions, the point p moves along a straight line inside Ω,

with constant speed. When two molecules Mi and Mj collide, ‖pi − pj‖ = 2ρand the velocity vectors change in the following way. Let vi and vj be thevelocity vectors of the two molecules immediately before the collision and letRij be the straight line through pi and pj . The elasticity hypothesis means thatthe velocity vectors v′i and v′j immediately after the collision are given by (checkthe right-hand side of Figure 4.8):

(i) the components of vi and v′i in the direction of Rij are symmetric and thesame is true for vj and v′j ;

(ii) the components of vi and v′i in the direction orthogonal to Rij are equaland the same is true for vj and v′j .

This means, precisely, that the point p undergoes elastic reflection on the hy-persurface p ∈ ∂Ω : ‖pi − pj‖ = 2ρ of the boundary of Ω (see Exercise 4.4.4).Therefore, the motion of the point p corresponds exactly to the evolution of thebilliard in the table Ω.

The next result places billiards well inside the domain of interest of ErgodicTheory. Let ds be the volume measure induced on the boundary ∂Ω by theRiemannian metric of the ambient manifold; in the planar case (that is, whenΩ ⊂ R2), ds is just the arc-length. Denote by dθ the angle measure on eachhemisphere v ∈ Sd−1 : v · n(s) > 0.

Theorem 4.4.14. The transformation f preserves the measure ν = cos θds dθon the domain (s, v) ∈ ∂Ω × Sd−1 : v · n(s) > 0.

In what follows we sketch the proof for planar billiards. The reader shouldhave no trouble checking that all the arguments extend naturally to arbitrarydimension.

Consider any family of trajectories starting from a given boundary point (thismeans that s is fixed), as represented on the left-hand side of Figure 4.9. Let


s

s′

dθdt

s

s′

dh

Figure 4.9: Calculating the derivative of the billiard map

this family be parameterized by the angle of reflection θ. Denote by ℓ(s, s′) thelength of the line segment connecting s to s′. Then ℓ(s, s′)dθ = dh = cos θ′ds′

and, thus,∂s′

∂θ=ℓ(s, s′)

cos θ′.

To calculate the derivative of θ′ with respect to θ, observe that the variation ofθ′ is the sum of two components: the first one corresponds to the variation of θ,whereas the second one arises from the variation of the normal vector n(s′) as thecollision point s′ varies. By the definition of curvature, this second componentis equal to κ(s′)ds′. It follows that dθ′ = dθ + κ(s′)ds′ and, consequently,

∂θ′

∂θ= 1 + κ(s′)

∂s′

∂θ= 1 + κ(s′)

ℓ(s, s′)

cos θ′.

This can be summarized as follows:

Df(s, θ)∂

∂θ=ℓ(s, s′)

cos θ′∂

∂s′+(1 + κ(s′)

ℓ(s, s′)

cos θ′) ∂∂θ′

(4.4.14)

Next, consider any family of parallel trajectories, as represented on the right-hand side of Figure 4.9. Let this family be parameterized by the arc-length tin the direction orthogonal to the trajectories. The variations of s and s′ alongthis family are given by − cos θds = dt = cos θ′ds′. Since the trajectories allhave the same direction, the variations of the angles θ and θ′ arise, solely, fromthe variations of the normal vectors n(s) and n(s′) as s and s′ vary. That is,dθ = κ(s)ds and dθ′ = κ(s′)ds′. Therefore,

Df(s, θ)(− 1

cos θ

∂

∂s− κ(s)

cos θ

∂

∂θ

)=

1

cos θ′∂

∂s′+κ(s′)

cos θ′∂

∂θ′. (4.4.15)

Let J(s, θ) be the matrix of the derivative Df(s, θ) with respect to the bases∂/∂s, ∂/∂θ and ∂/∂s′, ∂/∂θ′. The relations (4.4.14) and (4.4.15) imply that

detJ(s, θ) =

∣∣∣∣∣ℓ(s,s′)cos θ′

1cos θ′

1 + κ(s′) ℓ(s,s′)

cos θ′κ(s′)cos θ′

∣∣∣∣∣∣∣∣∣

0 − 1cos θ

1 − κ(s)cos θ

∣∣∣∣=

cos θ

cos θ′. (4.4.16)


So, by change of variables,∫ϕdν =

∫ϕ(s′, θ′) cos θ′ ds′ dθ′ =

∫ϕ(f(s, θ)) cos θ′

cos θ

cos θ′ds dθ

=

∫ϕ(f(s, θ)) cos θ ds dθ =

∫(ϕ f) dν

for every bounded measurable function ϕ. This proves that f preserves themeasure ν = cos θdsdθ, as we stated.

We call a billiard dispersing if the boundary of the billiard table is strictlyconvex at every point, when viewed from the inside. In the planar case, withthe orientation conventions that we adopted, this means that the curvature κis negative at every point. Figure 4.10 presents two examples. In the firstone, Ω ⊂ R2 and the boundary is a connected set formed by the union of fivedifferentiable curves. In the second example, Ω ⊂ T2 and the boundary hasthree connected components, all of which are differentiable and convex.

T2

∂Ω

Figure 4.10: Dispersive billiards

The class of dispersing billiards was introduced by Sinai in his 1970 arti-cle [Sin70]. The denomination “dispersing” refers to the fact that in such bil-liards any (thin) beam of parallel trajectories becomes divergent upon reflectionon the boundary, as illustrated on the left hand side of Figure 4.10. Sinai ob-served that dispersing billiards are hyperbolic systems, in a non-uniform sense:invariant sub-bundles Esz and Euz as in (4.4.11) exist at almost every pointand, instead of (4.4.12), we have that the derivative is contracting along Eszand expanding along Euz asymptotically, that is, for sufficiently large iterates(depending on the point z).

The billiards associated with ideal gases (Example 4.4.13) with N = 2molecules are dispersing: it is easy to see that (p1, p2) ∈ R2d : ‖p1 − p2‖ = 2ρis a convex hypersurface. Consequently, these billiards are hyperbolic, in thesense of the previous paragraph. Using a subtle version of the Hopf argument,Sinai proved in [Sin70] that such billiards are ergodic, at least when d = 2.This was later extended to arbitrary dimension d ≥ 2 by Sinai and his studentNikolai Chernov [SC87], still in the case N = 2. Thus, dispersing billiards werethe first class of billiards for which ergodicity was proven rigorously.


The case N ≥ 3 of the Boltzmann-Sinai ergodic hypothesis is a lot moredifficult because the corresponding billiards are not dispersing: the hypersurface

(p1, p2, . . . , pN) ∈ RNd : ‖p1 − p2‖ = 2ρ

has cylinder geometry, with zero curvature along the direction of the variablespi, i > 2. Such billiards are called semi-dispersing. Most results in this settingare due to Hungarian mathematicians Andras Kramli, Nandor Simanyi andDomoko Szasz. In [KSS91, KSS92] they proved hyperbolicity and ergodicity forN = 3 and also for N = 4 assuming that d ≥ 3. Later, Simanyi [Sim02] provedhyperbolicity for the general case: any number of spheres, in any dimension.The problem of ergodicity remains open, in general, although there are manyother partial results.

Figure 4.11: Bunimovich stadium and mushroom

There are now several known examples of ergodic billiards that are not dis-persing. This even includes some billiards whose boundary curvature is non-negative at every point. The best known example is the Bunimovich stadium,whose boundary is formed by two semi-circles and two straight line segments.See Figure 4.11. This billiard is hyperbolic, but this property arises from a dif-ferent mechanism, called defocusing: a beam of parallel trajectories reflectingon a concave segment of the billiard table wall starts by focusing, but then getsdispersed. Another interesting example is the Bunimovich mushroom, hyper-bolic behavior and elliptic behavior coexist, on disjoint invariant sets both withpositive measure.

4.4.7 Exercises

4.4.1. We say that ω ∈ Rd is τ -Diophantine if it is (c, τ)-Diophantine, that is,if it satisfies (4.4.5), for some c > 0. Prove that the set of τ -Diophantine vectorsis non-empty if and only if τ ≥ d − 1. Moreover, that set has full Lebesguemeasure in Rd whenever τ is strictly larger than d− 1.

4.4.2. Consider a billiard on a rectangular table. Check that every trajectorythat does not hit any corner either is periodic or is dense in the billiard table.

4.4.3. Show that every billiard on an acute triangle exhibits some periodictrajectory. [Observation: the same is true for right triangles, but the problemis open for obtuse triangles.]


4.4.4. Consider the billiard model for ideal gases in Example 4.4.13. Check thatelastic collisions between any two molecules correspond the elastic reflections ofthe billiard point particle on the boundary of Ω.

4.4.5. Prove Theorem 4.4.9 under the additional hypothesis that the functionρ 7→ Θ(θ, ρ) is monotone (increasing or decreasing) for every θ ∈ R.

4.4.6. Consider the context of Theorem 4.4.9 but, instead of (4.4.10), assumethat f rotates the two boundary components of A with different velocities: thereexists some lift F : R × [a, b] → R × [a, b] and there exist p, q ∈ Z with q ≥ 1,such that, denoting F q = (Θq, Rq),

[Θq(θ, a) − p− θ

][Θq(θ, b) − p− θ

]< 0 for every θ ∈ R. (4.4.17)

Show that f has two periodic orbits with period q in the interior of A, at least.

4.4.7. Let Ω be a convex domain in the plane whose boundary ∂Ω is a differ-entiable curve. Show that the billiard on Ω has infinitely many periodic orbits.


Chapter 5

Ergodic decomposition

For convex subsets of vector spaces with finite dimension, it is clear that ev-ery element of the convex set may be written as a convex combination of theextremal elements. For example, every point in a triangle may be written asa convex combination of the vertices of the triangle. In view of the results inSection 4.3, it is natural to ask whether a similar property holds in the spaceof invariant probability measures, that is, whether every invariant measure is aconvex combination of ergodic measures.

The ergodic decomposition theorem, which we prove in this chapter (Theo-rem 5.1.3), asserts that the answer is positive, except that the number of “terms”in this combination is not necessarily finite, not even countable. This theoremhas several important applications; in particular, it permits to reduce the proofof many results to the case when the system is ergodic.

We are going to deduce the ergodic decomposition theorem from anotherimportant result from Measure Theory, the Rokhlin disintegration theorem.The simplest instance of this theorem holds when we have a partition of aprobability space (M,µ) into finitely many measurable subsets P1, . . . , PN withpositive measure. Then, obviously, we may write µ as a linear combination

µ = µ(P1)µ1 + · · · + µ(PN )µN

of its normalized restrictions µi(E) = µ(E ∩ Pi)/µ(Pi) to each of the partitionelements. The Rokhlin disintegration theorem (Theorem 5.1.11) states that thistype of disintegration of the probability measure is possible for any partitionP (possibly uncountable!) that can be obtained as the limit of an increasingsequence of finite partitions.

5.1 Ergodic decomposition theorem

Before stating the ergodic decomposition theorem, let us analyze a couple ofexamples that help motivate and clarify its content:

147

148 CHAPTER 5. ERGODIC DECOMPOSITION

Example 5.1.1. Let f : [0, 1] → [0, 1] be given by f(x) = x2. The Diracmeasures δ0 and δ1 are invariant and ergodic for f . It is also clear that x = 0and x = 1 are the unique recurrent points for f and so every invariant probabilitymeasure µmust satisfy µ(0, 1) = 1. Then, µ = µ(0)δ0+µ(1)δ1 is a (finite)convex combination of the ergodic measures.

Example 5.1.2. Let f : T2 → T2 be given by f(x, y) = (x+y, y). The Lebesguemeasure m on the torus is preserved by f . Observe that every horizontal circleHy = S1 × y is invariant under f and the restriction f : Hy → Hy is therotation Ry. Let my be the Lebesgue measure on Hy. Observe that my is alsoinvariant under f . Moreover, my is ergodic whenever y is irrational. On theother hand, by the Fubini theorem,

m(E) =

∫my(E) dy for every measurable set E. (5.1.1)

The identity is not affected if we consider the integral restricted to the subsetof irrational values of y. Then (5.1.1) presents m as an (uncountable) convexcombination of ergodic measures.

5.1.1 Statement of the theorem

Let us start by introducing some useful terminology. In what follows, (M,B, µ)is a probability space and P is a partition of M into measurable subsets. Wedenote by π : M → P the canonical projection that assigns to each point x ∈Mthe element P(x) of the partition that contains it. This projection map permitsto endow P with the structure of a probability space, as follows. Firstly, bydefinition, a subset Q of P is measurable if and only if its pre-image

π−1(Q) = union of all P ∈ P that belong to Q

is a measurable subset of M . It is easy to check that this definition is consistent:the family B of measurable subsets is a σ-algebra in P . Then, we define thequotient measure µ by

µ(Q) = µ(π−1(Q)) for every Q ∈ B.

Theorem 5.1.3 (Ergodic decomposition). Let M be a complete separable met-ric space, f : M → M be a measurable transformation and µ be an invariantprobability measure. Then there exist a measurable set M0 ⊂M with µ(M0) = 1,a partition P of M0 into measurable subsets and a family µP : P ∈ P of prob-ability measures on M , satisfying

(a) µP (P ) = 1 for µ-almost every P ∈ P;

(b) P 7→ µP (E) is measurable, for every measurable set E ⊂M ;

(c) µP is invariant and ergodic for µ-almost every P ∈ P;

5.1. ERGODIC DECOMPOSITION THEOREM 149

(d) µ(E) =∫µP (E) dµ(P ), for every measurable set E ⊂M .

Part (d) of the theorem means that µ is a convex combination of the ergodicprobability measures µP , where the “weight” of each µP is determined by theprobability measure µ. Part (b) ensures that the integral in (d) is well defined.Moreover (see Exercise 5.1.3), it implies that the map P → M1(M) given byP 7→ µP is measurable.

5.1.2 Disintegration of a measure

We are going to deduce Theorem 5.1.3 from an important result in MeasureTheory, the Rokhlin disintegration theorem, which has many other applications.To state this theorem we need the following notion.

Definition 5.1.4. A disintegration of µ with respect to a partition P is a familyµP : P ∈ P of probability measures on M such that, for every measurable setE ⊂M :

(a) µP (P ) = 1 for µ-almost every P ∈ P ;

(b) the map P → R, defined by P 7→ µP (E) is measurable;

(c) µ(E) =∫µP (E) dµ(P ).

Recall that the partition P inherits fromM a natural structure of probabilityspace, with a σ-algebra B and a probability measure µ. The measures µP arecalled conditional probabilities of µ with respect to P .

Example 5.1.5. Let P = P1, . . . , Pn be a finite partition of M into measur-able subsets with µ(Pi) > 0 for every i. The quotient measure µ is given byµ(Pi) = µ(Pi). Consider the normalized restriction µi of µ to each Pi:

µi(E) =µ(E ∩ Pi)µ(Pi)

for every measurable set E ⊂M.

Then µ1, . . . , µn is a disintegration of µ with respect to P : it is clear thatµ(E) =

∑ni=1 µ(Pi)µi(E) for every measurable set E ⊂M .

This constructions extends immediately to countable partitions. In the nextexample we treat an uncountable case:

Example 5.1.6. Let M = T2 and P be the partition of M into horizontalcircles S1 × y, y ∈ S1. Let m be the Lebesgue measure on T2 and m be theLebesgue measure on S1. Denote by my the Lebesgue measure (arc-length) oneach horizontal circle S1 × y. By the Fubini theorem,

m(E) =

∫my(E) dm(y) for every measurable set E ⊂ T2.

Hence, my : y ∈ S1 is a disintegration of m with respect to P .


The next proposition asserts that disintegrations are essentially unique, whenthey exist. The hypothesis is very general: it holds, for example, if M is atopological space with a countable basis of open sets and B is the Borel σ-algebra:

Proposition 5.1.7. Assume that the σ-algebra B admits some countable gen-erator. If µP : P ∈ P and µ′

P : P ∈ P are disintegrations of µ with respectto P, then µP = µ′

P for µ-almost every P ∈ P.

Proof. Let Γ be a countable generator of the σ-algebra B and A be the algebragenerated by Γ. Note that A is countable, since it coincides with the unionof the (finite) algebras generated by the finite subsets of Γ. For each A ∈ A,consider the sets

QA = P ∈ P : µP (A) > µ′P (A) and RA = P ∈ P : µP (A) < µ′

P (A).

If P ∈ QA then P is contained in π−1(QA) and, using property (a) in thedefinition of disintegration, µP (A∩π−1(QA)) = µP (A). Otherwise, P is disjointfrom π−1(QA) and, hence, µP (A ∩ π−1(QA)) = 0. Moreover, these conclusionsremain valid when one takes µ′

P in the place of µP . Hence, using property (c)in the definition of disintegration,

µ(A ∩ π−1(QA)) =

∫P µP (A ∩ π−1(QA)) dµ(P ) =

∫QA

µP (A) dµ(P )

∫Pµ′P (A ∩ π−1(QA)) dµ(P ) =

∫QA

µ′P (A) dµ(P ).

Since µP (A) > µ′P (A) for every P ∈ QA, this implies that µ(QA) = 0 for every

A ∈ A. A similar argument shows that µ(RA) = 0 for every A ∈ A. So,

⋃

A∈A

QA ∪RA

is also a subset of P with measure zero. For every P in the complement ofthis subset, the measures µP and µ′

P coincide on the generating algebra A and,consequently, they coincide on the whole σ-algebra B.

On the other hand, disintegrations may fail to exist:

Example 5.1.8. Let f : S1 → S1 be an irrational rotation and P be the par-tition of S1 whose elements are the orbits fn(x) : n ∈ Z of f . Assume thatthere exists a disintegration µP : P ∈ P of the Lebesgue measure µ withrespect to P . Consider the iterates f∗µP : P ∈ P of the conditional probabil-ities. Since the partition elements are invariant sets, f∗µP (P ) = µP (P ) = 1 forµ-almost every P . It is clear that, given any measurable set E ⊂M ,

P 7→ f∗µP (E) = µP (f−1(E))

is a measurable function. Moreover, since µ is an invariant measure,

µ(E) = µ(f−1(E)) =

∫µP (f−1(E)) dµ(P ) =

∫f∗µP (E) dµ(P ).


These observations show that f∗µP : P ∈ P is a disintegration of µ withrespect to the partition P . By uniqueness (Proposition 5.1.7), it follows thatf∗µP = µP for µ-almost every P . That is, almost every conditional probabilityµP is invariant. That is a contradiction, because P = fn(x) : n ∈ Z isan infinite countable set and so there can be no invariant probability measuregiving P a positive weight.

The theorem of Rokhlin states that disintegrations always exist if the par-tition P is the limit of an increasing sequence of countable partitions and thespace M is reasonably well behaved. The precise statement is given in thesection that follows.

5.1.3 Measurable partitions

We say that P is a measurable partition if there exists some measurable setM0 ⊂M with full measure such that, restricted to M0,

P =∞∨

n=1

Pn

for some increasing sequence P1 ≺ P2 ≺ · · · ≺ Pn ≺ · · · of countable partitions(see also Exercise 5.1.1). By Pi ≺ Pi+1 we mean that every element of Pi+1 iscontained in some element of Pi or, equivalently, every element of Pi coincideswith a union of elements of Pi+1. Then we say that Pi is coarser than Pi+1 or,equivalently, that Pi+1 is finer than Pi.

Represent by ∨∞n=1Pn the partition whose elements are the non-empty in-

tersections of the form ∩∞n=1Pn with Pn ∈ Pn for every n. Equivalently, this is

the coarser partition such that

Pn ≺∞∨

n=1

Pn for every n.

It follows immediately from the definition that every countable partition ismeasurable. It is easy to find examples of uncountable measurable partitions:

Example 5.1.9. Let M = T2, endowed with the Lebesgue measure m, and letP be the partition of M into horizontal circles S1×y. Then P is a measurablepartition. To see that, consider

Pn = S1 × I(i, n) : i = 1, . . . , 2n,

where I(i, n), 1 ≤ i ≤ 2n is the segment of S1 = R/Z corresponding to theinterval [(i − 1)/2n, i/2n) ⊂ R. The sequence (Pn)n is increasing and P =∨∞n=1Pn.

On the other hand, not all partitions are measurable:


Example 5.1.10. Let f : M → M be a measurable transformation and µ bean ergodic probability measure. Let P be the partition of M whose elementsare the orbits of f . Then P is not measurable, unless f exhibits an orbit withfull measure. Indeed, suppose that there exists a non-decreasing sequence P1 ≺P2 ≺ · · · ≺ Pn ≺ · · · of countable partitions such that P = ∨∞

n=1Pn restrictedto some full measure subset. This last condition implies that almost every orbitof f is contained in some element Pn of the partition Pn. In other words, up tomeasure zero, every element of Pn is invariant under f . By ergodicity, it followsthat for every n there exists exactly one Pn ∈ Pn such that µ(Pn) = 1. DenoteP = ∩∞

n=1Pn. Then P is an element of the partition ∨∞n=1Pn = P , that is, P is

an orbit of f , and it has µ(P ) = 1.

Theorem 5.1.11 (Rokhlin disintegration). Assume that M is a complete sepa-rable metric space and P is a measurable partition. Then the probability measureµ admits some disintegration with respect to P.

Theorem 5.1.11 is proven in Section 5.2. The hypothesis that P is measurableis, actually, also necessary for the conclusion of the theorem (see Exercise 5.2.2).

5.1.4 Proof of the ergodic decomposition theorem

At this point we are going to use Theorem 5.1.11 to prove the ergodic decom-position theorem. Let U be a countable basis of open sets of M and A be thealgebra generated by U . Note that A is countable and that it generates theBorel σ-algebra of M . By the ergodic theorem of Birkhoff, for every A ∈ Athere exists a set MA ⊂ M with µ(MA) = 1 such that the mean sojourn timeτ(A, x) is well defined for every x ∈ MA. Let M0 = ∩A∈AMA. Note thatµ(M0) = 1, since the intersection is countable.

Now consider the partition P of M0 defined as follows: two points x, y ∈M0

are in the same element of P if and only if τ(A, x) = τ(A, y) for every A ∈ A.We claim that this partition is measurable. To prove that it is so, consider anyenumeration Ak : k ∈ N of the elements of the algebra A and let qk : k ∈ Nbe an enumeration of the rational numbers. For each n ∈ N, consider thepartition Pn of M0 defined as follows: two points x, y ∈ M0 are in the sameelement of Pn if and only if, given any i, j ∈ 1, . . . , n,

either τ(Ai, x) ≤ qj and τ(Ai, y) ≤ qj

or τ(Ai, x) > qj and τ(Ai, y) > qj .

It is clear that every Pn is a finite partition, with no more than 2n2

elements.It follows immediately from the definition that x and y are in the same elementof ∨∞

n=1Pn if and only if τ(Ai, x) = τ(Ai, y) for every i. This means that

P =

∞∨

n=1

Pn,

which implies our claim.


So, by Theorem 5.1.11, there exists some disintegration µP : P ∈ P ofµ with respect to P . Parts (a), (b) and (d) of Theorem 5.1.3 are contained inthe definition of disintegration. To prove part (c) it suffices to show that µP isinvariant and ergodic for µ-almost every P , which is what we do in the sequel.

Consider the family of probability measures f∗µP : P ∈ P. Observe thatevery P ∈ P is an invariant set, since mean sojourn times are constant on orbits.It follows that

f∗µP (P ) = µP (f−1(P )) = µP (P ) = 1.

Moreover, given any measurable set E ⊂M , the function

P 7→ f∗µP (E) = µP (f−1(E))

is measurable and, using the fact that µ is invariant under f ,

µ(E) = µ(f−1(E)) =

∫µP (f−1(E)) dµ(P ) =

∫f∗µP (E) dµ(P ).

This shows that f∗µP : P ∈ P is a disintegration of µ with respect to P . Byuniqueness (Proposition 5.1.7), it follows that f∗µP = µP for almost every P .

We are left to prove that µP is ergodic for almost every P . Since µ(M0) = 1,we have that µP (M0 ∩ P ) = 1 for almost every P . Hence, it is enough to provethat, given any P ∈ P and any measurable set E ⊂M , the mean sojourn timeτ(E, x) is well defined for every x ∈ M0 ∩ P and is constant on that set. FixP and denote by C the class of all measurable sets E for which this holds. Byconstruction, C contains the generating algebra A. Observe that if E1, E2 ∈ Cwith E1 ⊃ E2 then E1 \ E2 ∈ C:

τ(E1 \ E2, x) = τ(E1, x) − τ(E2, x)

is well defined and it is constant on M0 ∩ P . In particular, if E ∈ C then Ec isalso in C. Analogously, C is closed under countable pairwise disjoint unions: ifEj ∈ C are pairwise disjoint then

τ(∪j Ej , x) =

∑

j

τ(Ej , x)

is well defined and it is constant on M0 ∩ P . It is easy to deduce that C is amonotone class: given any sequences An, Bn ∈ C, n ≥ 1 with An ⊂ An+1 andBn ⊃ Bn+1 for every n, the two previous observations yield

∞⋃

n=1

An = A1 ∪∞⋃

n=1

(An+1 \An) ∈ C and

∞⋂

n=1

Bn =( ∞⋃

n=1

Bcn)c ∈ C.

By Theorem A.1.18, it follows that C contains the Borel σ-algebra of M .

This concludes the proof of Theorem 5.1.3 from Theorem 5.1.11.


5.1.5 Exercises

5.1.1. Show that a partition P is measurable if and only if there exist mea-surable subsets M0, E1, E2, . . . , En, . . . such that µ(M0) = 1 and, restricted toM0,

P =

∞∨

n=1

En,M \ En.

5.1.2. Let µ be an ergodic probability measure for a transformation f . Then µis also invariant under fk for any k ≥ 2. Describe the ergodic decomposition ofµ for the iterate fk.

5.1.3. Let M be a metric space and X be a measurable space. Prove that thefollowing conditions are all equivalent:

(a) the map ν : X → M1(M), x 7→ νx is measurable;

(b) the mapX → R, x 7→∫ϕdνx is measurable, for every bounded continuous

function ϕ : M → R;

(c) the map X → R, x 7→∫ψ dνx is measurable, for every bounded measur-

able function ψ : M → R;

(d) the map X → R, x 7→ νx(E) is measurable, for every measurable setE ⊂M .

5.1.4. Prove that if µP : P ∈ P is a disintegration of µ then

∫ψ dµ =

∫ ( ∫ψ dµP

)dµ(P )

for every bounded measurable function ψ : M → R.

5.1.5. Let µ be a probability measure invariant under a measurable transfor-mation f : M →M . Let f : M → M be the natural extension of f and µ be thelift of µ (Section 2.4.2). Relate the ergodic decomposition of µ to the ergodicdecomposition of µ.

5.1.6. When M is a compact metric space, we may obtain the ergodic decom-position of an invariant probability measure µ by taking for M0 the subset ofpoints x ∈M such that

µx = limn

1

n

n−1∑

j=0

δfj(x)

exists in the weak∗ topology and taking for P the partition of M0 defined byP(x) = P(y) ⇔ µx = µy. Check the details of this alternative proof of Theo-rem 5.1.3 for compact metric spaces.

5.2. ROKHLIN DISINTEGRATION THEOREM 155

5.1.7. Let σ : Σ → Σ be the shift map in Σ = 1, . . . , dZ. Consider thepartition Ws of Σ into “stable sets”

Ws((an)n) = (xn)n : xn = an for every n ≥ 0.

Given any probability measure µ invariant under σ, let µP : P ∈ P be anergodic decomposition of µ. Check that P ≺ Ws, restricted to a full measuresubset of M .

5.2 Rokhlin disintegration theorem

Now we prove Theorem 5.1.11. Fix any increasing P1 ≺ P2 ≺ · · · ≺ Pn ≺ · · · ofcountable partitions such that P = ∨∞

n=1Pn restricted to some full measure setM0 ⊂M . As before, we use Pn(x) to denote the element of Pn that contains agiven point x ∈M .

5.2.1 Conditional expectations

Let ψ : M → R be any bounded measurable function. For each n ≥ 1, defineen(ψ) : M → R as follows:

en(ψ, x) =

1

µ(Pn(x))

∫

Pn(x)

ψ dµ if µ(Pn(x)) > 0

0 otherwise.(5.2.1)

Since the partitions Pn are countable, the second case of the definition corre-sponds to a subset of points with total measure zero. Observe also that en(ψ) isconstant on each Pn ∈ Pn; let us denote by En(ψ, Pn) the value of that constant.Then,

∫ψ dµ =

∑

Pn

∫

Pn

ψ dµ =∑

Pn

µ(Pn)En(ψ, Pn) =

∫en(ψ) dµ (5.2.2)

for every n ∈ N (the sums involve only partition elements Pn ∈ Pn with positivemeasure).

Lemma 5.2.1. Given any bounded measurable function ψ : M → R, thereexists a subset Mψ of M with µ(Mψ) = 1 such that

(a) e(ψ, x) = limn en(ψ, x) exists for every x ∈Mψ.

(b) e(ψ) : Mψ → R is measurable and constant on each P ∈ P.

(c)∫ψ dµ =

∫e(ψ) dµ.

Proof. Initially, suppose that ψ ≥ 0. For each α < β, let S(α, β) be the set ofpoints x ∈M such that

lim infn

en(ψ, x) < α < β < lim supn

en(ψ, x).


It is clear that the sequence en(ψ, x) diverges if and only if x ∈ S(α, β) for somepair of rational numbers α < β. In other words, the limit e(ψ, x) exists if andonly if x belongs to the intersection Mψ of all S(α, β)c with rational α < β. Asthis is a countable intersection, in order to prove that µ(Mψ) = 1 it suffices toshow that µ(S(α, β)) = 0 for every α < β. We do this in the sequel.

Let α and β be fixed and denote S = S(α, β). Given x ∈ S, fix any sequenceof integer numbers 1 ≤ ax1 < bx1 < · · · < axi < bxi < · · · such that

eaxi(ψ, x) < α and ebx

i(ψ, x) > β for every i ≥ 1.

Define Ai to be the union of the partition elements Ai(x) = Paxi(x) and Bi to

be the union of the partition elements Bi(x) = Pbxi(x) obtained in this way, for

all points x ∈ S. By construction, S ⊂ Ai+1 ⊂ Bi ⊂ Ai for every i ≥ 1. Inparticular, S is contained in the set

S =∞⋂

i=1

Bi =∞⋂

i=1

Ai .

Since the sequence Pn, n ≥ 1, is monotone increasing, given any two of the setsAi(x) = Pax

i(x) that form Ai, either they are disjoint or one is contained in the

other. It follows that the maximal sets Ai(x) are pairwise disjoint and, hence,they constitute a partition of Ai. Hence, adding only over such maximal setswith positive measure,

∫

Ai

ψ dµ =∑

Ai(x)

∫

Ai(x)

ψ dµ ≤∑

Ai(x)

αµ(Ai(x)) = αµ(Ai),

for every i ≥ 1. Analogously,∫

Bi

ψ dµ =∑

Bi(x)

∫

Bi(x)

ψ dµ ≥∑

Bi(x)

βµ(Bi(x)) = βµ(Bi).

Since Ai ⊃ Bi and we are assuming that ψ ≥ 0, it follows that

αµ(Ai) ≥∫

Ai

ψ dµ ≥∫

Bi

ψ dµ ≥ βµ(Bi),

for every i ≥ 1. Taking the limit as i → ∞, we find that αµ(S) ≥ βµ(S). This

implies that µ(S) = 0 and, hence, µ(S) = 0. This proves the claim when ψ isnon-negative. The general case follows immediately, since we may always writeψ = ψ+ −ψ−, where ψ± are measurable, non-negative and bounded. Note thaten(ψ) = en(ψ

+) − en(ψ−) for every n ≥ 1 and, hence, the conclusion of the

lemma holds for ψ if it holds for ψ+ and ψ−. This ends the proof of claim (a).The other claims are simple consequences of the definition. The fact that

e(ψ) is measurable follows directly from Proposition A.1.31. Since Pn is coarserthan P , it is clear that en(ψ) is constant on each P ∈ P , restricted to a subsetof M with full measure. Hence, the same is true for e(ψ). This proves part (b).


Observe also that |en(ψ)| ≤ sup |ψ| for every n ≥ 1. Hence, we may use thedominated convergence theorem to pass to the limit in (5.2.2). In this way, weget part (c).

We are especially interested in the case when ψ is a characteristic function:ψ = XA for some measurable set A ⊂M . In this case, the definition means that

e(ψ, x) = limn

µ(Pn(x) ∩A)

µ(Pn(x)). (5.2.3)

We denote by PA the subset of elements P of the partition P that intersectMψ. Observe that µ(PA) = 1. Moreover, we define E(A) : PA → R by settingE(A,P ) = e(ψ, x) for any x ∈Mψ ∩ P . Note that e(ψ) = E(A) π. Hence, thefunction E(A) is measurable and satisfies:

∫ψ dµ =

∫e(ψ) dµ =

∫E(A) dµ. (5.2.4)

5.2.2 Criterion for σ-additivity

The hypothesis that the ambient space M is complete separable metric space isused in the proof of the important criterion for σ-additivity that we state andprove in the sequel:

Proposition 5.2.2. Let M be a complete separable metric space and A be analgebra generated by a countable basis U = Uk : k ∈ N of open sets of M .Let µ : A → [0, 1] be an additive function with µ(∅) = 0. Then µ extends to aprobability measure on the Borel σ-algebra of M .

First, let us outline the proof. We consider the product space Σ = 0, 1N,endowed with the topology generated by the cylinders

[0; a0, . . . , as] = (ik)k∈N : i0 = a0, . . . , is = as, s ≥ 0.

Note that Σ is compact (Exercise A.1.11). Using the fact that M is a completemetric space, we will show that the map

γ : M → Σ, γ(x) =(XUk

(x))k∈N

is a measurable embedding of M inside Σ. Moreover, the function µ yields anadditive function ν defined on the algebra AΣ generated by the cylinders of Σ.This algebra is compact (Definition A.1.15), since every element is compact.Hence, ν extends to a probability measure on the Borel σ-algebra of Σ; westill represent this extension by ν. We will show that the image γ(M) has fullmeasure for ν. Then, the image γ−1

∗ ν is a probability measure on the Borel σ-algebra ofM . Finally, we will check that this probability measure is an extensionof the function µ.

Now let us detail these arguments. In what follows, given any set A ⊂ M ,we denote A1 = A and A0 = Ac.


Lemma 5.2.3. The image γ(M) is a Borel subset of Σ.

Proof. Let x ∈M and (ik)k = γ(x). It is clear that

(A)⋂kj=0 U

ijj 6= ∅ for every k ∈ N,

since x is in the intersection. Moreover, since U is a basis of open sets of M ,

(B) there exists k ∈ N such that ik = 1 and diamUk ≤ 1 and

(C) for every k ∈ N such that ik = 1 there exists l(k) > k such that il(k) = 1and Ul(k) ⊂ Uk and diamUl(k) ≤ diamUk/2.

Conversely, suppose that (ik)k ∈ Σ satisfies conditions (A), (B) and (C). We aregoing to show that there exists x ∈M such that γ(x) = (ik)k. For that, define

Fn =

n⋂

k=0

Vk,

where Vk = U ck if ik = 0 and Vk = Ul(k) if ik = 1. Then (Fn)n is a decreasingsequence of closed sets. Condition (A) assures that Fn 6= ∅ for every n ≥ 1.Conditions (B) and (C) imply that the diameter of Fn converges to zero asn → ∞. Then, since M is a complete metric space, the intersection ∩nFncontains some point x. By construction, Fn is contained in ∩nk=0U

ikk for every

n. It follows that

x ∈∞⋂

k=0

U ikk

that is, γ(x) = (ik)k. In this way, we have shown that the image of γ is perfectlycharacterized by the conditions (A), (B) and (C).

To conclude the proof it suffices to show that the subset described by eachof these conditions may be constructed from cylinders through countable unionsand intersections. Given k ∈ N, let N(k) be the set of (k+ 1)-uples (a0, . . . , ak)in 0, 1 such that Ua0

0 ∩· · ·∩Uak

k 6= ∅. Condition (A) corresponds to the subset

∞⋂

k=0

⋃

(a0,...,ak)∈N(k)

[0; a0, . . . , ak].

Let D = k ∈ N : diamUk ≤ 1. Then, condition (B) corresponds to⋃

k∈D

⋃

(a0,...,ak−1)

[0; a0, . . . , ak−1, 1].

Finally, given any k ∈ N, let L(k) be the set of all l > k such that Ul ⊂ Uk anddiamUl ≤ diamUk/2. Condition (C) corresponds to the subset

∞⋂

k=0

⋃

a0,...,ak−1

([0; a0, . . . , ak−1, 0]

∪⋃

l∈L(k)

⋃

ak+1,...,al−1

[0; a0, . . . , ak−1, 1, ak+1, . . . , al−1, 1]).



Corollary 5.2.4. The map γ : M → γ(M) is a measurable bijection whoseinverse is also measurable.

Proof. Given any points x 6= y in M , there exists k ∈ N such that Uk containsone of the points but not the other. This ensures that γ is injective. For anys ≥ 0 and a0, . . . , as ∈ 0, 1,

γ−1([0; a0, . . . , as]) = Ua00 ∩ · · · ∩ Uas

s . (5.2.5)

This implies that γ is measurable, because the cylinders generate the Borelσ-algebra of Σ. Next, observe that

γ(Ua00 ∩ · · · ∩ Uas

s ) = [0; a0, . . . , as] ∩ γ(M) (5.2.6)

for every s, a0, . . . , as. Using Lemma 5.2.3, it follows that γ(Ua00 ∩ · · · ∩ Uas

s ) isa Borel subset of Σ for every s, a0, . . . , as. This proves that the inverse trans-formation γ−1 is measurable.

Now we are ready to prove that µ extends to a probability measure on theBorel σ-algebra of M , as claimed in Proposition 5.2.2. For that, let us considerthe algebra AΣ generated by the cylinders of Σ. Note that the elements of A arethe finite pairwise disjoint unions of cylinders. In particular, every element ofAΣ is compact and, consequently, AΣ is a compact algebra (Definition A.1.15).Define:

ν([0; a0, . . . , as]) = µ(Ua00 ∩ · · · ∩ Uas

s ), (5.2.7)

for every s ≥ 0 and a0, . . . , as in 0, 1. Then ν is an additive function in the setof all cylinders, with values in [0, 1]. It extends in a natural way to an additivefunction defined on the algebra AΣ, which we still denote as ν.

It is clear that ν(Σ) = 1. Moreover, since the algebra AΣ is compact, wemay use Theorem A.1.14 to conclude that the function ν : AΣ → [0, 1] is σ-additive. Hence, by Theorem A.1.13, the function ν extends to a probabilitymeasure defined on the Borel σ-algebra of Σ. Given any cover C of γ(M) bycylinders, it follows from the definition (5.2.7) that

ν( ⋃

C∈C

C)

= µ(⋃

C∈C

γ−1(C))

= µ(M) = 1.

Taking the infimum over all covers, we conclude that ν(γ(M)) = 1.By Corollary 5.2.4, the image γ−1

∗ ν is a Borel probability measure on M .By definition, and using the relation (5.2.6),

γ−1∗ ν(Ua0

0 ∩ · · · ∩ Uass ) = ν(γ(Ua0

0 ∩ · · · ∩ Uass )) = ν([0; a0, . . . , as] ∩ γ(M))

= ν([0; a0, . . . , as]) = µ(Ua00 ∩ · · · ∩ Uas

s )

for any s, a0, . . . , as. This implies that γ−1∗ ν is an extension of the function

µ : A → [0, 1]. Therefore, the proof of Proposition 5.2.2 is complete.


5.2.3 Construction of conditional measures

Let U = Uk : k ∈ N be a basis of open sets of M and A be the algebragenerated by U . It is clear that A generates the Borel σ-algebra of M . Observealso that A is countable: it coincides with the union of the (finite) algebrasgenerated by the subsets Uk : 0 ≤ k ≤ n, for every n ≥ 1. Define:

P∗ =⋂

A∈A

PA

Then µ(P∗) = 1, since the intersection is countable. For each P ∈ P∗, define:

µP : A → [0, 1], µP (A) = E(A,P ). (5.2.8)

In particular, µP (M) = E(M,P ) = 1. It is clear that µP is an additive function:the definition (5.2.3) gives that

A ∩B = ∅ ⇒ E(A ∪B,P ) = E(A,P ) + E(B,P ) for every P ∈ P∗.

By Proposition 5.2.2, it follows that this function extends to a probability mea-sure defined on the Borel σ-algebra of M , which we still denote as µP . Weare left to check that this family of measures µP : P ∈ P∗ satisfies all theconditions in the definition of disintegration (Definition 5.1.4).

Let us start with condition (a). Let P ∈ P∗ and, for every n ≥ 1, let Pn bethe element of the partition Pn that contains P . Observe that if A ∈ A is suchthat A ∩ Pn = ∅ for some n then

µP (A) = E(A,P ) = limm

µ(A ∩ Pm)

µ(Pm)= 0,

since Pm ⊂ Pn for everym ≥ n. Fix n. For each s ≥ 0, let P sn be the union of allsets of the form Ua0

0 ∩ · · · ∩Uass that intersect Pn. By the previous observation,

the cylinders of length s + 1 that are not in P sn have measure zero for µP .Therefore, µP (P sn) = 1 for every s ≥ 0. Passing to the limit when s → ∞,we conclude that µP (U) = 1 for every open set U that contains Pn. Since themeasure µP is regular (Proposition A.3.2), it follows that µP (Pn) = 1. Passingto the limit when n→ ∞, we find that µP (P ) = 1 for every P ∈ P∗.

Finally, let C denote the family of all measurable sets E ⊂ M for whichconditions (b) and (c) hold. By construction (recall Lemma 5.2.1), given anyA ∈ A, the function P 7→ µP (A) = E(A,P ) is measurable and satisfies

µ(A) =

∫E(A,P ) dµ(P ) =

∫µP (A) dµ(P ).

This means that A ⊂ C. We claim that C is a monotone class. Indeed, supposethat B is the union of an increasing sequence (Bj)j of sets in C. Then, byProposition A.1.31

P 7→ µP (B) = supjµP (Bj) is a measurable function


and, using the monotone convergence theorem,

µ(B) = limnµ(Bn) = lim

n

∫µP (Bn) dµ =

∫limnµP (Bn) dµ =

∫µP (B) dµ.

This means that B ∈ C. Analogously, if B is the intersection of a decreasing se-quence of sets in C then P 7→ µP (B) is measurable and µ(B) =

∫µP (B) dµ(P ).

That is, B ∈ C. This proves that C is a monotone class, as we claimed. ByTheorem A.1.18 it follows that C coincides with the Borel σ-algebra of M .

The proof of Theorem 5.1.11 is complete.

5.2.4 Exercises

5.2.1. Let P and Q be measurable partitions of (M,B, µ) such that P ≺ Q upto measure zero. Let µP : P ∈ P be a disintegration of µ with respect to Pand, for every P ∈ P , let µP,Q : Q ∈ Q, Q ⊂ P be a disintegration of µP withrespect to Q. Let π : Q → P be the canonical projection, such that Q ⊂ π(Q)for almost every Q ∈ Q. Show that µπ(Q),Q : Q ∈ Q is a disintegration of µwith respect to Q.

5.2.2. (Converse to the theorem of Rokhlin) Let M be a complete separablemetric space. Show that if P satisfies the conclusion of Theorem 5.1.11, that is, ifµ admits a disintegration with respect to P , then the partition P is measurable.

5.2.3. Let P1 ≺ · · · ≺ Pn ≺ · · · be an increasing sequence of countable parti-tions such that the union ∪nPn generates the σ-algebra B of measurable sets,up to measure zero. Show that the conditional expectation e(ψ) = limn en(ψ)coincides with ψ at almost every point, for every bounded measurable function.

5.2.4. Prove Proposition 2.4.4, using Proposition 5.2.2.


Chapter 6

Unique Ergodicity

This chapter is dedicated to a distinguished class of dynamical systems, charac-terized by the fact that they admit exactly one invariant probability measure.Initially, in Section 6.1, we give alternative formulations of this property andwe analyze the properties of the unique invariant measure.

The relation between unique ergodicity and minimality is another importanttheme. A dynamical system is said to be minimal if every orbit is dense in theambient space. As we observe in Section 6.2, every uniquely ergodic system isminimal, restricted to the support of the invariant measure, but the converse isnot true, in general.

The main construction of uniquely ergodic transformations is algebraic innature. In Section 6.3 we introduce the notion of Haar measure of a topologicalgroup and we show that every transitive translation on a compact metrizabletopological group is minimal and even uniquely ergodic: the Haar measure isthe unique invariant probability measure.

In Section 6.4 we present a remarkable application of the idea of uniqueergodicity in the realm of Arithmetics: the theorem of Hermann Weyl on theequidistribution of polynomial sequences.

Throughout this chapter, unless stated otherwise, it is understood that Mis a compact metric space and f : M →M is a continuous transformation.

6.1 Unique ergodicity

We say that a transformation f : M →M is uniquely ergodic if it admits exactlyone invariant probability measure. The corresponding notion for flows is definedin precisely the same way. This denomination is justified by the observation thatthe invariant probability measure µ is necessarily ergodic. Indeed, suppose thereexisted some invariant set A ⊂ M with 0 < µ(A) < 1. Then the normalizedrestriction of µ to A, defined by

µA(E) =µ(E ∩A)

µ(A)for every measurable set E ⊂M

163

164 CHAPTER 6. UNIQUE ERGODICITY

would be an invariant probability measure, different from µ, which would con-tradict the assumption that f is uniquely ergodic.

Proposition 6.1.1. The following conditions are equivalent:

(a) f admits a unique invariant probability measure;

(b) f admits a unique ergodic probability measure;

(c) for every continuous function ϕ : M → R, the sequence of time averages

n−1∑n−1

j=0 ϕ(f j(x)) converges at every point to a constant;

(d) for every continuous function ϕ : M → R, the sequence of time averages

n−1∑n−1

j=0 ϕ f j converges uniformly to a constant.

Proof. It is easy to see that (b) implies (a). Indeed, since invariant measure is aconvex combination of ergodic measures (Theorem 5.1.3), if there is a unique er-godic probability measure then the invariant probability measure is also unique.It is clear that (d) implies (c), since uniform convergence implies pointwise con-vergence. To see that (c) implies (b), suppose that µ and ν are ergodic proba-bility measures of f . Then, given any continuous function ϕ : M → R,

limn

1

n

n−1∑

j=0

ϕ(f j(x)) =

∫ϕdµ at µ-almost every point

∫ϕdν at ν-almost every point.

Since, by assumption, the limit does not depend on the point x, it follows that

∫ϕdµ =

∫ϕdν

for every continuous function ϕ. Using Proposition A.3.3 we find that µ = ν.We are left to prove that (a) implies (d). Start by recalling that f admits

some invariant probability measure µ (by Theorem 2.1). The idea is to showthat if (d) does not hold then there exists some probability measure ν 6= µ and,hence, (a) does not hold either. Suppose then that (d) does not hold, that is,

that there exists some continuous function ϕ : M → R such that n−1∑n−1

j=0 ϕf jdoes not converge uniformly to any constant; in particular, it does not convergeuniformly to

∫ϕdµ. By definition, this means that there exists ε > 0 such that

for every k ≥ 1 there exist nk ≥ k and xk ∈M such that

∣∣ 1

nk

nk−1∑

j=0

ϕ(f j(xk)) −∫ϕdµ

∣∣ ≥ ε. (6.1.1)

Let us consider the sequence of probability measures

νk =1

nk

nk−1∑

j=0

δfj(xk).

6.2. MINIMALITY 165

Since the space M1(M) of probability measures on M is compact for the weak∗

topology (Theorem 2.1.5), up to replacing this sequence by a subsequence,we may suppose that it converges to some probability measure ν on M . ByLemma 2.2.4 applied to the Dirac measure δx, the probability measure ν is in-variant under f . On the other hand, the fact that (νk)k converges to ν in theweak∗ topology implies that

∫ϕdν = lim

k

∫ϕdνk = lim

k

1

nk

nk−1∑

j=0

ϕ(f j(xk)).

Then, recalling (6.1.1), we have that

∣∣∫ϕdν −

∫ϕdµ

∣∣ ≥ ε.

In particular, ν 6= µ. This concludes the argument.

6.1.1 Exercises

6.1.1. Give an example of a transformation f : M → M in a compact metricspace such that (1/n)

∑n−1j=0 ϕ f j converges uniformly, for every continuous

function ϕ : M → R, but f is not uniquely ergodic.

6.1.2. Let f : M →M be a transitive continuous transformation in a compactmetric space. Show that if (1/n)

∑n−1j=0 ϕ f j converges uniformly, for every

continuous function ϕ : M → R, then f is uniquely ergodic.

6.1.3. Let f : M → M be an isometric homeomorphism in a compact metricspace M . Show that if µ is an ergodic measure for f then, for every n ∈ N, thefunction ϕ(x) = d(x, fn(x)) is constant on the support of µ.

6.2 Minimality

Let Λ ⊂ M be a closed invariant set of f : M → M . We say that Λ is minimalif it coincides with the closure of the orbit fn(x) : n ≥ 0 of every point x ∈ Λ.We say that the transformation f is minimal if the ambient M is a minimal set.

Recall that the support of a measure µ is the set of all points x ∈ M suchthat µ(V ) > 0 for every neighborhood V of x. It follows immediately from thedefinition that the complement of the support is an open set: if x /∈ suppµ thenthere exists an open neighborhood V such that µ(V ) = 0; then V is containedin the complement of the support. Therefore, suppµ is a closed set.

It is also easy to see that the support of any invariant measure is an invariantset, in the following sense: f(suppµ) ⊂ suppµ. Indeed, let x ∈ suppµ andlet V be any neighborhood of y = f(x). Since f is continuous, f−1(V ) is aneighborhood of x. Then µ(f−1(V )) > 0, because x ∈ suppµ. Hence, usingthat µ is invariant, µ(V ) > 0. This proves that y ∈ suppµ.


Proposition 6.2.1. If f : M → M is uniquely ergodic then the support of theunique invariant probability measure µ is a minimal set.

Proof. Suppose that there exists x ∈ suppµ whose orbit f j(x) : j ≥ 0 is notdense in the support of µ. This means that there exists some open subset U ofM such that U ∩ suppµ is non-empty and

f j(x) /∈ U ∩ suppµ for every j ≥ 0. (6.2.1)

Let ν be any accumulation point of the sequence of probability measures

νn = n−1n−1∑

j=0

δfj(x), n ≥ 1

with respect to the weak∗ topology. Accumulation points do exist, by Theo-rem 2.1.5, and ν is an invariant probability measure, by Lemma 2.2.4. Thecondition (6.2.1) means that νn(U) = 0 for every n ≥ 1. Hence, using The-orem 2.1.2 (see also part 3 of Exercise 2.1.1) we have that ν(U) = 0. Thisimplies that no point of U is in the support of µ, which contradicts the fact thatU ∩ suppµ is non-empty.

The converse to Proposition 6.2.1 is false in general:

Theorem 6.2.2 (Furstenberg). There exists some real-analytic diffeomorphismf : T2 → T2 that is minimal, preserves the Lebesgue measure m on the torus,but is not ergodic for m. In particular, f is not uniquely ergodic.

In the remainder of this section we give a brief sketch of the proof of thisresult. A detailed presentation may be found in the original paper of Fursten-berg [Fur61], as well as in the book of Mane [Man87]. In Section 7.3.1 we mentionother examples of minimal transformations that are not uniquely ergodic.

To prove Theorem 6.2.2, we look for a transformation f : T2 → T2 ofthe form f(x, y) = (x + α, y + φ(x)), where α is an irrational number andφ : S1 → R is a real-analytic function with

∫φ(x) dx = 0. Note that f preserves

the Lebesgue measure on the T2. Let us also consider the map f0 : T2 → T2

given by f0(x, y) = (x+ α, y). Note that no orbit of f0 is dense in T2 and thatthe system (f0,m) is not ergodic.

Let us consider the cohomological equation

u(x+ α) − u(x) ≡ φ(x). (6.2.2)

If φ and α are such that (6.2.2) admits some measurable solution u : S1 →R then (f0,m) and (f,m) are ergodically equivalent (see Exercise 6.2.1) and,consequently, (f,m) is not ergodic. On the other hand, one can show thatif (6.2.2) admits no continuous solution then f is minimal (the converse tothis fact is Exercise 6.2.2). Therefore, it suffices to find φ and α such thatthe cohomological equation admits a measurable solution but not a continuoussolution.

6.2. MINIMALITY 167

It is convenient to express these conditions in terms of the Fourier expansionφ(x) =

∑n∈Z ane

2πinx. To ensure that φ is real-analytic it is enough to requirethat:

there exists ρ < 1 such that |an| ≤ ρn for every n sufficiently large. (6.2.3)

Indeed, in that case the series∑

n∈Z anzn converges uniformly on every corona

z ∈ C : r ≤ |z| ≤ r−1 with r > ρ. In particular, its sum in the unit circle,which coincides with φ, is a real-analytic function. Since we want φ to takevalues in the real line and to have zero average, we must also require:

a0 = 0 and a−n = an for every n ≥ 1. (6.2.4)

According to Exercise 6.2.3, the cohomological equation admits a solutionin the space L2(m) if and only if

∞∑

n=1

∣∣∣an

e2πniα − 1

∣∣∣2

<∞. (6.2.5)

Moreover, the solution is uniquely determined: u =∑

n∈Z bne2πinx with

bn =an

e2πinα − 1for every n ∈ Z. (6.2.6)

The theorem of Fejer’s (see [Zyg68]) states that if u is a continuous functionthen the sequence of partial sums of its Fourier expansion converges Cesarouniformly to u:

1

n

n∑

k=1

( k∑

j=−k

bje2πijx

)converges uniformly to u(x). (6.2.7)

Hence, to ensure that u is not continuous it suffices to require:

( k∑

j=−k

bj

)

kis not Cesaro convergent. (6.2.8)

In this way, the problem is reduced to finding α and (an)n that satisfy(6.2.3), (6.2.4), (6.2.5) and (6.2.8). Exercise 6.2.4 hints at the issues involved inthe choice of such objects.

6.2.1 Exercises

6.2.1. Show that if u is a measurable solution of the cohomological equation(6.2.2) then h : T2 → T2, h(x, y) = (x, y + u(x)) is an ergodic equivalencebetween (f0,m) and (f,m), that is, h is an invertible measurable transformationthat preserves the measure m and conjugates the two maps f and f0. Deducethat (f,m) can not be ergodic.


6.2.2. Show that if u is a continuous solution of the cohomological equation(6.2.2) then h : T2 → T2, h(x, y) = (x, y + u(x)) is a topological conjugacybetween f0 and f . In particular, f can not be transitive.

6.2.3. Check that if u(x) =∑

n∈Z bne2πinx is a solution of (6.2.2) then

bn =an

e2πinα − 1for every n ∈ Z. (6.2.9)

Moreover, u ∈ L2(m) if and only if∑∞n=1 |bn|2 <∞.

6.2.4. We say that an irrational number α is Diophantine if there exist c > 0and τ > 0 such that |qα−p| ≥ c|q|−τ for any p, q ∈ Z with q 6= 0. Show that thecondition (6.2.5) is satisfied whenever α is Diophantine and φ satisfies (6.2.3).

6.2.5. (Theorem of Gottschalk) Let f : M → M be a continuous map in acompact metric space M . Show that the closure of the orbit of a point x ∈ Mis a minimal set if and only if Rε = n ∈ Z : d(x, fn(x)) < ε is a syndetic setfor every ε > 0.

6.2.6. Let f : M →M be a continuous map in a compact metric space M . Wesay that x, y ∈ M are close if infn d(f

n(x), fn(y)) = 0. Show that if x ∈ M issuch that the closure of its orbit is a minimal set then, for every neighborhoodU of x and every point y close to x, there exists an increasing sequence (ni)isuch that fni1+···+nik (x) and fni1+···+nik (y) are in U for any i1 < · · · < ik andk ≥ 1.

6.2.7. (Theorem of Hindman) A theorem of Auslander and Ellis (see [Fur81,Theorem 8.7]) states that in the conditions of Exercise 6.2.6 the closure of theorbit of every y ∈ M contains some point x that is close to y and such thatthe closure of its orbit is a minimal set. Deduce the following refinement of thetheorem of van der Waerden: given any decomposition N = S1 ∪ · · · ∪ Sq of theset of natural numbers in to pairwise disjoint sets, there exists j such that Sjcontains a sequence n1 < · · · < ni < · · · such that ni1 + · · ·+nik ∈ Sj for everyk ≥ 1 and any i1 < · · · < ik.

6.3 Haar measure

We are going to see that every compact topological group carries a remarkableprobability measure, called Haar measure, that is invariant under every trans-lation and every surjective group endomorphism. Assuming that the group ismetrizable, every transitive translation is uniquely ergodic, with the Haar mea-sure as the unique invariant probability measure.


Fix d ≥ 1 and a rationally independent vector θ = (θ1, . . . , θd). As we haveseen in Section 4.2.1, the rotation Rθ : Td → Td is ergodic with respect to the

6.3. HAAR MEASURE 169

Lebesgue measure m on the torus. Our goal now is to show that, in fact, Rθ isuniquely ergodic.

According to Proposition 6.1.1, we only have to show that, given any con-tinuous function ϕ : Td → R, there exists cϕ ∈ R such that

ϕn =1

n

n−1∑

j=0

ϕ Rjθ converges to cϕ at every point. (6.3.1)

Take cϕ =∫ϕdm. By ergodicity, the sequence (ϕn)n of time averages converges

to cϕ at m-almost every point. In particular, ϕn(x) → cϕ for a dense subset ofvalues of x ∈ Td.

Let d be the distance induced in the torus Td = Rd/Zd by the usual normin Rd: the distance between any two points in the torus is the minimum of thedistances between all their representatives in Rd. It is clear that the rotationRθ preserves that distance:

d(Rθ(x), Rθ(y)) = d(x, y) for every x, y ∈ Td.

Then, using that ϕ is continuous, given any ε > 0 we may find δ > 0 such that

d(x, y) < δ ⇒ d(Rjθ(x), Rjθ(y)) < δ ⇒ |ϕ(Rjθ(x)) − ϕ(Rjθ(y))| < ε

for every j ≥ 0. Then,

d(x, y) < δ ⇒ |ϕn(x) − ϕn(y)| < ε for every n ≥ 1.

Since ε does not depend on n, this proves that the sequence (ϕn)n is equicon-tinuous.

This allows us to use the theorem of Ascoli to prove the claim (6.3.1), asfollows. Suppose that there exists x ∈ Td such that (ϕn(x))n does not convergeto cϕ. Then there exists c 6= cϕ and some subsequence (nk)k such that ϕnk

(x)converges to c when k → ∞. By the theorem of Ascoli, up to restricting to asubsequence we may suppose that (ϕnk

)k is uniformly convergent. Let ψ be itslimit. Then ψ is a continuous function such that ψ(x) = cϕ for a dense subsetof values of x ∈ Td but ψ(x) = c is different from cϕ. It is clear that such afunction does not exist. This contradiction proves our claim that Rθ is uniquelyergodic.

6.3.2 Topological groups and Lie groups

Recall that a topological group is a group (G, ·) endowed with a topology withrespect to which the two operations

G×G→ G, (g, h) 7→ gh and G→ G, g 7→ g−1 (6.3.2)

are continuous. In all that follows it is assumed that the topology of G is suchthat every set consisting of a single point is closed. When G is a manifold and


the operations in (6.3.2) are differentiable, we say that (G, ·) is a Lie group. SeeExercise 6.3.1.

The Euclidean space Rd is a topological group, and even a Lie group, relativeto the addition + and the same holds for the torus Td. Recall that Td is thequotient of Rd by its subgroup Zd. This construction may be generalized asfollows:

Example 6.3.1. Given any closed normal subgroup H of a topological groupG, let G/H be the set of equivalence classes for the equivalence relation definedin G by x ∼ y ⇔ x−1y ∈ H . Denote by xH the equivalence class that containseach x ∈ G. Consider the following group operation in G/H :

xH · yH = (x · y)H.

The hypothesis that H is a normal subgroup ensures that this operation is welldefined. Let π : G 7→ G/H be the canonical projection, given by π(x) = xH .Consider in G/H the quotient topology, defined in the following way: a functionψ : G/H → X is continuous if and only if ψ π : G → X is continuous. Thehypothesis that H is closed ensures that the points are closed subsets of G/H .It follows easily from the definitions that G/H is a topological group. Recallalso that if the group G is abelian then all subgroups are normal.

Example 6.3.2. (Linear group) The set G = GL(d,R) of invertible real matri-ces of dimension d is a Lie group for the multiplication of matrices, called reallinear group of dimension d. Indeed, G may be identified with an open subset ofthe Euclidean space R(d2) and, thus, has a natural structure of a differentiablemanifold. Moreover, it follows directly from the definitions that the multiplica-tion of matrices and the inversion map A 7→ A−1 are differentiable with respectto this manifold structure. G has many important Lie subgroups, such as thespecial linear group SL(d,R), consisting of the matrices with determinant 1, andthe orthogonal group O(d,R), formed by the orthogonal matrices.

We call left-translation and right-translation associated with an element gof the group G, respectively, the maps

Lg : G→ G, Lg(h) = gh and Rg : G→ G, Rg(h) = hg.

An endomorphism of G is a continuous map φ : G → G that preserves thegroup operation, that is, such that φ(gh) = φ(g)φ(h) for every g, h ∈ G. Whenφ is an invertible endomorphism, that is, a bijection whose inverse is also anendomorphism, we call it an automorphism.

Example 6.3.3. Let A ∈ GL(d,Z); in other words, A is an invertible matrixof dimension d with integer coefficients. Then, as we have seen in Section 4.2.5,A induces an endomorphism fA : Td → Td. It can be shown that every endo-morphism of the torus Td is of this form.

A topological group is locally compact if every g ∈ G has some compactneighborhood. For example, every Lie group is locally compact. On the other


hand, the additive group of rational numbers, with the topology inherited fromthe real line, is not locally compact.

The following theorem is the starting point of the ergodic theory of locallycompact groups:

Theorem 6.3.4 (Haar). Let G be a locally compact topological group. Then:

(a) There exists some Borel measure µG on G that is invariant under all left-translations, finite on compact sets and positive on open sets;

(b) If η is a measure invariant under all left-translations and finite on compactsets then η = cµG for some c > 0.

(c) µG(G) <∞ if and only if G is compact.

We are going to sketch the proof of parts (a) and (b) in the special casewhen G is a Lie group. It will be apparent that in this case µG is a volumemeasure on G. The proof of part (c), for any topological group, is proposed inExercise 6.3.4.

Starting with part (a), let e be the unit element and d ≥ 1 be the dimensionof the Lie group. Consider any inner product · in the tangent space TeG. Foreach g ∈ G, represent by Lg : TeG → TgG the derivative of the left-translationLg at the point e. Next, consider the inner product defined in TgG in followingway:

u · v = L−1g (u) · L−1

g (v) for every u, v ∈ TgG.

It is clear that this inner product depends differentiably on g. Therefore, itdefines a Riemannian metric in G. It is also clear from the construction thatthis metric is invariant under left-translations: noting that Lhg = DLh(g)Lg,we see that

DLh(g)(u) ·DLh(g)(v) = L−1hgDLh(g)(u) · L−1

hgDLh(g)(v)

= L−1g (u) · L−1

g (v) = u · v

for any g, h ∈ G and u, v ∈ TgG. Let µG be the volume measure induced by thisRiemannian metric. This measure may be characterized in the following way.Given any x = (x1, . . . , xd) in G, consider

ρ(x) = det

g1,1(x) · · · g1,d(x)· · · · · · · · ·

gd,1(x) · · · gd,d(x)

where gi,j =∂

∂xi· ∂

∂xj.

Then µG(B) =∫B |ρ(x)| dx1 · · · dxd, for any measurable set B contained in

the domain of the local coordinates. Noting that the function ρ is continuousand non-zero, for every local chart, it follows that µG is positive on open setsand finite on compact sets. Moreover, since the Riemannian metric is invariantunder left-translations, the measure µG is also invariant under left-translations.

Now we move on to discussing part (b) of Theorem 6.3.4. Let ν any measureas in the statement. Denote by B(g, r) the open ball of center g and radius r,


relative to the distance associated with the Riemannian metric. In other words,B(g, r) is the set of all points in G that may be connected to g by some curve oflength less than r. Fix ρ > 0 such that ν(B(e, ρ)) is finite (such a ρ does existbecause G is locally compact and ν is finite on compact sets). We claim that

lim supr→0

ν(B(g, r))

µG(B(g, r))≤ ν(B(e, ρ))

µG(B(e, ρ))(6.3.3)

for every g ∈ G. This may be seen as follows.Firstly, the limit on the left-hand side of the inequality does not depend on

g, because both measures are assumed to be invariant under left-translations.Therefore, it is enough to consider the case g = e. Let (rn)n be any sequenceconverging to zero and such that:

limn

ν(B(e, rn))

µG(B(e, rn))= lim sup

r→0

ν(B(e, r))

µG(B(e, r)). (6.3.4)

By the Vitali lemma (Theorem A.2.16), we may find (gj)j in B(e, ρ) and (nj)jin N such that

1. the balls B(gj , rnj ) are contained in B(e, ρ) and they are pairwise disjoint;

2. the union of these balls has full µG-measure in B(e, ρ).

Moreover, given any a ∈ R smaller than the limit in (6.3.4), we may supposethat the integers nj are sufficiently large that ν(B(gj , rnj )) ≥ aµG(B(gj , rnj ))for every j. It follows that

ν(B(e, ρ)) ≥∑

j

ν(B(gj , rnj )) ≥∑

j

aµG(B(gj , rnj )) = aµG(B(e, ρ)).

Since a may be taken arbitrarily close to (6.3.4), this proves the claim (6.3.3).Next, we claim that ν is absolutely continuous with respect to µG. Indeed,

let b be any number larger than the quotient on the right-hand side of (6.3.3).Given any measurable set B ⊂ G with µG(B) = 0, and given any ε > 0, letB(gj, rj) : j be a cover of B by balls of small radii, such that ν(B(gj , rj)) ≤bµ(B(gj , rj)) and

∑j µG(B(gj , rj)) ≤ ε. Then,

ν(B) ≤∑

j

ν(B(gj , rj)) ≤ b∑

j

µ(B(gj , rj)) ≤ bε.

Since ε > 0 is arbitrary, it follows that ν(B) = 0. Therefore, ν ≪ µG, asclaimed. Now, by the Lebesgue derivation theorem (Theorem A.2.15)

dν

µG(g) = lim

r→0

1

µ(B(g, r))

∫

B(g,r)

dν

µGdµG = lim

r→0

ν(B(g, r))

µ(B(g, r))

for µ-almost every g ∈ G. The limit on the left-hand side does not depend on gand, by (6.3.3), it is finite. Let c ∈ R be that limit. Then ν = cµG, as stated inpart (b) of Theorem 6.3.4.


In the case when the group G is compact, it follows from Theorem 6.3.4that there exists a unique probability measure that is invariant under the left-translations, positive on open sets and finite on compact sets. This probabilitymeasure µG is called the Haar measure of the group. For example, the nor-malized Lebesgue measure is the Haar measure on the torus Td. See also Exer-cises 6.3.5 and 6.3.6. The Haar measures features some additional properties:

Corollary 6.3.5. Assume that G is compact. Then the Haar measure µG isinvariant under the right-translations and under every surjective endomorphismof G.

Proof. Given any g ∈ G, consider the probability measure (Rg)∗µG. Observethat Lh Rg = Rg Lh for every h ∈ G. Hence,

(Lh)∗(Rg)∗µG = (Rg)∗(Lh)∗µG = (Rg)∗µG.

In other words, (Rg)∗µG is invariant under every left-translation. By unique-ness, it follows that (Rg)∗µG = µG for every g ∈ G, as claimed.

Given any surjective endomorphism φ : G → G, consider the probabilityφ∗µG. Given any h ∈ G, choose some g ∈ φ−1(h). Observe that Lh φ = φLg.Hence,

(Lh)∗φ∗µG = φ∗(Lg)∗µG = φ∗µG.

In other words, φ∗µG is invariant under every left-translation. By uniqueness,it follows that φ∗µG = µG, as claimed.

More generally, when we do not assume G to be compact, the argument inCorollary 6.3.5 shows that for every g ∈ G there exists λ(g) > 0 such that

(Lg)∗µG = λ(g)µG.

The map G→ (0,∞), g 7→ λ(g) is a group homomorphism.

6.3.3 Translations on compact metrizable groups

We call a distance d in a topological groupG left-invariant if it is invariant underevery left-translation: d(Lh(g1), Lh(g2)) = d(g1, g2) for every g1, g2, h ∈ G.Analogously, we call a distance right-invariant if it is invariant under everyright-translation. In this section we always take the group G to be compactand metrizable. We start by observing that it is always possible to choose thedistance in G in such a way that it is invariant under all the translations:

Lemma 6.3.6. If G is a compact metrizable topological group then there existssome distance compatible with the topology of G that is both left-invariant andright-invariant.

Proof. Let (Un)n be a countable basis of neighborhoods of the unit element eof G. By Lemma A.3.4, for every n there exists a continuous function ϕn : G→


[0, 1] such that ϕn(e) = 0 and ϕn(z) = 1 for every z ∈ G \ Un. Define

ϕ : G→ [0, 1], ϕ(z) =

∞∑

n=1

2−nϕn(z).

Then, ϕ is continuous and ϕ(e) = 0 < ϕ(z) for every z 6= e. Now define

d(x, y) = sup|ϕ(gxh) − ϕ(gyh)| : (g, h) ∈ G2 (6.3.5)

for every x, y ∈ G. The supremum is finite, since we take G to be compact. Itis easy to see that d is a distance in G. Indeed, note that d(x, y) = 0 meansthat ϕ(gxh) = ϕ(gyh) for every g, h ∈ G. In particular, taking g = e andh = y−1, we get that ϕ(xy−1) = ϕ(e). By the construction of ϕ, this impliesthat x = y. All the other axioms of the notion of distance follow directly fromthe definition of d. It is also clear from the definition that d is invariant underboth left-translations and right-translations.

We are left to prove that the distance d is compatible with the topology ofthe group G. It is easy to check that, given any neighborhood V of a pointx ∈ G, there exists δ > 0 such that B(x, δ) ⊂ V . Indeed, since U = x−1V is aneighborhood of e ∈ G, the properties of ϕ ensure that there exists δ > 0 suchthat ϕ(z) ≤ 1 − δ for every z /∈ U . Then, y /∈ V implies that ϕ(x−1y) ≤ 1 − δor, in other words, that |ϕ(e)−ϕ(x−1y)| ≥ δ. Taking g = x−1 and h = e in thedefinition (6.3.5), we see that this last inequality implies that d(x, y) ≥ δ, thatis, y /∈ B(x, δ). Now let us check the converse: given x ∈ G and δ > 0, thereexists some neighborhood V of x contained in B(x, δ). By continuity, for everypair (g, h) ∈ G2 there exists an open neighborhood U × V ×W of (g, x, h) inG3 such that

|ϕ(gxh) − ϕ(g′x′h′)| ≤ δ/2 for every (g′, x′, h′) ∈ U × V ×W. (6.3.6)

The sets U × W obtained in this way, with x fixed and g, h variable, forman open cover of G2. Let Ui ×Wi, i = 1, . . . , k be a finite subcover and Vi,i = 1, . . . , k be the corresponding neighborhoods of x. Take V = ∩ki=1Vi andconsider any y ∈ V . Given any (g, h) ∈ G2, the condition (6.3.6) implies that|ϕ(gxh) − ϕ(gyh)| ≤ δ/2. It follows that d(x, y) ≤ δ/2 and, consequently,y ∈ B(x, δ).

Example 6.3.7. Given a matrix A ∈ GL(d,R), denote by ‖A‖ its operatornorm, that is, ‖A‖ = sup‖Av‖ : ‖v‖ = 1. Observe that ‖OA‖ = ‖A‖ = ‖AO‖for every O in the orthogonal group O(d,R). Define

d(A,B) = log(1 + ‖A−1B − id ‖ + ‖B−1A− id ‖).

Then d is a distance in GL(d,R), invariant under left-translations:

d(CA,CB) = log(1 + ‖A−1C−1CB − id ‖ + ‖B−1C−1CA− id ‖) = d(A,B)


for every C ∈ GL(d,R). This distance is not invariant under right-translationsin GL(d,R) (Exercise 6.3.3). However, it is right-invariant (and left-invariant)restricted to the orthogonal group O(d,R): for every O ∈ O(d,R),

d(AO,CO) = log(1 + ‖O−1A−1BO − id ‖ + ‖O−1B−1AO − id ‖)= log(1 + ‖O−1(A−1B − id )O‖ + ‖O−1(B−1A− id )O‖)= d(A,B).

Theorem 6.3.8. Let G be a compact metrizable topological group and let g ∈ G.The following conditions are equivalent:

(a) Lg is uniquely ergodic;

(b) Lg is ergodic with respect to µG;

(c) the subgroup gn : n ∈ Z generated by g is dense in G.

Proof. It is clear that (a) implies (b). To prove that (b) implies (c), consider theinvariant distance d given by Lemma 6.3.6. Let H be the closure of gn : n ∈ Zand consider the continuous function

ϕ(x) = mind(x, y); y ∈ H.

Observe that this function is invariant under Lg: using that gH = H , we getthat

ϕ(x) = mind(x, y) : y ∈ H = mind(gx, gy) : y ∈ H= mind(gx, z) : z ∈ H = ϕ(gx) for every x ∈ G.

Since H is closed, ϕ(x) = 0 if and only if x ∈ H . If H 6= G then µG(G\H) > 0,as the Haar measure is positive on open sets. In that case, the function ϕ is notconstant at µG-almost every point, which implies that Lg can not be ergodicwith respect to µG.

Finally, to prove that (c) implies (a), let us show that if µ is a probabilitymeasure invariant under Lg then µ = µG. For that, it suffices to check that µis invariant under every left-translation in G. Fix h ∈ G. Since µ is invariantunder Lg, ∫

ϕ(x) dµ(x) =

∫ϕ(gnx) dµ(x)

for every n ∈ N and every continuous function ϕ : G→ R. On the other hand,the hypothesis ensures that there exists a sequence of natural numbers nj → ∞such that gnj → h. Given any (uniformly) continuous function ϕ : G → R andany ε > 0, fix δ > 0 such that |ϕ(x) − ϕ(y)| < ε whenever d(x, y) < δ. If j issufficiently large,

d(gnjx, hx) = d(gnj , h) < δ for every x ∈ G.

Hence, |ϕ(gnjx) − ϕ(hx)| < ǫ for every x and, consequently,

|∫ (

ϕ(x) − ϕ(hx))dµ| = |

∫ (ϕ(gnjx) − ϕ(hx)

)dµ| < ǫ.


Since ε is arbitrary, it follows that∫ϕdµ =

∫ϕ Lh dµ for every continuous

function ϕ and every h ∈ G. This implies that µ is invariant under Lh for everyh ∈ G, as claimed.

6.3.4 Odometers

Odometers, or adding machines, are mathematical models for the mechanismsthat register the distance (number of kilometers) travelled by a car or theamount of electricity (number of energy units) consumed in a house. Theycome with a dynamics, which consists in advancing the counter by one uniteach time. The main difference with respect to the real life odometers is thatour idealized counters allow for an infinite number of digits.

Fix any number basis d ≥ 2, for example d = 10, and consider the setX = 0, 1, . . . , d − 1, endowed with the discrete topology. Let M = XN bethe set of all sequences α = (αn)n with values in X , endowed with the producttopology. This topology is metrizable: it is compatible, for instance, with thedistance defined in M by

d(α, α′) = 2−N(α,α′) where N(α, α′) = minj ≥ 0 : αj 6= α′j. (6.3.7)

Observe also that M is compact, being the product of compact spaces (theoremof Tychonoff).

Let us introduce in M the following operation of “sum with transport”:given α = (αn)n and β = (βn)n in M , define α+ β = (γn)n as follows. Firstly,

• if α0 + β0 < d then γ0 = α0 + β0 and δ1 = 0;

• if α0 + β0 ≥ d then γ0 = α0 + β0 − d and δ1 = 1.

Next, for every n ≥ 1,

• if αn + βn + δn < d then γn = αn + βn + δn and δn+1 = 0;

• if αn + βn + δn ≥ d then γn = αn + βn + δn − d and δn+1 = 1.

The auxiliary sequence (δn)n corresponds precisely to the transports. The map+ : M ×M →M defined in this way turns M into an abelian topological groupand the distance (6.3.7) is invariant under all the translations (Exercise 6.3.8).

Now consider the “translation by 1” f : M → M defined by

f((αn)n

)= (αn)n + (1, 0, . . . , 0, . . . ) = (0, . . . , 0, αk + 1, αk+1, . . . , αn, . . . )

where k ≥ 0 is the smallest value of n such that αn < d − 1; if there exists nosuch k, that is, if (αn)n is the constant sequence equal to d− 1, then the imagef((αn)n) is the constant sequence equal to 0. We leave it to the reader to checkthat this transformation f : M →M is uniquely ergodic (Exercise 6.3.9).

It is possible to generalize this construction somewhat, in the following way.Take M =

∏∞n=00, 1, . . . , dn − 1, where (dn)n is any sequence of integer num-

bers larger than 1. Just as in the previous particular case, this set has thestructure of a metrizable compact abelian group and the “translation by 1” isuniquely ergodic.


Example 6.3.9. A (simple) pile in an interval1 I is an ordered family S ofpairwise disjoint subintervals I0, . . . , Ik−1 with the same length and whoseunion is I. Write Ik = I0. We associate with S the transformation f : I → Iwhose restriction to each Ij is the translation mapping Ij to Ij+1. Graphically,we represent the subintervals “piled-up” on top of each other in order: from thebottom I0 to the top Ik−1. Then f is nothing but the translation “upwards”,except at the top of the pile. See the left-hand side of Figure 6.1.

.

.

.

I0

Ik−1

. ..

. . .

. . .

Figure 6.1: Example of the piling method

Let us consider a sequence (Sn)n of piles in the same interval I, constructedas follows. Fix any integer number d ≥ 2. Take S0 = I. For each n ≥ 1,take as Sn the pile obtained by dividing Sn−1 into d columns, all with the samewidth, and piling them up on top of each other. This procedure is described onthe right-hand side of Figure 6.1 for d = 3. Let fn : I → I be the transformationassociated with each Sn. We leave it to the reader to show (Exercise 6.3.10) thatthe sequence (fn)n converges at every point to a transformation f : I → I thatpreserves the Lebesgue measure. Moreover, this transformation f is uniquelyergodic.

This is only one of the simplest applications of the so-called piling method,which is a very effective tool to produce examples with interesting properties.The reader may find a detailed discussion of this method in Section 6 of thebook of Friedman [Fri69]. Another application, a bit more elaborate, will begiven in Example 8.2.3.

Example 6.3.10 (Substitutions). We are going to mention briefly a construc-tion of combinatorial nature that generalizes the definition of odometer andprovides several other interesting examples of minimal and even uniquely er-godic systems. For more information, including about the relations betweensuch systems and the odometer, recommend the book of Queffelec [Que87] andthe paper of Ferenczi, Fisher, Talet [FFT09].

1For definiteness, take all intervals to be closed on the left and open on the right.


We call substitution in a finite alphabet A any map associating with eachletter α ∈ A a word s(α) formed by a finite number of letters of A. A fewexamples, for A = 0, 1: Thue-Morse substitution s(0) = 01 and s(1) = 10;Fibonacci substitution s(0) = 01 and s(1) = 0; Feigenbaum substitution s(0) =11 and s(1) = 10; Cantor substitution s(0) = 010 and s(1) = 111; and Chaconsubstitution s(0) = 0010 and s(1) = 1. We may iterate a substitution, bydefining s1(α) = s(α) and

sk+1(α) = s(α1) · · · s(αn) if sk(α) = α1 · · ·αn.

We call a substitution s primitive (or aperiodic) if there exists k ≥ 1 such thatfor any α, β ∈ A the word sk(α) contains the letter β.

Let A be endowed with the discrete topology and Σ = AN be the space ofall sequences in A, endowed with the product topology. Denote by S : Σ → Σthe map induced in that space by a given substitution s: the image of each(a0, . . . , an, . . . ) ∈ Σ is the sequence of the letters that constitute the wordobtained when one concatenates the finite words s(a0), . . . , s(an), . . . . Supposethat there exists some letter α0 ∈ A such that the word s(α0) has length largerthan 1 and starts with the letter α0. That is the case for all the examples listedabove. Then (Exercise 6.3.11), S admits a unique fixed point x = (xn)n withx0 = α0.

Consider the restriction σ : X → X of the shift map σ : Σ → Σ to theclosure X ⊂ Σ of the orbit σn(x) : n ≥ 0 of the point x. If the substi-tution s is primitive then σ : X → X is minimal and uniquely ergodic (seeSection 5 in [Que87]). That holds, for instance, for the Thue-Morse, Fibonacciand Feigenbaum substitutions.

6.3.5 Exercises

6.3.1. Let G be a manifold and · be a group operation in G such that the map(g, h) 7→ g · h is of class C1. Show that g 7→ g−1 is also of class C1.

6.3.2. Let G be a compact topological space such that every point admits acountable basis of neighborhoods and let · be a group operation in G such thatthe map (g, h) 7→ g · h is continuous. Show that g 7→ g−1 is also continuous.

6.3.3. Show that the distance d in Example 6.3.7 is not right-invariant.

6.3.4. Prove part (c) of Theorem 6.3.4: a locally compact group G is compactif and only if its Haar measure is finite.

6.3.5. Identify GL(1,R) with the multiplicative group R \ 0. Check that themeasure µ defined on GL(1,R) by

∫

GL(1,R)

ϕdµ =

∫

R\0

ϕ(x)

|x| dx

is both left-invariant and right-invariant. Find a measure invariant under allthe translations of GL(1,C).

6.4. THEOREM OF WEYL 179

6.3.6. Identify GL(2,R) with (a11, a12, a21, a22) ∈ R4 : a11a22−a12a21 6= 0, insuch a way that det(a11, a12, a21, a22) = a11a22−a12a21. Show that the measureµ defined by

∫

GL(2,R)

ϕdµ =

∫ϕ(x11, x12, x21, x22)

| det(x11, x12, x21, x22)|2dx11dx12dx21dx22

is both left-invariant and right-invariant. Find a measure invariant under allthe translations of GL(2,C).

6.3.7. Let G be a compact metrizable group and let g ∈ G. Check that thefollowing conditions are equivalent:

(1) Lg is uniquely ergodic;

(2) Lg is transitive: there is x ∈ G such that gnx : n ∈ Z is dense in G;

(3) Lg is minimal: gny : n ∈ Z is dense in G for every y ∈ G.

6.3.8. Show that the operation + : M ×M → M defined in Section 6.3.4 iscontinuous and endows M with the structure of an abelian group. Moreover,every translation in this group preserves the distance defined in (6.3.7).

6.3.9. Let f : M →M be an odometer, as defined in Section 6.3.4, with d = 10.Given b0, . . . , bk−1 in 0, . . . , 9, denote by [b0, . . . , bk−1] the set of all sequencesβ ∈M with β0 = b0, . . . , βk−1 = bk−1. Show that

limn

1

n#0 ≤ j < n : f j(x) ∈ [b0, . . . , bk−1]

=

1

10k

for every x ∈ M . Moreover, this limit is uniform. Conclude that f admits aunique invariant probability measure and calculate that measure explicitly.

6.3.10. Check the claims in Example 6.3.9.

6.3.11. Prove that if s is a substitution in a finite alphabet A and α ∈ A issuch that s(α) has length larger than 1 and starts with the letter α, then thetransformation S : Σ → Σ defined in Example 6.3.10 admits a unique fixedpoint that starts with the letter α ∈ A.

6.4 Theorem of Weyl

In this section we use ideas that were discussed previously to prove a beau-tiful theorem of Hermann Weyl [Wey16] about the distribution of polynomialsequences.

Consider any polynomial function P : R → R with real coefficients anddegree d ≥ 1:

P (x) = a0 + a1x+ a2x2 + · · · + adx

d.


Composing P with the canonical projection R → S1, we obtain a polynomialfunction P∗ : R → S1 with values on the circle S1 = R/Z. Define:

zn = P∗(n), for every n ≥ 1.

We may think of zn as the fractional part of the real number P (n). We want tounderstand how the sequence (zn)n is distributed on the circle.

Definition 6.4.1. We say that a sequence (xn)n in S1 is equidistributed if, forany continuous function ϕ : S1 → R,

limn→∞

1

n

n∑

j=1

ϕ(xj) =

∫ϕ(x) dx.

According to Exercise 6.4.1, this is equivalent to saying that, for every seg-ment I ⊂ S1, the fraction of terms of the sequence that are in I is equal to thecorresponding length m(I).

Theorem 6.4.2 (Weyl). If some of the coefficients a1, a2, . . . , ad is irrationalthen the sequence zn = P∗(n), n ∈ N is equidistributed.

In order to develop some intuition about this theorem, let us start by con-sidering the special case d = 1. In this case the polynomial function reduces toP (x) = a0 + a1x. Let us consider the transformation

f : S1 → S1, f(θ) = θ + a1.

By assumption, the coefficient a1 is irrational. Therefore, as we have seen inSection 6.3.1, this transformation admits a unique invariant probability measure,which is the Lebesgue measure m. Consequently, given any continuous functionϕ : S1 → R and any point θ ∈ S1,

limn→∞

1

n

n∑

j=1

ϕ(f j(θ)) =

∫ϕdm.

Take θ = a0. Then, f j(θ) = a0 + a1j = zj . Hence, the previous relation yields

limn→∞

1

n

n∑

j=1

ϕ(zj) =

∫ϕdm.

This is precisely what it means to say that zj is equidistributed.

6.4.1 Ergodicity

Now we extend the previous arguments to any degree d ≥ 1. Consider thetransformation f : Td → Td defined on the d-dimensional torus Td by thefollowing expression:

f(θ1, θ2, . . . , θd) = (θ1 + α, θ2 + θ1, . . . , θd + θd−1), (6.4.1)


where α is an irrational number to be chosen later. Note that f is invertible:the inverse is given by

f−1(θ1, θ2, . . . , θd) = (θ1−α, θ2−θ1+α, . . . , θd−θd−1+· · ·+(−1)d−1θ1+(−1)dα).

Note also that the derivative of f at each point is given by the matrix

1 0 0 · · · 0 01 1 0 · · · 0 00 1 1 · · · 0 0· · · · · · · · · · · · · · · · · ·0 0 0 · · · 1 1

,

whose determinant is 1. This ensures that f preserves the Lebesgue measure onthe torus (recall Lemma 1.3.5).

Proposition 6.4.3. The Lebesgue measure on Td is ergodic for f .

Proof. We are going to use a variation of the Fourier series expansion argumentin Proposition 4.2.1. Let ϕ : Td → R be any function in L2(m). Write

ϕ(θ) =∑

n∈Zd

ane2πin·θ

with θ = (θ1, . . . , θd) and n = (n1, . . . , nd) and n · θ = n1θ1 + · · · + ndθd. TheL2-norm of ϕ is given by

∑

n∈Zd

|an|2 =

∫|ϕ(θ)|2 dθ1 · · · dθd <∞. (6.4.2)

Observe that

ϕ(f(θ)) =∑

n∈Zd

ane2πi(n1(θ1+α)+n2(θ2+θ1)+···nd(θd+θd−1))

=∑

n∈Zd

ane2πin1αe2πiL(n)·θ

where L(n) = (n1 + n2, n2 + n3, . . . , nd−1 + nd, nd). Suppose that the functionϕ is invariant, that is, ϕ f = ϕ at almost every point. Then,

ane2πin1α = aL(n) for every n ∈ Zd. (6.4.3)

This implies that an and aL(n) have the same absolute value. On the otherhand, the integrability relation (6.4.2) implies that there exists at most a finitenumber of terms with any given absolute value different from zero. It followsthat an = 0 for every n ∈ Zd whose orbit Lj(n), j ∈ Z is infinite. Observing theexpression of L, we deduce that an = 0 except, possibly, if n2 = · · · = nd = 0.For the remainder values of n, that is, for every n = (n1, 0, . . . , 0), one has thatL(n) = n and, thus, the relation (6.4.3) becomes

an = ane2πin1α.


Since α is irrational, the last factor is different from 1 whenever n1 is non-zero.Therefore, this relation implies that an = 0 also for n = (n1, 0, . . . , 0) withn1 6= 0. In this way, we have shown that if ϕ is an invariant function thenall the terms in its Fourier series vanish except, possibly, the constant term.This means that ϕ is constant at almost every point, and that proves that theLebesgue measure is ergodic for f .

6.4.2 Unique ergodicity

The last step in the proof of Theorem 6.4.2 is the following result:

Proposition 6.4.4. The transformation f is uniquely ergodic: the Lebesguemeasure on the torus is the unique invariant probability measure.

Proof. The proof is by induction on the degree d of the polynomial P . The caseof degree 1 was treated previously. Therefore, we only need to explain how thecase of degree d may be deduced from the case of degree d − 1. For that, wewrite Td = Td−1 × S1 and

f : Td−1 × S1 → Td−1 × S1, f(θ0, η) = (f0(θ0), η + θd−1), (6.4.4)

where θ0 = (θ1, . . . , θd−1) and f0(θ0) = (θ1 + α, θ2 + θ1, . . . , θd−1 + θd−2). Byinduction, the transformation

f0 : Td−1 → Td−1

is uniquely ergodic. Let us denote by π : Td → Td−1 the projection π(θ) = θ0.

Lemma 6.4.5. For any probability measure µ invariant under f , the projectionπ∗µ coincides with the Lebesgue measure m0 on Td−1.

Proof. Given any measurable set E ⊂ Td−1,

(π∗µ)(f−10 (E)) = µ(π−1f−1

0 (E)).

Using that π f = f0 π and the fact that µ is f -invariant, we get that theexpression on the right-hand side is equal to

µ(f−1π−1(E)) = µ(π−1(E)) = (π∗µ)(E).

Therefore, (π∗µ)(f−10 (E)) = (π∗µ)(E) for every measurable subset E, that is,

π∗µ is invariant under f0. It is clear that π∗µ is a probability measure. Sincef0 is uniquely ergodic, it follows that π∗µ coincides with the Lebesgue measurem0 on Td−1.

Now suppose that µ, besides being invariant, is also ergodic for f . ByTheorem 3.2.6, and by ergodicity, the set G(µ) ⊂ M of all points θ ∈ Td suchthat

limn

1

n

n−1∑

j=0

ϕ(f j(θ)) =

∫ϕdµ for any continuous function ϕ : Td → R (6.4.5)


has full measure. Let G0(µ) be the set of all θ0 ∈ Td−1 such that G(µ) intersectsθ0×S1. In other words,G0(µ) = π(G(µ)). It is clear that π−1(G0(µ)) containsG(µ) and, thus, has full measure. Hence, using Lemma 6.4.5,

m0(G0(µ)) = µ(π−1(G0(µ))) = 1. (6.4.6)

For the same reasons, this relation remains valid for the Lebesgue measure:

m0(G0(m)) = m(π−1(G0(m))) = 1. (6.4.7)

The identities (6.4.6) and (6.4.7) imply that the intersection between G0(µ)and G0(m) has full measure for m0. So, in particular, these two sets can not bedisjoint. Let θ0 be any point in the intersection. By definition, G(µ) intersectsθ0 × S1. But the next result asserts that G(m) contains θ0 × S1:

Lemma 6.4.6. If θ0 ∈ G0(m) then θ0 × S1 is contained in G(m).

Proof. The crucial observation is that the measure m is invariant under everytransformation of the form

Rβ : Td−1 × S1 → Td−1 × S1, (ζ, η) 7→ (ζ, η + β).

The hypothesis θ0 ∈ G0(m) means that there exists some η ∈ S1 such that(θ0, η) ∈ G(m), that is,

limn

1

n

n−1∑

j=0

ϕ(f j(θ0, η)) =

∫ϕdm

for every continuous function ϕ : Td → R. Any other point of θ0 × S1 maybe written as (θ0, η + β) = Rβ(θ0, η) for some β ∈ S1. Recalling (6.4.1), we seethat

f(Rβ(τ0, ζ)

)= (τ1 + α, τ2 + τ1, . . . , τd−1 + τd−2, ζ + β + τd−1) = Rβ

(f(τ0, ζ)

)

for every (τ0, ζ) ∈ Td−1 × S1. Hence, by induction,

f j(θ0, η + β) = f j(Rβ(θ0, η)

)= Rβ

(f j(θ0, η)

)

for every j ≥ 1. Therefore, given any continuous function ϕ : Td → R,

limn

1

n

n−1∑

j=0

ϕ(f j(θ0, η + β)) = lim1

n

n−1∑

j=0

(ϕ Rβ)(f j(θ0, η))

=

∫(ϕ Rβ) dm =

∫ϕdm.

This proves that (θ0, η + β) is in G(m) for every β ∈ S1, as stated.


It follows from what we said so far that G(µ) and G(m) intersect each otherat some point of θ0×S1. In view of the definition (6.4.5), this implies that thetwo measures have the same integral for every continuous function. Accordingto Proposition A.3.3, this implies that µ = m, as we wanted to prove.

Corollary 6.4.7. The orbit of every point θ ∈ Td is equidistributed on the torusTd, in the sense that

limn

1

n

n−1∑

j=0

ψ(f j(θ)) =

∫ψ dm

for every continuous function ψ : Td → R.

Proof. This follows immediately from Propositions 6.1.1 and 6.4.4.

6.4.3 Proof of the theorem of Weyl

To complete the proof of Theorem 6.4.2, we introduce the polynomial functionsp1, . . . , pd defined by

pd(x) = P (x) and

pj−1(x) = pj(x+ 1) − pj(x) for j = 2, . . . , d.(6.4.8)

Lemma 6.4.8. The polynomial pj(x) has degree j, for every 1 ≤ j ≤ d. More-over, p1(x) = αx+ β with α = d!ad.

Proof. By definition, pd(x) = P (x) has degree d. Hence, to prove the first claimit suffices to show that if pj(x) has degree j then pj−1(x) has degree j − 1. Inorder to do that, let

pj(x) = bjxj + bj−1x

j−1 + · · · + b0,

where bj 6= 0. Then

pj(x+ 1) = bj(x+ 1)j + bj−1(x+ 1)j−1 + · · · + b0

= bjxj + (jbj + bj−1)x

j−1 + · · · + b0.

Subtracting one expression from the other, we get that

pj−1(x) = (jbj)xj−1 + b′j−2x

j−2 + · · · + b′0

has degree j − 1. This proves the first claim in the lemma. This calculationalso shows that the main coefficient of pj−1(x) (the coefficient of the term withhighest degree) can be obtained multiplying by j the main coefficient of pj(x).Consequently, the main coefficient of p1 must be equal to d!aq, as claimed inthe last part of the lemma.


Lemma 6.4.9. For every n ≥ 0,

fn(p1(0), p2(0), . . . , pd(0)

)=(p1(n), p2(n), . . . , pd(n)

).

Proof. The proof is by induction on n. Since the case n = 0 is obvious, we onlyneed to treat the inductive step. Recall that f was defined in (6.4.1). If

fn−1(p1(0), p2(0), . . . , pd(0)) = (p1(n− 1), p2(n− 1), . . . , pd(n− 1))

then fn(p1(0), p2(0), . . . , pd(0)) is equal to

(p1(n− 1) + α, p2(n− 1) + p1(n− 1), . . . , pd(n− 1) + pd−1(n− 1)).

Using the definition (6.4.8) and Lemma 6.4.8, we find that this expression isequal to

(p1(n), p2(n), . . . , pd(n)),

and that proves the lemma.

Finally, we are ready to prove that the sequence zn = P∗(n), n ∈ N isequidistributed. We treat two cases separately.

Firstly, suppose that the main coefficient ad of P (x) is irrational. Then thenumber α in Lemma 6.4.8 is irrational and, thus, the results in Section 6.4.2 arevalid for the transformation f : Td → Td. Let ϕ : S1 → R be any continuousfunction. Consider ψ : Td → R defined by

ψ(θ1, θ2, . . . , θd) = ϕ(θd).

Fix θ = (p1(0), p2(0), . . . , pd(0)). Using Lemma 6.4.9 and Corollary 6.4.7, weget that

limn

1

n

n−1∑

j=0

ϕ(zn) = limn

1

n

n−1∑

j=0

ψ(fn(θ)) =

∫ψ dm =

∫ϕdx.

This ends the proof of Theorem 6.4.2 in the case when ad is irrational.Now suppose that ad is rational. Write ad = p/q with p ∈ Z and q ∈ N. It

is clear that we may write zn as a sum

zn = xn + yn, xn = adnd and yn = Q∗(n)

where Q(x) = a0+a1x+ · · ·+ad−1xd−1 and Q∗ : R → S1 is given by Q∗ = πQ.

To begin with, observe that

xn+q − xn =p

q(n+ q)d − p

qnd

is an integer number, for every n ∈ N. This means that the sequence xn, n ∈ Nis periodic (with period q) in the circle R/Z. In particular, it takes no morethan q distinct values. Observe also that, since ad is rational, the hypothesis ofthe theorem implies that some of the coefficients a1, . . . , ad−1 of Q is irrational.


Hence, by on the degree, the sequence yn, n ∈ N is equidistributed. More thanthat, the subsequences

yqn+r = Q∗(qn+ r), n ∈ Z

are equidistributed for every r ∈ 0, 1, . . . , q − 1. In fact, as the reader may

readily check, these sequences may be written as ynq+r = Q(r)∗ (n) for some

polynomial Q(r) that also has degree d− 1 and, thus, the induction hypothesisapplies to each one of them as well. From these two observations it follows thatevery subsequence zqn+r, n ∈ Z is equidistributed. Consequently, zn, n ∈ N isalso equidistributed. This completes the proof of Theorem 6.4.2.

6.4.4 Exercises

6.4.1. Show that a sequence (zj)j is equidistributed on the circle if and only if

limn

1

n#1 ≤ j ≤ n : zj ∈ I = m(I)

for every segment I ⊂ S1, where m(I) denotes the length of I.

6.4.2. Show that the sequence (√n mod Z)n is equidistributed on the circle.

The same holds for the sequence (logn mod Z)n?

6.4.3. Koksma [Kok35] proved that the sequence (an mod Z)n is equidis-tributed on the circle, for Lebesgue almost every a > 1. That is not truefor every a > 1. Indeed, consider the golden ratio golden ration a = (1+

√5)/2.

Check that the sequence (an mod Z)n converges to 0 ∈ S1 when n → ∞; inparticular, it is not equidistributed on the circle.

Chapter 7

Correlations

The models of dynamical systems that interest us the most, transformations andflows, are deterministic: the state of the system at any time determines the wholefuture trajectory; when the system is invertible, the past trajectory is equallydetermined. However, these systems may also present stochastic (that is, “partlyrandom”) behavior: at some level coarser than that of individual trajectories,information about the past is gradually lost as the system is iterated. That isthe subject of the present chapter.

In Probability Theory one calls correlation between two random variablesX and Y the number

C(X,Y ) = E[(X − E[X ])(Y − E[Y ])

]= E[XY ] − E[X ]E[Y ].

Note that the expression (X−E[X ])(Y −E[Y ]) is positive if X and Y are on thesame side (either larger or smaller) of their respective means, E[X ] and E[Y ], andit is negative otherwise. Therefore, the sign of C(X,Y ) indicates whether thetwo variables exhibit, predominantly, the same behavior or opposite behaviors,relative to their means. Furthermore, correlation close to zero indicates thatthe two behaviors are little, if at all, related to each other.

Given an invariant probability measure µ of a dynamical system f : M →Mand given measurable functions ϕ, ψ : M → R, we want to analyze the evolutionof the correlations

Cn(ϕ, ψ) = C(ϕ fn, ψ)

when time n goes to infinity. We may think of ϕ and ψ as quantities that aremeasured in the system, such as temperature, acidity (pH), kinetic energy, etc.Then Cn(ϕ, ψ) measures how much the value of ϕ at time n is correlated withthe value of ψ at time zero, to what extent one value “influences” the other.

For example, if ϕ = XA and ψ = XB are characteristic functions, then ψ(x)provides information on the position of the initial point x, whereas ϕ(fn(x))informs on the position of its n-th iterate fn(x). If the correlation Cn(ϕ, ψ) issmall, then the first information is of little use to make predictions about the

187

188 CHAPTER 7. CORRELATIONS

second one. That kind of behavior, where correlations approach zero as time nincreases, is quite common in important models, as we are going to see.

We start by introducing the notions of (strong) mixing and weak mixingsystems, and by studying their basic properties (Section 7.1). In Sections 7.2and 7.3 we discuss these notions in the context of the Markov shifts, which gen-eralize the Bernoulli shifts, and of the interval exchanges, which are an extensionof the class of the circle rotations. In Section 7.4 we analyze, in quantitativeterms, the speed of decay of correlations for certain classes of functions.

7.1 Mixing systems

Let f : M → M be a measurable transformation and µ be an invariantprobability measure. The correlations sequence of two measurable functionsϕ, ψ : M → R is defined by

Cn(ϕ, ψ) =

∫(ϕ fn)ψ dµ−

∫ϕdµ

∫ψ dµ, n ∈ N. (7.1.1)

We say that the system (f, µ) is mixing if

limnCn(XA,XB) = lim

nµ(f−n(A) ∩B) − µ(A)µ(B) = 0, (7.1.2)

for any measurable sets A,B ⊂ M . In other words, when n grows the prob-ability of the event x ∈ B and fn(x) ∈ A converges to the product of theprobabilities of the events x ∈ B and fn(x) ∈ A.

Analogously, given a flow f t : M → M , t ∈ R and an invariant probabilitymeasure µ, we define

Ct(ϕ, ψ) =

∫(ϕ f t)ψ dµ−

∫ϕdµ

∫ψ dµ, t ∈ R (7.1.3)

and we say that the system (f t, µ) is mixing if

limt→+∞

Ct(XA,XB) = limt→+∞

µ(f−t(A) ∩B) − µ(A)µ(B) = 0, (7.1.4)

for any measurable sets A,B ⊂M .

7.1.1 Properties

A mixing system is necessarily ergodic. Indeed, suppose that there exists someinvariant set A ⊂M with 0 < µ(A) < 1. Taking B = Ac, we get f−n(A)∩B = ∅for every n. Then, µ(f−n(A) ∩ B) = 0 for every n, whereas µ(A)µ(B) 6= 0. Inparticular, (f, µ) is not mixing. The example that follows shows that ergodicityis strictly weaker than mixing:

Example 7.1.1. Let θ ∈ R be an irrational number. As we have seen inSection 4.2.1, the rotation Rθ : S1 → S1 is ergodic with respect to the Lebesgue

7.1. MIXING SYSTEMS 189

measure m. However, (Rθ,m) is not mixing. Indeed, if A,B ⊂ S1 are two smallintervals then R−n

θ (A)∩B is empty and, thus, m(R−nθ (A)∩B) = 0 for infinitely

many values of n. Since m(A)m(B) 6= 0, it follows that the condition in (7.1.2)does not hold.

It is clear from the definition (7.1.2) that if (f, µ) is mixing then (fk, µ) ismixing for every k ∈ N. The corresponding statement for ergodicity is false:the map f(x) = 1 − x on the set 0, 1 is ergodic with respect to the measure(δ0 + δ1)/2 but the second iterate f2 is not.

Lemma 7.1.2. Assume that limn µ(f−n(A)∩B) = µ(A)µ(B) for every pair ofsets A and B in an algebra A that generates the σ-algebra of measurable sets.Then (f, µ) is mixing.

Proof. Let C be the family of all measurable sets A such that µ(f−n(A) ∩ B)converges to µ(A)µ(B) for every B ∈ A. By assumption, C contains A. Weclaim that C is a monotone class. Indeed, let A = ∪kAk be the union of anincreasing sequence A1 ⊂ · · · ⊂ Ak ⊂ · · · of elements of C. Given ε > 0, thereexists k0 ≥ 1 such that

µ(A) − µ(Ak) = µ(A \Ak) < ε

for every k ≥ k0. Moreover, for every n ≥ 1,

µ(f−n(A) ∩B) − µ(f−n(Ak) ∩B) = µ(f−n(A \Ak) ∩B)

≤ µ(f−n(A \Ak)) = µ(A \Ak) < ε.

For each fixed k ≥ k0, the fact that Ak ∈ C ensures that there exists n(k) ≥ 1such that

|µ(f−n(Ak) ∩B) − µ(Ak)µ(B)| < ε for every n ≥ n(k).

Adding these three inequalities we conclude that

|µ(f−n(A) ∩B) − µ(A)µ(B)| < 3ε for every n ≥ n(k0).

Since ε > 0 is arbitrary, this shows that A ∈ C. In the same way, one provesthat the intersection of any decreasing sequence of elements of C is still anelement of C. So, C is indeed a monotone class. By the monotone class theorem(Theorem A.1.18), it follows that C contains every measurable set: for everymeasurable set A one has

limnµ(f−n(A) ∩B) = µ(A)µ(B) for every B ∈ A.

All that is left to do is to deduce that this property holds for every measurableset B. This follows from precisely the same kind of arguments as we have justdetailed, as the reader may readily check.


Example 7.1.3. Every Bernoulli shift (recall Section 4.2.3) is mixing. Indeed,given any two cylinders A = [p;Ap, . . . , Aq] and B = [r;Br, . . . , Bs],

µ(f−n(A) ∩B) = µ([r;Br, . . . , Bs, X, . . . , X,Ap, . . . , Aq])

= µ([r;Br, . . . , Bs])µ([p;Ap, . . . , Aq]) = µ(A)µ(B)

for every n > s−p. Let A be the algebra generated by the cylinders: its elementsare the finite pairwise disjoint unions of cylinders. It follows from what we havejust said that µ(f−n(A) ∩ B) = µ(A)µ(B) for every pair of sets A,B ∈ A andevery n sufficiently large. Since A generates the σ-algebra of measurable sets,we may use Lemma 7.1.2 to conclude that the system is mixing, as stated.

Example 7.1.4. Let g : S1 → S1 be defined by g(x) = kx, where k ≥ 2is an integer number, and let m be the Lebesgue measure on the circle. Thesystem (g,m) is equivalent to a Bernoulli shift, in the following sense. LetX = 0, 1, . . . , k − 1 and let f : M → M be the shift map in M = XN.Consider the product measure µ = νN in M , where ν is the probability measuredefined by ν(A) = #A/k for every A ⊂ X . The map

h : M → S1, h((an)n

)=

∞∑

n=1

an−1

kn

is a bijection, restricted to a full measure subset, and both h and its inverseare measurable. Moreover, h∗µ = m and h f = g h at almost every point.We say that h is an ergodic equivalence between (g,m) and (f, µ). Throughit, properties of one system may be translated to corresponding properties forthe other system. In particular, recalling Example 7.1.3, we get that (g,m) ismixing: given any measurable sets A,B ⊂ S1,

m(g−n(A) ∩B

)= µ

(h−1(g−n(A) ∩B)

)= µ

(f−n(h−1(A)) ∩ h−1(B)

)

→ µ(h−1(A))µ(h−1(B)) = m(A)m(B) when n→ ∞.

Example 7.1.5. For surjective endomorphisms of the torus (Section 4.2.5)mixing and ergodicity are equivalent properties: the system (fA,m) is mixingif and only if no eigenvalue of the matrix A is a root of unity (compare Theo-rem 4.2.14). In Exercise 7.1.4 we invite the reader to prove this fact; a strongerstatement will appear in Exercise 8.4.2. More generally, relative to the Haarmeasure, a surjective endomorphism of a compact group is mixing if and onlyif it is ergodic. In fact, even stronger statements are true, as we will commentupon in Section 9.5.3.

Let us also discuss the topological version of the notion of mixing system.For that, take the ambient M to be a topological space. A transformationf : M → M is said to be topologically mixing if, given any non-empty opensets U, V ⊂ M , there exists n0 ∈ N such that f−n(U) ∩ V is non-empty forevery n ≥ n0. This is similar, but strictly stronger than the hypothesis ofLemma 4.3.4: in the lemma we asked f−n(U) to intersect V for some value ofn, whereas now we request that to happen for every n sufficiently large.


Proposition 7.1.6. If (f, µ) is mixing then the restriction of f to the supportof µ is topologically mixing.

Proof. Denote X = supp(µ). Let A,B ⊂ X be open sets. By the definition ofsupport of a measure, µ(A) > 0 and µ(B) > 0. Hence, since µ is mixing, thereexists n0 such that µ(f−n(A) ∩ B) > µ(A)µ(B)/2 > 0 for every n ≥ n0. Inparticular, µ(f−n(A) ∩B) 6= ∅ for every n ≥ n0.

It follows directly from this proposition that if f admits some invariant prob-ability measure µ that is mixing and positive on open sets, then f is topologicallymixing. For example, given any finite set X = 1, . . . , d, the shift map

f : XZ → XZ (or f : XN → XN)

is topologically mixing. Indeed, for any probability measure ν supported on thewhole X , the Bernoulli measure µ = νZ (or µ = νN) is mixing and positive onopen sets, as we have seen in Example 7.1.3. Analogously, by Example 7.1.4,every transformation f : S1 → S1 of the form f(x) = kx with k ≥ 2 is topolog-ically mixing.

Example 7.1.7. Translations in a metrizable group G are never topologicallymixing. Indeed, consider any left-translation Lg (the case of right-translations isanalogous). We may suppose that g is not the unit element e since otherwise it isobvious that Lg is not topologically mixing. Fix some distance d invariant underall the translations of the group G (recall Lemma 6.3.6) and let α = d(e, g−1).Consider U = V = ball of radius α/4 around e. Every L−n

g (U) is a ball ofradius α/4. Assume that L−n

g (U) intersects V . Then L−ng (U) is contained in

the ball of radius 3α/4 and, thus, L−n−1g (U) is contained in the ball of radius

3α/4 around g−1. Consequently, L−n−1g (U) does not intersect V . Since n is

arbitrary, this shows that Lg is not topologically mixing.

7.1.2 Weak mixing

A system (f, µ) is weak mixing if, given any measurable sets A,B ⊂M ,

limn

1

n

n−1∑

j=0

|Cj(XA,XB)| = limn→∞

1

n

n−1∑

j=0

|µ(f−j(A)∩B)−µ(A)µ(B)| = 0. (7.1.5)

It is clear from the definition that every mixing system is also weak mixing. Onthe other hand, every weak mixing system is ergodic. Indeed, if A ⊂ M is aninvariant set then

limn

1

n

n−1∑

j=0

|Cj(XA,XAc)| = µ(A)µ(Ac)

and, hence, the hypothesis implies that µ(A) = 0 or µ(Ac) = 0.


Example 7.1.8. Translations in metrizable compact groups are never weakmixing with respect to the Haar measure µ (or any other invariant measurepositive on open sets). Indeed, as observed in Example 7.1.7, it is always possibleto choose open sets U and V such that f−n(U)∩ V is empty for at least one inevery two consecutive values n. Then,

lim infn

1

n

n−1∑

j=0

|µ(f−j(U) ∩ V ) − µ(U)µ(V )| ≥ 1

2µ(U)µ(V ) > 0.

In this way we get several examples of ergodic systems, even uniquely ergodicones, that are not weak mixing.

We are going to see in Section 7.3.2 that the family of interval exchangescontains many systems that are weak mixing (and uniquely ergodic) but are notmixing.

The proof of the next result is analogous to the proof of Lemma 7.1.2 andis left to the reader:

Lemma 7.1.9. Assume that limn(1/n)∑n−1j=0 |µ(f−j(A)∩B)− µ(A)µ(B)| = 0

for every pair of sets A and B in some algebra A that generates the σ-algebraof measurable sets. Then (f, µ) is weak mixing.

Example 7.1.10. Given a system (f, µ), let us consider the product transfor-mation f2 : M ×M →M ×M given by f2(x, y) = (f(x), f(y)). It is easy to seethat f2 preserves the product measure µ2 = µ × µ. If (f2, µ2) is ergodic then(f, µ) is ergodic: just note that if A ⊂M is invariant under f and µ(A) ∈ (0, 1)then A×A is invariant under f2 and µ2(A×A) ∈ (0, 1).

The converse is not true in general, that is, (f2, µ2) may not be ergodic evenif (f, µ) is ergodic. For example, if f : S1 → S1 is an irrational rotation and d is adistance invariant under rotations, then any neighborhood (x, y) : d(x, y) < rof the diagonal is invariant under f2.

The next result shows that this type of phenomenon can not occur in thecategory of weak mixing systems:


(a) (f, µ) is weak mixing;

(b) (f2, µ2) is weak mixing;

(c) (f2, µ2) is ergodic.

Proof. To prove that (a) implies (b), consider any measurable sets A,B,C,Din M . Then∣∣µ2(f

−j2 (A×B) ∩ (C ×D)) − µ2(A×B)µ2(C ×D)

∣∣

=∣∣µ(f−j(A) ∩ C)µ(f−j(B) ∩D) − µ(A)µ(B)µ(C)µ(D)

∣∣

≤∣∣µ(f−j(A) ∩ C) − µ(A)µ(C)

∣∣ +∣∣µ(f−j(B) ∩D) − µ(B)µ(D)

∣∣.


Therefore, the hypothesis (a) implies that

limn

1

n

n−1∑

j=0

∣∣µ2(f−j2 (A×B) ∩ (C ×D)) − µ2(A×B)µ2(C ×D)

∣∣ = 0.

It follows that

limn

1

n

n−1∑

j=0

∣∣µ2(f−j2 (X) ∩ Y ) − µ2(X)µ2(Y )

∣∣ = 0

for any X,Y in the algebra generated by the products E × F of measurablesubsets of M , that is, the algebra formed by the finite pairwise disjoint unionsof such products. Since this algebra generates the σ-algebra of measurablesubsets of M ×M , we may use Lemma 7.1.9 to conclude that (f2, µ2) is weakmixing.

It is immediate that (b) implies (c). To prove that (c) implies (a), observethat

1

n

n−1∑

j=0

[µ(f−j(A) ∩B) − µ(A)µ(B)

]2

=1

n

n−1∑

j=0

[µ(f−j(A) ∩B)2 − 2µ(A)µ(B)µ(f−j(A) ∩B) + µ(A)2µ(B)2

].

The right-hand side may be rewritten as

1

n

n−1∑

j=0

[µ2

(f−j2 (A×A) ∩ (B ×B)

)− µ2(A×A)µ2(B ×B)

]

− 2µ(A)µ(B)1

n

n−1∑

j=0

[µ(f−j(A) ∩B) − µ(A)µ(B)

].

Since (f2, µ2) is ergodic and, consequently, (f, µ) is also ergodic, part (b) ofProposition 4.1.4 gives that both terms in this expression converge to zero. Inthis way, we conclude that

limn

1

n

n−1∑

j=0

[µ(f−j(A) ∩B) − µ(A)µ(B)

]2= 0

for any measurable sets A,B ⊂M . Using Exercise 7.1.2, we deduce that (f, µ)is weak mixing.

7.1.3 Spectral characterization

In this section we discuss equivalent formulations of the notions of mixing systemand weak mixing systems, in terms of the Koopman operator.



(a) (f, µ) is mixing.

(b) There exist p, q ∈ [1,∞] with 1/p + 1/q = 1 such that Cn(ϕ, ψ) → 0 forany ϕ ∈ Lp(µ) and ψ ∈ Lq(µ).

(c) The condition in part (b) holds for ϕ in some dense subset of Lp(µ) andψ in some dense subset of Lq(µ).

Proof. Condition (a) is the special case of (b) for characteristic functions. Sincethe correlations (ϕ, ψ) 7→ Cn(ϕ, ψ) are bilinear functions, condition (a) impliesthat Cn(ϕ, ψ) → 0 for any simple functions ϕ and ψ. This implies (c), since thesimple functions form a dense subset of Lr(µ) for any r ≥ 1.

To show that (c) implies (b), let us begin by observing that as correlationsCn(ϕ, ψ) are equicontinuous functions of ϕ and ψ. Indeed, given ϕ1, ϕ2 ∈ Lp(µ)and ψ1, ψ2 ∈ Lq(µ), the Holder inequality (Theorem A.5.5) gives that

∣∣∫

(ϕ1 fn)ψ1 dµ−∫

(ϕ2 fn)ψ2 dµ∣∣ ≤ ‖ϕ1 − ϕ2‖p ‖ψ1‖q + ‖ϕ2‖p ‖ψ1 − ψ2‖q.

Moreover,

∣∣∫ϕ1 dµ

∫ψ1 dµ−

∫ϕ2 dµ

∫ψ2 dµ

∣∣ ≤ ‖ϕ1 − ϕ2‖1 ‖ψ1‖1 + ‖ϕ2‖1 ‖ψ1 − ψ2‖1.

Adding these inequalities, and noting that ‖ · ‖1 ≤ ‖ · ‖r for every r ≥ 1, we getthat

∣∣Cn(ϕ1, ψ1) − Cn(ϕ2, ψ2)∣∣ ≤ 2‖ϕ1 − ϕ2‖p ‖ψ1‖q + 2‖ϕ2‖p ‖ψ1 − ψ2‖q (7.1.6)

for every n ≥ 1. Then, given ε > 0 and any ϕ ∈ Lp(µ) and ψ ∈ Lq(µ), we maytake ϕ′ and ψ′ in the dense subsets mentioned in the hypothesis such that

‖ϕ− ϕ′‖p < ε and ‖ψ − ψ′‖q < ε.

In particular, ‖ϕ′‖p < ‖ϕ‖p + ε and ‖ψ′‖q < ‖ψ‖q + ε. Then, (7.1.6) gives that

|Cn(ϕ, ψ)| ≤ |Cn(ϕ′, ψ′)| + 2ε(‖ϕ‖p + ‖ψ‖q + 2ε) for every n.

Moreover, by hypothesis, |Cn(ϕ′, ψ′)| < ε for every n sufficiently large. Since εis arbitrary, these two inequalities imply that Cn(ϕ, ψ) converges to zero whenn→ ∞. This proves property (b).

The same argument proves the following version of Proposition 7.1.12 forthe weak mixing property:


(a) (f, µ) is weak mixing.


(b) There exist p, q ∈ [1,∞] with 1/p+1/q = 1 such that (1/n)∑nj=1 |Cj(ϕ, ψ)|

converges to 0 for any ϕ ∈ Lp(µ) and ψ ∈ Lq(µ).

(c) The condition in part (b) holds for ϕ in some dense subset of Lp(µ) andψ in some dense subset of Lq(µ).

In the case p = q = 2, we may express the correlations in terms of the innerproduct · in the Hilbert space L2(µ):

Cn(ϕ, ψ) =[Unf ϕ− (ϕ · 1)

]· ψ for every ϕ, ψ ∈ L2(µ).

Therefore, Proposition 7.1.12 gives that (f, µ) is mixing if and only if

limn

[Unf ϕ− (ϕ · 1)

]· ψ = 0 for every ϕ, ψ ∈ L2(µ) (7.1.7)

and Proposition 7.1.13 gives that (f, µ) is weak mixing if and only if

limn

1

n

n∑

j=1

∣∣[U jfϕ− (ϕ · 1)]· ψ∣∣ = 0 for every ϕ, ψ ∈ L2(µ). (7.1.8)

The condition (7.1.7) means that Unf ϕ converges weakly to ϕ · 1 =∫ϕdµ, while

(7.1.8) is a Cesaro version of that assertion. Compare both conditions with thecharacterization of ergodicity in (4.1.7).

Corollary 7.1.14. Let f : M → M be a mixing transformation relative tosome invariant probability measure µ. Let ν be any probability measure on M ,absolutely continuous with respect to µ. Then fn∗ ν converges pointwise to µ,that is, ν(f−n(B)) → µ(B) for every measurable set B ⊂M .

Proof. Let ϕ = XB and ψ = dν/dµ. Note that ϕ ∈ L∞(µ) and ψ ∈ L1(µ).Hence, by Proposition 7.1.12,

∫(XB fn)dν

dµdµ =

∫(Unf ϕ)ψ dµ →

∫ϕdµ

∫ψ dµ =

∫XB dµ

∫dν

dµdµ.

The sequence on the left-hand side coincides with∫

(XB fn) dν = ν(f−n(B)).The right-hand side is equal to µ(B)

∫1 dν = µ(B).

7.1.4 Exercises

7.1.1. Show that (f, µ) is mixing if and only if µ(f−n(A) ∩ A) → µ(A)2 forevery measurable set A.

7.1.2. Let (an)n be a bounded sequence of real numbers. Prove that

limn

1

n

n∑

j=1

|aj | = 0


if and only if there exists E ⊂ N with density zero at infinity (that is, withlimn(1/n)#(E ∩ 0, . . . , n − 1) = 0) such that the restriction of (an)n to thecomplement of E converges to zero when n→ ∞. Deduce that

limn

1

n

n∑

j=1

|aj | = 0 ⇔ limn

1

n

n∑

j=1

(aj)2 = 0.

7.1.3. Prove that if µ is weak mixing for f then µ is weak mixing for everyiterate fk, k ≥ 1.

7.1.4. Show that if no eigenvalue of A ∈ SL(d,R) is a root of unity then thelinear endomorphism fA : Td → Td induced by A is mixing, with respect to theHaar measure.

7.1.5. Let f : M → M be a measurable transformation in a metric space.Check that an invariant probability measure µ is mixing if and only if (fn∗ η)nconverges to µ in the weak∗ topology for every probability measure η absolutelycontinuous with respect to µ.

7.1.6 (Multiple von Neumann ergodic theorem). Show that if (f, µ) is weakmixing then

1

N

N−1∑

n=0

(ϕ1 fn) · · · (ϕk fkn) →∫ϕ1 dµ · · ·

∫ϕk dµ in L2(µ),

for any bounded measurable functions ϕ1, . . . , ϕk.

7.2 Markov shifts

In this section we introduce an important class of systems that generalizes thenotion of Bernoulli shift. As explained previously, Bernoulli shifts model se-quences of identical experiments such that the outcome of each experiment isindependent of all the others. In the definition of Markov shifts we weaken thisindependence condition: we allow each outcome to depend on the preceding one,but not the others. More generally, Markov shifts may be used to model theso-called finite memory processes, that is, sequences of experiments for whichthere exists k ≥ 1 such that the outcome of each experiment depends only onthe outcomes of the k previous experiments. In this regard, see Exercise 7.2.4.

To define a Markov shift, let (X,A) be a measurable space and Σ = XN

(or Σ = XZ) be the space of all sequences in X , endowed with the productσ-algebra. Let us consider the shift map

σ : Σ → Σ, σ((xn)n

)= (xn+1)n.

Let us be given a family P (x, ·) : x ∈ X of probability measures on X thatdepend measurably on the point x. They will be called transition probabilities:

7.2. MARKOV SHIFTS 197

for each measurable set E ⊂ X , the number P (x,E) is meant to represent theprobability that xn+1 ∈ E, given that xn = x. A probability measure p in X iscalled a stationary measure, relative to the family of transition probabilities, ifit satisfies

∫P (x,E) dp(x) = p(E), for every measurable set E ⊂ X. (7.2.1)

Heuristically, this means that, relative to p, a probability of the event xn+1 ∈ Eis equal to the probability of the event xn ∈ E.

Fix any stationary measure p (assuming it exists) and then define

µ([m;Am, . . . , An]

)=

=

∫

Am

dp(xm)

∫

Am+1

dP (xm, xm+1) · · ·∫

An

dP (xn−1, xn)(7.2.2)

for every cylinder [m;Am, . . . , An] of Σ. One can show (check Exercise 7.2.1)that this function extends to a probability measure in the σ-algebra generatedby the cylinders. This probability measure is invariant under the shift map σ,because the right-hand side of (7.2.2) does not depend on m. Every probabilitymeasure µ obtained in this way is called a Markov measure; moreover, thesystem (σ, µ) is called Markov shift.

Example 7.2.1. (Bernoulli measure) Suppose that P (x, ·) does not depend onx, that is, that there exists a probability measure ν on X such that P (x, ·) = νfor every x ∈ X . Then

∫P (x,E) dp(x) =

∫ν(E) dp(x) = ν(E)

for every probability measure p and every measurable set E ⊂ X . Therefore,there exists exactly one stationary measure, namely p = ν. The definition in(7.2.2) gives

µ([m;Am, . . . , An]

)=

∫

Am

dν(xm)

∫

Am+1

dν(xm+1) · · ·∫

An

dν(xn)

= ν(Am)ν(Am+1) · · · ν(An).

Example 7.2.2. Suppose that the set X is finite, say X = 1, . . . , d forsome d ≥ 2. Any family of transition probabilities P (x, ·) on X is completelycharacterized by the values

Pi,j = P (i, j), 1 ≤ i, j ≤ d. (7.2.3)

Moreover, a measure p on the set X is completely characterized by the valuespi = p(i), 1 ≤ i ≤ d. With these notations, the definition (7.2.1) translates to

d∑

i=1

piPi,j = pj , for every 1 ≤ j ≤ d. (7.2.4)


Moreover, a Markov measure µ is determined by

µ([m; am, . . . , an]

)= pamPam,am+1 · · ·Pan−1,an . (7.2.5)

In the remainder of this book we always restrict ourselves to finite Markovshifts, that is, to the context of Example 7.2.2. We take the set X endowed withthe discrete topology and the corresponding Borel σ-algebra. Observe that thematrix

P = (Pi,j)1≤i,j≤d

defined by (7.2.3) satisfies the following conditions:

(i) Pi,j ≥ 0 for every 1 ≤ i, j ≤ d;

(ii)∑d

j=1 Pi,j = 1 for every 1 ≤ i ≤ d.

We say that P is a stochastic matrix. Conversely, any matrix that satisfies (i)and (ii) defines a family of transition probabilities on the set X . Observe alsothat, denoting p = (p1, . . . , pd), the relation (7.2.4) corresponds to

P ∗p = p, (7.2.6)

where P ∗ denotes the transposed of the matrix P . In other words, the stationarymeasures correspond precisely to the eigenvectors of the transposed matrix forthe eigenvalue 1. Using the following classical result, one can show that sucheigenvalues always exist:

Theorem 7.2.3 (Perron-Frobenius). Let A be a d×d matrix with non-negativecoefficients. Then there exists λ ≥ 0 and some vector v 6= 0 with non-negativecoefficients such that Av = λv and λ ≥ |γ| for every eigenvalue γ of A.

If A has some power whose coefficients are all positive then λ > 0 and it hassome eigenvector v whose coefficients are all positive. Indeed, λ > |γ| for anyother eigenvalue γ of A. Moreover, the eigenvalue λ has multiplicity 1 and it isthe only eigenvalue of A having some eigenvector with non-negative coefficients.

A proof of the Perron-Frobenius theorem may be found in the book of Mey-ers [Mey00], for example. Applying this theorem to the matrix A = P ∗, weconclude that there exist λ ≥ 0 and p 6= 0 with pi ≥ 0 for every i, such that

d∑

i=1

piPi,j = λpj , for every 1 ≤ j ≤ d.

Adding over j = 1, . . . , d we get that

d∑

j=1

d∑

i=1

piPi,j = λd∑

j=1

pj.


Using property (ii) of the stochastic matrix, the left-hand side of this equalitymay be written as

d∑

i=1

pi

d∑

j=1

Pi,j =

d∑

i=1

pi.

Comparing the last two equalities and recalling that the sum of the coefficientsof p is a positive number, we conclude that λ = 1. This proves our claim thatthere always exist vectors p 6= 0 satisfying (7.2.6).

When Pn has positive coefficients for some n ≥ 1, it follows from Theo-rem 7.2.3 that the eigenvector is unique up to scaling, and it may be chosenwith positive coefficients.

Example 7.2.4. In general, p is not unique and it may also happen that thereis no eigenvalue with positive coefficients. For example, consider:

P =

1 − a a 0 0 0b 1 − b 0 0 00 0 1 − c c 00 0 d 1 − d 0e 0 0 0 1 − e

where a, b, c, d, e ∈ (0, 1). A vector p = (p1, p2, p3, p4, p5) satisfies P ∗p = p ifand only if ap1 = bp2 and cp3 = dp4 and p5 = 0. Therefore, the eigenspace hasdimension 2 and no eigenvector has positive coefficients.

On the other hand, suppose that p is such that pi = 0 for some i. Let µ bethe corresponding Markov measure and let Σi = (X \i)N (or Σi = (X \i)Z).Then µ(Σi) = 1, since µ([n; i]) = pi = 0 for every n. This means that we mayeliminate the symbol i, and still have a system that is equivalent to the originalone. Therefore, up to removing from the set X a certain number of superfluoussymbols, we may always take the eigenvector p to have positive coefficients.

Denote by ΣP the set of all sequences (xn)n ∈ Σ satisfying

Pxn,xn+1 > 0 for every n, (7.2.7)

that is, such that all the transitions are “allowed” by P . It is clear from thedefinition that ΣP is invariant under the shift map σ. The transformationsσ : ΣP → ΣP constructed in this way are called shifts of finite type and will bestudied in more detail in Section 10.2.2.

Lemma 7.2.5. The set ΣP is closed in Σ and, given any solution of P ∗p = pwith positive coefficients, the support of the corresponding Markov measure µcoincides with ΣP .

Proof. Let xk = (xkn)n, k ∈ N be any sequence in ΣP and suppose that itconverges in Σ to some x = (xn)n. By the definition of the topology in Σ,this means that for every n there exists kn ≥ 1 such that xkn = xn for everyk ≥ kn. So, given any n, taking k ≥ maxkn, kn+1 we conclude that Pxn,xn+1 =


Pxkn,x

kn+1

> 0. This shows that x ∈ ΣP and that proves the first part of the

lemma .

To prove the second part, recall that the cylinders [m;xm, . . . , xn] form abasis of neighborhoods of any x = (xn)n in Σ. If x ∈ ΣP then

µ([m;xm, . . . , xn]) = pxmPxm,xm+1 · · ·Pxn−1,xn > 0

for every cylinder and, thus, x ∈ suppµ. If x /∈ ΣP then there exists n suchthat Pxn,xn+1 = 0. In that case, µ([n;xn, xn+1]) = 0 and so x /∈ suppµ.

Example 7.2.6. There are three possibilities for the support of a Markovmeasure in Example 7.2.4. If p = (p1, p2, 0, 0, 0) with p1, p2 > 0 then wemay eliminate the symbols 3, 4, 5. All the sequences of symbols 1, 2 are ad-missible. Hence suppµ = 1, 2N. Analogously, if p = (0, 0, p3, p4, 0) withp3, p4 > 0 then suppµ = 3, 4N. In all the other cases, p = (p1, p2, p3, p4, 0)with p1, p2, p3, p4 > 0. Eliminating the symbol 5, we get that the set of admis-sible sequences is

ΣP = 1, 2N ∪ 3, 4N.

Both sets in this union are invariant and have positive measure. So, in this casethe Markov shift (σ, µ) is not ergodic. But it follows from the theory presentedin the next section that in the previous two cases the system (σ, µ) is indeedergodic.

In the next lemma we collect some simple properties of stochastic matricesthat will be useful in what follows:

Lemma 7.2.7. Let P be a stochastic matrix and p = (p1, . . . , pd) be a solutionof P ∗p = p. For every n ≥ 0, denote by Pni,j , 1 ≤ i, j ≤ d the coefficients of thematrix Pn. Then:

(a)∑d

j=1 Pni,j = 1 for every 1 ≤ i ≤ d and every n ≥ 1;

(b)∑d

i=1 piPni,j = pj for every 1 ≤ j ≤ d and every n ≥ 1;

(c) the hyperplane H = (h1, . . . , hd) : h1 + · · · + hd = 0 is invariant underthe matrix P ∗.

Proof. Condition (ii) in the definition of stochastic matrix may be written asPu = u, with u = (1, . . . , 1). Then Pnu = u for every n ≥ 1. This is just anotherway of writing the claim (a). Analogously, P ∗p = p implies that (P ∗)np = p forevery n ≥ 1, which is another way of writing the claim (b). Observe that H isthe orthogonal complement of vector u. Since u is invariant under P , it followsthat H is invariant under the transposed matrix P ∗, as claimed in (c).


7.2.1 Ergodicity

In this section we always take p = (p1, . . . , pd) to be a solution of P ∗p = p withpi > 0 for every i, normalized in such a way that

∑i pi = 1. Let µ be the

corresponding Markov measure. We want to understand which conditions thestochastic matrix P must satisfy for the system (σ, µ) to be ergodic.

We say that a stochastic matrix P is irreducible if for every 1 ≤ i, j ≤ dthere exists n ≥ 0 such that Pni,j > 0. In other words, P is irreducible if anyoutcome i may be followed by any outcome j, after a certain number n of stepswhich may depend on i and j.

Theorem 7.2.8. The Markov shift (σ, µ) is ergodic if and only if the matrix Pis irreducible.

The remainder of the present section is dedicated to the proof of this theo-rem. We start by proving the following useful estimate:

Lemma 7.2.9. Let A = [m; am, . . . , aq] and B = [r; br, . . . , bs] be cylinders ofΣ with r > q. Then

µ(A ∩B) = µ(A)µ(B)P r−qaq ,br

pbr

.

Proof. We may write A ∩B as a disjoint union

A ∩B =⋃

x

[m; am, . . . , aq, xq+1, . . . , xr−1, br, . . . , bs],

over all x = (xq+1, . . . , xr−1) ∈ Xr−q−1. Then,

µ(A ∩B) =∑

x

pamPam,am+1 · · ·Paq−1,aqPaq,xq+1 . . . Pxr−1,brPbr ,br+1 · · ·Pbs−1,bs

= µ(A)∑

x

Paq,xq+1 . . . Pxr−1,br

1

pbr

µ(B).

The sum in this last expression is equal to P r−qaq,br. Therefore,

µ(A ∩B) = µ(A)µ(B)P r−qaq ,br/pbr ,

as stated.

Lemma 7.2.10. A stochastic matrix P is irreducible if and only if

limn

1

n

n−1∑

l=0

P li,j = pj for every 1 ≤ i, j ≤ d. (7.2.8)

Proof. Assume that (7.2.8) holds. Recall that pj > 0 for every j. Then, givenany 1 ≤ i, j ≤ d, we have P li,j > 0 for infinitely many values of l. In particular,P is irreducible.


To prove the converse, consider A = [0; i] and B = [0; j]. By Lemma 7.2.9,

1

n

n−1∑

l=0

µ(A ∩ σ−l(B)) =1

pjµ(A)µ(B)

1

n

n−1∑

l=0

P li,j .

According to the Exercise 4.1.2, the left-hand side converges when n → ∞.Therefore,

Qi,j = limn

1

n

n−1∑

l=0

P li,j

exists for every 1 ≤ i, j ≤ d. Consider the matrix Q = (Qi,j)i,j , that is,

Q = limn

1

n

n−1∑

l=0

P l. (7.2.9)

Using Lemma 7.2.7(b) and taking the limit when n→ ∞, we get that

d∑

i=1

piQi,j = pj for every 1 ≤ j ≤ d. (7.2.10)

Observe also that, given any k ≥ 1,

P kQ = limn

1

n

n−1∑

l=0

P k+l = limn

1

n

n−1∑

l=0

P l = Q. (7.2.11)

It follows that Qi,j does not depend on i. Indeed, suppose that there exist rand s such that Qr,j < Qs,j. Of course, we may choose s in such a way that theright-hand side of this inequality is largest. Since P is irreducible, there existsk such that P ks,r > 0. Hence, using (7.2.11) followed by Lemma 7.2.7(a),

Qs,j =

d∑

i=1

P ks,iQi,j < (

d∑

i=1

P ks,i)Qs,j = Qs,j ,

which is a contradiction. This contradiction proves that Qi,j does not dependon i, as claimed. Write Qj = Qi,j for any i. The property (7.2.10) gives that

pj =

d∑

i=1

Qi,jpi = Qj(

d∑

i=1

pi) = Qj ,

for every j. This finishes the proof of the lemma.

Proof of Theorem 7.2.8. Suppose that µ is ergodic. Let A = [0; i] andB = [0; j].By Proposition 4.1.4,

limn

1

n

n−1∑

l=0

µ(A ∩ σ−l(B)) = µ(A)µ(B) = pipj . (7.2.12)


On the other hand, by Lemma 7.2.9, we have that µ(A ∩ σ−l(B)) = piPli,j .

Using this identity in (7.2.12) and dividing both sides by pi we find that

limn

1

n

n−1∑

l=0

P li,j = pj .

Note that j is arbitrary. So, by Lemma 7.2.10, this proves that P is irreducible.Now suppose that the matrix P is irreducible. We want to conclude that µ

is ergodic. According to Corollary 4.1.5, it is enough to prove that

limn

1

n

n−1∑

l=0

µ(A ∩ σ−l(B)) = µ(A)µ(B) (7.2.13)

for any A and B in the algebra generated by the cylinders. Since the elementsof this algebra are the finite pairwise disjoint unions of cylinders, it suffices toconsider the case when A and B are cylinders, say A = [m; am, . . . , aq] andB = [r; br, . . . , bs]. Observe also that the validity of (7.2.13) is not affected ifone replaces B by some pre-image σ−j(B). So, it is no restriction to supposethat r > q. Then, by Lemma 7.2.9,

1

n

n−1∑

l=0

µ(A ∩ σ−l(B)) = µ(A)µ(B)1

pbr

1

n

n−1∑

l=0

P r−q+laq ,br

for every n. By Lemma 7.2.10,

limn

1

n

n−1∑

l=0

P r−q+laq,br= lim

n

1

n

n−1∑

l=0

P laq ,br= pbr .

This proves the property (7.2.13) for the cylinders A and B.

7.2.2 Mixing

In this section we characterize the Markov shifts that are mixing, in terms ofthe corresponding stochastic matrix P . As before, we take p to be a normalizedsolution of P ∗p = p with positive coefficients and µ to be the correspondingMarkov measure.

We say that a stochastic matrix P is aperiodic if there exists n ≥ 1 such thatPni,j > 0 for every 1 ≤ i, j ≤ d. In other words, P is aperiodic if some power Pn

has only positive coefficients. The relation between the notions of aperiodicityand irreducibility is analyzed in Exercise 7.2.6.

Theorem 7.2.11. The Markov shift (σ, µ) is mixing if and only if the matrixP is aperiodic.

For the proof of Theorem 7.2.11 we need the following fact:


Lemma 7.2.12. A stochastic matrix P is aperiodic if and only if

limlP li,j = pj for every 1 ≤ i, j ≤ d. (7.2.14)

Proof. Since we assume that pj > 0 for every j, it is clear that (7.2.14) impliesthat P li,j > 0 for every i, j and every l sufficiently large.

Now suppose that P is aperiodic. Then we may apply the theorem of Perron-Frobenius (Theorem 7.2.3) to the matrix A = P ∗. Since p is an eigenvector ofA with positive coefficients, we get that λ = 1 and all the other eigenvalues ofA are smaller than 1 in absolute value. By Lemma 7.2.7(c), the hyperplane Hformed by the vectors (h1, . . . , hd) with h1 + · · · + hd = 0 is invariant under A.It is clear that H is transverse to the direction of p. Then the decomposition

Rd = Rp⊕H (7.2.15)

is invariant under A and the restriction of A to the hyperplane H is a con-traction, meaning that its spectral radius is smaller than 1. It follows that thesequence (Al)l converges to the projection on the first coordinate of (7.2.15),that is, to the matrix B characterized by Bp = p and Bh = 0 for every h ∈ H .In other words, (P l)l converges to B∗. Observe that

Bi,j = pi for every 1 ≤ i, j ≤ d.

Therefore, limn Pli,j = Bj,i = pj for every i, j.

Proof of Theorem 7.2.11. Suppose that the measure µ is mixing. Let A = [0; i]and B = [0; j]. By Lemma 7.2.9, we have that µ(A∩σ−l(B)) = piP

li,j for every

l. Therefore,

pi limlP li,j = lim

lµ(A ∩ σ−l(B)) = µ(A)µ(B) = pipj .

Dividing both sides by pi we get that liml Pli,j = pj . According to Lemma 7.2.12,

this proves that P is aperiodic.Now suppose that the matrix P is aperiodic. We want to conclude that µ is

mixing. According Lemma 7.1.2, it is enough to prove that

limlµ(A ∩ σ−l(B)) = µ(A)µ(B) (7.2.16)

for any A and B in the algebra generated by the cylinders. Since the elementsof this algebra are the finite pairwise disjoint unions of cylinders, it sufficesto treat the case when A and B are cylinders, say A = [m; am, . . . , aq] andB = [r; br, . . . , bs]. By Lemma 7.2.9,

µ(A ∩ σ−l(B)) = µ(A)µ(B)1

pbr

P r−q+laq,br

for every l > q − r. Then, using Lemma 7.2.12,

limlµ(A ∩ σ−l(B)) = µ(A)µ(B)

1

pbr

limlP r−q+laq,br

= µ(A)µ(B)1

pbr

limlP laq,br

= µ(A)µ(B).


This proves the property (7.2.16) for cylinders A and B.

Example 7.2.13. In Example 7.2.4 we found different types of Markov mea-sures, depending on the choice of the probability eigenvector p. In the firstcase, p = (p1, p2, 0, 0, 0) and the measure µ is supported on 1, 2N. Once thesuperfluous symbols 3, 4, 5 have been removed, the stochastic matrix reducesto

P =

(1 − a ab 1 − b

).

Since this matrix is aperiodic, the Markov measure µ is mixing. The secondcase is entirely analogous. In the third case, p = (p1, p2, p3, p4, 0) and, afterremoving the superfluous symbol 5, the stochastic matrix reduces to

P =

1 − a a 0 0b 1 − b 0 00 0 1 − c c0 0 d 1 − d

.

This matrix is not irreducible and, hence, the Markov measures that one findsin this case are not ergodic (recall also Example 7.2.6).

Example 7.2.14. It is not difficult to find examples of irreducible matricesthat are not aperiodic:

P =

0 1/2 0 1/21/2 0 1/2 00 1/2 0 1/2

1/2 0 1/2 0

.

Indeed, we see that Pni,j > 0 if and only if n has the same parity as i− j. Notethat

P 2 =

1/2 0 1/2 00 1/2 0 1/2

1/2 0 1/2 00 1/2 0 1/2

.

Exercise 7.2.6 shows that every irreducible matrix has a form of this type.

7.2.3 Exercises

7.2.1. Let X = 1, . . . , d and P = (Pi,j)i,j be a stochastic matrix and p = (pi)ibe a probability vector such that P ∗p = p. Show that the function defined onthe set of all cylinders by

µ([m; am, . . . , an]

)= pamPam,am+1 · · ·Pan−1,an

extends to a measure on the Borel σ-algebra of Σ = XN (or Σ = XZ), invariantunder the shift map σ : Σ → Σ.


7.2.2. Prove that every weak mixing Markov shift is actually mixing.

7.2.3. Let X = 1, . . . , d and let µ be a Markov measure for the shift mapσ : XZ → XZ. Does it follow that µ is also a Markov measure for the inverseσ−1 : Σ → Σ?

7.2.4. Let X be a finite set and Σ = XZ (or Σ = XN). Let µ be a probabilitymeasure on Σ, invariant under the shift map σ : Σ → Σ. Given k ≥ 0, we saythat µ has memory k if

µ([m− l; am−l, . . . , am−1, am])

µ([m− l; am−l, . . . , am−1])=µ([m− k; am−k, . . . , am−1, am])

µ([m− k; am−k, . . . , am−1]).

for every l ≥ k, every m and every (an)n ∈ Σ (by convention, the equality holdswhenever at least one of the denominators is zero). Check that the measureswith memory 0 are the Bernoulli measures and the measures with memory 1are the Markov measures. Show that every measure with memory k ≥ 2 isequivalent to a Markov measure in the space Σ = XZ (or Σ = XN), whereX = Xk.

7.2.5. The goal is to show that the set of all measures with finite memory isdense in the space M1(σ) of all probability measures invariant under the shiftmap σ : Σ → Σ. Given any invariant probability measure µ and any k ≥ 1,consider the function µk defined on the set of all cylinders by

• µk = µ for cylinders with length less or equal than k;

• for every l ≥ k, every m and every (an)n ∈ Σ,

µk([m− l; am−l, . . . , am−1, am])

µk([m− l; am−l, . . . , am−1])=µ([m− k; am−k, . . . , am−1, am])

µ([m− k; am−k, . . . , am−1])

Show that µk extends to a probability measure on the Borel σ-algebra of Σ,invariant under the shift map and with memory k. Check that limk µk = µ inthe weak∗ topology.

7.2.6. Let P be an irreducible stochastic matrix. The goal is to show that thereexist κ ≥ 1 and a partition of X into κ subsets, such that the restriction of P κ

to each of these subsets is aperiodic. For that:

(1) For every i ∈ X , define R(i) = n ≥ 1 : Pni,i > 0. Show that R(i) isclosed under addition: if n1, n2 ∈ R(i) then n1 + n2 ∈ R(i).

(2) Let R ⊂ N be closed under addition and let κ ≥ 1 be the greatest commondivisor of its elements. Show that there exists m ≥ 1 such that R ∩[m,∞) = κN ∩ [m,∞).

(3) Show that the greatest common divisor κ of the elements of R(i) does notdepend on i ∈ X and that P is aperiodic if and only if κ = 1.

(4) Assume that κ ≥ 2. Find a partition Xr : 0 ≤ r < κ of X such that therestriction of P κ to each Xr is aperiodic.

7.3. INTERVAL EXCHANGES 207

7.3 Interval exchanges

By definition, an interval exchange is a bijection of the interval [0, 1) with a finitenumber of discontinuities and whose restriction to every continuity subintervalis a translation. Figure 7.1 describes an example with 4 continuity subintervals.To fix ideas, we always take the transformation to be continuous on the right,that is, we take all continuity subintervals to be closed on the left and open onthe right.

T C A G

f(G)

f(A)

f(C)

f(T )

Figure 7.1: An interval exchange

As a direct consequence of the definition, every interval exchange preservesthe Lebesgue measure on [0, 1). These transformations exhibit a very rich dy-namical behavior and they also have important connections with many othersystems, such as polygonal billiards, conservative flows on surfaces and Te-ichmuller flows. For example, the construction that we sketch in the sequelshows that interval exchanges arise naturally as Poincare return maps of con-servative vector fields on surfaces.

Example 7.3.1. Let S be an orientable surface and ω be an area form in S,that is, a differential 2-form that is non-degenerate at every point. We mayassociate with every vector field X a differential 1-form β, defined by

βx(v) = ωx(X(x), v) for every vector v ∈ TxS.

Observe that X and β have the same zeros. Moreover, at all other points thekernel of β coincides with the direction of the vector field. The 1-form β permitsto define a notion of “transverse arc-length” of curves c : [a, b] → S, as follows:

ℓ(c) =

∫

c

β =

∫ b

a

βc(t)(c(t)) dt.

Note that the flow trajectories have transverse arc-length zero. However, forcurves transverse to the flow, the measure ℓ is equivalent to the usual arc-lengthmeasure, in the sense that they have the same zero measure sets. We can show(see Exercise 7.3.1) that the 1-form β is closed if and only if X preserves area.Then, using the theorem of Green, the Poincare maps of the flow preserve the


transverse length. With an additional hypothesis on the zeros of X , the first-return map f : Σ → Σ to any cross-section Σ is well defined and is continuousoutside a finite subset of Σ. Then, parameterizing Σ by transverse arc length,f is an interval exchange.

An interval exchange is determined by two ingredients. The first one, of acombinatorial nature, concerns the number of continuity subintervals, the orderof these subintervals and the order of their images inside the interval [0, 1). Thismay be informed by assigning a label (for example, a letter) to each continuitysubinterval and to its image, and by listing these labels in their correspondingorders, in two horizontal rows. For example, in the case described in Figure 7.1,we obtain

π =

(T C A GG A C T

).

Note that the choice of the labels is arbitrary. We denote by A, and call alphabet,the set of all labels.

The second ingredient, of a metric nature, concerns the lengths of the subin-tervals. This may be expressed through a vector with positive coefficients,indexed by the alphabet: each coefficient determines the length of the corre-sponding continuity subinterval (and of its image). In the case of Figure 7.1this length vector has the form

λ = (λT , λC , λA, λG).

The sum of the coefficients of a length vector is always equal to 1.

Then, the interval exchange f : [0, 1) → [0, 1) associated with each pair (π, λ)is defined as follows. For every label α ∈ A, denote by Iα the correspondingcontinuity subinterval and define wα = v1 − v0, where v0 is the sum of thelengths λβ corresponding to all labels β to the left of α on the top row of π andv1 is the sum of the lengths λγ corresponding to all the labels γ to the left of αon the bottom row of π. Then

f(x) = x+ wα for every x ∈ Iα.

The vector w = (wα)α∈A is called translation vector. Clearly, for each fixed π,the translation vector is a linear function of the length vector λ = (λα)α∈A.

Example 7.3.2. The simplest interval exchanges have only two continuitysubintervals. See Figure 7.2. Choosing the alphabet A = A,B, we get

π =

(A BB A

)and f(x) =

x+ λB for x ∈ IAx− λA = x+ λB − 1 for x ∈ IB.

This transformation corresponds precisely to the rotation RλB if we identify[0, 1) with the circle S1, in the natural way. In this sense, the class of intervalexchanges is a generalization of the family of circle rotations.


A B

f(B)

f(A)

Figure 7.2: Rotation viewed as an exchange of two intervals

7.3.1 Minimality and ergodicity

As we saw previously, a circle rotation Rθ is minimal if and only if θ is irra-tional. Moreover, in that case Rθ is also uniquely ergodic. Given that almostevery number is irrational, this means that minimality and unique ergodicityare typical in the family of circle rotations. In this section we discuss the twoproperties in the broader context of interval exchanges.

Let us start with an observation that has no analogue for rotations. We saythat the combinatorics π of an interval exchange is reducible if there exists someposition such that the labels to the left of that position in the two rows of π areexactly the same. For example,

π =

(B X O L F DX O B F D L

)

is reducible, as the labels to the left of the fourth position are the same in bothrows: B, O and X . As a consequence, for any length vector λ, the intervalexchange f defined by (π, λ) leaves the subinterval IB ∪ IO ∪ IX invariant. Inparticular, f can not be minimal, not even transitive. In what follows we alwaysassume the combinatorics π to be irreducible.

It is natural to ask whether the interval exchange is minimal whenever thelength vector λ = (λα)α∈A is rationally independent, that is, whenever

∑

α∈A

nαλα 6= 0

for every non-zero vector (nα)α∈A with integer coefficients. This turns out tobe true but, in fact, the hypothesis of rational independence is a bit too strong:we are going to present a somewhat more general condition that still impliesminimality.

We denote by ∂Iα the left endpoint of each subinterval Iα. We say that apair (π, λ) satisfies the Keane condition if the trajectories of these points aredisjoint:

fm(∂Iα) 6= ∂Iβ for every m ≥ 1 and any α, β ∈ A with ∂Iβ 6= 0 (7.3.1)


(note that there always exist α and β such that f(∂Iα) = 0 = ∂Iβ). We leavethe proof of the next lemma as an exercise (Exercise 7.3.2):

Lemma 7.3.3. (1) If the pair (π, λ) satisfies the Keane condition then thecombinatorics π is irreducible.

(2) If π is irreducible and λ is rationally independent then the pair (π, λ)satisfies the Keane condition.

Since the subset of rationally independent vectors has full Lebesgue measure,it follows that the Keane condition is satisfied for almost every length vector λ,if π is irreducible.

Example 7.3.4. In the case of two subintervals (recall Example 7.3.2), theinterval exchange has the form fm(x) = x + mλB mod Z. Then, the Keanecondition means that

mλB 6= λA + n and λA +mλB 6= λA + n

for every m ∈ N and n ∈ Z. It is clear that this holds if and only if the vector(λA, λB) is rationally independent.

Example 7.3.5. For exchanges of 3 or more intervals, the Keane condition isstrictly weaker than the rational independence of the length vector. Consider,for example,

π =

(A B CC A B

).

Then fm(x) = x+mλC mod Z and, thus, the Keane condition means that

mλC , λA +mλC , λA + λB +mλC and λA + n, λA + λB + n

are disjoint for every m ∈ N and n ∈ Z. Equivalently,

pλC /∈ q, λA + q for every p ∈ Z and q ∈ Z.

This may hold even when (λA, λB, λC) is rationally dependent.

The following result was proved by Michael Keane [Kea75]:

Theorem 7.3.6 (Keane). If (π, λ) satisfies the Keane condition then the inter-val exchange f is minimal.

Example 7.3.7. The Keane condition is not necessary for minimality. Forexample, consider the interval exchange defined by (π, λ), where

π =

(A B C DD C B A

)

λA = λC , λB = λD and λA/λB = λC/λD is irrational. Then (π, λ) does notsatisfy the Keane condition and yet f is minimal.


As observed previously, every minimal circle rotation is also uniquely ergodic.This is still true for exchanges of 3 intervals, but not in general. Indeed, Keanegave an example of an exchange of 4 intervals exhibiting two ergodic probabilitymeasures, notwithstanding the fact that the combinatorics π is irreducible andthe length vector λ is rationally independent.

Keane conjectured that, nevertheless, it should be true that almost everyinterval exchange is uniquely ergodic. The following remarkable result, obtainedindependently by Howard Masur [Mas82] and William Veech [Vee82], establishedthis conjecture:

Theorem 7.3.8 (Masur, Veech). Assume that π is irreducible. Then, forLebesgue almost every length vector λ, the interval exchange defined by (π, λ) isuniquely ergodic.

A bit before, Michael Keane and Gerard Rauzy [KR80] had shown thatunique ergodicity holds for a residual (Baire second category) subset of lengthvectors whenever the combinatorics is irreducible.

7.3.2 Mixing

The interval exchanges provide many examples of systems that are uniquelyergodic and weak mixing but not (strongly) mixing.

Indeed, the theorem of Masur-Veech (Theorem 7.3.8) asserts that almostevery interval exchange is uniquely ergodic. Another deep theorem, due to ArturAvila and Giovanni Forni [AF07], states that, circle rotations (more precisely,interval exchanges with a unique discontinuity point) excluded, almost everyinterval exchange is weak mixing. The topological version of this fact had beenproved by Arnaldo Nogueira and Donald Rudolph [NR97].

On the other hand, a result of Anatole Katok [Kat80] that we are going todiscuss in the sequel asserts that interval exchanges are never mixing:

Theorem 7.3.9. Let f : [0, 1) → [0, 1) be an interval exchange and µ be aninvariant probability measure. Then (f, µ) is not mixing.

Proof. We may take µ to be ergodic, for otherwise the conclusion is obvious.If µ has some atom then its support is a periodic orbit and, thus, µ can notbe mixing. Hence, we may also take µ to be non-atomic. Denote by m theLebesgue measure on the interval and consider the map

h : [0, 1) → [0, 1), h(x) = µ([0, x]).

Then h is a homeomorphism and satisfies h∗µ = m. Consequently, the mapg = h f h−1 : [0, 1) → [0, 1) has finitely many discontinuity points andpreserves the Lebesgue measure. In particular, the restriction of g to eachcontinuity subinterval is a translation. Therefore, g is also an interval exchange.It is clear that (f, µ) is mixing if and only if (g,m) is mixing. Therefore, to proveTheorem 7.3.9 it is no restriction to suppose that µ is the Lebesgue measure m.We do that from now on.


Our goal is to find a measurable set X such that m(X ∩ f−n(X)) does notconverge to m(X)2 when n→ ∞. Let d = #A.

Lemma 7.3.10. Every interval J = [a, b) contained in some Iβ admits a par-tition J1, . . . , Js into no more than d+ 2 subintervals of the form Ji = [ai, bi)and admits natural numbers t1, . . . , ts ≥ 1 such that

(a) fn(Ji) ∩ J = ∅ for every 0 < n < ti and 1 ≤ i ≤ s;

(b) f ti | Ji is a translation for every 1 ≤ i ≤ s;

(c) f t1(J1), . . . , fts(Js) is a partition of J ;

(d) the intervals fn(Ji), 1 ≤ i ≤ s, 0 ≤ n < ti are pairwise disjoint;

(e)⋃∞n=0 f

n(J) =⋃si=1

⋃ti−1n=0 f

n(Ji).

Proof. Let B be the set formed by the endpoints a and b of J together with theendpoints ∂Iα, α ∈ A minus the origin. Then #B ≤ d + 1. Let BJ ⊂ J bethe set of points x ∈ J for which there exists m ≥ 1 such that fm(x) ∈ B andfn(x) /∈ J for every 0 < n < m. The fact that f is injective, together with thedefinition of m, implies that the map

BJ → B, x 7→ fm(x)

is injective. In particular, #BJ ≤ #B ≤ d + 1. Consider the partition of Jinto subintervals Ji = [ai, bi) with endpoints ai, bi in the set BJ ∪ a, b. Thispartition has at most d + 2 elements. By the Poincare recurrence theorem,for each i there exists ti ≥ 1 such that f ti(Ji) intersects J . Take ti minimumwith this property. Part (a) of the lemma is an immediate consequence. Bythe definition of BJ , the restriction of f ti to the interval Ji is a translation,as stated in part (b), and its image is contained in J . Moreover, the imagesf ti(Ji), 1 ≤ i ≤ s are pairwise disjoint, since f is injective and the ti are thefirst-return times to J . In particular,

s∑

i=1

m(f ti(Ji)) =

s∑

i=1

m(Ji) = m(J)

and so ∪si=1fti(Ji) = J . This proves part (c). Part (d) also follows directly from

the fact that f is injective and the ti are the first-return times to J . Finally,part (e) is a direct consequence of part (c).

Consider any interval J contained in some Iβ . By ergodicity, the invariantset ∪∞

n=0fn(J) has full measure. By part (e) of Lemma 7.3.10, this set is a finite

union of intervals closed on the left and open on the right. Therefore,

∞⋃

n=0

fn(J) =

s⋃

i=1

ti−1⋃

n=0

fn(Ji) = I

So, by Lemma 7.3.10(d), the family PJ = fn(Ji) : 1 ≤ i ≤ s and 0 ≤ n < tiis a partition of I.


Lemma 7.3.11. Given δ > 0 and N ≥ 1, we may choose the interval J in sucha way that diamPJ < δ and ti ≥ N for every i.

Proof. It is clear that diam fn(Ji) = diamJi ≤ diamJ for every i and everyn. Hence, diamPJ < δ as long as we pick J with diameter smaller than δ.To get the second property in the statement, take any point x ∈ I such thatfn(x) 6= ∂Iα for every 0 ≤ n < N and α ∈ A. We claim that fn(x) 6= x forevery 0 < n < N . Otherwise, since fn is a translation in the neighborhood of x,we would have fn(y) = y for every point y in that neighborhood, which wouldcontradict the hypothesis that (f,m) is ergodic. This proves our claim. Nowit suffices to take J = [x, x + ε) with ε < min0<n<N d(x, f

n(x)) to ensure thatti ≥ N for every i.

Lemma 7.3.12. For every 1 ≤ i ≤ s there exist si ≤ d + 2 and naturalnumbers ti,1, . . . , ti,si such that ti,j ≥ ti and, given any set A in the algebraAJ generated by PJ , there exists ti,j such that

m(A ∩ f−ti,j (A)) ≥ 1

(d+ 2)2m(A). (7.3.2)

Proof. Applying Lemma 7.3.10 to each of the intervals Ji, 1 ≤ i ≤ s we findsi ≤ d+2, a partition Ji,j : 1 ≤ j ≤ si of the interval Ji and natural numbersti,j such that each ti,j is the first-return time of the points of Ji,j to Ji. It isclear that ti,j ≥ ti, since ti is the first-return time of any point of Ji to theinterval J . The fact that Ji,j ⊂ f−ti,j (Ji) implies that

fn(Ji) =

si⋃

j=1

fn(Ji,j) ⊂si⋃

j=1

f−ti,j (fn(Ji)) for every n ≥ 0.

Since the algebra AJ is formed by the finite pairwise disjoint unions of intervalsfn(Ji), 0 ≤ n < ti, it follows that

A ⊂s⋃

i=1

si⋃

j=1

f−ti,j (A) for every A ∈ AJ .

In particular, m(A) ≤ ∑si=1

∑si

j=1m(A ∩ f−ti,j (A)). Recalling that s ≤ d + 2and si ≤ d+ 2 for every i, this implies (7.3.2).

We are ready to conclude the proof of Theorem 7.3.9. For that, let us fix ameasurable set X ⊂ [0, 1) with

0 < m(X) <1

4(d+ 2)2.

By Lemma 7.3.11, given any N ≥ 1 we may find an interval J ⊂ [0, 1) such thatall the first-return times ti ≥ N and there exists A ∈ AJ such that

m(X∆A) <1

4m(X)2. (7.3.3)


Applying Lemma 7.3.12, we get that there exists ti,j ≥ ti ≥ N such that:

m(X ∩ f−tij (X)) ≥ m(A ∩ f−tij (A)) − 2m(X∆A)

≥ 1

(d+ 2)2m(A) − 1

2m(X)2.

The relation (7.3.3) implies that m(A) ≥ (3/4)m(X). Therefore,

m(X ∩ f−tij(X)) ≥ 3

4

1

(d+ 2)2m(X) − 1

2m(X)2

≥ 3m(X)2 − 1

2m(X)2 > 2m(X)2

This proves that lim supnm(X ∩ f−n(X)) ≥ 2m(X)2 and so the system (f,m)can not be mixing.

7.3.3 Exercises

7.3.1. Let ω be an area form on a surface. Let X be a differentiable vector fieldon S and β be the differential 1-form defined on S by βx = ωx(X(x), ·). Showthat β is closed if and only if X preserves the area measure.

7.3.2. Prove Lemma 7.3.3.

7.3.3. Show that if (π, λ) satisfies the Keane condition then f has no periodicpoints. [Observation: This is a step in the proof of Theorem 7.3.6.]

7.3.4. Let f : [0, 1) → [0, 1) be an irreducible interval exchange and let a ∈ (0, 1)be the largest of all the discontinuity points of f and f−1. The Rauzy-Veechrenormalization R(f) : [0, 1) → [0, 1) is defined by R(f)(x) = g(ax)/a, where gis the first-return map of f to the interval [0, a). Check that R(f) is an intervalexchange with the same number of continuity subintervals as f , or less. If f isdescribed by the data (π, λ), how can we describe R(f)?

7.3.5. Given d ≥ 2 and a bijection σ : N → N without periodic points, considerthe transformation f : [0, 1] → [0, 1] where each f(x) is obtained by permutingthe digits of the base d expansion of x as prescribed by σ. More precisely,if x =

∑∞n=1 and

−n with an ∈ 0, . . . , d − 1 and infinitely many values of nsuch that an < d − 1, then f(x) =

∑∞n=1 aσ(n)d

−n. Show that f preserves theLebesgue measure m in the interval and that (f,m) is mixing.

7.4 Decay of correlations

In this section we discuss how quickly does the correlations sequence Cn(ϕ, ψ)decay to zero, in a mixing system. Since we are dealing with deterministicsystems, we can not expect interesting estimates to hold for arbitrary functions.However, as we are going to see, such estimates do exist in many important

7.4. DECAY OF CORRELATIONS 215

cases, if we restrict ϕ and ψ to suitable subsets of functions. Given that thecorrelations (ϕ, ψ) 7→ Cn(ϕ, ψ) are bilinear functions, it is natural to considersubsets that are vector subspaces.

We say that (f, µ) has exponential decay of correlations on a given vectorspace V if there exists λ < 1 and for every ϕ, ψ ∈ V there exists A(ϕ, ψ) > 0such that

|Cn(ϕ, ψ)| ≤ A(ϕ, ψ)λn for every n ≥ 1. (7.4.1)

There are similar notions (polynomial decay, for instance) where the exponentialλn is replaced by some other sequence converging to zero.

To illustrate the theory, let us analyze the issue of decay of correlations in thecontext of one-sided Markov shifts. That will also allow us to introduce severalideas that will be useful later in more general situations. Let f : M → M bethe shift map in M = XN, where X = 1, . . . , d is a finite set. Let P = (Pi,j)i,jbe an aperiodic stochastic matrix and p = (pi)i be the positive eigenvector ofP ∗, normalized by p1 + · · · + pd = 1. Let µ be the Markov measure defined inM by (7.2.2).

Consider L = G−1P ∗G, where G is the diagonal matrix whose coefficientsare p1, . . . , pd. The coefficients of L are given by

Li,j =pjpiPj,i for each 1 ≤ i, j ≤ d.

Recall that we denote u = (1, . . . , 1) and H = (h1, . . . , hd) : h1 + · · ·+hd = 0.Let

V = (v1, . . . , vd) : p1v1 + · · · + pdvd = 0.Then G(u) = p and G(V ) = H . Recalling (7.2.15), it follows that the decom-position

Rd = Ru⊕ V (7.4.2)

is invariant under L and all the eigenvalues of the restriction of L to V aresmaller than 1 in absolute value. We say that L has the spectral gap property ifthe largest eigenvalue is simple and all the rest of the spectrum is contained ina closed disk with strictly smaller radius.

The transfer operator of the shift map f is the linear operator Lf mappingeach function ψ : M → R to the function Lfψ : M → R defined by

Lfψ(x1, . . . , xn, . . . , ) =d∑

x0=1

Lx1,x0ψ(x0, x1, . . . , xn, . . . ). (7.4.3)

The transfer operator is dual to the Koopman operator Uf , in the followingsense: ∫

ϕ(Lfψ) dµ =

∫(Ufϕ)ψ dµ (7.4.4)

for any bounded measurable functions ϕ, ψ. Let us prove this fact.We call a function ϕ : M → R locally constant if there is n ≥ 0 such that

every ϕ(x) depends only on the first n coordinates x0, . . . , xn−1 of the point x.


For example, characteristic functions of cylinders are locally constant functions.Since every bounded measurable function is a uniform limit of linear combi-nations of characteristic functions of cylinders, it follows that every boundedmeasurable function is the uniform limit of some sequence of locally constantfunctions. Thus, to prove (7.4.4) it suffices to consider the case when ϕ and ψare both locally constant.

Then, consider functions ϕ and ψ that depend only on the first n coordinates.By the definition of Markov measure,∫ϕ(Lfψ) dµ =

∑

a1,...,an

pa1Pa1,a2 · · ·Pan−1,anϕ(a1, . . . , an)Lfψ(a1, . . . , an).

Using the definition of the transfer operator, the right-hand side of this expres-sion is equal to

∑

a0,a1,...,an

pa0Pa0,a1Pa1,a2 · · ·Pan−1,anϕ(a1, . . . , an)ψ(a0, a1, . . . , an).

Observe that ϕ(a1, . . . , an) = Ufϕ(a0, a1, . . . , an). So, using once more thedefinition of the Markov measure, this last expression is equal to

∫(Ufϕ)ψ dµ.

This proves the duality property (7.4.4).As a consequence, we may write the correlations sequence in terms of the

iterates of the transfer operator:

Cn(ϕ, ψ) =

∫(Unf ϕ)ψ −

∫ϕdµ

∫ψ dµ =

∫ϕ(Lnfψ −

∫ψ dµ

)dµ. (7.4.5)

The property Lu = u means that∑

j Li,j = 1 for every j. This has the followinguseful consequence:

sup |Lfψ| ≤ sup |ψ| for every ψ. (7.4.6)

Taking ϕ ≡ 1 in (7.4.4) we get the following special case, which will also beuseful in the sequel:

∫Lfψ dµ =

∫ψ dµ for every ψ. (7.4.7)

Now let us denote by E0 the subset of functions ψ that depend only on thefirst coordinate. The map ψ 7→ (ψ(1), . . . , ψ(d)) is an isomorphism between E0

and the Euclidean space Rd. Moreover, the definition

Lfψ(x1) =

d∑

x0=1

Lx1,x0ψ(x0)

shows that the restriction of the transfer operator to E0 corresponds precisely tothe operator L : Rd → Rd. Note also that the hyperplane V ⊂ Rd correspondsto the subset of ψ ∈ E0 such that

∫ψ dµ = 0. Consider in E0 the norm defined

by ‖ψ‖0 = sup |ψ|.


Fix any number λ between 1 and the spectral radius of L restricted to V .Every function ψ ∈ E0 may be written:

ψ = c+ v with c =

∫ψ dµ ∈ Ru and v = ψ −

∫ψ dµ ∈ V.

Then the spectral gap property implies that there exists B > 1 such that

sup∣∣Lnfψ −

∫ψ dµ

∣∣ ≤ B‖ψ‖0λn for every n ≥ 1. (7.4.8)

Using (7.4.5), it follows that

|Cn(ϕ, ψ)| ≤ B‖ϕ‖0‖ψ‖0λn for every n ≥ 1.

In this way, we have shown that every aperiodic Markov shift has exponentialdecay of correlations in E0.

With a little more effort, one can improve this result, by extending theconclusion to a much large space of functions. Consider in M the distancedefined by

d((xn)n, (yn)n

)= 2−N(x,y) where N(x, y) = minn ≥ 0 : xn 6= yn.

Fix any θ > 0 and denote by E the set of functions ϕ that are θ-Holder, that is,such that

Kθ(ϕ) = sup |ϕ(x) − ϕ(y)|

d(x, y)θ: x 6= y

is finite.

It is clear that E contains all the locally constant functions. We claim:

Theorem 7.4.1. Every aperiodic Markov shift (f, µ) has exponential decay ofcorrelations in the space E of θ-Holder functions, for any θ > 0.

Observe that Lf (E) ⊂ E . The function ‖ψ‖ = sup |ψ|+Kθ(ψ) is a completenorm in E and the linear operator Lf : E → E is continuous relative to thisnorm (Exercise 7.4.1). One way to prove the theorem is by showing that thisoperator has the spectral gap property, with invariant decomposition

E = Ru⊕ ψ ∈ E :

∫ψ dµ = 0.

Once that is done, exactly the same argument that we used before for E0 provesthe exponential decay of correlations in E . We do not present the details here(but we will come back to this theme, in a much more general context, nearthe end of Section 12.3). Instead, we give a direct proof that (7.4.8) may beextended to the space E .

Given ψ ∈ E and x = (x1, . . . , xn, . . . ) ∈M , we have

Lkfψ(x) =d∑

a1,...,ak=1

Lx1,ak· · ·La2,a1ψ(a1, . . . , ak, x1, . . . , xn, . . . )


for every k ≥ 1. Then, given y = (y1, . . . , yn, . . . ) with x1 = y1 = j,

|Lkfψ(x) − Lkfψ(y)| ≤d∑

a1,...,ak=1

Lj,ak· · ·La2,a1Kθ(ψ)2−kθd(x, y)θ.

Using the property∑di=1 Lj,i = 1, we conclude that

|Lkfψ(x) − Lkfψ(y)| ≤ Kθ(ψ)2−kθd(x, y)θ ≤ Kθ(ψ)2−kθ. (7.4.9)

Given any function ϕ, denote by πϕ the function that depends only on the firstcoordinate and coincides with the mean of ϕ on each cylinder [0; i]:

πϕ(i) =1

pi

∫

[0;i]

ϕdµ.

It is clear that sup |πϕ| ≤ sup |ϕ| and∫πϕdµ =

∫ϕdµ. The inequality (7.4.9)

implies that

sup |Lkfψ − π(Lkfψ)| ≤ Kθ(ψ)2−kθ for every k ≥ 1.

Then, using the property (7.4.6),

sup |Lk+lf ψ − Llfπ(Lkfψ)| ≤ Kθ(ψ)2−kθ for every k, l ≥ 1. (7.4.10)

Moreover, the properties (7.4.6) and (7.4.7) imply that

sup |π(Lkfψ)| ≤ sup |ψ| and

∫π(Lkfψ) dµ =

∫ψ dµ.

Therefore, the property (7.4.8) gives that

sup∣∣Llfπ(Lkfψ) −

∫ψ dµ

∣∣ ≤ B sup |ψ|λl for every l ≥ 1. (7.4.11)

Adding (7.4.10) and (7.4.11), we get that

sup∣∣Lk+lf ψ −

∫ψ dµ

∣∣ ≤ Kθ(ψ)2−kθ +B sup |ψ|λl for every k, l ≥ 1.

Fix σ < 1 such that σ2 ≥ max2−θ, λ. Then the previous inequality (withl ≈ n/2 ≈ k) gives

sup∣∣Lnfψ −

∫ψ dµ

∣∣ ≤ B‖ψ‖σn−1 for every n. (7.4.12)

Now Theorem 7.4.1 follows from the same argument that we used before for E0,with (7.4.12) in the place of (7.4.8).


7.4.1 Exercises

7.4.1. Show that ‖ϕ‖ = sup |ϕ| +Kθ(ϕ) defines a complete norm in the spaceE of θ-Holder functions and the transfer operator Lf is continuous relative tothis norm.

7.4.2. Let f : M → M be a local diffeomorphism on a compact manifold Mand d ≥ 2 be the degree of f . Assume that there exists σ > 1 such that‖Df(x)v‖ ≥ σ‖v‖ for every x ∈ M and every vector v tangent to M at thepoint x. Fix θ > 0 and let E be the space of θ-Holder functions ϕ : M → R.For every ϕ ∈ E , define

Lfϕ : M → R, Lfϕ(y) =1

d

∑

x∈f−1(y)

ϕ(x).

(1) Show that inf ϕ ≤ inf Lfϕ ≤ supLfϕ ≤ supϕ and Kθ(Lfϕ) ≤ σ−θKθ(ϕ)for every ϕ ∈ E .

(2) Conclude that Lf : E → E is a continuous linear operator (relative to thenorm defined in Exercise 7.4.1) with ‖Lf‖ = 1.

(3) Show that, for every ϕ ∈ E, the sequence (Lnfϕ)n converges to a constantνϕ ∈ R when n→ ∞. Moreover, there exists C > 0 such that

‖Lnfϕ− νϕ‖ ≤ Cσ−nθ‖ϕ‖ for every n and every ϕ ∈ E .

(4) Conclude that the operator Lf : E → E has the spectral gap property.

(5) Show that the map ϕ 7→ νϕ extends to a Borel probability measure on M(recall Theorem A.3.12).


Chapter 8

Equivalent Systems

This chapter is devoted to the isomorphism problem in Ergodic Theory: inwhich conditions should two systems (f, µ) and (g, ν) be considered “the same”and how to decide, for given systems, whether they are in those conditions?

The fundamental notion is called ergodic equivalence: two systems are saidto be ergodically equivalent if, restricted to subsets with full measure, the corre-sponding transformations are conjugated by some invertible map that preservesthe invariant measures. Through such a map, properties of any of the systemmay be translated to corresponding properties of the other system.

Although this is a natural notion of isomorphism in the context of ErgodicTheory, it is not an easy one to handle. In general, the only way to prove thattwo given systems are equivalent is by exhibiting the equivalence map more orless explicitly. On the other hand, the most usual way to show that two systemsare not equivalent is by finding some property that holds for one but not theother.

Thus, it is useful to consider a weaker notion, called spectral equivalence: twosystems are spectrally equivalent if their Koopman operators are conjugated bysome unitary operator. Two ergodically equivalent systems are always spectrallyequivalent, but the converse is not true.

The idea of spectral equivalence leads to a rich family of invariants, relatedto the spectrum of the Koopman operator, that must have the same value forany two systems that are equivalent and, thus, may be used to exclude thatpossibility. Other invariants, of non-spectral nature, have an equally crucialrole. The most important of all, the entropy, will be treated in Chapter 9.

The notions of ergodic equivalence and spectral equivalence, and the re-lations between them, are studied in Sections 8.1 and 8.2, respectively. InSections 8.3 and 8.4 we study two classes of systems with opposite dynamicalfeatures: the transformations with discrete spectrum, that include the ergodictranslations on compact abelian groups, and the transformations with Lebesguespectrum, which have the Bernoulli shifts as the main example.

These two classes of systems, as well as others that we introduced previously(ergodicity, strong mixing, weak mixing) are invariants of spectral equivalence

221

222 CHAPTER 8. EQUIVALENT SYSTEMS

and, hence, also of ergodic equivalence. Finally, in Section 8.5 we discuss athird notion of equivalence, that we call ergodic isomorphism, especially in thecontext of Lebesgue spaces.

8.1 Ergodic equivalence

Let µ and ν be probability measures invariant under measurable transformationsf : M → M and g : N → N , respectively. We say that the systems (f, µ) and(g, ν) are ergodically equivalent if one can find measurable sets X ⊂ M andY ⊂ N with µ(M \ X) = 0 and ν(N \ Y ) = 0, and a measurable bijectionφ : X → Y with measurable inverse, such that

φ∗µ = ν and φ f = g φ.

We leave it to the reader to check that this is indeed an equivalence relation,that is, reflexive, symmetric and transitive.

Observe also that the sets X and Y in the definition may be chosen to beinvariant under f and g, respectively. Indeed, consider X0 = ∩∞

n=0f−n(X). It

is clear from the definition that X0 ⊂ X and f(X0) ⊂ X0. Since µ(X) = 1and the intersection is countable, we have that µ(X0) = 1. Analogously, Y0 =∩∞n=0g

−n(Y ) is a measurable subset of Y such that ν(Y0) = 1 and g(Y0) ⊂ Y0.Moreover, by construction, Y0 = φ(X0). Therefore, the restriction of φ to X0 isstill a bijection onto Y0.

Example 8.1.1. Let f : [0, 1] → [0, 1] be defined by f(x) = 10x − [10x]. Aswe saw in Section 1.3.1, this transformation preserves the Lebesgue measurem on [0, 1]. If one represents each number x ∈ [0, 1] by its decimal expansionx = 0, a0a1a2 . . . , the transformation f corresponds, simply, to shifting all thedigits of x one unit to the left. That motivates us to consider:

φ : 0, 1, . . . , 9N → [0, 1], φ((an)n

)=

∞∑

n=0

an10n+1

= 0, a0a1a2 . . . .

It is clear that φ is surjective. On the other hand, it is not injective,since certain real numbers have more than one decimal expansion: for example,0, 1000000 · · · = 0, 099999 . . . . Actually, this happens if and only the numbersadmits a finite decimal expansion, that is, such that all but finitely many digitsare zero. The set of such numbers is countable and, hence, is irrelevant from thepoint of view of the Lebesgue measure. More precisely, let us consider the setX ⊂ 0, 1, . . . , 9N of all sequences with an infinite number of symbols differentfrom zero and the set Y ⊂ [0, 1] of all numbers whose decimal expansion isinfinite (hence, unique). Then the restriction of φ to X is a bijection onto Y .

It is easy to check that both φ | X and its inverse are measurable: use thefact that the image of the intersection of X with each cylinder [0; a0, . . . , am−1]is the intersection of Y with some interval of length 10−m. This observation alsoshows that φ∗ν = m, where ν denotes the Bernoulli measure on 0, 1, . . . , 9N

8.1. ERGODIC EQUIVALENCE 223

that assigns equal weights to all the digits. Moreover, denoting by σ the shiftmap in 0, 1, . . . , 9N, we have that

φ σ((an)n

)= 0, a1a2 . . . an · · · = f φ

((an)n

)

for every (an)n ∈ X . This proves that (f,m) is ergodically equivalent to theBernoulli shift (σ, ν).

Suppose that (f, µ) and (g, ν) are ergodically equivalent. A measurable setA ⊂ M is invariant under f : M → M if and only if φ(A) is invariant underg : N → N . Moreover, ν(φ(A)) = µ(A). Therefore, (f, µ) is ergodic if andonly if (g, ν) is ergodic. It is just as easy to obtain similar conclusions for themixing and the weak mixing properties. Indeed, essentially all the propertiesthat we study in this book are invariants of ergodic equivalence, that is, if theyhold for a given system then they also hold for any system that is ergodicallyequivalent to it. An exception is unique ergodicity, which is a property of adifferent nature, since it concerns solely the transformation.

This also means that these properties may be used to try to distinguishsystems that are not ergodically equivalent. Still, that is usually a difficult task.For example, nothing of what was said so far is of much help towards answeringthe following question: are the shift maps

σ : 1, 2Z → 1, 2Z and ζ : 1, 2, 3Z → 1, 2, 3Z, (8.1.1)

endowed with the corresponding Bernoulli measures giving the same weights toall the symbols, ergodically equivalent? It is easy to see that σ and ζ are nottopologically conjugate (for example: ζ has three fixed points, whereas σ hasonly two), but the existence of an ergodic equivalence is a much more delicateissue. In fact, this type of question motivates most of the content of the presentchapter and also leads to the notion of entropy, which is the subject of Chapter 9.

Example 8.1.2. Let σ : M →M be the shift map in M = XN and let µ = νN

be a Bernoulli measure. Let σ : M → M be the natural extension of σ and µbe the lift of µ (Section 2.4.2). Moreover, let σ : M → M be the shift map inM = XZ and µ = νZ be the corresponding Bernoulli measure. Then, (σ, µ) isergodically equivalent to (σ, µ). An equivalence may be constructed as follows.

By definition, M is the space of pre-orbits of σ : M → M , that is, of allthe sequences x = (. . . , x−n, . . . , x0) in M such that σ(x−j) = x−j+1 for everyj ≥ 1. Moreover, each x−j is a sequence (x−j,i)i∈N in X . So, the previousrelation means that

x−j,i+1 = x−j+1,i for every i, j ∈ N. (8.1.2)

Consider the map φ : M → M , x 7→ x given by

xn = x0,n = x−1,n+1 = · · · and x−n = x−n,0 = x−n−1,1 = · · · .We leave it to the reader to check that φ is indeed an ergodic equivalence betweenthe natural extension (σ, µ) and the two-sided shift map (σ, µ).


8.1.1 Exercises

8.1.1. Let f : [0, 1] → [0, 1] be the transformation defined by f(x) = 2x− [2x]and m be the Lebesgue measure on [0, 1]. Exhibit a map g : [0, 1] → [0, 1]and a probability measure ν invariant under g, such that (g, ν) is ergodicallyequivalent to (f, µ) and the support of ν has empty interior.

8.1.2. Let f : 1, . . . , kN → 1, . . . , kN and g : 1, . . . , lN → 1, . . . , lN beone-sided shift maps, endowed with Bernoulli measures µ and ν, respectively.Show that, for every set X ⊂ 1, . . . , kN with f−1(X) = X and µ(X) = 1,there exists x ∈ X such that #(X ∩ f−1(x)) = k. Conclude that if k 6= l then(f, µ) and (g, ν) can not be ergodically equivalent.

8.1.3. Let X = 1, . . . , d and consider the shift map σ : XN → XN endowedwith a Markov measure µ. Given any cylinder C = [0; c0, . . . , cl] in XN, letµC be the normalized restriction of µ to C. Show that there exists an inducedtransformation σC : C → C (see Section 1.4.2) preserving µC and such that(σC , µC) is ergodically equivalent to a Bernoulli shift (σN, ν) in NN.

8.2 Spectral equivalence

Let f : M → M and g : N → N be transformations preserving probabilitymeasures µ and ν, respectively. Let Uf : L2(µ) → L2(µ) and Ug : L2(ν) → L2(ν)be the corresponding Koopman operators. We say that (f, µ) and (g, ν) arespectrally equivalent if there exists some unitary operator L : L2(µ) → L2(ν)such that

Ug L = L Uf . (8.2.1)

We leave it to the reader to check that the relation defined in this way is, indeed,an equivalence relation.

It is easy to see that if two systems are ergodically equivalent then theyare spectrally equivalent. Indeed, suppose that there exists an invertible maph : M → N such that φ∗µ = ν and φ f = g φ. Then, the Koopman operator

Uφ : L2(ν) → L2(µ), Uφ(ψ) = ψ φ

is an isometry and is invertible: the inverse is the Koopman operator associatedwith φ−1. In other words, Uφ is a unitary operator. Moreover,

Uf Uφ = Uφf = Ugφ = Uφ Ug.

Therefore, L = Uφ is a spectral equivalence between the two systems.

The converse is false, as will be clear from the sequel. For example, allcountably generated two-sided Bernoulli shifts are spectrally equivalent (Corol-lary 8.4.12); however, not all have the same entropy (Example 9.1.10) and sonot all are ergodically equivalent.

8.2. SPECTRAL EQUIVALENCE 225

8.2.1 Invariants of spectral equivalence

Recall that the spectrum spec(A) of a bounded linear operator A : E → E in acomplex Banach space E consists of the complex numbers λ such that A− λidis not invertible. We say that λ ∈ spec(A) is an eigenvalue if A − λid is notinjective, that is, if there exists v 6= 0 such that Av = λv. Then, the dimensionof the kernel of A− λid is called multiplicity of the eigenvalue.

By definition, the spectrum of a system (f, µ) is the spectrum of the cor-responding Koopman operator Uf : L2(µ) → L2(µ). If (f, µ) is spectrallyequivalent to (g, ν) then the two systems have the same spectrum: the relation(8.2.1) implies that

(Ug − λ id ) = L (Uf − λ id ) L−1 (8.2.2)

and, consequently, Ug − λ id is invertible if and only if Uf − λ id is invertible.In fact, the spectrum itself is a poor invariant: in particular, all the invert-

ible ergodic systems with no atoms have the same spectrum (Exercise 8.2.1).However, the associated spectral measure does provide very useful invariants,as we are going to see. The simplest one is the set of atoms of the spectralmeasure, that is, the set of eigenvalues of the Koopman operator. Note that(8.2.2) also shows that a given λ is an eigenvalue of Uf if and only if it is aneigenvalue of Ug; besides, in that case the two multiplicities are equal.

Observe that 1 is always an eigenvalue of the Koopman operator, sinceUfϕ = ϕ for every constant function ϕ. By Proposition 4.1.3(e), the system(f, µ) is ergodic if and only if this eigenvalue has multiplicity 1 for Uf . Thus,it follows from what we have just said that (f, µ) is ergodic if and only if anysystem (g, ν) spectrally equivalent to it is ergodic. In other words, ergodicity isan invariant of spectral equivalence.

Analogously, suppose that the system (f, µ) is mixing. Then, by Proposi-tion 7.1.12,

limnUnf ϕ · ψ =

∫ϕdµ

∫ψ dµ

for every ϕ, ψ ∈ L2(µ). Now suppose that (g, ν) is spectrally equivalent to (f, µ).Let L be the unitary operator in (8.2.1). The inverse L−1 maps eigenvectorsof Ug associated with the eigenvalue 1 to eigenvectors of Uf associated withthe same eigenvalue 1. Since the two systems are ergodic (use the previousparagraph), this means that L−1 maps constant functions to constant functions.Since L−1 is unitary,

Ung ϕ · ψ = L−1(Ung ϕ) · L−1ψ = Unf (L−1ϕ) · L−1ψ

and, hence, limn Ung ϕ · ψ =

∫L−1ϕdµ

∫L−1ψ dµ for every ϕ, ψ ∈ L2(ν). Also,

∫L−1ϕdµ = L−1ϕ · 1 = L−1ϕ · L−11 = ϕ · 1 =

∫ϕdν

and, analogously,∫L−1ψ dµ =

∫ψ dµ. In this way, we have shown that

limnUng ϕ · ψ =

∫ϕdµ

∫ψ dµ,


for every ϕ, ψ ∈ L2(ν), that is, (g, ν) is also mixing. This shows that the mixingproperty is an invariant of spectral equivalence.

The same argument may be used for the weak mixing property. But thetheorem that we prove in Section 8.2.2 below gives us a more interesting proofof the fact that weak mixing is an invariant of spectral equivalence.

8.2.2 Eigenvalues and weak mixing

As we have seen, the Koopman operator Uf : L2(µ) → L2(µ) of a system (f, µ) isan isometry, that is, it satisfies U∗

fUf = id . If f is invertible then the Koopmanoperator is unitary, that is, it satisfies U∗

fUf = UfU∗f = id . In particular, in

that case Uf is a normal operator. Then the property of weak mixing admitsthe following interesting characterization:

Theorem 8.2.1. An invertible system (f, µ) is weak mixing if and only if theconstant functions are the only eigenvectors of the Koopman operator.

In particular, a system (f, µ) is weak mixing if and only if it is ergodic and1 is the unique eigenvalue of Uf .

Proof. Suppose that (f, µ) is weak mixing. Let ϕ ∈ L2(µ) be any (non-zero)eigenfunction of Uf and λ be the corresponding eigenvalue. Then,

∫ϕdµ =

∫Ufϕdµ = λ

∫ϕdµ

and this implies that∫ϕdµ = 0 or λ = 1. In the first case,

Cj(ϕ, ϕ) = |∫

(U jfϕ)ϕ dµ| = |λj∫ϕϕ dµ| =

∫|ϕ|2 dµ

for every j ≥ 1 (recall that |λ| = 1). But then,

limn

1

n

n−1∑

j=0

Cj(ϕ, ϕ) =

∫|ϕ|2 dµ > 0,

contradicting the hypothesis that the system is weak mixing. In the secondcase, using that the system is ergodic, we find that ϕ is constant at µ-almostevery point. This shows that if the system is weak mixing then the constantfunctions are the only eigenvectors.

Now suppose that the only eigenvectors of Uf are the constant functions.To conclude that (f, µ) is weak mixing, we must show that

1

n

n−1∑

j=0

Cj(ϕ, ψ)2 → 0 for any ϕ, ψ ∈ L2(µ)

(recall Exercise 7.1.2). It follows immediately from the definition that

Cj(ϕ, ψ) = Cj(ϕ′, ψ) where ϕ′ = ϕ−

∫ϕdµ


and the integral of ϕ′ vanishes. Hence, it is no restriction to suppose that∫ϕdµ = 0. Then, using the relation (A.7.6) for the unitary operator L = Uf ,

we get:

Cj(ϕ, ψ)2 =∣∣∫

(U jfϕ)ψ dµ∣∣2 =

∣∣∫

C

zj dθ(z)∣∣2,

where θ = Eϕ · ψ. The expression on the right-hand side may be rewritten asfollows: ∫

C

zj dθ(z)

∫

C

zj dθ(z) =

∫

C

∫

C

zjwj dθ(z) dθ(w).

Therefore, given any n ≥ 1,

1

n

n−1∑

j=0

Cj(ϕ, ψ)2 =

∫

C

∫

C

1

n

n−1∑

j=0

(zw)j dθ(z) dθ(w). (8.2.3)

We claim that the measure θ = Eϕ · ψ is non-atomic. In fact, suppose thatthere exists λ ∈ C such that θ(λ) 6= 0. Then, E(λ) 6= 0 and then we mayuse Proposition A.7.8 to conclude that the function E(λ)ϕ is an eigenvectorof Uf . By the hypothesis about the operator Uf , this implies that E(λ)ϕ isconstant at µ-almost every point. Hence,

E(λ)ϕ · ϕ = E(λ)ϕ∫ϕ dµ = 0.

Lemma A.7.3 also gives that

E(λ)ϕ · ϕ = E(λ)2ϕ · ϕ = E(λ)ϕ ·E(λ)ϕ.

Putting these two identities together, we conclude that E(λ)ϕ = 0, whichcontradicts the hypothesis. Thus, our claim is proved.

The sequence n−1∑n−1

j=0 (zw)j in (8.2.3) is bounded and (see Exercise 8.2.6)converges to zero on the complement of the diagonal ∆ = (z, w) : z = w.Moreover, the diagonal has measure zero:

(θ × θ)(∆) =

∫θ(y) dθ(y) = 0,

because θ is non-atomic. Then, we may use the monotone convergence theoremto conclude that (8.2.3) converges to zero when n→ ∞. This proves that (f, µ)is weak mixing if Uf has no non-constant eigenvectors.

Suppose that M is a topological space. We say that a continuous mapf : M →M is topologically weak mixing if the Koopman operator Uf has no non-constant continuous eigenfunctions. The following fact is an easy consequenceof Theorem 8.2.1:

Corollary 8.2.2. If (f, µ) is weak mixing then the restriction of f to the supportof µ is topologically weak mixing.


Proof. Let ϕ be a continuous eigenfunction of Uf . By Theorem 8.2.1, the func-tion ϕ is constant at µ-almost every point. Hence, by continuity, ϕ is constant(at every point) on the support of µ.

We mentioned in Section 7.3 that almost every interval exchange is weakmixing but not mixing. In the sequel we describe an explicit construction,based on an extension of ideas that were hinted at Example 6.3.9. The readermay find this and other variations of those ideas in Section 7.4 of the book ofKalikow and McCutcheon [KM10].

Example 8.2.3 (Chacon). Consider the sequence (Sn)n of piles defined asfollows. Firstly, S1 = [0, 2/3). Next, for each n ≥ 1, let Sn be the pileobtained by dividing Sn−1 into 3 columns, with the same width, and pilingthose columns up on top of each other, with an additional interval insertedbetween the second pile and the third one, as illustrated in Figure 8.1.

.

.

.

I0

Ik−1

. ..

. . .

. . .

Figure 8.1: Constructing a weak mixing system that is not mixing

For example, S2 = [0, 2/9), [2/9, 4/9), [6/9, 8/9), [4/9, 6/9) and

S3 = [0, 2/27), [6/27, 8/27), [18/27, 20/27), [12/27, 14/27), [2/27, 4/27),

[8/27, 10/27), [20/27, 22/27), [14/27, 16/27), [24/27, 26/27),

[4/27, 6/27), [10/27, 12/27), [22/27, 24/27), [16/27, 18/27).Note that each Sn is a pile in the interval Jn = [0, 1 − 3−n). The sequence(fn)n of transformations associated with such piles converges at every point toa transformation f : [0, 1) → [0, 1) that preserves the Lebesgue measure m. Thissystem (f,m) is weak mixing but not mixing (Exercise 8.2.7).

8.2.3 Exercises

8.2.1. Let (f, µ) be an invertible ergodic system with no atoms. Show thatevery λ in the unit circle z ∈ C : |z| = 1 is an approximate eigenvalue ofthe Koopman operator Uf : L2(µ) → L2(µ): there exists some sequence (ϕn)nsuch that ‖ϕn‖ → 1 and ‖Ufϕn − λϕn‖ → 0. In particular, the spectrum of Ufcoincides with the unit circle.


8.2.2. Let m be the Lebesgue measure on the circle and Uα : L2(m) → L2(m)be the Koopman operator of the irrational rotationRα : S1 → S1. Calculate theeigenvalues of Uα and deduce that (Rα,m) and (Rβ ,m) are spectrally equivalentif and only if α = ±β. [Observation: Corollary 8.3.6 provides a more completestatement.]

8.2.3. Letm be the Lebesgue measure on the circle and, for each integer numberk ≥ 2, let Uk : L2(m) → L2(m) be the Koopman operator of the transformationfk : S1 → S1 given by fk(x) = kx mod Z. Check that if p 6= q then (fp,m)and (fq,m) are not ergodically equivalent. Show that, for any k ≥ 2,

L2(m) = constants ⊕∞⊕

j=0

U jk(Hk)

where Hk = ∑n∈Z ane2πinx : an = 0 if k | n and the terms in the direct

sum are pairwise orthogonal. Conclude that (fp,m) and (fq,m) are spectrallyequivalent for any p and q.

8.2.4. Let f : S1 → S1 be the transformation given by f(x) = kx mod Zand µ be the Lebesgue measure. Show that (f, µ) is weak mixing if and only if|k| ≥ 2.

8.2.5. Prove that, for any invertible transformation f , if µ is ergodic for everyiterate fn and there exists C > 0 such that

lim supn

µ(f−n(A) ∩B) ≤ Cµ(A)µ(B),

for any measurable sets A and B, then µ is weak mixing. [Observation: Thisstatement is due to Ornstein [Orn72]. In fact, he proved more: under thesehypothesis the system is (strongly) mixing.]

8.2.6. Let z and w be two complex numbers with absolute value 1. Check that

1. limn

1

n

n−1∑

j=0

|zj − 1| = 0 if and only if z = 1;

2. limn

1

n

n−1∑

j=0

(zw)j = 0 if z 6= w.

8.2.7. Consider the system (f,m) in Example 8.2.3. Show that

(a) the system (f,m) is ergodic;

(b) the only eigenvalues of the Koopman operator Uf : L1(m) → L1(m) arethe constant functions; hence, (f,m) is weak mixing;

(b) lim supnm(fn(A)∩A) ≥ 2/27 if we take A = [0, 2/9); in particular, (f,m)is not mixing.


8.3 Discrete spectrum

In this section and the next one we study two extreme cases, in terms of the typeof spectral measure of the Koopman operator: systems with discrete spectrum,whose spectral measure is purely atomic, and systems with Lebesgue spectrum,whose spectral measure is equivalent to the Lebesgue measure on the unit circle.

We begin by describing some properties of the eigenvalues and eigenvectorsof the Koopman operator. It is clear that all the eigenvalues are in the unitcircle, since Uf is an isometry.

Proposition 8.3.1. If ϕ1, ϕ2 ∈ L2(µ) satisfy Ufϕ1 = λ1ϕ1 and Ufϕ2 = λ2ϕ2

with λ1 6= λ2, then ϕ1 ·ϕ2 = 0. Moreover, the eigenvalues of Uf form a subgroupof the unit circle.

If the system (f, µ) is ergodic then every eigenvalue of Uf is simple and theabsolute value of every eigenfunction is constant at µ-almost every point.

Proof. The first claim follows from the identity

ϕ1 · ϕ2 = Ufϕ1 · Ufϕ2 = λ1ϕ1 · λ2ϕ2 = λ1λ2(ϕ1 · ϕ2) = λ1λ−12 (ϕ1 · ϕ2),

since λ1λ−12 6= 1. This identity also shows that the set of all eigenvalues is closed

under the operation (λ1, λ2) 7→ λ1λ−12 . Recalling that 1 is always an eigenvalue,

it follows that this set is a multiplicative group.Now assume that (f, µ) is ergodic and suppose that Ufϕ = λϕ. Then,

Uf (|ϕ|) = |Ufϕ| = |λϕ| = |ϕ| at µ-almost every point. By ergodicity, thisimplies that |ϕ| is constant at µ-almost every point. Next, suppose that Ufϕ1 =λϕ1 and Ufϕ2 = λϕ2 and the functions ϕ1 and ϕ2 are not identically zero. Since|ϕ2| is constant at µ-almost every point, ϕ2(x) 6= 0 for µ-almost every x. Thenϕ1/ϕ2 is well defined. Moreover,

Uf(ϕ1

ϕ2

)=Uf (ϕ1)

Uf (ϕ2)=λϕ1

λϕ2=ϕ1

ϕ2.

By ergodicity, it follows that the quotient is constant at µ-almost every point.That is, ϕ1 = cϕ2 for some c ∈ C.

We say that a system (f, µ) has discrete spectrum if the eigenvectors ofthe Koopman operator Uf : L2(µ) → L2(µ) generate the Hilbert space L2(µ).Observe that this implies that Uf is invertible and, hence, is a unitary op-erator. This terminology is justified by the following observation (recall alsoTheorem A.7.9):

Proposition 8.3.2. A system (f, µ) has discrete spectrum if and only if itsKoopman operator Uf has a spectral representation of the form

T :⊕

j

L2(σj)χj →

⊕

j

L2(σj)χj , (ϕj,l)j,l 7→

(z 7→ zϕj,l(z)

)j,l

(8.3.1)

where each σj is a Dirac measure at a point in the unit circle.

8.3. DISCRETE SPECTRUM 231

Proof. Suppose that Uf admits a spectral representation of the form (8.3.1)with σj = δλj for some λj in the unit circle. Each L2(σj)

χj may be canoni-cally identified with a subspace in the direct sum. The restriction of T to thatsubspace coincides with λj id , since

z ϕj,l(z) = λj ϕj,l(z) at σj-almost every point. (8.3.2)

Let (vj,l)l be a Hilbert basis of L2(σj)χj . Then, (vj,l)j,l is a Hilbert basis of

the direct sum formed by eigenvectors of T . Since T is unitarily conjugate toUf , it follows that L2(µ) admits a Hilbert basis formed by eigenvectors of theKoopman operator.

Now suppose that (f, µ) has discrete spectrum. Let (λj)j be the eigenvaluesof Uf and, for each j, let σj = δλj and χj be the Hilbert dimension of theeigenspace ker(Uf − λj id ). Note that the space L2(σj) is 1-dimensional, sinceevery function is constant at σj -almost every point. Therefore, the Hilbertdimension of L2(δλj )

χj is also equal to χj . Hence, there exists some unitaryisomorphism

Lj : ker(Uf − λj id ) → L2(δλj )χj .

It is clear that Lj Uf L−1j = λj id . In other words, recalling the observation

(8.3.2),

Lj Uf L−1j : (ϕj,l)l 7→

(z 7→ λjϕj,l(z)

)l=(z 7→ zϕj,l(z)

)l. (8.3.3)

The eigenspaces ker(Uf − λj id ) generate L2(µ), by hypothesis, and they arepairwise orthogonal, by Proposition 8.3.1. Hence, we may combine the operatorsLj to obtain a unitary isomorphism L : L2(µ) → ⊕jL2(σj)

χj . The relation(8.3.3) gives that

L Uf L−1 : (ϕj,l)j,l 7→(z 7→ zϕj,l(z)

)l.

is a spectral representation of Uf of the form we are looking for.

Example 8.3.3. Let m be the Lebesgue measure on the torus Td. Considerthe Fourier basis φk(x) = e2πik·x : k ∈ Zd of the Hilbert space L2(m). Let fbe the rotation Rθ : Td → Td corresponding to a given vector θ = (θ1, . . . , θd).Then,

Ufφk(x) = φk(x+ θ) = e2πik·θφk(x) for every x ∈ Td.

This shows that every φk is an eigenvector of Uf and, hence, (f,m) has discretespectrum. Note that the group of eigenvalues is

Gθ = e2πik·θ : k ∈ Zd, (8.3.4)

that is, the subgroup of the unit circle generated by e2πiθj : j = 1, . . . , d.

More generally, every ergodic translation in a compact abelian group hasdiscrete spectrum. Conversely, every ergodic system with discrete spectrum isergodically isomorphic to a translation in a compact abelian group (the notion


of ergodic isomorphism is discussed in Section 8.5). Another interesting resultis that every subgroup of the unit circle is the group of eigenvalues of someergodic system with discrete spectrum. These facts are proved in Section 3.3 ofthe book of Peter Walters [Wal82].

Proposition 8.3.4. Suppose that (f, µ) and (g, ν) are ergodic and have discretespectrum. Then (f, µ) and (g, ν) are spectrally equivalent if and only if theirKoopman operators Uf : L2(µ) → L2(µ) and Ug : L2(ν) → L2(ν) have the sameeigenvalues.

Proof. It is clear that if the Koopman operators are conjugate then they have thesame eigenvalues. To prove the converse, let (λj)j be the eigenvalues of the twooperators. By Proposition 8.3.2, the eigenvalues are simple. For each j, let ujand vj be unit vectors in ker(Uf −λj id ) and ker(Ug−λj id ), respectively. Then(uj)j and (vj)j are Hilbert bases of L2(µ) and L2(ν), respectively. Consider theisomorphism L : L2(µ) → L2(ν) defined by L(uj) = vj . This operator is unitary,since it maps a Hilbert basis to a Hilbert basis, and it satisfies

L Uf(uj) = L(λjuj) = λjvj = Ug(vj) = Ug L(uj)

for every j. By linearity, it follows that L Uf = Ug L. Therefore, (f, µ) and(g, ν) are spectrally equivalent.

Corollary 8.3.5. If (f, µ) is ergodic, invertible and has discrete spectrum then(f, µ) is spectrally equivalent to (f−1, µ).

Proof. It is clear that λ is an eigenvalue of Uf if and only if λ−1 is an eigen-value of Uf−1 ; moreover, in that case the eigenvectors are the same. Since thesets of eigenvalues are groups, it follows that the two operators have the sameeigenvalues and the same eigenvectors. Apply Proposition 8.3.4.

Let m be the Lebesgue measure on the torus Td. Proposition 8.3.4 alsoallows us to classify the irrational rotations on the torus up to equivalence,ergodic and spectral:

Corollary 8.3.6. Let θ = (θ1, . . . , θd) and τ = (τ1, . . . , τd) be rationally inde-pendent vectors and Rθ and Rτ be the corresponding rotations on the torus Td.The following conditions are equivalent:

(a) (Rθ,m) and (Rτ ,m) are ergodically equivalent;

(b) (Rθ,m) and (Rτ ,m) are spectrally equivalent;

(c) there exists L ∈ SL(d,Z) such that θ = Lτ mod Zd.

We leave the proof to the reader (Exercise 8.3.2). In the special case of thecircle, we get that two irrational rotations Rθ and Rτ are equivalent if and onlyif either Rθ = Rτ or Rθ = R−1

τ . See also Exercise 8.3.3.

8.4. LEBESGUE SPECTRUM 233

8.3.1 Exercises

8.3.1. Suppose that (f, µ) has discrete spectrum and the Hilbert space L2(µ)is separable (this is the case, for instance, if the σ-algebra of measurable setsis countably generated). Show that there exists a sequence (nk)k convergingto infinity such that ‖Unk

f ϕ − ϕ‖2 converges to zero when k → ∞, for every

ϕ ∈ L2(µ).

8.3.2. Prove Corollary 8.3.6.

8.3.3. Let m be the Lebesgue measure on S1 and θ = p/q and τ = r/s betwo rational numbers, with gcd(p, q) = 1 = gcd(r, s). Show that the rotations(Rθ,m) and (Rτ ,m) are ergodically equivalent if and only if the denominatorsq and s are equal.

8.4 Lebesgue spectrum

This section is devoted to the class of systems whose Koopman operator hasthe following property (the reason for the terminology will become clear inProposition 8.4.10):

Definition 8.4.1. Let U : H → H be an isometry in a Hilbert space. We saythat U has Lebesgue spectrum if there exists some closed subspace E ⊂ H suchthat

(a) U(E) ⊂ E;

(b)⋂n∈N U

n(E) = 0;

(c)∑

n∈N U−n(E) = H .

Given a probability measure µ, we denote by L20(µ) = L2

0(M,B, µ) the or-thogonal complement, inside the space L2(µ) = L2(M,B, µ), of the subspace ofconstant functions. In other words,

L20(µ) = ϕ ∈ L2(µ) :

∫ϕdµ = 0.

Note that L20(µ) is invariant under the Koopman operator: ϕ ∈ L2

0(µ) if andonly if Ufϕ ∈ L2

0(µ). We say that the system (f, µ) has Lebesgue spectrum ifthe restriction of the Koopman operator to L2

0(µ) has Lebesgue spectrum.

8.4.1 Examples and properties

We start by observing that all Bernoulli shifts have Lebesgue spectrum. It isconvenient to treat one-sided shifts and the two-sided shifts separately.


Example 8.4.2. Consider a one-sided shift map σ : XN → XN and a Bernoullimeasure µ = νN on XN. Let E = L2

0(µ). Conditions (a) and (c) in Defini-tion 8.4.1 are obvious. To prove condition (b), consider any function ϕ ∈ L2

0(µ)in the intersection, that is, such that for every n ∈ N there exists a functionψn ∈ L2

0(µ) satisfying ϕ = ψn σn. We want to show that ϕ is constant atµ-almost every point. For each c ∈ R, consider

Ac = x ∈ XN : ϕ(x) > c.

For each n ∈ N, we may write Ac = σ−n(x ∈ XN : ψn(x) > c). Then Acbelongs to the σ-algebra generated by the cylinders of the form [n;Cn, · · · , Cm]with m ≥ n. Consequently, µ(Ac ∩ C) = µ(Ac)µ(C) for every cylinder C ofthe form C = [0;C0, . . . , Cn−1]. Since n is arbitrary and the cylinders are agenerating family, it follows that µ(Ac ∩B) = µ(Ac)µ(B) for every measurableset B ⊂ XN. Taking B = Ac we conclude that µ(Ac) = µ(Ac)

2; in other words,µ(Ac) ∈ 0, 1 for every c ∈ R. This proves that ϕ is constant at µ-almost everypoint, as stated.

Example 8.4.3. Now consider a two-sided shift map σ : XZ → XZ and aBernoulli measure µ = νZ. Let A be the σ-algebra generated by the cylindersof the form [0;C0, . . . , Cm] with m ≥ 0. Denote by L2

0(XZ,A, µ) the space of

all functions ϕ ∈ L20(µ) that are measurable with respect to the σ-algebra A

(in other words, ϕ(x) depends only on the coordinates xn, n ≥ 0 of the point).Take E = L2

0(XZ,A, µ). Condition (a) in Definition 8.4.1 is obvious. Condition

(b) follows from the same arguments that we used in Example 8.4.2. To provecondition (c), note that ∪nU−n

σ (E) contains the characteristic functions of allthe cylinders. Therefore, it contains all the linear combinations of characteristicfunctions of sets in the algebra generated by the cylinders. This implies thatthe union is dense in L2

0(µ), as we wanted to prove.

Lemma 8.4.4. If (f, µ) has Lebesgue spectrum then limn Unf ϕ ·ψ = 0 for every

ϕ ∈ L20(µ) and every ψ ∈ L2(µ).

Proof. Observe that the sequence Unf ϕ · ψ is bounded. Indeed, by the Cauchy-Schwarz inequality (Theorem A.5.4):

|Unf ϕ · ψ| ≤ ‖Unf ϕ‖2‖ψ‖2 = ‖ϕ‖2‖ψ‖2 for every n.

So, it is enough to prove that every convergent subsequence Unj

f ϕ ·ψ converges

to zero. Furthermore, the set Unf ϕ : n ∈ N is bounded in L2(µ), because Ufis an isometry. By the theorem of Banach-Alaoglu (Theorems A.6.1 and 2.3.1),every sequence in that set admits some weakly convergent subsequence. Hence,it is no restriction to suppose that U

nj

f ϕ converges weakly to some ϕ ∈ L2(µ).Let E be a subspace satisfying the conditions in Definition 8.4.1. Initially,

suppose that ϕ ∈ U−kf (E) for some k. Then U

nj

f ϕ ∈ Unj−kf (E). Hence, given

any l ∈ N, we have that Unj

f ϕ ∈ U lf (E) for every j sufficiently large. It follows

(see Exercise A.6.8) that ϕ ∈ U lf (E) for every l ∈ N. By condition (b) in


the definition, this implies that ϕ = 0 at µ-almost every point. In particular,limj U

nj

f ϕ · ψ = ϕ · ψ = 0.

Now consider any ϕ ∈ L20(µ). By condition (c) in the definition, for every

ε > 0 there exist k ∈ N and ϕk ∈ U−kf (E) such that ‖ϕ− ϕk‖2 ≤ ε. Using the

Cauchy-Schwarz once more:

|Unf ϕ · ψ − Unf ϕk · ψ| ≤ ‖ϕ− ϕk‖2 ‖ψ‖2 ≤ ε‖ψ‖2

for every n. Recalling that limn Unf ϕk · ψ = 0 (by the previous paragraph), we

find that

−ε‖ψ‖2 ≤ lim infn

Unf ϕ · ψ ≤ lim supn

Unf ϕ · ψ ≤ ε‖ψ‖2.

Making ε→ 0, it follows that limn Unf ϕ · ψ = 0, as we wanted to prove.

Corollary 8.4.5. If (f, µ) has Lebesgue spectrum then (f, µ) is mixing.

Proof. It suffices to observe that

Cn(ϕ, ψ) = |Unf ϕ · ψ − (

∫ϕdµ) · ψ| = |Unf (ϕ−

∫ϕdµ) · ψ|

and the function ϕ′ = ϕ−∫ϕdµ is in L2

0(µ).

The converse to Corollary 8.4.5 is false, in general: in Example 8.4.13 wepresent certain mixing systems that do not have Lebesgue spectrum.

The class of systems with Lebesgue spectrum is invariant under spectralequivalence. Indeed, suppose that (f, µ) has Lebesgue spectrum and (g, ν) isspectrally equivalent to (f, µ). Let L : L2(µ) → L2(ν) be a unitary operatorconjugating the Koopman operators Uf and Ug. It follows from the hypothe-sis and Corollary 8.4.5 that (f, µ) is weak mixing. Hence, by Theorem 8.2.1,the constant functions are the only eigenvectors of Uf . Then the same holdsfor Ug and so the conjugacy L maps constant functions to constant functions.Then, as L is unitary, its restriction to the orthogonal complement L2

0(µ) is aunitary operator onto L2

0(ν). Now, given any subspace E ⊂ L20(µ) satisfying the

conditions (a), (b), (c) in Definition 8.4.1 for Uf , it is clear that the subspaceL(E) ⊂ L2

0(ν) satisfies the corresponding conditions for Ug. Hence, (g, ν) hasLebesgue spectrum.

Given closed subspaces V ⊂ W of a Hilbert space H , we denote by W ⊖ Vthe orthogonal complement of V inside W , that is,

W ⊖ V = W ∩ V ⊥ = w ∈W : v · w = 0 for every v ∈ V .

The proof of the following fact is discussed in the next section:

Proposition 8.4.6. If U : H → H is an isometry and E1 and E2 are subspacessatisfying the conditions in Definition 8.4.1, then the orthogonal complementsE1 ⊖ U(E1) and E2 ⊖ U(E2) have the same Hilbert dimension.


This leads to the following definition: the rank of an operator U : H → Hwith Lebesgue spectrum is the Hilbert dimension of E⊖U(E) for any subspaceE satisfying the conditions in Definition 8.4.1.

Then we define the rank of a system (f, µ) with Lebesgue spectrum to be therank of the associated Koopman operator restricted to L2

0(µ). It is clear thatthe rank is less or equal than the Hilbert dimension of L2

0(µ). In particular, ifL2(µ) is separable then the rank is countable, possibly finite. The majority ofinteresting examples fall into this category:

Example 8.4.7. Suppose that the probability space (M,B, µ) is countably gen-erated, that is, there exists a countable family G of measurable subsets such thatevery element of B coincides, up to measure zero, with some element of the σ-algebra generated by G. Then L2(µ) is separable: the algebra A generated byG is countable and the linear combinations with rational coefficients of charac-teristic functions of elements of A form a countable dense subset of L2(µ).

It is interesting to point out that no examples are known of systems withLebesgue spectrum of finite rank. For Bernoulli shifts, the rank coincides withthe dimension of the corresponding L2(µ):

Example 8.4.8. Let (σ, µ) be a one-sided Bernoulli shift (similar considerationsapply in the two-sided case). As we have seen in Example 8.4.2, we may takeE = L2

0(µ). Then, denoting x = (x1, . . . , xn, . . . ) and recalling that µ = νN,

ϕ ∈ E ⊖ Uσ(E) ⇔∫ϕ(x0, x)ψ(x) dµ(x0, x) = 0 ∀ψ ∈ L2

0(µ)

⇔∫ ( ∫

ϕ(x0, x) dν(x0))ψ(x) dµ(x) = 0 ∀ψ ∈ L2

0(µ).

Hence, E ⊖ Uσ(E) =ϕ ∈ L2(µ) :

∫ϕ(x0, x) dν(x0) = 0 for µ-almost every x

.

We claim that dim(E ⊖ Uσ(E)) = dimL2(µ). The inequality ≤ is obvious.To prove the other inequality, fix any measurable function φ : X → R with∫φdν = 0 and

∫φ2 dν = 1. Consider the linear map I : L2(µ) → L2(µ)

associating with each ψ ∈ L2(µ) the function Iψ(x0, x) = φ(x0)ψ(x). Theassumptions on φ imply that

Iψ ∈ E ⊖ Uσ(E) and ‖Iψ‖2 = ‖ψ‖2 for every ψ ∈ L2(µ).

This shows that E ⊖Uσ(E) contains a subspace isometric to L2(µ) and, hence,dimE ⊖ Uσ(E) ≥ dimL2(µ). This concludes the argument.

We say that the shift is of countable type if the probability space X is count-ably generated. This is automatic, for example, if X is finite, or even countable.In that case, the space Σ = XN (or Σ = XZ) is also countably generated: if G isa countable generator ofX then the cylinders [m;Cm, . . . , Cn] with Cj ∈ G forma countable generator of Σ. Then, as observed in Example 8.4.7, the space L2(µ)is separable. Therefore, it follows from Example 8.4.8 that every Bernoulli shiftof countable type has Lebesgue spectrum with countable rank.


8.4.2 The invertible case

In this section we take the system (f, µ) to be invertible. In this context, thenotion of Lebesgue spectrum may be formulated in a more transparent way:

Proposition 8.4.9. Let U : H → H be a unitary operator in a Hilbert spaceH. Then U has Lebesgue spectrum if and only if there exists a closed subspaceF ⊂ H such that the iterates Uk(F ), k ∈ Z are pairwise orthogonal and satisfy

H =⊕

k∈Z

Uk(F ).

Proof. Suppose that there exists some subspace F as in the statement. TakeE = ⊕∞

k=0Uk(F ). Condition (a) in Definition 8.4.1 is immediate:

U(E) =∞⊕

k=1

Uk(F ) ⊂ E.

As for condition (b), note that ϕ ∈ ∩∞n=0U

n(E) means that ϕ ∈ ⊕∞k=nU

k(F )for every n ≥ 0. This implies that ϕ is orthogonal to Uk(F ) for every k ∈ Z.Hence, ϕ = 0. Finally, by hypothesis, we may write any ϕ ∈ H as an orthogonalsum ϕ =

∑k∈Z ϕk with ϕk ∈ Uk(F ) for every k. Then

∞∑

k=−n

ϕk ∈∞⊕

k=−n

Uk(F ) = U−n(E)

for every n and the sequence on the left-hand side converges to ϕ when n→ ∞.This gives condition (c) in the definition.

Now we prove the converse. Given E satisfying the conditions (a), (b) and(c) in the definition, take F = E ⊖ U(E). It is easy to see that the iterates ofF are pairwise orthogonal. We claim that

∞⊕

k=0

Uk(F ) = E. (8.4.1)

Indeed, consider any v ∈ E. It follows immediately from the definition ofF that there exist sequences vn ∈ Un(F ) and wn ∈ Un(E) such that v =v0 + · · ·+ vn−1 +wn for each n ≥ 1. We want to show that (wn)n converges tozero, to conclude that v =

∑∞j=0 vn. For that, observe that

‖v‖2 =

n−1∑

j=0

‖vj‖2 + ‖wn‖2 for every n

and, thus, the series∑∞

j=0 ‖vj‖2 is summable. Given ε > 0, fix m ≥ 1 such thatthe sum of the terms with j ≥ m is less than ε. For every n ≥ m,

‖wm − wn‖2 = ‖vm + · · · + vn−1‖2 = ‖vm‖2 + · · · + ‖vn−1‖2 < ε.


This proves that (wn)n is a Cauchy sequence in H . Let w be its limit. Sincewn ∈ Un(E) ⊂ Um(E) for every m ≤ n, passing to the limit we get thatw ∈ Um(E) for every m. By condition (b) in the hypothesis, this implies thatw = 0. Therefore, the proof of the claim (8.4.1) is complete. To conclude theproof of the proposition it suffices to observe that

∞⊕

k∈Z

Uk(F ) =

∞∑

n=0

∞⊕

k=−n

Uk(F ) =

∞∑

n=0

U−n(E).

Condition (c) in the hypothesis implies that this subspace coincides with H .

In particular, an invertible system (f, µ) has Lebesgue spectrum if and onlyif there exists a closed subspace F ⊂ L2

0(µ) such that

L20(µ) =

⊕

k∈Z

Ukf (F ). (8.4.2)

The next result is the reason why systems with Lebesgue spectrum are de-nominated in this way, and it also leads naturally to the notion of rank:

Proposition 8.4.10. Let U : H → H be a unitary operator in a Hilbert space.Let λ denote the Lebesgue measure on the unit circle. Then U has Lebesguespectrum if and only if it admits a spectral representation

T : L2(λ)χ → L2(λ)χ (ϕα)α 7→ (z 7→ zϕα(z))α

for some cardinal χ. Moreover, χ is uniquely determined by U .

Proof. Let us start by proving the “if” claim. As we know, the Fourier familyzn : n ∈ Z is a Hilbert basis of the space L2(λ). Let Vn be the one-dimensionalsubspace generated by ϕ(z) = zn. Then, L2(λ) = ⊕n∈ZVn and, consequently,

L2(λ)χ =(⊕

n∈Z

Vn)χ

=⊕

n∈Z

V χn (8.4.3)

(Wχ denotes the orthogonal direct sum of χ copies of a spaceW ). Moreover, therestriction of T to each V χn is a unitary operator onto V χn+1. Take F ′ = V χ0 . Therelation (8.4.3) means that the iterates T n(F ′) = V χn are pairwise orthogonaland their orthogonal direct sum is the space L2(λ)χ. Using the conjugacy of Tto the Koopman operator in L2

0(µ), we conclude that there exists a subspace Fin the conditions of Proposition 8.4.9.

Conversely, suppose that there exists F in the conditions of Proposition 8.4.9.Let vq : q ∈ Q be a Hilbert basis of F . Then Un(vq) : n ∈ Z, q ∈ Qis a Hilbert basis of H . Given q ∈ Q, denote by δq the element of the spaceL2(λ)Q that is equal to 1 in the coordinate q and identically zero in all the othercoordinates. Define,

L : H → L2(λ)Q, L(Un(vq)) = znδq for each n ∈ Z and q ∈ Q.


Observe that L is a unitary operator, since znδq is a Hilbert basis of L2(λ)Q.Observe also that LU = TL. This provides the spectral representation in thestatement of the proposition, with χ equal to the cardinal of the set Q, that is,equal to the Hilbert dimension of the subspace F .

Let E ⊂ H be any subspace satisfying the conditions in Definition 8.4.1.Then, the orthogonal difference F = E⊖U(E) satisfies the conclusion of Propo-sition 8.4.9, as we saw during the proof of that proposition. Moreover, accord-ing to the proof of Proposition 8.4.10, we may take the cardinal χ equal to theHilbert dimension of F . Since χ is uniquely determined, the same holds for theHilbert dimension of E ⊖ U(E). This proves Proposition 8.4.6 in the invertiblecase. In Exercise 8.4.3 we invite the reader to prove the general case.

We have just shown that the rank of a system with Lebesgue spectrum iswell defined. Next, we are going to see that for invertible systems the rank is acomplete invariant of spectral equivalence:

Corollary 8.4.11. Two invertible systems with Lebesgue spectrum are spectrallyequivalent if and only if they have the same rank.

Proof. It is clear that two invertible systems are spectrally equivalent if andonly if they admit the same spectral representation. By Proposition 8.4.10, thishappens if and only if the value of the cardinal χ is the same, that is, if therank is the same.

Corollary 8.4.12. All two-sided Bernoulli shifts of countable type are spectrallyequivalent.

Proof. As we saw in the previous section, all Bernoulli shifts of countable typehave countable rank.

Proofs of the facts that are quoted in the sequel may be found in the bookof Ricardo Mane [Man87, Section II.10]:

Example 8.4.13 (Gaussian shifts). Let A = (ai,j)i,j∈Z be an infinite realmatrix. We say that A is positive definite if every finite restriction Am,n =(ai,j)m≤i,j<n is positive definite, for any m < n. We say that A is symmet-ric if ai,j = aj,i for any i, j ∈ Z. Let µ be a Borel probability measure onΣ = RZ (similar considerations hold for Σ = RN). We say that µ is a Gaussianmeasure if there exists some symmetric positive definite matrix A such thatµ([m;Bm, . . . , Bn−1]) is equal to

1

(detAm,n)1/21

(2π)(n−m)/2

∫

Bm×···×Bn−1

exp

(−1

2(A−1

m,nz · z))dz

for any m < n and any measurable sets Bm, . . . , Bn−1 ⊂ R. The reason forthe factor on the left-hand side is explained in Exercise 8.4.4. A is called thecovariance matrix of µ. It is uniquely determined by

ai,j =

∫xixj dµ(x) for each i, j ∈ Z.


For each symmetric positive definite matrix A there exists a unique Gaussianprobability measure µ that has A as its covariance matrix. Moreover, µ isinvariant under the shift map σ : Σ → Σ if and only if ai,j = ai+1,j+1 for anyi, j ∈ Z. In that case, the properties of the system (σ, µ) are directly related tothe behavior of the covariance sequence

αn = an,0 = Unσ x0 · x0 for each n ≥ 0.

In particular, (f, µ) is mixing if and only if the covariance sequence convergesto zero.

Now, Exercise 8.4.5 shows that if (f, µ) has Lebesgue spectrum then the co-variance sequence is generated by some absolutely continuous probability mea-sure ν on the unit circle, in the following sense:

αn =

∫zndν(z) for each n ≥ 0.

(The Riemann-Lebesgue lemma asserts that if ν is a probability measure abso-lutely continuous with respect to the Lebesgue measure λ on the unit circle thenthe sequence

∫zn dν(z) converges to zero when n → ∞.) But Exercise 8.4.6

shows that not every sequence that converges to zero is of this form. Therefore,there exist Gaussian shifts (σ, µ) that are mixing but do not have Lebesguespectrum.

8.4.3 Exercises

8.4.1. Show that every mixing Markov shift has Lebesgue spectrum with count-able rank. [Observation: In Section 9.5.3 we mention stronger results.]

8.4.2. Let µ be the Haar measure on Td and fA : Td → Td be a surjectiveendomorphism. Assume that no eigenvalue of the matrix A is a root of unity.Check that every orbit of At in the set Zd \ 0 is infinite and use this factto conclude that (fA, µ) has Lebesgue spectrum. Conversely, if (fA, µ) hasLebesgue spectrum then no eigenvalue of A is a root of unity.

8.4.3. Complete the proof of Proposition 8.4.6, using Exercise 2.3.6 to reducethe general case to the invertible one.

8.4.4. Check that∫

Re−x

2/2 dx =√

2π. Use this fact to show that if A is asymmetric positive definite matrix of dimension d ≥ 1 then

∫

Rd

exp(− (A−1z · z)/2

)dz = (detA)1/2(2π)d/2.

8.4.5. Let (f, µ) be an invertible system with Lebesgue spectrum. Show thatfor every ϕ ∈ L2

0(µ) there exists a probability measure ν absolutely continuouswith respect to the Lebesgue measure λ on the unit circle z ∈ C : |z| = 1 andsuch that Unf ϕ · ϕ =

∫zn dν(z) for every n ∈ Z.

8.5. LEBESGUE SPACES AND ERGODIC ISOMORPHISM 241

8.4.6. Let λ be the Lebesgue measure in the unit circle. Consider the linearoperator F : L1(λ) → c0 defined by

F (ϕ) =

(∫znϕ(z) dλ(z)

)

n

.

Show that F is continuous and injective but not surjective. Therefore, notevery sequence of complex numbers (αn)n converging to zero may be written asαn =

∫zn dν(z) for n ≥ 0, for some probability measure ν absolutely continuous

with respect to λ.

8.5 Lebesgue spaces and ergodic isomorphism

The main subject in this section are the Lebesgue spaces (also called standardprobability spaces), a class of probability spaces introduced by the Russian math-ematician Vladimir A. Rokhlin [Rok62]. These spaces have a distinguished rolein Measure Theory, for two reasons: on the one hand, they exhibit much betterproperties than a general probability space; on the other hand, they includemost interesting examples. In particular, every complete separable metric spaceendowed with a Borel probability measure is a Lebesgue space.

Initially, we discuss yet another notion of equivalence, intermediate to er-godic equivalence and spectral equivalence, that we call ergodic isomorphism.One of the highlights is that for transformations in Lebesgue spaces the notionsof ergodic equivalence and ergodic isomorphism turn out to coincide.

8.5.1 Ergodic isomorphism

Let (M,B, µ) be a probability space. We denote by B the quotient of the σ-algebra by the equivalence relation A ∼ B ⇔ µ(A∆B) = 0. Observe that ifAk ∼ Bk for every k ∈ N then ∪kAk ∼ ∪kBk and ∩kAk ∼ ∩kBk and Ack ∼ Bckfor every k ∈ N. Therefore, the basic operations of set theory are well definedin the quotient B. Moreover, the measure µ induces a measure µ on B. Thepair (B, µ) is called the measure algebra of the probability space.

Now let (M,B, µ) and (N, C, ν) be two probability spaces and (B, µ) and(C, ν) be their measure algebras. A homomorphism of measure algebras is a mapH : B → C that preserves the operations of union, intersection and complementand also preserves the measures: µ(B) = ν(H(B)) for every B ∈ B. If H isa bijection, we call it an isomorphism of measure algebras. In that case theinverse H−1 is also an isomorphism of measure algebras.

Every measurable map h : M → N satisfying h∗µ = ν defines a homomor-phism h : C → B, through B 7→ h−1(B). Moreover, if h is invertible then h isan isomorphism. In the same way, transformations f : M →M and g : N → Npreserving the measures in the corresponding probability spaces define homo-morphisms f : B → B and g : C → C, respectively. We say that the systems(f, µ) and (g, ν) are ergodically isomorphic if these homomorphisms are conju-gate, that is, if f H = H g for some isomorphism H : C → B.


Ergodically equivalent systems are always ergodically isomorphic: given anyergodic equivalence h, it suffices to take H = h. We also have the followingrelation between ergodic isomorphism and spectral equivalence:

Proposition 8.5.1. If two systems (f, µ) and (g, ν) are ergodically isomorphicthen they are spectrally equivalent.

Proof. Let H : C → B be an isomorphism such that f H = H g. Consider thelinear operator L : L2(ν) → L2(µ) constructed as follows. Initially, L(XC) =XH(C) for every B ∈ C. Note that ‖L(XC)‖ = ‖XC‖. Extend the definition tothe set of simple functions, preserving linearity:

L(

k∑

j=1

cjXCj ) =

k∑

j=1

cjXH(Cj) for any k ≥ 1, cj ∈ R and Cj ∈ C.

The definition does not depend on the representation of the simple functionas a linear combination of characteristic functions (Exercise 8.5.1). Moreover,‖L(ϕ)‖ = ‖ϕ‖ for every simple function. Recall that the set of simple functionsis dense in L2(ν). Then, by continuity, L extends uniquely to a linear isometrydefined on the whole L2(ν). Observe that this isometry is invertible: the inverseis constructed in the same way, starting from the inverse of H . Finally,

Uf L(XC) = Uf(XH(C)) = Xf(H(C)) = XH(g(C)) = L(Xg(C)) = L Ug(XC)

for every C ∈ C. By linearity, it follows that Uf L(ϕ) = L Ug(ϕ) for everysimple function; then, by continuity, the same holds for every ϕ ∈ L2(ν).

Summarizing these observations, we have the following relation between thethree equivalence relations:

ergodic equivalence ⇒ ergodic isomorphism ⇒ spectral equivalence.

In what follows we discuss some partial converses, starting with the relationbetween ergodic isomorphism and spectral equivalence.

The following result of Paul Halmos and John von Neumann [HvN42] broad-ens Proposition 8.3.4 and shows that for systems with discrete spectrum thenotions of ergodic isomorphism and spectral equivalence coincide. The readermay find a proof in Section 3.2 of the book of Peter Walters [Wal75].

Theorem 8.5.2 (Discrete spectrum). If (f, µ) and (g, ν) are ergodic systemswith discrete spectrum then the following conditions are equivalent:

1. (f, µ) and (g, ν) are spectrally equivalent;

2. the Koopman operators of (f, µ) and (g, ν) have the same eigenvalues;

3. (f, µ) and (g, ν) are ergodically isomorphic.

In particular, every invertible ergodic system with discrete spectrum is er-godically isomorphic to its inverse.


8.5.2 Lebesgue spaces

Let (M,B, µ) be any probability space. Initially, suppose that the measure µ isnon-atomic, that is, that µ(x) = 0 for every x ∈M . Let P1 ≺ · · · ≺ Pn ≺ · · ·be an increasing sequence of finite partitions of M into measurable sets. Wecall the sequence separating if, given any two different points x, y ∈ M , thereexists n ≥ 1 such that Pn(x) 6= Pn(y). In other words, the non-empty elementsof the partition ∨∞

n=1Pn contain a unique point.Let MP be the subset one obtains by removing from M all the P ∈ ∪nPn

with measure zero. Observe that MP has full measure. We denote by BP andµP the restrictions of B and µ, respectively, to MP . Let m be the Lebesguemeasure on R. The next proposition means that the separating sequence allowsone to represent the probability space (MP ,BP , µP) as a kind of subspace ofthe real line. We say “kind of” because, in general, the image ι(MP) is not ameasurable subset of R.

Proposition 8.5.3. Given any separating sequence (Pn)n, there exists a com-pact totally disconnected set K ⊂ R and there exists a measurable injective mapι : MP → K such that the closure of the image ι(P ) of every P ∈ ∪nPn is anopen and closed subset of K with m(ι(P )) = µ(P ). In particular, ι∗µ coincideswith the restriction of the Lebesgue measure m to the set K.

Proof. Let αn = 1 + 1/n for n ≥ 1. We are going to construct a sequence ofbijective maps ψn : Pn → In, n ≥ 1 satisfying:

(a) each In is a finite family of compact pairwise disjoint intervals;

(b) each element of In, n > 1 is contained in some element of In−1;

(c) m(ψn(P )) = αnµ(P ) for every P ∈ Pn and every n ≥ 1.

For that, we start by writing P1 = P1, . . . , PN. Consider any family I1 =I1, . . . , IN of compact pairwise disjoint intervals such that m(Ij) = α1µ(Pj)for every j. Let ψ1 : P1 → I1 be the map associating with each Pj the corre-sponding Ij . Now suppose that, for a given n ≥ 1, we have already constructedmaps ψ1, . . . , ψn satisfying (a), (b), (c). For each P ∈ Pn, let I = ψn(P ) andlet P1, . . . , PN be the elements of Pn+1 contained in P . Take compact pair-wise disjoint intervals I1, . . . , IN ⊂ I satisfying m(Ij) = αn+1µ(Pj) for eachj = 1, . . . , N . This is possible because, by the induction hypothesis,

m(I) = αnµ(P ) = αn

N∑

j=1

µ(Pj) > αn+1

N∑

j=1

µ(Pj).

Then, define ψn+1(Pj) = Ij for each j = 1, . . . , N . Repeating this procedure foreach P ∈ Pn, we complete the definition of ψn+1 and In+1. It is clear that theconditions (a), (b), (c) are preserved. This finishes the construction.

Now, let K = ∩n ∪I∈In I. It is clear that K is compact and its intersectionwith any I ∈ In is an open and closed subset of K. Moreover,

maxm(I) : I ∈ In = αn maxµ(P ) : P ∈ Pn → 0 when n→ ∞ (8.5.1)


because the sequence (Pn)n is separating and the measure µ is non-atomic.Hence, K is totally disconnected. For each x ∈ MP , the intervals ψn(Pn(x))form a decreasing sequence of compact sets whose lengths decrease to zero.Define ι(x) to be the unique point in ∩nψn(Pn(x)). The hypothesis that thesequence is separating ensures that ι is injective: if x 6= y then there existsn ≥ 1 such that Pn(x) ∩ Pn(y) = ∅ and, thus, ι(x) 6= ι(y). By construction,the pre-image of K ∩ I is in ∪nPn for every I ∈ ∪nIn. Consider the algebra Aformed by the finite disjoint unions of sets K ∩ I of this form. This algebra isgenerating and we have just checked that ι−1(A) is a measurable set for everyA ∈ A. Therefore, the transformation ι is measurable.

To check the other properties in the statement of the proposition, begin bynoting that, for every n ≥ 1 and P ∈ Pn,

ι(P ) =

∞⋂

k=n

⋃

Q

ψk(Q), (8.5.2)

where the union is over all the Q ∈ Pk that are contained in P . To get theinclusion ⊂ it suffices to note that ι(P ) =

⋃Q ι(Q) and ι(Q) ⊂ ψ(Q) for every

Q ∈ Pk and every k. The converse follows from the fact that ι(P ) intersects ev-ery ψk(Q) (the intersection contains ι(Q)) and the length of the ψk(Q) convergesto zero when k → ∞. In this way, (8.5.2) is proven. It follows that

m(ι(P )) = limkm(⋃

Q

ψk(Q))

= limk

∑

Q

αkµ(Q) = limkαkµ(P ) = µ(P ).

Moreover, (8.5.2) means that ι(P ) = ∩∞k=n ∪I I, where the union is over all

the I ∈ Ik that are contained in ψn(P ). The right-hand side of this equalitycoincides with K ∩ψn(P ) and, hence, is an open and closed subset of K. It alsofollows from the construction that ι−1(ι(P )) = P . Consequently, ι∗µ(ι(P )) =µ(P ) = m(ι(P )) for every P ∈ ∪nPn. Since the algebra of finite pairwise disjointunions of sets ι(P ) generates the measurable structure of K, we conclude thatι∗µ = m | K.

We say that a probability space without atoms (M,B, µ) is a Lebesgue spaceif, for some separating sequence, the image ι(MP) is a Lebesgue measurable set.Actually, this property does not depend on the choice of the generating sequence(nor on the families In in the proof of Proposition 8.5.3), but we do not provethis fact here: the reader may find a proof in [Rok62, §2.2]. Exercise 8.5.6 showsthat it is possible to define Lebesgue space in a more direct way, without usingProposition 8.5.3.

Note that if ι(MP ) is measurable then ι(P ) = ι(MP )∩ψn(P ) is measurablefor every P ∈ Pn and every n. Hence, the inverse ι−1 is also a measurabletransformation. Moreover, m(ι(MP)) = µ(MP) = 1 = m(K). Therefore, everyLebesgue space (M,B, µ) is isomorphic, as a measure space, to a measurablesubset of a compact totally disconnected subset of the real line.

Observe that if the cardinal of M is strictly larger than the cardinal of thecontinuum then (M,B, µ) admits no separating sequence and, thus, can not be a


Lebesgue space. In Exercise 8.5.8 we propose another construction of probabilityspaces that are not Lebesgue spaces. Despite examples such as these, practicallyall the probability spaces we deal with are Lebesgue spaces:

Theorem 8.5.4. If M is a complete separable metric space and µ is a Borelprobability measure with no atoms then (M,B, µ) is a Lebesgue space.

Proof. Let X ⊂ M be a countable dense subset and Bn : n ∈ N be anenumeration of the set of balls B(x, 1/k) with x ∈ X and k ≥ 1. We are goingto construct an increasing sequence (Pn)n of finite partitions such that

(i) Pn is finer than B1, Bc1 ∨ · · · ∨ Bn, Bcn and

(ii) En = x ∈M : Pn(x) is not compact satisfies µ(En) ≤ 2−n.

For that, start by considering Q1 = B1, Bc1. By Proposition A.3.7, there exist

compact sets K1 ⊂ B1 and K2 ⊂ Bc1 such that µ(B1 \ K1) ≤ 2−1µ(B1) andµ(Bc1 \K2) ≤ 2−1µ(Bc1). Then take P1 = K1, B1 \K1,K2, B

c1 \K2. Now, for

each n ≥ 1, assume that one has already constructed partitions P1 ≺ · · · ≺ Pnsatisfying (i) and (ii). Consider the partition Qn+1 = Pn ∨ Bn+1, B

cn+1 and

let Q1, . . . , Qm be its elements. By Proposition A.3.7, there exist compact setsKj ⊂ Qj such that µ(Qj \Kj) ≤ 2−(n+1)µ(Qj) for every j = 1, . . . ,m. Take

Pn+1 = K1, Q1 \K1, . . . ,Km, Qm \Km.

It is clear that Pn+1 satisfies (i) and (ii). Therefore, our construction is complete.All that is left is to show that the existence of such a sequence (Pn)n im-

plies the conclusion of the theorem. Property (i) ensures that the sequence isseparating. Let ι : MP → K be a map as in Proposition 8.5.3. Fix any N ≥ 1and consider any point y ∈ K \ ι(MP). For each n > N , let In be the in-terval in the family In that contains y and let Pn be the element of Pn suchthat ψn(Pn) = In. Note that (Pn)n is a decreasing sequence. If they were allcompact, there would be x ∈ ∩n>NPn and, by definition, ι(x) would be equalto y. Since we are assuming that y is not in the image of ι, this proves thatthere exists l > N such that Pl is not compact. Take l > N minimum andlet Il = ψl(Pl). Recall that m(Il) = αlµ(Pl) ≤ 2µ(Pl). Let IN and PN bethe unions of all these Il and Pl, respectively, when we vary y on the wholeK \ ι(MP ). On the one hand, IN contains K \ ι(MP); on the other hand, PN iscontained in ∪l>NEl. Moreover, the Il are pairwise disjoint (because we took lminimum) and the same holds for the Pl. Hence,

m(IN)≤ 2µ

(PN)≤ 2µ

( ⋃

l>N

El)≤ 2−N+1.

Then, the intersection ∩N IN has Lebesgue measure zero and containsK\ι(MP).Since K is a Borel set, this shows that ι(MP ) is a Lebesgue measurable set.

The next result implies that all the Lebesgue spaces with no atoms areisomorphic:


Proposition 8.5.5. If (M,B, µ) is a Lebesgue space with no atoms, there existsan invertible measurable map h : M → [0, 1] (defined between subsets of fullmeasure) such that h∗µ coincides with the Lebesgue measure on [0, 1].

Proof. Let ι : MP → K be a map as in Proposition 8.5.3. Consider the mapg : K → [0, 1] defined by g(x) = m([a, x] ∩ K), where a = minK. It followsimmediately from the definition that g is non-decreasing and Lipschitz:

g(x2) − g(x1) = m([x1, x2] ∩K) ≤ x2 − x1,

for any x1 < x2 in K. In particular, g is measurable. By monotonicity, thepre-image of any interval [y1, y2] ⊂ [0, 1] is a set of the form [x1, x2] ∩K withx1, x2 ∈ K and g(x1) = y1 and g(x2) = y2. In particular,

m([x1, x2] ∩K) = g(x2) − g(x1) = y2 − y1 = m([y1, y2]).

This shows that g∗(m | K) = m | [0, 1]. Let Y be the set of points y ∈ [0, 1] suchthat g−1(y) = [x1, x2] ∩K with x1, x2 ∈ K and x1 < x2. Let X = g−1(Y ).Then, m(X) = m(Y ) = 0 because Y is countable. Moreover, the restrictiong : K\X → [0, 1]\Y is bijective. Its inverse is non-decreasing and, consequently,measurable. Now, take h = g ι. It follows from the previous observations that

h : MP \ ι−1(X) → g(ι(MP )) \ Y

is a measurable bijection with measurable inverse such that h∗µ = m | [0, 1].

Now we extend this discussion to general probability spaces (M,B, µ), pos-sibly with atoms. Let A ⊂M be the set of all the atoms; note that A is at mostcountable, possibly finite. If the space is purely atomic, that is, if µ(A) = 1then, by definition, it is a Lebesgue space. More generally, let M ′ = M \A, letB′ be the restriction of B to M ′ and let µ′ be the normalized restriction of µto B′. By definition, (M,B, µ) is a Lebesgue space if (M ′,B′, µ′) is a Lebesguespace.

It is clear that Theorem 8.5.4 remains valid in the general case: every com-plete separable metric space endowed with a Borel probability measure, possiblywith atoms, is a Lebesgue space. Moreover, Proposition 8.5.5 has the follow-ing extension to the atomic case: if (M,B, µ) is a Lebesgue space and A ⊂ Mdenotes the set of atoms of the measure µ, then there exists an invertible mea-surable map h : M → [0, 1 − µ(A)] ∪ A such that h∗µ coincides with m on theinterval [0, 1 − µ(A)] and coincides with µ on A.

Proposition 8.5.6. Let (M,B, µ) and (N, C, ν) be two Lebesgue spaces andH : C → B be an isomorphism between the corresponding measure algebras.Then there exists an invertible measurable map h : M → N such that h∗µ = νand H = h for every C ∈ C. Moreover, h is essentially unique: any two mapssatisfying these conditions coincide at µ-almost every point.

We are going to sketch the proof of this proposition in the non-atomic case.The arguments are based on the ideas and use the notations in the proof ofProposition 8.5.3.


Let us start with the uniqueness claim. Let h1, h2 : M → N be any twomaps such that (h1)∗µ = (h2)∗µ = ν. Suppose that h1(x) 6= h2(x) for everyx in a set E ⊂ M with µ(E) > 0. Let (Qn)n be a separating sequence in(N, C, ν). Then, Qn(h1(x)) 6= Qn(h2(x)) for every x ∈ E and every n sufficientlylarge. Hence, we may fix n (large) and E′ ⊂ E with µ(E′) > 0 such thatQn(h1(x)) 6= Qn(h2(x)) for every x ∈ E′. Consequently, there existQ ∈ Qn andE′′ ⊂ E′ with µ(E′′) > 0 such that Qn(h1(x)) = Q and Qn(h2(x)) 6= Q for everyx ∈ E′′. Therefore, E′′ ⊂ h−1

1 (Q) \ h−12 (Q). This implies that h1(Q) 6= h2(Q)

and, hence, h1 6= h2.Next we comment on the existence claim. Let (P ′

n)n and (Q′n)n be separating

sequences in (M,B, µ) and (N, C, ν), respectively. Define Pn = P ′n∨H(Q′

n) andQn = Q′

n∨H−1(Pn). Then (Pn)n and (Qn)n are also separating sequences andPn = H(Qn) for each n. Let ι : MP → K be a map as in Proposition 8.5.3 andψn : Pn → In, n ≥ 1 be the family of bijections used in its construction. Let :NQ → L and ϕn : Qn → Jn be corresponding objects for (N, C, ν). Since we areassuming that (M,B, µ) and (N, C, ν) are Lebesgue spaces, ι and are invertiblemaps over subsets with full measure. Recall also that m(ψn(P )) = αnµ(P ) foreach P ∈ Pn and, analogously, m(ϕn(Q)) = αnν(Q) for each Q ∈ Qn. Hence,m(ψn(P )) = m(ϕn(Q)) if P = H(Q). Then, for each n,

ψn H ϕ−1n : Jn → In (8.5.3)

is a bijection that preserves length. Given z ∈ K and n ≥ 1, let In be theelement of In that contains z and let Jn be the corresponding element of Jn,via (8.5.3). By construction, (Jn)n is a nested sequence of compact intervalswhose length converges to zero. Let φ(z) be the unique point in the intersection.In this way, one has defined a measurable map φ : K → L that preserves theLebesgue measure. It is clear from the construction that φ is invertible and theinverse is also measurable. Now it suffices to take h = −1 φ ι.

All that is left is to check that h is invertible. Applying the constructionin the previous paragraph to the inverse H−1 we find h′ : N → M such that

h′∗ν = µ and H−1 = h′. Then, h′ h = h h′ = id and h h′ = h′ h = id . Byuniqueness, it follows that h′ h = id and h h′ = id at almost every point.

Corollary 8.5.7. Let (M,B, µ) and (N, C, ν) be two Lebesgue spaces and letf : M → M and g : N → N be measurable transformations preserving themeasures in their corresponding domains. Then (f, µ) and (g, ν) are ergodicallyequivalent if and only if they are ergodically isomorphic.

Proof. We only need to show that if the systems are ergodically isomorphic thenthey are ergodically equivalent. Let H : C → B be an ergodic isomorphism. ByProposition 8.5.6, there exists an invertible measurable map h : M → N suchthat h∗µ = ν and H = h. Then,

h f = f h = f H = H g = h g = g h.

By the uniqueness part of Proposition 8.5.6, it follows that h f = g h atµ-almost every point. This shows that h is an ergodic equivalence.


8.5.3 Exercises

8.5.1. Let H : C → B be a homomorphism of measure algebras. Show that

l∑

i=1

biXBi =k∑

j=1

cjXCj ⇒l∑

i=1

biXH(Bi) =k∑

j=1

cjXH(Cj).

8.5.2. Check that the homomorphism of measure algebras g : C → B inducedby a measure preserving map g : M → N is injective. Suppose that N is aLebesgue space. Show that, given another measure preserving map h : M → N ,the corresponding homomorphisms g and h coincide if and only if g = h atalmost every point.

8.5.3. Let f : M → M be a measurable transformation in a Lebesgue space(M,B, µ), preserving the measure µ. Show that (f, µ) is invertible at almostevery point (that is, there exists an invariant full measure subset restricted towhich f is a measurable bijection with measurable inverse) if and only if thecorresponding homomorphism of measure algebras f : B → B is surjective.

8.5.4. Show that the Koopman operator of a system (f, µ) is surjective if andonly if the corresponding homomorphism of measure algebras f : B → B issurjective. In Lebesgue spaces this happens if and only if the system is invertibleat almost every point.

8.5.5. Show that every system (f, µ) with discrete spectrum in a Lebesgue spaceis invertible at almost every point.

8.5.6. Given a separating sequence P1 ≺ · · · ≺ Pn ≺ · · · , we call chain anysequence (Pn)n with Pn ∈ Pn and Pn+1 ⊂ Pn for every n. We say that achain is empty if ∩nPn = ∅. Consider the map ι : MP → K constructed inProposition 8.5.3. Show that the image ι(MP) is a Lebesgue measurable setand m(K \ ι(MP)) = 0 if and only if the empty chains have zero measure in thefollowing sense: for every δ > 0 there exists B ⊂ M such that B is a union ofelements of ∪nPn with µ(B) < δ and every empty chain (Pn)n has Pn ⊂ B forevery n sufficiently large.

8.5.7. Prove the following extension of Proposition 2.4.4: If f : M → Mpreserves a probability measure µ and (M,µ) is a Lebesgue space then µ admits

a (unique) lift µ to the natural extension f : M → M .

8.5.8. Let M be a subset of [0, 1] with exterior measure m∗(M) = 1 but whichis not a Lebesgue measurable set. Consider the σ-algebra M of all sets ofthe form M ∩ B, where B is a Lebesgue measurable subset of R. Check thatµ(M∩B) = m(B) defines a probability measure on (M,M) such that (M,M, µ)is not a Lebesgue space.

Chapter 9

Entropy

The word entropy was invented in 1865 by the German physicist and mathe-matician Rudolf Clausius, one of the founding pioneers of Thermodynamics. Inthe theory of systems in thermodynamical equilibrium, the entropy quantifiesthe degree of “disorder” in the system. The second law of Thermodynamicsstates that, when an isolated system passes from an equilibrium state to an-other, the entropy of the final state is necessarily bigger than the entropy ofthe initial state. For example, when we join two containers with different gases(oxygen and nitrogen, say), the two gases mix with one another until reaching anew macroscopic equilibrium, where they both are uniformly distributed in thetwo containers. The entropy of the new state is larger than the entropy of theinitial equilibrium, where the two gases were separate apart.

The notion of entropy plays a crucial role in different fields of Science. Animportant example, which we explore in our presentation, is the field of Infor-mation Theory, initiated by the work of the American electrical engineer ClaudeShannon in the mid 20th century. At roughly the same time, the Russian math-ematicians Andrey Kolmogorov and Yakov Sinai were proposing a definitionof entropy of a system in Ergodic Theory. The main purpose was to providean invariant of ergodic equivalence that, in particular, could distinguish twoBernoulli shifts. This Kolmogorov-Sinai entropy is the subject of the chapter.

In Section 9.1 we define the entropy of a transformation with respect to aninvariant probability measure, by analogy with a similar notion in InformationTheory. The theorem of Kolmogorov-Sinai, which we discuss in Section 9.2, isa fundamental tool for the actual calculation of the entropy in specific systems.In Section 9.3 we analyze the concept of entropy from a more local viewpoint,which is more closely related with Shannon’s formulation of this concept. Next,in Section 9.4, we illustrate a few methods for calculating the entropy, by meansof concrete examples.

In Section 9.5 we discuss the role of the entropy as an invariant of ergodicequivalence. The highlight is the theorem of Ornstein (Theorem 9.5.2), accord-ing to which any two-sided Bernoulli shifts are ergodically equivalent if and onlyif they have the same entropy. In that section we also introduce the class of

249

250 CHAPTER 9. ENTROPY

Kolmogorov systems, which contains the Bernoulli shifts and is contained in theclass of systems with Lebesgue spectrum. In both cases the inclusion is strict.

In the last couple of sections we present two complementary topics that willbe useful later. The first one (Section 9.6) is the theorem of Jacobs, accordingto which the entropy behaves in an affine way with respect to the ergodic de-composition. The other one (Section 9.7) concerns the notion of Jacobian andits relations with the entropy.

9.1 Definition of entropy

To motivate the definition of Kolmogorov-Sinai entropy, let us look at the fol-lowing basic situation in Information Theory. Consider some communicationchannel transmitting symbols from a certain alphabet A, one after the other.This could be a telegraph transmitting packs of dots and dashes, accordingto the old Morse code, an optical fiber, transmitting packs of zeros and ones,according to the ASCII binary code, or any other process of sequential transmis-sion of information, such as our reader’s going through the text of this book, oneletter after the other. The objective is to measure the entropy of the channel,that is, the mean quantity of information it carries, per unit of time.

9.1.1 Entropy in Information Theory

It is assumed that each symbol has a given frequency, that is, a given probabilityof being used at any time in the communication. For example, if the channel istransmitting a text in English then the letter E is more likely to be used thanthe letter Z, say. The occurrence of rarer symbols, such as Z, restricts the kindof word or sentence in which they appear and, hence, is more informative thanthe presence of commoner symbols, such as E.

This suggests that information should be a function of probability: the moreunlikely a symbol (or a word, defined as a finite sequence of symbols) is, themore information it carries.

The situation is actually more complicated, because for most communicationcodes the probability of using a given symbol also depends on the context. Forexample, still assuming that the channel transmits in English, any sequence ofsymbols S, Y , S, T , E must be followed by an M : in this case, in view of thesymbols transmitted previously, this letter M is unavoidable, which also meansthat it carries no additional information.1

On the other hand, in those situations where symbols are transmitted atrandom, independently of each other, the information carried by each symbolsimply adds to the information conveyed by the previous ones. For example, if

1We once participated in a “treasure hunt” that consisted in searching the woods for hiddenletters that would form the name of a mathematical object. It just so happened that the firstthree letters that were found were Z, Z and Z. That unfortunate circumstance ended the gameprematurely, since at that point the remaining letters could add no information: there is onlyone mathematical object whose name includes three times the letter Z (the Yoccoz puzzle).

9.1. DEFINITION OF ENTROPY 251

the transmission reflects the outcomes of the successive flipping of a fair coin,then the amount of information associated with the outcome (Head, Tail, Tail)must be equal to the sum of the amounts of information associated with eachof the symbols Head, Tail and Tail. Now, by independence, the probabilityof the event (Head, Tail, Tail) is the product of the probabilities of the eventsHead, Tail and Tail.

This suggests that information should be defined in terms of the logarithm ofthe probability. In Information Theory it is usual to consider base 2 logarithms,because essentially all the communication channels one finds in practice arebinary. However, there is no reason to stick to that custom in our setting: wewill consider natural (base e) logarithms instead.

By definition, the quantity of information associated with a symbol a ∈ Ais given by

I(a) = − log pa (9.1.1)

where pa is the probability (frequency) of the symbol a. The mean informationassociated with the alphabet A is given by

I(A) =∑

a∈A

paI(a) =∑

a∈A

−pa log pa. (9.1.2)

More generally, the quantity of information associated with a word a1 . . . anis

I(a1 . . . an) = − log pa1...an (9.1.3)

where pa1...an denotes the probability of the word. In the independent case thiscoincides with the product pa1 . . . pan of the probabilities of the symbols, butnot in general. Denoting by An the set of all the words of length n, we define

I(An) =∑

a1,...,an

pa1...anI(a1, . . . , an) =∑

a1,...,an

−pa1...an log pa1...an . (9.1.4)

Finally, the entropy of the communication channel is defined by:

I = limn

1

nI(An). (9.1.5)

We invite the reader to check that the sequence I(An) is subadditive and, thus,the limit in (9.1.5) does exist. This is also contained in the much more generaltheory that we are about to present.

9.1.2 Entropy of a partition

We want to adapt these ideas to our context in Ergodic Theory. The maindifference is that, while in Information Theory the alphabet A is usually discrete(finite or, at most, countable), that is not the case for the domain (space ofstates) of most interesting dynamical systems. That issue is dealt with by usingpartitions of the domain.


Let (M,B, µ) be a probability space. In this chapter, by partition we alwaysmean a countable (finite or infinite) family P of pairwise disjoint measurablesubsets of M whose union has full measure. We denote by P(x) the element ofthe partition that contains a given point x. The sum P ∨Q of two partitions Pand Q is the partition whose elements are the intersections P ∩Q with P ∈ Pand Q ∈ Q. More generally, given any countable family of partitions Pn, wedefine ∨

n

Pn =⋂

n

Pn : Pn ∈ Pn for each n.

With each partition P we associate the corresponding information function

IP : M → R, IP(x) = − logµ(P(x)). (9.1.6)

It is clear that the function IP is measurable. By definition, the entropy of thepartition P is the mean of its information function, that is,

Hµ(P) =

∫IP dµ =

∑

P∈P

−µ(P ) logµ(P ). (9.1.7)

We always abide to the usual (in the theory of Lebesgue Integration) conventionthat 0 log 0 = limx→0 x log x = 0. See Figure 9.1.

1

Figure 9.1: Graph of the function φ(x) = −x log x

Consider the function φ : (0,∞) → R given by φ(x) = −x log x. One canreadily check that φ′′ < 0. Therefore, φ is strictly concave:

t1φ(x1) + · · · + tkφ(xk) ≤ φ(t1x1 + · · · + tkxk) (9.1.8)

for every x1, . . . , xk > 0 and t1, . . . , tk > 0 with t1 + · · ·+ tk = 1; moreover, theidentity holds if and only if x1 = · · · = xk. This observation will be useful onseveral occasions.

We say that two partitions P and Q are independent if µ(P ∩Q) = µ(P )µ(Q)for every P ∈ P and every Q ∈ Q. Then, IP∨Q = IP + IQ and, therefore,Hµ(P ∨ Q) = Hµ(P) +Hµ(Q). In general, one has the inequality ≤, as we aregoing to see in a while.


Example 9.1.1. Let M = [0, 1] be endowed with the Lebesgue measure. Foreach n ≥ 1, consider the partition Pn of the interval M into the subintervals((i− 1)/10n, i/10n

]with 1 ≤ i ≤ 10n. Then,

Hµ(Pn) =

10n∑

i=1

−10−n log 10−n = n log 10.

Example 9.1.2. Let M = 1, . . . , dN be endowed with a product measureµ = νN. Denote pi = ν(i) for each i ∈ 1, . . . , d. For each n ≥ 1, let Pn bethe partition of M into the cylinders [0; a1, . . . , an] of length n. The entropy ofPn is

Hµ(Pn) =∑

a1,...,an

−pa1 . . . pan log(pa1 . . . pan

)

=∑

j

∑

a1,...,an

−pa1 . . . paj . . . pan log paj

=∑

j

∑

aj

−paj log paj

∑

ai,i6=j

pa1 . . . paj−1paj+1 . . . pan .

The last sum is equal to 1, since∑

i pi = 1. Therefore,

Hµ(Pn) =

n∑

j=1

d∑

aj=1

−paj log paj =

n∑

j=1

d∑

i=1

−pi log pi = −nd∑

i=1

pi log pi.

Lemma 9.1.3. Every finite partition P has finite entropy: Hµ(P) ≤ log #Pand the identity holds if and only if µ(P ) = 1/#P for every P ∈ P.

Proof. Let P = P1, P2, . . . , Pn and consider ti = 1/n and xi = µ(Pi). By theconcavity property (9.1.8):

1

nHµ(P) =

n∑

i=1

tiφ(xi) ≤ φ( n∑

i=1

tixi)

= φ( 1

n

)=

logn

n.

Therefore, Hµ(P) ≤ logn. Moreover, the identity holds if and only if µ(Pi) =1/n for every i = 1, . . . , n.

Example 9.1.4. Let M = [0, 1] be endowed with the Lebesgue measure µ. Ob-serve that the series

∑∞k=1 1/(k(log k)2) is convergent. Let c be the value of the

sum. Then, we may partition [0, 1] into intervals Pk with µ(Pk) = 1/(ck(log k)2)for every k. Let P be the partition formed by these subintervals. Then,

Hµ(P) =∞∑

k=1

log c+ log k + 2 log log k

ck(log k)2.

By the ratio convergence criterion, the series on the right-hand side as the samebehavior as the series

∑∞k=1 1/(k log k) which, as we known (use the integral

convergence criterion), is divergent. Therefore, Hµ(P) = ∞.


This shows that infinite partitions may have infinite entropy. From now,through the whole chapter, we always consider (countable) partitions with finiteentropy.

The conditional entropy of a partition P with respect to another partitionQ is the number

Hµ(P/Q) =∑

P∈P

∑

Q∈Q

−µ(P ∩Q) logµ(P ∩Q)

µ(Q). (9.1.9)

Intuitively, it measures the amount information provided by the partition Pin addition to the information provided by the partition Q. It is clear thatHµ(P/M) = Hµ(P) for every P , where M denotes the trivial partition M =M. Moreover, if P and Q are independent then Hµ(P/Q) = Hµ(P). Ingeneral, one has the inequality ≤, as we are going to see in a while.

Given two partitions P and Q, we say that P is coarser than Q (or, equiva-lently, Q is finer than P) and we write P ≺ Q, if every element of Q is containedin some element of P , up to measure zero. The sum P ∨Q may also be definedas the coarsest of all the partitions R such that P ≺ R and Q ≺ R.

Lemma 9.1.5. Let P, Q and R be partitions with finite entropy. Then,

(a) Hµ(P ∨ Q/R) = Hµ(P/R) +Hµ(Q/P ∨R);

(b) if P ≺ Q then Hµ(P/R) ≤ Hµ(Q/R) and Hµ(R/P) ≥ Hµ(R/Q).

(c) P ≺ Q if and only if Hµ(P/Q) = 0.

Proof. By definition,

Hµ(P ∨ Q/R) =∑

P,Q,R

−µ(P ∩Q ∩R) logµ(P ∩Q ∩R)

µ(R)

=∑

P,Q,R

−µ(P ∩Q ∩R) logµ(P ∩Q ∩R)

µ(P ∩R)

+∑

P,Q,R

−µ(P ∩Q ∩R) logµ(P ∩R)

µ(R).

The sum on the right-hand side may be rewritten as

∑

S∈P∨R,Q∈Q

−µ(S ∩Q) logµ(S ∩Q)

µ(S)+

∑

P∈P,R∈R

−µ(P ∩R) logµ(P ∩R)

µ(R)

= Hµ(Q/P ∨R) +Hµ(P/R).

This proves part (a). Next, observe that if P ≺ Q then

Hµ(P/R) =∑

P

∑

R

∑

Q⊂P

−µ(Q ∩R) logµ(P ∩R)

µ(R)

≤∑

P

∑

R

∑

Q⊂P

−µ(Q ∩R) logµ(Q ∩R)

µ(R)= Hµ(Q/R).


This proves the first half of claim (b). To prove the second half, note that forany P ∈ P and R ∈ R,

µ(R ∩ P )

µ(P )=∑

Q⊂P

µ(Q)

µ(P )

µ(R ∩Q)

µ(Q).

It is clear that∑

Q⊂P µ(Q)/µ(P ) = 1. Therefore, by (9.1.8),

φ(µ(R ∩ P )

µ(P )

)≥∑

Q⊂P

µ(Q)

µ(P )φ(µ(R ∩Q)

µ(Q)

)

for every P ∈ P and R ∈ R. Consequently,

Hµ(R/P) =∑

P,R

µ(P )φ(µ(R ∩ P )

µ(P )

)≥∑

P,R

µ(P )∑

Q⊂P

µ(Q)

µ(P )φ(µ(R ∩Q)

µ(Q)

)

=∑

Q,R

µ(Q)φ(µ(R ∩Q)

µ(Q)

)= Hµ(R/Q).

Finally, it follows from the definition in (9.1.9) that Hµ(P/Q) = 0 if and only if

µ(P ∩Q) = 0 or elseµ(P ∩Q)

µ(Q)= 1

for every P ∈ P and every Q ∈ Q. In other words, either Q is disjoint from P(up to measure zero) or else Q is contained in P (up to measure zero). Thismeans that Hµ(P/Q) = 0 if and only if P ≺ Q.

In particular, taking Q = M in part (b) of the lemma we get that

Hµ(R/P) ≤ Hµ(R) for any partitions R and P . (9.1.10)

Moreover, taking R = M in part (a) we find that

Hµ(P ∨ Q) = Hµ(P) +Hµ(Q/P) ≤ Hµ(P) +Hµ(Q). (9.1.11)

Let f : M → N be a measurable transformation and µ be a probabilitymeasure on M . Then, f∗µ is a probability measure on N . Moreover, if P isa partition of N then f−1(P) = f−1(P ) : P ∈ P is a partition of M . Bydefinition,

Hµ(f−1(P)) =

∑

P∈P

−µ(f−1(P )) log µ(f−1(P ))

=∑

P∈P

−f∗µ(P ) log f∗µ(P ) = Hf∗µ(P).(9.1.12)

In particular, if M = N and the measure µ is invariant under f then

Hµ(f−1(P)) = Hµ(P) for every partition P . (9.1.13)

We also need the following continuity property:


Lemma 9.1.6. Given k ≥ 1 and ε > 0 there exists δ > 0 such that, for anyfinite partitions P = P1, . . . , Pk and Q = Q1, . . . , Qk,

µ(Pi∆Qi) < δ for every i = 1, . . . , k ⇒ Hµ(Q/P) < ε.

Proof. Fix ε > 0 and k ≥ 1. Since φ : [0, 1] → R, φ(x) = −x log x is a continuousfunction, there exists ρ > 0 such that φ(x) < ε/k2 for every x ∈ [0, ρ)∪(1−ρ, 1].Let δ = ρ/k. Given partitions P and Q as in the statement, denote by R thepartition whose elements are the intersections Pi ∩ Qj with i 6= j and also theset ∪ki=1Pi ∩Qi. Note that µ(Pi ∩Qj) ≤ µ(Pi∆Qi) < δ for every i 6= j and

µ( k⋃

i=1

Pi ∩Qi)≥

k∑

i=1

(µ(Pi) − µ(Pi∆Qi)

)>

k∑

i=1

(µ(Pi) − δ

)= 1 − ρ.

Therefore,

Hµ(R) =∑

R∈R

φ(µ(R)) < #R ε

k2≤ ε.

It is clear from the definition that P ∨ Q = P ∨ R. Then, using (9.1.11) and(9.1.10),

Hµ(Q/P) = Hµ(P ∨Q) −Hµ(P) = Hµ(P ∨R) −Hµ(P)

= Hµ(R/P) ≤ Hµ(R) < ε.

This proves the lemma.

9.1.3 Entropy of a dynamical system

Let f : M → M be a measurable transformation preserving a probability mea-sure µ. The notion of entropy of the system (f, µ) that we introduce in whatfollows is inspired by (9.1.5).

Given a partition P of M with finite entropy, denote

Pn =n−1∨

i=0

f−i(P) for each n ≥ 1.

Observe that the element Pn(x) that contains x ∈M is given by:

Pn(x) = P(x) ∩ f−1(P(f(x))) ∩ · · · ∩ f−n+1(P(fn−1(x))).

It is clear that the sequence Pn is non-decreasing, that is, Pn ≺ Pn+1 for everyn. Therefore, the sequence of entropies Hµ(Pn) is also non-decreasing. Anotherimportant fact is that this sequence is subadditive:

Lemma 9.1.7. Hµ(Pm+n) ≤ Hµ(Pm) +Hµ(Pn) for every m,n ≥ 1.


Proof. By definition, Pm+n = ∨m+n−1i=0 f−i(P) = Pm ∨ f−m(Pn). Therefore,

using (9.1.11),Hµ(Pm+n) ≤ Hµ(Pm) +Hµ(f

−m(Pn)). (9.1.14)

On the other hand, since the measure µ is invariant under f , the property(9.1.13) implies that Hµ(f

−m(Pn)) = Hµ(Pn) for every m,n. Substituting thisfact in (9.1.14) we get the conclusion of the lemma.

In view of Lemma 3.3.4, it follows from Lemma 9.1.7 that the limit

hµ(f,P) = limn

1

nHµ(Pn) (9.1.15)

exists and coincides with the infinitum of the sequence on the left-hand side.We call hµ(f,P) the entropy of f with respect to the partition P. Observe thatthis entropy is a non-decreasing function of the partition:

P ≺ Q ⇒ hµ(f,P) ≤ hµ(f,Q). (9.1.16)

Indeed, if P ≺ Q then Pn ≺ Qn for every n. Using Lemma 9.1.5, it follows thatHµ(Pn) ≤ Hµ(Qn) for every n, and this implies (9.1.16).

Finally, the entropy of the system (f, µ) is defined by

hµ(f) = supPhµ(f,P), (9.1.17)

where the supremum is taken over all the partitions with finite entropy. A usefulobservation is that the definition is not affected if we take the supremum onlyover the finite partitions (see Exercise 9.1.2).

Example 9.1.8. Suppose that the invariant measure µ is supported on a pe-riodic orbit. In other words, there exist x ∈ M and k ≥ 1 such that fk(x) = xand the measure µ is given by

µ =1

k

(δx + δf(x) + · · · + δfk−1(x)

).

Note that this measure takes only a finite number of values (because the Diracmeasure takes only the values 0 and 1). Hence, the entropy function P 7→ Hµ(P)also takes only finitely many values. In particular, limn n

−1Hµ(Pn) = 0 forevery partition P . This proves that hµ(f) = 0.

Example 9.1.9. Consider the decimal expansion map f : [0, 1] → [0, 1], givenby f(x) = 10x−[10x]. As observed previously, f preserves the Lebesgue measureµ on the interval. Let P be the partition of [0, 1] into the intervals of the form((i− 1)/10, i/10] with i = 1, . . . , 10. Then, Pn is the partition into the intervals

of the form((i − 1)/10n, i/10n] with i = 1, . . . , 10n. Using the calculation in

Example 9.1.1, we get that

hµ(f,P) = limn

1

nHµ(Pn) = log 10.


Using the theory in Section 9.2 (the theorem of Kolmogorov-Sinai and its corol-laries), one can easily check that this is also the value of the entropy hµ(f), thatis, P realizes the supremum in the definition (9.1.17).

Example 9.1.10. Consider the shift map σ : Σ → Σ in Σ = 1, . . . , dN (orΣ = 1, . . . , dZ), with a Bernoulli measure µ = νN (respectively, µ = νZ). LetP be the partition of Σ into the cylinders [0;a] with a = 1, . . . , d. Then, Pn isthe partition into cylinders [0; a1, . . . , an] of length n. Using the calculation inExample 9.1.2 we conclude that

hµ(σ,P) = limn

1

nHµ(Pn) =

d∑

i=1

−pi log pi. (9.1.18)

The theory presented in Section 9.2 permits to prove that this is also the valueof the entropy hµ(σ).

It follows from the expression (9.1.18) that for every x > 0 there exists someBernoulli shift (σ, µ) such that hµ(σ) = x. We use this observation a few timesin what follows.

Lemma 9.1.11. hµ(f,Q) ≤ hµ(f,P) + Hµ(Q/P) for any partitions P and Qwith finite entropy.

Proof. By Lemma 9.1.5, for every n ≥ 1,

Hµ

(Qn+1/Pn+1

)= Hµ

(Qn ∨ f−n(Q)/Pn ∨ f−n(P)

)

≤ Hµ

(Qn/Pn

)+Hµ

(f−n(Q)/f−n(P)

).

The last term is equal to Hµ(Q/P), because the measure µ is invariant underf . Therefore, the previous relation proves that

Hµ

(Qn/Pn

)≤ nHµ

(Q/P

)for every n ≥ 1. (9.1.19)

Using Lemma 9.1.5 once more, it follows that

Hµ(Qn) ≤ Hµ(Pn ∨ Qn) = Hµ(Pn) +Hµ(Qn/Pn) ≤ Hµ(Pn) + nHµ(Q/P).

Dividing by n and taking the limit when n → ∞, we get the conclusion of thelemma.

Lemma 9.1.12. hµ(f,P) = limnHµ(P/∨nj=1 f

−j(P)) for any partition P withfinite entropy.

Proof. Using Lemma 9.1.5(a) and the fact that the measure µ is invariant underf , we get that

Hµ

( n−1∨

j=0

f−j(P))

= Hµ

( n−1∨

j=1

f−j(P))

+Hµ

(P/

n−1∨

j=1

f−j(P))

= Hµ

( n−2∨

j=0

f−j(P))

+Hµ

(P/

n−1∨

j=1

f−j(P))


for every n. By recurrence, it follows that

Hµ

( n−1∨

j=0

f−j(P))

= Hµ(P) +

n−1∑

k=1

Hµ

(P/

k∨

j=1

f−j(P)).

Therefore, hµ(f,P) is given by the Cesaro limit

hµ(f,P) = limn

1

nHµ

( n−1∨

j=0

f−j(P))

= limn

1

n

n−1∑

k=1

Hµ

(P/

k∨

j=1

f−j(P)).

On the other hand, Lemma 9.1.5(b) ensures that Hµ(P/ ∨nj=1 f−j(P)) is a

non-increasing sequence. In particular, limnHµ(P/ ∨nj=1 f−j(P)) exists and,

consequently, coincides with the Cesaro limit in the previous identity.

Recall that Pn = ∨n−1j=0 f

−j(P). When f : M → M is invertible, we also

consider P±n = ∨n−1j=−nf

−j(P).

Lemma 9.1.13. Let P be a partition with finite entropy. For every k ≥ 1, wehave hµ(f,P) = hµ(f,Pk) and, if f is invertible, hµ(f,P) = hµ(f,P±k).

Proof. Observe that, given any n ≥ 1,

n−1∨

j=0

f−j(Pk) =

n−1∨

j=0

f−j( k−1∨

i=0

f−i(P))

=

n+k−2∨

l=0

f−l(P) = Pn+k−1.

Therefore,

hµ(f,Pk

)= lim

n

1

nHµ

(Pn+k−1

)= lim

n

1

nHµ

(Pn)

= hµ(f,P

).

This proves the first part of the lemma. To prove the second part, note that

n−1∨

j=0

f−j(P±k) =

n−1∨

j=0

f−j( k−1∨

i=−k

f−i(P))

=

n+k−2∨

l=−k

f−l(P) = f−k(Pn+2k−1

)

for every n and every k. Therefore,

hµ(f,P±k

)= lim

n

1

nHµ

(f−k(Pn+2k−1)

)= lim

n

1

nHµ

(Pn+2k−1

)= hµ

(f,P

)

(the second equality uses the fact that µ is invariant under f).

Proposition 9.1.14. One has hµ(fk) = khµ(f) for every k ∈ N. If f is

invertible then hµ(fk) = |k|hµ(f) for every k ∈ Z.


Proof. It is clear that the identity holds for k = 0, since f0 = id and hµ(id) = 0.Take k to be non-zero from now on. Let g = fk and P be any partition of Mwith finite entropy. Recalling that Pk = P ∨ f−1(P) ∨ · · · ∨ f−k+1(P), we seethat

Pkm =

km−1∨

j=0

f−j(P) =

m−1∨

i=0

f−ki( k−1∨

j=0

f−j(P))

=

m−1∨

i=0

g−i(Pk).

Therefore,

khµ(f,P

)= lim

m

1

mHµ

(Pkm

)= hµ

(g,Pk

). (9.1.20)

Since P ≺ Pk, this implies that hµ(g,P) ≤ khµ(f,P) ≤ hµ(g) for any P . Takingthe supremum over these partitions P , it follows that hµ(g) ≤ khµ(f) ≤ hµ(g).This proves that khµ(f) = hµ(g), as stated.

Now suppose that f is invertible. Let P be any partition of M with finiteentropy. For any n ≥ 1,

Hµ

(∨n−1j=0 f

−j(P))

= Hµ

(f−n+1

(∨n−1i=0 f i(P)

))= Hµ

(∨n−1i=0 f i(P)

),

because the measure µ is invariant under f . Dividing by n and taking the limitwhen n→ ∞, we get that

hµ(f,P) = hµ(f−1,P). (9.1.21)

Taking the supremum over these partitions P , it follows that hµ(f) = hµ(f−1).

Replacing f with fk and using the first half of the proposition, we get thathµ(f

−k) = hµ(fk) = khµ(f) for every k ∈ N.

9.1.4 Exercises

9.1.1. Prove that Hµ(P/R) ≤ Hµ(P/Q) + Hµ(Q/R) for any partitions P , Qand R.

9.1.2. Show that the supremum of hµ(f,P) over the finite partitions coincideswith the supremum over all the partitions with finite entropy.

9.1.3. Check that limnHµ(∨k−1i=0 f

−i(P)/ ∨nj=k f−j(P)) = kh(f,P) for everypartition P with finite entropy and every k ≥ 1.

9.1.4. Let f : M →M be a measurable transformation preserving a probabilitymeasure µ.

(a) Assume that there exists an invariant set A ⊂ M with µ(A) ∈ (0, 1). LetµA and µB be the normalized restrictions of µ to the sets A and B = Ac,respectively. Show that hµ(f) = µ(A)hµA(f) + µ(B)hµB (f).

(b) Suppose that µ is a convex combination µ =∑n

i=1 aiµi of ergodic measuresµ1, . . . , µn. Show that hµ(f) =

∑ni=1 aihµi(f).

9.2. THEOREM OF KOLMOGOROV-SINAI 261

[Observation: In Section 9.6 we prove much stronger results.]

9.1.5. Let (M,B, µ) and (N, C, ν) be probability spaces and f : M → M andg : N → N be measurable transformations preserving the measures µ and ν,respectively. We say that (g, ν) is a factor of (f, µ) if there exists a measurablemap, not necessarily invertible, φ : (M,B) → (N, C) such that φ∗µ = ν andφ f = g φ at almost every point. Show that in that case hν(g) ≤ hµ(f).

9.2 Theorem of Kolmogorov-Sinai

In general, the main difficulty to calculate the entropy lies in the calculation ofthe supremum in (9.1.17). The methods that we develop in this section permitto simplify that task in many cases, by identifying certain partitions P thatrealize the supremum, that is, such that hµ(f,P) = hµ(f). The main result is:

Theorem 9.2.1 (Kolmogorov-Sinai). Let P1 ≺ · · · ≺ Pn ≺ · · · be a non-decreasing sequence of partitions with finite entropy such that ∪∞

n=1Pn generatesthe σ-algebra of measurable sets, up to measure zero. Then,

hµ(f) = limnhµ(f,Pn).

Proof. The limit always exists, for property (9.1.16) implies that the sequencehµ(f,Pn) is non-decreasing. The inequality ≥ in the statement is a direct con-sequence of the definition of entropy. Therefore, we only need to show thathµ(f,Q) ≤ limn hµ(f,Pn) for every partition Q with finite entropy. We use thefollowing fact, which is interesting in itself:

Proposition 9.2.2. Let A be an algebra that generates the σ-algebra of mea-surable sets, up to measure zero. For every partition Q with finite entropy andevery ε > 0 there exists some finite partition P ⊂ A such that Hµ(Q/P) < ε.

Proof. The first step is to reduce the statement to the case when Q is finite.Denote by Qj , j = 1, 2, . . . the elements of Q. For each k ≥ 1, consider thefinite partition

Qk = Q1, . . . , Qk,M \ ∪kj=1Qj.Lemma 9.2.3. If Q is a partition with finite entropy then limkHµ(Q/Qk) = 0.

Proof. Denote Q0 = M \ ∪kj=1Qj . By definition,

Hµ(Q/Qk) =

k∑

i=0

∑

j≥1

−µ(Qi ∩Qj) logµ(Qi ∩Qj)µ(Qi)

.

All the terms with i ≥ 1 vanish, since in that case µ(Qi ∩Qj) is equal to zero ifi 6= j and is equal to µ(Qi) if i = j. For i = 0 we have that µ(Qi ∩Qj) is equalto zero if j ≤ k and is equal to µ(Qj) if j > k. Therefore,

Hµ(Q/Qk) =∑

j>k

−µ(Qj) logµ(Qj)

µ(Q0)≤∑

j>k

−µ(Qj) logµ(Qj).


The hypothesis that Q has finite entropy means that the expression on theright-hand side converges to zero when k → ∞.

Given ε > 0, fix k ≥ 1 such that Hµ(Q/Qk) < ε/2. Consider any δ > 0. Bythe approximation theorem (Theorem A.1.19), for each i = 1, . . . , k there existsAi ∈ A such that

µ(Qi∆Ai) < δ/(2k2). (9.2.1)

Define P1 = A1 and Pi = Ai \∪i−1j=1Aj for i = 2, . . . , k and P0 = M \∪k−1

j=1Aj . Itis clear that P = P1, . . . , Pk, P0 is a partition of M and also that Pi ∈ A forevery i. For i = 1, . . . , k, we have Pi∆Ai = Pi \Ai = Ai ∩

(∪i−1j=1 Aj

). For any

x in this set, there is j < i such that x ∈ Ai ∩Aj . Since Qi ∩Qj = ∅, it followsthat x ∈ (Ai \Qi) ∪ (Aj \Qj). This proves that

Pi∆Ai ⊂ ∪ij=1(Aj \Qj) ⊂ ∪ij=1(Aj∆Qj)

and so µ(Pi∆Ai) < iδ/(2k2) ≤ δ/(2k). Together with (9.2.1), this implies that

µ(Pi∆Qi) < δ/(2k2) + δ/(2k) ≤ δ/k for i = 1, . . . , k. (9.2.2)

Moreover, P0∆Q0 ⊂ ∪ki=1Pi∆Qi since P and Qk are partitions of M . Hence,(9.2.2) implies that

µ(P0∆Q0) < δ. (9.2.3)

By Lemma 9.1.6, the relations (9.2.2) and (9.2.3) imply that Hµ(Qk/P) <ε/2, as long as we take δ > 0 sufficiently small. Then, by the inequality inExercise 9.1.1,

Hµ(Q/P) ≤ Hµ(Q/Qk) +Hµ(Qk/P) < ε,

as stated.

Corollary 9.2.4. If (Pn)n is a sequence of partitions as in Theorem 9.2.1 thenlimnHµ(Q/Pn) = 0 for every partition Q with finite entropy.

Proof. For each n, let An be the algebra generated by Pn. Note that the ele-ments of An are the finite unions of elements of Pn together with the comple-ments of such unions. Moreover, A = ∪nAn is the algebra generated by ∪nPn.Consider any ε > 0. By Proposition 9.2.2, there exists a finite partition P ⊂ Asuch that Hµ(Q/P) < ε. Hence, since P is finite, there exists m ≥ 1 such thatP ⊂ Am and, thus, P is coarser than Pm. Then, using Lemma 9.1.5,

Hµ(Q/Pn) ≤ Hµ(Q/Pm) ≤ Hµ(Q/P) < ε for every n ≥ m.

This completes the proof of the corollary.

We are ready to conclude the proof of Theorem 9.2.1. By Lemma 9.1.11,

hµ(f,Q) ≤ hµ(f,Pn) +Hµ(Q/Pn) for every n.

Taking the limit as n → ∞, we get that hµ(f,Q) ≤ limn hµ(f,Pn) for everypartition Q with finite entropy.


9.2.1 Generating partitions

In this section and the ones that follow, we deduce several useful consequencesof Theorem 9.2.1.

Corollary 9.2.5. Let P be a partition with finite entropy such that the unionof the iterates Pn = ∨n−1

j=0 f−j(P), n ≥ 1 generates the σ-algebra of measurable

sets, up to measure zero. Then, hµ(f) = hµ(f,P).

Proof. It suffices to apply Theorem 9.2.1 to the sequence Pn and to recall thathµ(f,Pn) = hµ(f,P) for every n, by Lemma 9.1.13.

Corollary 9.2.6. Assume that the system (f, µ) is invertible. Let P be a parti-tion with finite entropy such that the union of the iterates P±n = ∨n−1

j=−nf−j(P),

n ≥ 1 generates the σ-algebra of measurable sets, up to measure zero. Then,hµ(f) = hµ(f,P).

Proof. It suffices to apply Theorem 9.2.1 to the sequence P±n and to recall thathµ(f,P±n) = hµ(f,P) for every n, by Lemma 9.1.13.

In particular, the Corollaries 9.2.5 and 9.2.6 complete the calculation of theentropy of the decimal expansion and the Bernoulli shifts that we started inExamples 9.1.9 and 9.1.10, respectively.

In both situations, Corollary 9.2.5 and Corollary 9.2.6, we say that P isa generating partition or, simply, a generator of the system. Note, however,that this contains a certain abuse of language, since the conditions in the twocorollaries are not equivalent. For example, for the sift map in M = 1, . . . , dZ,the partition P into cylinders [0; a] : a = 1, . . . , d is such that the unionof the two-sided iterates P±n generates the σ-algebra of measurable sets butthe union of the one-sided iterates Pn does not. Whenever it is necessary todistinguish between these two concepts, we talk of one-sided generator and two-sided generator, respectively.

In this regard, let us point out that certain invertible systems admit one-sided generators. For example, if f : S1 → S1 is an irrational rotation andP = I, S1 \ I is a partition of the circle into two complementary intervals,then P is a one-sided generator (and also a two-sided generator, of course).However, this kind of situation is possible only for systems with entropy zero:

Corollary 9.2.7. Assume that the system (f, µ) is invertible and there existsa partition P with finite entropy such that ∪∞

n=1Pn generates the σ-algebra ofmeasurable sets, up to measure zero. Then, hµ(f) = 0.

Proof. Combining Lemma 9.1.12 and Corollary 9.2.5, we get that

hµ(f) = hµ(f,P) = limnHµ(P/f−1(Pn)).

Since ∪nPn generates the σ-algebra B of measurable sets, ∪nf−1(Pn) generatesthe σ-algebra f−1(B). Now notice that f−1(B) = B, since f is invertible. Hence,Corollary 9.2.4 implies that Hµ(P/f−1(Pn)) converges to zero when n → ∞.It follows that hµ(f) = hµ(f,P) = 0.


Now take M to be a metric space and µ to be a Borel probability measure.

Corollary 9.2.8. Let P1 ≺ · · · ≺ Pn ≺ · · · be a non-decreasing sequenceof partitions with finite entropy such that diamPn(x) → 0 for µ-almost everyx ∈M . Then,

hµ(f) = limnhµ(f,Pn).

Proof. Let U be any non-empty open subset of M . The hypothesis ensures thatfor each x ∈ U there exists n(x) such that the set Px = Pn(x)(x) is containedin U . It is clear that Px belongs to the algebra A generated by ∪nPn. Observealso that A is countable, since it consists of the finite unions of elements of thepartitions Pn together with the complements of such unions. In particular, themap x 7→ Px takes only countably many values. It follows that U = ∪x∈UPx isin the σ-algebra generated by A. This proves that the σ-algebra generated byA contains all the open sets and, thus, all the Borel sets. Now, the conclusionfollows directly from Theorem 9.2.1.

Example 9.2.9. Let f : S1 → S1 be a homeomorphism and µ be any invariantprobability measure. Given a finite partition P of S1 into subintervals, denoteby x1, . . . , xm their endpoints. For any j ≥ 1, the partition f−j(P) consists ofthe subintervals of S1 determined by the points f−j(xi). This implies that, foreach n ≥ 1, the elements of Pn have their endpoints in the set

f−j(xi) : j = 0, . . . , n− 1 and i = 1, . . . ,m.In particular, #Pn ≤ mn. Then, using Lemma 9.1.3,

hµ(f,P) = limn

1

nHµ(Pn) ≤ lim

n

1

nlog #Pn = lim

n

1

nlogmn = 0.

It follows that hµ(f) = 0: to see that, it suffices to consider any sequence of finitepartitions into intervals with diameter going to zero and to apply Corollary 9.2.8.

Corollary 9.2.10. Let P be a partition with finite entropy such that we havediamPn(x) → 0 for µ-almost every x ∈M . Then, hµ(f) = hµ(f,P).

Proof. It suffices to apply Corollary 9.2.8 to the sequence Pn, recalling thathµ(f,Pn) = hµ(f,P) for every n.

Analogously, if f is invertible and P is a partition with finite entropy andsuch that diamP±n(x) → 0 for µ-almost every x ∈M , then hµ(f) = hµ(f,P).

It is known that generators do exist in most interesting cases, although it maybe difficult to exhibit a generator explicitly. Indeed, suppose that the ambientM is a Lebesgue space. Rokhlin [Rok67a, §10] proved that if a system is aperi-odic (that is, the set of periodic points has measure zero) and almost every pointhas a countable (finite or infinite) set of pre-images, then there exists some gen-erator. In particular, every invertible aperiodic system admits some countablegenerator. In general, this generator may have infinite entropy. But Rokhlinalso showed that every invertible aperiodic system with finite entropy admitssome two-sided generator with finite entropy. Moreover (Krieger [Kri70]), thisgenerator may be chosen to be finite if the system is ergodic.


9.2.2 Semi-continuity of the entropy

Next, we examine some properties of the entropy function that associates witheach invariant measure µ of a given transformation f the value of the cor-responding entropy hµ(f). We are going to see that this function is usuallynot continuous. However, under quite broad assumptions, it is upper semi-continuous : given any ε > 0, one has hν(f) ≤ hµ(f) + ε for every ν sufficientlyclose to µ. That holds, in particular, for the class of transformations that we callexpansive. These facts have important consequences, some of which are exploredin Sections 9.2.3 and 9.6. Moreover, we return to this subject in Section 10.5.

Let us start by showing, through an example, that the entropy function maybe discontinuous:

Example 9.2.11. Let f : [0, 1] → [0, 1] be the decimal expansion map. As wesaw in Example 9.1.9, the entropy of f with respect to the Lebesgue measurem is hm(f) = log 10. For each k ≥ 1, denote by Fk the set of fixed points ofthe iterate fk. Observe that Fk is an invariant set with #Fk = 10k. Observealso that these sets are equidistributed, in the following sense: each interval[(i− 1)/10k, i/10k] contains exactly one point of Fk. Consider the sequence ofmeasures

µk =1

10k

∑

x∈Fk

δx.

The previous observations imply (check!) that each µk is an invariant probabilitymeasure and the sequence (µk)k converges to the Lebesgue measure m in theweak∗ topology. Since µk is supported on a finite set, the same argument as inExample 9.1.8 proves that hµk

(f) = 0 for every k. In particular, we have thatlimk hµk

(f) = 0 < hm(f).

On the other hand, in general, consider any finite partition P of M whoseboundary

∂P =⋃

P∈P

∂P

satisfies µ(∂P) = 0. By Theorem 2.1.2 or, more precisely, by the fact that thetopology (2.1.5) is equivalent to the weak∗ topology, the function ν 7→ ν(P ) iscontinuous at the point µ, for every P ∈ P . Consequently, the function

ν 7→ Hν(P) =∑

P∈P

−ν(P ) log ν(P ),

is also continuous at µ. The hypothesis on P also implies that µ(∂Pn) = 0 forevery n ≥ 1, since

∂Pn ⊂ ∂P ∪ f−1(∂P) ∪ · · · ∪ f−n+1(∂P).

Thus, the same argument shows that ν 7→ Hν(Pn) is continuous for every n.

Proposition 9.2.12. Let P be a finite partition such that µ(∂P) = 0. Then,the function ν 7→ hν(f,P) is upper semi-continuous at µ.


Proof. Recall that, by definition,

hν(f,P) = infn

1

nHν(f,P).

It is well-known easy fact that the infimum of any family of continuous functionsis an upper semi-continuous function.

Corollary 9.2.13. Assume that there exists a finite partition P such thatµ(∂P) = 0 and ∪nPn generates the σ-algebra of measurable sets of M , upto measure zero. Then, the function η 7→ hη(f) is upper semi-continuous at µ.

Proof. By Proposition 9.2.12, given ε > 0 there exists a neighborhood U of µsuch that hν(f,P) ≤ hµ(f,P) + ε for every ν ∈ V . By definition, hµ(f,P) ≤hµ(f). By Corollary 9.2.5, the hypothesis implies that hν(f,P) = hν(f) forevery ν. Therefore, hν(f) ≤ hµ(f) + ε for every ν ∈ V .

Now let us suppose that M is a compact metric space. As before, take µto be a Borel probability measure. By definition, the diameter diamP of apartition P is the supremum of the diameters of its elements. Then we have thefollowing more specialized version of the previous corollary:

Corollary 9.2.14. Assume that there exists ε0 > 0 such that every finite par-tition P with diamP < ε0 satisfies limn diamPn = 0. Then, the functionµ 7→ hµ(f) is upper semi-continuous. Consequently, that function is boundedand its supremum is attained for some measure µ.

Proof. As we saw in Corollary 9.2.10, the property limn diamPn = 0 impliesthat ∪nPn generates the σ-algebra of measurable sets. On the other hand, givenany invariant probability measure µ, it is easy to choose2 a partition P withdiameter smaller than ε0 and such that µ(∂P) = 0. It follows from the previouscorollary that the entropy function is upper semi-continuous at µ and, since µis arbitrary, this gives the first claim in the statement.

The other claims are general consequences of semi-continuity and fact thatthe domain of entropy function, that is, the space M1(f) of all invariant prob-ability measures is compact.

When f is invertible we may replace Pn by P±n = ∨n−1j=−nf

−j(P) in thestatement of the Corollaries 9.2.13 and 9.2.14. The proof is analogous, usingCorollary 9.2.5 and the version of Corollary 9.2.10 for invertible transformations.

2For example: for each x choose rx ∈ (0, ε0) such that the boundary of the ball of centerx and radius rx has measure zero. Then let U be a finite cover of M by such balls and takefor P the partition defined by U , that is, the partition whose elements are the maximal setsthat, for each U ∈ U , are either contained in U or disjoint from U ; see Figure 2.1.


9.2.3 Expansive transformations

Next, we discuss a rather broad class of transformations that satisfy the condi-tions in Corollary 9.2.14.

A continuous transformation f : M → M in a metric space is said to beexpansive if there exists ε0 > 0 (called constant of expansivity) such that, givenany x, y ∈ M with x 6= y, there exists n ∈ N such that d(fn(x), fn(y)) ≥ ε0.That is, any two distinct orbits of f may be distinguished, at a macroscopicscale, at some stage of the iteration.

When f is invertible, there is also a two-sided version of the notion of ex-pansivity, defined as follows: there exists ε0 > 0 such that, given any x, y ∈ Mwith x 6= y there exists n ∈ Z such that d(fn(x), fn(y)) ≥ ε0. It is clear that(one-sided) expansive homeomorphisms are also two-sided expansive.

Example 9.2.15. Let σ : Σ → Σ be the shift map in Σ = 1, . . . , dN. Considerin Σ the distance defined by d((xn)n, (yn)n) = 2−N , where N is the smallestvalue of n such that xn 6= yn. Note that d(σN (x), σN (y)) = 20 = 1 if x = (xn)nand y = (yn)n are distinct. This shows that σ is an expansive transformation,with ε0 = 1 as constant of expansivity.

Analogously, the two-sided shift map σ : Σ → Σ in Σ = 1, . . . , dZ istwo-sided expansive (but not one-sided expansive).

We leave it to the reader to check (Exercise 9.2.1) that the decimal expansiontransformation f(x) = 10x− [10x] is also expansive. On the other hand, torusrotations are never expansive.

Proposition 9.2.16. Let f : M → M be an expansive transformation ina compact metric space and let ε0 > 0 be a constant of expansivity. Then,limn diamPn = 0 for every finite partition P with diamP < ε0.

Proof. It is clear that the sequence diamPn is non-increasing. Suppose that itsinfimum δ is positive. Then, for every n ≥ 1 there exist points xn and yn suchthat d(xn, yn) > δ/2 but xn and yn belong to the same element of Pn and, thus,satisfy

d(f j(xn), f j(yn)) ≤ diamP < ε0 for every 0 ≤ j < n.

By compactness, there exists a sequence (nj)j → ∞ such that (xnj )j and(ynj )j converge to points x and y, respectively. Then, d(x, y) ≥ δ/2 > 0 butd(f j(x), f j(y)) ≤ diamP < ε0 for every j ≥ 0. This contradicts the hypothesisthat ε0 is a constant of expansivity.

Corollary 9.2.17. If f : M →M is an expansive transformation in a compactmetric space then the entropy function is upper semi-continuous. Moreover,there exist invariant probability measures µ whose entropy hµ(f) is maximumamong all the invariant probability measures of f .

Proof. Combine Proposition 9.2.16 with Corollary 9.2.14.

If the transformation f is invertible and two-sided expansive, we may replacePn by P±n in Proposition 9.2.16 and the conclusion of Corollary 9.2.17 alsoremains valid as stated.


9.2.4 Exercises

9.2.1. Show that the decimal expansion f : [0, 1] → [0, 1], f(x) = 10x − [10x]is expansive and exhibit a constant of expansivity.

9.2.2. Check that for every s > 0 there exists some Bernoulli shift (σ, µ) whoseentropy is equal to s.

9.2.3. Let X = 0 ∪ 1/n : n ≥ 1 and consider the space Σ = XN endowedwith the distance d((xn)n, (yn)n) = 2−N |xN − yN |, where N = minn ∈ N :xn 6= yn.

(a) Verify that the shift map σ : Σ → Σ is not expansive.

(b) For each k ≥ 1, let νk be the probability measure on X that assigns weight1/2 to each of the points 1/k and 1/(k + 1). Use the Bernoulli measuresµk = νN

k to conclude that the entropy function of the shift is not uppersemi-continuous.

(c) Let µ be the Bernoulli measure associated with any probability vector(px)x∈X such that

∑x∈X −px log px = ∞. Show that hµ(σ) is infinite.

9.2.4. Let f : S1 → S1 be a covering map of degree d ≥ 2 and µ be a probabilitymeasure invariant under f . Show that hµ(f) ≤ log d.

9.2.5. Let P and Q be two partitions with finite entropy. Show that if P iscoarser than ∨∞

j=0f−j(Q) then hµ(f,P) ≤ hµ(f,Q).

9.2.6. Show that if A is an algebra that generates the σ-algebra of measurablesets, up to measure zero, then the supremum of hµ(f,P) over the partitionswith finite entropy (or even the finite partitions) P ⊂ A coincides with hµ(f).

9.2.7. Consider transformations f : M → M and g : N → N preservingprobability measures µ and ν, respectively. Consider f × g : M ×N →M ×Ndefined by (f × g)(x, y) = (f(x), g(y)). Show that f × g preserves the productmeasure µ× ν and that hµ×ν(f × g) = hµ(f) + hν(g).

9.3 Local entropy

The theorem of Shannon-McMillan-Breiman, which we discuss in this section,provides a complementary view of the concept of entropy, more detailed andmore local in nature. We also mention a topological version of that idea, whichis due to Brin-Katok.

Theorem 9.3.1 (Shannon-McMillan-Breiman). Given any partition P withfinite entropy, the limit

hµ(f,P , x) = limn

− 1

nlogµ(Pn(x)) exists at µ-almost every point. (9.3.1)

9.3. LOCAL ENTROPY 269

The function x 7→ hµ(f,P , x) is µ-integrable, and the limit in (9.3.1) also holdsin L1(µ). Moreover,

∫hµ(f,P , x) dµ(x) = hµ(f,P).

If (f, µ) is ergodic then hµ(f,P , x) = hµ(f,P) at µ-almost every point.

Recall that Pn(x) = P(x) ∩ f−1(P(f(x))) ∩ · · · ∩ f−n+1(P(fn−1(x))), thatis, Pn(x) is formed by the points whose trajectories remains “close” to thetrajectory of x during n iterates, in the sense that they visit the same elementsof P . Theorem 9.3.1 states that the measure of this set has a well definedexponential rate of decay: at µ-almost every point,

µ(Pn(x)) ≈ e−nhµ(f,P,x) for every large n.

The proof of the theorem is presented in Section 9.3.1.The theorem of Brin-Katok that we state in the sequel belongs to the same

family of results, but uses a different notion of proximity.

Definition 9.3.2. Let f : M → M be a continuous map in a compact metricspace. Given x ∈ M , n ≥ 1 and ε > 0, we call dynamical ball of length n andradius ε around x the set

B(x, n, ε) = y ∈M : d(f j(x), f j(y)) < ε for every j = 0, 1, . . . , n− 1.

In other words,

B(x, n, ε) = B(x, ε) ∩ f−1(B(f(x), ε)) ∩ · · · ∩ f−n+1(B(fn−1(x), ε)).

Define,

h+µ (f, ε, x) = lim sup

n− 1

nlogµ(B(x, n, ε)) and

h−µ (f, ε, x) = lim infn

− 1

nlogµ(B(x, n, ε)).

Theorem 9.3.3 (Brin-Katok). Let µ be a probability measure invariant underf . The limits

limε→0

h+µ (f, ε, x) and lim

ε→0h−µ (f, ε, x)

exist and are equal, for µ-almost every point. Denoting by hµ(f, x) their commonvalue, the function hµ(f, ·) is integrable and

hµ(f) =

∫hµ(f, x)dµ(x).

The proof of this result may be found in the original paper of Brin, Ka-tok [BK83] and is not presented here.


Example 9.3.4 (Translations in compact groups). Let G be a compact metriz-able group and µ be its Haar measure. Every translation of G has zero entropywith respect to µ. Indeed, consider in G any distance d that is invariant underall the translations (recall Lemma 6.3.6). Relative to such a distance,

Ljg(B(x, ε)) = B(Ljg(x), ε)

for every g ∈ G and j ∈ Z. Consequently, B(x, n, ε) = B(x, ε) for every n ≥ 1.Then,

h±µ (Lg, ε, x) = limn

− 1

nlog µ(B(x, ε)) = 0

for every ε > 0 and x ∈ G. By the theorem of Brin-Katok, it follows thathµ(Lg) = 0 for every g ∈ G. The same argument applies to every right-translation Rg.

9.3.1 Proof of the Shannon-McMillan-Breiman theorem

Consider the sequence of functions ϕn : M → R defined by

ϕn(x) = − logµ(Pn(x))

µ(Pn−1(f(x))).

By telescopic cancellation,

− 1

nlogµ(Pn(x)) = − 1

nlogµ(P(fn−1(x))) +

1

n

n−2∑

j=0

ϕn−j(fj(x)) (9.3.2)

for every n and every x.

Lemma 9.3.5. The sequence −n−1 logµ(P(fn−1(x))) converges to zero µ-almost everywhere and in L1(µ).

Proof. Start by noting that the function x 7→ − logµ(P(x)) is integrable:∫

| logµ(P(x))| dµ(x) =

∫− logµ(P(x)) dµ(x) = Hµ(P) <∞.

Using Lemma 3.2.5, it follows that −(n − 1)−1 logµ(P(fn−1(x))) converges tozero at µ-almost every point. Moreover, it is clear that this conclusion is notaffected if one replaces n − 1 by n in the denominator. This proves the claimof µ-almost everywhere convergence. Next, using the fact that the measure µ isinvariant and Hµ(P) <∞,

‖ − 1

nlogµ(P(fn−1(x)))‖1 =

1

n

∫− logµ(P(fn−1(x))) dµ(x) =

1

nHµ(P)

converges to zero when n→ ∞. This proves the convergence in L1(µ).

Next, we show that the last term in (9.3.2) also converges µ-almost every-where and in L1(µ).


Lemma 9.3.6. The limit ϕ(x) = limn ϕn(x) exists at µ-almost every point.

Proof. For each n > 1, denote by Qn the partition of M defined by

Qn(x) = f−1(Pn−1(f(x))) = f−1(P(f(x))) ∩ · · · ∩ f−n+1(P(fn−1(x))).

Note that µ(Pn−1(f(x)) = µ(Qn(x)) and Pn(x) = P(x) ∩ Qn(x). Therefore,

µ(Pn(x))

µ(Pn−1(f(x)))=µ(P(x) ∩ Qn(x))

µ(Qn(x)). (9.3.3)

For each P ∈ P and n > 1, consider the conditional expectation (recall Sec-tion 5.2.1)

en(XP , x) =1

µ(Qn(x))

∫

Qn(x)

XP dµ =µ(P ∩Qn(x))

µ(Qn(x)).

Comparing with (9.3.3) we see that

en(XP , x) =µ(Pn(x))

µ(Pn−1(f(x)))for every x ∈ P.

By Lemma 5.2.1, the limit e(XP , x) = limn en(XP , x) exists for µ-almost everyx ∈ M and, in particular, for µ-almost every x ∈ P . Since P ∈ P is arbitrary,this proves that

limn

µ(Pn(x))µ(Pn−1(f(x)))

exists for µ-almost every point. Taking logarithms, we get that limn ϕn(x) existsfor µ-almost every point, as stated.

Lemma 9.3.7. The function Φ = supn ϕn is integrable.

Proof. As in the previous lemma, let us consider the partitions Qn defined byQn(x) = f−1(Pn−1(f(x))). Fix any P ∈ P . Given x ∈ P and t > 0, it is clearthat Φ(x) > t if and only if ϕn(x) > t for some n. Moreover,

ϕn(x) > t ⇔ µ(P ∩Qn(x)) < e−tµ(Qn(x))

and, in that case, ϕn(y) > t for every y ∈ P ∩ Qn(x). Therefore, we may writethe set x ∈ P : Φ(x) > t as a disjoint union ∪j(P ∩ Qj), where each Qjbelongs to some partition Qn(j) and

µ(P ∩Qj) < e−tµ(Qj) for every j.

Consequently, for every t > 0 and P ∈ P ,

µ(x ∈ P : Φ(x) > t) =∑

j

µ(P ∩Qj) < e−t∑

j

µ(Qj) ≤ e−t. (9.3.4)


Then (see Exercise 9.3.1),

∫Φ dµ =

∑

P∈P

∫

P

Φ dµ =∑

P∈P

∫ ∞

0

µ(x ∈ P : Φ(x) > t) dt

≤∑

P∈P

∫ ∞

0

mine−t, µ(P ) dt.

The last integral may be rewritten as follows:

∫ − logµ(P )

0

µ(P ) dt+

∫ ∞

− logµ(P )

e−t dt = −µ(P ) logµ(P ) + µ(P ).

Combining these two relations:∫

Φ dµ ≤∑

P∈P

−µ(P ) logµ(P ) + µ(P ) = Hµ(P) + 1 <∞.

This proves the lemma, since Φ is non-negative.

Lemma 9.3.8. The function ϕ is integrable and (ϕn)n converges to ϕ in L1(µ).

Proof. We saw in Lemma 9.3.6 that (ϕn)n converges to ϕ at µ-almost everypoint. Since 0 ≤ ϕn ≤ Φ for every n, we also have that 0 ≤ ϕ ≤ Φ. Inparticular, ϕ is integrable. Moreover, |ϕ − ϕn| ≤ Φ for every n and, thus, wemay use the dominated convergence theorem (Theorem A.2.11) to conclude that

limn

∫|ϕ− ϕn| dµ =

∫limn

|ϕ− ϕn| dµ = 0.

This proves the convergence in L1(µ).

Lemma 9.3.9. At µ-almost every point and in L1(µ),

limn

1

n

n−2∑

j=0

ϕn−j(fj(x)) = lim

n

1

n

n−2∑

j=0

ϕ(f j(x)).

Proof. By the Birkhoff ergodic theorem (Theorem 3.2.3), the limit on the right-hand side exists at µ-almost every point and in L1(µ), indeed, it is equal tothe time average of the function ϕ. Therefore, it is enough to show that thedifference

1

n

n−2∑

j=0

(ϕn−j − ϕ) f j (9.3.5)

converges to zero at µ-almost every point and in L1(µ). Since the measure µ isinvariant, ‖(ϕn−j − ϕ) f j‖1 = ‖ϕn−j − ϕ‖1 for every j. Hence,

‖ 1

n

n−2∑

j=0

(ϕn−j − ϕ) f j‖1 ≤ 1

n

n−2∑

j=0

‖ϕn−j − ϕ‖1.


By Lemma 9.3.8 the sequence on the right-hand side converges to zero. Thisimplies that (9.3.5) converges to zero in L1(µ). We are left to prove that thesequence converges at µ-almost every point.

For each fixed k ≥ 2, consider Φk = supi>k |ϕi −ϕ|. Note that Φk ≤ Φ and,thus, Φk ∈ L1(µ). Moreover,

1

n

n−2∑

j=0

|ϕn−j − ϕ| f j =1

n

n−k−1∑

j=0

|ϕn−j − ϕ| f j +1

n

n−2∑

j=n−k

|ϕn−j − ϕ| f j

≤ 1

n

n−k−1∑

j=0

Φk f j +1

n

n−2∑

j=n−k

Φ f j .

By the Birkhoff ergodic theorem, the first term on the right-hand side convergesto the time average Φk at µ-almost every point. By Lemma 3.2.5, the last termconverges to zero at µ-almost every point: the lemma implies that n−1Φ fn−iconverges to zero for any fixed i. Hence,

lim supn

1

n

n−2∑

j=0

|ϕn−j − ϕ|(f j(x)) ≤ Φk(x) at µ-almost every point. (9.3.6)

We claim that limk Φk(x) = 0 at µ-almost every point. Indeed, the sequence(Φk)k is non-increasing and, by Lemma 9.3.6, it converges to zero at µ-almostevery point. By the monotone convergence theorem (Theorem A.2.9), it followsthat

∫Φk dµ → 0 when k → ∞. Another consequence is that (Φk)k is non-

increasing. Hence, using the monotone convergence theorem together with theBirkhoff ergodic theorem,

∫limk

Φk dµ = limk

∫Φk dµ = lim

k

∫Φk dµ = 0.

Since Φk is non-negative, it follows that limk Φk = 0 at µ-almost every point,as we claimed. Therefore, (9.3.6) implies that

limn

1

n

n−2∑

j=0

|ϕn−j − ϕ| f j = 0

at µ-almost every point. This completes the proof of the lemma.

It follows from (9.3.2) and Lemmas 9.3.5 and 9.3.9 that

hµ(f,P , x) = limn

− 1

nlogµ(Pn(x))

exists at µ-almost every point and in L1(µ): in fact, it coincides with the timeaverage ϕ(x) of the function ϕ. Then, in particular,

∫hµ(f,P , x) dµ(x) = lim

n

1

n

∫− logµ(Pn(x)) dµ(x)

= limn

1

nHµ(Pn) = hµ(f,P).


Moreover, if (f, µ) is ergodic then h(f,P , x) = ϕ(x) is constant at µ-almostevery point. That is, in that case hµ(f,P , x) = hµ(f,P) for µ-almost everypoint. This closes the proof of Theorem 9.3.1.

9.3.2 Exercises

9.3.1. Check that, for any integrable function ϕ : M → (0,∞),

∫ϕdµ =

∫ ∞

0

µ(x ∈M : ϕ(x) > t) dt.

9.3.2. Use Theorem 9.3.1 to calculate the entropy of a Bernoulli shift in Σ =1, . . . , dN.

9.3.3. Show that the function hµ(f, x) in Theorem 9.3.3 is f -invariant. Con-clude that if (f, µ) is ergodic, then hµ(f) = hµ(f, x) for µ-almost every x.

9.3.4. Suppose that (f, µ) is ergodic and let P be a partition with finite entropy.Show that given ε > 0 there exists k ≥ 1 such that for every n ≥ k there existsBn ⊂ Pn such that

e−n(hµ(f,P)+ε) < µ(B) < e−n(hµ(f,P)−ε) for every B ∈ Bn

and the measure of the union of the elements of Bn is larger than 1 − ε.

9.4 Examples

In this section we illustrate the previous results through a few examples.

9.4.1 Markov shifts

Let Σ = 1, . . . , dN and σ : Σ → Σ be the shift map. Let µ be the Markovmeasure associated with a stochastic matrix P = (Pi,j)i,j and a probabilityvector p = (pi)i. We are going to prove:

Proposition 9.4.1. hµ(σ) =∑d

a=1 pa∑d

b=1 −Pa,b logPa,b.

Proof. Consider the partition P of Σ into cylinders [0; a], a = 1, . . . , d. Foreach n, the iterate Pn is the partition into cylinders [0; a1, . . . , an] of length n.Recalling that µ([0; a1, . . . , an]) = pa1Pa1,a2 · · ·Pan−1,an , we see that

Hµ(Pn) =∑

a1,...,an

−pa1Pa1,a2 · · ·Pan−1,an log(pa1Pa1,a2 · · ·Pan−1,an

)

=∑

a1

−pa1 log pa1

∑

a2,...,an

Pa1,a2 · · ·Pan−1,an

+

n−1∑

j=1

∑

aj ,aj+1

− logPaj ,aj+1

∑pa1Pa1,a2 · · ·Pan−1,an .

(9.4.1)

9.4. EXAMPLES 275

where the last sum is over all the values of a1, . . . , aj−1, aj+2, . . . , an. On theone hand, ∑

a2,...,an

Pa1,a2 · · ·Pan−1,an =∑

an

Pna1,an= 1,

because Pn is a stochastic matrix. On the other hand,

∑pa1Pa1,a2 · · ·Pan−1,an =

∑

a1,an

pa1Pja1,aj

Paj ,aj+1Pn−j−1aj+1,an

=∑

a1

pa1Pja1,aj

Paj ,aj+1 = pajPaj ,aj+1 ,

because Pn−j−1 is a stochastic matrix and pP j = P ∗jp = p. Replacing theseobservations in (9.4.1), we get that

Hµ(Pn) =∑

a1

−pa1 log pa1 +

n−1∑

j=1

∑

aj ,aj+1

−pajPaj ,aj+1 logPaj ,aj+1

=∑

a

−pa log pa + (n− 1)∑

a,b

−paPa,b logPa,b.

It follows that hµ(σ,P) =∑

a,b−paPa,b logPa,b. Since the family of all cylinders[0; a1, . . . , an] generates the σ-algebra of Σ, it follows from Corollary 9.2.5 thathµ(σ) = hµ(σ,P). This completes the proof of theorem.

This conclusion remains valid for two-sided Markov shifts as well, that is,when Σ = 1, . . . , dZ. The argument is analogous, using Corollary 9.2.6.

9.4.2 Gauss map

Now we calculate the entropy of the Gauss map G(x) = (1/x) − [1/x] relativeto the invariant probability measure

µ(E) =1

log 2

∫

E

dx

1 + x, (9.4.2)

which was already studied in Sections 1.3.2 and 4.2.4. The method that we aregoing to present extends to a much broader class of systems, including the ex-panding maps of the interval that are defined and discussed in Example 11.1.16.

Let P be the partition of (0, 1) into the subintervals (1/(m+ 1), 1/m) withm ≥ 1. As before, we denote Pn = ∨n−1

j=0G−j(P). The following facts are used

in what follows:

(A) Gn maps each Pn ∈ Pn diffeomorphically onto (0, 1), for each n ≥ 1.

(B) diamPn → 0 when n→ ∞.

(C) There exists C > 1 such that |(Gn)′(y)|/|(Gn)′(x)| ≤ C for every n ≥ 1and any x and y in the same element of the partition Pn.


(D) There exist c1, c2 > 0 such that c1m(Pn) ≤ µ(Pn) ≤ c2m(Pn) for everyn ≥ 1 and every Pn ∈ Pn, where m denotes the Lebesgue measure.

It is immediate from the definition that each P ∈ P is mapped by G diffeomor-phically onto (0, 1). Property (A) is a consequence, by induction on n. Using(A) and Lemma 4.2.12, we get that

diamPn ≤ supx∈Pn

1

|(Gn)′(x)| ≤ 2−[n/2]

for every n ≥ 1 and every Pn ∈ Pn. This implies (B). Property (C) is given byLemma 4.2.13. Finally, (D) follows directly from (9.4.2).

Proposition 9.4.2. hµ(G) =∫

log |G′| dµ.

Proof. Consider the function ψn(x) = − logµ(Pn(x)), for each n ≥ 1. Observethat

Hµ(Pn) =∑

Pn∈Pn

−µ(Pn) log µ(Pn) =

∫ψn(x) dµ(x).

Property (D) gives that

− log c1 ≥ ψn(x) + logm(Pn(x)) ≥ − log c2.

By property (A), we have logm(Pn(x)) = − log |(Gn)′(y)| for some y ∈ Pn(x).Using property (C), it follows that

− log c1 + logC ≥ ψn(x) − log |(Gn)′(x)| ≥ − log c2 − logC

for every x and every n. Consequently,

− log(c1/C) ≥ Hµ(Pn) −∫

log |(Gn)′| dµ ≥ log(c2C) (9.4.3)

for every n. Since the measure µ is invariant under G,

∫log |(Gn)′| dµ =

n−1∑

j=0

∫log |G′| Gj dµ = n

∫|G′| dµ.

Then, dividing (9.4.3) by n and taking the limit when n→ ∞,

hµ(f,P) = limn

1

nHµ(Pn) =

∫log |G′| dµ.

Now, property (B) ensures that we may apply Corollary 9.2.10 to conclude that

hµ(G) = hµ(G,P) =

∫log |G′| dµ.

This completes the proof of the proposition.

9.4. EXAMPLES 277

The integral in the statement of the proposition may be calculated explicitly:we leave it to the reader to check (using integration by parts and the fact that∑∞j=1 1/j2 = π2/6) that

hµ(G) =

∫log |G′| dµ =

∫ 1

0

−2 logxdx

(1 + x) log 2=

π2

6 log 2≈ 5, 46 . . . .

Then, recalling that (G,µ) is ergodic (Section 4.2.4), it follows from the theoremof Shannon-McMillan-Breiman (Theorem 9.3.1) that

limn

− 1

nlogµ(Pn(x)) =

π2

6 log 2for µ-almost every x.

As the measure µ is comparable to the Lebesgue measure, up to a constantfactor, this means that

diamPn(x) ≈ e−π2n

6 log 2 (up to a factor e±εn)

for µ-almost every x and every n sufficiently large. Observe that Pn(x) is formedby the points y whose continued fraction expansion coincide with the continuedfraction expansion of x up to order n.


Given a real number x > 0, we denote log+ x = maxlogx, 0. In this sectionwe prove the following result:

Proposition 9.4.3. Let fA : Td → Td be the endomorphism induced on thetorus Td by some invertible matrix A with integer coefficients. Let µ be theHaar measure of Td. Then,

hµ(fA) =

d∑

i=1

log+ |λi|

where λ1, . . . , λd are the eigenvalues of A, counted with multiplicity.

Initially, let us suppose that the matrix A is diagonalizable. Let v1, . . . , vdbe a normed basis of Rd such that Avi = λivi for each i. Let u be the numberof eigenvalues of A with absolute value strictly larger than 1. We may take theeigenvalues to be numbered in such a way that |λi| > 1 if and only if i ≤ u.Given x ∈ Td, every point y in a neighborhood of x may be written in the form

y = x+

d∑

i=1

tivi

with t1, . . . , td close to zero. Given ε > 0, denote by D(x, ε) the set of pointsy of this form with |ti| < ε for every i = 1, . . . , d. Moreover, for each n ≥ 1,consider

D(x, n, ε) = y ∈ Td : f jA(y) ∈ D(f jA(x), ε) for every j = 0, . . . , n− 1.


Observe that f jA(y) = f jA(x) +∑di=1 tiλ

ji vi for every n ≥ 1. Therefore,

D(x, n, ε) =x+

d∑

i=1

tivi : |λni ti| < ε for i ≤ u and |ti| < ε for i > u.

Hence, there exists a constant C1 > 1 that depends only on A, such that

C−11 εd

u∏

i=1

|λi|−n ≤ µ(D(x, n, ε)) ≤ C1εdu∏

i=1

|λi|−n

for every x ∈ Td, n ≥ 1 and ε > 0. It is also clear that there exists a constantC2 > 1 that depends only on A, such that

B(x,C−12 ε) ⊂ D(x, ε) ⊂ B(x,C2ε)

for x ∈ Td and ε > 0 small. Then, B(x, n, C−12 ε) ⊂ D(x, n, ε) ⊂ B(x, n, C2ε)

for every n ≥ 1. Combining these two observations and taking C = C1Cd2 , we

get that

C−1εdu∏

i=1

|λ−ni | ≤ µ(B(x, n, ε)) ≤ Cεdu∏

i=1

|λ−ni |

for every x ∈ Td, n ≥ 1 and ε > 0. Then,

h+µ (f, ε, x) = h−µ (f, ε, x) = lim

n

1

nlogµ

(B(x, n, ε)

)=

u∑

i=1

log |λi|

for x ∈ T and ε > 0 small. Hence, using the theorem of Brin-Katok (Theo-rem 9.3.3)

hµ(f) = hµ(f, x) =

u∑

i=1

log |λi|

for µ-almost every point x. This proves Proposition 9.4.3 in the diagonalizablecase.

The general case may be treated analogously, through writing the matrix Ain canonical Jordan form. We leave this task to the reader (Exercise 9.4.2).

9.4.4 Differentiable maps

Here we take M to be a Riemannian manifold (check Appendix A.4.5) andf : M → M to be a local diffeomorphism, that is, a C1 map whose derivativeDf(x) : TxM → Tf(x)M at each point x is an isomorphism. We are going tostate and discuss two important theorems, the Margulis-Ruelle inequality andthe Pesin entropy formula, that relate the entropy hµ(f) of an invariant measureto the Lyapunov exponents of the derivative Df .

Let µ be any probability measure invariant under f . According to the mul-tiplicative ergodic theorem of Oseledets (see Section 3.3.5), for µ-almost every

9.4. EXAMPLES 279

point x ∈ M there exist k(x) ≥ 1, real numbers λ1(x) > · · · > λk(x) and a

filtration Rd = V 1x > · · · > V

k(x)x > V

k(x)+1x = 0 such that

Df(x)V ix = V if(x) and limn

1

nlog ‖Dfn(x)v‖ = λi(x)

for every v ∈ V ix \ V i+1x , every i ∈ 1, . . . , k(x) and µ-almost every x ∈ M .

Moreover, all these objects depend measurably on the point x ∈M . When themeasure µ is ergodic, the number k(x), the Lyapunov exponents λi(x) and theirmultiplicities mi(x) = dimV ix −dimV i+1

x are all constant on a full measure set.Define ρ+(x) to be the sum of all positive Lyapunov exponents, counted with

multiplicity:

ρ+(x) =

k(x)∑

i=1

mi(x)λ+i (x) with λ+

i (x) = maxλi(x), 0.

The Margulis-Ruelle inequality asserts that the average of ρ+ is always an upperbound for the entropy of (f, µ). Proofs can be found in Ruelle [Rue78] andMane [Man87, Section 4.12].

Theorem 9.4.4 (Margulis-Ruelle inequality).

hµ(f) ≤∫ρ+ dµ.

It may happen that all the Lyapunov exponents are positive: that is thecase, for instance, for the expanding differentiable maps in Section 11.1. Then,ρ+(x) is simply the sum of all Lyapunov exponents, counted with multiplicity.Now, it is also part of the theorem of Oseledets (property (c1) in Section 3.3.5)that

k(x)∑

i=1

mi(x)λi(x) = limn

1

nlog | detDfn(x)|

at µ-almost every point. Observe that the right-hand side of this identity is aBirkhoff time average:

limn

1

nlog | detDfn(x)| = lim

n

1

n

n−1∑

j=0

log | detDf |(f j(x)).

So, by the Birkhoff ergodic theorem, the integral of ρ+ coincides with the integralof the function log | detDf |. Thus, in this case the Margulis-Ruelle inequalitybecomes:

hµ(f) ≤∫

log | detDf | dµ.

Another interesting particular case is when f is a diffeomorphism. It followsfrom the version of the Oseledets theorem for invertible maps (also stated inSection 3.3.5) that the Lyapunov exponents of Df−1 are the numbers −λi(x),


with multiplicities mi(x). Then, applying Theorem 9.4.4 to the inverse andrecalling (Proposition 9.1.14) that hµ(f) = hµ(f

−1), we get that

hµ(f) ≤∫ρ− dµ (9.4.4)

where ρ−(x) =∑k(x)i=1 mi(x)max−λi(x), 0.

Now let us suppose that the invariant measure µ is absolutely continuouswith respect to the volume measure associated with the Riemannian structureof M (check Appendix A.4.5). In this case, assuming just a little bit moreregularity, we have a much stronger result:

Theorem 9.4.5 (Pesin entropy formula). Assume that the derivative Df isHolder and the invariant measure µ is absolutely continuous. Then

hµ(f) =

∫ρ+ dµ.

This fundamental result was originally proven by Pesin [Pes77]. See alsoMane [Man87, Section 4.13] for an alternative proof.

The expression for the entropy of the Haar measure of a linear torus en-domorphism, given in Proposition 9.4.3, is a special case of Theorem 9.4.5.Indeed, one can check that the Lyapunov exponents of a linear endomorphismfA at every point coincide with the logarithms log |λi| of the absolute valuesof the eigenvalues of the matrix A, with the same multiplicities. Thus, in thiscontext

ρ+(x) ≡d∑

i=1

log+ |λi|.

Of course, the Haar measure is absolutely continuous. Another special case ofthe Pesin entropy formula will appear in Section 12.1.8, see (12.1.31)–(12.1.36).

Finally, let us point out that the assumption of absolute continuity is toostrong: the conclusion of Theorem 9.4.5 still holds if the invariant measure isjust “absolutely continuous along unstable manifolds”. Roughly speaking, thistechnical condition means that the conditional probabilities of µ with respect toa certain measurable partition, whose elements are unstable disks3 are absolutelycontinuous with respect to the volume measures induced on each of the disksby the Riemannian metric of M . Moreover, and most striking, this sufficientcondition is also necessary for the Pesin entropy formula to hold. For precisestatements, related results and proofs, see [LS82, Led84, LY85a, LY85b].

9.4.5 Exercises

9.4.1. Show that every rotation Rθ : Td → Td has entropy zero with respectto the Haar measure of the torus Td. [Observation: This is a special case of

3Unstable disks are differentiably embedded disks that are contracted exponentially undernegative iteration; in the non-invertible case, the definition is formulated in terms of thenatural extension of the transformation.

9.5. ENTROPY AND EQUIVALENCE 281

Example 9.3.4 but for the present statement we do not need the theorem ofBrin-Katok.]

9.4.2. Complete the proof of Proposition 9.4.3.

9.4.3. Let f : M → M be a measurable transformation and µ be an ergodicprobability measure. Let B ⊂M be a measurable set with µ(B) > 0, g : B → Bbe the first-return map of f to B and ν be the normalized restriction of µ tothe set B (recall Section 1.4.1). Show that hµ(f) = ν(B)hν(g).

9.4.4. Let f : M → M be a measure preserving transformation in a Lebesguespace (M,µ). Let f : M → M be the natural extension of f and µ be the lift

of µ (Exercise 8.5.7). Show that hµ(f) = hµ(f).

9.4.5. Prove that if f is the time-1 of a smooth flow on a surfaceM then hµ(f) =0 for every invariant ergodic measure µ. [Observation: Using Theorem 9.6.2below, it follows that the entropy is zero for every invariant measure.]

9.5 Entropy and equivalence

The notion entropy was originally introduced in Ergodic Theory as a means todistinguish systems that are not ergodically equivalent, especially in the case ofsystems that are spectrally equivalent and, thus, can not be distinguished byspectral invariants. It is easy to see that the entropy is, indeed, an invariant ofergodic equivalence:

Proposition 9.5.1. Let f : M → M and g : N → N be transformationspreserving probability measures µ in M and ν in N , respectively. If (f, µ) isergodically equivalent to (g, ν), then hµ(f) = hν(g).

Proof. Let φ : M → N be an ergodic equivalence between the two systems.This means that φ∗µ = ν and there exist full measure subsets X ⊂ M andY ⊂ N such that φ is a measurable bijection from X to Y , with measurableinverse. Moreover, as observed in Section 8.1, the sets X and Y may be chosento be invariant. Given any partition P of M with finite entropy for µ, let PXbe its restriction to X and QY = φ(PX) be the image of PX under φ. ThenQ = QY ∪Y c is a partition of N and, since X and Y are full measure subsets,

Hν(Q) =∑

Q∈QY

−ν(Q) log ν(Q) =∑

P∈PX

−µ(P ) logµ(P ) = Hµ(P).

Since X and Y are both invariant, PnX is the restriction of Pn to the subset Xand Qn = Qn

Y ∪ Y c for every n. Moreover,

QnY = ∨n−1

j=0 g−j(QY ) = φ(∨n−1

j=0 f−j(PX)) = φ(PnX)

for every n. Thus, the previous argument proves that Hν(Qn) = Hµ(Pn) forevery n and so

hν(g,Q) = limn

1

nHν(Qn) = lim

n

1

nHµ(Pn) = hµ(f,P).


Taking the supremum over P , we conclude that hν(g) ≥ hµ(f). The converseinequality is entirely analogous.

Using this observation, Kolmogorov and Sinai concluded that not all two-sided Bernoulli shifts are ergodically equivalent despite their being spectrallyequivalent, as we saw in Corollary 8.4.12. This also shows that spectral equiv-alence is strictly weaker than ergodic equivalence. In fact, as observed in Exer-cise 9.2.2, for every s > 0 there exists some two-sided Bernoulli shift (σ, µ) suchthat hµ(σ) = s. Therefore, a sole class of spectral equivalence may contain awhole continuum of ergodic equivalence classes.

9.5.1 Bernoulli automorphisms

The converse to Proposition 9.5.1 is false, in general. Indeed, we saw in Exam-ple 9.2.9 (and Corollary 9.2.7) that all the circle rotations have entropy zero.But an irrational rotation is never ergodically equivalent to a rational rotation,since the former is ergodic and the latter is not. Besides, Corollary 8.3.6 showsthat irrational rotations are also not ergodically equivalent to each other, ingeneral. The case of rational rotations is treated in Exercise 8.3.3.

However, a remarkable result due to Donald Ornstein [Orn70] states thatthe entropy is a complete invariant for two-sided Bernoulli shifts:

Theorem 9.5.2 (Ornstein). Two-sided Bernoulli shifts in Lebesgue spaces areergodically equivalent if and only if they have the same entropy.

We call Bernoulli automorphism any system that is ergodically equivalentto a two-sided Bernoulli shift. In the sequel we find several examples of suchsystems. The theorem of Ornstein may be reformulated as follows: two Bernoulliautomorphisms in Lebesgue spaces are ergodically equivalent if and only if theyhave the same entropy.

Let us point out that the theorem of Ornstein does not extend to one-sided Bernoulli. Indeed, Exercise 8.1.2 shows that in the non-invertible casethere are other invariants of ergodic equivalence, including the degree of thetransformation (the number of pre-images).

William Parry and Peter Walters [PW72b, PW72a, Wal73] proved, amongother results, that one-sided Bernoulli shifts corresponding to probability vectorsp = (p1, . . . , pk) and q = (q1, . . . , ql) are ergodically equivalent if and only ifk = l and the vector p is a permutation of the vector q. In Exercise 9.7.7, afterintroducing the notion of Jacobian, we invite the reader to prove this fact.

9.5.2 Systems with entropy zero

In this section we study some properties of systems whose entropy is equal tozero. The main result (Proposition 9.5.5) is that such systems are invertible atalmost every point, if the ambient space is a Lebesgue space. It is worthwhilecomparing this statement with Corollary 9.2.7. At the end of the section webriefly discuss the spectral types of systems with entropy zero.


In what follows (M,B, µ) is a probability space and f : M → M is a measurepreserving transformation. In the second half of the section we take (M B, µ)to be a Lebesgue space.

Lemma 9.5.3. For every ε > 0 there exists δ > 0 such that if P and Q arepartitions with finite entropy and Hµ(P/Q) < δ then for every P ∈ P thereexists a union P ′ of elements of Q satisfying µ(P∆P ′) < ε.

Proof. Let s = 1 − ε/2 and δ = −(ε/2) log s. For each P ∈ P consider

S = Q ∈ Q : µ(P ∩Q) ≥ sµ(Q).

Let P ′ be the union of all the elements of S. On the one hand,

µ(P ′ \ P ) =∑

Q∈S

µ(Q \ P ) ≤∑

Q∈S

(1 − s)µ(Q) ≤ ε

2. (9.5.1)

On the other hand,

Hµ(P/Q) =∑

R∈P

∑

Q∈Q

−µ(R ∩Q) logµ(R ∩Q)

µ(Q)

≥∑

Q/∈S

−µ(P ∩Q) log s = −µ(P \ P ′) log s.

This implies that

µ(P \ P ′) ≤ Hµ(P/Q)

− log s<

δ

− log s=ε

2. (9.5.2)

Putting (9.5.1) and (9.5.2) together, we get the conclusion of the lemma.

The next lemma means that the rate hµ(f,P) of information (relative tothe partition P) generated by system at each iteration is zero if and only if thefuture determines the present, in the sense that information relative to the 0-thiterate may be deduced from the ensemble of information relative to the futureiterates.

Lemma 9.5.4. Let P be a partition with finite entropy. Then, hµ(f,P) = 0 ifand only if P ≺ ∨∞

j=1 f−j(P).

Proof. Suppose that hµ(f,P) is zero. Using Lemma 9.1.12, we obtain thatHµ(P/ ∨nj=1 f

−j(P)) converges to zero when n → ∞. Then, by Lemma 9.5.3,for each l ≥ 1 and each P ∈ P there exist nl ≥ 1 and a union Pl of elements ofthe partition ∨nl

j=1f−j(P) such that µ(P∆Pl) < 2−l. It is clear that every Pl is

a union of elements of ∨∞j=1f

−j(P) and, thus, the same is true for every ∪∞l=nPl

and also for P∗ = ∩∞n=1 ∪∞

l=n Pl. Moreover,

µ(P \ ∪∞l=nPl) = 0 and µ(∪∞

l=nPl \ P ) ≤ 2−n


for every n and, consequently, µ(P∆P∗) = 0. This shows that every element ofP coincides, up to measure zero, with a union of elements of ∨∞

j=1f−j(P), as

claimed in the ‘only if’ half of the statement.The argument to prove the converse is similar to the one in Proposition 9.2.2.

Suppose that P ≺ ∨∞j=1f

−j(P). Write P = Pj : j = 1, 2, . . . and, for each

k ≥ 1, consider the finite partition Pk = P1, . . . , Pk,M \ ∪kj=1Pj. Given anyε > 0, Lemma 9.2.3 ensures that Hµ(P/Pk) < ε/2 for every k sufficiently large.Fix k in these conditions and write P0 = M \ ∪kj=1Pj . For each n ≥ 1 and each

j = 1, . . . , k, let Qni be the union of the elements of ∨nj=1f−j(P) that intersect

Pi. The hypothesis ensures that each (Qni )n is a decreasing sequence whoseintersection coincides with Pi up to measure zero. Then, given δ > 0 thereexists n0 such that

k∑

i=1

µ(Qni \ Pi) < δ for every n ≥ n0. (9.5.3)

Define Rni = Qni \ ∪i−1j=1Q

nj for i = 1, . . . , k and also Rn0 = M \ ∪kj=1Q

nj . It is

clear from the construction that Rn = Rn1 , . . . , Rnk , Rn0 is a partition of Mcoarser than ∨nj=1f

−j(P). Since Rni ⊂ Qni and Pi ⊂ Qni and the elements of Pare pairwise disjoint,

Rni \ Pi ⊂ Qni \ Pi and Pi \Rni = Pi ∩(∪i−1j=1 Q

nj

)⊂ ∪i−1

j=1

(Qnj \ Pj

)

for i = 1, . . . , k. Similarly, Rn0 ⊂ P0 and P0\Rn0 = P0∩∪kj=1Qni = ∪kj=1(Q

nj \Pj).

Therefore, the relation (9.5.3) implies that

µ(Pi∆Rni ) < δ for every i = 0, 1, . . . , k and every n ≥ n0.

Then, assuming that δ > 0 is small, it follows from Lemmas 9.1.5 and 9.1.6 that

Hµ

(Pk/

n∨

j=1

f−j(P))≤ Hµ(Pk/Rn) < ε/2

for every n ≥ n0. Using Exercise 9.1.1, we get that Hµ

(P/∨nj=1 f

−j(P))< ε

for every n ≥ n0. In this way, it is shown that Hµ

(P/∨nj=1 f

−j(P))→ 0. By

Lemma 9.1.12, it follows that hµ(f,P) = 0.

As a consequence, we get that every system with entropy zero is invertibleat almost every point:

Proposition 9.5.5. Let (M,B, µ) be a Lebesgue space and f : M → M be ameasure preserving transformation. If hµ(f) = 0 then (f, µ) is invertible: thereexists a measurable transformation g : M → M that preserves the measure µand satisfies f g = g f = id at µ-almost every point.

Proof. Consider the homomorphism f : B → B induced by f in the measurealgebra of B (these notions were introduced in Section 8.5). Recall that f is


always injective (Exercise 8.5.2). Given any B ∈ B, consider the partitionP = B,Bc. The hypothesis hµ(f) = 0 implies that hµ(f,P) = 0. ByLemma 9.5.4, it follows that P ≺ ∨∞

j=1f−j(P). This implies that P ⊂ f−1(B),

because f−j(P) ⊂ f−1(B) for every j ≥ 1. Varying B, we conclude thatB ⊂ f−1(B). In other words, the homomorphism f is surjective. Hence, fis an isomorphism of measure algebras. Then, by Proposition 8.5.6, there existsa measurable map g : M → M preserving the measure µ and such that thecorresponding homomorphism of measure algebra g : B → B is the inverse of f .In other words, f g = g f = id . Then, (Exercise 8.5.2) f g = g f = id , asclaimed.

These arguments also prove the following fact, which will be useful in awhile:

Corollary 9.5.6. In the conditions of Proposition 9.5.5, every σ-algebra A ⊂ Bthat satisfies f−1(A) ⊂ A up to measure zero also satisfies f−1(A) = A up tomeasure zero.

Exercise 9.1.5 implies that if (f, µ) has entropy zero then the same is truefor any factor. Therefore, the following fact is also an immediate consequenceof the proposition:

Corollary 9.5.7. In the conditions of Proposition 9.5.5, every factor of (f, µ)is invertible at almost every point.

It is not completely understood how entropy relates to the spectrum typeof a system, but there are several partial results, especially for systems withentropy zero.

Rokhlin [Rok67a, § 14] proved that every ergodic system with discrete spec-trum defined in a Lebesgue space has entropy zero. This may also be deducedfrom the fact that, as we mentioned in Section 8.3, every ergodic system withdiscrete spectrum is ergodically isomorphic to a translation in a compact abeliangroup. As we saw in Corollary 8.5.7, in Lebesgue spaces ergodic isomorphismimplies ergodic equivalence. Recall also that systems with discrete spectrum inLebesgue spaces are always invertible (Exercise 8.5.5).

In that same work of Rokhlin it is shown that invertible systems with singularspectrum defined in Lebesgue spaces have entropy zero and the same holds forsystems with Lebesgue spectrum of finite rank (if they exist). The case ofinfinite rank is the focus of the next section. We are going to see that thereare systems with Lebesgue spectrum of infinite rank and entropy zero. On theother hand, we introduce the important class of so-called Kolmogorov systems,for which the entropy is necessarily positive, in a strong sense.

9.5.3 Kolmogorov systems

Let (M,B, µ) be a non-trivial probability space, that is, one such that not allmeasurable sets have measure 0 or 1. We use ∨αUα to denote the σ-algebra


generated by any family of subsets Uα of B. Let f : M →M be a transformationpreserving the measure µ.

Definition 9.5.8. We say that (f, µ) is a Kolmogorov system if there existssome σ-algebra A ⊂ B such that

(a) f−1(A) ⊂ A up to measure zero;

(b)⋂∞n=0 f

−n(A) = ∅,M up to measure zero;

(c)∨∞n=0B ∈ B : f−n(B) ∈ A = B up to measure zero.

We leave it to the reader to check that this property is an invariant of ergodicequivalence (it is not an invariant of spectral equivalence, as will be explainedin a while).

If (f, µ) is Kolmogorov system then (fk, µ) is Kolmogorov system, for everyk ≥ 1. Indeed, if A satisfies condition (a) then the sequence f−j(A) is non-increasing and, in particular, f−k(A) ⊂ A. Then, the conditions (b) and (c)imply that

∞⋂

n=0

f−kn(A) =∞⋂

n=0

f−n(A) = ∅,M and

∞∨

n=0

B ∈ B : f−kn(B) ∈ A =

∞∨

n=0

B ∈ B : f−n(B) ∈ A = B

up to measure zero. We say that (f, µ) is a Kolmogorov automorphism if itis invertible and a Kolmogorov system. Then the inverse (f−1, µ) is also aKolmogorov system, as we will see in a while.

Proposition 9.5.9. Every Kolmogorov system has Lebesgue spectrum of infiniterank. If the σ-algebra B is countably generated then the rank is countable.

Proof. Let A ⊂ B be a σ-algebra satisfying the conditions in Definition 9.5.8.Let E = L2

0(M,A, µ) be the subspace of functions ϕ ∈ L20(M,B, µ) that are

A-measurable, that is, such that the pre-image ϕ−1(B) of every measurable setB ⊂ R is in A up to measure zero. We are going to show that E satisfies all theconditions in Definition 8.4.1.

Start by observing that Uf (L20(M,A, µ)) = L2

0(M, f−1(A), µ). Indeed, it isclear that if ϕ is A-measurable then Ufϕ = ϕ f is f−1(A)-measurable. Theinclusion ⊂ follows immediately. Conversely, given any B ∈ f−1(A), take A ∈ Asuch that B = f−1(A) and let c = µ(A) = µ(B). Then, XB − c = Uf(XA − c)is in Uf (L

20(M,A, µ)). This gives the other inclusion. So, the hypothesis that

f−1(A) ⊂ A up to measure zero ensures that Uf (E) ⊂ E.It also follows that Unf (L2

0(M,A, µ)) = L20(M, f−n(A), µ) for every n ≥ 0.

Then,∞⋂

n=0

Unf(L2

0(M,A, µ))

= L20

(M,∩∞

n=0f−n(A), µ

).


Hence, the hypothesis that ∩∞n=0f

−n(A) = 0,M up to measure zero impliesthat ∩∞

n=0Unf (E) = 0.

Now consider An = B ∈ B : f−n(B) ∈ A). The sequence (An)n is non-decreasing, because f−1(A) ⊂ A. Moreover, each ϕ is An-measurable if andonly if Unf ϕ = ϕ fn is A-measurable. This shows that U−n

f (L20(M,A, µ)) =

L20(M,An, µ) for every n ≥ 0. Observe also that

∞∑

n=0

L20

(M,An, µ

)= L2

0

(M,∨∞

n=0An, µ). (9.5.4)

Indeed, it is clear that L20(M,Ak, µ) ⊂ L2

0(M,∨∞n=0An, µ) for every k, since Ak

is contained in ∨∞n=0An. The inclusion ⊂ in (9.5.4) is an immediate consequence

of this observation, since L20(M,∨∞

n=0An, µ) is a Banach space. Now considerany A ∈ ∨∞

n=0An. The approximation theorem (Theorem A.1.19) implies thatfor each ε > 0 there exist n and An ∈ An such that µ(A∆An) < ε. Then,(XAn)n converges to XA in the L2-norm, and so XA ∈ ∑∞

n=0 L20(M,An, µ).

This proves the inclusion ⊃ in (9.5.4). In this way, we have shown that

∞∑

n=0

U−nf (L2

0(M,A, µ)) = L20

(M,∨∞

n=0An, µ).

Therefore, the hypothesis that ∨∞n=0An = B up to measure zero implies that∑∞

n=0 U−nf (E) = L2

0(M,B, µ).This concludes the proof that (f, µ) has Lebesgue spectrum. To prove that

the rank is infinite, we need the following lemma:

Lemma 9.5.10. Let A be any σ-algebra satisfying the conditions in Defini-tion 9.5.8. Then, for every A ∈ A with µ(A) > 0 there exists B ⊂ A such that0 < µ(B) < µ(A).

Proof. Suppose that A has any element A with positive measure that does notsatisfy the conclusion of the lemma. We claim that A ∩ f−k(A) has measurezero for every k ≥ 1. Then,

µ(f−i(A) ∩ f−j(A)) = µ(A ∩ f−j+i(A)) = 0 for every 0 ≤ i < j.

Since µ(f−j(A)) = µ(A) for every j ≥ 0, this implies that the measure µ isinfinite, which is a contradiction. This contradiction reduces the proof of thelemma to proving our claim.

To do that, note that condition (a) implies that f−k(A) ∈ f−k(A) ⊂ A.Then it follows from the choice of A that A ∩ f−k(A) must have either zeromeasure or full measure in A:

µ(A ∩ f−k(A)) = 0 or else µ(A \ f−k(A)) = 0.

So, to prove the claim it suffices to exclude the second possibility. Supposethat µ(A \ f−k(A)) = 0. Then (Exercise 1.1.4), there exists B ∈ A such that


µ(A∆B) = 0 and f−k(B) = B. It follows that B = f−nk(B) for every n ≥ 1and, thus,

B ∈⋂

n∈N

f−nk(A) =⋂

n∈N

f−n(A).

By condition (b), this means that the measure of B is either 0 or 1. Sinceµ(B) = µ(A) is positive, it follows that µ(A) = µ(B) = 1. Then, the hypothesisabout A implies that the σ-algebra A contains only sets with measure 0 or 1.By condition (c), it follows that the same is true for the σ-algebra B, whichcontradicts the assumption that the probability space is non-trivial.

On the way toward proving that the orthogonal complement F = E⊖Uf (E)has infinite dimension, let us start by checking that F 6= 0. Indeed, otherwisewe would have Uf (E) = E and, thus, Unf (E) = E for every n ≥ 1. By condition(b), that would imply that E = ∩nUnf (E) = 0. Then, by condition (c), we

would have L20(M,B, µ) = 0 and that would contradict the hypothesis that

the probability space is non-trivial.

Let ϕ be any non-zero element of F , fixed once and for all, and let N bethe set of all x ∈ M such that ϕ(x) 6= 0. Then, N ∈ A and µ(N) > 0.It is convenient to consider the space E′ = L2(M,A, µ) = E ⊕ constants.Observe that F coincides with E′ ⊖ Uf (E

′), because the Koopman operatorpreserves the line of constant functions. Let E′

N be the subspace of functionsψ ∈ E′ that vanish outside N , that is, such that ψ(x) = 0 for every x ∈ N c.By Lemma 9.5.10, we may find sets Aj ∈ A, j ≥ 1 with positive measure,contained in N and pairwise disjoint. Then, XAj is in E′

N for every j. Moreover,Ai ∩ Aj = ∅ yields XAi · XAj = 0 for every i 6= j. This implies that E′

N hasinfinite dimension.

Now denote by Uf(E′)N the subspace of functions ψ ∈ Uf (E

′) that vanishoutside N . Denote FN = E′

N ⊖ Uf(E′)N . The fact that dimE′

N = ∞ ensuresthat dimFN = ∞ or dimUf (E

′)N = ∞ (or both). We are going to show thatany of these alternatives implies that dimF = ∞.

To treat the first alternative, it suffices to show that FN ⊂ F . For that, sinceit is clear that FN ⊂ E′, it suffices to check that FN is orthogonal to Uf (E

′).Consider any ξ ∈ FN and η ∈ E′. The function (Ufη)XN = Uf (ηXf−1(N))is in Uf (E

′) and vanishes outside N . In other words, it is in Uf (E′)N . Then

ξ · Ufη = ξ · (Ufη)XN = 0, because ξ vanishes outside N and is orthogonal toUf (E

′)N . This completes the argument in this case.Now we treat the second alternative. Given any Ufη ∈ Uf(E

′)N and anyn ∈ N, let ηn = ηXRn with Rn = x ∈ M : |η(x)| ≤ n. Then (ηn)n is a se-quence of bounded functions converging to η in E′. Moreover, every ηn vanishesoutside f−1(N), because η does. Then, (Ufηn)n is a sequence of bounded func-tions that vanish outside N and, recalling that Uf is an isometry, this sequenceconverges to Ufη in E′. This proves that the subspace of bounded functions isdense in Uf (E

′)N . Then, since we are assuming that dimUf (E′)N = ∞, this

subspace must also have infinite dimension. Choose ξk : k ≥ 1 ⊂ E′ suchthat Ufξk : k ≥ 1 is a linearly independent subset of Uf (E

′)N consisting of


bounded functions. Observe that the products ϕ(Ufξk), k ≥ 1 form a linearlyindependent subset of E′. Moreover, given any η ∈ E′,

ϕ(Ufξk) · (Ufη) =

∫ϕ (ξk f) (η f) dµ =

∫ϕ (ξkη) f dµ = ϕ · Uf (ξkη).

This last expression is equal to zero because ξkη ∈ E′ and the function ϕ ∈ F isorthogonal to Uf (E

′). Varying η ∈ E′, we conclude that ϕ(Ufξk) is orthogonalto Uf (E

′) for every k. This shows that ϕ(Ufξk) : k ≥ 1 is contained in Fand, thus, dimF = ∞ also in this case.

This completes the proof that (f, µ) has infinite rank. When B is count-ably generated, L2

0(M,B, µ) is separable (Example 8.4.7) and so the rank isnecessarily countable.

We say that a partition of (M,B, µ) is trivial if all its elements have measure0 or 1. Keep in mind that in the present chapter all partitions are assumed tobe countable.

Proposition 9.5.11. A system (f, µ) in a Lebesgue space is a Kolmogorovsystem if and only if hµ(f,P) > 0 for every non-trivial partition with finiteentropy. In particular, every Kolmogorov system has positive entropy.

This result is due to Pinsker [Pin60] and to Rokhlin, Sinai [RS61]. The proofmay also be found in the article of Rokhlin [Rok67a, § 13]. Let us point out,however, that the last part of the statement is an immediate consequence of theideas in Section 9.5.2. Indeed, suppose that (f, µ) is a Kolmogorov system withzero entropy. By Corollary 9.5.6, any σ-algebra A that satisfies condition (a) inDefinition 9.5.8 also satisfies f−1(A) = A up to measure zero. Then, condition(b) implies that A is trivial and, by condition (c), the σ-algebra B itself is trivial(contradicting the assumption we made at the beginning of this section).

It follows from Proposition 9.5.11 and the relation (9.1.21) that the inverseof a Kolmogorov automorphism is also a Kolmogorov automorphism. Unlikewhat happens for Bernoulli automorphisms (Exercise 9.5.1), in the Kolmogorovcase the two systems (f, µ) and (f−1, µ) need not be ergodically equivalent.

Example 9.5.12. The first example of an invertible system with countableLebesgue spectrum that is not a Kolmogorov system was found by Girsanov in1959, but was never published. Another example, a factor of a certain Gaussianshift (recall Example 8.4.13) with countable Lebesgue spectrum but whose en-tropy vanishes, was exhibited a few years later by Newton, Parry [NP66]. Also,Gurevic [Gur61] proved that the horocyclic flow on surfaces with constant neg-ative curvature has entropy zero; sometime before, Parasyuk [Par53] had shownthat such flows have countable Lebesgue spectrum.

As we saw in Theorem 8.4.11, all the systems with countable Lebesgue spec-trum are spectrally equivalent. Therefore, the fact that systems as in Exam-ple 9.5.12 do exist has the interesting consequence that being a Kolmogorovsystem is not an invariant of spectral equivalence.


Example 9.5.13. We saw in Examples 8.4.2 and 8.4.3 that all the Bernoullishifts have Lebesgue spectrum. In both cases, one-sided and two-sided, weexhibited subspaces of L2

0(M,B, µ) of the form E = L20(M,A, µ) for some σ-

algebra A ⊂ B. Therefore, the same argument proves that every Bernoullishift is a Kolmogorov system. In particular, every Bernoulli automorphism is aKolmogorov automorphism.

There exist Kolmogorov automorphisms that are not Bernoulli automor-phisms. The first example, found by Ornstein, is quite elaborate. The followingsimple construction is due to Kalikow [Kal82]:

Example 9.5.14. Let σ : Σ → Σ be the shift map in Σ = 1, 2Z and µ bethe Bernoulli measure associated with the probability vector p = (1/2, 1/2).Consider the map f : Σ × Σ → Σ × Σ defined as follows:

f((xn)n, (yn)n

)=(σ((xn)n), σ±1((yn)n)

)

where the sign is − if x0 = 1 and is + if x0 = 2. This map preserves the productmeasure µ× µ. Kalikow showed that (f, µ) is a Kolmogorov automorphism butnot a Bernoulli automorphism.

Consider any Kolmogorov automorphism that is not a Bernoulli automor-phism and let s > 0 be its entropy. Consider any Bernoulli automorphismwhose entropy is equal to s (see Exercise 9.2.2). The two systems have the sameentropy but they can not be ergodically equivalent, since being a Bernoulliautomorphism is an invariant of ergodic equivalence. Therefore, the entropy isnot a complete invariant of ergodic equivalence for Kolmogorov automorphisms.Actually (see Ornstein, Shields [OS73]), there exists a non-countable family ofKolmogorov automorphisms that have the same entropy and, yet, are not er-godically equivalent.

The properties of Bernoulli automorphisms described in Exercise 9.5.1 do notextend to the Kolmogorov case: there exist Kolmogorov automorphisms thatare not ergodically equivalent to their inverses (see Ornstein, Shields [OS73])and there are also Kolmogorov automorphisms that admit no k-th root for anyvalue of k ≥ 1 (Clark [Cla72]).

Closing this section, let us discuss the Kolmogorov property for two specificclasses of systems: Markov shifts and automorphisms of compact groups.

Concerning the first class, Friedman and Ornstein [FO70] proved that ev-ery two-sided mixing Markov shift is a Bernoulli automorphism. Recall (The-orem 7.2.11) that a Markov shift is mixing if and only if the correspondingstochastic matrix is aperiodic. It follows from the theorem of Friedman andOrnstein that the entropy is still a complete invariant of ergodic equivalence inthe context of two-sided mixing Markov shifts. Another interesting consequenceis that every two-sided mixing Markov shift is a Kolmogorov automorphism.Observe, however, that this consequence admits a relatively easy direct proof(see Exercise 9.5.4).

As for the second class, every ergodic automorphism of a compact group isa Kolmogorov automorphism. This was proven by Rokhlin [Rok67b] for abelian


groups and by Yuzvinskii [Yuz68] in the general case. In fact, ergodic automor-phisms of metrizable compact groups are Bernoulli automorphisms (Lind [Lin77]and Miles, Thomas [MT78]). In particular, every ergodic linear automorphismof the torus Td is a Bernoulli automorphism; this had been proved by Katznel-son [Kat71]. Recall (Theorem 4.2.14) that a linear automorphism fA is ergodicif and only if no eigenvalue of the matrix A is a root of unity.

9.5.4 Exact systems

We say that a Kolmogorov system is exact if one may take the σ-algebra A inDefinition 9.5.8 to be the σ-algebra B of all measurable sets. Note that in thiscase the conditions (a) and (c) are automatically satisfied. Therefore, a system(f, µ) is exact if and only if the σ-algebra B is such that ∩∞

n=0f−n(B) is trivial,

meaning that it only contains sets with measure 0 or 1. Equivalently, (f, µ) isexact if and only if

∞⋂

n=0

Unf(L2

0(M,B, µ))

= 0.

This observation also implies that, unlike the Kolmogorov property, exactnessis an invariant of spectral equivalence.

We saw in Example 8.4.2 that every one-sided Bernoulli shift has Lebesguespectrum. In order to do that, we considered the subspace E = L2

0(M,B, µ).Therefore, the same argument proves that every one-sided Bernoulli shift is anexact system. A much larger class of examples, expanding maps endowed withtheir equilibrium states, is studied in Chapter 12.

It is immediate that invertible systems are never exact: in the invertible casef−n(B) = B up to measure zero, for every n; therefore, the exactness conditioncorresponds to saying that the σ-algebra B is trivial (which is excluded, byhypothesis).

Figure 9.2 summarizes the relations between the different classes of systemsstudied in this book. It is organized in 3 columns: systems with zero entropy(which are necessarily invertible, as we saw in Proposition 9.5.5), invertiblesystems with positive entropy and non-invertible systems.

9.5.5 Exercises

9.5.1. Show that if (f, µ) is a Bernoulli automorphism then it is ergodicallyequivalent to its inverse (f−1, µ). Moreover, for every k ≥ 1 there exists aBernoulli automorphism (g, ν) that is a k-th root of (f, µ), that is, such that(gk, ν) is ergodically equivalent to (f, µ). [Observation: Ornstein [Orn74] provedthat, conversely, every k-th root of a Bernoulli automorphism is a Bernoulliautomorphism.]

9.5.2. Use the notion of density point to show that the decimal expansion mapf(x) = 10x− [10x] is exact, relative to the Lebesgue measure.


B2

aut. Bernoulli

sist. Kolmogorov

espec. de Lebesgue

sist. misturadores

sist. ergodicos

B1

sist. exatos

nao invertıvel

espec. discreto

h = 0 h > 0, invertıvel

RT

Figure 9.2: Relations between various of classes of systems (B1/B2 = one-sided/two-sided Bernoulli shifts, RT = irrational rotations on tori)

9.5.3. Show that the Gauss map is exact, relative to its absolutely continuousinvariant measure µ.

9.5.4. Show that the two-sided Markov shift associated with any aperiodicstochastic matrix P is a Kolmogorov automorphism.

9.5.5. Show that the one-sided Markov shift associated with any aperiodicstochastic matrix P is an exact system.

9.5.6. Prove that if (f, µ) is exact then hµ(f,P) > 0 for every non-trivialpartition P with finite entropy.

9.6 Entropy and ergodic decomposition

It is not difficult to show that the entropy hµ(f) is always an affine function ofthe invariant measure µ:

Proposition 9.6.1. Let µ and ν be probability measures invariant under atransformation f : M → M . Then, htµ+(1−t)ν(f) = thµ(f) + (1 − t)hν(f) forevery 0 < t < 1.

Proof. Define φ(x) = −x log x for x > 0. On the one hand, since the function φis concave,

φ(tµ(B) + (1 − t)ν(B)) ≥ tφ(µ(B)) + (1 − t)φ(ν(B))

for every measurable set B ⊂M . On the other hand, given any measurable set

9.6. ENTROPY AND ERGODIC DECOMPOSITION 293

B ⊂M ,

φ(tµ(B) + (1 − t)ν(B)

)− tφ

(µ(B)

)− (1 − t)φ

(ν(B)

)

= −tµ(B) logtµ(B) + (1 − t)ν(B)

µ(B)− (1 − t)ν(B) log

tµ(B) + (1 − t)ν(B)

ν(B)

≤ −tµ(B) log t− (1 − t)ν(B) log(1 − t)

because the function − log is decreasing. Therefore, given any partition P withfinite entropy,

Htµ+(1−t)ν(P) ≥ tHµ(P) + (1 − t)Hν(P) and

Htµ+(1−t)ν(P) ≤ tHµ(P) + (1 − t)Hν(P) − t log t− (1 − t) log(1 − t).

Consequently,

htµ+(1−t)ν(f,P) = thµ(f,P) + (1 − t)hν(f,P). (9.6.1)

It follows, immediately, that htµ+(1−t)ν(f) ≤ thµ(f) + (1 − t)hν(f). Moreover,the relations (9.1.16) and (9.6.1) imply that

htµ+(1−t)ν(f,P1 ∨ P2) ≥ thµ(f,P1) + (1 − t)hν(f,P2)

for any partitions P1 and P2. Taking the supremum on P1 and P2 we obtainthat htµ+(1−t)ν(f) ≥ thµ(f) + (1 − t)hν(f).

In particular, given any invariant set A ⊂M , we have that

hµ(f) = µ(A)hµA(f) + µ(Ac)hµAc (f), (9.6.2)

where µA and µAc denote the normalized restrictions of µ to the set A andits complement, respectively (this fact was obtained before, in Exercise 9.1.4).Another immediate consequence is the following version of Proposition 9.6.1 forfinite convex combinations:

µ =

n∑

i=1

tiµi ⇒ hµ(f) =

n∑

i=1

tihµi(f), (9.6.3)

for any invariant probability measures µ1, . . . , µn and any positive numberst1, . . . , tn with

∑ni=1 ti = 1.

A much deeper fact, due to Konrad Jacobs [Jac60, Jac63], is that the affinityproperty extends to the ergodic decomposition given by Theorem 5.1.3:

Theorem 9.6.2 (Jacobs). Suppose that M is a complete separable metric space.Given any invariant probability measure µ, let µP : P ∈ P be its ergodicdecomposition. Then, hµ(f) =

∫hµP (f) dµ(P ) (if one side is infinite, so is the

other side).

We are going to deduce this result from a general theorem about affinefunctionals in the space of probability measures, that we state in Section 9.6.1and prove in Section 9.6.2.


9.6.1 Affine property

Let M be a complete separable metric space. We saw in Lemma 2.1.3 thatthe weak∗ topology in the space of probability measures M1(M) is metrizable.Moreover (Exercise 2.1.3), the metric space M1(M) is separable.

Let W be a probability measure on the Borel σ-algebra of M1(M). Thebarycenter of W is the probability measure bar(W ) ∈ M1(M) given by

∫ψ dbar(W ) =

∫ ( ∫ψ dη

)dW (η) (9.6.4)

for every bounded measurable function ψ : M → R. We leave it to the reader tocheck that this relation determines bar(W ) uniquely (Exercise 9.6.1) and thatthe barycenter is an affine function of the measure (Exercise 9.6.2).

Example 9.6.3. If W is a Dirac measure, that is, if W = δν for some ν ∈M1(M), then bar(W ) = ν. Using Exercise 9.6.2, we get the following general-ization: if W =

∑∞i=1 tiδνi with ti ≥ 0 and

∑∞i=1 ti = 1 and νi ∈ M1(M) for

every i, then bar(W ) =∑∞

i=1 tiνi.

Example 9.6.4. Let µP : P ∈ P be the ergodic decomposition of a proba-bility measure µ invariant under a measurable transformation f : M → M andµ be the associated quotient measure in P (recall Section 5.1.1). Let W = Ξ∗µbe the image of the quotient measure µ under the map Ξ : P → M that assignsto each P ∈ P the conditional probability µP . Then (Exercise 5.1.4),

∫ψdµ =

∫ ( ∫ψ dµP

)dµ(P ) =

∫ ( ∫ψ dη

)dW (η)

for every bounded measurable function ψ : M → R. This means that µ is thebarycenter of W .

A set M ⊂ M1(M) is said to be strongly convex if∑∞i=1 tiνi ∈ M for any

νi ∈ M and ti ≥ 0 with∑∞

i=1 ti = 1.

Theorem 9.6.5. Let M be a strongly convex subset of M1(M) and H : M → Rbe a non-negative affine functional. If H is upper semi-continuous then

H(bar(W )) =

∫H(η) dW (η)

for any probability measure W on M1(M) with W (M) = 1 and bar(W ) ∈ M.

Before proving this result, let us explain how Theorem 9.6.2 may be obtainedfrom it. The essential step is the following lemma:

Lemma 9.6.6. hµ(f,Q) =∫hµP (f,Q) dµ(P ) for any finite partition Q of M .

Proof. Let M = M1(f), the subspace of invariant probability measures, andH : M → R be the functional defined by H(η) = hη(f,Q). Let W be theimage of the quotient measure µ by the map Ξ : P → M that assigns to each


P ∈ P the conditional probability µP . It is clear that M is strongly convex,W (M) = 1 and (recall Example 9.6.4) the barycenter bar(W ) = µ is in M.Proposition 9.6.1 shows that H is affine and it is clear that H is non-negative.In order to apply Theorem 9.6.5, we also need to check that H is upper semi-continuous.

Initially, suppose that f is the shift map in a space Σ = XN, where X is afinite set, and Q is the partition of Σ into cylinders [0; a], a ∈ X . The pointwith this partition is that its elements are both open and closed subsets of Σ.In other words, ∂Q = ∅ for every Q ∈ Q. By Proposition 9.2.12, it follows thatthe map η 7→ H(η) = hη(f,Q) is upper semi-continuous at every point of M.So, we may indeed apply Theorem 9.6.5 to the functional H . In this way weget that

hµ(f,Q) = H(µ) = H(bar(W )) =

∫H(η) dW (η)

=

∫H(µP ) dµ(P ) =

∫hµP (f,Q) dµ(P ).

Now we treat the general case of the lemma, by reduction to the previousparagraph. Given any finite partition Q, let Σ = QN and

h : M → Σ, h(x) =(Q(fn(x))

)n∈N

.

Observe that h f = σ h, where σ : Σ → Σ denotes the shift map. To eachmeasure η on M we may assign the measure η′ = h∗η on Σ. The previousrelation ensures that if η is invariant under f then η′ is invariant under σ.Moreover, if η is ergodic for f then η′ is ergodic for σ. Indeed, if B′ ⊂ Σ isinvariant under σ then B = h−1(B′) is invariant under σ. Assuming that η isergodic, it follows that η′(B′) = η(B) is either 0 or 1; hence, η′ is ergodic.

By construction, Q = h−1(Q′), where Q′ denotes the partition of Σ intothe cylinders [0;Q], Q ∈ Q. More generally, ∨n−1

j=0 f−j(Q) = h−1(∨n−1

j=0 σ−j(Q′))

and, thus,

Hη

( n−1∨

j=0

f−j(Q))

= Hη′( n−1∨

j=0

σ−j(Q′))

for every n ∈ N. Dividing by n and taking the limit,

hη(f,Q) = hη′(σ,Q′) for every η ∈ M. (9.6.5)

Denote µ′ = h∗µ and µ′P = h∗(µP ) for each P . For every bounded measurable

function ψ : Σ → R,∫ψ dµ′ =

∫(ψ h) dµ =

∫ ( ∫(ψ h) dµP

)dµ(P )

=

∫ ( ∫ψ dµ′

P

)dµ(P ).

(9.6.6)

As the measures µ′P are ergodic, the relation (9.6.6) means that µ′

P : P ∈ Pis an ergodic decomposition of µ′. Then, according to the previous paragraph,


hµ′(σ,Q′) =∫hµ′

P(σ,Q′) dµ(P ). By the relation (9.6.5) applied to η = µ and

to η = µP , this may be rewritten as

hµ(σ,Q) =

∫hµP (σ,Q) dµ(P ),

which is precisely what we wanted to prove.

Proceeding with the proof of Theorem 9.6.2, consider any increasing sequenceQ1 ≺ · · · ≺ Qn ≺ · · · of finite partitions of M such that diamQn(x) convergesto zero at every x ∈M (such a sequence may be constructed, for instance, froma family of balls centered at the points of a countable dense subset, with radiiconverging to zero). By Lemma 9.6.6,

hµ(f,Qn) =

∫hµP (f,Qn) dµ(P ) (9.6.7)

for every n. According to (9.1.16), the sequence hη(f,Qn) is non-decreasing,for any invariant measure η. Moreover, by Corollary 9.2.5, its limit is equal tohη(f). Then, we may pass to the limit in (9.6.7) with the aid of the monotoneconvergence theorem. In this way we get that

hµ(f) =

∫hµP (f) dµ(P ),

as we wanted to prove. Note that the argument remains valid even when eitherof the two sides of this identity is infinite (then the other one is also infinite).

In this way, we reduced the proof of Theorem 9.6.2 to proving Theorem 9.6.5.

9.6.2 Proof of the Jacobs theorem

Now we prove Theorem 9.6.5. Let us start by proving that the barycenterfunction has the following continuity property: if W is concentrated in a neigh-borhood V of a given measure ν then the barycenter of W is close to ν. Moreprecisely:

Lemma 9.6.7. Let W be a probability measure on M1(M) and ν ∈ M1(M).Given any finite set Φ = φ1, . . . , φN of bounded continuous functions andany ε > 0, let V = V (ν,Φ, ε) be as defined in (2.1.1). If W (V ) = 1, thenbar(W ) ∈ V .

Proof. Consider any i = 1, . . . , N . By the definition of barycenter and thehypothesis that the complement of V has measure zero,

∣∣∫φi dbar(W ) −

∫φi dν

∣∣ =∣∣∫ ( ∫

φi dη)dW (η) −

∫ ( ∫φi dν

)dW (η)

∣∣

≤∫

V

∣∣(∫φi dη −

∫φi dν

)∣∣ dW (η).


By the definition of V , the last expression is smaller than ε. Therefore,

|∫φi dbar(W ) −

∫φi dν| < ε

for every i = 1, . . . , N . In other words, bar(W ) ∈ V .

We also use the following simple property of non-negative affine functionals:

Lemma 9.6.8. For any non-negative affine functional H : M → R, probabilitymeasures νi ∈ M, i ≥ 1 and non-negative numbers ti, i ≥ 1 with

∑∞i=1 ti = 1,

H(

∞∑

i=1

tiνi) ≥∞∑

i=1

tiH(νi).

Proof. Define sn =∑n

i=1 ti for every n ≥ 1. Let Rn = (1 − sn)−1∑i>n tiνi if

sn < 1; otherwise, pick Rn arbitrarily. Then,

∞∑

i=1

tiνi =

n∑

i=1

tiνi + (1 − sn)Rn.

Since H is affine and the expression on the right-hand side is a (finite) convexcombination, it follows that

H(

∞∑

i=1

tiνi) =

n∑

i=1

tiH(νi) + (1 − sn)H(Rn) ≥n∑

i=1

tiH(νi)

for every n. Now just make n go to infinity.

Corollary 9.6.9. If H : M → R is a non-negative affine functional then H isbounded.

Proof. Suppose thatH is not bounded: there exist νi ∈ M such that H(νi) ≥ 2i

for every i ≥ 1. Consider ν =∑∞

i=1 2−iνi. By Lemma 9.6.8,

H(ν) ≥∞∑

i=1

2−iH(νi) = ∞.

This contradicts the fact that H(ν) is finite.

Now we are ready to prove the inequality ≥ in Theorem 9.6.5. Let us writeµ = bar(W ). By the hypothesis of semi-continuity, given any ε > 0 there existδ > 0 and a finite family Φ = φ1, . . . , φN of bounded continuous functionssuch that

H(η) < H(µ) + ε for every η ∈ M∩ V (µ,Φ, δ). (9.6.8)

Since M1(M) is a separable metric space, it admits a countable basis of opensets, and then so does any subspace. Let V1, . . . , Vn, . . . be a basis of opensets of M, with the following properties:


(i) every Vn is contained in M∩ V (νn,Φ, δ) for some νn ∈ M;

(ii) and H(η) < H(νn) + ε for every η ∈ Vn.

Consider the partition P1, . . . , Pn, . . . of the space M defined by P1 = V1 andPn = Vn \ (V1 ∪ · · · ∪ Vn−1) for every n > 1. It is clear that the properties (i)and (ii) remain valid if we replace Vn by Pn. We claim that

∑

n

W (Pn)νn ∈ V (µ,Φ, δ). (9.6.9)

Indeed, observe that

∣∣∫φi dµ−

∑

n

W (Pn)

∫φi dνn

∣∣ =∣∣∑

n

∫

Pn

( ∫φi dη −

∫φi dνn) dW (η)

∣∣

for every i. Therefore, property (i) ensures that

∣∣∫φi dµ−

∑

n

W (Pn)

∫φi dνn

∣∣ <∑

n

δW (Pn) = δ for every i,

which is precisely what (9.6.9) means. Then, combining (9.6.8), (9.6.9) andLemma 9.6.8,

∑

n

W (Pn)H(νn) ≤ H(∑

n

W (Pn)νn) < H(µ) + ε.

On the other hand, property (ii) implies that∫H(η) dW (η) −

∑

n

W (Pn)H(νn) =∑

n

∫

Pn

(H(η) −H(νn)

)dWµ(η)

<∑

n

εW (Pn) = ε.

Adding the last two inequalities, we get that∫H(η) dW (η) < H(µ)+ 2ε. Since

ε > 0 is arbitrary, this implies that H(µ) ≥∫H(η) dW (η).

Now we prove the inequality ≤ in Theorem 9.6.5. Consider any sequence(Pn)n of finite partitions of M such that, for every ν ∈ M, the diameter of Pn(ν)converges to zero when n goes to infinity. For example, Pn = ∨ni=1Vi, V ci ,where Vn : n ≥ 1 is any countable basis of open sets of M. For each fixed n,consider the normalized restriction WP of the measure W to each set P ∈ Pn(we consider only sets with positive measure: the union of all the elements of∪nPn with W (P ) = 0 has measure zero and so may be neglected):

WP (A) =W (A ∩ P )

W (P )for each measurable set A ⊂M.

It is clear thatW =∑P∈Pn

W (P )WP . Since the barycenter is an affine function(Exercise 9.6.2), it follows that

bar(W ) =∑

P∈Pn

W (P ) bar(WP )


and, therefore,

H(bar(W )) =∑

P∈Pn

W (P )H(bar(WP )).

Define Hn(η) = H(bar(WPn(η))), for each η ∈ M. Then, the last identity abovemay be rewritten as follows:

H(bar(W )) =

∫Hn(η) dW (η) for every n. (9.6.10)

It follows directly from the definition of Hn that 0 ≤ Hn(η) ≤ supH forevery n and every η. Recall that supH < ∞ (Corollary 9.6.9). We also claimthat

lim supn

Hn(η) ≤ H(η) for every η ∈ M. (9.6.11)

This may be seen as follows. Given any neighborhood V = V (η,Φ, ε) of η, wehave that Pn(η) ⊂ V for every large n, because the diameter of Pn(η) convergesto zero. Then, assuming always that W (Pn(η)) is positive,

WPn(η)(V ) ≥WPn(η)(Pn(η)) = 1.

By Lemma 9.6.7, it follows that bar(WPn(η)) ∈ V for every large n. Now (9.6.11)is a direct consequence of the hypothesis that H is upper semi-continuous.

Applying the lemma of Fatou to the sequence −Hn+supH , we deduce that

lim supn

∫Hn(η) dW (η) ≤

∫lim sup

nHn(η) dW (η) ≤

∫H(η) dW (η). (9.6.12)

Combining the relations (9.6.10) and (9.6.12), we get that

H(bar(W )) ≤∫H(η) dW (η),

as we wanted to prove.Now the proof of Theorems 9.6.2 and 9.6.5 is complete.

9.6.3 Exercises

9.6.1. Check that, given any probability measure W on M1(M), there existsa unique probability measure bar(W ) ∈ M1(M) on M that satisfies (9.6.4).

9.6.2. Show that the barycenter function is strongly affine: if Wi, i ≥ 1 areprobability measures on M1(M) and ti, i ≥ 1 are non-negative numbers with∑∞i=1 ti = 1, then

bar(

∞∑

i=1

tiWi) =

∞∑

i=1

ti bar(Wi).

9.6.3. Show that if M ⊂ M1(M) is a closed convex set then M is stronglyconvex. Moreover, in that case W (M) = 1 implies that bar(W ) ∈ M.


9.6.4. Check that the inequality ≥ in Theorem 9.6.2 may also be obtainedthrough the following, more direct, argument:

1. Recalling that the function φ(x) = −x log x is concave, show thatHµ(Q) ≥∫HµP (Q) dµ(P ) for every finite partition Q.

2. Deduce that hµ(f,Q) ≥∫hµP (f,Q) dµ(P ) for every finite partition Q.

3. Conclude that hµ(f) ≥∫hµP (f) dµ(P ).

9.6.5. The inequality ≤ in Theorem 9.6.2 is based on the fact that hµ(f,Q) ≤∫hµP (f,Q) dµ(P ) for every finite partition Q, which is part of Lemma 9.6.6.

Find what is wrong with the following “alternative proof”:Let Q be a finite partition. The theorem of Shannon-McMillan-Breiman

ensures that hµ(f,Q) =∫hµ(f,Q, x) dµ(x), where

hµ(f,Q, x) = limn

− 1

nlogµ(Qn(x)) = lim

n− 1

nlog

∫µP (Qn(x)) dµ(P ).

By the Jensen inequality applied to the convex function ψ(x) = − logx,

limn

− 1

nlog

∫µP (Qn(x)) dµ(P ) ≤ lim

n

∫− 1

nlogµP (Qn(x)) dµ(P ).

Using the fact that hµP (f,Q) = hµP (f,Q, x) at almost every point (because themeasure µP is ergodic),

limn

∫− 1

nlogµP (Qn(x)) dµ(P ) =

∫limn

− 1

nlog µP (Qn(x)) dµ(P )

=

∫hµP (f,Q) dµ(P ).

This shows that hµ(f,Q, x) ≤∫hµP (f,Q) dµ(P ) for every finite partition Q

and µ-almost every x. Consequently, hµ(f,Q) ≤∫hµP (f,Q) dµ(P ) for every

finite partition Q.

9.7 Jacobians and the Rokhlin formula

Let U be an open subset and m be the Lebesgue measure of Rd. Let f : U → Ube a local diffeomorphism. By the formula of change of variables,

m(f(A)) =

∫

A

| detDf(x)| dx (9.7.1)

for any measurable subset A of a small ball restricted to which f is injective.The notion of Jacobian that we present in this section extends this kind ofrelation to much more general transformations and measures. Besides intro-ducing this concept, we are going to show that Jacobians do exist under quite

9.7. JACOBIANS AND THE ROKHLIN FORMULA 301

general hypotheses. Most important, it is possible to express the system’s en-tropy explicitly in terms of the Jacobian. Actually, we already encountered aninteresting manifestation of this fact in Proposition 9.4.2.

Let f : M → M be a measurable transformation. We say that f is locallyinvertible if there exists some countable cover Uk : Uk ≥ 1 ofM by measurablesets such that f(Uk) is a measurable set and the restriction f | Uk : Uk → f(Uk)is a bijection with measurable inverse, for every k ≥ 1. Every measurable subsetof some Uk is called an invertibility domain. Note that the image f(A) of anyinvertibility domain A is a measurable set. It is also clear that if f is locallyinvertible then the pre-image f−1(y) of any y ∈ M is countable: it contains atmost one point in each Uk.

Let η be a probability measure on M , not necessarily invariant under f . Ameasurable function ξ : M → [0,∞) is a Jacobian of f with respect to η if therestriction of ξ to any invertibility domain A is integrable with respect to η andsatisfies

η(f(A)) =

∫

A

ξ dη. (9.7.2)

It is important to note (see Exercise 9.7.1) that the definition does not dependon the choice of the family Uk : k ≥ 1.

Example 9.7.1. Let σ : Σ → Σ be the shift map in Σ = 1, 2, . . . , dN and µbe the Bernoulli measure associated with a probability vector p = (p1, . . . , pd).The restriction of σ to each cylinder [0; a] is an invertible map. Moreover, givenany cylinder [0; a, a1, . . . , an] ⊂ [0; a],

µ(σ([0; a, a1, . . . , an])

)= pa1 · · · pan =

1

paµ([0; a, a1, . . . , an]

).

We invite the reader to deduce that µ(σ(A)

)= (1/pa)µ(A) for every measurable

set A ⊂ [0; a]. Therefore, the function ξ((xn)n) = 1/px0 is a Jacobian of σ withrespect to µ.

We say that a measure η is non-singular with respect to the transformationf : M →M if the image of any invertibility domain with measure zero also hasmeasure zero: if η(A) = 0 then η(f(A)) = 0. For example, if f : U → U is alocal diffeomorphism of an open subset of Rd and η is the Lebesgue measure,then η is non-singular. It is also easy to see that every invariant probabilitymeasure is non-singular.

It follows immediately from the definition (9.7.2) that if f admits a Jacobianwith respect to a measure η then this measure is non-singular. The converse isalso true:

Proposition 9.7.2. Let f : M →M be a locally invertible transformation andη be a measure on M , non-singular with respect to f . Then, there exists someJacobian of f with respect to η and it is essentially unique: any two Jacobianscoincide at η-almost every point.


Proof. We start by proving existence. Given any countable cover Uk : k ≥ 1 ofM by invertibility domains of f , define P1 = U1 and Pk = Uk \ (U1∪· · ·∪Uk−1)for each k > 1. Then, P = Pk : k ≥ 1 is a partition of M formed byinvertibility domains. For each Pk ∈ P , denote by ηk the measure defined onPk by ηk(A) = η(f(A)). Equivalently, ηk is the image under (f | Pk)−1 of themeasure η restricted to f(Pk). The hypothesis that η is non-singular impliesthat every ηk is absolutely continuous with respect to η restricted to Pk:

η(A) = 0 ⇒ ηk(A) = η(f(A)) = 0

for every measurable setA ⊂ Pk. Let ξk = dηk/d(η | Pk) be the Radon-Nikodymderivative (Theorem A.2.18). Then, ξk is a function defined on Pk, integrablewith respect to η and satisfying

η(f(A)) = ηk(A) =

∫

A

ξk dη (9.7.3)

for every measurable set A ⊂ Pk. Consider the function ξ : M → [0,∞) whoserestriction to each Pk ∈ P is given by ξk. Every subset of Uk may be writtenas a (disjoint) union of subsets of P1, . . . , Pk. Applying (9.7.3) to each one ofthese subsets and summing the corresponding equalities, we get that

η(f(A)) =

∫

A

ξ dη for every measurable set A ⊂ Uk and k ≥ 1.

This proves that ξ is a Jacobian of f with respect to η.Now suppose that ξ and ζ are Jacobians of f with respect to η and there

exists B ⊂ M with η(B) > 0 such that ξ(x) 6= ζ(x) for every x ∈ B. Up toreplacing B by a suitable subset, and exchanging the roles of ξ and ζ if necessary,we may suppose that ξ(x) < ζ(x) for every x ∈ B. Similarly, we may supposethat B is contained in some Uk. Then,

η(f(B)) =

∫

B

ξ dη <

∫

B

ζ dη = η(f(B)).

This contradiction proves that the Jacobian is essentially unique.

From now on, we denote by Jηf the (essentially unique) Jacobian of a locallyinvertible transformation f : M → M with respect to a measure η, when itexists.

By definition, Jηf is integrable on each invertibility domain. If f is suchthat the number of pre-images of any y ∈ M is bounded then the Jacobian is(globally) integrable: if ℓ ≥ 1 is the maximum number of pre-images then

∫Jηf dη =

∑

k

∫

Pk

Jηf dη =∑

k

η(f(Pk)) ≤ ℓ,

because every point y ∈M is in no more than ℓ images f(Pk).


The following observation will be useful in the sequel. Let Z ⊂M be the setof points where the Jacobian Jηf vanishes. Covering Z with a countable familyof invertibility domains and using (9.7.2), we see that f(Z) is a measurable setand η(f(Z)) = 0. In other words, the set of points y ∈M such that Jµf(x) > 0for every x ∈ f−1(y) has total measure for η. When the probability measure ηis invariant under f , it also follows that η(f−1(f(Z))) = η(f(Z)) = 0 and soη(Z) = 0.

The main result in this section is the following formula for the entropy of aninvariant measure:

Theorem 9.7.3 (Rokhlin formula). Let f : M → M be a locally invertibletransformation and µ be a probability measure invariant under f . Assume thatthere is some partition P with finite entropy such that ∪nPn generates the σ-algebra of M , up to measure zero, and every P ∈ P is an invertibility domainof f . Then, hµ(f) =

∫log Jµf dµ.

Proof. Let us consider the sequence of partitions Qn = ∨nj=1f−j(P). By Corol-

lary 9.2.5 and Lemma 9.1.12,

hµ(f) = hµ(f,P) = limnHµ(P/Qn). (9.7.4)

By definition (as before, φ(x) = −x log x)

Hµ(P/Qn) =∑

P∈P

∑

Qn∈Qn

−µ(P ∩Qn) logµ(P ∩Qn)µ(Qn)

=∑

P∈P

∑

Qn∈Qn

µ(Qn)φ(µ(P ∩Qn)

µ(Qn)

).

(9.7.5)

Let en(ψ, x) be the conditional expectation of a function ψ with respect to thepartition Qn and e(ψ, x) be its limit when n goes to infinity (these notions wereintroduced in Section 5.2.1: see (5.2.1) and Lemma 5.2.1). It is clear from thedefinition that

µ(P ∩Qn)µ(Qn)

= en(XP , x) for every x ∈ Qn and every Qn ∈ Qn.

Therefore,

∑

P∈P

∑

Qn∈Qn

µ(Qn)φ(µ(P ∩Qn)

µ(Qn)

)=∑

P∈P

∫φ(en(XP , x)) dµ(x). (9.7.6)

By Lemma 5.2.1, the limit e(XP , x) = limn en(XP , x) exists at µ-almost everyx. So, observing that the function φ is bounded, we may use the dominatedconvergence theorem to deduce from (9.7.4) – (9.7.6) that

hµ(f) =∑

P∈P

∫φ(e(XP , x)) dµ(x). (9.7.7)


Now we need to relate the expression inside the integral with the Jacobian. Thiswe do by means of Lemma 9.7.5 below. Beforehand, let us prove the followingchange of variables formulas:

Lemma 9.7.4. For any probability measure η non-singular with respect to f ,and any invertibility domain A ⊂M of f :

(a)∫f(A)

ϕdη =∫A(ϕ f)Jηf dη for any measurable function ϕ : f(A) → R

such that the integrals are defined (possibly ±∞).

(b)∫A ψ dη =

∫f(A)(ψ/Jηf) (f | A)−1 dη for any measurable function ψ :

A→ R such that the integrals are defined (possibly ±∞).

Proof. The definition (9.7.2) means that the formula in part (a) holds for thecharacteristic function ϕ = Xf(A) for any invertibility domain A. Thus, it holdsfor the characteristic function of any measurable subset of f(A), since such asubset may be written as f(B) for some invertibility domain B ⊂ A. Hence, bylinearity, the identity extends to every simple function defined on f(A). Usingthe monotone convergence theorem, we conclude that the identity holds forevery non-negative measurable function. Using linearity once more, we get thegeneral statement of part (a).

To deduce the claim in (b), apply (a) to the function ϕ = (ψ/Jηf)(f | A)−1.Note that this function is well defined at η-almost every point for, as observedbefore, Jηf(x) > 0 for every x in the pre-image of η-almost every y ∈M .

Lemma 9.7.5. For every bounded measurable function ψ : M → R and everyprobability measure η invariant under f ,

e(ψ, x) = ψ(f(x)) for η-almost every x, where ψ(y) =∑

z∈f−1(y)

ψ

Jηf(z).

Proof. Recall that Qn = ∨nj=1f−j(P). We also use the sequence of partitions

Pn = ∨n−1j=0 f

−j(P). Observe that Qn(x) = f−1(Pn−1(f(x))) and Pn(x) =P(x) ∩ Qn(x) for every n and every x. Then,

∫

Pn−1(f(x))

ψ dη =∑

P∈P

∫

f(P )∩Pn−1(f(x))

ψ

Jηf (f | P )−1 dη.

Using the formula of change of variables in Lemma 9.7.4(b), the expression onthe right-hand side may be rewritten as

∑

P∈P

∫

P∩Qn(x)

ψ(z) dη(z) =

∫

Qn(x)

ψ dη.

Therefore, ∫

Pn−1(f(x))

ψ dη =

∫

Qn(x)

ψ dη. (9.7.8)


Let e′n−1(ψ, x) be the conditional expectation of ψ with respect to the partition

Pn−1, as defined in Section 5.2.1, and let e′(ψ, x) be its limit when n goes toinfinity, given by Lemma 5.2.1. The hypothesis that η is invariant gives thatη(Pn−1(f(x))

)= η

(Qn(x)

). Dividing both sides of (9.7.8) by this number, we

get that

e′n−1(ψ, f(x)) = en(ψ, x) for every x and every n > 1. (9.7.9)

Then, taking the limit, e′(ψ, f(x)) = e(ψ, x) for η-almost every x. On the other

hand, according to Exercise 5.2.3, the hypothesis implies that e′(ψ, y) = ψ(y)for η-almost every y ∈M .

Let us apply this lemma to ψ = XP and η = µ. Since f is injective on everyelement of P , each intersection P ∩ f−1(y) either is empty or contains exactlyone point. Therefore, it follows from Lemma 9.7.5 that e(XP , x) = XP (f(x)),with

XP (y) =

1/Jµf

((f | P )−1(y)

)if y ∈ f(P )

0 if y /∈ f(P ).

Then, recalling that the measure µ is assumed to be invariant,∫φ(e(XP , x)) dµ(x) =

∫φ(XP (y)) dµ(y)

=

∫

f(P )

( 1

Jµflog Jµf

) (f | P )−1 dµ =

∫

P

log Jµf dµ

(the last step uses the identity in part (b) of Lemma 9.7.4). Replacing thisexpression in (9.7.7), we get that

hµ(f) =∑

P∈P

∫

P

log Jµf dµ =

∫log Jµf dµ,

as stated in the theorem.

9.7.1 Exercises

9.7.1. Check that the definition of Jacobian does not depend on the choice ofthe cover Uk : k ≥ 1 by invertibility domains.

9.7.2. Let σ : Σ → Σ be the shift map in Σ = 1, 2, . . . , dN and µ be theMarkov measure associated with an aperiodic matrix P . Find the Jacobian off with respect to µ.

9.7.3. Let f : M → M be a locally invertible transformation and η be aprobability measure on M , non-singular with respect to f . Show that for everybounded measurable function ψ : M → R,

∫ψ dη =

∫ ∑

z∈f−1(x)

ψ

Jηf(z)dη(x).


9.7.4. Let f : M → M be a locally invertible transformation and η be aprobability measure on M , non-singular with respect to f . Show that η isinvariant under f if and only if

∑

z∈f−1(x)

1

Jηf(z)= 1 for η-almost every x ∈M .

Moreover, if η is invariant under f then Jηf ≥ 1 at µ-almost every point.

9.7.5. Let f : M → M be a locally invertible transformation and η be aprobability measure on M , non-singular with respect to f . Show that, for everyk ≥ 1, there exists a Jacobian of fk with respect to η and it is given by

Jηfj(x) =

k−1∏

j=0

Jηf(f j(x)) for η-almost every x.

Assuming that f is invertible, what can be said about the Jacobian of f−1 withrespect to η?

9.7.6. Let f : M → M and g : N → N be locally invertible transformationsand let µ and ν be probability measures invariant under f and g, respectively.Assume that there exists an ergodic equivalence φ : M → N between the systems(f, µ) and (g, ν). Show that Jµf = Jνg φ at µ-almost every point.

9.7.7. Let σk : Σk → Σk and σl : Σl → Σl be the shift maps in Σk = 1, . . . , kN

and Σl = 1, . . . , lN. Let µk and µl be the Bernoulli measures on Σk andΣl, respectively, associated with probability vectors p = (p1, . . . , pk) and q =(q1, . . . , ql). Show that the systems (σk, µk) and (σl, µl) are ergodically equiva-lent if and only if k = l and the vectors p and q are obtained from one anotherby permutation of the components.

Chapter 10

Variational principle

In 1965, the IBM researchers R. Adler, A. Konheim and M. McAndrew pro-posed [AKM65] a notion of topological entropy, inspired by the Kolmogorov-Sinai entropy, which we studied in the previous chapter, but whose definitiondoes not involve any invariant measure. This notion applies to any continuoustransformation in a compact topological space.

Subsequently, Efim Dinaburg [Din70] and Rufus Bowen [Bow71, Bow75a]gave a different, yet equivalent, definition for continuous transformations in com-pact metric spaces. Despite being a bit more restrictive, the Bowen-Dinaburgdefinition has the advantage of making more transparent the meaning of thisconcept: the topological entropy is the rate of exponential growth of the numberof orbits that can be distinguished within a certain precision, arbitrarily small.Moreover, Bowen extended the definition to non-compact spaces, which is alsovery useful for applications.

These definitions of topological entropy and their properties are studied inSection 10.1 where, in particular, we observe that the topological entropy is aninvariant of topological equivalence (topological conjugacy). In Section 10.2 weanalyze several concrete examples.

The main result is the following remarkable relation between the topologicalentropy and the entropies of the transformation with respect to its invariantmeasures:

Theorem 10.1 (Variational principle). If f : M → M is a continuous trans-formation in a compact metric space then its topological entropy h(f) coincideswith the supremum of the entropies hµ(f) of f with respect to all the invariantprobability measures.

This theorem was proven by Dinaburg [Din70, Din71], Goodman [Goo71a]and Goodwin [Goo71b]. Here, it arises as a special case of a more general state-ment, the variational principle for the pressure, which is due to Walters [Wal75].

The pressure P (f, φ) is a weighted version of the topological entropy h(f),where the “weights” are determined by a continuous function φ : M → R, whichwe call potential. We study these notions and their properties in Section 10.3.

307

308 CHAPTER 10. VARIATIONAL PRINCIPLE

The topological entropy corresponds to the special case when the potential isidentically zero. The notion of pressure was brought from Statistical Mechan-ics to Ergodic Theory by the Belgium mathematician and theoretical physicistDavid Ruelle, one of the founders of differentiable ergodic theory, and was thenextended by the British mathematician Peter Walters.

The variational principle (Theorem 10.1) extends to the setting of the pres-sure, as we are going to see in Section 10.4:

P (f, φ) = suphµ(f) +

∫φdµ : µ is invariant under f

(10.0.1)

for every continuous function φ : M → R. An invariant probability measure µis called an equilibrium state for the potential φ if it realizes the supremum in(10.0.1), that is, if hµ(f) +

∫φdµ = P (f, φ). The set of all equilibrium states

is studied in Section 10.5.

10.1 Topological entropy

Initially, we present the definitions of Adler-Konheim-McAndrew and Bowen-Dinaburg and we prove that they are equivalent when the ambient is a compactmetric space.

10.1.1 Definition via open covers

The original definition of the topological entropy is very similar to that of theKolmogorov-Sinai entropy, with open covers in the place of partitions into mea-surable sets.

Let M be a compact topological space. An open cover of M is any familyα of open sets whose union is the whole M . By compactness, every open coveradmits a subcover (that is, a subfamily that is still an open cover) with finitelymany elements. We call entropy of the open cover α the number

H(α) = logN(α), (10.1.1)

where N(α) is the smallest number such that α admits some finite subcoverwith that number of elements.

Given two open covers α and β, we say that α is coarser than β (or β is finerthan α), and we write α ≺ β, if every element of β is contained in some elementof α. For example, if β is a subcover of α then α ≺ β. By Exercise 10.1.1,

α ≺ β ⇒ H(α) ≤ H(β). (10.1.2)

Given open covers α1, . . . , αn, we denote by α1 ∨ · · · ∨αn their sum, that is, theopen cover whose elements are the intersections A1 ∩ · · · ∩An with Aj ∈ αj foreach j. Note that αj ≺ α1 ∨ · · · ∨ αn for every j.

Let f : M →M be a continuous transformation. If α is an open cover of Mthen so is f−j(α) = f−j(A) : A ∈ α. For each n ≥ 1, let us denote

αn = α ∨ f−1(α) ∨ · · · ∨ f−n+1(α).

10.1. TOPOLOGICAL ENTROPY 309

Using Exercise 10.1.2, we see that

H(αm+n) = H(αm ∨ f−m(αn)) ≤ H(αm) +H(f−m(αn)) ≤ H(αm) +H(αn)

for every m,n ≥ 1. In other words, the sequence H(αn) is subadditive. Conse-quently, (Lemma 3.3.4),

h(f, α) = limn

1

nH(αn) = inf

n

1

nH(αn) (10.1.3)

always exists and is finite. It is called entropy of f with respect to the opencover α. The relation (10.1.2) implies that

α ≺ β ⇒ h(f, α) ≤ h(f, β). (10.1.4)

Finally, we define the topological entropy of f to be

h(f) = suph(f, α) : α is an open cover of M. (10.1.5)

In particular, if β is a subcover of α then h(f, α) ≤ h(f, β). Therefore, thedefinition (10.1.5) does not change when one restricts the supremum to thefinite open covers.

Observe that the entropy h(f) is a non-negative number, possibly infinite(see Exercise 10.1.6).

Example 10.1.1. Let f : S1 → S1 be any homeomorphism (for example, arotationRθ) and let α be an open cover of the circle formed by a finite number ofopen intervals. Let ∂α be the set consisting of the endpoints of those intervals.For each n ≥ 1, the open cover αn is formed by intervals whose endpoints arein

∂αn = ∂α ∪ f−1(∂α) ∪ · · · ∪ f−n+1(∂α).

Note that #αn ≤ #∂αn ≤ n#∂α. Therefore,

h(f, α) = limn

1

nH(αn) ≤ lim inf

n

1

nlog #αn ≤ lim inf

n

1

nlog n = 0.

Proposition 10.1.12 below gives that h(f) = limk h(f, αk) for any sequence ofopen covers αk with diamαk → 0. Then, considering open covers αk by intervalsof length less than 1/k, we conclude from the previous calculation that h(f) = 0for every homeomorphism of the circle.

Example 10.1.2. Let Σ = 1, . . . , dN and α be the cover of Σ by the cylinders[0; a], a = 1, . . . , d. Consider the shift map σ : Σ → Σ. For each n, the opencover αn consists of the cylinders of length n:

αn = [0; a0, . . . , an−1] : aj = 1, . . . , d.

Therefore, H(αn) = log #αn = log dn and, consequently, h(f, α) = log d. Ob-serve also that diamαn converges to zero when n → ∞, relative to the dis-tance defined by (A.2.7). Then, it follows from Corollary 10.1.13 below thath(f) = h(f, α) = log d. The same holds for the two-sided shift σ : Σ → Σ inΣ = 1, . . . , dZ.


Now we show that the topological entropy is an invariant of topologicalequivalence. Let f : M → M and g : N → N be continuous transformations incompact topological spaces M and N . We say that g is a topological factor off if there exists a surjective continuous map θ : M → N such that θ f = g θ.When θ may be chosen to be invertible (a homeomorphism), we say that thetwo transformations are topologically equivalent, or topologically conjugate, andwe call θ a topological conjugacy between f and g.

Proposition 10.1.3. If g is a topological factor of f then h(g) ≤ h(f). Inparticular, if f and g are topologically equivalent then h(f) = h(g).

Proof. Let θ : M → N be a surjective continuous map such that θ f = g θ.Given any open cover α of N , the family

θ−1(α) = θ−1(A) : A ∈ α

is an open cover ofM . Recall that, by definition, the iterated sum αn is the opencover formed by the sets

⋂n−1j=0 g

−j(Aj) with A0, A1, . . . , An−1 ∈ α. Analogously,

the iterated sum θ−1(α)n consists of the sets⋂n−1j=0 f

−j(θ−1(Aj)

). Clearly,

n−1⋂

j=0

f−j(θ−1(Aj)

)=

n−1⋂

j=0

θ−1(g−j(Aj)

)= θ−1

( n−1⋂

j=0

g−j(Aj)).

Noting that the sets of the form on the right-hand side of this identity constitutethe pre-image θ−1(αn) of αn, we conclude that θ−1(αn) = θ−1(α)n. Since θ issurjective, a family γ ⊂ αn covers N if and only if θ−1(γ) covers M . Therefore,

H(θ−1(α)n) = H(θ−1(αn)) = H(αn).

Since n is arbitrary, it follows that h(f, θ−1(α)) = h(g, α). Then, taking thesupremum over all the open covers α of N :

h(g) = supαh(g, α) = sup

αh(f, θ−1(α)) ≤ h(f).

This proves the first part of the proposition. The second part is an immediateconsequence, since in that case f is also a factor of g.

The converse to Proposition 10.1.3 is false, in general. For example, all thehomeomorphisms of the circle have topological entropy equal to zero (recall Ex-ample 10.1.1) but they are not necessarily topologically equivalent (for example,the identity is not topologically equivalent to any other homeomorphism).

10.1.2 Generating sets and separated sets

Next, we present the definition of topological entropy of Bowen-Dinaburg. Letf : M →M be a continuous transformation in a metric spaceM , not necessarilycompact, and let K ⊂M be any compact subset. When M is compact it sufficesto consider K = M , as observed in (10.1.12) below.


Given ε > 0 and n ∈ N, we say that a set E ⊂ M (n, ε)-generates K, iffor every x ∈ K there exists a ∈ E such that d(f i(x), f i(a)) < ε for everyi ∈ 0, . . . , n− 1. In other words,

K ⊂⋃

a∈E

B(a, n, ε),

where B(a, n, ε) = x ∈ M : d(f i(x), f i(a)) < ε for i = 0, . . . , n − 1 is thedynamical ball of center a, length n and radius ε. Note that B(x, n, ε) : x ∈ Kis an open cover of K. Hence, by compactness, there always exist finite (n, ε)-generating sets.

Let us denote by gn(f, ε,K) the smallest cardinality of an (n, ε)-generatingset of K. We define

g(f, ε,K) = lim supn

1

nlog gn(f, ε,K). (10.1.6)

Observe that the function ε 7→ g(f, ε,K) is monotone non-increasing. Indeed,it is clear from the definition that if ε1 < ε2 then every (n, ε1)-generating setis also (n, ε2)-generating. Therefore, gn(f, ε1,K) ≥ gn(f, ε2,K) for every n ≥ 1and, passing to the limit, g(f, ε1,K) ≥ g(f, ε2,K). This ensures, in particular,that

g(f,K) = limε→0

g(f, ε,K) (10.1.7)

exists. Finally, we define

g(f) = supg(f,K) : K ⊂M compact. (10.1.8)

We also introduce the following dual notion. Given ε > 0 and n ∈ N,we say that a set E ⊂ K is (n, ε)-separated if, given x, y ∈ E, there existsj ∈ 0, . . . , n− 1 such that d(f j(x), f j(y)) ≥ ε. In other words, if x ∈ E thenB(x, n, ε) contains no other point of E. We denote by sn(f, ε,K) the largestcardinality of an (n, ε)-separated set. We define

s(f, ε,K) = lim supn

1

nlog sn(f, ε,K). (10.1.9)

It is clear that if 0 < ε1 < ε2, then every (n, ε2)-separated set is also (n, ε1)-separated. Therefore, sn(f, ε1,K) ≥ sn(f, ε2,K) for every n ≥ 1 and, passingto the limit, s(f, ε1,K) ≥ s(f, ε2,K). In particular,

s(f,K) = limε→0

s(f, ε,K) (10.1.10)

always exists. Finally, we define

s(f) = sups(f,K) : K ⊂M compact. (10.1.11)

It is clear that g(f,K1) ≤ g(f,K2) and s(f,K1) ≤ s(f,K2) if K1 ⊂ K2. Inparticular,

g(f) = g(f,M) and s(f) = s(f,M) if M is compact. (10.1.12)


Another interesting observation (Exercise 10.1.7) is that the definitions (10.1.8)and (10.1.11) are not affected when we restrict the supremum to compact setswith small diameter.

Proposition 10.1.4. We have g(f,K) = s(f,K) for every compact K ⊂ M .Consequently, g(f) = s(f).

Proof. For the proof we need the following lemma:

Lemma 10.1.5. gn(f, ε,K) ≤ sn(f, ε,K) ≤ gn(f, ε/2,K) for every n ≥ 1,every ε > 0 and every compact K ⊂M .

Proof. Let E ⊂ K be an (n, ε)-separated set with maximal cardinality. Givenany y ∈ K \E, the set E ∪y is not (n, ε)-separated, and so there exists x ∈ Esuch that d(f i(x), f i(y)) < ε for every i ∈ 0, . . . , n− 1. This shows that E isan (n, ε)-generating set of K. Consequently, gn(f, ε,K) ≤ #E = sn(f, ε,K).

To prove the other inequality, let E ⊂ K be an (n, ε)-separated set andF ⊂ M be an (n, ε/2)-generating set of K. The hypothesis ensures that, givenany x ∈ E there exists some y ∈ F such that d(f i(x), f i(y)) < ε/2 for everyi ∈ 0, . . . , n − 1. Let φ : E → F be a map such that each φ(x) is a point ysatisfying this condition. We claim that the map φ is injective. Indeed, supposethat x, z ∈ E are such that φ(x) = y = φ(z). Then,

d(f i(x), f i(z)) ≤ d(f i(x), f i(y)) + d(f i(y), f i(z)) < ε/2 + ε/2

for every i ∈ 0, . . . , n− 1. Since E is (n, ε)-separated, this implies that x = z.Therefore, φ is injective, as we claimed. It follows that #E ≤ #F and, since Eand F are arbitrary, this proves that sn(f, ε,K) ≤ gn(f, ε/2,K).

Then, given any ε > 0 and any compact K ⊂M ,

g(f, ε,K) = lim supn

1

nlog gn(f, ε,K)

≤ lim supn

1

nlog sn(f, ε,K) = s(f, ε,K)

≤ lim supn

1

nlog gn(f,

ε

2,K) = g(f,

ε

2,K).

(10.1.13)

Taking the limit when ε→ 0, we get that

g(f,K) = limε→0

g(f, ε,K) ≤ limε→0

s(f, ε,K) = s(f,K)

≤ limε→0

g(f,ε

2,K) = g(f,K).

This proves the first part of the proposition. The second part is an immediateconsequence.

By definition, the diameter of an open cover α of a metric space M is thesupremum of the diameters of all the sets A ∈ α.


Proposition 10.1.6. If M is a compact metric space then h(f) = g(f) = s(f).

Proof. By Proposition 10.1.4, it suffices to show that s(f) ≤ h(f) ≤ g(f).Start by fixing ε > 0 and n ≥ 1. Let E ⊂ M be an (n, ε)-separated set and

α be any open cover of M with diameter less than ε. If x and y are in the sameelement of αn then

d(f i(x), f i(y)) ≤ diamα < ε for every i = 0, . . . , n− 1.

In particular, each element of αn contains at most one element of E. Conse-quently, #E ≤ N(αn). Taking E with maximal cardinality, we conclude thatsn(f, ε,M) ≤ N(αn) for every n ≥ 1. So,

s(f, ε,M) = lim supn

1

nlog sn(f, ε,M)

≤ limn

1

nlogN(αn) = h(f, α) ≤ h(f).

(10.1.14)

Making ε→ 0, we find that s(f) = s(f,M) ≤ h(f).Next, given any open cover α of M , let ε > 0 be a Lebesgue number for

α, that is, a positive number such that every ball of radius ε is contained insome element of α. Let E ⊂ M be an (n, ε)-generating set of M with minimalcardinality. For each x ∈ E and i = 0, . . . , n− 1, there exists Ax,i ∈ α such thatB(f i(x), ε) is contained in Ax,i. Then,

B(x, n, ε) ⊂n−1⋂

i=0

f−i(Ax,i).

Therefore, the hypothesis that E is a generating set implies that the familyγ = ∩n−1

i=0 f−i(Ax,i) : x ∈ E is an open cover of M . Since γ ⊂ αn, it follows

that N(αn) ≤ #E = gn(f, ε,M) for every n. Therefore,

h(f, α) = limn

1

nlogN(αn) ≤ lim inf

n

1

nlog gn(f, ε,M)

≤ lim supn

1

nlog gn(f, ε,M) = g(f, ε,M).

(10.1.15)

Making ε→ 0, we get that h(f, α) ≤ g(f,M) = g(f). Since the open cover α isarbitrary, it follows that h(f) ≤ g(f).

We define the topological entropy of a continuous transformation f : M →Min a metric space M to be g(f) = s(f). Proposition 10.1.6 shows that this defi-nition is compatible with the one we gave in Section 10.1.1 for transformationsin compact topological spaces. A relevant difference is that, while for compactspaces the topological entropy depends only on the topology (because h(f) isdefined solely in terms of the open sets), in the non-compact case the topologi-cal entropy may depend also on the distance function in M . In this regard, seeExercises 10.1.4 and 10.1.5. They also show that in the non-compact case thetopological entropy is no longer an invariant of topological conjugacy, althoughit remains an invariant of uniformly continuous conjugacy.


Example 10.1.7. Assume that f : M → M does not expand distances, that is,that d(f(x), f(y)) ≤ d(x, y) for every x, y ∈ M . Then, the topological entropyof f is equal to zero. Indeed, the hypothesis implies that B(x, n, ε) = B(x, ε)for every n ≥ 1. Hence, a set E is (n, ε)-generating if and only if it is (1, ε)-generating. In particular, the sequence gn(f, ε,K) does not depend on n and,hence, g(f, ε,K) = 0 for every ε > 0 and every compact set K. Making ε → 0and taking the supremum over K we get that g(f) = 0 (analogously, s(f) = 0).

There are two important special cases: contractions, such that there existsλ < 1 satisfying d(f(x), f(y)) ≤ λd(x, y) for every x, y ∈ M ; and isometries,such that d(f(x), f(y)) = d(x, y) for every x, y ∈ M . We saw in Lemma 6.3.6that every compact metrizable group admits a distance relative to which everytranslation is an isometry. Therefore, it also follows from the previous observa-tions that the topological entropy of every translation in a compact metrizablegroup is zero.

Recalling that g(f) = g(f,M) and s(f) = s(f,M) when M is compact, wesee that the conclusion of Proposition 10.1.6 may be rewritten as follows:

h(f) = limε→0

lim supn

1

nlog gn(f, ε,M) = lim

ε→0lim sup

n

1

nlog sn(f, ε,M).

From the proof of the proposition we may also obtain the following relatedidentity:

Corollary 10.1.8. If f : M →M is a continuous transformation in a compactmetric space then

h(f) = limε→0

lim infn

1

nlog gn(f, ε,M) = lim

ε→0lim inf

n

1

nlog sn(f, ε,M).

Proof. The relation (10.1.15) gives that

h(f, α) ≤ lim infn

1

nlog gn(f, ε,M)

whenever ε > 0 is a Lebesgue number for the open cover α. Making ε→ 0, weconclude that

h(f) ≤ limε→0

lim infn

1

nlog gn(f, ε,M). (10.1.16)

The first inequality in Lemma 10.1.5 implies that

limε→0

lim infn

1

nlog gn(f, ε,M) ≤ lim

ε→0lim inf

n

1

nlog sn(f, ε,M). (10.1.17)

Also, it is clear that

limε→0

lim infn

1

nlog sn(f, ε,M) ≤ lim

ε→0lim sup

n

1

nlog sn(f, ε,M). (10.1.18)

As we have just observed, the expression on the right-hand side is equal to h(f).Therefore, the inequalities (10.1.16)-(10.1.18) imply the conclusion.


10.1.3 Calculation and properties

We start by proving a version of Lemma 9.1.13 for the topological entropy. Theproof is a bit more elaborate because, unlike what happens for partitions, givenan open cover α the covers (αk)n and αn+k−1 need not coincide if the elementsof α are not pairwise disjoint.

Example 10.1.9. Let f : M →M be the shift map in M = 1, 2, 3N (or M =1, 2, 3Z) and α be the open cover of M consisting of the cylinders [0; 1, 2]and [0; 1, 3]. For each n ≥ 1, the cover αn consists of the 2n cylinders of theform [0;A0, . . . , An−1] with Aj = [0; 1, 2] or Aj = [0; 1, 3]. In particular,#α3 = 8. On the other hand, (α2)2 contains 12 elements: the 8 elements of α3

together with the 4 cylinders of the form [0;A0, 1, A2] with Aj = [0; 1, 2] orAj = [0; 1, 3] for j = 0 and j = 2. Hence, αn+k−1 6= (αk)n for n = k = 2.

Proposition 10.1.10. Let M be a compact topological space, f : M → M bea continuous transformation and α be an open cover of M . Then, h(f, α) =h(f, αk) for every k ≥ 1. Moreover, if f : M → M is a homeomorphism thenh(f, α) = h(f, α±k) for every k ≥ 1, where α±k = ∨k−1

j=−kf−j(α).

Proof. The main point is to show that the open covers (αk)n and αn+k−1 havethe same entropy, for every n ≥ 1. We use the following simple fact, which willbe useful again later:

Lemma 10.1.11. Given any open cover α and any n, k ≥ 1,

1. αn+k−1 is a subcover of (αk)n and, in particular, (αk)n ≺ αn+k−1;

2. for any subcover β of (αk)n there exists a subcover γ of αn+k−1 such that#γ ≤ #β and γ ≺ β.

Proof. By definition, every element αn+k−1 has the form B = ∩n+k−2l=0 f−l(Bl)

with Bl ∈ α for every l. It is clear that this may be written in the formB = ∩n−1

i=0 f−i(∩k−1j=0 f

−j(Bi+j))

and, thus, B ∈ (αk)n. This proves the first

claim. Next, let β be a subcover of (αk)n. Every element of β has the form

A = ∩n−1i=0 f

−i(∩k−1j=0 f

−j(Ai,j))

= ∩n+k−2l=0 f−l

(∩i+j=l Ai,j

)

with Ai,j ∈ α. Consider B = ∩n+k−2l=0 f−l(Bl), where Bl = Ai,j for some pair

(i, j) such that i+ j = l. Observe that A ⊂ B and B ∈ αn+k−1. Therefore, thefamily γ formed by all the sets B obtained in this way satisfies all the conditionsin the second claim.

According to the relation (10.1.2), the first part of Lemma 10.1.11 impliesthat H((αk)n) ≤ H(αn+k−1). Clearly, the second part of the lemma implies theopposite inequality. Hence,

H(αn+k−1) = H((αk)n) for any n, k ≥ 1, (10.1.19)


as we claimed. Therefore,

h(f, αk) = limn

1

nH((αk)n) = lim

n

1

nH(αn+k−1) = h(f, α) for every k.

When f is invertible, it follows from the definitions that α±k = fk(α2k). UsingExercise 10.1.3, we get that h(f, α±k) = h(f, fk(α2k)) = h(f, α2k) = h(f, α).

The next proposition and its corollary simplify the calculation of the topo-logical entropy significantly in concrete examples. Recall that, when M is ametric space, the diameter of an open cover is defined to be the supremum ofthe diameters of its elements.

Proposition 10.1.12. Assume that M is a compact metric space. Let (βk)k beany sequence of open covers of M such that diamβk converges to zero. Then,

h(f) = supkh(f, βk) = lim

kh(f, βk).

Proof. Given any open cover α, let ε > 0 be a Lebesgue number of α. Taken ≥ 1 such that diamβk < ε for every k ≥ n. By the definition of Lebesguenumber, it follows that every element of βk is contained in some element of α.In other words, α ≺ βk and, hence, h(f, βk) ≥ h(f, α). In view of the definition(10.1.5), this proves that

lim infk

h(f, βk) ≥ h(f).

It is also clear from the definitions that h(f) ≥ supk h(f, βk) ≥ lim supk h(f, βk).Combining these observations, we obtain the conclusion of the proposition.

Corollary 10.1.13. Assume that M is a compact metric space. If β is an opencover such that

(1) the diameter of the one-sided iterated sum βk =∨k−1j=0 f

−j(β) convergesto zero when k → ∞, or

(2) f : M → M is a homeomorphism and the diameter of the two-sided iter-

ated sum β±k =∨k−1j=−k f

−j(β) converges to zero when k → ∞,

then h(f) = h(f, β).

Proof. In case (1), Propositions 10.1.10 and 10.1.12 yield

h(f) = limkh(f, βk) = h(f, β).

The proof in case (2) is analogous.

Next, we check that the topological entropy behaves as one could expectwith respect to positive iterates, at least when the transformation is uniformlycontinuous:


Proposition 10.1.14. If f : M →M is a uniformly continuous transformationin a metric space then h(fk) = kh(f) for every k ∈ N.

Proof. Fix k ≥ 1 and let K ⊂ M be any compact set. Consider any n ≥ 1and ε > 0. It is clear that if E ⊂ M is an (nk, ε)-generating set of K for thetransformation f then it is also an (n, ε)-generating set of K for the iterate fk.Therefore, gn(f

k, ε,K) ≤ gnk(f, ε,K). Hence,

g(fk, ε,K) = limn

1

ngn(f

k, ε,K) ≤ limn

1

ngnk(f, ε,K) = kg(f, ε,K).

Making ε→ 0 and taking the supremum over K, we see that h(fk) ≤ kh(f).The proof of the other inequality uses the assumption that f is uniformly

continuous. Take δ > 0 such that d(x, y) < δ implies d(f j(x), f j(y)) < ε forevery j ∈ 0, . . . , k − 1. If E ⊂M is an (n, δ)-generating set of K for fk thenE is an (nk, ε)-generating set of K for f . Therefore, gnk(f, ε,K) ≤ gn(f

k, δ,K).This shows that kg(f, ε,K) ≤ g(fk, δ,K). Making ε and δ go to zero, we getthat kg(f,K) ≤ g(fk,K) for every compact set K. Hence, kh(f) ≤ h(fk).

In particular, Proposition 10.1.14 holds for every continuous transformationin a compact metric space. On the other hand, in the case of homeomorphismsin compact spaces the conclusion extends to negative iterates:

Proposition 10.1.15. If f : M →M is a homeomorphism of a compact metricspace then h(f−1) = h(f). Consequently, h(fn) = |n|h(f) for every n ∈ Z.

Proof. Let α be an open cover of M . For every n ≥ 1, denote

αn+ = α ∨ f−1(α) ∨ · · · ∨ f−n+1(α) and αn− = α ∨ f(α) ∨ · · · ∨ fn−1(α).

Observe that αn− = fn−1(αn+). Moreover, γ is a finite subcover of αn+ if and onlyif fn−1(γ) is a finite subcover of αn−. Since the two subcovers have the samenumber of elements, it follows that H(αn+) = H(αn−). Therefore,

h(f, α) = limn

1

nH(αn+) = lim

n

1

nH(αn−) = h(f−1, α).

Since α is arbitrary, this proves that h(f) = h(f−1). The second part of thestatement follows from combining the first part with Proposition 10.1.14.

The claim in Proposition 10.1.15 is generally false when the space M is notcompact:

Example 10.1.16. Let M = R with the distance d(x, y) = |x − y| and takef : R → R to be given by f(x) = 2x. We are going to check that h(f) 6= h(f−1).Let K = [0, 1] and, given n ≥ 1 and ε > 0, take E ⊂ R to be any (n, ε)-generating set of K. In particular, every point of fn−1(K) = [0, 2n−1] is withinless than ε from some point of fn−1(E). Hence,

2ε#E = 2ε#fn−1(E) ≥ 2n−1.


This proves that gn(f, ε,K) ≥ 2n−2/ε for every n and, thus, g(f, ε,K) ≥ log 2.It follows that h(f) ≥ g(f,K) ≥ log 2. On the other hand, f−1 is a contractionand so it follows from Example 10.1.7 that its topological entropy h(f−1) iszero.

10.1.4 Exercises

10.1.1. Let M be a compact topological space. Show that if α and β are opencovers of M such that α ≺ β then H(α) ≤ H(β).

10.1.2. Let f : M → M be a continuous transformation and α, β be opencovers of a compact topological space M . Show that H(α ∨ β) ≤ H(α) +H(β)and H(f−1(β)) ≤ H(β). Check that if f is surjective then H(f−1(β)) = H(β).

10.1.3. Let M be a compact topological space. Show that if f : M → M is asurjective continuous transformation and β is an open cover of M then h(f, β) =h(f, f−1(β)). Moreover, if f is a homeomorphism then h(f, β) = h(f, f(β)).

10.1.4. Let M = (0,∞) and f : M →M be given by f(x) = 2x. Calculate thetopological entropy of f when one considers in M :

1. the usual distance d(x, y) = |x− y|;

2. the distance d(x, y) = | log x− log y|.

[Observation: Hence, in non-compact spaces the topological entropy may de-pend on the distance function, not just the topology.]

10.1.5. Consider in M two distances d1 and d2 that are uniformly equivalent:for every ε > 0 there exists δ > 0 such that

d1(x, y) < δ ⇒ d2(x, y) < ε and d2(x, y) < δ ⇒ d1(x, y) < ε.

Show that if f : M →M is continuous with respect to any of the two distancesthen the value of the topological entropy is the same relative to both distances.

10.1.6. Let f : M → M and g : N → N be continuous transformations incompact metric spaces. Show that if there exists a continuous injective mapψ : M → N such that ψ f = g ψ then h(f) ≤ h(g). Use this fact to show thatthe topological entropy of the shift map σ : [0, 1]Z → [0, 1]Z is infinite (thus, thetopological entropy of a homeomorphism of a compact space need not be finite).[Observation: The first claim remains valid for non-compact spaces, as long aswe require the inverse ψ−1 : ψ(M) →M to be uniformly continuous.]

10.1.7. Show that if K,K1, . . . ,Kl are compact sets such that K is containedin K1 ∪ · · · ∪Kl then g(f,K) ≤ maxj g(f,Kj). Conclude that, given any δ > 0,

g(f) = supg(f,K) : K compact with diamK < δ

and analogously for s(f).

10.2. EXAMPLES 319

10.1.8. Prove that the logistic map f : [0, 1] → [0, 1], f(x) = 4x(1 − x) istopologically conjugate to the map g : [0, 1] → [0, 1] defined by g(x) = 1−|2x−1|.Use this fact to calculate h(f).

10.1.9. Let A be a finite alphabet and σ : Σ → Σ be the shift map in Σ = AN.The complexity of a sequence x ∈ Σ is defined by c(x) = limn n

−1 log cn(x),where cn(x) is the number of distinct words of length n that appear in x. Showthat this limit exists and coincides with the topological entropy of the restrictionσ : X → X of the shift map to the closure X of the orbit of x. [Observation:One interesting application we have in mind is in the context of Example 6.3.10,where x is the fixed point of a substitution.]

10.1.10. Check that if θ is the fixed point of the Fibonacci substitution inA = 0, 1 (see Example 6.3.10) then cn(θ) = n + 1 for every n and so thecomplexity c(θ) is equal to zero. Hence, the topological entropy of the shiftmap σ : X → X associated with the Fibonacci substitution is equal to zero.

10.2 Examples

Let us use a few concrete situations to illustrate the ideas introduced in theprevious section.

10.2.1 Expansive maps

Recall (Section 9.2.3) that a continuous transformation f : M → M in acompact metric space is said to be expansive if there exists ε0 > 0 such thatd(f j(x), f j(y)) < ε0 for every j ∈ N implies that x = y. When f : M → M isinvertible, we say that it is two-sided expansive if there exists ε0 > 0 such thatd(f j(x), f j(y)) < ε0 for every j ∈ Z implies that x = y. In both cases, ε0 iscalled a constant of expansivity for f .

Proposition 10.2.1. If ε0 > 0 is a constant of expansivity for f then

(a) h(f) = h(f, α) for every open cover α with diameter less than ε0;

(b) h(f) = g(f, ε,M) = s(f, ε,M) for every ε < ε0/2.

In particular, h(f) <∞.

Proof. Let α be any open cover of M with diameter less than ε0. We claimthat limk diamαk = 0. Indeed, suppose that this is not so. It is clear thatthe sequence of diameters is non-increasing. Then, there exists δ > 0 and foreach k ≥ 1 there exist points xk and yk in the same element of αk such thatd(xk, yk) ≥ δ. By compactness, we may find a subsequence (kj)j such that bothx = limj xkj and y = limj ykj exist. On the one hand, d(x, y) ≥ δ and so x 6= y.On the other hand, the fact that xk and yk are in the same element of αk impliesthat

d(f i(xk), fi(yk)) ≤ diamα for every 0 ≤ i < k.


Passing to the limit, we get that d(f i(x), f i(y)) ≤ diamα < ε0 for every i ≥ 0.This contradicts the hypothesis that ε0 is a constant of expansivity for f . Thiscontradiction proves our claim. Using Corollary 10.1.13, it follows that h(f) =h(f, α), as claimed in part (a).

To prove part (b), let α be the open cover of M formed by the balls of radiusε. Note that αn contains every dynamical ball B(x, n, ε):

B(x, n, ε) =

n−1⋂

j=0

f−j(B(f j(x), ε)

)and each B(f j(x), ε) ∈ α.

If E is an (n, ε)-generating set of M then B(a, n, ε) : a ∈ E is an open coverof M ; in view of what we have just said, it is a subcover of αn. Therefore (recallalso Lemma 10.1.5),

N(αn) ≤ gn(f, ε,M) ≤ sn(f, ε,M) for every n.

Passing to the limit, we get that h(f, α) ≤ g(f, ε,M) ≤ s(f, ε,M). Recall thats(f, ε,M) ≤ s(f,M) = h(f). Since diamα < ε0, the first part of the propositionyields that h(f) = h(f, α). These relations imply part (b).

The last claim in the proposition is a direct consequence, since g(f, ε,M),s(f, ε,M) and h(f, α) are always finite. Indeed, that h(f, α) < ∞ for everyopen cover was observed right after the definition (10.1.3). Then, (10.1.14)implies that s(f, ε,M) <∞ and (10.1.13) implies that g(f, ε,M) <∞ for everyε > 0.

Exercise 10.2.8 contains an extension of Proposition 10.2.1 to h-expansivetransformations, due to Rufus Bowen [Bow72]. Exercise 10.1.6 shows that thetopological entropy of a continuous transformation, or even a homeomorphism,in a compact metric space may be infinite, if one omits the expansivity assump-tion.

Next, we prove that for expansive maps the topological entropy is an upperbound on the rate of growth of the number of periodic points. Let Fix(fn)denote the set of all points x ∈M such that fn(x) = x.

Proposition 10.2.2. If M is a compact metric space and f : M → M isexpansive then

lim supn

1

nlog #Fix(fn) ≤ h(f).

Proof. Let ε0 be a constant of expansivity for f and α be any open cover ofM with diamα < ε0. We claim that every element of αn contains at most onepoint of Fix(fn). Indeed, if x, y ∈ Fix(fn) are in the same element of αn thend(f i(x), f i(y)) < diamα < ε0 for every i = 0, . . . , n − 1. Since fn(x) = x andfn(y) = y, it follows that d(f i(x), f i(y)) < ε0 for every i ≥ 0. By expansivity,this implies that x = y, which proves our claim. It follows that

lim supn

1

nlog #Fix(fn) ≤ lim sup

n

1

nlogN(αn) = h(f, α).

10.2. EXAMPLES 321

Taking the limit when the diameter of α goes to zero, we get the conclusion ofthe proposition.

In some interesting situations, one can show that the topological entropyactually coincides with the rate of growth of the number of periodic points:

limn

1

nlog #Fix(fn) = h(f). (10.2.1)

That is the case, for example, for the shifts of finite type, which we are goingto study in Section 10.2.2 (check Proposition 10.2.5 below). More generally,(10.2.1) holds whenever f : M → M is an expanding transformation in a com-pact metric space, as we are going to see in Section 11.3.

10.2.2 Shifts of finite type

Let X = 1, . . . , d be a finite set and A = (Ai,j)i,j be a transition matrix, thatis, a square matrix of dimension d ≥ 2 with coefficients in the set 0, 1 andsuch that no row is identically zero: for every i there exists j such that Ai,j = 1.Consider the subset ΣA of Σ = XN consisting of all the sequences (xn)n ∈ Σthat are A-admissible, meaning that

Axn,xn+1 = 1 for every n ∈ N. (10.2.2)

It is clear that ΣA is invariant under the shift map σ : Σ → Σ, in the sensethat σ(ΣA) ⊂ ΣA. Note also that ΣA is closed in Σ and, hence, it is a compactmetric space (this is similar to Lemma 7.2.5).

The restriction σA : ΣA → ΣA of the shift map σ : Σ → Σ to this invari-ant compact set is called one-sided shift of finite type associated with A. Thetwo-sided shift of finite type associated with a transition matrix A is definedanalogously, considering Σ = XZ and requiring (10.2.2) for every n ∈ Z. In thiscase, as part of the definition of transition matrix, we also require the columns(not just the rows) of A to be non-zero.

The restriction of the shift map σ : Σ → Σ to the support of any Markovmeasure is a shift of finite type:

Example 10.2.3. Given a stochastic matrix P = (Pi,j)i,j , define A = (Ai,j)i,jby

Ai,j =

1 if Pi,j > 00 if Pi,j = 0.

Note that A is a transition matrix: the definition of stochastic matrix impliesthat no row P is identically zero (in the two-sided situation we must assumethat the columns of P are also not zero; this is automatic, for example, if thematrix P is aperiodic). Comparing (7.2.7) and (10.2.2), we see that a sequenceis A-admissible if and only if it is P -admissible. Let µ be the Markov measuredetermined by a probability vector p = (pj)j with positive coefficients and suchthat P ∗p = p (recall Example 7.2.2). By Lemma 7.2.5, the support of µ coincideswith the set ΣA = ΣP of all admissible sequences.


1 2

3 4

Figure 10.1: Graph associated with a transition matrix

It is useful to associate with any transition matrix A the oriented graphwhose vertices are the points of X = 1, . . . , d and such that there exists anedge from vertex a to vertex b if and only if Aa,b = 1. In other words,

GA = (a, b) ∈ X ×X : Aa,b = 1.

For example, Figure 10.1 describes the graph associated with the matrix

A =

0 1 1 01 1 0 11 0 1 01 0 0 1

.

A path of length l ≥ 1 in the graph GA is a sequence a0, . . . , al in X suchthat Aai−1,ai = 1 for every i, that is, such that there always exists an edgeconnecting ai−1 to ai. Given a, b ∈ X and l ≥ 1, denote by Ala,b the number ofpaths of length l starting at a and ending at b, that is, with a0 = a and al = b.Observe that

1. A1a,b = 1 if there exists an edge connecting a to b and A1

a,b = 0 otherwise.

In other words, A1a,b = Aa,b for every a, b.

2. The paths of length l +m starting at a and ending at b are the concate-nations of the paths of length l starting at a and ending at some pointz ∈ X with the paths of length m starting at that point z and ending atb. Therefore,

Al+ma,b =

d∑

z=1

Ala,zAmz,b for every a, b ∈ X and every l,m ≥ 1.

It follows, by induction on l, that Ala,b coincides with the coefficient in row a

and column b of the matrix Al.The basic topological properties of shifts of finite type are analyzed in Exer-

cise 10.2.2. In the proposition that follows we calculate the topological entropy

10.2. EXAMPLES 323

of these transformations. We need a few prior observations about transitionmatrices.

Recall that the spectral radius ρ(B) of a linear map B : Rd → Rd (that is,the largest absolute value of an eigenvalue of B) is given by

ρ(B) = limn

‖Bn‖1/n = limn

| trcBn|1/n, (10.2.3)

where trc denotes the trace of the matrix and ‖·‖ denotes any norm in the vectorspace of linear maps (all norms are equivalent, as we are in finite dimension).Most of the times, one uses the operator norm ‖B‖ = sup‖Bv‖/‖v‖ : v 6= 0,but it will also be useful to consider the norm ‖ · ‖s defined by

‖B‖s =

d∑

i,j=1

|Bi,j |.

Now take A to be a transition matrix. Since the coefficients of A are non-negative, we may use the Perron-Frobenius theorem (Theorem 7.2.3) to concludethat A admits a non-negative eigenvalue λA that is equal to the spectral radius.By our definition of transition matrix, we also have that all the rows of A arenon-zero. Then, the same is true about An, for any n ≥ 1 (Exercise 10.2.5).This implies that all the coefficients of the vector An(1, . . . , 1) are positive (andinteger) and, thus,

‖An‖ ≥ ‖An(1, . . . , 1)‖‖(1, . . . , 1)‖ ≥ 1 for every n ≥ 1.

Using (10.2.3), we get that λA = ρ(A) ≥ 1 for every transition matrix A.

Proposition 10.2.4. The topological entropy h(σA) of a shift of finite typeσA : ΣA → ΣA is given by h(σA) = logλA, where λA is the largest eigenvalueof the transition matrix A.

Proof. We treat the case of one-sided shifts; the two-sided case is analogous, asthe reader may readily check. Consider the open cover α of ΣA formed by therestrictions

[0; a]A = (xj)j ∈ ΣA : x0 = a

of the cylinders [0; a] of Σ. For each n ≥ 1, the open cover αn is formed by therestrictions

[0; a0, . . . , an−1]A = (xj)j ∈ ΣA : xj = aj for j = 0, . . . , n− 1

of the cylinders of length n. Observe that [0; a0, . . . , an−1]A is non-empty if andonly if a0, . . . , an−1 is a path (of length n−1) in the graph GA: it is evident thatthis condition is necessary; to see that it is also sufficient, use the assumptionthat for every i there exists j such that Ai,j = 1. Since the cylinders are pairwise


disjoint, this observation shows that N(αn) is equal to the total number of pathsof length n− 1 in the graph GA. In other words,

N(αn) =

d∑

i,j=1

An−1i,j = ‖An−1‖s.

By the spectral radius formula (10.2.3), it follows that

h(σA, α) = limn

1

nlogN(αn) = lim

n

1

nlog ‖An−1‖s = log ρ(A) = logλA.

Finally, since diamαn → 0, Corollary 10.1.13 yields that h(σA) = h(σA, α).

Proposition 10.2.5. If σA : ΣA → ΣA is a shift of finite type then

h(σA) = limn

1

nlog #Fix(σnA).

Proof. We treat the case of one-sided shifts, leaving the two-sided case for thereader. Note that (xk)k ∈ ΣA is a fixed point of σnA if and only if xk = xk−n forevery k ≥ n. In particular, every cylinder [0; a0, . . . , an−1]A contains at mostone element of Fix(σnA). Moreover, the cylinder does contain a fixed point ifand only if a0, . . . , an−1, a0 is a path (of length n) in the graph GA. This provesthat

#Fix(σnA) =

d∑

i=1

Ani,i = trcAn

for every n. Consequently,

limn

1

nlog #Fix(σnA) = lim

n

1

nlog trcAn = log ρ(A).

Now the conclusion is a direct consequence of the previous proposition.

10.2.3 Topological entropy of flows

The definition of topological entropy extends easily to the context of continuousflows φ = φt : M →M : t ∈ R in a metric space M , as we now explain.

Given x ∈M and T > 0 and ε > 0, the dynamical ball of center x, length Tand radius ε > 0 is the set

B(x, T, ε) = y ∈M : d(φt(x), φt(y)) < ε for every 0 ≤ t ≤ T .

Let K be any compact subset of M . We say that E ⊂M is a (T, ε)-generatingset for K if

K ⊂⋃

x∈E

B(x, T, ε)

and we say that E ⊂ K is a (T, ε)-separated set if the dynamical ball B(x, T, ε)of each x ∈ E contains no other element of E.

10.2. EXAMPLES 325

Denote by gT (φ, ε,K) the smallest cardinality of a (T, ε)-generating set ofK and by sT (φ, ε,K) the largest cardinality of a (T, ε)-separated set E ⊂ K.Then, take

g(φ,K) = limε→0

lim supT→∞

1

Tlog gT (φ, ε,K) and

s(φ,K) = limε→0

lim supT→∞

1

Tlog sT (φ, ε,K)

and defineg(φ) = sup

Kg(φ,K) and s(φ) = sup

Ks(φ,K),

where both suprema are taken over all the compact sets K ⊂M .The next result, a continuous-time analogue of Proposition 10.1.4, ensures

that these two last numbers coincide. We leave the proof up to the reader(Exercise 10.2.3). By definition, the topological entropy of the flow φ is thenumber h(φ) = g(φ) = s(φ).

Proposition 10.2.6. We have g(φ,K) = s(φ,K) for every compact K ⊂ M .Consequently, g(φ) = s(φ).

In the statement that follows we take the flow to be uniformly continuous,that is, such that for every T > 0 and ε > 0 there exists δ > 0 such that

d(x, y) < δ ⇒ d(φt(x), φt(y)) < ε for every t ∈ [−T, T ].

Observe that this is automatic for continuous flows when M is compact.

Proposition 10.2.7. If the flow φ is uniformly continuous then its topologicalentropy h(φ) coincides with the topological entropy h(φ1) of its time-1 map.

Proof. It suffices to prove that g(φ,K) = g(φ1,K) for every compact K ⊂M .It is clear that if E ⊂M is (T, ε)-generating for K relative to the flow φ then

E is also (n, ε)-generating for K relative to the time-1 map, for any n ≤ T + 1.In particular, gn(φ

1, ε,K) ≤ gT (φ, ε,K). It follows that

lim supn

1

nlog gn(φ

1, ε,K) ≤ lim supT→∞

1

Tlog gT (φ, ε,K)

and so g(φ1,K) ≤ g(φ,K).The hypothesis of uniform continuity is used for the opposite inequality.

Given ε > 0, fix δ ∈ (0, ε) such that if d(x, y) < δ then d(φt(x), φt(y)) < ε forevery t ∈ [0, 1]. If E ⊂ M is an (n, δ)-generating of K relative to φ1 then E isa (T, ε)-generating set of K relative to the flow φ, for any T ≤ n. In particular,gT (φ, ε,K) ≤ gn(φ

1, δ,K). It follows that

lim supT→∞

1

Tlog gT (φ, ε,K) ≤ lim sup

n

1

nlog gn(φ

1, δ,K)

(given a sequence (Tj)j that realizes the supremum on the left-hand side, con-sider the sequence (nj)j given by nj = [Tj ] + 1). Making ε → 0 (then δ → 0),we get that g(φ,K) ≤ g(φ1,K).


We have seen previously that for transformations the topological entropy isan invariant of topological (uniformly continuous) conjugacy. The same is truefor flows: this follows from Proposition 10.2.7 and the obvious observation thatany flow conjugacy also conjugates the corresponding time-1 maps. However,in the continuous-time context, one uses more often the concept of topologicalequivalence, which allows for rescaling of time. Clearly, topological equivalenceneed not preserve the topological entropy.

10.2.4 Differentiable maps

In this section we take M to be a Riemannian manifold (Appendix A.4.5). Letf : M → M be a differentiable map and Df(x) : TxM → Tf(x)M denote thederivative of f at each point x ∈M . Our goal is to prove that the norm of thederivative, defined by

‖Df(x)‖ = sup‖Df(x)v‖

‖v|| : v ∈ TxM and v 6= 0,

determines an upper bound for the topological entropy h(f) of f . For x > 0,we denote log+ x = maxlog x, 0.

Proposition 10.2.8. Let f : M →M be a differentiable map in a Riemannianmanifold of dimension d such that ‖Df‖ is bounded. Then,

h(f) ≤ d log+ sup ‖Df‖ <∞.

Proof. Let L = sup‖Df(x)‖ : x ∈M. By the mean value theorem,

d(f(x), f(y)) ≤ Ld(x, y) for every x, y ∈M.

If L ≤ 1 then, as we have seen in Example 10.1.7, the entropy of f is zero. Thus,from now on we may suppose that L > 1.

Let A be an atlas of the manifold M consisting of charts ϕα : Uα → Xα withXα = (−2, 2)d. Given any compact set K ⊂ M , we may find a finite familyAK ⊂ A such that

ϕ−1α ((−1, 1)d) : ϕα ∈ AK

covers K. Fix B > 0 such that d(u, v) ≤ Bd(ϕα(u), ϕα(v)) for all u, v ∈ [−1, 1]d

and ϕα ∈ AK . Given n ≥ 1 and ε > 0, fix δ = (ε/B√d)L−n. Denote by δZd

the set of all points of the form (δk1, . . . , δkd) with kj ∈ Z for every j = 1, . . . , d.Let E ⊂M be the union of the pre-images ϕ−1

α (δZd ∩ (−1, 1)d), with ϕα ∈ AK .

Note that every point of (−1, 1)d is at distance less than δ√d from some

point of δZd ∩ (−1, 1)d. Therefore, for any ϕα ∈ AK , every x ∈ ϕ−1α ((−1, 1)d)

is at distance less than Bδ√d from some point a ∈ ϕ(δZd ∩ (−1, 1)d). Then, by

the choice of δ,

d(f j(x), f j(a)) ≤ LjBδ√d < LnBδ

√d = ε

10.2. EXAMPLES 327

for every j = 0, . . . , n− 1. This proves that E is an (n, ε)-generating set for K.On the other hand, by construction,

#E ≤ #AK#(δZd ∩ (−1, 1)d

)≤ #AK(2/δ)d ≤ #AK(2B

√dLn/ε)d.

So, the expression on the right-hand side is an upper bound for gn(f, ε,K).Consequently,

g(f, ε,K) ≤ lim supn

1

nlog(2B

√dLn/ε)d = d logL.

Making ε→ 0 and taking the supremum over K, we get that h(f) ≤ d logL.

Combining Propositions 10.1.14 and 10.2.8, we find that

h(f) ≤ 1

nlog+ sup ‖Dfn‖ for every n ≥ 1.

When f is a homeomorphism, using Proposition 10.1.15 we also get that

h(f) ≤ 1

nlog+ sup ‖Df−n‖ for every n ≥ 1.

The following conjecture of Michael Shub [Shu74] is central to the theory oftopological entropy:

Conjecture 10.2.9 (Entropy conjecture). If f : M → M is a diffeomorphismof class C1 in a Riemannian manifold of dimension d, then

h(f) ≥ max1≤k≤d

log ρ(fk), (10.2.4)

where each ρ(fk) denotes the spectral radius of the action fk : Hk(M) → Hk(M)induced by f in the real homology of dimension k.

The full statement of the conjecture remains open to date, but several partialanswers and related results have been obtained, both positive and negative. Letus summarize what is known in this regard.

It follows from a result of Yano [Yan80] that the inequality (10.2.4) is truefor an open and dense subset of the space of homeomorphisms in any manifoldof dimension d ≥ 2. Moreover, it is true for every homeomorphism in cer-tain classes of manifolds, such as the spheres or the infranilmanifolds [MP77b,MP77a, MP08]. On the other hand, Shub [Shu74] exhibited a Lipschitz home-omorphism, with zero topological entropy, for which (10.2.4) is false. See Exer-cise 10.2.7.

A useful way to approach (10.2.4) is by comparing the topological entropywith each one of the spectral radii ρ(fk). The case k = d is relatively easy.Indeed, for any continuous map f in a manifold of dimension d, the spectralradius ρ(fd) is equal to the absolute value | deg f | of the degree of the map. Inparticular, the inequality h(f) ≥ log ρ(fd) is trivial for any homeomorphism.For non-invertible continuous maps, the topological entropy may be less than


the logarithm of the absolute value of the degree. However, it was shown in[MP77b] that for differentiable maps one always has h(f) ≥ log | deg f |.

Anthony Manning [Man75] proved that the inequality h(f) ≥ log ρ(f1) istrue for every homeomorphism in a manifold of any dimension d. It follows thath(f) ≥ log ρ(fd−1), since the duality theorem of Poincare implies that

ρ(fk) = ρ(fd−k) for every 0 < k < d.

In particular, the theorem of Manning together with the observations in theprevious paragraph prove that entropy conjecture is true for every homeomor-phisms in any manifold of dimension d ≤ 3.

Rufus Bowen [Bow78] proved that for any homeomorphism in a manifoldthe topological entropy h(f) is larger or equal than the logarithm of the rateof growth of the fundamental group. One can show that this rate of growth islarger or equal than ρ(f1). Thus, this result of Bowen implies the theorem ofManning that we have just mentioned.

The main result concerning the entropy conjecture is the theorem of YosefYomdin [Yom87], according to which the conjecture is true for every diffeomor-phism of class C∞. The crucial ingredient in the proof is a relation betweenthe topological entropy h(f) and the diffeomorphism’s rate of growth of volume,which is defined as follows. For each 1 ≤ k < d, let Bk be the unit ball inRk. Denote by v(σ) the k-dimensional volume of the image of any differentiableembedding σ : Bk →M . Then, define

vk(f) = supσ

lim supn

1

nlog v(fn σ),

where the supremum is taken over all the embeddings σ : Bk →M of class C∞.Define also v(f) = maxvk(f) : 1 ≤ k < d. It is not difficult to check that

log ρ(fk) ≤ vk(f) for every 1 ≤ k < d. (10.2.5)

On the one hand, Sheldon Newhouse [New88] proved that h(f) ≤ v(f) for everydiffeomorphism of class Cr with r > 1. On the other hand, Yomdin [Yom87]proved the opposite inequality

v(f) ≤ h(f), (10.2.6)

for every diffeomorphism of class C∞ (this inequality is false, in general, in theCr case with r < ∞). Combining (10.2.5) with (10.2.6), one gets the entropyconjecture (10.2.4) for every diffeomorphism of class C∞.

Concerning systems of class C1, it is also known that the inequality (10.2.4)is true for every Axiom A diffeomorphism with no cycles [SW75], for certainpartially hyperbolic diffeomorphisms [SX10] and, more generally, for any C1

diffeomorphism far from homoclinic tangencies [LVY13].


In this section we calculate the topological entropy of the linear endomorphismsof the torus:

10.2. EXAMPLES 329

Proposition 10.2.10. Let fA : Td → Td be the endomorphism induced on thetorus Td by some square matrix A of dimension d with integer coefficients andnon-zero determinant. Then,

h(fA) =

d∑

j=1

log+ |λj |, (10.2.7)

where λ1, . . . , λd are the eigenvalues of A, counted with multiplicity.

We have seen in Proposition 9.4.3 that the entropy of fA with respect to theHaar measure µ is equal to the expression on the right-hand side of (10.2.7).By the variational principle (Theorem 10.1), whose proof is contained in Sec-tion 10.4 below, the topological entropy is larger or equal than the entropy ofthe transformation with respect to any invariant probability measure. Thus,

h(fA) ≥ hµ(f) =

d∑

j=1

log+ |λj |.

In what follows, we focus on proving the opposite inequality:

h(fA) ≤d∑

j=1

log+ |λj |. (10.2.8)

Initially, assume that A is diagonalizable, that is, that there exists a basisv1, . . . , vd of Rd with Avi = λivi for each i. Then, clearly, we may take theelements of such a basis to be unit vectors. Moreover, up to renumbering theeigenvalues, we may assume that there exists u ∈ 0, . . . , d such that |λi| > 1for 1 ≤ i ≤ u and |λi| ≤ 1 for every i > u. Let e1, . . . , ed be the canonicalbasis of Rd and P : Rd → Rd be the linear isomorphism defined by P (ei) = vifor each i. Then, P−1AP is a diagonal matrix. Fix L > 0 large enough sothat P ((0, L)d) contains some unit cube

∏di=1[bi, bi + 1]d. See Figure 10.2. Let

π : Rd → Td be the canonical projection. Then, πP ((0, L)d) contains the wholetorus Td.

Given n ≥ 1 and ε > 0, fix δ > 0 such that ‖P‖δ√d < ε. Moreover, for each

i = 1, . . . , d, take

δi =

δ|λi|−n if i ≤ uδ if i > u.

Consider the set

E = πP(

(k1δ1, . . . , kdδd) ∈ (0, L)d : k1, . . . , kd ∈ Z).

Observe also that, given any j ≥ 0,

f jA(E) ⊂ πP(

(k1λj1δ1, . . . , kdλ

jdδd) : k1, . . . , kd ∈ Z

).

Consider 0 ≤ j < n. By construction, |λji δi| ≤ δ for every i = 1, . . . , d. There-

fore, every point of Rd is at distance less or equal than δ√d from some point of


L

L

P

b1

b2

b1 + 1

b2 + 1

Figure 10.2: Building an (n, ε)-generating set in Td

the form (k1λj1δ1, . . . , kdλ

jdδd). Then (see Figure 10.2), for each x ∈ Td we may

find a ∈ E such that d(f j(x), f j(a)) ≤ ‖P‖δ√d < ε for every 0 ≤ j < n. This

shows that E is an (n, ε)-generating set for Td. On the other hand,

#E ≤d∏

i=1

L

δi=

(L

δ

)d u∏

i=1

|λi|n.

These observations show that gn(fA, ε,Td) ≤ (L/δ)d∏ui=1 |λi|n for every n ≥ 1

and ε > 0. Hence,

h(f) = limε→0

lim supn

1

ngn(fA, ε,T

d) ≤u∑

i=1

log |λi| =d∑

i=1

log+ |λi|.

This proves Proposition 10.2.10 in the case when A is diagonalizable.The general case may be treated in a similar fashion, writing the matrix A

in its Jordan canonical form. The reader is invited to carry out the details.

10.2.6 Exercises

10.2.1. Let (Mi, di), i = 1, 2 be metric spaces and fi : Mi → Mi, i = 1, 2 becontinuous transformations. Let M = M1 ×M2 and d be the distance definedin M by

d((x1, x2), (y1, y2)) = maxd1(x1, y1), d2(x2, y2)and f : M → M be the transformation defined by f(x1, x2) = (f1(x1), f2(x2)).Show that h(f) ≤ h(f1) + h(f2) and the identity holds if at least one of thespaces is compact.

10.2.2. Let σA : ΣA → ΣA be a shift of finite type, either one-sided or two-sided. We say that a transition matrix A is irreducible if for any i, j ∈ X thereexists n ≥ 1 such that Ani,j > 0 and we say that A is aperiodic if there existsn ≥ 1 such that Ani,j > 0 for every i, j ∈ X . Show that:

(a) If A is irreducible then the set of periodic points of σA is dense in ΣA.

10.2. EXAMPLES 331

(b) σA is transitive if and only if A is irreducible.

(c) σA is topologically mixing if and only if A is aperiodic.

[Observation: Condition (b) means that the oriented graph GA is connected:given any a, b ∈ X there exists some path in GA starting at a and ending at b.]


10.2.4. Let M be a compact metric space. Show that, given any ε > 0, therestriction of the topological entropy function f 7→ h(f) to the set of continuoustransformations f : M → M that are ε-expansive is upper semi-continuous(with respect to the topology of uniform convergence).

10.2.5. Show that if A is a transition matrix then, for every k ≥ 1, no row ofAk is identically zero. The same is true for the columns of Ak, k ≥ 1, if weassume that A is a transition matrix in the two-sided sense.

10.2.6. (a) Let f : M → M be a surjective local homeomorphism in a com-pact metric space and let d = infy #f−1(y). Prove that h(f) ≥ log d.

(b) Let f : S1 → S1 be a continuous map in the circle. Show that h(f) islarger or equal than the logarithm of the absolute value of the degree off , that is, h(f) ≥ log | deg f |.

[Observation: Misiurewicz, Przytycki [MP77b] proved that h(f) ≥ log | deg f |for every map f : M →M of class C1 in a compact manifold.]

10.2.7. Consider the map f : C → C defined by f(z) = zd/(2|z|d−1), withd ≥ 2. Prove that the topological entropy of f is zero, but the degree of f is d.Why is this not in contradiction with Exercise 10.2.6?

10.2.8. Let f : M → M be a continuous map in a compact metric space M .Given ε > 0, define

g∗(f, ε) = supg(f,B(x,∞, ε)) : x ∈M,

where B(x,∞, ε) denotes the set of all y ∈ M such that d(f i(x), f i(y)) ≤ ε forevery n ≥ 0. Bowen [Bow72] has shown that, given b > 0 and δ > 0, there existsc > 0 such that

log gn(f, δ, B(x, n, ε)) < c+ (g∗(f, ε) + b)n for every x ∈M and n ≥ 1.

Using this fact, prove that h(f) ≤ g(f, ε,M) + g∗(f, ε). One says that fis h-expansive if g∗(f, ε) = 0 for some ε > 0. Conclude that in that caseh(f) = g(f, ε,M). [Observation: This generalizes Proposition 10.2.1, since ev-ery expansive transformation is also h-expansive.]


10.3 Pressure

In this section we introduce an important extension of the concept of topo-logical entropy, called (topological) pressure, and we study its main properties.Throughout, we consider only continuous transformations in compact metricspaces. Related to this, check Exercises 10.3.4 and 10.3.5.

10.3.1 Definition via open covers

Let f : M →M be a continuous transformation in a compact metric space. Wecall potential in M any continuous function φ : M → R. For each n ∈ N, defineφn : M → R by φn =

∑n−1i=0 φ f i. Given an open cover α of M , let

Pn(f, φ, α) = inf∑

U∈γ

supx∈U

eφn(x) : γ is a finite subcover of αn. (10.3.1)

This sequence logPn(f, φ, α) is subadditive (Exercise 10.3.1) and so the limit

P (f, φ, α) = limn

1

nlogPn(f, φ, α) (10.3.2)

exists. Define the pressure of the potential φ with respect to f to be the limitP (f, φ) of P (f, φ, α) when the diameter of α goes to zero. The existence of thislimit is guaranteed by the following lemma:

Lemma 10.3.1. There exists limdiamα→0 P (f, φ, α), that is, there exists someP (f, φ) ∈ R such that

limkP (f, φ, αk) = P (f, φ)

for every sequence (αk)k of open covers with diamαk → 0.

Proof. Let (αk)k and (βk)k be any sequences of open covers with diametersconverging to zero. Given any ε > 0, fix δ > 0 such that |φ(x) − φ(y)| ≤ εwhenever d(x, y) ≤ δ. By assumption, diamαk < δ for every k sufficientlylarge. For fixed k, let ρ > 0 be a Lebesgue number for αk. By assumption,diamβl < ρ for every l sufficiently large. By the definition of Lebesgue number,it follows that every B ∈ βl is contained in some A ∈ αk. Observe also that

supx∈A

φn(x) ≤ nε+ supy∈B

φn(y)

for every n ≥ 1, since diamαk < δ. This implies that

Pn(f, φ, αk) ≤ enεPn(f, φ, βl) for every n ≥ 1

and, hence, P (f, φ, αk) ≤ ε+ P (f, φ, βl). Making l → ∞ and then k → ∞, weget that

lim supk

P (f, φ, αk) ≤ ε+ lim infl

P (f, φ, βl).

Since ε > 0 is arbitrary, it follows that lim supk P (f, φ, αk) ≤ lim inf l P (f, φ, βl).Exchanging the roles of the two sequences of covers, we conclude that the limitslimk P (f, φ, αk) and liml P (f, φ, βl) exist and are equal.

10.3. PRESSURE 333

Before we proceed, let us mention a few simple consequences of the defini-tions. The first one is that the pressure of the zero potential coincides with thetopological entropy. Indeed, it is immediate from (10.3.1) that Pn(f, 0, α) =N(αn) for every n ≥ 1 and, thus, P (f, 0, α) = h(f, α) for every open cover α.Let (αk)k be any sequence of open covers with diameters going to zero. Then,by Proposition 10.1.12 and the definition of the pressure,

h(f) = limkh(f, αk) = lim

kP (f, 0, αk) = P (f, 0). (10.3.3)

Observe, however, that for general potentials P (f, φ) need not coincide with thesupremum of P (f, φ, α) over all open covers α (see Exercise 10.3.5).

Given any constant c ∈ R, we have that Pn(f, φ + c, α) = ecnPn(f, φ, α) forevery n ≥ 1 and, consequently, P (f, φ+c, α) = P (f, φ, α)+c for any open coverα. Hence,

P (f, φ+ c) = P (f, φ) + c. (10.3.4)

Analogously, if φ ≤ ψ then Pn(f, φ, α) ≤ Pn(f, ψ, α) for every n ≥ 1, whichimplies that P (f, φ, α) = P (f, ψ, α) for every open cover α. That is,

φ ≤ ψ ⇒ P (f, φ) ≤ P (f, ψ). (10.3.5)

In particular, since inf φ ≤ φ ≤ supφ, we have that

h(f) + inf φ ≤ P (f, φ) ≤ h(f) + supφ (10.3.6)

for every potential φ. An interesting corollary is that if h(f) is finite thenP (f, φ) <∞ for every potential φ and, otherwise, P (f, φ) = ∞ for every poten-tial φ. An example of this last situation is given in Exercise 10.1.6.

Another simple consequence of the definition is that the pressure is an in-variant of topological equivalence:

Proposition 10.3.2. Let f : M → M and g : N → N be continuous transfor-mations in compact metric spaces. If there exists a homeomorphism h : M → Nsuch that h f = g h then P (g, φ) = P (f, φ h) for every potential φ in N .

Proof. The correspondence α 7→ h(α) is a bijection between the spaces of opencovers of M and N , respectively. Moreover, since h and its inverse are (uni-formly) continuous, diamαk → 0 if and only if diamh(αk) → 0. Consider thepotential ψ = φ h in M . Note that ψn = φn h and so

supx∈U

ψn(x) = supy∈h(U)

φn(y)

for every U ⊂M and every n ≥ 1. Hence, Pn(f, ψ, α) = Pn(g, φ, h(α)) for everyn and every open cover α of M . Thus, P (f, ψ, α) = P (g, φ, h(α)) and, passingto the limit when the diameter of α goes to zero, P (f, ψ) = P (g, φ).

One may replace the supremum by the infimum in (10.3.1), that is, replacePn(f, φ, α) with

Qn(f, φ, α) = inf∑

U∈γ

infx∈U

eφn(x) : γ is a finite subcover of αn,


although this makes the definition a bit more complicated. In contrast withlogPn(f, φ, α), the sequence logQn(f, φ, α) need not be subadditive. Denote,

Q−(f, φ, α) = lim infn

1

nlogQn(f, φ, α) and

Q+(f, φ, α) = lim supn

1

nlogQn(f, φ, α).

Clearly, Q−(f, φ, α) ≤ Q+(f, φ, α) for every open cover α of M . Further-more, Qn(f, 0, α) = Pn(f, 0, α) = N(αn) for every n and so Q−(f, 0, α) =Q+(f, 0, α) = P (f, 0, α) = h(f, α).

Corollary 10.3.3. For any potential φ : M → R,

P (f, φ) = limdiamα→0

Q+(f, φ, α) = limdiamα→0

Q−(f, φ, α).

Proof. Since φ is (uniformly) continuous, given any ε > 0 there exists δ > 0such that

infx∈C

φn(x) ≤ supx∈C

φn(x) ≤ nε+ infx∈C

φn(x)

whenever diamC ≤ δ. So,

Qn(f, φ, α) ≤ Pn(f, φ, α) ≤ enεQn(f, φ, α)

for every open cover α with diamα ≤ δ. It follows that

lim supn

1

nlogQn(f, φ, α) ≤ P (f, φ, α) ≤ ε+ lim inf

n

1

nlogQn(f, φ, α).

As the diameter of α goes to zero, we may take ε→ 0. Thus,

limdiamα→0

Q−(f, φ, α) = limdiamα→0

Q+(f, φ, α) = limdiamα→0

P (f, φ, α) = P (f, φ),

as claimed.

10.3.2 Generating sets and separated sets

Now we present two alternative definitions of pressure, in terms of generatingsets and separated sets. As before, f : M → M is a continuous transformationin a compact metric space and φ : M → R is a continuous function.

Given n ≥ 1 and ε > 0, define

Gn(f, φ, ε) = inf∑

x∈E

eφn(x) : E is an (n, ε)-generating set for M

and

Sn(f, φ, ε) = sup∑

x∈E

eφn(x) : E is an (n, ε)-separated set in M.

(10.3.7)

10.3. PRESSURE 335

Next, define

G(f, φ, ε) = lim supn

1

nlogGn(f, φ, ε) and

S(f, φ, ε) = lim supn

1

nlogSn(f, φ, ε)

(10.3.8)

and also

G(f, φ) = limε→0

G(f, φ, ε) and S(f, φ) = limε→0

S(f, φ, ε) (10.3.9)

(these limits exist because the functions are monotonic in ε).Note thatGn(f, 0, ε) = gn(f, ε) and Sn(f, 0, ε) = sn(f, ε) for every n ≥ 1 and

every ε > 0. Therefore (Proposition 10.1.6), G(f, 0) = g(f) and S(f, 0) = s(f)coincide with the topological entropy h(f). In fact,

Proposition 10.3.4. P (f, φ) = G(f, φ) = S(f, φ) for every potential φ in M .

Proof. Consider n ≥ 1 and ε > 0. It is clear from the definitions that everymaximal (n, ε)-separated set is (n, ε)-generating. Then,

Sn(f, φ, ε) = sup∑

x∈E

eφn(x) : E is (n, ε)-separated

= sup∑

x∈E

eφn(x) : E is (n, ε)-separated maximal

≥ inf∑

x∈E

eφn(x) : E is (n, ε)-generating

= Gn(f, φ, ε)

(10.3.10)

for every n and every ε. This implies that G(f, φ, ε) ≤ S(f, φ, ε) for every εand, thus, G(f, φ) ≤ S(f, φ).

Next, we prove that S(f, φ) ≤ P (f, φ). Let ε and δ be positive numberssuch that d(x, y) ≤ δ implies |φ(x) − φ(y)| ≤ ε. Let α be any open cover of Mwith diamα < δ and E ⊂ M be any (n, δ)-separated set. Given any subcoverγ of αn, it is obvious that every point of E is contained in some element of γ.On the other hand, the hypothesis that E is (n, δ)-separated implies that eachelement of γ contains at most one element of E. Therefore,

∑

x∈E

eφn(x) ≤∑

U∈γ

supy∈U

eφn(y).

Taking the supremum in E and the infimum in γ, we get that

Sn(f, φ, δ) ≤ Pn(f, φ, α). (10.3.11)

It follows that S(f, φ, δ) ≤ P (f, φ, α). Making δ → 0 (hence diamα → 0), weconclude that S(f, φ) ≤ P (f, φ), as stated.

Finally, we prove that P (f, φ) ≤ G(f, φ). Let ε and δ be positive numberssuch that d(x, y) ≤ δ implies |φ(x) − φ(y)| ≤ ε. Let α be any open cover of Mwith diamα < δ and ρ > 0 be a Lebesgue number of α. Let E ⊂ M be any


(n, ρ)-generating set for M . For each x ∈ E and i = 0, . . . , n − 1, there existsAx,i ∈ α such that B(f i(x), ρ) is contained in Ax,i. Denote

γ(x) =

n−1⋂

i=0

f−i(Ax,i).

Observe that γ(x) ∈ αn and B(x, n, ρ) ⊂ γ(x). Hence, the hypothesis that E is(n, ρ)-generating implies that γ = γ(x) : x ∈ E is a subcover of α. Observealso that

supy∈γ(x)

φn(y) ≤ nε+ φn(x) for every x ∈ E,

since diamAx,i < δ for every i. It follows that

∑

U∈γ

supy∈U

eφn(y) ≤ enε∑

x∈E

eφn(x).

This proves that Pn(f, φ, α) ≤ enεGn(f, φ, ρ) for every n ≥ 1 and, consequently,

P (f, φ, α) ≤ ε+ lim infn

1

nGn(f, φ, ρ) ≤ ε+G(f, φ, ρ). (10.3.12)

Making ρ → 0 we find that P (f, φ, α) ≤ ε + G(f, φ). Hence, making ε, δ anddiamα go to zero, P (f, φ) ≤ G(f, φ).

The conclusion of Proposition 10.3.4 may be rewritten as follows:

P (f, φ) = lims→0

lim supn

1

nlogGn(f, φ, s)

= lims→0

lim supn

1

nlogSn(f, φ, s).

(10.3.13)

The relations (10.3.12) and (10.3.10) in the proof also give that

P (f, φ) ≤ lims→0

lim infn

1

nlogGn(f, φ, s) ≤ lim

s→0lim inf

n

1

nlogSn(f, φ, s).

Combining these observations we get:

P (f, φ) = lims→0

lim infn

1

nlogGn(f, φ, s)

= lims→0

lim infn

1

nlogSn(f, φ, s).

(10.3.14)

10.3.3 Properties

Properties of the pressure function in the spirit of Proposition 10.1.10 and Corol-lary 10.1.13 are stated in Exercise 10.3.3. Let us also extend Propositions 10.1.14and 10.1.15 to the present context:

10.3. PRESSURE 337

Proposition 10.3.5. Let f : M → M be a continuous transformation in acompact metric space and φ be a potential in M . Then:

(1) P (fk, φk) = kP (f, φ) for every k ≥ 1.

(2) If f is a homeomorphism then P (f−1, φ) = P (f, φ).

Proof. Given a potential φ : M → R and an open cover α, denote ψ =∑k−1

i=0 φf i and β = ∨k−1

i=0 f−i(α). Let g = fk. It is clear that

n−1∑

j=0

ψ gj =kn−1∑

l=0

φ f l and ∨n−1j=0 g

−j(β) = ∨nk−1l=0 f−l(α).

Then,

Pn(g, ψ, β) = inf∑

U∈γ

supx∈U

ePn−1

j=0 ψ(gj(x)) : γ ⊂ ∨n−1j=0 g

−j(β)

= inf∑

U∈γ

supx∈U

ePkn−1

l=0 φ(f l(x)) : γ ⊂ ∨nk−1l=0 f−l(α) = Pkn(f, φ, α).

Consequently, P (fk, ψ, β) = kP (f, φ, α) for any α. Making diamα → 0 (notethat diamβ → 0), we deduce that P (fk, ψ) = kP (f, φ). This proves part (1).

Suppose that f is a homeomorphism. Given an open cover α and an integernumber n ≥ 1, denote

φ−n =

n−1∑

j=0

φ f−j and αn− = α ∨ f(α) ∨ · · · ∨ fn−1(α).

It is clear that φ−n = φn fn−1 and αn− = fn−1(αn). Moreover, γ is a subcoverof αn if and only if δ = fn−1(γ) is a subcover of αn−. Combining these facts, wefind that

Pn(f−1, φ, α) = inf

∑

U∈δ

supx∈U

eφ−

n (x) : γ ⊂ αn−

= inf∑

V ∈γ

supy∈V

eφn(y) : δ ⊂ αn

= Pn(f, φ, α)

for every n ≥ 1. Hence, P (f−1, φ, α) = P (f, φ, α) for every open cover α.Making diamα→ 0, we reach the conclusion in part (2).

Next, we fix the transformation f : M → M and we consider P (f, ·) as afunction in the space C0(M) of all continuous functions, with the norm definedby

‖ϕ‖ = sup|ϕ(x)| : x ∈M.We have seen in (10.3.6) that if the topological entropy h(f) is infinite then thepressure function is constant equal to ∞. In what follows we assume that h(f)is finite. Then, P (f, φ) is finite for every potential φ.


Proposition 10.3.6. The pressure function is Lipschitz, with Lipschitz con-stant equal to 1: |P (f, φ) − P (f, ψ)| ≤ ‖φ− ψ‖ for any potentials φ and ψ.

Proof. Clearly, φ ≤ ψ + ‖φ− ψ‖. Hence, by (10.3.4) and (10.3.5), we have thatP (f, φ) ≤ P (f, ψ) + ‖φ − ψ‖. Exchanging the roles of φ and ψ, one gets theother inequality.

Proposition 10.3.7. The pressure function is convex:

P (f, (1 − t)φ+ tψ) ≤ (1 − t)P (f, φ) + tP (f, ψ)

for any potentials φ and ψ in M and any 0 ≤ t ≤ 1.

Proof. Write ξ = (1− t)φ+ tψ. Then, ξn = (1− t)φn+ tψn for every n ≥ 1 and,thus, sup(ξn | U) ≤ (1− t) sup(φn | U) + t sup(ψn | U) for every U ⊂M . Then,by the Holder inequality (Theorem A.5.5),

∑

U∈γ

supx∈U

eξn(x) ≤(∑

U∈γ

supx∈U

eφn(x))1−t(∑

U∈γ

supx∈U

eψn(x))t

for any finite family γ of subsets of M . This implies that, given any open coverα,

Pn(f, ξ, α) ≤ Pn(f, φ, α)1−tPn(f, ψ, α)t

for every n ≥ 1 and, hence, P (f, ξ, α) ≤ (1− t)P (f, φ, α) + tP (f, ψ, α). Passingto the limit when diamα→ 0, we get the conclusion of the proposition.

We say that two potentials φ, ψ : M → R are cohomologous if there exists acontinuous function u : M → R such that φ = ψ + u f − u. Note that this isan equivalence relation in the space of potentials (Exercise 10.3.6).

Proposition 10.3.8. Let f : M → M be a continuous transformation in acompact topological space. If φ, ψ : M → R are cohomologous potentials thenP (f, φ) = P (f, ψ).

Proof. If ψ = φ + u f − u then ψn(x) = φn(x) + u(fn(x)) − u(x) for everyn ∈ N. Let K = sup |u|. Then, | supx∈C ψn(x) − supx∈C φn(x)| ≤ 2K for everyset C ⊂M . Hence, for any open cover γ,

e−2K∑

U∈γ

supx∈U

eφn(x) ≤∑

U∈γ

supx∈U

eψn(x) ≤ e2K∑

U∈γ

supx∈U

eφn(x).

This implies that, given any open cover α of M ,

e−2KPn(f, φ, α) ≤ Pn(f, ψ, α) ≤ e2KPn(f, φ, α)

for every n. Therefore, P (f, φ, α) = P (f, ψ, α) for every α and, consequently,P (f, φ) = P (f, ψ).

10.3. PRESSURE 339

10.3.4 Comments in Statistical Mechanics

Let us make a pause to explain what is the relation between the mathematicalconcept of pressure and the issues in Physics that originated it. This also servesas a preview to Chapter 12, where this theory will be developed in the context ofexpanding maps in metric spaces. The discussion that follows is a combination ofmathematical results and physical considerations, not necessarily rigorous, andis quite brief: we refer the reader to the classical works of David Ruelle [Rue04]and Oscar Lanford [Lan73] for actual presentations of the subject.

The goal of Statistical Mechanics is to describe the properties of physicalsystems consisting of a large number of units that interact with each other. Forexample, these units may be particles, such as molecules of a gas, or sites in acrystal grid, which may or may not be occupied by particles. The constant ofAvogadro 6.022×1023 illustrates what one means by ‘large’ in specific situationsin this context.

The main challenge in this area of Mathematical Physics is to understandthe phenomena of phase transitions, that is, sudden changes from one physi-cal state to another: for example, what happens when liquid water turns intoice? Why does this occur suddenly, at a given freezing temperature? Mathe-matical methods developed for tackling this kind of questions turn out to bevery useful in other areas of science, such as quantum field theory and, closerto the scope of this book, the ergodic theory of hyperbolic dynamical systems(Bowen [Bow75a]).

In order to formulate these problems in mathematical terms, it is convenientto assume that the set L of units in the system is actually infinite, because finitesystems do not have genuine phase transitions. The best studied examples arethe lattice systems, for which L = Zd with d ≥ 1. It is assumed that each unithas a finite set F of possible values (or ‘states’). For example, F = −1,+1 inthe case of spin systems, with ±1 representing the two possible orientations ofthe particle’s ‘spin’, and F = 0, 1 in the case of lattice gases : 1 means thatthe site k ∈ L is occupied by a gas molecule, whereas 0 means that the site isempty.

Then, the system’s configuration space is a subset Ω of the product spaceFL. We assume Ω to be closed in FL and invariant under the shift map

σn : FL → FL, (ξk)k∈L 7→ (ξk+n)k∈L

for every n ∈ L. A state of the system is a probability measure µ on Ω: intu-itively, one presumes that at the microscopical level the system oscillates ran-domly between different configurations ξ ∈ Ω (for example, different positionsor velocities of the molecules), all corresponding to the same macroscopic pa-rameters (same temperature, etc); then, the measure µ describes the probabilitydistribution of these microscopic configurations.

States corresponding to macroscopic configurations that can be physicallyobserved, that is, that actually occur in Nature, are called equilibrium states.This notion has a central role in the theory, in particular, because phase tran-sitions are associated with the coexistence of more than one equilibrium state.


Under our hypotheses on Ω, one can show that every equilibrium state µ isinvariant under the shift maps σn, n ∈ L. Thus, the study of lattice systems isnaturally inserted in the scope of Ergodic Theory.

According to the variational principle of Statistical Mechanics, which goesback to the principle of least action of Maupertuis, the equilibrium states arecharacterized by the fact that they minimize a certain fundamental quantity,called Gibbs free energy, whose definition involves the energy E, the temperatureT and the entropy S of the system’s state. The pressure of that state is, simply,the product of the Gibbs free energy by a negative factor −β whose naturewill be explained in a little while1. Therefore, the equilibrium states are alsocharacterized by the fact that they maximize the pressure among all probabilitymeasures invariant under the shift maps σn, n ∈ L.

From these facts, one can obtain a rather explicit description of the equi-librium states for lattice systems: under suitable hypotheses, the equilibriumstates are precisely the Gibbs states invariant under the shift maps. In the re-mainder of this section we are going to motivate and define this concept ofGibbs state, which will also allow us to illustrate the ideas outlined in the pre-vious paragraphs. By the end of the section we briefly comment on the case ofone-dimensional lattice systems, that is, the case d = 1, whose theory is muchsimpler and which is more closely related to the topics treated in this book.

Let us start by considering the particularly simple case of finite systems,that is, such that the configuration space Ω is finite. The entropy of a state µin Ω is the number

S(µ) =∑

ξ∈Ω

−µ(ξ) logµ(ξ).

To each configuration ξ ∈ Ω corresponds a value E(ξ) for the energy of thesystem. Denote by E(µ) the energy of the state µ, that is, the mean

E(µ) =∑

ξ∈Ω

µ(ξ)E(ξ).

Take the system’s absolute temperature T to be constant in time. Then, theGibbs free energy is defined by

G(µ) = E(µ) − κTS(µ),

where κ = 1, 380 × 10−23m2kg s−2K−1 is the Boltzmann constant. In otherwords, denoting β = 1/(κT ),

−βG(µ) =∑

ξ∈Ω

µ(ξ)[− βE(ξ) − logµ(ξ)

]. (10.3.15)

1From the mathematical point of view, the two quantities are equivalent. Preference forone denomination or the other has mostly to do with the physical interpretation of the set F :for spin systems one usually refers to the Gibbs free energy, whereas for lattice gases, wherethe elements of F describe the rate of occupation of each site, it is more natural to refer tothe pressure.

10.3. PRESSURE 341

This expression is denoted by P (µ) and is called pressure of the state µ.It is easy to check that the pressure (10.3.15) is maximum (hence, the Gibbs

free energy G(µ) is minimum) if and only if

µ(ξ) =e−βE(ξ)

∑η∈Ω e

−βE(η)for every ξ ∈ Ω (10.3.16)

(see Lemma 10.4.4 below). Therefore, the Gibbs distribution µ given by (10.3.16)is the unique equilibrium state of the system. In particular, in this simplecontext there are no phase transitions.

Now we sketch how this analysis can be extended to infinite lattice systems,assuming that the interaction between sites that are far apart is sufficientlyweak. It is part of the hypotheses that the energy associated with each config-uration ξ ∈ Ω comes from the pairwise interactions between the different sitesin the lattice (including self-interactions) and that this interaction is invariantunder the shift maps σn, n ∈ L. Then, the energy Ek,l resulting from the ac-tion of any site k ∈ L on any other site l ∈ L depends only on their relativeposition and on the values of ξk and ξl. In other words, there exists a functionΨ : L× F × F → R such that

Ek,l = Ψ(k − l, ξk, ξl) for any k, l ∈ L.

It is also assumed that the strength of the interaction decays exponentiallywith the distance between the sites, in the following sense: there exist constantsK > 0 and θ > 0 such that

|Ψ(m, a, b)| ≤ Ke−θ|m| for any m ∈ L and a, b ∈ F , (10.3.17)

where |m| = max|m1|, . . . , |md|. In particular (Exercise 10.3.9), the energy

ϕ(ξ) =∑

k∈L

Ψ(k, ξk, ξ0)

resulting from the action of all the sites on the site 0 at the origin is uniformlybounded.

Initially, given any finite set Λ ⊂ L, let us consider the system one obtainsby observing only the sites k ∈ Λ and “switching off” their interactions withthe sites in the complement of Λ. This is a finite system, as the configurationspace is contained in FΛ, with energy function given by

EΛ(x) =∑

l∈Λ

∑

k∈Λ

Ψ(k − l, xk, xl) for every x ∈ FΛ.

Hence, according to (10.3.16), its Gibbs distribution µΛ is given by

µΛ(x) =e−βEΛ(x)

∑y∈FΛ e−βEΛ(y)

for each x ∈ FΛ. (10.3.18)


The notion of Gibbs state is obtained from this one by “switching back on”the interaction with the sites outside Λ, in the way we are going to explain.Denote by rΛ(x) the expression on the right-hand side of (10.3.18). Observethat

rΛ(x) =[ ∑

y∈FΛ

eβEΛ(x)−βEΛ(y)]−1

and recall that

EΛ(x) − EΛ(y) =∑

l∈Λ

∑

k∈Λ

Ψ(k − l, xk, xl) − Ψ(k − l, yk, yl).

For ξ, η ∈ FL, define

E(ξ, η) =∑

l∈L

∑

k∈L

Ψ(k − l, ξk, ξl) − Ψ(k − l, ηk, ηl)

=∑

l∈L

∑

j∈L

Ψ(j, ξj+l, ξl) − Ψ(j, ηj+l, ηl) =∑

l∈L

ϕ(σl(ξ)) − ϕ(σl(η)).

It follows from the condition (10.3.17) that this sum converges whenever thetwo configurations are such that ξk = ηk for every k in the complement of somefinite set (Exercise 10.3.9). Then,

ρΛ(ξ) =[β

∑

η|Λc=ξ|Λc

eβE(ξ,η)]−1

is well defined for every ξ ∈ FΛ.A probability measure µ supported in Ω ⊂ FL is called a Gibbs state if, for

every finite set Λ ⊂ L, the disintegration µΛ,θ : θ ∈ FΛc of µ relative to thepartition FΛ × θ : θ ∈ FΛc of the space FL = FΛ × FΛc

is given by

µΛ,θ(x × θ) =

ρΛ(x, θ) if (x, θ) ∈ Ω0 otherwise.

To conclude this section, we state one of the main results of this formalismthat we have been describing: one-dimensional lattice systems exhibit no phasetransitions. More precisely:

Theorem 10.3.9 (Ruelle). If d = 1 and the interactions decay exponentiallywith the distance, then there exists a unique Gibbs state and it is also the uniqueequilibrium state.

The arguments in the proof of this theorem (Ruelle [Rue04]) are at the basisof the thermodynamical formalism of expanding maps, which we are going topresent in Chapter 12. Let us point out that the theorem is false in dimensiond ≥ 2. Exercise 10.3.10 highlights one of the specificities of the one-dimensionalcase that are behind this result.

10.3. PRESSURE 343

10.3.5 Exercises

10.3.1. Check that the sequence logPn(f, φ, α) is subadditive.

10.3.2. Show that if f is a homeomorphism then P (f, φ, f(α)) = P (f, φ, α) andQ+(f, φ, f(α)) = Q+(f, φ, α) and Q−(f, φ, f(α)) = Q−(f, φ, α) for every opencover α.

10.3.3. Show that, for any potential φ : M → R:

(a) If α, β are open covers with α ≺ β then Q+(f, φ, α) ≤ Q+(f, φ, β) andQ−(f, φ, α) ≤ Q−(f, φ, β).

(b) Q+(f, φ, α) = Q+(f, φ, αk) andQ−(f, φ, α) = Q−(f, φ, αk) for every k ≥ 1and every open cover α.

(c) Q+(f, φ, α) = P (f, φ) = Q−(f, φ, α) for any open cover α such thatdiamαk → 0.

(d) P (f, φ, α) = P (f, φ, αk) for every k ≥ 1 and any open cover α whoseelements are pairwise disjoint.

(e) P (f, φ, α) = P (f, φ) for any open cover α such that diamαk → 0 andwhose elements are pairwise disjoint.

(f) If f is a homeomorphism, one may replace αk by α±k in the statements(b), (c), (d) and (e).

10.3.4 (Walters). Prove that

P (f, φ) = supQ−(f, φ, α) : α is an open cover of M= supQ+(f, φ, α) : α is an open cover of M.

[Observation: In particular, the pressure depends only on the topology of M ,not the distance. This also provides a way to extend the definition to continuoustransformations in compact topological spaces.]

10.3.5. Exhibit a homeomorphism f : M → M , a potential φ : M → Rand open covers α and β of a compact metric space M such that α ≺ β andP (f, φ, α) > P (f, φ, β) = P (f, φ). [Observation: Thus, the conclusions of Exer-cise 10.3.3(a) and Exercise 10.3.4 are no longer valid if one replaces Q±(f, φ, α)by P (f, φ, α).]

10.3.6. Check that the cohomology relation

φ ∼ ψ ⇔ ψ = φ+ u f − u for some continuous function u : M → R

is an equivalence relation.


10.3.7. Let fi : Mi → Mi, i = 1, 2 be continuous transformations in compactmetric spaces and, for each i, let φi be a potential in Mi. Define

f1 × f2 : M1 ×M2 →M1 ×M2, f1 × f2(x1, x2) = (f1(x1), f2(x2))

φ1 × φ2 : M1 ×M2 → R, φ1 × φ2(x1, x2) = φ1(x1) + φ2(x2).

Show that P (f1 × f2, φ1 × φ2) = P (f1, φ1) + P (f2, φ2).

10.3.8. Consider the transformation f : S1 → S1 defined by f(x) = 2x mod Z.Prove that if φ : S1 → R is a Holder function then

P (f, φ) = limn

1

nlog

∑

p∈Fix(fn)

eφn(p).

[Observation: We will get a more general result in Exercise 11.3.4.]

10.3.9. Assuming the conditions in Section 10.3.4, prove that:

(a) There exists C > 0 such that |ϕ(ξ)| ≤ C and |ϕ(ξ)− ϕ(η)| ≤ Ce−θN/2 forany ξ, η ∈ FL such that ξk = ηk whenever |k| < N .

(b) For any finite set Λ ⊂ L, there exists CΛ > 0 such that |E(ξ, η)| ≤ CΓ forany ξ, η ∈ FL such that ξk = ηk for every k ∈ Λc.

10.3.10. Assuming the conditions in Section 10.3.4, prove that if d = 1 thenthere exists C > 0 such that |E(ξ, η) − EΛ(ξ) + EΛ(η)| ≤ C for every finiteinterval Λ ⊂ Z and any ξ, η ∈ FZ with ξ | Λc = η | Λc. Consequently,

e−βC ≤ rΛ(ξ | Λ)

ρΛ(ξ)≤ eβC

for every ξ ∈ FZ and every finite interval Λ. [Observation: Therefore, theprobability distribution of the configurations restricted to each finite interval isnot affected in a significant way by the interactions with the sites outside thatinterval.]

10.4 Variational principle

The variational principle for the pressure, which we state in the sequel, wasproven originally by Ruelle [Rue73], under more restrictive assumptions, andwas then extended by Walters [Wal75] to the context we consider here:

Theorem 10.4.1 (Variational principle). Let f : M → M be a continu-ous transformation in a compact metric space and M1(f) denote the set ofprobability measures invariant under f . Then, for every continuous functionφ : M → R,

P (φ, f) = suphν(f) +

∫φdν : ν ∈ M1(f).

10.4. VARIATIONAL PRINCIPLE 345

Theorem 10.1 corresponds to the special case φ ≡ 0. In particular, it fol-lows that the topological entropy of f is zero if and only if hν(f) = 0 forevery invariant probability measure ν. That is the case, for example, for ev-ery circle homeomorphism (Example 10.1.1) and every translation in a compactmetrizable group (Example 10.1.7). The compactness hypothesis is crucial: asobserved in Exercise 10.4.4, there exist transformations (in non-compact spaces)without invariant measures and whose topological entropy is positive.

In Sections 10.4.1 and 10.4.2 we present a proof of Theorem 10.4.1 that is dueto Misiurewicz [Mis76]. Before that, let us mention a couple of consequences.

Corollary 10.4.2. Let f : M → M be a continuous transformation in a com-pact metric space and Me(f) denote the set of probability measures invariantand ergodic. Then,

P (φ, f) = suphν(f) +

∫φdν : ν ∈ Me(f).

Proof. Given any ν ∈ M1(f), let νP : P ∈ P be its ergodic decomposition.By Theorems 5.1.3 and 9.6.2,

hν(f) +

∫φdν =

∫ (hνP (f) +

∫φdνP

)dµ(P ).

This implies that

suphν(f) +

∫φdν : ν ∈ M1(f) ≤ suphν(f) +

∫φdν : ν ∈ Me(f).

The converse inequality is trivial, since Me(f) ⊂ M1(f). Now it suffices toapply Theorem 10.4.1.

Another interesting consequence is that for transformations with finite topo-logical entropy the pressure function determines the set of all invariant proba-bility measures:

Corollary 10.4.3 (Walters). Let f : M → M be a continuous transformationin a compact metric space with topological entropy h(f) < ∞. Let η be anyfinite signed measure on M . Then, η is a probability measure invariant underf if and only if

∫φdη ≤ P (f, φ) for every continuous function φ : M → R.

Proof. The ‘only if’ claim is an immediate consequence of Theorem 10.4.1: if ηis an invariant probability measure then

P (f, φ) ≥ hη(f) +

∫φdη ≥

∫φdη

for every continuous function φ. In what follows we prove the converse.Let η be a finite signed measure such that

∫φdη ≤ P (f, φ) for every φ.

Consider any φ ≥ 0. For any c > 0 and ε > 0,

c

∫(φ+ ε) dη = −

∫−c(φ+ ε) dη ≥ −P (f,−c(φ+ ε)).


By the relation (10.3.6), we also have that

P (f,−c(φ+ ε)) ≤ h(f) + sup(− c(φ+ ε)

)= h(f) − c inf(φ+ ε).

Therefore, c∫(φ+ ε) dη ≥ −h(f)+ c inf(φ+ ε). When c > 0 is sufficiently large

the right-hand side of this inequality is positive. Hence,∫(φ + ε) dη > 0. Since

ε > 0 is arbitrary, this implies that∫φdη ≥ 0 for every φ ≥ 0. So, η is a

positive measure.The next step is to show that η is a probability measure. By assumption,

∫c dη ≤ P (f, c) = h(f) + c

for every c ∈ R. For c > 0, this implies that η(M) ≤ 1 + h(f)/c. Passing tothe limit when c→ +∞, we get that η(M) ≤ 1. Analogously, considering c < 0and passing to the limit when c → −∞, we get η(M) ≥ 1. Therefore, η is aprobability measure, as stated.

We are left to prove that η is invariant under f . By assumption, given anyc ∈ R and any potential φ,

c

∫(φ f − φ) dη ≤ P (f, c(φ f − φ)).

By Proposition 10.3.8, the expression on the right-hand side is equal to P (f, 0) =h(f). For c > 0, this implies that

∫(φ f − φ) dη ≤ h(f)

c

and, passing to the limit when c→ +∞, it follows that∫

(φf−φ) dη ≤ 0. Thesame argument, applied to the function −φ, shows that

∫(φ f − φ) dη ≥ 0.

Hence,∫φ f dη =

∫φdη for every φ. By Proposition A.3.3, this implies that

f∗η = η.

10.4.1 Proof of the upper bound

In this section we prove that, given any invariant probability measure ν,

hν(f) +

∫φdν ≤ P (f, φ). (10.4.1)

For that, let P = P1, . . . , Ps be any finite partition. We are going to showthat if α is an open cover of M with sufficiently small diameter, depending onlyon P , then

hν(f,P) +

∫φdν ≤ log 4 + P (f, φ, α). (10.4.2)

Making diamα→ 0, it follows that hν(f,P)+∫φdν ≤ log 4+P (f, φ) for every

finite partition P . Hence, hν(f) +∫φdν ≤ log 4 + P (f, φ). Now replace the


transformation f by fk and the potential φ by φk. Note that∫φk dν = k

∫φdν,

since ν is invariant under f . Using also Propositions 9.1.14 and 10.3.5, we getthat

khν(f,P) + k

∫φdν ≤ log 4 + kP (f, φ)

for every k ≥ 1. Dividing by k and passing to the limit when k → ∞, we getthe inequality (10.4.1).

For proving (10.4.2) we need the following elementary fact:

Lemma 10.4.4. Let a1, . . . , ak be real numbers and p1, . . . , pk be non-negativenumbers such that p1 + · · · + pk = 1. Let A =

∑ki=1 e

ai . Then,

k∑

i=1

pi(ai − log pi) ≤ logA.

Moreover, the identity holds if and only if pj = eaj/A for every j.

Proof. Write ti = eai/A and xi = pi/eai . Note that

∑ki=1 ti = 1. By the

concavity property (9.1.8) of the function φ(x) = −x log x,

k∑

i=1

tiφ(xi) ≤ φ(

k∑

i=1

tixi).

Note that tiφ(xi) = (pi/A)(ai − log pi) and∑k

i=1 tixi = 1/A. So, the previousinequality may be rewritten as follows:

k∑

i=1

piA

(ai − log pi) ≤1

AlogA.

Multiplying by A we get the inequality in the statement of the lemma. Moreover,the identity holds if and only if the xi are all equal, that is, if and only if thereexists c such that pi = ceai for every i. Summing over i = 1, . . . , k we see thatin that case c = 1/A, as stated.

Since the measure ν is regular (Proposition A.3.2), given ε > 0 we may findcompact sets Qi ⊂ Pi such that ν(Pi \ Qi) < ε for every i = 1, . . . , s. Let Q0

be the complement of ∪si=1Qi and let P0 = ∅. Then, Q = Q0, Q1, . . . , Qs is afinite partition of M such that ν(Pi∆Qi) < sε for every i = 0, 1, . . . , s. Hence,by Lemma 9.1.6,

Hν(P/Q) ≤ log 2

as long as ε > 0 is sufficiently small (depending only on s). Let ε and Q befixed from now on and assume that the open cover α satisfies

diamα < mind(Qi, Qj) : 1 ≤ i < j ≤ s. (10.4.3)


By Lemma 9.1.11, we have that hν(f,P) ≤ hν(f,Q) +Hν(P/Q) ≤ hν(f,Q) +log 2. Hence, to prove (10.4.2) it suffices to show that

hν(f,Q) +

∫φdν ≤ log 2 + P (f, φ, α). (10.4.4)

To that end, observe that

Hν(Qn) +

∫φn dν ≤

∑

Q∈Qn

ν(Q)(− log ν(Q) + sup

x∈Qφn(x)

)

for every n ≥ 1. Then, by Lemma 10.4.4,

Hν(Qn) +

∫φn dν ≤ log

( ∑

Q∈Qn

supx∈Q

eφn(x)). (10.4.5)

Let γ be any finite subcover of αn. For each Q ∈ Qn, consider any pointxQ in the closure of Q such that φn(xQ) = supx∈Q φn(x). Pick UQ ∈ γ suchthat xQ ∈ UQ. Then, supx∈Q φn(x) ≤ supy∈UQ

φn(y) for every Q ∈ Qn. Thecondition (10.4.3) implies that each element of α intersects the closure of notmore than two elements of Q. Therefore, each element of αn intersects theclosure of not more than 2n elements of Qn. In particular, for each U ∈ γ thereexist not more than 2n sets Q ∈ Qn such that UQ = U . Therefore,

∑

Q∈Qn

supx∈Q

eφn(x) ≤ 2n∑

U∈γ

supy∈U

eφn(y), (10.4.6)

for any finite subcover γ of αn. Combining (10.4.5) and (10.4.6),

Hν(Qn) +

∫φn dν ≤ n log 2 + logPn(f, φ, α).

Dividing by n and passing to the limit when n → ∞, we get (10.4.4). Thiscompletes the proof of the upper bound (10.4.1).

10.4.2 Approximating the pressure

To finish the proof of Theorem 10.4.1, we now show that for every ε > 0 thereexists a probability measure µ invariant under f and such that

hµ(f) +

∫φdµ ≥ S(f, φ, ε) (10.4.7)

Clearly, this implies that the supremum of the values of hν(f) +∫φdν when ν

varies in M1(f) is larger or equal than S(f, φ) = P (f, φ).For each n ≥ 1, let E be an (n, ε)-separated set such that

∑

y∈E

eφn(y) ≥ 1

2Sn(f, φ, ε). (10.4.8)


Denote by A the expression on the left-hand side of this inequality. Considerthe probability measures νn and µn defined on M by:

νn =1

A

∑

x∈E

eφn(x)δx and µn =

n−1∑

j=0

f j∗νn.

By the definition (10.3.8), recalling also that the space of probability measuresis compact (Theorem 2.1.5), we may choose a subsequence (nj)j → ∞ such that

1.1

njlogSnj (f, φ, ε) converges to S(f, φ, ε) and

2. µnj converges, in the weak∗ topology, to some probability measure µ.

We are going to check that such a measure µ is invariant under f and satisfies(10.4.7). For the reader’s convenience, we split the argument into four steps.

Step 1: First, we prove that µ is invariant. Let ϕ : M → R be any continuousfunction. For each n ≥ 1,

∫ϕd(f∗µn) =

1

n

n∑

j=1

∫ϕ f j dνn =

∫ϕdµn +

1

n

( ∫ϕ fn dνn −

∫ϕdνn

)

and, consequently,

∣∣∫ϕd(f∗µn) −

∫ϕdµn

∣∣ ≤ 2

nsup |ϕ|.

Restricting to n = nj and passing to the limit when j → ∞, we see that∫ϕdf∗µ =

∫ϕdµ for every continuous function ϕ : M → R. This proves (recall

Proposition A.3.3) that f∗µ = µ, as claimed.

Step 2: Next, we estimate the entropy with respect to νn. Let P be any finitepartition of M such that diamP < ε and µ(∂P) = 0, where ∂P denotes theunion of the boundaries ∂P of all sets P ∈ P . The first condition implies thateach element of Pn contains at most one element of E. On the other hand, itis clear that every element of E is contained in some element of Pn. Hence,

Hνn(Pn) =∑

x∈E

−νn(x) log νn(x) =∑

x∈E

− 1

Aeφn(x) log

(1

Aeφn(x)

)

= logA− 1

A

∑

x∈E

eφn(x)φn(x) = logA−∫φn dνn

(10.4.9)(the last equality follows directly from the definition of νn).


Step 3: Now we calculate the entropy with respect to µn. Consider 1 ≤ k < n.For each r ∈ 0, . . . , k − 1, let qr ≥ 0 be the largest integer number such thatr + kqr ≤ n. In other words, qr = [(n− r)/k]. Then,

Pn = Pr ∨[ qr−1∨

j=0

f−(kj+r)(Pk)]∨ f−(kqr+r)(Pn−(kqr+r))

(the first term is void if r = 0 and the third one is void if n = kqr+r). Therefore,

Hνn(Pn) ≤qr−1∑

j=0

Hνn(f−(kj+r)(Pk)) +Hνn(Pr) +Hνn(f−(kqr+r)(Pn−(kqr+r))).

Clearly, #Pr ≤ (#P)k. Using Lemma 9.1.3, we find that Hνn(Pr) ≤ k log #P .For the same reason, the last term in the previous inequality is also boundedby k log #P . Then, using also the property (9.1.12),

Hνn(Pn) ≤qr−1∑

j=0

Hf(kj+r)∗ νn

(Pk) + 2k log #P (10.4.10)

for every r ∈ 0, . . . , k−1. Now, it is clear that every number i ∈ 0, . . . , n−1may be written in a unique way as i = kj+r with 0 ≤ j ≤ qr−1. Then, summing(10.4.10) over all the values of r,

kHνn(Pn) ≤n−1∑

i=0

Hfi∗νn

(Pk) + 2k2 log #P . (10.4.11)

The concavity property (9.1.8) of the function φ(x) = −x log x implies that

1

n

n−1∑

i=0

Hfi∗νn

(Pk) ≤ Hµn(Pk).

Combining this inequality with (10.4.11), we see that

1

nHνn(Pn) ≤

1

kHµn(Pk) +

2k

nlog #P .

On the other hand, by the definition of µn,

1

n

∫φn dνn =

1

n

n−1∑

j=0

∫φ f j dνn =

∫φdµn.

Thus, the previous inequality yields

1

nHνn(Pn) +

1

n

∫φn dνn ≤ 1

kHµn(Pk) +

∫φdµn +

2k

nlog #P . (10.4.12)


Step 4: Finally, we translate the previous estimates to the limit measure µ.From (10.4.9) and (10.4.12), we get

1

kHµn(Pk) +

∫φdµn ≥ 1

nlogA− 2k

nlog #P .

By the choice of E in (10.4.8), it follows that

1

kHµn(Pk) +

∫φdµn ≥ 1

nlogSn(f, φ, ε) − 1

nlog 2 − 2k

nlog #P . (10.4.13)

The choice of the partition P with µ(∂P) = 0 implies that µ(∂Pk) = 0 for everyk ≥ 1, since

∂Pk ⊂ ∂P ∪ f−1(∂P) ∪ · · · ∪ f−k+1(∂P).

In other words, every element of Pk is a continuity set for the measure µ.According to Exercise 2.1.1, it follows that µ(P ) = limj µnj (P ) for every P ∈ Pkand, hence,

Hµ(Pk) = limjHµnj

(Pk) for every k ≥ 1.

Since the function φ is continuous, we also have that∫φdµ = limj

∫φdµnj .

Therefore, restricting (10.4.13) to the subsequence (nj)j and passing to thelimit when j → ∞,

1

kHµ(Pk) +

∫φdµ ≥ S(f, φ, ε).

Passing to the limit when k → ∞, we find that

hµ(f,P) +

∫φdµ ≥ S(f, φ, ε).

Then, making ε→ 0 (and, consequently, diamP → 0), we get (10.4.7).This completes the proof of the variational principle (Theorem 10.4.1).

10.4.3 Exercises

10.4.1. Let f : M → M be a continuous transformation in a compact metricspace M . Check that P (f, ϕ) ≤ h(f) + supϕ for every continuous functionϕ : M → R.

10.4.2. Show that if f : M → M is a continuous transformation in a compactmetric space and X ⊂ M is a forward invariant set, meaning that f(X) ⊂ X ,then P (f | X,ϕ | X) ≤ P (f, ϕ).

10.4.3. Give an alternative proof of Proposition 10.3.8, using the variationalprinciple.

10.4.4. Exhibit a continuous transformation f : M → M in a non-compactmetric space M such that f has no invariant probability measure and yet thetopological entropy h(f) is positive. [Observation: Thus, the variational princi-ple need not hold when the ambient space is not compact.]


10.4.5. Given numbers α, β > 0 such that α+ β < 1, define

g : [0, α] ∪ [1 − β, 1] → [0, 1] g(x) =

x/α if x ∈ [0, α](x− 1)/β + 1 if x ∈ [1 − β, 1].

LetK ⊂ [0, 1] be the Cantor set formed by the points x such that gn(x) is definedfor every n ≥ 0 and f : K → K be the restriction of g. Calculate the functionψ : R → R defined by ψ(t) = P (f,−t log g′). Check that ψ is convex anddecreasing and admits a (unique) zero in (0, 1). Show that hµ(f) <

∫log g′ dµ

for every probability measure µ invariant under f .

10.4.6. Let f : M → M be a continuous transformation in a compact metricspace, such that the set of ergodic invariant probability measures is finite. Showthat for every potential ϕ : M → R there exists some invariant probabilitymeasure that realizes the supremum in (10.0.1).

10.5 Equilibrium states

Let f : M →M be a continuous transformation and φ : M →M be a potentialin a compact metric space. In this section we study the fundamental proper-ties of the set E(f, φ) formed by the equilibrium states, that is, the invariantprobability measures µ such that

hµ(f) +

∫φdµ = P (φ, f) = suphν(f) +

∫φdν : ν ∈ M1(f).

In the special case φ ≡ 0 the elements of E(f, φ) are also called measures ofmaximal entropy. Let us start with a few simple examples.

Example 10.5.1. If f : M → M has zero topological entropy then everyinvariant probability measure µ is a measure of maximal entropy: hµ(f) = 0 =h(f). For any potential φ : M → R,

P (f, φ) = sup∫φdν : ν ∈ M1(f).

Hence, ν is an equilibrium state if and only if ν maximizes the integral of φ.Since the function ν 7→

∫φdν is continuous and M1(f) is compact, with respect

to the weak∗ topology, maxima do exist for every potential φ.

Example 10.5.2. Let fA : M → M be the linear endomorphism induced in Td

by some matrix A with integer coefficients and non-zero determinant. Let µ bethe Haar measure on Td. By Propositions 9.4.3 and 10.2.10,

hµ(fA) =

d∑

i=1

log+ |λi| = h(f),

where λ1, . . . , λd are the eigenvalues of A. In particular, the Haar measure is ameasure of maximal entropy for f .

10.5. EQUILIBRIUM STATES 353

Example 10.5.3. Let σ : Σ → Σ be the shift map in Σ = 1, . . . , dN and µbe the Bernoulli measure associated with a probability vector p = (p1, . . . , pd).As observed in Example 9.1.10,

hµ(σ,P) = limn

1

nHµ(Pn) =

d∑

i=1

−pi log pi.

We leave it to the reader (Exercise 10.5.1) to check that this function attainsits maximum precisely when the coefficients pi are all equal to 1/d. Moreover,in that case hµ(σ) = log d. Recall also (Example 10.1.2) that h(σ) = log d.Therefore, the Bernoulli measure associated with the vector p = (1/d, . . . , 1/d)is the only measure of maximal entropy among the Bernoulli measures. In fact,it follows from the theory that we develop in Chapter 12 that µ is the uniquemeasure of maximal entropy among all invariant measures.

Let us start with the following extension of the variational principle:

Proposition 10.5.4. For every potential φ : M → R,

P (f, φ) = suphµ(f) +

∫φdµ : µ invariant and ergodic for f.

Proof. Consider the function Ψ : M1(f) → R given by Ψ(µ) = hµ(f) +∫φdµ.

For each invariant probability measure µ, let µP : P ∈ P be the correspondingergodic decomposition. It follows from Theorem 9.6.2 that

Ψ(µ) =

∫Ψ(µP ) dµ(P ). (10.5.1)

This implies that the supremum of Ψ over all the invariant probability measuresis less or equal than the supremum of Ψ over the ergodic invariant probabil-ity measures. Since the opposite inequality is obvious, it follows that the twosuprema coincide. By the variational principle (Theorem 10.4.1), the supremumof Ψ over all the invariant probability measures is equal to P (f, φ).

Proposition 10.5.5. Assume that h(f) <∞. Then the set of equilibrium statesfor any potential φ : M → R is a convex subset of M1(f): more precisely, givent ∈ (0, 1) and µ1, µ2 ∈ M1(f),

(1 − t)µ1 + tµ2 ∈ E(f, φ) ⇔ µ1, µ2 ⊂ E(f, φ).

Moreover, an invariant probability measure µ is in E(f, φ) if and only if almostevery ergodic component of µ is in E(f, φ).

Proof. As we have seen in (10.3.6), the hypothesis that the topological entropyis finite ensures that P (f, φ) < ∞ for every potential φ. Let us consider thefunctional Ψ(µ) = hµ(f)+

∫φdµ introduced in the proof of the previous result.

By Proposition 9.6.1, this functional is convex:

Ψ((1 − t)µ1 + tµ2) = (1 − t)Ψ(µ1) + tΨ(µ2)


for every t ∈ (0, 1) and any µ1, µ2 ∈ M1(f). Then, Ψ((1 − t)µ1 + tµ2) is equalto the supremum of Ψ if and only if both Ψ(µ1) and Ψ(µ2) are. This provesthe first part of the proposition. The proof of the second part is analogous: therelation (10.5.1) implies that Ψ(µ) = sup Ψ if and only if Ψ(µP ) = supΨ forµ-almost every P .

Corollary 10.5.6. If E(f, φ) is non-empty then it contains ergodic invariantprobability measures. Moreover, the extremal elements of the convex set E(f, φ)are precisely the ergodic measures contained in it.

Proof. To get the first claim it suffices to consider the ergodic components of anyelement of E(f, φ). Let us move on to proving the second claim. If µ ∈ E(f, φ)is ergodic then (Proposition 4.3.2) µ is an extremal element of M1(f) and so itmust be an extremal element of E(f, φ). Conversely, if µ ∈ E(f, φ) is not ergodicthen we may write

µ = (1 − t)µ1 + tµ2, with 0 < t < 1 and µ1, µ2 ∈ M1(f).

By Proposition 10.5.5 we have that µ1, µ2 ∈ E(f, φ), which implies that µ is notan extremal element of the set E(f, φ).

In general, the set of equilibrium states may be empty. The first example ofthis kind was given by Gurevic. The following construction is taken from thebook of Walters [Wal82]:

Example 10.5.7. Let fn : Mn → Mn be a sequence of homeomorphismsin compact metric spaces such that the sequence (h(fn))n is increasing andbounded. We are going to build a metric space M and a homeomorphismf : M →M with the following features:

• M is the union of all Mn with an additional point, denoted as ∞, endowedwith a distance function relative to which (Mn)n converges to ∞.

• f fixes the point ∞ and its restriction to each Mn coincides with fn.

Then we are going to check that f : M → M has no measure of maximalentropy. Let us explain how this is done.

Denote by dn the distance in each metric space Mn. It is no restriction toassume that dn ≤ 1 for every n. Define M = ∪nMn ∪ ∞ and consider thedistance d defined in M by:

d(x, y) =

n−2dn(x, y) if x ∈ Xn and y ∈ Xn with n ≥ 1∑mi=n i

−2 if x ∈ Xn and y ∈ Xm with n < m∑∞i=n i

−2 if x ∈ Xn and y = ∞.

We leave it to the reader to check that d is indeed a distance in M and that(M,d) is a compact space. Let β = supn h(fn). Since the sets ∞ and Mn,n ≥ 1 are invariant and cover the whole M , any ergodic probability measureµ of f satisfies either µ(∞) = 1 or µ(Mn) = 1 for some n ≥ 1. In the first


case, hµ(f) = 0. In the second one, µ may be viewed as a probability measureinvariant under fn and, consequently, hµ(f) ≤ h(fn). In particular, hµ(f) < βfor every ergodic probability measure µ of f . The previous observation alsoshows that

suphµ(f) : µ invariant and ergodic for f= sup

nsuphµ(f) : µ invariant and ergodic for fn.

According to Proposition 10.5.4, this means that h(f) = supn h(fn) = β. Thus,no ergodic invariant measure of f realizes the topological entropy. By Proposi-tion 10.5.5, it follows that f has no measure of maximal entropy.

Nevertheless, there is a broad class of transformations for which equilibriumstates do exist for every potential:

Lemma 10.5.8. If the entropy function ν 7→ hν(f) is upper semi-continuousthen E(f, φ) is compact, relative to the weak∗ topology, and non-empty, for anypotential φ : M → R.

Proof. Let (µn)n be a sequence in M1(f) such that

hµn(f) +

∫φdµn converges to P (f, φ).

Since M1(f) is compact (Theorem 2.1.5), there exists some accumulation pointµ. The assumption implies that ν 7→ hν(f) +

∫φdν is upper semi-continuous.

Consequently,

hµ(f) +

∫φdµ ≥ lim inf

nhµn(f) +

∫φdµn = P (f, φ)

and so µ is an equilibrium state, as stated. Analogously, taking any sequence(νn)n in E(f, φ) we see that every accumulation point ν is an equilibrium state.This shows that E(f, φ) is closed and, thus, compact.

Corollary 10.5.9. Assume that f : M →M is an expansive continuous trans-formation in a compact metric space M . Then, every potential φ : M → Radmits some equilibrium state.

Proof. Just combine Corollary 9.2.17 with Lemma 10.5.8.

The conclusions of Corollaries 9.2.17 and 10.5.9 remain valid when f isjust h-expansive, in the sense of Exercise 10.2.8. See Bowen [Bow72]. Misi-urewicz [Mis73] noted that the same is still true when f is just asymptoticallyh-expansive, meaning that g∗(f, ε) → 0 when ε → 0. Buzzi [Buz97] provedthat every C∞ diffeomorphism is asymptotically h-expansive. The correspond-ing statement for h-expansivity is false: Burguet, Liao, Yang [BLY] found opensets, in the C2 topology, formed by diffeomorphisms that are not h-expansive;C∞ diffeomorphisms are dense in such sets.


Combining these results of Misiurewicz and Buzzi, one gets the following the-orem of Newhouse [New90]: for every C∞ diffeomorphism f , the entropy func-tion ν 7→ hν(f) is upper semi-continuous and so equilibrium states always exist.Yomdin [Yom87] also proved that the topological entropy function f 7→ h(f) isupper semi-continuous in the realm of C∞ diffeomorphisms. Both conclusions,Newhouse’s and Yomdin’s, are usually false for Cr diffeomorphisms with r <∞,according to Misiurewicz [Mis73]. But Liao, Viana, Yang [LVY13] proved thatthey both extend to C1 diffeomorphisms away from homoclinic tangencies andany such diffeomorphism is h-expansive. In particular, equilibrium states alwaysexist in that generality.

Uniqueness of equilibrium states is a much more delicate problem. It isvery easy to exhibit transformations with infinitely many ergodic equilibriumstates. For example, let f : S1 → S1 be a circle homeomorphism with infinitelymany fixed points. The Dirac measures on those points are ergodic invariantprobability measures. Since the topological entropy h(f) is equal to zero (Ex-ample 10.5.1), each of those measures is an equilibrium state for any potentialthat attains its maximum at the corresponding point.

This type of example is trivial, of course, because the transformation is nottransitive. A more interesting question is whether an indivisibility property,such as transitivity or topological mixing, ensures uniqueness of the equilibriumstate. It turns out that this is also not true. The first counter-example (calledDyck shift) was exhibited by Krieger [Kri75]. Next, we present a particularlytransparent and flexible construction, due to Haydn [Hay]. Other interestingexamples were studied by Hofbauer [Hof77].

Example 10.5.10. Let X = ∗, 1, 2, 3, 4 and consider the subsets E = 2, 4(the even symbols) and O = 1, 3 (the odd symbols). We are going to exhibita compact H ⊂ XZ invariant under the shift map in XZ and such that therestriction σ : H → H is topologically mixing and yet admits two mutuallysingular invariant measures, µv and µa, such that

hµv (σ) = hµa(σ) = log 2 = h(σ).

Let us describe this example. By definition, H = EZ ∪ OZ ∪ H∗, where H∗

consists of the sequences x ∈ XZ that satisfy the following rule: Whenever oneblock with m symbols of one type, even or odd, is followed by another blockwith n symbols of the other type, odd or even, the two of them are separatedby a block of no less than m + n symbols ∗. In other words, the followingconfigurations are admissible in sequences x ∈ H∗:

x =(. . . , ∗, e1, . . . , em, ∗, . . . , ∗︸︷︷︸k

, o1, . . . , on, ∗ . . . ) or

x =(. . . , ∗, o1, . . . , om, ∗, . . . , ∗︸︷︷︸k

, e1, . . . , en, ∗ . . . )

with ei ∈ E, oj ∈ O and k ≥ m+n. Observe that a sequence x ∈ H∗ may startand/or end with an infinite block of ∗ but it can neither start nor end with an


infinite block of either even or odd type. It is clear that H is invariant underthe shift map. Haydn [Hay] proved that (see Exercise 10.5.6):

(a) the shift map σ : H → H is topologically mixing;

(b) h(σ) = log 2.

We know that EZ and OZ support Bernoulli measures µv and µa with entropyequal to log 2. Then, µv and µa are measures of maximal entropy for σ : H → Hand they are mutually singular.

Clearly, this construction may be modified to yield transformations with anygiven number of ergodic measures of maximal entropy. Haydn [Hay] has alsoshown how to adapt it to construct examples with multiple equilibrium statesfor other potentials as well.

In Chapters 11 and 12 we study a class of transformations, called expand-ing, for which every Holder potential admits exactly one equilibrium state. Inparticular, these transformations are intrinsically ergodic, that is, they have aunique measure of maximal entropy.

10.5.1 Exercises

10.5.1. Show that, among the Bernoulli measures of the shift map σ : Σ → Σin the space Σ = 1, . . . , dZ, the one with the largest entropy is given by theprobability vector (1/d, . . . , 1/d).

10.5.2. Let σ : Σ → Σ be the shift map in Σ = 1, . . . , dZ and φ : Σ → Rbe a locally constant potential, that is, such that φ is constant on each cylinder[0; i]). Calculate P (f, φ) and show that there exists some equilibrium state thatis a Bernoulli measure.

10.5.3. Let σ : Σ → Σ be the shift map in Σ = 1, . . . , dN. An invariantprobability measure µ is called a Gibbs state for a potential ϕ : Σ → R if thereexist P ∈ R and K > 0 such that

K−1 ≤ µ(C)

exp(ϕn(x) − nP )≤ K (10.5.2)

for every cylinder C = [0; i0, . . . , in−1] and any x ∈ C. Prove that if µ is a Gibbsstate then hµ(σ) +

∫ϕdµ coincides with the constant P in (10.5.2). Therefore,

µ is an equilibrium state if and only if P = P (σ, ϕ). Prove that for each choiceof the constant P there exists at most one ergodic Gibbs state.

10.5.4. Let f : M → M be a continuous transformation in a compact metricspace and φ : M → R be a continuous function. If µ is an equilibrium statefor φ, then the functional Fµ : C0(M) → R defined by Fµ(ψ) =

∫ψ dµ is such

that Fµ(ψ) ≤ P (f, φ+ ψ)− P (f, φ) for every ψ ∈ C0(M). Conclude that if thepressure function P (f, ·) : C0(M) → R is differentiable in every direction at apoint φ then φ admits at most one equilibrium state.


10.5.5. Let f : M → M be a continuous transformation in a compact metricspace. Show that the subset of functions φ : M → R for which there exists aunique equilibrium state is residual in C0(M).

10.5.6. Check the claims (a) and (b) in Example 10.5.10.

Chapter 11

Expanding Maps

The distinctive feature of the transformations f : M →M that we study in thelast two chapters of this book is that they expand the distance between nearbypoints: there exists a constant σ > 1 such that

d(f(x), f(y)) ≥ σd(x, y)

whenever the distance between x and y is small (a precise definition will begiven shortly). There is more than one reason why this class of transformationshas an important role in Ergodic Theory.

On the one hand, as we are going to see, expanding maps exhibit very richdynamical behavior, from the metric and topological point of view as well asfrom the ergodic point of view. Thus, they provide a natural and interestingcontext for utilizing many of the ideas and methods that were introduced alongthe text.

On the other hand, expanding maps lead to paradigms that are useful forunderstanding many other systems, technically more complex. A good illustra-tion of this is the ergodic theory of uniformly hyperbolic systems, for which anexcellent presentation can be found in the book of Bowen [Bow75a].

An important special case of expanding maps are the differentiable transfor-mations on manifolds such that

‖Df(x)v‖ ≥ σ‖v‖

for every x ∈M and every vector v tangent to M at the point x. We focus onthis case in Section 11.1. The main result (Theorem 11.1.2) is that, under thehypothesis that the Jacobian detDf is Holder, the transformation f admits aunique invariant probability measure absolutely continuous with respect to theLebesgue measure. Moreover, that probability measure is ergodic and positiveon the open subsets of M .

In Section 11.2 we extend the notion of expanding map to metric spacesand we give a global description of the topological dynamics of such maps,starting from the study of their periodic points. The main objective is to show

359

360 CHAPTER 11. EXPANDING MAPS

that the global dynamics may always be reduced to the topologically exact case(Theorem 11.2.15). In Section 11.3 we complement this analysis by showing thatfor these transformations the topological entropy coincides with the growth rateof the number of periodic points.

The study of expanding maps will proceed in Chapter 12, where we willdevelop the so-called thermodynamical formalism for such systems.

11.1 Expanding maps on manifolds

Let M be a compact manifold and f : M → M be a map of class C1. We saythat f is expanding if there exists σ > 1 and some Riemannian metric on Msuch that

‖Df(x)v‖ ≥ σ‖v‖ for every x ∈M and every v ∈ TxM . (11.1.1)

In particular, f is a local diffeomorphism: the condition (11.1.1) implies thatDf(x) is an isomorphism for every x ∈ M . In what follows, we call Lebesguemeasure on M the volume measure m induced by such Riemannian metric. Theprecise choice of the metric is not very important, since the volume measuresinduced by different Riemannian metrics are all equivalent.

Example 11.1.1. Let fA : Td → Td be the linear endomorphism of the torusinduced by some matrix A with integer coefficients and determinant differentfrom zero. Assume that all the eigenvalues λ1, . . . , λd of A are larger than 1 inabsolute value. Then, given any 1 < σ < infi |λi|, there exists an inner productin Rd relative to which ‖Av‖ ≥ σ‖v‖ for every v. Indeed, suppose that theeigenvalues are real. Consider any basis of Rd that sets A in canonical Jordanform: A = D + εN where N is nilpotent and D is diagonal with respect tothat basis. The inner product relative to which such basis is orthonormal hasthe required property, as long as ε > 0 is small enough. The reader shouldhave no difficulty extending this argument to the case when there are complexeigenvalues. This shows that the transformation fA is expanding.

It is clear from the definition that any map sufficiently close to an expandingone, relative to the C1 topology, is still expanding. Thus, the observation inExample 11.1.1 provides a whole open set of examples of expanding maps. Aclassical result of Michael Shub [Shu69] asserts a (much deeper) kind of converse:every expanding map on the torus Td is topologically conjugate to an expandinglinear endomorphism fA.

Given a probability measure µ invariant under a transformation f : M →M ,we call basin of µ the set B(µ) of all points x ∈M such that

limn→∞

1

n

n−1∑

j=0

ϕ(f j(x)) =

∫ϕdµ

for every continuous function ϕ : M → R. Note that the basin is always aninvariant set. If µ is ergodic then B(µ) is a full measure set (Exercise 4.1.5).

11.1. EXPANDING MAPS ON MANIFOLDS 361

Theorem 11.1.2. Let f : M → M be an expanding map on a compact (con-nected) manifold M and assume that the Jacobian x 7→ detDf(x) is Holder.Then f admits a unique invariant probability measure µ absolutely continuouswith respect to Lebesgue measure m. Moreover, µ is ergodic, its support coin-cides with M and its basin has full Lebesgue measure in M .

First, let us outline the strategy of the proof of Theorem 11.1.2. The detailswill be given in the forthcoming sections. The conclusion is generally false ifone omits the hypothesis of Holder continuity: see Quas [Qua99].

It is easy to check (Exercise 11.1.1) that the pre-image under f of any setwith zero Lebesgue measurem also has zero Lebesgue measure. This means thatthe image f∗ν under f of any measure ν absolutely continuous with respect tom is also absolutely continuous with respect to m. In particular, the n-th imagefn∗m is always absolutely continuous with respect to m.

In Proposition 11.1.7 we prove that the density (that is, the Radon-Nikodymderivative) of each fn∗m with respect to m is bounded by some constant inde-pendent of n ≥ 1. We deduce from this fact that every accumulation point ofthe sequence

1

n

n−1∑

j=0

f j∗m,

with respect to the weak∗ topology, is an invariant probability measure abso-lutely continuous with respect to Lebesgue measure, with density bounded bythat same constant.

An additional argument, using the fact that M is connected, proves that theaccumulation point is unique and has all the properties in the statement of thetheorem.

11.1.1 Distortion lemma

Starting the proof of Theorem 11.1.2, let us prove the following elementary fact:

Lemma 11.1.3. Let f : M → M be a local diffeomorphism of class Cr, r ≥ 1on a compact Riemannian manifold M and σ > 0 be such that ‖Df(x)v‖ ≥ σ‖v‖for every x ∈ M and every v ∈ TxM . Then there exists ρ > 0 such that, forany pre-image x of any point y ∈ M , there exists a map h : B(y, ρ) → M ofclass Cr such that f h = id and h(y) = x and

d(h(y1), h(y2)) ≤ σ−1 d(y1, y2) for every y1, y2 ∈ B(y, ρ). (11.1.2)

Proof. By the inverse function theorem, for every ξ ∈M there exist open neigh-borhoods U(ξ) ⊂ M of ξ and V (ξ) ⊂ N of f(ξ) such that f maps U(ξ) diffeo-morphically onto V (ξ). Since M is compact, it follows that there exists δ > 0such that d(ξ, ξ′) ≥ δ whenever f(ξ) = f(ξ′). In particular, we may choose theseneighborhoods in such a way that U(ξ) ∩U(ξ′) = ∅ whenever f(ξ) = f(ξ′). Foreach η ∈M , let

W (η) =⋂

ξ∈f−1(η)

V (ξ).


Since f−1(η) is finite (Exercise A.4.6), every W (η) is an open set. Fix ρ > 0such that 2ρ is a Lebesgue number for the open cover W (η) : η ∈ M of M .In particular, for every y ∈M there exists η ∈M such that B(y, ρ) is containedin W (η), that is, it is contained in V (ξ) for all ξ ∈ f−1(η). Since the U(ξ) arepairwise disjoints and #f−1(y) = degree(f) = #f−1(η), given any x ∈ f−1(y)there exists exactly one ξ ∈ f−1(η) such that x ∈ U(ξ). Let h be the restrictionto B(y, ρ) of the inverse of f : U(ξ) → V (ξ). By construction, f h = idand h(y) = x. Moreover, ‖Dh(z)‖ = ‖Df(h(z))−1‖ ≤ σ−1 for every z in thedomain of h. By the mean value theorem, this implies that h has the property(11.1.2).

Transformations h as in this statement are called inverse branches of thelocal diffeomorphism f . Now assume that f is an expanding map. The condition(11.1.1) means that in this case we may take σ > 1 in the hypothesis of thelemma. Then, the conclusion (11.1.2) implies that the inverse branches arecontractions, with uniform contraction rate.

In particular, we may define inverse branches hn of any iterate fn, n ≥ 1,as follows. Given y ∈ M and x ∈ f−n(y), let h1, . . . , hn be inverse branches off with

hj(fn−j+1(x)) = fn−j(x)

for every 1 ≤ j ≤ n. Since every hj is a contraction, its image is contained ina ball around fn−j(x) with radius smaller than ρ. Then, hn = hn · · · h1

is well defined on the closure of the ball of radius ρ around y. It is clear thatfn hn = id and hn(y) = x. Moreover, each hn is a contraction:

d(hn(y1), hn(y2)) ≤ σ−n d(y1, y2) for every y1, y2 ∈ B(y, ρ).

Lemma 11.1.4. If f : M →M is a C1 expanding map on a compact manifoldthen f is expansive.

Proof. By Lemma 11.1.3, there exists ρ > 0 such that, for any pre-image x ofa point y ∈ M , there exists a map h : B(y, ρ) → M of class C1 such thatf h = id and h(y) = x and

d(h(y1), h(y2)) ≤ σ−1 d(y1, y2) for every y1, y2 ∈ B(y, ρ).

Hence, if d(fn(x), fn(y)) ≤ ρ for every n ≥ 0 then

d(x, y) ≤ σ−n d(fn(x), fn(y)) ≤ σ−nρ,

which immediately implies that x = y.

The next result provides a good control of the distortion of the iterates off and their inverse branches, which is crucial for the proof of Theorem 11.1.2.This is the only step of the proof where we use the hypothesis that the Jacobianx 7→ detDf(x) is Holder. Note that, since f is a local diffeomorphism and M is


compact, the Jacobian is bounded from zero and infinity. Hence, the logarithmlog | detDf | is also Holder: there exist C0 > 0 and ν > 0 such that

∣∣ log | detDf(x)| − log | detDf(y)|∣∣ ≤ C0d(x, y)

ν for any x, y ∈M .

Proposition 11.1.5 (Distortion lemma). There exists C1 > 0 such that, givenany n ≥ 1, any y ∈M and any inverse branch hn : B(y, ρ) →M of fn,

log| detDhn(y1)|| detDhn(y2)|

≤ C1d(y1, y2)ν ≤ C1(2ρ)

ν

for every y1, y2 ∈ B(y, ρ).

Proof. Write hn as a composition hn = hn · · · h1 of inverse branches of f .Analogously, hi = hi · · · h1 for 1 ≤ i < n and h0 = id . Then,


=

n∑

i=1

log | detDhi(hi−1(y1))| − log | detDhi(h

i−1(y2))| .

Note that log | detDhi| = − log | detDf | hi and recall that every hj is a σ−1-contraction. Hence,


≤n∑

i=1

C0 d(hi(y1), h

i(y2))ν ≤

n∑

i=1

C0 σ−iνd(y1, y2)

ν .

Therefore, to prove the lemma it suffices to take C1 = C0

∑∞i=1 σ

−iν .

The geometric meaning of this proposition is made even more transparentby the following corollary:

Corollary 11.1.6. There exists C2 > 0 such that, for every y ∈ M and anymeasurable sets B1, B2 ⊂ B(y, ρ),

1

C2

m(B1)

m(B2)≤ m(hn(B1))

m(hn(B2))≤ C2

m(B1)

m(B2).

Proof. Take C2 = exp(2C1(2ρ)ν). It follows from the Proposition 11.1.5 that

m(hn(B1)) =

∫

B1

| detDhn| dm ≤ exp(C1(2ρ)ν)| detDhn(y)|m(B1) and

m(hn(B2)) =

∫

B2

| detDhn| dm ≥ exp(−C1(2ρ)ν)| detDhn(y)|m(B2)

Dividing the two inequalities, we get that

m(hn(B1))

m(hn(B2))≤ C2

m(B1)

m(B2).

Inverting the roles of B1 and B2 we get the other inequality.


The next result, which is also a consequence of the distortion lemma, as-serts that the iterates fn∗m of the Lebesgue measure have uniformly boundeddensities:

Proposition 11.1.7. There exists C3 > 0 such that (fn∗m)(B) ≤ C3m(B) forevery measurable set B ⊂M and every n ≥ 1.

Proof. It is no restriction to suppose that B is contained in a ballB0 = B(z, ρ) ofradius ρ around some point in the pre-image of z ∈M . Using Corollary 11.1.6,we see that

m(hn(B))

m(hn(B0))≤ C2

m(B)

m(B0)

for every inverse branch hn of fn at the point z. Moreover, (fn∗m)(B) =m(f−n(B)) is the sum of m(hn(B)) over all the inverse branches, and anal-ogously for B0. In this way, we find that

(fn∗m)(B)

(fn∗m)(B0)≤ C2

m(B)

m(B0).

It is clear that (fn∗m)(B0) ≤ (fn∗m)(M) = 1. Moreover, the Lebesgue measureof the balls with a fixed radius ρ is bounded from zero for some constant α0 > 0that depends only on ρ. So, to get the conclusion of the proposition it sufficesto take C3 = C2α0.

We also need the auxiliary result that follows. Recall that, given a functionϕ and a measure ν, we denote by ϕν the measure defined by (ϕν)(B) =

∫Bϕdν.

Lemma 11.1.8. Let ν be a probability measure on a compact metric space Xand ϕ : X → [0,+∞) be an integrable function with respect to ν. Let µi, i ≥ 1,be a sequence of probability measures on X converging, in the weak∗ topology,to a probability measure µ. If µi ≤ ϕν for every i ≥ 1 then µ ≤ ϕν.

Proof. Let B be any measurable set. For each ε > 0, let Kε be a compact subsetof B such that µ(B \Kε) and (ϕν)(B \Kε) are both less than ε (such a compactset does exist, by Proposition A.3.2). Fix r > 0 small enough that the measureof Aε \Kε is also less than ε, for both µ and ϕν, where Aε = z : d(z,Kε) < r.The set of values of r for which the boundary of Aε has positive µ-measure is atmost countable (Exercise A.3.2). Hence, up to changing r slightly if necessary,we may suppose that the boundary of Aε has measure zero. Then, µ = limi µiimplies that µ(Aε) = limi µi(Aε) ≤ (ϕν)(Aε). Making ε → 0, we conclude thatµ(B) ≤ (ϕν)(B).

Applying this lemma to our present situation, we obtain

Corollary 11.1.9. Every accumulation point µ of the sequence n−1∑n−1

j=0 fj∗m

is an invariant probability measure for f absolutely continuous with respect tothe Lebesgue measure.


Proof. Take ϕ constant equal to C3 and let ν = m. Choose a subsequence(ni)i such that µi = n−1

i

∑ni−1j=0 f j∗m converges to some probability measure µ.

Proposition 11.1.7 ensures that µi ≤ ϕν. Then, by Lemma 11.1.8, we also haveµ ≤ ϕν = C3m. This implies that µ≪ m with density bounded by C3.

11.1.2 Existence of ergodic measures

Next, we show that the measure µ we have just constructed is the unique in-variant probability measure absolutely continuous with respect to the Lebesguemeasure and, moreover, it is ergodic for f .

Start by fixing a finite partition P0 = U1, . . . , Us of M into regions withnon-empty interior and diameter less than ρ. Then, for each n ≥ 1, definePn to be the partition of M into the images of the Ui, 1 ≤ i ≤ s, under theinverse branches of fn. The diameter of each Pn, that is, the supremum of thediameters of its elements, is less than ρσ−n.

Lemma 11.1.10. Let Pn, n ≥ 1, be a sequence of partitions in a compact metricspace M , with diameters converging to zero. Let ν be a probability measure onM and B be any measurable set with ν(B) > 0. Then, there exist Vn ∈ Pn,n ≥ 1, such that

ν(Vn) > 0 andν(B ∩ Vn)

ν(Vn)→ 1 when n→ ∞.

Proof. Given any 0 < ε < ν(B), letKε ⊂ B be a compact set with ν(B\Kε) < ε.Let Kε,n be the union of all the elements of Pn that intersect Kε. Since thediameters of the partitions converge to zero, ν(Kε,n \ Kε) < ε for every nsufficiently large. By contradiction, suppose that

ν(Kε ∩ Vn) ≤ν(B) − ε

ν(B) + εν(Vn)

for every Vn ∈ Pn that intersects Kε. It would follow that

ν(Kε) ≤∑

Vn

ν(Kε ∩ Vn) ≤∑

Vn

ν(B) − ε

ν(B) + εν(Vn) =

ν(B) − ε

ν(B) + εν(Kε,n)

≤ ν(B) − ε

ν(B) + ε(ν(Kε) + ε) ≤ ν(B) − ε < ν(Kε).

This contradiction shows that there must exist some Vn ∈ Pn such that

ν(Vn) ≥ ν(B ∩ Vn) ≥ ν(Kε ∩ Vn) >ν(B) − ε

ν(B) + εν(Vn)

and, consequently, ν(Vn) > 0. Making ε→ 0 we get the claim.

In the statements that follow, we say that a measurable set A ⊂ M isinvariant under f : M → M if f−1(A) = A up to zero Lebesgue measure.According to Exercise 11.1.1, then we also have that f(A) = A up to zeroLebesgue measure.


Lemma 11.1.11. If A ⊂ M is invariant under f and has positive Lebesguemeasure then A has full Lebesgue measure inside some Ui ∈ P0, that is, thereexists 1 ≤ i ≤ s such that m(Ui \A) = 0.

Proof. By Lemma 11.1.10, we may choose Vn ∈ Pn so that m(Vn \A)/m(Vn)converges to zero when n → ∞. Let Ui(n) = fn(Vn). By Proposition 11.1.5applied to the inverse branch of fn that maps Ui(n) to Vn, we get that

m(Ui(n) \A)

m(Ui(n))≤ m(fn(Vn \A))

m(fn(Vn))≤ exp

(C1(2ρ)

ν)m(Vn \A)

m(Vn)

also converges to zero. Since P0 is finite, there must be 1 ≤ i ≤ s such thati(n) = i for infinitely many values of n. Then, m(Ui \A) = 0.

Corollary 11.1.12. The transformation f : M → M admits some ergodicinvariant probability measure absolutely continuous with respect to the Lebesguemeasure.

Proof. It follows from the previous lemma there exist at most s = #P0 pairwisedisjoint invariant sets with positive Lebesgue measure. Therefore, M may bepartitioned into a finite number of minimal invariant sets A1, . . . , Ar, r ≤ s withpositive Lebesgue measure, where by minimal we mean that there are no invari-ant sets Bi ⊂ Ai with 0 < m(Bi) < m(Ai). Given any absolutely continuousinvariant probability measure µ, there exists some i such that µ(Ai) > 0. Thenormalized restriction

µi(B) =µ(B ∩Ai)µ(Ai)

of µ to any such Ai is invariant and absolutely continuous. Moreover, theassumption that Ai is minimal implies that µi is ergodic.

11.1.3 Uniqueness and conclusion of the proof

The previous argument also shows that there exists only a finite number ofabsolutely continuous ergodic probability measures. The last step is to showthat, in fact, such a probability measure is unique. For that we use the factthat f is topologically exact:

Lemma 11.1.13. Given any non-empty open set U ⊂ M , there exists N ≥ 1such that fN (U) = M .

Proof. Let x ∈ U and r > 0 be such that the ball of radius r around x iscontained in U . Given any n ≥ 1, suppose that fn(U) does not cover the wholemanifold. Then, there exists some curve γ connecting fn(x) to a point y ∈M \ fn(U), and that curve may be taken with length smaller than diamM + 1.Lifting 1 γ by the local diffeomorphism fn, we obtain a curve γn connecting xto some point yn ∈ M \ U . Then, r ≤ length(γn) ≤ σ−n(diamM + 1). Thisprovides an upper bound on the possible value of n. Hence, fn(U) = M forevery n sufficiently large, as claimed.

1Note that any local diffeomorphism from a compact manifold to itself is a covering map.


Corollary 11.1.14. If A ⊂M is an invariant set with positive Lebesgue mea-sure then A has full Lebesgue measure in the whole manifold M .

Proof. Let U be the interior of a set Ui as in Lemma 11.1.11 and N ≥ 1 besuch that fN(U) = M . Then m(U \A) = 0 and, using the fact that f is a localdiffeomorphism, it follows that M \A = fN (U) \ fN (A) ⊂ fN(U \A) also hasLebesgue measure zero.

The next statement completes the proof of Theorem 11.1.2:

Corollary 11.1.15. Let µ be any absolutely continuous invariant probabilitymeasure. Then, µ is ergodic and its basin B(µ) has full Lebesgue measure inM . Consequently, µ is unique. Moreover, its support is the whole manifold M .

Proof. If A is an invariant set then, by Corollary 11.1.14, either A or its com-plement Ac has Lebesgue measure zero. Since µ is absolutely continuous, itfollows that either µ(A) = 0 or µ(Ac) = 0. This proves that µ is ergodic. Thenµ(B(µ)) = 1 and, in particular, m(B(µ)) > 0. Since B(µ) is an invariant set,we get that it has full Lebesgue measure. Analogously, since the support of µ isa compact invariant set with positive Lebesgue measure, it must coincide withM .

Finally, let µ and ν be any two absolutely continuous invariant probabilitymeasures. It follows from what we have just said that the two measures areergodic and their basins intersect each other. Given any point x in B(µ)∩B(ν),the sequence

1

n

n−1∑

j=0

δfj(x)

converges to both µ and ν in the weak∗ topology. Thus, µ = ν.

In general, we say that an invariant probability measure µ of a local diffeo-morphism f : M → M is a physical measure if its basin has positive Lebesguemeasure. It follows from Corollary 11.1.15 that in the present context thereexists a unique physical measure, which is the absolutely continuous invariantmeasure µ, and its basin has full Lebesgue measure. This last fact may beexpressed as follows:

1

n

n−1∑

j=0

δfj(x) → µ for Lebesgue almost every x.

In Chapter 12 we will find this absolutely continuous invariant probabilitymeasure µ through a different approach (Proposition 12.1.20) that also showsthat the density h = dµ/dm is Holder and bounded away from zero. In particu-lar, µ is equivalent to the Lebesgue measure m, not just absolutely continuous.Moreover (Section 12.1.7), the system (f, µ) is exact, not just ergodic. In addi-tion (Lemma 12.1.12), its Jacobian is given by Jµf = | detDf |(hf)/h. Hence,


by the Rokhlin formula (Theorem 9.7.3),

hµ(f) =

∫log Jµf dµ =

∫log | detDf | dµ+

∫log(h f) dµ−

∫log h dµ.

Since µ is invariant, this means that

hµ(f) =

∫log | detDf | dµ.

Actually, the facts stated in the previous paragraph can already be provenwith the methods available at this point. We invite the reader to do just that(Exercises 11.1.3 through 11.1.6), in the context of expanding maps of the in-terval, which are technically a bit simpler than expanding maps on a generalmanifold.

Example 11.1.16. We say that a transformation f : [0, 1] → [0, 1] is an ex-panding map of the interval if there exists a countable (possibly finite) familyP of pairwise disjoint open subintervals whose union has full Lebesgue measurein [0, 1] and which satisfy:

(a) The restriction of f to each P ∈ P is a diffeomorphism onto (0, 1); denoteby f−1

P : (0, 1) → P its inverse.

(b) There exist C > 0 and θ > 0 such that, for every x, y and every P ∈ P ,

∣∣ log |D(f−1P )(x)| − log |D(f−1

P )(y)|∣∣ ≤ C|x− y|θ.

(c) There exist c > 0 and σ > 1 such that, for every n and every x,

|Dfn(x)| ≥ cσn (whenever the derivative is defined.)

This class of transformations includes the decimal expansion and the Gauss mapas special cases. Its properties are analyzed in Exercises 11.1.3 through 11.1.5.

Exercise 11.1.6 deals with a slightly more general class of transformations,where we replace condition (a) by

(a’) There exists δ > 0 such that the restriction of f to each P ∈ P is a diffeo-morphism onto some interval f(P ) of length larger than δ that containsevery element of P that it intersects.

11.1.4 Exercises

11.1.1. Let f : M → M be a local diffeomorphism in a compact manifold andm be the Lebesgue measure on M . Check the following facts:

1. If m(B) = 0 then m(f−1(B)) = 0.

2. If B is measurable then f(B) is measurable.

11.2. DYNAMICS OF EXPANDING MAPS 369

3. If m(B) = 0 then m(f(B)) = 0.

4. If A = B up to zero Lebesgue measure zero then f(A) = f(B) andf−1(A) = f−1(B) up to zero Lebesgue measure.

5. If A is an invariant set then f(A) = A up to zero Lebesgue measure.

11.1.2. Let f : M → M be a transformation of class C1 such that there existσ > 1 and k ≥ 1 satisfying ‖Dfk(x)v‖ ≥ σ‖v‖ for every x ∈ M and everyv ∈ TxM . Show that there exists θ > 1 and a Riemannian norm 〈·〉 equivalentto ‖ · ‖ such that 〈Df(x)v〉 ≥ θ〈v〉 for every x ∈M and every v ∈ TxM .

11.1.3. Show that if f : [0, 1] → [0, 1] is an expanding map of the interval andmis the Lebesgue measure on [0, 1] then there exists a function ρ : (0, 1) → (0,∞)such that log ρ is bounded and Holder and µ = ρm is a probability measureinvariant under f .

11.1.4. Show that the measure µ in Exercise 11.1.3 is exact and is the uniqueinvariant probability measure of f absolutely continuous with respect to theLebesgue measure m.

11.1.5. Show that the measure µ in Exercise 11.1.3 satisfies the Rokhlin for-mula: assuming that log |f ′| ∈ L1(µ), we have that hµ(f) =

∫log |f ′| dµ.

11.1.6. Prove the following generalization of Exercises 11.1.3 and 11.1.4: if fsatisfies the conditions (a’), (b) and (c) in Example 11.1.16 then there exists a fi-nite (non-empty) family of absolutely continuous invariant probability measuresergodic for f and such that every absolutely continuous invariant probabilitymeasure is a convex combination of those ergodic probability measures.

11.2 Dynamics of expanding maps

In this section we extend the notion of expanding map to compact metric spacesand we mention a few interesting examples. In this general set-up, an ex-panding map need not be transitive, let alone topologically exact (compareLemma 11.1.13). However, Theorems 11.2.14 and 11.2.15 assert that the dy-namics may always be reduced to the topologically exact case. This is relevantbecause for the main results in the sections we need the transformation to betopologically exact or, equivalently (Exercise 11.2.2), topologically mixing.

A continuous transformation f : M → M in a compact metric space M isan expanding map if there exist constants σ > 1 and ρ > 0 such that for everyp ∈ M the image of the ball B(p, ρ) contains a neighborhood of the closure ofB(f(p), ρ) and

d(f(x), f(y)) ≥ σd(x, y) for every x, y ∈ B(p, ρ). (11.2.1)

Every expanding map on a manifold, in the sense of Section 11.1, is also ex-panding in the present sense:


Example 11.2.1. Let M be a compact Riemannian manifold and f : M → Mbe a map of class C1 such that ‖Df(x)v‖ ≥ σ‖v‖ for every x ∈ M and everyv ∈ TxM , where σ is a constant larger than 1. Denote K = sup ‖Df‖ (observethat K > 1). Fix ρ > 0 small enough that the restriction of f to every ballB(p, 2Kρ) is a diffeomorphism onto its image. Consider any y ∈ B(f(p), σρ)and let γ : [0, 1] → B(f(p), σρ) be a minimizing geodesic (that is, such thatit realizes the distance between points) with γ(0) = f(p) and γ(1) = y. Bythe choice of ρ, there exists a differentiable curve β : [0, δ] → B(p, ρ) such thatβ(0) = p and f(β(t)) = γ(t) for every t. Observe that (using ℓ(·) to denote thelength of a curve),

d(p, β(t)) ≤ ℓ(β | [0, t]

)≤ σ−1ℓ

(γ | [0, t]

)= σ−1td(f(p), y) < tρ

for every t. This shows that we may take δ = 1. Then, β(1) ∈ B(p, ρ) andf(β(1)) = γ(1) = y. In this way, we have shown that f(B(p, ρ)) containsB(f(p), σρ), which is a neighborhood of the closure of B(f(p), ρ). Now con-sider any x, y ∈ B(p, ρ). Note that d(f(x), f(y)) < 2Kρ. Let γ : [0, 1] →B(f(x), 2Kρ) be a minimizing geodesic connecting f(x) to f(y). Arguing as inthe previous paragraph, we find a differentiable curve β : [0, 1] → B(x, 2Kρ)connecting x to y and such that f(β(t)) = γ(t) for every t. Then,

d(f(x), f(y)) = ℓ(γ) ≥ σℓ(β) ≥ σd(x, y).

This completes the proof that f is an expanding map.

The following fact is useful for constructing further examples:

Lemma 11.2.2. Assume that f : M →M is an expanding map and Λ ⊂M isa compact set such that f−1(Λ) = Λ. Then, the restriction f : Λ → Λ is alsoan expanding map.

Proof. It is clear that the condition (11.2.1) remains valid for the restriction. Weare left to check that f(Λ ∩B(p, ρ)) contains a neighborhood of Λ ∩B(f(p), ρ)inside Λ. By assumption, f(B(p, ρ)) contains some neighborhood V of theclosure of B(f(p), ρ). Then, Λ∩V is a neighborhood of Λ∩B(f(p), ρ). Moreover,given any y ∈ Λ ∩ V there exists x ∈ B(p, ρ) such that f(x) = y. Sincef−1(Λ) = Λ, this point is necessarily in Λ. This proves that Λ ∩ V is containedin the image f(Λ ∩ B(p, ρ)). Hence, the restriction of f to the set Λ is anexpanding map, as stated.

It is not possible to replace the hypothesis of Lemma 11.2.2 by f(Λ) = Λ.See Exercise 11.2.4.

Example 11.2.3. Let J ⊂ [0, 1] be a finite union of (two or more) pairwisedisjoint compact intervals. Consider a map f : J → [0, 1] such that the restric-tion of f to each connected component of J is a diffeomorphism onto [0, 1]. SeeFigure 11.1. Assume that there exists σ > 1 such that

|f ′(x)| ≥ σ for every x ∈ J . (11.2.2)


Denote Λ = ∩∞n=0f

−n(J). In other words, Λ is the set of all points x whoseiterates fn(x) are defined for every n ≥ 0. It follows from the definition thatΛ is compact (one can show that K is a Cantor set) and f−1(Λ) = Λ. Therestriction f : Λ → Λ is an expanding map. Indeed, fix any ρ > 0 smaller thanthe distance between any two connected components of J . Then, every ball ofradius ρ inside Λ is contained in a unique connected component of J and so, by(11.2.2), it is dilated by a factor larger or equal than σ.

0

1

1J1 J2 J3

Figure 11.1: Expanding map on a Cantor set

Example 11.2.4. Let σ : ΣA → ΣA be the one-sided shift of finite typeassociated with a transition matrix A (these notions were introduced in Sec-tion 10.2.2). Consider in ΣA the distance defined by

d((xn)n, (yn)n

)= 2−N , N = infn ∈ N : xn 6= yn. (11.2.3)

Then, σA is an expanding map. Indeed, fix ρ ∈ (1/2, 1) and σ = 2. The openball of radius ρ around any point (pn)n ∈ ΣA is just the cylinder [0; p0]A thatcontains the point. The definition (11.2.3) yields

d((xn+1)n, (yn+1)n

)= 2d

((xn)n, (yn)n

)

for any (xn)n and (yn)n in the cylinder [0; p0]A. Moreover, σA([0; p0]A) is theunion of all cylinders [0; q] such that Ap0,q = 1. In particular, it contains thecylinder [0; p1]A. Since the cylinders are both open and closed in ΣA, this showsthat the image of the ball of radius ρ around (pn)n contains a neighborhood ofthe closure of the ball of radius ρ around (pn+1)n. This completes the proofthat every shift of finite type is an expanding map.

Example 11.2.5. Let f : S1 → S1 be a local diffeomorphism of class C2 withdegree larger than 1. Assume that all the periodic points of f are hyperbolic,that is, |(fn)′(x)| 6= 1 for every x ∈ Fix(fn) and every n ≥ 1. Let Λ be thecomplement of the union of the basins of attraction of all the attracting periodicpoints of f . Then, the restriction f : Λ → Λ is an expanding map: this is aconsequence of a deep theorem of Ricardo Mane [Man85].


For expanding maps f : M → M in a metric space M , the number ofpre-images of a point y ∈ M may vary with y (unless M is connected; seeExercises 11.2.1 and A.4.6). For example, for a shift of finite type σ : ΣA → ΣA(Example 11.2.4) the number of pre-images of a point y = (yn)n ∈ ΣA is equalto the number of symbols i such that Ai,y0 = 1; in general, this number dependson y0.

On the other hand, it is easy to see that the number of pre-images is alwaysfinite, and even bounded: it suffices to consider a finite cover of M by balls ofradius ρ and to notice that every point has at most one pre-image in each ofthose balls. By a slight abuse of language, we call degree of an expanding mapf : M →M the maximum number of pre-images of any point, that is,

degree(f) = max#f−1(y) : y ∈M. (11.2.4)

11.2.1 Contracting inverse branches

Let f : M → M be an expanding map. By definition, the restriction of f toeach ball B(p, ρ) of radius ρ is injective and its image contains the closure ofB(f(p), ρ). Thus, the restriction to B(p, ρ) ∩ f−1(B(f(p), ρ)) is a homeomor-phism onto B(f(p), ρ). We denote by

hp : B(f(p), ρ) → B(p, ρ)

its inverse and call it inverse branch of f at p. It is clear that hp(f(p)) = p andf hp = id . The condition (11.2.1) implies that hp is a σ−1-contraction:

d(hp(z), hp(w)) ≤ σ−1d(z, w) for every z, w ∈ B(f(p), ρ). (11.2.5)

Lemma 11.2.6. If f : M →M is expanding then, for every y ∈M ,

f−1(B(y, ρ)) =⋃

x∈f−1(y)

hx(B(y, ρ)).

Proof. The relation f hx = id implies that hx(B(y, ρ)) is contained in thepre-image of B(y, ρ) for every x ∈ f−1(y). To prove the other inclusion, let zbe any point such that f(z) ∈ B(y, ρ). By the definition of expanding map,f(B(z, ρ)) contains B(f(z), ρ) and, hence, contains y. Let hz : B(f(z), ρ) → Mbe the inverse branch of f at z and let x = hz(y). Both z and hx(f(z)) are inB(x, ρ) ∩ f−1(z). Since f is injective on the balls of radius ρ, it follows thatz = hx(f(z)). This completes the proof.

More generally, for any n ≥ 1, we call inverse branch of fn at p the compo-sition

hnp = hfn−1(p) · · · hp : B(fn(p), ρ) → B(p, ρ)

of the inverse branches of f at the iterates of p. Observe that hnp (fn(p)) = p

and fn hnp = id . Moreover, f j hnp = hn−jfj(p) for each 0 ≤ j < n. Hence,

d(f j hnp (z), f j hnp (w)) ≤ σj−nd(z, w) (11.2.6)

for every z, w ∈ B(fn(p), ρ) and every 0 ≤ j ≤ n.


p f(p) f2(p)

hp hf(p)

Figure 11.2: Inverse branches of fn

Lemma 11.2.7. If f : M →M is an expanding map then fn(B(p, n+ 1, ε)) =B(fn(p), ε) for every p ∈M , n ≥ 0 and ε ∈ (0, ρ].

Proof. The inclusion fn(B(p, n + 1, ε)) ⊂ B(fn(p), ε) is an immediate conse-quence of the definition of dynamical ball. To prove the converse, consider theinverse branch hnp : B(fn(p), ρ) → B(p, ρ). Given any y ∈ B(fn(p), ε), letx = hnp (y). Then, fn(x) = y and, by (11.2.6),

d(f j(x), f j(p)) ≤ σj−nd(fn(x), fn(p)) ≤ d(y, fn(p)) < ε

for every 0 ≤ j ≤ n. This shows that x ∈ B(p, n+ 1, ε).

Corollary 11.2.8. Every expanding map is expansive.

Proof. Assume that d(fn(z), fn(w)) < ρ for every n ≥ 0. This implies thatz = hnw(fn(z)) for every n ≥ 0. Then, the property (11.2.6) gives that

d(z, w) ≤ σ−nd(fn(z), fn(w)) < ρσ−n.

Making n→ ∞, we get that z = w. So, ρ is a constant of expansivity for f .

11.2.2 Shadowing and periodic points

Given δ > 0, we call δ-pseudo-orbit of a transformation f : M → M anysequence (xn)n≥0 such that

d(f(xn), xn+1) < δ for every n ≥ 0.

We say that the δ-pseudo-orbit is periodic if there is κ ≥ 1 such that xn = xn+κ

for every n ≥ 0. It is clear that every orbit is a δ-pseudo-orbit, for every δ > 0.For expanding maps we have a kind of converse: every pseudo-orbit is uniformlyclose to (we say that it is shadowed by) some orbit of the transformation:

Proposition 11.2.9 (Shadowing lemma). Assume that f : M → M is anexpanding map. Then, given any ε > 0 there exists δ > 0 such that for everyδ-pseudo-orbit (xn)n there exists x ∈ M such that d(fn(x), xn) < ε for everyn ≥ 0.

If ε is small enough, so that 2ε is a constant of expansivity for f , then thepoint x is unique. If, in addition, the pseudo-orbit is periodic then x is a periodicpoint.


Proof. It is no restriction to suppose that ε is less than ρ. Fix δ > 0 so thatσ−1ε + δ < ε. For each n ≥ 0, let hn : B(f(xn), ρ) → B(xn, ρ) be the inversebranch of f at xn. The property (11.2.5) ensures that

hn(B(f(xn), ε)

)⊂ B(xn, σ

−1ε) for every n ≥ 1. (11.2.7)

Since d(xn, f(xn−1)) < δ, it follows that

hn(B(f(xn), ε)

)⊂ B(f(xn−1), ε) for every n ≥ 1. (11.2.8)

Then, we may consider the composition hn = h0 · · ·hn−1 and (11.2.8) impliesthat the sequence of compact sets Kn = hn

(B(f(xn), ε)

)is nested. Take x in

the intersection. For every n ≥ 0, we have that x ∈ Kn+1 and so fn(x) belongsto

fn hn+1(B(f(xn), ε)

)= hn

(B(f(xn), ε)

).

By (11.2.7), this implies that d(fn(x), xn) ≤ σ−1ε < ε for every n ≥ 0.The other claims in the proposition are simple consequences, as we now

explain. If x′ is another point as in the conclusion of the proposition then

d(fn(x), fn(x′)) ≤ d(fn(x), xn) + d(fn(x′), xn) < 2ε for every n ≥ 0.

Since 2ε is an expansivity constant, it follows that x = x′. Moreover, if thepseudo-orbit is periodic, with period κ ≥ 1, then

d(fn(fκ(x)), xn) = d(fn+κ(x), xn+κ) < ε for every n ≥ 0.

By uniqueness, it follows fκ(x) = x.

It is worthwhile pointing out that δ depends linearly on ε: the proof ofProposition 11.2.9 shows that we may take δ = cε, where c > 0 depends onlyon σ.

We call pre-orbit of a point x ∈M any sequence (x−n)n≥0 such that x0 = xand f(x−n) = x−n+1 for every n ≥ 1. If x is a periodic point, with period l ≥ 1,then it admits a distinguished periodic pre-orbit (x−n)n, such that x−kl = x forevery integer k ≥ 1.

Lemma 11.2.10. If d(x, y) < ρ then, given any pre-orbit (x−n)n of x, thereexists a pre-orbit (y−n)n of y asymptotic to (x−n)n, in the sense that d(x−n, y−n)converges to 0 when n→ ∞.

Proof. For each n ≥ 1, let hn : B(x, ρ) → M be the inverse branch of fn withhn(x) = x−n. Define y−n = hn(y). It is clear that d(x−n, y−n) ≤ σ−nd(x, y).This implies the claim.

Theorem 11.2.11. Let f : M →M be an expanding map in a compact metricspace and Λ ⊂ M be the closure of the set of all periodic points of f . Then,f(Λ) = Λ and the restriction f : Λ → Λ is an expanding map.


Proof. On the one hand, it is clear that f(Λ) is contained in Λ: if a point xis accumulated by periodic points pn then f(x) is accumulated by the imagesf(pn), which are also periodic points. On the other hand, since f(Λ) is acompact set that contains all the periodic points, it must contain Λ. This showsthat f(Λ) = Λ.

Next, we prove that the restriction f : Λ → Λ is an expanding map. It isclear that the property (11.2.1) remains valid for the restriction. So, we onlyhave to show that there exists r ≤ ρ (we are going to take r = σ−1ρ) such that,for every x ∈ Λ, the image f(Λ∩B(x, r)) contains a neighborhood of Λ∩B(x, r)inside Λ. The main ingredient is the following lemma:

Lemma 11.2.12. Let p be a periodic point and hp : B(f(p), ρ) → B(p, ρ) be theinverse branch of f at p. If y ∈ B(f(p), ρ) is a periodic point then hp(y) ∈ Λ.

Proof. Write x = hp(y) and q = f(p). Consider any ε > 0 such that 2ε isa constant of expansivity for f . Take δ > 0 given by the shadowing lemma(Proposition 11.2.9). By Lemma 11.2.10, there exists a pre-orbit (x−n)n of xasymptotic to the periodic pre-orbit (p−n)n of p. In particular,

d(x−k+1, q) = d(x−k+1, p−k+1) < δ (11.2.9)

for every large multiple k of the period of p. Analogously, there exists a pre-orbit(q−n)n of q asymptotic to the periodic pre-orbit (y−n)n of y. Fix any multiplel of the period of y such that

d(q−l, f(x)) = d(q−l, y) < δ. (11.2.10)

p q

x

y

x−k+1

q−l

q−1

Figure 11.3: Constructing periodic orbits


Now consider the periodic sequence (zn)n, with period k + l, given by

z0 = x, z1 = q−l, . . . , zl = q−1, zl+1 = x−k+1, . . . , zl+k−1 = x−1, zk+l = x.

See Figure 11.3. We claim that (zn)n is a δ-pseudo-orbit. Indeed, if n is amultiple of k + l then, by (11.2.10),

d(f(zn), zn+1) = d(f(x), q−1) = d(y, q−1) < δ.

If n− l is a multiple of k + l then, by (11.2.9),

d(f(zn), zn+1) = d(f(q−1), x−k+1) = d(q, x−k+1) < δ.

In all the other cases, f(zn) = zn+1. This proves our claim. Now we may useProposition 11.2.9 to conclude that there exists a periodic point z such thatd(fn(z), zn) < ε for every n ≥ 0. In particular, d(z, x) < ε. Since ε > 0 isarbitrary, this shows that x is in the closure of the set of periodic points, asstated.

Corollary 11.2.13. Let z ∈ Λ and hz : B(f(z), ρ) → B(z, ρ) be the inversebranch of f at z. If w ∈ Λ ∩B(f(z), ρ) then hz(w) ∈ Λ.

Proof. Since z ∈ Λ, we may find some periodic point p close enough to z thatw ∈ B(f(p), ρ) and hp(w) = hz(w). Since w ∈ Λ, we may find periodic pointsyn ∈ B(f(p), ρ) converging to w. By Lemma 11.2.12, we have that hp(yn) ∈ Λfor every n. Passing to the limit, we conclude that hp(w) ∈ Λ.

We are ready to conclude the proof of Theorem 11.2.11. Take r = σ−1ρ.The property (11.2.6) implies that hz(B(f(z), ρ)) is contained in B(z, r), forevery z ∈ Λ. Then, Corollary 11.2.13 implies that f(Λ ∩ B(z, r)) containsΛ∩B(f(z), ρ), which is a neighborhood of Λ∩B(f(z), r) in Λ. Thus the argumentis complete.

Theorem 11.2.14. Let f : M →M be an expanding map in a compact metricspace and Λ ⊂M be the closure of the set of periodic points of f . Then,

M =

∞⋃

k=0

f−k(Λ).

Proof. Given any x ∈ M , let ω(x) denote its ω-limit set, that is, the set ofaccumulation points of the iterates fn(x) when n → ∞. First, we show thatω(x) ⊂ Λ. Then, we deduce that fk(x) ∈ Λ for some k ≥ 0.

Let ε > 0 be such that 2ε is a constant of expansivity for f . Take δ > 0given by the shadowing lemma (Proposition 11.2.9) and let α ∈ (0, δ) be suchthat d(f(z), f(w)) < δ whenever d(z, w) < α. Let y be any point in ω(x). Thedefinition of the ω-limit set implies that there exist r ≥ 0 and s ≥ 1 such that

d(f r(x), y) < α and d(f r+s(x), y) < α.


Consider the periodic sequence (zn), with period s, given by

z0 = y, z1 = f r+1(x), . . . , zs−1 = f r+s−1(x), zs = y.

Observe that d(f(z0), z1) = d(f(y), f r+1(x)) < δ (because d(y, f r(x)) < α) andd(f(zs−1), zs) = d(f r+s(x), y) < α < δ and f(zn) = zn+1 in all the other cases.In particular, (zn)n is a δ-pseudo-orbit. Then, by Proposition 11.2.9, thereexists some periodic point z such that d(y, z) < ε. Making ε → 0, we concludethat y is accumulated by periodic points, that is, y ∈ Λ.

Let ε > 0 and δ > 0 be as before. It is no restriction to suppose thatδ < ε. Take β ∈ (0, δ/2) such that d(f(z), f(w)) < δ/2 whenever d(z, w) < β.Since ω(x) is contained in Λ, there exist k ≥ 1 and points wn ∈ Λ such thatd(fn+k(x), wn) < β for every n ≥ 0. Observe that

d(f(wn), wn+1) ≤ d(f(wn), fn+k+1(x)) + d(fn+k+1(x), wn+1) < δ/2 + β < δ

for every n ≥ 0. Therefore, (wn)n is a δ-pseudo-orbit in Λ. Since the re-striction of f to Λ is an expanding map (Theorem 11.2.11), it follows fromProposition 11.2.9 applied to the restriction that there exists w ∈ Λ such thatd(fn(w), wn) < ε for every n ≥ 0. Then,

d(fn(fk(x)), fn(w)) ≤ d(fn+k(x), wn) + d(wn, fn(w)) < β + ε < 2ε

for every n ≥ 0. Then, by expansivity, fk(x) = w.

11.2.3 Dynamical decomposition

Theorem 11.2.14 shows that for expanding maps the interesting dynamics islocalized in the closure Λ of the set of periodic points. In particular, suppµ ⊂ Λfor every invariant probability measure f . Moreover (Theorem 11.2.11), therestriction of f to Λ is still an expanding map. Thus, up to replacing M by Λ,it is no restriction to suppose that the set of periodic points is dense in M .

Theorem 11.2.15 (Dynamical decomposition). Let f : M →M be an expand-ing map whose set of periodic points is dense in M . Then, there exists a partitionof M into non-empty compact sets Mi,j, with 1 ≤ i ≤ k and 1 ≤ j ≤ m(i), suchthat

(a) Mi = ∪m(i)j=1 Mi,j is invariant under f , for every i;

(b) f(Mi,j) = Mi,j+1 if j < m(i) and f(Mi,m(i)) = Mi,1, for every i, j;

(c) each restriction f : Mi →Mi is a transitive expanding map;

(d) each fm(i) : Mi,j →Mi,j is a topologically exact expanding map.

Moreover, the number k, the numbers m(i) and the sets Mi,j are unique up torenumbering.


Proof. Consider the relation ∼ defined in the set of periodic points of f asfollows. Given two periodic points p and q, let (p−n)n and (q−n)n, respectively,be their periodic pre-orbits. By definition, p ∼ q if and only if there existpre-orbits (p−n)n of p and (q−n)n of q such that

d(p−n, q−n) → 0 and d(p−n, q−n) → 0. (11.2.11)

We claim that ∼ is an equivalence relation. It is clear from the definitionthat the relation ∼ is reflexive and symmetric. To prove that it is also transitive,suppose that p ∼ q and q ∼ r. Then, there exist pre-orbits (q−n)n of q and(r−n)n of r asymptotic to the periodic pre-orbits (p−n)n of p and (q−n)n ofq, respectively. Let k ≥ 1 be a multiple of the periods of p and q such thatd(r−k, q−k) < ρ. Note that q−k = q, since k is a multiple of the period of q.Then, by Lemma 11.2.10, there exists a pre-orbit (r′−n)n of the point r′ = r−ksuch that d(r′−n, q−n) → 0. Then, d(r′−n, p−n) → 0. Consider the pre-orbit(r′′−n)n of r defined by

r′′−n =

r−n if n ≤ kr′−n+k if n > k.

Since k is a multiple of the period of p, we have d(r′′−n, p−n) = d(r′−n+k, p−n+k)for every n > k. Therefore, (r′′−n)n is asymptotic to (p−n)n. Analogously, onecan find a pre-orbit (p′′−n) of p asymptotic to the periodic pre-orbit (r−n)n of r.Therefore, p ∼ r and this proves that the relation ∼ is indeed transitive.

Next, we claim that p ∼ q if and only if f(p) ∼ f(q). Start by supposingthat p ∼ q, and let (p−n)n and (q−n)n be pre-orbits of p and q as in (11.2.11).The periodic pre-orbits of p′ = f(p) and q′ = f(q) are given by, respectively,

p′−n =

f(p) if n = 0p−n+1 if n ≥ 1

and q′−n =

f(q) if n = 0q−n+1 if n ≥ 1.

Consider the pre-orbits of p and q given by, respectively,

p′−n =

f(p) if n = 0p−n+1 if n ≥ 1

and q′−n =

f(q) if n = 0q−n+1 if n ≥ 1.

It is clear that (p′−n)n is asymptotic to (q′−n)n and (q′−n)n is asymptotic to(p′−n)n. Hence, f(p) ∼ f(q). Now suppose that f(p) ∼ f(q). The previousargument shows that fk(p) ∼ fk(q) for every k ≥ 1. When k is a commonmultiple of the periods of p and q this means that p ∼ q. This completes theproof of our claim. Note that the statement means that the image and thepre-image of any equivalence class are both equivalence classes.

Observe also that if d(p, q) < ρ then p ∼ q. Indeed, by Lemma 11.2.10we may find a pre-orbit of q asymptotic to the periodic pre-orbit of p and,analogously, a pre-orbit of p asymptotic to the periodic pre-orbit of q. It followsthat the equivalence classes are open sets and, since M is compact, they arein finite number. Moreover, if A and B are two different equivalence classes,then their closures A and B are disjoint: the distance between them is at least


ρ. Since p ∼ q if and only if f(p) ∼ f(q), it follows that the transformation fpermutes the closures of the equivalence classes.

Thus, we may enumerate the closures of the equivalence classes as Mi,j , with1 ≤ i ≤ k and 1 ≤ j ≤ m(i), in such a way that

f(Mi,j) = Mi,j+1 for j < m(i) and f(Mi,m(i)) = Mi,1. (11.2.12)

The properties (a) and (b) in the statement of the theorem are immediateconsequences.

Let us prove property (c). Since the Mi are pairwise disjoint, it followsfrom (11.2.12) that f−1(Mi) = Mi for every i. Hence, Lemma 11.2.2 impliesthat f : Mi → Mi is an expanding map. By Lemma 4.3.4, to show that thismap is transitive it suffices to show that given any open subsets U and V ofMi there exists n ≥ 1 such that fn(U) intersects V . It is no restriction toassume that U ⊂ Mi,j for some j. Moreover, up to replacing V by some pre-image f−k(V ), we may suppose that V is contained in the same Mi,j . Chooseperiodic points p ∈ U and q ∈ V . By the definition of equivalence classes, thereexists some pre-orbit (q−n)n of q asymptotic to the periodic pre-orbit (p−n)nof p. In particular, we may find n arbitrarily large such that q−n ∈ U . Then,q ∈ fn(U) ∩ V . Therefore, f : Mi →Mi is transitive.

Next, we prove property (d). Since the Mi,j are pairwise disjoint, it followsfrom (11.2.12) that f−m(i)(Mi,j) = Mi,j for every i. Hence (Lemma 11.2.2),g = fm(i) : Mi,j → Mi,j is an expanding map. We also want to prove that gis topologically exact. Let U be a non-empty open subset of Mi,j and p be aperiodic point of f in U . By (11.2.12), the period κ is a multiple of m(i), sayκ = sm(i). Let q be any periodic point of f in Mi,j . By the definition of theequivalence relation ∼, there exists some pre-orbit (q−n)n of q asymptotic tothe periodic pre-orbit (p−n)n of p. In particular, d(q−κn, p) → 0 when n → ∞.Then, hκnq (B(q, ρ)) is contained in U for every n sufficiently large. This impliesthat gsn(U) = fκn(U) contains B(q, ρ) for every n sufficiently large. SinceMi,j is compact, we may find a finite cover by balls of radius ρ around periodicpoints. Applying the previous argument to each of those periodic points, wededuce that gsn(U) contains Mi,j for every n sufficiently large. Therefore, g istopologically exact.

We are left to prove that k, the m(i) and the Mi,j are unique. Let Nr,s, with1 ≤ r ≤ l and 1 ≤ s ≤ n(r), be another partition as in the statement. Initially,let us consider the partitions M = Mi : 1 ≤ i ≤ k and N = Nr : 1 ≤ r ≤ l,where Nr = ∪n(r)

s=1Nr,s. Given any i and r, the sets Mi and Nr are open, closed,invariant and transitive. We claim that either Mi∩Nr = ∅ or Mi = Nr. Indeed,since the intersection is open, if it is non-empty then it intersects any orbit thatis dense in Mi (or in Nr). Since the intersection is also closed and invariant, itfollows that it contains Mi (and Nr). In other words, Mi = Nr. This provesour claim. It follows that the partitions M and N coincide, that is, k = l andMi = Ni up to renumbering. Now, fix i. The transformation f permutes theMi,j and the Ni,s cyclically, with periods m(i) and n(i). Since each of thesefamilies is a partition of Mi, this is possible only if m(i) = n(i). Since fm(i)


is transitive in each Mi,j and each Ni,s, the same argument as in the first partof this paragraph shows that, given any j and s, either Mi,j ∩ Ni,s = ∅ orMi,j = Ni,s. It follows that the families Mi,j and Ni,s coincide, up to cyclicrenumbering.

The following consequence of the theorem contains Lemma 11.1.13:

Corollary 11.2.16. If M is connected and f : M → M is an expanding mapthen the set of periodic points is dense in M and f is topologically exact.

Proof. We claim that Λ is an open subset of f−1(Λ). Indeed, consider δ ∈ (0, ρ)such that d(x, y) < δ implies d(f(x), f(y)) < ρ. Assume that x ∈ f−1(Λ) issuch that d(x,Λ) < δ. Then, there exists z ∈ Λ such that d(x, z) < δ < ρ andso d(f(x), f(z)) < ρ. Applying Corollary 11.2.13 with w = f(x), we get thatx = hz(w) ∈ Λ. Therefore, Λ contains its δ-neighborhood inside f−1(Λ). Thisimplies our claim.

Then, the set S = f−1(Λ) \ Λ is closed in f−1(Λ) and, consequently, it isclosed in M . Then, f−n(S) is closed in M for every n ≥ 0. By Theorem 11.2.14,the space M is a countable pairwise disjoint union of closed sets Λ and f−n(S),n ≥ 0. By the Baire theorem, some of these closed sets has non-empty interior.Since f is an open map, it follows that Λ has non-empty interior.

Now consider the restriction f : Λ → Λ. By Theorem 11.2.11, this is anexpanding map. Let Λi,j : 1 ≤ i ≤ k, 1 ≤ j ≤ m(i) be the partition of thedomain Λ given by Theorem 11.2.15. Then, some Λi,j contains an open subsetV of M . Since fm(i) is topologically exact, fnm(i)(V ) = Λi,j for some n ≥ 1.Using once more the fact that f is an open map, it follows that the compact setMi,j is an open subset of M . By connectivity, it follows that M = Λi,j . Thisimplies that Λ = M and f : M →M is topologically exact.

11.2.4 Exercises

11.2.1. Show that if f : M → M is a local homeomorphism in a compactconnected metric space then the number of pre-images #f−1(y) is the same forevery y ∈M .

11.2.2. Show that if an expanding map is topologically mixing then it is topo-logically exact.

11.2.3. Let f : M → M be a topologically exact transformation in a com-pact metric space. Show that for every r > 0 there exists N ≥ 1 such thatfN(B(x, r)) = M for every x ∈M .

11.2.4. Consider the expanding map f : S1 → S1 given by f(x) = 2x mod Z.Give an example of a compact set Λ ⊂ S1 such that f(Λ) = Λ but the restrictionf : Λ → Λ is not an expanding map.

11.2.5. Let f : M → M be an expanding map and Λ be the closure of the setof periodic points of f . Show that h(f) = h(f | Λ).

11.3. ENTROPY AND PERIODIC POINTS 381

11.2.6. Let f : M → M be an expanding map such that the set of periodicpoints is dense in M and let Mi, Mi,j be the compact subsets given by Theo-rem 11.2.15. Show that h(f) = maxi h(f |Mi) and

h(f |Mi) =1

m(i)h(fm(i) |Mi,j) for any i, j.

11.2.7. Let σA : ΣA → ΣA be a shift of finite type. Interpret the decompositiongiven by Theorem 11.2.15 in terms of the matrix A.

11.3 Entropy and periodic points

In this section we analyze the distribution of periodic points of an expandingmap f : M → M from a quantitative point of view. We show (Section 11.3.1)that the rate of growth of the number of periodic points is equal to the topolog-ical entropy; compare this statement with the discussion in Section 10.2.1. An-other interesting conclusion (Section 11.3.2) is that every invariant probabilitymeasure may be approximated, in the weak∗ topology, by invariant probabilitymeasures supported on periodic orbits. These results are based on the followingproperty:

Proposition 11.3.1. Let f : M → M be a topologically exact expanding map.Then, given any ε > 0 there exists κ ≥ 1 such that, given any x1, . . . , xs ∈ M ,any n1, . . . , ns ≥ 1 and any k1, . . . , ks ≥ κ, there exists a point p ∈M such that,denoting mj =

∑ji=1 ni + ki for j = 1, . . . , s and m0 = 0,

(i) d(fmj−1+i(p), f i(xj)) < ε for 0 ≤ i < nj and 1 ≤ j ≤ s

(ii) and fms(p) = p.

Proof. Given ε > 0, take δ > 0 as in the shadowing lemma (Proposition 11.2.9).Without loss of generality, we may suppose that δ < ε and 2ε is a constant ofexpansivity for f (recall Lemma 11.1.4). Since f is topologically exact, given anyz ∈M there exists κ ≥ 1 such that fk(B(z, δ)) = M for every k ≥ κ. Moreover(see Exercise 11.2.3), since M is compact, we may choose κ depending only onδ. Let xj , nj , kj ≥ κ, j = 1, . . . , s be as in the statement. In particular, foreach j = 1, . . . , s− 1 there exists yj ∈ B(fnj (xj), δ) such that fkj (yj) = xj+1.Analogously, there exists ys ∈ B(fns(xs), δ) such that fks(ys) = x1. Considerthe periodic δ-pseudo-orbit (zn)n≥0 defined by

zn =

fn−mj−1(xj) for 0 ≤ n−mj−1 < nj and j = 1, . . . sfn−mj−1−nj (yj) for 0 ≤ n−mj−1 − nj < kj and j = 1, . . . , szn−ms for n ≥ ms.

By the second part of the shadowing lemma, there exists some periodic pointp ∈ M , with period ms, whose trajectory ε-shadows this periodic pseudo-orbit(zn)n. In particular, the conditions (i) and (ii) in the statement hold.


The property in the conclusion of Proposition 11.3.1 was introduced by RufusBowen [Bow71] and is called specification by periodic points. When condition (i)holds, but the point p is not necessarily periodic, we say that f has the propertyof specification.

11.3.1 Rate growth of periodic points

Let f : M →M be an expanding map. Then, f is expansive (by Lemma 11.1.4)and so it follows from Proposition 10.2.2 that the rate of growth of the numberof periodic points is bounded above by the topological entropy:

lim supn

1

nlog #Fix(fn) ≤ h(f). (11.3.1)

In this section we prove that, in fact, the identity holds in (11.3.1). We startwith the topologically exact case, where one may even replace the limit superiorby a limit:

Proposition 11.3.2. For any topologically exact expanding map f : M →M ,

limn

1

nlog #Fix(fn) = h(f).

Proof. Given ε > 0, fix κ ≥ 1 satisfying the conclusion of Proposition 11.3.1with ε/2 instead of ε. For each n ≥ 1, let E be a maximal (n, ε)-separated set.According to Proposition 11.3.1, for each x ∈ E there exists p(x) ∈ B(x, n, ε/2)with fn+κ(p(x)) = p(x). We claim that the map x 7→ p(x) is injective. In-deed, consider any y ∈ E \ x. Since the set E was chosen to be maximal,B(x, n, ε/2)∩B(y, n, ε/2) = ∅. This implies that p(x) 6= p(y), which proves ourclaim. In particular, it follows that

#Fix(fn+κ) ≥ #E = sn(f, ε,M) for every n ≥ 1

(recall the definition (10.1.9) in Section 10.2.1). Then,

lim infn

1

nlog #Fix(fn+κ) ≥ lim inf

n

1

nlog sn(f, ε,M).

Making ε→ 0 and using the Corollary 10.1.8, we find that

lim infn

1

nlog #Fix(fn+κ) ≥ lim

ε→0lim inf

n

1

nlog sn(f, ε,M) = h(f). (11.3.2)

Together with (11.3.1), this implies the claim in the proposition.

Proposition 11.3.2 is not true, in general, if f is not topologically exact.For example, given an arbitrary expanding map g : M → M , consider f :M × 0, 1 → M × 0, 1 defined by f(x, i) = (g(x), 1 − i). Then f is anexpanding map but all its periodic points have even period. In particular, inthis case,

lim infn

1

nlog #Fix(fn) = 0.


However, the next proposition shows that this type of example is the worst thatcan happen. The proof also makes it clear when and how the limit may fail toexist.

Proposition 11.3.3. For any expanding map f : M →M ,

lim supn

1

nlog #Fix(fn) = h(f).

Proof. By Theorem 11.2.11, the restriction f to the set of periodic points isan expanding map. According to Exercise 11.2.5 this restriction has the sameentropy as f . Obviously, the two transformations have the same periodic points.Therefore, up to replacing f by this restriction, we may suppose that the set ofperiodic points is dense inM . Then, by the theorem of dynamical decomposition(Theorem 11.2.11), one may write M as a finite union of compact sets Mi,j ,with 1 ≤ i ≤ k and 1 ≤ j ≤ m(i), such that each fm(i) : Mi,j → Mi,j is atopologically exact expanding map. According to Exercise 11.2.6, there exists1 ≤ i ≤ k such that

h(f) =1

m(i)h(fm(i) | Mi,1). (11.3.3)

It is clear that

lim supn

1

nlog #Fix

(fn)≥ lim sup

n

1

nm(i)log #Fix

(fnm(i)

)

≥ 1

m(i)lim sup

n

1

nlog #Fix

((fm(i) | Mi,1)

n).

(11.3.4)

Moreover, Proposition 11.3.2 applied to fm(i) : Mi,1 →Mi,1 yields

limn

1

nlog #Fix

((fm(i) | Mi,1)

n)

= h(fm(i) |Mi,1). (11.3.5)

Combining (11.3.3)–(11.3.5), we find that

lim supn

1

nlog #Fix

(fn)≥ h(f), (11.3.6)

as we wanted to prove.

11.3.2 Approximation by atomic measures

Given a periodic point p, with period n ≥ 1, consider the probability measureµp defined by

µp =1

n

(δp + δf(p) + · · · + δfn−1(p)

).

Clearly, µp is invariant and ergodic for f . We are going to show that if f isexpanding then the set of measures of this form is dense in the space M1(f) ofall invariant probability measures:


Theorem 11.3.4. Let f : M → M be an expanding map. Then, every proba-bility measure µ invariant under f can be approximated, in the weak∗ topology,by invariant probability measures supported on periodic orbits.

Proof. Let ε > 0 and Φ = φ1, . . . , φN be a finite family of continuous functionsin M . We want to show that the neighborhood V (µ,Φ, ε) defined in (2.1.1)contains some measure µp supported on a periodic orbit. By the theorem ofBirkhoff, for µ-almost every point x ∈M ,

φi(x) = limn

1

n

n−1∑

t=0

φi(ft(x)) exists for every i. (11.3.7)

Fix C > sup |φi| ≥ sup |φi| and take δ > 0 such that

d(x, y) < δ ⇒ |φi(x) − φi(y)| <ε

5for every i. (11.3.8)

Fix κ = κ(δ) ≥ 1 given by the property of specification (Proposition 11.3.1).Choose points xj ∈ M , 1 ≤ j ≤ s satisfying (11.3.7) and positive numbers αj ,1 ≤ j ≤ s such that

∑j αj = 1 and

∣∣∫φi dµ−

s∑

j=1

αj φi(xj)∣∣ < ε

5for every i (11.3.9)

(use Exercise A.2.6). Take kj ≡ κ and choose integer numbers nj much biggerthan κ, in such a way that

∣∣ njms

− αj∣∣ < ε

5Cs(11.3.10)

(recall that ms =∑j(nj + kj) = sκ+

∑j nj) and, using (11.3.8),

∣∣nj−1∑

t=0

φi(ft(xj)) − nj φi(xj)

∣∣ < ε

5nj for 1 ≤ i ≤ N. (11.3.11)

Combining (11.3.9) and (11.3.10) with the fact that∫φi dµ =

∫φi dµ, we get

∣∣∫φi dµ−

s∑

j=1

njms

φi(xj)∣∣ < ε

5+

ε

5Css sup |φi| <

2ε

5. (11.3.12)

By Proposition 11.3.1, there exists some periodic point p ∈M , with period ms,such that d(fmj−1+t(p), f t(xj)) < δ for 0 ≤ t < nj and 1 ≤ j ≤ s. Then, theproperty (11.3.8) implies that

∣∣nj−1∑

t=0

φi(fmj−1+t(p)) −

nj−1∑

t=0

φi(ft(xj))

∣∣ < ε

5nj for 1 ≤ j ≤ s.


Combining this relation with (11.3.11), we obtain

∣∣nj−1∑

t=0

φi(fmj−1+t(p)) − nj φi(xj)

∣∣ < 2ε

5nj for 1 ≤ j ≤ s. (11.3.13)

Since∑

j αj = 1, the condition (11.3.10) implies that

sκ = ms −s∑

j=1

nj <ε

5Cms.

Then, (11.3.13) implies that

∣∣ms−1∑

t=0

φi(ft(p)) −

s∑

j=1

nj φi(xj)∣∣ < 2ε

5

s∑

j=1

nj + sκ sup |φi| <3ε

5ms. (11.3.14)

Let µp be the invariant probability measure supported on the orbit of p. The firstterm in (11.3.14) coincides with ms

∫φi dµp. Therefore, adding the inequalities

(11.3.12) and (11.3.14), we conclude that

∣∣∫φi dµp −

∫φi dµ

∣∣ < 2ε

5+

3ε

5= ε for every 1 ≤ i ≤ N.

This means that µp ∈ V (µ,Φ, ε), as we wanted to prove.

11.3.3 Exercises

11.3.1. Let f : M → M be a continuous transformation in a compact metricspace M . Check that if some iterate f l, l ≥ 1 has the property of specification,or specification by periodic points, then so does f .

11.3.2. Let f : M →M be a continuous transformation in a metric space withthe property of specification. Show that f is topologically mixing.

11.3.3. Let f : M → M be a topologically mixing expanding map and ϕ :M → R be a continuous function. Assume that there exist probability mea-sures µ1, µ2 invariant under f and such that

∫ϕdµ1 6=

∫ϕdµ2. Show that

there exists x ∈ M such that the time average of ϕ on the orbit of x does notconverge. [Observation: One can show (see [BS00]) that the set Mϕ of pointswhere the time average of ϕ does not converge has full entropy and full Hausdorffdimension.]

11.3.4. Prove the following generalization of Proposition 11.3.2: if f : M →Mis a topologically exact expanding map then

P (f, φ) = limk

1

klog

∑

p∈Fix(fk)

eφk(p) for every Holder function φ : M → R.


11.3.5. Let f : M → M be an expanding map of class C1 on a compactmanifold M . Show that f admits:

(a) A neighborhood U0 in the C0 topology (that is, the topology of uniformconvergence) such that f is a topological factor of every g ∈ U0. Inparticular, h(g) ≥ h(f) for every g ∈ U0.

(b) A neighborhood U1 in the C1 topology such that every g ∈ U1 is topolog-ically conjugate to f . In particular, g 7→ h(g) is constant on U1.

Chapter 12

Thermodynamical

Formalism

In this chapter we develop the ergodic theory of expanding maps on compactmetric spaces. This theory evolved from the kind of ideas in Statistical Me-chanics that we discussed in Section 10.3.4 and, for that reason, is often calledThermodynamical Formalism. We point out, however, that this last expressionis much broader, encompassing not only the original setting of MathematicalPhysics but also applications to other mathematical systems, such as the so-called uniformly hyperbolic diffeomorphisms and flows (in this latter regard, seethe excellent monograph of Rufus Bowen [Bow75a]).

The main result in this chapter is the following theorem of David Ruelle,which we prove in Section 12.1 (the notion of Gibbs state is also introduced inSection 12.1):

Theorem 12.1 (Ruelle). Let f : M → M be a topologically exact expandingmap on a compact metric space and ϕ : M → R be a Holder function. Then,there exists a unique equilibrium state µ for ϕ. Moreover, the measure µ isexact, it is supported on the whole M and is a Gibbs state.

Recall that an expanding map is topologically exact if (and only if) it istopologically mixing (Exercise 11.2.2).

In the particular case when M is a Riemannian manifold and f is differ-entiable, the equilibrium state of the potential ϕ = − log | detDf | coincideswith the absolutely continuous invariant measure given by Theorem 11.1.2. Inparticular, it is the unique physical measure of f . These facts are proven inSection 12.1.8.

The theorem of Livsic that we present in Section 12.2 complements thetheorem of Ruelle in a very elegant way. It asserts that two potentials ϕ andψ have the same equilibrium state if and only if the difference between themis cohomologous to a constant. In other words, this happens if and only ifϕ−ψ = c+uf−u for some c ∈ R and some continuous function u. Moreover,and remarkably, it suffices to check this condition on the periodic orbits of f .

387

388 CHAPTER 12. THERMODYNAMICAL FORMALISM

In Section 12.3 we show that the system (f, µ) exhibits exponential decayof correlations in the space of Holder functions, for every equilibrium state µ ofany Holder potential.

We close this chapter (Section 12.4) with an application of these ideas to aclass of geometric and dynamical objects called conformal repellers. We provethe Bowen-Manning formula according to which the Hausdorff dimension of therepeller is given by the unique zero of the function t 7→ P (f, tϕu).

12.1 Theorem of Ruelle

Let f : M → M be a topologically exact expanding map and ϕ be a Holderpotential. In what follows, ρ > 0 and σ > 1 are the same constants as in thedefinition (11.2.1). Recall that we denote by ϕn the orbital sums of ϕ:

ϕn(x) =

n−1∑

j=0

ϕ(f j(x)) for x ∈M .

Before getting into the details of the proof of Theorem 12.1 let us outline themain points. The arguments in the proof turn around the transfer operator (orRuelle-Perron-Frobenius operator), the linear operator L : C0(M) → C0(M)defined in the Banach space C0(M) of continuous complex functions by

Lg(y) =∑

x∈f−1(y)

eϕ(x)g(x). (12.1.1)

Observe that L is well defined: Lg ∈ C0(M) whenever g ∈ C0(M). Indeed,as we saw in Lemma 11.2.6, for each y ∈M there exist inverse branches

hi : B(y, ρ) → M, i = 1, . . . , k

of the transformation f such that ∪ki=1hi(B(y, ρ)) coincides with the pre-imageof the ball B(y, ρ). Then,

Lg =

k∑

i=1

(eϕg) hi (12.1.2)

restricted to B(y, ρ) and, clearly, this expression defines a continuous function.It is clear from the definition that L is a positive operator: if g(x) ≥ 0 for

every x ∈M then Lg(y) ≥ 0 for every y ∈M . It is also easy to check that L isa continuous operator: indeed,

‖Lg‖ = sup |Lg| ≤ degree(f)esupϕ sup |g| = degree(f)esupϕ‖g‖ (12.1.3)

for every g ∈ C0(M) and that means that ‖L‖ ≤ degree(f) esupϕ; recall thatdegree(f) was defined in (11.2.4).

12.1. THEOREM OF RUELLE 389

According to the theorem of Riesz-Markov (Theorem A.3.12), the dual spaceof the Banach space C0(M) may be identified with the space M(M) of allcomplex Borel measures. Then, the dual of the transfer operator is the linearoperator L∗ : M(M) → M(M) defined by

∫g d(L∗η

)=

∫ (Lg)dη for every g ∈ C0(M) and η ∈ M(M). (12.1.4)

This operator is positive, in the sense that if η is a positive measure then L∗ηis also a positive measure.

The first step in the proof (Section 12.1.1) is to show that L∗ admits apositive eigenmeasure ν associated with a positive eigenvalue λ. We will see thatsuch measure admits a positive Jacobian which is Holder and whose support isthe whole space M . Moreover (Section 12.1.2), the eigenmeasure ν is a Gibbsstate: there exists a constant P ∈ R and for each ε > 0 there exists K ≥ 1 suchthat

K−1 ≤ ν(B(x, n, ε))

exp(ϕn(x) − nP

) ≤ K for every x ∈M and every n ≥ 1, (12.1.5)

where B(x, n, ε) is the dynamical ball defined in (9.3.2). Actually, P = logλ.Behind the proof of the Gibbs property are certain results about distortion

control that are also crucial to show (Section 12.1.3) that the transfer operatoritself L admits an eigenfunction associated with the eigenvalue λ. This functionis strictly positive and Holder. The measure µ = hν is the equilibrium state weare looking for, although that will take a little while to prove.

It follows easily from the properties of h (Section 12.1.4) that µ is invariantand a Gibbs state and its support is the wholeM . Moreover, hµ(f)+

∫ϕdµ = P .

To conclude that µ is indeed an equilibrium state, we need to check that P isequal to the pressure P (f, ϕ). This is done (Section 12.1.5) with the help ofthe Rokhlin formula (Theorem 9.7.3), which also allows us to conclude that ifη is an equilibrium state then η/h is an eigenmeasure of L∗ associated with theeigenvalue λ = logP (f, ϕ). This last result is the key ingredient for provingthat the equilibrium state is unique (Section 12.1.6).

The distortion control is, again, crucial role for checking (Section 12.1.7) thatthe system (f, µ) is exact. Finally, in Section 12.1.8 we comment on the specialcase ϕ = − log | det f |, when f is an expanding map on a Riemannian manifold.In this case, the reference measure ν is the Lebesgue measure on the manifolditself. Thus, the equilibrium state µ is an invariant measure equivalent to theLebesgue measure, and so it coincides with the invariant measure constructedin Section 11.1.

Before we start to detail these steps, it is convenient to make a couple ofquick comments. Firstly, note that the existence of an equilibrium state followsimmediately from Corollary 10.5.9, since Lemma 11.1.4 asserts that every ex-panding map is expansive. However, this fact is not used in the proof: instead,in Section 12.1.4 we present a much more explicit construction of the equilibriumstate.


The other comment concerns the Rokhlin formula. Let P be any finitepartition of M with diamP < ρ. For each n ≥ 1, every element of the partitionPn = ∨n−1

j=0 f−j(P) is contained in the image hn−1(P ) of some P ∈ P by an

inverse branch hn−1 of the iterate fn−1. In particular, diamPn < σ−n+1ρfor every n. Then, P satisfies the hypotheses of Theorem 9.7.3 at every point.Hence, the Rokhlin formula holds for every invariant probability measure.

12.1.1 Reference measure

Recall that C0+(M) denotes the cone of positive continuous functions. As ob-

served previously, this cone is preserved by the transfer operator L. The dualcone (recall Example 2.3.3) is defined by

C0+(M)∗ = η ∈ C0(M)∗ : η(ψ) ≥ 0 for every ψ ∈ C0

+(M)

and may be seen as the cone of finite positive Borel measures. It follows directlyfrom (12.1.4) that C0

+(M)∗ is preserved by the dual operator L∗.

Lemma 12.1.1. Consider the spectral radius λ = ρ(L∗) = ρ(L). Then, thereexists some probability measure ν on M such that L∗ν = λν.

Proof. As we saw in Exercise 2.3.3, the cone C0+(M) is normal. Hence, we may

apply Theorem 2.3.4 with E = C0(M) and C = C0+(M) and T = L. The

conclusion of the theorem means that L∗ admits some eigenvector ν ∈ C0+(M)∗

associated with the eigenvalue λ. As we have just explained, ν may be identifiedwith a finite positive measure. Normalizing ν, we may take it to be a probabilitymeasure.

In Exercise 12.1.2 we propose an alternative proof of Lemma 12.1.1, basedon the Tychonoff-Schauder theorem (Theorem 2.2.3).

Example 12.1.2. Let f : M → M be a local diffeomorphism on a compactRiemannian manifold M . Consider the transfer operator L associated with thepotential ϕ = − log | detDf |. The Lebesgue measure m (that is, the volumemeasure induced by the Riemannian metric) of M is an eigenmeasure of thetransfer operator associated with the eigenvalue λ = 1:

L∗m = m. (12.1.6)

To check this fact, it is enough to show that L∗m(E) = m(E) for every mea-surable set E contained in the image of some inverse branch hj : B(y, ρ) → M(because, M being compact, every measurable set may be written as a finitedisjoint union of subsets E of this kind). Now, using the expression (12.1.2),

L∗m(E) =

∫XE d(L∗m) =

∫(LXE) dm =

∫ k∑

i=1

XE| detDf | hi dm.


Hence, by the choice of E and the formula of change of variables,

L∗m(E) =

∫ k∑

i=1

XE| detDf | hi dm =

∫XE dm = m(E).

This proves that m is, indeed, a fixed point of L∗.Exercise 12.1.3 gives a similar conclusion for Markov measures.

From now on, we always take ν to be a reference measure, that is, a probabil-ity measure such that L∗ν = λν for some λ > 0. By the end of the proof of The-orem 12.1 we will find that λ is uniquely determined (in view of Lemma 12.1.1,that means that λ is necessarily equal to the spectral radius of L and L∗) andthe measure ν itself is also unique.

Initially, we show that f admits a Jacobian with respect to ν, which may bewritten explicitly in terms of the eigenvalue λ and the potential ϕ:

Lemma 12.1.3. The transformation f : M → M admits a Jacobian withrespect to ν, given by Jνf = λe−ϕ.

Proof. Let A be any domain of invertibility of f . Let (gn)n be a sequence ofcontinuous functions converging to the characteristic function of A at ν-almostevery point and such that sup |gn| ≤ 1 for every n (see Exercise A.3.5). Observethat

L(e−ϕgn)(y) =∑

x∈f−1(y)

gn(x).

The expression on the right-hand side is bounded by the degree of f , as definedin (11.2.4), and it converges to χf(A)(y) at ν-almost every point. Hence, usingthe dominated convergence theorem, the sequence

∫λe−ϕgn dν =

∫e−ϕgn d(L⋆ν) =

∫L(e−ϕgn) dν

converges to ν(f(A)). Since the expression on the left-hand side converges to∫A λe

−ϕdν, we conclude that

ν(f(A)) =

∫

A

λe−ϕdν,

which proves the claim.

The next lemma applies, in particular, to the reference measure ν:

Lemma 12.1.4. Let f : M →M be a topologically exact expanding map and ηbe any Borel probability measure such that there exists Jacobian of f with respectto η. Then, η it is supported on the whole M .

Proof. Suppose, by contradiction, that there exists some open set U ⊂M suchthat η(U) = 0. Note that f is an open map, since it is a local homeomorphism.


Thus, the image f(U) is also an open set. Moreover, we may write U as a finitedisjoint union of domains of invertibility A. For each one of them,

η(f(A)) =

∫

A

Jηf dη = 0.

Therefore, η(f(U)) = 0. By induction, it follows that η(fn(U)) = 0 for everyn ≥ 0. Since we take f to be topologically exact, there exists n ≥ 1 such thatfn(U) = M . This contradicts the fact that η(M) = 1.

12.1.2 Distortion and the Gibbs property

In this section we prove certain distortion bounds that have a central role in theproof of Theorem 12.1. The hypothesis that ϕ is Holder is critical at this stage:most of what follows is false, in general, if the potential is only continuous. As afirst application of this distortion control, we prove that every reference measureν is a Gibbs state.

Fix constants K0 > 0 and α > 0 such that |ϕ(z) − ϕ(w)| ≤ K0d(z, w)α forany z, w ∈M .

Lemma 12.1.5. There exists K1 > 0 such that for every n ≥ 1, every x ∈ Mand every y ∈ B(x, n+ 1, ρ),

|ϕn(x) − ϕn(y)| ≤ K1d(fn(x), fn(y))α.

Proof. By hypothesis, d(f i(x), f i(y)) < ρ for every 0 ≤ i ≤ n. Then, foreach j = 1, . . . , n, the inverse branch hj : B(fn(x), ρ) → M of f j at thepoint fn−j(x), which maps fn(x) to fn−j(x), also maps fn(y) to fn−j(y).Hence, recalling (11.2.6), d(fn−j(x), fn−j(y)) ≤ σ−jd(fn(x), fn(y)) for everyj = 1, . . . , n. Then,

|ϕn(x) − ϕn(y)| ≤n∑

j=1

|ϕ(fn−j(x)) − ϕ(fn−j(y))|

≤n∑

j=1

K0σ−jαd(fn(x), fn(y))α.

Therefore, we may take any K1 ≥ K0

∑∞j=0 σ

−jα.

As a consequence of Lemma 12.1.5, we obtain the following variation ofProposition 11.1.5 where the usual Jacobian with respect to the Lebesgue mea-sure is replaced by the Jacobian with respect to any reference measure ν:

Corollary 12.1.6. There exists K2 > 0 such that for every n ≥ 1, every x ∈Mand every y ∈ B(x, n+ 1, ρ),

K−12 ≤ Jνf

n(x)

Jνfn(y)≤ K2.


Proof. From the expression of the Jacobian in Lemma 12.1.3 it follows that(recall Exercise 9.7.5)

Jνfn(z) = λne−ϕn(z) for every z ∈M and every n ≥ 1. (12.1.7)

Then, Lemma 12.1.5 yields

∣∣ logJνf

n(x)

Jνfn(y)

∣∣ =∣∣ϕn(x) − ϕn(y)

∣∣ ≤ K1d(fn(x), fn(y))α ≤ K1ρ

α.

So, it suffices to take K2 = exp(K1ρα).

Now we may show that every reference measure ν is a Gibbs state:

Lemma 12.1.7. For every small ε > 0, there exists K3 = K3(ε) > 0 such that,denoting P = log λ,

K−13 ≤ ν(B(x, n, ε))

exp(ϕn(x) − nP )≤ K3 for every x ∈M and every n ≥ 1.

Proof. Consider ε < ρ. Then, f | B(y, ε) is injective for every y ∈ M and,consequently, fn | B(x, n, ε) is injective for every x ∈M and every n. Then,

ν(fn(B(x, n, ε))) =

∫

B(x,n,ε)

Jνfn(y)dν(y).

By Corollary 12.1.6, the value of Jνfn at any point y ∈ B(x, n, ε) differs from

Jνfn(x) by a factor bounded by the constant K2. It follows that

K−12 ν(fn(B(x, n, ε))) ≤ Jνf

n(x)ν(B(x, n, ε)) ≤ K2ν(fn(B(x, n, ε))). (12.1.8)

Now, Jνfn(x) = λne−ϕn(x) = exp(nP − ϕn(x)), as we saw in (12.1.7). By

Lemma 11.2.7 we also have that fn(B(x, n, ε)) = f(B(fn−1(x), ε)), and so

ν(fn(B(x, n, ε))) =

∫

B(fn−1(x),ε)

Jνf dν (12.1.9)

for every x ∈ M and every n. It is clear that the left-hand side of (12.1.9)is bounded above by 1. Moreover, Jνf = λe−ϕ is bounded from zero and (byExercise 12.1.1 and Lemma 12.1.4) the set ν(B(y, ε)) : y ∈M is also boundedfrom zero. Therefore, the right-hand side of (12.1.9) is bounded below by somenumber a > 0. Using these observations in (12.1.8), we obtain

K−12 a ≤ ν(B(x, n, ε))

exp(ϕn(x) − nP )≤ K2.

Now it suffices to take K3 = maxK2/a,K2.


12.1.3 Invariant density

Next, we show that the transfer operator L admits some positive eigenfunctionh associated with the eigenvalue λ. We are going to find h as a Cesaro accumu-lation point of the sequence of functions λ−nLn1. To show that there does existsome accumulation point, we start by proving that this sequence is uniformlybounded and equicontinuous.

Lemma 12.1.8. There exists K4 > 0 such that

−K4d(y1, y2)α ≤ log

Ln1(y1)

Ln1(y2)≤ K4d(y1, y2)

α

for every n ≥ 1 and any y1, y2 ∈M with d(y1, y2) < ρ.

Proof. It follows from (12.1.2) that, given any continuous function g,

Lng =∑

i

(eϕng

) hni restricted to each ball B(y, ρ),

where the sum is over all inverse branches hni : B(y, ρ) → M of the iterate fn.In particular,

Ln1(y1)

Ln1(y2)=

∑i eϕn(hn

i (y1))

∑i eϕn(hn

i (y2)).

By Lemma 12.1.5, for each of these inverse branches hni one has

|ϕn(hni (y1)) − ϕn(hni (y2))| ≤ K1d(y1, y2)α.

Consequently,

e−K1d(y1,y2)α ≤ Ln1(y1)

Ln1(y2)≤ eK1d(x1,x2)

α

.

Therefore, one may take any K4 ≥ K1.

It follows that the sequence λ−nLn1 is bounded from zero and infinity:

Corollary 12.1.9. There exists K5 > 0 such that K−15 ≤ λ−nLn1(x) ≤ K5 for

every n ≥ 1 and any x ∈M .

Proof. Start by observing that, for every n ≥ 1,∫

Ln1 dν =

∫1 d(L∗nν) =

∫λn dν = λn.

In particular, for every n ≥ 1,

miny∈M

λ−nLn1(y) ≤ 1 ≤ maxy∈M

λ−nLn1(y). (12.1.10)

Since f is topologically exact, there exists N ≥ 1 such that fN(B(x, ρ)) = Mfor every x ∈ M (check Exercise 11.2.3). Now, given any x, y ∈ M , we mayfind x′ ∈ B(x, ρ) such that fN (x′) = y. Then, on the one hand,

Ln+N1(y) =∑

z∈f−N (y)

eϕN(z)Ln1(z) ≥ eϕN (x′)Ln1(x′) ≥ e−cNLn1(x′).


On the other hand, Lemma 12.1.8 gives that Ln1(x′) ≥ Ln1(x) exp(−K4ρα).

Take c = sup |ϕ| and K ≥ exp(K4ρα)ecNλN . Combining the previous inequali-

ties, we get that

Ln+N1(y) ≥ exp(−K4ρα)e−cNLn1(x) ≥ K−1λNLn1(x)

for every x, y ∈M . Therefore, for every n ≥ 1,

minλ−(n+N)Ln+N1 ≥ K−1 maxλ−nLn1. (12.1.11)

Combining (12.1.10) and (12.1.11), we get:

maxλ−nLn1 ≤ K minλ−(n+N)Ln+N1 ≤ K for every n ≥ 1

minλ−nLn1 ≥ K−1 maxλ−n+NLn−N1 ≥ K−1 for every n > N .

To conclude the proof, we only have to extend this last estimate to the valuesn = 1, . . . , N . For that, observe that each Ln1 is a positive continuous function.Since M is compact, it follows that the minimum of Ln1 is positive for everyn. Then, we may take K5 ≥ K such that min λ−nLn1 ≥ K−1

5 for every n =1, . . . , N .

It follows immediately from Corollary 12.1.9 that the positive eigenvalue λis uniquely determined. By Lemma 12.1.1, this implies that λ = ρ(L) = ρ(L∗).We are also going to see, in a while, that λ = eP (f,ϕ).

Lemma 12.1.10. There exists K6 > 0 such that

|λ−nLn1(x) − λ−nLn1(y)| ≤ K6d(x, y)α for any n ≥ 1 and x, y ∈M .

In particular, the sequence λ−nLn1 is equicontinuous.

Proof. Initially, suppose that d(x, y) < ρ. By Lemma 12.1.8,

Ln1(x) ≤ Ln1(y) exp(K4d(x, y)α)

and, hence,

λ−nLn1(x) − λ−nLn1(y) ≤[exp(K4d(x, y)

α) − 1]λ−nLn1(y).

Take K > 0 such that | exp(K4t) − 1| ≤ K|t| whenever |t| ≤ ρα. Then, usingCorollary 12.1.9,

λ−nLn1(x) − λ−nLn1(y) ≤ KK5d(x, y)α.

Reversing the roles of x and y, we conclude that

|λ−nLn1(x) − λ−nLn1(y)| ≤ KK5d(x, y)α whenever d(x, y) < ρ.

When d(x, y) ≥ ρ, Corollary 12.1.9 gives that

|λ−nLn1(x) − λ−nLn1(y)| ≤ 2K5 ≤ 2K5ρ−αd(x, y)α.

Hence, it suffices to take K6 ≥ maxKK5, 2K5ρ−α to get the first part of the

statement. The second part is an immediate consequence.


We are ready to show that the transfer operator L admits some eigenfunctionassociated with the eigenvalue λ. Corollary 12.1.9 and Lemma 12.1.10 implythat the time average

hn =1

n

n−1∑

i=0

λ−iLi1

defines an equicontinuous bounded sequence. Then, by the theorem of Ascoli-Arzela, there exists some subsequence (hni)i converging uniformly to a contin-uous function h.

Lemma 12.1.11. The function h satisfies Lh = λh. Moreover,∫h dν = 1 and

K−15 ≤ h(x) ≤ K5 and |h(x) − h(y)| ≤ K6d(x, y)

α for every x, y ∈M .

Proof. Consider any subsequence (hni)i converging to h. As the transfer oper-ator L is continuous,

Lh = limiLhni = lim

i

1

ni

ni−1∑

k=0

λ−kLk+11 = limi

λ

ni

ni∑

k=1

λ−kLk1

= limi

λ

ni

ni−1∑

k=0

λ−kLk1 +λ

ni

(λ−niLni1 − 1

).

The first term on the right-hand side converges to λh whereas the second oneconverges to zero, because the sequence λ−nLn1 is uniformly bounded. It followsthat Lh = λh, as we stated.

Note that∫λ−nLn1 dν =

∫λ−nd(L∗nν) =

∫1 dν = 1 for every n ∈ N, by the

definition of ν. It follows that∫hn dν = 1 for every n and, using the dominated

convergence theorem,∫h dν = 1. All the other claims in the statement follow,

in an entirely analogous way, from Corollary 12.1.9 and Lemma 12.1.10.

12.1.4 Construction of the equilibrium state

Consider the measure defined by µ = hν, that is,

µ(A) =

∫

A

h dν for each measurable set A ⊂M .

We are going to see that µ is an equilibrium state for the potential ϕ and satisfiesall the other conditions in Theorem 12.1.

From Lemma 12.1.11 we get that µ(M) =∫h dν = 1 and so µ is a probability

measure. Moreover,

K−15 ν(A) ≤ µ(A) ≤ K5ν(A) (12.1.12)

for every measurable set A ⊂M . In particular, µ is equivalent to the referencemeasure ν. This fact, together with Lemma 12.1.4, gives that suppµ = M . It


also follows from the relation (12.1.12), together with Lemma 12.1.7, that µ isa Gibbs state: taking L = K5K, we find that

L−1 ≤ µ(B(x, n, ε))

exp(ϕn(x) − nP )≤ L, (12.1.13)

for every x ∈M and every n ≥ 1. Recall that P = logλ.

Lemma 12.1.12. The probability measure µ is invariant under f . Moreover,f admits a Jacobian with respect to µ, given by Jµf = λe−ϕ(h f)/h.

Proof. Start by noting that L((g1 f)g2) = g1Lg2, for any continuous functions

g1, g2 : M → R: for every y ∈M ,

L((g1 f)g2

)(y) =

∑

x∈f−1(y)

eϕ(x)g1(f(x))g2(x)

= g1(y)∑

x∈f−1(y)

eϕ(x)g2(x) = g1(y)Lg2(y).(12.1.14)

Thus, for every continuous function g : M → R,∫

(g f) dµ = λ−1

∫(g f)h d(L∗ν) = λ−1

∫L((g f)h

)dν

= λ−1

∫gLh dν =

∫gh dν =

∫g dµ.

In view of Proposition A.3.3, this proves that the probability measure µ isinvariant under f .

To prove the second claim, consider any domain of invertibility A of f . Then,using Lemma 9.7.4(a),

µ(f(A)) =

∫

f(A)

1 dµ =

∫

f(A)

h dν =

∫

A

Jνf(h f) dν =

∫

A

Jνfh fh

dµ.

By Lemma 12.1.3, this means that

Jµf = Jνfh fh

= λe−ϕh fh

,

as stated.

Corollary 12.1.13. The invariant probability measure µ = hν satisfies

hµ(f) +

∫ϕdµ = P.

Proof. Combining the Rokhlin formula (Theorem 9.7.3) with the second part ofLemma 12.1.12,

hµ(f) =

∫log Jµf dµ = logλ−

∫ϕdµ+

∫(log h f − log h) dµ.

Since µ is invariant and logh is bounded (Corollary 12.1.9), the last term isequal to zero. This shows that hµ(f) = P −

∫ϕdµ, as stated.


To complete the proof that µ = hν is an equilibrium state, all that we needto do is to check that P = logλ is equal to the pressure P (f, ϕ). This is donein Corollary 12.1.15 below.

12.1.5 Pressure and eigenvalues

Let η be any probability measure invariant under f and such that

hη(f) +

∫ϕdη ≥ P (12.1.15)

(for example: the probability measure µ constructed in the previous section).Let gη = 1/Jηf and consider also the function g = λ−1eϕh/(h f). Observethat

∑

x∈f−1(y)

g(x) =1

λh(y)

∑

x∈f−1(y)

eϕ(x)h(x) =Lh(y)λh(y)

= 1 (12.1.16)

for every y ∈ M . Moreover, since η is invariant under f , Exercise 9.7.4 givesthat ∑

x∈f−1(y)

gη(x) = 1 for η-almost every y ∈M . (12.1.17)

Using (12.1.15) and the Rokhlin formula (Theorem 9.7.3),

0 ≤ hη(f) +

∫ϕdη − P =

∫(− log gη + ϕ− logλ) dη. (12.1.18)

By the definition of g and the hypothesis that η is invariant, the integral on theright-hand side of (12.1.18) is equal to

∫(− log gη + log g + log h f − log h) dη =

∫log

g

gηdη. (12.1.19)

Recalling the definition of gη, Exercise 9.7.3 gives that

∫log

g

gηdη =

∫ ( ∑

x∈f−1(y)

gη(x) logg

gη(x))dη(y). (12.1.20)

At this point we need the following elementary fact:

Lemma 12.1.14. Let pi, bi, i = 1, . . . , k be positive real numbers such that∑ki=1 pi = 1. Then,

k∑

i=1

pi log bi ≤ log(k∑

i=1

pibi)

and the identity holds if and only if the numbers bj are all equal to∑k

i=1 pibi.


Proof. Take ai = log(pibi) in Lemma 10.4.4. Then, the inequality in the con-clusion of Lemma 10.4.4 corresponds exactly to the inequality in the presentlemma. Moreover, the identity holds if and only if

pj =eaj

∑i eai

⇔ pj =pjbj∑i pibi

⇔ bj =∑

i

pibi

for every j = 1, . . . , n.

For each y ∈ M , take pi = gη(xi) and bi = log(g(xi)/gη(xi)), where thexi are the pre-images of y. The identity (12.1.17) means that

∑i pi = 1 for

η-almost every y. Then, we may apply Lemma 12.1.14:

∑

x∈f−1(y)

gη(x) logg

gη(x) ≤ log

∑

x∈f−1(y)

gη(x)g

gη(x)

= log∑

x∈f−1(y)

g(x) = 0(12.1.21)

for η-almost every y; in the last step we used (12.1.16). Combining the relations(12.1.18) through (12.1.21), we find:

hη(f) +

∫ϕdη − P =

∫log

g

gηdη = 0. (12.1.22)

Corollary 12.1.15. P (f, ϕ) = P = log ρ(L).

Proof. By (12.1.22), we have that hη(f) +∫ϕdη = P for every invariant prob-

ability measure η such that hη(f) +∫ϕdη ≥ P . By the variational principle

(Theorem 10.4.1), it follows that P (f, ϕ) = P . The second identity had beenobserved before, right after Corollary 12.1.9.

At this point we have completed the proof that the measure µ = hν con-structed in the previous section is an equilibrium state for ϕ. The statementthat follows arises from the same kind of ideas and is the basis for proving thatthis equilibrium state is unique:

Corollary 12.1.16. If η is an equilibrium state for ϕ then supp η = M and

Jηf = λe−ϕ(h f)/h and L∗(η/h) = λ(η/h).

Proof. The first claim is an immediate consequence of the second one andLemma 12.1.4.

Note that the identity in (12.1.22) also implies that the identity in (12.1.21)holds for η-almost every y ∈ M . According to Lemma 12.1.14, that happens ifand only if the numbers bi = log(g(xi)/gη(xi)) are all equal. In other words, forη-almost every y ∈M there exists a number c(y) such that

g(x)

gη(x)= c(y) for every x ∈ f−1(y).

h

)=

∫1

h

(Lξ)dη =

∫1

h(y)

( ∑

x∈f−1(y)

eϕ(x)ξ(x))dη(y). (12.1.23)

By the definition of the function g,

eϕ(x)

h(y)=λg(x)

h(x).

Replacing this identity in (12.1.23), we obtain:∫ξ dL∗

(ηh

)=

∫ ( ∑

x∈f−1(y)

λgξ

h(x))dη(y). (12.1.24)

Then, recalling that g = gη = 1/Jηf , we may use Exercise 9.7.3 to concludethat ∫

ξ dL∗(ηh

)=

∫ ( ∑

x∈f−1(y)

λgξ

h(x))dη(y) =

∫λξ

hdη.

Since the continuous function ξ is arbitrary, this shows that L∗(η/h) = λ(η/h),as stated.

12.1.6 Uniqueness of the equilibrium state

Let us start by proving the following distortion bound:

Corollary 12.1.17. There exists K7 > 0 such that for every equilibrium stateη, every n ≥ 1, every x ∈M and every y ∈ B(x, n+ 1, ρ),

K−17 ≤ Jηf

n(x)

Jηfn(y)≤ K7.

Proof. By Corollary 12.1.16,

Jηfn = λe−ϕn

h fnh

= Jνfnh fn

h

for each n ≥ 1. Then, using Corollary 12.1.6 and Lemma 12.1.11,

K−12 K−4

5 ≤ Jηfn(x)

Jηfn(y)=Jνf

n(x)

Jνfn(y)

h(fn(x))h(y)

f(fn(y))h(x)≤ K2K

45 .

Therefore, it suffices to take K7 = K2K45 .


Lemma 12.1.18. All the equilibrium states of ϕ are equivalent measures.

Proof. Let η1 and η2 be equilibrium states. Consider any finite partition P of Msuch that every P ∈ P has non-empty interior and diameter less than ρ. Sincesupp η1 = supp η2 = M (by Corollary 12.1.16), the set ηi(P ) : i = 1, 2 and P ∈P is bounded from zero. Consequently, there exists C1 > 0 such that

1

C1≤ η1(P )

η2(P )≤ C1 for every P ∈ P . (12.1.25)

We are going to show that this relation extends to every measurable subset ofM , up to replacing C1 by a convenient constant C2 > C1.

For each n ≥ 1, let Qn be the partition of M formed by the images hn(P )of the elements of P under the inverse branches hn of the iterate fn. By thedefinition of Jacobian, ηi(P ) =

∫hn(P ) Jηif

n dηi. Hence, using Corollary 12.1.17,

K−17 Jηif

n(x) ≤ ηi(P )

ηi(hn(P ))≤ K7Jηif

n(x)

for any x ∈ hn(P ). Recalling that Jη1f = Jη2f (Corollary 12.1.16), it followsthat

K−27 ≤ η2(P )η1(h

n(P ))

η1(P )η2(hn(P ))≤ K2

7 . (12.1.26)

Combining (12.1.25) and (12.1.26), and taking C2 = C1K27 , we get that

1

C2≤ η1(h

n(P ))

η2(hn(P ))≤ C2 (12.1.27)

for every P ∈ P , every inverse branch hn of fn and every n ≥ 1. In other words,the property in (12.1.25) holds for every element of Qn, with C2 in the place ofC1.

Now observe that diamQn ≤ 2σ−nρ for every n. Given any measurable setB and any δ > 0, we may use Proposition A.3.2 to find a compact set F ⊂ Band an open set A ⊃ B such that ηi(A \ F ) < δ for i = 1, 2. Let Qn be theunion of all the elements of the partition Qn that intersect F . It is clear thatQn ⊃ F and, assuming that n is large enough, Qn ⊂ A. Then,

η1(B) ≤ η1(A) < η1(Qn) + δ and η2(B) ≥ η2(F ) > η2(Qn) − δ.

The relation (12.1.27) gives that η1(Qn) ≤ C2η2(Qn), since Qn is a (disjoint)union of elements of Qn. Combining these three inequalities, we obtain

η1(B) < C2

(η2(B) + δ

)+ δ.

Since δ is arbitrary, we conclude that η1(B) ≤ C2η2(B) for every measurable setB ⊂M . Reversing the roles of the two measures, we also get η2(B) ≤ C2η2(B)for every measurable set B ⊂M .

These inequalities prove that any two equilibrium states are equivalent mea-sures, with Radon-Nikodym derivatives bounded from zero and infinity.


Combining Lemmas 4.3.3 and 12.1.18 we get that all the ergodic equilibriumstates are equal. Now, by Proposition 10.5.5, the connected components of anyequilibrium state are also equilibrium states (ergodic, of course). It follows thatthere exists a unique equilibrium state, as stated.

There is an alternative proof of the fact that the equilibrium state is uniquethat does not use Proposition 10.5.5 and, thus, does not require the theorem ofJacobs. Indeed, the results in the next section imply that the equilibrium stateµ = hν in Section 12.1.4 is ergodic. By Lemma 12.1.18, that implies that allthe equilibrium states are ergodic. Using Lemma 4.3.3, it follows that all theequilibrium states must coincide.

As a consequence, the reference measure ν is also unique: if there were twodistinct reference measures, ν1 and ν2, then µ1 = hν1 and µ2 = hν2 would bedistinct equilibrium states. Analogously, the positive eigenfunction h is uniqueup to multiplication by a positive constant.

12.1.7 Exactness

Finally, let us prove that the system (f, µ) is exact. Recall that this means thatif B ⊂ M is such that there exist measurable sets Bn satisfying B = f−n(Bn)for every n ≥ 1, then B has measure 0 or measure 1.

Let B be such a subset of M and assume that µ(B) > 0. Let P be a finitepartition of M by subsets with non-empty interior and diameter less than ρ.For each n, let Qn be the partition of M whose elements are the images hn(P )of the sets P ∈ P under the inverse branches hn of the iterate fn.

Lemma 12.1.19. For every ε > 0 and every n ≥ 1 sufficiently large there existssome hn(P ) ∈ Qn such that

µ(B ∩ hn(P )) > (1 − ε)µ(hn(P )). (12.1.28)

Proof. Fix ε > 0. Since the measure µ is regular (Proposition A.3.2), given anyδ > 0 there exist some compact set F ⊂ B and some open set A ⊃ B satisfyingµ(A \ F ) < δ. Since we assume that µ(B) > 0, this inequality implies thatµ(F ) > (1 − ε)µ(A), as long as δ > 0 is sufficiently small. Fix δ from now on.Note that diamQn < σ−nρ. Then, for every n sufficiently large, any elementhn(P ) of Qn that intersects F is contained in A. By contradiction, supposethat (12.1.28) is false for every hn(P ). Then, adding over all the hn(P ) thatintersect F ,

µ(F ) ≤∑

P,hn

µ(F ∩ hn(P )) ≤∑

P,hn

µ(B ∩ hn(P ))

≤ (1 − ε)∑

P,hn

µ(hn(P )) ≤ (1 − ε)µ(A).

This contradiction proves that (12.1.28) is valid for some hn(P ) ∈ Qn.


Consider any hn(P ) ∈ Qn such that (12.1.28). Since B = f−n(Bn) andfn hn = id in its domain, we have that fn

(hn(P ) \ B) = P \ Bn. Then,

applying Corollary 12.1.17 to the measure η = µ,

µ(P \Bn) =

∫

hn(P )\B

Jµfn dµ ≤ K7µ(hn(P ) \B)Jµf

n(x)

and µ(P ) =

∫

hn(P )

Jµfn dµ ≥ K−1

7 µ(hn(P ))Jµfn(x)

(12.1.29)

for any x ∈ hn(P ). Combining (12.1.28) and (12.1.29),

µ(P \Bn)µ(P )

≤ K27

µ(hn(P ) \B)

µ(hn(P ))≤ K2

7ε.

In this way we have shown that, given any ε > 0 and any n ≥ 1 sufficientlylarge, there exists some P ∈ P such that µ(P \Bn) ≤ K2

7εµ(P ).Since the partition P is finite, it follows that there exist some P ∈ P and

some sequence (nj)j → ∞ such that

µ(P \Bnj ) → 0 when j → ∞. (12.1.30)

Let P be fixed from now on. Since, by assumption, P has non-empty interiorand f is topologically exact, there exists N ≥ 1 such that fN (P ) = M . LetP = P1 ∪ · · · ∪ Ps be a finite partition of P into domains of invertibility offN . The Corollaries 12.1.9 and 12.1.16 give that Jµf

N = λNe−ϕN (h fN )/f isbounded from zero and infinity. Note also that fN(Pi \Bnj ) = fN (Pi)\Bnj+N ,because f−n(Bn) = B for every n. Combining these two observations with(12.1.30), we find that, given any i = 1, . . . , s, the sequence

µ(fN (Pi) \Bnj+N ) = µ(fN (Pi \Bnj )) =

∫

Pi\Bnj

log JµfN dµ

converges to zero when j → ∞. Now, fN(Pi) : i = 1, . . . , s is a finite cover ofM by measurable sets. Therefore, this last conclusion implies that µ(M\Bnj+N )converges to zero, that is, µ(B) = µ(Bnj+N ) converges to 1 when j → ∞. Thatmeans that µ(B) = 1, of course.

The proof of Theorem 12.1 is complete.

12.1.8 Absolutely continuous measures

In this last section on the theorem of Ruelle we briefly discuss the special casewhen f : M →M is a local diffeomorphism on a compact Riemannian manifoldand ϕ = − log | detDf |. It is assumed that f is such that this potential ϕ isHolder. The first goal is to compare the conclusions of the theorem of Ruelle inthis case with the results in Section 11.1:

Proposition 12.1.20. The invariant absolutely continuous probability measurecoincides with the equilibrium state µ of the potential ϕ = − log | detDf |. Con-sequently, it is equivalent to the Lebesgue measure m, with density dµ/dm Holderand bounded from zero and infinity, and it is exact.


Proof. We saw in Example 12.1.2 that the Lebesgue measure m is an eigen-vector of the dual L∗ of the transfer operator associated with the potentialϕ = − log | detDf |: more precisely,

L∗m = m.

Applying the previous theory (from Lemma 12.1.3 on) with λ = 1 and ν = m,we find a Holder function h : M → R, bounded from zero and infinity, suchthat Lh = h and the measure µ = hm is the equilibrium state of the potentialϕ. Recalling Corollary 11.1.15, it follows that µ is also the unique probabilitymeasure invariant under f and absolutely continuous with respect to m. Thefact that h is positive implies that µ and m are equivalent measures. Exactnesswas proven in Section 12.1.7.

It is worthwhile pointing out that, while the absolutely continuous invariantmeasure is unique (Theorem 11.1.2), the potential ϕ = − log | detDf | dependson the choice of the Riemannian metric onM , because the determinant does. So,Proposition 12.1.20 also implies that all the potentials of this form, correspond-ing to different choices of the Riemannian metric, have the same equilibriumstate. This type of situation is the subject of Section 12.2 and, in particular,Exercise 12.2.3.

It also follows from the proof of Theorem 12.1 that

hµ(f) −∫

log | detDf | dµ = P (f,− log | detDf |) = logλ = 0. (12.1.31)

Let ϕ be the time average of the function ϕ, given by the Birkhoff ergodictheorem. Then,

∫log | detDf | dµ =

∫−ϕdµ =

∫−ϕ dµ. (12.1.32)

Moreover,

−ϕ(x) = limn

1

n

n−1∑

j=0

log | detDf(f j(x))| = limn

1

nlog | detDfn(x)| (12.1.33)

at µ-almost every point. In the context of our comments about the Oseledetstheorem (see the relation (c1) in Section 3.3.5) we mentioned that

limn

1

nlog | detDfn(x)| =

k(x)∑

i=1

di(x)λi(x), (12.1.34)

where λ1(x), . . . , λk(x)(x) are the Lyapunov exponents of the transformationf at the point x and d1(x), . . . , dk(x)(x) are the corresponding multiplicities.Combining the relations (12.1.31)-(12.1.34), we find that

hµ(f) =

∫ ( k(x)∑

i=1

di(x)λi(x))dµ(x). (12.1.35)


Since theses functions are invariant (see the relation (a1) in Section 3.3.5)and the measure µ is ergodic, the functions k(x), λi(x) and di(x) are constantat µ-almost every point. Let us denote by k, λi and di these constants. Then,(12.1.35) translates into the following theorem:

Theorem 12.1.21. Let f : M → M be an expanding map on a compact Rie-mannian manifold, such that the derivative Df is Holder. Let µ be the uniqueinvariant probability measure absolutely continuous with respect to the Lebesguemeasure on M . Then,

hµ(f) =

k∑

i=1

diλi, (12.1.36)

where λi, i = 1, . . . , k are the Lyapunov exponents of f at µ-almost every pointand di, i = 1, . . . , k are the corresponding multiplicities.

As we pointed out before, in Section 9.4.4, this is a special instance of thePesin entropy formula (Theorem 9.4.5).

12.1.9 Exercises

12.1.1. Show that if η is a Borel measure on a compact metric space then forevery ε > 0 there exists b > 0 such that η(B(y, ε)) > b for every y ∈ supp η.

12.1.2. Let f : M →M be an expanding map. Consider the non-linear operatorG : M1(M) → M1(M) defined in the space M1(M) of all Borel probabilitymeasures by

G(η) =L∗(η)∫L1 dη

.

Use the Tychonoff-Schauder theorem (Theorem 2.2.3) to show that G admitssome fixed point and deduce Lemma 12.1.1.

12.1.3. Let σ : ΣA → ΣA be the one-sided shift of finite type associated with agiven transition matrix A (recall Section 10.2.2). Let P be a stochastic matrixsuch that Pi,j = 0 whenever Ai,j = 0 and p be a probability vector with positivecoefficients such that P ∗p = p. Consider the transfer operator L associated withthe locally constant potential

ϕ(i0, i1, . . . , in, . . . ) = − logpi1

pi0Pi0,i1.

Show that the Markov measure µ associated with the matrix P and the vectorp satisfies L∗µ = µ.

12.1.4. Let λ be any positive number and ν be a Borel probability measure suchthat L∗ν = λν. Show that, given any u ∈ L1(ν) and any continuous functionv : M → R, ∫

(u f)v dν =

∫u(λ−1Lv) dν.


12.2 Theorem of Livsic

Now we discuss the following issue: when is it the case that two different Holderpotentials φ and ψ have the same equilibrium state? Observe that, since theseare ergodic measures, the two equilibrium states µφ and µψ either coincide orare mutually singular (by Lemma 4.3.3).

Recall that two potentials are said to be cohomologous (with respect to f)if the difference between them may be written as u f − u for some continuousfunction u : M → R.

Theorem 12.2.1 (Livsic). A potential ϕ : M → R is cohomologous to zero ifand only if ϕn(x) = 0 for every x ∈ Fix(fn) and every n ≥ 1.

Proof. It is clear that if ϕ = u f − u for some u then

ϕn(x) =

n∑

j=1

u(f j(x)) −n−1∑

j=0

u(f j(x)) = 0

for every x ∈M such that fn(x) = x. The converse is a lot more interesting.Suppose that ϕn(x) = 0 for every x ∈ Fix(fn) and every n ≥ 1. Consider

any point z ∈ M whose orbit is dense in M ; such a point exists because f istopologically exact and, consequently, transitive. Define the function u on theorbit of z through the following relation:

u(fn(z)) = u(z) + ϕn(z), (12.2.1)

where u(z) is arbitrary. Observe that

u(fn+1(z)) − u(fn(z)) = ϕn+1(z) − ϕn(z) = ϕ(fn(z)) (12.2.2)

for every n ≥ 0. In other words, the cohomology relation

φ− ψ = u f − u (12.2.3)

holds on the orbit of z. To extend this relation to the whole M , we use thefollowing fact:

Lemma 12.2.2. The function u is uniformly continuous on the orbit of z.

Proof. Given ε ∈ (0, ρ), take δ > 0 given by the shadowing lemma (Proposi-tion 11.2.9). Suppose that k ≥ 0 and l ≥ 1 are such that d(fk(z), fk+l(z)) < δ.Then, the periodic sequence (xn)n of period l given by

x0 = fk(z), x1 = fk+1(z), . . . , xl−1 = fk+l−1(z), xl = fk(z)

is a δ-pseudo-orbit. Hence, by Proposition 11.2.9, there exists x ∈ Fix(f l) suchthat d(f j(x), fk+j(z)) < ε for every j ≥ 0. Since we took ε < ρ, this also impliesthat x = hl(f

l(x)), where hl : B(fk+l(z), ρ) →M denotes the inverse branch off l that maps fk+l(z) to fk(z). By (11.2.6), it follows that

d(f j(x), fk+j(z)) ≤ σj−ld(f l(x), fk+l(z)) for every 0 ≤ j ≤ l. (12.2.4)

12.2. THEOREM OF LIVSIC 407

By the definition (12.2.1),

u(fk+l(z)) − u(fk(z)) = ϕk+l(z) − ϕk(z) = ϕl(fk(z)). (12.2.5)

Fix constants C > 0 and ν > 0 such that |ϕ(x) − ϕ(y)| ≤ Cd(x, y)ν for anyx, y ∈M . Then,

∣∣ϕl(fk(z)) − ϕl(x)∣∣ ≤

j−1∑

j=0

∣∣ϕ(fk+j(z)) − ϕ(f j(x))∣∣ ≤

∑

j=0

Cd(f j(x), fk+j(z))ν .

Using (12.2.4), it follows that

|ϕl(fk(z)) − ϕl(x)| ≤∑

j=0

Cσν(j−l)d(x, fk+l(z))ν ≤ C1εν (12.2.6)

where C1 = C∑∞i=0 σ

−iν . Recall that, by assumption, ψl(x) = 0. Hence,combining (12.2.5) and (12.2.6), we find that |u(fk+l(z)) − u(fk(z))| ≤ C1ε

ν .This completes the proof of the lemma.

It follows from Lemma 12.2.2 that u admits a (unique) continuous extensionto the closure of the orbit of z, that is, the ambient spaceM . Then, by continuityof ϕ and u, the cohomology relation (12.2.3) extends to the whole M . Thisproves Theorem 12.2.1.

Theorem 12.2.3. Let f : M →M be a topologically exact expanding map on acompact metric space and φ and ψ be two Holder potentials in M . The followingconditions are equivalent:

(a) µφ = µψ;

(b) there exist c ∈ R and an arbitrary function u : M → R such that φ− ψ =c+ u f − u;

(c) φ− ψ is cohomologous to some constant c ∈ R;

(d) there exist c ∈ R and a Holder function u : M → R such that φ − ψ =c+ u f − u;

(e) there exists c ∈ R such that φn(x) − ψn(x) = cn for every x ∈ Fix(fn)and n ≥ 1.

Moreover, the constants c ∈ R in (b), (c), (d) and (e) coincide; indeed, they areall equal to P (f, φ) − P (f, ψ).

Proof. It is clear that (d) implies (c) and (c) implies (b).If φ− ψ = c+ u f − u for some function u then, given any x ∈ Fix(fn),

φn(x) − ψn(x) =n−1∑

j=0

(φ− ψ

)(f j(x)) =

n−1∑

j=0

(c+ u(f j+1(x)) − u(f j(x))

).


Since fn(x) = x, the sum of the last two terms over every j = 0, . . . , n − 1vanishes. Therefore, φn(x) − ψn(x) = cn. This proves that (b) implies (e).

Suppose that φn(x) − ψn(x) = cn for every x ∈ Fix(fn) and every n ≥ 0.That means that the function ϕ = φ − ψ − c satisfies ϕn(x) = 0 for every x ∈Fix(fn) and every n ≥ 0. Note also that ϕ is Holder. Hence, by Theorem 12.2.1,there exists a continuous function u : M → R such that ϕ = u f − u. In otherwords, φ− ψ is cohomologous to c. This shows that (e) implies (c).

It follows from (10.3.4) and Proposition 10.3.8 that if φ is cohomologous toψ + c then

P (f, φ) = P (f, ψ + c) = P (f, ψ) + c.

On the other hand, given any invariant probability measure ν,

hν(f) +

∫φdν = hν(f) +

∫(ψ + c) dν = hν(f) +

∫ψ dν + c.

Therefore, ν is an equilibrium state for φ if and only if ν is an equilibrium statefor ψ. This shows that (c) implies (a).

If µφ and µψ coincide then they have the same Jacobian, of course. ByLemma 12.1.12, this means that

λφe−φhφ f

hφ= λψe

−ψ hψ fhψ

. (12.2.7)

Let c = logλφ − logλψ and u = log hφ− log hψ. Both are well defined, since λφand λψ and hφ and hψ are all positive. Moreover, since the functions hφ and hψare Holder and bounded from zero and infinity (Corollary 12.1.9), the functionu is Holder. Finally, (12.2.7) may be rewritten as follows:

φ− ψ = c+ log u f − u.

This shows that (a) implies (d). The proof of the theorem is complete.

Here is an interesting consequence in the differentiable setting:

Corollary 12.2.4. Let f : M → M be a differentiable expanding map on acompact Riemannian manifold such that the Jacobian detDf is Holder. Theabsolutely continuous invariant probability measure µ coincides with the measureof maximum entropy if and only if there exists c ∈ R such that

| detDfn(x)| = ecn for every x ∈ Fix(fn) and every n ≥ 1.

Proof. As we saw in Proposition 12.1.20, µ is the equilibrium state of the po-tential ϕ = − log | detDf |. It is clear that the measure of maximum entropy µ0

is the equilibrium state of the zero function. Observe that

ϕn(x) =

n−1∑

j=0

log | detDf(f j(x))| = log | detDfn(x)|.

Therefore, Theorem 12.2.3 gives that µ = µ0 if and only if there exists somenumber c ∈ R such that log | detDfn(x)| = 0 + cn for every x ∈ Fix(fn) andevery n ≥ 1.


12.2.1 Exercises

12.2.1. Consider the two-sided shift σ : Σ → Σ in Σ = 1, . . . , dZ. Show thatfor every Holder function ϕ : Σ → R, there exists a Holder function ϕ+ : Σ → R,cohomologous to ϕ and such that ϕ+(x) = ϕ+(y) whenever x = (xi)i∈Z andy = (yi)i∈Z are such that xi = yi for i ≥ 0.

12.2.2. Prove that if the functions ϕ, ψ : M → R are such that there existconstants C,L satisfying |ϕn(x) − ψn(x) − nC| ≤ L for every x ∈ M , thenP (f, ϕ) = P (f, ψ) + C and ϕ is cohomologous to ψ + C.

12.2.3. Let f : M → M be a differentiable expanding map on a compactmanifold, with Holder derivative. Check that any two potentials of the formϕ = − log | detDf |, for two different choices of a Riemannian metric on M ,are cohomologous. [Observation: In particular, all such potentials have thesame equilibrium state, namely, the absolutely continuous invariant probabilitymeasure. This was observed before, in Section 12.1.8.]

12.2.4. Given k ≥ 2, let f : S1 → S1 be the (expanding) map given byf(x) = kx mod Z. Let g : S1 → S1 be a differentiable expanding map ofdegree k. Show that f and g are topologically conjugate.

12.2.5. Given k ≥ 2, let f : S1 → S1 be the map given by f(x) = kx mod Z.Let g : S1 → S1 be a differentiable expanding map of degree k, with Holderderivative. Show that the following conditions are equivalent:

(a) f and g are conjugated by some diffeomorphism;

(b) f and g are conjugated by some absolutely continuous homeomorphismwhose inverse is also absolutely continuous;

(c) (gn)′(p) = kn for every p ∈ Fix(fn)

12.3 Decay of correlations

Let f : M → M be a topologically exact expanding map and ϕ : M → Rbe a Holder potential. As before, we denote by ν the reference measure (Sec-tion 12.1.1) and by µ the equilibrium state (Section 12.1.4) of the potential ϕ.Recall that µ = hν, where the function h is bounded from zero and infinity(Corollary 12.1.9). In particular, L1(µ) = L1(ν).

Given b > 0 and β > 0, we say that a function g : M → R is (b, β)-Holder if

|g(x) − g(y)| ≤ bd(x, y)β for any x, y ∈M. (12.3.1)

We say that g is β-Holder if it is (b, β)-Holder for some b > 0. Then, we denoteby Hβ(g) the smallest of such constants b. Moreover, fixing ρ > 0 as in (11.2.1),we denote by Hβ,ρ(g) the smallest constant b such that the inequality in (12.3.1)holds for any x, y ∈M with d(x, y) < ρ.


The correlations sequence of two functions g1 and g2, with respect to theinvariant measure µ, was defined in (7.1.1):

Cn(g1, g2) =∣∣∫

(g1 fn)g2 dµ−∫g1 dµ

∫g2 dµ

∣∣.

We also consider a similar notion for the reference measure ν:

Bn(g1, g2) =∣∣∫

(g1 fn)g2 dν −∫g1 dµ

∫g2 dν

∣∣.

In this section we prove that these sequences decay exponentially.

Theorem 12.3.1 (Exponential convergence to equilibrium). Given β ∈ (0, α],there exists Λ < 1 and for every β-Holder function g2 : M → C there existsK1(g2) > 0 such that

Bn(g1, g2) ≤ K1(g2)Λn

∫|g1| dν for every g1 ∈ L1(ν) and every n ≥ 1.

The proof is presented in Sections 12.3.1 through 12.3.3. It provides anexplicit expression for the factor K1(g2). Observe also that

Bn(g1, g2) =∣∣∫g1 d

(fn∗ (g2ν)

)−∫g1 d

(µ

∫g2 dν

)∣∣.

Then, the conclusion of Theorem 12.3.1 may be interpreted as follows: the iter-ates of any measure of the form g2ν converge to the invariant measure µ

∫g2 dν

exponentially fast.

Theorem 12.3.2 (Exponential decay of correlations). For every β ∈ (0, α]there exists Λ < 1 and for every β-Holder function g2 : M → C there existsK2(g2) > 0 such that

Cn(g1, g2) ≤ K2(g2)Λn

∫|g1| dµ for every g1 ∈ L1(µ) and every n ≥ 1.

In particular, given any pair g1 and g2 of β-Holder functions, there existsK(g1, g2) > 0 such that Cn(g1, g2) ≤ K(g1, g2)Λ

n for every n ≥ 1.

Proof. Recall that µ = hν and, according to Corollary 12.1.9, the functionh is α-Holder and satisfies K−1

5 ≤ h ≤ K5 for some K5 > 0. Hence (seeExercise 12.3.5), g2 is β-Holder if and only if g2h is β-Holder. Moreover,

Cn(g1, g2) =

∫(g1 fn)g2 dµ−

∫g1 dµ

∫g2 dµ

=

∫(g1 fn)(g2h) dν −

∫g1 dµ

∫(g2h) dν = Bn(g1, g2h).

Therefore, it follows from Theorem 12.3.1 that

Cn(g1, g2) ≤ K1(g2h)Λn

∫|g1| dν ≤ K1(g2h)/K5Λ

n

∫|g1| dµ.


This proves the first part of the theorem, with K2(g2) = K1(g2h)/K5. Thesecond part is an immediate consequence: if g1 is β-Holder then g1 ∈ L1(µ) andit suffices to take K(g1, g2) = K2(g2)

∫|g1| dµ.

Before we move to prove Theorem 12.3.1, let us make a few quick comments.The issue of decay of correlations was already discussed in Section 7.4, from theviewpoint of the spectral gap property. Here we introduce a different approach.The proof of the theorem that we are going to present is based on the notionof projective distance associated with a cone, which was introduced by GarretBirkhoff [Bir67]. This tool permits to obtain exponential convergence to equi-librium (which yields exponential decay of correlations, as we have just shown)without having to analyze the spectrum of the transfer operator. Incidentally,this can also be used to deduce that the spectral gap property does hold in thepresent context. We will come back to this theme near the end of Section 12.3.

12.3.1 Projective distances

Let E be a Banach space. We call cone any convex subset C of E such that

tC ⊂ C for every t > 0 and C ∩ (−C) = 0, (12.3.2)

where C denotes the closure of C (previously we considered only closed conesbut at this point it is convenient to loosen that requirement). Given v1, v2 ∈ C,define

α(v1, v2) = supt > 0 : v2 − tv1 ∈ C and β(v1, v2) = infs > 0 : sv1 − v2 ∈ C.

Figure 12.1 helps understand the geometric meaning of these numbers. Byconvention, α(v1, v2) = 0 if v2 − tv1 /∈ C for every t > 0 and β(v1, v2) = +∞ ifsv1 − v2 /∈ C for every s > 0.

0

v1

v2

v2 − α(v1, v2)v1

v1 − β(v1, v2)−1v2

C

Figure 12.1: Defining the projective distance in a cone C

Note that α(v1, v2) is always finite. Indeed, α(v1, v2) = +∞ would meanthat there exists a sequence (tn)n → +∞ with v2 − tnv1 ∈ C for every n. Then,sn = 1/tn would be a sequence of positive numbers converging to zero and such


that snv2 − v1 ∈ C for every n. This would imply that −v1 ∈ C, which wouldcontradict the second condition in (12.3.2). An analogous argument shows thatβ(v1, v2) is always positive: β(v1, v2) = 0 would imply −v2 ∈ C.

Given any cone C ⊂ E and any v1, v2 ∈ C \ 0, we define

θ(v1, v2) = logβ(v1, v2)

α(v1, v2), (12.3.3)

with θ(v1, v2) = +∞ whenever α(v1, v2) = 0 or β(v1, v2) = +∞. The remarksin the previous paragraph ensure that θ(v1, v2) is always well defined. We call θthe projective distance associated with the cone C. This terminology is justifiedby the proposition that follows, which shows that θ defines a distance in theprojective quotient of C \ 0, that is, in the set of equivalence classes of therelation ∼ defined by v1 ∼ v2 ⇔ v1 = tv2 for some t > 0.

Proposition 12.3.3. If C is a cone then

(a) θ(v1, v2) = θ(v2, v1) for any v1, v2 ∈ C;

(b) θ(v1, v2) + θ(v2, v3) ≥ θ(v1, v3) for any v1, v2, v3 ∈ C;

(c) θ(v1, v2) ≥ 0 for any v1, v2 ∈ C;

(d) θ(v1, v2) = 0 if and only if there exists t > 0 such that v1 = tv2;

(e) θ(t1v1, t2v2) = θ(v1, v2) for any v1, v2 ∈ C and t1, t2 > 0.

Proof. If α(v2, v1) > 0 then

α(v2, v1) = supt > 0 : v1 − tv2 ∈ C = supt > 0 :1

tv1 − v2 ∈ C

=(infs > 0 : sv1 − v2 ∈ C

)−1= β(v1, v2)

−1.

Moreover,

α(v2, v1) = 0 ⇔ v1 − tv2 /∈ C for every t > 0

⇔ sv1 − v2 /∈ C for every s > 0 ⇔ β(v1, v2) = +∞.

Therefore, α(v2, v1) = β(v1, v2)−1 in all the cases. Exchanging the roles of v1

and v2, we also get that β(v2, v1) = α(v1, v2)−1 for any v1, v2 ∈ C. Part (a) of

the proposition is an immediate consequence of these observations.Next, we claim that α(v1, v2)α(v2, v3) ≤ α(v1, v3) for any v1, v2, v3 ∈ C.

This is obvious if α(v1, v2) = 0 or α(v2, v3) = 0; therefore, we may supposethat α(v1, v2) > 0 and α(v2, v3) > 0. Then, by definition, there exist increasingsequences of positive numbers (rn)n → α(v1, v2) and (sn)n → α(v2, v3) suchthat

v2 − rnv1 ∈ C and v3 − snv2 ∈ C for every n ≥ 1.

Since C is convex, it follows that v3 − snrnv1 ∈ C and so snrn ≤ α(v1, v3), forevery n ≥ 1. Passing to the limit as n → +∞, we get the claim. An analogous


argument shows that β(v1, v2)β(v2, v3) ≥ β(v1, v3) for any v1, v2, v3 ∈ C. Part(b) of the proposition follows immediately from these inequalities.

Part (c) means, simply, that α(v1, v2) ≤ β(v1, v2) for any v1, v2 ∈ C. Toprove this fact, consider t > 0 and s > 0 such that v2−tv1 ∈ C and sv1−v2 ∈ C.Then, by convexity, (s − t)v1 ∈ C. If s − t were negative, then we would have−v1 ∈ C, which would contradict the last part of (12.3.2). Therefore, s ≥ t forany t and s as above. This implies that α(v1, v2) ≤ β(v1, v2).

Let v1, v2 ∈ C be such that θ(v1, v2) = 0. Then, α(v1, v2) = β(v1, v2) = γfor some γ ∈ (0,+∞). Hence, there exist an increasing sequence (tn)n → γ anda decreasing sequence (sn)n → γ with

v2 − tnv1 ∈ C and snv1 − v2 ∈ C for every n ≥ 1.

Writing v2 − tnv1 = (v2 − γv1) + (γ − tn)v1, we conclude that v2 − γv1 is in theclosure C of C. Analogously, γv1 − v2 ∈ C. By the second part of (12.3.2), itfollows that v2 − γv1 = 0. This proves part (d) of the proposition.

Finally, consider any t1, t2 > 0 and v1, v2 ∈ C. By definition,

α(t1v1, t2v2) =t2t1α(v1, v2) and β(t1v1, t2v2) =

t2t1β(v1, v2).

Hence, θ(t1v1, t2v2) = θ(v1, v2), as stated in part (e) of the proposition.

Example 12.3.4. Consider the cone C = (x, y) ∈ E : y > |x| in E = R2.The projective quotient of C may be identified with the interval (−1, 1), through(x, 1) 7→ x. Given −1 < x1 ≤ x2 < 1, we have:

α((x1, 1), (x2, 1)) = supt > 0 : (x2, 1) − t(x1, 1) ∈ C

= supt > 0 : 1 − t ≥ |x2 − tx1| =1 − x2

1 − x1

and β((x1, 1), (x2, 1)) =x2 + 1

x1 + 1.

Therefore,

θ((x1, 1), (x2, 1)) = logR(−1, x1, x2, 1), (12.3.4)

where

R(a, b, c, d) =(c− a)(d− b)

(b − a)(d− c)

denotes the cross-ratio of four positive numbers a < b ≤ c < d.

In Exercise 12.3.2 we invite the reader to check a similar fact when theinterval is replaced by the unit disk D = z ∈ C : |z| < 1.

Example 12.3.5. Let E = C0(M) be the space of continuous functions on acompact metric space M . Consider the cone C+ = g ∈ E : g(x) > 0 for x ∈


M. For any g1, g2 ∈ C+,

α(g1, g2) = supt > 0 : (g2 − tg1)(x) > 0 for every x ∈M

= infg2g1

(x) : x ∈M

and β(g1, g2) = supg2g1

(x) : x ∈M.

Therefore,

θ(g1, g2) = logsup(g2/g1)

inf(g2/g1)= log sup

g2(x)g1(y)g1(x)g2(y)

: x, y ∈M. (12.3.5)

This projective distance is complete (Exercise 12.3.3) but that is not always thecase (Exercise 12.3.4).

Next, we observe that the projective distance depends monotonically on thecone. Indeed, let C1 and C2 be two cones with C1 ⊂ C2 and let αi(·, ·), βi(·, ·),θi(·, ·), i = 1, 2 be the corresponding functions, as defined previously. It is clearfrom the definitions that, given any v1, v2 ∈ C2,

α1(v1, v2) ≤ α2(v1, v2) and β1(v1, v2) ≥ β2(v1, v2)

and, consequently, θ1(v1, v2) ≥ θ2(v1, v2).More generally, let C1 and C2 be cones in Banach spaces E1 and E2, re-

spectively, and let L : E1 → E2 be a linear operator such that L(C1) ⊂ C2.Then,

α1(v1, v2) = supt > 0 : v2 − tv1 ∈ C1≤ supt > 0 : L(v2 − tv1) ∈ C2= supt > 0 : L(v2) − tL(v1) ∈ C2 = α2(L(v1), L(v2))

and, analogously, β1(v1, v2) ≥ β2(L(v1), L(v2)). Consequently,

θ2(L(v1), L(v2)) ≤ θ1(v1, v2) for any v1, v2 ∈ C1. (12.3.6)

Of course, the inequality (12.3.6) is not strict, in general. However, accordingto the next proposition, one does have a strict inequality whenever L(C1) hasfinite θ2-diameter in C2; actually, in that case L is a contraction with respect tothe projective distances θ1 and θ2. Recall that the hyperbolic tangent is definedby

tanhx =1 − e−2x

1 + e−2xfor every x ∈ R.

Keep in mind that the function tanh takes values in the interval (0, 1).

Proposition 12.3.6. Let C1 and C2 be cones in Banach spaces E1 and E2,respectively, and let L : E1 → E2 be a linear operator such that L(C1) ⊂ C2.Suppose that D = supθ2(L(v1), L(v2)) : v1, v2 ∈ C1 is finite. Then,

θ2(L(v1), L(v2)) ≤ tanh(D

4

)θ1(v1, v2) for any v1, v2 ∈ C.


Proof. Let v1, v2 ∈ C1. It is no restriction to suppose that α1(v1, v2) > 0 andβ1(v1, v2) < +∞: otherwise θ1(v1, v2) = +∞ and there is nothing to prove.Then, there exist an increasing sequence (tn)n → α1(v1, v2) and a decreasingsequence (sn)n → β1(v1, v2) such that

v2 − tn v1 ∈ C1 and sn v1 − v2 ∈ C1 .

In particular, θ2(L(v2 − tnv1), L(snv1 − v2)) ≤ D for every n ≥ 1. Fix anyD0 > D. Then, we may choose positive numbers Tn and Sn such that

L(sn v1 − v2) − TnL(v2 − tnv1) ∈ C2 and

Sn L(v2 − tnv1) − L(sn v1 − v2) ∈ C2

(12.3.7)

and log(Sn/Tn) ≤ D0 for every n ≥ 1. The first part of (12.3.7) gives that

(sn + tnTn)L(v1) − (1 + Tn)L(v2) ∈ C2

and, by definition of β2(·, ·), this implies that

β2(L(v1), L(v2)) ≤sn + tnTn

1 + Tn.

Analogously, the second part of (12.3.7) implies that

α2(L(v1), L(v2)) ≥sn + tnSn

1 + Sn.

Therefore, θ2(L(v1), L(v2)) can not exceed

log

(sn + tnTn

1 + Tn· 1 + Snsn + tnSn

)= log

(sn/tn + Tn

1 + Tn· 1 + Snsn/tn + Sn

).

The last term may be rewritten as

log

(sntn

+ Tn

)− log(1 + Tn) − log

(sntn

+ Sn

)+ log(1 + Sn) =

=

∫ log(sn/tn)

0

(ex dx

ex + Tn− ex dx

ex + Sn

),

and this expression is less or equal than

supx>0

ex(Sn − Tn)

(ex + Tn)(ex + Sn)log

(sntn

).

Now we use the following elementary facts:

supy>0

y(Sn − Tn)

(y + Tn)(y + Sn)=

1 −√Tn/Sn

1 +√Tn/Sn

≤ 1 − e−D0/2

1 + e−D0/2= tanh

D0

4.


Indeed, the supremum is attained for y =√SnTn and the inequality is a conse-

quence of the fact that log(Sn/Tn) ≤ D0. This proves that

θ2(L(v1), L(v2)) ≤ tanh

(D0

4

)log

(sntn

).

Note also that θ(v1, v2) = limn log(sn/tn), due to our choice of sn and tn. Hence,passing to the limit when n → ∞ and then making D0 → D, we obtain theconclusion of the proposition.

Example 12.3.7. Let C+ be the cone of positive continuous functions in M .For each L > 1, let C(L) = g ∈ C+ : sup |g| ≤ L inf |g|. Then, C(L) has finitediameter in C+, for every L > 1. Indeed, we have seen in Example 12.3.5 thatthe projective distance θ associated with C+ is given by

θ(g1, g2) = log supg2(x)g1(y)g1(x)g2(y)

: x, y ∈M.

In particular, θ(g1, g2) ≤ 2 logL for any g1, g2 ∈ C(L).

12.3.2 Cones of Holder functions

Let f : M → M be a topologically exact expanding map and ρ > 0 and σ > 1be the constants in the definition (11.2.1). Let L : C0(M) → C0(M) be thetransfer operator associated with a Holder potential ϕ : M →M . Fix constantsK0 > 0 and α > 0 such that

|ϕ(x) − ϕ(y)| ≤ K0d(x, y)α for any x, y ∈M.

Given b > 0 and β > 0, we denote by C(b, β) the set of positive functionsg ∈ C0(M) whose logarithm is (b, β)-Holder on balls of radius ρ, that is, suchthat

| log g(x) − log g(y)| ≤ bd(x, y)β whenever d(x, y) < ρ. (12.3.8)

Lemma 12.3.8. For any b > 0 and β > 0, the set C(b, β) is a cone in the spaceE = C0(M) and the corresponding projective distance is given by

θ(g1, g2) = logβ(g1, g2)

α(g1, g2)

where α(g1, g2) is the infimum and β(g1, g2) is the supremum of the set

g2g1

(x),exp(bd(x, y)β)g2(x) − g2(y)

exp(bd(x, y)β)g1(x) − g1(y): x 6= y and d(x, y) < ρ

.

Proof. It is clear that g ∈ C implies tg ∈ C for every t > 0. Moreover, theclosure of C is contained in the set of non-negative functions and so −C ∩ Ccontains only the zero function. Now, to conclude that C is a cone, we only


have to check that it is convex. Consider any g1, g2 ∈ C(b, β). The definition(12.3.8) means that

exp(−bd(x, y)β) ≤ gi(x)

gi(y)≤ exp(bd(x, y)β)

for i = 1, 2 and any x, y ∈M with d(x, y) < ρ. Then, given t1, t2 > 0,

exp(−bd(x, y)β) ≤ t1g1(x) + t2g2(x)

t1g1(y) + t2g2(y)≤ exp(bd(x, y)β)

for any x, y ∈M with d(x, y) < ρ. Hence, t1g1 + t2g2 is in C(b, β).We proceed to calculate the projective distance. By definition, α(g1, g2) is

the supremum of all the numbers t > 0 satisfying the following three conditions:

(g2 − tg1)(x) > 0 ⇔ t <g2g1

(x)

(g2 − tg1)(x)

(g2 − tg1)(y)≤ exp(bd(x, y)β) ⇔ t ≤ exp(bd(x, y)β)g2(y) − g2(x)

exp(bd(x, y)β)g1(y) − g1(x)

(g2 − tg1)(x)

(g2 − tg1)(y)≥ exp(−bd(x, y)β) ⇔ t ≤ exp(bd(x, y)β)g2(x) − g2(y)

exp(bd(x, y)β)g1(x) − g1(y)

for any x, y ∈M with x 6= y and d(x, y) < ρ. Hence, α(g1, g2) is equal to

infg2(x)g1(x)

,exp(bd(x, y)β)g2(x) − g2(y)

exp(bd(x, y)β)g1(x) − g1(y): x 6= y and d(x, y) < ρ

.

Analogously, β(g1, g2) is the supremum of this same set.

The crucial fact that makes the proof of Theorem 12.3.1 work is that thetransfer operator tends to improve the regularity of functions, more precisely,their Holder constants. The next proposition is a concrete manifestation of thisfact:

Lemma 12.3.9. For each β ∈ (0, α] there exists a constant λ0 ∈ (0, 1) suchthat L(C(b, β)) ⊂ C(λ0b, β) for every b sufficiently large (depending on β).

Proof. It follows directly from the expression of the transfer operator in (12.1.1)that Lg is positive whenever g is positive. Therefore, we only have to checkthe second condition in the definition of C(λ0b, β). Consider y1, y2 ∈ M withd(y1, y2) < ρ. The expression (12.1.2) gives that

Lg(yi) =k∑

j=1

eϕ(xi,j)g(xi,j)

for i = 1, 2, where the points xi,j ∈ f−1(yi) satisfy d(x1i, x2i) ≤ σ−1d(y1, y2)for every 1 ≤ j ≤ k. By hypothesis, ϕ is (K0, α)-Holder. Since we suppose that


β ≤ α, it follows that ϕ is (K,β)-Holder, with K = K0(diamM)α−β . Therefore,

(Lg)(y1) =

k∑

i=1

eϕ(x1,i)g(x1,i) =

k∑

i=1

eϕ(x2,i)g(x2,i)g(x1,i)e

ϕ(x1,i)

g(x2,i)eϕ(x2,i)

≤k∑

i=1

eϕ(x2,i)g(x2,i) exp(bd(x1,i, x2,i)

β +Kd(x1,i, x2,i)β)

≤ (Lg)(y2) exp((b+K)σ−βd(y1, y2)

β)

for every g ∈ C(b, β). Fix λ0 ∈ (σ−β , 1). For every b sufficiently large, (b +K)σ−β ≤ bλ0. Then, the previous relation gives that

(Lg)(y1) ≤ (Lg)(y2) exp(λ0bd(y1, y2)β),

for any y1, y2 ∈ M with d(y1, y2) < ρ. Exchanging the roles of y1 and y2, weobtain the other inequality.

Next, we use the family of cones C(L) introduced in Example 12.3.7:

Lemma 12.3.10. There exists N ≥ 1 and for every β > 0 and every b > 0there exists L > 1 satisfying LN (C(b, β)) ⊂ C(L).

Proof. By hypothesis, f is topologically exact. Hence, there exists N ≥ 1such that fN(B(z, ρ)) = M for every z ∈ M . Fix N once and for all. Giveng ∈ C(b, β), consider any point z ∈ M such that g(z) = sup g. Considery1, y2 ∈M . On the one hand,

LNg(y1) =∑

x∈f−N (y1)

eϕN(x)g(x) ≤ degree(fN )eN sup |ϕ|g(z).

On the other hand, by the choice of N , there exists x ∈ B(z, ρ) such thatfN(x) = y2. Then,

LNg(y2) ≥ eϕN(x)g(x) ≥ e−N sup |ϕ|e−bd(x,z)β

g(z) ≥ e−N sup |ϕ|−bρβ

g(z).

Since y1 and y2 are arbitrary, this proves that

supLNginf LNg ≤ degree(fN)e2N sup |ϕ|+bρβ

.

Now it suffices to take L equal to the expression on the right-hand side of thisinequality.

Combining Lemmas 12.3.9 and 12.3.10, we get that there exists N ≥ 1 and,given β ∈ (0, α] there exists λ0 ∈ (0, 1) such that, for every b > 0 sufficientlylarge (depending on N and β) there exists L > 1, satisfying

LN (C(b, β)) ⊂ C(λN0 b, β) ∩ C(L). (12.3.9)

In what follows, we write C(c, β,R) = C(c, β) ∩ C(R) for any c > 0, β > 0 andR > 1.


Lemma 12.3.11. For every c ∈ (0, b) and R > 1, the set C(c, β,R) ⊂ C(b, β)has finite diameter with respect to the projective distance of the cone C(b, β).

Proof. We use the expression of θ given by Lemma 12.3.8. On the one hand,the hypothesis that g1, g2 ∈ C(c, β) ensures that

exp(bd(x, y)β

)g2(x) − g2(y)

exp(bd(x, y)β

)g1(x) − g1(y)

=g2(x)

g1(x)

1 − exp(− bd(x, y)β

)(g2(y)/g2(x)

)

1 − exp(− bd(x, y)β

)(g1(y)/g1(x)

)

≥ g2(x)

g1(x)

1 − exp(− (b − c))d(x, y)β)

1 − exp(− (b + c)d(x, y)β

)

≥ g2(x)

g1(x)

1 − exp(− (b− c)ρβ)

1 − exp(− (b + c)ρβ

)

for any x, y ∈M with d(x, y) < ρ. Denote by r the value of the last fraction onthe right-hand side. Then, observing that r ∈ (0, 1),

α(g1, g2) ≥ infg2(x)g1(x)

, rg2(x)

g1(x): x ∈M

= r inf

g2(x)g1(x)

: x ∈M≥ r

inf g2sup g1

.

Analogously,

β(g1, g2) ≤ supg2(x)g1(x)

,1

r

g2(x)

g1(x): x ∈M

=

1

rsup

g2(x)g1(x)

: x ∈M≤ 1

r

sup g2inf g1

.

On the other hand, the hypothesis that g1, g2 ∈ C(R) gives that

sup g2inf g1

≤ R2 inf g2sup g1

.

Combining these three inequalities, we conclude that θ(g1, g2) ≤ log(R2/r2) forany g1, g2 ∈ C(c, β,R).

Corollary 12.3.12. There exists N ≥ 1 such that for every β ∈ (0, α] andevery b > 0 sufficiently large there exists Λ0 < 1 satisfying

θ(LNg1,LNg2) ≤ Λ0θ(g1, g2) for any g1, g2 ∈ C(b, β).

Proof. Take N ≥ 1, λ0 ∈ (0, 1) and L > 1 as in (12.3.9) and consider

c = λN0 b and R = L. (12.3.10)

Then, LN (C(b, β)) ⊂ C(c, β,R) and it follows from Lemma 12.3.11 that thediameter D of the image LN (C(b, β)) with respect to the projective distanceθ is finite. Take Λ0 = tanh(D/4). Now the conclusion of the corollary is animmediate application of Proposition 12.3.6.


12.3.3 Exponential convergence

Fix N ≥ 1 and β ∈ (0, α] and b > 0 and L > 1 as in Corollary 12.3.12 andthen consider c > 0 and R > 1 given by (12.3.10). As before, we denoteby h the positive eigenfunction (Lemma 12.1.11) and by λ the spectral radius(Corollary 12.1.15) of the transfer operator L. Recall that h is α-Holder andbounded from zero and infinity. Therefore, up to increasing the constants b andL, if necessary, we may assume that h ∈ C(c, β,R).

The next lemma follows directly from the previous considerations and is asignificant step toward the estimate in Theorem 12.3.1. We continue denotingby ‖ · ‖ the norm defined in C0(M) by ‖φ‖ = sup|φ(x)| : x ∈M.

Lemma 12.3.13. There exists C > 0 and Λ ∈ (0, 1) such that

‖λ−nLng − h

∫g dν‖ ≤ CΛn

∫g dν for g ∈ C(c, β,R) and n ≥ 1.

Proof. Let g ∈ C(c, β,R). In particular, g > 0 and so∫g dν > 0. The conclu-

sion of the lemma is not affected when we multiply g by any positive number.Hence, it is no restriction to suppose that

∫g dν = 1. Then,

∫λ−nLng dν =

∫λ−ng d(L∗nν) =

∫g dν = 1 =

∫h dν

and, hence, inf(λ−nLng/h) ≤ 1 ≤ sup(λ−nLng/h) for every n ≥ 1. Now, itfollows from the expressions in Lemma 12.3.8 that

α(λ−jNLjNg, h) ≤ infλ−jNLjNg

h≤ 1

β(λ−jNLjNg, h) ≥ supλ−jNLjNg

h≥ 1.

Consequently,

θ(λ−jNLjNg, h) ≥ log β(λ−jNLjNg, h) ≥ log supλ−jNLjNg

h

θ(λ−jNLjNg, h) ≥ − logα(λ−jNLjNg, h) ≥ − log infλ−jNLjNg

h

for every j ≥ 0. Now let D be the diameter of C(c, β,R) with respect tothe projective distance θ (Lemma 12.3.11). By Proposition 12.3.3 and Corol-lary 12.3.12,

θ(λ−jNLjNg, h) = θ(LjN g,LjNh) ≤ Λj0 θ(g, h) ≤ Λj0D

for every j ≥ 0. Combining this with the previous two inequalities,

exp(−Λj0D) ≤ infλ−jNLjNg

h≤ sup

λ−jNLjNgh

≤ exp(Λj0D)


for every j ≥ 0. Fix C1 > 0 such that |ex− 1| ≤ C1|x| whenever |x| ≤ D. Then,the previous relation implies that

∣∣λ−jNLjNg(x) − h(x)∣∣ ≤ h(x)C1Λ

j0D for every x ∈M and j ≥ 0. (12.3.11)

Take C2 = C1D suph and Λ = Λ1/N0 . The inequality (12.3.11) means that

‖λ−jNLjNg − h‖ ≤ C2ΛjN for every j ≥ 1.

Given any n ≥ 1, write n = jN + r with j ≥ 0 and 0 ≤ r < N . Since thetransfer operator L : C0(M) → C0(M) is continuous and Lh = λh,

‖λ−nLng − h‖ = ‖λ−rLr(λ−jNLjNg − h)‖ ≤ (‖L‖/λ)r ‖λ−jNLjN g − h‖.

Combining the last two inequalities,

‖λ−nLng − h‖ ≤ (‖L‖/λ)rC2Λn−r.

This proves the conclusion of the lemma, as long as we take C ≥ C2(‖L‖/(λΛ))r

for every 0 ≤ r < N .

Now we are ready to prove Theorem 12.3.1:

Proof. Start by considering g2 ∈ C(c, β,R). Using the identity in Exercise 12.1.4and recalling that µ = hν,

Bn(g1, g2) =∣∣∫g1

(λ−nLng2 − h

∫g2 dν

)dν∣∣

≤∥∥λ−nLng2 − h

∫g2 dν

∥∥∫

|g1| dν.

Therefore, using Lemma 12.3.13,

Bn(g1, g2) ≤ CΛn∫

|g1| dν∫g2 dν. (12.3.12)

Now let g2 : M → R be any β-Holder function and H = Hβ(g2). Writeg2 = g+

2 − g−2 with

g+2 =

1

2(|g2| + g2) +B and g−2 =

1

2(|g2| − g2) +B

and B is defined by B = maxH/c, sup |g2|/(R − 1). It is clear that thefunctions g±2 are positive: g±2 ≥ B > 0. Moreover, they are (H,β)-Holder:

|g±2 (x) − g±2 (y)| ≤ |g2(x) − g2(y)| ≤ Hd(x, y)β ,

for x, y ∈M . Hence, using the mean value theorem and the fact that B ≥ H/c,

∣∣ log g±2 (x) − log g±2 (y)∣∣ ≤ |g±2 (x) − g±2 (y)|

B≤ Hd(x, y)β

B≤ cd(x, y)β .


Moreover, since B ≥ sup |g2|/(R− 1),

sup g±2 ≤ sup |g2| +B ≤ RB ≤ R inf g±2 .

Together with the previous relation, this means that g±2 ∈ C(c, β,R). Then, wemay apply (12.3.12) to both functions:

Bn(g1, g±2 ) ≤ CΛn

∫|g1| dν

∫g±2 dν

and, consequently,

Bn(g1, g2) ≤ Bn(g1, g+2 ) +Bn(g1, g

−2 )

≤ CΛn∫

|g1| dν∫

(g+2 + g−2 ) dν.

(12.3.13)

Moreover, by the definition of g±2 ,

∫(g+

2 + g−2 ) dν =

∫|g2| dν + 2B ≤

∫|g2| dν +

2H

c+

2 sup |g2|R− 1

≤ 2

cHβ(g2) +

R + 1

R − 1sup |g2|.

(12.3.14)

Take C1 = Cmax2/c, (R+ 1)/(R− 1) and define

K1(g2) = 2C1

(sup |g2| +Hβ(g2)

).

The relations (12.3.13) and (12.3.14) give that

Bn(g1, g2) ≤ C1Λn

∫|g1| dν

(Hβ(g2) + sup |g2|

)≤ 1

2K1(g2)Λ

n

∫|g1| dν.

This closes the proof of the theorem in the case when g2 is a real function.The general (complex) case follows easily. Note that K1(ℜg2) ≤ K1(g2),

because sup |ℜg2| ≤ sup |g2| and Hβ(ℜg2) ≤ Hβ(g2). Analogously, K1(ℑg2) ≤K1(g2). Therefore, the previous argument yields

Bn(g1, g2) ≤ Bn(g1,ℜg2) +Bn(g1,ℑg2) ≤1

2

(K1(ℜg2) +K1(ℑg2)

)Λn∫

|g1| dν

≤ K1(g2)Λn

∫|g1| dν.

This completes the proof of Theorem 12.3.1.

We close this section with a few comments about the spectral gap property.Let Cβ(M) be the vector space of β-Holder functions g : M → C. We leave itto the reader (Exercise 12.3.6) to check the following facts:

(i) The function ‖g‖β,ρ = sup |g| +Hβ,ρ(g) is a complete norm in Cβ(M).


(ii) Cβ(M) is invariant under the transfer operator: L(Cβ(M)) ⊂ Cβ(M).

(iii) The restriction L : Cβ(M) → Cβ(M) is continuous with respect to thenorm ‖ · ‖β,ρ.

Note that h ∈ Cβ(M), since β ≤ α. Define V = g ∈ Cβ(M) :∫g dν = 0.

Then Cβ(M) = V ⊕Ch, because every function g ∈ Cβ(M) may be decomposed,in a unique way, as the sum of a function in V with a multiple of h:

g =(g − h

∫g dν) + h

∫g dν.

Moreover, the direct sum Cβ(M) = V ⊕ Ch is invariant under the transferoperator. Indeed, if g ∈ V then

g ∈ V ⇒∫

Lg dν =

∫gdL∗ν = λ

∫g dν = 0 ⇒ Lg ∈ V.

It follows that the spectrum of L : Cβ(M) → Cβ(M) is the union of λ withthe restriction of L to the hyperplane V . In Exercise 12.3.8 we invite the readerto show that the spectral radius of L | V is strictly less than λ. Consequently,L : Cβ(M) → Cβ(M) has the spectral gap property.

The book of Viviane Baladi [Bal00] contains an in-depth presentation of thespectral theory of transfer operators and its connections to the issue of decay ofcorrelations, for differentiable (or piecewise differentiable) expanding maps andalso for uniformly hyperbolic diffeomorphisms.

12.3.4 Exercises

12.3.1. Show that the cross-ratio R(a, x, y, b) is invariant under every Mobiusautomorphism of the real line, that is, R(φ(a), φ(b), φ(c), φ(d)) = R(a, b, c, d) forany a < b ≤ c < d and every transformation of the form φ(x) = (αx+β)/(γx+δ)with αδ − βγ 6= 0.

12.3.2. Consider the cone C = (z, s) ∈ C×R : s > |z|. Its projective quotientmay be identified with the unit disk D = z ∈ C : |z| < 1 through (z, 1) 7→ z.Let d be the distance induced in D, through this identification, by the projectivedistance of C. Show that d coincides with the Cayley-Klein distance ∆, whichis defined by

∆(p, q) = log|aq| |pb||ap| |bq| , for p, q ∈ D,

where a and b are the points where the straight line through p and q intersectsthe boundary of the disk, denoted in such a way that p is between a and q andq is between p and b. [Observation: The Cayley-Klein is related to the Poincaredistance in the disk through the map z 7→ (2z)/1 + |z|2.]


12.3.3. Show that the projective distance associated with the cone C+ in Ex-ample 12.3.7 is complete, in the following sense: with respect to the projectivedistance, every Cauchy sequence (gn)n converges to some element of C+. More-over, if we normalize the functions (for example, fixing any probability measureη on M and requiring that

∫gn dη = 1 =

∫g dη for every n), then (gn)n con-

verges uniformly to g.

12.3.4. Let M be a compact manifold and C1 be the cone of positive differ-entiable functions in M . Show that the corresponding projective distance θ1 isnot complete.

12.3.5. Check that if g1, g2 : M → R are β-Holder functions, θ : M → M is aL-Lipschitz transformation and η is a probability measure on M then:

(a) Hβ(g1g2) ≤ sup |g1|Hβ(g2) + sup |g2|Hβ(g1);

(b)∫|g1| dη ≤ sup |g1| ≤

∫|g1| dη +Hβ(g1)(diamM)β ;

(c) Hβ(g θ) ≤ LβHβ(g).

Moreover, the claim in (a) remains true if we replace Hβ by Hβ,ρ. The sameholds for the claim in (c), as long as L ≤ 1.

12.3.6. Let Cβ(M) be the vector space of β-Holder functions on a compactmetric space M . Prove the properties (i), (ii), (iii) stated at the end of Sec-tion 12.3.

12.3.7. Endow Cβ(M) with the norm ‖ · ‖β,ρ. Let L : Cβ(M) → Cβ(M)be the transfer operator associated with an α-Holder potential ϕ : M → R,with α ≥ β. Let λ be the spectral radius, ν be the reference measure, hbe the eigenfunction and µ = hν be the equilibrium state of the potentialϕ. Consider the transfer operator P : Cβ(M) → Cβ(M) associated with thepotential ψ = ϕ+ log h− log h f − logλ.

(a) Check that L is linearly conjugate to λP , and so spec(L) = λ spec(P).Moreover, P1 = 1 and P∗µ = µ.

(b) Show that∫|Png| dµ ≤

∫|g| dµ and sup |Png| ≤ sup |g| and there exist

constants C > 0 and τ < 1 such that Hβ,ρ(Png) ≤ τnHβ,ρ(g) + C sup |g|for every g ∈ Cβ(M) and every n ≥ 1.

12.3.8. The goal of this exercise is to prove that the spectral radius of therestriction of L to the hyperplane V = g ∈ Cβ(M) :

∫g dν = 0 is strictly less

than λ. By part (a) of Exercise 12.3.7, it is enough to consider the case L = P(with λ = 1 and ν = µ and h = 1). Fix b, β,R as in Corollary 12.3.12.

(a) Show that there exist K > 1 and r > 0 such that, for every v ∈ V with‖v‖β,ρ ≤ r, the function g = 1 + v is in the cone C(b, β,R) and satisfies

K−1‖v‖β,ρ ≤ θ(1, g) ≤ K‖v‖β,ρ.

(b) Use Corollary 12.3.12 and the previous item to find C > 0 and τ < 1 suchthat ‖Pnv‖β,ρ ≤ Cτn‖v‖β,ρ for every v ∈ V . Deduce that the spectralradius of P | V is less or equal than τ < 1.

12.4. DIMENSION OF CONFORMAL REPELLERS 425

12.4 Dimension of conformal repellers

In this section we present an application of the theory developed previously tothe calculation of the Hausdorff dimension of certain invariant sets of expand-ing maps, that we call conformal repellers. The main result (Theorem 12.4.3)contains a formula for the value of the Hausdorff dimension of the repeller interms of the pressure of certain potentials.

Detailed presentations of the theory of fractal dimensions and its many ap-plications can be found in the books of Falconer [Fal90], Palis, Takens [PT93,Chapter 4], Pesin [Pes97] and Bonatti, Dıaz, Viana [BDV05, Chapter 3].

12.4.1 Hausdorff dimension

Let M be a metric space. In this section, we call cover of M any countable(possibly finite) family of subsets of M , not necessarily open, whose union isthe whole M . The diameter of a cover U is the supremum of the diameters ofits elements. For each d > 0 and δ > 0, define

md(M, δ) = inf ∑

U∈U

diam(U)d : U cover with diamU < δ. (12.4.1)

That is, we consider all possible covers of M by subsets with diameter less thanδ and we try to minimize the sum of the diameters raised to the power d. Thisnumber varies with δ in a monotonic fashion: when δ decreases, the class ofadmissible covers decreases and, thus, the infimum can only increase. We callHausdorff measure of M in dimension d the limit

md(M) = limδ→0

md(M, δ). (12.4.2)

Note that md(M) ∈ [0,∞]. Moreover, it follows directly from the definitionthat

md1(M, δ) ≤ δd1−d2md2(M, δ) for every δ > 0 and any d1 > d2 > 0.

Making δ → 0, it follows that md1(M) = 0 or md2(M) = ∞ or both. Therefore,there exists a unique d(M) ∈ [0,∞] such that md(M) = ∞ for every d < d(M)and md(M) = 0 for every d > d(M). We call d(M) the Hausdorff dimension ofthe metric space M .

Example 12.4.1. Consider the usual Cantor set K in the real line. That is,

K =

∞⋂

n=0

Kn

where K0 = [0, 1] and every Kn, n ≥ 1 is obtained by removing from eachconnected component of Kn−1 the central open subinterval with relative length


1/3. Let d0 = log 2/ log 3. We are going to show that md0(M) = 1, whichimplies that d(M) = d0.

To prove the upper bound consider, for each n ≥ 0, the cover Vn of K whoseelements are the intersections of K with each of the connected components ofKn. It is clear that the sequence (Vn)n is increasing: Vn−1 ≺ Vn for everyn ≥ 1. Note also that Vn has exactly 2n elements, all with diameter equal to3−n. Therefore,

∑

V ∈Vn

(diamV )d0 = 2n3−nd0 = 1 for every n. (12.4.3)

Since diamVn → 0 when n→ ∞, it follows that md0(M) ≤ 1 and so d(M) ≤ d0.The lower bound is a bit more difficult, because one needs to deal with

arbitrary covers. We are going to show that, given any cover U of M ,

∑

U∈U

(diamU)d0 ≥ 1. (12.4.4)

Clearly, this implies that md0(M) ≥ 1 and so d(M) ≥ d0.Let us call open segment the intersection of K with any interval of the real

line whose endpoints are not in K. It is clear that every subset ofK is containedin some open segment whose diameter is only slightly larger. Hence, given anycover U , we can always find covers U ′ whose elements are open segments suchthat

∑U ′∈U ′(diamU ′)d0 is as close to

∑U∈U (diamU)d0 as we want. So, it is no

restriction to assume from the start that the elements of U are open segments.Then, since K is compact, we may also assume that U is finite. For any opensegment U there exists n ≥ 0 such that every element of Vm, m ≥ n thatintersects U is contained in U . Since U is finite, we may choose the same n forall its elements. We claim that

∑

U∈U

(diamU)d0 ≥∑

V ∈Vn

(diamV )d0 . (12.4.5)

Clearly, (12.4.3) and (12.4.5) imply (12.4.4). We are left to prove (12.4.5).The strategy is to modify the cover U successively, in such a way that the

expression on the left-hand side of (12.4.5) never increases and one reaches thecover Vn after finitely many modifications. For each U ∈ U , let k ≥ 0 beminimum such that U intersects a unique element V of Vk. The choice of nimplies that k ≤ n: for k > n, if U intersects an element of Vk then U containsall the 2k−n elements of Vk inside the same element of Vn. Suppose that k < n.By the choice of k, the set U intersects exactly two elements of Vk+1. Let thembe denoted V1 and V2 and let U1 and U2 be their intersections with U . Then,

diamUi ≤ diamVi = 3−k−1 and diamU = diamU1 + 3−k−1 + diamU2.

Hence (Exercise 12.4.1),

(diamU)d0 ≥ (diamU1)d0 + (diamU2)

d0 .


This means that the value on the left-hand side of (12.4.5) does not increasewhen we replace U by U1 and U2 in the cover U . On the one hand, the newcover satisfies the same conditions as the original: U1 and U2 are open segments(because V1, V2 and U are open segments) and they contain every element ofVn that they intersect. On the other hand, by construction, each one of themintersects a unique element of Vk+1. Therefore, after finitely many repetitionsof this procedure we reduce the initial situation to the case when k = n for everyU ∈ U . Now, the choice of n implies that in that case each U ∈ U containsthe unique V ∈ Vn that it intersects. Observe that this means that U = V .In particular, any elements of U that correspond to the same V ∈ Vm mustcoincide. Eliminating such repetitions we obtain the cover Vn. This completesthe calculation.

In general, the Hausdorff measure of a metric space M in its dimension d(M)may take any value. In many interesting situations, including Example 12.4.1and the much more general construction that we treat in the sequel, this measureis positive and finite. But there are many other cases where it is either zero orinfinity.

12.4.2 Conformal repellers

Let D,D1, . . . , DN be compact convex subsets of an Euclidean space Rℓ suchthat Di ⊂ D for every i and Di ∩Dj = ∅ whenever i 6= j. Assume that

vol(D \D∗) > 0, (12.4.6)

where D∗ = D1 ∪ · · · ∪DN and vol denote the volume measure on Rℓ. Assumealso that there exists a map f : D∗ → D such that the restriction to each Di is ahomeomorphism onto D. See Figure 12.2. Note that the sequence of pre-imagesf−n(D) is decreasing. Their intersection

Λ =

∞⋂

n=0

f−n(D) (12.4.7)

is called repeller of f . In other words, Λ is the set of points x whose iteratesfn(x) are defined for every n ≥ 1. It is clear that Λ is compact and f−1(Λ) = Λ.

Example 12.4.2. The Cantor set K in Example 12.4.1 is the repeller of thetransformation f : [0, 1/3] ∪ [2/3, 1] → [0, 1] given by f(x) = 3x if x ∈ [0, 1/3]and f(x) = 3x−2 if x ∈ [2/3, 1]. A more general class of examples in dimension1 was introduced in Example 11.2.3.

In what follows we take the map f : D∗ → D to be of class C1; for pointson the boundary of the domain this just means that f admits a C1 extensionto some neighborhood. We also make the following additional hypotheses.

The first hypothesis is that the map f is expanding: there exists σ > 1 suchthat

‖Df(x)v‖ ≥ σ for every x ∈ D∗ and every v ∈ Rℓ. (12.4.8)


DD

f

hi

hj

Di

Dj

Figure 12.2: A repeller

Then it is not difficult to check that the restriction f : Λ → Λ to the repeller isan expanding map in the sense of Section 11.2.

The second hypothesis is that the logarithm of the Jacobian of f is Holder:there exist C > 0 and θ > 0 such that

log| detDf(x)|| detDf(y)| ≤ C‖x− y‖θ for every x, y ∈ D∗. (12.4.9)

Up to choosing C sufficiently large, the inequality is automatically satisfied whenx and y belong to distinct subdomains Di and Dj , because d(Di, Dj) > 0.

The third and last hypothesis is that the map f is conformal :

‖Df(x)‖ ‖Df(x)−1‖ = 1 for every x ∈ D∗. (12.4.10)

It is important to note that this condition is automatic when ℓ = 1. For ℓ = 2,it holds if and only if the map f is analytic.

All these conditions are satisfied in the case of the Cantor set (Exam-ples 12.4.1 and 12.4.2). They are also satisfied in Example 11.2.3, as long as wetake the derivative of the corresponding map f to be Holder.

Theorem 12.4.3 (Bowen-Manning formula). Suppose that f : D∗ → D satis-fies the conditions (12.4.8), (12.4.9) and (12.4.10). Then, the Hausdorff dimen-sion of the repeller is given by

d(Λ) = d0ℓ,

where d0 ∈ (0, 1) is the unique number such that P (f,−d0 log | detDf |) = 0.

The reader should be warned that we allow ourselves a slight abuse of lan-guage, in order not to overload the notations: throughout this section, P (f, ψ)always denotes the pressure of a potential ψ : Λ → R with respect to the re-striction f : Λ → Λ to the repeller, even if at other points of the arguments weconsider the map f defined on the whole domain D∗.

Before we start proving the theorem, let us mention the following interestingspecial case:


Example 12.4.4. Let f : J → [0, 1] be a map as in Example 11.2.3 andassume that the restriction of f to each connected component Ji of J is affine:the absolute value of the derivative is constant, equal to the inverse of the length|Ji|. Then, the Hausdorff dimension of the repeller K of the map f is the uniquenumber τ such that ∑

i

|Ji|τ = 1. (12.4.11)

To obtain this conclusion from Theorem 12.4.3 it suffices to note that

P (f,−t log |f ′|) = log∑

i

|Ji|t for every t. (12.4.12)

We let the reader check this last claim (Exercise 12.4.6).

12.4.3 Distortion and conformality

Let us call inverse branch of f the inverse hi : D → Di of the restriction of f toeach domain Di. More generally, we call inverse branch of fn any composition

hn = hi0 · · · hin−1 (12.4.13)

with i0, . . . , in−1 ∈ 1, . . . , N. For each n ≥ 1, denote by In the family of allinverse branches hn of fn. By construction, the images hn(D), hn ∈ In arepairwise disjoint and their union contains Λ.

The principal goal in this section is to prove the following geometric estimate,which is at the heart of the proof of Theorem 12.4.3:

Proposition 12.4.5. There exists C0 > 1 such that for every n ≥ 1, everyhn ∈ In, every E ⊂ hn(D) and every x ∈ hn(D):

1

C0[diam fn(E)]ℓ ≤ [diamE]ℓ| detDfn(x)| ≤ C0[diam fn(E)]ℓ. (12.4.14)

Starting the proof of this proposition, observe that our hypotheses implythat every inverse branch hi of f is a diffeomorphism with ‖Dhi‖ ≤ σ−1. Then,since D is convex, we may use the mean value theorem to conclude that

‖hi(z) − hi(w)‖ ≤ σ−1‖z − w‖ for every z, w ∈ D. (12.4.15)

For each inverse branch hn as in (12.4.13), let us consider the sequence of inversebranches

hn−k = hik · · · hin−1 , k = 0, . . . , n− 1. (12.4.16)

Note that hn−k(D) ⊂ Dik for each k. It follows from (12.4.15) that each hn−k

is a σk−n-contraction. In particular,

diamhn−k(D) ≤ σk−n diamD for every k = 0, . . . , n− 1. (12.4.17)

Recall that the convex hull of a set X ⊂ Rℓ is the union of all the linesegments whose endpoints are in X . It is clear that the convex hull has the


same diameter as the set itself. Since Di is convex for every i, the convex hullof each hn−k(D) is contained in Dik . In particular, the derivative Df is definedat every point in the convex hull of every hn−k(D).

Lemma 12.4.6. There exists C1 > 1 such that, for every n ≥ 1 and everyinverse branch hn ∈ In,

n−1∏

k=0

| detDf(zk)|| detDf(wk)|

≤ C1

for any zk, wk in the convex hull of hn−k(D) for k = 0, . . . , n− 1.

Proof. The condition (12.4.9) gives that

log| detDf(zk)|| detDf(wk)|

≤ C‖zk − wk‖θ ≤ C[diamhn−k(D)]θ

for each k = 0, . . . , n− 1. Then, using (12.4.17),

logn−1∏

k=0

| detDf(zk)|| detDf(wk)|

≤n−1∑

k=0

C[diamhn−k(D)]θ ≤ C[diamD]θn−1∑

k=0

σ(k−n)θ.

Therefore, it suffices to take C1 = exp(C(diamD)θ

∑∞j=1 σ

−jθ).

The time has come for us to exploit the conformality hypothesis (12.4.10).Given any linear isomorphism L : Rℓ → Rℓ, it is clear that | detL| ≤ ‖L‖ℓ, andanalogously for the inverse. Therefore,

1 = | detL| | detL−1| ≤(‖L‖ ‖L−1‖

)ℓ.

Hence, ‖L‖ ‖L−1‖ = 1 implies that | detL| = ‖L‖ℓ, and analogously for theinverse. Therefore, (12.4.10) implies that

| detDf(y)| = ‖Df(y)‖ℓ for every y ∈ D∗. (12.4.18)

Now we are ready to prove Proposition 12.4.5:

Proof of Proposition 12.4.5. Let n, hn, E and x be as in the statement. Let wbe a point of maximum for the norm of Dhn in the domain D. By the meanvalue theorem,

‖x1 − x2‖ ≤ ‖Dhn(w)‖ ‖fn(x1) − fn(x2)‖ (12.4.19)

for any x1, x2 in E. Observe that Dhn(w) is the inverse of Dfn(z), with z =hn(w). Hence, by conformality, ‖Dhn(w)‖ = ‖Dfn(z)‖−1. Moreover, usingLemma 12.4.6 and (12.4.18),

| detDfn(x)| ≤ C1| detDfn(z)| = C1‖Dfn(z)‖ℓ. (12.4.20)


Combining (12.4.19) and (12.4.20), we obtain

‖x1 − x2‖ℓ ≤ C1| detDfn(x)|−1‖fn(x1) − fn(x2)‖ℓ.

Varying x1, x2 ∈ E, it follows that

[diamE]ℓ ≤ C1| detDfn(x)|−1[diam fn(E)]ℓ.

This proves the second inequality in (12.4.14), as long as we take C0 ≥ C1.The proof of the other inequality is similar. For each k = 0, . . . , n − 1, let

zk be a point of maximum for the norm of Df restricted to the convex hull ofhn−k(D). Then,

‖fk+1(x1) − fk+1(x2)‖ ≤ ‖Df(zk)‖‖fk(x1) − fk(x2)‖= | detDf(zk)|1/ℓ‖fk(x1) − fk(x2)‖

for every k and any x1, x2 ∈ E. Hence,

‖fn(x1) − fn(x2)‖ℓ ≤n−1∏

k=0

| detDf(zk)| ‖x1 − x2‖ℓ. (12.4.21)

By Lemma 12.4.6,

n−1∏

k=0

| detDf(zk)| ≤ C1| detDfn(x)|. (12.4.22)

Combining (12.4.21) and (12.4.22), we obtain

‖y1 − y2‖ℓ ≤ C1| detDfn(x)| ‖x1 − x2‖ℓ.

Varying y1, y2, we conclude that

[diam fn(E)]ℓ ≤ C1| detDfn(x)|[diamE]ℓ.

This proves the first inequality in (12.4.14), for any C0 ≥ C1.

12.4.4 Existence and uniqueness of d0

Now we prove the existence and uniqueness of the number d0 in the statementof Theorem 12.4.3. Denote φ = − log | detDf | and consider the function

Ψ : R → R, Ψ(t) = P (f, tφ).

We want to show that there exists a unique d0 such that Ψ(d0) = 0.Uniqueness is easy to prove. Indeed, the hypotheses (12.4.8) and (12.4.10)

imply that

φ = log | detDf−1 f | = ℓ log ‖Df−1 f‖ ≤ −ℓ log σ.


Then, given any s < t, we have tφ ≤ sφ − (t − s)ℓ log σ. Using (10.3.4) and(10.3.5), it follows that

P (f, tφ) ≤ P (f, sφ) − (t− s)ℓ log σ < P (f, sφ).

This proves that Ψ is strictly decreasing, and so there exists at most one d0 ∈ Rsuch that Ψ(d0) = 0.

On the other hand, it follows from Proposition 10.3.6 that Ψ is continuous.Hence, to prove the existence of d0 it is enough to show that Ψ(0) > 0 > Ψ(1).This may be done as follows.

Let L be the open cover of Λ whose elements are the images h(Λ) of Λ underall the inverse branches of f . For each n ≥ 1, the iterated sum Ln is formedby the images hn(Λ) of Λ under the inverse branches of fn. It follows from(12.4.17) that diamLn ≤ σ−n diamD for every n, and so diamLn → 0. Then,since the elements of L are pairwise disjoint, we may use Exercise 10.3.3 toconclude that

P (f, ψ) = P (f, ψ,L) for every potential ψ. (12.4.23)

In particular, Ψ(0) = P (f, 0,L) = h(f,L). Note that each family Ln is aminimal cover of the repeller, that is, no proper subfamily covers Λ. Therefore,H(Ln) = log #Ln = n logN for every n and, consequently, h(f,L) = logN .This proves that Ψ(0) is positive.

Proposition 12.4.7. Ψ(1) = limn1n log vol

(f−n(D)

)< 0.

Proof. By (12.4.23), we have that Ψ(1) = P (f, φ,L). In other words,

Ψ(1) = limn

1

nlogPn(f, φ,L) = lim

n

1

nlog

∑

hn∈In

eφn(hn(Λ)).

Since φ = − log | detDf |, this means that

Ψ(1) = limn

1

nlog

∑

hn∈In

suphn(D)

1

| detDfn| . (12.4.24)

On the other hand, by the formula of change of variables,

vol(f−n(D)

)=∑

hn∈In

vol(hn(D)) =∑

hn∈In

∫

D

1

| detDfn| hn dx.

It follows from Lemma 12.4.6 that

infhn(D)

| detDfn| ≤ | detDfn|(hn(z)) ≤ C1 infhn(D)

| detDfn|

for every z ∈ hn(D) and every hn ∈ In. Consequently,

vol(f−n(D)

)≤∑

hn∈In

suphn(D)

1

| detDfn| ≤ C1 vol(f−n(D)

).


Combining these inequalities with (12.4.24), we conclude that

lim supn

1

nlog vol

(f−n(D)

)≤ Ψ(1) ≤ lim inf

n

1

nlog vol

(f−n(D)

).

This proves the identity in the statement of the proposition.It remains to prove that the volume of the pre-images f−n(D) decays ex-

ponentially fast. For that, observe that f−(n+1)(D) = f−n(D∗) is the disjointunion of the images hn(D∗), with hn ∈ In. Therefore,

vol(f−(n+1)(D)

)

vol(f−n(D)

) =

∑hn∈In vol

(hn(D∗)

)∑hn∈In vol

(hn(D)

) ≤ maxhn∈In

vol(hn(D∗)

)

vol(hn(D)

) . (12.4.25)

By the formula of change of variables,

vol(hn(D)) =

∫

D

1

| detDfn| hn dx and

vol(hn(D \D∗)) =

∫

D\D∗

1

| detDfn| hn dx.

Hence, using Lemma 12.4.6,

vol(hn(D \D∗)

)

vol(hn(D)

) ≥ 1

C1

vol(D \D∗

)

vol(D) (12.4.26)

for every hn ∈ In. By the hypothesis (12.4.6), the expression on the right-handside of (12.4.26) is positive. Fix β > 0 close enough to zero that 1 − e−β issmaller than that expression. Then,

vol(hn(D) \ hn(D∗)

)

vol(hn(D)

) ≥ 1 − e−β

for every hn ∈ In. Combining this inequality with (12.4.25) and the fact thatvol(hn(D) \ hn(D∗)) = volhn(D) − volhn(D∗), we obtain that

vol(f−(n+1)(D)

)

vol(f−n(D)

) ≤ e−β for every n ≥ 0

(the case n = 0 follows directly from the hypothesis (12.4.6)). Hence,

limn

1

nlog vol

(f−n(D)

)≤ −β < 0.

This concludes the proof of the proposition.

Figure 12.3 summarizes the conclusions in this section. Recall that the func-tion defined by Ψ(t) = P (f,−t log | detDf |) is convex, by Proposition 10.3.5.


d0

h(f)

01

t 7→ Ψ(t)

Figure 12.3: Pressure and Hausdorff dimension

12.4.5 Upper bound

Here we show that d(Λ) ≤ bℓ for every b > 0 such that P (f, bφ) < 0. In view ofthe observations in the previous section, this proves that d(Λ) ≤ d0ℓ.

Let L be the open cover of Λ introduced in the previous section and let b > 0be such that P (f, bφ) < 0. The property (12.4.23) implies that

P (f, bφ,L) = P (f, bφ) < −κ

for some κ > 0. By the definition (10.3.2), it follows that

Pn(f, bφ,L) ≤ e−κn for every n sufficiently large. (12.4.27)

It is clear that Ln is a minimal cover of Λ: no proper subfamily covers Λ. Hence,recalling the definition (10.3.1), the inequality (12.4.27) implies that

∑

L∈Ln

ebφn(L) ≤ e−κn for every n sufficiently large. (12.4.28)

It is clear that every L ∈ Ln is compact. Hence, by continuity of the Jacobian,

eφn(L) = supL

| detDfn|−1 = | detDfn(x)|−1

for some x ∈ L. It is also clear that fn(L) = Λ for every L ∈ Ln. Then, takingE = L in Proposition 12.4.5,

[diamL]ℓe−φn(L) ≤ C0[diamΛ]ℓ.

Combining this inequality with (12.4.28), we obtain that

∑

L∈Ln

[diamL]bℓ ≤ Cb0 [diamΛ]bℓ∑

L∈Ln

ebφn(L) ≤ Cb0 [diamΛ]bℓe−κn

for every n sufficiently large. Since the expression on the right-hand side con-verges to zero, and the diameter of the covers Ln also converges to zero, itfollows that mbℓ(M) = 0. Therefore, d(M) ≤ bℓ.


12.4.6 Lower bound

Now we show that d(Λ) ≥ a for every a such that P (f, aφ) > 0. This impliesthat d(Λ) ≥ d0, which completes the proof of Theorem 12.4.3.

As observed in the previous section, the cover L realizes the pressure andall its iterated sums Ln are minimal covers of Λ. Hence, the choice of a impliesthat there exists κ > 0 such that

Pn(f, aφ,L) =∑

L∈Ln

eaφn(L) ≥ eκn for every n sufficiently large. (12.4.29)

Fix such an n. Let ε > 0 be a lower bound for the distance between anytwo elements of Ln: a lower bound does exist because the elements of Ln arecompact and pairwise disjoint. Fix ρ ∈ (0, εaℓ). The reason for this choice willbe clear in a while. We claim that

∑

U∈U

[diamU ]al ≥ 2−alρ (12.4.30)

for every cover U of Λ. By definition, this implies that maℓ(Λ) ≥ 2−alρ > 0and, consequently, d(Λ) ≥ aℓ. Therefore, to end the proof of Theorem 12.4.3 itsuffices to prove this claim.

Let us suppose that there exists some open cover of Λ which does not satisfy(12.4.30). Then, using Exercise 12.4.3, there exists some open cover U of Λ with

∑

U∈U

[diamU ]al < ρ < εal. (12.4.31)

By compactness, we may suppose that this open cover U is finite. The relation(12.4.31) implies that every U ∈ U has diameter less than ε. Hence, each U ∈ Uintersects at most one L ∈ Ln. Since Ln covers Λ and U is a non-empty subsetof Λ, we also have that U intersects some L ∈ Ln. This means that U is thedisjoint union of the families

UL = U ∈ U : U ∩ L 6= ∅, L ∈ Ln.If U ∈ UL then U ⊂ L. Let us consider the families fn(UL) = fn(U) :U ∈ UL. Observe that each one of them is a cover of Λ. Moreover, usingProposition 12.4.5,

∑

V ∈fn(UL)

[diamV ]aℓ =∑

U∈UL

[diam fn(U)]aℓ ≤ C0e−aφn(L)

∑

U∈UL

[diamU ]aℓ.

(12.4.32)Therefore,∑

U∈U

[diamU ]aℓ =∑

L∈Ln

∑

U∈UL

[diamU ]aℓ ≥∑

L∈Ln

C−10 eaφn(L)

∑

V ∈fn(UL)

[diamV ]aℓ.

Let us suppose that∑

V ∈fn(UL)

[diamV ]aℓ ≥∑

U∈U

[diamU ]aℓ for every L ∈ Ln.


Then, the previous inequality implies

∑

U∈U

[diamU ]aℓ ≥∑

L∈Ln

C−10 eaφn(L)

∑

U∈U

[diamU ]aℓ ≥ C−10 eκn

∑

U∈U

[diamU ]aℓ.

This is a contradiction, because eκn > C0. Hence, there exists L ∈ Ln such that

∑

V ∈fn(UL)

[diamV ]aℓ ≤∑

U∈U

[diamU ]aℓ < ρ.

Thus, we may repeat the previous procedure with fn(UL) in the place of U .Observe, however, that #fn(UL) = #UL is strictly less than #U . Therefore,this process must stop after a finite number of steps. This contradiction provesthe claim 12.4.30.

The proof of Theorem 12.4.3 is complete. However, it is possible to prove aneven stronger result: in the conditions of the theorem, the Hausdorff measure ofΛ in dimension d(M) is positive and finite. We leave this statement as a specialchallenge (Exercise 12.4.7) for the reader who remained with us till the end ofthis book!

Figure 12.4: Sierpinski triangle

12.4.7 Exercises

12.4.1. Let d = log 2/ log 3. Show that (x1 + 1 + x2)d ≥ xd1 + xd2 for every

x1, x2 ∈ [0, 1]. Moreover, the identity holds if and only if x1 = x2 = 1.

12.4.2. Let f : M → N be a Lipschitz map, with Lipschitz constant L. Showthat

md(f(A)) ≤ Ldmd(A)

for any d ∈ (0,∞) and any A ⊂ M . Use this fact to show that if A ⊂ Rn andt > 0, then md(tA) = tdmd(A), where tA = tx : x ∈ A.

12.4.3. Represent bymod(M) and mc

d(M) the numbers defined in the same waysas the Hausdorff measure md(M) but considering only covers by open sets andcovers by closed sets, respectively. Show that mo

d(M) = mcd(M) = md(M).


12.4.4. (Mass distribution principle) Let µ be a finite measure on a compactmetric space M and assume that there exist numbers d, K, ρ > 0 such thatµ(B) ≤ K(diamU)d for every set B ⊂M with diameter less than ρ. Show thatif A ⊂M is such that µ(A) > 0 then md(A) > 0 and so d(A) ≥ d.

12.4.5. Use the mass distribution principle to show that the Hausdorff dimen-sion of the Sierpinski triangle (Figure 12.4) is equal to d0 = log 3/ log 2 and theHausdorff measure in dimension d0 is positive and finite.

12.4.6. Check the pressure formula (12.4.12).

12.4.7. Adapting arguments from Exercise 12.4.5, show that in the conditionsof Theorem 12.4.3 one has 0 < md(Λ)(Λ) <∞.


Appendix A

Topics of Measure Theory,

Topology and Analysis

In this series of appendices we recall several basic concepts and facts in MeasureTheory, Topology and Functional Analysis that are useful throughout the book.Our purpose is to provide the reader with a quick, accessible source of refer-ences to measure and integration, general and differential topology and spectraltheory, to try and make this book as self-contained as possible. We have notattempted to make the material in these appendices completely sequential: itmay happen that a notion mentioned at an appendix is defined, or discussed inmore depth, only at a latter appendix (check the index).

As a general rule, we omit the proofs. For Appendices A.1, A.2 and A.5,the reader may find detailed information in the books of Castro [Cas04], Fer-nandez [Fer02], Halmos [Hal50], Royden [Roy63] and Rudin [Rud87]. The pre-sentation in Appendix A.3 is a bit more complete, including the proofs of mostresults, but the reader may find additional relevant material in the books ofBillingsley [Bil68, Bil71]. We recommend the books of Hirsch [Hir94] and doCarmo [dC79] to all those interested in going further into the topics in Ap-pendix A.4. For more information on the subjects of Appendices A.6 and A.7,including proofs of the results quoted here, check the book of Halmos [Hal51]and the treatise of Dunford, Schwarz [DS57, DS63], especially Section IV.4 ofthe first volume and the initial sections of the second volume.

A.1 Measure spaces

Measure spaces are the natural environment for the definition of Lebesgue in-tegral, which is the main topic to be presented in Appendix A.2. We begin byintroducing the notions of algebra and σ-algebra of subsets of a set, which leadto the concept of measurable space. Next, we present the notion of measure ona σ-algebra and we analyze some of its properties. In particular, we mentiona few results on the construction of measures, including Lebesgue measures in

439

440 APPENDIX A. MEASURE THEORY, TOPOLOGY AND ANALYSIS

Euclidean spaces. The last part is dedicated to measurable maps, which are themaps that preserve the structure of measurable spaces.

A.1.1 Measurable spaces

Given a set X , we often denote by Ac the complement X \A of each subset A.

Definition A.1.1. An algebra of subsets of a set X is a family B of subsets ofX that contains the empty set and is closed under the elementary operations ofset theory:

(i) ∅ ∈ B

(ii) A ∈ B implies Ac ∈ B

(iii) A ∈ B and B ∈ B implies A ∪B ∈ B

(iv) A ∈ B and B ∈ B implies A ∩B ∈ B

(v) A ∈ B and B ∈ B implies A \B ∈ B.

The two last properties are immediate consequences of the previous ones,since A ∩ B = (Ac ∪ Bc)c and A \ B = A ∩ Bc. Moreover, by associativity,properties (iii) and (iv) imply that the union and the intersection of any finitefamily of elements of B are also in B.

Definition A.1.2. A σ-algebra of subsets of a set X is an algebra B of subsetsof X that is also closed under countable unions:

Aj ∈ B for j = 1, . . . , n, . . . implies

∞⋃

j=1

Aj ∈ B.

Then B is also closed under countable intersections:

Aj ∈ B for j = 1, . . . , n, . . . implies∞⋂

j=1

Aj =(∪∞j=1 A

cj

)c ∈ B.

Definition A.1.3. A measurable space is a pair (X,B) where X is a set and Bis a σ-algebra of subsets of X . The elements of B are called measurable sets.

Next, we describe a few examples of constructions of σ-algebras.

Example A.1.4. For any setX , the following families of subsets are σ-algebras:

∅, X and 2X = all subsets of X.

Moreover, clearly, if B is any algebra of subsets of X then ∅, X ⊂ B ⊂ 2X .So, ∅, X is the smallest and 2X is the largest of all algebras of subsets of X .

In the statement that follows, I is an arbitrary set whose sole use is to indexthe elements of the family of σ-algebras.

A.1. MEASURE SPACES 441

Proposition A.1.5. Consider any non-empty family Bi : i ∈ I of σ-algebrasof subsets of the same set X. Then the intersection B = ∩i∈IBi is also aσ-algebra of subsets of X.

Given any family E of subsets of X , we may apply Proposition A.1.5 to thefamily of all σ-algebras that contain E . Note that this family is non-empty,since it contains the σ-algebra 2X of all subsets of X . According to the pre-vious proposition, the intersection of all these σ-algebras is also σ-algebra. Byconstruction, this σ-algebra contains E and is contained in every σ-algebra thatcontains E . In other words, it is the smallest σ-algebra that contains E . Thisleads to the following definition:

Definition A.1.6. The σ-algebra generated by a family E of subsets of X isthe smallest σ-algebra σ(E) that contains E or, in other words, the intersectionof all the σ-algebras that contain E .

Recall that a topological space is a pair (X, τ) where X is a set and τ is afamily of subsets of X that contains ∅, X and is closed under finite intersec-tions and arbitrary unions. Such a family τ is called a topology and its elementsare called open subsets of X . In this book we take all topological spaces to beHausdorff, that is, such that for any pair of distinct points there exists a pairof disjoint open subsets each of which contains one of the points.

Definition A.1.7. The Borel σ-algebra of a topological space is the σ-algebraσ(τ) generated by the topology τ , that is, the smallest σ-algebra that containsall the open subsets of X . The elements of σ(τ) are called Borel subsets of X .The closed subsets of X , being the complements of the open subsets, are alsoin the Borel σ-algebra.

Analogously to Proposition A.1.5, the intersection of any non-empty familyτi : i ∈ I of topologies of the same set X is also a topology of X . Then,by the same argument as we used before for σ-algebras, given any family E ofsubsets of X there exists a smallest topology τ(E) that contains E . We call itthe topology generated by E .

Example A.1.8. Let (X,B) be a measurable space. The limit superior of asequence of sets En ∈ B is the set lim supnEn formed by the points x ∈ X suchthat x ∈ En for infinitely many values of n. Analogously, the limit inferior of(En)n is the set lim infnEn of points x ∈ X such that x ∈ En for every valueof n sufficiently large. In other words,

lim infn

En =⋃

n≥1

⋂

m≥n

Em and lim supn

En =⋂

n≥1

⋃

m≥n

Em.

Observe that lim infnEn ⊂ lim supnEn and both sets are in B.

Example A.1.9. The extended line R = [−∞,∞] is the union of the real lineR = (−∞,+∞) with the two points ±∞ at infinity. This space has a naturaltopology, generated by the intervals [−∞, b) and (a,+∞], with a, b ∈ R. It


is easy to see that the extended line is homeomorphic to a compact intervalin the real line: for example, the function arctan : R → (−π/2, π/2) extendsstraightforwardly to a homeomorphism (that is, a continuous bijection whoseinverse is also continuous) between R and [−π/2, π/2]. We always consider onthe extended line the Borel σ-algebra associated with this topology.

Of course, the real line R is a subspace (measurable as well as topological)of the extended line. The Borel subsets of the real line constitute a large familyand one might even be led to thinking that every subset of R is a Borel subset.However, this is not true: a counterexample is constructed in Exercise A.1.4.

A.1.2 Measure spaces

Let (X,B) be a measurable space. The following notions have a central role inthis book:

Definition A.1.10. A measure on (X,B) is a function µ : B → [0,+∞] suchthat µ(∅) = 0 and

µ(

∞⋃

j=1

Aj) =

∞∑

j=1

µ(Aj)

for any countable family of pairwise disjoint sets Aj ∈ B. This last propertyis called countable additivity or σ-additivity. Then the triple (X,B, µ) is calleda measure space. If µ(X) < ∞ then we say that µ is a finite measure and ifµ(X) = 1 then we call µ a probability measure. In this last case, (X,B, µ) iscalled a probability space.

Example A.1.11. Let X be an arbitrary set, endowed with the σ-algebraB = 2X . Given any p ∈ X , consider the function δp : 2X → [0,+∞] defined by:

δp(A) =

1 if p ∈ A

0 if p /∈ A.

It is easy to see that δp is a measure. It is usually called the Dirac measure, orDirac mass at p.

Definition A.1.12. We say that a measure µ is σ-finite if there exists a se-quence A1, . . . , An, . . . of subsets of X such that µ(Ai) < ∞ for every i ∈ Nand

X =

∞⋃

i=1

Ai.

We say that a function µ : B → [0,+∞] is finitely additive if

µ(N⋃

j=1

Aj) =N∑

j=1

µ(Aj)


for any finite family A1, . . . , AN ∈ B of pairwise disjoint subsets. Note that if µis σ-additive then it is also finitely additive. Moreover, if µ is finitely additiveand is not constant equal to +∞ then µ(∅) = 0.

The main tool for constructing measures is the following theorem:

Theorem A.1.13 (Extension). Let A be an algebra of subsets of X and letµ0 : A → [0,+∞] be a σ-additive function with µ0(X) < ∞. Then thereexists a unique measure µ defined on the σ-algebra B generated by A that is anextension of µ0, meaning that it satisfies µ(A) = µ0(A) for every A ∈ A.

Theorem A.1.13 remains valid for σ-finite measures. Moreover, there is aversion for finitely additive functions: if µ0 is finitely additive then it admitsa finitely additive extension to σ-algebra B generated by A. However, in thiscontext the extension need not be unique.

The most useful criterion for proving that a given function is σ-additive isprovided by the following theorem:

Theorem A.1.14 (Continuity at the empty set). Let A be an algebra of subsetsof X and µ : A → [0,+∞) be a finitely additive function with µ(X) <∞. Thenµ is σ-additive if and only if

limnµ(An)

= 0 (A.1.1)

for every sequence A1 ⊃ · · · ⊃ Aj ⊃ · · · of elements of A with ∩∞j=1Aj = ∅.

The proof of this theorem is proposed in Exercise A.1.7. Exercise A.1.9 dealswith some variations of the statement.

Definition A.1.15. We say that an algebra A is compact if any decreasingsequence A1 ⊃ · · · ⊃ An ⊃ · · · of non-empty elements of A has non-emptyintersection.

An open cover of a topological space is a family of open subsets whose unionis the wholeK. A subcover is just a subfamily of elements of a cover whose unionis still the whole space. A topological space is compact if every open cover admitssome finite subcover. A subset K of a topological space X is compact if thetopology of X restricted to K turns the latter into a compact topological space.Every closed subset of compact space is compact. Conversely, (assuming X is aHausdorff space) then every compact subset is closed. Another important factis that the intersection ∩nKn of any decreasing sequence K1 ⊃ · · · ⊃ Kn ⊃ · · ·of compact subsets is non-empty.

Example A.1.16. It follows from what we have just said that if X is a (Haus-dorff) topological space and every element of the algebra A is compact then Ais a compact algebra.

It follows from Theorem A.1.14 that if A is a compact algebra then everyfinitely additive function µ : A → [0,+∞) with µ(X) <∞ is σ-additive. Hence,by Theorem A.1.13, µ extends uniquely to a measure defined on the σ-algebragenerated by A.


Definition A.1.17. We say that a non-empty family C of subsets of X is amonotone class if C contains X and is closed under countable monotone unionsand intersections:

• if A1 ⊂ A2 ⊂ · · · are in C then ∪n≥1An ∈ C and

• if A1 ⊃ A2 ⊃ · · · are in C then ∩n≥1An ∈ C.

Clearly, the two families ∅, X and 2X are monotone classes. Moreover,if Ci : i ∈ I is any family of monotone classes then the intersection ∩i∈ICiis a monotone class. Thus, for every subset A of 2X there exists the smallestmonotone class that contains A.

Theorem A.1.18 (Monotone class). The smallest monotone class that containsan algebra A coincides with the σ-algebra σ(A) generated by A.

Another important result about σ-algebras that will be useful later statesthat every element of a σ-algebra B generated by an algebra A is approximatedby the elements of A, in the sense that the measure of the symmetric difference

A∆B = (A \B) ∪ (B \A) = (A ∪B) \ (A ∩B)

can be made arbitrarily small. More precisely:

Theorem A.1.19 (Approximation). Let (X,B, µ) be a probability space and Abe an algebra A of subsets of X that generates the σ-algebra B. Then, for everyε > 0 and every B ∈ B there exists A ∈ A such that µ(A∆B) < ε.

Definition A.1.20. A measure space is complete if every subset of a measurableset with zero measure is also measurable.

It is possible to transform any measure space (X,B, µ) into a complete space,as follows. Let B be the family of all subsets A ⊂ X such that there existB1, B2 ∈ B with B1 ⊂ A ⊂ B2 and µ(B2 \ B1) = 0. Then B is a σ-algebraand it contains B. Consider the function µ : B → [0,+∞] defined by µ(A) =µ(B1) = µ(B2), for any B1, B2 ∈ B as before. The function µ is well defined,it is a measure on B and its restriction to B coincides with µ. By construction,(X, B, µ) is a complete measure space. It is called the completion of (X,B, µ).

Given subsets U1 and U2 of the σ-algebra B, we say that U1 ⊂ U2 up tomeasure zero if for every B1 ∈ U1 there exists B2 ∈ U2 such that µ(B1∆B2) = 0.By definition, U1 = U2 up to measure zero if U1 ⊂ U2 up to measure zero andU2 ⊂ U1 up to measure zero. We say that a set U ⊂ B generates the σ-algebra Bup to measure zero if the σ-algebra generated by U is equal to B up to measurezero. Equivalently, U generates B up to measure zero if the completion of theσ-algebra generated by U coincides with the completion of B.

By definition, a measure takes values in [0,∞]. Whenever it is convenientto stress that fact, we speak of positive measure instead. But it is possible toweaken that requirement and, indeed, such generalizations are useful for ourpurposes.


We call signed measure on a measurable space (X,B) any σ-additive functionµ : B → [−∞,∞] such that µ(∅) = 0. More precisely, µ may take either thevalue −∞ or the value +∞, but not both; this is to avoid the “indetermination”∞−∞ in the additivity condition.

Theorem A.1.21 (Hahn decomposition). If µ is a signed measure then thereexist measurable sets P,N ⊂ X such that P ∪N = X and P ∩N = ∅ and

µ(E) ≥ 0 for every E ⊂ P and µ(E) ≤ 0 for every E ⊂ N.

This means that we may write µ = µ+ − µ− where µ+ and µ− are the(positive) measures defined by

µ+(E) = µ(E ∩ P ) and µ−(E) = −µ(E ∩N).

In particular, the sum |µ| = µ+ + µ− is also a positive measure; it is called thetotal variation of the signed measure µ.

If µ takes values in (−∞,∞) only, we call it a finite signed measure. In thiscase, the measures µ+ and µ− are finite. The set M(X) of finite signed measuresis a real vector space and the function ‖µ‖ = |µ|(X) is a complete norm in thisspace (see Exercise A.1.10). In other words, (M(X), ‖·||) is a real Banach space.When X is a compact metric space, this Banach space is isomorphic to the dualof the space C0(X) of continuous real functions X (theorem of Riesz-Markov).

More generally, we call complex measure on a measurable space (X,B) anyσ-additive function µ : B → C. Observe that µ(∅) is necessarily zero. Clearly,we may write µ = ℜµ+ iℑµ, where the real part ℜµ and the imaginary part ℑµare finite signed measures. The total variation of µ is the finite measure definedby

|µ|(E) = supP

∑

P∈P

|µ(P )|,

where the supremum is taken over all countable partitions of the measurableset E into measurable subsets (this definition coincides with the one we gavepreviously in the special case when µ is real). The function ‖µ‖ = |µ|(X) definesa norm in the vector space of complex measures on X , which we also denote asM(X). Moreover, this norm is complete. When X is a compact metric space,the complex Banach space (M(X), ‖ · ‖) is isomorphic to the dual of the spaceC0(X) of continuous complex functions on X (theorem of Riesz-Markov).

A.1.3 Lebesgue measure

The notion of Lebesgue measure corresponds to the notion of volume of subsetsof the Euclidean space Rd. It is defined as follows.

Let X = [0, 1] and A be the family of all subsets of the form A = I1∪· · ·∪INwhere I1, . . . , IN are pairwise disjoint intervals. It is easy to check that A is analgebra of subsets of X . Let m0 : A → [0, 1] be the function defined on thisalgebra by

m0

(I1 ∪ · · · ∪ IN

)= |I1| + · · · + |IN |,


where |Ij | represents the length of each interval Ij . Note that m0(X) = 1. InExercise A.1.8 we ask the reader to show that m0 is σ-additive.

Note that the σ-algebra B generated by A coincides with the Borel σ-algebraof X , since every open subset is a countable union of pairwise intervals. So, byTheorem A.1.13, there exists a unique probability measure m defined on B thatis an extension of m0. It is called the Lebesgue measure on [0, 1].

More generally, one defines the Lebesgue measure m on the cube X = [0, 1]d

of any dimension d ≥ 1, in the following way. Firstly, we call rectangle in X anysubset of the form R = I1 × · · · × Id where the Ij are intervals. Then we define:

m0(R) = |I1| × · · · × |Id|.

Next, we consider the algebra A of subsets of X of the form A = R1∪· · · ∪RN ,where R1, . . . , RN are pairwise disjoint rectangles, and we define

m0(A) = m0(R1) + · · · +m0(RN )

for every A in that algebra. The σ-algebra generated by A coincides with theBorel σ-algebra of X . The Lebesgue measure on the cube X = [0, 1]d is theextension of m0 to that σ-algebra.

In order to define the Lebesgue measure on the whole Euclidean space Rd,we decompose the space into cubes of unit size:

Rd =⋃

k1∈Z

· · ·⋃

kd∈Z

[k1, k1 + 1) × · · · × [kd, kd + 1).

Each cube [k1, k1 + 1)× · · · × [kd, kd + 1) may be identified with [0, 1)d throughthe translation Tk1,...,kd

(x) = x − (k1, . . . , kd) that maps (k1, k2, . . . , kd) to theorigin. That allows us to define a measure mk1,k2,...,kd

on C, by setting

mk1,k2,...,kd(B) = m0

(Tk1,...,kd

(B))

for every measurable set B ⊂ C. Finally, given any measurable set B ⊂ Rd,define:

m(B) =∑

k1∈Z

· · ·∑

kd∈Z

mk1,...,kd

(B ∩ [k1, k1 + 1) × · · · × [kd, kd + 1)

).

Note that this measure m is σ-finite but not finite.

Example A.1.22. It is worthwhile outlining a classical alternative constructionof the Lebesgue measure (see Chapter 2 of the book of Royden [Roy63] fordetails). We call Lebesgue exterior measure of an arbitrary set E ⊂ Rd thenumber

m∗(E) = inf∑

k

m0(Rk)

where the infimum is taken over all countable covers (Rk)k of E by open rect-angles. The function E 7→ m(E) is defined for every E ⊂ Rd, but is not finitely


additive (although it is countably subadditive). We say that E is a Lebesguemeasurable set if

m∗(A) = m∗(A ∩ E) +m∗(A ∩ Ec) for every A ⊂ Rd.

Every rectangle R is a Lebesgue measurable set and satisfies m∗(R) = m0(R).The family M of all Lebesgue measurable sets is a σ-algebra. Moreover, therestriction of m∗ to M is σ-additive and, hence, a measure. By the previousobservation, M contains every Borel set of Rd. The restriction of m∗ to theBorel σ-algebra B of Rd coincides with the Lebesgue measure on Rd.

Actually, M coincides with the completion of the Borel σ-algebra of Rd withrespect to the Lebesgue measure. This and other related properties are part ofExercise A.1.13.

Example A.1.23. Let φ : [0, 1] → R be a positive continuous function. Givenany interval I, with endpoints 0 ≤ a < b ≤ 1, define

µφ(I) =

∫ b

a

φ(x) dx (Riemann integral).

Next, extend the definition of µφ to the algebra A formed by the finite unionsA = I1 ∪ · · · ∪ Ik of pairwise disjoint intervals, through the relation

µφ(A) =

k∑

j=1

µφ(Ij).

The basic properties of the Riemann integral ensure that µφ is finitely additive.We leave it to the reader to check that the measure µφ is σ-additive in the algebraA (see Exercise A.1.7). Moreover, µφ(∅) = 0 and µφ([0, 1]) < ∞, because φ iscontinuous and, hence, bounded. With the help of Theorem A.1.13, we mayextend µφ to the whole Borel σ-algebra of [0, 1].

The measure µφ that we have just constructed has the following specialproperty: if a set A ⊂ [0, 1] has Lebesgue measure zero then µφ(A) = 0. Thisproperty is called absolute continuity (with respect to the Lebesgue measure)and is studied in a lot more depth in Appendix A.2.4.

Here is an example of a measure that is positive on any open set but is notabsolutely continuous with respect to Lebesgue measure:

Example A.1.24. Fix any enumeration r1, r2, . . . of the set Q of rationalnumbers. Consider the measure µ defined on R by:

µ(A) =∑

ri∈A

1

2i.

On the one hand, the measure of any non-empty open subset of the real lineis positive, for such a subset must contain some ri. On the other hand, themeasure of Q is

µ(Q) =∑

ri∈Q

1

2i= 1.


Since Q has Lebesgue measure zero (because it is a countable set), this impliesthat µ is not absolutely continuous with respect to the Lebesgue measure.

This example also motivates the concept of support of a measure on a topo-logical space (X, τ), which we introduce next. For that, we must recall a fewbasic ideas from Topology.

A subset τ ′ of the topology τ is a basis of the topology, or a basis of opensets, if for every x ∈ X and every open set U containing x there exists U ′ ∈ τ ′

such that x ∈ U ′ ⊂ U . We say that the topological space admits a countablebasis of open sets if such a subset τ ′ may be chosen to be countable. A setV ⊂ X is a neighborhood of a point x ∈ X if there exists some open set U suchthat x ∈ U ⊂ V . Thus, a subset X is open if and only if it is a neighborhood ofeach one of its points. A family υ′ of subsets of X is a basis of neighborhoods ofa point x ∈ X if for every neighborhood V there exists some V ′ ∈ υ′ such thatx ∈ V ′ ⊂ V . We say that x admits a countable basis of neighborhoods if υ′ maybe chosen to be countable. If the topological space admits a countable basis ofopen sets then every x ∈ X admits a countable basis of neighborhoods, namely,the family of elements of the countable basis of open sets that contain x.

Definition A.1.25. Let (X, τ) be a topological space and µ be a measure onthe Borel σ-algebra of X . The support of the measure µ is the set suppµ formedby the points x ∈ X such that µ(V ) > 0 for any neighborhood V of x.

It follows immediately from the definition that the support of a measure isa closed set. In Example A.1.24 above, the support of µ is the whole real line,despite the fact that µ(Q) = 1.

Proposition A.1.26. If X is a topological space with a countable basis of opensets and µ is a non-zero measure on X, then the support suppµ is non-empty.

Proof. If suppµ is empty then for each point x ∈ X we may find an openneighborhood Vx such that µ(Vx) = 0. Let Aj : j = 1, 2, . . . be a countablebasis of the topology of X . Then, for each x ∈ X we may choose i(x) ∈ N suchthat x ∈ Ai(x) ⊂ Vx. Hence,

X =⋃

x∈X

Vx =⋃

x∈X

Ai(x)

and so

µ(X) = µ( ⋃

x∈X

Ai(x))≤

∞∑

i=1

µ(Ai) = 0.

This is a contradiction, and so suppµ can not be empty.

A.1.4 Measurable maps

Measurable maps play a role in Measure Theory similar to the role of continuousmaps in Topology: measurability corresponds to the idea that the map preservesthe family of measurable subsets, just as continuity means that the family ofopen subsets is preserved by the map.


Definition A.1.27. Given measurable spaces (X,B) and (Y, C), we say that amap f : X → Y is measurable if f−1(C) ∈ B for every C ∈ C.

In general, the family of sets C ∈ C such that f−1(C) ∈ B is a σ-algebra.So, to prove that f is measurable it suffices to show that f−1(C0) ∈ B forevery set C0 in some family C0 ⊂ C that generates the σ-algebra C. See alsoExercise A.1.1.

Example A.1.28. A function f : X → [−∞,∞] is measurable if and only if theset f−1((c,+∞]) belongs to B for every c ∈ R. This follows from the previousobservation, since the family of intervals (c,+∞] generates the Borel σ-algebraof the extended line (recall Example A.1.9). In particular, if a function f takesvalues in (−∞,+∞) then it is measurable if and only if f−1((c,+∞)) belongsto B for every c ∈ R.

Example A.1.29. If X is a topological space and B is the corresponding Borelσ-algebra, then every continuous function f : X → R is measurable. Indeed,continuity means that the pre-image of every open subset of R is an open subsetof X and, hence, is in B. Since the family of open sets generates the Borel σ-algebra of R, it follows that the pre-image of every Borel subset of the real lineis also in B.

Example A.1.30. The characteristic function XB : X → R of a set B ⊂ X isdefined by:

XB(x) =

1, if x ∈ B;0, otherwise.

Observe that the function XB is measurable if and only if B is a measurablesubset: indeed, X−1

B (A) ∈ ∅, B,X \B,X for any A ⊂ R.

Among the basic properties of measurable functions, let us highlight:

Proposition A.1.31. Let f, g : X → [−∞,+∞] be measurable functions andlet a, b ∈ R. Then the following functions are also measurable:

(af + bg)(x) = af(x) + bg(x) and (f · g)(x) = f(x) · g(x).

Moreover, if fn : X → [−∞,+∞] is a sequence of measurable functions, thenthe following functions are also measurable:

s(x) = supfn(x) : n ≥ 1 and i(x) = inffn(x) : n ≥ 1,

f∗(x) = lim supn

fn(x) and f∗(x) = lim infn

fn(x).

In particular, if f(x) = lim fn(x) exists then f is measurable.

The linear combinations of characteristic functions form an important classof measurable functions:


Definition A.1.32. We say that a function s : X → R is simple if there existconstants α1, . . . , αk ∈ R and pairwise disjoint measurable sets A1, . . . , Ak ∈ Bsuch that

s =k∑

j=1

αjXAj , (A.1.2)

where XA is the characteristic function of the set A.

Note that every simple function is measurable. In the converse direction,the result that follows asserts that every measurable function is the limit of asequence of simple functions. This fact will be very useful in the next appendix,when defining the Lebesgue integral.

Proposition A.1.33. Let f : X → [−∞,+∞] be a measurable function. Thenthere exists a sequence (sn)n of simple functions such that |sn(x)| ≤ |f(x)| forevery n and

limnsn(x) = f(x) for every x ∈ X.

If f takes values in R, we may take every sn with values in R. If f is bounded,the sequence (sn)n may be chosen such that the convergence is uniform. If f isnon-negative, we may take 0 ≤ s1 ≤ s2 ≤ · · · ≤ f .

In Exercise A.1.16 the reader is invited to prove this proposition.

A.1.5 Exercises

A.1.1. Let X be a set and (Y, C) be a measurable space. Show that, for anytransformation f : X → Y there exists some σ-algebra B of subsets of X suchthat the transformation is measurable with respect to the σ-algebras B and C.

A.1.2. Let X be a set and consider the family of subsets

B0 = A ⊂ X : A is finite or Ac is finite.

Show that B0 is an algebra. Moreover, B0 is a σ-algebra if and only if the setX is finite. Show also that, in general,

B1 = A ⊂ X : A is finite or countable or Ac is finite or countable

is the σ-algebra generated by the algebra B0.

A.1.3. Prove Proposition A.1.5.

A.1.4. The purpose of this exercise is to exhibit a non-Borel subset of the realline. Let α be any irrational number. Consider the following relation on R :x ∼ y ⇔ there are m,n ∈ Z such that x − y = m + nα. Check that ∼ is anequivalence relation and every equivalence class intersects [0, 1). Let E0 be asubset of [0, 1) containing exactly one element of each equivalence class (theexistence of such a set is a consequence of the Axiom of Choice). Show that E0

is not a Borel set.


A.1.5. Let (X,B, µ) be a measure space. Show that if A1,A2, . . . are in B then

µ(∞⋃

j=1

Aj) ≤∞∑

j=1

µ(Aj).

A.1.6 (Lemma of Borel-Cantelli). Let (En)n be a countable family of measur-able sets. Let F be the set of points that belong to En for infinitely many valuesof n, that is, F = lim supnEn = ∩∞

k=1 ∪∞n=k En. Show that if

∑n µ(En) < ∞

then µ(F ) = 0.

A.1.7. Prove Theorem A.1.14.

A.1.8. Let A be the collection of subsets of X = [0, 1] that may be written asfinite unions of pairwise disjoint intervals. Check that A is an algebra of subsetsof X . Let m0 : A → [0, 1] be the function defined on this algebra by

m0

(I1 ∪ · · · ∪ IN

)= |I1| + · · · + |IN |,

where |Ij | represents the length of Ij . Show that m0 is σ-additive.

A.1.9. Let B be an algebra of subsets of X and µ : B → [0,+∞) be a finitelyadditive function with µ(X) <∞. Show that µ is σ-additive if and only if anyone of the following conditions holds:

1. limn µ(An) = µ(∩∞j=1Aj) for any decreasing sequence A1 ⊃ · · · ⊃ Aj ⊃ · · ·

of elements of B;

2. limn µ(An) = µ(∪∞j=1Aj) for any increasing sequence A1 ⊂ · · · ⊂ Aj ⊂ · · ·

of elements of B.

A.1.10. Show that ‖µ‖ = |µ|(X) defines a complete norm in the vector spaceof finite signed measures on a measurable space (X,B).

A.1.11. Let X = 1, . . . , d be a finite set, endowed with the discrete topology,and let M = XI with I = N or I = Z.

(a) Check that (A.2.7) defines a distance on M and that the topology definedby this distance coincides with the product topology on M . Describe theopen balls and the closed balls around any point x ∈ XI.

(b) Without using the theorem of Tychonoff, show that (M,d) is a compactspace.

(c) Let A be the algebra generated by the elementary cylinders of M . Showthat every additive function µ : A → [0, 1] with µ(M) = 1 extends to aprobability measure on the Borel σ-algebra of M .

A.1.12. Let K ⊂ [0, 1] be the Cantor set, that is, K = ∩∞n=0Kn where K0 =

[0, 1] and each Kn is the set obtained by removing from each connected compo-nent C of Kn−1 the open interval whose center coincides with the center of Cand whose length is one third of the length of C. Show that K has Lebesguemeasure equal to zero.


A.1.13. Given a set E ⊂ Rd, prove that the following conditions are equivalent:

(a) E is a Lebesgue measurable set.

(b) E belongs to the completion of the Borel σ-algebra relative to the Lebesguemeasure, that is, there exist Borel sets B1, B2 ⊂ Rd such that B1 ⊂ E ⊂B2 and m(B2 \B1) = 0.

(c) (Approximation from above by open sets) Given ε > 0 we can find anopen set A such that E ⊂ A and m∗(A \ E) < ε.

(d) (Approximation from below by closed sets) Given ε > 0 we can find aclosed set F such that F ⊂ E and m∗(E \ F ) < ε.


A.1.15. Let gn : M → R, n ≥ 1 be a sequence of measurable functions suchthat f(x) =

∑∞n=1 gn(x) converges at every point. Show that the sum f is a

measurable function.


A.1.17. Let f : X → X be a measurable transformation and ν be a measureon X . Define (f∗ν)(A) = ν(f−1(A)). Show that f∗ν is a measure and note thatit is finite if and only if ν itself is finite.

A.1.18. Let ω5 : [0, 1] → [0, 1] be the function assigning to each x ∈ [0, 1] theupper frequency of the digit 5 in the decimal expansion of x. In other words,writing x = 0, a0a1a2 . . . ... with ai 6= 9 for infinitely many values of i,

ω5(x) = lim supn

1

n#0 ≤ j ≤ n− 1 : aj = 5.

Prove that the function ω5 is measurable.

A.2 Integration in measure spaces

In this appendix we define the Lebesgue integral of a measurable function withrespect to a measure. This generalizes to a much broader class of functions thenotion of Riemann integral that is usually presented in Calculus or introduc-tory Analysis courses. Indeed, the Riemann integral is not defined for manyuseful functions, for example, the characteristic functions of arbitrary measur-able sets (see Example A.2.5 below). In contrast, the Lebesgue integral makessense for the whole class of measurable functions, which, as we have seen inProposition A.1.31, is closed under all the main operations in Analysis.

Also in this appendix, we state some important results about the behaviorof the (Lebesgue) integral under limits of sequences. Moreover, we describe theproduct of any finite family of finite measures; for probability measures we evenextend this construction to countable families. Near the end, we discuss therelated notions of absolute continuity and Lebesgue derivation.

A.2. INTEGRATION IN MEASURE SPACES 453

A.2.1 Lebesgue integral

Throughout this section, we always take (X,B, µ) to be a measure space. Weare going to introduce the notion of Lebesgue integral in a certain number ofsteps. The first one deals with the integral of a simple function:

Definition A.2.1. Let s =∑k

j=1 αjXAj be a simple function. The integral ofs is given by:

∫s dµ =

k∑

j=1

αjµ(Aj).

It is easy to check (Exercise A.2.1) that this definition is consistent: if twodifferent linear combinations of characteristic functions define the same functionthen the values of the integrals obtained from those two linear combinations areequal.

The next step is to define the integral of a non-negative measurable function.The idea is to approximate the function by a monotone sequence of simplefunctions, using Proposition A.1.33:

Definition A.2.2. Let f : X → [0,∞] be a non-negative measurable function.Then ∫

fdµ = limn

∫sndµ,

where s1 ≤ s2 ≤ . . . is a non-decreasing sequence of simple functions such thatlimn sn(x) = f(x) for every x ∈ X .

It is not difficult to check (Exercise A.2.2) that this definition is consistent:the value of the integral does not depend on the choice of the sequence (sn)n.

Next, to extend the definition of integral to an arbitrary measurable function,let us observe that given any function f : X → [−∞,+∞] we can always writef = f+ − f− with

f+(x) = maxf(x), 0 and f−(x) = max−f(x), 0.

It is clear that the functions f+ and f− are non-negative. Moreover, by Propo-sition A.1.31, they are measurable whenever f is measurable.

Definition A.2.3. Let f : X → [−∞,+∞] be a measurable function. Then,

∫f dµ =

∫f+ dµ−

∫f− dµ,

as long as at least one of the integrals on the right-hand side is finite (with theusual conventions that (+∞)− a = +∞ and a− (+∞) = −∞ for every a ∈ R).

Definition A.2.4. A function f : X → [−∞,+∞] is integrable if it is mea-surable and its integral is a real number. We denote the set of all integrablefunctions as L1(X,B, µ) or, simply, as L1(µ).


Given a measurable function f : X → [−∞,∞] and a measurable set E, wedefine the integral of f over E to be

∫

E

fdµ =

∫fXE dµ,

where XE is the characteristic function of the set E.

Example A.2.5. Consider X = [0, 1] endowed with the Lebesgue measure m.Let f = XB, where B is the subset of rational numbers. Then m(B) = 0 andso, using Definition A.2.2, the Lebesgue integral of f is equal to zero. On theother hand, a direct calculation shows that every lower Riemann sum of f isequal to 0, while every upper Riemann sum of f is equal to 1. So, the Riemannintegral of f does not exist. Indeed, more generally, the Riemann integral of thecharacteristic function of a measurable set exists if and only if the boundary ofthe set has zero Lebesgue measure. Note that in the present case the boundaryis the whole [0, 1], which has positive Lebesgue measure.

Example A.2.6. Let x1, . . . , xm ∈ X and p1, . . . , pm > 0 with p1+· · ·+pm = 1.Let µ be the probability measure µ defined on 2X by

µ =

m∑

i=1

piδxi where δxi is the Dirac mass at xi.

In other words, µ(A) =∑

xi∈Api for every subset A of X . Then, for any

function f : X → [−∞,+∞],

∫f dµ =

m∑

i=1

pif(xi).

Proposition A.2.7. The set L1(µ) of all real integrable functions is a realvector space. Moreover, the map I : L1(µ) → R given by I(f) =

∫f dµ is a

positive linear functional:

(1)∫af + bg dµ = a

∫f dµ+ b

∫g dµ and

(2)∫f dµ ≥

∫g dµ if f(x) ≥ g(x) for every x.

In particular, |∫f dµ| ≤

∫|f | dµ if |f | ∈ L1(µ). Moreover, |f | ∈ L1(µ) if and

only if f ∈ L1(µ).

The notion of Lebesgue integral may be extended to an even broader classof functions, in two different ways. On the one hand, we may consider complexfunctions f : X → C. In this case, we say that f is integrable if and onlyif the real part ℜf and the imaginary part ℑf are both integrable. Then, bydefinition, ∫

f dµ =

∫ℜf dµ+ i

∫ℑf dµ.


On the other hand, we may consider functions that are not necessarily mea-surable but coincide with some measurable function on a subset of the domainwith total measure. To explain this, we need the following notion, which is usedfrequently throughout the text:

Definition A.2.8. We say that a property holds at µ-almost every point (orµ-almost everywhere) if the subset of points of X for which it does not hold iscontained in some zero measure set.

For example, we say that a sequence of functions (fn)n converges to somefunction at µ-almost every point if there exists some measurable set N ⊂ Xwith µ(N) = 0 such that f(x) = limn fn(x) for every x ∈ X \N . Analogously,we say that two functions f and g are equal at µ-almost every point if thereexists a measurable set N ⊂ X with µ(N) = 0 such that f(x) = g(x) for everyx ∈ X \ N . Clearly, this is an equivalence relation in the space of functionsdefined on X . Moreover, assuming that the two functions are integrable, itimplies that the two integrals coincide:

∫f dµ =

∫g dµ if f = g at µ-almost every point.

This observation permits to define the integral for any function f , possiblynon-measurable, that coincides at µ-almost every point with some measurablefunction g: it suffices to take

∫f dµ =

∫g dµ.

To close this section, let us observe that the notion of integral may also beextended to signed measures and even complex measures, as follows. Let µ bea signed measure and µ = µ+ − µ− be its Hahn decomposition. We say that afunction φ is integrable with respect to µ if it is integrable with respect to bothµ+ and µ−. Then we define:

∫φdµ =

∫φdµ+ −

∫φdµ−.

Similarly, let µ be a complex measure. By definition, a function φ is integrablewith respect to µ if it is integrable with respect to both the real part ℜµ andthe imaginary part ℑµ. Then we define:

∫φdµ =

∫φdℜµ−

∫φdℑµ.

A.2.2 Convergence theorems

Next, we mention three important results concerning the convergence of func-tions under the integral sign. The first one deals with monotone sequences offunctions:

Theorem A.2.9 (Monotone convergence). Let fn : X → [−∞,+∞] be a non-decreasing sequence of non-negative measurable functions. Consider the functionf : X → [−∞,+∞] defined by f(x) = limn fn(x). Then,

limn

∫fn dµ =

∫f(x) dµ.


The next result applies to much more general sequences, not necessarilymonotone:

Theorem A.2.10 (Lemma of Fatou). Let fn : X → [0,+∞] be a sequenceof non-negative measurable functions. Then the function f : X → [−∞,+∞]defined by f(x) = lim infn fn(x) is integrable and satisfies

∫lim inf

nfn(x) dµ ≤ lim inf

n

∫fn dµ.

The most powerful of the results in this section is the dominated conver-gence theorem, which asserts that we may take the limit under the integral signwhenever the sequence of functions is bounded by some integrable function:

Theorem A.2.11 (Dominated convergence). Let fn : X → R be a sequenceof measurable functions and assume that there exists some integrable functiong : X → R such that |fn(x)| ≤ |g(x)| for µ-almost every x in X. Assumemoreover that the sequence (fn)n converges at µ-almost every point to somefunction f : X → R. Then f is integrable and satisfies

limn

∫fn dµ =

∫f dµ.

In Exercise A.2.7 we invite the reader to deduce the dominated convergencetheorem from the Lemma of Fatou.

A.2.3 Product measures

Let (Xj ,Aj , µj), j = 1, . . . , n be finite measure spaces, that is, such thatµj(Xj) < ∞ for every j. One can endow the Cartesian product X1 × · · · ×Xn

with the structure of a finite measure space, in the following way. Consider onX1 × · · · ×Xn the σ-algebra generated by the family of all subsets of the formA1×· · ·×An with Aj ∈ Aj . This is called the product σ-algebra and is denotedby A1 ⊗ · · · ⊗ An.

Theorem A.2.12. There exists a unique measure µ on the measurable space(X1 × · · · ×Xn,A1 ⊗ · · · ⊗An) such that µ(A1 × · · · ×An) = µ1(A1) · · ·µn(An)for every A1 ∈ A1, . . . , An ∈ An. In particular, µ is a finite measure.

The proof of this result (see Theorem 35.B in the book of Halmos [Hal50])combines the extension theorem (Theorem A.1.13) with the monotone conver-gence theorem (Theorem A.2.9). The measure µ in the statement is the productof the measures µ1, . . . , µn and is denoted by µ1 × · · · × µn. In this way onedefines the product measure space

(X1 × · · · ×Xn,A1 ⊗ · · · ⊗ An, µ1 × · · · × µn).

Theorem A.2.12 remains valid when the measures µj are just σ-finite, exceptthat in this case the product measure µ is also only σ-finite.


Next, we describe the product of a countable family of measure spaces. Ac-tually, presently we restrict ourselves to the case of probability spaces. Let(Xj ,Bj, µj), j ∈ I be probability measure spaces with µj(Xj) = 1 for everyj ∈ I. What follows holds for both I = N and I = Z. Consider the Cartesianproduct

Σ =∏

j∈I

Xj = (xj)j∈I : xj ∈ Xj. (A.2.1)

We call cylinders of Σ all subsets of the form

[m;Am, . . . , An] = (xj)j∈I : xj ∈ Aj for m ≤ j ≤ n (A.2.2)

where m ∈ I and n ≥ m and Aj ∈ Bj for each m ≤ j ≤ n. Note that X itself isa cylinder: we may write X = [1;X1], for example. By definition, the productσ-algebra on Σ is the σ-algebra B generated by the family of all cylinders. Thefamily A of all finite unions of pairwise disjoint cylinders is an algebra and itgenerates the product σ-algebra B.

Theorem A.2.13. There exists a unique measure µ on (Σ,B) such that

µ([m;Am, . . . , An]) = µm(Am) · · ·µn(An) (A.2.3)

for every cylinder [m;Am, . . . , An]. In particular, µ is a probability measure.

The proof of this theorem (see Theorem 38.B in the book of Halmos [Hal50])uses the extension theorem (Theorem A.1.13) together with the theorem ofcontinuity at the empty set (Theorem A.1.14). The probability measure µ iscalled the product of the measures µj and is denoted as

∏j∈I µj . The probability

space (Σ,B, µ) is called the product of the spaces (Xj ,Bj , µj), j ∈ I.An important special case is when the spaces (Xi,Bi, µi) are all equal to

a given (X, C, ν). The corresponding product space may be used to model asequence of identical random experiments such that the outcome of each exper-iment is independent of all the others. To explain this, take X to be the set ofpossible outcomes of each experiment and let ν be the probability distributionof those outcomes. In this context, the measure µ = νI =

∏j∈I ν is usually

called the Bernoulli measure defined by ν. Property (A.2.3) corresponds to theidentity

µ([m;Am, . . . , An]) =

n∏

j=m

ν(Aj), (A.2.4)

which may be read in the following way: the probability of any composite eventxm ∈ Am, . . . , xn ∈ An is equal to the product of the probabilities of theindividual events xi ∈ Ai. So, (A.2.4) does reflect the assumption that thesuccessive experiments are mutually independent.

We have a special interest in the case when X is a finite set, endowed withthe σ-algebra C = 2X of all its subsets. In this case, it is useful to consider theelementary cylinders

[m; am, . . . , an] = (xj)j∈I ∈ X : xm = am, . . . , xn = an, (A.2.5)


corresponding to subsets Aj consisting of a single point aj . Observe that everycylinder is a finite union of pairwise disjoint elementary cylinders. In particular,the σ-algebra generated by the elementary cylinders coincides with the σ-algebragenerated by all the cylinders, and the same is true for the generated algebra.Moreover, the relation (A.2.4) may be written as

µ([m; am, . . . , an]) = pam · · · pan where pa = ν(a) for a ∈ X . (A.2.6)

Consider the finite set X endowed with the discrete topology. The producttopology on Σ = XI coincides with the topology generated by the elemen-tary cylinders. Moreover (see Exercise A.1.11), it coincides with the topologyassociated with the distance defined by

d((xi)i∈I , (yi)i∈I

)= θN , (A.2.7)

where θ ∈ (0, 1) is fixed and N = N((xi)i∈I , (yi)i∈I) ≥ 0 is the largest integernumber such that xi = yi for every i ∈ I with |i| < N .

A.2.4 Derivation of measures

Let m be the Lebesgue measure on Rd. Given a measurable subset A of Rd, wesay that a ∈ Rd is a density point of A if the subset A occupies most of everysmall neighborhood of a, in the following sense:

limδ→0

m(B(a, δ) ∩A)

m(B(a, δ))= 1. (A.2.8)

Theorem A.2.14. Let A be a measurable subset of Rd with Lebesgue measurem(A) positive. Then m-almost every a ∈ A is a density point of A.

In Exercise A.2.11 we propose a proof of this result. It is also a directconsequence of the theorem that we state in the sequel. We say that a functionf : Rd → R is locally integrable if the product fXK is integrable for everycompact set K ⊂ Rd.

Theorem A.2.15 (Lebesgue derivation). Let X = Rd and B be a Borel σ-algebra and m be the Lebesgue measure on Rd. Let f : X → R be a locallyintegrable function. Then

limr→0

1

m(B(x, r))

∫

B(x,r)

|f(y) − f(x)|dm = 0 at m-almost every point x.

In particular,

limr→0

1

m(B(x, r))

∫

B(x,r)

f(y)dm = f(x) at m-almost every point x.

The crucial ingredient in the proof of these results is the following geometricfact:


Theorem A.2.16 (Lemma of Vitali). Let m be the Lebesgue measure on Rd

and suppose that for every x ∈ R one is given a sequence (Bn(x))n of ballscentered at x with radii converging to zero. Let A ⊂ Rd be a measurable set withm(A) > 0. Then, for every ε > 0 there exist sequences (xj)j in R and (nj)j inN such that

1. the balls Bnj (xj) are pairwise disjoint;

2. m(∪j Bnj (xj) \A

)< ε and m

(A \ ∪jBnj (xj)

)= 0.

This theorem remains valid if, instead of balls, we take for (Bn(x))n anysequence of sets such that ∩nBn(x) = x and

supx,n

supd(x, y) : y ∈ Bn(x)infd(x, z) : z /∈ Bn(x)

<∞.

The set of measures defined on the same measurable space possesses a naturalpartial order relation:

Definition A.2.17. Let µ and ν be two measures in the same measurablespace (X,B). We say that ν is absolutely continuous with respect to µ if everymeasurable set E that satisfies µ(E) = 0 also satisfies ν(E) = 0; then we writeν ≪ µ. We say that µ and ν are equivalent if each one of them is absolutelycontinuous with respect to the other; then we write µ ∼ ν. In other words, twomeasures are equivalent if they have exactly the same zero measure sets.

Another very important result, known as the theorem of Radon-Nikodym,asserts that if ν ≪ µ then the measure ν may be seen as the product of µ bysome measurable function ρ:

Theorem A.2.18 (Radon-Nikodym). If µ and ν are finite measures such thatν ≪ µ then there exists a measurable function ρ : X → [0,+∞] such thatν = ρµ, meaning that∫φdν =

∫φρ dµ for any bounded measurable function φ : X → R. (A.2.9)

In particular, ν(E) =∫E ρ dµ for every measurable set E ⊂ X. Moreover, ρ

is essentially unique: any two functions satisfying (A.2.9) coincide at µ-almostevery point.

We call ρ the density, or Radon-Nikodym derivative, of ν relative to µ andwe write

ρ =dν

dµ.

Definition A.2.19. Let µ and ν be two measures in the same measurablespace (X,B). We say that µ and ν are mutually singular if there exist disjointmeasurable subsets A and B such that A∪B = X and µ(A) = 0 and ν(B) = 0.Then we write µ ⊥ ν.


The Lebesgue decomposition theorem states that, given any two finite mea-sures µ and ν in the same measurable space, we may write ν = νa+νs where νaand νs are finite measures such that νa ≪ µ and νs ⊥ µ. Combining this withthe theorem of Radon-Nikodym, we get:

Theorem A.2.20 (Lebesgue decomposition). Given any finite measures µ andν, there exist a measurable function ρ : X → [0,+∞] and a finite measure ηsuch that ν = ρµ+ η and η ⊥ µ.

A.2.5 Exercises

A.2.1. Prove that the integral of a simple function is well defined: if two lin-ear combinations of characteristic functions define the same function, then thevalues of the integrals obtained from the two combinations coincide.

A.2.2. Show that if (rn)n and (sn)n are non-decreasing sequences of non-negative functions converging at µ-almost every point to the same functionf : M → [0,+∞), then limn

∫rn dµ = limn

∫sn dµ.


A.2.4. (Tchebysheff-Markov inequality) Let f : M → R be a non-negativefunction integrable with respect to a finite measure µ. Then, given any realnumber a > 0,

µ(x ∈M : f(x) ≥ a

)≤ 1

a

∫

X

f dµ.

In particular, if∫|f | dµ = 0 then µ

(x ∈ X : f(x) 6= 0

)= 0.

A.2.5. Let f be an integrable function. Show that for every ε > 0 there existsδ > 0 such that |

∫Ef dµ| < ε for every measurable set E with µ(E) < δ.

A.2.6. Let ψ1, . . . , ψN : M → R be bounded measurable functions defined on aprobability space (M,B, µ). Show that for any ε > 0 there exist x1, . . . , xs ∈Mand positive numbers α1, . . . , αs such that

∑sj=1 αj = 1 and

|∫ψi dµ−

s∑

j=1

αjψi(xj)| < ε for every i = 1, . . . , N .

A.2.7. Deduce the dominated convergence theorem (Theorem A.2.11) from theLemma of Fatou (Theorem A.2.10).

A.2.8. A set F of measurable functions f : M → R is said to be uniformlyintegrable with respect to a probability measure µ if for every α > 0 there existsC > 0 such that

∫|f |>C |f | dµ < α for every f ∈ F . Show that

(a) F is uniformly integrable with respect to µ if and only if there existsL > 0 and for every ε > 0 there exists δ > 0 such that

∫|f | dµ < L and∫

A|f | dµ < ε for every f ∈ F and every measurable set A with µ(A) < δ.


(b) If there exists a function g : M → R integrable with respect to µ suchthat |f | ≤ |g| for every f ∈ F (we say that F is dominated by g) then theset F is uniformly integrable with respect to µ.

(c) If the set F is uniformly integrable with respect to µ then limn

∫fn dµ =∫

lim fn dµ for any sequence (fn)n in F such that limn fn exists at µ-almost every point.

A.2.9. Show that a is a density point of a set A ⊂ Rd if and only if

limδ→0

inf

m(B ∩A)

m(B): B a ball with a ∈ B ⊂ B(a, δ)

= 1 (A.2.10)

A.2.10. Let Pn, n ≥ 1 be a sequence of countable partitions of Rd into mea-surable subsets. Assume that the diameter diamPn = supdiamP : P ∈ Pnconverges to zero when n → ∞. Show that, given any measurable set A ⊂ Rd

with positive Lebesgue measure, it is possible to choose sets Pn ∈ Pn, n ≥ 1 insuch a way that m(A ∩ Pn)/m(Pn) → 1 when n→ ∞.

A.2.11. Prove Theorem A.2.14.

A.2.12. Consider x1, x2 ∈M and p1, p2, q1, q2 > 0 with p1 + p2 = q1 + q2 = 1.Let µ and ν be the probability measures given by

µ(A) =∑

xi∈A

pi, ν(A) =∑

xi∈A

qi,

that is, µ = p1δx1 + p2δx2 and ν = q1δx1 + q2δx2 . Check that ν ≪ µ and µ≪ νand calculate the corresponding Radon-Nikodym derivatives.

A.2.13. Construct a probability measure µ on [0, 1] absolutely continuous withrespect to the Lebesgue measure m and such that there exists a measurable setK ⊂ [0, 1] with µ(K) = 0 and m(K) = 1/2. In particular, m is not absolutelycontinuous with respect to µ. Could we require that m(K) = 1?

A.2.14. Assume that f : X → X is such that there exists a countable coverof M by measurable sets Bn, n ≥ 1, such that the restriction of f to each Bnis a bijection onto its image, with measurable inverse. Let η be a probabilitymeasure on M such that A ⊂ Bn and η(A) = 0 implies η(f(A)) = 0. Show thatthere exists a function Jη : X → [0,+∞] such que

∫

f(Bn)

ψ dη =

∫

Bn

(ψ f)Jη dη

for every bounded measurable function ψ : X → R and every n. Moreover, Jηis essentially unique.

A.2.15. Let µ = µ+−µ− be the Hahn decomposition of a finite signed measureµ. Show that there exist functions ρ± and τ± such that µ+ = ρ+|µ| = τ+µ andµ− = ρ−|µ| = τ−µ. Which functions are these?


A.2.16. Let (µn)n and (νn)n be two sequences of measures such that µ =∑

n µnand ν =

∑n νn are finite measures. Let µn =

∑ni=1 µi and νn =

∑ni=1 νi. Show

that if µn ≪ νn for every n then µ≪ ν and

dµ

dν= lim

n

dµndνn

at ν-almost every point.

A.3 Measures in metric spaces

In this appendix, unless stated otherwise, µ is a Borel probability measure on ametric space M , that is, a probability measure defined on the Borel σ-algebraof M . Most of the results extend immediately to finite Borel measures, in fact.

Recall that a metric space is a pair (M,d) where M is a set and d is adistance in M , that is, a function d : M ×M → R satisfying:

1. d(x, y) ≥ 0 for any x, y and the equality holds if and only if x = y;

2. d(x, y) = d(y, x) for any x, y;

3. d(x, y) ≤ d(x, z) + d(z, y) for any x, y, z.

We denote B(x, r) = y ∈M : d(x, y) < r and call it the ball of center x ∈Mand radius r > 0.

Every metric space has a natural structure of a topological space where thefamily of balls centered at each point is a basis of neighborhoods for that point.Equivalently, a subset of M is open if and only if it contains some ball centeredat each one of its points. In the converse direction, one says that a topologicalspace is metrizable if its topology can be defined in this way, from some distancefunction.

A.3.1 Regular measures

A first interesting fact is that any probability measure on a metric space iscompletely determined by the values it takes on the open subsets (or the closedsubsets) of the space.

Definition A.3.1. A (Borel) measure µ on a topological space is regular if forevery measurable subset B and every ε > 0 there exists a closed set F and anopen set A such that F ⊂ B ⊂ A and µ(A \ F ) < ε.

Proposition A.3.2. Any probability measure on a metric space is regular.

Proof. Let B0 be the family of all Borel subsets B for which the condition inthe definition holds, that is, such that for every ε > 0 there exist a closed set Fand an open set A satisfying F ⊂ B ⊂ A and µ(A \ F ) < ε. Begin by notingthat B0 contains all the closed subsets of M . Indeed, let B be any closed setand let Bδ denote the (open) set of points whose distance to B is less than δ.By Theorem A.1.14, we have that µ(Bδ \ B) → 0 when δ → 0. Hence, we maytake F = B and A = Bδ for δ > 0 sufficiently small.

A.3. MEASURES IN METRIC SPACES 463

It is immediate that the family B0 is closed under passing to the complement,that is, Bc ∈ B0 whenever B ∈ B0. Furthermore, consider any countable familyBn, n = 1, 2, . . . of elements of B0 and let B = ∪∞

n=1Bn. By hypothesis, forevery n ∈ N and ε > 0 there exist a closed set Fn and an open set An satisfyingFn ⊂ Bn ⊂ An and µ(An \ Fn) < ε/2n+1. The union A = ∪∞

n=1An is an openset and any finite union F = ∪mn=1Fn is a closed set. Fix m large enough that

µ( ∞⋃

n=1

Fn \ F)< ε/2

(recall Theorem A.1.14). Then F ⊂ B ⊂ A and

µ(A \ F

)≤

∞∑

n=1

µ(An \ Fn

)+ µ

( ∞⋃

n=1

Fn \ F)<

∞∑

n=1

ε

2n+1+ε

2= ε.

This shows that B ∈ B0. In this way, we have shown that B0 is a σ-algebra.Hence, B0 contains all the Borel subsets of M .

It follows that, as stated above, the values that the probability measure µtakes on the closed subsets of M determine µ completely: if ν is another prob-ability measure such that µ(F ) = ν(F ) for every closed set F then, passing tothe complement, µ(A) = ν(A) for every open set A and, using the theorem,µ(B) = ν(B) for every Borel set B. In other words, µ = ν. The same argu-ment shows that the values of µ on the open sets also determine the measurecompletely.

The proposition that we state and prove next implies that the values of theintegrals of the bounded continuous functions also determine the probabilitymeasure completely. Indeed, the same is true for the (smaller) set of boundedLipschitz functions.

Recall that a map h : M → N is Lipschitz if there exists some constantC > 0such that d(h(x), h(y)) ≤ Cd(x, y) for every x, y ∈ M . When it is necessary tospecify the constant, we say that the function h is C-Lipschitz. More generally,we say that h is Holder if there exist C, θ > 0 such that d(h(x), h(y)) ≤ Cd(x, y)θ

for every x, y ∈M . Then we also say that h is θ-Holder or even (C, θ)-Holder.

Proposition A.3.3. If µ and ν are probability measures on a metric space Mwith

∫ϕdµ =

∫ϕdν for every bounded Lipschitz function ϕ : M → R then

µ = ν.

Proof. We are going to use the following simple topological fact:

Lemma A.3.4. Given any closed subset F of M and any δ > 0, there existsa Lipschitz function gδ : M → [0, 1] such that gδ(x) = 1 for every x ∈ F andgδ(x) = 0 for every x ∈M such that d(x, F ) ≥ δ.

Proof. Consider the function h : R → [0, 1] given by h(s) = 1 if s ≤ 0 andh(s) = 0 if s ≥ 1 and h(s) = 1 − s if 0 ≤ s ≤ 1. Define

g : M → [0, 1], g(x) = h(1δd(x, F )

).


Note that g is Lipschitz, since it is a composition of Lipschitz functions. Theother properties in the lemma follow immediately from the definition.

Now we may finish the proof of Proposition A.3.3. Let F be any closedsubset of M and, for every δ > 0, let gδ : M → [0, 1] be a function as in thelemma above. By assumption,

∫gδ dµ =

∫gδ dν for every δ > 0.

Moreover, by the dominated convergence theorem (Theorem A.2.11),

limδ→0

∫gδ dµ = µ(F ) and lim

δ→0

∫gδ dν = ν(F ).

This shows that µ(F ) = ν(F ) for every closed subset F . As pointed out before,the latter implies µ = ν.

As observed in Example A.1.29, continuous maps are automatically mea-surable relative to the Borel σ-algebra. The result that we prove next assertsthat, under a simple condition on the metric space, there is a kind of converse:measurable maps are continuous, restricted to certain subsets with almost fullmeasure.

A subset of a topological space M is dense if it intersects every open subsetof M . We say that the space M is separable if it admits some countable densesubset. In the special case of metric spaces this is equivalent to saying that thetopology admits a countable basis of open sets (Exercise A.3.1).

Theorem A.3.5 (Lusin). Let ϕ : M → N be a measurable map with valuesin some separable metric space N . Given any ε > 0, there exists a closed setF ⊂M such that µ(M \ F ) < ε and the restriction of ϕ to F is continuous.

Proof. Let xn : n ∈ N be a countable dense subset of N and, for every k ≥ 1,letBn,k be the ball of center xn and radius 1/k. Fix ε > 0. By Proposition A.3.2,for every (n, k) we may find an open set An,k ⊂ M containing ϕ−1(Bn,k) andsatisfying µ(An,k \ ϕ−1(Bn,k)) < ε/2n+k+1. Define

E =

∞⋂

n,k=1

(ϕ−1(Bn,k) ∪Acn,k

).

On the one hand,

µ(M \ E) ≤∞∑

n,k=1

µ(An,k \ ϕ−1(Bn,k)) <

∞∑

n,k=1

ε

2n+k+1=ε

2.

On the other hand, every ϕ−1(Bn,k) is an open subset of ϕ−1(Bn,k)∪Acn,k, since

the complement is the closed set Acn,k. Consequently, ϕ−1(Bn,k) is open in Efor every (n, k). This shows that the restriction of ϕ to the set E is continuous.To conclude the proof it suffices to use Proposition A.3.2 once more to find aclosed set F ⊂ E such that µ(E \ F ) < ε/2.


A.3.2 Separable complete metric spaces

Next, we discuss another important property of measures on metric spaces thatare both separable and complete. Recall that the latter means that every Cauchysequence converges.

Definition A.3.6. A (Borel) measure µ on a topological space is tight if forevery ε > 0 there exists a compact subset K such that µ(Kc) < ε.

Since every closed subset of a compact metric space is also compact, itfollows immediately from Proposition A.3.2 that every probability measure ona compact metric space is tight. However, this conclusion is a lot more general:

Proposition A.3.7. Every probability measure on a separable complete metricspace is tight.

Proof. Let pk : k ∈ N be a countable dense subset of M . Then, for everyn ≥ 1, the closed balls B(pk, 1/n), k ∈ N form a countable cover of M . Givenε > 0 and n ≥ 1, fix k(n) ≥ 1 in such a way that the (closed) set

Ln =

k(n)⋃

k=1

B(pk, 1/n)

satisfies µ(Ln) > 1 − ε/2n. Take K = ∩∞n=1Ln. Note that K is closed and

µ(Kc) ≤ µ( ∞⋃

n=1

Lcn)<

∞∑

n=1

ε

2n= ε.

It remains to check that K is compact. For that, it is enough to show that everysequence (xi)i in K admits some Cauchy subsequence (since M is complete, thissubsequence converges). Such a subsequence may be found as follows. Sincexi ∈ L1 for every i, there exists l(1) ≤ k(1) such that the set of indices

I1 = i ∈ N : xi ∈ B(pl(1), 1)

is infinite. Let i(1) be the smallest element of I1. Next, since xi ∈ L2 for everyi, there exists l(2) ≤ k(2) such that

I2 = i ∈ I1 : xi ∈ B(pl(2), 1/2)

is infinite. Let i(2) be the smallest element of I2 \ i(1). Repeating thisprocedure, we construct a decreasing sequence In of infinite subsets of N, andan increasing sequence i(1) < i(2) < · · · < i(n) < · · · of integer numbers suchthat i(n) ∈ In and all the xi, i ∈ In are contained in the same closed ball ofradius 1/n. In particular,

d(xi(a), xi(b)) ≤ 2/n for every a, b ≥ n.

This shows that the subsequence (xi(n))n is indeed Cauchy.


Corollary A.3.8. Assume that M is a separable complete metric space and µis a probability measure on M . For every ε > 0 and every Borel set B ⊂ Mthere exists a compact set L ⊂ B such that µ(B \ L) < ε.

Proof. By Proposition A.3.2, we may find some closed set F ⊂ B such thatµ(B \F ) < ε/2. By Theorem A.3.5, there exists a compact subset K ⊂M suchthat µ(M \K) < ε/2. Take L = F∩K. Then L is compact and µ(B\L) < ε.

Analogously, when the metric space M is separable and complete we canimprove the statement of Lusin’s theorem, replacing ‘closed’ with ‘compact’ inthe conclusion:

Theorem A.3.9 (Lusin). Assume that M is a separable complete metric spaceand µ is a probability measure on M . Let ϕ : M → N be a measurable mapwith values in a separable metric space N . Then, given any ε > 0 there existsa compact set K ⊂M such that µ(M \K) < ε and the restriction of ϕ to K iscontinuous.

We close this section with another important fact about measures on sepa-rable complete metric spaces. A measure µ is called atomic if there exists somepoint x such that µ(x) > 0; any such point is called an atom. Otherwise, themeasure µ is said to be non-atomic.

The next theorem states that every non-atomic probability measure on aseparable complete metric space is equivalent to the Lebesgue measure in theinterval. The proof is given in Section 8.5.

Theorem A.3.10. Let M be a separable complete metric space and µ be anon-atomic probability measure on M . Then there exists a measurable mapψ : M → [0, 1] such that ψ is a bijection with measurable inverse, restricted toa subset with full measure, and ψ∗µ is the Lebesgue measure on [0, 1].

A.3.3 Space of continuous functions

Let M be a compact metric space. We are going to describe some importantproperties of the vector space C0(M) of continuous functions, real or complex,defined on M . We consider on this space the norm of uniform convergence,given by:

‖φ‖ = sup|φ(x)| : x ∈M.This norm is complete and, hence, endows C0(M) with the structure of a Banachspace.

The conclusions of the previous sections hold in this setting, since everycompact metric space is separable and complete. Another useful fact aboutcompact metric spaces is that every open cover admits some Lebesgue number,that is, some number ρ > 0 such that for every x ∈M there exists some elementof the cover that contains the ball B(x, ρ).

A linear functional Φ : C0(M) → C is said to be positive if Φ(ϕ) ≥ 0 forevery function ϕ ∈ C0(M) with ϕ(x) ≥ 0 for every x ∈ M . The theorem of


Riesz-Markov (see Theorem 6.19 in the book of Rudin [Rud87]) shows that theonly positive linear functionals on C0(M) are the integrals:

Theorem A.3.11 (Riesz-Markov). Let M be a compact metric space. Considerany positive linear functional Φ : C0(M) → C . Then there exists a unique finiteBorel measure µ on M such that

Φ(ϕ) =

∫ϕdµ for every ϕ ∈ C0(M).

Moreover, µ is a probability measure if and only if Φ(1) = 1.

The next result, which is also known as the theorem of Riesz-Markov, givesan analogous representation for continuous linear functionals in C0(M), notnecessarily positive. Recall that the norm of a linear functional Φ : C0(M) → Cis defined by

‖Φ‖ = sup |Φ(ϕ)|

‖ϕ‖ : ϕ 6= 0

(A.3.1)

and that Φ is continuous if and only if the norm is finite.

Theorem A.3.12 (Riesz-Markov). Let M be a compact metric space. Considerany continuous linear functional Φ : C0(M) → C. Then there exists somecomplex Borel measure µ on M such that

Φ(ϕ) =

∫ϕdµ for every ϕ ∈ C0(M).

The norm ‖µ‖ = |µ|(X) of the measure µ coincides with the norm ‖Φ‖ of thefunctional Φ. Moreover, µ takes values in [0,∞) if and only if Φ is positive andµ takes values in R if and only if Φ(ϕ) ∈ R for every real function ϕ.

In other words, this last theorem asserts that the dual space of C0(M) isisometrically isomorphic to M(M). Theorems A.3.11 and A.3.12 extend tolocally compact topological spaces, with suitable assumptions on the behaviorof the functions at infinity. In this context the measure µ is still regular, butnot necessarily finite.

We also use the fact that the space C0(M) has countable dense subsets(Exercise A.3.6 is a particular instance):

Theorem A.3.13. If M is a compact metric space then C0(M) is separable.

Proof. We treat the case of real functions; the complex case is entirely analogous.Every compact metric space is separable. Let xk : k ∈ N be a countable densesubset of M . For each k ∈ N , consider the function fk : M → R defined byfk(x) = d(x, xk). Represent by A the set of all functions f : M → R of the form

f = c+∑

k1,...,ks

ck1,...,ksfk1 · · · fks (A.3.2)


with c ∈ R and ck1,...,ks ∈ R for every k1, . . . , ks ∈ N. It is clear that A containsall constant functions. Observe also that A is an algebra of functions, thatis, it is closed under the operations of addition and multiplication (includingmultiplication by any constant). Moreover, A separates the points of M , in thesense that for any x 6= y there exists some f ∈ A such that f(x) 6= f(y). Tosee that, fix ε > 0 such that d(x, y) > 2ε, consider k ∈ N such that d(x, xk) < εand then take f = fk. Note that f(x) = d(x, xk) < ε while, by the triangleinequality, f(y) = d(y, xk) ≥ d(x, y)− d(x, xk) > ε. So, the algebra of functionsA is separating, as we claimed. Now, the theorem of Stone-Weierstrass (see[DS57, Theorem 4.6.16]) asserts that every separating subalgebra of the space ofcontinuous functions that contains the constant function 1 is dense in C0(M).The previous observations show that this applies to A. It follows that the(countable) set of functions of the form (A.3.2) with c ∈ Q and ck1,...,ks ∈ Q isalso dense in C0(M).

A.3.4 Exercises

A.3.1. Let M be a metrizable topological space. Justify that every point ofM admits a countable basis of neighborhoods. Check that M is separable ifand only if it admits a countable basis of open sets. Give examples of separablemetric spaces and non-separable metric spaces.

A.3.2. Let µ be a finite measure on a metric space M . Show that for everyclosed set F ⊂M there exists some finite or countable set E ⊂ (0,∞) such that

µ(x ∈M : d(x, F ) = r) = 0 for every r ∈ (0,∞) \ E.

A.3.3. Let µ be a finite measure on a separable metric space M . Show thatfor every ε > 0 there exists a countable partition of M into measurable subsetswith diameter less than ε and whose boundaries have measure zero.

A.3.4. Let µ be a probability measure on [0, 1] and φ : [0, 1] → [0, 1] be thefunction given by φ(x) = µ([0, x]). Check that φ is continuous if and only if µ isnon-atomic. Check that φ is absolutely continuous if and only if µ is absolutelycontinuous with respect to the Lebesgue measure.

A.3.5. Let µ be a probability measure on some metric space M . Show thatfor every integrable function ψ : M → R there exists a sequence ψn : M → R,n ≥ 1 of uniformly continuous functions converging to ψ at µ-almost everypoint. Moreover, if ψ is bounded then we may choose the sequence in sucha way that sup |ψn| ≤ sup |ψ| for every n. Do these claims remain true if werequire convergence at every point?

A.3.6. Without using Theorem A.3.13, show that the space C0([0, 1]d) of con-tinuous functions, real or complex, on the compact unit cube is separable, forevery d ≥ 1.

A.4. DIFFERENTIABLE MANIFOLDS 469

A.4 Differentiable manifolds

In this appendix we review some fundamental notions and facts from DifferentialTopology and Riemannian Geometry.

A.4.1 Differentiable manifolds and maps

A differentiable manifold of dimension d is a (Hausdorff) topological space Mendowed with a differentiable atlas of dimension d, that is, a family of homeo-morphisms ϕα : Uα → Xα such that

1. each Uα is an open subset of M and each Xα is an open subset of Rd andM = ∪αUα;

2. the map ϕβ ϕ−1α : ϕα(Uα ∩ Uβ) → ϕβ(Uα ∩ Uβ) is differentiable, for any

α and β such that Uα ∩ Uβ 6= ∅.

More generally, instead of Rd we may consider any Banach space E. Then wesay that M is a differentiable manifold modelled on the space E.

The homeomorphisms ϕα are called local charts, or local coordinates, and thetransformations ϕβ ϕ−1

α are called coordinate changes. Exchanging the roles ofα and β, we see that the inverse (ϕβ ϕ−1

α )−1 = ϕα ϕ−1β is also differentiable.

So, the definition of differentiable manifold requires the coordinate changes tobe diffeomorphisms between open subsets of Euclidean space.

Unless explicitly stated otherwise, we only consider manifolds such that Madmits a countable basis of open sets and is connected. The latter means thatno subset of M is both open and closed, except for M and ∅.

Let r ∈ N ∪ ∞. If every coordinate change is of class Cr (that is, allits partial derivatives up to order r exist and are continuous), we say that themanifold M (and the atlas ϕα : Uα → Xα) are of class Cr. Clearly, everymanifold of class Cr is also of class Cs for every s ≤ r.

Example A.4.1. The following are manifolds of class C∞ and dimension d:Euclidean space Rd: consider the atlas consisting of a unique map, namely,

the identity map Rd → Rd.Sphere Sd = (x0, x1, . . . , xd) ∈ Rd+1 : x2

0 + x21 + · · ·+ x2

d = 1: consider theatlas formed by the two stereographic projections:

Sd \ (1, 0, . . . , 0) → Rd, (x0, x1, . . . , xd) 7→ (x1, . . . , xd)/(1 − x0)

Sd \ (−1, 0, . . . , 0) → Rd, (x0, x1, . . . , xd) 7→ (x1, . . . , xd)/(1 + x0).

Torus Td = Rd/Zd: consider the atlas formed by the inverses of the mapsgz : (0, 1)d → Td, defined by gz(x) = z + x mod Zd for every z ∈ Rd.

Example A.4.2 (Grassmannian manifolds). Given 0 ≤ k ≤ d, denote byGr(k, d) the set of all vector subspaces of dimension k of the Euclidean spaceRd. For each j1 < · · · < jk, denote by Gr(k, d, j1, . . . , jk) the subset of elementsof Gr(k, d) that are transverse to (xj)j ∈ Rd : xj1 = · · · = xjk = 0. For every


V ∈ Gr(k, d, j1, . . . , jk) there exists a unique matrix (ui,j)i,j with (d − k) rowsand k columns such that

V =(xj)j ∈ Rd : xi = ui,j1xj1 + · · · + ui,jkxjk for every i /∈ j1, . . . , jk

.

The maps Gr(k, d, j1, . . . , jk) → R(d−k)k associating with each V the corre-sponding matrix (ui,j)i,j constitute an atlas of class C∞ for Gr(k, d). So, everyGr(k, d) is a manifold of class C∞ and dimension (d− k)k.

Let M be a manifold of dimension d and A = ϕα : Uα → Xα be thecorresponding atlas. Let S be a subset of M . We say that S is a submanifold ofdimension k < d if there exists some atlas B = ψβ : Vβ → Yβ of M such that

(a) A and B are compatible: the coordinate changes ψβ ϕ−1α and ϕα ψ−1

β

are differentiable in their domains, for every α and every β;

(b) for every β, the local chart ψβ maps V ′β = Vβ ∩ S onto an open subset Y ′

β

of Rk × 0d−k.Identifying Rk×0d−k ≃ Rk, we get that the family formed by the restrictionsψβ : V ′

β → Y ′β constitutes an atlas for S. Hence, S is a manifold of dimension k.

If M is a manifold of class Cr and the atlases A and B are Cr-compatible, thatis, if all the coordinate changes in (a) are of class Cr, then S is a (sub)manifoldof class Cr.

We say that a map f : M → N between two manifolds is differentiable if

ψβ f ϕ−1α : ϕα(Uα ∩ f−1(Vβ)) → ψβ(Vβ ∩ f(Uα)) (A.4.1)

is a differentiable map for every local chart ϕα : Uα → Xα of M and every localchart ψβ : Vβ → Yβ of N with f(Uα) ∩ Vβ 6= ∅. Moreover, we say that f is ofclass Cr if M and N are manifolds of class Cr and every map ψβ f ϕ−1

α in(A.4.1) is of class Cr. A diffeomorphism f : M → N is a bijection between twomanifolds such that both f and f−1 are differentiable. If both maps are of classCr then we say that the diffeomorphism is of class Cr.

Let Cr(M,N) be the space of maps of class Cr between two manifolds Mand N . We are going to introduce in this space a certain topology, called Cr

topology, for which two maps are close if and only if they are uniformly closeand the same is true for their derivatives up to order r. The definition may begiven in a very broad context (see Section 2.1 of Hirsch [Hir94]), but we restrictourselves to the case when M and N are compact. In this case, the Cr topologymay be defined in the following way.

Fix finite families of local charts ϕi : Ui → Xi of M and ψj : Vj → Yj of N ,such that ∪iUi = M and ∪jVj = N . Let δ > 0 be a Lebesgue number for theopen cover Ui∩f−1(Vj) of M . For each pair (i, j) such that Ui∩f−1(Vj) 6= ∅,let Ki,j be the set of points whose distance to the complement of Ui ∩ f−1(Vj)is bigger or equal than δ. Then Ki,j is a compact set contained in Ui ∩ f−1(Vj)and the union ∪i,jKi,j is the whole M . Consider

U(f) = g ∈ Cr(M,N) : g(Ki,j) ⊂ Vj for any i, j.


It is clear that f ∈ U(f). For each g ∈ U(f) and each pair (i, j) such that Ki,j

is non-empty, denote by gi,j the restriction of ψj g ϕ−1i to the set ϕi(Ki,j).

For each r ∈ N and ε > 0, define

Ur(f, ε) = g ∈ U(f) : sups,x,i,j

‖Dsfi,j(x) −Dsgi,j(x)‖ < ε, (A.4.2)

where the supremum is over every s ∈ 1, . . . , r, every x ∈ ϕi(Ki,j) and everypair (i, j) such that Ki,j 6= ∅. By definition, the family Ur(f, ε) : ε > 0 is abasis of neighborhoods of each f ∈ Cr(M,N) relative to the Cr topology. Alsoby definition, the family Ur(f, ε) : ε > 0 and r ∈ N is a basis of neighborhoodsof f ∈ C∞(M,N) relative to the C∞ topology.

The Cr topology has very nice properties: in particular, it admits a countablebasis of open sets and is completely metrizable, that is, it is generated by somecomplete distance. An interesting consequence is that Cr(M,N) is a Bairespace: every intersection of a countable family of open dense subsets is dense inthe space. The set Diffeor(M) of diffeomorphisms of class Cr is an open subsetof Cr(M,M) relative to the Cr topology.

A.4.2 Tangent space and derivative

Let M be a manifold. For each p ∈ M , consider the set C(p) of all the curvesc : I → M whose domain is some open interval I containing 0 ∈ R, such thatc(0) = p and c is differentiable at the point 0. The latter means that the mapϕα c is differentiable at the point 0 for every local chart ϕα : Uα → Xα withp ∈ Uα. We say that two curves c1, c2 ∈ C(p) are equivalent if (ϕα c1)′(0) =(ϕα c2)′(0) for every local chart ϕα : Uα → Xα with p ∈ Uα. Actually, if theequality holds for some local chart then it holds for all the other charts as well.We denote by [c] the equivalence class of any curve c ∈ C(p).

The tangent space to the manifold M at the point p is the set of suchequivalence classes. We denote this set by TpM . For any fixed local chartϕα : Uα → Xα with p ∈ Uα, the map

Dϕα(p) : TpM → Rd, [c] 7→ (ϕα c)′(0)

is well defined and is a bijection. We may use this bijection to identify TpMwith Rd. In this way, the tangent space acquires the structure of a vector space,transported from Rd via Dϕα(p). Although this identification Dϕα(p) dependson the choice of the local chart, the vector space structure on TpM does not.That is because, for any other local chart ϕβ : Uβ → Xβ with p ∈ Uβ , thecorresponding map Dϕβ(p) is given by

Dϕβ(p) = D(ϕβ ϕ−1

α

)(ϕα(p)) Dϕα(p).

SinceD(ϕβϕ−1

α

)(ϕα(p)) is a linear isomorphism, it follows that the vector space

structures transported from Euclidean space to TpM by Dϕα(p) and Dϕβ(p)coincide, as we stated.


If f : M → N is a differentiable map, its derivative at a point p ∈M is thelinear map Df(p) : TpM → Tf(p)N defined by

Df(p) = Dψβ(f(p))−1 D(ψβ f ϕ−1

α

)(ϕα(p)) Dϕα(p),

where ϕα : Uα → Xα is a local chart of M with p ∈ Uα and ψβ : Vβ → Yβ is alocal chart of N with f(p) ∈ Vβ . The definition does not depend on the choiceof these local charts.

The tangent bundle to M is the (disjoint) union TM = ∪p∈MTpM of all thetangent spaces to M . For each local chart ϕα : Uα → Xα, consider the unionTUαM = ∪p∈UαTpM and the map

Dϕα : TUαM → Xα × Rd

that associates with each [c] ∈ TUαM the pair

((ϕα c)(0), (ϕα c)′(0)) ∈ Xα × Rd.

We consider on TM the (unique) topology that turns every Dϕα into a home-omorphism. Assuming that the atlas ϕα : Uα → Xα of the manifold M is ofclass Cr, the coordinate change

Dϕβ Dϕ−1α : ϕα(Uα ∩ Uβ) × Rd → ϕβ(Uα ∩ Uβ) × Rd

is a map of class Cr−1 for any α and β such that Uα ∩Uβ 6= ∅. So, the tangentbundle TM is endowed with the structure of a manifold of class Cr−1 anddimension 2d.

The derivative Df : TM → TN of a differentiable map f : M → N is themap whose restriction to each tangent space TpM is given by Df(p). If f isof class Cr then Df is of class Cr−1, relative to the manifold structure on thetangent bundles TM and TN that we introduced in the previous paragraph.For example, the canonical projection π : TM → M , associating with eachv ∈ TM the unique point p ∈ M such that v ∈ TpM , is a map of class Cr−1

(Exercise A.4.9).A vector field on a manifold M is a map that associates with each point

p ∈M an element X(p) of the tangent space TpM , that is, a map X : M → TMsuch that π X = id . We say that the vector field is of class Ck, with k ≤ r−1,if this map is of class Ck.

Assuming that k ≥ 1, we may apply the theorem of existence and uniquenessof solutions of ordinary differential equations to conclude that for every pointp ∈M there exists a unique curve cp : Ip →M such that

• cp(0) = p and c′p(t) = X(c(t)) for every t ∈ Ip and

• Ip is the largest open interval where such a curve can be defined.

If M is compact then Ip = R for any p ∈M . Moreover, the maps f t : M → Mdefined by f t(p) = cp(t) are diffeomorphisms of class Ck, with f0 = id andfs f t = fs+t for any s, t ∈ R. The family f t : t ∈ R is called the flow of thevector field X .


A.4.3 Cotangent space and differential forms

The cotangent space T ∗pM to a manifoldM at a point p is the dual of the tangent

space TpM , that is, the space of linear functionals ξ : TpM → R. For any localchart ϕα : Uα → Xα with p ∈ Uα, the isomorphism Dϕα(p) : TpM → Rd

induces an isomorphismDϕ∗

α(p) : T ∗pM → Rd

as follows. For each i = 1, . . . , d, let dxi = πi Dϕα(p), where πi : Rd → R isthe projection to the i-th coordinate. Then dxi ∈ T ∗

pM and, in fact, the familydx1, . . . , dxd is a basis of T ∗

pM . For each ξ ∈ T ∗pM , define

Dϕ∗α(p)ξ = (ξ1, . . . , ξd) ⇔ ξ =

d∑

i=1

ξidxi.

The cotangent bundle of M is the (disjoint) union T ∗M = ∪p∈MT ∗pM of all

the cotangent spaces to M . For each local chart ϕα : Uα → Xα, consider theunion T ∗

UαM = ∪p∈UαT

∗pM and the map

Dϕ∗α : T ∗

UαM → Xα × Rd

defined by Dϕ∗αξ = (ϕα(p), Dϕ∗

α(p)ξ) if ξ ∈ T ∗PM . It is clear that this is a

bijection. We consider on T ∗M the unique topology that turns every Dϕ∗α into

a homeomorphism. If ϕα : Uα → Xα is an atlas of class Cr for the manifoldM then

Dϕ∗α : T ∗

UαM → Xα × Rd

is an atlas of class Cr−1 for T ∗M . So, the cotangent bundle T ∗M is also endowedwith the structure of a manifold of class Cr−1 and dimension 2d. Moreover, thecanonical map π∗ : T ∗M →M defined by π∗ | T ∗

pM = p is of class Cr−1.A differential 1-form in M is a differentiable map θ : M → T ∗M such that

π∗ θ = id . In other words, θ assigns to each point p ∈ M a linear functional(or linear form) θp : TpM → R that depends differentiably on the point.

More generally, for any 0 ≤ k ≤ d, an alternate k-linear form in TpM is amap1

θp : (TpM)k → R, (v1, . . . , vk) 7→ θp(v1, . . . , vk)

such that θp is linear on each variable vi and

θp(v1, . . . , vi, vi+1, . . . , vk) = −θp(v1, . . . , vi+1, vi, . . . , vk)

for any 1 ≤ i < k and any (v1, . . . , vk) ∈ (TpM)k.Let dx1, . . . , dxd be the basis of the cotangent space associated with a

local chart ϕα : Uα → Xα and let ∂/∂x1, . . . , ∂/∂xd be the dual basis ofTpM , defined by

dxi(∂/∂xj) =

1 if i = j0 if i 6= j.

1An alternate 0-linear form is just a real number.


If i1 . . . , ik ∈ 1, . . . , d are all distinct, there exists a unique alternate k-linearform dxi1 ∧ · · · ∧ dxik such that

• dxi1 ∧ · · · ∧ dxik(∂/∂xi1 , . . . , ∂/∂xik) = 1 and

• dxi1 ∧· · · ∧dxik (∂/∂xj1 , . . . , ∂/∂xjk) = 0 when i1, . . . , ik 6= j1, . . . , jk.

The family dxi1 ∧ · · · ∧ dxik : 1 ≤ i1 < · · · < ik ≤ d is a basis of the vectorspace of alternate k-linear forms in TpM .

A differential k-form in M is a map θ assigning to each point p ∈ M analternate k-linear form in the tangent space TpM that depends differentiably onthe point. In local coordinates, this may be written as

θp =∑

1≤i1<···<ik≤d

ai1,...,ik(p)dxi1 ∧ · · · ∧ dxik .

The differentiability condition means that the coefficients ai1,...,ik(p) dependdifferentiably on the point p.

Assuming that k < d, the exterior derivative of θ is the differential (k + 1)-form dθ determined by

dθp =∑

1≤i1<···<ik≤d

∑

j

∂ai1,...,ik∂xj

(p)dxj ∧ dxi1 ∧ · · · ∧ dxik

where the second sum is over all j /∈ i1, . . . , ik; one can check that the expres-sion on the right-hand side does not depend on the choice of the local chart.A differential k-form θ is closed if dθ = 0 (or else k = d) and it is exact ifthere exists some (k − 1)-form η such that dη = θ (or else k = 0). Every exactdifferential form is closed.

For much more information on the subject of differential forms, see the bookof Henri Cartan [Car70].

A.4.4 Transversality

The result that we state next is a main tool for constructing new manifolds.We say that y ∈ N is a regular value of a differentiable map f : M → N if thederivative Df(x) : TxM → TyN is surjective for every x ∈ f−1(y). Note thatthis holds, automatically, if y is not in the image of f , that is, if f−1(y) is theempty set. On the other hand, in order that some point y ∈ f(M) be a regularvalue of f it is necessary that dimM ≥ dimN .

Theorem A.4.3. Let f : M → N be a map of class Cr and y ∈ f(M) be aregular value of f . Then f−1(y) is a submanifold (not necessarily connected) ofclass Cr of M , with dimension equal to dimM − dimN .

Example A.4.4. For any d ≥ 1, the space of square matrices of dimensiond with real coefficients is isomorphic to the Euclidean space R(d2) and, hence,it is a manifold of dimension d2 and class C∞. The linear group GL(d,R)


of invertible matrices is an open subset of that space and, hence, it is also amanifold of dimension d2 and class C∞. The function det : GL(d,R) → R thatmaps each matrix to its determinant is of class C∞ and y = 1 is a regularvalue (see Exercise A.4.5). Using Theorem A.4.3, it follows that the speciallinear group SL(d,R), formed by the matrices with determinant equal to 1, is asubmanifold of class C∞ of GL(d,R), with dimension equal to d2 − 1.

It is possible to generalize Theorem A.4.3, using the notion of transversality.We say that a submanifold S of N is transverse to f if

Df(x)(TxM

)+ Tf(x)S = Tf(x)N for every x ∈ f−1(S). (A.4.3)

For example, if S is a submanifold of dimension zero, that is, if it consists ofa unique point, then S is transverse to f if and only if that point is a regularvalue of f . Therefore, the following statement generalizes Theorem A.4.3:

Theorem A.4.5. Let f : M → N be a map of class Cr and let S be asubmanifold of class Cr of N transverse to f . Then f−1(S) is a subman-ifold (not necessarily connected) of class Cr of M , with dimension equal todimM − dimN + dimS.

The next theorem asserts that, for every map f : M → N of class Cr withr sufficiently high, “almost all” points y ∈ N are regular values. We say thata set X ⊂ N is residual if it contains some countable intersection of open anddense subsets. Every residual set is dense in the manifold, because manifoldsare Baire spaces. We say that a set Z ⊂ N has volume zero if for every localchart ψβ : Vβ → Yβ the image ψβ(Z ∩ Vβ) is a subset of the Euclidean spacewith volume zero, that is, it may be covered by balls in such a way that thesum of the volumes of those balls is arbitrarily small.

Theorem A.4.6 (Sard). Assume that f : M → N is a map of class Cr withr > max0, dimM − dimN. Then the set of regular points of f is a residualsubset of N and its complement has volume zero.

A.4.5 Riemannian manifolds

A Riemannian metric on a manifold M is a map that associates with each pointp ∈M an inner product in the tangent space TpM , that is, a symmetric bilinearmap

·p : TpM × TpM → R

such that v ·pv > 0 for every non-zero vector v ∈ TpM . As part of the definition,this inner product is required to vary in a differentiable way with the point p, inthe following sense. Consider any local chart ϕα : Uα → Xα of M . As explainedpreviously, for every p ∈ Uα we may identify TpM with Rd, through the mapDϕα(p). Thus, we may view ·p as an inner product in the Euclidean space. Lete1, . . . , ed be a basis of Rd. Then the functions gα,i,j(p) = ei ·p ej are required tobe differentiable, for every pair (i, j) and any choice of the local chart ϕα andthe basis e1, . . . , ed.


We call Riemannian manifold any manifold endowed with a Riemannianmetric. Every submanifold S of a Riemannian manifold M inherits a structureof Riemannian manifold, given by the restriction of the inner product ·p of Mto the tangent subspace TpS of each point p ∈ S. Every compact manifoldadmits (infinitely many) Riemannian metrics. That follows from the theorem ofWhitney (see Section 1.3 of Hirsch [Hir94]), according to which every compactmanifold may be realized as a submanifold of some Euclidean space. Actually,this remains true in the much larger class of paracompact manifolds (which wedo not define here): every paracompact manifold of dimension d is diffeomorphicto some submanifold of R2d. In particular, paracompact manifolds are alwaysmetrizable.

Starting from the Riemannian metric, we may define the length of a differ-entiable curve γ : [a, b] →M , by

length(γ) =

∫ b

a

‖γ′(t)‖γ(t) dt, where ‖v‖p = (v ·p v)1/2.

This also allows us to define on the manifold M the following distance associatedwith the Riemannian metric: the distance d(p, q) between two points p, q ∈ Mis the infimum of the lengths of all the differentiable curves connecting the twopoints. We say that a differentiable curve γ : [a, b] → M is minimizing if itrealizes the distance between its endpoints, that is, if

length(γ) = d(γ(a), γ(b)).

Any two points p, q ∈ M are connected by some minimizing curve; in otherwords, the infimum in the definition of d(p, q) is always realized.

A differentiable curve γ : I → M defined on an open interval I is calleda geodesic if it is locally minimizing, in the following sense: for every c ∈ Ithere exists δ > 0 such that the restriction of γ to the interval [c − δ, c + δ] isminimizing. Every minimizing curve is a geodesic, but the converse is not true:for example, the great circles are geodesics on the sphere S2, but closed curvescan not be minimizing. An important fact is that if γ is a geodesic then the norm‖γ′(t)‖γ(t) is constant on the domain I. The theory of ordinary differentiableequations may be used to show that for every p ∈M and every v ∈ TpM thereexists a unique geodesic γp,v : Ip,v → M such that γp,v(0) = p and γ′p,v(0) = vand Ip,v is a maximal interval such that γp,v is locally minimizing.

If the manifold M is compact then Ip,v = R for every p ∈ M and everyv ∈ TpM . Then we define the exponential map at each point p ∈M :

expp : TpM →M, v 7→ γp,v(1).

This is a differentiable map and its derivative at v = 0 is the identity trans-formation on the tangent space TpM . We also define the geodesic flow on thetangent bundle:

f t : TM → TM, (p, v) 7→ (γp,v(t), γ′p,v(t)).


Most of the times, one considers the restriction of the geodesic flow to the unittangent bundle T 1M = (p, v) ∈ TM : ‖v‖p = 1. This is well defined since, aswe mentioned before, the norm of the velocity vector of any geodesic is constant.

A.4.6 Exercises

A.4.1. Check that every set X with the cardinality of R may be endowed withthe structure of a differentiable manifold of class C∞ and dimension d, for anyd ≥ 1.

A.4.2. Consider the differentiable manifolds M = (R,A) and N = (R,B),where A is the atlas consisting of the map φ(x) = x and B is the atlas consistingof the map ψ(x) = x3. The map f : M → N defined by f(x) = x is adiffeomorphism between these manifolds?

A.4.3. A topological space is path-connected if any two points are connectedby some continuous curve. Show that every (connected) manifold is path-connected.

A.4.4. For each d ≥ 2, the projective space of dimension d is the set Pd of allsubspaces of Rd+1 with dimension 1. Equivalently, Pd is the quotient space ofRd+1 \ 0 for the equivalence relation defined by:

(x0, . . . , xd) ∼ (y0, . . . , yd) ⇔ there exists c 6= 0 such that xi = cyi for every i.

Show that the family of maps ϕi : Ui → Rd, i = 0, . . . , d defined by

Ui = [x0 : · · · : xd] ∈ Pd : xi 6= 0

(where [x0 : · · · : xd] denotes the equivalence class of (x0, . . . , xd)) and

ϕi([x0 : · · · : xd]) =(x0

xi, . . . ,

xi−1

xi,xi+1

xi, . . . ,

xdxi

),

constitutes an atlas of class C∞ and dimension d for Pd.

A.4.5. Check the claims in Example A.4.4.

A.4.6. Let M and N be two compact (connected) manifolds with the samedimension. A map f : M → N of class C1 is a local diffeomorphism if thederivative Df(x) : TxM → Tf(x)N is an isomorphism for every x ∈ M . Showthat in that case there exists an integer number k ≥ 1 such that every y ∈ Mhas exactly k pre-images:

#f−1(y) = k for every y ∈ N .

[Observation: The number k is called degree of f and is denoted degree(f).]

A.4.7. Consider on R+ = x ∈ R : x > 0 the Riemannian metric defined byu ·x v = uv/x2. Calculate the distance d(a, b) between any two points a, b ∈ R+.


A.4.8. Let M and N be submanifolds of Rm+n with dimM = m and dimN =n. Show that there exists a set Z ⊂ Rm+n with volume zero such that, for everyv in the complement of Z, the translate M + v is transverse to N :

Tx(M + v) + TxN = Rd for every x ∈ (M + v) ∩N.

A.4.9. Show that if M is a manifold of class Cr then the canonical projectionπ : TM →M is a map of class Cr−1.

A.5 Lp(µ) spaces

In this appendix we review certain Banach spaces formed by functions with spe-cial integrability properties. Throughout, (X,B, µ) is a measure space. Recallthat a Banach space is a vector space endowed with a norm relative to which thespace is complete. We also state some properties of the norms in these spaces.

A.5.1 Lp(µ) spaces with 1 ≤ p < ∞Given any p ∈ [1,∞), we say that a function f : X → C is p-integrable withrespect to µ if the function |f |p is integrable with respect to µ. For p = 1 thisis the same as saying that the function f is integrable (Definition A.2.4 andProposition A.2.7).

Definition A.5.1. We denote by Lp(µ) the set of all complex functions p-integrable with respect to µ, modulo the equivalence relation that identifies anytwo functions that are equal at µ-almost every point.

Note that if the measure µ is finite, which is the case in most of our examples,then all bounded measurable functions are in Lp(µ):

∫|f |p dµ ≤ (sup |f |)pm(X) <∞.

In particular, ifX is a compact topological space then every continuous functionis in Lp(µ). In other words, the space C0(X) of all continuous functions iscontained in Lp(µ) for every p ≥ 1.

For every function f ∈ Lp(µ), define the Lp-norm of f by:

‖f‖p =

(∫|f |p dµ

) 1p

.

The next theorem asserts that ‖ · ‖p turns Lp(µ) into a Banach space:

Theorem A.5.2. The set Lp(µ) is a complex vector space. Moreover, ‖ · ‖p isa norm in Lp(µ) and this norm is complete.

The most interesting part of the proof of this theorem is to establish thetriangle inequality, which in this context is known as the Minkowski inequality:

A.5. LP (µ) SPACES 479

Theorem A.5.3 (Minkowski inequality). Let f, g ∈ Lp(µ). Then:

(∫|f + g|p dµ

) 1p

≤(∫

|f |p dµ) 1

p

+

(∫|g|p dµ

) 1p

.

In Exercises A.5.2 and A.5.5 we invite the reader to prove the Minkowskiinequality and to complete the proof of Theorem A.5.2.

A.5.2 Inner product in L2(µ)

The case p = 2 deserves special attention. The reason is that the norm ‖ · ‖2

introduced in the previous section arises from an (Hermitian) inner product.Indeed, consider:

f · g =

∫f g dµ. (A.5.1)

It follows from the properties of the Lebesgue integral that this expression doesdefine an inner product on L2(µ). Moreover, this product gives rise to the norm‖ · ‖2 through:

‖f‖2 = (f · f)1/2.

In particular, we have the Cauchy-Schwarz inequality:

Theorem A.5.4 (Cauchy-Schwarz Inequality). For every f, g ∈ L2(µ) we havethat f g ∈ L1(µ) and

|∫f g dµ| ≤

∫|f g| dµ ≤

( ∫|f |2 dµ

)1/2(∫

|g|2 dµ)1/2

.

This inequality has the following interesting consequence. Assume that themeasure µ is finite and consider any f ∈ L2(µ). Then, taking g ≡ 1,

∫|f | dµ =

∫|f g| dµ ≤

( ∫|f |2 dµ

)1/2(∫

1 dµ)1/2

<∞. (A.5.2)

This proves that every function in L2(µ) is also in L1(µ). In fact, when themeasure µ is finite one has Lp(µ) ⊂ Lq(µ) whenever p ≥ q (Exercise A.5.3).

The next result is a generalization of the Cauchy-Schwarz inequality for allvalues of p > 1:

Theorem A.5.5 (Holder inequality). Given 1 < p <∞, consider q > 1 definedby the relation 1

p + 1q = 1. Then, for every f ∈ Lp(µ) and every g ∈ Lq(µ), we

have that f g ∈ L1(µ) and

∫|f g| dµ ≤

(∫|f |p dµ

) 1p(∫

|g|q dµ) 1

q

.


A.5.3 Space of essentially bounded functions

Next, we extend the definition of Lp(µ) to the case p = ∞. For that we needthe following notion. We say that a function f : X → C is essentially boundedwith respect to µ if there exists some constant K > 0 such that |f(x)| ≤ Kat µ-almost every point. Then, the infimum of all such constants K is calledessential supremum of f and is denoted by supessµ(f).

Definition A.5.6. We denote by L∞(µ) the set of all complex functions essen-tially bounded with respect to µ, identifying any two functions that coincide atµ-almost every point.

We endow L∞(µ) with the following norm:

‖f‖∞ = supessµ(f).

The conclusion of Proposition A.5.2 remains valid for p = ∞ (Exercise A.5.5):the space L∞(µ) is a Banach space for the norm ‖ · ‖∞. Clearly, if µ is a finitemeasure then L∞(µ) ⊂ Lp(µ) for any p ≥ 1.

The dual of a complex Banach space E is the space E∗ of all continuouslinear functionals φ : E → C, endowed with the norm

‖φ‖ = sup |φ(v)|

‖v‖ : v ∈ E \ 0. (A.5.3)

The Holder inequality (Theorem A.5.5) leads to the following explicit charac-terization of the dual space of Lp(µ) for every p <∞:

Theorem A.5.7. For each p ∈ [1,∞) consider q ∈ (1,∞] defined by the relation1p + 1

q = 1. The map Lq(µ) → Lp(µ)∗ defined by g 7→[f 7→

∫fg dµ

]is an

isomorphism and an isometry between Lq(µ) and the dual space of Lp(µ).

This statement is false for p = ∞: in general, the dual space of L∞(µ) is notisomorphic to L1(µ).

A.5.4 Convexity

We say that a function φ : I → R defined on an interval I of the real line isconvex if

φ(tx + (1 − t)y) ≤ tφ(x) + (1 − t)φ(y)

for every x, y ∈ I and t ∈ [0, 1]. Moreover, we say that φ is concave if −φis convex. For functions that are twice differentiable we have the followingpractical criterion (Exercise A.5.1): φ is convex if φ′′(x) ≥ 0 for every x ∈ I andit is concave if φ′′(x) ≤ 0 for every x ∈ I.

Theorem A.5.8 (Jensen inequality). Let φ : I → R be a convex function. If µis a probability measure on X and f ∈ L1(µ) is such that

∫f dµ ∈ I, then:

φ

(∫f dµ

)≤∫φ f dµ.

A.5. LP (µ) SPACES 481

Example A.5.9. For any probability measure µ and any integrable positivefunction f , we have

log

∫f dµ ≥

∫log f dµ.

Indeed, this corresponds to the Jensen inequality for the function φ : (0,∞) → Rgiven by φ(x) = − logx. Note that φ is convex: φ′′(x) = 1/x2 > 0 for every x.

Example A.5.10. Let φ : R → R be a convex function, (λi)i be a sequenceof non-negative real numbers satisfying

∑∞i=1 λi ≤ 1 and (ai)i be a bounded

sequence of real numbers. Then,

φ

(∞∑

i=1

λiai

)≤

∞∑

i=1

λiφ(ai). (A.5.4)

This may be seen as follows. Consider X = [0, 1] endowed with the Lebesguemeasure µ. Let f : [0, 1] → R be a function of the form f =

∑∞i=1 aiXEi , where

the Ei are pairwise disjoint measurable sets such that µ(Ei) = λi. The Jenseninequality applied to this function f gives precisely the relation (A.5.4).

A.5.5 Exercises

A.5.1. Consider any function ϕ : (a, b) → R. Show that if ϕ is twice differ-entiable and ϕ

′′ ≥ 0 then ϕ is convex. Show that if ϕ is convex then it iscontinuous.

A.5.2. Consider p, q > 1 such that 1/p+ 1/q = 1. Prove:

(a) The Young inequality: ab ≤ ap/p+ aq/q for every a, b > 0.

(b) The Holder inequality (Theorem A.5.5).

(c) The Minkowski inequality (Theorem A.5.3).

A.5.3. Show that if µ is a finite measure then we have Lq(µ) ⊂ Lp(µ) for every1 ≤ p < q ≤ ∞.

A.5.4. Let µ be a finite measure and f ∈ L∞(µ) be different from zero. Showthat

‖f‖∞ = limn

∫|f |n+1 dµ∫|f |n dµ .

A.5.5. Show that a normed vector space (V, ‖·‖) is complete if and only if everyseries

∑k vk that is absolutely summable (meaning that

∑k ‖vk‖ converges) is

convergent. Use this fact to show that if µ is a probability measure then ‖ · ‖pis a complete norm on Lp(µ) for every 1 ≤ p ≤ ∞.

A.5.6. Show that if µ is a finite measure and 1/p+ 1/q = 1 with 1 ≤ p < ∞then the map Φ : Lq(µ) → Lp(µ)∗, Φ(g)f =

∫fg dµ is an isomorphism and an

isometry.


A.5.7. Show that if X is a metric space then, given any Borel probabilitymeasure µ, the set C0(X) of all continuous functions is dense in Lp(µ) for every1 ≤ p ≤ ∞. Indeed, the same holds for the subset of all uniformly continuousbounded functions.

A.5.8. Let f, g : X → R be two positive measurable functions such thatf(x)g(x) ≥ 1 for every x. Show that

∫f dµ

∫g dµ ≥ 1 for every probability

measure µ.

A.6 Hilbert spaces

Let H be a vector space, real or complex. An (Hermitian) inner product on His a map (u, v) 7→ u · v from H ×H to the scalars field (R or C, respectively)satisfying: for any u, v, w ∈ H and any scalar λ,

1. (u + w) · v = u · v + w · v and u · (v + w) = u · v + u · w;

2. (λu) · v = λ(u · v) and u · (λv) = λ(u · v);

3. u · v = v · u;

4. u · u ≥ 0 and u · u = 0 if and only if u = 0.

Then we can define the norm of a vector u ∈ H to be ‖u‖ = (u · u)1/2.A Hilbert space is a vector space endowed with an inner product whose norm

‖ · ‖ is complete: relative to ‖ · ‖ every Cauchy sequence is convergent. Thus,in particular, (H, ‖ · ‖) is a Banach space. A standard example of a Hilbertspace is the space L2(µ) of square-integrable functions that we introduced inAppendix A.5.2.

Given v ∈ H and any family (vα)α of vectors of H , we say that v =∑α vα

if for every ε > 0 there exists a finite set I such that

‖v −∑

β∈J

vβ‖ ≤ ε for every finite set J ⊃ I.

Given any family (Hα)α of subspaces of H , the set of all vectors of the formv =

∑α vα with vα ∈ Hα for every α is a subspace of H (see Exercise A.6.2).

It is called the sum of the family (Hα)α and it is denoted by∑

αHα.

A.6.1 Orthogonality

Let H be a Hilbert space. Two vectors u, v ∈ H are said to be orthogonal ifu · v = 0. We call a subset of H orthonormal if its elements have norm 1 andare pairwise orthogonal.

A Hilbert basis of H is an orthonormal subset B = vβ such that the setof all (finite) linear combinations of elements of B is dense in H . For example,the Fourier basis

x 7→ e2πikx : k ∈ Z (A.6.1)

A.6. HILBERT SPACES 483

is a Hilbert basis of the space L2(m) of all measurable functions on the unitcircle whose square is integrable with respect to the Lebesgue measure.

A Hilbert basis B = vβ is usually not a basis of the vector space, in theusual sense (Hammel basis): it is usually not true that every vector of H is afinite linear combination of the elements of B. However, every v ∈ H may bewritten as an infinite linear combination of the elements of the Hilbert basis:

v =∑

β

(v · vβ)vβ and, moreover, ‖v‖2 =∑

β

|v · vβ |2.

In particular, v · vβ = 0 except, possibly, for a countable subset of values of β.Every orthonormal subset of H may be extended to a Hilbert basis. In

particular, Hilbert bases always exist. Moreover, any two Hilbert bases havethe same cardinal, which is called the Hilbert dimension of H . The Hilbertdimension depends monotonically on the space: if H1 is a subspace of H2 thendimH1 ≤ dimH2. We say that two Hilbert spaces are isometrically isomorphicif there exists some isomorphism between the two that also preserves the innerproduct. A necessary and sufficient condition is that the two spaces have thesame Hilbert dimension.

A Hilbert space is said to be separable if it admits some countable subsetthat is dense for the topology defined by the norm. This happens if and only ifthe Hilbert dimension is either finite or countable. In particular, all separableHilbert spaces with infinite Hilbert dimension are isometrically isomorphic. Forthis reason, one often finds in the literature (especially in the area of Mathe-matical Physics) mentions to the Hilbert space, as if there were only one.

Given any family (Hα)α of Hilbert spaces, we denote by ⊕αHα their or-thogonal direct sum, that is, the vector space of all (vα)α ∈ ∏αHα such that∑α ‖vα‖2

α < ∞ (this implies that vα = 0 except, possibly, for a countable setof values of α), endowed with the inner product

(vα)α · (wα)α =∑

α

vαwα.

The orthogonal complement of a subset S of a Hilbert space H is the set S⊥

of all the vectors of H that are orthogonal to every vector of S. It is easy tosee that S⊥ is a closed subspace of H (Exercise A.6.7). If S itself is a closedsubspace of H then S = (S⊥)⊥ and every vector v ∈ H may be decomposed as asum v = s+s⊥ of some s ∈ S and some s⊥ ∈ S⊥. Moreover, this decompositionis unique and the vectors s and s⊥ are the elements of S and S⊥, respectively,that are closest to v.

A.6.2 Duality

A linear functional on a Hilbert space H (or, more generally on a Banach space)is a linear map from H to the scalars field (R or C). It is said to be bounded if

‖φ‖ = sup |φ(v)|

‖v‖ : v 6= 0<∞.


This is equivalent to saying that the linear functional is continuous, relative tothe topology defined by the norm of H (see Exercise A.6.3). The dual spaceof a Hilbert space H is the vector space H∗ formed by all the bounded linearfunctionals. The function φ 7→ ‖φ‖ is a complete norm on H∗ and, hence, itendows the dual with the structure of a Banach space. The map

h : H → H∗, w 7→[v 7→ v · w

](A.6.2)

is a bijection between the two spaces and it preserves the norms. In particular,h is a homeomorphism. Moreover, it satisfies h(w1 + w2) = h(w1) + h(w2) andh(λw) = λh(w).

The weak topology in H is the smallest topology relative to which all thelinear functionals v 7→ v · w are continuous. In terms of sequences, it can becharacterized as follows:

(wn)n → w weakly ⇔ (v · wn)n → v · w for every v ∈ H.

The weak∗ topology in the dual space H∗ is the smallest topology relative towhich φ 7→ φ(v) is continuous for every v ∈ H .

It is known from the theory of Banach spaces (theorem of Banach-Alaoglu)that every bounded closed subset of the dual space is compact for the weak∗

topology. In the special case of Hilbert spaces, the weak topology in the spaceH is homeomorphic to the weak∗ topology in the dual space H∗: the map hin (A.6.2) is a homeomorphism also for these topologies. Since h preserves theclass of bounded sets, it follows that the weak topology in the space H itselfenjoys the property in the theorem of Banach-Alaoglu:

Theorem A.6.1 (Banach-Alaoglu). Every bounded closed subset of a Hilbertspace H is compact for the weak topology in H.

A linear operator L : H1 → H2 between two Hilbert spaces is continuous (orbounded) if

‖L‖ = sup |L(v)|

‖v‖ : v 6= 0

is finite. The adjoint of a continuous linear operator is the linear operatorL∗ : H2 → H1 defined by

v · Lw = L∗v · w for every v, w ∈ H.

The adjoint operator is continuous, with ‖L∗‖ = ‖L‖ and ‖L∗L‖ = ‖LL∗‖ =‖L‖2. Moreover, (L∗)∗ = L and (L1 + L2)

∗ = L∗1 + L∗

2 and (λL)∗ = λL∗ (inExercise A.6.5 we invite the reader to prove these facts).

A continuous linear operator L : H → H is self-adjoint if L = L∗. Moregenerally, L is normal if it satisfies L∗L = LL∗. We are especially interested inthe case when L is unitary, that is, L∗L = id = LL∗. We call linear isometryto every linear operator L : H → H such that L∗L = id . Hence, the unitaryoperators are the linear isometries that are also normal operators.

A.6. HILBERT SPACES 485

A.6.3 Exercises

A.6.1. Let H be a Hilbert space. Prove:

1. That every ball (either open or closed) is a convex subset of H .

2. The parallelogram identity: ‖v + w‖2 + ‖v − w‖2 = ‖v‖2 + ‖w‖2 for anyv, w ∈ H .

3. The polarization identity: 4(v · w) = ‖v + w‖2 − ‖v − w‖2 (real case) or4(v ·w) = (‖v+w‖2−‖v−w‖2)+ i(‖v+ iw‖2−‖v− iw‖2) (complex case).

A.6.2. Show that, given any family (Hα)α of subspaces of a Hilbert space H ,the set of all the vectors of the form v =

∑α vα with vα ∈ Hα for every α is a

vector subspace of H .

A.6.3. Show that a linear operator L : E1 → E2 between two Banach spacesis continuous if and only if there exists C > 0 such that ‖L(v)‖2 ≤ C‖v‖1 forevery v ∈ E1, where ‖ · ‖i denotes the norm in the space Ei (we say that L is abounded operator).

A.6.4. Consider the Hilbert space L2(µ). Let V be the subspace formed by theconstant functions. What is the orthogonal complement of V ? Determine the(orthogonal) projection to V of an arbitrary function g ∈ L2(µ).

A.6.5. Prove that if L : H → H is a bounded operator on a Hilbert space Hthen the adjoint operator L∗ is also bounded and ‖L∗‖ = ‖L‖ and ‖L∗L‖ =‖LL∗‖ = ‖L‖2 and (L∗)∗ = L.

A.6.6. Show that if K is a closed convex subset of a Hilbert space then forevery z ∈ H there exists a unique v ∈ K such that ‖z − v‖ = d(z,K).

A.6.7. Let S be a subspace of a Hilbert space H . Prove that:

1. The orthogonal complement S⊥ of S is a closed subspace of H and itcoincides with the orthogonal complement of the closure S. Moreover,(S⊥)⊥ = S.

2. Every v ∈ H may be written, in a unique fashion, as a sum v = s+ s⊥ ofsome s ∈ S and some s⊥ ∈ S⊥. The two vectors s and s⊥ are the elementsof S and S⊥ that are closest to v.

A.6.8. Let E be a closed subspace of a Hilbert space H . Show that E is alsoclosed in the weak topology. Moreover, U(E) is a closed subspace of H , forevery isometry U : H → H .

A.6.9. Show that a linear operator L : H → H on a Hilbert space H is anisometry if and only if ‖L(v)‖ = ‖v‖ for every v ∈ H . Moreover, L is a unitaryoperator if and only if L is an isometry and is invertible.


A.7 Spectral theorems

Let H be a complex Hilbert space. The spectrum of a continuous linear operatorL : H → H is the set spec(L) of all numbers λ ∈ C such that L − λ id isnot an isomorphism. The spectrum is closed and it is contained in the closeddisk of radius ‖L‖ around 0 ∈ C. In particular, spec(L) is a compact subsetof the complex plane. When H has finite dimension, spec(L) consists of theeigenvalues of L, that is, the complex numbers λ such that L − λ id is notinjective. In general, the spectrum is strictly larger than the set of eigenvalues(see Exercise A.7.2).

A.7.1 Spectral measures

By definition, a projection in H is a continuous linear operator P : H → H thatis idempotent (P 2 = P ) and self-adjoint (P ∗ = P ). Then the image and thekernel of P are closed subspaces of H and they are orthogonal complements toeach other. In fact, the image coincides with the set of all fixed points of P .

Consider any map E associating with each measurable subset of the planeC a projection in H . Such a map is called a spectral measure if it satisfiesE(C) = id and

E( ⋃

n∈N

Bn)

=∑

n∈N

E(Bn)

whenever the Bn are pairwise disjoint (σ-additivity). Then, given any v, w ∈ H ,the function

Ev · w : B 7→ E(B)v · w (A.7.1)

is a complex measure in C. Clearly, it depends on the pair (v, w) in a bilinearfashion.

We call support of a spectral measure E the set suppE of all the points z ∈ Csuch that E(V ) 6= 0 for every neighborhood V of z. Note that the support isalways a closed set. Moreover, the support of the complex measure Ev · w iscontained in suppE for every v, w ∈ H .

Example A.7.1. Consider λ1, . . . , λs ⊂ C and let V1, . . . , Vs be a finite familyof subspaces of Cd, pairwise orthogonal and such that Cd = V1 ⊕ · · · ⊕ Vs. Foreach set J ⊂ 1, . . . , s, denote by PJ the projection in Cd whose image is⊕j∈JVj . For each measurable set B ⊂ C define

E(B) : Cd → Cd, E(B) = PJ(B)

where J(B) is the set of all j ∈ 1, . . . , s such that λj ∈ B. The function E isa spectral measure.

Example A.7.2. Let µ be a probability measure in C and H = L2(µ) be thespace of all complex functions whose square is integrable with respect to µ. Foreach measurable set B ⊂ C, let

E(B) : L2(µ) → L2(µ), ϕ 7→ XBϕ.

A.7. SPECTRAL THEOREMS 487

Each E(B) is a projection and the function E is a spectral measure.

The next lemma collects a few simple properties of the spectral measures:

Lemma A.7.3. Let E be a spectral measure and A,B be measurable subsets ofC. Then:

1. E(∅) = 0 and E(suppE) = id ;

2. if A ⊂ B then E(A) ≤ E(B) and E(B \A) = E(B) − E(A);

3. E(A ∪B) + E(A ∩B) = E(A) + E(B);

4. E(A)E(B) = E(A ∩B) = E(B)E(A).

In what follows we always assume that E is a spectral measure with compactsupport. Then the support of every complex measure Ev · w is also compact.Consequently, the integral

∫z d(E(z)v · w) is well defined and it is a bilinear

function of (v, w). Hence, there exists a bounded linear operator L : H → Hsuch that

Lv · w =

∫z d(E(z)v · w) for every v, w ∈ H . (A.7.2)

We write, in shorter form:

L =

∫z dE(z). (A.7.3)

More generally, given any bounded measurable function ψ in the support of thespectral measure E, there exists a bounded linear operator ψ(L) : H → H thatis characterized by

ψ(L)v · w =

∫ψ(z) d(E(z)v · w) for every v, w ∈ H. (A.7.4)

We write

ψ(L) =

∫ψ(z) dE(z). (A.7.5)

Lemma A.7.4. Let E be a spectral measure with compact support. Givenbounded measurable functions ϕ, ψ and numbers α, β ∈ C,

(1)∫(αϕ + βψ)(z) dE(z) = α

∫ϕdE(z) + β

∫ψ dE(z);

(2)∫ϕ(z) dE(z) =

( ∫ϕ(z) dE(z)

)∗;

(3)∫(ϕψ)(z) dE(z) =

( ∫ϕ(z) dE(z)

)( ∫

ψ(z) dE(z))

In particular, by part (3) of this lemma,

Lj =( ∫

z dE(z))j

=

∫zj dE(z) for every j ∈ N. (A.7.6)


Analogously, using also part (2) of the lemma,

LL∗ =( ∫

z dE(z))( ∫

z dE(z))

=

∫|z|2 dE(z)

=( ∫

z dE(z))( ∫

z dE(z))

= L∗L.

(A.7.7)

Consequently, the linear operator defined by (A.7.3) is normal. Conversely, thespectral theorem asserts that every normal operator may be written in this way:

Theorem A.7.5 (Spectral). For every normal operator L : H → H there existsa spectral measure E such that L =

∫z dE(z). This measure is unique and its

support coincides with the spectrum of L. In particular, L is unitary if and onlyif suppE is contained in the unit circle z ∈ C : |z| = 1.Example A.7.6 (Spectral theorem in finite dimension). Let H be a complexHilbert space with finite dimension. Then for every normal operator L : H → Hthere exists a basis of H formed by eigenvectors of L. Let λ1, . . . , λs be theeigenvalues of L. The eigenspaces Vj = ker(L − λj id ) are pairwise orthogonal,because L is normal. Moreover, by Theorem A.7.5, the direct sum ⊕sj=1Vj isthe whole H . So

L =

s∑

j=1

λjπj

where πj denotes the orthogonal projection to Vj . In other words, the spectralmeasure E of the operator L is given by E(λj) = πj for every j = 1, . . . , sand E(B) = 0 if B contains no eigenvalue of L.

Example A.7.7. Let (σα)α∈A be any family of finite measures in the unit circlez ∈ C : |z| = 1. Consider H = ⊕α∈AL

2(σα) and the linear operator

L : H → H, (ϕα)α 7→ (z 7→ zϕα(z))α.

Consider the spectral measure E given by

E(B) : H → H, (ϕα)α 7→ (XBϕα)α

(compare with Example A.7.2). Then, L =∫z dE(z). Indeed, the definition of

E gives that Eϕ · ψ =∑α ϕαψα σα for every ϕ = (ϕα)α and ψ = (ψα)α in the

space H . Then,

Lϕ · ψ =∑

α

∫zϕα(z)ψα(z) dσα(z) =

∫z d(E(z)ϕ · ψ) (A.7.8)

for every ϕ, ψ.

We say that λ ∈ C is an atom of the spectral measure if E(λ) 6= 0 or,equivalently, if there exists some non-zero vector ω ∈ H such that E(λ)ω 6= 0.The proof of the next proposition is outlined in Exercise A.7.4.

Proposition A.7.8. Every eigenvalue of L is an atom of the spectral measureE. Conversely, if λ is an atom of E then λ is an eigenvalue of the operator Land every non-zero vector of the form v = E(λ)ω is an eigenvector.

A.7. SPECTRAL THEOREMS 489

A.7.2 Spectral representation

Theorem A.7.5 shows that normal linear operators on a Hilbert space are essen-tially the same thing as spectral measures in that space. Theorems of this type,establishing a kind of dictionary between two classes of objects that a priorido not seem to be related, are among the most fascinating results in Mathe-matics. Of course, just how useful such a dictionary is to study one of thoseclasses (normal linear operators, say) depends on to what extent we are capableof understanding the other one (spectral measures, in this case). In the presentsituation this is handled, in a most satisfactory way, by the next result, whichexhibits a canonical form (inspired by Example A.7.2) in which every normallinear operator may be written.

As before, we use ⊕ to denote the orthogonal direct sum of Hilbert spaces.Given any cardinal χ, finite or infinite, and a Hilbert space V , we denote by V χ

the orthogonal direct sum of χ copies of V .

Theorem A.7.9 (Spectral representation). Let L : H → H be a normal linearoperator. Then there exist mutually singular finite measures (σj)j with sup-port in the spectrum of L, there exist cardinals (χj)j and there exists a unitaryoperator U : H → ⊕jL2(σj)

χj , such that the conjugate ULU−1 = T is given by:

T :⊕

j

L2(σj)χj →

⊕

j

L2(σj)χj , (ϕj,l)j,l 7→

(z 7→ zϕj,l(z)

)j,l. (A.7.9)

We call (A.7.9) the spectral representation of the normal operator L. Let uspoint out that the measures σj in Theorem A.7.9 are not uniquely determined.However, the spectral representation is unique, in the following sense.

Call multiplicity function of the operator L the function associating witheach finite measure θ in C the smallest cardinal χj such that the measures θand σj are not mutually singular. One can prove that this function is uniquelydetermined by the operator L, that is, it does not depend on the choice of themeasures σj in the statement. Moreover, two normal operators are conjugate bysome unitary operator if and only if they have the same multiplicity function.

Example A.7.10 (Spectral representation in finite dimension). Let us go backto the setting of Example A.7.6. For each j = 1, . . . , s, let σj be the Dirac massat the eigenvalue λj and χj be the dimension of the eigenspace Vj . Note thatthe space L2(σj) has dimension 1. Hence, we may choose a unitary operatorUj : Vj → L2(σj), for each j = 1, . . . , s. Since L = λj id restricted to Vj , wehave that Tj = UjLU

−1j = λj id , that is,

Tj :((ϕα)α

)7→(z 7→ λjϕα(z)

)α

=(z 7→ zϕα(z)

)α.

In this way, we have found a unitary operator

U : Cd →s⊕

j=1

L2(σj)χj .

such that T = ULU−1 is a spectral representation of L.


A.7.3 Exercises

A.7.1. Let T : E → E be a Banach space isomorphism, that is, a continuouslinear bijection whose inverse is also continuous. Show that T +H is a Banachspace isomorphism for every continuous linear map H : E → E such that‖H‖ ‖T−1‖ < 1. Use this fact to prove that the spectrum of every continuouslinear operator L : E → E is a closed set and is contained in the closed disk ofradius ‖L‖ around the origin.

A.7.2. Show that if L : H → H is a linear operator in a Hilbert space Hwith finite dimension then spec(L) consists of the eigenvalues of L, that is,the complex numbers λ for which L − λid is not injective. Give an example,in infinite dimension, such that the spectrum is strictly larger than the set ofeigenvalues.

A.7.3. Prove Lemma A.7.3.

A.7.4. Prove Proposition A.7.8, along the following lines:

1. Assume that Lv = λv for some v 6= 0. Consider the functions

ϕn(z) =

(z − λ)−1 if |z − λ| > 1/n0 otherwise.

Show that ϕn(L)(L−λid ) = E(z : |z−λ| > 1/n) for every n. Concludethat E(λ)v = v and, consequently, λ is an atom of E.

2. Assume that there exists w ∈ H such that v = E(λ)w is non-zero. Showthat, given any measurable set B ⊂ C,

E(B)v =

v if λ ∈ B0 if λ /∈ B.

Conclude that Lv = λv and, consequently, λ is an eigenvalue of L.

A.7.5. Let (σj)j be the family of measures given in Theorem A.7.9. Given anymeasurable set B ⊂ C, check that E(B) = 0 if and only if σj(B) = 0 for everyj. Therefore, given any measure η in C, we have that E ≪ η if and only ifσj ≪ η for every j.

DR

AFT

Hints or solutions for

selected exercises

1.1.2. Use Exercise A.3.5 to approximate characteristic functions by continuous functions.

1.2.5. Show that if N > 1/µ(A), then there exists j ∈ VA with 0 ≤ j ≤ N . Adaptingthe proof of the previous statement, conclude that is K is a set of non-negative integerswith #K > 1/µ(A), then we may find k1, k2 ∈ K and n ∈ VA such that n = k1 − k2.That is, the set K − K = k1 − k2; k1, k2 ∈ K intersects VA. To conclude that S issyndetic assume, by contradiction, that for every n ∈ N there is some number ln such thatln, ln + 1, . . . , ln + n ∩ VA = ∅. Consider an element k1 /∈ VA and construct, recursively, asequence kj+1 = lkj

+kj . Prove that the set K = k1, . . . , kN is such that (K−K)∩VA = ∅.1.2.6. Otherwise, there exists k ≥ 1 and b > 1 such that the set B = x ∈ [0, 1] : n|fn(x) −x| > b for every n ≥ k has positive measure. Let a ∈ B be a density point of B. ConsiderE = B ∩ B(a, r), for r small. Get a lower estimate for the return time to E of any point ofx ∈ E and use the Kac theorem to reach a contradiction.

1.3.5. Consider the sequence log10 an, where log10 denotes the base 10 logarithm, and observethat log10 2 is an irrational number.

1.3.12. Consider orthonormal bases v1, . . . , vd, at x, and w1, . . . , wd, at f(x), such thatv1 and w1 are orthogonal to Hc. Check that gradH(f(x)) ·Df(x)v = gradH(x) · v for everyv. Deduce that the matrix of Df(x) with respect to those bases has the form

Df(x) =

0

B

B

@

α 0 · · · 0β2 γ2,2 · · · γ2,d· · · · · · · · · · · ·βd γd,2 · · · γd,d

1

C

C

A

with ‖ gradH(f(x))‖ |α| = ‖ gradH(x)‖. Note that Γ = (γi,j)i,j is the matrix of D(f | Hc)and observe that |det Γ| = ‖ gradH(x)‖/‖ gradH(f(x))‖. Using the formula of change ofvariables, conclude that f | Hc preserves the measure ds/‖ gradH‖.1.4.4. Choose a set E ⊂M with measure less than ε/n and, for each k ≥ 1, let Ek be the setof points x ∈ E that return to E in exactly k iterates. Take for B the union of the sets Ekwith k ≥ n, of the n-th iterates of the sets Ek with k ≥ 2n, and so on. For the second part,observe that if (f, µ) is aperiodic then µ can not have atoms.

1.4.5. By assumption, fτ (y) ∈ Hn−τ(y) whenever y ∈ Hn with n > τ(y). Therefore,

T (y) ∈ H if y ∈ H. Consider An = 1 ≤ j ≤ n : x ∈ Hj and Bn = l ≥ 1 :Pli=0 τ(T

i(x)) ≤n. Show, by induction, that #An ≤ #Bn and deduce that lim supn#Bn/n ≥ θ. Now

suppose that lim infk(1/k)Pk−1i=0 τ(T

i(x)) > (1/θ). Show that there exists θ0 < θ such that#Bn < θ0n, for every n sufficiently large. This contradicts the previous conclusion.

1.5.5. Observe that the maps f, f2, . . . , fk commute with each other and then use thePoincare multiple recurrence theorem.

1.5.6. By definition, the complement of Ω(f1, . . . , fq)c is an open set. The Birkhoff multiplerecurrence theorem ensures that the non-wandering set is non-empty.

491

DR

AFT

492 HINTS OR SOLUTIONS FOR SELECTED EXERCISES

2.1.6. Consider the imageV∗µ of the measure µ under V . Check that V∗µ((a, b]) = F (b)−F (a)for every a < b. Consequently, V∗µ(b) = F (b) − lima→b F (a). Therefore, (−∞, b] is acontinuity set for V∗µ if and only if b is continuity point for F . Using Theorem 2.1.2, itfollows that if (Vk∗µ)k converges to V∗µ in the weak∗ topology then (Vk)k converges toV in distribution. Conversely, if (Vk)k converges to V in distribution then Vk∗µ((a, b]) =Fk(b) − Fk(a) converges to F (b) − F (a) = V∗µ((a, b]), for any continuity points a < b of F .Observing that such intervals (a, b] generate the Borel σ-algebra of the real lines, concludethat (Vk∗µ)k converges to V∗µ in the weak∗ topology.

2.1.8. (Billingsley [Bil68]) Use the hypothesis to show that if (Un)n is an increasing sequenceof open subsets of M such that ∪nUn = M then, for every ε > 0 there exists n such thatµ(Un) ≥ 1 − ε for every µ ∈ K. Next, imitate the proof of Proposition A.3.7.

2.2.2. For the first part of the statement use induction in q. The case q = 1 corresponds toTheorem 2.1. Consider continuous transformations fi : M → M , 1 ≤ i ≤ q + 1 commutingwith each other. By the induction hypothesis, there exists a probability ν invariant underfi for 1 ≤ i ≤ q. Define µn = (1/n)

Pn−1j=0 (fq+1)j∗(ν). Note that (fi)∗µn = µn for every

1 ≤ i ≤ q and every n. Hence, every accumulation point of (µn)n is invariant under every fi,1 ≤ i ≤ q. By compactness, there exists some accumulation point µ ∈ M1(M). Check that µis invariant under fq+1. For the second part, denote by Mq ⊂ M1(M) the set of probabilitymeasures invariant under fi, 1 ≤ i ≤ q. Then, (Mq)q is a non-increasing sequence of closednon-empty subsets of M1(M). By compactness, the intersection ∩qMq is non-empty.

2.2.6. Define µ in each iterate fj(W ), j ∈ Z by letting µ(A) = m(f−j(A)) for each measurableset A ⊂ fj(W ).

2.3.2. Clearly, convergence in norm implies weak convergence. To prove the converse, assumethat (xk)k converges to zero in the weak topology but not in the norm topology. The first

condition implies that, for every fixed N , the sumPNn=0 |xkn| converges to zero when k → ∞.

The second condition means that, up to restricting to a subsequence, there exists δ > 0 suchthat ‖xk‖ > δ for every k. Then, there exists some increasing sequence (lk)k such that

lk−1X

n=0

|xkn| ≤1

kbut

lkX

n=0

|xkn| ≥ ‖xk‖ − 1

k≥ δ − 1

kfor every k.

Take an = xkn/|xkn| for each lk−1 < n ≤ lk. Then, for every k,

˛

˛

∞X

n=0

anxkn

˛

˛ ≥X

lk−1<n≤lk

|xkn| −X

n≤lk−1

|xkn| −X

n>lk

|xkn| ≥ ‖xk‖ − 4

k≥ δ − 4

k.

This contradicts the hypotheses. Now take xkn = 1 if k = n and xkn = 0 otherwise. Given any(an)n ∈ c0, we have that

P

n anxkn = ak converges to zero when k → ∞. Therefore, (xk)k

converges to zero in the weak∗ topology. But ‖xk‖ = 1 for every k, hence (xk)k does notconverge to zero in the norm topology.

2.3.6. Take W = U(H)⊥ and V = (L∞n=0 U

n(W ))⊥.

2.3.7. Suppose that there exist tangent functionals T1 and T2 with T1(v) > T2(v) for somev ∈ E. Show that φ(u + tv) + φ(u − tv) − 2φ(u) ≥ t(T1(u) − T2(u)) for every t and deducethat φ is not differentiable in the direction of v.

2.4.1. Consider the set P of all probability measures on X ×M of the form νZ × η. Notethat P is compact in the weak∗ topology and is invariant under the operator F∗.

2.4.2. The condition p g = f p entails fn p = p gn for every n ∈ Z. Using π p = p,it follows that π fn p = p gn for every n ≤ 0. Therefore, p(y) =

`

p(gn(y))´

n≤0. This

proves the existence and uniqueness of p. Now suppose that p is surjective. The hypothesesof compactness and continuity ensure that

`

g−n(p−1(xn))´

n≤0

is a nested sequence of compact sets, for every (xn)n≤0 ∈ M . Take y in the intersection andnote that p(y) = (xn)n≤0.

DR

AFT

493

2.5.2. Fix q and l. Assume that for every n ≥ 1 there exists a partition Sn1 , . . . , Snl ofthe set 1, . . . , n such that no subset of Snj contains an arithmetic progression of length q.

Consider the function φn : N → 1, . . . , l given by φn(i) = j if i ∈ Sj and φn(i) = l if i > n.Take (nk)k → ∞ such that the subsequence (φnk )k converges at every point to some functionφ : N → 1, . . . , l. Consider Sj = φ−1(j) for j = 1, . . . , l. Some Sj contains some arithmeticprogression of length q. Then S

nkj contains that arithmetic progression for every k sufficiently

large.

2.5.4. Consider Σ = 1, ..., lNkwith the distance d(ω, ω′) = 2−N where N ≥ 0 is largest

such that ω(i1, ..., ik) = ω′(i1, ..., ik) for every i1, . . . , ik < N . Note that Σ is a compactmetric space. Given q ≥ 1, let Fq = (a1, . . . , ak) : 1 ≤ ai ≤ q and 1 ≤ i ≤ k. Let e1, . . . , embe an enumeration of the elements of Fq. For each j = 1, . . . , m, consider the shift mapσj : Σ → Σ given by (σjω)(n) = ω(n + ej) for n ∈ Nk . Consider the point ω ∈ Σ defined by

ω(n) = i ⇔ n ∈ Si. Let Z be the closure of σl11 · · ·σlmm (ω) : l1, ..., lm ∈ N. Note that Z isinvariant under the shift maps σj . By the Birkhoff multiple recurrence theorem, there existζ ∈ Z and s ≥ 1 such that d(σsj (ζ), ζ) < 1 for every j = 1, . . . ,m. Let e = (1, . . . , 1) ∈ Nk.

Then, ζ(e) = ζ(e + se1) = · · · = ζ(e + sem). Consider σl11 · · ·σlmm (ω) close enough to ζ thatω(b) = ω(b + se1) = · · · = ω(b + sem), where b = e + l1e1 + · · · + lmem. It follows that ifi = ω(b), then b + sFq ⊂ Si. Given that there are only finitely many sets Si, some of themmust contain infinitely many sets of the type b+ sFq, with q arbitrarily large.

3.1.1. Mimic the proof of Theorem 3.1.6.

3.1.2. Suppose that for every k ∈ N there exists nk ∈ N such that µ(A ∩ f−j(A)) = 0 forevery nk + 1 ≤ j ≤ nk + k. It is no restriction to assume that (nk)k → ∞. Take ϕ = XA.

By Exercise 3.1.1, (1/k)Pnk+kj=nk+1 ϕ · ϕ fj → ϕ · P (ϕ). The left-hand side is identically

zero and the right-hand side is equal to ‖P (ϕ)‖2. Hence, the time average P (ϕ) = 0 and soµ(A) =

R

P (ϕ) dµ = 0.

3.2.3. (a) Consider ε = 1 and let C = sup|ϕ(l)| : |l| ≤ L(1). Given n ∈ Z, fix s ∈ Z suchthat sL(1) < n ≤ (s + 1)L(1). By hypothesis, there exists τ ∈ sL(1) + 1, . . . , (s + 1)L(1)such that |ϕ(k+ τ) − ϕ(k)| < 1 for every k ∈ Z. Take k = n− τ and observe that |k| ≤ L(1).It follows that |ϕ(n)| < 1+C. (b) Take ρε > 2L(ε) sup |ϕ|. For every n ∈ Z there exists someε-quasi-period τ = nρ+ r with 1 ≤ r ≤ L(ε). Then,

˛

˛

(n+1)ρX

j=nρ+1

ϕ(j) −ρ−rX

j=1−r

ϕ(j)˛

˛ < ρε and˛

˛

ρ−rX

j=1−r

ϕ(j) −ρ

X

j=1

ϕ(j)˛

˛ ≤ 2r sup |ϕ| < ρε.

(c) Given ε > 0, take ρ as in part (b). For each n ≥ 1, write n = sρ + r, with 1 ≤ r ≤ ρ.Then,

1

n

nX

j=1

ϕ(j) =ρ

sρ+ r

s−1X

i=0

1

ρ

(i+1)ρX

l=iρ+1

ϕ(l) +1

n

sρ+rX

l=sρ+1

ϕ(l).

For s large, the first term on the right-hand side is close to (1/ρ)Pρ−1j=0 ϕ(j) (by part (b)) and

the last term is close to zero (by part (a)). Conclude that the left-hand side of the identity isa Cauchy sequence. (d) Observe that

| 1n

nX

j=1

ϕ(x+ k) − 1

n

nX

j=1

ϕ(j)| ≤ 2|x|n

sup |ϕ|

and use parts (a) and (c).

3.3.3. Let µ be a probability measure invariant under a flow f t : M → M , t ∈ R and let(ϕs)s>0 be a family of functions, indexed by the positive real numbers, such that ϕs+t ≤ϕt + ϕs f t and the function Φ = sup0<s<1 ϕ

+s is in L1(µ). Then, (1/T )ϕT converges at µ-

almost every point to a function ϕ such that ϕ+ ∈ L1(µ) andR

ϕdµ = limT→∞(1/T )R

ϕT dµ.To prove this, take ϕ = limn(1/n)ϕn (Theorem 3.3.3). For T > 0 non-integer, write T = n+swith N ∈ N and s ∈ (0, 1). Then,

ϕT ≤ ϕn + ϕs fn ≤ ϕn + Φ fn and ϕT ≥ ϕn+1 − ϕ1−s fT ≥ ϕn − Φ fT .

DR

AFT


Using Lemma 3.2.5, the first inequality shows that lim supT→∞(1/T )ϕT ≤ ϕ. Analogously,using the version of Lemma 3.2.5 for continuous time, the second inequality above givesthat lim infT→∞(1/T )ϕT ≥ ϕ. It also follows that limT→∞(1/T )

R

ϕT dµ coincides withlimn(1/n)

R

ϕn dµ. By Theorem 3.3.3, this last limit is equal toR

ϕdµ.

3.3.6. Since log+ ‖φ‖ ∈ L1(µ), for every ε > 0 there exists δ > 0 such that µ(B) < δ impliesR

B log+ ‖θ‖ dµ < ε. Using that log+ ‖φn‖ ≤ Pn−1j=0 log+ ‖θ‖ fj , one gets that

µ(E) < δ ⇒ 1

n

Z

Elog+ ‖φn‖ dµ ≤ 1

n

n−1X

j=0

Z

f−j(E)log+ ‖θ‖ dµ ≤ ε.

3.4.3. Consider local coordinates x = (x1, x2, . . . , xd) such that Σ is contained in x1 = 0.Write ν = ψ(x) dx1dx2 . . . dxd. Then, νΣ = ψ(y) dx2 . . . dxd with y = (0, x2, . . . , xd). GivenA ⊂ Σ and δ > 0, the map ξ : (t, y) 7→ gt(y) is a diffeomorphism from [0, δ] × A to Aδ.Therefore, ν(Aδ) =

R

[0,δ]×A(ψ ξ)| detDξ| dtdx2 . . . dxd and, consequently,

limδ→0

ν(Aδ)

δ=

Z

Aψ(y)| detDξ|(y) dx2 . . . dxd.

Next, note that |detDξ|(y) = |X(y) · (∂/∂t)| = φ(y) for every y ∈ Σ. It follows that the fluxof ν coincides with the measure η = φνΣ. In particular, η is invariant under the Poincaremap.

4.1.2. Use the theorem of Birkhoff and the dominated convergence theorem.

4.1.8. Assume that Ufϕ = λϕ. Since Uf is an isometry, |λ| = 1. If λn = 1 for some nthen ϕ fn = ϕ and, by ergodicity, ϕ is constant almost everywhere. Otherwise, given anyc 6= 0, the sets ϕ−1(λ−kc), k ≥ 0 are pairwise disjoint. Since they all have the same measure,this measure must be zero. Finally, the set ϕ−1(c) is invariant under f and, consequently, itsmeasure is either zero or total.

4.2.4. Let K be such a set. We may assume that K contains an infinite sequence of periodicorbits (On)n with period going to infinity. Let Y ⊂ K be the set of accumulation points ofthat sequence. Show that Y can not consist of a single point. Let p 6= q be periodic pointsin Y and z be a heteroclinic point, that is, such that σn(z) converges to the orbit of p whenn → −∞ and to the orbit of q when n → +∞. Show that z ∈ Y and deduce the conclusionof the exercise.

4.2.10. Let Jk = (0, 1/k), for each k ≥ 1. Check that the continued fraction expansion of xis of bounded type if and only if there exists k ≥ 1 such that Gn(x) /∈ Jk for every n. Observethat µ(Jk) > 0 for every k. Deduce that for every k and µ-almost every x there exists n ≥ 1such that Gn(x) ∈ Jk. Conclude that L has zero Lebesgue measure.

4.2.11. For each L ∈ N, consider ϕL(x) = minφ(x), L. Then, ϕL ∈ L1(µ) and, byergodicity, ϕL =

R

ϕL dµ at µ-almost every point. To conclude, observe that φ ≥ φL for everyL and

R

φL dµ → +∞.

4.3.7. Let M = 0, 1N and, for each n, let µn be the invariant measure supported on theperiodic orbit αn = (αnk )k , with period 2n, defined by αnk = 0 if 0 ≤ k < n and αnk = 1 ifn ≤ k < 2n. Show that (µn)n converges to (δ0 + δ1)/2, where 0 and 1 are the fixed points ofthe shift map.

4.3.9. (1) Take k ≥ 1 such that every cylinder of length k has diameter less than δ. Takey = (yj) defined by yj+ni

= xij for each 0 ≤ j < mi+ k. (2) Take δ > 0 such that d(z,w) < δ

implies |ϕ(z) − ϕ(w)| < ε and consider k ≥ 1 given by part (1). Choose mi, i = 1, . . . , ssuch that mi/ns ≈ αi for every i. Then take y as in part (1). (3) By the ergodic theorem,R

ϕdµ =R

ϕ dµ. Take x1, . . . , xs ∈ Σ and α1, . . . , αs such thatR

ϕ dµ ≈ P

i αiϕ(xi). Notethat ϕ(y) =

R

ϕdνy , where νy is the invariant measure supported on the orbit of y. RecallExercise 4.1.1.

4.4.3. On each side of the triangle, consider the foot of the corresponding height, that is, theorthogonal projection of the opposite vertex. Show that the defined by those three points isa periodic orbit of the billiard.

DR

AFT

495

4.4.5. Using (4.4.10) and the twist condition, we get that for each θ ∈ R there exists exactlyone number ρθ ∈ (a, b) such that Θ(θ, ρθ) = θ. The function θ 7→ ρθ is continuous andperiodic, with period 1. Consider its graph Γ = (θ, ρθ) : θ ∈ S1. Every point in Γ ∩ f(Γ)is fixed under f : if (θ, ρθ) = f(γ, ργ) =

`

Θ(γ, ργ), R(γ, ργ)´

then, since Θ(γ, ργ) = γ, itfollows that θ = γ and so ρθ = ργ . Since f preserves the area measure, none of the connectedcomponents of A \ Γ may be mapped inside itself. This implies that f(Γ) intersect Γ at noless than two points.

4.4.7. Taking inspiration from Example 4.4.12, show that the billiard map in Ω extends to aDehn twist in the annulus A = S1 × [−π/2, π/2], that is, a homeomorphism f : A → A thatcoincides with the identity on both boundary components but is homotopically non-trivial:actually, f admits a lift F : R × [−π/2, π/2] → R × [−π/2, π/2] such that F (s,−π/2) = (s−2π,−π/2) and F (s, π/2) = (s, π/2) for every s. Consider rational numbers pn/qn ∈ (−2π, 0)with qn → ∞. Use Exercise 4.4.6 to show that g has periodic points of period qn. One wayto ensure that these periodic points are all distinct is to take the qn mutually prime.

5.1.7. The statement does not depend on the choice of the ergodic decomposition, sincethe latter is essentially unique. Consider the construction in Exercise 5.1.6. The set M0 issaturated by the partition Ws, that is, if x ∈ M0 then Ws(x) ⊂ M0. Moreover, the mapy 7→ µy is constant on each Ws(x). Since the partition P is characterized by P(x) = P(y) ⇔µx = µy , it follows that P ≺ Ws restricted to M0.

5.2.1. Consider the canonical projections πP : M → P and πQ : M → Q, the quotientmeasures µP = (πP )∗µ and µQ = (πQ)∗µ and the disintegrations µ =

R

µP dµP (P ) and µ =R

µQ dµQ(Q). Moreover, for each P ∈ P, consider µP,Q = (πQ)∗µP and the disintegrationµP =

R

µP,Q dµP,Q(Q). Observe thatR

µP,Q dµP (P ) = µQ: given any B ⊂ Q,Z

µP,Q(B) dµP (P ) =

Z

µP (π−1Q (B)) dµP (P ) = µ(π−1

Q (B)) = µQ(B).

To check that µπ(Q),Q is a disintegration of µ with respect to Q: (a) µP,Q(Q) = 1 for µP,Q-almost every Q and µP -almost every P . Moreover, µP,Q = µπ(Q),Q for µP,Q-almost every Qand µP -almost every P , because µP (P ) = 1 for µP -almost every P . By the previous observa-tion, it follows that µπ(Q),Q(Q) = 1 for µQ-almost every Q. (b) P 7→ µP (E) is measurable,up to measure zero, for every Borel set E ⊂ M . By construction (Section 5.2.3), there existsa countable generating algebra A such that µP,Q(E) = limn µP (E ∩Qn)/µP (Qn) for everyE ∈ A (where Qn is the element of Qn that contains Q). Deduce that P 7→ µπ(Q),Q(E) ismeasurable, up to measure zero, for every E ∈ A. Extend this conclusion to every Borel setE, using the monotone class argument in Section 5.2.3. (c) Note that µ =

R

µP dµP (P ) =R R

µP,Q dµP,Q(Q)dµP (P ) =R R

µπ(Q),Q dµP,Q(Q)dµP (P ) =R

µπ(Q),Q dµQ(Q).

5.2.2. Argue that the partition Q of the space M1(M) into points is measurable. Given anydisintegration µP : P ∈ P, consider the measurable map M 7→ M1(M), x 7→ µP (x). Thepre-image of Q under this map is a measurable partition. Check that this pre-image coincideswith P on a subset with full measure.

6.1.3. The function ϕ is invariant.

6.2.5. Denote by X the closure of the orbit of x. If X is minimal, for each y ∈ X thereexists n(y) ≥ 1 such that d(fn(y)(y), x) < ε. Then, by continuity, y admits an open neigh-borhood V (y) such that d(fn(y)(z), x) < ε for every z ∈ V (y). Take y1, ..., ys such thatX ⊂ ∪iV (yi) and let m = maxi n(yi). Given any k ≥ 1, take i such that fk(x) ∈ V (yi).Then, d(fk+ni (x), x) < ε, that is, k+ni ∈ Rε. This proves that, given any m+1 consecutiveintegers, at least one of them is in Rε. Hence, Rε is syndetic. Now assume that X is notminimal. Then, there exists a non-empty, closed invariant set F properly contained in X.Note that x /∈ F and so, for every ε sufficiently small, there exists an open set U that containsF and does not intersect B(x, ε). On the other hand, since Rε is syndetic, there exists m ≥ 1such that for any k ≥ 1 there exists n ∈ k, . . . , k + m satisfying fn(x) ∈ B(x, ε). Take ksuch that fk(x) ∈ U1, where U1 = U ∩ f−1(U) ∩ · · · ∩ f−m(U), and find a contradiction.

6.2.6. By Exercise 6.2.5, the set Rε = n ∈ N : d(x, fn(x)) < ε is syndetic for every ε > 0.If y is close to x then n ∈ N : d(fn(x), fn(y)) < ε contains blocs of consecutive integers

DR

AFT


with arbitrary length, no matter the choice of ε > 0. Let U1 be any neighborhood of x. Itfollows from the previous observations that there exist infinitely many values of n ∈ N suchthat fn(x), fn(y) are in U1. Fix n1 with this property. Next, consider U2 = U1 ∩ f−n1 (U1).By the previous step, there exists n2 > n1 such that fn2 (x), fn2 (y) ∈ U2. Continuing inthis way, construct a non-increasing sequence of open sets Uk and an increasing sequenceof natural numbers nk such that fnk (Uk+1) ⊂ Uk and fnk (x), fnk (y) ∈ Uk. Check that

fni1+···+nik (x) and fni1

+···+nik (y) are in U1 for any i1 < · · · < ik, k ≥ 1.

6.2.7. Consider the shift map σ : Σ → Σ in Σ = 1, 2, ..., qN. The partition N = S1 ∪· · ·∪Sqdefines a certain element α = (αn) ∈ Σ, given by αn = i if and only if n ∈ Si. Consider β inthe closure of the orbit of α such that α and β are near and the closure of the orbit of β is aminimal set. Apply Exercise 6.2.6 with x = β, y = α and U = [0;α0] to obtain the result.

6.3.6. Write g = (a11, a12, a2, a22). Then,

Eg(x11, x12, x21, x22) = (a11x11 +a12x21, a11x12 +a12x22, a21x11 +a22x21, a21x12 +a22x22).

Write the right-hand side as (y11, y12, y21, y22). Use the formula of change of variables, ob-serving that det(y11, y12, y21, y22) = (det g) det(x11, x12, x21, x22) and

dy11dy12dy21dy22 = (det g)2dx11dx12dx21dx22.

In the complex case, takeZ

GL(2,R)ϕdµ =

Z

ϕ(z11, z12, z21, z22)

|det(z11, z12, z21, z22)|4 dx11dy11dx12dy12dx21dy21dx22dy22,

where zjk = xjk + yjki. [Observation: Generalize these constructions to any dimension!]

6.3.9. Given x ∈M , there exists a unique number 0 ≤ r < 10k such that fr(x) ∈ [b0, ..., bk−1].Moreover, fn(x) ∈ [b0, . . . , bk−1] if and only if n− r is a multiple of 10k . Use this observationto conclude that

τ([b0, . . . , bk−1], x) = 10−k for every x ∈ M .

Conclude that if f admits an ergodic probability measure µ then µ([b0, . . . , bk−1]) = 10−k forevery b0, . . . , bk−1. This determines µ uniquely. To conclude, show that µ is well defined andinvariant.

6.3.11. Consider the sequence of words wn defined inductively by w1 = α and s(wn+1) = wnfor n ≥ 1. Decompose the word s(α) = w2 = αr1 and prove, by induction, that wn+1 may bedecomposed as wn+1 = wnrn, for some word rn with length larger or equal than n, such thats(rn) = rn+1. Define w = αr1r2 · · · and note that s(w) = s(α)s(r1)s(r2) · · · = αr1r2r3 · · · =w. This proves existence. To prove uniqueness, let γ ∈ Σ be a sequence starting with α andsuch that S(γ) = γ. Decompose γ as γ = αγ1γ2γ3 · · · , in such a way that γi and ri have thesame length. Note that S(α) = αγ1 = αr1, and so γ1 = r1. Conclude by induction.

6.4.2. Given any 0 ≤ α < β ≤ 1, we have that√n ∈ (α, β) in the circle if and only if

there exists some integer number k ≥ 1 such that k2 + 2kα + α2 < n < k2 + 2kβ + β2. Foreach k the number of values of n that satisfy this inequality is equal to the integer part of2k(β − α) + (β2 − α2). Therefore,

#1 ≤ n < N2 :√n ∈ (α, β) ≤

N−1X

k=1

2k(β − α) + (β2 − α2)

and the difference between the term on the right and the one on the left is less than N . Hence,

lim1

N2#1 ≤ n < N2 :

√n ∈ (α, β) = β − α.

A similar calculation shows that the sequence (log n mod Z)n is not equidistributed in thecircle. [Observation: But it does admit a continuous (non-constant) limit density. Calculatethat density!]

6.4.3. Define φn = an + (−1/a)n. Check that (φn)n is the Fibonacci sequence and, inparticular, φn ∈ N for every n ≥ 1. Now observe that (−1/a)n converges to zero. Hence,n ≥ 1 : an mod Z ∈ I is finite, for any interval I ⊂ S1 whose closure does not contain zero.

DR

AFT

497

7.1.1. It is clear that the condition is necessary. To see that it is sufficient: Given A, considerthe closed subspace V of L2(µ) generated by the functions 1 and Xf−k(A), k ∈ N. The

hypothesis ensures that limn Unf (XA) ·Xf−k(A) = (XA · 1)(Xf−k(A) · 1) for every k. Conclude

that limn Unf (XA) · φ = (XA · 1)(φ · 1) for every φ ∈ V . Given a measurable set B, write

XB = φ+φ⊥ with φ ∈ V and φ⊥ ∈ V⊥ to conclude that limn Unf (XA) ·XB = (XA ·1)(XB ·1).

7.1.2. Assuming that E exists, decompose (1/n)Pn−1j=0 |aj | into two terms, one over j ∈ E and

the other over j /∈ E. The hypotheses imply that the two terms converge to zero. Conversely,assume that (1/n)

Pn−1j=0 |aj | converges to zero. Define Em = j ≥ 0 : |aj | ≥ (1/m) for each

m ≥ 1. The sequence (Em)m is increasing and each Em has density zero; in particular, thereexists ℓm ≥ 1 such that (1/n)#

`

Em ∩ 0, . . . , n − 1´

< (1/m) for every n ≥ ℓm. Choose(ℓm)m increasing and define E = ∪m(Em ∩ ℓm, . . . , ℓm+1 − 1). For the second part of theexercise, apply the first part to both sequences, (an)n and (a2n)n.

7.1.6. (Pollicott, Yuri [PY98]) It is enough to treat the case whenR

ϕj dµ = 0 for every j.Use induction on the number k of functions. The case k = 1 is contained in Theorem 3.1.6.Use the inequalities

1

N

nX

n=1

an ≤ 1

N

N−m+1X

n=1

“ 1

m

m−1X

j=0

an+j

”

+m

N

`

max1≤i≤m

|ai| + maxN−m≤i≤N

|ai|´

“ 1

N

NX

n=1

bn”2

≤ (1/N)N

X

n=1

|bn|2

to conclude thatR ˛

˛(1/N)PN−1j=0 (ϕ1 fn) · · · (ϕk fkn)

˛

˛

2dµ is bounded above by

1

N

NX

n=1

“

Z

| 1

m

m−1X

j=0

`

ϕ1 fn+j´

· · ·`

ϕk fk(n+j)´˛

˛

2dµ +

“2m

N+m2

N2

”“

max1≤i≤k

supess |ϕi|”2.

The integral is equal to

m−1X

i=0

m−1X

j=0

Z kY

l=1

“

ϕl`

ϕl f l(j−i)´

”

f l(n+i) dµ.

By the induction hypothesis,

1

N

NX

n=1

kY

l=2

“

ϕl`

ϕl f l(j−i)´

”

f l(n+i) →k

Y

l=2

Z

ϕl`

ϕl f l(j−i)´

dµ

in L2(µ), when N → ∞. Therefore,

1

N

NX

n=1

Z kY

l=1

“

ϕl`

ϕl f l(j−i)´

”

f l(n+i) dµ→k

Y

l=1

Z

ϕl`

ϕl f l(j−i)´

dµ

in L2(µ), when N → ∞. Combining these estimates,

lim supN

Z

˛

˛

1

N

NX

n=1

`

ϕ1 fn´

· · ·`

ϕk fkn´˛

˛

2dµ ≤ 1

m2

m−1X

i=0

m−1X

j=0

kY

l=1

Z

ϕl(ϕl f l(j−i)´

dµ.

Since (f, µ) is weak mixing,R

ϕl(ϕl f lr´

dµ converges to 0 when r → ∞, restricted to a setof values of l with density 1 at infinity (recall Exercise 7.1.2). Therefore, the expression onthe right-hand side is close to zero when m is large.

7.2.5. The first statement is analogous to Exercise 7.2.1. The definition ensures that µk hasmemory k. Given ε > 0 and any (uniformly) continuous function ϕ : Σ → R, there existsκ ≥ 1 such that |

R

C ϕdη − ϕ(x)η(C)| ≤ εη(C) for every x ∈ C, every cylinder C of lengthl ≥ κ and every probability measure η. Since µ = µk for cylinders of length k, it follows that

DR

AFT


|R

ϕdµk −R

ϕdµ| ≤ ε for every k ≥ κ. This proves that (µk)k converges to µ in the weak∗

topology.

7.2.6. (1) Use that Pn1+n2i,i =

P

j Pn1i,j P

n2j,i . All the terms in this expression are non-negative

and the term corresponding to j = i is positive. (2) Up to replacing R by R/κ, we maysuppose that κ = 1. Start by showing that if S ⊂ Z is closed under addition and subtractionthen S = aZ, where a is the smallest positive element of S. Use that fact to show that ifa1, ..., as are positive integers with greatest common divisor equal to 1 then there exist integernumbers b1, ..., bs such that b1a1 + ... + bsas = 1. Now take a1, ..., as ∈ R such that theirgreatest common divisor is equal to 1. Using the previous observation, and the hypothesisthat R is closed under addition, conclude that there exists p, q ∈ R such that p − q = 1. Tofinish, show that R contains every integer number n ≥ pq. (3) Consider any i, j ∈ X andlet κi, κj be the greatest common divisors of R(i), R(j), respectively. By irreducibility, thereexist k, l ≥ 1 such that P ki,j > 0 and P lj,i > 0. Deduce that if n ∈ R(i) then n+ k + l ∈ R(j).

In view of (2), this is possible only if κi ≥ κj . Exchanging the roles of i and j, it also followsthat κi ≤ κj . If κ ≥ 2 then, given any i, we have Pni,i = 0 for n arbitrarily large and so P

can not be aperiodic. Now suppose that κ = 1. Then, using (2) and the hypothesis that Xis finite, there exists m ≥ 1 such that Pni,i > 0 for every i ∈ X and every n ≥ m. Then,since P is irreducible and X is finite, there exists k ≥ 1 such that for any i, j there existsl ≤ k such that P li,j > 0. Deduce that Pm+k

i,j > 0 for every i, j and so P is aperiodic. (4)

Fix any i ∈ X and, for each r ∈ 0, . . . , κ − 1, define Xr = j ∈ X : there exists n ≡ rmod κ such that Pni,j > 0. Check that these sets Xr cover X and are pairwise disjoint. Showthat the restriction of Pκ to each of them is aperiodic.

7.3.1. By the theorem of Darboux, there exist coordinates (x1, x2) in the neighborhoodof any point of S such that ω = dx1 ∧ dx2. Consider the expression of the vector field inthose coordinates: X = X1(∂/∂x1) +X2(∂/∂x2). Show that β = X1dx− 2 −X2dx1 and sodβ = (divX) dx1 ∧ dx2. Hence, β is closed if and only if the divergent of X vanishes.

7.3.5. Observe that f is invertible and if A is a d-adic interval of level r ≥ 1, (that is, aninterval of the form A = [id−r, (i+ 1)d−r ]), then there exists s ≥ r such that f(A) consists ofds−r d-adic intervals of level s. Deduce that f preserves the Lebesgue measure. Show also thatifA and B are d-adic intervals then, since σ has no periodic points, m(fk(A)∩B) = m(A)m(B)for every large k.

7.4.2. (1) Given y1, y2 ∈ M , write f−1(yi) = xi1, . . . , xid with d(x1j , x

2j ) ≤ σ−1d(y1, y2).

Then,

|Lϕ(y1) −Lϕ(y2)| =1

d

dX

j=1

|ϕ(x1j ) − ϕ(x2

j )| ≤ Kθ(ϕ)σ−θd(y1, y2)θ.

(2) It follows that ‖Lϕ‖ ≤ sup |ϕ|+σ−θKθ(ϕ) ≤ ‖ϕ‖ for every ϕ ∈ E, and the identity holds ifand only if ϕ is constant. Hence, ‖L‖ = 1. (3) Let Jn = [inf Lnϕ, supLnϕ]. By part (1), thesequence (Jn)n is decreasing and the diameter of Jn converges to zero exponentially fast. Takeνϕ to be the point in the intersection and note that ‖Lnϕ−νϕ‖ = sup |Lnϕ−νϕ|+Kθ(Lnϕ).(4) The constant functions are eigenvectors of L, associated with the eigenvalue λ = 1. Itfollows that νϕ+c = νϕ + c for every ϕ ∈ E and every c ∈ R. Then, H = ϕ : νϕ = 0 isa hyperplane of E transverse to the line of constant functions. This hyperplane is invariantunder L and, by part (3), the spectral radius of L | H is less or equal than σ−θ < 1. (5) Bypart (2), ‖Lnϕ − Lnψ‖ ≤ ‖Lkϕ − Lkψ‖ for every n ≥ k ≥ 1. Making n → ∞, we get that|νϕ − νψ| ≤ ‖Lkϕ − Lkψ‖ for every k ≥ 1. Using part (1) and making k → ∞, we get that|νϕ − νψ| ≤ sup |ϕ− ψ|. Therefore, the linear operator ψ 7→ νψ is continuous, relative to thenorm in the space C0(M).

8.1.2. Denote Xi = X ∩ [0; i] and pi = µ([0; i]), for i = 1, . . . , k. Since µ is a Bernoullimeasure, µ(Xi) = piµ(f(Xi)). Hence,

P

i piµ(f(Xi)) = 1. SinceP

i pi = 1, it follows thatµ(f(Xi)) = 1 for every i. Consequently, ∩if(Xi) has full measure. Take x in that intersection.If (f, µ) and (g, ν) are ergodically equivalent, there exists a bijection φ : X → Y between fullmeasure invariant subsets such that φ f = g φ. Take x ∈ X with k pre-images x1, . . . , xk

DR

AFT

499

in X. The points φ(xi) are pre-images of φ(x) for the transformation g. Hence, k ≤ l; bysymmetry, we also have that l ≤ k.

8.2.5. Assume that (f, µ) is not weak mixing. By Theorem 8.2.1, there exists a non-constantfunction ϕ such that Ufϕ = λϕ for some λ = e2πiθ . By ergodicity, the absolute value of ϕis constant µ-almost everywhere. Using that fn is ergodic for every n (Exercise 4.1.8), θ isirrational and any set where ϕ is constant has measure zero. Given α < β in [0, 2π], considerA = x ∈ C : α ≤ arg(ϕ(x)) ≤ β. Show that for every ε > 0 there exists n such thatµ(f−n(A) \ A) < ε. Show that, by choosing |β − α| sufficiently small, one gets to contradictthe inequality in the statement.

8.2.7. Note that fn+1(x) = fn(x) for every x ∈ Jn that is not on the top of Sn. Hence,(for example, arguing as in Exercise 6.3.10), f(x) = fn(x) for every x ∈ [0, 1) and every nsufficiently large; moreover, f preserves the Lebesgue measure. Let an = #Sn be the heightof each pile Sn. Denote by Ie, Ic, Id the partition of each I ∈ Sn into subintervals of equallength, ordered from left to right. (a) If A is a set with m(A) > 0 then for every ε > 0 thereexists n ≥ 1 and some interval I ∈ Sn such that m(A ∩ I) ≥ (1 − ε)m(I). If A is invariant, itfollows that m(A∩ J) ≥ (1− ε)m(J) for every J ∈ Sn. (b) Assume that Ufϕ = λϕ. Since Ufis an isometry, |λ| = 1. By ergodicity, |ϕ| is constant almost everywhere; we may suppose that|ϕ| ≡ 1. Initially, assume that there exists n and some interval I ∈ Sn such that the restrictionof ϕ to I is constant. Take x ∈ Ie and y ∈ Ic and z ∈ Id. Then, ϕ(x) = ϕ(y) = ϕ(z) andϕ(y) = λanϕ(x) and ϕ(z) = λan+1ϕ(y). Hence, λ = 1 and, by ergodicity, ϕ is constant. Ingeneral, use the theorem of Lusin (Theorems A.3.5-A.3.9) to reach the same conclusion. (c)A is a union of intervals Ij in the pile Sn for each n ≥ 2. Then, fan(Iej ) = Icj for every j.

Hence, m(fan (A) ∩ A) ≥ m(A)/3 = 2/27.

8.3.1. Let vj : j ∈ I be a basis of H formed by eigenvectors with norm 1 and λj be theeigenvalue associated with each eigenvector vj . The hypothesis ensures that we may considerI = N. Show that for every δ > 0 and every k ≥ 1 there exists n ≥ 1 such that |λnj − 1| ≤ δ

for every j ∈ 1, . . . , k (use the pigeonhole principle). Decompose ϕ =P

j cjvj , with cj ∈ C.

Observe that Unf ϕ =P

j∈N cjλnj vj and so

‖Unf ϕ− ϕ‖22 ≤

kX

j=1

|cj(λnj − 1)|2 +∞

X

j=k+1

2|cj |2 ≤ δ2‖ϕ‖22 +

∞X

j=k+1

2|cj |2.

Given ε > 0, we may choose δ and k in such a way that each one of the terms on the right-handside is less than ε/2.

8.4.3. Let U : H → H be a non-invertible isometry. Recalling Exercise 2.3.6, show thatthere exist closed subspaces V and W of H such that U : H → H is unitarily conjugate tothe operator U1 : V ⊕WN → V ⊕WN given by U1 | V = U | V and U1 | WN = id . LetU2 : V ⊕W Z → V ⊕W Z be the linear operator defined by U1 | V = U | V and U1 |W Z = id .Check that U2 is a unitary operator such that U2 = U1, where : V ⊕WN → V ⊕W Z

is the natural inclusion. Show that if E ⊂ V ⊕WN satisfies the conditions in the definitionof Lebesgue spectrum for U1 then j(E) satisfies those same conditions for U2. Conclude thatthe rank of U1 is well defined.

8.4.6. The lemma of Riemann-Lebesgue ensures that F takes values in c0. The operator Fis continuous: ‖F (ϕ)‖ ≤ ‖ϕ‖ for every ϕ ∈ L1(λ). Moreover, F is injective: if F (ϕ) = 0 thenR

ϕ(z)ψ(z) dλ(z) = 0 for every linear combination ψ(z) =P

|j|≤l ajzj , aj ∈ C. Given any

interval I ⊂ S1, the sequence ψN =P

|n|≤N cnzn, cn =R

Iz−n dλ(z) of partial sums of the

Fourier series of the characteristic function XI is bounded (see [Zyg68, pagina 90]). Using thedominated convergence theorem, it follows that F (ϕ) = 0 implies

R

Iϕ(z) dλ(z) = 0, for any

interval I. Hence, ϕ = 0. If F were bijective then, by the open mapping theorem, its inversewould be a continuous linear operator. Then, there would be c > 0 such that ‖F (ϕ)‖ ≥ c‖ϕ‖for every ϕ ∈ L1(λ). But that is false: consider DN (z) =

P

|n|≤N zn for N ≥ 0. Check that

F (DN ) = (aNn )n with aNn = 1 if |n| ≤ N and aNn = 0 otherwise. Hence, ‖F (DN )‖ = 1 forevery N . Writing z = e2πit, check that DN (z) = sin((2N + 1)πt)/ sin(πt). Conclude that‖DN‖ =

R

|DN (z)| dλ(z) converges to infinity when N → ∞. [Observation: One can alsogive explicit examples. For instance, if (an)n converges to zero and satisfies

P∞n=1 an/n = ∞

DR

AFT


then the sequence (αn)n given by αn = an/(2i) for n ≥ 1 and αn + α−n = 0 for everyn ≥ 0 may not be written in the form αn =

R

zn dν(z). See Section 7.3.4 of the book ofEdwards [Edw79].]

8.5.3. By Exercise 8.5.2, f is always injective. Conclude that if f is surjective then it isinvertible: there exists a homomorphism of measure algebras h : B → B such that h f =f h = id . Use Proposition 8.5.6 to find g : M →M such that g f = f g at µ-almost everypoint. The converse is easy: if (f, µ) is invertible at almost every point then the homomorphismof measure algebras g associated with g = f−1 satisfies g f = f g = id ; in particular, f issurjective.

8.5.6. Check that the unions of elements of ∪nPn are pre-images, under the inclusion ι, ofopen subset of K. Use that fact to show that if the chains have measure zero then for eachδ > 0 there exists an open set A ⊂ K such that m(A) < δ and every point outside A is in theimage of the inclusion: in other words, K \ ι(MP ) ⊂ A. Conclude that ι(MP ) is a Lebesguemeasurable set and its complement in K has measure zero. For the converse, use the fact that(a) implies (c) in Exercise A.1.13.

9.1.1. Hµ(P/R) ≤ Hµ(P ∨ Q/R) = Hµ(Q/R) +Hµ(P/Q ∨R) ≤ Hµ(Q/R) +Hµ(P/Q).

9.1.3. Let g = fk . Then, Hµ(∨k−1i=0 f

−i(P)/ ∨nj=k f−j(P)) = Hµ(Pk/ ∨n−ki=1 g−i(Pk)). By

Lemma 9.1.12, this expression converges to hµ(g,Pk). Now use Lemma 9.1.13.

9.2.5. Write Qn = ∨n−1j=0 f

−j(Q) for each n and let A be the σ-algebra generated by ∪nQn.Check that f is measurable with respect to the σ-algebra A. Show that the hypothesisimplies that P ⊂ A. By Corollary 9.2.4, it follows that Hµ(P/Qn) converges to zero. ByLemmas 9.1.11 and 9.1.13, we have that hµ(f,P) ≤ hµ(f,Q) +Hµ(P/Qn) for every n.

9.2.7. The set A of all finite disjoint of rectangles Ai × Bi, with Ai ⊂ M and Bi ⊂ N ,is an algebra that generates the σ-algebra of M × N . Given partitions P and Q of M andN , respectively, the family P × Q = P × Q : P ∈ P and Q ∈ Q is a partition of M × Ncontained in A and such that hµ×ν(f × g,P × Q) = hµ(f,P) + hν(g,Q). Conversely, givenany partition R ⊂ A of M ×N , there exist partitions P and Q such that R ≺ P ×Q and sohµ×ν(f × g,R) ≤ hµ(f,P) + hν(g,Q). Conclude using Exercise 9.2.6.

9.3.3. It is clear that B(x, n, ε) ⊂ B(f(x), n − 1, ε). Hence, hµ(f, x) ≥ hµ(f, f(x)) for µ-almost every x. On the other hand,

R

hµ(f, x) dµ(x) =R

hµ(f, f(x)) dµ(x) since the measureµ is invariant under f .

9.4.2. Use the following consequence of the Jordan canonical form: there exist numbersρ1, . . . , ρl > 0, there exists an A-invariant decomposition Rd = E1⊕· · ·⊕El and, given α > 0,there exists an inner product in Rd relative to which the subspaces Ej are orthogonal andsatisfy e−αρj‖v‖ ≤ ‖Av‖ ≤ eαρj‖v‖ for every v ∈ Ej . Moreover, the ρi are the absolute

values of the eigenvalues of A and they satisfyPdi=1 log+ |λi| =

Plj=1 dimEj log+ ρj .

9.4.3. Consider any countable partition P with B,Bc ≺ P. Let Q be the restriction of

P to the set B. Write Pn = ∨n−1j=0 f

−j(P) and Qk = ∨k−1j=0 g

−j(Q). Check that, for every

x ∈ B and k ≥ 1, there exists nk ≥ 1 such that Qk(x) = Pnk (x). Moreover, by ergodicity,limk k/nk = τ(B, x) = µ(B) for almost every x. By the theorem of Shannon-McMillan-Breiman,

hν(g,Q, x) = limk

− 1

klog ν(Qk(x)) and hµ(f,P, x) = lim

k− 1

nklog µ(Pnk (x)).

Conclude that hν(f,Q, x) = µ(B)hν (g,Q, x) for almost every x ∈ B. Varying P, deduce thathν(f) = µ(B)hν (g).

9.5.2. Consider A ∈ ∩nf−n(B) with m(A) > 0. Then, for each n there exists An ∈ B suchthat A = f−n(An). Consider the intervals Ij,n =

`

(j − 1)/10n, j/10n´

. Then,

m(A ∩ Ij,n)

m(Ij,n)=

m(An)

m((0, 1))= m(A) for every 1 ≤ j ≤ 10n.

Making n→ ∞, conclude that Ac has no points of density. Hence, m(Ac) = 0.

DR

AFT

501

9.5.6. Assume that hµ(f,P) = 0. Use Lemma 9.5.4 to show that P ≺ W∞j=1 f

−jk(P) for

every k ≥ 1. Deduce that, up to measure zero, P is contained in f−k(B) for every k ≥ 1.Conclude that the partition P is trivial.

9.6.1. Uniqueness is immediate. To prove the existence, consider the functional Ψ definedby Ψ(ψ) =

R ` R

ψ dη´

dW (η) in the space of bounded measurable functions ψ : M → R.Note that Ψ is linear and non-negative and satisfies Ψ(1) = 1. Use the monotone convergencetheorem to show that if Bn, n ≥ 1 are pairwise disjoint measurable subsets of M thenΨ(X∪nBn ) =

P

nΨ(XBn ). Conclude that ξ(B) = Ψ(XB) defines a probability measure inthe σ-algebra of measurable subsets of M . Show that

R

ψ dξ = Ψ(ψ) for every boundedmeasurable function. Take bar(W ) = ξ.

9.6.5. For the penultimate identity one would need to know that n−1 log µP (Qn(x)) is adominated sequence (for example).

9.7.4. By Exercise 9.7.3, given any bounded measurable function ψ,Z

(ψ f) dη =

Z

ψ(x)“

X

z∈f−1(x)

1

Jηf(z)

”

dη(x).

Deduce the first part of the statement. For the second part, note that if η is invariant thenη(f(A)) = η(f−1(f(A))) ≥ η(A) for every domain of invertibility A.

9.7.7. The ‘if’ part of the statement is easy: we may exhibit the ergodic equivalence explicitly.Assume that the two systems are ergodically equivalent. The fact that k = l follows fromExercise 8.1.2. To prove that p and q are permutations of one another, use the fact that theJacobian is invariant under ergodic equivalence (Exercise 9.7.6), together with the expressionsof the Jacobians given by Example 9.7.1.

10.1.6. Note that ψ(M) is a compact and the inverse ψ−1 : ψ(M) → M is (uniformly)continuous. Hence, given ε > 0 there exists δ > 0 such that if E ⊂ M is (n, ε)-separated forf then ψ(E) ⊂ N is (n, δ)-separated for g. Conclude that s(f, ε,M) ≤ s(g, δ,N) and deducethat h(f) ≤ h(g). [Observation: The statement remains valid in the non-compact case, aslong as we assume the inverse ψ−1 : ψ(M) →M to be uniformly continuous.]

For the second part, consider the distance defined in Σ by

d`

(xn)n, (yn)n´

=X

n∈Z

2−|n||xn − yn|.

Consider a discrete set A ⊂ [0, 1] with n elements. Check that the restriction to AZ of thedistance of [0, 1]Z is uniformly equivalent to the distance defined in (9.2.15). Using Exam-ple 10.1.2, conclude that the topological entropy of σ is at least log n, for any n.

10.1.10. (Carlos Gustavo Moreira) Let θ1=0, θ2 = 01 and, for n ≥ 2, θn+1 = θnθn−1.We claim that, for every n ≥ 1, there exists a word τn such that θnθn+1 = τnαn andθn+1θn = τnβn, where αn = 10 and βn = 01 if n is even and αn = 01 and βn = 10 if nis odd. That holds for n = 1 with τ1 = 0 and for n = 2 with τ2 = 010. If it holds fora given n, then θn+1θn+2 = θn+1θn+1θn = θn+1τnβn = τn+1αn+1 and also θn+2θn+1 =θn+1θnθn+1 = θn+1τnαn = τn+1βn+1, as long as we take τn+1 = θn+1τn. This proves theclaim. It follows that the last letters of θn and θn+1 are distinct.

Now, we claim that θ = lim θn is not pre-periodic. Indeed, suppose that θ were pre-periodic and let m be its period. Since the length of θn is Fn+1 (where Fk is the k-th Fibonaccinumber), we may take n large such that m divides Fn+1 and the pre-period (that is, the lengthof the non-periodic part) of θ is less than Fn+2. Then, θ starts with θn+3 = θn+2θn+1 =θn+1θnθn+1. However, since the length Fn+1 of θn is a multiple of the period m, the Fm+2-thletter of θ, which is the last letter of θn+1, must coincide with the (Fm+2 + Fn+1)-th letterof θ, which is the last letter of θn. This would contradict the conclusion of the previousparagraph.

Next, we claim that ck+1(θ) > ck(θ) for every k. Indeed, suppose that ck+1(θ) = ck(θ)for some k. Then, every subword of length k can have only one continuation of length k + 1.Hence, we have a transformation in the set of subwords of length k, assigning to each subword

DR

AFT


its unique continuation, without the first letter. Since the domain is finite, all the orbits ofthis transformation are pre-periodic. In particular, θ is also pre-periodic, which contradictsthe conclusion in the previous paragraph.

Since c1(θ) = 2, it follows that ck(θ) ≥ k + 1 for every k. We claim that cFn+1(θ) ≤

Fn+1 + 1 for every n > 1. To prove that fact, note that θ may be written as a concatenationof words belonging to θn, θn+1, because (by induction), every θr, with r ≥ n may be writtenas a concatenation of words belonging to θn, θn+1. Thus, any subword of θ of length Fn+1

(which is the length of θn) is a subword of θnθn+1 or θn+1θn. Since θnθn+1 = θnθnθn−1 isa subword of θnθnθn−1θn−2 = θnθnθn, there are at most |θn| = Fn+1 subwords of length|θn| = Fn+1 of θnθnθn and, hence, of θnθn+1. Since θnθn+1 = τnαn and θn+1θn = τnβn andθn+1θn ends with θn and |βn| = 2, the unique subword of θn+1θn of length |θn| = Fn+1 thatmay not be a subword of θnθn+1 is the subword that ends with the first letter of βn (that is,one position before the end of θn+1θn). Hence, cFn+1

(θ) ≤ Fn+1 + 1 as stated.We are ready to obtain the statement of the exercise. Assume that ck(θ) > k+1 for some k.

Taking n such that Fn+1 > k, we would have cFn+1(θ)−ck(θ) < Fn+1+1−(k+1) = Fn+1−k

and that would imply that cm+1(θ) ≤ cm(θ) for some m with k ≤ m < Fn+1. This wouldcontradict the conclusion in the previous paragraph.

10.2.4. By Proposition 10.2.1, h(f) = g(f, δ,M) whenever f is ε-expansive and δ < ε/2.Show that if d(f, h) < δ/3 then g(h, δ/3, M) ≤ g(f, δ,M). Deduce that if (fk)k converges tof then lim supk h(fk) = lim supk g(fk, δ/3,M) ≤ g(f, δ,M) = h(f).

10.2.8. (Bowen [Bow72]) Write a = g∗(f, ε). Observe that if E is an (n, δ)-generating setof M , with δ < ε, then M = ∪x∈EB(x, n, ε). Combining this fact with the result of Bowen,show that gn(f, δ,M) ≤ #Eec+(a+b)n. Take b→ 0 to conclude the inequality.

10.3.3. (a) The hypothesis implies that for every n and every subcover δ of βn there exists asubcover γ of αn such that γ ≺ δ. Taking γ minimal, #γ ≤ #δ and

P

U∈γ infx∈U eφn(x) ≤

P

V ∈δ infy∈V eφn(y). It follows that Qn(f, φ, α) ≤ Qn(f, φ, β) for every n. (b) Lemma 10.1.11

gives that αn+k−1 is a subcover of (αk)n. A variation of the argument in part (a) gives thatQn(f, φ, αk) ≤ e(k−1) sup |φ|Qn+k−1(f, φ, α) for every n. Hence, Q±(f, φ, αk) ≤ Q±(f, φ, α).[Observation: Analogously, P (f, φ, αk) ≤ P (f, φ, α).] By the second part of Lemma 10.1.11,for every subcover β of (αk)n there exists a subcover γ of αn+k−1 such that γ ≺ β and

#γ ≤ #β andP

U∈γ infx∈U eφn+k−1(x) ≤ e(k−1) sup |φ|

P

V ∈β infy∈V eφn(y) (taking γ min-

imal). Deduce that Qn+k−1(f, φ, α) ≤ e(k−1) sup |φ|Qn(f, φ, αk). Hence, Q±(f, φ, α) ≤Q±(f, φ, αk). (c) Follows from part (b) and Corollary 10.3.3. (d) If the elements of α aredisjoint then (αk)n = αn+k−1 and so

Pn(f, φ, αk) = infX

U∈γ

supx∈U

eφn(x) : γ ⊂ (αk)n = infX

U∈γ

supx∈U

eφn(x) : γ ⊂ αn+k−1.

It follows that e−(k−1) sup |φ|Pn(f, φ, αk) ≤ Pn+k−1(f, φ, α) ≤ e(k−1) sup |φ|Pn(f, φ, αk). (e)Follows from part (d) and the definition of pressure (Lemma 10.3.1). (f) Note that α±k =fk(α2k) and use Exercise 10.3.2.

10.3.8. Show that given ε > 0 there exists κ ≥ 1 such that every dynamical ball B(x, n, ε)has diameter equal to ε2−n and contains some periodic point pnx of period n+ κ. Show thatgiven C, θ > 0 there exists K > 0 such that |φn(y) − φn(pnx)| ≤ K for every y ∈ B(x, n, ε),every n ≥ 1 and every (C, θ)-Holder function φ : S1 → R. Use this fact to replace generating(or separated) sets by sets of periodic points in the definition of pressure.

10.3.10. Since ξ | Λc = η | Λc,˛

˛E(ξ, η) −EΛ(ξ | Λ) +EΛ(η | Λ)˛

˛ =˛

˛

X

(k,l)∈Λ×Λc∪Λc×Λ

Ψ(k − l, ξk, ξl) − Ψ(k − l, ηk , ηl)˛

˛

≤X

(k,l)P

(k,l)∈Λ×Λc∪Λc×Λ

2Ke−θ|k−l|.

Recalling that Λ is an interval, the cardinal of (k, l) ∈ Λ × Λc ∪ Λc × Λ : |k − l| = n is lessor equal than 4n for every n ≥ 1. Hence,

˛

˛E(ξ, η) − EΛ(ξ | Λ) + EΛ(η | Λ)˛

˛ ≤∞

X

n=1

8Kne−θn < ∞.

DR

AFT

503

The second part of the statement is an immediate consequence of the first one.

10.4.4. Consider the shift map σ in the space Σ = 0, 1N. Consider the function φ : Σ → R

defined by φ(x) = 0 if x0 = 0 and φ(x) = 1 if x0 = 1. Let N be the set of points x ∈ Σ suchthat the time average in the orbit of x does not converge. Check that N is invariant under σand is non-empty: for each finite sequence (z0, . . . , zk) one can find x ∈ N with xi = zi fori = 0, . . . , k. Deduce that the topological entropy of the restriction f | Nφ is equal to log 2.Justify that N does not support any probability measure invariant under f .

10.4.5. Consider the open cover ξ of K whose elements are K ∩ [0, α] and K ∩ [1 − β, 1].Check that P (f, φ) = P (f, φ, ξ) for every potential φ. Moreover,

Pn(f,−t log g′, ξ) =X

U∈αn

[(gn)′]−t(U) = (αt + βt)n.

Conclude that ψ(t) = log(αt + βt). Check that ψ′ < 0 and ψ′′ > 0 (convexity also followsfrom Proposition 10.3.7). Moreover, ψ(0) > 0 > ψ(1). By the variational principle, the lastinequality implies that hµ(f) −

R

log g′ dµ < 0.

10.5.3. The Gibbs property gives that limn(1/n) log µ(Cn(x)) = ϕ(x) − P , where Cn(x) isthe cylinder of length n that contains x. Combine this identity with the theorem of Brin-Katok (Theorem 9.3.3) and the theorem of Birkhoff to get the first claim. Now assume thatµ1 and µ2 are two ergodic Gibbs states with the same constant P . Observe that there existsC such that C−1µ1(A) ≤ µ2(A) ≤ Cµ1(A) for every A in the algebra formed by the finitedisjoint unions of cylinders. Using the monotone class theorem (Theorem A.1.18), deducethat C−1µ1(A) ≤ µ2(A) ≤ Cµ1(A) for any measurable set A. This implies that µ1 and µ2

are equivalent measures. Using Lemma 4.3.1, it follows that µ1 = µ2.

10.5.5. By Proposition 10.3.7, the pressure function is convex. By Exercise A.5.1, it followsthat it is also continuous. By the smoothness theorem of Mazur (recall Exercise 2.3.7), thereexists a residual subset R ⊂ C0(M) such that the pressure function is differentiable at everyϕ ∈ R. Apply Exercise 10.5.4.

11.1.3. Adapt the arguments in Section 9.4.2, as follows. Start by checking that the iteratesof f have bounded distortion: there exists K > 1 such that

1

K≤ |Dfn(x)|

|Dfn(y)| ≤ K,

for every n ≥ 1 and any points x, y with Pn(x) = Pn(y). Consider the sequence µn =

(1/n)Pn−1j=0 f

j∗m of averages of the iterates of the Lebesgue measure m. Show that the

Radon-Nikodym derivatives dµn/dm are uniformly bounded and are Holder, with uniformHolder constants. Deduce that every accumulation point µ of that sequence is an invariantprobability measure absolutely continuous with respect to the Lebesgue measure. Show thatthe Radon-Nikodym derivative ρ = dµ/dm is bounded from zero and infinity (in other words,log ρ is bounded). Show that ρ and log ρ are Holder.

11.1.5. Check that Jµf = (ρ f)|f ′|/ρ and use the Rokhlin formula (Theorem 9.7.3).

11.2.4. Take Λ = 2−n : n ≥ 0 mod Z. The restriction f : Λ → Λ can not be an expandingmap because 1/2 is an isolated point in Λ but 1 = f(1/2) is not. [Observation: Note thatΛ = S1 \ ∪∞

n=0f−n(I), where I = (1/2, 1) mod Z. Modifying suitably the choice of I, one

finds many other examples, possibly with Λ uncountable.]

11.3.3. Let a =R

ϕdµ1 and b =R

ϕdµ2. Assume that a < b and write r = (b − a)/5.By the ergodic decomposition theorem, we may assume that µ1 and µ2 are ergodic. Then,there exist x1 and x2 such that ϕ(x1) = a and ϕ(x2) = b. Using the hypothesis that f istopologically exact, construct a pseudo-orbit (zn)n≥0 alternating (long) segments of the orbitsof x1 and x2, in such a way that the sequence of time averages of ϕ along the pseudo-orbit(zn)n oscillates from a+ r to b− r (meaning that lim inf ≤ a+ r and lim sup ≥ b− r). Next,use the shadowing lemma to find x ∈ M whose orbit shadows this pseudo-orbit. Using thatϕ is uniformly continuous, conclude that the sequence of time averages of ϕ along the orbitof x oscillates from a+ 2r to b− 2r.

DR

AFT


12.1.2. Theorem 2.1.5 gives that M1(M) is weak∗ compact and it is clear that it is convex.Check that the operator L : C0(M) → C0(M) is continuous and deduce that its dual L∗ :M(M) → M(M) is also continuous. If (ηn)n → η in the weak∗ topology then (

R

L1 dηn)n →R

L1dη. Conclude that the operator G : M1(M) → M1(M) is continuous. Hence, by theTychonoff-Schauder theorem, G has some fixed point ν. This means that L∗ν = λν, whereλ =

R

L1dν. Since λ > 0, this proves that ν is a reference measure. Using Corollary 12.1.9,

check that λ = lim supnnp

‖Ln1‖ and deduce that λ is the spectral radius of L.

12.2.4. Fix in S1 the orientation induced by R. Consider the fixed point p0 = 0 of f and letp1, . . . , pd be its pre-images, ordered cyclically, with pd = p0. Analogously, let q0 be a fixedpoint of g and q1, . . . , qd be its pre-images, ordered cyclically, with qd = q0. Note that f mapseach [pi−1, pi] and g maps each [qi−1, qi] onto S1. Then, for each sequence (in)n ∈ 1, . . . , dN

there exists exactly one point x ∈ S1 and one point y ∈ S1 such that fn(x) ∈ [pin−1, pin ] andgn(y) ∈ [qin−1, qin ] for every n. Clearly, the maps (in)n 7→ x and (in)n 7→ y are surjective.Consider two sequences (in)n and (jn)n to be equivalent if there exists N ∈ N ∪ ∞ suchthat (1) in = jn for every n ≤ N and either (2a) in = 1 and jn = d for every n > N or (2b)in = d and jn = 1 for every n > N . Show that the points x corresponding to (in)n and (jn)ncoincide if and only if the two sequences are equivalent and a similar fact holds for the pointsy corresponding to the two sequences. Conclude that the map φ : x 7→ y is well defined andis a bijection in S1 such that φ(f(x)) = g(φ(x)) for every x. Observe that φ preserves theorientation of S1 and, thus, is a homeomorphism.

12.2.5. (a) ⇒ (b): Trivial. (b) ⇒ (c): Let µa be the absolutely continuous invariant prob-ability measure and µm be the measure of maximum entropy of f ; let νa and νm be thecorresponding measures for g. Show that µa = µm. Let φ : S1 → S1 be a topologicalconjugacy. Show that νm = φ∗µm and νa = φ∗µa if φ is absolutely continuous. Use Corol-lary 12.2.4 to conclude that in the latter case |(gn)′(x)| = kn for every x ∈ Fix(fn). (c) ⇒(a): The hypothesis implies that νa = νm and so νa = φ∗µa. Recall (Proposition 12.1.20)that the densities dµa/dm and dνa/dm are continuous and bounded from zero and infinity.Conclude that φ is differentiable, with φ′ = (dµ/dm)

‹

(dν/dm) φ.

12.3.2. Consider A = (a, 1), P = (p, 1), Q = (q, 1), B = (b, 1), O = (0, 0) ∈ C × R. LetA′ (respectively, B′) be the point where the line parallel to OQ (respectively, OP ) passingthrough P (respectively, Q) intersects the boundary of C. Note that all these points belongto the plane determined by P , Q and O; note also that A′ ∈ OA and B′ ∈ OB. By definition,α(P,Q) = |B′Q|/|OP | and β(P,Q) = |OQ|/|A′P |. Check that |AP |/|AQ| = |A′P |/|OQ| and|BQ|/|BP | = |B′Q|/|OP |. Hence,

θ(P,Q) = logβ(P,Q)

α(P,Q)= log

|OQ| |OP ||A′P | |B′Q| = log

|AQ| |BP ||AP | |BQ| .

In other words, d(p, q) = log(|aq| |bp|)/(|ap| |bq|) = ∆(p, q), for any p, q ∈ D.

12.3.4. Consider the cone C0 of positive continuous functions in M . The correspondingprojective distance θ0 is given in Example 12.3.5. Check that θ1 is the restriction of θ0 tothe cone C1. Consider a sequence of positive differentiable functions converging uniformlyto a (continuous but) non-differentiable function g0 . Show that (gn)n converges to g0 withrespect to the distance θ0 and, thus, is a Cauchy sequence for θ0 and θ1. Argue that (gn)ncan not be convergent for θ1.

12.3.8. (a) It is clear that log g is (b, β)-Holder and sup g/ inf g is close to 1 if the norm ‖v‖β,ρis small; this will be implicit in all that follows. Then, g ∈ C(b, β,R). To estimate θ(1, g), usethe expression given by Lemma 12.3.8. Observe that

β(1, g) = sup˘

g(x),exp(bδ)g(x) − g(y)

exp(bδ) − 1: x 6= y, d(x, y) < ρ

¯

where δ = d(x, y)β .

Clearly, g(x) ≤ 1 + sup |v|. Moreover,

exp(bδ)g(x) − g(y)

exp(bδ) − 1≤ exp(bδ)g(y) + exp(bδ)Hβ,ρ(v)δ − g(y)

exp(bδ) − 1= g(y) +

δ exp(bδ)

exp(bδ) − 1Hβ,ρ(v).

Take K1 > K2 > 0, depending only on b, β, ρ, such that K1 ≥ exp(bs)s/(exp(bs)−1) ≥ K2 forevery s ∈ [0, ρβ ]. Then, the term on the right-hand side of the inequality previous is bounded

DR

AFT

505

by 1 + sup |v| + K1Hβ,ρ(v). Hence, logβ(1, g) ≤ log(1 + sup |v| +K1Hβ,ρ(v)) ≤ K ′1 ‖v‖β,ρ,

where K ′1 = maxK1, 1. Varying x and y in the previous arguments, we also find that

β(1, g) ≥ 1 + sup |v| and β(1, g) ≥ 1 − sup |v| +K2Hβ,ρ(v). Deduce that

log β(1, g) ≥ max˘

log(1 + sup |v|), log(1 − sup |v| +K2Hβ,ρ(v))¯

≥ K ′2 ‖v‖β,ρ,

where the constant K ′2 depends only on K2, β and ρ. Analogously, there exist constants

K ′3 > K ′

4 > 0 such that −K ′3 ‖v‖β,ρ ≤ logα(1, g) ≤ −K ′

4 ‖v‖β,ρ. Fixing

K ≥ max(K1 +K3), 1/(K2 +K4),it follows that K−1 ‖v‖β,ρ ≤ θ(1, g) ≤ K ‖v‖β,ρ. (b) It is no restriction to assume that‖v‖β,ρ < r. Note that Png = 1 + Pnv for every n. Corollary 12.3.12 gives that

θ(PkNg, 1) ≤ Λk0θ(1, g) for every k,

with Λ0 < 1. By part (a), it follows that ‖PkNv‖ ≤ K2Λk0 for every k. This yields the

statement, with τ = Λ1/k0 and C = K2‖P‖NΛ−1

0 .

12.4.4. Consider 0 < δ ≤ ρ. For every cover U of A with diameter less than δ, we haveX

U∈U

(diamU)d ≥X

U∈U

K−1µ(U) ≥ K−1µ(A).

Taking the infimum over U , we get that md(A, δ) ≥ K−1µ(A). Making δ → 0, we find thatmd(A) > K−1µ(A); hence, d(A) ≥ d.

12.4.7. Consider ℓ = 1. Then, D,D1, . . . ,DN (Section 12.4.3) are compact intervals. Itis no restriction to assume that D = [0, 1]. Write Din = hi0 · · · hin−1

(D) for eachin = (i0, . . . , in−1) in 1, . . . , Nn. Starting from the bounded distortion property (Proposi-tion 12.4.5), prove that there exists c > 0 such that, for every in and every n,

(a) c ≤ |(fn)′(x)| diamDin ≤ c−1 for every x ∈ Din ;

(b) d(Din ,Djn ) ≥ cdiamDin for every jn 6= in;

(c) diamDin+1 ≥ cdiamDin for every in, where in+1 = (i0, . . . , in−1, in).

Let ν be the reference measure of the potential ϕ = −d0 log |f ′|. Since P (f, ϕ) = 0, itfollows from Lemma 12.1.3 and Corollary 12.1.15 that Jνf = |f ′|d0 . Deduce that c ≤|(fn)′(x)|d0ν(Din ) ≤ c−1 for any x ∈ Din and, using (a) once more, conclude that

c2 ≤ diam(Din )d0

ν(Din )≤ c−2 for every in and every n.

It follows thatP

in diam(Din )d0 ≤ c−2P

in ν(Din ) = c−2. Since the diameter of Dinconverges uniformly to zero when n→ ∞, this implies that md0 (Λ) ≤ c−2.

For the lower estimate, let us prove that ν satisfies the hypothesis of the mass distribu-tion principle (Exercise 12.4.4). Given any U with diamU < cmindiamD1, . . . , diamDN,there exist n ≥ 1 and in such that Din intersects U and cdiamDin > diamU . By (b), we

have that ν(U) ≤ ν(Din ) ≤ c−2 diamDd0in . Take n maximum. Then, using (c), diamU ≥cdiamDin+1 ≥ c2 diamDin for some choice of in. Combining the two inequalities, we getν(U) ≤ c−2−2d0 (diamU)d0 . Then, by the mass distribution principle, md0 (Λ) ≥ c2+2d0 .Finally, extend these arguments to any dimension ℓ ≥ 1.

A.1.9. Given A1 ⊃ · · · ⊃ Ai ⊃ · · · take A = ∩∞i=1Ai. For j ≥ 1, consider A′

j = Aj \ A. By

Theorem A.1.14, we have that µ(A′j ) → 0 and so µ(Aj) → µ(A). Given A1 ⊂ · · · ⊂ Ai ⊂ · · ·

take A = ∪∞i=1Ai. For each j, consider A′

j = A \ Aj . By Theorem A.1.14, we have that

µ(A′j) → 0, that is, µ(Aj) → µ(A).

A.1.13. (Royden [Roy63]) (b) ⇒ (a) Assume that there exist Borel sets B1, B2 such thatB1 ⊂ E ⊂ B2 and m(B2 \B1) = 0. Deduce that m∗(E \B1) = 0, hence E \B1 is a Lebesguemeasurable set. Conclude that E is a Lebesgue measurable set. (a) ⇒ (c) Let E be a Lebesguemeasurable set such that m∗(E) < ∞. Given ε > 0, there exists a cover by open rectangles

DR

AFT


(Rk)k such thatP

km∗(Rk) < m∗(E) + ε. Then, A = ∪kRk is an open set containing E

and such that m∗(A) −m∗(E) < ε. Using that E is a Lebesgue measurable set, deduce thatm∗(A \E) < ε. For the general case, write E as a disjoint union of Lebesgue measurable setswith finite exterior measure. (c) ⇔ (d) It is clear that E is a Lebesgue measurable set if andonly if its complement is. (c) and (d) ⇒ (b) For each k ≥ 1, consider a closed set Fk ⊂ E andan open set Ak ⊃ E such that m∗(E \Fk) and m∗(Ak \E) are less than 1/k. Then, B1 = ∪Fkand B2 = ∩kAk are Borel sets such that B1 ⊂ E ⊂ B2 and m∗(E \ B1) = m∗(B2 \ E) = 0.Conclude that m(B2 \B1) = m∗(B2 \B1) = 0.

A.1.18. Show that x 7→ 1n

#0 ≤ j ≤ n− 1 : aj = 5 is a simple function for each n ≥ 1. ByProposition A.1.31, it follows that ω5 is measurable.

A.2.8. (a) Assume that F is uniformly integrable. Consider C > 0 corresponding to α = 1and take L = C + 1. Check that

R

|f | dµ < L for every f ∈ F . Given ε > 0, consider C > 0corresponding to α = ε/2 and take δ = ε/(2C). Check that

R

A|f | dµ < ε for every f ∈ F and

every set with µ(A) < δ. Conversely, given α > 0, take δ > 0 corresponding to ε = α and letC = L/δ. Show that

R

|f |>C |f | dµ < α. (b) Applying Exercise A.2.5 to the function |g|, show

that F satisfies the criterion in (a). (c) Let us prove three facts about f = limn fn. (i) f isfinite at almost every point : Consider L as in (a). Note that µ(x : |fn(x)| ≥ k) ≤ L/k forevery n, k ≥ 1 (Exercise A.2.4) and deduce that µ(x : |f(x)| ≥ k) ≤ L/k for every k ≥ 1.(ii) f is integrable: Fix K > 0. Given any ε > 0, take δ as in (a). Take n sufficiently largethat µ(x : |fn(x) − f(x)| > ε) < δ. Note that

Z

|f |≤K|f | dµ ≤

Z

|fn−f |≤ε|f | dµ+

Z

|f |≤K,|fn−f |>ε|f | dµ ≤ (L+ ε) +Kδ.

Deduce thatR

|f |≤K|f | dµ ≤ L for every K and

R

|f | dµ ≤ L. (iii) (fn)n converges to f

in L1(µ): Show that, given ε > 0 there exists K > 0 such thatR

|f |>K|f | dµ < ε and

R

|f |>K |fn| dµ < ε for every n. Take δ as in part (a) and n large enough that µ(x : |fn(x) −f(x)| > ε) < δ. Then,

Z

|f |≤K|fn − f | dµ ≤

Z

|fn−f |≤ε|fn − f | dµ +

Z

|fn−f |>ε|fn| dµ +

Z

|fn|≤K,|fn−f |>ε|f | dµ.

The right-hand side is bounded above by 2ε+Kδ. Combining these inequalities,R

|fn−f | dµ <4ε+Kδ for every n sufficiently large.

A.2.14. It is no restriction to assume that the Bn are pairwise disjoint. For each n, considerthe measure ηn defined in Bn by ηn(A) = η(f(A)). Then, ηn ≪ (η | Bn) and, by the theoremof Radon-Nikodym, there exists ρn : Bn → [0,+∞] such that

R

Bnφ dηn =

R

Bnφρn dη for

every bounded measurable function φ : Bn → R. Define Jη | Bn = ρn. The essentialuniqueness of Jη is a consequence of the essential uniqueness of the Radon-Nikodym derivative.

A.3.5. Given any Borel set B ⊂ M , use Proposition A.3.2 and Lemma A.3.4 to constructLipschitz functions ψn : M → [0, 1] such that µ(x ∈ M : ψn(x) 6= XB(x)) ≤ 2−n forevery n. Conclude that the claim in the exercise is true for every simple function. Extendthe conclusion to every bounded measurable function, using the fact that it is a uniformlimit of simple functions. Finally, for any integrable function, use the fact that the positivepart and the negative part are monotone pointwise limits of bounded measurable functions.Now consider M = [0, 1] and assume that there exists a sequence of continuous functionsψn : M → R converging to the characteristic function ψ of M ∩ Q at every point. Considerthe set R = ∩m ∪n>m x ∈ M : ψn(x) > 1/2. On the one hand, R = Q ∩M ; on the otherhand, R is a residual subset of M ; this is a contradiction.

A.4.6. By the inverse function theorem, for every x ∈ M there exist neighborhoods U(x) ⊂Mof x and V (x) ⊂ N of f(x) such that f maps U(x) diffeomorphically onto V (x). This impliesthat the function y 7→ #f−1(y) is lower semi-continuous. Moreover, this function is bounded.Indeed, if there were yn ∈ N with #f−1(yn) ≥ n for every n ≥ 1 then, since M is compact,we could find xn, x′n ∈ f−1(yn) distinct with d(xn, x′n) → 0. Let x be any accumulation pointof either sequence. Then, f would not be injective in the neighborhood of x, contradicting thehypothesis. Let k be the maximum value of #f−1(y). The set Bk of points y ∈ N such that#f−1(y) = k is open, closed and non-empty. Since N is connected, it follows that Bk = M .

DR

AFT

507

A.4.9. Consider local charts ϕα : Uα → Xα, x 7→ ϕα(x) of M and ϕα : TUαM → Xα × Rd,

(x, v) 7→ (ϕα,Dϕα(x)v) of TM . Note that ϕαπDϕ−1α is the canonical projectionXα×Rd →

Xα, which is infinitely differentiable. Since M is of class Cr and TM is of class Cr−1, it followsthat π is of class Cr−1.

A.5.2. (a) Use the fact that the exponential function is convex. (b) Starting from the Younginequality, show that

R

|fg| dµ ≤ 1 whenever ‖f‖p = ‖g‖q = 1. Deduce the general case ofthe Holder inequality. (c) Start by noting that |f + g|p ≤ |f ||f + g|p−1 + |g||f + g|p−1. Applythe Holder inequality to each of the terms on the right-hand side of this inequality to obtainthe Minkowski inequality.

A.5.6. (Rudin [Rud87, Theorem 6.16]) Note that Φ(g) ∈ Lp(µ)∗ and ‖Φ(g)‖ ≤ ‖g‖q : forq < ∞ that follows from the Holder inequality; the case q = ∞ is immediate. It is clear thatΦ is linear. To see that it is injective, given g such that Φ(g) = 0, consider a function βwith values on the unit circle such that βg = |g|. Then, φ(g)β =

R

|g| dµ = 0, hence g = 0.We are left to prove that for every φ ∈ Lp(µ)∗ there exists g ∈ Lq(µ) such that φ = Φ(g)and ‖g‖q = ‖φ‖. For each measurable set B ⊂ M , define η(B) = φ(XB). Check that ηis a complex measure (to prove σ-additivity one needs p < ∞) and observe that η ≪ µ.Consider the Radon-Nikodym derivative g = (dη/dµ). Then, φ(XB) =

R

B g dµ for every B;conclude that φ(f) =

R

fg dµ for every f ∈ L∞(µ). In the case p = 1, this construction yields|R

B g dµ| ≤ ‖φ‖µ(B) for every measurable set. Deduce that ‖g‖∞ ≤ ‖φ‖. Now suppose that

1 < p <∞. Take fn = XBnβ|g|q−1, where Bn = x : |g(x)| ≤ n. Observe that fn ∈ L∞(µ)and |fn|p = |g|q in the set Bn and

Z

Bn

|g|q dµ =

Z

fng dµ = φ(fn) ≤ ‖φ‖`

Z

|fn|p dµ´1/p ≤ ‖φ‖

`

Z

Bn

|g|q dµ´1/p

.

This yieldsR

Bn|g|q dµ ≤ ‖φ‖q for every n and, thus, ‖g‖q ≤ ‖φ‖. Finally, φ(f) =

R

fg dµ

for every f ∈ Lp(µ), since the two sides are continuous functionals and they coincide on thedense subset L∞.

A.6.5. By definition, u · Lv = L∗u · v and u · L∗v = (L∗)∗u · v for any u and v. Hence,v · (L∗)∗u = L∗v · u for any u and v. Reversing the roles of u and v, we see that L = (L∗)∗.Note that ‖L∗u · v‖ ≤ ‖L‖ ‖u‖ ‖v‖ for every u and v. Taking v = L∗u, it follows that‖L∗u‖ ≤ ‖L‖ ‖u‖ for every u and so ‖L∗‖ ≤ ‖L‖. Since L = (L∗)∗, it follows that ‖L‖ ≤ ‖L∗,hence the two norms coincide. Since the operator norm is submultiplicative, ‖L∗L‖ ≤ ‖L‖2.On the other hand, u · L∗Lu = ‖Lu‖2 and so ‖L∗L‖ ‖u‖2 ≥ ‖Lu‖2, for every u. Deduce that‖L∗L‖ ≥ ‖L‖2 and so the two expressions coincide. Analogously, ‖LL∗‖ = ‖L‖2.

A.6.8. Assume that v ∈ H and (un)n is a sequence in E such that un · v → u · v for everyv ∈ H. Considering v ∈ E⊥, conclude that v ∈ (E⊥)⊥. By Exercise A.6.7, it follows thatu ∈ E. Therefore, E is closed in the weak topology. Now consider any sequence (vn)n inU(E) converging to some v ∈ H. For each n, take un = h−1(vn) ∈ E. Since h is an isometry,‖um − un‖ = ‖vm − vn‖ for any m,n. It follows that (un)n is a Cauchy sequence in E andso it admits a limit u ∈ E. Hence, v = h(u) is in U(E).

A.7.1. The inverse of T +H is given by the equation (T +H)(T−1 + J) = id , which may berewritten as a fixed point equation J = −L−1HL−1 + L−1HJ . Use the hypothesis to showthat this equation admits a (unique) solution. Hence, T +H is an isomorphism. Deduce thatL − λid whenever λ > ‖L‖. Therefore, the spectrum of L is contained in the disk of radius‖L‖. It also follows from the previous observation that if L−λid is an isomorphism then thesame is true for L− λ′id if λ′ is sufficiently close to λ.

A.7.4. (1) Observe that L− λid =R

(z − λ) dE(z) and use Lemma A.7.4. By the continuityfrom below property (Exercise A.1.9), E(λ) = limn E(z : |z − λ| ≤ 1/n). It follows thatE(λ)v = v. (2) It follows from Exercise A.7.3 that E(B)E(λ) = E(λ) if λ ∈ B andE(B)E(λ) = E(∅) = 0 otherwise. Since L =

R

z dE(z), we get that Lv = λE(λ)v = λv.


Bibliography

[Aar97] J. Aaronson. An introduction to infinite ergodic theory, volume 50 of MathematicalSurveys and Monographs. American Mathematical Society, 1997.

[AB] A. Avila and J. Bochi. Proof of the subadditive ergodic theorem. Preprintwww.mat.puc-rio.br/∼jairo/.

[AF07] A. Avila and G. Forni. Weak mixing for interval exchange transformations andtranslation flows. Ann. of Math., 165:637–664, 2007.

[AKM65] R. Adler, A. Konheim, and M. McAndrew. Topological entropy. Trans. Amer.Math. Soc., 114:309–319, 1965.

[AKN06] V. Arnold, V. Kozlov, and A. Neishtadt. Mathematical aspects of classical andcelestial mechanics, volume 3 of Encyclopaedia of Mathematical Sciences. Springer-Verlag, third edition, 2006. [Dynamical systems. III], Translated from the Russianoriginal by E. Khukhro.

[Ano67] D. V. Anosov. Geodesic flows on closed Riemannian manifolds of negative curva-ture. Proc. Steklov Math. Inst., 90:1–235, 1967.

[Arn78] V. I. Arnold. Mathematical methods of classical mechanics. Springer Verlag, 1978.

[AS67] D. V. Anosov and Ya. G. Sinai. Certain smooth ergodic systems. Russian Math.Surveys, 22:103–167, 1967.

[Bal00] V. Baladi. Positive transfer operators and decay of correlations. World ScientificPublishing Co. Inc., 2000.

[BDV05] C. Bonatti, L. J. Dıaz, and M. Viana. Dynamics beyond uniform hyperbolicity,volume 102 of Encyclopaedia of Mathematical Sciences. Springer-Verlag, 2005.

[Bil68] P. Billingsley. Convergence of probability measures. John Wiley & Sons Inc., NewYork, 1968.

[Bil71] P. Billingsley. Weak convergence of measures: Applications in probability. Societyfor Industrial and Applied Mathematics, 1971. Conference Board of the Mathe-matical Sciences Regional Conference Series in Applied Mathematics, No. 5.

[Bir13] G. D. Birkhoff. Proof of Poincare’s last Geometric Theorem. Trans. Amer. Math.Soc., 14:14–22, 1913.

[Bir67] G. Birkhoff. Lattice theory, volume 25. A.M.S. Colloq. Publ., 1967.

[BK83] M. Brin and A. Katok. On local entropy. In Geometric dynamics (Rio de Janeiro,1981), volume 1007 of Lecture Notes in Math., pages 30–38. Springer-Verlag, 1983.

[BLY] D. Burguet, G. Liao, and J. Yang. Asymptotic h-expansiveness rate of C∞ maps.www.arxiv.org:1404.1771.

[Bos86] J.-B. Bost. Tores invariants des systemes hamiltoniens. Asterisque, 133–134:113–157, 1986.

[Bos93] M. Boshernitzan. Quantitative recurrence results. Invent. Math., 113(3):617–631,1993.

509

510 BIBLIOGRAPHY

[Bow71] R. Bowen. Entropy for group endomorphisms and homogeneous spaces. Trans.Amer. Math. Soc., 153:401–414, 1971.

[Bow72] R. Bowen. Entropy expansive maps. Trans. Am. Math. Soc., 164:323–331, 1972.

[Bow75a] R. Bowen. Equilibrium states and the ergodic theory of Anosov diffeomorphisms,volume 470 of Lect. Notes in Math. Springer Verlag, 1975.

[Bow75b] R. Bowen. A horseshoe with positive measure. Invent. Math., 29:203–204, 1975.

[Bow78] R. Bowen. Entropy and the fundamental group. In The Structure of Attractors inDynamical Systems, volume 668 of Lecture Notes in Math., pages 21–29. Springer-Verlag, 1978.

[BS00] L. Barreira and J. Schmeling. Sets of “non-typical” points have full topologicalentropy and full Hausdorff dimension. Israel J. Math., 116:29–70, 2000.

[Buz97] J. Buzzi. Intrinsic ergodicity for smooth interval maps. Isreal J. Math, 100:125–161,1997.

[Car70] H. Cartan. Differential forms. Hermann, 1970.

[Cas04] A. A. Castro. Teoria da medida. Projeto Euclides. IMPA, 2004.

[Cla72] J. Clark. A Kolmogorov shift with no roots. ProQuest LLC, Ann Arbor, MI, 1972.Thesis (Ph.D.)–Stanford University.

[dC79] M. do Carmo. Geometria riemanniana, volume 10 of Projeto Euclides. Institutode Matematica Pura e Aplicada, 1979.

[Dei85] K. Deimling. Nonlinear functional analysis. Springer Verlag, 1985.

[Din70] E. Dinaburg. A correlation between topological entropy and metric entropy. Dokl.Akad. Nauk SSSR, 190:19–22, 1970.

[Din71] E. Dinaburg. A connection between various entropy characterizations of dynamicalsystems. Izv. Akad. Nauk SSSR Ser. Mat., 35:324–366, 1971.

[dlL93] R. de la Llave. Introduction to K.A.M. theory. In Computational physics(Almunecar, 1992), pages 73–105. World Sci. Publ., 1993.

[DS57] N. Dunford and J. Schwarz. Linear operators I: General theory. Wiley & Sons,1957.

[DS63] N. Dunford and J. Schwarz. Linear operators II: Spectral theory. Wiley & Sons,1963.

[Dug66] J. Dugundji. Topology. Allyn and Bacon Inc., 1966.

[Edw79] R. E. Edwards. Fourier series. A modern introduction. Vol. 1, volume 64 of Grad-uate Texts in Mathematics. Springer-Verlag, second edition, 1979.

[ET36] P. Erdos and P. Turan. On some sequences of integers. J. London. Math. Soc.,11:261–264, 1936.

[Fal90] K. Falconer. Fractal geometry. John Wiley & Sons Ltd., 1990. Mathematicalfoundations and applications.

[Fer02] R. Fernandez. Medida e integracao. Projeto Euclides. IMPA, 2002.

[FFT09] S. Ferenczi, A. Fisher, and M. Talet. Minimality and unique ergodicity for adictransformations. J. Anal. Math., 109:1–31, 2009.

[FO70] N. Friedman and D. Ornstein. On isomorphism of weak Bernoulli transformations.Advances in Math., 5:365–394, 1970.

[Fri69] N. Friedman. Introduction to ergodic theory. Van Nostrand, 1969.

[Fur61] H. Furstenberg. Strict ergodicity and transformation of the torus. Amer. J. Math.,83:573–601, 1961.

[Fur77] H. Furstenberg. Ergodic behavior and a theorem of Szemeredi on arithmetic pro-gressions. J. d’Analyse Math., 31:204–256, 1977.

BIBLIOGRAPHY 511

[Fur81] H. Furstenberg. Recurrence in ergodic theory and combinatorial number theory.Princeton Univertsity Press, 1981.

[Goo71a] T. Goodman. Relating topological entropy and measure entropy. Bull. LondonMath. Soc., 3:176–180, 1971.

[Goo71b] G. Goodwin. Optimal input signals for nonlinear-system identification. Proc. Inst.Elec. Engrs., 118:922–926, 1971.

[GT08] B. Green and T. Tao. The primes contain arbitrarily long arithmetic progressions.Ann. of Math., 167:481–547, 2008.

[Gur61] B. M. Gurevic. The entropy of horocycle flows. Dokl. Akad. Nauk SSSR, 136:768–770, 1961.

[Hal50] P. Halmos. Measure Theory. D. Van Nostrand Company, 1950.

[Hal51] P. Halmos. Introduction to Hilbert Space and the theory of Spectral Multiplicity.Chelsea Publishing Company, New York, 1951.

[Hay] N. Haydn. Multiple measures of maximal entropy and equilibrium states for one-simensional subshifts. Preprint Penn State University.

[Hir94] M. Hirsch. Differential topology, volume 33 of Graduate Texts in Mathematics.Springer-Verlag, 1994. Corrected reprint of the 1976 original.

[Hof77] F. Hofbauer. Examples for the nonuniqueness of the equilibrium state. Trans.Amer. Math. Soc., 228:223–241, 1977.

[Hop39] E. F. Hopf. Statistik der geodatischen Linien in Mannigfaltigkeiten negativerKrummung. Ber. Verh. Sachs. Akad. Wiss. Leipzig, 91:261–304, 1939.

[HvN42] P. Halmos and J. von Neumann. Operator methods in classical mechanics. II. Ann.of Math., 43:332–350, 1942.

[Jac60] K. Jacobs. Neuere Methoden und Ergebnisse der Ergodentheorie. Ergebnisse derMathematik und ihrer Grenzgebiete. N. F., Heft 29. Springer-Verlag, 1960.

[Jac63] K. Jacobs. Lecture notes on ergodic theory, 1962/63. Parts I, II. MatematiskInstitut, Aarhus Universitet, Aarhus, 1963.

[Kal82] S. Kalikow. T, T−1 transformation is not loosely Bernoulli. Ann. of Math., 115:393–409, 1982.

[Kat71] Yi. Katznelson. Ergodic automorphisms of Tn are Bernoulli shifts. Israel J. Math.,10:186–195, 1971.

[Kat80] A. Katok. Lyapunov exponents, entropy and periodic points of diffeomorphisms.Publ. Math. IHES, 51:137–173, 1980.

[Kea75] M. Keane. Interval exchange transformations. Math. Zeit., 141:25–31, 1975.

[KM10] S. Kalikow and R. McCutcheon. An outline of ergodic theory, volume 122 of Cam-bridge Studies in Advanced Mathematics. Cambridge University Press, 2010.

[Kok35] J. F. Koksma. Ein mengentheoretischer Satz uber die Gleichverteilung moduloEins. Compositio Math., 2:250–258, 1935.

[KR80] M. Keane and G. Rauzy. Stricte ergodicite des echanges d’intervalles. Math. Zeit.,174:203–212, 1980.

[Kri70] W. Krieger. On entropy and generators of measure-preserving transformations.Trans. Amer. Math. Soc., 149:453–464, 1970.

[Kri75] W. Krieger. On the uniqueness of the equilibrium state. Math. Systems Theory,8:97–104, 1974/75.

[KSS91] A. Kramli, N. Simanyi, and D. Szasz. The K-property of three billiard balls. Ann.of Math., 133:37–72, 1991.

[KSS92] A. Kramli, N. Simanyi, and D. Szasz. The K-property of four billiard balls. Comm.Math. Phys., 144:107–148, 1992.

512 BIBLIOGRAPHY

[KW82] Y. Katznelson and B. Weiss. A simple proof of some ergodic theorems. Israel J.Math., 42:291–296, 1982.

[Lan73] O. Lanford. Entropy and equilibrium states in classical statistical mechanics. InStatistical mechanics and mathematical problems, volume 20 of Lecture Notes inPhysics, page 1–113. Springer Verlag, 1973.

[Led84] F. Ledrappier. Proprietes ergodiques des mesures de Sinaı. Publ. Math. I.H.E.S.,59:163–188, 1984.

[Lin77] D. Lind. The structure of skew products with ergodic group actions. Israel J.Math., 28:205–248, 1977.

[LS82] F. Ledrappier and J.-M. Strelcyn. A proof of the estimation from below in pesin’sentropy formula. Ergod. Th & Dynam. Sys, 2:203–219, 1982.

[LVY13] G. Liao, M. Viana, and J. Yang. The entropy conjecture for diffeomorphisms awayfrom tangencies. J. Eur. Math. Soc. (JEMS), 15(6):2043–2060, 2013.

[LY85a] F. Ledrappier and L.-S. Young. The metric entropy of diffeomorphisms. I. Charac-terization of measures satisfying Pesin’s entropy formula. Ann. of Math., 122:509–539, 1985.

[LY85b] F. Ledrappier and L.-S. Young. The metric entropy of diffeomorphisms. II. Relationsbetween entropy, exponents and dimension. Ann. of Math., 122:540–574, 1985.

[Man75] A. Manning. Topological entropy and the first homology group. In DynamicalSystems, Warwick, 1974, volume 468 of Lecture Notes in Math., pages 185–190.Springer-Verlag, 1975.

[Man85] R. Mane. Hyperbolicity, sinks and measure in one-dimensional dynamics. Comm.Math. Phys., 100:495–524, 1985.

[Man87] R. Mane. Ergodic theory and differentiable dynamics. Springer Verlag, 1987.

[Mas82] H. Masur. Interval exchange transformations and measured foliations. Ann. ofMath, 115:169–200, 1982.

[Mey00] C. Meyer. Matrix analysis and applied linear algebra. Society for Industrial andApplied Mathematics (SIAM), 2000.

[Mis73] M. Misiurewicz. Diffeomorphim without any measure of maximal entropy. Bull.Acad. Pol. Sci., 21:903–910, 1973. Series sci. math, astr. et phys.

[Mis76] M. Misiurewicz. A short proof of the variational principle for a ZN+ action on acompact space. Asterisque, 40:147–187, 1976.

[MP77a] M. Misiurewicz and F. Przytycki. Entropy conjecture for tori. Bull. Pol. Acad. Sci.Math., 25:575–578, 1977.

[MP77b] M. Misiurewicz and F. Przytycki. Topological entropy and degree of smooth map-pings. Bull. Pol. Acad. Sci. Math., 25:573–574, 1977.

[MP08] W. Marzantowicz and F. Przytycki. Estimates of the topological entropy frombelow for continuous self-maps on some compact manifolds. Discrete Contin. Dyn.Syst. Ser., 21:501–512, 2008.

[MT78] G. Miles and R. Thomas. Generalized torus automorphisms are Bernoullian. Ad-vances in Math. Supplementary Studies, 2:231–249, 1978.

[New88] S. Newhouse. Entropy and volume. Ergodic Theory Dynam. Systems, 8∗(CharlesConley Memorial Issue):283–299, 1988.

[New90] S. Newhouse. Continuity properties of entropy. Annals of Math., 129:215–235,1990. Errata in Annals of Math. 131:409–410, 1990.

[NP66] D. Newton and W. Parry. On a factor automorphism of a normal dynamical system.Ann. Math. Statist, 37:1528–1533, 1966.

[NR97] A. Nogueira and D. Rudolph. Topological weak-mixing of interval exchange maps.Ergod. Th. & Dynam. Sys., 17:1183–1209, 1997.

BIBLIOGRAPHY 513

[Orn60] D. Ornstein. On invariant measures. Bull. Amer. Math. Soc., 66:297–300, 1960.

[Orn70] D. Ornstein. Bernoulli shifts with the same entropy are isomorphic. Advances inMath., 4:337–352 (1970), 1970.

[Orn72] Donald S. Ornstein. On the root problem in ergodic theory. In Proceedings ofthe Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ.California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pages 347–356. Univ. California Press, 1972.

[Orn74] D. Ornstein. Ergodic theory, randomness, and dynamical systems. Yale Univer-sity Press, 1974. James K. Whittemore Lectures in Mathematics given at YaleUniversity, Yale Mathematical Monographs, No. 5.

[OS73] D. Ornstein and P. Shields. An uncountable family of K-automorphisms. Advancesin Math., 10:63–88, 1973.

[OU41] J. C. Oxtoby and S. M. Ulam. Measure-preserving homeomorphisms and metricaltransitivity. Ann. of Math., 42:874–920, 1941.

[Par53] O. S. Parasyuk. Flows of horocycles on surfaces of constant negative curvature.Uspehi Matem. Nauk (N.S.), 8:125–126, 1953.

[Pes77] Ya. B. Pesin. Characteristic Lyapunov exponents and smooth ergodic theory. Rus-sian Math. Surveys, 324:55–114, 1977.

[Pes97] Ya. Pesin. Dimension theory in dynamical systems. University of Chicago Press,1997. Contemporary views and applications.

[Pet83] K. Petersen. Ergodic theory. Cambridge Univ. Press, 1983.

[Phe93] R. Phelps. Convex functions, monotone operators and differentiability, volume1364 of Lecture Notes in Mathematics. Springer-Verlag, second edition, 1993.

[Pin60] M. S. Pinsker. Informatsiya i informatsionnaya ustoichivostsluchainykh velichin iprotsessov. Problemy Peredaci Informacii, Vyp. 7. Izdat. Akad. Nauk SSSR, 1960.

[PT93] J. Palis and F. Takens. Hyperbolicity and sensitive-chaotic dynamics at homoclinicbifurcations. Cambridge University Press, 1993.

[PU10] F. Przytycki and M. Urbanski. Conformal fractals: ergodic theory methods, volume371 of London Mathematical Society Lecture Note Series. Cambridge UniversityPress, 2010.

[PW72a] W. Parry and P. Walters. Errata: “Endomorphisms of a Lebesgue space”. Bull.Amer. Math. Soc., 78:628, 1972.

[PW72b] W. Parry and P. Waltersr. Endomorphisms of a Lebesgue space. Bull. Amer. Math.Soc., 78:272–276, 1972.

[PY98] M. Pollicott and M. Yuri. Dynamical systems and ergodic theory, volume 40 ofLondon Mathematical Society Student Texts. Cambridge University Press, 1998.

[Qua99] A. Quas. Most expanding maps have no absolutely continuous invariant mesure.Studia Math., 134:69–78, 1999.

[Que87] M. Queffelec. Substitution dynamical systems—spectral analysis, volume 1294 ofLecture Notes in Mathematics. Springer-Verlag, 1987.

[Rok61] V. A. Rokhlin. Exact endomorphisms of a Lebesgue space. Izv. Akad. Nauk SSSRSer. Mat., 25:499–530, 1961.

[Rok62] V. A. Rokhlin. On the fundamental ideas of measure theory. A. M. S. Transl.,10:1–54, 1962. Transl. from Mat. Sbornik 25 (1949), 107–150. First published bythe A. M. S. in 1952 as Translation Number 71.

[Rok67a] V. A. Rokhlin. Lectures on the entropy theory of measure-preserving transforma-tions. Russ. Math. Surveys, 22 -5:1–52, 1967. Transl. from Uspekhi Mat. Nauk. 22- 5 (1967), 3–56.

514 BIBLIOGRAPHY

[Rok67b] V. A. Rokhlin. Metric properties of endomorphisms of compact commuitativegroups. Amer. Math. Soc. Transl., 64:244–252, 1967.

[Roy63] H. L. Royden. Real analysis. The Macmillan Co., 1963.

[RS61] V. A. Rokhlin and Ja. G. Sinaı. The structure and properties of invariant measur-able partitions. Dokl. Akad. Nauk SSSR, 141:1038–1041, 1961.

[Rud87] W. Rudin. Real and complex analysis. McGraw-Hill, 1987.

[Rue73] D. Ruelle. Statistical Mechanics on a compact set with Zν action satisfying expan-siveness and specification. Trans. Amer. Math. Soc., 186:237–251, 1973.

[Rue78] D. Ruelle. An inequality for the entropy of differentiable maps. Bull. Braz. Math.Soc., 9:83–87, 1978.

[Rue04] D. Ruelle. Thermodynamic formalism. Cambridge Mathematical Library. Cam-bridge University Press, second edition, 2004. The mathematical structures ofequilibrium statistical mechanics.

[RY80] C. Robinson and L. S. Young. Nonabsolutely continuous foliations for an Anosovdiffeomorphism. Invent. Math., 61:159–176, 1980.

[SC87] Ya. Sinaı and Nikolay Chernov. Ergodic properties of some systems of two-dimensional disks and three-dimensional balls. Uspekhi Mat. Nauk, 42:153–174,256, 1987.

[Shu69] M. Shub. Endomorphisms of compact differentiable manifolds. Amer. Journal ofMath., 91:129–155, 1969.

[Shu74] M. Shub. Dynamical systems, filtrations and entropy. Bull. Amer. Math. Soc.,80:27–41, 1974.

[Sim02] N. Simanyi. The complete hyperbolicity of cylindric billiards. Ergodic TheoryDynam. Systems, 22:281–302, 2002.

[Sin63] Ya. Sinaı. On the foundations of the ergodic hypothesis for a dynamical system ofStatistical Mechanics. Soviet. Math. Dokl., 4:1818–1822, 1963.

[Sin70] Ya. Sinaı. Dynamical systems with elastic reflections. Ergodic properties of dispers-ing billiards. Uspehi Mat. Nauk, 25:141–192, 1970.

[Ste58] E. Sternberg. On the structure of local homeomorphisms of Euclidean n-space - II.Amer. J. Math., 80:623–631, 1958.

[SW75] M. Shub and R. Williams. Entropy and stability. Topology, 14:329–338, 1975.

[SX10] R. Saghin and Z. Xia. The entropy conjecture for partially hyperbolic diffeomor-phisms with 1-D center. Topology Appl., 157:29–34, 2010.

[Sze75] S. Szemeredi. On sets of integers containing no k elements in arithmetic progression.Acta Arith., 27:199–245, 1975.

[vdW27] B. van der Waerden. Beweis eibe Baudetschen Vermutung. Nieuw Arch. Wisk.,15:212–216, 1927.

[Vee82] W. Veech. Gauss measures for transformations on the space of interval exchangemaps. Ann. of Math., 115:201–242, 1982.

[Ver99] Alberto Verjovsky. Sistemas de Anosov, volume 9 of Monographs of the Instituteof Mathematics and Related Sciences. Instituto de Matematica y Ciencias Afines,IMCA, Lima, 1999.

[Via14] M. Viana. Lectures on Lyapunov exponents. Cambridge University Press, 2014.

[VO14] M. Viana and K. Oliveira. Fundamentos da Teoria Ergodica. Colecao Fronteirasda Matematica. Sociedade Brasileira de Matematica, 2014.

[Wal73] P. Walters. Some results on the classification of non-invertible measure preservingtransformations. In Recent advances in topological dynamics (Proc. Conf. Topolog-ical Dynamics, Yale Univ., New Haven, Conn., 1972; in honor of Gustav ArnoldHedlund), pages 266–276. Lecture Notes in Math., Vol. 318. Springer-Verlag, 1973.

BIBLIOGRAPHY 515

[Wal75] P. Walters. A variational principle for the pressure of continuous transformations.Amer. J. Math., 97:937–971, 1975.

[Wal82] P. Walters. An introduction to ergodic theory. Springer Verlag, 1982.

[Wey16] H. Weyl. Uber die Gleichverteilungen von Zahlen mod Eins. Math. Ann., 77:313–352, 1916.

[Yan80] K. Yano. A remark on the topological entropy of homeomorphisms. Invent. Math.,59:215–220, 1980.

[Yoc92] J.-C. Yoccoz. Travaux de Herman sur les tores invariants. Asterisque, 206:Exp.No. 754, 4, 311–344, 1992. Seminaire Bourbaki, Vol. 1991/92.

[Yom87] Y. Yomdin. Volume growth and entropy. Israel J. Math., 57:285–300, 1987.

[Yos68] K. Yosida. Functional analysis. Second edition. Die Grundlehren der mathematis-chen Wissenschaften, Band 123. Springer-Verlag, 1968.

[Yuz68] S. A. Yuzvinskii. Metric properties of endomorphisms of compact groups. Amer.Math. Soc. Transl., 66:63–98, 1968.

[Zyg68] A. Zygmund. Trigonometric series: Vols. I, II. Second edition, reprinted withcorrections and some additions. Cambridge University Press, 1968.

Index

2X

family of all subsets, 440A∆B

symmetric difference of sets, 444B(x, T, ε)

dynamical ball for flows, 324B(x,∞, ε)

infinite dynamical ball, 331B(x, n, ε)

dynamical ball, 269B(x, r)

ball of center x and radius r, 462Bδ

δ-neighborhood of a set, 36C∗

dual cone, 52C0(M)

space of continuous functions, 50, 445,466

C0+(M)

cone of positive functions, 52Cβ(M)

space of Holder functions, 422Cr(M,N)

space of Cr maps, 470Cn(ϕ,ψ)

correlations sequence, 188Dl

lower density, 59Du

upper density, 59Df

derivative of a map, 471, 472E(A,P )

conditional expectation, 157E∗

dual of a Banach space, 49G(f, φ)

pressure, via generating sets, 335H(α)

entropy of an open cover, 308Hβ(g)

Holder constant, 409Hµ(P)

entropy of a partition, 252Hµ(P/Q)

conditional entropy, 254Hβ,ρ(g)

local Holder constant, 409I(U)

set of invariant vectors, 67I(A)

mean information of an alphabet, 251I(a)

information of a symbol, 251IP

information function of a partition, 252L∞(µ)

space of essentially bounded functions,480

Lp(µ)space of p-integrable functions, 478

P (f, φ)pressure, 332

P (f, φ, α)pressure relative to an open cover, 332

P (x, ·)transition probability, 196

Pi,jtransition probability, 197

Rθrotation on circle or torus, 16

S(f, φ)pressure, via separated sets, 335

S1

circle, 16S⊥

orthogonal complement, 483Sd

sphere of dimension d, 469TM

tangent bundle, 472T 1M

unit tangent bundle, 477TpM

tangent space at a point, 471Uf

Koopman operator, 50, 51U∗f

dual of the Koopman operator, 51V (µ,Φ, ε)

neighborhoods in weak∗ topology, 36

516

INDEX 517

V (v, g1, . . . , gN, ε)neighborhoods in weak topology, 49

V ∗(g, v1, . . . , vN, ε)neighborhoods in weak∗ topology, 50

Va(µ,A, ε)neighborhoods in weak∗ topology, 37

Vc(µ,B, ε)neighborhoods in weak∗ topology, 37

Vf (µ,F , ε)neighborhoods in weak∗ topology, 37

Vp(µ,B, ε)neighborhoods in pointwise topology, 44

Vu(µ, ε)neighborhoods in uniform topology, 44

XBcharacteristic function of a set, 449

Diffeor(M)space of Cr diffeomorphisms, 471

Fix(f)set of fixed points, 320

GL(d,R)linear group, 79, 170, 474

O(d,R)orthogonal group, 170

SL(d,R)special linear group, 170, 475

ΣA, ΣPshift of finite type, 199, 321

α ∨ βsum of open covers, 308

α ≺ βorder relation for open covers, 308

αn, α±n

iterated sum of an open cover, 308, 315,316

L1(µ)space of integrable functions, 453

M(X)space of measures, 50, 445

M1(M)space of probability measures, 36

M1(f)space of invariant probability measures,

121Me(f)

space of ergodic probability measures,121

P ≺ Qorder relation for partitions, 254

P ∨ Qsum of partitions, 252

Pn, P±n

iterated sum of a partition, 256, 259Ur(f, ε)

Cr neighborhood of a map, 470δp

Dirac measure, 47

divFdivergence of a vector field, 20

degree(f)degree of a map, 477

λ = (λα)αlength vector, 208

λmax

largest Lyapunov exponent, 86λmin

smallest Lyapunov exponent, 86µ ⊥ ν

mutually singular measures, 459ν ≪ µ

absolutely continuous measure, 459∂I

left endpoint of an interval, 209∂P

boundary of a partition, 265, 349Pd

projective space, 477ρ(B)

spectral radius, 323spec(L)

spectrum of a linear operator, 486supessϕ

essential supremum of a function, 480suppµ

support of a measure, 448, 486tanh

hyperbolic tangent, 414τ(E, x)

mean sojourn time, 65θ(g1, g2)

projective distance, 412ϕ

time average of a function, 73Td

torus of dimension d, 18, 469ϕ+

positive part of a function, 453ϕ−

negative part of a function, 453ϕn

orbital sum of a function, 332, 388∨αUα

σ-algebra generated by a family, 286d-adic interval, 498d(M)

Hausdorff dimension, 425e(ψ, x)

conditional expectation, 155f∗µ

image of a measure, 45, 50fA

linear endomorphism of the torus, 115g(φ)

518 INDEX

topological entropy of flows, via gener-ating sets, 325

g(f)topological entropy, via generating sets,

311h(f)

topological entropy, 309h(f, α)

entropy relative to an open cover, 309hµ(f)

entropy of a dynamical system, 257hµ(f,P)

entropy relative to a partition, 257hµ(f,P, x)

entropy (local) at a point, 268h±µ (f, ε, x)

entropy (local) at a point, 269md(M)

d-dimensional Hausdorff measure, 425s(φ)

topological entropy of flows, via sepa-rated sets, 325

s(f)topological entropy, via separated sets,

311w = (wα)α

translation vector, 208

1-parameter group, 68σ-additive function, 443, 486σ-algebra, 440

Borel, iii, 441generated, 286, 441

up to measure zero, 444product, 108, 456, 457

σ-finite measure, 8, 75, 442C0 topology, 385C1 topology, 385Cr topology, 470h-expansive map, 320, 331, 355k-linear form, 473L2 convergence, 70L∞ norm, 480Lp norm, 478p-integrable function, 478

absolute continuity, 118, 447, 459theorem, 137

absolutelycontinuous measure, 121, 459summable series, 481

action-angle coordinates, 127adding machine, 176additive

function, 442sequence, 79

adjoint linear operator, 484

admissible sequence, 321affine function, 292algebra, 440

compact, 443generating, 10of functions, 468

separating, 468of measures, 241

almostevery point, 99, 455everywhere, 455

convergence, 455integrable system, 126

alphabet, 208alternate form, 473Anosov

flow, 136system, vtheorem, 136

aperiodicstochastic matrix, 203system, 29, 264

approximate eigenvalue, 228approximation theorem, 444area form, 207arithmetic progression, 58

length, 58atlas

compatible, 470differentiable, 469of class Cr , 469

atom, 466, 488atomic measure, 466Aubry-Mather set, 133automorphism

Bernoulli, 282Kolmogorov, 286, 289Mobius, 423of a group, 170

Avogadro constant, 339

Baire space, 122, 124, 471, 475Banach space, 49, 478, 482Banach-Alaoglu theorem, 484Banach-Mazur theorem, 52barycenter of a measure, 294basin of a measure, 103, 360basis

dual, 473Fourier, 482Hammel, 483Hilbert, 482of neighborhoods, 36, 37, 448

countable, 448of open sets, 448

countable, 448of the topology, 448

INDEX 519

Bernoulliautomorphism, 282measure, 197, 457shift, 108, 109, 197

billiard, 138corner, 138dispersing, 143semi-dispersing, 144table, 138

Birkhoffergodic theorem, 66, 71, 73, 75ergodic theorem for flows, 78multiple recurrence theorem, 29normal form theorem, 131, 134recurrence theorem, 7, 48

Boltzmannconstant, 340ergodic hypothesis, v, 65, 124

Boltzmann-Sinai ergodic hypothesis, 138Borel

σ-algebra, iii, 441measure, 462normal theorem, 108set, 441

Borel-Cantelli lemma, 451bottom of a pile, 177boundary of a partition, 265, 349bounded

distortion, 106, 107, 112linear functional, 483linear operator, 484, 485

Bowen-Manning formula, 388, 428branch (inverse), 362, 429

contracting, 362, 372Brin-Katok theorem, 269bundle

cotangent, 129, 473tangent, 130, 135

Bunimovichmushroom billiard, 144stadium billiard, 144

Cantorset, 425substitution, 178

Cauchy-Schwarz inequality, 479Cayley-Klein distance, 423Chacon

example, 228substitution, 178

Champernowne constant, 107change

of coordinates, 469of variables formula, 304

characteristic function, 449circle, 16

rotation, 16

class Cr

atlas, 469diffeomorphism, 470manifold, 469map, 470

closed differential form, 474coarser

cover, 308partition, 151, 254

cocycle, 86cohomological equation, 166cohomologous potentials, 338, 406cohomology relation, 338, 343commuting maps, 29compact

algebra, 443group, 173space, 443

compactness theorem, 41compatible atlases, 470complete

measure, 444measure space, 444metric space, 465metrizable space, 471

completely metrizable space, 471completion of a measure space, 444complex measure, 445, 486concave function, 480condition

Keane, 209twist, 128, 130, 132, 134

conditionalentropy, 254expectation, 155, 157, 271probability, 149

cone, 51, 411dual, 52, 390normal, 51

configuration space, 339conformal

map, 428repeller, 388, 428

conjecture of entropy, 327conjugacy (topological), 223, 310connected space, 469conservative

flow, 19map, 19system, 49, 124

constantAvogadro, 339Boltzmann, 340Champernowne, 107of expansivity, 267, 319

continued fractionexpansion, 13

520 INDEX

of bounded type, 120continuity

absolute, 118at the empty set theorem, 443from above theorem, 451from below theorem, 451set of a measure, 37

continuousfunction, 449linear functional, 467, 483linear operator, 484map, 449

contracting inverse branch, 362, 372contraction, 314convergence

almost everywhere, 455in L2, 70in distribution, 44to equilibrium, 410

convexfunction, 480hull, 429set, 45

convexity, 121, 292coordinate change, 469correlation, 187

decay, 215correlations sequence, 188cotangent

bundle, 129, 473space, 129, 473

countable basisof neighborhoods, 448of open sets, 448

countable type shift, 236countably

additive function, 442generated system, 236

covariancematrix, 240sequence, 240

cover, 425coarser, 308diameter, 312, 316, 425finer, 308open, 308, 443

cross-ratio, 413cross-section, 91cube, 446cylinder, 457

elementary, 457measurable, 55open, 55

decay of correlations, 215, 217decimal expansion, 10

ergodicity, 106

decompositionof Oseledets, 87theorem of Hahn, 445theorem of Lebesgue, 460

degree of a map, 362, 372, 477Dehn twist, 495density

lower, 59of a measure, 361, 459point, 458upper, 59, 61zero at infinity, 195

derivation theorem of Lebesgue, 458derivative, 471, 472

exterior, 474Radon-Nikodym, 361, 459

diagonal, 31diameter

of a cover, 312, 316, 425of a partition, 266, 461

diffeomorphism, 469, 470of class Cr , 470

difference (orthogonal), 235differentiable

atlas, 469manifold, 469map, 470

differentialform, 473, 474

closed, 474exact, 474

dimensionHausdorff, 425Hilbert, 483

Diophantinenumber, 168vector, 128

Diracmass, 442measure, 47, 442

direct sum (orthogonal), 483discrete

spectrum, 221, 230spectrum theorem, 242topology, 110

disintegrationof a measure, 149theorem of Rokhlin, 152, 161

dispersing billiard, 143distance, 462

associated with Riemannian metric, 476Cayley-Klein, 423flat on the torus, 105hyperbolic, 423invariant, 174Poincare, 423projective, 412

INDEX 521

distortion, 106, 107, 112lemma, 363

distributionfunction, 44Gibbs, 341

divergence of a vector field, 20domain

fundamental, 89of invertibility, 301

dominated convergence theorem, 456dual

basis, 473cone, 52, 390linear operator, 51, 388of a Banach space, 49, 480of a Hilbert space, 484

duality, 50, 215, 388, 479dynamical

ball, 269, 311for a flow, 324infinite, 331

decomposition theorem, 377system, iii

eigenvalue, 225approximate, 228multiplicity of, 225

elementary cylinder, 457elliptic fixed point, 131–133, 135

generic, 132endomorphism

of a group, 170of the torus, 115

ergodic, 115energy

hypersurface, 125of a state, 340

entropy, vconditional, 254conjecture, 327formula, 405formula (Pesin), 280function, 265Gauss map, 276local, 268, 269of a communication channel, 251of a dynamical system, 257of a linear endomorphism, 277of a Markov shift, 274of a partition, 252of a state, 340of an open cover, 308relative to a partition, 257relative to an open cover, 309semi-continuity, 265topological, 307, 309, 313

equation

cohomological, 166Hamilton-Jacobi, 125, 127, 131

equidistributed sequence, 180equilibrium state, 308, 339, 352equivalence

ergodic, 167, 190, 221, 222, 241invariant of, 223

spectral, 221, 224, 242invariant of, 225

topological, 310of flows, 326

equivalentmeasures, 14, 459topologies, 37

ergodicdecomposition theorem, 148equivalence, 167, 190, 221, 222, 241

invariant of, 223hypothesis, v, 65, 124, 138isomorphism, 222, 241measure, 75system, 97, 98theorem

Birkhoff, 66, 71, 73, 75Birkhoff for flows, 78Kingman, 66, 80Kingman for flows, 87multiplicative, 86, 278Oseledets, 86, 278subadditive, 66, 80subadditive for flows, 87von Neumann, 66, 69, 75von Neumann for flows, 71von Neumann multiple, 196

ergodicitydecimal expansion, 106irrational rotation, 104, 105linear endomorphism, 115Markov shift, 201

essential supremum, 480essentially bounded function, 480Euclidean space, 469exact differential form, 474exactness, 291

topological, 366example

Chacon, 228Furstenberg, 166

existence theorem, 35for flows, 48

expanding map, 27of the interval, 368on a manifold, 360on a metric space, 369

expansive map, 267, 319, 362, 373two-sided, 319

expansivity

522 INDEX

constant, 267, 319one-sided, 267two-sided, 267

expectation (conditional), 271exponential

decay of correlations, 217decay of interactions, 341map, 476

extended real line, 441, 449extension

natural, 54, 57multiple, 57

of a transformation, 54theorem, 443

exteriorderivative, 474measure, 446

extremal element of a convex set, 121

factor, 261topological, 310, 386

Fatou lemma, 456Feigenbaum substitution, 178Fibonacci substitution, 178, 319filtration of Oseledets, 86finer

cover, 308partition, 151, 254

finiteMarkov shift, 198measure, 442memory, 196, 206signed measure, 445

finitely additive function, 443first integral, 22, 125first-return

map, 5, 22, 90, 91time, 5, 23, 91

fixed pointelliptic, 131–133, 135

generic, 132hyperbolic, 132non-degenerate, 131, 134

flat distance on the torus, 105flow, iii, 2, 472

Anosov, 136conservative, 19geodesic, 135, 476Hamiltonian, 125, 131horocyclic, 289suspension, 89uniformly continuous, 325uniformly hyperbolic, 136

flowsBirkhoff ergodic theorem, 78existence theorem, 48Kingman ergodic theorem, 88

Poincare recurrence theorem, 5subadditive ergodic theorem, 88topological entropy, 325, 326von Neumann ergodic theorem, 71

flux of a measure, 92, 94foliation

stable, 116, 117, 137unstable, 116, 117, 137

formk-linear, 473alternate, 473area, 207differential, 473, 474

closed, 474exact, 474

linear, 473symplectic, 129volume, 20, 129

formulaBowen-Manning, 388, 428change of variables, 304entropy, 405Liouvile, 20of entropy (Pesin), 280Pesin, 405Rokhlin, 301, 303, 368

Fourierbasis, 482series, 105, 115

fractional part, 10free energy (Gibbs), 340frequency vector, 127Friedman-Ornstein theorem, 290function

σ-additive, 442p-integrable, 478affine, 292characteristic, 449concave, 480continuous, 449convex, 480countably additive, 442entropy, 265essentially bounded, 480finitely additive, 442Holder, 217, 409information of a partition, 252integrable, 453invariant, 70, 98locally constant, 215locally integrable, 458measurable, 449of distribution, 44of multiplicity, 489quasi-periodic, 127semi-continuous, 33simple, 450

INDEX 523

strongly affine, 299uniformly quasi-periodic, 78

functionalbounded, 483continuous, 467norm, 467positive, 454, 467

over a cone, 52tangent, 53

functions algebra, 468separating, 468

fundamental domain, 89Furstenberg

example, 166theorem, 166

Furstenberg-Kesten theorem, 86

gasideal, 140lattice, 339

Gauss map, 13, 24entropy, 276

Gaussianmeasure, 239shift, 239, 289

generatedσ-algebra, 286, 441topology, 441

generatingalgebra, 10partition, 263, 264set, 310

for flows, 324generator

one-sided, 263two-sided, 263

geodesic, 476flow, 135, 476

Gibbsdistribution, 341free energy, 340state, 340, 342, 357, 387, 389

golden ratio, 9, 186Gottschalk theorem, 168Grunwald theorem, 63Grassmannian manifold, 469Green-Tao theorem, 60group

1-parameter, 68automorphism, 170compact, 173endomorphism, 170Lie, 169linear, 170, 474locally compact, 170metrizable, 173orthogonal, 170

special linear, 170, 475topological, 169

Holderfunction, 217, 409inequality, 479, 481map, 463

Haarmeasure, 173theorem, 171

Hahn decomposition theorem, 445, 455Halmos-von Neumann theorem, 242Hamilton-Jacobi equation, 125, 127, 131Hamiltonian

flow, 125, 131function, 125non-degenerate, 128system, 125vector field, 22, 131

Hammel basis, 483Hausdorff

dimension, 425measure, 425space, 36, 441

hereditary property, 42heteroclinic

point, 494Hilbert

basis, 482dimension, 483space, 482

Hindman theorem, 168homeomorphism, 442

twist, 132homoclinic point, 132homomorphism of measure algebras, 241horocyclic flow, 289hyperbolic

distance, 423fixed point, 132matrix, 116

hypersurface of energy, 125

ideal gas, 140idempotent linear operator, 486identity

parallelogram, 485polarization, 485

image of a measure, 45, 50independent partitions, 252induced map, 24inequality

Cauchy-Schwarz, 479Holder, 479, 481Jensen, 480Margulis-Ruelle, 279Minkowski, 478, 481

524 INDEX

Tchebysheff-Markov, 460Young, 481

infinitedynamical ball, 331matrix, 239measure, 7

infinitesimal generator, 68information

function of a partition, 252of a symbol, 251of an alphabet, 251

inner product, 479, 482integer part, 10integrability, 453

uniform, 88, 460integrable

function, 453map, 130system, 126

integral, 453, 454first, 125of a simple function, 453with respect to a signed measure, 455with respect to complex measure, 455

intervald-adic, 498exchange, 94, 207

irreducible, 209in Z, 59

intrinsically ergodic map, 357invariant

distance, 174function, 70, 98measure, 2, 45of ergodic equivalence, 223of spectral equivalence, 225set, 56, 98, 365

inverse branch, 362, 429contracting, 362, 372

invertibility domain, 301irrational rotation, 17

ergodicity, 104, 105irreducible

interval exchange, 209stochastic matrix, 201

isometrically isomorphic spaces, 467, 483isometry, 314

linear, 51, 484, 485isomorphism

ergodic, 222, 241of measure algebras, 241

iterate of a measure, 45, 50iterated sum

of a partition, 256, 259of an open cover, 308, 315, 316

Jacobian, 301

Jacobs theorem, 293, 294Jensen inequality, 480

Kac theorem, 5Kakutani-Rokhlin

lemma, 29tower, 27, 28

Keanecondition, 209theorem, 210

Kingman ergodic theorem, 66, 80for flows, 87

Kolmogorovautomorphism, 286, 289system, 286, 289

Kolmogorov-Arnold-Mosertheorem, 128, 130theory, v

Kolmogorov-Sinaientropy, 257theorem, 261

Koopman operator, 50, 51

latticegas, 339system, 339

state, 339leaf

stable, 117unstable, 117

Lebesguedecomposition theorem, 460derivation theorem, 458exterior measure, 446integral, 453, 454measurable set, 244, 447, 452measure, 446

on the circle, 16number of an open cover, 313, 466space, 241, 244, 246spectrum, 221, 233, 289

rank of, 236, 238left

invariance, 173translation, 170

lemmaBorel-Cantelli, 451distortion, 363Fatou, 456Kakutani-Rokhlin, 29Riemann-Lebesgue, 240shadowing, 373Vitali, 459Zorn, 31

lengthof a curve, 476of an arithmetic progression, 58

INDEX 525

vector, 208Levy-Prohorov metric, 39Lie group, 169lift

of an invariant measure, 56of an invariant set, 56

limit of a sequence of setsinferior, 441superior, 441

linearendomorphism of the torus, 115form, 473functional

bounded, 483continuous, 467, 483norm, 49, 467positive, 454, 467positive over a cone, 52tangent, 53, 357

group, 170, 474special, 475

isometry, 51, 484, 485operator

adjoint, 484bounded, 484, 485continuous, 484dual, 51, 388idempotent, 486Koopman, 50, 51normal, 484, 488, 489positive, 41, 51, 388positive over a cone, 52self-adjoint, 484, 486spectrum, 486unitary, 484, 488

Liouvilleformula, 20measure, iv, 125theorem, 20, 21

Lipschitz map, 463Livsic theorem, 387, 406local

chart, 469coordinate, 469diffeomorphism, 477entropy, 268, 269

local diffeomorphism, 278locally

compact group, 170constant function, 215integrable function, 458invertible map, 301

logistic map, 319lower density, 59Lusin theorem, 464, 466Lyapunov exponent, 87

Mobius automorphism, 423

manifolddifferentiable, 469Grassmannian, 469leaf, 137modeled on a Banach space, 469of class Cr , 469Riemannian, 476stable, 117, 137symplectic, 129unstable, 117, 137

Manneville-Pomeau map, 26map

h-expansive, 320, 331, 355conformal, 428conservative, 19continuous, 449decimal expansion, 10degree, 362, 372, 477derivative, 471, 472differentiable, 470expanding, 27, 369

of the interval, 368on a manifold, 360

expansive, 267, 319, 362, 373exponential, 476first-return, 5, 22, 90, 91Gauss, 13, 24Holder, 463induced, 24integrable, 130interval exchange, 94, 207intrinsically ergodic, 357Lipschitz, 463locally invertible, 301logistic, 319Manneville-Pomeau, 26measurable, 449minimal, 18, 105non-degenerate, 130of class Cr , 470Poincare, 90, 91shift, 60, 109symplectic, 129time-1, 5topologically

exact, 366mixing, 190weak mixing, 227

transitive, 110two-sided expansive, 319

mapstopologically conjugate, 310topologically equivalent, 310

Margulis-Ruelleinequality, 279

Markovmeasure, 197

526 INDEX

shift, 197entropy, 274ergodic, 201finite, 198mixing, 203

mass distribution principle, 436Masur-Veech theorem, 211matrix

hyperbolic, 116infinite, 239of covariance, 240positive definite, 239stochastic, 198

aperiodic, 203irreducible, 201

symmetric, 240transition, 321

maximal entropy measure, 352Mazur theorem, 53mean

information of an alphabet, 251return time, 6sojourn time, 65, 71

measurablecylinder, 55function, 449map, 449partition, 147, 151set, iii, 440

Lebesgue, 244, 447, 452space, 440

measure, iv, 442σ-finite, 8, 75, 442absolutely continuous, 121algebra, 241

homomorphism, 241isomorphism, 241

atomic, 466barycenter of, 294basin, 360Bernoulli, 197, 457Borel, 462complete, 444complex, 445, 486density, 361Dirac, 47, 442ergodic, 75exterior, 446finite, 442flux, 92, 94Gaussian, 239Haar, 173Hausdorff, 425infinite, 7invariant, 2, 45Lebesgue, 446

on the circle, 16

Liouville, iv, 125Markov, 197non-atomic, 466non-singular, 301, 461of maximal entropy, 352of probability, iv, 442physical, 367positive, 444product, 108, 456, 457quotient, 148reference, 389–391regular, 462signed, 50, 445

finite, 445space, 442

complete, 444completion, 444

spectral, 486stationary, 57, 197suspension of, 93tight, 465with finite memory, 196, 206

measuresequivalent, 14, 459mutually singular, 122, 459

metricLevy-Prohorov, 39Riemannian, 475space, 462

complete, 465metrizable

group, 173space, 39, 462

minimalmap, 18, 105set, 8, 165, 168system, 163, 165, 210

minimality, 18, 163, 210minimizing curve, 476Minkowski inequality, 478, 481mixing, 188

Markov shift, 203weak, 226

monkey paradox, 110monotone

class, 444theorem, 444

convergence theorem, 455multiple

natural extension, 57recurrence theorem

Birkhoff, 29Poincare, 29

von Neumann ergodic theorem, 196multiplicative ergodic theorem, 86, 278multiplicity

function, 489

INDEX 527

of a Lyapunov exponent, 87of an eigenvalue, 225

mushroom billiard, 144mutually singular measures, 122, 459

natural extension, 54, 57multiple, 57

negativecurvature, 136part of a function, 453

neighborhoodof a point, 448of a set, 36

non-atomic measure, 466non-degenerate

fixed point, 131, 134Hamiltonian, 128map, 130

non-lacunary sequence, 29non-singular measure, 301, 461non-trivial

partition, 289probability space, 285

non-wanderingpoint, 34

super, 63set, 34

norm, 479, 482L∞, 480Lp, 478of a linear functional, 49, 467of a matrix, 79of a measure, 445of an operator, 174, 323, 326uniform convergence, 466

normalcone, 51linear operator, 484, 488, 489number, 11, 107, 108

normalized restriction of a measure, 147, 163number

Diophantine, 168normal, 11, 107, 108

odometer, 176one-sided

expansivity, 267generator, 263iterated sum

of a partition, 259of an open cover, 308, 316

shift, 109open

cover, 308, 443diameter, 312, 316

cylinder, 55operator

dual, 51, 388Koopman, 50, 51norm, 174, 323, 326normal, 484, 488, 489positive, 41, 51, 388

over a cone, 52Ruelle-Perron-Frobenius, 388transfer, 215, 388

orbitalaverage, 73sum, 332, 388

orthogonalcomplement, 67, 483difference, 235direct sum, 483group, 170projection, 66vectors, 482

orthonormal set, 482Oseledets

decomposition, 87ergodic theorem, 86, 278filtration, 86

Oxtoby-Ulam theorem, 124

parallelogram identity, 485part

fractional, 10integer, 10negative, 453positive, 80, 453

partition, 6, 251, 461boundary, 265, 349defined by a cover, 40, 266diameter, 266, 461generating, 263, 264measurable, 147, 151non-trivial, 289of Z, 58

partitionscoarser, 151, 254finer, 151, 254independent, 252

path-connected space, 477periodic pre-orbit, 374permutation, 77Perron-Frobenius theorem, 198Pesin entropy formula, 280, 405phase transition, 339physical measure, 367pile

bottom, 177simple, 177top, 177

piling method, 177Poincare

distance, 423

528 INDEX

first-return map, 90, 91last theorem, 132recurrence theorem, 4, 7

for flows, 5multiple, 29

Poincare-Birkhoff fixed point theorem, 132point

heteroclinic, 494non-wandering, 34of density, 458recurrent, 7simultaneously recurrent, 29super non-wandering, 63transverse homoclinic, 132

pointwise topology, 44polarization identity, 485Portmanteau theorem, 37positive

definite matrix, 239linear functional, 454, 467linear operator, 41, 51, 388measure, 444over a cone, 52part of a function, 80, 453

potential, 307, 332cohomologous, 338, 406

pre-orbit, 55, 374periodic, 374

pressure, 307, 332of a state, 340, 341

primitive substitution, 178principle

least action (Maupertuis), 340mass distribution, 436variational, 340, 344

probabilityconditional, 149measure, iv, 442space, 442

non-trivial, 285standard, 241

transition, 196product

σ-algebra, 108, 456, 457inner, 479, 482measure, 108, 457of measures, 456

countable case, 457finite case, 456

space, 456, 457topology, 110, 451, 458

Prohorov theorem, 42projection, 486

orthogonal, 66stereographic, 469

projectivedistance, 412

quotient, 412space, 477

pseudo-orbit, 373periodic, 373

quasi-periodic function, 127quotient

measure, 148projective, 412

Radon-Nikodymderivative, 361, 459theorem, 459

random variable, 44rank of Lebesgue spectrum, 236, 238rational rotation, 17rationally independent vector, 18, 209Rauzy-Veech renormalization, 214rectangle, 118, 446recurrent point, 7

simultaneously, 29reference measure, 389–391regular

measure, 462value of a map, 474

renormalization of Rauzy-Veech, 214repeller, 427

conformal, 388, 428residual set, 124, 471, 475return

first, 5simultaneous, 30time, 89, 90time (mean), 6

Riemann sum, 454Riemann-Lebesgue lemma, 240Riemannian

manifold, 476metric, 475submanifold, 476

Riesz-Markov theorem, 445, 467right

invariance, 173translation, 170

Rokhlindisintegration theorem, 152, 161formula, 301, 303, 368

root of a system, 291rotation, 16

irrational, 17number, 132on the circle, 16on the torus, 18rational, 17spectrum, 232

Ruelleinequality, 279

INDEX 529

Ruelle theorem, 342, 387Ruelle-Perron-Frobenius operator, 388

Sard theorem, 475Schauder-Tychonoff theorem, 45section transverse to a flow, 91self-adjoint linear operator, 484, 486semi-continuity of the entropy, 265semi-continuous function, 33semi-dispersing billiard, 144separable

Hilbert space, 483space, 39, 464, 467

separated set, 311for flows, 324

separatingfunctions algebra, 468sequence, 243

sequenceadditive, 79admissible, 321equidistributed, 180non-lacunary, 29of correlations, 188of covariance, 240separating, 243subadditive, 79, 80

seriesabsolutely summable, 481Fourier, 105, 115

setAubry-Mather, 133Borel, 441Cantor, 425convex, 45generating, 310

for flows, 324invariant, 56, 98, 365Lebesgue measurable, 244, 447, 452measurable, 440minimal, 8, 165, 168non-wandering, 34of continuity of a measure, 37of invariant vectors, 67orthonormal, 482residual, 124, 471, 475separated, 311

for flows, 324strongly convex, 294syndetic, 9, 168tight, 42transitive, 122with zero volume, 475

shadowing lemma, 373Shannon-McMillan-Breiman theorem, 268shift

Bernoulli, 108, 109, 197

Gaussian, 239, 289map, 60, 109Markov, 197

entropy, 274ergodic, 201finite, 198mixing, 203

multi-dimensional, 339of countable type, 236of finite type, 199, 321one-sided, 109two-sided, 60, 109

Sierpinski triangle, 437signed measure, 445

finite, 445simple

function, 450pile, 177

simultaneous return, 30simultaneously recurrent point, 29Sinai ergodicity theorem, 143Sinai, Ruelle, Bowen theory, vskew-product, 54space

Baire, 122, 124, 471, 475Banach, 49, 478, 482compact, 443completely metrizable, 471connected, 469cotangent, 129, 473dual, 49, 480, 484Euclidean, 469Hausdorff, 36, 441Hilbert, 482

separable, 483Lebesgue, 241, 244, 246measurable, 440measure, 442

complete, 444metric, 462

complete, 465metrizable, 39, 462

complete, 471of configurations, 339path-connected, 477probability, 442

non-trivial, 285product, 456, 457projective, 477separable, 39, 464, 467tangent at a point, 130, 471topological, 441topological vector, 45

spacesisometric, 467, 483isomorphic, 467, 483

special linear group, 170, 475

530 INDEX

specification, 381, 382by periodic orbits, 382

spectralequivalence, 221, 224, 242

invariant of, 225gap property, 215, 422measure, 486radius, 52, 323representation theorem, 489theorem, 488

spectrumdiscrete, 221, 230Lebesgue, 221, 233

rank of, 236, 238of a linear operator, 225, 486of a rotation, 232of a transformation, 225

sphere of dimension d, 469spin system, 339stable

foliation, 116, 117, 137leaf, 117, 137manifold, 117, 137set, 55

stadium billiard, 144standard probability space, 241state

energy of, 340entropy of, 340equilibrium, 308, 339, 352Gibbs, 340, 342, 357, 387, 389of a lattice system, 339pressure of, 340, 341

stationary measure, 57, 197stereographic projection, 469stochastic matrix, 198

aperiodic, 203irreducible, 201

Stone theorem, 68Stone-Weierstrass theorem, 468stronger topology, 37strongly

affine function, 299convex set, 294

subadditiveergodic theorem, 66, 80

for flows, 87sequence, 79, 80

subcover, 308, 443submanifold, 470

Riemannian, 476substitution, 177, 179

Cantor, 178Chacon, 178Feigenbaum, 178Fibonacci, 178, 319primitive, 178

Thue-Morse, 178sum

direct orthogonal, 483of a family of subspaces, 482of a family of vectors, 482of open covers, 308of partitions, 252orbital, 332, 388Riemann, 454

super non-wandering point, 63support

of a measure, 448of a spectral measure, 486

suspensionflow, 89of a measure, 90, 93of a transformation, 89

symmetricdifference, 444matrix, 240

symplecticform, 129manifold, 129map, 129

syndetic set, 9, 168system

almost everywhere invertible, 248almost integrable, 126Anosov, vaperiodic, 29, 264conservative, 49, 124countably generated, 236ergodic, 97, 98Hamiltonian, 125integrable, 126Kolmogorov, 286, 289lattice, 339

state, 339Lebesgue spectrum

rank of, 236, 238minimal, 163, 165, 210mixing, 188root, 291spin, 339totally dissipative, 49uniquely ergodic, 163weak mixing, 191, 226with discrete spectrum, 221, 230with finite memory, 196, 206with Lebesgue spectrum, 221, 233, 289

Szemeredi theorem, 60, 61

tangentbundle, 130, 135, 472

unit, 135, 477linear functional, 53, 357space at a point, 130, 471

INDEX 531

Tchebysheff-Markov inequality, 460theorem

absolute continuity, 137Anosov, 136approximation, 444Banach-Alaoglu, 50, 484Banach-Mazur, 52Birkhoff

ergodic, 66, 71, 73, 75ergodic for flows, 78multiple recurrence, 29normal form, 131, 134recurrence, 7, 48

Borel normal, 108Brin-Katok, 269compactness, 41continuity

at the empty set, 443from above, 451from below, 451

discrete spectrum, 242disintegration, 152dominated convergence, 456dynamical decomposition, 377ergodic decomposition, 148existence of invariant measures, 35

for flows, 48extension of measures, 443Friedman-Ornstein, 290Furstenberg, 166Furstenberg-Kesten, 86Gottschalk, 168Grunwald, 63Green-Tao, 60Haar, 171Hahn decomposition, 445Halmos-von Neumann, 242Hindman, 168Jacobs, 293, 294Kac, 5Keane, 210Kingman ergodic, 66, 80

for flows, 87Kolmogorov-Arnold-Moser, 128, 130Kolmogorov-Sinai, 261Lebesgue

decomposition, 460derivation, 458

Liouville, 20, 21Livsic, 387, 406Lusin, 464, 466Masur-Veech, 211Mazur, 53monotone class, 444monotone convergence, 455multiplicative ergodic, 86, 278Oseledets, 86, 278

Oxtoby-Ulam, 124Perron-Frobenius, 198Poincare

multiple recurrence, 29recurrence, 4, 7

Poincare-Birkhoff fixed point, 132Portmanteau, 37Prohorov, 42Radon-Nikodym, 459Riesz-Markov, 445, 467Rokhlin, 152, 161Ruelle, 342, 387Sard, 475Schauder-Tychonoff, 45Shannon-McMillan-Breiman, 268Sinai ergodicity, 143spectral, 488spectral representation, 489Stone, 68Stone-Weierstrass, 468subadditive ergodic, 66, 80

for flows, 87Szemeredi, 60, 61Tychonoff, 110van der Waerden, 58, 60von Neumann ergodic, 66, 69, 75


Weyl, 180Whitney, 476

Thue-Morse substitution, 178tight

measure, 465set of measures, 42

timeaverage, 73constant of a subadditive sequence, 88mean sojourn, 65, 71of first return, 5, 23, 91of return, 89, 90

time-1 map, 5top of a pile, 177topological

conjugacy, 223, 310entropy, 307, 309, 313

for flows, 325, 326equivalence, 310

of flows, 326factor, 310, 386group, 169space, 441vector space, 45weak mixing, 227

topologicallyconjugate maps, 310equivalent maps, 310exact map, 366

532 INDEX

mixing map, 190weak mixing map, 227

topology, 441C0, 385C1, 385Cr , 470defined by

a basis of neighborhoods, 36a distance, 462

discrete, 110generated, 441pointwise, 44product, 110, 451, 458stronger, 37uniform, 44uniform convergence, 385weak, 49, 484weak∗, 36, 50, 484weaker, 37

torus, 18of dimension d, 114, 469rotation, 18

total variation, 445totally dissipative system, 49tower, 27

Kakutani-Rokhlin, 28transfer operator, 215, 388transition

matrix, 321phase, 339probability, 196

transitivemap, 110set, 122

transitivity, 122translation

in a compact group, 314left, 170right, 170vector, 208

transversality, 475, 478transverse

homoclinic point, 132section, 91

twistcondition, 128, 130, 132, 134Dehn, 495homeomorphism, 132

two-sidedexpansivity, 267generator, 263iterated sum

of a partition, 259of an open cover, 315, 316

shift, 60, 109Tychonoff theorem, 110

uniform

convergence norm, 466integrability, 88, 460topology, 44

uniformlycontinuous flow, 325hyperbolic flow, 136quasi-periodic function, 78

unique ergodicity, 163unit

circle, 16tangent bundle, 135, 477

unitary linear operator, 484, 488unstable

foliation, 116, 117, 137leaf, 117, 137manifold, 117, 137

up to measure zero, 444upper density, 59, 61

van der Waerden theorem, 58, 60variation, 445variational principle, 340, 344vector

Diophantine, 128field, 472

Hamiltonian, 22, 131frequency, 127length, 208rationally independent, 18, 209translation, 208

Vitalilemma, 459

volumeelement, 135form, 20, 129induced by a Riemannian metric, 135,

171von Neumann ergodic theorem, 66, 69, 75


weakmixing, 191, 226topology, 49, 484

weak∗ topology, 36, 50, 484weaker topology, 37Weyl theorem, 180Whitney theorem, 476word, 250

Young inequality, 481

zero volume set, 475Zorn lemma, 31

Documents

Foundations of Ergodic Theory · v Brief historic survey The word ergodic is a concatenation of two Greek words, ǫργoν(ergon) = work and oδoσ(odos) = way, and was introduced