88
LECTURE NOTES FOR THE COURSE MATH 245A: GRADUATE ANALYSIS FALL QUARTER 2014 BY MAREK BISKUP Note: Preliminary version, comments welcome!

 · 2020. 9. 8. · 0 CONTENTS 1. A brief history of integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • LECTURE NOTES FOR THE COURSE

    MATH 245A: GRADUATE ANALYSIS

    FALL QUARTER 2014

    BY MAREK BISKUP

    Note: Preliminary version, comments welcome!

  • 0

    CONTENTS

    1. A brief history of integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1. Newton’s integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Cauchy’s integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3. Riemann’s integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4. Stieltjes’ integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5. Stieltjes and beyond: Itô-Stratonovich and Young integrals . . . . . . . . . . . . . . . . . . . . . 7

    2. Peano-Jordan content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1. Elementary sets and (pre)content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2. Outer/inner content, measurability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3. Non-measurable sets, relation to topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4. Connection with Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3. Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1. Lebesgue outer measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2. Lebesgue measurable sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3. Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4. Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.5. Nonmeasurable sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    4. Lebesgue integral: bounded case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1. Bounded functions on finite measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2. Integrability vs measurability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3. Bounded Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.4. Characterization of Riemann integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    5. Lebesgue integral: general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.1. Unsigned integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2. Signed integral and convergence theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.3. Absolute integrability and L1-space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.4. Uniform integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    6. Abstract measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.1. From semialgebras to outer measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.2. Carathéodory measurability and extension theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.3. Uniqueness of extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.4. Lebesgue-Stieltjes measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    7. More on outer measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737.1. Regular outer measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737.2. Borel regular and Radon measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.3. Metric outer measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.4. Hausdorff measure and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807.5. Frostman’s lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 1

    1. A BRIEF HISTORY OF INTEGRALThe MATH 245A analysis course is by and large devoted to Lebesgue’s theory of measure andintegration. This theory stands on the foundations laid over 200 years by many great mindsworking in mathematics. In order to appreciate these better, we begin by a brief recount of thehistory of integral and the rather winding road that led to Lebesgue’s ultimate formulation.

    In what follows, I will freely invoke various technical terms such as limit, continuity, deriva-tive, etc, pertinent to differential calculus. These are introduced and discussed in detail in variousundergraduate analysis courses (e.g., MATH 131).

    1.1 Newton’s integral.

    At the time of Newton (and Leibnitz), integral was understood in the sense of an “inverse” toderivative. The interpretation in terms of “area under graph of function” was known but wasregarded as secondary. We may thus take Newton’s definition of integral to be:

    Definition 1.1 (Newton’s integral) Let f : [a,b]→ R and let F be such that F ′(x) = f (x) for allx ∈ [a,b]. Then ∫ b

    af (x)dx := F(b)−F(a) (1.1)

    Newton’s integral thus required the function to have an antiderivative (a.k.a. the primitive func-tion). The definition needs to be supplied by the following observation that, we note, uses onlydifferential calculus:

    Lemma 1.2 If F : [a,b]→ R is such that F ′(x) = 0 for all x ∈ [a,b], then F is constant.

    Indeed, if there are two antiderivatives for f (x), then their difference has zero derivative andso must be constant. The lemma is proved by way of:

    Theorem 1.3 (Mean-Value Theorem) Let f : [a,b]→ R be continuous on [a,b] and such thatf ′(x) exists for all x ∈ (a,b). Then there is c ∈ (a,b) such that

    f (b)− f (a)b−a

    = f ′(c). (1.2)

    It is worth noting that, since we require no regularity of f ′, the Mean-Value Theorem re-quires some facts from point-set topology that were certainly not available at the time of New-ton. Indeed, a linear transformation reduces the statement to the case f (b) = f (a), which goesby the name Rolles’ Theorem. This is in turn proved by using that a continuous function ona compact interval achieves its maximum and minimum in the interior for which we need theBolzano-Weierstrass Theorem (stating that every bounded sequence of reals has a convergingsubsequence). A simple argument underlying the first-derivative criterion then gives the result.

    An attractive feature of Newton’s definition is that, once the integral is well defined, we imme-diately get both basic formulations of the Fundamental Theorem of Calculus:

    Fundamental Theorem of Calculus I: The derivative of the integral with respect to the upperlimit is the integrand,

    ddx

    ∫ xa

    f (t)dt = f (x) (1.3)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 2

    Fundamental Theorem of Calculus II: The integral of the derivative of a function equals thefunction itself, ∫ x

    aF ′(t)dt = F(x) (1.4)

    assuming F(a) = 0.A major shortcoming of Newton’s integral is that its definition is non-constructive. In particu-

    lar, there is no direct way to decide whether a given function admit antiderivatives. This is, in asense, the key issue in all definitions that followed Newton’s.

    1.2 Cauchy’s integral.

    In his 1821 monograph, Cauchy put forward a definition of integral that is directly based on theinterpretation of “area under graph of function.” To state this precisely, we need some notation.(For the impatient, this notation will carry through a bunch of forthcoming subsections.) Given aninterval [a,b], a marked partition Π of an interval [a,b] is a collection of intervals {[xi−1,xi) : i =1, . . . ,n} and points z1, . . . ,zn such that

    a = x0 < x1 < · · ·< xn−1 < xn = b (1.5)

    and

    zi ∈ [xi−1,xi], i = 1, . . . ,n. (1.6)

    The norm of the partition Π is defined by

    ‖Π‖ := maxi=1,...,n

    |xi− xi−1|. (1.7)

    Given a function f : [a,b]→ R and a marked partition Π of [a,b], we then define

    R( f ,Π) :=n

    ∑i=1

    f (zi)(xi− xi−1). (1.8)

    (Here “R” is for Riemann sum.) Cauchy then more or less proved:

    Theorem 1.4 (Cauchy’s integral) Let f : [a,b]→ R be continuous. Then the limit∫ ba

    f (x)dx := lim‖Π‖→0

    R( f ,Π) (1.9)

    exists (finitely) in the sense that

    limδ↓0

    infΠ : ‖Π‖ 0 such that

    x,y ∈ [a,b] & |x− y|< δ ⇒∣∣ f (y)− f (x)∣∣< ε. (1.11)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 3

    It follows that, if Π and Π′ are two marked partitions with same x1, . . . ,xn but (possibly) differentz1, . . . ,zn, ∣∣R( f ,Π)−R( f ,Π′)∣∣≤∑

    i=1

    ∣∣ f (zi)− f (z′i)∣∣(xi− xi−1)< ε(b−a). (1.12)The right-hand side tends to zero as ε ↓ 0 and so the choice of the marked points thus playsno role for the limit and so we may as well choose one that makes R( f ,Π) maximal. Thiscorresponds to zi being the maximizer of f over [xi−1,xi]. But for such partition the limit existsby monotonicity because refining the partition decreases (maximal) R( f ,Π). (We will discussthis more in the context of upper and lower integrals in the next subsection.)

    Cauchy completed his theory by providing proofs of both versions of the Fundamental Theo-rem of Calculus. Thanks to him, there was a first constructive definition of the definite integralfor which the classic Newton-Leibnitz theory applied. It seemed that harmony was achieved;unfortunately, not for too long.

    1.3 Riemann’s integral.

    The first half of the 19th century was marked by gradual but steady increase in precision withwhich mathematicians approached various questions surrounding the theory of functions. Ini-tially, there was not even a consensus about what it meant for a function to be continuous —before Cauchy, continuity meant that the same expression defining f (x) applied to the entire do-main of allowed x values. Numerous examples of discontinuous functions (in present-day senseof the word) were being considered and argued about. This permitted to extend Cauchy’s defi-nition to even functions with a finite number of points of discontinuity, in fact, even such pointswhere the integrand is unbounded.

    For instance, when f : [a,c)∪ (c,b]→ R is given with f continuous on both [a,c) and (c,b],we may define ∫ b

    af (x)dx := lim

    ε,ε ′↓0

    (∫ c−εa

    f (x)dx+∫ b

    c+ε ′f (x)dx

    ), (1.13)

    provided the double limit exists. The resulting object is referred to as an improper integral. (It iseasy to check that, if f is in fact continuous on all of [a,b], then this coincides with the originalnotion.) There are even more stringent recipes, e.g., given a < 0 < b we set

    PV∫ b

    a

    1x

    dx := limε↓0

    (∫ −εa

    1x

    dx+∫ b

    ε

    1x

    dx)

    (1.14)

    where “PV” stands for principal value. (It is easy to check that the integral of 1x fails to exists asan improper integral, yet it does exists in the sense of principal value.) Notwithstanding, all ofthese attempts carry additional choices that make the concept difficult to use. In particular, thereare properties that naturally hold for the ordinary integral but fail for these extensions.

    In his habilitation work, Riemann made a significant philosophical leap compared to all earlierattempts. Here is the definition he used:

    Definition 1.5 (Riemann’s integral) Given f : [a,b] → R, we say that the Riemann integral∫ ba f (x)dx exists if the limit ∫ b

    af (x)dx := lim

    ‖Π‖→0R( f ,Π), (1.15)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 4

    defined in the sense of equality (1.10), exists (finitely). The function f is then said to be Riemannintegrable on [a,b].

    The most notable fact is that the existence of an integral is no longer derived from assumptionson regularity of f ; instead, it becomes a regularity property itself. To make the work with hisconcept easier, Riemann supplied a condition for f to be integrable in his sense:

    Lemma 1.6 A function f : [a,b]→ R is Riemann integrable on [a,b] if and only if

    lim‖Π‖→0

    n

    ∑i=1

    osc(

    f , [xi−1,xi])(xi− xi−1) = 0. (1.16)

    Here osc(

    f ,A) is the oscillation of f on set A which is defined by

    osc(

    f ,A) := sup{| f (y)− f (x)| : x,y ∈ A

    }(1.17)

    and the limit is over partitions of [a,b] without marked points.

    We leave the proof of this fact to homework. Note that this criterion shows almost instantlythat all continuous f are Riemann integrable. Riemann’s definition thus subsumes Cauchy’scompletely. We will later see this does not quite apply to Newton’s definition.

    Another way how to define the integral is by way already touched upon in the section onCauchy’s integral. This method is generally attributed to Darboux — and, in the French literature,Darboux’s name is occasionaly used instead of Riemann’s in connection with this integral —although there are certainly predecessors to this. The idea is to introduce the upper Darboux sum,

    U( f ,Π) :=n

    ∑i=1

    (sup

    t∈[xi−1,xi]f (t))(xi− xi−1) (1.18)

    and the lower Darboux sum,

    L( f ,Π) :=n

    ∑i=1

    (inf

    t∈[xi−1,xi]f (t))(xi− xi−1). (1.19)

    These sums now depend only on the partition Π; no additional marked points are necessary. Anencouraging feature of these is the behavior under refinements of the partition. Indeed, considera partition Π′ obtained by inserting a point (or points) into Π. Then

    U( f ,Π)≥U( f ,Π′) while L( f ,Π)≤ L( f ,Π′). (1.20)It follows that the limit ‖Π‖ → 0 can be simulated by taking infimum, resp., supremum of thesequantities. This leads to the definition of upper and lower Riemann/Darboux integrals,∫ b

    af (x)dx := sup

    ΠL( f ,Π) and

    ∫ ba

    f (x)dx := infΠ

    U( f ,Π). (1.21)

    A key point to note is then:

    Lemma 1.7 f : [a,b]→ R is Riemann integrable on [a,b] if and only if∫ ba

    f (x)dx =∫ b

    af (x)dx with both values in R. (1.22)

    The common value of these integrals then coincides with∫ b

    a f (x)dx.

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 5

    Given its generality, it is no surprise that Riemann’s definition includes functions that hadheretofore seemed inaccessible. To give an example, pick a non-increasing sequence {αn} withan ≥ 0 and consider an enumeration of rationals into a sequence, Q= {qi : i≥ 1}. Define

    f (x) :=

    {αn, if x = qn for some n≥ 1,0, otherwise.

    (1.23)

    Since {αn} is non-increasing, osc(

    f , [xi−1,xi])> αn only for at most n intervals in the partition.

    Hence, the sum in (1.16) is less than nαn+αn(b−a). Thus, if nαn→ 0, this function is Riemannintegrable. But once an > 0 for all n, this function is also discontinuous at all rationals! Moreover,it is not hard to check that ∫ b

    af (x)dx = 0 (1.24)

    for all a < b. In particular, the Fundamental Theorem of Calculus I fails at rational x.The previous example shows that Riemann’s integral is strictly more general than Cauchy’s.

    To see that it does not even subsume Newton’s integral, we state:

    Lemma 1.8 (Volterra’s example) There exists a function F : [a,b]→ R with F ′(x) defined andbounded at all x ∈ [a,b] and yet F ′ not Riemann integrable on [a,b].

    Thus, although Riemann’s definition of integral is more general than Cauchy’s, it is too gen-eral to make the FTC I work at all points (usual statements of FTC I require continuity of theintegrand) and yet not general enough to make the FTC II work (since one has to assume that F ′

    is Riemann integrable). This is a quandary that only Lebegue’s notion of measurable functionwas able to resolve.

    There are numerous other shortcomings of the Riemann integral that complicate its use. Wename just a few:

    (1) The integral requires the integrated function to be bounded.(2) The integral behaves poorly under pointwise limits: There is a sequence of continuous

    functions converging pointwise to a bounded function that is not Riemann integrable.

    Further shortcomings concern generalizations of the Riemann integral to higher dimensions whichshould be quite obvious: Partition integration domain into squares and take a limit of the corre-sponding Riemann sum. Notwithstanding, we then run into:

    (3) Generalizations to two (or higher) dimensions require additional conditions on the under-lying domain of integration. Or, taking this problem back to one dimension, the integralis defined more or less only for intervals.

    (4) Fubini’s theorem, stating that an integral over a rectangle in R2 can be computed astwo successive one-dimensional integrals requires unwieldy conditions. Indeed, even iff : R2→ R is Riemann integrable, g(x) := f (x,y) may not be.

    Again, all of these are resolved very elegantly in Lebesgue’s theory.

    1.4 Stieltjes’ integral.

    Before we move to discussing Lebesgue’s theory, it is worth pointing out a generalization ofRiemann’ integral, due to Stieltjes. This involves two functions, f ,g : [a,b]→ R for which we

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 6

    consider the sum

    S( f ,dg,Π) :=n

    ∑i=1

    f (zi)(g(xi)−g(xi−1)

    ). (1.25)

    Obviously, when g(x) := x, this degenerates to R( f ,Π).

    Definition 1.9 (Stieltjes’ integral) A function f is said to be Riemann-Stieltjes integrable withrespect to g on [a,b] if ∫ b

    af dg := lim

    ‖Π‖→0S( f ,dg,Π) (1.26)

    exists (in the above sense). The resulting value defines the Riemann-Stieltjes integral.

    We note that, although the Stieltjes integral appears quite asymmetric in f and g, there isactually a lot of symmetry. In fact, we have:

    Lemma 1.10 If∫ b

    a f dg exists, then so does∫ b

    a gd f and∫ ba

    f dg+∫ b

    agd f = f (b)g(b)− f (a)g(a). (1.27)

    Note that this boils down to the integration by parts formula when f and g are continuouslydifferentiable. Indeed, we have:

    Lemma 1.11 If g is continuously differentiable,∫ ba

    f dg =∫ b

    af (x)g′(x)dx (1.28)

    whenever the Riemann integral on the right exists.

    We leave both lemmas as an exercise in the homework. The integral has a lot of “standard”properties although the additivity property with respect to domain of integration,∫ b

    af dg =

    ∫ ca

    f dg+∫ b

    cf dg (1.29)

    fails in general. Again, we leave this as an exercise to the reader.There is a criterion for the existence which is formulated using the notion of (first) variation

    of f on [a,b]. This is the quantity

    V(

    f , [a,b])

    := supΠ

    n

    ∑i=1

    ∣∣ f (xi)− f (xi−1)∣∣. (1.30)(Again, thanks to the triangle inequality, insertion of points into the partition increases the sumso we take the supremum instead of ‖Π‖ → 0.) A function f is said to be of finite variation on[a,b] if V

    (f , [a,b]

    )< ∞. By analyzing the trivial decomposition

    f (x) =V(

    f , [a,x])−(V(

    f , [a,x])− f (x), x ∈ [a,b], (1.31)

    we obtain the classic theorem due to Jordan that a function is of finite variation on a finite closedinterval if and only if it can be written as the difference of two bounded non-decreasing functions.It thus often suffices to develop the integral

    ∫ ba f dg for monotone g only.

    The promised criterion for existence is then:

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 7

    Lemma 1.12 Suppose f ,g : [a,b]→ R with f continuous and g of finite variation. Then theStieltjes integral

    ∫ ba f dg exists and, in fact,∣∣∣∣∫ ba f dg

    ∣∣∣∣≤ ( supx∈[a,b]

    ∣∣ f (x)∣∣)V(g, [a,b]) (1.32)Proof. Similar to Riemann’s integral, a sufficient condition for the limit underpinning the defini-tion of Stieltjes integral to exist is

    lim‖Π‖→0

    n

    ∑i=1

    osc(

    f , [xi−1,xi])∣∣g(xi)− xi−1∣∣= 0. (1.33)

    Under the assumption of continuity, and thus uniform continuity, of f , given ε > 0 there is δ > 0such that once ‖Π‖ < δ , we have osc( f , [xi−1,xi]) < ε . For such partitions, the sum is at mostεV (g, [a,b]). Taking ε ↓ 0 proves (1.33).

    To get the bound on the integral, we note that S( f ,dg,Π) is itself bounded by the right-handside of (1.32). This, naturally, survives taking the limit. �

    1.5 Stieltjes and beyond: Itô-Stratonovich and Young integrals.

    The restriction to Stieltjes integrals with respect to functions of finite variation seemed fine ini-tially but, as more and more irregular functions become considered by mathematicians, it becameclear that other criteria had to be developed as well. One example when this is necessary is thestochastic or Itô integral. Here g is taken to be a path of a random process called Brownian mo-tion, which is continuous, but nowhere differentiable and of infinite variation. Notwithstanding,if we define the p-variation of g by

    V (p)(

    f , [a,b])

    := lim‖Π‖→0

    n

    ∑i=1

    ∣∣ f (xi)− f (xi−1)∣∣p, (1.34)where for p≥ 1 the existence of the limit needs to be proved by a separate argument, the Brownianmotion has the p= 2 (i.e., the second) variation (well-defined) finite and positive over any intervalof positive length.

    Notwithstanding, in this case the integral cannot be defined in the usual sense. This is seenrather simply from the following observation. Let L( f ,dg,Π) denote the left endpoint approxi-mation of the integral,

    L( f ,dg,Π) =n

    ∑i=1

    f (xi−1)(g(xi)−g(xi−1)

    )(1.35)

    and let

    R( f ,dg,Π) =n

    ∑i=1

    f (xi)(g(xi)−g(xi−1)

    )(1.36)

    be the corresponding right endpoint approximation. Then

    R( f ,dg,Π)−L( f ,dg,Π) =n

    ∑i=1

    (f (xi)− f (xi−1)

    )(g(xi)−g(xi−1)

    )(1.37)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 8

    and so if f = g, then

    R( f ,d f ,Π)−L( f ,d f ,Π) =n

    ∑i=1

    (f (xi)− f (xi−1)

    )2 (1.38)Assuming that the ‖Π‖ → 0 limit of the right-hand side exists, we get that the limits of theright and left endpoint approximations differ exactly by the second variation of f . Under suchconditions the Stieltjes integral

    ∫ ba f df cannot exist in the usual sense.

    It turns out that the integral using the left-endpoint approximation does exist in the sense ofL2-convergence and bears the name Itô integral. If mid-point approximation is used instead, itis called the Stratonovich integral. Each of these has its own merits, although the FundamentalTheorem of Calculus works only for the Stratonovich integral.

    Let us turn the whole story in the positive direction by noting a simple but useful observationmade by Young that gives an easy-to-check criterion for existence of the Stieltjes integral. Themain idea is that one can trade an increase in regularity of f against a decrease in regularity of g.The resulting integral is sometimes called Young’s, although it remains in the sense of Stieltjes.

    In order to state Young’s criterion, recall that f is α-Hölder if there exists a constant K f suchthat | f (y)− f (x)| ≤ K f |y− x|α for all x and y in the domain of f . (The least such constant isreferred to as Hölder norm of f .) The set of all α-Hölder functions on a set A will be denoted byCα(A). Then we have:

    Theorem 1.13 (Young’s integral) Suppose that f : [a,b]→ R are such that f ∈Cα([a,b]) andg ∈ Cβ ([a,b]) for some α,β > 0 with α + β > 1. Then f is Riemann-Stieltjes integrable withrespect to g on [a,b] (and vice versa).

    The key point of the proof is:

    Lemma 1.14 (Hölder’s inequality) Let p,q ∈ (1,∞) be such that 1p +1q = 1. Then for all real

    numbers a1, . . . ,an,b1, . . . ,bn ≥ 0n

    ∑i=1

    aibi ≤( n

    ∑i=1

    api

    )1/p( n∑i=1

    bqi

    )1/q(1.39)

    Proof. This is an arithmetic-geometric inequality in disguise. A quick way to make the argumentis to consider the function ϕ(x,y) := xy− xp/p− yq/q for x,y≥ 0. It is straightforward to checkthat, once 1p +

    1q = 1, we have ϕ(x,y)≥ 0 for all x,y≥ 0. Assuming that not all ai’s and bi’s are

    zero (otherwise the inequality is trivial), we may set

    ãi :=ai(

    ∑ni=1 api

    )1/p and b̃i := bi(∑ni=1 bpi )1/p (1.40)and use the non-negativity of ϕ to get

    ãib̃i ≤ãpip+

    b̃qiq. (1.41)

    Summing this over i then yieldsn

    ∑i=1

    ãib̃i ≤1p

    n

    ∑i=1

    ãpi +1q

    n

    ∑i=1

    b̃qi . (1.42)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 9

    But the sums on the right are both equal to one and, since 1p +1q = 1, the sum on the left is thus

    bounded by one. It is now easy to check that this is the desired bound. �

    Proof of Theorem 1.13. Our goal is to show (1.33). To treat f and g on the same footing, we firstbound the sum as

    n

    ∑i=1

    osc(

    f , [xi−1,xi])∣∣g(xi)− xi−1∣∣≤ n∑

    i=1osc(

    f , [xi−1,xi])osc(g, [xi−1,xi]

    )(1.43)

    Next we apply Hölder’s inequality to estimate this by( n∑i=1

    osc(

    f , [xi−1,xi])p)1/p( n∑

    i=1osc(g, [xi−1,xi]

    )q)1/q (1.44)for any p,q > 1 with 1p +

    1q = 1. Thanks to f ∈C

    α we have | f (x)− f (y)| ≤ K f |x− y|α and thus

    osc(

    f , [xi−1,xi])≤ K f |xi− xi−1|α (1.45)

    and similarlyosc(g, [xi−1,xi]

    )≤ Kg|xi− xi−1|β (1.46)

    The fact that α,β > 0 with α + β > 1 permits us to find p and q such that 1p +1q = 1 and yet

    α p≥ 1 and βq≥ 1. This implies|xi− xi−1|α p = |xi− xi−1| |xi− xi−1|α p−1 ≤ |xi− xi−1|‖Π‖α p−1 (1.47)

    and, since ∑ni=1 |xi− xi−1|= (b−a), also( n∑i=1

    osc(

    f , [xi−1,xi])p)1/p ≤ K f( n∑

    i=1|xi− xi−1|α p

    )1/p≤ K f ‖Π‖α−1/p(b−a)1/p

    (1.48)

    An analogous estimate (with α and p replaced by β and q) applies to g. It follows thatn

    ∑i=1

    osc(

    f , [xi−1,xi])∣∣g(xi)− xi−1∣∣≤ K f Kg‖Π‖α+β−1(b−a) (1.49)

    and, since α +β > 1, the criterion (1.33) for existence of the Stieltjes integral is satisfied. (Notethat we even get a rate of convergence in terms of ‖Π‖ which can be useful for numerics.) �

    Note that the Young criterion for existence of∫ b

    a f d f requires f to be Hölder with an exponentstrictly larger than 1/2. This is not consistent with the second variation of f being positive andso the Itô/Stratonovich integrals are not included in Young’s class. This is no surprise since weshowed that different choices of the “rule” can lead to different limits.

    We also remark that there are numerous other integrals that further generalize those above (andthat even includes the Lebesgue integral that we have not discussed yet). We will not developthese here as they are a bit too special.

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 10

    2. PEANO-JORDAN CONTENTThe somewhat dissatisfactory situation with the Riemann integral led Cantor, Peano and par-ticularly Jordan develop a theory of (what we now call) the Peano-Jordan content (a.k.a. justas Jordan content). The underlying motivation is that, since the Riemann integral is after all ameans to define the area of a region in R2 — namely that bounded between one of the axes andthe graph of a function — we may as well try to address the notion of area directly from theoutset. The usefulness of this philosophical turn of thought becomes even more apparent whenhigher-dimensional Riemann integrals are considered; there one often wants to integrate f onlyover a bounded region in, say, R2 or R3 and then even integrating function f = 1 over such regionsleads to problem with defining areas and volumes.

    Compared to the Lebesgue measure, the Peano-Jordan content has numerous shortcomings(essentially identical to those of the Riemann integral) and so, for present day mathematics, itsrelevance rests mostly in the historical role it played at its time. However, the development ofPeano-Jordan content will demonstrate many technical aspects of a full-fledged theory of measureand so we may start with it just as well.

    2.1 Elementary sets and (pre)content.

    The development of any theory of content or measure has essentially two parts:

    (1) Identify a class of elementary sets for which we agree what their content/measure is.(2) Extend this to a larger class, called measurable sets, by way of approximations from

    within and without by elementary sets.

    Of course, there are numerous aspects one should be concerned about in implementing such aprogram. For instance, if the class of elementary sets is chosen too large, or the specific meaningof content we agree on is poorly behaved, the approximations considered in (2) might lead toinconsistent answers even for some elementary set. On the other hand, if the class of elementarysets is too small, too few additional sets could be reached by approximations. As we shall seelater, these are exactly the concerns that took quite a while to tune out correctly.

    For the Peano-Jordan content, our choice of elementary set will be as follows:

    Definition 2.1 (Elementary sets) A half-open box in Rd is a set of the form

    I := (a1,b1]× (a2,b2]×·· ·× (ad ,bd ] (2.1)

    where −∞ < ai < bi < ∞ for each i = 1, . . . ,d. An elementary set is then any union of the form⋃ni=1 Ii where Ii ∈I for all i = 1, . . . ,n. We will write I to denote the set of all half-open boxes

    including the empty set and E for the class of elementary sets.

    Lemma 2.2 The class E is closed under unions, intersections and set differences.

    Proof. The closure under unions is trivial. For intersections we use that if I,J are non-disjointhalf-open boxes, then so is I∩ J. For set differences we use that if I,J are half-open boxes, thenJ r I is either empty or a finite union of half-open boxes. To extend this to elementary sets, wejust apply basic set operations. �

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 11

    For a half-open box I as in (2.1) we then define its (pre)content by

    |I| :=d

    ∏i=1

    (bi−ai). (2.2)

    with the proviso | /0|= 0. When a set is a union of a finite number of disjoint half-open boxes {Ii},our intuition would dictate to set its (pre)content to the sum ∑ni=1 |Ii|. However, this harbors aconsistency problem: We do not know that a different representation as a sum of half-open boxeswould result in the same value. This, and a few other issues that are related to this problem, isaddressed in:

    Proposition 2.3 (Consistency, subadditivity, additivity) Let {Ii : i = 1, . . . ,n} ⊂I and {J j : j =1, . . . ,m} ⊂I with Ii∩ Ii′ = /0 whenever i 6= i′. Then

    n⋃i=1

    Ii ⊆m⋃

    j=1

    J j ⇒n

    ∑i=1|Ii| ≤

    m

    ∑j=1|J j|. (2.3)

    In particular, if the unions are equal and {J j : j = 1, . . . ,m} are disjoint, then equality holds.

    We begin with an elementary version of the claim:

    Lemma 2.4 Let {Ii : i = 1, . . . ,n} ⊂I be disjoint with I :=⋃n

    i=1 Ii ∈I . Then

    |I|=n

    ∑i=1|Ii|. (2.4)

    Proof in d = 1. Suppose I is a half-open interval and {Ii : i = 1, . . . ,n} are disjoint half-openintervals such that I =

    ⋃ni=1 Ii. We can always relabel the intervals such that Ii = (ai,bi] for

    −∞ < a1 < b1 ≤ a2 < b2 ≤ ·· · ≤ an < bn < ∞. (2.5)

    But then the fact that the union of Ii’s must be an interval forces bi = ai+1 for all i = 1, . . . ,n−1along with a1 = a and bn = b. This and the specific form∣∣(ai,bi]∣∣= (bi−ai) (2.6)implies

    n

    ∑i=1|Ii|=

    n

    ∑i=1

    (bi−ai) =

    n

    ∑i=1

    bi−n

    ∑i=1

    ai

    = b+n−1

    ∑i=1

    bi−n

    ∑i=2

    ai−a = b−a(2.7)

    and so we get |I|= ∑ni=1 |Ii| as desired. �

    Proof in d ≥ 2. The proof has two steps. First we address the case of (what we will call) productpartitions. Then we will use that to handle general partitions as well.

    STEP 1 (Product partitions): By a product partition of a half-open box I we will mean anycollection of sets of the form J1, j1×·· ·× J1, jd such that

    (1) Ji, j is a half-open interval in R for all i = 1, . . . ,d and all j = 1, . . . ,mi,(2) Ji, j ∩ Ji,k = /0 unless j = k, and

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 12

    (3) their union is all of I, i.e.,

    I =m1⋃

    j1=1

    · · ·md⋃

    jd=1

    (J1, j1×·· ·× Jd, jd ). (2.8)

    Naturally, the union in (2.8) is disjoint thanks to (2). Observe also that

    Ki :=mi⋃j=1

    Ji, j (2.9)

    is necessarily an half-open interval and I = K1×·· ·×Kd .Given such a product partition, we can use the product form of (2.2) and the distributive law

    for multiplication around summation to getm1

    ∑j1=1· · ·

    md

    ∑jd=1|J1, j1×·· ·× Jd, jd |=

    m1

    ∑j1=1· · ·

    md

    ∑jd=1

    ( d∏i=1|Ji, ji |

    )=

    d

    ∏i=1

    ( mi∑j=1|Ji, j|

    )(2.10)

    From the (already proved) claim in d = 1 we know that ∑mij=1 |Ji, ji | = |Ki|. Relabeling things weconclude that

    m1

    ∑j1=1· · ·

    md

    ∑jd=1

    ∣∣J1, j1×·· ·× Jd, jd ∣∣= |I|. (2.11)This verifies the claim in the case of product partitions.

    STEP 2 (General partitions): Let I ∈ I and suppose that I =⋃n

    i=1 Ii for some {Ii} ⊂ I withIi∩ I j = /0 whenever i 6= j. Since Ii is itself a product of half-open intervals, for each k = 1, . . . ,dwe can enumerate all endpoints of all intervals in the k-th coordinate directions that appear in theproduct constituting Ii for some i. Labeling these endpoints increasingly, for each k = 1, . . . ,d weget a sequence

    ak,1 ≤ ak,2 ≤ ·· · ≤ ak,2n−1 ≤ ak,2n. (2.12)We can now use these to define a product partition

    ⋃mj=1 J j of I, where m = (2n)

    d and each J j isa half-open box of the form

    (a1,i1 ,a1,i1+1]×·· ·× (ad,id ,ad,id+1] (2.13)

    (which is empty whenever ak,ik+1 = ak,ik for any k = 1, . . . ,d). Then

    |I|=m

    ∑j=1|J j| (2.14)

    by STEP 1.Thanks to disjointness of {Ii}, each J j has a non-empty intersection with at most one Ii, and

    since both {Ii} and {J j} are partitions of I, we have

    J j ∩ Ii 6= /0 ⇒ J j ⊂ Ii. (2.15)

    Setting Si := { j : J j∩ Ii 6= /0}, it then follows that {J j : j ∈ Si} constitutes a product partition of Ii,for each i = 1, . . . ,n. By STEP 1 again, we thus have

    ∑j∈Si|J j|= |Ii|, i = 1, . . . ,d, (2.16)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 13

    and summing this over i with the help of (2.14) then yields

    |I|=m

    ∑j=1|J j|=

    n

    ∑i=1

    ∑j∈Si|J j|=

    n

    ∑i=1|Ii| (2.17)

    where we used that Si∩Si′ = /0 when i 6= i′ and⋃n

    i=1 Si = {1, . . . ,m}. �By small modifications of the proof, we also get:

    Corollary 2.5 Let {Ii : i = 1, . . . ,n} ⊂I and I ∈I . Then

    I ⊆n⋃

    i=1

    Ii ⇒ |I| ≤n

    ∑i=1|Ii|. (2.18)

    Proof. Repeating the argument in STEP 2 of the previous proof, we can write⋃n

    i=1 Ii as⋃m

    j=1 J jwhere J j ∈I and {J j} is a product partition arising from collecting all endpoints of all intervalsconstituting {Ii}. Given i ∈ {1, . . . ,n}, let S j be as there. Then Ii =

    ⋃j∈Si J j and thus |Ii| =

    ∑ j∈Si |J j|. Note also that I =⋃m

    j=1(J j∩ I) and {J j∩ I} ⊂I is disjoint. The only part of that proofthat needs a chance is thus (2.17) which becomes

    |I|=m

    ∑j=1|J j ∩ I| ≤

    m

    ∑j=1|J j| ≤

    n

    ∑i=1

    ∑j∈Si|J j|=

    n

    ∑i=1|Ii|, (2.19)

    where the first inequality comes from the fact that |I ∩ J| ≤ |I| whenever I,J ∈ I — the side-lengths of I∩J are at most those of I — while the second inequality follows from the fact that foreach j there is at least one i such that j ∈ Si. �

    Proof of Proposition 2.3. Let us first assume that both {Ii} and {J j} are disjoint collections withequal unions. Then {Ii ∩ J j : i = 1, . . . ,n, j = 1, . . . ,m} is also a disjoint collection of half-openboxes with the same union. Then

    |Ii|=m

    ∑j=1|Ii∩ J j| and |J j|=

    n

    ∑i=1|Ii∩ J j| (2.20)

    by (2.4). Hencen

    ∑i=1|Ii|=

    n

    ∑i=1

    m

    ∑j=1|Ii∩ J j|=

    m

    ∑j=1

    n

    ∑i=1|Ii∩ J j|=

    m

    ∑j=1|J j|. (2.21)

    This proves the claim in the case when {J j} are disjoint and the unions are equal. For the casewhen {J j} are not necessarily disjoint and the union of {Ii} is only included in the union of {J j},we use (2.18) instead of (2.4). �

    2.2 Outer/inner content, measurability.

    We are now ready to proclaim

    {Ii} ⊂I disjoint ⇒ c( n⋃

    i=1

    Ii)

    :=n

    ∑i=1|Ii|. (2.22)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 14

    to be the content of any elementary set. (The independence of representation follows from Propo-sition 2.3.) A trivial consequence of the definition is

    E,F ∈ E disjoint ⇒ c(E ∪F) = c(E)+ c(F). (2.23)

    In order to extend the notion of content to more general sets, we invoke approximations:

    Definition 2.6 (Inner/outer content, measurability) Let A⊂ R. Then we define:(1) the outer Peano-Jordan content c?(A) of A by

    c?(A) := inf{ n

    ∑i=1|Ii| : {Ii} ⊂I AND

    n⋃i=1

    Ii ⊃ A}, (2.24)

    (2) the inner Peano-Jordan content c?(A) of A by

    c?(A) := sup{ n

    ∑i=1|Ii| : {Ii} ⊂I disjoint AND

    n⋃i=1

    Ii ⊂ A}. (2.25)

    (3) We say that A is Peano-Jordan measurable if c?(A) = c?(A). In such a case we call

    c(A) := c?(A) (= c?(A)) (2.26)

    the Peano-Jordan content of A. LetJ denote the class of Peano-Jordan measurable sets.

    First we observe that the apparent discrepancy (use of “disjoint” collections) between the def-inition of outer and inner content plays little role:

    Lemma 2.7 Requiring that the families {Ii} are disjoint in the definition of c?(A) leads to thesame value of the infimum. In particular, for any A⊂ Rd we have

    c?(A)≤ c?(A). (2.27)

    Proof. By Proposition 2.3, representing the union⋃m

    j=1 J j as a disjoint union⋃n

    i=1 Ii does notincrease the value of the sum of (pre)contents. Once “disjoint” is required also in the definitionof c?(A), Proposition 2.3 ensures that we get a value no smaller than c?(A). �

    We are then able to conclude:

    Lemma 2.8 (Elementary sets are measurable) Every A ∈ E is measurable and c(A) agrees withthe expression in (2.22).

    Proof. Let A be elementary. Using A itself in the approximations leading to c?(A) and c?(A)shows, with the help of the lemma before, that c?(A) = c?(A) = the value in (2.22). �

    Hence, the outer content of A is thus the least value of content seen in approximations of A byelementary sets from outside while the inner content of A is the largest value of content seen inapproximations of A by elementary sets from within, i.e.,

    c?(A) = inf{

    c(E) : E ∈ E , A⊆ E}

    (2.28)

    andc?(A) = sup

    {c(E) : E ∈ E , E ⊆ A

    }. (2.29)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 15

    Just as for the concept of Riemann integrability, being Peano-Jordan measurable then signifiesthat these approximations result in the same value.

    Let us now collect some basic consequences of the definitions of outer/inner content. The firstone of these is:

    Lemma 2.9 (Positivity and monotonicity) For any A, both c?(A)≥ 0 and c?(A)≥ 0. In fact,

    A⊂ B ⇒ c?(A)≤ c?(B) and c?(A)≤ c?(B). (2.30)

    Proof. The monotonicity follows immediately. The positivity is then a consequence of /0 ∈ I ,c?( /0) = 0 and (2.27). �

    This has a trivial corollary that does not even need a proof:

    Corollary 2.10 (Null sets are measurable) If A is a null set in the sense that c?(A) = 0 thenA ∈J and, in fact, B ∈J for all B⊆ A.

    Another, albeit somewhat less direct, consequence of the definitions is:

    Lemma 2.11 (Sub/superadditivity) For any sets A1, . . . ,An,

    c?( n⋃

    i=1

    Ai)≤

    n

    ∑i=1

    c?(Ai) (subadditivity of c?) (2.31)

    Similarly, if Ai∩A j = /0 whenever i 6= j, then also

    c?( n⋃

    i=1

    Ai)≥

    n

    ∑i=1

    c?(Ai). (superadditivity of c?) (2.32)

    Proof. We start with (2.31). In light of positivity, we may assume that c?(Ai) < ∞ for all i;otherwise there is nothing to prove. Let ε > 0 and, for each i = 1, . . . ,n find Ei ∈ E such that

    Ai ⊂ Ei and c?(Ai)+ ε/n≥ c(Ei), i = 1, . . . ,n. (2.33)

    Since⋃n

    i=1 Ei ⊃⋃n

    i=1 Ai, from Lemma 2.2, (2.28) and (2.23) we thus get

    c?( n⋃

    i=1

    Ai)≤ c( n⋃

    i=1

    Ei)=

    n

    ∑i=1

    c(Ei)≤n

    ∑i=1

    (c?(Ai)+ ε/n

    )≤ ε +

    n

    ∑i=1

    c?(Ai). (2.34)

    But ε is arbitrary and so (2.31) follows. The proof of (2.32) is completely analogous. �It turns out that A ⊂ Rd is unbounded if and only if c?(A) = ∞. (It is instructive to write a

    proof of this fact.) This shows that, rather than for subsets of all of R, the Peano-Jordan contentshould be considered for subsets of a (measurable) bounded set only. Here we will find anotherconsequence of the above definitions quite handy:

    Lemma 2.12 For any bounded B ∈J and any A⊂ B,

    c?(A) = c(B)− c?(BrA). (2.35)

    In particular,c?(A)− c?(A) = c?(BrA)− c?(BrA). (2.36)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 16

    Proof. Fix ε > 0 and let E,F ∈ E be such that E ⊇ B (this is where we need that B is bounded)and F ⊆ A obey c(F)≥ c?(A)−ε . As E rF ∈ E by Lemma 2.2 and E rF ⊇ BrA, from (2.23)we have

    c(E) = c(F)+ c(E rF)≥ c?(A)− ε + c?(BrA). (2.37)

    Varying E along a sequence such that c(E) decreases to c?(B) = c(B), we get that ≤ in (2.35).For the opposite inequality, given ε > 0, we pick E,F ∈ E such that E ⊆ B and F ⊇ BrA

    (again using the boundedness of B) obeys c(F) ≤ c?(BrA)+ ε . Then E rF ⊆ A and, sinceE rF ∈ E , from (2.23) we get

    c(E) = c(F)+ c(E rF)≤ c?(BrA)+ ε + c?(A). (2.38)

    Taking E along a sequence such that c(E) tends to c?(B) = c(B), we get ≥ in (2.35). The secondconclusion then follows by applying the first one to A and BrA. �

    Note that going via complements could be considered as an alternative way to define innercontent. (We will see that this is exactly what is done for the Lebesgue measure.) With theseproperties established, we now get:

    Theorem 2.13 The following holds:(1) If A1, . . . ,An ∈J are disjoint, then also

    ⋃ni=1 Ai ∈J and

    c( n⋃

    i=1

    Ai)=

    n

    ∑i=1

    c(Ai). (2.39)

    (2) If A,B are bounded and measurable, then so is A∩B and BrA.In particular, for any bounded D ∈J , the class {A ∈J : A ⊆ D} of Peano-Jordan measurablesubsets of D is an algebra (of subsets of D) and c is an additive set function on it.

    Proof. To get (1) we note that, by Lemmas 2.7 and 2.11,

    n

    ∑i=1

    c?(Ai)≤ c?( n⋃

    i=1

    Ai)≤ c?

    ( n⋃i=1

    Ai)≤

    n

    ∑i=1

    c?(Ai). (2.40)

    The left and right-hand sides agree whenever A1, . . . ,An ∈J . The claim then follows from thedefinition of measurability.

    For (2) let D ∈J be bounded and note that, by Lemma 2.12, A⊆ D is measurable if and onlyif DrA is measurable. Now, for A,B⊆ D,

    A∩B = Dr((DrA)∪ (DrB)

    )ArB = A∩ (DrB)

    (2.41)

    thus show that if A,B are measurable, then so are A∩B and ArB. �

    Remark 2.14 We note that all results in this section use only some very basic facts about ele-mentary sets. Namely, we need that E is closed under unions, intersections and set differencesand the content is well-defined, finite and finitely additive on E .

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 17

    2.3 Non-measurable sets, relation to topology.

    A natural question to ask is whether there are non-measurable sets — i.e., those that fail to bemeasurable. The answer to this is quite easy:

    Lemma 2.15 The set A :=Q∩ [0,1] is not Peano-Jordan measurable.

    Proof. The set A := Q∩ [0,1] is totally disconnected and so it contains no non-trivial intervals.Hence, we get c?(A) = 0. On the other hand, any elementary set that contains A must contain[0,1] and so c?(A)≥ c?([0,1]) = 1. �

    A slightly more subtle question is what properties of sets characterize Peano-Jordan measura-bility. A simple exercise shows:

    Lemma 2.16 Let A ⊂ Rd be bounded. Then A ∈J if and only if for each ε > 0 there areE,F ∈ E with F ⊆ A⊆ E and c?(E rF)< ε .

    Unfortunately, this is still very much in the spirit of the original definition. A more interestingcharacterization arises when we try to relate measurability to topology. Recall that, for any setA ⊂ Rd , A denotes the closure of A while A◦ denotes the interior of A with respect to the usualEuclidean metric (and thus topology) on Rd . The boundary ∂A of A is then given by ∂A :=ArA◦.

    Proposition 2.17 (Topological characterization of PJ-measuability) For any bounded A⊆ Rd ,c?(A) = c?(A) and c?(A) = c?(A◦) (2.42)

    and so A is measurable if and only if c?(A) = c?(A◦). Hence, if A is bounded and measurablethen so is any B such that A◦ ⊆ B⊆ A and c(B) = c(A) for all of these. In particular,

    A ∈J ⇒ c(A) = c(A) = c(A◦). (2.43)Finally, for A⊆ Rd bounded, Peano-Jordan measurability is characterized by

    A ∈J ⇔ c?(∂A) = 0. (2.44)

    Proof. We begin by proving (2.42). Let E ∈ E be an elementary sets such that A ⊆ E andc?(A)+ ε ≥ c(E). Writing E as the disjoint union

    ⋃ni=1 Ii of half-open boxes {Ii}, we now find

    (by enlarging Ii slightly in all directions) half-open boxes I′i such that

    Ii ⊆ I′i and |I′i | ≤ |Ii|+ ε/n. (2.45)The fact that A is the smallest closed set containing A implies

    A⊆n⋃

    i=1

    Ii ⊆n⋃

    i=1

    I′i = E′ (2.46)

    and we thus get

    c?(A)≤n

    ∑i=1|I′i | ≤

    n

    ∑i=1

    (|Ii|+ ε/n

    )= ε + c(E)≤ c?(A)+2ε. (2.47)

    The monotonicity with respect to inclusion then yields c?(A) = c?(A). The second part of (2.42)is proved analogously: We find an elementary F ⊆ A with c?(A)− ε ≤ c(F), write it as a disjointunion of half-open boxes {Ii}, shrink these to half-open boxes I′i ⊂ (Ii)◦ with close value of

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 18

    content, use that A◦ is the largest open set contained in A and then apply a similar calculation asin (2.47) to conclude c?(A◦)≥ c?(A). Monotonicity then finishes the claim.

    From (2.42) we then immediately get that A ∈J if and only if c?(A) = c?(A◦). Then (2.43)follows from the monotonicity of inner/outer content with respect to inclusion. Similarly we getthat, once A ∈J , A◦ ⊆ B ⊆ A implies B ∈J as well. It thus remains to prove the equivalence(2.44). Lemma 2.16 gives the implication⇒: If E,F are as in the lemma, then ∂A⊂ ErF and soc?(∂A)≤ c(E rF), which can be made as small as desired. For the opposite implication,⇐, weuse c?(∂A) = 0 to find a non-empty F ∈ E such that ∂A⊂ F◦ and c(F)< ε . Then ArF◦ ⊆ A◦.But ArF◦ is closed and so one can find E ∈ E such that

    ArF◦ ⊆ E ⊆ A◦ and c?(A)− ε ≥ c(E). (2.48)

    But E ∪F ∈ E and E ∪F ⊇ A. Therefore,

    c?(A)≤ c(E ∪F)≤ c(E)+ c(F)< c?(A◦)+2ε. (2.49)

    Hence, c?(A) = c?(A◦) and, by the first part of the claim, A ∈J . �

    2.4 Connection with Riemann integral.

    The fact that measurability of A implies measurability of both A and A◦ seems quite natural:indeed, we usually expect that the closure and interior of a set are more regular than the set itself.However, this conclusion has its limitations:

    Lemma 2.18 There exists a closed set A⊂ [0,1] which is not Peano-Jordan measurable.

    Proof (sketch). One example for this would be any fat Cantor set in [0,1]. (The existence of suchsets is what underlies Lemma 1.8.) Even more simple is an example constructed as follows: Let{qn} enumerate Q and, given ε > 0, let

    A := [0,1]r⋃n≥1

    (qn− ε2−n,qn + ε2−n). (2.50)

    It is obvious that A is closed (possibly empty). By truncating the union at a finite n and replacingopen intervals by half-open ones, we readily find c?(A)≥ 1−2ε . On the other hand, A is totallydisconnected and so c?(A) = 0. So for ε < 1/2, A is a non-empty closed set which is not Peano-Jordan measurable. �

    The conclusion that the contents of A, A and A◦ are all equal was, at the time of conceptionof this theory, considered as completely natural. However, later it was found too restrictive and,in fact, a limitation of the theory. Here we will demonstrate this further by stating without prooftwo lemmas that show that measurability is essentially equivalent to Riemann integrability.

    Lemma 2.19 Let A ⊆ Rd be a bounded set. Then A ∈J if and only if 1A, the characteristicfunction of A, is Riemann integrable.

    Lemma 2.20 Let a,b ∈ R obey a < b and let f : [a,b]→ [0,∞) be a function. Then

    f is Riemann integrable ⇔{(x, t) ∈ [a,b]× [0,∞) : f (x)≤ t

    }∈J (2.51)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 19

    Underlying these is the fact that the Peano-Jordan content is not countably additive. Theobstruction is explained in the next lemma, whose proof we also leave to the reader:

    Lemma 2.21 There are sets A1,A2, · · · ∈J with Ai ∈ [0,1] for all i such that⋃∞

    i=1 Ai 6∈J .

    (Note, however, that if the union is measurable, and the sets Ai are disjoint, the content iscountably additive on this sequence.) The punchline is that, despite all of the conceptual develop-ments, we still have not left the paradigm (and thus have to face also the associated shortcomings)of Riemann integrability. However, as we will see in the next chapter, we are already on a goodway to do so.

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 20

    3. LEBESGUE MEASUREWe can now finally start developing the theory of Lebesgue measure. There are two key im-provements compared to Peano-Jordan content: We define the outer measure using countable(as opposed to finite) covers and, abandoning the concept of inner approximations, we definemeasurability directly from outer measure.

    Since we will often talk about infinite series throughout, it is useful to recall the followingbasic facts: For any sequence {ai : i ∈ N} satisfying ai ≥ 0 for each i,

    ∑i=1

    ai := limn→∞

    n

    ∑i=1

    ai (3.1)

    coincides with

    ∑i∈N

    ai := sup{

    ∑i∈A

    ai : A⊂ N, finite}. (3.2)

    In particular, the value of the infinite sum is the same for all reorderings of the sequence {ai}. Forsuch absolutely converging sums we pretty much get all of the standard properties of finite sums.For instance, if ai,bi ≥ 0 and ε ≥ 0, then

    ∑i∈N

    (ai + εbi) = ∑i∈N

    ai + ε ∑i∈N

    bi. (3.3)

    Similarly,0≤ ai ≤ bi ∀i ∈ N ⇒ ∑

    i∈Nai ≤ ∑

    i∈Nbi. (3.4)

    Finally, if {ai j : i, j ∈ N} obey ai j ≥ 0, then

    ∑i, j∈N

    ai j = ∑i∈N

    ∑j∈N

    ai j (3.5)

    (This will be later called the discrete Tonelli Theorem.) We leave (easy) verification of theseclaims to the reader.

    3.1 Lebesgue outer measure.

    Let us now move to the development of Lebesgue’s theory. Recall that I denotes the class ofnonempty half-open boxes in Rd and that the outer content was defined by

    c?(A) := inf{ n

    ∑i=1|Ii| : {Ii} ⊂I AND

    n⋃i=1

    Ii ⊃ A}. (3.6)

    The first important insight provided by Lebesgue is to enlarge the covers to countably infiniteones. This leads to:

    Definition 3.1 (Lebesgue outer measure) For any A ⊂ Rd , the Lebesgue outer measure λ ?(A)of A is defined by

    λ ?(A) := inf{ ∞

    ∑i=1|Ii| : {Ii} ⊂I AND

    ∞⋃i=1

    Ii ⊃ A}, (3.7)

    The outer measure has the following properties (which, as we will see, can be used to definedouter measures abstractly):

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 21

    Proposition 3.2 (Outer measure properties) We have:(1) (positivity) λ ?(A)≥ 0 and λ ?( /0) = 0,(2) (monotonicity) if A⊂ B, then λ ?(A)≤ λ ?(B),(3) (countable subadditivity) for any sets {Ai : i ∈ N},

    λ ?(⋃

    i∈NAi

    )≤ ∑

    i∈Nλ ?(Ai). (3.8)

    Proof. (1) The real line can be covered by a countable union of unit intervals and so λ ?(A) ≥ 0for all sets A. The claim for A := /0 would be trivial if /0 were included I . However, since it isnot, for any ε > 0 we pick a collection {Ii} ⊂I such that |Ii| ≤ ε2−i. Then µ?( /0)≤∑ni=1 |Ii| ≤ εand so we must have µ( /0) = 0.

    For (2) we just observe that if {Ii} ⊂I covers A then it covers B. So the set of covers of A isat least as large as the set of covers of B.

    To get (3), we may as well assume that µ?(Ai)< ∞ for all i because otherwise the claim holdstrivially. Given ε > 0, for each i we can then find {Ii j : j ∈ N} ⊂I such that

    Ai ⊂⋃j∈N

    Ii j and λ ?(Ai)+ ε2−i ≥ ∑j∈N|Ii j| (3.9)

    hold for each i. But then ⋃i∈N

    Ai ⊂⋃

    i, j∈NIi j (3.10)

    and so, by the definition of λ ?,

    λ ?(⋃

    i∈NAi

    )≤ ∑

    i, j∈N|Ii j|= ∑

    i∈N∑j∈N|Ii j|

    ≤ ∑i∈N

    (λ ?(Ai)+ ε2−i

    )= ε + ∑

    i∈Nλ ?(Ai)

    (3.11)

    where we used (3.3–3.5) and ∑i∈N 2−i = 1 in the intermediate steps. Since this holds for all ε > 0,the claim follows. �

    A natural question is whether the outer measure in fact gives a better approximation of the“volume” of A from without than the outer content does. This is settled in:

    Lemma 3.3 The infimum defining λ ?(A) is unchanged if we permit Ii to be empty. In particular,for each A⊂ Rd ,

    λ?(A)≤ c?(A). (3.12)

    Proof. Let {Ii : i = 1, . . . ,n} be a cover of A by Ii ∈I ∪{ /0}. Given ε > 0, define I′i := Ii if Ii 6= /0and I′i :=(0,ε2−i]d otherwise. Then {I′i}⊂I is a cover of A and ∑i∈N |I′i | ≤ εd +∑i∈N |Ii|. Hence,allowing Ii to be empty does not affect the value of the infimum. The conclusion λ?(A) ≤ c?(A)then follows by comparing (3.6) with (3.7). �

    The comparison with the inner content is somewhat harder. We will need a lemma:

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 22

    Lemma 3.4 Let I ∈I and let {Ii} ⊂I be such that I ⊂⋃

    i∈N Ii. Then

    |I| ≤ ∑i∈N|Ii|. (3.13)

    Proof. In d = 1 this may perhaps be proved by reordering the sets monotonically using transfiniteinduction but an argument based on topology is much easier. We will use that each half-openbox contains a closed box, and is contained in an open box, with both of these of nearly the samevolume. More precisely, fix ε > 0. Then there is J ∈I such that J⊂ I and |J| ≥ |I|−ε . Similarly,for each i ∈ N, there is I′i ∈I be such that Ii ⊂ (I′i )◦ and |I′i | ≤ |Ii|+ ε2−i for each i.

    A key fact is that {(I′i )◦ : i ∈ N} form an open cover of the compact set J and so, by theHeine-Borel Theorem, there is n ∈ N such that

    J ⊂ J ⊂n⋃

    i=1

    (I′i )◦ ⊂

    n⋃i=1

    I′i . (3.14)

    By Corollary 2.5 we thus get

    |I|− ε ≤ |J| ≤n

    ∑i=1|I′i | ≤ ε +

    n

    ∑i=1|Ii| ≤ ε + ∑

    i∈N|Ii|. (3.15)

    Since ε was arbitrary, we are done. �This implies:

    Lemma 3.5 For any A⊂ Rd ,λ ?(A)≥ c?(A). (3.16)

    In particular, if A ∈J , then λ ?(A) = c(A).

    Proof. We start by proving the ultimate conclusion for elementary sets: If E ∈ E , then λ ?(E) =c(E). Clearly, λ ?(E)< ∞ for any E ∈ E . By Lemma 3.3, for any E ∈ E we have λ ?(E)≤ c(E).Now, given ε > 0, let {Ii} be a cover of E such that λ ?(E) ≥ ∑i∈N |Ii| − ε . Write E =

    ⋃nj=1 J j

    with with {J j} ⊂I disjoint. Then {Ii∩ J j : i ∈ N} is a cover of J j and so, by Lemma 3.4,

    |J j| ≤ ∑i∈N|Ii∩ J j|. (3.17)

    On the other hand, {Ii∩ J j : j = 1, . . . ,n} is a disjoint partition of Ii and so, by Lemma 2.4,

    |Ii|=n

    ∑j=1|Ii∩ J j|. (3.18)

    Combining these we get

    λ ?(E)+ ε ≥ ∑i∈N|Ii|= ∑

    i∈N

    n

    ∑j=1|Ii∩ J j|

    =n

    ∑j=1

    ∑i∈N|Ii∩ J j| ≥

    n

    ∑j=1|J j|= c(E)

    (3.19)

    where we again used (3.5) to write the two sums in a convenient order. Since ε was arbitrary, weget λ ?(E) = c(E) as claimed.

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 23

    Now recall that c?(A) is the supremum of c(E) over all E ∈ E with E ⊂ A. Thus we get

    c?(A) = sup{

    λ ?(E) : E ∈ E , E ⊂ A}. (3.20)

    But Proposition 3.2(2) gives λ ?(E) ≤ λ ?(A) for all such E and so c?(A) ≤ λ ?(A) as well. IfA ∈J , then c?(A) = c?(A) and so they also equal λ ?(A). �

    The above argument has a simple, albeit quite non-trivial, consequence:

    Corollary 3.6 If {Ii} ⊂I are disjoint, then

    λ ?(⋃

    i∈NIi

    )= ∑

    i∈N|Ii| (3.21)

    In particular, for any pair of disjoint collections {Ii} ⊂I and {J j} ⊂I ,⋃i∈N

    Ii =⋃j∈N

    J j ⇒ ∑i∈N|Ii|= ∑

    j∈N|J j| (3.22)

    Proof. By subadditivity (Proposition 3.2(3)) we have “≤.” For the other direction we use that⋃i∈N Ii ⊃ E for E :=

    ⋃ni=1 Ii. This is an elementary set so the previous lemma gives us

    λ ?(⋃

    i∈NIi

    )≥ λ ?

    ( n⋃i=1

    Ii

    )= c( n⋃

    i=1

    Ii

    )=

    n

    ∑i=1|Ii| (3.23)

    by Proposition 3.2(2) and Theorem 2.13(1). Taking n→ ∞, we get “≥” as well. The second partfollows trivially from the first. �

    We remark that, although (3.22) is quite intuitive, a direct proof may be quite challenging.

    Lemmas 3.3 and 3.5 can be further strengthened for bounded sets. Given a bounded A ⊂ Rdand any I ∈I such that A⊂ I, define

    λ?(A) := |I|−λ ?(I rA) (3.24)

    This is the inner Lebesgue measure. As is easy to check (using, e.g., Corollary 3.6), this definitionis independent of the choice of I. We then have:

    Lemma 3.7 Let A⊂ Rd be bounded. Then

    c?(A)≤ λ?(A)≤ λ ?(A)≤ c?(A). (3.25)

    In particular, if A ∈J , then λ ?(A) = λ?(A).

    Proof. This is a direct consequence of Lemmas 3.3, 3.5 and 2.12. �

    This property suggest calling a bounded set A ⊂ Rd to be Lebesgue measurable wheneverλ ?(A) = λ?(A). The conclusion of the lemma is then that every Peano-Jordan measurable setis automatically Lebesgue measurable. Unfortunately, defining Lebesgue measurability via thisroute would make the whole concept restricted to bounded sets. So another, more general, defini-tion will need to be employed that subsumes the inner/outer measure definition for bounded sets.

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 24

    3.2 Lebesgue measurable sets.

    Our definition of measurability will be tied to topology. Recall that a set O⊂ Rd is open if, witheach point x, it contains a ball of a positive radius centered at x. The class of open sets defines atopology. The following property ties topology to the Lebesgue outer measure:

    Lemma 3.8 For each A⊂ Rd ,

    λ ?(A) = inf{

    λ ?(O) : O⊂ A, open}. (3.26)

    Proof. Whenever A⊂O, we have λ ?(A)≤ λ ?(O) by Proposition 3.2(2). Hence, λ ?(A) is at mostthe infimum. For the opposite inequality assume, without loss of generality, that λ ?(A) < ∞.Then for each ε > 0 there are {Ii} ⊂ I such that A ⊂

    ⋃i∈N Ii and λ ?(A)+ ε ≥ ∑i∈N |Ii|. Now

    enlarge each Ii to get I′i open with I′i ⊃ Ii and |I′i | ≤ |Ii|+ ε2−i. Define O :=

    ⋃i∈N I

    ′i . Then O is

    open and O⊃ A. Moreover,

    λ ?(O)≤ ∑i∈N|I′i | ≤ ε + ∑

    i∈N|Ii| ≤ 2ε +λ ?(A). (3.27)

    Since ε was arbitrary, the claim follows. �When an open set O⊃ A is gradually decreased along a sequence On so that λ ?(On) ↓ λ ?(A),

    we may ask whether λ ?(On rA) tends to zero as well. We may even consider the intersectionB :=

    ⋂n∈N On and ask about the value of λ ?(BrA). None of these seem to hold generally and so

    we single out a class of sets for which these properties can be proved:

    Definition 3.9 (Lebesgue measurable sets) A set A ⊂ Rd is Lebesgue measurable if for ev-ery ε > 0 there is an open set O⊂ A such that λ ?(OrA)< ε .

    We will write L or, when explicit mention of dimension is necessary, L (Rd) to denote theclass of Lebesgue measurable sets in Rd . This class has the following properties:

    Proposition 3.10 We have:(1) if O⊂ Rd is open, then O ∈L (Rd),(2) if {An} ⊂L (Rd), then also

    ⋃n∈N An ∈L (Rd),

    (3) if C ⊂ Rd is closed, then C ∈L (Rd),(4) if λ ?(A) = 0 then A ∈L (Rd),(5) if A ∈L (Rd) then also Ac ∈L (Rd).

    Proof of (1) and (2). (1) is trivial as for A open one takes O := A. For (2), let {An} ⊂L (Rd).Then, given ε > 0, we can find open sets On ⊃ An such that λ ?(On rAn) < ε2−n. Then A :=⋃

    n∈N An ⊂ O :=⋃

    n∈N On with O open and, since⋃n∈N

    On r⋃

    n∈NAn ⊂

    ⋃n∈N

    (On rAn), (3.28)

    the monotonicity and countable subadditivity of λ ? show

    λ ?(An rOn)≤ ∑n∈N

    λ ?(On rAn)≤ ∑n∈N

    ε2−n = ε. (3.29)

    Hence A ∈L (Rd) as well. �

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 25

    For the proof of the remaining parts, we need some lemmas:

    Lemma 3.11 If O⊂ Rd is open then there are {Ii} ⊂I disjoint such that O =⋃

    i∈N Ii.

    Proof. This is based on the use of dyadic partitions of Rn. Let

    Dn :={(

    k12−n,(k1 +1)2−n]×·· ·×

    (kd2−n,(kd +1)2−n

    ]: k1, . . . ,kd ∈ Z

    }. (3.30)

    We call elements of Dn the dyadic (half-open) cubes of base 2−n. These are rather special because,for any m,n ∈ N with m≤ n we have

    I ∈Dm, J ∈Dn ⇒ J ⊆ I OR I∩ J = /0. (3.31)This is because the partitions induced by Dn and Dm are nested.

    Now pick O open and define a sequence of sets An inductively as follows: Set A1 := O and,given An, let An+1 be the set of all x ∈ An such that if I ∈ Dn+1 obeys x ∈ I, then An r I 6= /0. (Inother words, An+1 is An with all I ∈Dn+1 that fit entirely into An removed.) Then

    An rAn−1 =m(n)⋃i=1

    Jn,i (3.32)

    for some Jn,i ∈Dn with {Jn,i : n∈N, i= 1, . . . ,m(n)} disjoint. Obviously, An decreases with n; weclaim that An ↓ /0. Indeed, with each x ∈O, the set O contains an open Euclidean ball containing xand thus also a dyadic cube I ∈Dn+1 with x ∈ I, for some n ∈ N. By (3.31), either this I is eitherentirely contained in one of dyadic cubes constituting OrAn, or is disjoint from all of them. Inthe former case x ∈ OrAn, in the latter case x ∈ ArAn+1. Hence,

    O = A1 =⋃n≥2

    (An rAn−1) =⋃n≥2

    m(n)⋃i=1

    Jn,i. (3.33)

    Since the dyadic cubes on the right are by construction disjoint, the claim follows. �

    The other lemma gives a statement concerning additivity of the outer measure λ ?. We notethat outer measures are generally not additive — i.e., we may have λ ?(A∪B) < λ ?(A)+λ ?(B)even if A,B⊂Rd obey A∩B = /0. However, if we assume that A and B are separated by a positivedistance, then additivity holds:

    Lemma 3.12 Let ρ(A,B) := inf{|x− y|∞ : x ∈ A, y ∈ B} where |x− y|∞ := maxi=1,...,d |xi− yi|.Then for all A,B⊂ Rd ,

    ρ(A,B)> 0 ⇒ λ ?(A∪B) = λ ?(A)+λ ?(B). (3.34)

    Proof. In light of subadditivity of λ ?, we only need to show “≥” which in turn permits us toassume λ ?(A∪B)< ∞. We will use the fact that, by Lemma 2.4, given any δ > 0, if every cover{Ii} of a set by half-open boxes can be refined into a cover by boxes whose every side is lessthan δ without changing ∑i∈N |Ii|.

    Thus, pick δ := 13 ρ(A,B) and let {Ii} be a cover of A∪B by half-open boxes (or empty sets)with the largest side at most δ so that

    λ ?(A∪B)+ ε ≥ ∑i∈N|Ii|. (3.35)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 26

    Define S := {i ∈ N : Ii∩A 6= /0} and note that then A⊂⋃

    i∈S Ii. By our choice of δ , i ∈ S impliesIi∩B = /0 and so also B⊂

    ⋃i∈NrS Ii. Hence, by the definition of λ ? and (3.35),

    λ ?(A)+λ ?(B)≤∑i∈S|Ii|+ ∑

    i∈NrS|Ii|= ∑

    i∈N|Ii| ≤ λ ?(A∪B)+ ε. (3.36)

    Since ε was arbitrary, the claim thus follows. �We may now finish the proof of Proposition 3.10:

    Proof of (3). Pick C ⊂ Rd closed and assume, for the start, that C is also bounded (and thuscompact). If O is an open set such that O⊃C, then also OrC is open. By Lemma 3.11 there aredisjoint {Ii} ⊂I so that

    OrC =⋃i∈N

    Ii. (3.37)

    By Corollary 3.6,λ ?(OrC) = ∑

    i∈N|Ii|. (3.38)

    Next let ε > 0 and let I′i ∈I be such that I′i ⊂ Ii and |I′i |+ε2−i ≤ |Ii|. By the Bolzano-Weierstrasstheorem, two compact sets A,B ⊂ Rd with A∩B = /0 necessarily obey ρ(A,B) > 0, and since afinite union of compact sets is compact,

    ρ(

    C,n⋃

    i=1

    I′i)> 0, n ∈ N. (3.39)

    Corollary 3.6 gives λ ?(⋃n

    i=1 I′i ) = ∑

    ni=1 |I′i | and, by (3.37), the monotonicity of λ ?, Lemma 3.12

    and the construction of {I′i} we thus get

    λ ?(O)≥ λ ?(

    C∪n⋃

    i=1

    I′i)

    = λ ?(C)+λ ?( n⋃

    i=1

    I′i)= λ ?(C)+

    n

    ∑i=1|I′i |

    ≥ λ ?(C)− ε +n

    ∑i=1|Ii|.

    (3.40)

    With the help of (3.38), the limits n→ ∞ and ε ↓ 0 yield

    λ ?(OrC) = ∑i∈N|Ii| ≤ λ ?(O)−λ ?(C). (3.41)

    Since C is bounded, λ ?(C) < ∞. By Lemma 3.8 for each ε > 0 there is O ⊃ C open so thatλ ?(O) ≤ λ ?(C)+ ε . The above then shows λ ?(OrC) < ε and so C ∈L (Rd). This proves (3)for every C compact. As every closed set in Rd can be written as a countable union of compactsets, (2) extends this to general closed sets as well. �

    Proof of (4). Let A⊂Rd be such that λ ?(A) = 0. Since Rd is open, the infimum in Lemma 3.8 isnot over an empty set. Hence, for each ε > 0 there is an O ⊃ A open such that λ ?(O) < ε . Butλ ?(OrA)≤ λ ?(O) and so A ∈L (Rd) as claimed. �

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 27

    Proof of (5). Let A ∈ L (Rd). Then there are On ⊃ A open with λ ?(On r A) → 0. DefineB :=

    ⋃n∈N O

    cn and note that A

    crB⊂AcrOcn for each n∈N. But A⊂On implies AcrOcn =OnrAand so

    λ ?(Ac rB)≤ λ ?(Ac rOcn) = λ ?(On rA) −→n→∞ 0. (3.42)

    Hence λ ?(Ac rB) = 0 and, by (4), Ac rB ∈ L (Rd). But B ∈ L (Rd) as well by (1,2,3) and,since B⊂ Ac, also Ac = B∪Ac ∈L (Rd) by (2). �

    Since /0 and Rd are automatically open, Proposition 3.10 shows that the structure of L (Rd)fits into the following important concept:

    Definition 3.13 A class M of subsets of a set X is a σ -algebra if

    (1) /0,X ∈M ,(2) A ∈M ⇒ Ac ∈M ,(3) {An} ∈M ⇒

    ⋃n∈N An ∈M .

    Note that this specializes the concept of an algebra, introduce earlier in these notes, by requir-ing that also countable unions of sets in M are contained in M . Obviously, M is then closedalso under countable intersection and set differences.

    An important question is whether the characterization we used to define L (Rd) is unique.Consider the following classes of sets:

    Fσ (Rd) :={⋃

    n∈NKn : Kn ⊂ Rd , closed

    }(3.43)

    and

    Gδ (Rd) :={⋂

    n∈NOn : On ⊂ Rd , open

    }(3.44)

    As a simple consequence of Proposition 3.10, we get:

    Corollary 3.14 (Alternative definitions of Lebesgue measurable sets) A ∈L (Rd) is equivalentto any of these three statements:

    (1) For each ε > 0 there is K ⊂ A closed so that λ ?(ArK)< ε .(2) There is B ∈ Gδ (Rd) such that B⊃ A and λ ?(BrA) = 0.(3) There is C ∈Fσ (Rd) such that C ⊂ A and λ ?(ArC) = 0.

    Proof. (1) If O⊃ Ac is open then K := Oc is closed with K ⊂ A and ArK = OrAc. For (2) usethe argument from the proof (5) of Proposition 3.10; (3) then follows by complementation. �

    3.3 Lebesgue measure.

    We now move on to defining the Lebesgue measure. Here is a general definition:

    Definition 3.15 (General measure) Let M be a σ -algebra of subsets of a set X . The mapµ : M → [0,∞] is a measure if

    (1) µ( /0) = 0, and

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 28

    (2) if {An} ⊂M are disjoint, then

    µ(⋃

    n∈NAn)= ∑

    n∈Nµ(An). (3.45)

    Condition (2) is referred to as countable subadditivity of µ .

    The main purpose of condition (1) is to ensure that µ is not identically infinity on M . (Notethat, by (2), µ( /0) can only take values 0 or ∞.) Going back to our main line of reasoning, we nowclaim:

    Proposition 3.16 The restriction of λ ? to L (Rd) is a measure.

    Proof. We have λ ?( /0) = 0 so we only need to show that λ ? acts additively on countable unionsof disjoint sets from L (Rd). First we prove finite additivity for bounded sets. Let A,B ∈L (Rd)be bounded. By Corollary 3.14, for each ε > 0 there are closed sets K1 ⊂ A and K2 ⊂ B (thuscompact) such that λ ?(ArK1),λ ?(BrK2) < ε . If also A∩B = /0, the fact that K1 and K2 arecompact ensures that ρ(K1,K2)> 0. Hence

    λ ?(K1∪K2) = λ ?(K1)+λ ?(K2) (3.46)

    by Lemma 3.12. But λ ?(A)≤ λ ?(K1)+ε and λ ?(B)≤ λ ?(K2)+ε by subadditivity of λ ? and so

    λ ?(A∪B)≥ λ ?(K1∪K2)≥ λ ?(A)+λ ?(B)−2ε. (3.47)

    By subadditivity again, λ ? is additive on finite unions of bounded disjoint sets.Now consider a countable collection {An} ⊂L (Rd) of bounded disjoint sets. Then

    λ ?(⋃

    n∈NAn)≥ λ ?

    ( n⋃i=1

    Ai)=

    n

    ∑i=1

    λ ?(Ai)n→∞ ∑

    n∈Nλ ?(An) (3.48)

    where we used finite additivity to get the middle equality. Hence, λ ? is additive on countableunions of disjoint bounded sets. To remove the boundedness restriction, let D be a partitionof Rd into shifts of (0,1]d by integers in each lattice direction. If {Dm} enumerates D and{An} ⊂L (Rd) is a disjoint collection, then

    λ ?(⋃

    n∈NAn)= λ ?

    ( ⋃m,n∈N

    (An∩Dm))

    = ∑m,n∈N

    λ ?(An∩Dm) = ∑n∈N

    ∑m∈N

    λ ?(An∩Dm)

    = ∑n∈N

    λ ?(⋃

    n∈N(An∩Dm)

    )= ∑

    n∈Nλ ?(An)

    (3.49)

    where the second and fourth equality follow because {(An ∩Bm) : m,n ∈ N} is a countable col-lection of disjoint bounded sets, the third equality is a consequence of (3.5) and the last equalityfollows because {Dm : m ∈ N} form a partition of Rd . �

    Definition 3.17 (Lebesgue measure) Let us write λ (A) for λ ?(A) whenever A ∈L (Rd). Wewill call λ (A) the Lebesgue measure of A.

    Returning to the concept of inner measure for bounded sets, we now observe:

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 29

    Proposition 3.18 Let A⊂ Rd be bounded. Then A ∈L (Rd) if and only if λ ?(A) = λ?(A).

    Proof. Homework exercise. �

    The Lebesgue measure has a lot of symmetries. In particular, we have:

    Proposition 3.19 (Translation invariance) For any A⊂ Rd and z ∈ Rd , define

    z+A := {z+ x : x ∈ A} (3.50)

    Then for all A ∈L (Rd) and all z ∈ Rd ,

    z+A ∈L (Rd) and λ (z+A) = λ (A). (3.51)

    Proof. Homework exercise (easy). �

    The Lebesgue measure is further invariant under rotations (i.e., linear maps induced by orthog-onal matrices) and transforms nicely under dilations. We will revisit this question later.

    3.4 Convergence Theorems.

    As a corollary to the definition of measure, we get useful convergence theorems. (We state thesefor a general measure as no other properties are needed.) The first theorem deals with mono-tone sequences. Recall that we write An ↑ A whenever n 7→ An is increasing and A =

    ⋃n∈N An.

    Similarly, we write An ↓ A whenever n 7→ An is decreasing and A =⋂

    n∈N An.

    Theorem 3.20 (Monotone Convergence Theorem for sets) Let µ be a measure on σ -algebra M .Let {An} ∈M . Then we have:

    (1) If An ↑ A, then µ(An) ↑ µ(A).(2) If An ↓ A and µ(A1)< ∞, then µ(An) ↓ µ(A).

    Proof. If An ↑ A then {An rAn−1 : n ∈ N}, where A0 := /0, are disjoint with

    An =n⋃

    k=1

    (Ak rAk−1) and A =⋃k∈N

    (Ak rAk−1). (3.52)

    Therefore,

    µ(An) =n

    ∑k=1

    µ(Ak rAk−1) −→n→∞ ∑

    k∈Nµ(Ak rAk−1) = µ(A). (3.53)

    This proves (1).For (2), let An ↓ A. Then A1 rAn ↑ A1 rA and so, by µ(A1)< ∞,

    µ(An) = µ(A1)−µ(A1 rAn) −→n→∞

    µ(A1)−µ(A1 rA) = µ(A). (3.54)

    where the limit follows by part (1). �

    Our next convergence result addresses the situation when the sequence of sets is not necessarilymonotone. Given a collection of sets {An}, we define

    limsupn→∞

    An :=⋂n≥1

    ⋃k≥n

    Ak (3.55)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 30

    andliminf

    n→∞An :=

    ⋃n≥1

    ⋂k≥n

    Ak (3.56)

    Obviously, liminfn→∞ An ⊂ limsupn→∞ An and so we saylimn→∞

    An exists if limsupn→∞

    An = liminfn→∞

    An (3.57)

    The limit is the the common value of the limes inferior and the limes superior. As is easy tocheck, this subsumes the aforementioned case when {An} is either increasing or decreasing.

    Lemma 3.21 (Fatou’s lemma for sets) Let µ be a measure on a σ -algebra M . Let {An} ⊂M .Then liminfn→∞ An ⊂M and

    µ(liminf

    n→∞An)≤ liminf

    n→∞µ(An). (3.58)

    Proof. For each m≥ n we have Am ⊃⋂

    k≥n Ak. Hence,

    infm≥n

    µ(Am)≥ µ(⋂

    k≥nAk). (3.59)

    Since⋂

    k≥n Ak ↑ liminfn→∞ An, the claim follows from the Upward Monotone Convergence The-orem for sets. �

    Due to the asymmetry between the Upward and Downward version of the Monotone Conver-gence Theorem for sets, the corresponding theorem for limes superior requires bounded mea-sures:

    Lemma 3.22 (Reverse Fatou’s lemma for sets) Let µ be a measure on a σ -algebra M . Let{An} ⊂M . Then limsupn→∞ An ⊂M and, if there is B ∈M with µ(B) < ∞ and B ⊃ An foreach n, then

    µ(limsup

    n→∞An)≤ limsup

    n→∞µ(An). (3.60)

    Proof. We havelimsup

    n→∞An = Br liminf

    n→∞(BrAn) (3.61)

    Once µ(B)< ∞, the result follows by complementation from Fatou’s lemma. �When the limit of An exists, these simplify into:

    Theorem 3.23 (Dominated Convergence Theorem for sets) Let µ be a measure on σ -algebra M .Let {An} ⊂M be such that limn→∞ An exists. Then limn→∞ An ∈M and if there is a B ∈M withµ(B)< ∞ and B⊃ An for all n, we have

    µ(

    limn→∞

    An)= lim

    n→∞µ(An). (3.62)

    (In particular, the limit on the right exists, albeit only in [0,∞].)

    Proof. Apply Fatou and reverse Fatou to get

    limsupn→∞

    µ(An)≤ µ(

    infn→∞

    An)≤ liminf

    n→∞µ(An). (3.63)

    Hence the limit of µ(An) exists and equals the measure of the limit of An. �

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 31

    We remark that, perhaps with the exception of the Monotone Convergence Theorem, the aboveresults are far more frequently used in their versions for integrals of measurable functions.

    3.5 Nonmeasurable sets.

    We have seen that all open and closed sets are Lebesgue measurable, and so are the sets inFσ (Rd) and Gδ (Rd) — and in particular, also the counterexample to Peano-Jordan measurabilityfrom the proof of Lemma 2.18. In fact, for any set A ⊂L (Rd), if we denote by

    Aσ :={⋃

    n∈NAn : An ∈A

    }(3.64)

    and

    Aδ :={⋂

    n∈NOn : An ∈A

    }(3.65)

    the sets in Aδ ,Aσ ,Aδσ := (Aδ )σ , Aσδ := (Aσ )δ ), Aσδσ := (Aσδ )σ , etc., all lie in L (Rd). Aquestion thus arises whether there are in fact any sets that are not Lebesgue measurable. This wasresolved rather soon after Lebesgue set up his theory:

    Theorem 3.24 (Vitali) Assuming the Axiom of Choice, there is A⊂ [0,1] such that A 6∈L (Rd).

    Here we recall:

    Axiom of Choice For any set X there is a map f : 2X → X so that f (A) ∈ A for every A ∈ 2X .

    In words, the Axiom of Choice guarantees that from each subset A of X we can choose arepresentative element f (x) that, naturally, belongs to A. This axiom of set theory (not the partof the standard ZF system) is often invoked in analysis where a proof without it is not known (orknown not to be possible, as is the example of Vitali’s theorem).Proof of Theorem 3.24. Define an equivalence relation ∼ on R by saying x ∼ y if and onlyif x− y ∈ Q. Denoting by [x] := {y ∈ R : y ∼ x} the equivalence class containing x, we haveR =

    ⋃x∈R[x]. The Axiom of Choice guarantees the existence of a map f : {[x] : x ∈ R} → R so

    that f ([x])∈ [x]. By taking this map modulo integers, we may (and will) assume that f ([x])∈ [0,1)for all x ∈ R.

    Let A := { f (x) : x ∈ R}. Then A⊂ [0,1). Writing q+A := {q+ x : x ∈ A}, we now claim

    ∀q,q′ ∈Q : (q+A)∩ (q′+A) 6= /0 ⇔ q = q′ (3.66)

    Indeed, if x,y ∈ A are such that x+q = y+q′ for q,q′ ∈ Q, then y ∈ [x]. But A contains at mostone representative of [x] and so y = x. Then also q = q′. (The other direction is trivial.)

    From the above we know that {q+A : q ∈Q} are disjoint. Next we observe⋃q∈Q

    (q+A) = R. (3.67)

    This is because each y ∈ R belongs to [x] for some x ∈ A. Finally, we note that, since A⊂ [0,1],⋃q∈[0,1]∩Q

    (q+A)⊂ [0,2]. (3.68)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 32

    Now let us assume that A ∈L (R) and so λ (A) is meaningful. Then Proposition 3.19 tells us thatalso q+A ∈L (R) and

    λ (q+A) = λ (A) (3.69)for all q ∈Q. So, in light of (3.68), (3.66) and

    2 = λ ([0,2])≥ ∑q∈[0,1]∩Q

    λ (q+A) = ∑q∈[0,1]∩Q

    λ (A), (3.70)

    we must have λ (A) = 0. However, (3.67) gives

    λ (R) = ∑q∈Q

    λ (q+A) = ∑q∈Q

    λ (A) = 0 (3.71)

    a contradiction with λ (R) = ∞. Hence A 6∈L (R) after all. �It is known that the Axiom of Choice cannot be left out from the assumptions. In fact, there

    are set theories without the Axiom of Choice where all subsets of [0,1] are measurable.

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 33

    4. LEBESGUE INTEGRAL: BOUNDED CASEHaving developed the concept of the Lebesgue measure, we will now move to defining theLebesgue integral. It would be a waste to restrict attention to integrals with respect to theLebesgue measure only; indeed, working with general measure adds only little extra work. (Wewill still call the integral the Lebesgue integral, though.) The theory is best appreciated if startedfrom integrating bounded functions on sets of finite measure.

    4.1 Bounded functions on finite measure spaces.

    Recall the definitions of the σ -algebra and general measure from Definitions 3.13 and 3.15. Westart with some standard notions:

    Definition 4.1 A pair (X ,F ), where F is a σ -algebra of subsets of X , is called a measurablespace. If µ is a measure on (X ,F ), we call (X ,F ,µ) a measure space. The measure space(X ,F ,µ) is finite if µ(X)< ∞. It is a probability space if µ(X) = 1.

    Let (X ,F ) be a measurable space and consider a bounded function f : X → R. The definitionof the Lebesgue integral we will give is similar to Darboux’ version of Riemann’s integral: Wewill approximate f from below and above by “piece-wise” constant functions — to be calledsimple functions — for which the integral is canonically defined. If the resulting upper and lower(Lebesgue) integrals coincide, we proclaim the common value the (Lebesgue) integral of f . Thesimple functions are defined as follows:

    Definition 4.2 Let (X ,F ) be a measurable space. A map φ : X → R is a simple function ifthere are disjoint sets {A1, . . . ,An} ⊂F and values a1, . . . ,an ∈ R such that

    φ =n

    ∑i=1

    ai1Ai . (4.1)

    Obviously, we may even require that the values {ai} are distinct and non-zero. Under thisrestriction, the representation (4.1) of φ is unique. The integral of simple functions with respectto a measure µ on (X ,F ) is then defined canonically by∫

    φ dµ :=n

    ∑i=1

    ai µ(Ai). (4.2)

    Here we see the reason for requiring the Ai’s to be measurable.The distinctness restriction on {ai} are sometimes inconvenient to impose. However, without

    this restriction the representation of the simple function is no longer unique and one then needsto check that the integral does not depend on the representation. It suffices to prove:

    Lemma 4.3 (Independence of representation) Suppose µ(X) < ∞ and consider a collection ofnumbers a1, . . . ,an,b1, . . . ,bm ∈ R and setsA1, . . . ,An,B1, . . . ,Bm ∈ F be such that both fami-lies {Ai} and {B j} are disjoint and {b j} are distinct. If

    n

    ∑i=1

    ai1Ai(x) =m

    ∑j=1

    b j1B j(x), x ∈ X , (4.3)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 34

    then alson

    ∑i=1

    ai µ(Ai) =m

    ∑j=1

    b j µ(B j). (4.4)

    In particular, (4.2) applies whenever (4.1) holds, regardless of distinctness restriction on {ai}.

    Proof. A term with ai = 0 or b j = 0 can be left out of the expression so let us also assume thata1, . . . ,an,b1, . . . ,bm 6= 0. Then (4.3) shows

    ⋃ni=1 Ai =

    ⋃mj=1 B j. Hence {Ai} and {B j} are both

    disjoint partitions of this union and so

    Ai =m⋃

    j=1

    (Ai∩B j) and B j =n⋃

    i=1

    (Ai∩B j). (4.5)

    Since alsoAi∩B j 6= /0 ⇒ ai = b j (4.6)

    we thus haven

    ∑i=1

    ai µ(Ai) =n

    ∑i=1

    m

    ∑j=1

    ai µ(Ai∩B j)

    =m

    ∑j=1

    n

    ∑i=1

    b j µ(Ai∩B j),=m

    ∑j=1

    b j µ(B j)(4.7)

    where we also used finite additivity of µ . �The integral in (4.2) has the following immediate properties:

    Lemma 4.4 Suppose µ(X)< ∞. Then we have:(1) If φ1, φ2 are simple then so is φ1 +φ2 and∫

    (φ1 +φ2)dµ =∫

    φ1 dµ +∫

    φ2 dµ. (4.8)

    (2) If φ is simple and c ∈ R, then cφ is simple and∫(cφ)dµ = c

    ∫φ dµ. (4.9)

    (3) If φ1, φ2 are simple with φ1 ≤ φ2, then∫φ1 dµ ≤

    ∫φ2 dµ. (4.10)

    (4) If φ is simple, then ∣∣∣∫ φ dµ∣∣∣≤ (supx∈X|φ(x)|

    )µ(X). (4.11)

    Proof. Property (2) is immediate from the definition of the integral. An important technical pointfor the proof of (1) and (3) is that if φ1 and φ2 are simple functions then there are disjoint sets{Ai} and (not necessarily distinct) collections of numbers a1, . . . ,an and b1, . . . ,bn such that

    φ1 =n

    ∑i=1

    ai 1Ai and φ2 =n

    ∑i=1

    bi 1Ai . (4.12)

    245A lecture notes ( c©2017 M. Biskup) Preliminary version (typeset: December 29, 2019)

  • 35

    Using this representation of both simple functions, to get (1) it then suffices to note that φ1+φ2 =∑ni=1(ai +bi)1Ai and so∫

    (φ1 +φ2)dµ =n

    ∑i=1

    (ai +bi)µ(Ai)

    =n

    ∑i=1

    ai µ(Ai)+n

    ∑i=1

    bi µ(Ai) =∫

    φ1 dµ +∫

    φ2 dµ,(4.13)

    where in the first and third inequality we used Lemma 4.3.For (3) we in turn note that, for the representation as in (4.12), φ1≤ φ2 implies ai≤ bi whenever

    Ai 6= /0 and so (again with the help of Lemma 4.3),∫φ1 dµ =

    n

    ∑i=1

    ai µ(Ai)≤n

    ∑i=1

    bi µ(Ai) =∫

    φ2 dµ. (4.14)

    Finally, for property (4) we use the fact that φ1 :=(supx∈X |φ(x)|

    )is simple and −φ1 ≤ φ ≤ φ1.

    So, by (2) and (3) we have

    −(supx∈X|φ(x)|

    )µ(X) =

    ∫(−φ1)dµ ≤

    ∫φ dµ ≤

    ∫φ1 dµ =