Ma 40042

Embed Size (px)

DESCRIPTION

maths notes

Citation preview

  • MA40042 Measure Theory and Integration

    class notes prepared by Dr. Antal A. Jarai

    December 9, 2014

    Contents

    0 Introduction 30.1 Basic information . . . . . . . . . . . . . . . . . . . . . . . . . 30.2 Conventions regarding infinities . . . . . . . . . . . . . . . . . 4

    1 Systems of sets 51.1 -algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Some easy properties . . . . . . . . . . . . . . . . . . . . . . . 51.3 Examples and non-examples . . . . . . . . . . . . . . . . . . . 51.4 Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 Generated -algebras . . . . . . . . . . . . . . . . . . . . . . . 61.6 Borel sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2 Measures 72.1 Definition of measure . . . . . . . . . . . . . . . . . . . . . . . 82.2 Some simple properties . . . . . . . . . . . . . . . . . . . . . . 82.3 Some simple examples . . . . . . . . . . . . . . . . . . . . . . 9

    3 Construction of measures, Lebesgue measure in Rd 103.1 Volume in Rd, the overall idea . . . . . . . . . . . . . . . . . . 113.2 Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Semi-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4 Generated algebra . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Volume of boxes in Rd . . . . . . . . . . . . . . . . . . . . . . 133.6 Extension from a semi-algebra to the generated algebra . . . 143.7 Pre-measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.8 Verifying the conditions for boxes . . . . . . . . . . . . . . . . 183.9 -finiteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.10 Caratheodorys extension theorem . . . . . . . . . . . . . . . 20

    1

  • 3.11 Dynkins lemma (- theorem) . . . . . . . . . . . . . . . . . 213.12 Uniqueness lemma . . . . . . . . . . . . . . . . . . . . . . . . 233.13 Construction of the extension, Outer measures . . . . . . . . 243.14 Example of a set that is not Lebesgue measurable . . . . . . . 30

    4 Measurable functions and their properties 314.1 Open sets in [,] . . . . . . . . . . . . . . . . . . . . . . 314.2 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . 324.3 Measurable functions . . . . . . . . . . . . . . . . . . . . . . . 324.4 Compositions of functions . . . . . . . . . . . . . . . . . . . . 334.5 Borel functions . . . . . . . . . . . . . . . . . . . . . . . . . . 344.6 Limits of measurable functions . . . . . . . . . . . . . . . . . 364.7 Simple functions . . . . . . . . . . . . . . . . . . . . . . . . . 374.8 Monotone class theorem . . . . . . . . . . . . . . . . . . . . . 38

    5 Abstract integration theory (Lebesgue integral) 385.1 Arithmetic in [0,] . . . . . . . . . . . . . . . . . . . . . . . 385.2 Integration of non-negative simple functions . . . . . . . . . . 395.3 Integration of non-negative functions . . . . . . . . . . . . . . 395.4 Basic properties of the integral . . . . . . . . . . . . . . . . . 405.5 Integral as a new measure . . . . . . . . . . . . . . . . . . . . 425.6 Monotone Convergence Theorem . . . . . . . . . . . . . . . . 435.7 Sums of non-negative series . . . . . . . . . . . . . . . . . . . 445.8 Fatous Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 455.9 Density functions . . . . . . . . . . . . . . . . . . . . . . . . . 455.10 Integration of signed functions . . . . . . . . . . . . . . . . . 475.11 Linearity of the integral . . . . . . . . . . . . . . . . . . . . . 485.12 Dominated Convergence Theorem . . . . . . . . . . . . . . . . 505.13 The role of sets of measure 0 . . . . . . . . . . . . . . . . . . 505.14 Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.15 Series absolutely convergent in L1() . . . . . . . . . . . . . . 525.16 Examples of a.e.-type conclusions . . . . . . . . . . . . . . . . 53

    6 Inequalities and Lp spaces 536.1 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 Jensens inequality . . . . . . . . . . . . . . . . . . . . . . . . 546.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.4 Holders inequality; Minkowskis inequality . . . . . . . . . . 556.5 Lp-spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.6 Completeness of Lp() . . . . . . . . . . . . . . . . . . . . . . 60

    2

  • 7 Product spaces and Fubinis theorem 627.1 Product measure . . . . . . . . . . . . . . . . . . . . . . . . . 627.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.3 Fubinis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 647.4 Important counterexamples . . . . . . . . . . . . . . . . . . . 697.5 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    8 Applications to probability 718.1 Product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 728.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.3 Conditional expectation . . . . . . . . . . . . . . . . . . . . . 74

    A Appendix 75

    0 Introduction

    0.1 Basic information

    These are class notes for the unit MA40042 Measure Theory and Integration.They combine some of the material (adapted to the level of prerequisitesstudents in MA40042 have) from the following two sources:

    1. W. Rudin, Real and complex analysis, third edition, McGraw-Hill, NewYork, 1987.

    2. R. Durrett, Probability: theory and examples, fourth edition, Cam-bridge Series in Statistical and Probabilistic Mathematics, CambridgeUniv. Press, Cambridge, 2010.

    These notes are a summary of the lectures and their goal is to supple-ment the lectures by recording the essentials. Motivation, further detail andexplanation may be given in the lectures, in particular, if there are studentquestions.

    Numbering conventions: If there is only a single theorem (definition,lemma, etc.) in Subsection X.Y, then in later sections this theorem is re-ferred to as Theorem X.Y. The theorem does not carry a separate number inSubsection X.Y itself. If there are, say, three theorems in Subsection X.Y,then in later subsections these are referred to as Theorems X.Y.1, X.Y.2and X.Y.3. In Subsection X.Y itself, the theorems are simply numbered asTheorems 1, 2 and 3.

    3

  • In text exercises: Some statements made in the text are followed by(HW). This means they are left as exercises for you to do. Some of thesewill appear on problem sheets, but not all of them. You will be expected toknow how to do them.

    Potential typos: Unavoidably, some typos will have crept in. Pleaseask if anything is not clear.

    Non-examinable: Any part that is non-examinable is cleared markedas such, printed in smaller font size and indented. All else is examinable.

    0.2 Conventions regarding infinities

    A number of conventions regarding infinities will be very useful in measuretheory. This subsection summarizes them. [Note: the conventions are re-stricted to this subject only. Do not try to use them without care in otherunits.]

    When discussing measures and integrals, one inevitably encounters .The area or volume of a set may be infinite, or the integral of a functions maybe + or . Also, although we will mainly be interested in functions thattake real values, at some points it may be reasonable to define the functionto take an infinite value. This is often the case when we take limits or forminfinite sums of functions. We need precise definitions to deal with suchcases unambiguously.

    Arithmetic in [0,]. If 0 a , we define:

    a+ =+ a =;and

    a = a ={ if 0 < a 0 if a = 0.

    To illustrate in an example why it will be convenient to define 0 = 0,think about the set in the two-dimensional plane consisting of the points onthe x-axis. This set has infinite length in the x-direction, and 0 widthin the y-direction, and it has 0 area, correspondingly.

    Warning: Careful with cancelations! Suppose a, b, c [0,]. Ifa + b = a + c, this implies the statement b = c, provided that a < . Ifab = ac, this implies the statement b = c, provided 0 < a

  • is, when a
  • M is a -algebra(b) Non-example: X = {1, 2, 3, . . . },

    A = {A X : A is finite or X A is finite}.

    A is not a -algebra: {1}, {3}, {5}, A, but their union: {1, 3, 5, . . . } 6A.

    (c) Example: X is an uncountable set,

    M = {A X : A is countable or X A is countable}.

    M is a -algebra (HW).

    1.4 Algebras

    Definition. A collection A of subsets of X is called an algebra in X, if ithas the following properties.(i) X A;(ii) if A A then also X A = Ac A;(iii) if A1, . . . , An A, then also nk=1Ak A.

    Example 1.3(b) is an algebra. Any -algebra is also an algebra (HW).

    1.5 Generated -algebras

    Theorem. If C is any collection of subsets of X, then there is a smallest-algebra M in X, such that C M.

    M is called the -algebra generated by C.Notation: M = (C).

    Proof of Theorem. LetM be the intersection of all -algebras that containC. (We know there is at least one such -algebra, namely the one containingall subsets of X). Then clearly C M. We show that M is itself a -algebra. For this, we show the more general statement that if {M}I isany family of -algebras in X, then IM is also a -algebra in X. Thisstatement then can be applied to the family of all -algebras containing C.(Here I is an arbitrary non-empty index set, possibly uncountable.)

    Let F = IM, and we check that F satisfies the requirements ofDefinition 1.1(i)(iii).

    (i): X M for all I, beacuse each M is a -algebra. ThereforeX IM = F .

    6

  • (ii): A F = A M, I = Ac M, I = Ac IM = F .

    (iii): A1, A2, F = A1, A2, M, I = n=1An M, I = n=1An IM = F .

    We get that F is a -algebra. In particular, our M is a -algebra.It is clear thatM is necessarily the smallest -algebra containing C.(HW): (i) If M1 M2 . . . are -algebras in X, then n=1Mn is

    always an algebra in X, but it may not be a -algebra.(ii) The smallest -algebra that contains all the Mns is (n=1Mn),

    that is, in general, it is necessary to apply the () operation to the union.

    1.6 Borel sets

    Definition 1. A set U Rd is called open, if for all x U there exists r > 0 such thatB(x, r) U . Here B(x, r) = {y Rd : |yx| < r} is the open ball of radiusr centred at x. A set F Rd is called closed, if F c is open. A set K Rd is called compact, if any open cover of K admits a finitesubcover. [That is, if K IU, where U are arbitrary open sets, thenthere exist n 1 and 1, . . . , n I such that also K ni=1Ui .]Theorem (Heine-Borel Theorem). (see Appendix) A set K Rd iscompact if and only if it is closed and bounded.

    Definition 2. We denote

    G := collection of all open sets = {U : U Rd, U open}.

    We define

    B := (G) = smallest -algebra containing all open sets.

    B is called the collection of Borel sets in Rd.(HW): B = (closed sets) = (compact sets).

    2 Measures

    In this section, let (X,M) be a measurable space.

    7

  • 2.1 Definition of measure

    Definition.(a) A measure is a function defined on a -algebraM with the followingproperties:

    (i) :M [0,];(ii) () = 0;(iii) If A1, A2, M are pairwise disjoint then we have (n=1An) =

    n=1 (An) (countable additivity, -additivity).

    (b) The triple (X,M, ) is called a measure space, and the members ofM are called measurable sets.

    2.2 Some simple properties

    Theorem. Let be a measure on (X,M). Then we have:(i) If A1, . . . , An M are pairwise disjoint, then

    (A1 An) = (A1) + + (An) (finite additivity).

    (ii) If A,B M, A B, then (A) (B) (monotonicity).(iii) If A1 A2 . . . , and A = n=1An, then

    (A) = limn

    (An) (continuity for increasing unions).

    (iv) If A1 A2 . . . , and A = n=1An, and (A1)

  • (iii) Put

    B1 = A1,

    B2 = A2 A1,...

    Bn = An An1,...

    Then B1, B2, . . . are pairwise disjoint and n=1An = n=1Bn. Therefore,we have

    (A) = (n=1An) = (n=1Bn)Defn 2.1(a)(iii)

    =n=1

    (Bn)

    = limn

    nk=1

    (Bk)part (i)= lim

    n(nk=1Bk) = limn(

    nk=1Ak)

    = limn

    (An),

    as required.(iv) Put Bk = A1 Ak, k = 1, 2, . . . . Then B1 B2 . . . , and

    n=1Bn = A1 (n=1Acn) = A1 (n=1An)c = A1 Ac = A1 A.

    Also, (Bn) = (A1)(An), since both (A1) and (An) are finite. There-fore,

    (A1) = (A) + (A1 A) part (iii)= (A) + limn

    (Bn)

    = (A) + limn

    [(A1) (An)] = (A) + (A1) limn

    (An).

    We may cancel (A1) from both sides, since (A1)

  • is called counting measure on X.(b) Point mass (Dirac measure). X and set, M = all subsets of X,

    x0 X a fixed.x0(A) =

    {1 if x0 A;0 if x0 6 A.

    x0 is called the point mass at x0, or Dirac measure at x0.(c) Discrete measures. X is a countable set, M = all subsets of X,

    mi 0, i X given numbers (weights).

    (A) =iA

    mi.

    (HW): Check that (a), (b), (c) are indeed measures.Whenever (X,M, ) is a measure space and (X) = 1, we say that

    is a probability measure. (b) is a probability measure, and wheneveriX mi = 1, so is (c).

    3 Construction of measures, Lebesgue measure inRd

    One general approach to defining a measure is to define it first on a collectionof simple sets, where we already know what we want the measure to be,and then extend it to all sets in the generated -algebra. The extensionstep is often non-trivial, and we will see in this section what propertiesone has to verify for this. Thus there will be two parallel threads: thegeneral construction, and, as an illustration, the concrete construction ofd-dimensional volume on the Borel sets in Rd (called Lebesgue measure).

    A lot of the material in this section is quite abstract. So it is importantto keep in mind that once we have constructed a measure, we can essen-tially forget about how it was obtained: the rest of the material, namelyintegration theory and the three main theorems of integration will not relyon the method of construction, and it is perfectly possible to understand thefurther sections without knowing the internal workings of this section. Butone needs to be aware that the construction step, in almost all interestingcases, involves something non-trivial.

    Finally, if in the future you will be using more delicate properties offunction than we treat in this course, you may want to know that onecan construct measures in a somewhat more elegant way than we dohere; see e.g. [2, Chapter 2]. The reason I chose a more pedestrian

    10

  • approach in this unit, although it is slightly longer, is that I believe itis more intuitive on a first encounter with the subject.

    3.1 Volume in Rd, the overall idea

    For a rectangular set [a1, b1] . . . [ad, bd], we already know what its volumeshould be:

    di=1(bi ai). We can go a slight bit further, and consider sets

    that are finite unions of rectangular sets. These can be decomposed into afinite disjoint union of rectangular sets, so their volume is determined due tofinite additivity. After taking care of a few uninteresting technicalities, wecan ensure that the volume is defined on the algebra generated by rectangularsets, and it is finitely additive there. The main step is to go from this algebrato the Borel -algebra, and this requires the most work. But the basic ideais simple: given a Borel set B Rd, we consider coverings B n=1An,where the Ans are rectangular sets. Then the sum of the volumes of theAns is an approximation to what the volume of B should be. Making thecovering as efficient as possible, we define the measure of B as the infimumof the approximations over all coverings.

    3.2 Boxes

    For technical reasons (that are not essential, but help make some of thestatements neater), we will work with half-open intervals. A box S Rd isa set of the form S = I1 Id, where, for each i = 1, . . . , d the intervalIi takes one of the following four forms:

    Ii =

    (ai, bi] for some < ai bi

  • Example. The collection of boxes in Rd, defined in Section 3.2 is a semi-algebra. (HW): The complement of any box is the disjoint union of at most2d boxes.

    3.4 Generated algebra

    Lemma. Let S be a semi-algebra in X. PutA = finite disjoint unions of members of S= {A X : n 1, S1, . . . , Sn S disjoint such that A = nk=1Sk}.

    Then A is an algebra in X. (It is called the algebra generated by S.)Proof. Step 1. We firts show that A is closed under intersections. SupposeA = ki=1Si, B = j=1Tj (disjoint unions). Then

    A B = ki=1 j=1 ( Si Tj S by Defn. 3.3(i)

    ) A by definition of A.

    Step 2. We now show that A is closed under complements. SupposeA = ki=1Si (disjoint union). Then

    Ac = ki=1 Sci A by Defn. 3.3(ii)

    A by Step 1 and induction.

    This verifies that A is closed under complements.Step 3. We check that X A. Take any S S. Write Sc = ki=1Si

    (disjoint). Then X = S ki=1Si A.We check that A is closed under finite unions: if A,B A, we have

    A B = ( Ac Bc A by Steps 1 and 2

    )c A by Step 2.

    Example. If S = collection of boxes in Rd, then

    A = finite disjoint unions of boxes

    is an algebra.

    12

  • 3.5 Volume of boxes in Rd

    Take S = collection of boxes in Rd. For S = I1 Id Sc put

    (S) :=ki=1

    (bi ai),

    where ai and bi are the endpoints of Ii; allowed to take infinite values.(Recall that the convention 0 is in force.)Lemma. The set-function is finitely additive on S, that is, if S S,S = ki=1Si with S1, . . . , Sk S disjoint, then

    (S) =ki=1

    (Si).

    Proof. Let us say that S = j=1Bj is a regular subdivision of S, if foreach i = 1, . . . , d, there exists a sequence

    ai = i,0 < i,1 < < i,ni = bisuch that each Bj is of the form:

    (1,r11, 1,r1 ] (d,rd1, d,rd ], for some 1 ri ni, i = 1, . . . , d.

    (and consequently: = n1 nd). It follows easily, from the distributivelaw, that for a regular subdivision (S) =

    j=1 (Bj).

    General case: let S = ki=1Si. We can subdivide each Si regularly insuch a way that

    Si = mjj=1Ti,j (regular subdivision)S = i,jTi,j (also a regular subdivision).

    It follows that

    (S) =ki=1

    mjj=1

    (Ti,j) =ki=1

    (Si).

    13

  • 3.6 Extension from a semi-algebra to the generated algebra

    Theorem. Let S be a semi-algebra in X, and suppose that the set-function : S [0,] satisfies:(i) () = 0;(ii) (finitely additive) if S = ki=1Si, as a disjoint union, then (S) =k

    i=1 (Si).Then has a unique extension to the algebra A generated by S, such thatthe following properties holds:(a) (extension is still finitely additive) If A,B1, . . . , Bn A, A = ni=1Bi,disjoint union, then

    (A) =ni=1

    (Bi).

    Moreover, also has the following property:(b) ( is finitely sub-additive) If A,B1, . . . , Bn A, A ni=1Bi, (notnecessarily disjoint union), then

    (A) ni=1

    (Bi).

    Proof. We define on A, by writing any A A in the form A = ki=1Si(disjoint union of members of S), and putting (A) =ki=1 (Si).

    Note: this is the only possible way to define (A), if we want property(a) to hold at all. Hence uniqueness of the extension is immediate.

    But we have to check that is well-defined, that is, the value of (A)does not depend on the way we represented A as a union. Suppose A =ki=1Si = j=1Tj . Then

    Si = j=1(Si Tj) and Tj = ki=1(Si Tj).This implies:

    ki=1

    (Si)(ii)=

    ki=1

    j=1

    (Si Tj) (ii)=

    j=1

    (Tj).

    So any two possible definitions of (A) agree.Now to see that (a) holds: write Bi = mij=1Si,j as a disjoint union of

    members of S. Then A = ki=1 mij=1 Si,j , and hence

    (A)defn. of (A)

    =ki=1

    mij=1

    (Si,j)defn. of (Bi)

    = =ki=1

    (Bi).

    14

  • Finally, to see that (b) holds: start with n = 1, B1 = B. Then B =A (B Ac

    A

    ), so

    (A) (A) + (B Ac) = (B).

    Now we handle n > 1. Let Fk = Bc1 . . . Bck1 Bk A. Then ni=1Bi =

    ni=1Fi, and the latter is a disjoint union. So we have:

    (A)(a)=

    ni=1

    (Fi A)n = 1 case

    ni=1

    (Fi)n = 1 case

    ni=1

    (Bi).

    Corollary. If A = i=1Bi, as a disjoint union, and A,B1, B2, A, then

    (A) i=1

    (Bi). (1)

    Proof. Write A = (ni=1Bi)Cn, as a disjoint union of members of A (notethat Cn = A (ni=1Bi)c A). Hence we get:

    (A)(a)= (B1) + + (Bn) + (Cn)

    ni=1

    (Bn).

    Letting n, we get (1).Example. The theorem implies that the set-function defined on boxes inSection 3.5 extends uniquely, to a finitely additive set-function on thegenerated algebra (finite disjoint unions of boxes).

    Note: Of course, we would like = in (1), but this does not follow fromthe assumptions we made on . For example, consider X = {1, 2, 3, . . . },A = finite or co-finite subsets, (A) = 0 if A is finite and (A) = 1 if Ais co-finite. This is finitely additive, but the inequality in (1) is strict forBi = {i}, A = {1, 2, 3, . . . }. This set-function simply cannot be extendedto a measure on the generated -algebra. We will need extra conditions on to guarantee that we can further extend to the generated -algebra.

    15

  • 3.7 Pre-measures

    First, a brief summary of what we have achieved so far. If S is a semi-algebrain X and A is the generated algebra:

    Assume: finitely additive on S, that is:(S) =

    ki=1 (Si), if S = ki=1Si (a disjoint union)

    unique extension to A, which is:finitely additive: (A) =

    ki=1 (Ai), if A = ki=1Ai

    (a disjoint union)

    finitely subadditive: (A) ki=1 (Ai), if A ki=1Ai(not necessarily disjoint union)

    countably superadditive: (A) i=1 (Ai), if A = i=1Ai(disjoint union).

    Examples of s we have seen so far: on boxes in Rd, and F on 1D boxes.We want countable additivity for . As the Note at the end of Section 3.6shows, for this it is not enough to assume what we did in the first box above.We are going to assume that is countably sub-additive on S, and show thatthis implies the same for on A. Such a will be called a pre-measure.Definition. Let A be an algebra in a set X. A set-function : A [0,]is called a pre-measure, if it satisfies:(i) () = 0;(ii) (-additive) If A1, A2, A are disjoint and A = n=1An A, then(A) =

    n=1 (An).

    Note: In short, a pre-measure has the same defining properties as ameasure, except that it is defined on an algebra, not a -algebra.

    Note: There is an appropriate definition of pre-measure also when A isnot an algebra (that you might see in some books), but it is more technical.

    Theorem. (An enhancement of Theorem 3.6) Let S be a semi-algebrain X and A the generated algebra. Suppose that : S [0,] satisfies:(i) () = 0;(ii) If S = ki=1Si, disjoint union, then (S) =

    ki=1 (Si);

    (iii) If S = i=1Si, disjoint union, then (S)

    i=1 (Si).Then has a unique extension to A such that is a pre-measure on A.

    16

  • Proof. In Theorem 3.6 we saw that (i) + (ii) implies the existence of a uniqueextension on A satisfying 3.6(a) + (b) + (1). What is left to check is thatif A = n=1An, a disjoint union, A,A1, A2, A, then

    (A) n=1

    (An). (2)

    (Because (2) together with 3.6(1) implies -additivity, the required property(ii) of a pre-measure.)

    In order to show (2), for each n 1 write:

    An = knj=1Sn,j , disjoint, Sn,j S.

    We haven=1

    (An) =n=1

    knj=1

    (Sn,j). (3)

    For ease of reference, reindex the Sn,j s as a single sequence: S1, S

    2, . . . ,

    disjoint (since the Ans are also disjoint), so that

    i=1Si = n=1 knj=1 Sn,j = A.

    Since A A, it also has a decomposition A = j=1Tj , disjoint, Tj S, sowe can write:

    Tj = i=1(Tj Si S

    ).

    Due to assumption (iii), we have:

    (Tj) i=1

    (Tj Si).

    Summing over j and exchanging the sums we get:

    (A) =

    j=1

    (Tj)

    j=1

    i=1

    (Tj Si) =i=1

    j=1

    (Tj Si)

    =i=1

    (A Si) =i=1

    (Si)reindexing

    =n=1

    knj=1

    (Sn,j)(3)=

    n=1

    (An).

    This establishes the required (2).

    17

  • 3.8 Verifying the conditions for boxes

    Let S = boxes in Rd, let A be the generated algebra, the volume functiondefined in Section 3.5, and its extension to A.

    In this section we verify the condition (iii) of Theorem 3.7 for , and asa consequence that is a pre-measure on the generated algebra A.

    Suppose that S = i=1Si, disjoint union, S, S1, S2, . . . boxes. We needto check that

    (S) i=1

    (Si). (4)

    You might want to pause here a little bit, and try to think how you wouldprove this. It is very intuitive that the two sides should be equal, yet, toprove just the inequality above involves something non-trivial.

    Before presenting the argument, I should mention that one could stream-line the proof below somewhat, but the reason I do not, is that this way theargument will apply equally well to the set-functions F we looked at onProblem Sheet 2.

    Proof of (4). First assume that S is bounded, so

    S = (a1, b1] (ad, bd].

    We may assume that the sum on the right hand side of (4) is 0 such that for the box

    S = (a1 + , b1] (ad + , bd],

    that is slightly smaller than S, we have

    (S) (S) + . (5)

    Similarly, for each i = 1, 2, . . . , we can choose i > 0 such that for the box

    Si = (ai1, b

    i1 + i] (aid + , bid + i],

    that is slightly larger than Si, we have

    (Si) (Si) +

    2i, i = 1, 2, . . . . (6)

    18

  • Put

    S := [a1 + , b1] [ad + , bd];Si := (a

    i1, b

    i1 + i) (aid, bid + i),

    and observe that S is compact and the Si s are open. Moreover, we havea covering, because of the inclusions:

    S (a1, b1] (ad, bd] = i=1Si i=1Si . (7)

    Due to the Heine-Borel Theorem (see the Appendix), there exists a finite Nsuch that S Ni=1Si . From this the following inclusion follows:

    S S Ni=1Si Ni=1Si.

    From Theorem 3.6(b) we get:

    (S)(5)

    (S)+ Thm 3.6Ni=1

    (Si)+(6)

    Ni=1

    (Si)+2 i=1

    (Si)+2. (8)

    Since > 0 was arbitrary, we can let 0 in (8) to get:

    (S) i=1

    (Si),

    which is what we wanted to prove.Finally, we deal with the case of unbounded S. Let S S be a bounded

    box, so that S i=1Si. We can repeat the previous argument and get

    (S) i=1

    (Si).

    (For this, note that we only need inclusion, and not =, to hold in the middleof (7).) Since this holds for any bounded S S, (4) follows.

    19

  • A summary of the achivement of the last two sections:

    Assume is:

    finitely additive on S: (S) =ki=1 (Si), if S = ki=1Si(a disjoint union)

    countably sub-additive on S: (S) i=1 (Si), if S = i=1Si(a disjoint union)

    unique extension to A, a pre-measure, that is:countably additive: (A) =

    i=1 (Ai), if A = i=1Ai

    (disjoint union).

    Examples of we have seen that satisy the assumptions in the first box: in Rd, F in 1D (Problem Sheets).

    What is left of the construction of measures is to extend a pre-measure defined on an algebra A to a measure on (A). This involves two separateparts: showing that an extension exists, and proving that it is unique. Theuniqueness part is the easier one, and we will start with that. In doing so,we will need to impose another restriction (which is however, usually easyto verify). This is explained in the next section.

    3.9 -finiteness

    The following concept is needed to ensure that a pre-measure defined onan algebra A has at most one extension to a measure on (A).Definition. Let be a pre-measure on an algebra A in X (or a measure ona -algebraM inX). We say that is -finite, if there exist A1, A2, A(or M), such that

    n=1An = X and (An)

  • Example. By the work in Sections 3.7 and 3.8 the volume function is apre-measure on the algebra A generated by the boxes. It is not difficult tocheck that it is -finite: ((n, n]d) = (2n)d < for all n 1, and Rd =n1(n, n]d. It follows from the Theorem that there is a unique extensionto a measure on (A) = B. We call this measure the d-dimensionalLebesgue measure.

    The proof of the Theorem is quite technical, and as mentioned earlier,the uniquemess part is easier. We do this next. Following that we define theextension. The details of seeing that the definition indeed works and definesa measure will be non-examinable.

    3.11 Dynkins lemma (pi- theorem)

    In this section we prove a very useful lemma that helps with statementsof the sort: if we know that a property holds for a certain (often rathersmall) collection of sets, then it necessarily holds for all sets in the generated-algebra.

    Let X be a non-empty set.

    Definition. A collection P of subsets of X is called a -system, if it is closed underintersection:(i) A,B P A B P. A collection L of subsets of X is called a -system, if it has the followingthree properties:(ii) X L;(iii) (closed under monotonic difference) A,B L, A B BA L;(iv) (closed under monotonic union) A1, A2, L, A1 A2 . . . n=1An L.

    Note: Sometimes a -system is called a d-system (after Dynkin).

    Theorem (Dynkins - theorem). If P is a -system and L is a -system,and P L, then also (P) L.

    Proof. (Non-examinable)Let (P) denote the smallest -system that contains P. (That there isa smallest such -system can be seen similarly to the case of -algebras;Theorem 1.5.) The Theorem will follow, once we show that:

    (P) is a -algebra. (a)

    21

  • Indeed, then we get:(P) (P) L.

    The first inclusion holds because (P) is the smallest -algebra con-taining P, and by (a) (P) is one such -algebra. The second inclusionholds because (P) is the smallest -system containing P, and L is onesuch -system, by our assumption P L.In order to prove (a), it is sufficient to prove:

    (P) is closed under intersection. (b)Indeed, if we have (b), we get: X (P), since (P) is a -system; If A (P), then XA (P) (since this is a monotonic difference); If A1, A2, (P), then

    ni=1

    Ai =

    (ni=1

    Aci (P) as just shown (P) due to (b)

    )c (P),

    and then ni=1Ai i=1Ai (P), as it is a monotonic union. Thethree bullet points verify the three requirements for a -algebra, soindeed (b) implies (a), and it is left to prove (b).

    Proving (b) is the main part of the argument. For any set A (P),we define

    DA := {B X : A B (P)}.We claim that:

    DA is a -system. (c)Indeed: A X = A (P), so X DA; IfB,C DA, andB C, then A(CB) = (AC)(AB) (P),since this is a monotonic difference of members of (P); If B1, B2, DA and B1 B2 . . . , then A (n=1Bn) =n=1(A Bn) L, since this is a monotonic union of members of(P).The above verifies (c), saying that DA is a -system.The remaining part of the argument is simply tracing definitions in theright way. First we show that:

    If A P, then P DA. (d)Indeed: for any B P, we have A B P, since P is closed underintersection, and by the definition of DA we have B DA.

    22

  • Next, (c) + (d) imply that if A P then (P) DA (since DA isa -system containing P, and (P) is the smallest one). Tracing thedefinition of DA, this means that:

    If A P and B (P), then A B (P).

    Swapping the roles of A and B in this statement, we can rephrase itas:

    If A (P) and B P, then A B (P). (e)But (e) says that if A (P), then P DA. This implies that if A (P), then (P) DA (since DA is the smallest -system containingP). By the definition of DA, the last statement says:

    If A (P) and B (P) then A B (P).

    And this is the statement (b) that we wanted to prove. The proof ofthe Theorem is complete.

    3.12 Uniqueness lemma

    Theorem. Suppose that 1 and 2 are measures on -algebras M1 and M2in X. Suppose that 1 and 2 equal on a -system P M1 M2. Supposealso that there exist A1, A2, P, An X, such that 1(An) = 2(An) < for all n 1. Then 1 and 2 equal on (P).Proof. Fix A P such that 1(A) = 2(A)

  • Hence C B L. Suppose B1, B2 L and B1 B2 . . . . We have:

    1(A (n=1Bn)) = 1(n=1(A Bn))increasing union

    = limn

    1(A Bn)B1,B2,L

    = limn

    2(A Bn)increasing union

    = 2(n=1(A Bn))= 1(A (n=1Bn)).

    Hence n=1Bn L.The above verifies the three requirements for L to be a -system.

    The - theorem implies that (P) L, and hence that 1(A B) =2(A B) for all sets B (P). We apply this now with A = A1, A2, . . .from the assumption of the Theorem. This gives, for all B (P), theequality:

    1(B) = limn

    1(An B) = limn

    2(An B) = 2(B).

    Hence the statement is proved.

    Corollary. Suppose that is a -finite pre-measure on an algebra A. Then has at most one extension to a measure on (A).Proof. Suppose that 1 and 2 are both extensions of . Then 1 and 2equal on A, which is a -system. The -finiteness assumption implies thatthe Theorem can be applied, and hence 1 and 2 are equal on (A).Remark. In particular, d-dimensional Lebesgue measure is uniquely deter-mined on the Borel sets, by specifying it on boxes.

    3.13 Construction of the extension, Outer measures

    Let X be a non-empty set, and write E(X) for the collection of all subsetsof X.

    Definition 1. A set-function : E(X) [0,] is called an outer mea-sure, if it satisfies:(i) () = 0;(ii) (-sub-additivie) (E) n=1 (En), whenever E n=1En.

    24

  • Remark 1. Observe that (i) + (ii) imply monotonicity: if E F , then(E) (F ), because E F .

    One way to obtain an outer measure is to start with an arbitrary set-function, and cover sets as efficiently as possible, as we nexy show.

    Let us return to in the earlier sections, that is a pre-measure on analgebra A in X.Definition 2. The outer measure constructed from is defined as:

    (E) := inf

    {n=1

    (An) : E n=1An, A1, A2, A}, (9)

    that is, the infimum is over all countable coverings of E by elements of A.Lemma. The set-function is indeed an outer measure, in the sense ofDefinition 1.

    Proof. The emptyset E = can be covered with = A1 = A2 = . . . ,showing that () = 0.

    In order to see -subadditivity, suppose that E n=1, and we want toshow that

    (E) n=1

    (En). (10)

    If for any n 1 we have (En) =, then the inequality (10) holds trivially.Henceforth assume that (En) 0 be fixed. Foreach n 1, due to the definition of (En) by an infimum, as well as itsfiniteness, we can find a covering En k=1An,k, with An,1, An,2, A,such that

    k=1

    (An,k) (En) + 2n. (11)

    SinceE n=1En n=1 k=1 An,k,

    the doubly index collections {An,k}n1,k1 is a countable covering of E byelements of A. Therefore, using (11), we have

    (E) n=1

    k=1

    (An,k) n=1

    ((En) +

    2n

    )=

    n=1

    (En) + .

    Since > 0 was arbitrary, we can let 0 to obtain (E) n=1 (En).This was the required inequality (10). This completes the proof that isan outer measure.

    25

  • Remark. Note that we did not use at all in the proof that was apre-measure, nor that it was defined on an algebra. So we could havestarted from any set-function, and (9) would have given an outer mea-sure. This is sometimes useful. The fact that is a pre-measure comesin when we want to show that is an extension of .

    Example 1. Consider , defined on A, the finite unions of boxes in Rd. Wecall

    (E) := inf

    {n=1

    (An) : E n=1An, A1, A2, A}

    the d-dimensional Lebesgue outer measure. Writing each An as a finitedisjoint union of boxes, and re-indexing, can also be written in terms ofcoverings with boxes:

    (E) := inf

    {n=1

    (Sn) : E n=1Sn, S1, S2, S}.

    The infimum is now over all countable coverings of E with boxes.

    The following is the crucial definition for the construction.

    Definition 3. Let be an outer measure on E(X). We say that a setE X is -measurable, if for all T X we have:

    (T ) = (T E) + (T Ec). (12)

    Note: One way to remember the formula (12) is that it says: E cutsevery set T nicely (nice meaning: additively).

    Remark 2. (Important) Since an outer measure is -sub-additive, and() = 0, the inequality (T ) (T E) + (T Ec) always holds,trivially. Hence, in showing that a given set E X is -measurable, itwill always be enough to show that (T ) (T E) + (T Ec) for allT X.Example 2. With the Lebesgue outer measure, we say that a set E Xis Lebesgue measurable, if it is -measurable, in the sense of Definition3.

    Theorem (Main theorem). Suppose that is a pre-measure on an algebraA in X, and let be the outer measure constructed from , according toDefinition 2.

    26

  • (i) The collection M of -measurable sets is a -algebra.(ii) M A, and hence M (A).(iii) restricted to M is a measure.(iv) |A = .In particular, extends to a measure on a -algebra that is at least aslarge as (A).Example 3. Consider the Lebesgue outer measure , and let M denotethe collection of Lebesgue measurable sets. ThenM B, and restrictedto M is called d-dimensional Lebesgue measure.

    Proof of Main Theorem. (Non-examinable) We start with the proofsof (i) and (iii). It is an important fact that these hold regardless ofhow the outer measure was obtained, so we state them in this greatergenerality in the next two lemmas.

    Lemma 1. Let be any outer measure on E(X), and let M denotethe collection of -measurable sets. Then M is a -algebra.

    Proof. It is easy to check that X M. Indeed, for any T X wehave

    (T ) = (T X) = (T X) + () = (T X) + (T Xc),and hence X is -measurable.

    It is also easy to see that M is closed under complements, since thedefinition (12) of -measurability is symmetric in E and Ec.

    We show that M is closed under countable intersections. This issufficient to finish the proof, because taking complements we get thatM is closed under countable unions. So suppose that E1, E2, . . . M, and recall that it is sufficient to show that for any T X we have

    (T ) (T

    j=1

    Ej

    )+

    (T

    ( j=1

    Ej

    )c). (13)

    Using -measurability of E1, E2, . . . in turn, with the role of the test-set T being played by T, T E1, T E1 E2, . . . we get:

    (T ) = (T Ec1) + (T E1)= (T Ec1) + (T E1 Ec2) + (T E1 E2)= (T Ec1) + (T E1 Ec2) + (T E1 E2 Ec3)

    + (T E1 E2 E3)...

    =

    n1i=1

    (T

    ( i1j=1

    Ej

    ) Eci

    )+

    (T

    ( nj=1

    Ej

    )).

    27

  • Due to monotonicity, the last term on the right hand side is at least(T (j=1Ej)), so we have:

    (T ) n1i=1

    (T

    ( i1j=1

    Ej

    ) Eci

    )+

    (T

    ( j=1

    Ej

    )).

    Since this inequality holds for any n 1, we can let n on theright hand side and obtain:

    (T ) i=1

    (T

    ( i1j=1

    Ej

    ) Eci

    )+

    (T

    ( j=1

    Ej

    )).

    Since the union of the sets T (i1j=1Ej)Eci (that appear in the sum)over i = 1, 2, . . . equals T (j=1Ecj ) = T (j=1Ej)c, we get

    (T ) (T

    ( j=1

    Ej

    )c)+

    (T

    ( j=1

    Ej

    )).

    And this is the inequality (13) we wanted to show. The proof is com-plete.

    Lemma 2. Let be any outer measure on E(X), and let M be thecollection of -measurable sets. Then |M is a measure.

    Proof. It is easy to see finite additivity: let E,F M be disjoint.Using the -measurability of F with T = E F , we have:

    (E F ) = ((E F ) F ) + ((E F ) F c) = (F ) + (E).

    Since due to Lemma 1 M is an algebra, finite additivity follows byinduction.

    We saw in Eqn. 1 of Corollary 3.6 that finite additivity on an algebraimplies countable super-additivity: if E = n=1En as a disjoint union,with E1, E2, . . . M, then

    (E) n=1

    (En).

    On the other hand, since is an outer measure, we also have

    (E) n=1

    (En).

    This proves that is -additive on M, and the lemma follows.

    28

  • We next prove (ii) in the Main Theorem. Here we need to use that is finitely additive, and that the outer measure was constructedusing coverings with elements of A.Lemma 3. Under the assumptions of the Main Theorem, we haveM A, and consequently M (A).

    Proof. We need to show that any A A is -measurable, and for thisit is enough to show that for any T X we have

    (T ) (T A) + (T Ac). (14)

    We may assume that (T ) 0. By the definition of , we can find a coveringT n=1Bn, with B1, B2, . . . A, such that

    (T ) + n=1

    (Bn).

    Writing (Bn) = (BnA)+ (BnAc), this can be reformulated as:

    (T ) + n=1

    (Bn A) +n=1

    (Bn Ac), (15)

    Since T A n=1(Bn A) and T Ac n=1(Bn Ac), the righthand side of (15) is bounded below by (T A) + (T Ac). Thisgives:

    (T ) + (T A) + (T Ac).Since > 0 was arbitrary, we can let 0, and this gives the required(14).

    Finally, we prove (iv) of the Main Theorem. This is where we use that is a pre-measure.

    Lemma 4. Under the assumptions of the Main Theorem, |A = .

    Proof. Let A A. Since A is covered by A , we immediatelyhave (A) (A). What we need to show is that there does not exista countable covering of A that is more efficient than A itself. Forthis, consider any covering A n=1Bn, with B1, B2, . . . A. LetBn := A Bn, n 1; these sets still cover A, and are in A. We haveA = n=1Bn. Further, let us make the sets disjoint, that is, considerBn := B

    n ((B1)c (Bn1)c). Then A = n=1Bn as a disjoint

    union.

    29

  • Using that is a pre-measure, we have:

    (A) =

    n=1

    (Bn) n=1

    (Bn) n=1

    (Bn).

    Since the covering A n=1Bn was arbitrary, we get (A) (A).This completes the proof.

    Together Lemmas 1, 2, 3, 4 prove the Main Theorem.

    Together the Main Theorem and the Uniqueness Lemma complete theproof of Caratheodorys Extension Theorem.

    (Non-examinable)

    3.14 Example of a set that is not Lebesgue mea-surable

    Define the following relation between elements of the interval [0, 1]:for x, y [0, 1], we write x y, if x y Q. This is an equivalencerelation:(i) x x, because x x = 0 Q;(ii) x y implies y x, because y x = (x y) Q;(iii) x y and y z imply x z, because xz = (xy)+(yz) Q.Consider the equivalence classes with respect to the equivalence rela-tion . Let E be a set that contains exactly one number from eachequivalence class. We show that E is not Lebesgue measurable.

    For each q [1, 1], denote Eq := E + q. We claim that the sets Eq,q [1, 1]Q are disjoint. Indeed, suppose we had x Eq Er, withq, r [1, 1] Q. Then due to the definitions of Eq and Er, we musthave x = y+ q = z + r for some y, z E. But then y z = r q Q,and hence y z. But since E contains exactly one number from eachequivalence class, we can only have y z if y = z and hence q = r.This proves the disjointness claim.

    Next we claim that

    [0, 1]

    q[1,1]Q

    Eq [1, 2]. (16)

    The second inclusion is obvious, because E [0, 1] and each q is in[1, 1]. To see the first inclusion, let x [0, 1]. Then there exists aunique y E such that x y. Let q = x y [1, 1] Q. Thenx = y + q Eq, so x is an element of the middle set in (16).

    30

  • Suppose that E was Lebesgue measurable. Then so are all the sets Eqand (Eq) = (E), because Lebesgue outer measure is translation in-variant (because the volumes of boxes are, and Lebesgue outer measurewas defined in terms of those). From (16) we have:

    0 < 1 = ([0, 1]) (q[1,1]QEq) =

    q[1,1]Q

    (Eq)

    =

    q[1,1]Q

    (E).

    It follows from this that we must have (E) > 0. But then, again from(16) we get

    3 = ([1, 2]) (q[1,1]QEq) =

    q[1,1]Q

    (Eq)

    =

    q[1,1]Q

    (E) =,

    because an infinite sum, all of whose terms are a fixed positive numberis . This is a contradiction, so we must have that E is not Lebesguemeasurable.

    Remark 1. In defining the set E we used the so-called Axiom of choice,that asserts that if there is a family of sets {A : I}, then thereexists a function f : I IA such that f() A for each I.That is, there exists a choice function that selects one element fromeach set in the family. In our case the sets A are the equivalenceclasses with respect to .

    4 Measurable functions and their properties

    4.1 Open sets in [,]Sometimes it will be useful to consider functions taking values in [,],rather than in R. The following two definition adapt the notion of of openset to this setting.

    Definition 1. We say that I [,] is an open interval, if eitherI R, and I is an open interval in the ordibary sense, or if I has one of thefollowing forms:

    I =

    [, b) where < b ; or(a,] where a

  • Definition 2. We say that U [,] is open, if for any x U thereexists an open interval I such that x I U . A set F [,] isclosed, if F c = [,] F is open.

    4.2 Continuous functions

    Notation (inverse image): For any function f : X Y between twosets, and any B Y , we denote the inverse image of B under f as

    f1(B) := {x X : f(x) B}.Note that nothing is implied about f have an inverse: in general, it will nothave one. The notationf1(B) is simply a shorthand for the set of pointsin X that f maps into the set B.

    Definition 1. Let U Rd be an open set. A function f : U Rk is calledcontinuous, if for all open sets V Rk the set f(V ) is an open subset ofU .

    Note: A similar definition can be made for functions f : U [,].Remark. (HW) The above definition of continuity is equivalent to the usual definition of continuity:

    for all x U and all > 0, there exists > 0 suchthat whenever |y x| < then |f(y) f(x)| < .

    4.3 Measurable functions

    Definition 1. Let (X,M) be a measurable space. We say that a functionf : X Rk ismeasurable, if for all open sets V Rk, the set f1(V ) M.

    Note: When more than one -algebra is considered in X (a situationcommon in probability), then we say f isM-measurable, in order to clarifywhich -algebra is meant.

    We sometimes need the following.

    Definition 2. Let (X,M) be a measurable space. We say that a functionf : X [,] is measurable, if for all open V [,], the setf1(V ) M.Example 1. Let E M. Then 1E : X R defined by

    1E(x) :=

    {1 if x E;0 if x 6 E,

    is a measurable function (HW).

    32

  • Example 2. Let U Rd be open, and let

    BU := {B U : B B} (HW)= (open subsets of U),so that (U,BU ) is a measurable space. Suppose that f : U Rk is con-tinuous. Then f is also measurable on (U,BU ), as can be seen as follows.Let V Rk be open. Then f(V ) U is open, and hence it is in BU .Therefore, f is measurable on (U,BU ).

    4.4 Compositions of functions

    Let (X,M) be a measurable space.Theorem 1. Let U Rd be an open set. If f : X U is measurable,and g : U Rk is continuous, then h = g f , defined by h(x) = g(f(x)),h : X Rk, is measurable.Proof. Let V Rk be open. We have:

    h1(V ) = {x X : h(x) V } = {x X : g(f(x)) V }= {x X : f(x) g1(V )} = f1( g1(V )

    open byassump. on g

    )

    M by

    assump. on f

    M.

    Theorem 2. Let u, v : X R be measurable, let : R2 R be continuous,and put h(x) = (u(x), v(x)), h : X R. Then h is measurable.Proof. Put f(x) = (u(x), v(x)), so that f : X R2, and h = f . Due toTheorem 1, it is enough to show that f is measurable. Let R R2 be anyopen rectangle, that is R = I1 I2 for open intervals I1, I2 R. We have:

    f1(R) = {x X : f(x) R} = {x X : (u(x), v(x)) I1 I2}= {x X : u(x) I1} {x X : v(x) I2}= u1(I1) v1(I2) M.

    Hence the inverse image of any open rectangle is measurable. Now if V R2is any open set, we can write V = i=1Ri, with all Ris open rectangles(HW). Then we have

    f1(V ) = f1(i=1Ri)(HW)= i=1 f1(Ri)

    M

    M.

    33

  • Corollary.(i) If f : X R is measurable, so are |f |, f+ := max{f, 0} and f :=max{f, 0}.(ii) If f, g : X R are measurable, so are f + g and fg.(iii) If f : X C is measurable, there exists a measurable function : X C such that || 1, and f = |f |.Proof. (i) Follows from Theorem 1 with g(y) = |y|, g(y) = max{y, 0} andg(y) = max{y, 0}.

    (ii) Follows from Theorem 2 with (s, t) = s+ t and (s, t) = st.(iii) PutX0 := f

    1(C{0}) M. The restriction f |X0 : X0 C{0} ismeasurable (with respect toM0 := {E X0 : E M}), and g : C{0} C defined by g(z) = z/|z| is continuous. Therefore, on X0 we can define(x) = g(f(x)), and on X X0, we can set (x) = 1.

    Note: It is not difficult to check that (i) remains true if f : X [,].

    Note: In general, I will not spell out the case of complex valued func-tions in these notes. These do not present any major difficulty com-pared to the real-valued case, as you can check by consulting the text[4].

    4.5 Borel functions

    Definition 1. Consider the measurable space (Rd cB), or more generallytha space (U,BU ), where U Rd is open. A function g : U R is called aBorel function, it is measurable with respect to (U,BU ).Theorem. Let M be a -algebra in X, and f : X Rd a function.(i) Let F := {E Rd : f1(E) M}. Then F is a -algebra in Rd.(ii) If f is measurable, and E is a Borel set in Rd, then f1(E) M.(iii) If f : X R and f1((,)) M for every R, then f ismeasurable.(iii) If f : X [,] and f1((,]) M for every R, then f ismeasurable.(iv) If f : X Rd is measurable, g : Rd R is a Borel function, andh = f g, then h : X R is also measurable.

    34

  • Proof. (i) We check the three properties for F to be a -algebra. First,f1(Rd) = X M, so Rd F . Second, if E F , then

    f1(Rd E) = X f1(E) M

    M,

    and hence Rd E F , so F is closed under complements. Finally, ifE1, E2, F , then

    f1(i=1Ei) = i=1 f1(Ei) M

    M,

    so i=1Ei F , and F is closed under countable unions. This completes theproof that F is a -algebra.

    (ii) If f is measurable, then all open sets of Rd are in F . Since by part (i)the collection F is a -algebra, it follows that B F . This proves statement(ii).

    (iii), (iii) We only prove (iii), the proof of (iii) is very similar. LetF = {E [,] : f1(E) M}. By the same proof as in part (i), F is a -algebra. Choose R, and choose n < such that n . Then

    [, ) = n=1[, n] = n=1 (n,]c F

    by assump.

    F .

    It follows that(, ) = [, ) (,] F .

    Every open set V [,] is a countable union of such segements, so allopen sets in [,] belong to F . This implies that f is measurable.

    (iv) Let V R be open. Then

    h1(V ) = f1(g1(V ) B

    ) M by part (ii).

    Remark. Statement (iv) remains true if Rd is replaced by [,], [0,],or in fact any Borel subset of [,].

    35

  • 4.6 Limits of measurable functions

    Let (X,M) be a measurable space.Theorem. If fn : X [,], n 1 are measurable functions, then

    g := supn1

    fn and h := lim supn

    fn

    are also measurable. Similarly for infn1 fn, and lim infn fn.

    Proof. For any R we have

    g1((,]) = {x X : supn1

    fn(x) > }

    = {x X : fn(x) > for some n 1}

    =

    n=1

    {x X : fn(x) > }

    =

    n=1

    f1n ((,]) M, sincefn meas.

    M.

    This implies that g is measurable by Theorem 4.5(iii). From this we alsoget that

    g := infn1

    fn = (supn1

    fn)

    is also measurable.We can write

    h = limn

    supkn

    fk = infn1

    supkn

    fk,

    and this is measurable by the previous paragraph. Similarly, lim infn fn =supn1 infkn fk is also measurable.

    Corollary. (i) If fn : X [,] are measurable, and fn(x) f(x) in[,] at every x X, then f is also measurble.(ii) If fn : X R are measurable, and fn(x) f(x) in R at every x X,then f is also measurable.(iii) If f : X [,] is measurable, so are f+ := max{f, 0} and f :=max{f, 0}.Remark. |f | = f+ + f and f = f+ f are useful identities.

    36

  • 4.7 Simple functions

    Let (X,M) be a measurable space.Definition. A function s : X R is called simple, if its range is a finiteset.

    Note: Here we exclude from the possible values.If 1, . . . , n are the distinct values of s, we set

    Ai = {x X : s(x) = i}.

    Then

    s =ni=1

    i1Ai .

    Remark. (HW)

    s is measurable 1 i n : Ai M.

    Theorem. Let f : X [0,] be measurable. There exist simple functionssn, such that:(a) 0 s1 s2 f ;(b) sn(x) f(x) as n at every x X.Proof. For each n 1, define the function:

    n(t) :=

    {k2n if k2n t < (k + 1)2n for some integer 0 k < n2n;n if t n.

    It is easy to see that: each n is a Borel function on [0,]; t 2n < n(t) t, if 0 t n; 0 1(t) 2(t) t; n(t) t as n for all t [0,].Let us set sn(x) := n(f(x)). Then (a) + (b) are clear. Also, sn is measur-able as a composition of the measurable function f with the Borel functionn (Theorem 4.5(iv)).

    37

  • 4.8 Monotone class theorem

    Let X be a non-empty set.

    Theorem (Monotone class theorem). Let P be a -system in X, such thatX P. Let H be a collection of functions X R satisfying the following:(i) If A P, then 1A H;(ii) If f, g H, then also f + g H and for all c R also cf H (linearspace);(iii) If fn H, fn f , f bounded, fn 0, then also f H (closed underbounded monotone convergence of non-negative functions).Then H contains all bounded measurable functions with respect to (P).

    Proof. (Non-examinable) Step 1. Put

    G := {A X : 1A H}.

    Then P G by assumption. We show that G is a -system in X: X P by assumption, so by (i) we have X G; If B,C G, B C, then 1CB = 1C 1B H by (ii). HenceC B G. If Bn G, Bn B, then 1Bn 1B , and by (iii) we have 1B H.Hence B G.This verifies that G is a -system. By the - theorem, we concludethat (P) G.Summarizing the previous paragraph: the indicator function of anyelement of (P) is in H. It follows using (ii) that all simple (P)-measurable functions belong to H.Using (iii) and Theorem 4.7 we get that all non-negative bounded (P)-measurable functions belong to H. Using f = f+ f, the statementfollows for all bounded (P)-measurable functions.

    5 Abstract integration theory (Lebesgue integral)

    5.1 Arithmetic in [0,]Recall the arithmetic operations introduced in Section 0.2.

    Proposition. If 0 a1 a2 . . . and 0 b1 b2 . . . , and an a andbn b, then anbn ab.Proof. (HW).

    38

  • With the arithmetic introduced, sums and products of measurable func-tions with values in [0,] are also measurable. Indeed, if we have 0 sn fand 0 tn g (with sn, tn simple functions), then using the Proposition wehave (sn + tn) (f + g) and sntn fg, showing that f + g and fg are alsomeasurable.

    5.2 Integration of non-negative simple functions

    Let (X,M, ) be a measure space.Definition. Let s : X [0,) be a measurable simple function withdistinct values 1, . . . , n, so that s =

    ni=1 i1Ai . For any E M, we

    define Es d :=

    ni=1

    i(E Ai).

    Note: we use here the convention 0 = 0, because we may well havei = 0 and (E Ai) =.Lemma. If 0 s t are measurable simple functions, then

    Es d

    Et d.

    Proof. If s =

    i i1Ai and t =

    j j1Bj , then whenever Ai Bj is non-empty, we must have i j (since we assumed s t). Hence, we have

    Es d =

    i

    i(E Ai) =i

    j

    i(E Ai Bj)

    i

    j

    j(E Ai Bj) =j

    i

    j(E Ai Bj)

    =j

    j(E Bj) =Et d.

    5.3 Integration of non-negative functions

    As in the previous section, (X,M, ) is any measure space.

    39

  • Definition. Let f : X [0,] be a measurable function. For any E M,we define

    Ef d := sup

    {Es d : 0 s f , s a measurable simple function

    },

    (*)that is, the supremum is over all measurable simple functions s satisfying0 s f . The expression (*) is called the Lebesgue integral of f withrespect to . It is a value in [0,].

    Note: If f itself is a simple function, then s = f appears in the supre-mum, and due to Lemma 5.2 is the maximal value considered in the supre-mum. Therefore, for simple functions the definition (*) agress with theearlier Definition 5.2.

    5.4 Basic properties of the integral

    Theorem. All functions below are assumed measurable.(i) If 0 f g then E f d E g d.(ii) If A B, f 0 then A f d B f d.(iii) If f 0, c is a constant 0 c

  • the distinct values of cs. HenceEcs d =

    i

    ci(E Ai) = ci

    i(E Ai) = cEs d. (17)

    Also note that 0 s f if and only if 0 cs cf . Taking sup on bothsides of (17) we get the statement.

    (iv) If E = , then E s d = 0 for any 0 s f . If E 6= , 0 s f ,then 0 rng(s). Without loss of generality, assume 1 = 0. Then EA1 = for i 2, and

    Es d = 0(E A1) +

    i2

    i() = 0,

    (even if (E) =).(v) If (E) = 0, then (E Ai) = 0 for all i, so for any 0 s f

    we haveE s d = 0 (even if f(x) = for all x E). This implies the

    statement.(vi) We may assume that E 6= X (otherwise the statement is trivial).

    Let 0 s f , s = i i1Ai . Observe that 1Es is also a simple function.We distinguish two cases according to whether 0 rng(s) or 0 6 rng(s).

    If 0 rng(s), assume (without loss of generaility) that 1 = 0, and thatE Ai 6= for i = 2, . . . , n n, and E Ai = for n < i n. Then thedistinct values of 1Es are 0 = 1, 2, . . . , n , and

    1Es = 01EcA1 +ni=2

    i1EAi = 01EcA1 +ni=2

    i1EAi

    gives the decomposition of the simple function 1Es into a sum of indicators.The decomposition gives:X1Es d = 0(E

    c A1) +ni=2

    i(E Ai) =ni=1

    i(E Ai) =Es d.

    If 0 6 rng(s), then let assume without loss of generality that E Ai 6= for i = 1, . . . , n n, and E Ai = for n < i n. Then the distinctvalues of 1Es are 0, 1, 2, . . . , n , and

    1Es = 01Ec +ni=2

    i1EAi = 01Ec +ni=1

    i1EAi

    41

  • gives the decomposition of the simple function 1Es into a sum of indicators.The decomposition gives:

    X1Es d = 0(E

    c) +ni=1

    i(E Ai) =ni=1

    i(E Ai) =Es d.

    Thus in both cases we haveX1Es d =

    Es d. (18)

    Now we use that all simple functions 0 t 1Ef are of the form 1Es forsome simple function 0 s f . Therefore, taking the sup on both sides of(18) we get the statement.

    (vii) Put En = {x f(x) n}. Then for all n 1, we have

    n(En) =

    En

    nd(i)

    En

    f d(ii)

    Xf d =: K

  • Proof. (i) It is clear that : M [0,]. We also have () = 0, dueto Theorem 5.4(v). Suppose now that E1, E2, . . . M are disjoint, ands =

    ni=1 i1Ai . Then

    (m=1Em) =m=1Em

    s d =ni=1

    i((

    m=1

    Em

    )Ai

    )=

    ni=1

    i( m=1

    (Em Ai))=

    ni=1

    i

    m=1

    (Em Ai)

    =

    m=1

    ni=1

    i(Em Ai) =

    m=1

    Em

    s d =

    m=1

    (Em).

    This completes the proof that is a measure.(ii) Let t =

    mj=1 j1Bj . Then if Eij = Ai Bj , then we have

    Eij

    (s+t) d = (i+j)(Eij) = i(Eij)+j(Eij) =

    Eij

    s d+

    Eij

    t d.

    Due to part (i), we can sum both sides over i = 1, . . . , n, j = 1, . . . ,m to getthe statement.

    5.6 Monotone Convergence Theorem

    Theorem (Monotone Convergence Theorem). Let {fn}n1 be a sequenceof measurable functions on X, and suppose that:(i) 0 f1(x) f2(x) for all x X;(ii) fn(x) f(x) as n for all x X.Then f is measurable, and

    Xfn d

    Xf d, as n. (*)

    Proof. By Theorem 4.6, f is measurable. Due to Theorem 5.4(i), we haveX fn d

    X fn+1 d, and therefore there exists [0,], such that

    X fn d as n . Since fn f for all n 1, it is also clear that X f d. Hence it remains to show that

    Xf d. (19)

    Fix 0 < c < 1, and let 0 s f be a simple function. Consider the setsEn := {x X : fn(x) > cs(x)}.

    43

  • Due to the monotonicity of the fns we have E1 E2 . Also, since forevery x X we have f(x) > cs(x), and fn(x) f(x), for large enough n wehave x En. This shows that n=1En = X. Now observe that

    Xfn d

    En

    fn d En

    cs d = c

    En

    s d.

    As n, the left hand side approaches . Using that (E) := E s d is ameasure (Theorem 5.5), the right hand side approaches c(X) = c

    X s d,

    which gives:

    cXs d.

    Since 0 < c < 1 is arbitrary here, we can let c 1, and this yields:

    Xs d.

    Finally, taking the sup over s on the right hand side yields (19). The proofis complete.

    5.7 Sums of non-negative series

    As in earlier sections, let (X,M, ) be any measure space.Theorem.(i) If f, g : X [0,] are measurable, then

    X(f + g) d =

    Xf d+

    Xg d.

    (ii) If fn : X [0,], n = 1, 2, . . . are measurable, and f(x) =

    n=1 fn(x),then

    Xf d =

    n=1

    Xfn d.

    Proof. (i) There exist simple functions 0 si f and 0 ti g. Then dueto Proposition 5.1 we have 0 (si + ti) (f + g). Applying the MonotoneConvergence Theorem (Theorem 5.6), we get:

    X(f + g) d

    MCT= lim

    i

    X(si + ti) d

    Thm 5.5= lim

    i

    [Xsi d+

    Xti d

    ]= lim

    i

    Xsi d+ lim

    i

    Xti d

    MCT=

    Xf d+

    Xg d.

    44

  • (ii) Consider the partial sums gn(x) =n

    i=1 fi(x). Then 0 g1(x) g2(x) , and gn(x) f(x) for all x X. Hence by part (i) and theMonotone Convergence Theorem we get

    Xf d

    MCT= lim

    n

    Xgn d

    (i)= lim

    n

    [ni=1

    Xfi d

    ]=

    n=1

    Xfn d.

    Corollary. Suppose that aij 0, i, j = 1, 2, . . . are real numbers. Theni=1

    j=1 aij =

    j=1

    i=1 aij.

    Proof. Indeed, take X = {1, 2, . . . } with counting measure, and let fi : X [0,] be defined by fi(j) = aij .

    5.8 Fatous Lemma

    As earlier, (X,M, ) is any measure space.Theorem (Fatous Lemma). If fn : X [0,], n = 1, 2, . . . are measur-able, then

    Xlim infn

    fn d lim infn

    Xfn d.

    Proof. Put gn(x) := infkn fk(x), n = 1, 2, . . . . Then we have 0 g1(x) g2(x) , and gn(x) lim infn fn(x) as n . Hence due to theMonotone Convergence Theorem, we get

    Xlim infn

    fn d =

    X

    limn

    gn dMCT= lim

    n

    Xgn d. (20)

    But gn(x) fk(x), for all k n, and thereforeX gn d

    X fk d for all

    k n. Therefore, X gn d infkn X fk d. Inserting this into (20), theright hand side becomes lim infn

    X fn d, and the statement follows.

    5.9 Density functions

    Let (X,M, ) be any measure space.Theorem 1. Let f : X [0,] be measurable, and for E M, put(E) :=

    E f d. Then is measure on (X,M), and for all measurable

    g : X [0,] we have: Xg d =

    Xgf d.

    45

  • Proof. It is clear that : M [0,], and that () = 0. Suppose thatE1, E2, . . . M are disjoint and E = n=1En. Then 1Ef =

    n=1 1Enf .

    Therefore, using Theorem 5.7(ii) we have

    (E) =

    Ef d =

    X1Ef d

    Thm 5.7=

    n=1

    X1Enf d =

    n=1

    En

    f d

    =n=1

    (En).

    This proves that is a measure.For the second statement, first note that it holds when g = 1E for some

    E M, since: X1E d = (E) =

    Ef d =

    X1Ef d.

    From this, the statement also follows for simple functions g =n

    i=1 i1Ai .Indeed:

    X

    [ni=1

    i1Ai

    ]d

    Thm 5.7(i)Thm 5.4(iii)

    =ni=1

    i

    X1Ai d

    (i)=

    ni=1

    i

    X1Aif d

    Thm 5.7(i)Thm 5.4(iii)

    =

    X

    [ni=1

    i1Ai

    ]f d.

    For general non-negative measurable g, take simple 0 si g, and use theMonotone Convergence Theorem of both sides of the equality:

    Xsi d =

    Xsif d

    to get the statement.

    Remark. It is customary to write d = f d, and call f the density func-tion of with respect to .

    There is an important converse to the above theorem that we now state.

    Definition. If and are both measures on (X,M), we say that isabsolutely continuous with respect to , denoted , if whenever(E) = 0 then also (E) = 0.

    46

  • Observe that has this property: if (E) = 0, then (E) =E f d = 0.

    The following theorem states that under a -finiteness assumption, anyfinite measure that is absolutely continuous with respect to has a density.

    Theorem 2 (Radon-Nikodym Theorem). If is -finite, and (X) < ,then there exists a non-negative measurable function f on X such that(E) =

    E f d for all E M.

    (Non-examinable) For the proof see for example [4].

    5.10 Integration of signed functions

    Let (X,M, ) be any measure space.Definition 1. We define:

    L1() :=

    {f : X R : f is a measurable and

    X|f | d

  • 5.11 Linearity of the integral

    Let (X,M, ) be any measure space.Theorem 1. If f, g L1() and , R, then (f + g) L1() and wehave

    X(f + g) d =

    Xf d+

    Xg d. (21)

    Proof. It is clear from Corollary 4.4(ii) that f + g is measurable. To seeintegrability, we estimate

    X|f + g| d

    X(|||f |+ |||g| ) d

    Thm 5.7Thm 5.4(iii)

    = ||X|f | d+ ||

    X|g| d

  • When < 0, we can reduce to the case of positive as follows:

    l.h.s. of (23) =

    X(()(f))+ d

    X(()(f)) d

    =

    X()(f) d

    X()(f+) d

    = ()Xf d ()

    Xf+ d

    =

    [Xf d+

    Xf+ d

    ]= r.h.s. of (23).

    The following theorem is essentially trivial for real-valued functions, sowe spell out the complex-valued case that is only slightly more tricky.

    Theorem 2. Suppose f : X C. If f L1(), then X f d X |f | d.Proof. When f is real-valued, the statement follows from the fact that thedistance between two non-negative real numbers is at most their sum:

    Xf d

    = Xf+ d

    Xf d

    Xf+ d+

    Xf d =

    X|f | d.

    When f is complex-valued, we use that there exists a real number suchthat | X f d| = ei X f d. Therefore, we have

    Xf d

    = ei Xf d

    Thm 1=

    Xeif d =

    X(eif) d.

    Here (z) denotes the real part of the complex number z, and the equalityholds, because the quantity we started with at the left hand side is real.Now, the real part of a complex number is at most its modulus, so we have

    Xf d

    = X(eif) d

    X

    eif d = X|f | d,

    since |ei| = 1. This proves the statement in the complex case.Remark. It is not difficult to see that equality holds here if and only if(eif) |f |, except on a set of measure 0. In other words, if and onlyif f = ei|f |, that is the argument of f is constant, apart from a set ofmeasure 0.

    49

  • 5.12 Dominated Convergence Theorem

    Let (X,M, ) be a measure space.Theorem (Dominated Convergence Theorem). Suppose that fn : X Care measurable, and fn(x) f(x) for every x X. If there exists g : X [0,] such that g L1() and |fn(x)| g(x) for every x X, then:(i) limn

    X |fn f | d = 0;

    (ii) limnX fn d =

    X f d.

    Proof. Letting n in the inequality |fn(x)| g(x) we get |f(x)| g(x),and hence f L1(). Since |fn f | 2g, we have 2g |fn f | 0.Therefore, Fatous Lemma (Theorem 5.8) can be applied to the sequence2g |fn f |. This gives

    X2g d =

    Xlim infn

    (2g |fn f |) d lim infn

    X(2g |fn f |) d

    =

    X2g d+ lim inf

    n

    X|fn f | d

    =

    X2g d lim sup

    n

    X|fn f | d.

    SinceX 2g d < , we may cancel it from the left hand side and the last

    expression, and rearrange to get

    lim supn

    X|fn f | d 0.

    But the lim sup of a non-negative sequence can only be less than or equalto 0, if the sequence converges to 0, so we get the statement (i).

    The statement (ii) now follows from (i), because | X fn d X f d| X |fn f | d.

    5.13 The role of sets of measure 0

    Let (X,M, ) be a measure space.Definition. Let P be a property that a point x X may or may not have.For example, f(x) > 0 (with f a given function), or fn(x) converges to alimit (with {fn} a given sequence of functions). Given E M, we say thatP holds almost everywhere on E, if there exists N M with (N) = 0,such that P holds for all x E N . We often abbreviate this to P holdsa.e. on E, and when the measure needs to be emphasized, to P holdsa.e.[] on E.

    50

  • Example. If f, g : X R are measurable, and ({x X : f(x) 6= g(x)}) =0, we say that f = g a.e.[] on X, or simply that f = g a.e.[]. We are goingto write f g for this relation (f and g are equivalent). When f g,then for all E M we have E f d = E g d. That is, equivalent functionsbehave identically for the purposes of integration.

    Lemma 1. The relation is indeed an equivalence relation.Proof. Reflexivity is clear: f = f a.e.[] (we can take the exceptional setN = .

    Symmetry is also clear, since if f g with exceptional set N , then thesame exceptional set can be used to show that g f .

    Let us check transitivity. Suppose that f g and g h, and let N1 andN2 be the exceptional sets: f(x) = g(x) for all x XN1, and g(x) = h(x)for all x X N2, where (N1) = 0 = (N2). Put N = N1 N2. Then(N) = 0, and we have f(x) = h(x) for all x X N .Remark 1. As in the last part of the proof above, we can combine exceptionalsets to show that several statements simultaneously hold a.e. The only thingwe have to be careful about is that we are only allowed to combine countablymany statements.

    L1() and L1[]. It is sometimes convenient to identify equivalent func-tions. Let us write

    [f ] :={f1 L1() : f1 f

    }.

    Define

    L1[] := {[f ] : f L1()} = {[f ] :X|f | d

  • 5.14 Completion

    Theorem. Let (X,M, ) be a measure space. Let

    M := {E X : A,B M, A E B, such that (B A) = 0}.

    For E M, define (E) = (A), with A as in the definition of M. ThenM is a -algebra, is well-defined and is a measure on M.Proof. (HW)

    Definition. We call M the completion of M with respect to . Ithas the property that if E M, (E) = 0, and F E, then we also haveF M (and of course (F ) = 0 as well). We say that the measure space(X,M, ) is complete.

    5.15 Series absolutely convergent in L1()

    Let (X,M, ) be a measure space.Theorem. Suppose fn are defined a.e.[] on X, and are measurable. Sup-pose

    n=1

    X|fn| d

  • Let E = {x S : (x) < }. From (26) we have (Ec) = 0. For everyx E, the series (25) converges absolutely, and |f(x)| (x) on E. Hencef L1(). If gn = f1 + + fn, then |gn| , gn(x) n f(x) at everyx E, so by the Dominated Convergence Theorem, we get (*).

    5.16 Examples of a.e.-type conclusions

    Let (X,M, ) be a measure space.1. If f : X [,] is measurable, E M, E |f | d = 0, then f = 0

    a.e. on E.

    2. If f L1(), E f d = 0 for all E M, then f = 0 a.e. on X.3. If f L1(), X f d = X |f | d, then there exists a constant ,|| = 1, such that f = |f | a.e. on X.

    6 Inequalities and Lp spaces

    6.1 Convex functions

    Definition. A function : (a, b) R (where a < b ) is calledconvex if

    (x+ (1 )y) (x) + (1 )(y) (27)for all x, y (a, b) and for all 0 1. Equivalent to (27) is the require-ment that

    (t) (s)t s

    (u) (t)u t (28)

    for all a < s < t < u < b.

    How to check convexity? A differentiable function is convex on (a, b),if (s) (t) for all a < s < t < b.

    We will use the following theorem from elementary analysis.

    Theorem. If is convex on (a, b), then it is continuous on (a, b).

    Proof. (Non-examinable) Fix t (a, b). We show that is continuousat t. Fix any a < u < t < v < b. Then for u < s < v we have:

    (t) (u)t u

    (s) (t)s t

    (v) (t)v t .

    53

  • Hence

    |(s) (t)| |s t|max{(t) (u)t u

    , (v) (t)v t} .

    This shows that as s t, (s) (t). (We even get the strongerstatement that is Lipschitz on any compact subinterval of (a, b).)

    6.2 Jensens inequality

    Theorem. Suppose (,M, ) is a measure space such that () = 1 (aprobability space). If f L1(), a < f(x) < b for all x , and is convexon (a, b), then

    (f d

    )( f) d. (29)

    Remark. It may happen that ( f) 6 L1(). In this case the proof belowwill show that the right hand side is +.Proof of Theorem. Denote t :=

    f d. Then a < t < b. Let

    := sups:a

  • Since is continuous, f is measurable. Integrating both sides of (32) weget:

    ( f) d

    (f d

    )

    f d (

    f d)()

    =0

    0.Hence (29) follows.

    6.3 Examples

    1. Take (x) = ex. Then

    exp

    {f d

    }ef d.

    2. Suppose is finite, = {x1, . . . , xn}, ({xi}) = 1n , f(xi) = ai. Thenitem (1) specializes to:

    exp

    {1

    n(a1 + + an)

    } 1

    n(ea1 + + ean).

    3. Putting bi = eai in item (2), we get the familiar inequality between

    the geometric and arithmetic mean:

    (b1 bn)1/n 1n(b1 + + bn).

    4. A bit more generally than item (3), if ({xi}) = pi > 0, whereni=1 pi = 1, we have:

    bp11 bpnn p1b1 + + pnbn.

    6.4 Holders inequality; Minkowskis inequality

    Let (X,M, ) be a measure space.Definition. If 1 < p < , 1 < q < and 1p + 1q = 1, we call p andq conjugate exponents. (A special case if p = q = 2.) We extend thedefinition to the pairs p = 1, q = and p =, q = 1.

    55

  • Theorem. Let p and q be conjugate exponents, 1 < p 0 and B = , then the right handside of (Holders inequality) is , and the inequality holds. By symmetricarguments, we may discard the case B = 0 and the case B > 0, A = .Henceforth we may assume 0 < A
  • We now prove Minkowskis inequality. Write

    (f + g)p = f(f + g)p1 + g(f + g)p1. (34)

    Using Holders inequality for the first term, we havef(f + g)p1

    (fp)1/p(

    (f + g)(p1)q)1/q

    =(

    fp)1/p(

    (f + g)p)1/q

    ,

    where in the last step we used

    (p 1)q = (p 1) 11 1p

    =p(p 1)p 1 = p.

    Similalrly, using Holders inequality for the second term in (34), we haveg(f + g)p1

    (gp)1/p(

    (f + g)(p1)q)1/q

    =(

    gp)1/p(

    (f + g)p)1/q

    .

    Adding the two inequalitues, we have:(f + g)p

    [(fp)1/p

    +(

    gp)1/p](

    (f + g)p)1/q

    . (35)

    We now discard some trivial cases to make sure there is no problem withinfinities, when re-arranging. If

    (f + g)p = 0, then f = g = 0 a.e., and

    (Minkowskis inequality) holds. Assume(f+g)p > 0. If we have

    fp =

    ofgp = , then (Minkowskis inequality) clearly holds. Otherwise, using

    that (f+g2 )p 12(fp+ gp), we have

    (f + g)p

  • 6.5 Lp-spaces

    Let (X,M, ) be a measure space.Definition 1. If 0 < p < , and f : X [,] is measurable, wedefine

    fp :={

    X|f |p d

    }1/p,

    and let

    Lp() := {f : X [,] : f is measurable and fp

  • Example 3. We write L(Rd), when is Lebesgue measure on Rd. When is counting measure on A, all functions are measurable, and only the emptyset has measure 0, so the definition reduces to:

    (A) = {f : A R : f is bounded}.When A is countable, we write for the space of bounded sequences.

    The following theorem is essentially obvious from our definitions andHolders inequality for non-negative functions.

    Theorem 1. If p and q are conjugate exponents, 1 p , and f Lp()and g Lq(), then fg L1() and fg1 fpgq.Proof. If 1 < p

  • The spaces Lp() and Lp[]. Note that for R we have fp =||fp, Lp() is a vector space. The following abstract nonsense is some-times useful. We write Lp[] for the collection of equivalence classes offunctions in Lp(). Then we can define a distance on Lp[] as follows:

    d([f ], [g]) := f gp.It it easy to check that the right hand side indeed does not depend on whichrepresentatives we pick from the equivalence class. With this definition, thedistance function d satisfies the triangle inequality: for [f ], [g], [h] Lp[]we have

    d([f ], [h]) = f hp f gp + g hp = d([f ], [g]) + d([g], [h]).Moreover, if d([f ], [g]) = 0, we have f gp = 0, and hence |f g| = 0 a.e.and hence [f ] = [g]. The latter property would not have been true withoutthe idenitification of equivalent functions.

    6.6 Completeness of Lp()

    Let (X,M, ) be a measure space.Definition 1. If fn Lp(), n = 1, 2, . . . and f Lp(), we say that fnconvergese to f in Lp() if limn fn fp = 0. We denote this asfn

    Lp f .Definition 2. We say that a sequence {fn}n1 of elements of Lp() is aCauchy sequence in Lp(), if for any > 0 there exists N = N() suchthat for all n,m N we have fn fm < .Theorem 1 (Riesz-Fisher Theorem). Let 1 p . Then Lp() iscomplete, that is, every Cauchy sequence {fn}n1 in Lp() converges to alimit f Lp() in Lp().Proof. We first prove the case 1 p n1.

    60

  • Next, using the Cauchy property with = 1/4, we can find n2 > n1, suchthat

    fm fn2p n2.

    Continuing inductively, we find indices n1 < n2 < . . . such that

    fm fnip ni. (37)

    The inequality (37) in particular implies (36).Put

    gk :=ki=1

    |fni+1 fni |, g :=i=1

    |fni+1 fni |.

    We claim that g Lp(). We have

    gkp ki=1

    fni+1 fnip ki=1

    1

    2i 1.

    Since 0 gpk gp, the Monotone Convergence Theorem gives:Xgp d = lim

    k

    Xgpk d 1.

    In particular, g

  • Fix > 0. Due to the Cauchy property, there exists N such that forn,m N we have fm fnp < . In particular, fixing an n N , andapplying Fatous Lemma we have

    X|f fn|p d =

    Xlim infk

    |fnk fn|p d

    lim infk

    X|fnk fn|p d

    lim infk

    fnk fnpp p.

    (38)

    Since fn Lp() andfp fnp + f fnp fn fm}have measure 0. Let E = (k=1Ak) (n,m=1Bn,m), so that (E) = 0.At every x X E, the sequence {fn(x)}n1 is a Cauchy sequence in R,and hence converges to a limit f(x) in R. Since the Cauchy property holdsuniformly at every x X E, |f(x)| is uniformly bounded for x X E,and the convergence is uniform in x X E. Therefore, f L(), andfn f 0 as n.

    The following theorem is a corollary of the proof of Theorem 1, and isuseful in itself.

    Theorem 2. Let 1 p . If {fn}n1 is a Cauchy sequence in Lp()with limit f , then there exists a subsequence fnk such that fnk f a.e.

    7 Product spaces and Fubinis theorem

    7.1 Product measure

    Definition 1. Let (X,M, ) and (Y,N , ) be measure spaces. A set of theform A B X Y , where A M and B N , is called a measurablerectangle.

    62

  • We denoteS := {AB : A M, B N} .

    Lemma. S is a semi-algebra.Proof. If AB, C D S, we have

    (AB) (C D) = (A C) M

    (B D) N

    S,

    so S is closed under intersection.If AB S, we have:

    (AB)c = (Ac B) (ABc) (Ac Bc),

    which is a disjoint union of members of S. This shows that S is a semi-algebra.

    Definition 2. We denote M N := (S), and call it the product -algebra (of M and N ).

    Caution: Although we write MN , this is not a Cartesian productof the collections M and N .

    (HW): Show that if Bn = B(Rn), then Bn Bm = Bn+m.Theorem. If and are -finite measures, then there exists a uniquemeasure on the product -algebra MN such that

    (AB) = (A)(B), for all A M, B N . (39)

    Definition 3. We denote =: , and call it the product measure (of and ).

    Proof of Theorem. Formula (39) defines on S. We show that the condi-tions of Theorem 3.7 are satisfied. We have

    () = ( ) = ()() = 0.

    We also show that if AB = i=1(Ai Bi) is a disjoint union, then

    (AB) =i=1

    (Ai Bi). (40)

    63

  • Given x A, write I(x) := {1 i < : x Ai}. Then we have B =iI(x)Bi, and hence by countable additivity of , we have

    1A(x)(B) =

    i=1

    1Ai(x)(Bi).

    Integrating both sides with respect to , and using Theorem 5.7(ii), we get

    (A)(B) =

    X

    (i=1

    1Ai(x)(Bi)

    )d(x) =

    i=1

    X1Ai(x)(Bi) d(x)

    =i=1

    (Ai)(Bi).

    This proves the claim (40). An application of Theoem 3.7 yields that extends uniquely to a pre-measure on the algebra generated by S.

    Since we assumed that and are -finite, we can write X = n=1Xn,Y = n=1Yn, where (Xn), (Yn) < . Therefore, we have X Y =n=1XnYn, with (XnYn)

  • (i) for all x X the function y 7 f(x, y) is N -measurable; and(ii) the function x 7 Y f(x, y) d(y) is defined -a.e. and isM-measurable.

    Assume first that f = 1E , where E MN . For x X, define

    Ex := {y Y : (x, y) E},

    called the cross section of E at x.

    Lemma 1. Let E MN . Then for all x X, we have Ex N .Proof. Fix x X, and conside the collection of sets for which that statementin the lemma is true, that is, let

    E := {E MN : Ex N}.

    We have S E : indeed, if AB S, we have

    (AB)x ={B if x A; if x 6 A.

    We show that E is a -algebra in X Y , which will imply that E contains(S) =MN , and hence equals MN .

    It is clear that X Y E . Let now E E . Then(Ec)x = {y Y : (x, y) Ec} = {y Y : (x, y) 6 E}

    = {y Y : (x, y) E}c = (Ex)c N ,showing that E is closed under complements. Let E1, E2, . . . E . Then

    (i=1Ei)x = {y Y : (x, y) i=1Ei} = i=1{y Y : (x, y) Ei}= i=1(Ei)x N ,

    showing that E is closed under countable unions. Therefore, E is indeed a-algebra, and the proof of the lemma is complete.

    Lemma 2. Let E MN . The function g(x) := (Ex) is M-measurable,and

    X g d = ( )(E).

    Proof. Let X = n=1Xn and Y = n=1Yn, such that (Xn), (Yn) < .We first prove the statement for E Xn Yn. Let

    Pn := measurable rectangles in Xn Yn= {AB : A Xn, A M, B Yn, B N} .

    65

  • Then Pn is a -system in Xn Yn. Let Ln denote the collection of subsetsof Xn Yn that are in MN and for which the statement of the lemma istrue:

    Ln :={E MN : E Xn Yn, the function x 7 (Ex) is M-

    measurable andX (Ex) d = ( )(E)

    }.

    We first check that Pn Ln. Let E = AB Pn. Then

    g(x) = (Ex) =

    {(B) if x A;0 if x 6 A.

    So g(x) = (B)1A(x), and this function is indeed M-measurable, sinceA M. Also, X g d = (A)(B) = ( )(E)

    Now we check that Ln is a -system. First, it is clear that XnYn Ln,by the previous paragraph. Let E,F Ln, E F . Then

    ((F E)x) = (Fx Ex) = (Fx) (Ex).

    Here we used that we are allowed to subtract, since the sets have finitemeasure due to (Yn) < . Since E,F Ln, the functions x 7 (Fx)and x 7 (Ex) are both M-measurable, and hence x 7 ((F E)x) isM-measurable. Also, we have

    X((F E)x) d =

    X(Fx) d

    X(Ex) d

    = ( )(F ) ( )(E) = ( )(F E).Here we used that we are allowed to subtract, since the integrals are finitedue to (Xn) < . Therefore, we have F E Ln. The proof is similarfor monotone unions. Let E1 E2 . . . , E1, E2, . . . Ln, and write E =i=1Ei. Then Ex = i=1(Ei)x, and this is an increasing union. Therefore

    (Ex) = (i=1(Ei)x) = limi

    ((Ei)x).

    Thus x 7 (Ex) is a limit of M-measurable functions, and hence M-measurable. Also, due to the Monotone Convergence Theorem, we have

    X(Ex) d = lim

    i

    X((Ei)x) d = lim

    i( )(Ei) = ( )(E).

    Thsi shows that E Ln, and completes the proof that Ln is a -system.

    66

  • Now we can apply the - Theorem to get Ln (Pn) =M|Xn N|Yn ,that is the statement of the lemma holds for all E Xn Yn, E MN .

    To complete the proof of the lemma, let now E MN be arbitrary,and let En = E (Xn Yn). We have

    (Ex) = limn

    ((En)x),

    so the function x 7 (Ex) is a limit of M-measurable functions and henceitself M-measurable. Also, the Monotone Convergence Theorem gives

    X(Ex) d = lim

    n

    X((En)x) d = lim

    n( )(En) = ( )(E).

    This completes the proof of the lemma.

    We can now finish the proof of Fubinis Theorem in a few steps. Iff = 1E for some E MN , the statement holds due to Lemma 2.

    If f =n

    i=1 i1Ei is a non-negative simple function, then the function

    y 7 f(x, y) =ni=1

    i1(Ei)x(y)

    is N -measurable, as a linear combination of N -measurable functions. Inte-grating it with respect to we have

    Yf(x, y) d(y) =

    ni=1

    i((Ei)x).

    So using Lemma 2 the function x 7 Y f(x, y) d(y) is M-measurable.Integrating with respect to , we get:

    X

    [Yf(x, y) d(y)

    ]=

    ni=1

    i

    X((Ei)x) d(x) =

    ni=1

    i( )(Ei)

    =

    XY

    f d( ).

    So the statement holds for non-negative simple functions.Let now f be any non-negative M N -measurable function. Choose

    simple functions 0 sn f . Then by the previous paragraph,

    y 7 f(x, y) = limn

    sn(x, y)

    67

  • is a limit of N -measurable functions, and hence is N -measurable. By theMonotone Convergence Theorem,

    Yf(x, y) d(y) = lim

    n

    Ysn(x, y) d(y).

    Hence x 7 Y f(x, y) d(y) is a limit ofM-measurable functions, and henceis itself M-measurable. Integrating with respect to and using the Mono-tone Convergence Theorem again yields

    X

    [Yf(x, y) d(y)

    ]d(x) = lim

    n

    X

    [Ysn(x, y) d(y)

    ]d(x)

    = limn

    XY

    sn d( )

    =

    XY

    f d( ).

    This completes the non-negative case of the Theorem.Finally, suppose that f L1( ). Write f = f+ f. Then since f+

    and f are MN -measurable,

    y 7 f(x, y) = f+(x, y) f(x, y)

    is also MN -measurable. Applying the non-negative case to |f | we havethat

    X

    [Y|f(x, y)| d(y)

    ]d(x)

  • (x). Therefore, due to (41) we have g L1() andX

    [Yf(x, y) d(y)

    ]d(x)

    =

    Xg(x) d(x)

    =

    X

    [Yf+(x, y) d(y)

    ]d(x)

    X

    [Yf(x, y) d(y)

    ]d(x)

    =

    XY

    f+ d( )XY

    f d( ) =XY

    f d( ).

    This completes the L1-case of the Theorem.

    7.4 Important counterexamples

    Example 1. Let X = Y = {1, 2, . . .} with = = counting measure. Putf(m,m) = 1, f(m,m + 1) = 1 for all m 1, and put f(m,n) = 0otherwise. Then

    m

    n

    f(m,n) = 1 andn

    m

    f(m,n) = 0.

    Here f 6 0 and f 6 L1( ).Example 2. Take X = (0, 1), Y = (1,) with Lebesgue measure. Letf(x, y) = exy 2e2xy. Then 1

    0

    [ 1

    f(x, y) dy

    ]dx > 0

    1

    [ 10f(x, y) dx

    ]dy < 0.

    Here again f 6 0 and f 6 L1((0, 1) (1,)).Example 3. Let X = (0, 1), M = B, = Lebesgue measure, and let Y =(0, 1), N = all subsets of (0, 1), and = counting measure. Let

    f(x, y) =

    {1 if x = y;

    0 if x 6= y.We have

    X

    [Yf(x, y) d(y)

    ]d(x) =

    X1 d(x) = 1

    Y

    [Xf(x, y) d(x)

    ]d(y) =

    Y0 d(y) = 0.

    69

  • Here is not -finite.

    Example 4. (Non-examinable) Take X = Y = [0, 1] with Lebesguemeasure. Let be a linear ordering of [0, 1] with the property thatany non-empty subset 6= A [0, 1] has a least element with respectto (called a well-ordering). Such exists if we assume the so-calledAxiom of Choice. Further assume that has the property that for allx [0, 1] the set {y [0, 1] : y x} is countable. This can be achievedif we assume the so-called Continuum Hypothesis. Let

    f(x, y) =

    {1 if y x;0 otherwise.

    We have X

    [Y

    f(x, y) d(y)

    ]d(x) =

    X

    0 d(x) = 0;Y

    [X

    f(x, y) d(x)

    ]d(y) =

    Y

    1 d(y) = 1.

    What goes wrong here is that f is not B B-measurable (although itis measurable separately in both variables).

    7.5 Convolutions

    Let f, g L1(R). We define

    h(x) := (f g)(x) :=

    f(y)g(x y) dy, (42)

    whenever the integral exists.

    Theorem. Suppose f, g L1(R) are B-measurable. Then

    |f(y)g(x y)| dy

  • First we have to check that F is a Borel function on R2. For this, define, : R2 R by

    (x, y) := y and (x, y) := x y.Since and are continuous on R2, they are Borel functions. It followsthat

    (x, y) 7 f(y) = (f )(x, y) and (x, y) 7 g(x y) = (g )(x, y)are Borel functions, and hence so is their product F (x, y). Apply FubinisTheorem to |F |:

    [

    |f(y)g(x y)| dy]dx

    =

    [

    |f(y)||g(x y)| dx]dy

    =

    |f(y)|[

    |g(x y)| dx]dy

    =

    |f(y)|[

    |g(x)| dx]dy

    = f1g1

  • for probability is summarized in many texts, for example [1] or [5]. Anice short treatment is provided by the book of Kolmogorov [3], whowas the first to work out the measure theoretic foundations of proba-bility. Statements not proved in detail in this section can be found inthese references.

    8.1 Product spaces

    Example 1. Let X = {0, 1}, M = {, {0}, {1}, {0, 1}}, and i = (1 pi)0+pi1, i = 1, . . . , n. The probability space (X,M, i) models theflip of a coin with bias pi. The product space where n = X X,Fn = M M, Pn = 1 n models flipping the coinsindependently (see Section 8.2 below):

    Pn[{(1, . . . , n)}] = 1({1}) . . . n({n}).

    Sometimes it is necessary to model an infinite sequenc