Upload
doanthien
View
220
Download
0
Embed Size (px)
Citation preview
MEASURE AND INTEGRATION
Syafiq Johar
Contents
1 Riemann/Darboux Integration 1
2 Measure 6
2.1 Caratheodory Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Lebesgue Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Borel Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Cantor Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Measurable Functions 16
4 Lebesgue Integration 18
4.1 Lebesgue Integration of Non-Negative Functions . . . . . . . . . . . . . . . . . . 19
4.2 Lebesgue Integration of General Functions . . . . . . . . . . . . . . . . . . . . . . 21
5 Convergence Theorems 22
6 Double Integrals 27
6.1 Product Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Theorems by Tonelli and Fubini . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1 Riemann/Darboux Integration
The intuition behind Riemann or Darboux integral is that we slice the subgraph of a function
into strips and approximate the area of the subgraph from above and below by rectangles.
These rectangles have a well defined area which is simply the product of their side lengths. The
integral is defined to be the “limiting” area of these rectangles if we take finer and finer slices.
The first step in defining a Riemann integral is to define the partition and step functions.
Definition 1.1 (Partition). Let [a, b] be an interval in R. Then a partition P of the interval
[a, b] is a finite sequence of a = x0 < x1 < x2 < · · · < xn = b. Each (xi−1, xi] is called a
subinterval of the partition.
1
Definition 1.2 (Refinement of partition). A partition P ′ given by a = x0 < x′1 < x′2 < · · · <x′m = b is called a refinement of the partition P defined above if each xi is equal to some x′j for
some j ∈ {1, 2, . . . ,m}.
An important remark here is that if P1 and P2 are two different partitions of [a, b], then
there exists a common refinement P3 of both P1 and P2. This is simply done by taking the
union of the partition points from each partition and reorder them increasingly.
Definition 1.3 (Step function). A step function φ : [a, b]→ R is called a step function adapted
to the partition P if φ is a constant on each interval (xi, xi+1].
Suppose that for each i = 1, 2, . . . , n we have φ(x) = ci for x ∈ (xi, xi−1]. We can express
the step function φ as the sum:
φ =
n∑i=1
ci1(xi−1,xi],
where 1(xi−1,xi] is the indicator function on the interval (xi−1, xi]. We know how to “integrate”
these step functions. This is done by summing up the lengths of each of the subintervals in the
partition weighted by the value of φ in this interval. More precisely, this is defined as:
Definition 1.4. Let φ be a step function for some partition P. Suppose that for each i =
1, 2, . . . , n we have φ(x) = ci for x ∈ (xi, xi−1]. Then, we define
I(φ,P) =
n∑i=1
ci|xi − xi−1|.
Note that if P ′ is a refinement of P, we have I(φ,P) = I(φ,P ′).How do we adapt this to general functions? We define upper and lower Darboux sums. For
a function f : [a, b]→ R, given a partition P, we define:
mi = infx∈(xi−1,xi]
f(x),
Mi = supx∈(xi−1,xi]
f(x).
From this, we approximate the function f from below and from above via step functions
adapted to the partition P with two different approximations called the lower and upper ap-
proximations respectively:
¯f(P) =
n∑i=1
mi1(xi−1,xi],
f(P) =n∑i=1
Mi1(xi−1,xi].
Note that given a partition P, we have the pointwise ordering¯f(P) ≤ f ≤ f(P). Further-
more, if P ′ is a refinement of P, then we have¯f(P ′) ≥
¯f(P) and f(P ′) ≤ f(P), that is the finer
the partition, the lower approximation gets bigger and the upper approximation gets smaller
pointwise.
2
From this, we can define the lower and upper Darboux sum which is given as:
Lf,P = I(¯f(P)) =
n∑i=1
mi|xi − xi−1|,
Uf,P = I(f(P)) =n∑i=1
Mi|xi − xi−1|.
By the same argument as the pointwise approximation, we get that the finer the partition,
the lower Darboux sum gets bigger and the upper Darboux sum gets smaller. Note that for
a given partition P, we have Lf,P ≤ Uf,P by definitions of mi and Mi. Finally, we define the
lower and upper Darboux integral by taking the supremum and infimum of the ower and upper
sums over all possible partitions:
Lf = sup{I(¯f(P)) : P is a partition of [a, b]},
Uf = inf{I(f(P)) : P is a partition of [a, b]}.
Note that, if P1 and P2 are two different partitions of [a, b], letting P3 be their common
refinement, we have:
Lf,P1 ≤ Lf,P3 ≤ Uf,P3 ≤ Uf,P2 ,
which implies that the lower Darboux sum for any partition P of [a, b] is always smaller than
equal to the upper Darboux sum of any partition. Thus, we must have Lf ≤ Uf . We call a
function f : [a, b] → R Darboux integrable or integrable if Lf = Uf and we define its integral
as: ˆ b
af(x) dx = Lf = Uf .
Remark 1.5. Technically, the above construction is called the Darboux integral. The construc-
tion done by Riemann is done by considering a “tagged partition” in which for each subinterval
in a partition P, a point (the tag) pi ∈ (xi−1, xi) is chosen. A Riemann sum is defined exactly
as the Darboux sum but instead building a step function by evaluating at the infimum or supre-
mum of the function at the subinterval, the step function is built by evaluating at the tagged
points pi. This can be shown to be equivalent to the construction by Darboux.
Proposition 1.6 (Properties of Riemann/Darboux integral). Suppose that f, g : [a, b]→ R are
integrable on [a, b], then
1. The integral is linear, that is for constants λ, µ, we have´ ba λf + µg dx = λ
´ ba f dx +
µ´ ba g dx.
2. The functions min(f, g) and max(f, g) are also integrable.
3. If f ≤ g pointwise, then´ ba f dx ≤
´ ba g dx. A direct corollary is |
´ ba f dx| ≤
´ ba |f | dx.
The following are called the fundamental theorem of calculus:
Proposition 1.7 (Fundamental theorem of calculus).
3
1. Suppose that h : (a, b)→ R is integrable on (a, b), let H : [a, b]→ R be defined as:
H(x) =
ˆ x
ah(y) dy.
Then, H is continuous on [a, b]. Furthermore, if h is continuous at c ∈ (a, b), then H is
differentiable at c with H ′(c) = h(c).
2. Suppose that H(y) is continuous on [a, b] and differentiable in (a, b). Suppose furthermore
that its derivative H ′(y) is integrable on (a, b). Then:
ˆ b
aH ′(y) dy = H(b)−H(y).
Proposition 1.8.
1. If f differs from an integrable function in finitely many points, then f is also integrable,
2. If f is continuous on [a, b], then it is integrable,
3. If f is monotone on [a, b], then it is integrable,
4. If f : (a, b)→ R is continuous and bounded on (a, b), then it is integrable on (a, b).
There are some problems with taking limits with Darboux/Riemann integral. For example,
the function g : [0, 1]→ R defined as:
g(x) =
0 if x ∈ Q,
1 if x ∈ Q,
differs from the integrable constant 0 (or 1) function at countably infinite many points, so it
is a limit of a sequence of functions which differ from 0 at finitely many points which are all
integrable by Proposition 1.8. However, this function is not integrable because regardless of the
partitions used, we would always have Lg,P = 0 and Ug,P = 1, so the lower and upper Darboux
integral will never be equal.
Another example is the following: for n = 1, 2, . . ., let fn : [0, 1] → R be defined as the
sequence of functions:
fn(x) =
n if x ∈ [0, 1n ],
0 otherwise.
However, we have:
limn→∞
ˆ 1
0fn dx = lim
n→∞1 = 1,
ˆ 1
0limn→∞
fn dx =
ˆ 1
00 dx = 0.
However, not all is lost. By placing stronger hypothesis, we have some limits/convergence
results:
4
Proposition 1.9.
1. Suppose that there is a sequence of functions fn : [a, b] → R which converges uniformly
to a function f that is fnu−→ f . Suppose that all of fn are integrable in [a, b], then f is
also integrable with:
limn→∞
ˆ b
afn dx =
ˆ b
alimn→∞
fn dx =
ˆ b
af dx.
2. Suppose that φn : [a, b]→ R are integrable functions and |φn| ≤Mn with∑∞
n=1Mi <∞.
Then the sum∑∞
n=1 φn is integrable and:
∞∑n=1
ˆ b
aφn dx =
ˆ b
a
∞∑n=1
φn dx.
3. Suppose that fn : [a, b]→ R is a sequence of functions that are continuously differentiable
in (a, b), fnpw−−→ f on [a, b] and f ′n
u−→ g where g : (a, b)→ R where g is a bounded function.
Then f is differentiable and f ′ = g, that is:
limn→∞
f ′n = ( limn→∞
fn)′.
Propositions 1.8 and 1.9 are state-of-the-art results in Riemann/Darboux integral, and they
are quite restrictive. Even with these limitations, the Riemann/Darboux integral is good enough
for some purposes.
However, there is the limitation of not being able to take limits (and hence differentiation,
infinite sums, ...) across the integral. The limitation here stems from the fact we build our
functions by approximating them with step functions which are derived from finite partitions of
the domain of the integration. This is important in the construction because the lengths of the
subintervals make sense and this the integral makes sense. For example, the function g which
takes the value 0 on the rationals and 1 otherwise cannot be approximated by a limit of step
functions.
If we enlarge the class of the approximating functions to some other functions which are
“integrable” in some other sense, we have more chance of integrating these more complicated
functions. We just need to approximate them by “simple” functions of the form:
φ =
n∑i=1
ci1Ei ,
where Ei are some sets of the domain on which we can assign sizes, just like the intervals. Then,
the approximating integral will be defined as:
I(φ) =
n∑i=1
ci|Ei|,
where |Ei| denotes the size of these sets. These sizes are called measure of Ei and these sets Ei
will be called measurable sets.
5
2 Measure
We first think about what a measure is. A measure should be defined on “certain subsets” of
our domain say X (for demonstration, we shall consider R) with images in [0,∞]. This is done
by demanding the measure satisfies certain reasonable properties. First we need to define the
domain of the measure. What are these “certain subsets”? They must be a subset of the power
set of X. Is it possible for the collection of subsets to coincide with the power set? We shall see
that for some domains, the power set is too big for the measure to make reasonable “physical”
sense. We denote 2X as the power set of the set X, that is the collection of all subsets of X.
Definition 2.1 (π-system). Let X be a domain. A non-empty collection of subsets of X is
called a π-system P ⊂ 2X if it is non-empty and closed under finite intersection, that is for an
A,B ∈ P, we have A ∩B ∈ P.
We have seen an example of a π-system during the construction of Riemann/Darboux inte-
gral. The collection of sets:
J = {intervals in [a, b] of the form (c, d] where a ≤ c ≤ d ≤ b}
forms a π-system. We can “measure” the sets in this π-system by defining the length of an
interval (c, d] as the non-negative number d − c ≥ 0. This length is called a content. Now we
generalise this collection further by allowing taking countable unions.
Definition 2.2 (Rings). Let X be a domain. A non-empty collection of subsets of X is called
a ring R ⊂ 2X if it is closed under union and set difference, that is for A,B ∈ R, we have:
1. A ∪B ∈ R,
2. A \B ∈ R.
These also imply that ∅ ∈ R and A ∩B = A \ (A \B) ∈ R.
Here are two facts about rings:
Lemma 2.3.
1. The intersection of finitely many rings⋂ni=1Ri of X is also a ring.
2. Let S ⊂ 2X be an arbitrary non-empty collection of sets of X. Then there exists exactly
one ring R(S) containing S and contained in every ring R containing S (that is R(S) is
the smallest ring containing S). We say this R(S) the ring generated by S.
From the lemma above, there exists a unique ring generated by J the set of intervals in
[a, b]. This ring R(J) is called the elementary sets. Again, there are no qualms about defining
the content on the sets in R(J). The content of union of finite number of disjoint intervals is
just the finite sum of their individual content. We say that the content on J extends to the
content on R(J).
6
Definition 2.4 (σ-rings). Let X be a domain. A non-empty collection of subsets of X is called
a σ-ring R ⊂ 2X if it is a ring which is closed under countable union, that is for Ai, B ∈ R, we
have:
1.⋃∞i=1Ai ∈ R,
2. A \B ∈ R.
These also imply that ∅ ∈ R and⋂∞i=1Ai ∈ R.
This is where the problem of defining contents start: how do we define the content of a
set made up of infinitely many pieces? The content is not suitable here since it is only finitely
additive. Also, how would we define the “length” of the intersection of infinitely many sets?
For good measure (no pun intended), we throw in the universal set and any other sets related
to it into the collection, which we define a σ-algebra. This will be natural collection of sets to
define measures on.
Definition 2.5 (σ-algebra). Let X be a domain. A non-empty collection of subsets of X is
called a σ-algebra (or σ-field) F ⊆ 2X if it is a σ-ring such that X itself is contained in F . That
is, for A,Ai, B ∈ F , we have:
1. X ∈ F ,
2.⋃∞i=1Ai ∈ F ,
3. A \B ∈ F .
These also imply that ∅ ∈ F and⋂∞i=1Ai ∈ F .
Remark 2.6. Alternatively, instead of requiring A \ B ∈ F for all A,B ∈ F , it is enough to
have that X \C ∈ F for any C ∈ F since for any A,B ∈ F , we have A \B = X \ (B ∪ (X \A))
Similar to rings, we have the following Lemma for σ-algebras.
Lemma 2.7. Let S ⊂ 2X be an arbitrary non-empty collection of sets of X. Then there exists
exactly one σ-algebra F(S) containing S and contained in every σ-algebra F containing S (that
is F(S) is the smallest σ-algebra containing S). We say this F(S) the σ-algebra generated by
S.
Thus for a given set X, we have the inclusion:
π-system ⊆ σ-ring ⊆ σ-algebra ⊆ 2X
Note that 2X is also a σ-algebra of X because it satisfies all the defining properties of a σ-algebra.
We aim to generalise the notion of lengths from π-system to the largest collection of subsets
possible. In general, we cannot do this for the set of all subsets 2X , sadly. We shall see this in the
next section. However, in general, the best we can do is to define it to some smaller σ-algebra
of X by requiring a condition called the Caratheodory condition. We demand the measure on a
σ-algebra to satisfy certain reasonable “physical” properties. From these properties, we define:
7
Definition 2.8 (Measure). A measure µ is a function µ : F → [0,∞] such that:
1. µ(∅) = 0,
2. µ is non-negative, that is for any E ∈ F , we have µ(E) ≥ 0,
3. µ is σ-additive, that is for any countable collection of pairwise disjoint sets Ei ∈ F , we
have:
µ
( ∞⋃i=1
Ei
)=∞∑i=1
µ(Ei).
The sets in F are called µ-measurable sets.
Remark 2.9. The content, as mentioned earlier, is just like measure except instead of σ-
additive, we only have finite additivity for content, which is why it is well defined on rings but
not on algebras.
The pair (X,F) is called a measurable space. The triple (X,F , µ) is called a measure space.
A measure space satisfies these properties:
Proposition 2.10. Let (X,F , µ) be a measure space. Then:
1. If A ⊆ B where A,B ∈ F , then µ(A) ≤ µ(B).
2. If Ai ∈ F such that Ai ⊆ Ai+1 for all i = 1, 2, . . ., then µ(⋃∞i=1Ai) = limi→∞ µ(Ai).
3. If Ai ∈ F such that Ai ⊇ Ai+1 for all i = 1, 2, . . . and µ(A1) < ∞, then µ(⋂∞i=1Ai) =
limi→∞ µ(Ai).
Proof. For the first one, since B = A ∪ (A \B), we have µ(B) ≥ µ(A).
For the second one, define A′1 = A1 and A′i = Ai \ Ai−1 for all i = 2, 3, . . .. Then we have
that A′i are disjoint and An =⋃ni=1A
′i and
⋃∞i=1Ai =
⋃∞i=1A
′i. Thus:
µ
( ∞⋃i=1
Ai
)= µ
( ∞⋃i=1
A′i
)=
∞∑i=1
µ(A′i) = limi→∞
i∑k=1
µ(A′k) = limi→∞
µ(Ai).
Finally, for the third, define Bi = A1 \Ai. Then the sequence Bi is an increasing nested set.
Note that⋃∞i=1Bi = A1 \
⋂∞i=1Ai. Thus, using (2), we get:
limi→∞
µ(Bi) = µ
( ∞⋃i=1
Bi
)= µ
(A1 \
∞⋂i=1
Ai
)= µ(A1)− µ
( ∞⋂i=1
Ai
), (1)
while on the other hand since A1 = Ai ∪ Bi for all i and Ai and Bi are disjoint, we have
µ(A1) = µ(Bi) + µ(Ai) for all i = 1, 2, . . .. Hence (1) becomes:
limi→∞
(µ(A1)− µ(Ai)) = µ(A1)− µ
( ∞⋂i=1
Ai
),
using the fact that µ(A1) <∞ and rearrange the terms yield the result.
8
Most of the time, we are interested in σ-finite measures since this allows us to work locally
in order to get a global picture. The definition of σ-finite measure is:
Definition 2.11 (σ-finite measure). Let (X,F , µ) be a measure space. Then, the measure µ is
σ-finite if there exists countable subsets Ei ∈ F such that X ⊆ ∪∞i=1Ei and µ(Ei) < ∞ for all
i = 1, 2, . . ..
2.1 Caratheodory Extension Theorem
Let us construct a σ-algebra on the space of R by working on what we had known so far. This
idea can be extended to a general space easily. We begin with the π-system J made up of
intervals of the form (a, b]. The content m on the sets in J are defined as:
m : J → [0,∞],
(a, b] 7→ |b− a|.
This content m can be extended to the ring generated by J , which we call R(J), easily.
Now we want to extend this content to a bigger collection of set. Let’s be ambitious and try
to define it on all the sets of R. For any A ∈ 2R, we define the size of the set A by something
called an outer measure m∗:
m∗(A) = inf
{ ∞∑i=1
m(Ji) : Ji ∈ J s.t. A ⊆∞⋃i=1
Ji
}, (2)
essentially what we are doing is covering the set A with intervals and defining the outer measure
on A as the smallest possible contents of interval covers of A, which could also be infinity. The
outer measure m∗ satisfies the following properties:
Lemma 2.12. The outer measure m∗ : 2R → [0,∞] satisfies:
1. m∗(∅) = 0,
2. m∗(A) ≥ 0 for any A ∈ 2R,
3. m∗ is σ-subadditive, that is that is for any countable collection of pairwise disjoint sets
Ei ∈ 2R, we have:
m∗
( ∞⋃i=1
Ei
)≤∞∑i=1
m∗(Ei).
Proof. For the last property, if∑∞
i=1m∗(Ei) = ∞, we are done. Otherwise, if
∑∞i=1m
∗(Ei) <
∞, then m∗(Ei) <∞ for all i. Fix an arbitrary ε > 0. Then, by definition of the outer measure,
for each i = 1, 2, . . ., there exists a countable cover {J ji }∞j=1 of Ei such that:
∞∑j=1
m(J ji ) < m∗(Ei) +ε
2i,
9
thus, since⋃∞i=1{J
ji }∞j=1 forms a countable cover of
⋃∞i=1Ei, by definition of the outer measure,
we have:
m∗
( ∞⋃i=1
Ei
)≤∞∑i=1
∞∑j=1
m(J ji ) <
∞∑i=1
(m∗(Ei) +
ε
2i
)=
∞∑i=1
m∗(Ei) + ε,
and since ε > 0 is arbitrary, we are done.
The last property above implies that the outer measure on 2R is not a genuine measure in
the sense of Definition 2.8. In some other set, this would probably be enough, but for R, this is
not good. To recap:
J ⊆ ? ⊆ 2R
(content)→ m ? m∗ ← (outer measure)
which means that J is too small to be equipped with a measure while 2R is too big to be
equipped with a measure. So we need a σ-algebra somewhere in the middle, that is we need to
throw away some sets from 2R so that the outer measure becomes a measure on the resulting
set. This is where the Caratheodory condition comes in:
Definition 2.13 (Caratheodory condition). Consider X a domain and a σ-algebra G ⊂ 2X
equipped with an outer measure m∗. A subset E ∈ G is m∗-measurable if it satisfies the
Caratheodory condition:
m∗(F ) = m∗(F ∩ E) +m∗(F ∩ Ec) for all F ∈ G.
The subset of G which satisfies the Caratheodory conditions is denoted G∗.
Remark 2.14. Sometimes the Caratheodory condition is relaxed to simply requiring:
m∗(F ) ≥ m∗(F ∩ E) +m∗(F ∩ Ec) for all F ∈ G,
since the ≤ relation is trivially true by the σ-subadditivity of m∗.
An important result of this definition is that:
Theorem 2.15 (Caratheodory Extension Theorem). If X ∈ G, then G∗ is a σ-algebra of X
and m∗ is a measure on G∗.
Proof. We begin by showing that the set G∗ is a σ-algebra. Clearly ∅, X ∈ G∗. Now we want
to show that for any A,B ∈ G∗, we have A \B = A ∩Bc ∈ G∗, that is we require:
Goal: m∗(G) = m∗(G ∩A ∩Bc) +m∗(G ∩ (Ac ∪B)) for any G ∈ G.
Note that since A,B ∈ G∗, for any F ∈ G, we have:
m∗(F ) = m∗(F ∩A) +m∗(F ∩Ac), (3)
m∗(F ) = m∗(F ∩B) +m∗(F ∩Bc). (4)
10
Applying (4) to F = G ∩ A, we get m∗(G ∩ A) = m∗(G ∩ A ∩ B) + m∗(G ∩ A ∩ Bc).
Substituting this in (3) with F = G yields:
m∗(G) = m∗(G ∩A ∩B) +m∗(G ∩A ∩Bc) +m∗(G ∩Ac). (5)
Next, consider (3) with F = G ∩ (Ac ∪B). This gives us:
m∗(G ∩ (Ac ∪B) = m∗(G ∩ (Ac ∪B) ∩A) +m∗(G ∩ (Ac ∪B) ∩Ac)
= m∗(G ∩B ∩A) +m∗(G ∩Ac),
by using distributivity of the set operations. Substituting this in (5) yields the desired goal.
Note that A ∪B = X \ ((X \A) \B) ∈ G∗, thus any finite union of elements in G∗ also lies
in G∗.Next we want to prove closure under countable union. Let Ei all satisfy the Caratheodory
condition, that is Ei ∈ G∗. WLOG assume further that they are all pairwise disjoint. Fix
n <∞. Then,⋃ni=1Ei is also m∗-measurable and for any F ∈ G we have :
m∗(F ) = m∗
(F ∩
(n⋃i=1
Ei
))+m∗
(F ∩
(n⋃i=1
Ei
)c), (6)
Furthermore, since En ∈ G∗, we have:
m∗
(F ∩
(n⋃i=1
Ei
))= m∗
(F ∩
(n⋃i=1
Ei
)∩ En
)+m∗
(F ∩
(n⋃i=1
Ei
)∩ Ecn
)
= m∗(F ∩ En) +m∗
(F ∩
(n−1⋃i=1
Ei
))= . . . =
n∑i=1
m∗(F ∩ Ei),
by using the fact that all the Ei are pairwise disjoint and induction on n. Substituting this in
(6), we have:
m∗(F ) =n∑i=1
m∗(F ∩ Ei) +m∗
(F ∩
(n⋃i=1
Ei
)c)
≥n∑i=1
m∗(F ∩ Ei) +m∗
(F ∩
( ∞⋃i=1
Ei
)c).
Taking the limit as n→∞ and using the σ-subadditivity of m∗, we get:
m∗(F ) ≥∞∑i=1
m∗(F ∩ Ei) +m∗
(F ∩
( ∞⋃i=1
Ei
)c)
≥ m∗( ∞⋃i=1
(F ∩ Ei)
)+m∗
(F ∩
( ∞⋃i=1
Ei
)c)= m∗
(F ∩
( ∞⋃i=1
Ei
))+m∗
(F ∩
( ∞⋃i=1
Ei
)c),
which implies Caratheodory condition from Remark 2.14. Thus G∗ is a σ-algebra. Furthermore,
since this is in fact an equality, we must have:
∞∑i=1
m∗(F ∩ Ei) = m∗
(F ∩
( ∞⋃i=1
Ei
))for any F ∈ G.
11
So, by setting F =⋃∞i=1Ei ∈ G, we have:
∞∑i=1
m∗(Ei) = m∗
( ∞⋃i=1
Ei
),
for arbitrary disjoint sets Ei ∈ G∗. Thus m∗ is σ-additive on G∗ and hence a measure on G∗.
One of the most obvious sets that satisfy the Caratheodory condition are the m∗-null sets,
which are the sets of measure 0 under m∗.
Lemma 2.16.
1. Suppose that Ei is a collection of countably many m∗-null sets. Then,⋃∞i=1Ei is also
m∗-null.
2. If E is m∗-null, then E ∈ G∗.
Proof. The first is clearly true by the σ-subadditivity property of m∗.
The second, we have to show that E satisfies the Caratheodory condition. Pick an arbitrary
F ∈ G, then note that F ⊃ F ∩Ac for any A ⊂ X. Thus, by setting A = E, we have:
m∗(F ) ≥ m∗(F ∩ Ec). (7)
On the other hand, m∗(F ∩E) ≤ m∗(E) = 0. Thus, (7) is really m∗(F ) ≥ m∗(F ∩Ec)+m∗(F ∩E), which is the Caratheodory condition.
2.2 Lebesgue Space
Going back to our construction, from the set of intervals J with content m, we constructed an
outer measure m∗ on 2R. To turn this outer measure m∗ into a genuine measure, we discard
the sets which do not satisfy the Caratheodory condition from 2R so that m∗ restricted to the
remaining set, which is a σ-algebra denoted as L, is a measure. We call m∗|L = µ.
Furthermore, the measure µ restricted to J is simply the content of the set. The space
(R,L, µ) is called the Lebesgue space. One property of the Lebesgue space, from Lemma 2.16,
is it is a complete measure space:
Definition 2.17 (Complete measure space). A measure space (X,F , µ) is called a complete
measure space if whenever E ∈ F is such that µ(E) = 0, then any subset of E is also in F .
The Lebesgue space L is not all of 2R. Though it is very difficult to construct a set which
is not in L. Here is an example of a subset of R which is not measurable:
Example 2.18. The Lebesgue measure is invariant under translation, that is for any E ∈ L,
we have µ(E + x) = µ(E) for any x ∈ R.
Consider the interval [0, 1] and define an equivalence relation on [0, 1] by: x ∼ y iff x−y ∈ Q.
Divide the interval [0, 1] into equivalent classes and by Axiom of Choice, choose a representative
from each class. Take the union of these representatives to form a subset A ⊂ [0, 1].
12
This set A is not Lebesgue measurable. Suppose it is, then µ(A) ≥ 0 is a constant. The
rational numbers in [−1, 1] is countable, so order them r1, r2, . . .. Consider the sets ri + A
for i = 1, 2, . . .. These sets are disjoint by construction and µ(ri + A) = µ(A). Furthermore,
[0, 1] ⊆⋃∞i=1(ri +A) ⊆ [−1, 2], thus by σ-additivity we have:
1 ≤∞∑i=1
µ(ri +A) =∞∑i=1
µ(A) ≤ 3.
However, this yields a contradiction since µ(A) ≥ 0, we either have µ(A) = 0 or µ(A) > 0.
The former implies that∑∞
i=1 µ(A) = 0 ≥ 1 while the latter implies that an infinite sum of
constant positive terms is bounded by 3, both of which are absurd. Hence, A is not Lebesgue
measurable.
By the above construction, we have:
Lemma 2.19. Every set in L with positive measure contains a non-Lebesgue measurable set.
2.3 Borel Space
Of course, there are other ways to construct a σ-algebras from a given ring. Recall Lemma
2.7. Given a collection of sets S ⊂ 2X , we can construct the σ-algebra generated by S, which
is called F(S). This is the opposite way from how we do the construction previously: we add
subsets to the collection S until it becomes a σ-algebra (in the previous, we discard sets from
2X until it becomes measurable with the outer measure m∗).
For X = R, our previous method yields the Lebesgue space L. This other method yields
Borel space B. A Borel space is the σ-algebra on R generated by open sets in R. Via construc-
tion, we can expect that the Borel space is smaller than Lebesgue space. This is because the
Borel space, by definition, is the smallest σ-algebra that contains open sets and L is a σ-algebra
that contains open sets, so B ⊆ L. In fact, we have:
Lemma 2.20.
1. B ⊂ L that is any Borel measurable set is Lebesgue measurable and there are Lebesgue
measurable sets which are not Borel measurable.
2. If E ∈ L, then there are A,B ∈ B ⊂ L such that A ⊂ E ⊂ B with µ(B\E) = µ(E\A) = 0.
That is L and B differ by Lebesgue measure 0 subsets.
3. L is a completion of B.
An interesting lemma is the following:
Lemma 2.21. A strictly increasing homeomorphism on an interval maps Borel sets to Borel
sets.
13
Proof. The function f is a strictly increasing homeomorphism, so it has a continuous inverse.
It is easy to check that for any continuous function g the set:
A = {E : g−1(E) ∈ B},
is a σ-algebra containing open sets. Thus, A ⊃ B by definition of B. Taking g = f−1, we have:
B ⊂ A = {E : f(E) ∈ B},
which yields the result.
2.4 Cantor Set
One important example in measure theory is the Cantor set C. The Cantor set is constructed
iteratively from the interval [0, 1] by removing the open middle third from this segment, and
removing the open middle third of all the remaining two segments, and removing the open
middle third of all the remaining four segments, and removing the open middle third of all the
remaining eight segments, and. . . ad infinitum. The explicit formula for C is:
C = [0, 1] \∞⋃n=1
2n−1−1⋃k=0
(3k + 1
3n,3k + 2
3n
).
n=0
n=1
n=2...
......
...
Furthermore, in this form, we see that C is a closed set in [0, 1]. It has Lebesgue measure 0.
Lemma 2.22. The Lebesgue measure of C is 0, that is µ(C) = 0.
Proof. We note that at each iteration of the construction, the remaining intervals form a finite
cover of the set C. That is [0, 1] is a cover, [0, 13 ]∪ [23 , 1] is a cover, et cetera. In general for every
n ∈ N the following set is a cover for C:
2n−1⋃i=0
[3k + 0
3n,3k + 1
3n
]∪[
3k + 2
3n,3k + 3
3n
].
Thus, µ(C) ≤(23
)nfor all n ∈ N. Thus, taking the limit as n→∞, we have µ(C) = 0.
Another construction of the Cantor set is to consider the ternary expansion of numbers in
the interval [0, 1]. Note ternary expansion is the expansion of a number x in this interval as
x =∑∞
i=1ai3i
= 0.a1a2a3 . . . (base 3) where ai ∈ {0, 1, 2} and it does not end with ak = 2 for
all k ≥ N for some N ∈ N. By doing the first iteration of the step, we are removing all the
numbers in [0, 1] with ternary expansion of the form 0.1 . . . (base 3), during the second step, we
14
are removing all numbers of the form 0.01 . . . (base 3) and 0.21 . . . (base 3) from the remaining
set. Iteratively, by construction, we are removing any numbers with 1 appearing in its ternary
expansion. So any element of C does not have 1 in its ternary expansion. By Cantor diagonal
argument, the number of elements in C is uncountable.
Definition 2.23 (Cantor staircase). The Cantor staircase is a function C : [0, 1]→ [0, 1] defined
iteratively as the limit of the sequence of functions Cn : [0, 1] → [0, 1] constructed as follows:
Let C0(x) = x. For every integer n ∈ N, the function Cn+1(x) is be defined in terms of Cn(x)
as:
Cn+1(x) =
Cn(3x)
2 if x ∈[0, 13],
12 if x ∈
[13 ,
23
],
1+Cn(3x−2)2 if x ∈
[23 , 1].
13
23
79
89
19
29
14
12
34
From this construction, one may check that the convergence is uniform. Indeed, for each
n ∈ N,by splitting into the three different regions, we have:
maxx∈[0,1]
|Cn+1(x)− Cn(x)| ≤ 1
2maxx∈[0,1]
|Cn(x)− Cn−1(x)|,
so iteratively, for every n ∈ N, we have:
maxx∈[0,1]
|C(x)− Cn(x)| ≤ 2
2nmaxx∈[0,1]
|C1(x)− C0(x)| = 1
3 · 2n,
which proves uniform convergence. And since all Cn are continuous, C(x) is continuous.
Furthermore for an arbitrary x /∈ C, since C is closed, there exist an open ball Bε(x) ⊂ Cc.Furthermore, there exists some N ∈ N such that the sequence {fn(y) : y ∈ Bε(x))}∞n=N is
constant 2k+12N
for some k ∈ {0, 1, , . . . , N}. Thus, C ′(x) = 0 for x ∈ Cc which implies that
the derivative of C vanishes almost everywhere. So, this Cantor staircase is constant almost
everywhere but increases from 0 to 1. Strange!
Example 2.24. Now let us construct a Lebesgue measurable set which is not Borel measurable.
Consider the function f(x) = C(x) + x for x ∈ [0, 1]. Then this function satisfies:
15
1. f is continuous and strictly increasing.
2. f ′ = 1 almost everywhere.
3. f−1 exists, since it is strictly increasing and continuous, it is onto its image, which is [0, 2].
4. f−1 is continuous (use topological argument).
The function f maps intervals in [0, 1] \ C to intervals in [0, 2] of the same length. Indeed,
suppose (a, b) ⊂ [0, 1] \ C, then C(a) = C(b) and hence µ(f((a, b))) = µ((f(a), f(b))) = f(b) −f(a) = C(b)+b−C(a)−a = b−a. Thus, the the image of [0, 1]\C has measure µ(f([0, 1]\C)) =
µ([0, 1] \ C) = 1 since µ(C) = 0.
However, since [0, 2] = f(C) ∪ f([0, 1] \ C), taking their measure, we have 2 = µ(f(C)) + 1
which implies that the image of C has measure 1. So the map f stretches the null set into a set
of positive measure. Thus, from Lemma 2.19, we conclude that there exists a non-measurable
set in f(C), which we call N /∈ L.
Note that N ⊂ f(C), thus f−1(N) ⊂ C so the inverse of N has measure 0 and hence
f−1(N) ∈ L. However, f−1(N) is not Borel. Indeed, suppose it is, then by Lemma 2.3, we have
f(f−1(N)) = N ∈ B. since B ⊆ L and N by choice is not in L, we get a contradiction.
3 Measurable Functions
Since we have properly defined measures and subsets of domains which can be measured, we now
proceed to define functions which we want to integrate. These functions are called measurable
functions. A remark here is that the measurable functions do not depend on the measures
themselves, just the underlying σ-algebra on the domain X. We define it as:
Definition 3.1 (Measurable functions). Let (X,F) and (Y, E) be measurable spaces. The map
f : X → Y is measurable if for any E ∈ E , its preimage under f is in F , that is f−1(E) ∈ F .
To put emphasis on the dependence on E and F , we sometimes write f : (X,F)→ (Y, E).
In particular, we have:
Definition 3.2. Let (X,F) be a measurable space. The map f : X → R is F-measurable if for
any E ∈ B, its preimage under f is in F , that is f−1(E) ∈ F .
Some properties:
Proposition 3.3. Let (X,F) be a measurable space. Then:
1. Let A ⊂ X, then 1A is measurable iff A ∈ F .
2. Let f : X → R be F-measurable and g : R→ R be Borel measurable. Then g ◦f : X → Ris F-measurable,
3. If a ∈ R and f, g : X → R are F-measurable, then so are af, f ± g, fg, f/g (if g 6= 0),
max(f, g),min(f, g), f+, f− and |f |, where f+ = max(f, 0) and f− = max(−f, 0) .
16
We can extend the definition of F-measurable functions to functions with image ±∞.
Definition 3.4. A function f : X → [−∞,∞] is F-measurable if {f =∞} and {f = −∞} are
measurable and if for any E ∈ B, its preimage under f is in F , that is f−1(E) ∈ F .
Measurable functions behave in a nice manner under limits. Thus, they are excellent candi-
dates of functions for integration. Recall these definitions:
Definition 3.5 (Limit inferior and limit superior). Let (xn)∞n=1 be a sequence of real numbers.
We define the limit inferior of this sequence as the infimum of the set of limit points of the
subsequences of (xn). More explicitly, it is defined as:
lim infn→∞
xn = limn→∞
(infm≥n
xm
)= sup
n≥0
(infm≥n
xn
).
Similarly, the limit superior of this sequence is the supremum of the set of limit points of
the subsequences of (xn). More explicitly, it is defined as:
lim supn→∞
xn = limn→∞
(supm≥n
xm
)= inf
n≥0
(supm≥n
xn
).
Lemma 3.6. Here are some properties of the limit inferior and limit superior of sequences (xn)
and (yn):
1. for the sequence (xn), we have inf xn ≤ lim infn→∞ xn ≤ lim supn→∞ xn ≤ supxn,
2. the sequence (xn) converges if and only if lim supn→∞ xn = lim infn→∞ xn and this com-
mon value is the limn→∞ xn,
3. the limit superior satisfies finite subadditivity, that is whenever all the terms are defined,
we have:
lim supn→∞
(xn + yn) ≤ lim supn→∞
xn + lim supn→∞
yn
4. the limit inferior satisfies finite superadditivity, that is whenever all the terms are defined,
we have:
lim infn→∞
(xn + yn) ≥ lim infn→∞
xn + lim infn→∞
yn
With the limit inferior and limit superior defined for sequences of real numbers defined, we
extend this pointwise to sequences of functions and we have:
Proposition 3.7. Let fn : X → R be a sequence of F-measurable functions. Then, supn fn,
infn fn, lim supn→∞ fn and lim infn→∞ fn are F-measurable. In particular, if limn→∞ fn exists
and equal to f , then f is F-measurable.
Now, we are ready to approximate the F-measurable functions by an analogue of step
functions we defined for Riemann/Darboux integral. We define simple functions:
17
Definition 3.8 (Simple functions). Let (X,F) be a measure space. Then a function φ : X → Ris called a simple function if there exists a finite n, some constants ci ∈ R and a collection of
measurable sets Ei ∈ F such that:
φ =n∑i=1
ci1Ei .
The following is a useful result that says any measurable functions can be approximated by
simple functions.
Proposition 3.9. Let f : X → [0,∞] be an F-measurable function. Then there is an increasing
sequence fn of simple functions such that fn ↑ f .
Proof. For each n ∈ N, we define:
fn =22n−1∑k=1
k
2n1E
(n)k
+ 2n1An ,
where:
E(n)k =
{x :
k
2n≤ f(x) ≤ k + 1
2n
}and An = {x : f(x) ≥ 2n}.
Thus, fn is a sequence of increasing simple functions and 0 ≤ f − fn ≤ 12n on {f < 2n} and
fn = 2n on {f ≥ 2n}. Therefore fn ↑ f as n→∞.
Another useful result is the almost everywhere property.
Proposition 3.10. Let (X,F , µ) be a measure space. We say that a property holds almost
everywhere (or a.e.) if the measure of the set such that the property does not hold is 0. Suppose
that (X,F) is a complete measure space. Then:
1. If f : X → [−∞,∞] is F-measurable and f = g a.e. (that is µ{x : f(x) 6= g(x)} = 0),
then g is also F-measurable.
2. If fn : X → [−∞,∞] is a sequence of F-measurable functions and fna.e.−−→ f , then f is
also F-measurable. Note fna.e.−−→ f means:
µ{x : fn(x) does not converge to a number} = µ{x : fn(x) is not a Cauchy sequence}
= µ
∞⋃k=1
∞⋂N=1
∞⋃m,n=N
{|fn − fm| ≥
1
k
} = 0.
4 Lebesgue Integration
Now we are in position to define Lebesgue integral. We assume two conditions here: the measure
space is complete and σ-finite. We first shall look at the space of non-negative functions.
18
4.1 Lebesgue Integration of Non-Negative Functions
As we did for the Riemann/Darboux integral, we define the integral of simple functions. Recall
that simple functions are functions φ of the form:
φ =
n∑i=1
ci1Ei .
where Ei ∈ F . Thus, the obvious integral for the simple functions is:
I(φ) =n∑i=1
ciµ(Ei).
Note that the integral over simple function is a linear operation, that is I(λφ + νϕ) =
λI(φ) + νI(ϕ). Thus, for a general non-negative F-measurable function f : X → [0,∞], we
define: ˆXf dµ = sup{I(φ) : φ is a simple function s.t. φ ≤ f}.
Another definition of the Lebesgue integration is via the improper integration. Suppose that
f : (X,F , µ) → [0,∞] is an F-measurable function. Then, we define its Lebesgue integral as
the improper Riemann integral:ˆXf dµ =
ˆ ∞0
µ{x ∈ X : f(x) ≥ t} dt.
This definition makes perfect sense because f is F-measurable, so the integrand on the RHS
is well defined for all t and the function
F : [0,∞]→ [0,∞]
t 7→ µ{x ∈ X : f(x) ≥ t}
is a monotone function and by Proposition 1.8, is Riemann integrable in the improper sense.
A non-negative function f is called Lebesgue integrable if´X f dµ < ∞. The space of
integrable functions is denoted L1(X,F , µ) or simply L1(X) if there is no confusion. From the
definition, these properties are clear:
1. if λ > 0, then´X λf dµ = λ
´X f dµ,
2. if 0 ≤ f ≤ g and f, g are both F-measurable functions, then´X f dµ ≤
´X g dµ,
3. if E,F ∈ F are disjoint measurable subsets of X and f ∈ L1(X), then´E∪F f dµ =´
E f dµ+´F f dµ.
These two other results need some work:
Proposition 4.1.
1. If f : X → [0,∞] is integrable, then µ{f =∞} = 0.
2. If f : X → [0,∞] is F-measurable and´X f dµ = 0, then f = 0 a.e. on X.
19
Proof. The proof uses Markov’s inequality which states that if f : X → [0,∞] is F-measurable
and for any λ > 0, we have:
µ{f ≥ λ} ≤ 1
λ
ˆXf dµ.
This inequality is proven by considering the function φ = λ1{f≥λ} ≤ f which implies that
λµ{f ≥ λ} =´{f≥λ} λ dµ =
´X φdµ ≤
´X f dµ which yields the inequality after rearrangement.
For the first, we know that´X f dµ <∞ a constant, thus taking the limit as λ→∞ yields
µ{f = ∞} = 0. For the second one, Markov’s inequality implies that µ{f ≥ λ} = 0 for every
λ > 0, so µ{f > 0} =∑∞
n=1 µ{f ≥ 1
n
}= 0.
A useful result for non-negative F-measurable functions is the Monotone Convergence The-
orem or the MCT:
Theorem 4.2 (Monotone Convergence Theorem, MCT). Let fn : X → [0,∞] be an increasing
sequence of non-negative F-measurable function and limn→∞ fn = f . Then, we have:
ˆXf dµ =
ˆX
limn→∞
fn dµ = limn→∞
ˆXfn dµ,
where this integral takes values in [0,∞].
Proof. We know that fn ≤ f . Thus, we have´X fn dµ ≤
´X f dµ for all n ∈ N. Taking the
supremum yields supn(´X fn dµ
)≤´X f dµ. Now we need to show the reverse inequality.
Consider a simple function φ =∑k
i=1 ci1Ei such that 0 ≤ φ ≤ f . Fix λ ∈ (0, 1) and consider
the set Bn = {x : fn(x) ≥ λφ(x)}. Thus, Bn is F-measurable and since fn is increasing,
Bn ⊆ Bn+1 for all n ∈ N. Furthermore, we have⋃∞n=1Bn = X since we have f(x) > λφ(x) and
fn(x)→ f(x) for all x ∈ X. Since λφ1Bn ≤ fn1Bn ≤ fn, we have
λ
ˆBn
φdµ ≤ˆXfn dµ, (8)
By definition of φ, we have:
ˆBn
φdµ =
k∑i=1
ciµ(Ei ∩Bn) −−−→n→∞
k∑i=1
ciµ(Ei) =
ˆXφdµ.
Thus, taking the limit as n→∞ in (8), we get:
λ
ˆXφdµ ≤ lim
n→∞fn dµ,
for all λ ∈ (0, 1). Taking the limit as λ→ 1 yields:
ˆXφdµ ≤ lim
n→∞fn dµ.
Since φ is an arbitrary simple function such that φ ≤ f , by definition of the Lebesgue integral
of f , we have the opposite inequality.
20
A direct corollary is that the Lebesgue integral is linear over the integrable functions, that
is if non-negative functions f, g are F-measurable functions and λ, ν ∈ R, we have:
ˆXλf + νg dµ = λ
ˆXf dµ+ ν
ˆXg dµ.
This is done by considering simple functions φn ↑ f and ϕn ↑ g, then φn + ϕn ↑ f + g and
by linearity of integration of simple functions, we have:
ˆXf + g dµ = lim
n→∞
ˆXφn + ϕn dµ = lim
n→∞
(ˆXφn dµ+
ˆXϕn dµ
)= lim
n→∞
ˆXφn dµ+ lim
n→∞
ˆXϕn dµ =
ˆXf dµ+
ˆXg dµ.
Since the Lebesgue integral behaves well under limits on non-negative functions, we can do
much more. For example, an infinite sum is defined as the limit of the finite sum as the number
of terms go to infinity. For example, if (fk) is a sequence of functions, we define the infinite
sum/series pointwise via the limits of partial sums Sn(x) =∑n
k=1 fk(x), that is:
∞∑k=1
fk(x) := limn→∞
n∑k=1
fk(x) = limn→∞
Sn(x).
If the functions in the series are all non-negative, then the partial sums Sn(x) are non-
negative and is increasing pointwise. Thus, applying MCT on the partial sums and noting that
the Lebesgue integral is linear for finite a finite sum of terms, we have:
Proposition 4.3 (Series of non-negative functions). Suppose that fn : X → [0,∞] is a sequence
of non-negative F-measurable functions. Then we have:
∞∑n=1
ˆXfn dµ =
ˆX
∞∑n=1
fn dµ.
In particular,∑∞
n=1 fn ∈ L1(X,µ) iff∑∞
n=1
´X fn dµ <∞.
A variant of this is to consider for a sequence of disjoint measurable subsets of E:
Proposition 4.4. Suppose that En is a sequence of measurable sets with En all pairwise disjoint
and E =⋃∞n=1En. Let f : X → [0,∞] be a non-negative F-measurable function. Then we
have: ˆEf dµ =
∞∑n=1
ˆEn
f dµ.
4.2 Lebesgue Integration of General Functions
In fact, the construction of integral generalises to F-measurable functions having image in
[−∞,∞], not just non-negative functions. This is done by breaking an arbitrary F-measurable
function into its positive and negative parts, that is:
f = f+ − f−,
21
where f+ = max(f, 0) and f− = max(−f, 0) are non-negative functions on X. Since these two
functions are also F-measurable,´X f
+ dµ and´X f− dµ are well defined (can also be infinite).
So, if at least one of the two integrals above is finite, we can define its Lebesgue integral as:ˆXf dµ =
ˆXf+ dµ−
ˆXf− dµ,
which takes values in [−∞,∞].
Definition 4.5 (Lebesgue integrable functions). If both of the integrals´X f
+ dµ and´X f− dµ
are finite, we call the function f = f+ − f− Lebesgue integrable. The space of Lebesgue
integrable functions over X is denoted as:
L1(X,F , µ) =
{f : f : X → [−∞,∞] is F-measurable and
ˆX|f | dµ =
ˆXf+ + f− dµ <∞
}.
Here are some direct consequences of the definition above:
Lemma 4.6. For F-measurable functions f, g : X → [−∞,∞], we have the following results:
1. the function f is Lebesgue integrable if and only if |f | is Lebesgue integrable,
2. if |g| ≤ f and f ∈ L1(X), then g ∈ L1(X),
3. if 0 ≤ g ≤ |f | and g /∈ L1(X), then f /∈ L1(X),
4. if f, g ∈ L1(X) and f ≤ g, then´X f dµ ≤
´X g dµ,
5. if Y ∈ F is a measurable subset of X and f ∈ L1(X), then f ∈ L1(Y ),
6. if f and g are both integrable, α, β ∈ R and αf +βg is defined, then the function αf +βg
is integrable and´X αf + βg dµ = α
´X f dµ+ β
´X g dµ,
7. if f ∈ L1(X) and f = g a.e., then g ∈ L1(X),
8. if f ∈ L1(X), then f ∈ R a.e.,
9. if f ∈ L1(X) and´X |f | dµ = 0, then f = 0 a.e.,
10. if f ∈ L1(X) and g : X → R is a bounded and F-measurable function, then fg ∈ L1(X).
5 Convergence Theorems
After looking at the monotone convergence theorem (MCT) for non-negative functions, we
want to extend this to a general family of functions. We can improve Theorem 4.2 to general
integrable functions in X (not necessarily non-negative functions):
Theorem 5.1 (General MCT). Let fn : X → R be an increasing sequence of integrable
functions and limn→∞ fn = f a.e. Suppose further that the set{´
X fn dµ}
is bounded above.
Then f ∈ L1(X,F , µ) and:ˆXf dµ =
ˆX
limn→∞
fn dµ = limn→∞
ˆXfn dµ.
22
Proof. Consider the sequence of non-negative functions defined by gn = fn−f1 for n ∈ N. This
sequence of functions is increasing. Apply Theorem 4.2 to this sequence of functions.
Since integrals can be thought of as limits of sums, the finite subadditivity of limit superior
and finite superadditivity of limit inferior, which we have seen in Lemma 3.6, carries forward
to Lebesgue integrals. These generalisations are called Fatou’s Lemmas.
Theorem 5.2 (Fatou’s Lemma). Suppose that fn : X → [0,∞] is a sequence of F-measurable
functions. Then: ˆX
lim infn→∞
fn dµ ≤ lim infn→∞
ˆXfn dµ.
In particular, if fn is a sequence of non-negative F-measurable functions and fna.e.−−→ f , then:
ˆXf dµ ≤ lim inf
n→∞
ˆXfn dµ.
Proof. Define gn(x) = infi≥n fi(x) for n ∈ N. Thus, gi ≤ fn for all i = n, n + 1, . . . and
gn ↑ lim infn→∞ fn. We apply Theorem 5.1 to the sequence gn to get:
limn→∞
ˆXgn dµ =
ˆX
limn→∞
gn dµ =
ˆX
lim infn→∞
fn dµ. (9)
On the other hand, gn ≤ fi for all i = n, n+1, . . ., so´X gn dµ ≤
´X fi dµ for i = n, n+1, . . ..
Thus, we have: ˆXgn dµ ≤ inf
i≥n
ˆXfi dµ. (10)
Thus, putting (9) and (10) together, we get the result.
Corollary 5.3 (Reverse Fatou’s Lemma). Suppose that fn : X → [0,∞] is a sequence of
F-measurable functions such that there exists a non-negative integrable function g such that
fn ≤ g for all n ∈ N. Then:
lim supn→∞
ˆXfn dµ ≤
ˆX
lim supn→∞
fn dµ.
Proof. Apply Fatou’s Lemma to the sequence of non-negative functions hn = g − fn and rear-
range the inequality. Note that lim inf(−f) = − lim sup(f)
Another powerful convergence theorems that do not require monotonicity of the terms in
the sequence are the Dominated/Bounded Convergence Theorem:
Theorem 5.4 (Dominated Convergence Theorem, DCT). Let fn : X → [−∞,∞] be a sequence
of F-measurable functions with limn→∞ fn = f a.e. on X. Suppose further that there exists
an integrable function g ∈ L1(X,F , µ) such that |fn(x)| ≤ g(x) a.e. on X for n ∈ N. Then:
1. fn and f are in L1(X,F , µ).
2. limn→∞´X fn dµ =
´X f dµ.
23
Proof. From Proposition 3.10, we have f is F-measurable. By comparison, since |fn(x)| ≤ g(x)
we have fn integrable for all n ∈ N and taking limits, we have f integrable as well. Apply
Fatou’s Lemma to the sequence of non-negative functions hn = g − fn, we getˆX
lim infn→∞
hn dµ ≤ lim infn→∞
ˆXhn dµ
⇒ˆXg − f dµ ≤ lim inf
n→∞
ˆX
(g − fn) dµ =
ˆXg dµ− lim sup
n→∞
ˆXfn dµ
⇒ lim supn→∞
ˆXfn dµ ≤
ˆXfdµ.
Repeating the process with the sequence kn = g + fn, we get:ˆXfdµ ≤ lim inf
n→∞
ˆXfn dµ.
So, we get the inequality:
lim supn→∞
ˆXfn dµ ≤
ˆXfdµ ≤ lim inf
n→∞
ˆXfn dµ.
However, since the last term is, by definition of lim sup and lim inf, actually smaller than
the first term, the whole inequality is an equality, so:
lim supn→∞
ˆXfn dµ =
ˆXfdµ = lim inf
n→∞
ˆXfn dµ,
which implies that limn→∞´X fn dµ =
´X f dµ.
The integrable function g here, called the dominating function, is a necessary condition. A
counter-example would be X = R and fn = 1(n−1,n]. We know that´X fn dµ = 1 for all n ∈ N
and fn → 0. But
1 = limn→∞
ˆXfn dµ while
ˆX
limn→∞
fn dµ = 0,
which do not agree. This is because there is no integrable function that dominates all of fn at
the same time. A corollary of this is the Bounded Convergence Theorem:
Corollary 5.5 (Bounded Convergence Theorem, BCT). Let (X,F , µ) be a finite measure
space, that is µ(X) < ∞. Let fn : X → [−∞,∞] be a sequence of F-measurable functions
with limn→∞ fn = f a.e. on X. Suppose further that there exists a constant K ∈ R such that
|fn(x)| ≤ K a.e. on X for n ∈ N. Then:
1. fn and f are in L1(X,F , µ).
2. limn→∞´X fn dµ =
´X f dµ.
Proof. Use integrable function K1X as the control function and apply the DCT.
Limits also appear implicitly in infinite sums and differentiations. Recall Proposition 4.3
for which we use the MCT to exchange the order of infinite sum and integral. By applying the
MCT this to the series of positive and negative parts separately, we have:
24
Proposition 5.6 (Series of functions). Suppose that fn : X → [∞,∞] is a sequence of F-
measurable functions. Then we have:
1. if∑∞
n=1
´X |fn| dµ < ∞, then the partial sums Sn(x) converges a.e. to an integrable
function,
2. if∑∞
n=1 |fn(x)| is integrable, then then the partial sums Sn(x) converges a.e. to an
integrable function .
In both cases, we have:∞∑n=1
ˆXfn dµ =
ˆX
∞∑n=1
fn dµ.
Another operation that requires a limiting process is the differentiation operation. Recall
that if f : R→ R is a function, we define its derivative at y0 ∈ R as:
df
dy(y0) = lim
h→0
f(y0 + h)− f(y0)
h.
If we have a function f : X ×R→ R in two variables, say f(x, y), we can fix the R variable
y and consider the function fy(x) = f(x, y) on X. If for each y ∈ R, the functions fy(x) are in
L1(X), we have a well-defined function F : R→ R given by:
F (y) =
ˆXf(x, y) dµ =
ˆXfy(x) dµ.
We can prove the continuity of this function under some mild assumptions.
Theorem 5.7. Let I ⊂ R and f : X × I → R be a function such that:
1. for each y ∈ I, the function fy are in L1(X),
2. for a.e. x ∈ X and every y0 ∈ I, we have limy→y0 f(x, y) = f(x, y0),
3. for each y0 ∈ I, there is an open subinterval J0 ⊂ I with y0 ∈ J0 and a function g0(x) ∈L1(X) such that for all y ∈ J0, we have |f(x, y)| ≤ g0(x) a.e. in x,
then F (y) is continuous on I.
Proof. Pick y0 ∈ I and let (yn) be a sequence of points in I such that yn → y0. For large enough
N , we must have that yn ∈ J0 for all n ≥ N . WLOG, we assume yn ∈ J0 for all n ∈ N. Then
(fyn(x)) is a sequence of functions from X to R. For this y0 we have an integrable dominating
function g0, then for all yn ∈ J0, we have |fyn(x)| ≤ g0(x) a.e. in x and f(x, yn)→ f(x, y), we
apply the DCT to get:
limyn→y
F (yn) = limyn→y0
ˆXfyn(x) dµ =
ˆX
limyn→y
f(x, yn) dµ =
ˆXf(x, y0) dµ = F (y0),
which proves the continuity of the function F at y0 ∈ I. Since y0 is arbitrarily chosen, this
proves the theorem.
25
Since F is continuous everywhere in I, we have some hope that the function F is differ-
entiable. We now want to find some conditions for which the derivative with respect to the y
variable commutes with the Lebesgue integral, that is:
dF
dy(y) =
ˆX
∂f
∂y(x, y) dµ.
We have the following theorem:
Theorem 5.8. Let I ⊂ R and f : X × I → R be a function such that:
1. for each y ∈ I, the function fy are in L1(X),
2. for each x ∈ X and y ∈ I, the partial derivative ∂f∂y (x, y) exists,
3. for each y0 ∈ I, there is an open subinterval J0 ⊂ I with y0 ∈ J0 and a function g0(x) ∈L1(X) such that for all y ∈ J0, we have |∂f∂y (x, y)| ≤ g0(x) a.e. in x,
then F (y) is differentiable on I and:
dF
dy(y) =
ˆX
∂f
∂y(x, y) dµ.
Proof. Pick y0 ∈ I and let (yn) be a sequence of points in I such that yn → y0 and yn 6= y0.
For large enough N , we must have that yn ∈ J0 for all n ≥ N . WLOG, we assume yn ∈ J0 for
all n ∈ N. Then (fyn(x)) is a sequence of functions from X to R. Define:
gn(x) =f(x, yn)− f(x, y0)
yn − y0.
Since yn 6= y0 for all n ∈ N, gn is a difference of two functions in L1(X), which implies that
gn(x) ∈ L1(X) for all n. Furthermore, since ∂f∂y (x, y) exists for all x ∈ X and y ∈ Y , we have
gn(x)→ ∂f∂y (x, y0) as n→∞.
Furthermore, by Mean Value Theorem, for each n, since f(x, y) is continuous and differen-
tiable with respect to the variable y in I, there exists some ξn ∈ (min(y0, yn),max(y0, yn)) ⊂ J0such that:
gn(x) =f(x, yn)− f(x, y0)
yn − y0=∂f
∂y(x, ξn).
For this y0 we have an integrable dominating function g0, then for all ξn ∈ J0, we have
|gn(x)| = |∂f∂y (x, ξn)| ≤ g0(x) a.e. in x and gn(x) = ∂f∂y (x, ξn) → ∂f
∂y (x, y0). Thus, we apply the
DCT to get:
F (yn)− F (y0)
yn − y0=
ˆX
f(x, yn)− f(x, y0)
yn − y0dµ =
ˆXgn(x) dµ −−−→
n→∞
ˆX
∂f
∂y(x, y0) dµ,
which proves the differentiability of the function F at y0 ∈ I. By uniqueness of limits, we have:
dF
dy(y0) =
ˆX
∂f
∂y(x, y0) dµ,
and since y0 is arbitrarily chosen, this proves the theorem.
26
6 Double Integrals
6.1 Product Measure
Let us extend the Lebesgue integral from two distinct measure spaces (X,F , µ1) and (Y,G, µ2)to their product space X ×Y . We first need to define a candidate for the measure µ on X ×Y .
We can clearly define the measure (or in this case, the content) on any sets in X × Y of
the rectangular form A × B ⊂ X × Y where A ∈ F and B ∈ G via m(A × B) = µ1(A)µ2(B).
The system of all rectangular sets form a π-system, which is not good enough for Lebesgue
integration.
The σ-algebra in X × Y that contains all the sets A × B for any A ∈ F and B ∈ G would
contain many more sets than these rectangular sets! We proceed to define an outer measure m∗
these all of the subsets of X × Y by the covering argument we have seen in Section 2.1. That
is, for any E ∈ 2X×Y , we define:
m∗(E) = inf
{ ∞∑i=1
µ1(Ai)µ2(Bi) : Ai ∈ F , Bi ∈ G s.t. E ⊆∞⋃i=1
Ai ×Bi
}, (11)
Similar to the construction in Section 2.1, this defines an outer measure, not a genuine
measure. Therefore, we proceed by removing the sets which does not satisfy the Caratheodory
condition (see Definition 2.13) from 2X×Y . The resulting collection of set, which we now call
H, is a σ-algebra and the outer measure restricted to this σ-algebra, which we now call µ, is a
genuine measure. We call this measure the product measure.
Furthermore, for any A ∈ F and B ∈ G, we have A×B ∈ H and µ(A×B) = µ1(A)µ2(B).
Moreover, if µ1 and µ2 are σ-finite measures on X and Y respectively, the product measure µ
is unique. Caratheodory Extension Theorem saves the day again!
6.2 Theorems by Tonelli and Fubini
Let (X,F , µ1) and (Y,G, µ2) be σ-finite measure spaces and (X × Y,H, µ) be their product
measure space. Now let us look at the theorem by Tonelli, which allows one to swap the order
of integration of non-negative functions under some very mild assumptions.
Theorem 6.1 (Tonelli’s theorem). Let f : X×Y → [0,∞] be a H-measurable function. Then:
1. the functions x 7→ f(x, y) is F-measurable for a.e. y ∈ Y ,
2. the functions x 7→´Y f(x, y) dµ2 is non-negative and F-measurable,
3. we have the equality:
ˆX×Y
f(x, y) dµ =
ˆX
(ˆYf(x, y) dµ2
)dµ1 =
ˆY
(ˆXf(x, y) dµ1
)dµ2.
We can extend this theorem to integrable functions by considering f+ and f− separately.
This is Fubini’s theorem:
27
Theorem 6.2 (Fubini’s theorem). Let f : X × Y → [−∞,∞] be Lebesgue integrable, that is
f ∈ L1(X × Y,H, µ). Then:
1. for almost all y ∈ Y , fy(x) ∈ L1(X), where fy(x) = f(x, y) is a function of x for a fixed y,
2. defining F (y) =´X fy(x) dµ1, we have that F ∈ L1(Y ), and the equality:
ˆYF (y) dµ2 =
ˆX×Y
f(x, y) dµ,
3. we have the equality:
ˆX×Y
f(x, y) dµ =
ˆX
(ˆYf(x, y) dµ2
)dµ1 =
ˆY
(ˆXf(x, y) dµ1
)dµ2.
Putting these two theorems together, we have the Fubini-Tonelli theorem:
Theorem 6.3 (Fubini-Tonelli theorem). Let f : X × Y → [−∞,∞] be H-measurable and
suppose that:
ˆX
(ˆY|f(x, y)| dµ2
)dµ1 <∞ or
ˆY
(ˆX|f(x, y)| dµ1
)dµ2 <∞.
Then, f is Lebesgue integrable, that is f ∈ L1(X × Y,H, µ).
28