8/8/2019 Probability Theory Presentation 07
1/56
BST 401 Probability Theory
Xing Qiu Ha Youn Lee
Department of Biostatistics and Computational BiologyUniversity of Rochester
September 22, 2009
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
2/56
Outline
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
3/56
Review of last lecture
The distribution function of a measure on R:
F(x) = ((, x]).
By definition, determines F.
On the other hand, ((a, b]) = F(b) F(a), which thendeterminesthe value of this measure on all Borel sets
through countable infinite set operations. (Carathodory
extension theorem).
Lebesgue-Stieltjes measures. Their distribution functions
are a) increasing; b) right-continuous. (Most literature
requires F to be non-negative as well).
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
4/56
Review of last lecture
The distribution function of a measure on R:
F(x) = ((, x]).
By definition, determines F.
On the other hand, ((a, b]) = F(b) F(a), which thendeterminesthe value of this measure on all Borel sets
through countable infinite set operations. (Carathodory
extension theorem).
Lebesgue-Stieltjes measures. Their distribution functions
are a) increasing; b) right-continuous. (Most literature
requires F to be non-negative as well).
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
5/56
Review of last lecture
The distribution function of a measure on R:
F(x) = ((, x]).
By definition, determines F.
On the other hand, ((a, b]) = F(b) F(a), which thendeterminesthe value of this measure on all Borel sets
through countable infinite set operations. (Carathodory
extension theorem).
Lebesgue-Stieltjes measures. Their distribution functions
are a) increasing; b) right-continuous. (Most literature
requires F to be non-negative as well).
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
6/56
Review of last lecture
The distribution function of a measure on R:
F(x) = ((, x]).
By definition, determines F.
On the other hand, ((a, b]) = F(b) F(a), which thendeterminesthe value of this measure on all Borel sets
through countable infinite set operations. (Carathodory
extension theorem).
Lebesgue-Stieltjes measures. Their distribution functions
are a) increasing; b) right-continuous. (Most literature
requires F to be non-negative as well).
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
7/56
Review of last lecture (II)
Compare to the Lebesgue measure: both are defined on
B; both are finite on bounded intervals;
L-S measure does not have to be uniform, i.e., ,
((a, b]) = b a in general. L-S measure may contain
discrete measures, or jump points of F. i.e., , for certainsingle point set {a}, ({a}) > 0.
Restriction of a measure.
Rn generalizations of distribution functions and L-S
measures.Basic definition of measurable functions. Sometimes called
coding functions.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
8/56
Review of last lecture (II)
Compare to the Lebesgue measure: both are defined on
B; both are finite on bounded intervals;
L-S measure does not have to be uniform, i.e., ,
((a, b]) = b a in general. L-S measure may contain
discrete measures, or jump points of F. i.e., , for certainsingle point set {a}, ({a}) > 0.
Restriction of a measure.
Rn generalizations of distribution functions and L-S
measures.Basic definition of measurable functions. Sometimes called
coding functions.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
9/56
Review of last lecture (II)
Compare to the Lebesgue measure: both are defined on
B; both are finite on bounded intervals;
L-S measure does not have to be uniform, i.e., ,
((a, b]) = b a in general. L-S measure may contain
discrete measures, or jump points of F. i.e., , for certainsingle point set {a}, ({a}) > 0.
Restriction of a measure.
Rn generalizations of distribution functions and L-S
measures.Basic definition of measurable functions. Sometimes called
coding functions.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
10/56
Review of last lecture (II)
Compare to the Lebesgue measure: both are defined on
B; both are finite on bounded intervals;
L-S measure does not have to be uniform, i.e., ,
((a, b]) = b a in general. L-S measure may contain
discrete measures, or jump points of F. i.e., , for certainsingle point set {a}, ({a}) > 0.
Restriction of a measure.
Rn generalizations of distribution functions and L-S
measures.Basic definition of measurable functions. Sometimes called
coding functions.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
11/56
Review of last lecture (II)
Compare to the Lebesgue measure: both are defined on
B; both are finite on bounded intervals;
L-S measure does not have to be uniform, i.e., ,
((a, b]) = b a in general. L-S measure may contain
discrete measures, or jump points of F. i.e., , for certainsingle point set {a}, ({a}) > 0.
Restriction of a measure.
Rn generalizations of distribution functions and L-S
measures.Basic definition of measurable functions. Sometimes called
coding functions.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
12/56
General measurable functions
We need two measure spaces (1,F1, 1) and(2,F2, 2).
For random variables, the first one is an arbitrary
probability space, the second one is a good measure
space, e.g., Lebesgue measure space of real numbers.
A function h : 1 2 is called measurable relative toF1,F2 if for every A in F2, its inverse h
1(A) ismeasurable. In mathematical notation:
h1
(A) F
1, A F
2. (1)
Borel measurable functions are real functions (i.e.,
f(x) : R R) which are measurable relative to the Borelsets of the two Rs.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
13/56
General measurable functions
We need two measure spaces (1,F1, 1) and(2,F2, 2).
For random variables, the first one is an arbitrary
probability space, the second one is a good measure
space, e.g., Lebesgue measure space of real numbers.
A function h : 1 2 is called measurable relative toF1,F2 if for every A in F2, its inverse h
1(A) ismeasurable. In mathematical notation:
h1
(A) F
1, A F
2. (1)
Borel measurable functions are real functions (i.e.,
f(x) : R R) which are measurable relative to the Borelsets of the two Rs.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
14/56
General measurable functions
We need two measure spaces (1,F1, 1) and(2,F2, 2).
For random variables, the first one is an arbitrary
probability space, the second one is a good measure
space, e.g., Lebesgue measure space of real numbers.
A function h : 1 2 is called measurable relative toF1,F2 if for every A in F2, its inverse h
1(A) ismeasurable. In mathematical notation:
h1
(A) F
1, A F
2. (1)
Borel measurable functions are real functions (i.e.,
f(x) : R R) which are measurable relative to the Borelsets of the two Rs.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
15/56
General measurable functions
We need two measure spaces (1,F1, 1) and(2,F2, 2).
For random variables, the first one is an arbitrary
probability space, the second one is a good measure
space, e.g., Lebesgue measure space of real numbers.
A function h : 1 2 is called measurable relative toF1,F2 if for every A in F2, its inverse h
1(A) ismeasurable. In mathematical notation:
h
1
(A) F
1, A F
2. (1)
Borel measurable functions are real functions (i.e.,
f(x) : R R) which are measurable relative to the Borelsets of the two Rs.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
16/56
-algebra generated by a measurable function
A measurable function (random variable) can generate a
-algebra of in this way
(h) :=
h1
(B) : B B
.
where B denotes the Borel -algebra.
It is not hard to show that (h) F.
In general, (h) F
and there are some information lostin this process.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
17/56
-algebra generated by a measurable function
A measurable function (random variable) can generate a
-algebra of in this way
(h) :=
h1
(B) : B B
.
where B denotes the Borel -algebra.
It is not hard to show that (h) F.
In general, (h) F
and there are some information lostin this process.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
18/56
-algebra generated by a measurable function
A measurable function (random variable) can generate a
-algebra of in this way
(h) :=
h1
(B) : B B
.
where B denotes the Borel -algebra.
It is not hard to show that (h) F.
In general, (h) F
and there are some information lostin this process.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
19/56
A binomial example
A binomial example. Toss a coin for three times. = {H, T}3. Define X() : R to be the numberof heads. What is the -algebra generated by X? Is setA = {(H, H, H), (H, T, H)} a member of this -algebra?
So (X) is coarser than 2. From this aspect, some
information is lost.
Another interpretation: if we know a particular observation
, we can compute X. But if we only know X, we cant besure what might be.
However, X is the minimal sufficient statistic of a Binomialmodel. In other words, it captures all the information which
is relevant to the unknown parameter.
I will sent you some slides on this topic later.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
20/56
A binomial example
A binomial example. Toss a coin for three times. = {H, T}3. Define X() : R to be the numberof heads. What is the -algebra generated by X? Is setA = {(H, H, H), (H, T, H)} a member of this -algebra?
So (X) is coarser than 2. From this aspect, some
information is lost.
Another interpretation: if we know a particular observation
, we can compute X. But if we only know X, we cant besure what might be.
However, X is the minimal sufficient statistic of a Binomialmodel. In other words, it captures all the information which
is relevant to the unknown parameter.
I will sent you some slides on this topic later.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
21/56
A binomial example
A binomial example. Toss a coin for three times. = {H, T}3. Define X() : R to be the numberof heads. What is the -algebra generated by X? Is setA = {(H, H, H), (H, T, H)} a member of this -algebra?
So (X) is coarser than 2. From this aspect, some
information is lost.
Another interpretation: if we know a particular observation
, we can compute X. But if we only know X, we cant besure what might be.
However, X is the minimal sufficient statistic of a Binomialmodel. In other words, it captures all the information which
is relevant to the unknown parameter.
I will sent you some slides on this topic later.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
22/56
A binomial example
A binomial example. Toss a coin for three times. = {H, T}3. Define X() : R to be the numberof heads. What is the -algebra generated by X? Is setA = {(H, H, H), (H, T, H)} a member of this -algebra?
So (X) is coarser than 2. From this aspect, some
information is lost.
Another interpretation: if we know a particular observation
, we can compute X. But if we only know X, we cant besure what might be.
However, X is the minimal sufficient statistic of a Binomialmodel. In other words, it captures all the information which
is relevant to the unknown parameter.
I will sent you some slides on this topic later.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
23/56
A binomial example
A binomial example. Toss a coin for three times. = {H, T}3. Define X() : R to be the numberof heads. What is the -algebra generated by X? Is setA = {(H, H, H), (H, T, H)} a member of this -algebra?
So (X) is coarser than 2. From this aspect, some
information is lost.
Another interpretation: if we know a particular observation
, we can compute X. But if we only know X, we cant besure what might be.
However, X is the minimal sufficient statistic of a Binomialmodel. In other words, it captures all the information which
is relevant to the unknown parameter.
I will sent you some slides on this topic later.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
24/56
Indicators and simple functions (I)
We would like to study the properties of all random
variables/Borel measurable functions. Again, let us start with
the most trivial building blocks.
Indicator function: 1A() =
1 for A,
0 for / A.is called the
indicatorfunction of A.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
25/56
Indicators and simple functions (II)
If A is a Borel set, 1A is a Borel-measurable function. vice
versa.
1A takes only two values, 0, 1.
If h a) is Borel-measurable; b) takes only finitely manyvalues, h is called a simple function.
Equivalently, h is the finite sum of indicator functions:
h() =
ri=1 xi1Ai, Ais are disjoint Borel sets.
+,,, / of simple functions are simple functions.
Remark: step functions are simple functions, but simple
functions may not be step functions!
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
26/56
Indicators and simple functions (II)
If A is a Borel set, 1A is a Borel-measurable function. vice
versa.
1A takes only two values, 0, 1.
If h a) is Borel-measurable; b) takes only finitely manyvalues, h is called a simple function.
Equivalently, h is the finite sum of indicator functions:
h() =
ri=1 xi1Ai, Ais are disjoint Borel sets.
+,,, / of simple functions are simple functions.
Remark: step functions are simple functions, but simple
functions may not be step functions!
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
27/56
Indicators and simple functions (II)
If A is a Borel set, 1A is a Borel-measurable function. vice
versa.
1A takes only two values, 0, 1.
If h a) is Borel-measurable; b) takes only finitely many
values, h is called a simple function.
Equivalently, h is the finite sum of indicator functions:
h() =
ri=1 xi1Ai, Ais are disjoint Borel sets.
+,,, / of simple functions are simple functions.
Remark: step functions are simple functions, but simple
functions may not be step functions!
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
28/56
Indicators and simple functions (II)
If A is a Borel set, 1A is a Borel-measurable function. vice
versa.
1A takes only two values, 0, 1.
If h a) is Borel-measurable; b) takes only finitely many
values, h is called a simple function.
Equivalently, h is the finite sum of indicator functions:
h() =
ri=1 xi1Ai, Ais are disjoint Borel sets.
+,,, / of simple functions are simple functions.
Remark: step functions are simple functions, but simple
functions may not be step functions!
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
29/56
Indicators and simple functions (II)
If A is a Borel set, 1A is a Borel-measurable function. vice
versa.
1A takes only two values, 0, 1.
If h a) is Borel-measurable; b) takes only finitely many
values, h is called a simple function.
Equivalently, h is the finite sum of indicator functions:
h() =
ri=1 xi1Ai, Ais are disjoint Borel sets.
+,,, / of simple functions are simple functions.
Remark: step functions are simple functions, but simple
functions may not be step functions!
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
30/56
Indicators and simple functions (II)
If A is a Borel set, 1A is a Borel-measurable function. vice
versa.
1A takes only two values, 0, 1.
If h a) is Borel-measurable; b) takes only finitely many
values, h is called a simple function.
Equivalently, h is the finite sum of indicator functions:
h() =
ri=1 xi1Ai, Ais are disjoint Borel sets.
+,,, / of simple functions are simple functions.
Remark: step functions are simple functions, but simple
functions may not be step functions!
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
31/56
Borel-measurable functions are limits of simple
functions
Point-wise convergence: hn() h() for all . Wecan simply say h is the limit of hn.
Ashs book, Thm 1.5.5: All Borel-measurable function arelimits of simple functions. Use a figure to illustrate this
point.
Ashs book, Thm 1.5.4: limits of sequence of
Borel-measurable functions are again Borel-measurablefunctions. Analogy: set limits of Borel sets are Borel sets.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
32/56
Borel-measurable functions are limits of simple
functions
Point-wise convergence: hn() h() for all . Wecan simply say h is the limit of hn.
Ashs book, Thm 1.5.5: All Borel-measurable function arelimits of simple functions. Use a figure to illustrate this
point.
Ashs book, Thm 1.5.4: limits of sequence of
Borel-measurable functions are again Borel-measurablefunctions. Analogy: set limits of Borel sets are Borel sets.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
33/56
Borel-measurable functions are limits of simple
functions
Point-wise convergence: hn() h() for all . Wecan simply say h is the limit of hn.
Ashs book, Thm 1.5.5: All Borel-measurable function arelimits of simple functions. Use a figure to illustrate this
point.
Ashs book, Thm 1.5.4: limits of sequence of
Borel-measurable functions are again Borel-measurablefunctions. Analogy: set limits of Borel sets are Borel sets.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
34/56
Properties of Borel-measurable functions
The set of Borel measurable functions are a) closed under
point-wise limit operation; b) generated by just simple
functions through the (pointwise) limit operation.
R, Q analogy again. R is closed under the limit operation.
R can be generated by its dense subset Q.
The set of Borel-measurable functions is closed under
+,,, /, and function composition because step
functions are closed under these operations.
Qiu, Lee BST 401
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
35/56
Properties of Borel-measurable functions
The set of Borel measurable functions are a) closed under
point-wise limit operation; b) generated by just simple
functions through the (pointwise) limit operation.
R, Q analogy again. R is closed under the limit operation.R can be generated by its dense subset Q.
The set of Borel-measurable functions is closed under
+,,, /, and function composition because step
functions are closed under these operations.
Qiu, Lee BST 401
P i f B l bl f i
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
36/56
Properties of Borel-measurable functions
The set of Borel measurable functions are a) closed under
point-wise limit operation; b) generated by just simple
functions through the (pointwise) limit operation.
R, Q analogy again. R is closed under the limit operation.R can be generated by its dense subset Q.
The set of Borel-measurable functions is closed under
+,,, /, and function composition because step
functions are closed under these operations.
Qiu, Lee BST 401
D fi iti f th I t l
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
37/56
Definition of the Integral
Now we are going to define the abstract Lebesgue integral
For indicators:
1Ad = (A).
For simple functions:
hd =
ri=1 xi(Ai). (other
notations: h()d(), h()(d).)Motivation 1. Mathematical expectation of a discrete
random variable.
Motivation 2. Riemann integral. Velocity and distance.
Classical rectangle representation.
Roughly speaking, an integral w.r.t. is just a weightedRiemann integral/summation.
Qiu, Lee BST 401
D fi iti f th I t l
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
38/56
Definition of the Integral
Now we are going to define the abstract Lebesgue integral
For indicators:
1Ad = (A).
For simple functions:
hd =
ri=1 xi(Ai). (other
notations: h()d(), h()(d).)Motivation 1. Mathematical expectation of a discrete
random variable.
Motivation 2. Riemann integral. Velocity and distance.
Classical rectangle representation.
Roughly speaking, an integral w.r.t. is just a weightedRiemann integral/summation.
Qiu, Lee BST 401
D fi iti f th I t l
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
39/56
Definition of the Integral
Now we are going to define the abstract Lebesgue integral
For indicators:
1Ad = (A).
For simple functions:
hd =
ri=1 xi(Ai). (other
notations: h()d(), h()(d).)Motivation 1. Mathematical expectation of a discrete
random variable.
Motivation 2. Riemann integral. Velocity and distance.
Classical rectangle representation.
Roughly speaking, an integral w.r.t. is just a weightedRiemann integral/summation.
Qiu, Lee BST 401
Definition of the Integral
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
40/56
Definition of the Integral
Now we are going to define the abstract Lebesgue integral
For indicators:
1Ad = (A).
For simple functions:
hd =
ri=1 xi(Ai). (other
notations: h()d(), h()(d).)Motivation 1. Mathematical expectation of a discrete
random variable.
Motivation 2. Riemann integral. Velocity and distance.
Classical rectangle representation.
Roughly speaking, an integral w.r.t. is just a weightedRiemann integral/summation.
Qiu, Lee BST 401
Definition of the Integral
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
41/56
Definition of the Integral
Now we are going to define the abstract Lebesgue integral
For indicators:
1Ad = (A).
For simple functions:
hd =
ri=1 xi(Ai). (other
notations: h()d(), h()(d).)Motivation 1. Mathematical expectation of a discrete
random variable.
Motivation 2. Riemann integral. Velocity and distance.
Classical rectangle representation.
Roughly speaking, an integral w.r.t. is just a weightedRiemann integral/summation.
Qiu, Lee BST 401
Definition of the integral (II)
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
42/56
Definition of the integral (II)
For arbitrary Borel-measurable functions (random
variables), we use limit/approximation to define the
integral. Technically, it involves four steps. Please go
through the textbook, page 452-458.An analogy: in measure extension, we first have a very
simple algebra F0 and a simple 0. Then we extend 0 to1 defined on G, which includes all limiting sets of F0. 1
is defined by taking limits.
Qiu, Lee BST 401
Definition of the integral (II)
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
43/56
Definition of the integral (II)
For arbitrary Borel-measurable functions (random
variables), we use limit/approximation to define the
integral. Technically, it involves four steps. Please go
through the textbook, page 452-458.An analogy: in measure extension, we first have a very
simple algebra F0 and a simple 0. Then we extend 0 to1 defined on G, which includes all limiting sets of F0. 1
is defined by taking limits.
Qiu, Lee BST 401
Integrability
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
44/56
Integrability
A Borel-measurable function is called integrable if|h|d
is finite.
It is equivalent to say that the positive/negative branches of
h have finite values of integral. Show a picture of these twobranches.
For a random variable X, this means EX exists and is finite.
Infinity is a nuisance in math. Almost all theorem about the
integral (expectation) of random variables/meas. functions
need the integrability condition.
Qiu, Lee BST 401
Integrability
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
45/56
Integrability
A Borel-measurable function is called integrable if|h|d
is finite.
It is equivalent to say that the positive/negative branches of
h have finite values of integral. Show a picture of these twobranches.
For a random variable X, this means EX exists and is finite.
Infinity is a nuisance in math. Almost all theorem about the
integral (expectation) of random variables/meas. functions
need the integrability condition.
Qiu, Lee BST 401
Integrability
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
46/56
Integrability
A Borel-measurable function is called integrable if|h|d
is finite.
It is equivalent to say that the positive/negative branches of
h have finite values of integral. Show a picture of these twobranches.
For a random variable X, this means EX exists and is finite.
Infinity is a nuisance in math. Almost all theorem about the
integral (expectation) of random variables/meas. functions
need the integrability condition.
Qiu, Lee BST 401
Integrability
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
47/56
Integrability
A Borel-measurable function is called integrable if|h|d
is finite.
It is equivalent to say that the positive/negative branches of
h have finite values of integral. Show a picture of these twobranches.
For a random variable X, this means EX exists and is finite.
Infinity is a nuisance in math. Almost all theorem about the
integral (expectation) of random variables/meas. functions
need the integrability condition.
Qiu, Lee BST 401
The notion of almost everywhere
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
48/56
The notion of almost everywhere
A mathematical condition (such as two functions are equal,a sequence of functions converges, etc) is said to hold
almost everywherew.r.t. (simply denoted as either a.e.or a.s.) if this condition is true up to a zero measure set.
For example, 1Q = 0 is true almost everywhere w.r.t. theLebesgue measure. But it is not true w.r.t. many discrete
probabilities.
From the measure/probability theory point of view, almost
everywhere/almost sure conclusions are good enough.
Almost all the theorems in probability theory (and almost allother branches of statistics) are just as good if we replace
pointwise properties by almost everywhere properties.
Qiu, Lee BST 401
The notion of almost everywhere
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
49/56
The notion of almost everywhere
A mathematical condition (such as two functions are equal,a sequence of functions converges, etc) is said to hold
almost everywherew.r.t. (simply denoted as either a.e.or a.s.) if this condition is true up to a zero measure set.
For example, 1Q = 0 is true almost everywhere w.r.t. theLebesgue measure. But it is not true w.r.t. many discrete
probabilities.
From the measure/probability theory point of view, almost
everywhere/almost sure conclusions are good enough.
Almost all the theorems in probability theory (and almost allother branches of statistics) are just as good if we replace
pointwise properties by almost everywhere properties.
Qiu, Lee BST 401
The notion of almost everywhere
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
50/56
The notion of almost everywhere
A mathematical condition (such as two functions are equal,a sequence of functions converges, etc) is said to hold
almost everywherew.r.t. (simply denoted as either a.e.or a.s.) if this condition is true up to a zero measure set.
For example, 1Q = 0 is true almost everywhere w.r.t. theLebesgue measure. But it is not true w.r.t. many discrete
probabilities.
From the measure/probability theory point of view, almost
everywhere/almost sure conclusions are good enough.
Almost all the theorems in probability theory (and almost allother branches of statistics) are just as good if we replace
pointwise properties by almost everywhere properties.
Qiu, Lee BST 401
The notion of almost everywhere
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
51/56
e ot o o a ost e e y e e
A mathematical condition (such as two functions are equal,a sequence of functions converges, etc) is said to hold
almost everywherew.r.t. (simply denoted as either a.e.or a.s.) if this condition is true up to a zero measure set.
For example, 1Q = 0 is true almost everywhere w.r.t. theLebesgue measure. But it is not true w.r.t. many discrete
probabilities.
From the measure/probability theory point of view, almost
everywhere/almost sure conclusions are good enough.
Almost all the theorems in probability theory (and almost allother branches of statistics) are just as good if we replace
pointwise properties by almost everywhere properties.
Qiu, Lee BST 401
The four steps approach
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
52/56
p pp
1 Define integral for the simple functions.
2 Define integral for measurable functions that are a)
non-zero only on a set E with finite measure; b) bounded.
3 Extend the above definition to positive functions without
the two restrictions. The positiveness is important because
it ensures that the limiting process in the definition (first
equation, page 456) is a monotonic process.
4 Extend the above definition to arbitrary integrable functions
by break the function into two branches: f = f+
f
, andtake integrals separately for these two branches.
Qiu, Lee BST 401
The four steps approach
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
53/56
p pp
1 Define integral for the simple functions.
2 Define integral for measurable functions that are a)
non-zero only on a set E with finite measure; b) bounded.
3 Extend the above definition to positive functions without
the two restrictions. The positiveness is important because
it ensures that the limiting process in the definition (first
equation, page 456) is a monotonic process.
4 Extend the above definition to arbitrary integrable functions
by break the function into two branches: f = f+
f
, andtake integrals separately for these two branches.
Qiu, Lee BST 401
The four steps approach
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
54/56
p pp
1 Define integral for the simple functions.
2 Define integral for measurable functions that are a)
non-zero only on a set E with finite measure; b) bounded.
3 Extend the above definition to positive functions without
the two restrictions. The positiveness is important because
it ensures that the limiting process in the definition (first
equation, page 456) is a monotonic process.
4 Extend the above definition to arbitrary integrable functions
by break the function into two branches: f = f+
f
, andtake integrals separately for these two branches.
Qiu, Lee BST 401
The four steps approach
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
55/56
p pp
1 Define integral for the simple functions.
2 Define integral for measurable functions that are a)
non-zero only on a set E with finite measure; b) bounded.
3 Extend the above definition to positive functions without
the two restrictions. The positiveness is important because
it ensures that the limiting process in the definition (first
equation, page 456) is a monotonic process.
4 Extend the above definition to arbitrary integrable functions
by break the function into two branches: f = f+
f
, andtake integrals separately for these two branches.
Qiu, Lee BST 401
Homework
http://find/http://goback/8/8/2019 Probability Theory Presentation 07
56/56
Page 43, number 2, 3.
Qiu, Lee BST 401
http://find/http://goback/