Upload
buiphuc
View
216
Download
2
Embed Size (px)
Citation preview
Lecture 2
Statistical Inference
An example of data Mid-Heights of Parents 𝑥
Hei
gh
ts o
f A
du
lt C
hil
dre
n
below 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5 72.5 above sum
above 5 3 2 4 14
73.2 3 4 3 2 2 3 17
72.2 1 4 4 11 4 9 7 1 41
71.2 2 11 18 20 7 4 2 64
70.2 5 4 19 21 25 14 10 1 99
69.2 1 2 7 13 38 48 33 18 5 2 167
68.2 1 7 14 28 34 20 12 3 1 120
67.2 2 5 11 17 38 31 27 3 4 138
66.2 2 5 11 17 36 25 17 1 3 117
65.2 1 1 7 2 15 16 4 1 1 48
64.2 4 4 5 5 14 11 16 59
63.2 2 4 9 3 5 7 1 1 32
62.2 1 3 3 7
below 1 1 1 1 1 5
sum 14 23 66 78 211 219 183 68 43 19 4 928
F. Galton:Regression towards mediocrity in hereditary stature, Anthropological Miscellanea (1886)
2
Multi-dimensional data and data matrix
2 dimensional data
No. father Adult child
1 68.2 63.3
2 71.8 72.5
3 64.4 69.2
⋯ ⋯ ⋯
𝑖 𝑥𝑖 𝑦𝑖
⋯ ⋯ ⋯
928 𝑥928 𝑦928
𝑥1 𝑥2 ⋯ 𝑥𝑗 ⋯ 𝑥𝑝
1
2
⋮
𝑖 𝑥𝑖1 𝑥𝑖2 ⋯ 𝑥𝑖𝑗 ⋯ 𝑥𝑖𝑝
⋮
𝑛
p variables
nsa
mp
les
p dimensional data
𝐗𝑖 =
𝑥𝑖1𝑥𝑖2⋮𝑥𝑖𝑝
𝑖-th data
data matrix
𝐃 = 𝐗1 𝐗2⋯ 𝐗𝑖 ⋯𝐗𝑛𝑇
3
Mid-heights of Parents
Hei
gh
ts o
f A
du
lt C
hil
dre
n
below 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5 72.5 above sum
above 5 3 2 4 14
73.2 3 4 3 2 2 3 17
72.2 1 4 4 11 4 9 7 1 41
71.2 2 11 18 20 7 4 2 64
70.2 5 4 19 21 25 14 10 1 99
69.2 1 2 7 13 38 48 33 18 5 2 167
68.2 1 7 14 28 34 20 12 3 1 120
67.2 2 5 11 17 38 31 27 3 4 138
66.2 2 5 11 17 36 25 17 1 3 117
65.2 1 1 7 2 15 16 4 1 1 48
64.2 4 4 5 5 14 11 16 59
63.2 2 4 9 3 5 7 1 1 32
62.2 1 3 3 7
below 1 1 1 1 1 5
sum 14 23 66 78 211 219 183 68 43 19 4 928
x
y
Joint distribution
4
Marginal distribution (Heights of adult children)
population
samples
statisticalinference
Inch below 62.2 63.2 64.2 65.2 66.2 67.2 68.2 69.2 70.2 71.2 72.2 73.2 above sum
Num. 5 7 32 59 48 117 138 120 167 99 64 41 17 14 928
Rel. Freq.
0.005 0.008 0.034 0.064 0.052 0.126 0.149 0.129 0.180 0.107 0.069 0.044 0.018 0.015 1.000
0.000
0.050
0.100
0.150
0.200
below 62.2 63.2 64.2 65.2 66.2 67.2 68.2 69.2 70.2 71.2 72.2 73.2 above
5
Point estimates of population parameters
The population distribution is our ultimate target. However, it is not reasonable to discuss the full
information about it. We will estimate some parameters of the
population distribution, e.g., mean value, variance, maximal value, correlation coefficient, ....
Point estimateLet 𝜃 be a parameter of the population distribution.We wish to find a function
𝜃 = 𝑓 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁such that 𝜃 is a reasonable estimate of the unknown parameter 𝜃 by using the sample data 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁.
population distribution
samples
𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁
𝜃
statisticalinference
6
Unbiased estimates
Definition 𝜃 = 𝜃 𝑋1, 𝑋2, ⋯ , 𝑋𝑁 is called an
unbiased estimator of 𝜃 if 𝐄 𝜃 = 𝜃.
population
samples𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁
Unknown parameter 𝜃
𝜃 = 𝜃 𝑥1, 𝑥2, ⋯ , 𝑥𝑁
Sample data 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁 are definite values once a sampling is performed.
However, they vary with each sampling though the population remains the same.
Moreover, because each sample 𝑥𝑖 is a result of random sampling, it is modeled by a random variable 𝑋𝑖 obeying the population distribution.
Thus, the sample data 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁 are considered as realized values of the independent random variables 𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑁
Therefore, 𝜃 should be considered as a random variable 𝜃 = 𝜃 𝑋1, 𝑋2, ⋯ , 𝑋𝑁 .
As usual, we adoptrandom sampling with replacement
7
Sample mean
For random samples 𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑁
the sample mean is defined by
Theorem The sample mean 𝑋 is an unbiased estimator of the mean value of the population:
where 𝑚 is the mean value of the population.
𝑋 =1
𝑁
𝑘=1
𝑁
𝑋𝑘
𝐄 𝑋 = 𝑚
PROOF Since each 𝑋𝑘 is a random sample from the population, its distribution coincides with the population distribution. In particular, 𝐄 𝑋𝑘 = 𝑚. Then,
𝐄 𝑋 =1
𝑁
𝑘=1
𝑁
𝐄 𝑋𝑘 = 𝑚
population mean 𝑚
sample 1: 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁 → 𝑥
sample 2: 𝑦1, 𝑦2, 𝑦3, ⋯ , 𝑦𝑁 → 𝑦
⋮
𝑚 𝑥 𝑦
8
Better estimates
1. We have seen that the sample mean
𝑋 =1
𝑁
𝑘=1
𝑁
𝑋𝑘
is an unbiased estimator: 𝐄 𝑋 = 𝑚.
Let 𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑁 be random samples.
2. The weighted mean
𝑚 =
𝑘=1
𝑛
𝑎𝑘𝑋𝑘 ,
𝑘=1
𝑁
𝑎𝑘 = 1
is also an unbiased estimator.
𝐄 𝑚 =
𝑘=1
𝑁
𝑎𝑘𝐄 𝑋𝑘 = 𝑚
In fact,
3. Which is better?
Definition Let 𝜃1 and 𝜃2 be two unbiased
estimators of 𝜃, i.e., 𝐄 𝜃1 = 𝐄 𝜃2 = 𝜃.
We say that 𝜃1 is better than 𝜃2 if
𝐄 𝜃1 − 𝜃2
≤ 𝐄 𝜃2 − 𝜃2.
In general, 𝐄 𝜃 − 𝜃2
= 𝐕 𝜃 is called the
mean squared error.
9
Sample mean is better than weighted mean
For the weighted mean 𝑚 = 𝑘=1𝑛 𝑎𝑘𝑋𝑘
we compute the mean squared error: 𝐄 𝑚 −𝑚 2 .
𝑚2 =
𝑗=1
𝑛
𝑎𝑗𝑋𝑗
𝑘=1
𝑛
𝑎𝑘𝑋𝑘
=
𝑗≠𝑘
𝑎𝑗𝑎𝑘𝑋𝑗𝑋𝑘 +
𝑘=1
𝑛
𝑎𝑘2𝑋𝑘
2
=
𝑗≠𝑘
𝑎𝑗𝑎𝑘 𝑚2 +
𝑘=1
𝑛
𝑎𝑘2 𝜎2 +𝑚2
= 𝑚2
𝑗,𝑘=1
𝑛
𝑎𝑗𝑎𝑘 + 𝜎2
𝑘=1
𝑛
𝑎𝑘2 =𝑚2 + 𝜎2
𝑘=1
𝑛
𝑎𝑘2First note that
and
Hence
𝐄 𝑚 −𝑚 2 = 𝐄 𝑚2 −𝑚2 = 𝜎2
𝑘=1
𝑛
𝑎𝑘2
= 𝜎2
𝑘=1
𝑛
𝑎𝑘 −1
𝑛
2
+2
𝑛
𝑘=1
𝑛
𝑎𝑘 − 𝑛 ×1
𝑛2
= 𝜎2
𝑘=1
𝑛
𝑎𝑘 −1
𝑛
2
+𝜎2
𝑛
𝐄 𝑚2 =
𝑗≠𝑘
𝑎𝑗𝑎𝑘 𝐄 𝑋𝑗𝑋𝑘 +
𝑘=1
𝑛
𝑎𝑘2𝐄 𝑋𝑘
2
=
𝑗≠𝑘
𝑎𝑗𝑎𝑘 𝐄 𝑋𝑗 𝐄 𝑋𝑘 +
𝑘=1
𝑛
𝑎𝑘2 𝜎2 +𝑚2
This attains minimum when 𝑎𝑘 = 1/𝑛 (sample mean).
10
Exercise 3 (10min)
(1) A coin is tossed 𝑁 times. Let 𝑋 be the number of heads. Examine that the relative frequency 𝑋/𝑁 is an unbiased estimator of a probability that a head occurs. [Hint] 𝑋/𝑁 =
1
𝑁 𝑘=1𝑁 𝑍𝑘 , where 𝑍𝑘 = 0 or 1 according to heads or tails.
(2) An urn contains many white balls and a relatively small number of red
balls. 𝑁 balls are chosen randomly with replacement. Find an unbiased
estimator of the ratio of red balls in the urn.
11
Exercise 4 (Challenge)[Median is an unbiased estimator]
There is a deck of 𝑁 cards and a unique number from 1 to 𝑁 is assigned to each
card as a reward. Three cards are drawn randomly with replacement, say,
𝑋1, 𝑋2, 𝑋3. Then we know that (𝑋1 + 𝑋2 + 𝑋3)/3 is an unbiased estimator of the
mean of the reward. The median 𝑀 of 𝑋1, 𝑋2, 𝑋3 is the reward which locates at
the middle rank among the three, i.e., in this case the second largest (or the
second smallest).
(1) Find the distribution of 𝑀, i.e., 𝑃 𝑀 = 𝑘 .
(2) Show that 𝑀 is an unbiased estimator.
12
Unbiased variance
• Let 𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑁 be random
samples with mean 𝑚 and variance 𝜎2.
• The sample mean is defined by
𝑋 =1
𝑁
𝑘=1
𝑁
𝑋𝑘
• The variance would be naturally defined by
𝑆2 =1
𝑁
𝑘=1
𝑁
𝑋𝑘 − 𝑋 2
• However, 𝐄 𝑆2 ≠ 𝜎2.
TheoremAn unbiased estimator of 𝜎2 is given by
𝑈2 =1
𝑁 − 1
𝑘=1
𝑁
𝑋𝑘 − 𝑋 2.
In other words,
𝐄 𝑈2 = 𝜎2.
Definition 𝑈2 is called the unbiased variance (or sample variance in some literatures).
PROOF is by direct calculation of 𝐄 𝑈2
expanding the right-hand side [Hoel: p.220].
13
Maximum likelihood estimates
• It is desired to use the best unbiased estimator.
• The best unbiased estimators are known for many concrete cases,
however, it is not straightforward to obtain the best unbiased
estimator in general.
• Then the maximum likelihood method is employed because
the maximum likelihood estimator is rather easily derived;
and in many cases it is the best unbiased estimator.
14
Examples:
(1) Binomial population – opinion poll, etc.
The population distribution is the Bernoulli
distribution 𝐵 1, 𝑝 , so the unknown parameter
is 𝑝.
(2) Poisson population – counting rare events
The population distribution Po 𝜆 contains just
one unknown parameter 𝜆.
(3) Normal population – Population consisting of
individuals sharing a lot of small fluctuations
The normal distribution 𝑁 𝑚, 𝜎2 contains two
unknown parameters 𝑚 and 𝜎2.
Formulation
population distribution
𝑓 𝑥, 𝜃
with unknown parameter 𝜃
samples𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁
𝜃 = 𝜃 𝑥1, 𝑥2, ⋯ , 𝑥𝑁
15
Likelihood functions
Let 𝑓 𝑥, 𝜃 be the population distribution, where 𝜃 is unknown.
The likelihood function is defined by
𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝜃 =
𝑘=1
𝑁
𝑓 𝑥𝑘 , 𝜃
Basic idea: • Given a set of sample data 𝑥1, 𝑥2, ⋯ , 𝑥𝑁,
we determine the value 𝜃 which maximizes 𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝜃 .
• Roughly, we presume that the occurring event (the data 𝑥1, 𝑥2, ⋯ , 𝑥𝑁) is the one that have the highest probability among many other candidates.
The maximum value of 𝐿 is found by solving 𝜕𝐿
𝜕𝜃= 0.
Such a value 𝜃 = 𝜃 is called the maximum likelihood estimator.
It is often more convenient to consider the log-likelihood function:
log 𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝜃 =
𝑘=1
𝑁
log 𝑓 𝑥𝑘 , 𝜃
Then
𝜕 log 𝐿
𝜕𝜃=
𝑘=1
𝑁𝜕
𝜕𝜃log 𝑓 𝑥𝑘 , 𝜃 .
16
Binomial population
We consider
𝑓 𝑥, 𝑝 =𝑝, if 𝑥 = 11 − 𝑝, if 𝑥 = 0
= 𝑝𝑥 1 − 𝑝 1−𝑥
where 𝑝 is unknown and to be estimated.
The log-likelihood function is defined by
log 𝐿 =
𝑘=1
𝑁
𝑥𝑘 log 𝑝 + 1 − 𝑥𝑘 log 1 − 𝑝
The likelihood function is defined by
𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝑝 =
𝑘=1
𝑁
𝑓 𝑥𝑘 , 𝑝 =
𝑘=1
𝑁
𝑝𝑥𝑘 1 − 𝑝 1−𝑥𝑘
Then we have
𝜕
𝜕𝑝log 𝐿 =
𝑘=1
𝑁
𝑥𝑘1
𝑝+ 1 − 𝑥𝑘
−1
1 − 𝑝
=
𝑘=1
𝑁𝑥𝑘 − 𝑝
𝑝 1 − 𝑝
and solving 𝜕
𝜕𝑝log 𝐿 = 0, we
obtain the maximum likelihood estimator:
𝑝 =1
𝑁
𝑘=1
𝑁
𝑥𝑘
which is known as the sample mean and is an unbiased estimator too.
17
Point estimates vs Interval estimates
populationsamples
𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁unknown parameter
𝜃
𝜃 𝜃
point estimator
𝜃 = 𝜃 𝑥1, 𝑥2, ⋯ , 𝑥𝑁
• 𝜃 is calculated by using our good estimator.• However, nobody knows whether or not 𝜃 is
close to 𝜃.• What we know is that our estimating method
is statistically reasonable in the sense that 𝜃 hits a good approximation of 𝜃 with high frequency if we apply many times.
Probabilistic estimate is done using the confidence interval [Hoel: Chap 5]18
Problem 5
19
Consider the interval [0, 𝐴], where 𝐴 > 0 is unknown and to be estimated (e.g., the ultimate world record). Let 𝑋1, 𝑋2, ⋯ , 𝑋𝑛 be random samples obeying the uniform distribution [0, 𝐴]. Let 𝑋 be the sample mean.
(1) Show that 2 𝑋 is an unbiased estimator of 𝐴.
(2) This model is too simple for application to estimating the ultimate world record of long jump. Discuss about this point.
Problem 6
There are many runners on the street. Each runner is given a number cloth starting with
1 to a certain unknown big number 𝑁. Snap shots are taken at random.
(1) In a snap shot there are 3 runners with numbers 𝑋1, 𝑋2, 𝑋3. Set 𝑀 = max 𝑋1, 𝑋2, 𝑋3 .
Prove that
𝑃 𝑀 = 𝑘 =3 𝑘 − 1 𝑘 − 2
𝑁 𝑁 − 1 𝑁 − 2
(2) Calculating the mean value 𝐄 𝑀 , show that 4
3𝑀 − 1 is an unbiased estimator of 𝑁.
(3) In a snap shot there are 𝑛 runners with numbers 𝑋1, 𝑋2, ⋯ , 𝑋𝑛. Show that
1 +1
𝑛𝑀 − 1 is an unbiased estimator of 𝑁, where 𝑀 = max 𝑋1, 𝑋2, ⋯ , 𝑋𝑛 .
20
Problem 7
Recall Exercise 7. There is a deck of 𝑁 cards and a unique number from 1 to 𝑁 is
assigned to each card as a reward. Three cards are drawn randomly with replacement,
say, 𝑋1, 𝑋2, 𝑋3. We know that both
𝑋 =1
3𝑋1 + 𝑋2 + 𝑋3 , 𝑀 = median 𝑋1, 𝑋2, 𝑋3
are unbiased estimator of the mean of the reward. Calculating the mean squared
errors, determine the better estimator.
21
Problem 8
The exponential distribution with parameter 𝜆 > 0 is defined by the density
function:
𝑓 𝑥, 𝜆 = 𝜆𝑒−𝜆𝑥 , 𝑥 ≥ 0,
0, 𝑥 < 0.
Then the likelihood function is defined by
𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝜆 =
𝑘=1
𝑁
𝑓 𝑥𝑘 , 𝜆 .
Find the maximum likelihood estimator 𝜆 of 𝜆.
22
Problem 9It is known [de Moivre-Laplace theorem] that the binomial distribution 𝐵 𝑛, 𝑝 is well approximated
by the normal distribution 𝑁 𝑛𝑝, 𝑛𝑝 1 − 𝑝 when 𝑛𝑝 ≥ 30 and 𝑛 1 − 𝑝 ≥ 30. Then, for a
random variable 𝑋~𝐵 𝑛, 𝑝 we have
𝑃 𝑋 ≤ 𝑥 = 𝑃𝑋 − 𝑛𝑝
𝑛𝑝 1 − 𝑝≤ 𝑦 ≈
1
2𝜋 −∞
𝑦
𝑒−𝑡2
2 𝑑𝑡 , 𝑦 =𝑥 − 𝑛𝑝
𝑛𝑝 1 − 𝑝
On the other hand, by numerical computation for 𝑍~𝑁 0,1 we know that
𝑃 𝑍 ≤ 1.96 =1
2𝜋 −1.96
1.96
𝑒−𝑡2
2 𝑑𝑡 = 0.95
A fair coin is tossed 𝑛 times (𝑛 is so large as 𝑛 ≫ 100). Let 𝑋𝑛 be the number of heads among 𝑛
trials. Find 𝑎𝑛 such that
𝑃𝑋𝑛
𝑛−1
2≤ 𝑎𝑛 = 0.95
23
Solution to Exercise 3
(1) Let 𝑝 be the probability that a head occurs
and 𝑋 the number of heads obtained in
tossing a coin 𝑁 times. Then 𝑋/𝑁 is an
unbiased estimator of 𝑝. In fact, define
𝑍𝑘 = 1, if a head occurs on 𝑘−th toss
0, otherwise
Then 𝑍1, 𝑍2, ⋯ , 𝑍𝑁 are random samples from
the population with mean 𝑝 (This is called a
binomial population). Then we know that the
sample mean
𝑍 =1
𝑁
𝑘=1
𝑁
𝑍𝑘
(2) Let 𝑝 be the ratio of the red balls in the urn.
Then, choosing a ball from the urn is equivalent to
tossing a coin such that the probability that a
head occurs is 𝑝. Then the situation is the same as
in (1). Letting 𝑋 be the number of red balls among
𝑁 trials, X/𝑁 is an unbiased estimator of 𝑝.
is an unbiased estimator, i.e., 𝐄 𝑍 = 𝑝. On the other
hand, 𝑘=1𝑁 𝑍𝑘 is the number of of heads obtained in
tossing a coin 𝑁 times and coincides with 𝑋.
Consequently, 𝑋
𝑁= 𝑍 is an unbiased estimator of 𝑝.
24
Solution to Exercise 4
First we find the distribution of 𝑀. The event 𝑀 = 𝑘 is divided into the following four cases:
• 𝑋1 < 𝑘, 𝑋2= 𝑘, 𝑋3 > 𝑘 or its permutations
• 𝑋1 = 𝑋2 = 𝑘, 𝑋3> 𝑘 or its permutations
• 𝑋1 < 𝑘, 𝑋2 = 𝑋3 = 𝑘 or its permutations
• 𝑋1 = 𝑋2 = 𝑋3 = 𝑘
Thus,
𝑃 𝑀 = 𝑘 =1
𝑁36 𝑘 − 1 𝑁 − 𝑘 + 3 𝑁 − 𝑘 + 3 𝑘 − 1) + 1 =
1
𝑁3−6𝑘2 + 6 𝑁 + 1 𝑘 − 3𝑁 + 2
𝐸 𝑀 =
𝑘=1
𝑁
𝑘 𝑃 𝑀 = 𝑘 =1
𝑁3
𝑘=1
𝑁
−6𝑘3 + 6 𝑁 + 1 𝑘2 − 3𝑁 + 2 𝑘
=1
𝑁3−6
𝑁2 𝑁 + 1 2
4+ 6 𝑁 + 1
𝑁 𝑁 + 1 2𝑁 + 1
6− 3𝑁 + 2
𝑁 𝑁 + 1
2=
𝑁 + 1
2
The mean value of 𝑀 is computed as follows:
which coincides with the mean of the reward 𝑁 + 1 /2. 25