Lecture 2 Statistical Inference - Tohoku University Official ...obata/student/graduate/file/...Unbiased estimates Definition 𝜃 =𝜃 𝑋 1,𝑋2,⋯,𝑋𝑁 is called an unbiased

Lecture 2

Statistical Inference

An example of data Mid-Heights of Parents 𝑥

Hei

gh

ts o

f A

du

lt C

hil

dre

n

below 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5 72.5 above sum

above 5 3 2 4 14

73.2 3 4 3 2 2 3 17

72.2 1 4 4 11 4 9 7 1 41

71.2 2 11 18 20 7 4 2 64

70.2 5 4 19 21 25 14 10 1 99

69.2 1 2 7 13 38 48 33 18 5 2 167

68.2 1 7 14 28 34 20 12 3 1 120

67.2 2 5 11 17 38 31 27 3 4 138

66.2 2 5 11 17 36 25 17 1 3 117

65.2 1 1 7 2 15 16 4 1 1 48

64.2 4 4 5 5 14 11 16 59

63.2 2 4 9 3 5 7 1 1 32

62.2 1 3 3 7

below 1 1 1 1 1 5

sum 14 23 66 78 211 219 183 68 43 19 4 928

F. Galton:Regression towards mediocrity in hereditary stature, Anthropological Miscellanea (1886)

2

Multi-dimensional data and data matrix

2 dimensional data

No. father Adult child

1 68.2 63.3

2 71.8 72.5

3 64.4 69.2

⋯ ⋯ ⋯

𝑖 𝑥𝑖 𝑦𝑖

⋯ ⋯ ⋯

928 𝑥928 𝑦928

𝑥1 𝑥2 ⋯ 𝑥𝑗 ⋯ 𝑥𝑝

1

2

⋮

𝑖 𝑥𝑖1 𝑥𝑖2 ⋯ 𝑥𝑖𝑗 ⋯ 𝑥𝑖𝑝

⋮

𝑛

p variables

nsa

mp

les

p dimensional data

𝐗𝑖 =

𝑥𝑖1𝑥𝑖2⋮𝑥𝑖𝑝

𝑖-th data

data matrix

𝐃 = 𝐗1 𝐗2⋯ 𝐗𝑖 ⋯𝐗𝑛𝑇

3

Mid-heights of Parents

Hei

gh

ts o

f A

du

lt C

hil

dre

n

below 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5 72.5 above sum

above 5 3 2 4 14

73.2 3 4 3 2 2 3 17

72.2 1 4 4 11 4 9 7 1 41

71.2 2 11 18 20 7 4 2 64

70.2 5 4 19 21 25 14 10 1 99

69.2 1 2 7 13 38 48 33 18 5 2 167

68.2 1 7 14 28 34 20 12 3 1 120

67.2 2 5 11 17 38 31 27 3 4 138

66.2 2 5 11 17 36 25 17 1 3 117

65.2 1 1 7 2 15 16 4 1 1 48

64.2 4 4 5 5 14 11 16 59

63.2 2 4 9 3 5 7 1 1 32

62.2 1 3 3 7

below 1 1 1 1 1 5

sum 14 23 66 78 211 219 183 68 43 19 4 928

x

y

Joint distribution

4

Marginal distribution (Heights of adult children)

population

samples

statisticalinference

Inch below 62.2 63.2 64.2 65.2 66.2 67.2 68.2 69.2 70.2 71.2 72.2 73.2 above sum

Num. 5 7 32 59 48 117 138 120 167 99 64 41 17 14 928

Rel. Freq.

0.005 0.008 0.034 0.064 0.052 0.126 0.149 0.129 0.180 0.107 0.069 0.044 0.018 0.015 1.000

0.000

0.050

0.100

0.150

0.200

below 62.2 63.2 64.2 65.2 66.2 67.2 68.2 69.2 70.2 71.2 72.2 73.2 above

5

Point estimates of population parameters

The population distribution is our ultimate target. However, it is not reasonable to discuss the full

information about it. We will estimate some parameters of the

population distribution, e.g., mean value, variance, maximal value, correlation coefficient, ....

Point estimateLet 𝜃 be a parameter of the population distribution.We wish to find a function

𝜃 = 𝑓 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁such that 𝜃 is a reasonable estimate of the unknown parameter 𝜃 by using the sample data 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁.

population distribution

samples

𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁

𝜃

statisticalinference

6

Unbiased estimates

Definition 𝜃 = 𝜃 𝑋1, 𝑋2, ⋯ , 𝑋𝑁 is called an

unbiased estimator of 𝜃 if 𝐄 𝜃 = 𝜃.

population

samples𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁

Unknown parameter 𝜃

𝜃 = 𝜃 𝑥1, 𝑥2, ⋯ , 𝑥𝑁

Sample data 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁 are definite values once a sampling is performed.

However, they vary with each sampling though the population remains the same.

Moreover, because each sample 𝑥𝑖 is a result of random sampling, it is modeled by a random variable 𝑋𝑖 obeying the population distribution.

Thus, the sample data 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁 are considered as realized values of the independent random variables 𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑁

Therefore, 𝜃 should be considered as a random variable 𝜃 = 𝜃 𝑋1, 𝑋2, ⋯ , 𝑋𝑁 .

As usual, we adoptrandom sampling with replacement

7

Sample mean

For random samples 𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑁

the sample mean is defined by

Theorem The sample mean 𝑋 is an unbiased estimator of the mean value of the population:

where 𝑚 is the mean value of the population.

𝑋 =1

𝑁

𝑘=1

𝑁

𝑋𝑘

𝐄 𝑋 = 𝑚

PROOF Since each 𝑋𝑘 is a random sample from the population, its distribution coincides with the population distribution. In particular, 𝐄 𝑋𝑘 = 𝑚. Then,

𝐄 𝑋 =1

𝑁

𝑘=1

𝑁

𝐄 𝑋𝑘 = 𝑚

population mean 𝑚

sample 1: 𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁 → 𝑥

sample 2: 𝑦1, 𝑦2, 𝑦3, ⋯ , 𝑦𝑁 → 𝑦

⋮

𝑚 𝑥 𝑦

8

Better estimates

1. We have seen that the sample mean

𝑋 =1

𝑁

𝑘=1

𝑁

𝑋𝑘

is an unbiased estimator: 𝐄 𝑋 = 𝑚.

Let 𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑁 be random samples.

2. The weighted mean

𝑚 =

𝑘=1

𝑛

𝑎𝑘𝑋𝑘 ,

𝑘=1

𝑁

𝑎𝑘 = 1

is also an unbiased estimator.

𝐄 𝑚 =

𝑘=1

𝑁

𝑎𝑘𝐄 𝑋𝑘 = 𝑚

In fact,

3. Which is better?

Definition Let 𝜃1 and 𝜃2 be two unbiased

estimators of 𝜃, i.e., 𝐄 𝜃1 = 𝐄 𝜃2 = 𝜃.

We say that 𝜃1 is better than 𝜃2 if

𝐄 𝜃1 − 𝜃2

≤ 𝐄 𝜃2 − 𝜃2.

In general, 𝐄 𝜃 − 𝜃2

= 𝐕 𝜃 is called the

mean squared error.

9

Sample mean is better than weighted mean

For the weighted mean 𝑚 = 𝑘=1𝑛 𝑎𝑘𝑋𝑘

we compute the mean squared error: 𝐄 𝑚 −𝑚 2 .

𝑚2 =

𝑗=1

𝑛

𝑎𝑗𝑋𝑗

𝑘=1

𝑛

𝑎𝑘𝑋𝑘

=

𝑗≠𝑘

𝑎𝑗𝑎𝑘𝑋𝑗𝑋𝑘 +

𝑘=1

𝑛

𝑎𝑘2𝑋𝑘

2

=

𝑗≠𝑘

𝑎𝑗𝑎𝑘 𝑚2 +

𝑘=1

𝑛

𝑎𝑘2 𝜎2 +𝑚2

= 𝑚2

𝑗,𝑘=1

𝑛

𝑎𝑗𝑎𝑘 + 𝜎2

𝑘=1

𝑛

𝑎𝑘2 =𝑚2 + 𝜎2

𝑘=1

𝑛

𝑎𝑘2First note that

and

Hence

𝐄 𝑚 −𝑚 2 = 𝐄 𝑚2 −𝑚2 = 𝜎2

𝑘=1

𝑛

𝑎𝑘2

= 𝜎2

𝑘=1

𝑛

𝑎𝑘 −1

𝑛

2

+2

𝑛

𝑘=1

𝑛

𝑎𝑘 − 𝑛 ×1

𝑛2

= 𝜎2

𝑘=1

𝑛

𝑎𝑘 −1

𝑛

2

+𝜎2

𝑛

𝐄 𝑚2 =

𝑗≠𝑘

𝑎𝑗𝑎𝑘 𝐄 𝑋𝑗𝑋𝑘 +

𝑘=1

𝑛

𝑎𝑘2𝐄 𝑋𝑘

2

=

𝑗≠𝑘

𝑎𝑗𝑎𝑘 𝐄 𝑋𝑗 𝐄 𝑋𝑘 +

𝑘=1

𝑛

𝑎𝑘2 𝜎2 +𝑚2

This attains minimum when 𝑎𝑘 = 1/𝑛 (sample mean).

10

Exercise 3 (10min)

(1) A coin is tossed 𝑁 times. Let 𝑋 be the number of heads. Examine that the relative frequency 𝑋/𝑁 is an unbiased estimator of a probability that a head occurs. [Hint] 𝑋/𝑁 =

1

𝑁 𝑘=1𝑁 𝑍𝑘 , where 𝑍𝑘 = 0 or 1 according to heads or tails.

(2) An urn contains many white balls and a relatively small number of red

balls. 𝑁 balls are chosen randomly with replacement. Find an unbiased

estimator of the ratio of red balls in the urn.

11

Exercise 4 (Challenge)[Median is an unbiased estimator]

There is a deck of 𝑁 cards and a unique number from 1 to 𝑁 is assigned to each

card as a reward. Three cards are drawn randomly with replacement, say,

𝑋1, 𝑋2, 𝑋3. Then we know that (𝑋1 + 𝑋2 + 𝑋3)/3 is an unbiased estimator of the

mean of the reward. The median 𝑀 of 𝑋1, 𝑋2, 𝑋3 is the reward which locates at

the middle rank among the three, i.e., in this case the second largest (or the

second smallest).

(1) Find the distribution of 𝑀, i.e., 𝑃 𝑀 = 𝑘 .

(2) Show that 𝑀 is an unbiased estimator.

12

Unbiased variance

• Let 𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑁 be random

samples with mean 𝑚 and variance 𝜎2.

• The sample mean is defined by

𝑋 =1

𝑁

𝑘=1

𝑁

𝑋𝑘

• The variance would be naturally defined by

𝑆2 =1

𝑁

𝑘=1

𝑁

𝑋𝑘 − 𝑋 2

• However, 𝐄 𝑆2 ≠ 𝜎2.

TheoremAn unbiased estimator of 𝜎2 is given by

𝑈2 =1

𝑁 − 1

𝑘=1

𝑁

𝑋𝑘 − 𝑋 2.

In other words,

𝐄 𝑈2 = 𝜎2.

Definition 𝑈2 is called the unbiased variance (or sample variance in some literatures).

PROOF is by direct calculation of 𝐄 𝑈2

expanding the right-hand side [Hoel: p.220].

13

Maximum likelihood estimates

• It is desired to use the best unbiased estimator.

• The best unbiased estimators are known for many concrete cases,

however, it is not straightforward to obtain the best unbiased

estimator in general.

• Then the maximum likelihood method is employed because

the maximum likelihood estimator is rather easily derived;

and in many cases it is the best unbiased estimator.

14

Examples:

(1) Binomial population – opinion poll, etc.

The population distribution is the Bernoulli

distribution 𝐵 1, 𝑝 , so the unknown parameter

is 𝑝.

(2) Poisson population – counting rare events

The population distribution Po 𝜆 contains just

one unknown parameter 𝜆.

(3) Normal population – Population consisting of

individuals sharing a lot of small fluctuations

The normal distribution 𝑁 𝑚, 𝜎2 contains two

unknown parameters 𝑚 and 𝜎2.

Formulation

population distribution

𝑓 𝑥, 𝜃

with unknown parameter 𝜃

samples𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁

𝜃 = 𝜃 𝑥1, 𝑥2, ⋯ , 𝑥𝑁

15

Likelihood functions

Let 𝑓 𝑥, 𝜃 be the population distribution, where 𝜃 is unknown.

The likelihood function is defined by

𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝜃 =

𝑘=1

𝑁

𝑓 𝑥𝑘 , 𝜃

Basic idea: • Given a set of sample data 𝑥1, 𝑥2, ⋯ , 𝑥𝑁,

we determine the value 𝜃 which maximizes 𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝜃 .

• Roughly, we presume that the occurring event (the data 𝑥1, 𝑥2, ⋯ , 𝑥𝑁) is the one that have the highest probability among many other candidates.

The maximum value of 𝐿 is found by solving 𝜕𝐿

𝜕𝜃= 0.

Such a value 𝜃 = 𝜃 is called the maximum likelihood estimator.

It is often more convenient to consider the log-likelihood function:

log 𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝜃 =

𝑘=1

𝑁

log 𝑓 𝑥𝑘 , 𝜃

Then

𝜕 log 𝐿

𝜕𝜃=

𝑘=1

𝑁𝜕

𝜕𝜃log 𝑓 𝑥𝑘 , 𝜃 .

16

Binomial population

We consider

𝑓 𝑥, 𝑝 =𝑝, if 𝑥 = 11 − 𝑝, if 𝑥 = 0

= 𝑝𝑥 1 − 𝑝 1−𝑥

where 𝑝 is unknown and to be estimated.

The log-likelihood function is defined by

log 𝐿 =

𝑘=1

𝑁

𝑥𝑘 log 𝑝 + 1 − 𝑥𝑘 log 1 − 𝑝

The likelihood function is defined by

𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝑝 =

𝑘=1

𝑁

𝑓 𝑥𝑘 , 𝑝 =

𝑘=1

𝑁

𝑝𝑥𝑘 1 − 𝑝 1−𝑥𝑘

Then we have

𝜕

𝜕𝑝log 𝐿 =

𝑘=1

𝑁

𝑥𝑘1

𝑝+ 1 − 𝑥𝑘

−1

1 − 𝑝

=

𝑘=1

𝑁𝑥𝑘 − 𝑝

𝑝 1 − 𝑝

and solving 𝜕

𝜕𝑝log 𝐿 = 0, we

obtain the maximum likelihood estimator:

𝑝 =1

𝑁

𝑘=1

𝑁

𝑥𝑘

which is known as the sample mean and is an unbiased estimator too.

17

Point estimates vs Interval estimates

populationsamples

𝑥1, 𝑥2, 𝑥3, ⋯ , 𝑥𝑁unknown parameter

𝜃

𝜃 𝜃

point estimator

𝜃 = 𝜃 𝑥1, 𝑥2, ⋯ , 𝑥𝑁

• 𝜃 is calculated by using our good estimator.• However, nobody knows whether or not 𝜃 is

close to 𝜃.• What we know is that our estimating method

is statistically reasonable in the sense that 𝜃 hits a good approximation of 𝜃 with high frequency if we apply many times.

Probabilistic estimate is done using the confidence interval [Hoel: Chap 5]18

Problem 5

19

Consider the interval [0, 𝐴], where 𝐴 > 0 is unknown and to be estimated (e.g., the ultimate world record). Let 𝑋1, 𝑋2, ⋯ , 𝑋𝑛 be random samples obeying the uniform distribution [0, 𝐴]. Let 𝑋 be the sample mean.

(1) Show that 2 𝑋 is an unbiased estimator of 𝐴.

(2) This model is too simple for application to estimating the ultimate world record of long jump. Discuss about this point.

Problem 6

There are many runners on the street. Each runner is given a number cloth starting with

1 to a certain unknown big number 𝑁. Snap shots are taken at random.

(1) In a snap shot there are 3 runners with numbers 𝑋1, 𝑋2, 𝑋3. Set 𝑀 = max 𝑋1, 𝑋2, 𝑋3 .

Prove that

𝑃 𝑀 = 𝑘 =3 𝑘 − 1 𝑘 − 2

𝑁 𝑁 − 1 𝑁 − 2

(2) Calculating the mean value 𝐄 𝑀 , show that 4

3𝑀 − 1 is an unbiased estimator of 𝑁.

(3) In a snap shot there are 𝑛 runners with numbers 𝑋1, 𝑋2, ⋯ , 𝑋𝑛. Show that

1 +1

𝑛𝑀 − 1 is an unbiased estimator of 𝑁, where 𝑀 = max 𝑋1, 𝑋2, ⋯ , 𝑋𝑛 .

20

Problem 7

Recall Exercise 7. There is a deck of 𝑁 cards and a unique number from 1 to 𝑁 is

assigned to each card as a reward. Three cards are drawn randomly with replacement,

say, 𝑋1, 𝑋2, 𝑋3. We know that both

𝑋 =1

3𝑋1 + 𝑋2 + 𝑋3 , 𝑀 = median 𝑋1, 𝑋2, 𝑋3

are unbiased estimator of the mean of the reward. Calculating the mean squared

errors, determine the better estimator.

21

Problem 8

The exponential distribution with parameter 𝜆 > 0 is defined by the density

function:

𝑓 𝑥, 𝜆 = 𝜆𝑒−𝜆𝑥 , 𝑥 ≥ 0,

0, 𝑥 < 0.

Then the likelihood function is defined by

𝐿 𝑥1, 𝑥2, ⋯ , 𝑥𝑁; 𝜆 =

𝑘=1

𝑁

𝑓 𝑥𝑘 , 𝜆 .

Find the maximum likelihood estimator 𝜆 of 𝜆.

22

Problem 9It is known [de Moivre-Laplace theorem] that the binomial distribution 𝐵 𝑛, 𝑝 is well approximated

by the normal distribution 𝑁 𝑛𝑝, 𝑛𝑝 1 − 𝑝 when 𝑛𝑝 ≥ 30 and 𝑛 1 − 𝑝 ≥ 30. Then, for a

random variable 𝑋~𝐵 𝑛, 𝑝 we have

𝑃 𝑋 ≤ 𝑥 = 𝑃𝑋 − 𝑛𝑝

𝑛𝑝 1 − 𝑝≤ 𝑦 ≈

1

2𝜋 −∞

𝑦

𝑒−𝑡2

2 𝑑𝑡 , 𝑦 =𝑥 − 𝑛𝑝

𝑛𝑝 1 − 𝑝

On the other hand, by numerical computation for 𝑍~𝑁 0,1 we know that

𝑃 𝑍 ≤ 1.96 =1

2𝜋 −1.96

1.96

𝑒−𝑡2

2 𝑑𝑡 = 0.95

A fair coin is tossed 𝑛 times (𝑛 is so large as 𝑛 ≫ 100). Let 𝑋𝑛 be the number of heads among 𝑛

trials. Find 𝑎𝑛 such that

𝑃𝑋𝑛

𝑛−1

2≤ 𝑎𝑛 = 0.95

23

Solution to Exercise 3

(1) Let 𝑝 be the probability that a head occurs

and 𝑋 the number of heads obtained in

tossing a coin 𝑁 times. Then 𝑋/𝑁 is an

unbiased estimator of 𝑝. In fact, define

𝑍𝑘 = 1, if a head occurs on 𝑘−th toss

0, otherwise

Then 𝑍1, 𝑍2, ⋯ , 𝑍𝑁 are random samples from

the population with mean 𝑝 (This is called a

binomial population). Then we know that the

sample mean

𝑍 =1

𝑁

𝑘=1

𝑁

𝑍𝑘

(2) Let 𝑝 be the ratio of the red balls in the urn.

Then, choosing a ball from the urn is equivalent to

tossing a coin such that the probability that a

head occurs is 𝑝. Then the situation is the same as

in (1). Letting 𝑋 be the number of red balls among

𝑁 trials, X/𝑁 is an unbiased estimator of 𝑝.

is an unbiased estimator, i.e., 𝐄 𝑍 = 𝑝. On the other

hand, 𝑘=1𝑁 𝑍𝑘 is the number of of heads obtained in

tossing a coin 𝑁 times and coincides with 𝑋.

Consequently, 𝑋

𝑁= 𝑍 is an unbiased estimator of 𝑝.

24

Solution to Exercise 4

First we find the distribution of 𝑀. The event 𝑀 = 𝑘 is divided into the following four cases:

• 𝑋1 < 𝑘, 𝑋2= 𝑘, 𝑋3 > 𝑘 or its permutations

• 𝑋1 = 𝑋2 = 𝑘, 𝑋3> 𝑘 or its permutations

• 𝑋1 < 𝑘, 𝑋2 = 𝑋3 = 𝑘 or its permutations

• 𝑋1 = 𝑋2 = 𝑋3 = 𝑘

Thus,

𝑃 𝑀 = 𝑘 =1

𝑁36 𝑘 − 1 𝑁 − 𝑘 + 3 𝑁 − 𝑘 + 3 𝑘 − 1) + 1 =

1

𝑁3−6𝑘2 + 6 𝑁 + 1 𝑘 − 3𝑁 + 2

𝐸 𝑀 =

𝑘=1

𝑁

𝑘 𝑃 𝑀 = 𝑘 =1

𝑁3

𝑘=1

𝑁

−6𝑘3 + 6 𝑁 + 1 𝑘2 − 3𝑁 + 2 𝑘

=1

𝑁3−6

𝑁2 𝑁 + 1 2

4+ 6 𝑁 + 1

𝑁 𝑁 + 1 2𝑁 + 1

6− 3𝑁 + 2

𝑁 𝑁 + 1

2=

𝑁 + 1

2

The mean value of 𝑀 is computed as follows:

which coincides with the mean of the reward 𝑁 + 1 /2. 25

Documents

Lecture 2 Statistical Inference - Tohoku University Official ...obata/student/graduate/file/...Unbiased estimates Definition 𝜃 =𝜃 𝑋 1,𝑋2,⋯,𝑋𝑁 is called an unbiased