32
1 Bernoulli and Binomial Distributions

1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

1

Bernoulli and Binomial Distributions

Page 2: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

2

Bernoulli Random Variables

• Setting:– finite population – each subject has a categorical response

with one of 2 possible values (0/1) – pick a simple random sample of n=1

subject

• Y random variable representing response (a Bernoulli random variable)

E Y p

var 1Y p p

Prob(Y=1)

Page 3: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

3

Bernoulli Random Variables

• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.

• Response: • 0 normal weight • 1 overweight

1

1 600.6

100

N

ss

y pN

Population Parameters:

Mean

2 2 2

1

22

22

1 140 0 60 1

100

40 601

100 100

1 1

1 1

1

N

ss

y p pN

p p

p p p p

p p p p

p p

Variance

Page 4: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

4

Bernoulli Random Variables

• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.

Values: • 0 normal weight • 1 overweight

0.6p Population Parameters:

Mean 2 1 0.24p p Variance

Pick a single subject at random:

Y

a Bernoulli Random Variable

1n

10

Page 5: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

5

Bernoulli Random Variables

• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.

Values: • 0 normal weight • 1 overweight

0.6p 2 1 0.24p p

ProbabilityY a Bernoulli Random

Variable

Event y P(y)

Normal 0 1-p

Overwt 1 p

Total 1

•events are mutually exclusive•exhaustive•probabilities sum to 1

Page 6: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

6

Bernoulli Random Variables

• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.

Values: • 0 normal weight • 1 overweight

0.6p 2 1 0.24p p

Y a Bernoulli Random Variable

Event y P(y)

Normal 0 1-p

Overwt 1 p

Total 1

1 0 1events

E Y p y y

p p

p

2

2 2

var

1 0 1

1

events

Y p y y E y

p p p p

p p

Page 7: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

7

Bernoulli Random Variables

• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.

Values: • 0 normal weight • 1 overweight

0.6p 2 1 0.24p p

Y a Bernoulli Random Variable

E Y p var 1Y p p

Simple random sample of n=1

Page 8: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

8

Binomial Random Variable

• Binomial Random Variable: The sum of independent identically distributed Bernoulli random variables.

• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.

Values: • 0 normal weight • 1 overweight

• Select a simple random sample of size n with replacement– the random variable representing each selection is a Bernoulli

Random variables– the random variables are independent– the random variables are identically distributed

• iid = independent and identically distributed (always occurs for random variables representing selections using simple random sampling with replacement)

1

n

ii

X Y

a Binomial Random Variable

Page 9: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

9

Independent Variables

Are the two random variables independent?

1Y first selection in a sample2Y second selection in a sample

(with Rep)

Two random variables are independent if for any realized value of the firstrandom variable, the probability is unchanged for any realized value of the second random variable.

Page 10: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

10

Independent Variables

Are the two random variables independent?

1Y first selection in a sample2Y second selection in a sample

(with Rep)

Suppose

1 0Y

22

2

0 with 0 1

1 with 1

p Y pY

p Y p

2 12

2 1

0 with 0 | 0 1

1 with 1| 0

p Y Y pY

p Y Y p

1 1Y

2 1

22 1

0 with 0 | 1 1

1 with 1| 1

p Y Y pY

p Y Y p

Conclusion: The RV’s are independent

Page 11: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

11

Independent Variables

Are the two random variables independent?

1Y first selection in a sample2Y second selection in a sample

(without Rep)

1

11

0 with 0 1

1 with 1

p Y pY

p Y p

Y

a Bernoulli Random Variable

10

10

0.6

N

p

Page 12: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

12

Independent Variables

Are the two random variables independent?

1Y first selection in a sample2Y second selection in a sample

(without Rep)Suppose

1 0Y

2 1

2

2 1

30 with 0 | 0

96

1 with 1| 09

p Y YY

p Y Y

1 1Y

2 1

2

2 1

40 with 0 | 1

95

1 with 1| 19

p Y YY

p Y Y

Conclusion: The RV’s are not independent

Page 13: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

13

Binomial Random Variable

• Binomial Random Variable: The sum of independent identically distributed (iid) Bernoulli random variables.

1

n

ii

X Y

a Binomial Random Variable

1

21 1 1 n

n

Y

YX

Y

1 Y

1

2

n

Y

Y

Y

a vector of Random Variables

Page 14: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

14

Expected Value and Variance of a Vector of Random Variables

1 2 1 1 2 2 1 1 2 2 possible

values

cov , ,all

Y Y p Y y Y y y E Y y E Y

1

2

n

Y

Y

Y

a vector of Random Variables

1 1

2 2

n n

Y E Y

Y E YE

Y E Y

1 1 1 2 1

2 2 1 2 2

1 2

var cov , cov ,

cov , var cov ,var

cov , cov , var

n

n

n n n n

Y Y Y Y Y Y

Y Y Y Y Y Y

Y Y Y Y Y Y

Page 15: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

15

Expected Value and Variance of a Vector of Random Variables

1

2

n

Y

Y

Y

a vector of independent Random Variables

1 1

2 2

n n

Y

YE

Y

21 1

22 2

2

0 0

0 0var

0 0n n

Y

Y

Y

a vector of independent and identically distributed (iid)Random Variables

1

2

1

1

1

n

n

Y

YE

Y

1

21

22 2

2

0 0

0 0var

0 0

n

n

Y

Y

Y

I

zero covariances

identity matrix

Page 16: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

16

Expected Value and Variance of a Linear Combination of Random Variables

nE X E1 Y

a Binomial Random Variable

1

2

1

n

i n ni

n

Y

YX Y

Y

1 1 Y

a vector of independent and identically distributed Bernoulli Random Variables

1

2n

n

Y

YE E p

Y

Y 1

1

2

1 0 0

0 1 0var 1 1

0 0 1

n

n

Y

Yp p p p

Y

I

var varn nX 1 Y 1

1

n

i ii

X cY

c Y

In general

E X Ec Y var varX c Y c

Page 17: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

17

Variance of a Binomal Random Variables

2

2

2

2

2

2

var var

0 0

0 0

0 0

1 0 0

0 1 01 1 1

0 0 1

1 1 1

n n

n n

n

n

X

n

1 Y 1

1 1

1

1

Page 18: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

18

Expected Value and Variance of a Binomal Random Variable

E X np

a Binomial Random Variable

1

n

i ni

X Y

1 Y

a vector of independent and identically distributed Bernoulli Random Variables

nE pY 1

var 1 np p Y I

var 1X np p

Page 19: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

19

Binomial Distribution

see table A.1 in Appendix of Textn k=X 0.4=p

X=x

4 0 0.1785

1 0.3456

2 0.3456

3 0.1536

4 0.0256

2 | 0.4, 4 0.3456E X p n

3 | 0.4, 4 0.1792E X p n

3 | 0.4, 4 1 3 | 0.4, 4

1 0.1792

0.8208

E X p n E X p n

1| 0.6, 4 ?E X p n

P X x

Page 20: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

20

Binomial Distribution

see table A.1 in Appendix of Textn k 0.6

4 4 0.1785

3 0.3456

2 0.3456

1 0.1536

0 0.0256

1| 0.6, 4 ?

0.1536

E X p n

Page 21: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

21

SRS with rep: Seasons Study

With Seasons Study, define High Total Cholesterol: TC>240

Select SRS with replacement:

Run SAS program: ejs09b540p46.sas

Example: Change Program to get 5 samples of size n=10

For each, calculate total TC>240

Page 22: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

22

Binomial DistributionFigure 2. Histogram of Totals for Sample (Prop with TC>240) based on Samples of n=20

Source: ejs09b540p47.sas 12/2/2009 by ejs

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0

5

10

15

20

25

30

35

40

Pe

rce

nt

x1_sum

What if 10,000 Sampleswere selected?

Page 23: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

23

Binomial Distribution

( ) !!( )!nn

x x n x=

-

P(X=x=# with TC>240)=

=(# ways of ways of picking samples with x)Pr(x ‘success’)P(n-x ‘failures’)

Page 24: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

24

Binomial DistributionLikelihood

We select a srs with replacement of n=10 and observe x=4. What is p?

64

64

64

4 | , 10 1

101

4

10 9 8 71

4 3 2 1

210 1

n xxnP X p n p p

x

p p

p p

p p

This is a function of p

Page 25: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

25

Binomial DistributionLikelihood

We select a srs with replacement of n=10 and observe x=4. What is p?

644 | , 10 210 1P X p n p p

Likelihood: 64210 1L p p p

Use table to find values for p:p L(p) p L(p)

0.05 0.001 0.40 0.2508

0.10 0.0112 0.45 0.2384

0.15 0.0401 0.50 0.2051

0.20 0.0881 0.55 0.1596

0.25 0.1460 0.60 0.1115

0.30 0.2001 0.65 0.0689

0.35 0.2377 etc

Page 26: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

26

Binomial DistributionMaximum LikelihoodLikelihood: 64210 1L p p p

p L(p) p L(p)

0.05 0.001 0.40 0.2508

0.10 0.0112 0.45 0.2384

0.15 0.0401 0.50 0.2051

0.20 0.0881 0.55 0.1596

0.25 0.1460 0.60 0.1115

0.30 0.2001 0.65 0.0689

0.35 0.2377 etc

L p

0.05

0.1

0.2

0.2 0.3 0.4 0.5

MaximumLikelihood

ˆ 0.4x

pn

Page 27: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

27

Binomial Distribution- Differences in Use

Mean

Usually report “total” instead of “mean”.

Total

EstimateVariance

Estimated Variance

Use Normal CLT

P̂ Y ˆnP nY

1P P

n

1nP P

2ˆ ˆ1

ˆ pP P

n

ˆ ˆ1nP P

Page 28: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

28

Binomial Distribution- Differences in Use

Mean

Total

Use Normal Dist for Interval Estimates

0.975ˆ ˆ pP z 0.975

ˆ ˆ pnP z n

Approximation good when

5np 1 5n p and

Page 29: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

29

Binomial Distribution- Differences in Use

0

0 0

ˆ

1cal

p pz

p p

n

Use hypothesized p for variance when

5np 1 5n p and

0 0:H p p

Page 30: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

30

Binomial Distribution- CI for Difference in Prop.

Diff in Means

(Proportions see 14.6)

1 1 0.975 1 2ˆ ˆ ˆ ˆvarP P z P P

1 1 2 2

1 21 2

ˆ ˆ ˆ ˆ1 1ˆ ˆvar

P P P PP P

n n

Page 31: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

31

Binomial Distribution- Hyp. Test for Difference in Prop.

1 2 1 2ˆ ˆ

ˆcald

P P p pz

1 2

ˆ ˆ ˆ ˆ1 1ˆd

P P P P

n n

1 1 2 2

1 2

ˆ ˆˆ n P n PP

n n

Pooled prob

0 1 2:H p p

1 2:aH p p0 1 2: 0H p p

1 2: 0aH p p

Page 32: 1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of

32

Chi-Square Distribution Hyp. Test for Difference in Prop.

22cal calz

Under the null hypothesis,this statistic follows a chi-square distribution with 1 degree of freedom.

0 1 2:H p p

1 2:aH p p0 1 2: 0H p p

1 2: 0aH p p