View
226
Download
0
Tags:
Embed Size (px)
Citation preview
1
Bernoulli and Binomial Distributions
2
Bernoulli Random Variables
• Setting:– finite population – each subject has a categorical response
with one of 2 possible values (0/1) – pick a simple random sample of n=1
subject
• Y random variable representing response (a Bernoulli random variable)
E Y p
var 1Y p p
Prob(Y=1)
3
Bernoulli Random Variables
• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.
• Response: • 0 normal weight • 1 overweight
1
1 600.6
100
N
ss
y pN
Population Parameters:
Mean
2 2 2
1
22
22
1 140 0 60 1
100
40 601
100 100
1 1
1 1
1
N
ss
y p pN
p p
p p p p
p p p p
p p
Variance
4
Bernoulli Random Variables
• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.
Values: • 0 normal weight • 1 overweight
0.6p Population Parameters:
Mean 2 1 0.24p p Variance
Pick a single subject at random:
Y
a Bernoulli Random Variable
1n
10
5
Bernoulli Random Variables
• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.
Values: • 0 normal weight • 1 overweight
0.6p 2 1 0.24p p
ProbabilityY a Bernoulli Random
Variable
Event y P(y)
Normal 0 1-p
Overwt 1 p
Total 1
•events are mutually exclusive•exhaustive•probabilities sum to 1
6
Bernoulli Random Variables
• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.
Values: • 0 normal weight • 1 overweight
0.6p 2 1 0.24p p
Y a Bernoulli Random Variable
Event y P(y)
Normal 0 1-p
Overwt 1 p
Total 1
1 0 1events
E Y p y y
p p
p
2
2 2
var
1 0 1
1
events
Y p y y E y
p p p p
p p
7
Bernoulli Random Variables
• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.
Values: • 0 normal weight • 1 overweight
0.6p 2 1 0.24p p
Y a Bernoulli Random Variable
E Y p var 1Y p p
Simple random sample of n=1
8
Binomial Random Variable
• Binomial Random Variable: The sum of independent identically distributed Bernoulli random variables.
• Example: Finite population of 100 subjects, where 40 are normal weight and 60 are overweight.
Values: • 0 normal weight • 1 overweight
• Select a simple random sample of size n with replacement– the random variable representing each selection is a Bernoulli
Random variables– the random variables are independent– the random variables are identically distributed
• iid = independent and identically distributed (always occurs for random variables representing selections using simple random sampling with replacement)
1
n
ii
X Y
a Binomial Random Variable
9
Independent Variables
Are the two random variables independent?
1Y first selection in a sample2Y second selection in a sample
(with Rep)
Two random variables are independent if for any realized value of the firstrandom variable, the probability is unchanged for any realized value of the second random variable.
10
Independent Variables
Are the two random variables independent?
1Y first selection in a sample2Y second selection in a sample
(with Rep)
Suppose
1 0Y
22
2
0 with 0 1
1 with 1
p Y pY
p Y p
2 12
2 1
0 with 0 | 0 1
1 with 1| 0
p Y Y pY
p Y Y p
1 1Y
2 1
22 1
0 with 0 | 1 1
1 with 1| 1
p Y Y pY
p Y Y p
Conclusion: The RV’s are independent
11
Independent Variables
Are the two random variables independent?
1Y first selection in a sample2Y second selection in a sample
(without Rep)
1
11
0 with 0 1
1 with 1
p Y pY
p Y p
Y
a Bernoulli Random Variable
10
10
0.6
N
p
12
Independent Variables
Are the two random variables independent?
1Y first selection in a sample2Y second selection in a sample
(without Rep)Suppose
1 0Y
2 1
2
2 1
30 with 0 | 0
96
1 with 1| 09
p Y YY
p Y Y
1 1Y
2 1
2
2 1
40 with 0 | 1
95
1 with 1| 19
p Y YY
p Y Y
Conclusion: The RV’s are not independent
13
Binomial Random Variable
• Binomial Random Variable: The sum of independent identically distributed (iid) Bernoulli random variables.
1
n
ii
X Y
a Binomial Random Variable
1
21 1 1 n
n
Y
YX
Y
1 Y
1
2
n
Y
Y
Y
a vector of Random Variables
14
Expected Value and Variance of a Vector of Random Variables
1 2 1 1 2 2 1 1 2 2 possible
values
cov , ,all
Y Y p Y y Y y y E Y y E Y
1
2
n
Y
Y
Y
a vector of Random Variables
1 1
2 2
n n
Y E Y
Y E YE
Y E Y
1 1 1 2 1
2 2 1 2 2
1 2
var cov , cov ,
cov , var cov ,var
cov , cov , var
n
n
n n n n
Y Y Y Y Y Y
Y Y Y Y Y Y
Y Y Y Y Y Y
15
Expected Value and Variance of a Vector of Random Variables
1
2
n
Y
Y
Y
a vector of independent Random Variables
1 1
2 2
n n
Y
YE
Y
21 1
22 2
2
0 0
0 0var
0 0n n
Y
Y
Y
a vector of independent and identically distributed (iid)Random Variables
1
2
1
1
1
n
n
Y
YE
Y
1
21
22 2
2
0 0
0 0var
0 0
n
n
Y
Y
Y
I
zero covariances
identity matrix
16
Expected Value and Variance of a Linear Combination of Random Variables
nE X E1 Y
a Binomial Random Variable
1
2
1
n
i n ni
n
Y
YX Y
Y
1 1 Y
a vector of independent and identically distributed Bernoulli Random Variables
1
2n
n
Y
YE E p
Y
Y 1
1
2
1 0 0
0 1 0var 1 1
0 0 1
n
n
Y
Yp p p p
Y
I
var varn nX 1 Y 1
1
n
i ii
X cY
c Y
In general
E X Ec Y var varX c Y c
17
Variance of a Binomal Random Variables
2
2
2
2
2
2
var var
0 0
0 0
0 0
1 0 0
0 1 01 1 1
0 0 1
1 1 1
n n
n n
n
n
X
n
1 Y 1
1 1
1
1
18
Expected Value and Variance of a Binomal Random Variable
E X np
a Binomial Random Variable
1
n
i ni
X Y
1 Y
a vector of independent and identically distributed Bernoulli Random Variables
nE pY 1
var 1 np p Y I
var 1X np p
19
Binomial Distribution
see table A.1 in Appendix of Textn k=X 0.4=p
X=x
4 0 0.1785
1 0.3456
2 0.3456
3 0.1536
4 0.0256
2 | 0.4, 4 0.3456E X p n
3 | 0.4, 4 0.1792E X p n
3 | 0.4, 4 1 3 | 0.4, 4
1 0.1792
0.8208
E X p n E X p n
1| 0.6, 4 ?E X p n
P X x
20
Binomial Distribution
see table A.1 in Appendix of Textn k 0.6
4 4 0.1785
3 0.3456
2 0.3456
1 0.1536
0 0.0256
1| 0.6, 4 ?
0.1536
E X p n
21
SRS with rep: Seasons Study
With Seasons Study, define High Total Cholesterol: TC>240
Select SRS with replacement:
Run SAS program: ejs09b540p46.sas
Example: Change Program to get 5 samples of size n=10
For each, calculate total TC>240
22
Binomial DistributionFigure 2. Histogram of Totals for Sample (Prop with TC>240) based on Samples of n=20
Source: ejs09b540p47.sas 12/2/2009 by ejs
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
5
10
15
20
25
30
35
40
Pe
rce
nt
x1_sum
What if 10,000 Sampleswere selected?
23
Binomial Distribution
( ) !!( )!nn
x x n x=
-
P(X=x=# with TC>240)=
=(# ways of ways of picking samples with x)Pr(x ‘success’)P(n-x ‘failures’)
24
Binomial DistributionLikelihood
We select a srs with replacement of n=10 and observe x=4. What is p?
64
64
64
4 | , 10 1
101
4
10 9 8 71
4 3 2 1
210 1
n xxnP X p n p p
x
p p
p p
p p
This is a function of p
25
Binomial DistributionLikelihood
We select a srs with replacement of n=10 and observe x=4. What is p?
644 | , 10 210 1P X p n p p
Likelihood: 64210 1L p p p
Use table to find values for p:p L(p) p L(p)
0.05 0.001 0.40 0.2508
0.10 0.0112 0.45 0.2384
0.15 0.0401 0.50 0.2051
0.20 0.0881 0.55 0.1596
0.25 0.1460 0.60 0.1115
0.30 0.2001 0.65 0.0689
0.35 0.2377 etc
26
Binomial DistributionMaximum LikelihoodLikelihood: 64210 1L p p p
p L(p) p L(p)
0.05 0.001 0.40 0.2508
0.10 0.0112 0.45 0.2384
0.15 0.0401 0.50 0.2051
0.20 0.0881 0.55 0.1596
0.25 0.1460 0.60 0.1115
0.30 0.2001 0.65 0.0689
0.35 0.2377 etc
L p
0.05
0.1
0.2
0.2 0.3 0.4 0.5
MaximumLikelihood
ˆ 0.4x
pn
27
Binomial Distribution- Differences in Use
Mean
Usually report “total” instead of “mean”.
Total
EstimateVariance
Estimated Variance
Use Normal CLT
P̂ Y ˆnP nY
1P P
n
1nP P
2ˆ ˆ1
ˆ pP P
n
ˆ ˆ1nP P
28
Binomial Distribution- Differences in Use
Mean
Total
Use Normal Dist for Interval Estimates
0.975ˆ ˆ pP z 0.975
ˆ ˆ pnP z n
Approximation good when
5np 1 5n p and
29
Binomial Distribution- Differences in Use
0
0 0
ˆ
1cal
p pz
p p
n
Use hypothesized p for variance when
5np 1 5n p and
0 0:H p p
30
Binomial Distribution- CI for Difference in Prop.
Diff in Means
(Proportions see 14.6)
1 1 0.975 1 2ˆ ˆ ˆ ˆvarP P z P P
1 1 2 2
1 21 2
ˆ ˆ ˆ ˆ1 1ˆ ˆvar
P P P PP P
n n
31
Binomial Distribution- Hyp. Test for Difference in Prop.
1 2 1 2ˆ ˆ
ˆcald
P P p pz
1 2
ˆ ˆ ˆ ˆ1 1ˆd
P P P P
n n
1 1 2 2
1 2
ˆ ˆˆ n P n PP
n n
Pooled prob
0 1 2:H p p
1 2:aH p p0 1 2: 0H p p
1 2: 0aH p p
32
Chi-Square Distribution Hyp. Test for Difference in Prop.
22cal calz
Under the null hypothesis,this statistic follows a chi-square distribution with 1 degree of freedom.
0 1 2:H p p
1 2:aH p p0 1 2: 0H p p
1 2: 0aH p p