Upload
jewel-chase
View
258
Download
12
Embed Size (px)
Citation preview
Statistical Distributions
. Bernoulli Distribution
. Binomial Distribution
. Hypergeometric Distribution
. Multinomial Distribution
. Poisson Distribution
Discrete Type
. Normal Distribution
. Gamma Distribution
. Exponential Distribution
. Chi-square Distribution
. Student’s T Distribution
. F Distribution
Continuous Type
. Bernoulli Distribution
An experiment that can have one(X) of two outcomes: Success(S, x=1), Failure(F, x=0) Bernoulli experiment P(S) = P(X=1) = p , P(F) = P(X=0) = 1 - p = q
Probability distribution function
..,
,,)()()(
wo
xppxXPxf
xx
0
101 1
)(~ pBerX
pqppXEXEXVar 22222 )()()(
pXE )(
. Binomial Distribution
Repeated n times independent Bernoulli experiment
i.e. a Binomial experiment possesses the following properties:
10 the experiment consists of a fixed number n of trials 20 the result of each trial can be classified into one of two categories
30 the probability p of a success remains constant for each trial
40 each trial of the experiment is independent of the other trials
Binomial experiment
Let r.v. X be the number of successes in the n trials of a Binomialexperiment, then X is called the Binomial distribution, ),(~ pnBinX .
The probability distribution function is
..,
,,,,,)()()(
wo
nxppCxXPxf
xnxnx
0
2101
where)!(!
!xnx
nC n
x
npXE )(
)()()()( pnpXEXEXVar 12222
mean
variance
Let nXXX ,,, 21 be a random sample with Ber(p) and
nXXXX 21 , then
),(~ pnBinX and
np
XEXEXE
XXXEXE
n
n
)()()(
)()(
21
21
)(
)()()(
)()(
pnp
XVarXVarXVar
XXXVarXVar
n
n
121
212
. Hypergeometric Distribution
◆ Sampling with replacement (WR) v.s. Sampling without replacement (WTR)
R red balls
N balls n ballssampling
Population
Sample
Let X be the number of red balls in the sample, then the distributionof X is the hypergeometric distribution, ),,(~ NRnHypX
The probability distribution function (PDF) is
..,
},min{},max{,)()(
wo
RnxRNnC
CCxXPxf N
n
RNxn
Rx
0
0
meanN
RnXE )(
.
variance1
12
NnN
NR
NR
nXVar )(
Theorem:
If ),,(~ NRnHypX , then for each value nx ,,,, 210 , and as
N and R with pNR , a positive constant ,
xnxnxN
n
RNxn
Rx
NppC
C
CC
)(lim 1
Example:
Ten seeds are selected from a bin that contains 1000 flower seeds,
of which 400 are red flowering seeds, and the rest are of other colors.
10 P(exactly five red flowering seeds)=0.2013
mean = 4 variance = 2.378
20 P(five red flowering seeds by the Binomial approximation)=0.2007
mean = 4 variance = 2.40
Extension:
. Multinomial Distribution
If each trial has several different outcomes, label the different possible
types resulting from each trial by i where ,,,, ki 21
the probability of each type at each trial is pi , and the count of each
of the types in a sample of size n as Xi , then the probability of),,,( kXXXX 21 is
k
ii
k
ii
k
i
xik
ii
xk
xx
k
kkk
pnxpx
n
pppxxx
n
xXxXxXPxxxf
i
k
111
1
2121
221121
1
21
,,!
!
!!!
!
),,,(),,,(
),,,,(~ kpppnMULTX 21
H.W.
寫出 “ Extended Hypergeometric Distribution”
. Poisson Distribution
the number of cars that are red, out of every 10 cars that pass a
certain spot on a road
Binomial distribution
the number of red cars that pass the spot per hour, without specifying
how many cars in total there are
Poisson distribution
)(~ PoiXr.v.
..,
,,,,!)()(
wo
xx
exXPxf
x
0
210
mean = variance = λ
例 :
假設到達某醫院病患人數符合 Poisson 過程 , 且平均每小時 1人到達 , 則
10 P(1 小時內無病患到達 ) 3679001
00 101
.!
)()(
ee
fXP
20 P(1 小時內病患到達人數少於 4 人 )
981001
343
0
1
.!
)()(
x
x
x
eXPXP
Hypergeometric distribution
n/N 0.05≦
Binomial distribution
Poisson distribution
n large, p small(rare event)
例 :
假設某種疾病治癒率為 2% , 若今有 100 位病患接受治療 , 試求最多三人被治癒之機率。
<Sol. Binomial>
令 r.v. X 表治癒人數 , 則 ).,(~ 020100BinX
859002010203 1003
0
100 ...)(
xx
xxCXP
<Sol. Poisson>
)(~)(., 22020100 PoiXnpsmallpn
85702
33
0
2
.!
)(
x
x
xe
XP
. Normal Distribution
X
Xμ
f(x)
r.v. X ~ N(μ, σ2)
02
1 2
2
2
,,,)( Rxexfx
the pdf for X is
2 )(,)( XVarXE
μ-σ μ+σ
X ~ N(μ, σ2)
μ-2σ μ+2σ
9970333
9540222
6830
.)()(
.)()(
.)()(
XPXP
XPXP
XPXP
μ
X ~ N(μ, σ2)
μa b
??)()()()(
b
a
xba dxedxxfaFbFbXaP
2
2
2
21
● X ~ N(μ, σ2) normalized Z ~ N( 0 , 1)
),(~ 10NX
Z
Z0
ψ(z)
● the pdf for X is
z
ezz
,)( 2
2
21
Z ~ N( 0 , 1)
● ??)()()(
z
zz dzedzzzzZP 2
2
21
z
( 查表 )
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.00.10.20.30.40.50.60.70.80.91.01.11.21.31.41.51.61.71.81.92.02.1
0.50000.53980.57930.61790.65540.69150.72570.75800.78810.81590.84130.86430.88490.90320.91920.93320.94520.95540.96410.97130.97720.9821
0.5040
0.9463
0.97190.9778
0.5080
0.9474
0.97260.9783
0.5120
0.9484
0.97320.9788
0.5160
0.9495
0.97380.9793
0.51990.55960.59870.63680.67360.70880.74220.77340.80230.82890.85310.87490.89440.91150.92650.93940.95050.95990.96780.97440.97980.9842
0.5239
0.9515
0.9750
0.5279
0.9525
0.9756
0.5319
0.9535
0.9761
0.5359
0.9545
0.9767
Standard Normal Cumulative Distribution Function Φ(z)
● Properties : Z ~ N( 0 , 1)
50000 .)()( ZP
Z0
ψ(z)Z ~ N( 0 , 1)
z-z
10
20
)(
)(
)(
)()(
z
zZP
zZP
zZPz
1
1
ab
bXaPaFbFbXaP )()()(
30 X ~ N(μ, σ2)
例 : 1. P(– 0.15 Z 1.60) = Φ(1.60) – Φ(– 0.15)≦ ≦ = Φ(1.60) – [1 – Φ(0.15)] = 0.9452 – [1 – 0.5596] = 0.9452 – 0.4404 = 0.5048
2. P(Z – 1.9 or Z 2.1) ≦ ≧ = P(Z – 1.9) + P( Z 2.1) ≦ ≧ = Φ(– 1.9) + [1 – Φ(2.1)] = [1 – Φ(1.9)] + [1 – Φ(2.1)] = [1 – 0.9713] + [1 – 0.9821] = 0.0287 + 0.0179 = 0.0466
0 z
Z ~ N( 0 , 1)
● )()( zZPzZP
z
α α
● 11 )()()()( zzzz
6451961 0500250 .,. .. zz例 :
例 : 假設某一族群男性體重 (X) 呈常態分配 , 平均體重與標準差分 別是 80 及 5 公斤 , 則體重介於 65 和 75 公斤之間的比例有
15740
01035
8075
5
80
5
80657565
.
)..(
)()(
ZP
XPXP
H.W. 體重超過 85 公斤的比例有多少 ?
●
),(~)(
..
))(,(~
)(,),(~..
101
1
515
Npnp
npXZei
pnpnpNX
pnnpandpnBinXvr
10
)(.
pnpnpx
ZPxXP1
50
)(.
pnpnpx
ZPxXP1
5020
例 : Consider sample allele proportions for the ABO blood group system. For a sample of size 16 alleles from a population in which allele A has proportion 0.50, the numbers of A are X.
ondistributiBinomialbyXP 227206 .10
20 ionapproximatNormalbyZPXP 22660505016
50501666 .)
..
..(
. Gamma Distribution
),(~.. GamXvr
0001 1
,,,)(
)( xexxfpdfx
)( XE
2)( XVar
. Exponential Distribution
In the Gamma distribution ,),(~.. GamXvr
if
01
xexfpdfEXPXvr
x
,)(,)(~..
2 )(,)( XVarXE
1 then
Theorem: no-memory property
)()()(~ tXPaXtaXPExpX
. Chi-square Distribution
In the Gamma distribution ,),(~.. GamXvr
if then2
2 ,
02
1 21
2
2 2
2
xexxfpdfXvr
x
,)(
)(,)(~..
2 )(,)( XVarXE
例 :
)(~)(
),.(~,,,..
111 2
2
122
22
221
nXXSn
NXXXsr
n
ii
n
22
2
22 1
1
)(
)()(
SE
nSn
EE
12
112
121
4
2
222
2
22
nnnSVar
nSn
VarVar
)(
)()()(
)()(
)(
. Student’s T Distribution
)(~)(~),(~
tVZ
TthentindependenareVandNZif 210
Rxx
xfpdf
,)(2
12
11
2
21
20
)(,)( XVarXE
. F Distribution
),(~)(~)(~ 21
2
2
1
1
22
212
1
FV
V
XthentindependenareVandVif
01
22
2 2
2
1122
2
1
21
21 21
1
1
xxxxfpdf ,)(
)(
22
221
2122
22
2
442
22
22
,)()(
)()(
,)(
XVar
XE
例 :
),(~
)(
)(
)(
)(
)(
)(
)(~)(
)(~)(
),.(~,,,..),.(~,,,..
11
1
1
1
1
1
1
111
111
2
2
2
2
2
2
2
2
2
2
22
122
22
22
122
22
221
221
yxy
x
x
y
y
y
yy
x
x
xx
y
y
x
x
y
n
ii
yy
yyy
x
n
ii
xx
xxx
ynxn
nnFS
S
n
Sn
n
Sn
n
nF
tindependenarenYYSn
andnXXSn
tindependenare
NYYYsrandNXXXsr
y
x
yx
Basic Sampling
Distribution Theory
● Statistic is a function of a random sample.
e.q.
n
iiX
nX
1
1
● Statistic is a random variable. Sampling distribution
e.q.
),(~
),(~,,,
n
n
ii
n
NXn
X
NXXXsamplerandom
2
1
221
1
Inferential statistics
◆ Estimation of parameter
◆ Testing of statistical hypothesis
估計 (Estimation): 由母體抽出樣本 , 依據樣本統計量的 抽樣分配 , 推估母體參數真實值。
點估計 (point estimation)
區間估計 (interval estimation)
點估計 (point estimation)
e.q.
2S
Sample mean X Population meanEst.
Sample variance2Est. Population variance
Sample proportion pn
xp ˆ Est. Population proportion
i.e. ̂ Est. (Statistic) (Parameter)
‧ 不偏性 (unbiased)
假設 T 為母體未知參數 θ 之一估計量 , 若 E(T) = θ, 則 T 為 θ 的不偏估計量 (unbiased estimator)
參數的估計是否只存在唯一的估計量 ( 統計量 )?
‧ 有效性 (efficient)
假設 T1 及 T2 均為母體未知參數 θ 之不偏估計量 , 且 Var(T1) < Var(T2), 則 T1 較 T2 具有效性
‧ 一致性 (consistant)
假設 T 為母體未知參數 θ 之一估計量 , 若 , 則 T 為 θ 的一致估計量 (consistant estimator)
0
)(lim TVarn
例 : r.s. 321 XXX ,, 具期望值 μ , 變異數 σ2
令
則3
32
3
2
3321
321
2321
1XXX
TXX
TXXX
T
,,
33
3211
XXXETE )(
)( 2TE
23 )(TE
9
3
33
2
2
222321
1
XXX
VarTVar )(
9
5 2
2)(TVar
Unbiased estimator
例 : r.s. ),(~,,, 2
21 NXXX n
),(~ n
n
ii NX
nX
2
1
1
02
nXVar
XE
nn
limlim Consistant estimator
Unbiased estimator