Upload
others
View
50
Download
1
Embed Size (px)
Citation preview
Computational Inference
Simple computational tools for solving hard statisticalproblems.
I Monte Carlo/simulation
I MC and simulation in frequentist inference
I Random number generation/ simulating from probabilitydistributions
I Further Bayesian computation
Methods implemented via simple programs in R.
2 / 1
Chapter 1: Monte Carlo methodsProblem 1: estimating probabilities
A particular site is being considered for a wind farm. At thatsite, the log of the wind speed in m/s on day t is known tofollow an AR(2) process:
Yt = 0.6Yt−1 + 0.4Yt−2 + εt, (1)
with εt ∼ N(0, 0.01).
If Y1 = Y2 = 1.5, what is the probability that the wind speedexp(Yt) will be below 15 kmh for more than 10 days in a 100day period?
3 / 1
Problem 2: estimating variances
Given a sample of 5 standard normal random variablesX1, . . . , X5, what is the variance of
maxi{Xi} −min
i{Xi}
4 / 1
Problem 3: Estimating percentiles
The concentration of pollutant at any point in region followingrelease from point source can be describe by the model
C(x, y, z) =Q
2πu10σzσyexp
[−1
2
{y2
σ2y+
(z − h)2
σ2z
}], (2)
C: air concentration of pollutant, Q: release rate, u10: windspeed at 10m above ground, σy, σz: diffusion parameters inhorizontal and vertical directions, h:release height, (x, y, z):coordinates along wind direction, cross wind and above ground.
Given Q = 100, h = 50m, but u, σz, σy uncertain. If
log u10 ∼ N(2, .1) log σ2y ∼ N(10, 0.2) log σ2z ∼ N(5, 0.05)
What is the 95th percentile of C(100, 100, 40)?
5 / 1
Problem 4:Estimating expectations
A hospital ward has 8 beds
I The number of patients arriving each day is uniformlydistributed between 0 and 5 inclusive.
I The length of stay for each patient is also uniformlydistributed between 1 and 3 days inclusive.
If all 8 beds are free initially, what is the expected number ofdays before there are more patients than beds?
6 / 1
Problem 5: Optimal decisionsThe Monty Hall Problem
On a game show you are given the choice of three doors.
I Behind one door is a car; behind the others, goats.
The rules of the game are
I After you have chosen a door, the game show host, MontyHall, opens one of the two remaining doors to reveal a goat.
I You are now asked whether you want to stay with yourfirst choice, or to switch to the other unopened door.
What is the optimal strategy? And what is the resultingprobability of winning?
7 / 1
These 5 problems are all either hard or impossible to tackleanalytically. However, the Monte Carlo method, can be usedto obtain approximate answers to all of them.
Monte Carlo methods are a broad class ofcomputational algorithms relying on repeated randomsampling to obtain numerical results. They userandomness to solve problems that might bedeterministic in principle.
8 / 1
Some useful resultsMonte Carlo is primarily used to calculate integrals. Forexample
I Expectation of a random variable X ∼ f(·), or a functionof it
Eg(X) =
∫g(x)f(x)dx
I VarianceVar(X) = E(X2)− E(X)2. (3)
I Probability P(X < a) is the expectation of IX<a, theindicator function which is 1 if X < a and otherwise is 0.Then
P(X < a) = 1× P(X < a) + 0× P(X ≥ a)
= E{I(X < a)} =
∫IX<af(X)dx
9 / 1
Monte Carlo Integration - ISuppose we are interested in the integral
I = E(g(X)) =
∫g(x)f(x)dx
Let X1, X2, . . . , Xn be independent random variables with pdff(x). Then a Monte Carlo approximation to I is
In =1
n
n∑i=1
g(Xi). (4)
Example:For example, suppose we wish to find
EX =
∫xf(x)dx
Then we can approximate the theoretical mean by the samplemean
E(X) ≈ 1
n
n∑i=1
Xi
10 / 1
Monte Carlo Integration - II
Some properties of I.
(1) In is an unbiased estimator of I. Proof:
We have
E(In) =1
n
n∑i=1
E{g(Xi)}
but each g(Xi) has expectation
E{g(X)} =
∫ 1
0g(x)f(x) dx = I.
11 / 1
Monte Carlo Integration - III
(2) In converges to I as n→∞.
Proof: The strong law of large numbers says that if Xi areiid rvs with E(X) = µ and VarX <∞ then
1
n
n∑i=1
Xi → µ with probability 1
or equivalently
P
(limn→∞
1
n
n∑i=1
Xi = µ
)= 1.
Apply the SLLN to the rvs g(Xi) to find In → I w.p. 1.
12 / 1
Monte Carlo Integration - IVThe SLLN tells us In converges, but not how fast. It doesn’ttell us how large n must be to achieve a certain error.
(3)
E[(In − I)2] =σ2
n
where σ2 = Var(g(X)). Thus the ‘root mean square error’(RMSE) of In is
RMSE(In) =σ√n
= O(n−1/2).
Thus, our estimate is more accurate as n→∞, and is lessaccurate when σ2 is large. σ2 will usually be unknown, but wecan estimate it:
σ2 =1
n− 1
n∑i=1
(g(Xi)− In)2
We call σ the Monte Carlo standard error.13 / 1
Monte Carlo Integration - V
We write1
RMSE(In) = O(n−1/2)
to emphasise the rate of convergence of the error with n.
To get 1 digit more accuracy requires a 100-fold increase in n.A 3-digit improvement would require us to multiply n by 106.
Consequently Monte Carlo is not usually suited for problemswhere we need a very high accuracy. Although the error rate islow (the RMSE decreases slowly with n), it has the niceproperties that the RMSE
I does not depend on d = dim(x)
I does not depend on the smoothness of f
Consequently Monte Carlo is very competitive in highdimensional problems that are not smooth.
14 / 1
Monte Carlo Integration - VI
In addition to the rate of convergence, the central limit theoremtells us the asymptotic2 distribution of In
(4) √n(In − I)
σ→ N(0, 1) in distribution as n→∞
Informally, In is approximately N(I, σ2
n ) for large n.
This allows us to calculate confidence intervals for I.
See the R code on MOLE.
15 / 1
I If we require E{f(X)}, random observations fromdistribution of f(X) can be generated by generatingX1, . . . , Xn from distribution of X, and then evaluatingf(X1), . . . , f(Xn).
I Preceding results can be applied when estimating variancesor probabilities of events.
I Percentiles estimated by taking the sample percentile fromthe generated sample of values X1, . . . , Xn.
I We expect the estimate to be more accurate as n increases.Determining a percentile is equivalent to inverting a CDF.If wish to know the 95th percentile, we must find ν suchthat
P (X ≤ ν) = 0.95, (5)
16 / 1
Monte Carlo solutions to the example problemsQuestion 1
Define E: the event that in 100 days the wind speed is below15kmh for more than 10 days.To estimate P(E), generate lots of individual time series, andcount proportion of series in which E occurs For i = 1, 2, . . . , N :
1. Generate ith realisation of the time series process:For t = 3, 4, . . . , 100:
I Set Yt ← 0.6Yt−1 + 0.4Yt−2 +N(0, 0.01)
2. Count number of elements of {Y1, . . . , Y100} less thanlog 15 = 4.167:
I Set Xi ←∑100t=1 I{Yt < 4.167}
3. Determine if event E has occured for time series i:I Set Ei ← I{Xi > 10}
4. Estimate P(E) by 1N
∑Ni=1Ei
17 / 1
Question 2
Define Z to be the difference between max and min of 5standard normal random variables. Estimate the variance
For i = 1, 2, . . . , N :
1. Generate the ith random value of Z:I Sample X1, . . . , X5 independently from N(0, 1)I Set Zi ← max{X1, . . . , X5} −min{X1, . . . , X5}
2. Estimate Var(Z) by 1N−1
∑Ni=1(Zi − Z)2
18 / 1
Question 3
Transformation of a random variable:Given random variables X1, . . . , Xd we want to know thedistribution of Y = f(X1, . . ., Xd).
I The Monte Carlo method can be usedI Sample unknown inputs from their distributions,I evaluate the function to obtain output value from its
distribution.
I Given suitably large sample, 95th percentile fromdistribution of C(100, 100, 40) can be estimated by the 95thpercentile from sample of simulated values ofC(100, 100, 40).
19 / 1
For i = 1, 2, . . . , N :
1. Sample a set of input values:I Sample u10,i from logN(2, .1)I Sample σ2
y,i from logN(10, 0.2)I Sample σ2
z,i from logN(5, 0.05)
2. Evaluate the model output Ci:
I Set Ci ← 1002πu10,iσz,iσy,i
exp[− 1
2
{402
σ2y,i
+ 100σ2z,i
}]3. Return the 95th percentile of C1, C2, . . . , CN .
20 / 1
Question 4
I Define W to be the number of days before the first patientarrives to find no available beds.
I The question has asked us to give E(W ).
I If we can generate W1, . . . ,Wn from the distribution of W ,we can then estimate E(W ) by W .
See the R code on MOLE for a way to simulate this process.
21 / 1
The Monty Hall Problem
I Simulate N separate games by randomly letting x takevalues in {1, 2, 3} with equal probability. x representswhich door the car is behind.
I Simulate the contestant randomly picking a door bychoosing a value y in {1, 2, 3} (it doesn’t matter how we dothis, we can always choose 1 if you like, the results are thesame).
I Now the game show host will open the door which hasn’tbeen picked that contains a goat. For each of the N games,record the success of the two strategies
1. stick with choice y2. change to the unopened door.
I Calculate the success rate for each strategy.
22 / 1
Example 1
Consider the probability p that a standard normal randomvariable will lie in the interval [0, 1]. This can be written as anintegral
p =
∫ 1
0
1√2π
exp
(−x
2
2
)dx. (6)
Two methods for estimating/evaluating this probability are
1. numerical integration/quadrature, e.g., trapezium rule,Simpson’s rule etc
2. given a sample of standard normal random variablesZ1, . . . , Zn, look at the proportion of Zis occurring in theinterval [0, 1].
23 / 1
Example 1: An alternative method
1. Y is a RV with f(Y ) any function of Y . To generate arandom value from the distribution of f(Y ), generate arandom Y from the distribution of Y , and then evaluatef(Y ).
2. Providing E{f(Y )} exists, given a sample f(Y1), . . . , f(Yn),
1
n
n∑i=1
f(Yi)
is an unbiased estimator of E{f(Y )}.3. Let X be a random variable with a U [0, 1] distribution. For
an arbitrary function f(X), what is the expectation off(X)?
E{f(X)} =
∫ 1
0f(x)1dx, (7)
24 / 1
4 Now choose f to be the function f(X) = 1√2π
exp(−X2
2
).
Then if X ∼ U [0, 1]
E{f(X)} =
∫ 1
0
1√2π
exp
(−x
2
2
)1dx (8)
Given a sample f(X1), . . . , f(Xn) from the distribution off(X), we can estimate E{f(X)} by the unbiased MonteCarlo estimator p
p =1
n
n∑i=1
f(Xi), (9)
where Xi is drawn randomly from the U [0, 1] distribution.
Key idea
re-express the integral of interest (6) as an expectation.
25 / 1
Example 2
Consider the integral∫ 10 h(x)dx where
h(x) = [cos(50x) + sin(20x)]2
Generate X1, . . . , Xn from U [0, 1] and estimate withIn = 1
n
∑h(Xi).
26 / 1
The general framework
R =
∫f(x)dx (10)
Let g(x) be some density function that is easy to sample from.How do we re-write (10) as the expectation of a function of arandom variable X with density function g(x)?
R =
∫f(x)
g(x)g(x)dx =
∫h(x)g(x)dx, (11)
with h(x) = f(x)g(x) .
So we now have R = E{h(X)}, where X has the densityfunction g(x). If we now sample X1, . . . , Xn from g(x), thenevaluate h(X1), . . . , h(Xn),
R =1
n
n∑i=1
h(Xi) (12)
is an unbiased estimator of R.27 / 1
Example 3
Use Monte Carlo integration to estimate
R =
∫ 1
−1exp(−x2)dx. (13)
We’ll consider two different choices for g(x).
1. A uniform density on [−1, 1]: g(x) = 0.5 for x ∈ [−1, 1].We sample X1, . . . , Xn from U [−1, 1], and estimate R by
R =1
n
n∑i=1
exp(−X2i )
g(Xi)=
1
n
n∑i=1
2 exp(−X2i ). (14)
28 / 1
2 A normal density function N(0, 0.5).Note: sampled value X from g(x) not constrained to lie in[−1, 1].Re-write R as
R =
∫ ∞−∞
I{−1 ≤ x ≤ 1} exp(−x2)dx, (15)
where I{} denotes the indicator function.We now sample X1, . . . , Xn from N(0, 0.5) and estimate Rby
R =1
n
n∑i=1
I{−1 ≤ Xi ≤ 1} exp(−X2i )
g(Xi)=
1
n
n∑i=1
π1/2I{−1 ≤ Xi ≤ 1}.
(16)
Key idea
g(x) needs to mimic f(x) as closely as possible. Consider againR =
∫ 1−1 exp(−x2)dx.
29 / 1
Two terrible choices of g:
1. A uniform density on [0, 1]: g(x) = 1 for x ∈ [0, 1].
R =
∫ ∞−∞
I{−1 ≤ x ≤ 1} exp(−x2)dx, (17)
For x ∈ [−1, 0), we have f(x) > 0 and g(x) = 0. Must haveg(x) > 0 for all x where f(x) > 0.
2. A normal density N(0, 0.09).In this case, we have g(x) > 0 for x ∈ [−1, 1], but we whenwe sample x from g, we expect around 95% of the values tolie in the range (−0.6, 6).The Monte Carlo estimate of R is given by
R =1
n
n∑i=1
I{−1 ≤ Xi ≤ 1}exp(−X2i )√
0.18π
exp(−5.56X2i )
. (18)
30 / 1
ConvergenceI Provided f(x) > 0⇒ g(x) > 0, R will converge to R asn→∞.
I Use the central limit theorem to derive a confidenceinterval for R:
R ∼ N(R,
σ2
n
), (19)
where we estimate σ2 by
σ2 =1
n− 1
n∑i=1
{h(Xi)− R
}2(20)
I We can then report the confidence interval as
R± Z1−α/2√σ2/n, (21)
I Estimates of σ2 in the example: U [−1, 1] : 0.16,N(0, 0.5) : 0.42, N(0, 0.09) : 6.81.
31 / 1
Comparison of Monte Carlo with numerical integrationMid-ordinate rule
Consider finding I =
∫ 1
0f(x) dx. There are many different
numerical integration schemes we might use.For example, the mid-ordinate rule is one of the simplestmethods, and approximates I by a sum
In =1
n
n∑i=1
f(xi),
The points xi = (i− 12)h are equally spaced at intervals of
h = 1/n.
32 / 1
Comparison of MC with numerical integration IIMid-ordinate rule error analysis
For smooth 1-d functions the error rates for quadrature rulescan be much better than Monte Carlo
For example, if f : [0, 1]→ R and f ′′(x) is continuous, then
|I − In| ≤1
24n2max0≤x≤1
|f ′′(x)|
SoRMSE(I) = O(n−2)
i.e., it is a second order method. Other rules achieve highererrror rates. For example, Simpson’s rule is a fourth ordermethod.
This is much faster than Monte Carlo: to get an extra digit ofaccuracy we only need multiply n by a factor of
√10 = 3.2
33 / 1
Comparison of MC with numerical integration IIICurse of dimensionality
Classial quadrature methods work well for smooth 1d problems.But for d-dimensional integrals we have a problem. Suppose
I =
∫ 1
0
∫ 1
0. . .
∫ 1
0f(x1, . . . , xd)dx1 . . . dxd
We can use the same N point 1-d quadrature rules on each ofthe d integrals.
This uses n = Nd evaluations of f . The 1d mid-ordinate rulehas error O(N−2), so the d-dimensional mid-ordinate rule haserror
|I − I| = O(N−2) = O(n−2/d)
For d = 4 this is the same as Monte Carlo. For larger d it isworse.In addition, we require f to be smooth (f ′′(x) to be continuous)for the method to work well.Monte Carlo has the same O(n−1/2) error rate regardless ofdim(x) or f ′′(x) 34 / 1