Computing for Research ISpring 2013
Presented by: Liqiong Fan
R: Random number generation & Simulations
April 7
Outline
How to sample from common distribution:
• Randomization code generation
• Simulation 1 (explore the relationship between power and effect size)
• Simulation 2 (explore the relationship between power and sample size)
Uniform distribution Binomial distribution Normal distribution
Examples:
Pre-specified vector
Syntax for random number generation in R
e.g., runif() Uniform rbinom() Binomial rnorm() normal …
1. Sample from a known distribution: “r” + name of distribution:
2. Sample from a vector: sample()
e.g., extract two numbers from {1,2,3,4,5,6} with replacement
Uniform distribution (continuous)
PDF Mean:
Variance:
[R] Uniform distribution
runif(n, min=0, max=1)
See R code …
[R] Uniform distribution
Use UNIFORM distribution to generate BERNOULLI distribution
See R code …
Basic idea:0
.00
.20
.40
.60
.81
.0
aa
1
0
1Uniform
distributionBernoulli
distribution
[R] Binomial distribution
rbinom(n, size, prob)
See R code …
e.g. generate 10 Binomial random number with Binom(100, 0.6)
n = 10size = 100prob = 0.6
rbinom(10, 100, 0.6)
e.g. generate 100 Bernoulli random number with p=0.6
n = 100size = 1prob = 0.6
rbinom(100, 1, 0.6)
[R] Normal distribution
See R code …
rnorm(n, mean, sd) #random numberdnorm(x, mean, sd) #densitypnorm(q, mean, sd) #P(X<=q) cdf qnorm(p, mean, sd) #quantile
[R] Normal distribution
dnorm(x, mean, sd) #density
e.g. plot a standard normal curve
pnorm(q, mean, sd) #probability P(X<=x)
e.g. calculate the p-value for a one sides test with standardized test statistic
H0: X<=0H1: X>0
Reject H0 if “Z” is very large
If from the one-sided test, we got the Z value = 3.0, what’s the p-value?
P-value = P(Z>=z) = 1 - P(Z<=z)
1 - pnorm(3, 0, 1)
[R] Normal distribution
See R code …
qnorm(p, mean, sd) #quantile
See R code …
rnorm(n, mean, sd) #random number
[R] Another useful command for sampling from a vector – “sample()”
e.g. randomly choose two number from {2,4,6,8,10} with/without replacement
2
4
6
8
10
sample(x, size, replace = FALSE, prob = NULL)
sample(c(2,4,6,8,10), 2, replace = F)
[R] Another useful command for sampling from a vector – “sample()”
e.g. A question from our THEORY I CLASS:
1
14
2
8
7
“Draw a histogram of all possible average of 6 numbers selected from {1,2,7,8,14,20} with replacement”
20
Answer:A quick way to solve this question is to do a simulation:That is: we assume we repeat selection of 6 balls with replacement from left urn for many many times, and plot their averages. The R code is looked like:
a <- NULLfor (i in 1:10000){a[i] <- mean(sample(c(1,2,7,8,14,20),6, replace = T))}hist(a)
[R] Another useful command for sampling from a vector – “sample()”
e.g. Generate 1000 Bernoulli random number with P = 0.6
sample(x, size, replace = T, prob =)
Answer:Let x = (0, 1),Let size = 1,Let replace = T/F,Let prob = (0.4, 0.6).
Repeat 1000 times0 1
Example 1Generate randomization sequence
Goal: randomize 100 patients to TRT A and B
runif(), rbinom(), sample().
1. Simple randomization (like flipping a coin) – Bernoulli distribution
0 0 1 0 0 1 0 1 0 0 …. 1 0 1 0
See R code …
Example 1Generate randomization sequence
Goal: randomize 100 patients to TRT A and B
2. Random allocation rule (RAL)
Unlike simple randomization, number of allocation for each treatment need to be fixed in advance
Again, think about the urn model!
50
50Draw the balls without replacement
RAL can only guarantee treatment allocation is balanced toward the end.
Example 1Generate randomization sequence
Goal: randomize 100 patients to TRT A and B3. Permuted block randomization
AABB BABA BBAA BABA BAAB … BBAABlock size = 4
sample()
Think about multi urns model! 50
50
…
25
Example 2Investigate the relationship between effect size and power – drug increases SBP
Y: Systolic Blood Pressure (response)X: intervention (1 = drug vs. 0 = control)e: random error = var(Y)
Linear model: Y = b0 + b1X + e
b1 represents the effect size of new drug relative to the control. For instance, assuming that the SBP in control population is distributed as N(120, 49), what is the power if the new drug can truly increase SBP by 0, 1, 2, 3, 4 and 5 units in a study with a sample size of 100 (50 in drug, 50 in placebo)
Important information: Y (placebo) ~ N(120, 49)b0 = 120e ~ N(0, 49)
When X=0, E(Y) = b0, effect of control;When X=1, E(Y) = b0 + b1, effect of drug;Between group difference is represented by b1
Example 2Investigate the relationship between effect size and power - drug increases BP
Y: Blood Pressure (response)X: intervention (1 = drug vs. 0 = control)e: random error = var(Y)
Linear model: Y = b0 + b1X + e
Important information: Y (placebo) ~ N(120, 49)b0 = 120e ~ N(0, 49)
We try to answer:What’s the power given b1 (the real effect size of the treatment) is 0, 1, 2, 3, 4 or 5
If we run simulation for N times, power means the probability that b1 (treatment effect) shows significant (P<0.05) from linear regression tests out of N simulations
Definition of Power:Probability of rejecting NULL when ALTERNATIVE IS TRUE (i.e., b1 = some non-zero value).
Example 2Investigate the relationship between effect size and power - drug increases BP
Linear model: Y = b0 + b1X + e
Simulation steps (E.g. sample size = 50/ per group, 1000 simulations):1. Generate X according to study design (50 “1”s and 50 “0”s);2. Generate 100 “e” from N(0, 49);3. Given b0 and b1, generate Y using Y = b0 + b1X + e;4. Use 100 pairs of (Y, X) to refit a new linear model, and get the new b0 and b1 and
their p-value;5. Repeat these steps for 1000 times.6. If type I error is 0.05, for a two-sided test
# p value for b1 0.05 in 1000 simulationsPower
1000
Y: Systolic Blood Pressure (response)X: intervention (1 = drug vs. 0 = control)e: random error = var(Y)
Example 3Investigate the relationship between sample size and power
We try to answer:What’s the power given b1 = 2 and sample size = 25, 50, 75, 100, 125, and 150 per group
Linear model: Y = b0 + b1X + e
Some recommendation
1. Try not “fix” the parameters in your simulation
2. Always test your code with small number of iterations before you actually start your simulation
3. Use append / write.table (… append = T …) to save the result or simulated data
4. Print the number of interations / senarios Code:
print(c)flush.console()