13
BIO5312 Biostatistics BIO5312 Biostatistics R Session 03: Random Number and R Session 03: Random Number and Probability Distributions Probability Distributions Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 9/13/2016 1 /12

BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

BIO5312 Biostatistics BIO5312 Biostatistics R Session 03: Random Number and R Session 03: Random Number and

Probability DistributionsProbability Distributions

Dr. Junchao Xia

Center of Biophysics and Computational Biology

Fall 2016

9/13/2016 1 /12

Page 2: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Random Number GeneratorRandom Number Generator Random number generators have many important applications in gambling, statistical sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator of genuinely random numbers means a mechanism for producing a sequence of random variables, X1, X2 , X3, …Xn, with the property that

1) Each Xi is uniformly distributed between 0 and 1. 2) The Xi are mutually independent.

“True” vs. pseudo-random numbers 1) First method measures some physical phenomenon that is expected to be random and then compensates for possible bases in the measurement process such as atmospheric noise and thermal noise. “True” random numbers 2) Second method uses computational algorithms that can produce long sequences of apparently random results, which are in fact completely determined by a shorter initial value, known as a seed value. Pseudo-random numbers

A linear congruential generator is a reoccurrence of the following form: Where the multiplier a and the modulus m are integer constants that determine the values generated, given an initial value (seed) X0.

1) Park and Miller method: a= 231 -1=2147483647, m=16897 2) L’Ecuyer method: a = 2147483399, m=40692

9/13/2016

m,/IX m, mod aII 1i1ii1i

2 /12

Page 3: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

General Sampling MethodsGeneral Sampling Methods

Assuming we have a random number generator to produce a sequence of random variables, U1, U2 , U3, …Un, which are mutually independent and uniformly distributed between 0 and 1. How can we obtain a sequence of variables obeying some certain distribution such as normal.

Inverse transform method is the simple but very important one among many others.

1) Suppose we want to sample from a cumulative distribution function F(x); i.e. we want to generate a random variable X with the property that P(X <x) =F(x) for all x.

2) The inverse transform method sets X= F-1(U), where U~Unif[0,1].

9/13/2016 3 /12

Page 4: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Generate Random Integers In RGenerate Random Integers In R

Examples using the sample() function

# set work directory

> setwd("C:/Users/Junchao/Desktop/Biostatistics_5312/2016/lab_03")

# generate a random integer between 1 to 20

>sample(1:20,1)

# generate 10 random integers between 1 to 20 with repeats are allowed

> sample(1:20,10,replace=T)

# select 10 states randomly without repeats

>sample(state.name,10,replace=F)

# sample 52 states randomly without repeats

>sample(state.name,52,replace=F)

# sample 52 states randomly with repeats

sample(state.name,52,replace=T)

# sample 50 states randomly without repeats

>sample(state.name,50,replace=F)

9/13/2016 4 /12

Page 5: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Generate Random Generate Random FloatsFloats Examples using the runif() function # generate 10 random numbers between 0 and 1 >runif(10,0,1) # generate 1000 random numbers between 1.5 to 10.5 > y=runif(1000,1.5,10.5) # check the histogram >hist(y) # generate 10,000 random numbers between 1.5 to 10.5 > y=runif(10000,1.5,10.5) # check the histogram, any difference? >hist(y) # set the seed for the random number generator >set.seed(12345) # generate 1000 random numbers and set to x x=runif(1000,1.5,10.5) # generate another 1000 random numbers and set to y > y=runif(1000,1.5,10.5) # reset random number seed to 12345 >set.seed(12345) # generate another 1000 random numbers and set to z >z=runif(1000,1.5,10.5) # plot scatter plots for x-y and x-z >plot(x,y,xlab="x",ylab="y") >plot(x,z,xlab="x",ylab="z")

9/13/2016 5 /12

Page 6: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Generate Random Generate Random Floats: Continued Floats: Continued

Examples plots from the previous slide

9/13/2016 6 /12

Page 7: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Binomial DistributionBinomial Distribution

Examples using the dbinom(), pbinom(), rbinom() # check the help

>help(dbinom)

# get a binomial distribution with n=10,p=0.05

>x=0:10

>y=dbinom(x,10,0.05)

>plot(x,y,xlab="k",ylab="Pr(k)",main="n=10,p=0.05")

# get a binomial distribution with n=10,p=0.95

>y=dbinom(x,10,0.95)

>plot(x,y,xlab="k",ylab="Pr(k)",main="n=10,p=0.95")

# get a binomial distribution with n=10,p=0.50

>y=dbinom(x,10,0.50)

>plot(x,y,xlab="k",ylab="Pr(k)",main="n=10,p=0.50")

# get the cumulative probability function

>y=pbinom(x,10,0.5)

>plot(x,y,xlab=“k”,ylab=“CDF of Pr(k)”,main=“n=10,p=0.50”)

# generate 1000 random numbers from the binomial distribution

>z=rbinom(1000,10,0.5)

>hist(z)

9/13/2016 7 /12

Page 8: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Binomial Distribution: ContinuedBinomial Distribution: Continued

Some plots from the previous slide

9/13/2016 8 /12

Page 9: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Poisson DistributionPoisson Distribution Examples using the dpois(), ppois(), rpois() # check the help >help(dpois) # get a Poisson distribution with lambda*t=4.6 >x=0:10 >y=dpois(x,4.6) >plot(x,y,xlab="k",ylab="Pr(k)",main="lambda*t=4.6") # get a Poisson distribution with lambda*t=1.15 >y=dpois(x,1.15) >plot(x,y,xlab="k",ylab="Pr(k)",main="lambda*t=1.15") # get the cumulative probability function >y=ppois(x,4.6) >plot(x,y,xlab="k",ylab="CDF of Pr(k)",main="lambda*t=4.6") # generate 1000 random numbers from the Poisson distribution >z=rpois(1000,4.6) hist(z) # Poisson approximation to the Binomial distribution >x=0:20 >y=dbinom(x,100,0.05) >z=dpois(x,5.0) >plot(x,y,xlab="k",ylab="Pr(k)",col="red", main="Red: Binomial,n=100,p=0.05 \n green: Poisson, lambda*t=5.0") > points(x,z,col="green")

9/13/2016 9 /12

Page 10: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Poisson Distribution: ContinuedPoisson Distribution: Continued

Some plots from the previous slide

9/13/2016 10 /12

Page 11: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Normal DistributionNormal Distribution Examples using the dnorm(), pnorm(), rnorm() # check the help >help(dnorm) # get a normal distribution with mean=2,sd=4 >x=c(-5:9) >y=dnorm(x,2,4) >plot(x,y,xlab="x",ylab="Pr(x)",main="normal, mean=2, sd=4") # generate 1000 random numbers from the normal distribution >z=rnorm(1000,2,4) # get PDF of z >hist(z,freq=F) # add y values as red points > points(x,y,co="red") # normal approximation to the binomial distribution > x=0:20 > y=dbinom(x,25,0.4) # normal distribution with mean=np, variance=npq >z=dnorm(x,10,sqrt(6)) >plot(x,y,xlab="x",ylab="Pr(x)",col="red",main="red: binomial\n green : normal") >points(x,z,col="green") # normal approximation to the Poisson distribution >y=dpois(x,10) >z=dnorm(x,10,sqrt(10)) >plot(x,y,xlab="x",ylab="Pr(x)",col="red",main="red: poisson\n green : normal") > points(x,z,col="green")

9/13/2016 11 /12

Page 12: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

Normal Normal Distribution: ContinuedDistribution: Continued

Some plots from the previous slide

9/13/2016 12 /12

Page 13: BIO5312 Biostatistics R Session 03: Random Number and ......sampling, computer simulations, and other areas where producing an unpredictable random sequence is desirable. A generator

The End

9/13/2016 13 /12