Upload
matt-moores
View
2.699
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
Parallel MCMCRandom Number Generators
Summary
Parallel Bayesian computation in R ≥ 2.14using the packages foreach and parallel
Matt Moores Cathy Hargrave
Bayesian Research & Applications GroupQueensland University of Technology, Brisbane, Australia
CRICOS provider no. 00213J
Thursday September 27, 2012
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
Outline
1 Parallel MCMCIntroductionR packages
2 Random Number GeneratorsRNG and parallel MCMCRNGs available in R
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
Motivation
Why parallel?large datasetsmany MCMC iterationsmultiple CPU cores now commonplace
eg. Intel Core i5 and i7even mobile phones have multicore CPUs
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
Parallel MCMC
2 kinds of parallelism:concurrent MCMC chains
always applicablestraightforward to implement
concurrent updates within an iterationonly useful for a very large parameter spaceideally in a compiled language (eg. Rcpp with OpenMP)
also implicit parallelism, eg. with Intel Math Kernel Library
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
Concurrent Chains
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
Simple Network Of Workstations
R package snow by Luke Tierney, et al.spawns multiple copies of Rprovides several options for inter-process communication
TCP socketsavailable on any platform, including Microsoft Windows
Message Passing Interface (via the package Rmpi)Parallel Virtual Machine (via the package rpvm)NetWorkSpaces (via the package nws)
can either run on a local machine or a cluster (eg. Lyra)
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
multicore
R package by Simon Urbanekimplemented using the POSIX fork system call
available on Linux and Mac OS Xclones the R instance (functions + data)takes advantage of copy-on-writewill fork as many processes as there are available CPUcores, unless told otherwise
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
parallel
R package parallel included in the core R distributionavailable in versions ≥ 2.14.0incorporates subsets of snow, multicore, and rlecuyersensible default behaviour
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
foreach
"syntactic sugar"�l i b r a r y ( foreach )l i b r a r y ( p a r a l l e l )l i b r a r y ( d o P a r a l l e l )
# w i l l a u t o m a t i c a l l y use a SOCK c l u s t e r on Windows# ( otherwise uses mu l t i co re )r e g i s t e r D o P a r a l l e l ( cores=detectCores ( ) )
foreach ( i =1: getDoParWorkers ( ) ) %dopar% {# t h i s code w i l l be executed concu r ren t l y. . .
}
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
foreach with SNOW
�l i b r a r y ( foreach )l i b r a r y ( p a r a l l e l )l i b r a r y ( d o P a r a l l e l )
# setup l o c a l SOCK c l u s t e r f o r 4 CPU coresc l ← makePSOCKcluster ( 4 )r e g i s t e r D o P a r a l l e l ( c l )
foreach ( i =1: getDoParWorkers ( ) ) %dopar% {# t h i s code w i l l be executed concu r ren t l y. . .
}s topC lus te r ( c l )
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
foreach with multicore
�l i b r a r y ( foreach )l i b r a r y ( p a r a l l e l )l i b r a r y ( d o P a r a l l e l )
# f o r k one c h i l d process f o r each CPU corec l ← makeForkCluster ( detectCores ( ) )r e g i s t e r D o P a r a l l e l ( c l )
foreach ( i =1: getDoParWorkers ( ) ) %dopar% {# t h i s code w i l l be executed concu r ren t l y. . .
}
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
foreach with CODA
If your Gibbs sampler returns an mcmc object, these can beconbined into an mcmc.list:�l i b r a r y ( coda )
samples . l i s t ← foreach ( i =1: getDoParWorkers ( ) ,. combine=mcmc. l i s t ,. mul t icombine=T ) %dopar% {
# t h i s code w i l l be executed concu r ren t l y. . .
}
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
IntroductionR packages
foreach with other libraries
You need to declare any libraries that are used inside the childprocess. For example:�l i b r a r y ( mvtnorm )l i b r a r y ( coda )
foreach ( i =1: getDoParWorkers ( ) ,. packages=c ("mvtnorm" ,"coda" ) ) %dopar% {
# t h i s code uses mcmc ( . . . ) and rmvnorm ( . . . ). . .
}
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
RNG and parallel MCMCRNGs available in R
Random Number Generators for parallel MCMC
The chains of our Gibbs sampler run independently, but:if the same RNG is seeded with the same value, all of thechains will generate the same random numbers in thesame sequence - they will be identical!we either need to use:
different seeds, ordifferent random number generators
for each chain (preferably both)it is also advisable to choose (or generate) different initialvalues in each chain of our Gibbs sampler
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
RNG and parallel MCMCRNGs available in R
Mersenne Twister
The default RNG in Rpseudo-random sequence with 32bit precisionperiodicity of 219937 − 1takes 0.4 seconds to generate 107 random numberson an Intel Core i5 running R 2.15.1 and Windows 7
open-source implementation available at:http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
BRAG Sept. 27 Parallel MCMC in R
Matsumoto & Nishimura (1998) TOMACS 8: 3–30.
Parallel MCMCRandom Number Generators
Summary
RNG and parallel MCMCRNGs available in R
Other RNGs in the base package
Wichmann-Hill (1982) Applied Statistics 31, 188–190.Marsaglia-Multicarry(Usenet newsgroup sci.stat.math, 1997)Super-Duper(Reeds, J., Hubert, S. and Abrahams, M., 1982–4)
For JAGS with up to 4 concurrent chains:�r n g I n i t s ← p a r a l l e l . seeds ("base::BaseRNG" , 4 )
BRAG Sept. 27 Parallel MCMC in R
Parallel MCMCRandom Number Generators
Summary
RNG and parallel MCMCRNGs available in R
L’Ecuyer
Available via R libraries rlecuyer or parallelMultiple independent streams of random numbersPeriodicity ≈ 2191
(each stream is a subsequence of length 2127)0.6 seconds to generate 107 random numbers via runif
To initialize each child process in a SNOW cluster with anindependent stream:�c l ← makeCluster ( 4 )clusterSetRNGStream ( c l )r e g i s t e r D o P a r a l l e l ( c l )
BRAG Sept. 27 Parallel MCMC in R
L’Ecuyer, et al. (2002) Operations Research, 50(6): 1073–1075.
Parallel MCMCRandom Number Generators
Summary
Summary
Most MCMC algorithms are "embarrasingly parallel"chains run independently(as long as the RNG is set up correctly)
The R packages foreach and doParallel make parallelismeasy, on any computing platform
Related topics (not covered in this presentation):Running R on a supercomputer (eg. lyra.qut.edu.au)Cloud computing with Apache HadoopGPU programming in R (nVidia CUDA)
BRAG Sept. 27 Parallel MCMC in R
Appendix For Further Reading
For Further Reading
Norman MatloffThe Art of R Programming.No Starch Press, 2011.
M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney & U.MansmannState of the Art in Parallel Computing with R.Journal of Statistical Software, 31(1), 2009.
P. L’Ecuyer, R. Simard, E.J. Chen & W.D. KeltonAn Object-Oriented Random-Number Package with Many Long Streamsand Substreams.Operations Research, 50(6): 1073–1075, 2002.
M. Matsumoto & T. NishimuraMersenne Twister: A 623-Dimensionally Equidistributed UniformPseudo-Random Number Generator.ACM Transactions on Modeling and Computer Simulation, 8: 3–30,1998.
BRAG Sept. 27 Parallel MCMC in R