19
Parallel MCMC Random Number Generators Summary Parallel Bayesian computation in R 2.14 using the packages foreach and parallel Matt Moores Cathy Hargrave Bayesian Research & Applications Group Queensland University of Technology, Brisbane, Australia CRICOS provider no. 00213J Thursday September 27, 2012 BRAG Sept. 27 Parallel MCMC in R

Parallel R

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Parallel R

Parallel MCMCRandom Number Generators

Summary

Parallel Bayesian computation in R ≥ 2.14using the packages foreach and parallel

Matt Moores Cathy Hargrave

Bayesian Research & Applications GroupQueensland University of Technology, Brisbane, Australia

CRICOS provider no. 00213J

Thursday September 27, 2012

BRAG Sept. 27 Parallel MCMC in R

Page 2: Parallel R

Parallel MCMCRandom Number Generators

Summary

Outline

1 Parallel MCMCIntroductionR packages

2 Random Number GeneratorsRNG and parallel MCMCRNGs available in R

BRAG Sept. 27 Parallel MCMC in R

Page 3: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

Motivation

Why parallel?large datasetsmany MCMC iterationsmultiple CPU cores now commonplace

eg. Intel Core i5 and i7even mobile phones have multicore CPUs

BRAG Sept. 27 Parallel MCMC in R

Page 4: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

Parallel MCMC

2 kinds of parallelism:concurrent MCMC chains

always applicablestraightforward to implement

concurrent updates within an iterationonly useful for a very large parameter spaceideally in a compiled language (eg. Rcpp with OpenMP)

also implicit parallelism, eg. with Intel Math Kernel Library

BRAG Sept. 27 Parallel MCMC in R

Page 5: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

Concurrent Chains

BRAG Sept. 27 Parallel MCMC in R

Page 6: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

Simple Network Of Workstations

R package snow by Luke Tierney, et al.spawns multiple copies of Rprovides several options for inter-process communication

TCP socketsavailable on any platform, including Microsoft Windows

Message Passing Interface (via the package Rmpi)Parallel Virtual Machine (via the package rpvm)NetWorkSpaces (via the package nws)

can either run on a local machine or a cluster (eg. Lyra)

BRAG Sept. 27 Parallel MCMC in R

Page 7: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

multicore

R package by Simon Urbanekimplemented using the POSIX fork system call

available on Linux and Mac OS Xclones the R instance (functions + data)takes advantage of copy-on-writewill fork as many processes as there are available CPUcores, unless told otherwise

BRAG Sept. 27 Parallel MCMC in R

Page 8: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

parallel

R package parallel included in the core R distributionavailable in versions ≥ 2.14.0incorporates subsets of snow, multicore, and rlecuyersensible default behaviour

BRAG Sept. 27 Parallel MCMC in R

Page 9: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

foreach

"syntactic sugar"�l i b r a r y ( foreach )l i b r a r y ( p a r a l l e l )l i b r a r y ( d o P a r a l l e l )

# w i l l a u t o m a t i c a l l y use a SOCK c l u s t e r on Windows# ( otherwise uses mu l t i co re )r e g i s t e r D o P a r a l l e l ( cores=detectCores ( ) )

foreach ( i =1: getDoParWorkers ( ) ) %dopar% {# t h i s code w i l l be executed concu r ren t l y. . .

}

BRAG Sept. 27 Parallel MCMC in R

Page 10: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

foreach with SNOW

�l i b r a r y ( foreach )l i b r a r y ( p a r a l l e l )l i b r a r y ( d o P a r a l l e l )

# setup l o c a l SOCK c l u s t e r f o r 4 CPU coresc l ← makePSOCKcluster ( 4 )r e g i s t e r D o P a r a l l e l ( c l )

foreach ( i =1: getDoParWorkers ( ) ) %dopar% {# t h i s code w i l l be executed concu r ren t l y. . .

}s topC lus te r ( c l )

BRAG Sept. 27 Parallel MCMC in R

Page 11: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

foreach with multicore

�l i b r a r y ( foreach )l i b r a r y ( p a r a l l e l )l i b r a r y ( d o P a r a l l e l )

# f o r k one c h i l d process f o r each CPU corec l ← makeForkCluster ( detectCores ( ) )r e g i s t e r D o P a r a l l e l ( c l )

foreach ( i =1: getDoParWorkers ( ) ) %dopar% {# t h i s code w i l l be executed concu r ren t l y. . .

}

BRAG Sept. 27 Parallel MCMC in R

Page 12: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

foreach with CODA

If your Gibbs sampler returns an mcmc object, these can beconbined into an mcmc.list:�l i b r a r y ( coda )

samples . l i s t ← foreach ( i =1: getDoParWorkers ( ) ,. combine=mcmc. l i s t ,. mul t icombine=T ) %dopar% {

# t h i s code w i l l be executed concu r ren t l y. . .

}

BRAG Sept. 27 Parallel MCMC in R

Page 13: Parallel R

Parallel MCMCRandom Number Generators

Summary

IntroductionR packages

foreach with other libraries

You need to declare any libraries that are used inside the childprocess. For example:�l i b r a r y ( mvtnorm )l i b r a r y ( coda )

foreach ( i =1: getDoParWorkers ( ) ,. packages=c ("mvtnorm" ,"coda" ) ) %dopar% {

# t h i s code uses mcmc ( . . . ) and rmvnorm ( . . . ). . .

}

BRAG Sept. 27 Parallel MCMC in R

Page 14: Parallel R

Parallel MCMCRandom Number Generators

Summary

RNG and parallel MCMCRNGs available in R

Random Number Generators for parallel MCMC

The chains of our Gibbs sampler run independently, but:if the same RNG is seeded with the same value, all of thechains will generate the same random numbers in thesame sequence - they will be identical!we either need to use:

different seeds, ordifferent random number generators

for each chain (preferably both)it is also advisable to choose (or generate) different initialvalues in each chain of our Gibbs sampler

BRAG Sept. 27 Parallel MCMC in R

Page 15: Parallel R

Parallel MCMCRandom Number Generators

Summary

RNG and parallel MCMCRNGs available in R

Mersenne Twister

The default RNG in Rpseudo-random sequence with 32bit precisionperiodicity of 219937 − 1takes 0.4 seconds to generate 107 random numberson an Intel Core i5 running R 2.15.1 and Windows 7

open-source implementation available at:http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

BRAG Sept. 27 Parallel MCMC in R

Matsumoto & Nishimura (1998) TOMACS 8: 3–30.

Page 16: Parallel R

Parallel MCMCRandom Number Generators

Summary

RNG and parallel MCMCRNGs available in R

Other RNGs in the base package

Wichmann-Hill (1982) Applied Statistics 31, 188–190.Marsaglia-Multicarry(Usenet newsgroup sci.stat.math, 1997)Super-Duper(Reeds, J., Hubert, S. and Abrahams, M., 1982–4)

For JAGS with up to 4 concurrent chains:�r n g I n i t s ← p a r a l l e l . seeds ("base::BaseRNG" , 4 )

BRAG Sept. 27 Parallel MCMC in R

Page 17: Parallel R

Parallel MCMCRandom Number Generators

Summary

RNG and parallel MCMCRNGs available in R

L’Ecuyer

Available via R libraries rlecuyer or parallelMultiple independent streams of random numbersPeriodicity ≈ 2191

(each stream is a subsequence of length 2127)0.6 seconds to generate 107 random numbers via runif

To initialize each child process in a SNOW cluster with anindependent stream:�c l ← makeCluster ( 4 )clusterSetRNGStream ( c l )r e g i s t e r D o P a r a l l e l ( c l )

BRAG Sept. 27 Parallel MCMC in R

L’Ecuyer, et al. (2002) Operations Research, 50(6): 1073–1075.

Page 18: Parallel R

Parallel MCMCRandom Number Generators

Summary

Summary

Most MCMC algorithms are "embarrasingly parallel"chains run independently(as long as the RNG is set up correctly)

The R packages foreach and doParallel make parallelismeasy, on any computing platform

Related topics (not covered in this presentation):Running R on a supercomputer (eg. lyra.qut.edu.au)Cloud computing with Apache HadoopGPU programming in R (nVidia CUDA)

BRAG Sept. 27 Parallel MCMC in R

Page 19: Parallel R

Appendix For Further Reading

For Further Reading

Norman MatloffThe Art of R Programming.No Starch Press, 2011.

M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney & U.MansmannState of the Art in Parallel Computing with R.Journal of Statistical Software, 31(1), 2009.

P. L’Ecuyer, R. Simard, E.J. Chen & W.D. KeltonAn Object-Oriented Random-Number Package with Many Long Streamsand Substreams.Operations Research, 50(6): 1073–1075, 2002.

M. Matsumoto & T. NishimuraMersenne Twister: A 623-Dimensionally Equidistributed UniformPseudo-Random Number Generator.ACM Transactions on Modeling and Computer Simulation, 8: 3–30,1998.

BRAG Sept. 27 Parallel MCMC in R