34
Parallel Random Generator Manny Ko Principal Engineer Activision

Parallel Random Generator - GDC 2015

Embed Size (px)

Citation preview

Page 1: Parallel Random Generator - GDC 2015

Parallel Random Generator Manny Ko Principal Engineer Activision

Page 2: Parallel Random Generator - GDC 2015

Outline

●Serial RNG

●Background

●LCG, LFG, crypto-hash

●Parallel RNG

●Leapfrog, splitting, crypto-hash

Page 3: Parallel Random Generator - GDC 2015

RNG - desiderata

● White noise like

● Repeatable for any # of cores

● Fast

● Small storage

Page 4: Parallel Random Generator - GDC 2015

RNG Quality

● DIEHARD

● Spectral test

● SmallCrush

● BigCrush

GPUBBS

Page 5: Parallel Random Generator - GDC 2015

Power Spectrum

Power spectrum density Radial Mean Radial Variance

Page 6: Parallel Random Generator - GDC 2015

Serial RNG: LCG

● Linear-congruential (LCG)

● 𝑋𝑖 = 𝑎 ∗ 𝑋𝑖−1 + 𝑐 𝑚𝑜𝑑 𝑀,

● a, c and M must be chosen carefully!

● Never choose 𝑀 = 231! Should be a prime

● Park & Miller: 𝑎 = 16807, 𝑚 = 214748647 =231 − 1. 𝑚 is a Mersenne prime!

● Most likely in your C runtime

Page 7: Parallel Random Generator - GDC 2015

LCG: the good and bad

● Good:

● Simple and efficient even if we use mod

● Single word of state

● Bad:

● Short period – at most m

● Low-bits are correlated especially if 𝑚 = 2𝑛

● Pure serial

Page 8: Parallel Random Generator - GDC 2015

LCG - bad

● 𝑋𝑘_+1 = (3 ∗ 𝑋𝑘+4) 𝑚𝑜𝑑 8

● {1,7,1,7, … }

Page 9: Parallel Random Generator - GDC 2015

Mersenne Prime modulo

● IDIV can be 40~80 cycles for 32b/32b

● 𝑘 𝑚𝑜𝑑 𝑝 where 𝑝 = 2𝑠 − 1:

● 𝑖 = 𝑘 & 𝑝 + 𝑘 ≫ 𝑠 ;

● 𝑟𝑒𝑡 𝑖 ≥ 𝑝 ? 𝑖 − 𝑝 ∶ 𝑖;

Page 10: Parallel Random Generator - GDC 2015

Lagged-Fibonacci Generator

● 𝑋𝑖 = 𝑋𝑖−𝑝 ∗ 𝑋𝑖−𝑞; p and q are the lags ● ∗ is =-* mod M (or XOR);

● ALFG: 𝑋𝑛 = 𝑋𝑛−𝑗 + 𝑋𝑛−𝑘(𝑚𝑜𝑑 2𝑚)

● * give best quality

● Period = 2𝑝 − 1 2𝑏−3; 𝑀 = 2𝑏

Page 11: Parallel Random Generator - GDC 2015

LFG

● The good:

●Very efficient: 2 ops + power-of-2 mod

●Much Long period than LCG;

●Directly works in floats

●Higher quality than LCG

●ALFG can skip ahead

Page 12: Parallel Random Generator - GDC 2015

LFG – the bad

● Need to store max(p,q) floats

● Pure sequential –

● multiplicative LFG can’t jump ahead.

Page 13: Parallel Random Generator - GDC 2015

Mersenne Twister

● Gold standard ?

● Large state (624 ints)

● Lots of flops

● Hard to leapfrog

● Limited parallelism

power spectrum

Page 14: Parallel Random Generator - GDC 2015

● End of Basic RNG Overview

Page 15: Parallel Random Generator - GDC 2015

Parallel RNG

● Maintain the RNG’s quality

● Same result regardless of the # of cores

● Minimal state especially for gpu.

● Minimal correlation among the streams.

Page 16: Parallel Random Generator - GDC 2015

Random Tree

• 2 LCGs with different 𝑎

• L used to generate a seed for R

• No need to know how many generators or # of values #s per-thread

• GG

Page 17: Parallel Random Generator - GDC 2015

Leapfrog with 3 cores

• Each thread leaps ahead by 𝑁 using L

• Each thread use its own R to generate its own sequence

• 𝑁 = 𝑐𝑜𝑟𝑒𝑠 ∗ 𝑠𝑒𝑞𝑝𝑒𝑟𝑐𝑜𝑟𝑒

Page 18: Parallel Random Generator - GDC 2015

Leapfrog

● basic LCG without c:

● 𝐿𝑘+1 = 𝑎𝐿𝑘𝑚𝑜𝑑 𝑚

● 𝑅𝑘+1 = 𝑎𝑛𝑅𝑘 𝑚𝑜𝑑 𝑚

● LCG: 𝐴 = 𝑎𝑛and 𝐶 = 𝑐(𝑎𝑛 − 1)/(𝑎 − 1) – each core jumps ahead by n (# of cores)

Page 19: Parallel Random Generator - GDC 2015

Leapfrog with 3 cores

• Each sequence will not overlap

• Final sequence is the same as the serial code

Page 20: Parallel Random Generator - GDC 2015

Leapfrog – the good

● Same sequence as serial code

● Limited choice of RNG (e.g. no MLFG)

● No need to fix the # of random values used per core (need to fix ‘n’)

Page 21: Parallel Random Generator - GDC 2015

Leapfrog – the bad

● 𝑎𝑝no longer have the good qualities of 𝑎

● power-of-2 N produce correlated sub-sequences

● Need to fix ‘n’ - # of generators/sequences

● the period of the original RNG is shorten by a factor of ‘n’. 32 bit LCG has a short period to start with.

Page 22: Parallel Random Generator - GDC 2015

Sequence Splitting

• If we know the # of values per thread 𝑛

• 𝐿𝑘+1 = 𝑎𝑛𝐿𝑘 𝑚𝑜𝑑 𝑚 • 𝑅𝑘+1 = 𝑎𝑅𝑘𝑚𝑜𝑑 𝑚

• the sequence is a subset of the serial code

Page 23: Parallel Random Generator - GDC 2015

Leapfrog and Splitting

● Only guarantees the sequences are non-overlap; nothing about its quality

● Not invariant to degree of parallelism

● Result change when # cores change

● Serial and parallel code does not match

Page 24: Parallel Random Generator - GDC 2015

Lagged-Fibonacci Leapfrog

● LFG has very long period ● Period = 2𝑝 − 1 2𝑏−3; 𝑀 = 2𝑏

● 𝑀 can be power-of-two!

● Much better quality than LCG

● No leapfrog for the best variant – ‘*’

● Luckily the ALFG supports leapfrogging

Page 25: Parallel Random Generator - GDC 2015

Issues with Leapfrog & Splitting ● LCG’s period get even shorter

● Questionable quality

● ALFG is much better but have to store more state – for the ‘lag’.

Page 26: Parallel Random Generator - GDC 2015

Crypto Hash

● MD5

● TEA: tiny encryption algorithm

Page 27: Parallel Random Generator - GDC 2015

Core Idea

1. input trivially prepared in parallel, e.g. linear ramp

2. feed input value into hash, independently and in parallel

3. output white noise

hash

input

output

Page 28: Parallel Random Generator - GDC 2015

TEA

● A Feistel coder

● Input is split into L and R

● 128B key

● F: shift and XORs or adds

Page 29: Parallel Random Generator - GDC 2015

TEA

Page 30: Parallel Random Generator - GDC 2015

Magic ‘delta’

● 𝑑𝑒𝑙𝑡𝑎 = 5 − 1 231

● Avalanche in 6 cycles (often in 4)

● * mixes better than ^ but makes TEA twice as slow

Page 31: Parallel Random Generator - GDC 2015

Applications

Fractal terrain

(vertex shader)

Texture tiling

(fragment shader)st

Page 32: Parallel Random Generator - GDC 2015

SPRNG

● Good package by Michael Mascagni

● http://www.sprng.org/

Page 33: Parallel Random Generator - GDC 2015

References ● [Mascagni 99] Some Methods for Parallel Pseudorandom Number Generation, 1999.

● [Park & Miller 88] Random Number Generators: Good Ones are hard to Find, CACM, 1988.

● [Pryor 94] Implementation of a Portable and Reproducible Parallel Pseudorandom Number Generator, SC, 1994

● [Tzeng & Li 08] Parallel White Noise Generation on a GPU via Cryptographic Hash, I3D, 2008

● [Wheeler 95] TEA, a tiny encryption algorithm, 1995.

Page 34: Parallel Random Generator - GDC 2015

Take Aways

● Look beyond LCG

● ALFG is worth a closer look

● Crypto-based hash is most promising – especially TEA.