Parallel RNGs
Theory: multiple RNGs
If you want to use random generators in multiple worker threads simultaneously, then you will want to use multiple RNGs. A few suggested approaches:
- Use
thread_rng
in each worker thread. This is seeded automatically (lazily and uniquely) on each thread where it is used. - Use
thread_rng
(or another master RNG) to seed a custom RNG on each worker thread. The main advantage here is flexibility over the RNG used. - Use a custom RNG per work unit, not per worker thread. If these RNGs are seeded in a deterministic fashion, then deterministic results are possible. Unfortunately, seeding a new RNG for each work unit from a master generator cannot be done in parallel, thus may be slow.
- Use a single master seed. For each work unit, seed an RNG using the master seed and set the RNG's stream to the work unit number. This is potentially a faster than (3) while still deterministic, but not supported by all RNGs.
Note: do not simply clone RNGs for worker threads/units. Clones return the same sequence of output as the original. You may however use clones if you then set a unique stream on each.
Streams
Which RNG families support multiple streams?
- ChaCha: the ChaCha RNGs support 256-bit seed, 64-bit stream and 64-bit counter (per 16-word block), thus supporting 264 streams of 268 words each.
- Hc128 is a cryptographic RNG supporting a 256-bit seed; one could construct this seed from (e.g.) a smaller 192-bit key plus a 64-bit stream.
Note that the above approach of constructing the seed from a smaller key plus a stream counter can only be recommended with cryptographic PRNGs since simpler RNGs often have correlations in the RNG's output using two similar keys, and may also require "random looking" seeds to produce high quality output.
Non-cryptographic PRNGs may still support multiple streams, but likely with significant limitations (especially noting that a common recommendation with such PRNGs is not to consume more than the square root of the generator's period).
-
Xoshiro: the Xoshiro family of RNGs support
jump
andlong_jump
methods which may effectively be used to divide the output of a single RNG into multiple streams. In practice this is only useful with a small number of streams, sincejump
must be calledn
times to select the nth "stream". -
Pcg: these RNGs support construction with
state
andstream
parameters. Note, however, that the RNGs have been critiqued in that multiple streams using the same key are often strongly correlated. See the author's own comments.The PCG RNGs also support an
fn advance(delta)
method, which might be used to divide a single stream into multiple sub-streams as with Xoshiro'sjump
(but better since the offset can be specified).
Practice: non-deterministic multi-threaded
We use Rayon's parallel iterators, using map_init
to initialize an RNG in
each worker thread. Note: this RNG may be re-used across multiple work units,
which may be split between worker threads in non-deterministic fashion.
use rand::distributions::{Distribution, Uniform}; use rayon::prelude::*; static SAMPLES: u64 = 1_000_000; fn main() { let range = Uniform::new(-1.0f64, 1.0); let in_circle = (0..SAMPLES) .into_par_iter() .map_init(|| rand::thread_rng(), |rng, _| { let a = range.sample(rng); let b = range.sample(rng); if a * a + b * b <= 1.0 { 1 } else { 0 } }) .reduce(|| 0usize, |a, b| a + b); // prints something close to 3.14159... println!( "π is approximately {}", 4. * (in_circle as f64) / (SAMPLES as f64) ); }
Practice: deterministic multi-threaded
We use approach (4) above to achieve a deterministic result: initialize all RNGs
from a single seed, but using multiple streams.
We use ChaCha8Rng::set_stream
to achieve this.
Note further that we manually batch multiple work-units according to
BATCH_SIZE
. This is important since the cost of initializing an RNG is large
compared to the cost of our work unit (generating two random samples plus some
trivial calculations). Manual batching could improve performance of the above
non-deterministic simulation too.
(Note: this example is https://github.com/rust-random/rand/blob/master/examples/rayon-monte-carlo.rs.)
use rand::distributions::{Distribution, Uniform}; use rand_chacha::{rand_core::SeedableRng, ChaCha8Rng}; use rayon::prelude::*; static SEED: u64 = 0; static BATCH_SIZE: u64 = 10_000; static BATCHES: u64 = 1000; fn main() { let range = Uniform::new(-1.0f64, 1.0); let in_circle = (0..BATCHES) .into_par_iter() .map(|i| { let mut rng = ChaCha8Rng::seed_from_u64(SEED); rng.set_stream(i); let mut count = 0; for _ in 0..BATCH_SIZE { let a = range.sample(&mut rng); let b = range.sample(&mut rng); if a * a + b * b <= 1.0 { count += 1; } } count }) .reduce(|| 0usize, |a, b| a + b); // prints 3.1409052 (determinstic and reproducible result) println!( "π is approximately {}", 4. * (in_circle as f64) / ((BATCH_SIZE * BATCHES) as f64) ); }