Pseudorandom number generators

Random numbers are useful for a variety of purposes, such as generating data encryption keys, simulating and modeling complex phenomena and for selecting random samples from larger data sets. They have also been used aesthetically, for example in literature and music, and are of course ever popular for games and gambling. When discussing single numbers, a random number is one that is drawn from a set of possible values, each of which is equally probable, i.e., a uniform distribution. When discussing a sequence of random numbers, each number drawn must be statistically independent of the others.[1]

This chapter aims at explaining why it's hard and very important (besides interesting) to understand how to get a computer to generate proper random numbers. Since most of currently used security and encryption standards depend on random numbers, it is easy to imagine that by selecting not secured algorithms causes many aspects of our digital lives to become exposed to clever programmers and companies interested in data analysis or less legal practices (identity theft, surveillance, bank fraud and so on). One must realize that the privacy of for example all their banking activities, email communication, social networking and so on heavily depends on randomness in applied security mechanisms.

In cryptography, a pseudorandom generator (or PSG) is procedure that outputs a sequence computationally indistinguishable from truly random sequence with uniformly distributed random sequence. The prefix pseudo (from Greek ψευδής "lying, false") is used to mark something as false, fraudulent, or pretending to be something it is not. Pseudo random generators find application in many fields besides cryptography such as applied mathematics, physics and simulations. Simulations often require mechanisms producing sequences of random values. These procedures are certainly non-trivial and often require significant amounts of computational time.

Definition

A pseudorandom generator (or PRG) is a (deterministic) map ${\displaystyle {G:\{0,1\}^{l}\rightarrow \{0,1\}^{n}}}$, where ${\displaystyle {n\geq l}}$. Here ${\displaystyle {l}}$ is the 'seed length' and ${\displaystyle {n-l\geq 0}}$ is the 'stretch'. We typically think that ${\displaystyle {n\gg l}}$ and that ${\displaystyle {G}}$ is efficiently computable in some model. If ${\displaystyle {f:\{0,1\}^{n}\rightarrow \{0,1\}}}$ is any 'statistical test', we say that G '${\displaystyle {\epsilon }}$-fools' ${\displaystyle {f}}$ is

${\displaystyle {|Pr[f({U_{n}}=1)]-Pr[f(G({U_{l}}))=1)]|\leq \epsilon }}$

where ${\displaystyle {U_{m}}}$ denotes a uniformly random string in ${\displaystyle {\{0,1\}^{m}}}$. Here the string ${\displaystyle {U_{l}}}$ is called the 'seed'. If ${\displaystyle {C}}$ is a class of tests, we say that G '${\displaystyle {\epsilon }}$-fools ${\displaystyle {C}}$' or is an '${\displaystyle {\epsilon }}$-PRG against ${\displaystyle {C}}$' if ${\displaystyle {G}}$ ${\displaystyle {\epsilon }}$-fools ${\displaystyle {f}}$ for every ${\displaystyle {f\in C}}$.[2]

In other words we are trying to convince any outside party (let's call them an adversary) that sequences returned from PRG are being produced chosen at random. Adversary may use statistical test algorithms to check simultaneously outputs from PRG and uniformly random sequnces. PRGs ensure that both outputs look the same to the adversary..

Required properties

Reliable PRG should have all these properties[3]:

Unbiased - Uniform distribution
By definition of the word unbiased this property states that PRG is showing no prejudice for or against something. In PRG language this means that all values of whatever sample size is collected are equiprobable. This property ensures the independence of the generator and its stability against certain types of attacks.

Unpredictable - Independence
It is impossible to predict what the next output will be, given all the previous outputs, but not the internal state. If this is not guaranteed then basically anyone can pose as a generator. This is used for example in the man in the middle attack.

Unreproducible
Two of the same generators, given the same starting conditions, will produce different outputs. Certain types of spoofing attacks try to reproduce subset of real production environment in order to exploit the lack of security in respect with this condition.

Long period
The generator should be of long period because this property directly influences the randomness of generated outputs. This is crucial for example for simulations since they are conducted in order to simulate dynamic behavior and states of an environment not only the cyclic stages of an environment.

Fast computation
The generator should be reasonably fast. It is always a good idea to take care about the users of the generator since they might be in a quite constrained environment either by hardware specifications or by expected time performance.

Security
The generator should be secured. Basically, security of the generator ensures that no one can break the generator in reasonable time either by the brute-force approach or by more clever ones. If there is no polynomial-time algorithm that on the first ${\displaystyle m}$ output sequence can predict the ${\displaystyle {m+1}^{th}}$ bit with probability greater than 0.5 we consider the generator to be secured.

Construction of simple PRG

There are many ways how to construct PRGs and one of the simplest ones is to use pseudorandom functions and expand the key. A pseudorandom function (or PRF) is any function defined over $\displaystyle {{\textnormal(K, X, Y)}}$  :

${\displaystyle {F:K\times X\rightarrow Y}}$

where:

1. ${\displaystyle {K}}$ is key space
2. ${\displaystyle {X}}$ is input space
3. ${\displaystyle {Y}}$ is output space

such that exists efficient algorithm to evaluate $\displaystyle {{\textnormal F(k,x)}}$ .[4]

On the other hand pseudorandom permutation is any function defined over $\displaystyle {{\textnormal(K, X)}}$ :

${\displaystyle {E:K\times X\rightarrow X}}$

such that[4]:

1. Exists efficient deterministic algorithm to evaluate $\displaystyle {{\textnormal E(k,x)}}$
2. The function $\displaystyle {{\textnormal F(k,\cdot)}}$ is one-to-one
3. Exists efficient inversion algorithm $\displaystyle {{\textnormal D(k,y)}}$

Any pseudorandom permutation (or PRP) is also pseudorandom function given

1. ${\displaystyle {X=Y}}$
2. Function is efficiently invertible

So let ${\displaystyle {F:K\times \{0,1\}^{n}\rightarrow \{0,1\}^{n}}}$ be a PRF

$\displaystyle {{ \begin{cases} Functions[X, Y]: \text{ all functions from X to Y}\\ S_F = {F(k, \cdot) st. k \in K} \subseteq Functions[X, Y] \end{cases}$

For PRF to be suitable for use in PRG it must be secure and therefore computationally indistinguishable from random function ${\displaystyle {f(\cdot )}}$. This situation is depicted below:

The adversary can not distinguish whether output came from ${\displaystyle {S_{F}}}$ or some random ${\displaystyle {f(\cdot )}}$.

If ${\displaystyle {F}}$ is secure PRF we can use key expantion to construct secure PRG defined as

${\displaystyle {G:K\times \{0,1\}^{nt}}}$ where:

1. ${\displaystyle {n}}$ number of bits in each block
2. ${\displaystyle {t}}$ number of generated blocks

We get return value by using key expansion. Great advantage of using this approach is ability to employ multiple CPU cores and take advantage of parallelization (for example odd values are computed by core 1; even values are computed by core 2). Security of PRG is provided by fact that ${\displaystyle {F(k,\cdot )}}$ is indistinguishable from random ${\displaystyle {f(\cdot )}}$.

${\displaystyle {G(k)=F(k,0)||F(k,1)||...||F(k,t-1)}}$

Linear methods

Linear Congruential Generator

 Advantages fast computation time small memery requirements suitable for embedded systems suitable for gaming consoles (high order bits)
 Disadvantages low quality of randomness selection from ${\displaystyle n}$-dimensional space produces points positioned on lines weak against the spectral test big differences in the length of the period for low and high order bits

Linear Congruential Generator (or LCG) is one of the best known PRGs in the world. This generator is defined as follows

${\displaystyle X_{n+1}\equiv \left(aX_{n}+c\right)~~{\pmod {m}}}$

where:

1. ${\displaystyle X_{n}}$ is the sequence of pseudorandom values
2. ${\displaystyle X_{0},\,0\leq X_{0} is the seed
3. ${\displaystyle a,\,0 is the multiplier
4. ${\displaystyle c,\,0\leq c is the increment
5. ${\displaystyle m,\,0 is the modulo

are integer constants that specify the generator.[5]

The range of output values is restricted since after at most ${\displaystyle m}$ values the period starts to repeat itself (in the terms of repeating the same pattern). The most significant element in terms of the length of the period is the multiplier. Best outputs from generator with regard to the length of the period are provided given values:

• ${\displaystyle a-1}$ is divisable by all the primes that divide the multiplier
• ${\displaystyle a-1}$ is multiple of number 4, when the multiplier is multiple of number 4
• the multiplier and the increment do not have common divisor (except from 1)

An example of output exported from WolframAplha application[6]:

Example:
Following example shows how to compute first five output values for specified LCG:

• ${\displaystyle m=7902}$
• ${\displaystyle a=4331}$
• ${\displaystyle c=3492}$
• ${\displaystyle x_{0}=1477}$.
${\displaystyle X_{1}}$ ${\displaystyle \equiv }$ $\displaystyle 4331 * X_0 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 4331 * 1477 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 7661
${\displaystyle X_{2}}$ ${\displaystyle \equiv }$ $\displaystyle 4331 * X_1 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 4331 * 7661 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 2785
${\displaystyle X_{3}}$ ${\displaystyle \equiv }$ $\displaystyle 4331 * X_2 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 4331 * 2785 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 6875
${\displaystyle X_{4}}$ ${\displaystyle \equiv }$ $\displaystyle 4331 * X_3 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 4331 * 6875 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 4381
${\displaystyle X_{5}}$ ${\displaystyle \equiv }$ $\displaystyle 4331 * X_4 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 4331 * 4381 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 4901

Multiplicative Congruential Generator

 Choice of m: Most natural choice for ${\displaystyle m}$ is one that equals to the capacity of a computer word.${\displaystyle m=2^{b}}$ (binary machine), where ${\displaystyle b}$ is the number of bits in the computer word.${\displaystyle m=10^{d}}$ (decimal machine), where ${\displaystyle d}$ is the number of digits in the computer word.

Multiplicative Congruent Generator (or MCG) is simplified version of LCG since if c = 0 in LCG we get the MCG[5]. This generator is defined as follows:

${\displaystyle X_{n+1}\equiv aX_{n}~~{\pmod {m}}}$

where:

1. ${\displaystyle X_{n}}$ is the sequence of pseudorandom values
2. ${\displaystyle X_{0},\,0\leq X_{0} is the seed
3. ${\displaystyle a,\,0 is the multiplier
4. ${\displaystyle m,\,0 is the modulo

An example of output (of 59-bit multiplicative congruential generator) exported from WolframAplha application[7]:

Example:
Following example shows how to compute first five output values for specified MCG:

• ${\displaystyle m=5037}$
• ${\displaystyle a=3414}$
• ${\displaystyle x_{0}=1739}$.
${\displaystyle X_{1}}$ ${\displaystyle \equiv }$ $\displaystyle X_0 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 1739 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 3360
${\displaystyle X_{2}}$ ${\displaystyle \equiv }$ $\displaystyle X_1 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 3360 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 1791
${\displaystyle X_{3}}$ ${\displaystyle \equiv }$ $\displaystyle X_2 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 1791 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 4593
${\displaystyle X_{4}}$ ${\displaystyle \equiv }$ $\displaystyle X_3 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 4593 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 0321
${\displaystyle X_{5}}$ ${\displaystyle \equiv }$ $\displaystyle X_4 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ $\displaystyle 0321 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)$ ${\displaystyle \equiv }$ 2865

Lagged Fibonacci Generator

 Advantages fast computation time efficient implementation long period acceptable performance on standard statistical tests for randomness a large number of independent streams of numbers can be generated from the same initial values
 Disadvantages poor behavior with R(24,55) and smaller generators weak against birthday spacing and generalized triple tests complexity of the initialization of LFGs outputs are very sensitive to initial conditions

Lagged Fibonacci generator (or LFG) is one of the fastest PRGs providing long period. LFG is defined as follows

${\displaystyle LFG(p,q,\bigoplus )}$

where:

1. ${\displaystyle p}$ is coeficient of lag, ${\displaystyle p>q}$
2. ${\displaystyle q}$ is coeficient of lag, ${\displaystyle p>q}$
3. ${\displaystyle \bigoplus }$ binary operation such as adding or subtracting in modulo ${\displaystyle m}$, multiplication in modulo ${\displaystyle m}$ or bitwise exclusive ${\displaystyle OR}$ (${\displaystyle XOR}$)

and the sequence is defined by

${\displaystyle x_{n}=x_{n-p}\bigoplus x_{n-q}}$

For generator to work, ${\displaystyle p}$ and ${\displaystyle q}$ must be odd numbers. Generator stores used ${\displaystyle p}$ values in a lag table. In order to achieve the maximum length of the period and fair degree of randomness parameters need to be set in the following way:

1. ${\displaystyle m=2^{b}}$
2. ${\displaystyle p}$ and ${\displaystyle q}$ have values of powers of primitive polynomials

producing the length of the period for ${\displaystyle XOR}$:

${\displaystyle P=2^{p}-1}$[8]

An example of an output[9]:

Mersenne Twister

 Advantages long period passes number of tests for statistical randomness passes most of the strict tests from TestU01 crush randomness tests
 Disadvantages not suitable for cryptography (in its native form) after certain number of observations one is able to predict outputs of the future iterations it takes quite long time to turn non-random initial state into sufficiently random output that passes randomness tests may require LFG or LCG to do the initial seeding

Mersenne twister (or MT) is one of the modern implementations of PRGs and it provides very high quality of peudorandom numbers. It was developed by Makoto Matsumoto and Takuji Nishimura in 1997 with the aim of replacement of some known faults of older PRGs. MT is founded on a matrix linear recurrence over a finite binary field. Nowadays, the most used versions are MT with 32 and 64 bit word length. Due to optimizations applied to MT it is optimized to be used in Monte Carlo method simulaions in many fields of science.

Since the description of algorithm is quite , I would like to point the readers to this paper from George Mason University where the internal mechanics of MT are explained in detail.

An example of output (of Mersenne twister shift register generator) exported from WolframAplha application[7]:

Nonlinear methods

Blum Blum Shub

 Advantages unpredictable (passes next-bit test) security is based on difficulty of factoring ${\displaystyle N}$ suitable for key generation
 Disadvantages slow computation times uses very large numbers not suitable for cipher use

Blum Blum Shub (or BBS) is PRG developed by Lenore Blum, Manuel Blum and Michael Shub in 1986. It is defined as follows:

$\displaystyle x_{n+1} = x_n^2 \hspace{0.5cm} mod M$

where:

1. ${\displaystyle p}$ random large prime
2. ${\displaystyle q}$ random large prime
3. ${\displaystyle M}$ is th eproduct of ${\displaystyle p}$ and ${\displaystyle q}$
4. ${\displaystyle x_{0}}$ is the seed; usually an integer that is co-prime to ${\displaystyle M}$[10]

BBS does not find application in simulations due to its running time. On the other hand due to it's security properties it is appropriate for use in cryptography.

An example of output constructed in Maple 14 (seed=10, range=1000):

Testing

Very important concept to ensure reliability and determine possible areas of use is testing PRGs. There are many tests for PRGs. First of all let's take a look at bacis catogories of tests.

Theoretic tests

These tests aim at detailed study of internal structure of a PRG, its parameters and inner workings. They are typically used when theoretical concepts of PRG are publicly known. From cryptography we know that best way to test any security concept is to assume that adversary already knows the structure and inner workings of tested concept so we rely on complexity of the math backing this concept.

These tests aim at:

• finding logical gaps in proposed solutions
• short comings in the ways parameters are processed
• detail analysis and possible test covarage (in terms of the software development testing)

Examples:

• Autocorrelation test
• Analysis serial correlation of the members of the sequence. Very nice example and tutorial how to approach this problem is described here with simple applet for computing the autocorrelation test
• Spectral test
• This test detects periodical aspects of produced sequences. Nice application of these test on LCGs is demonstrated here

Blackbox testing

Sometimes adversary does not have an access to the used PRG or simply cannot determine what kind of PRG (if any) is used. In these cases adversary uses another approach and tries to determine the PRG by supplying certain groups of parameters and analysing and testing provided result sequences. Result analysis is based on finding similarities and patterns in result sets. In case of less secured PRG the adversary is able to determine just by doing that, whether they are interacting with PRG or truly random function. An adversary might be able to determine a kind of PRG or even the concrete implementation based on the randomness of gathered outputs.

These tests aim at:

• faults in PRG inner workings
• repeating sequences and patterns in result sequences
• outputs of certain combination of entry parameters

Self test

1. What is the plaintext for cipher text "33 63 66 15 41 79 85 15 65 58 85" with following encryption properties:

Lets assume that a secret message has been prepared by converting the letters into digits following the rule:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Letter code 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Then the successive digits are added, modulo 10 to the successive digits of the output of a LCG with following properties: ${\displaystyle m=8397}$, ${\displaystyle a=4381}$, ${\displaystyle c=7364}$ and ${\displaystyle x_{0}=2134}$.

If ${\displaystyle x_{1}=1234}$, then cipher for plaintext AB is:

Plaintext A B
Plaintext code 01 02
Key digits 12 34
Ciphertext 13 36

2. Let ${\displaystyle F:K\times X\rightarrow \{0,1\}^{128}}$ be a secure PRF. I following generator G a secure PRF?

$\displaystyle G(k, x) = { \begin{cases} 0^{128} {\hskip 1cm} \text{if } x = 0\\ F(k, x) {\hskip 0.5cm} \text{otherwise}\\ \end{cases}$

• a) No, it is easy to distinguish G from a random function
• b) Yes, an attack on G would also break F
• c) It depends on F

3. Which required property of PRG is not fulfilled based on the following output from a generator:

Input 235 803 186 597 931 235 274 727
Output 345812 971486 207319 349183 729460 345812 367428 319708
• a) unbiased
• b) unpredictable
• c) unreproducible
• d) none of the above

4. What is value of ${\displaystyle X_{8}}$ for MCG defined as follows: ${\displaystyle m=6478}$, ${\displaystyle a=5620}$, and ${\displaystyle x_{0}=3671}$.

• a) 4572
• b) 3649
• c) 2892
• d) 6217

5. Categorize test based on its description.

Birthday spacings: Choose random points on a large interval. The spacings between the points should be asymptotically exponentially distributed. The name is based on the birthday paradox.

• a) theoretic test
• b) blackbox testing