Pseudorandom number generators

Random numbers are useful for a variety of purposes, such as generating data encryption keys, simulating and modeling complex phenomena and for selecting random samples from larger data sets. They have also been used aesthetically, for example in literature and music, and are of course ever popular for games and gambling. When discussing single numbers, a random number is one that is drawn from a set of possible values, each of which is equally probable, i.e., a uniform distribution. When discussing a sequence of random numbers, each number drawn must be statistically independent of the others.

This chapter aims at explaining why it's hard and very important (besides interesting) to understand how to get a computer to generate proper random numbers. Since most of currently used security and encryption standards depend on random numbers, it is easy to imagine that by selecting not secured algorithms causes many aspects of our digital lives to become exposed to clever programmers and companies interested in data analysis or less legal practices (identity theft, surveillance, bank fraud and so on). One must realize that the privacy of for example all their banking activities, email communication, social networking and so on heavily depends on randomness in applied security mechanisms.

In cryptography, a pseudorandom generator (or PSG) is procedure that outputs a sequence computationally indistinguishable from truly random sequence with uniformly distributed random sequence. The prefix pseudo (from Greek ψευδής "lying, false") is used to mark something as false, fraudulent, or pretending to be something it is not. Pseudo random generators find application in many fields besides cryptography such as applied mathematics, physics and simulations. Simulations often require mechanisms producing sequences of random values. These procedures are certainly non-trivial and often require significant amounts of computational time.

Definition
A pseudorandom generator (or PRG) is a (deterministic) map $$$$, where $$$$. Here $$$$ is the 'seed length' and $$$$ is the 'stretch'. We typically think that $$$$ and that $$$$ is efficiently computable in some model. If $$$$ is any 'statistical test', we say that G '$$$$-fools' $$$$ is

$$$$

where $${U_m}$$ denotes a uniformly random string in $$$$. Here the string $$$$ is called the 'seed'. If $$$$ is a class of tests, we say that G '$$$$-fools $$$$' or is an '$$$$-PRG against $$$$' if $$$$ $$$$-fools $$$$ for every $$$$.

In other words we are trying to convince any outside party (let's call them an adversary) that sequences returned from PRG are being produced chosen at random. Adversary may use statistical test algorithms to check simultaneously outputs from PRG and uniformly random sequnces. PRGs ensure that both outputs look the same to the adversary..

Required properties
Reliable PRG should have all these properties :

Unbiased - Uniform distribution By definition of the word unbiased this property states that PRG is showing no prejudice for or against something. In PRG language this means that all values of whatever sample size is collected are equiprobable. This property ensures the independence of the generator and its stability against certain types of attacks.

Unpredictable - Independence It is impossible to predict what the next output will be, given all the previous outputs, but not the internal state. If this is not guaranteed then basically anyone can pose as a generator. This is used for example in the man in the middle attack.

Unreproducible Two of the same generators, given the same starting conditions, will produce different outputs. Certain types of spoofing attacks try to reproduce subset of real production environment in order to exploit the lack of security in respect with this condition.

Long period The generator should be of long period because this property directly influences the randomness of generated outputs. This is crucial for example for simulations since they are conducted in order to simulate dynamic behavior and states of an environment not only the cyclic stages of an environment.

Fast computation The generator should be reasonably fast. It is always a good idea to take care about the users of the generator since they might be in a quite constrained environment either by hardware specifications or by expected time performance.

Security The generator should be secured. Basically, security of the generator ensures that no one can break the generator in reasonable time either by the brute-force approach or by more clever ones. If there is no polynomial-time algorithm that on the first $$m$$ output sequence can predict the $${m + 1}^{th}$$ bit with probability greater than 0.5 we consider the generator to be secured.

Construction of simple PRG
There are many ways how to construct PRGs and one of the simplest ones is to use pseudorandom functions and expand the key. A pseudorandom function (or PRF) is any function defined over $$$$ :

$$$$

where:


 * 1) $$$$ is key space
 * 2) $$$$ is input space
 * 3) $$$$ is output space

such that exists efficient algorithm to evaluate $$$$.

On the other hand pseudorandom permutation is any function defined over $$$$:

$$$$

such that :


 * 1) Exists efficient deterministic algorithm to evaluate $$$$
 * 2) The function $$$$ is one-to-one
 * 3) Exists efficient inversion algorithm $$$$

Any pseudorandom permutation (or PRP) is also pseudorandom function given
 * 1) Function is efficiently invertible
 * 1) Function is efficiently invertible

So let $$$$ be a PRF

$${{ \begin{cases} Functions[X, Y]: \text{ all functions from X to Y}\\ S_F = {F(k, \cdot) st. k \in K} \subseteq Functions[X, Y] \end{cases} $$

For PRF to be suitable for use in PRG it must be secure and therefore computationally indistinguishable from random function $${{f(\cdot)}}$$. This situation is depicted below:



The adversary can not distinguish whether output came from $${{S_F}}$$ or some random $${{f(\cdot)}}$$.

If $${{F}}$$ is secure PRF we can use key expantion to construct secure PRG defined as

$${{G: K \times \{0, 1\}^{nt}}}$$ where:
 * 1) $${{n}}$$ number of bits in each block
 * 2) $${{t}}$$ number of generated blocks

We get return value by using key expansion. Great advantage of using this approach is ability to employ multiple CPU cores and take advantage of parallelization (for example odd values are computed by core 1; even values are computed by core 2). Security of PRG is provided by fact that $${{F(k, \cdot)}}$$ is indistinguishable from random $${{f(\cdot)}}$$.

$${{G(k) = F(k, 0) || F(k, 1) || ... || F(k, t-1)}}$$

Linear Congruential Generator
Linear Congruential Generator (or LCG) is one of the best known PRGs in the world. This generator is defined as follows


 * $$X_{n+1} \equiv \left( a X_n + c \right)\pmod{m}$$

where:
 * 1) $$X_{n}$$ is the sequence of pseudorandom values
 * 2) $$ X_0,\,0 \le X_0 < m$$ is the seed
 * 3) $$ a,\,0 < a < m$$ is the multiplier
 * 4) $$ c,\,0 \le c < m$$ is the increment
 * 5) $$ m,\, 0<m $$ is the modulo

are integer constants that specify the generator.

The range of output values is restricted since after at most $$m$$ values the period starts to repeat itself (in the terms of repeating the same pattern). The most significant element in terms of the length of the period is the multiplier. Best outputs from generator with regard to the length of the period are provided given values:


 * $$a - 1$$ is divisable by all the primes that divide the multiplier
 * $$a - 1$$ is multiple of number 4, when the multiplier is multiple of number 4
 * the multiplier and the increment do not have common divisor (except from 1)

An example of output exported from WolframAplha application :



Example: Following example shows how to compute first five output values for specified LCG:


 * $$m = 7902$$
 * $$a = 4331$$
 * $$c = 3492$$
 * $$x_0 = 1477$$.

Multiplicative Congruential Generator
Multiplicative Congruent Generator (or MCG) is simplified version of LCG since if c = 0 in LCG we get the MCG. This generator is defined as follows:

$$X_{n+1} \equiv a X_n \pmod{m}$$

where:
 * 1) $$X_{n}$$ is the sequence of pseudorandom values
 * 2) $$ X_0,\,0 \le X_0 < m$$ is the seed
 * 3) $$ a,\,0 < a < m$$ is the multiplier
 * 4) $$ m,\, 0 q$$
 * 2) $$q$$ is coeficient of lag, $$p > q$$
 * 3) $$\bigoplus$$ binary operation such as adding or subtracting in modulo $$m$$, multiplication in modulo $$m$$ or bitwise exclusive $$OR$$ ($$XOR$$)

and the sequence is defined by

$$x_n = x_{n-p} \bigoplus x_{n-q}$$

For generator to work, $$p$$ and $$q$$ must be odd numbers. Generator stores used $$p$$ values in a lag table. In order to achieve the maximum length of the period and fair degree of randomness parameters need to be set in the following way:
 * 1) $$m = 2^b$$
 * 2) $$p$$ and $$q$$ have values of powers of primitive polynomials

producing the length of the period for $$XOR$$:

$$P = 2^p - 1$$

An example of an output :



Mersenne Twister
Mersenne twister (or MT) is one of the modern implementations of PRGs and it provides very high quality of peudorandom numbers. It was developed by Makoto Matsumoto and Takuji Nishimura in 1997 with the aim of replacement of some known faults of older PRGs. MT is founded on a matrix linear recurrence over a finite binary field. Nowadays, the most used versions are MT with 32 and 64 bit word length. Due to optimizations applied to MT it is optimized to be used in Monte Carlo method simulaions in many fields of science.

Since the description of algorithm is quite, I would like to point the readers to this paper from George Mason University where the internal mechanics of MT are explained in detail.

An example of output (of Mersenne twister shift register generator) exported from WolframAplha application :



Blum Blum Shub
Blum Blum Shub (or BBS) is PRG developed by Lenore Blum, Manuel Blum and Michael Shub in 1986. It is defined as follows:

$$x_{n+1} = x_n^2 \hspace{0.5cm} mod M$$

where:
 * 1) $$p$$ random large prime
 * 2) $$q$$ random large prime
 * 3) $$M$$ is th eproduct of $$p$$ and $$q$$
 * 4) $$x_0$$ is the seed; usually an integer that is co-prime to $$M$$

BBS does not find application in simulations due to its running time. On the other hand due to it's security properties it is appropriate for use in cryptography.

An example of output constructed in Maple 14 (seed=10, range=1000):



Testing
Very important concept to ensure reliability and determine possible areas of use is testing PRGs. There are many tests for PRGs. First of all let's take a look at bacis catogories of tests.

Theoretic tests
These tests aim at detailed study of internal structure of a PRG, its parameters and inner workings. They are typically used when theoretical concepts of PRG are publicly known. From cryptography we know that best way to test any security concept is to assume that adversary already knows the structure and inner workings of tested concept so we rely on complexity of the math backing this concept.

These tests aim at:
 * finding logical gaps in proposed solutions
 * short comings in the ways parameters are processed
 * detail analysis and possible test covarage (in terms of the software development testing)

Examples:
 * Autocorrelation test
 * Analysis serial correlation of the members of the sequence. Very nice example and tutorial how to approach this problem is described here with simple applet for computing the autocorrelation test
 * Spectral test
 * This test detects periodical aspects of produced sequences. Nice application of these test on LCGs is demonstrated here

Blackbox testing
Sometimes adversary does not have an access to the used PRG or simply cannot determine what kind of PRG (if any) is used. In these cases adversary uses another approach and tries to determine the PRG by supplying certain groups of parameters and analysing and testing provided result sequences. Result analysis is based on finding similarities and patterns in result sets. In case of less secured PRG the adversary is able to determine just by doing that, whether they are interacting with PRG or truly random function. An adversary might be able to determine a kind of PRG or even the concrete implementation based on the randomness of gathered outputs.

These tests aim at:
 * faults in PRG inner workings
 * repeating sequences and patterns in result sequences
 * outputs of certain combination of entry parameters

Extarnal links

 * 1) WolframAlpha - Linear Congruential Generator
 * 2) WolframAlpha - Mersenne Twister simulator
 * 3) Mersenne Twister – A Pseudo Random Number Generator and its Variants
 * 4) Autocorrelation test
 * 5) Spectral test
 * 6) Web dedicated to the 'randomness'

Self test
1. What is the plaintext for cipher text "33 63 66 15 41 79 85 15 65 58 85" with following encryption properties:

Lets assume that a secret message has been prepared by converting the letters into digits following the rule:

Then the successive digits are added, modulo 10 to the successive digits of the output of a LCG with following properties: $$m = 8397$$, $$a = 4381$$, $$c = 7364$$ and $$x_0 = 2134$$.

If $$x_1 = 1234$$, then cipher for plaintext AB is:

2. Let $$F:K \times X \rightarrow \{0, 1\}^{128}$$ be a secure PRF. I following generator G a secure PRF?

$$G(k, x) = { \begin{cases} 0^{128} {\hskip 1cm} \text{if } x = 0\\ F(k, x) {\hskip 0.5cm} \text{otherwise}\\ \end{cases} $$


 * a) No, it is easy to distinguish G from a random function
 * b) Yes, an attack on G would also break F
 * c) It depends on F

3. Which required property of PRG is not fulfilled based on the following output from a generator:


 * a) unbiased
 * b) unpredictable
 * c) unreproducible
 * d) none of the above

4. What is value of $$X_8$$ for MCG defined as follows: $$m = 6478$$, $$a = 5620$$, and $$x_0 = 3671$$.


 * a) 4572
 * b) 3649
 * c) 2892
 * d) 6217

5. Categorize test based on its description.

Birthday spacings: Choose random points on a large interval. The spacings between the points should be asymptotically exponentially distributed. The name is based on the birthday paradox.


 * a) theoretic test
 * b) blackbox testing

Solution

 * 1. Answer: LCGINACTION
 * First determine first 6 outputs of the generator. Get letter codes by reversed modulo 10 addition and translate the ciphertext.
 * 2. Answer: a
 * When the adversary queries G at x = 0 they always get 0 and they know they are interacting with PRF and not truly random function.
 * 3. Answer: c
 * Generator produced same output for two identical inputs
 * 4. Answer: c
 * 5. Answer: a