Pseudorandom number generators

From Simulace.info
Jump to: navigation, search

Random numbers are useful for a variety of purposes, such as generating data encryption keys, simulating and modeling complex phenomena and for selecting random samples from larger data sets. They have also been used aesthetically, for example in literature and music, and are of course ever popular for games and gambling. When discussing single numbers, a random number is one that is drawn from a set of possible values, each of which is equally probable, i.e., a uniform distribution. When discussing a sequence of random numbers, each number drawn must be statistically independent of the others.[1]

This chapter aims at explaining why it's hard and very important (besides interesting) to understand how to get a computer to generate proper random numbers. Since most of currently used security and encryption standards depend on random numbers, it is easy to imagine that by selecting not secured algorithms causes many aspects of our digital lives to become exposed to clever programmers and companies interested in data analysis or less legal practices (identity theft, surveillance, bank fraud and so on). One must realize that the privacy of for example all their banking activities, email communication, social networking and so on heavily depends on randomness in applied security mechanisms.

In cryptography, a pseudorandom generator (or PSG) is procedure that outputs a sequence computationally indistinguishable from truly random sequence with uniformly distributed random sequence. The prefix pseudo (from Greek ψευδής "lying, false") is used to mark something as false, fraudulent, or pretending to be something it is not. Pseudo random generators find application in many fields besides cryptography such as applied mathematics, physics and simulations. Simulations often require mechanisms producing sequences of random values. These procedures are certainly non-trivial and often require significant amounts of computational time.


Definition

A pseudorandom generator (or PRG) is a (deterministic) map , where . Here is the 'seed length' and is the 'stretch'. We typically think that and that is efficiently computable in some model. If is any 'statistical test', we say that G '-fools' is



where denotes a uniformly random string in . Here the string is called the 'seed'. If is a class of tests, we say that G '-fools ' or is an '-PRG against ' if -fools for every .[2]


In other words we are trying to convince any outside party (let's call them an adversary) that sequences returned from PRG are being produced chosen at random. Adversary may use statistical test algorithms to check simultaneously outputs from PRG and uniformly random sequnces. PRGs ensure that both outputs look the same to the adversary..

Required properties

Reliable PRG should have all these properties[3]:

Unbiased - Uniform distribution
By definition of the word unbiased this property states that PRG is showing no prejudice for or against something. In PRG language this means that all values of whatever sample size is collected are equiprobable. This property ensures the independence of the generator and its stability against certain types of attacks.


Unpredictable - Independence
It is impossible to predict what the next output will be, given all the previous outputs, but not the internal state. If this is not guaranteed then basically anyone can pose as a generator. This is used for example in the man in the middle attack.


Unreproducible
Two of the same generators, given the same starting conditions, will produce different outputs. Certain types of spoofing attacks try to reproduce subset of real production environment in order to exploit the lack of security in respect with this condition.


Long period
The generator should be of long period because this property directly influences the randomness of generated outputs. This is crucial for example for simulations since they are conducted in order to simulate dynamic behavior and states of an environment not only the cyclic stages of an environment.


Fast computation
The generator should be reasonably fast. It is always a good idea to take care about the users of the generator since they might be in a quite constrained environment either by hardware specifications or by expected time performance.


Security
The generator should be secured. Basically, security of the generator ensures that no one can break the generator in reasonable time either by the brute-force approach or by more clever ones. If there is no polynomial-time algorithm that on the first output sequence can predict the bit with probability greater than 0.5 we consider the generator to be secured.

Construction of simple PRG

There are many ways how to construct PRGs and one of the simplest ones is to use pseudorandom functions and expand the key. A pseudorandom function (or PRF) is any function defined over Failed to parse (unknown function "\textnormal"): {\displaystyle {{\textnormal(K, X, Y)}}}  :

where:

  1. is key space
  2. is input space
  3. is output space

such that exists efficient algorithm to evaluate Failed to parse (unknown function "\textnormal"): {\displaystyle {{\textnormal F(k,x)}}} .[4]


On the other hand pseudorandom permutation is any function defined over Failed to parse (unknown function "\textnormal"): {\displaystyle {{\textnormal(K, X)}}} :

such that[4]:

  1. Exists efficient deterministic algorithm to evaluate Failed to parse (unknown function "\textnormal"): {\displaystyle {{\textnormal E(k,x)}}}
  2. The function Failed to parse (unknown function "\textnormal"): {\displaystyle {{\textnormal F(k,\cdot)}}} is one-to-one
  3. Exists efficient inversion algorithm Failed to parse (unknown function "\textnormal"): {\displaystyle {{\textnormal D(k,y)}}}


Any pseudorandom permutation (or PRP) is also pseudorandom function given

  1. Function is efficiently invertible


So let be a PRF


Failed to parse (syntax error): {\displaystyle {{ \begin{cases} Functions[X, Y]: \text{ all functions from X to Y}\\ S_F = {F(k, \cdot) st. k \in K} \subseteq Functions[X, Y] \end{cases} }


For PRF to be suitable for use in PRG it must be secure and therefore computationally indistinguishable from random function . This situation is depicted below:


Secure prf.png


The adversary can not distinguish whether output came from or some random .


If is secure PRF we can use key expantion to construct secure PRG defined as


where:

  1. number of bits in each block
  2. number of generated blocks


We get return value by using key expansion. Great advantage of using this approach is ability to employ multiple CPU cores and take advantage of parallelization (for example odd values are computed by core 1; even values are computed by core 2). Security of PRG is provided by fact that is indistinguishable from random .

Linear methods

Linear Congruential Generator

Advantages
  • fast computation time
  • small memery requirements
  • suitable for embedded systems
  • suitable for gaming consoles (high order bits)
Disadvantages
  • low quality of randomness
  • selection from -dimensional space produces points positioned on lines
  • weak against the spectral test
  • big differences in the length of the period for low and high order bits

Linear Congruential Generator (or LCG) is one of the best known PRGs in the world. This generator is defined as follows

where:

  1. is the sequence of pseudorandom values
  2. is the seed
  3. is the multiplier
  4. is the increment
  5. is the modulo

are integer constants that specify the generator.[5]

The range of output values is restricted since after at most values the period starts to repeat itself (in the terms of repeating the same pattern). The most significant element in terms of the length of the period is the multiplier. Best outputs from generator with regard to the length of the period are provided given values:

  • is divisable by all the primes that divide the multiplier
  • is multiple of number 4, when the multiplier is multiple of number 4
  • the multiplier and the increment do not have common divisor (except from 1)

An example of output exported from WolframAplha application[6]:

LCG example.png

Example:
Following example shows how to compute first five output values for specified LCG:

  • .
Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * X_0 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * 1477 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 7661
Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * X_1 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * 7661 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 2785
Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * X_2 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * 2785 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 6875
Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * X_3 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * 6875 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 4381
Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * X_4 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 4331 * 4381 + 3492 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 4901

Multiplicative Congruential Generator

Choice of m:
Most natural choice for is one that equals to the capacity of a computer word.

(binary machine), where is the number of bits in the computer word.

(decimal machine), where is the number of digits in the computer word.

Multiplicative Congruent Generator (or MCG) is simplified version of LCG since if c = 0 in LCG we get the MCG[5]. This generator is defined as follows:

where:

  1. is the sequence of pseudorandom values
  2. is the seed
  3. is the multiplier
  4. is the modulo

An example of output (of 59-bit multiplicative congruential generator) exported from WolframAplha application[7]:

MCG59 example.png

Example:
Following example shows how to compute first five output values for specified MCG:

  • .
Failed to parse (unknown function "\hskip"): {\displaystyle X_0 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 1739 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 3360
Failed to parse (unknown function "\hskip"): {\displaystyle X_1 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 3360 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 1791
Failed to parse (unknown function "\hskip"): {\displaystyle X_2 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 1791 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 4593
Failed to parse (unknown function "\hskip"): {\displaystyle X_3 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 4593 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 0321
Failed to parse (unknown function "\hskip"): {\displaystyle X_4 * 3414 {\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} Failed to parse (unknown function "\hskip"): {\displaystyle 0321 * 3414{\hskip 0.5cm} (mod{\hskip 0.15cm}7902)} 2865

Lagged Fibonacci Generator

Advantages
  • fast computation time
  • efficient implementation
  • long period
  • acceptable performance on standard statistical tests for randomness
  • a large number of independent streams of numbers can be generated from the same initial values
Disadvantages
  • poor behavior with R(24,55) and smaller generators
  • weak against birthday spacing and generalized triple tests
  • complexity of the initialization of LFGs
  • outputs are very sensitive to initial conditions

Lagged Fibonacci generator (or LFG) is one of the fastest PRGs providing long period. LFG is defined as follows

where:

  1. is coeficient of lag,
  2. is coeficient of lag,
  3. binary operation such as adding or subtracting in modulo , multiplication in modulo or bitwise exclusive ()

and the sequence is defined by


For generator to work, and must be odd numbers. Generator stores used values in a lag table. In order to achieve the maximum length of the period and fair degree of randomness parameters need to be set in the following way:

  1. and have values of powers of primitive polynomials

producing the length of the period for :

[8]


An example of an output[9]:

LFG example.gif

Mersenne Twister

Advantages
  • long period
  • passes number of tests for statistical randomness
  • passes most of the strict tests from TestU01 crush randomness tests
Disadvantages
  • not suitable for cryptography (in its native form)
  • after certain number of observations one is able to predict outputs of the future iterations
  • it takes quite long time to turn non-random initial state into sufficiently random output that passes randomness tests
  • may require LFG or LCG to do the initial seeding

Mersenne twister (or MT) is one of the modern implementations of PRGs and it provides very high quality of peudorandom numbers. It was developed by Makoto Matsumoto and Takuji Nishimura in 1997 with the aim of replacement of some known faults of older PRGs. MT is founded on a matrix linear recurrence over a finite binary field. Nowadays, the most used versions are MT with 32 and 64 bit word length. Due to optimizations applied to MT it is optimized to be used in Monte Carlo method simulaions in many fields of science.

Since the description of algorithm is quite , I would like to point the readers to this paper from George Mason University where the internal mechanics of MT are explained in detail.

An example of output (of Mersenne twister shift register generator) exported from WolframAplha application[7]:

MT example.png

Nonlinear methods

Blum Blum Shub

Advantages
  • unpredictable (passes next-bit test)
  • security is based on difficulty of factoring
  • suitable for key generation
Disadvantages
  • slow computation times
  • uses very large numbers
  • not suitable for cipher use

Blum Blum Shub (or BBS) is PRG developed by Lenore Blum, Manuel Blum and Michael Shub in 1986. It is defined as follows:

Failed to parse (unknown function "\hspace"): {\displaystyle x_{n+1} = x_n^2 \hspace{0.5cm} mod M}

where:

  1. random large prime
  2. random large prime
  3. is th eproduct of and
  4. is the seed; usually an integer that is co-prime to [10]


BBS does not find application in simulations due to its running time. On the other hand due to it's security properties it is appropriate for use in cryptography.

An example of output constructed in Maple 14 (seed=10, range=1000):

BBS example.gif

Testing

Very important concept to ensure reliability and determine possible areas of use is testing PRGs. There are many tests for PRGs. First of all let's take a look at bacis catogories of tests.

Theoretic tests

These tests aim at detailed study of internal structure of a PRG, its parameters and inner workings. They are typically used when theoretical concepts of PRG are publicly known. From cryptography we know that best way to test any security concept is to assume that adversary already knows the structure and inner workings of tested concept so we rely on complexity of the math backing this concept.


These tests aim at:

  • finding logical gaps in proposed solutions
  • short comings in the ways parameters are processed
  • detail analysis and possible test covarage (in terms of the software development testing)


Examples:

  • Autocorrelation test
    • Analysis serial correlation of the members of the sequence. Very nice example and tutorial how to approach this problem is described here with simple applet for computing the autocorrelation test
  • Spectral test
    • This test detects periodical aspects of produced sequences. Nice application of these test on LCGs is demonstrated here

Blackbox testing

Sometimes adversary does not have an access to the used PRG or simply cannot determine what kind of PRG (if any) is used. In these cases adversary uses another approach and tries to determine the PRG by supplying certain groups of parameters and analysing and testing provided result sequences. Result analysis is based on finding similarities and patterns in result sets. In case of less secured PRG the adversary is able to determine just by doing that, whether they are interacting with PRG or truly random function. An adversary might be able to determine a kind of PRG or even the concrete implementation based on the randomness of gathered outputs.


These tests aim at:

  • faults in PRG inner workings
  • repeating sequences and patterns in result sequences
  • outputs of certain combination of entry parameters

References

Extarnal links

  1. WolframAlpha - Linear Congruential Generator
  2. WolframAlpha - Mersenne Twister simulator
  3. Mersenne Twister – A Pseudo Random Number Generator and its Variants
  4. Autocorrelation test
  5. Spectral test
  6. Web dedicated to the 'randomness'

Self test

1. What is the plaintext for cipher text "33 63 66 15 41 79 85 15 65 58 85" with following encryption properties:

Lets assume that a secret message has been prepared by converting the letters into digits following the rule:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Letter code 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Then the successive digits are added, modulo 10 to the successive digits of the output of a LCG with following properties: , , and .

If , then cipher for plaintext AB is:

Plaintext A B
Plaintext code 01 02
Key digits 12 34
Ciphertext 13 36

2. Let be a secure PRF. I following generator G a secure PRF?

Failed to parse (unknown function "\hskip"): {\displaystyle G(k, x) = { \begin{cases} 0^{128} {\hskip 1cm} \text{if } x = 0\\ F(k, x) {\hskip 0.5cm} \text{otherwise}\\ \end{cases} }

  • a) No, it is easy to distinguish G from a random function
  • b) Yes, an attack on G would also break F
  • c) It depends on F

3. Which required property of PRG is not fulfilled based on the following output from a generator:

Input 235 803 186 597 931 235 274 727
Output 345812 971486 207319 349183 729460 345812 367428 319708
  • a) unbiased
  • b) unpredictable
  • c) unreproducible
  • d) none of the above

4. What is value of for MCG defined as follows: , , and .

  • a) 4572
  • b) 3649
  • c) 2892
  • d) 6217

5. Categorize test based on its description.

Birthday spacings: Choose random points on a large interval. The spacings between the points should be asymptotically exponentially distributed. The name is based on the birthday paradox.

  • a) theoretic test
  • b) blackbox testing

Solution

  • 1. Answer: LCGINACTION
    • First determine first 6 outputs of the generator. Get letter codes by reversed modulo 10 addition and translate the ciphertext.
  • 2. Answer: a
    • When the adversary queries G at x = 0 they always get 0 and they know they are interacting with PRF and not truly random function.
  • 3. Answer: c
    • Generator produced same output for two identical inputs
  • 4. Answer: c
  • 5. Answer: a