Simulation

Generating Uniform Random Numbers

44 minute read



Notice a tyop typo? Please submit an issue or open a PR.

Generating Uniform Random Numbers

Introduction

Uniform(0,1) random numbers are the key to all random variate generation and simulation. As we have discussed previously, we transform uniforms to get other random variables - exponential, normal, and so on.

The goal is to produce an algorithm that can generate a sequence of pseudo-random numbers (PRNs) R1,R2,...R_1, R_2, ... that "appear" to be i.i.d Uniform(0,1). Of course, these numbers are not truly uniform because they are neither independent of one another, nor are they random. However, they will appear random to humans, and they will have all the statistical properties of i.i.d Uniform(0,1) random numbers, so we are allowed to use them.

Such an algorithm has a few desired properties. As we said, we need to produce output that appears to be i.i.d Unif(0,1). We also need the algorithm to execute quickly because we might want to generate billions of PRNs. Moreover, we need the algorithm to be able to reproduce any sequence it generates. This property allows us to replicate experiments and simulation runs that rely on streams of random numbers.

In the next few lessons, we will look at different classes of Unif(0,1) generators. We'll start with some lousy generators, such as

  • the output of a random device
  • a table of random numbers
  • the mid-square method
  • the Fibonacci method

We will then move on to the more common methods used in practice: the linear congruential generators and the Tausworthe generator. Finally, we'll talk about hybrid models, which most people use today.

Some Lousy Generators

In this lesson, we will spend some time talking about poor generators. Obviously, we won't use these generators in practice, but we can study them briefly both for a historical perspective and to understand why they are so bad.

Random Devices

We'll start our discussion by looking at random devices. A coin toss is a random device that can help us generate values of zero or one randomly. Similarly, we can roll a die to generate a random number between one and six. More sophisticated random devices include Geiger counters and atomic clocks.

Naturally, random devices have strong randomness properties; however, we cannot reproduce their results easily. For example, suppose we ran an experiment based off of one billion die rolls. We'd have to store the results of those rolls somewhere if we ever wanted to replicate the experiment. This storage requirement makes random devices unwieldy in practice.

Random Number Tables

We can also use random number tables. These tables are relatively ubiquitous, although they have fallen out of use since the 1950s. This table, published by the RAND corporation, contains one million random digits and one hundred thousand normal random variates. Basically, folks used a random device to generate random numbers, and then they wrote them down in a book.

How do we use this book? Well, we can simply flip to a random page, put our finger down in the middle of the page, and start reading off the digits.

While this approach has strong reproducibility and randomness properties, it hasn't scaled to meet today's needs. More specifically, a finite sequence of one million random numbers is just not sufficient for most applications. We can consume a billion random numbers in seconds on modern hardware. Additionally, once the digits were tabled, they were no longer random.

Mid-Square Method

The mid-square method was created by the famous mathematician John von Neumann. The main idea here is we take an integer, square it, and then use the middle part of that integer as our next random integer, repeating the process as many times as we need. To generate the Uniform(0,1) random variable, we would divide each generated integer by the appropriate power of ten.

Let's look at an example. Suppose we have the seed X0=6632X_0 = 6632. If we square X0X_0, we get 66322=439834246632^2 = 43983424, and we set X1X_1 equal to the middle four digits: X1=9834X_1 = 9834. We can generate X2X_2 in a similar fashion: X12=98342=96707556X_1^2 = 9834^2 = 96707556, so X2=7075X_2 = 7075. Now, since we are taking the middle four integers, we generate the corresponding Uniform(0,1) random variate, RiR_i, by dividing XiX_i by 1000010000. For example, R1=X1/10000=9834/10000=0.9834R_1 = X_1 / 10000 = 9834 / 10000 = 0.9834, and R2=0.7075R_2 = 0.7075 by similar arithmetic.

Unfortunately, it turns out that there is a positive serial correlation in the RiR_i's. For example, a very small Xi+1X_{i+1} often follows a very small XiX_i, more so than we would expect by random chance. More tragically, the sequence occasionally degenerates. For example, if Xi=0003X_i = 0003, then the sequence ends up producing only zeros after a certain point.

Fibonacci and Additive Congruential Generators

The Fibonacci and additive congruential generators, not to be confused with the linear congruential generators, are also no good.

Here is the expression for the Fibonacci generator, which got its name got its name from the famed sequence that involves adding the previous two numbers to get the current number.

Xi=(Xiβˆ’1+Xiβˆ’2)β€Šmodβ€Šm,i=1,2,...Ri=Xi/m\begin{alignedat}{1} X_i &= (X_{i-1} + X_{i-2})\bmod m, \quad i = 1,2,... \\ R_i &= X_i / m \end{alignedat}

For this sequence, Xβˆ’1X_{-1} and X0X_0 are seeds, and mm is the modulus. Remember that a=bβ€Šmodβ€Šma = b \bmod m if and only if aa is the remainder of b/mb/m. For example, 6=13β€Šmodβ€Š76 = 13 \bmod 7.

Like the mid-square method, this generator has a problem where small numbers frequently follow small numbers. Worse, it turns out the one-third of the values for each subsequent uniform are impossible to calculate. Particularly, it is impossible to compute an Xi+1X_{i+1}, such that Xiβˆ’1<Xi+1<XiX_{i-1} < X_{i+1} < X_i, or Xi<Xi+1<Xiβˆ’1X_i < X_{i+1} < X_{i-1}. Those two configurations should occur one-third of the time, which means that this generator does not have strong randomness properties.

Linear Congruential Generator

In this lesson, we will ditch the lousy pseudo-random number generators and turn to a much better alternative: the linear congruential generator (LCG). Variations of LCGs are the most commonly used generators today in practice.

LCGs

Here is the general form for an LCG. Given a modulus, mm, and the constants aa and cc, we generate a pseudo-random integer, XiX_i, and its corresponding Uniform(0,1) counterpart, RiR_i, using the following equations:

Xi=(aXiβˆ’1+c)β€Šmodβ€Šm,whereΒ X0Β isΒ theΒ seed.Ri=Xi/m,i=1,2,...\begin{alignedat}{1} X_i &= (aX_{i-1}+c) \bmod m, \text{where } X_0 \text{ is the seed.} \\ R_i &= X_i / m, \quad i = 1,2,... \end{alignedat}

We choose aa, cc, and mm carefully to ensure both that the RiR_i's appear to be i.i.d uniforms and that the generator has long periods or cycle lengths: the amount of time until the LCG starts to repeat.

Multiplicative generators are LCGs where c=0c=0.

Trivial Example

Let's look at an example. Consider the following LCG:

Xi=(5Xiβˆ’1+3)β€Šmodβ€Š8X_i = (5X_{i-1} + 3) \bmod 8

Starting with the seed, X0=0X_0 = 0, we can generate the following sequence of values:

i0123456789Xi0325476103Ri03/82/85/84/87/86/81/803/8\begin{array}{c|cccccccc|cc} i & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\ \hline X_i & 0 & 3 & 2 & 5 & 4 & 7 & 6 & 1 & 0 & 3 \\ \hline R_i & 0 & 3/8 & 2/8 & 5/8 & 4/8 & 7/8 & 6/8 & 1/8 & 0 & 3/8 \\ \end{array}

Notice that this sequence starts repeating with after eight observations: X8=X0=0X_8 = X_0 = 0. This generator is a full-period generator since it has a cycle length equal to mm. Generally speaking, we prefer full-period generators, but we would never use this particular generator in practice because the modulus is much too small.

Let's consider the following plot of all (Xiβˆ’1,Xi)(X_{i-1}, X_i) coordinate pairs. For example, the point (0,3)(0, 3) corresponds to (X0,X1)(X_0, X_1). Notice that we also include the point (βˆ’2,1)(-2, 1). We can observe that βˆ’2β€Šmodβ€Š8=6β€Šmodβ€Š8=6-2 \bmod 8 = 6 \bmod 8 = 6.

What's interesting about this plot is that the pseudo-random numbers appear to fall on lines. This feature is a general property of the linear congruential generators.

Easy Exercise

Consider the following generator:

Xi=(5Xiβˆ’1+2)β€Šmodβ€Š8X_i = (5X_{i-1} + 2) \bmod 8

Does this generator achieve full cycle? Let's find out:

i01234Xi02460Ri02/84/86/80\begin{array}{c|cccc|c} i & 0 & 1 & 2 & 3 & 4 \\ \hline X_i & 0 & 2 & 4 & 6 & 0 \\ \hline R_i & 0 & 2/8 & 4/8 & 6/8 & 0 \\ \end{array}

If we seed this sequence with an odd number, we only see odd numbers. Similarly, if we seed this sequence with an even number, we only see even numbers. Regardless, this sequence repeats after four observations, and, since the modulus is eight, this generator does not achieve full cycle.

Better Example

Let's look at a much better generator, which we have seen before:

Xi=16807Xiβˆ’1β€Šmodβ€Š231βˆ’1X_i = 16807X_{i-1} \bmod 2^{31}-1

This particular generator has a full-period (greater than two billion) cycle length, except when X0=0X_0 = 0.

Let's look at the algorithm for generating each XiX_i and RiR_i:

Kβ†βŒŠXiβˆ’1/237773βŒ‹Xi←16807(Xiβˆ’1βˆ’127773K)βˆ’2836KifΒ Xi<0,thenΒ setΒ Xi←Xi+214748367Ri←Xiβˆ—4.656612875E-10\begin{alignedat}{1} & K \leftarrow \lfloor X_{i-1} / 237773 \rfloor \\ & X_i \leftarrow 16807(X_{i-1} - 127773K) - 2836K \\ & \text{if } X_i < 0, \text{then set } X_i \leftarrow X_i + 214748367 \\ & R_i \leftarrow X_i * 4.656612875\text{E-}10 \end{alignedat}

As an example, if X0=12345678X_0 = 12345678, then:

Kβ†βŒŠ12345678/237773βŒ‹=96X1←16807[12345678βˆ’127773(96)]βˆ’2836(96)=1335380034R1←1335380034βˆ—4.656612875E-10=0.621835\begin{alignedat}{1} & K \leftarrow \lfloor 12345678 / 237773 \rfloor = 96 \\ & X_1 \leftarrow 16807[12345678 - 127773(96)] - 2836(96) = 1335380034 \\ & R_1 \leftarrow 1335380034 * 4.656612875\text{E-}10 = 0.621835 \end{alignedat}

What Can Go Wrong?

As we saw, we can have generators like Xi=(5Xiβˆ’1+2)β€Šmodβ€Š8X_i = (5X_{i-1} + 2) \bmod 8, which only produce even integers and are therefore not full-period. Alternatively, the generator Xi=(Xiβˆ’1+1)β€Šmodβ€Š8X_i = (X_{i-1} + 1) \bmod 8 is full-period, but it produces very non-random output: X1=1X_1 = 1, X2=2X_2 = 2, and so on.

In any case, if the modulus mm is "small", we'll get unusably short cycling whether or not the generator is full period. By small, we mean anything less than two billion or so. However, just because mm is big, we still have to be careful because subtle problems can arise.

RANDU Generator

Let's take a look at the RANDU generator, which was popular during the 1960s:

Xi=65539Xiβˆ’1β€Šmodβ€Š231X_i = 65539 X_{i-1} \bmod 2^{31}

Unfortunately, the numbers that this LCG generates are provably not i.i.d., even from a statistical point of view.

Let's take a sufficiently large ii and plot the points (Riβˆ’2,Riβˆ’1,Ri)(R_{i-2}, R_{i-1}, R_i) within a unit cube. If the numbers generated from this sequence were truly random, we would see a random dispersion of dots within the cube. However, the random numbers fall entirely on fifteen hyperplanes.

Exercises

Tausworthe Generators

In this lesson, we will look at the Tausworthe generator.

Tausworthe Generator

Let's take a look at the Tausworthe generator. We will define a sequence of binary digits, B1,B2,...,B_1, B_2,..., as:

Bi=(βˆ‘j=1qcjBiβˆ’j)β€Šmodβ€Š2B_i = \left(\sum_{j=1}^q c_j B_{i-j}\right) \bmod 2

In other words, we calculate each BiB_i by taking the sum of the qq previous cjBiβˆ’jc_j B_{i -j} products, where cj∈{0,1}c_j \in \{0,1\}. We take the entire sum β€Šmodβ€Š2\bmod 2, so every BiB_i is either a one or a zero.

Instead of using the previous qq entries in the sequence to compute BiB_i, we can use a shortcut that saves a lot of computational effort, which takes the same amount of time regardless of the size of qq:

Bi=(Biβˆ’r+Biβˆ’q)β€Šmodβ€Š2=Biβˆ’rΒ XORΒ Biβˆ’q0<r<qB_i = (B_{i-r} + B_{i-q}) \bmod 2 = B_{i-r} \text{ XOR } B_{i-q} \quad 0 < r < q

Remember that any BiB_i can only be either 00 or 11. Consider the following table:

Biβˆ’rBiβˆ’qBiβˆ’r+Biβˆ’q(Biβˆ’r+Biβˆ’q)β€Šmodβ€Š200000111101111100\begin{array}{cc|cc} B_{i-r} & B_{i-q} & B_{i-r} + B_{i-q} & (B_{i-r} + B_{i-q}) \bmod 2 \\ \hline 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 10 & 0 \end{array}

Now, let's consider the XOR\text{XOR} operator. If we think of the OR\text{OR} operator as, colloquially, "either this or that", we can think of the XOR\text{XOR} as "either this or that, but not both"; indeed, XOR\text{XOR} is an abbreviation of "eXclusive OR". Let's look at the truth table:

Biβˆ’rBiβˆ’qBiβˆ’rΒ XORΒ Biβˆ’q000011101110\begin{array}{cc|c} B_{i-r} & B_{i-q} & B_{i-r} \text{ XOR } B_{i-q} \\ \hline 0 & 0 & 0 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{array}

Thus, we can see that (Biβˆ’r+Biβˆ’q)β€Šmodβ€Š2(B_{i-r} + B_{i-q}) \bmod 2 and Biβˆ’rΒ XORΒ Biβˆ’qB_{i-r} \text{ XOR } B_{i-q} are indeed equivalent. Of course, we might use an even simpler expression to compute BiB_i, which doesn't involve XOR\text{XOR}:

Bi={0Biβˆ’r=Biβˆ’q1Biβˆ’rβ‰ Biβˆ’qB_i = \left\{ \begin{matrix} 0 & B_{i-r} = B_{i-q} \\ 1 & B_{i-r} \neq B_{i-q} \end{matrix} \right.

To initialize the BiB_i sequence, we need to specify, B1,B2,...,BqB_1, B_2,..., B_q.

Example

Let's look at an example. Consider r=3,q=5;B1=β‹―=B5=1r=3, q=5; B_1 = \cdots = B_5 = 1. Thus, for i>5i > 5, Bi=Biβˆ’3Β XORΒ Biβˆ’5B_i = B_{i-3} \text{ XOR } B_{i-5}. For example, B6=B3Β XORΒ B1=0B_6 = B_3 \text{ XOR } B_1 = 0 and B7=B4Β XORΒ B2=0B_7 = B_4 \text{ XOR } B_2 = 0. Let's look at the first 36 values in the sequence:

111110001101110101000010010111111111 \quad 1000 \quad 1101 \quad 1101 \quad 0100 \quad 0010 \quad 0101 \quad 1111

Generally, the period of these bit sequences is 2qβˆ’12^q - 1. In our case, q=5q = 5, so 25βˆ’1=312^5 - 1 = 31. Indeed, the thirty-second bit restarts the sequence of five ones that we see starting from the first bit.

Generating Uniforms

How do we get Unif(0,1) random variables from the BiB_i's? We can take a sequence of ll bits and divide them by 2l2^l to compute a real number between zero and one.

For example, suppose l=4l=4. Given the sequence of bits in the previous sequence, we get the following sequence of randoms:

11112,10002,11012,11012,...β†’1516,816,1316,1316,...1111_2, 1000_2, 1101_2, 1101_2,... \to \frac{15}{16}, \frac{8}{16}, \frac{13}{16}, \frac{13}{16}, ...

Tausworthe generators have a lot of potential. They have many nice properties, including long periods and fast calculations. Like with the LCGs, we have to make sure to choose our parameters - qq, rr, B1,⋯ ,BqB_1, \cdots, B_q, and ll - with care.

Generalizations of LCGs

In this lesson, we will return to the LCGs and look at several generalizations, some of which have remarkable properties.

A Simple Generalization

Let's consider the following generalization:

Xi=(βˆ‘j=1qaiXiβˆ’j)β€Šmodβ€ŠmX_i = \left(\sum_{j=1}^q a_i X_{i-j}\right) \bmod m

Generators of this form can have extremely large periods - up to mqβˆ’1m^q - 1 if we choose the parameters correctly. However, we need to be careful. The Fibonacci generator, which we saw earlier, is a special case of this generalization, and it's a terrible generator, as we demonstrated previously.

Combinations of Generators

We can combine two generators, X1,X2,...X_1, X_2, ... and Y1,Y2,...Y_1, Y_2,..., to construct a third generator, Z1,Z2,...Z_1, Z_2,.... We might use one of the following techniques to construct the ZiZ_i's:

  • Set ZiZ_i equal to (Xi+Yi)β€Šmodβ€Šm(X_i + Y_i) \bmod m
  • Shuffling the XiX_i's and YiY_i's
  • Set ZiZ_i conditionally equal to XiX_i or YiY_i

However, the properties for these composite generators are challenging to prove, and we should not use these simple tricks to combine generators.

A Really Good Combined Generator

The following is a very strong combined generator. First, we initialize X1,0,X1,1,X1,2,X2,0,X2,1,X2,2X_{1,0}, X_{1,1}, X_{1,2}, X_{2,0}, X_{2,1}, X_{2,2}. Next, for iβ‰₯3i \geq 3:

X1,i=(1,403,580X1,iβˆ’2βˆ’810,728X1,iβˆ’3)β€Šmodβ€Š(232βˆ’209)X2,i=(527,612X2,iβˆ’1βˆ’1,370,589X2,iβˆ’3)β€Šmodβ€Š(232βˆ’22,853)Yi=(X1,iβˆ’X2,i)β€Šmodβ€Š(232βˆ’209)Ri=Yi/(232βˆ’209)\begin{aligned} & X_{1,i} = (1,403,580X_{1, i-2} - 810,728X_{1,i-3}) \bmod (2^{32} - 209) \\ & X_{2,i} = (527,612X_{2, i-1} - 1,370,589X_{2,i-3}) \bmod (2^{32} - 22,853) \\ & Y_i = (X_{1,i} - X_{2,i}) \bmod(2^{32} - 209) \\ & R_i = Y_i / (2^{32} - 209) \end{aligned}

As crazy as this generator looks, it only requires simple mathematical operations. It works well, it's easy to implement, and it has an amazing cycle length of about 21912^{191}.

Some Remarks

Matsumoto and Nishimura have developed the "Mersenne Twister" generator, which has a period of 219937βˆ’12^{19937} - 1. This period is beyond sufficient for any modern application; we will often need several billion PRNs, but never more than even 21002^{100}. All standard packages - Excel, Python, Arena, etc. - use one of these "good" generators.

Choosing a Good Generator - Theory

In this lesson, we will discuss some PRN generator properties from a theoretical perspective, and we'll look at an amalgamation of typical results to aid our discussion.

Theorem

Suppose we have the following multiplicative generator:

Xi=aXiβˆ’1β€Šmodβ€Š2nX_i = aX_{i-1} \bmod 2^n

This generator can have a cycle length of at most 2nβˆ’22^{n-2}, which means that this generator is not full-cycle. To make matters worse, we can only achieve this maximum cycle length when X0X_0 is odd and a=8k+3a = 8k + 3 or a=8k+5a = 8k + 5, for some kk.

For example, suppose Xi=13Xiβˆ’1β€Šmodβ€Š64X_i = 13X_{i-1}\bmod64. Note that a=8k+5,k=1a = 8k + 5, k = 1 and n=26=64n = 2^6 = 64. Consider the following sequence of values:

X0X1X2X3X4β‹―X8β‹―X16113412117β‹―33β‹―1\begin{array}{c|cccccccc} X_0 & X_1 & X_2 & X_3 & X_4 & \cdots & X_8 & \cdots & X_{16} \\ \hline 1 & 13 & 41 & 21 & 17 & \cdots & 33 & \cdots & 1 \end{array}

We can see that we have cycled after 2nβˆ’2=24=162^{n-2} = 2^4 = 16 entries. What happens to the cycle length if X0X_0 is even?

X0X1X2X3X4β‹―X8226184234β‹―33\begin{array}{c|cccccc} X_0 & X_1 & X_2 & X_3 & X_4 & \cdots & X_8 \\ \hline 2& 26 & 18 & 42 & 34 & \cdots & 33 \end{array}

Here, we cycle after 2nβˆ’3=82^{n-3} = 8 entries. Let's look at the cycle for X0=3X_0 = 3. As we can see, our cycle length increases back to 2nβˆ’22^{n-2}:

X0X1X2X3X4β‹―X8β‹―X16339566351β‹―35β‹―3\begin{array}{c|cccccccc} X_0 & X_1 & X_2 & X_3 & X_4 & \cdots & X_8 & \cdots & X_{16} \\ \hline 3 & 39 & 56 & 63 & 51 & \cdots & 35 & \cdots & 3 \end{array}

Finally, let's look at the sequence when X0=4X_0 = 4. As we can see, when we seed the generator with this value, our cycle length drops to four.

X0X1X2X3X445236204\begin{array}{c|cccc} X_0 & X_1 & X_2 & X_3 & X_4 \\ \hline 4 & 52 & 36 & 20 & 4 \end{array}

Theorem

Suppose we have the following linear congruential generator:

Xi=(aXiβˆ’1+c)mod  m,c>0X_i = (aX_{i-1} + c) \mod m, \quad c > 0

This generator has full cycle if the following three conditions hold:

  • cc and mm are relatively prime
  • aβˆ’1a - 1 is a multiple of every prime which divides mm
  • aβˆ’1a - 1 is a multiple of 44 if 44 divides mm

Let's look at a corollary to this theorem. Consider the following special case of the general LCG above:

Xi=(aXiβˆ’1+c)mod  2n,(c,n>1)X_i = (aX_{i-1} + c) \mod 2^n, \quad (c, n > 1)

This generator has full cycle if cc is odd and a=4k+1a = 4k+1 for some kk.

Theorem

Suppose we have the following multiplicative generator:

Xi=aXi=1β€Šmodβ€Šm,mΒ isΒ primeX_i = aX_{i=1} \bmod m, \quad m \text{ is prime}

This generator has full period (mβˆ’1m - 1, in this case), if and only if the following two conditions hold:

  • mm divides amβˆ’1βˆ’1a^{m-1} - 1
  • for all integers i<mβˆ’1i < m - 1, mm does not divide aiβˆ’1a^i - 1

We define the full period as mβˆ’1m - 1 in this case because, if we start at X0=0X_0 = 0, we just cycle at zero.

For m=231βˆ’1m = 2^{31} - 1, it can be shown that 534,600,000534,600,000 multipliers yield full period. In some sense, a=950,706,376a = 950,706,376 is the "best" multiplier, according to this paper by Fishman and Moore from 1986.

Geometric Considerations

Let's look at kk consecutive PRNs, (Ri,...,Ri+kβˆ’1),iβ‰₯1(R_i,...,R_{i+k-1}), i \geq 1, generated from a multiplicative generator. As it turns out, these kk-tuples lie on parallel hyperplanes in a kk-dimensional unit cube.

We might be interested in the following geometric optimizations:

  • maximizing the minimum number of hyperplanes in all directions
  • minimizing the maximum distance between parallel hyperplanes
  • maximizing the minimum Euclidean distance between adjacent kk-tuples

Remember that the RANDU generator is particularly bad because the PRNs it generates only lie on 15 hyperplanes.

One-Step Serial Correlation

We can also look at a property called one-step serial correlation, which measures how adjacent are correlated with each other. Here's a result from this paper by Greenberg from 1961:

Corr(R1,R2)≀1a(1βˆ’6cm+6(cm)2)+a+6m\text{Corr}(R_1, R_2) \leq \frac{1}{a}\left(1 - \frac{6c}{m} + 6\left(\frac{c}{m}\right)^2\right) + \frac{a+6}{m}

Remember that aa is the multiplicative constant, cc is the additive constant, and mm is the modulus.

This upper bound is very small when mm is in the range of two billion, and aa is, say, 1680716807. This result is good because it demonstrates that small XiX_i's don't necessarily follow small Xiβˆ’1X_{i-1}'s, and large XiX_i's don't necessarily follow large Xiβˆ’1X_{i-1}'s.

Choosing a Good Generator - Stats Test

In this lesson, we'll give an overview of various statistical tests for goodness-of-fit and independence. We'll conduct specific tests in subsequent lessons.

Statistical Tests Intro

We will look at two general classes of tests because we have a couple of goals we want to accomplish when examining how well pseudo-random number generators perform.

We run goodness-of-fit tests to ensure that the PRNs are approximately Unif(0,1). Generally speaking, we will look at the chi-squared test to measure goodness-of-fit, though there are other tests available. We also run independence tests to determine whether the PRNs are approximately independent. If a generator passes both tests, we can assume that it generates approximately i.i.d Unif(0,1) random variables.

All the tests that we will look at are hypothesis tests: we will test our null hypothesis (H0H_0) against an alternative hypothesis (H1H_1).

We regard H0H_0 as the status quo - "that which we currently believe to be true". In this case, H0H_0 refers to the belief that the numbers we are currently generating are, in fact, i.i.d Unif(0,1).

If we get substantial, observational evidence that H0H_0 is wrong, then we will reject it in favor of H1H_1. In other words, we are innocent until proven guilty, at least from a statistical point of view.

When we design a hypothesis test, we need to set the level of significance, α\alpha, which is the probability that we reject H0H_0, given it is true: α=P(Reject H0∣H0 true)\alpha = P(\text{Reject } H_0 | H_0 \text{ true}). Typically, we set α\alpha equal to 0.050.05 or 0.10.1. Rejecting a true H0H_0 is known as a Type I Error.

Selecting a value for Ξ±\alpha informs us of the number of observations we need to take to achieve the conditional probability to which Ξ±\alpha refers. For example, if we set Ξ±\alpha to a very small number, we typically have to draw more observations than if we were more flexible with Ξ±\alpha.

We can also specify the probability of a Type II Error, which is the probability, β\beta, of accepting H0H_0, given that it is false: β=P(Accept H0∣H0 false)\beta = P(\text{Accept } H_0 | H_0 \text{ false}). We won't focus on β\beta much in these lessons.

Usually, we are much more concerned with avoiding incorrect rejections of H0H_0, so Ξ±\alpha tends to be the more important of the two measures.

For instance, suppose we have a new anti-cancer drug, and it's competing against our current anti-cancer drug, which has average performance. Our null hypothesis would likely be that the current drug is better than the new drug because we want to keep the status quo. It's a big problem if we reject this hypothesis and replace it with a less effective drug; indeed, it's a much worse situation than not replacing it with a superior drug.

Choosing a Good Generator - Goodness-of-Fit Tests

In this lesson, we'll look at the chi-squared goodness-of-fit test to check whether or not the PRNs we generate are indeed Unif(0,1). There are many goodness-of-fit tests in the wild, such as the Kolmogorov-Smirnov test and the Anderson-Darling test, but chi-squared is the most tried-and-true. We'll look at how to do goodness-of-fit tests for other distributions later.

Chi-Squared Goodness-of-Fit Test

The first thing we need to consider when designing this test is what our null hypothesis ought to be. In this case, we assume that our PRNs are Unif(0,1). Formally, H0H_0: R1,R2,...,Rn∼Unif(0,1)R_1, R_2,...,R_n \sim \text{Unif}(0,1). As with all hypothesis tests, we assume that H0H_0 is true until we have ample evidence to the contrary, at which point we reject H0H_0.

To start, let's divide the unit interval, [0,1][0,1], into kk adjacent sub-intervals of equal width: [0,1/k),[1/k,2/k),...,[(kβˆ’1)/k,1][0, 1/k), [1/k, 2/k),..., [(k-1)/k, 1]. If our alleged uniforms are truly uniform, then the probability that any one observation RjR_j will fall in a particular sub-interval is 1/k1/k.

Next, we'll draw nn observations, R1,...,RnR_1,...,R_n, and observe how many fall into each of the kk cells. Let OiO_i equal the number of RjR_j's that fall in sub-interval ii. Given nn trials, and a probability of 1/k1/k of landing in sub-interval ii, we see that each OiO_i is distributed as a binomial random variable: Oi∼Bin(n,1/k),i=1,2,...,kO_i \sim \text{Bin}(n, 1/k), i = 1,2,...,k. Note that we can only describe OiO_i in this way if we assume that the RjR_j's are i.i.d.

Since OiO_i is a binomial random variable, we can define the expected number of RjR_j's that will fall in cell ii as: Ei=E[Oi]=n/k,i=1,2,...,kE_i = E[O_i] = n/k, i = 1,2,...,k. Remember that, by definition, the expected value of a Bin(n,pn,p) random variable is npnp.

We reject the null hypothesis that the PRNs are uniform if the OiO_i's don't match the EiE_i's well. In other words, if our observations don't match our expectations, under the assumption that H0H_0 is true, then something is wrong. The only thing that could be wrong is the null hypothesis.

The goodness-of-fit statistic, Ο‡2\chi^2, gives us some insight into how well our observations match our expectations. Here's how we compute Ο‡2\chi^2:

Ο‡02=βˆ‘i=1k(Oiβˆ’Ei)2Ei\chi_0^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}

Note the subscript of 00. All this means is that Ο‡02\chi^2_0 is a statistic that we collect by observation.

In layman's terms, we take the sum of the squared differences of each of our observations and expectations, and we standardize each squared difference by the expectation. Note that we square the difference so that negative and positive deviations don't cancel.

If the observations don't match the expectations, then Ο‡02\chi_0^2 will be a large number, which indicates a bad fit. When we talk about "bad fit" here, what we are saying is that describing the sequence of PRNs as being approximately Unif(0,1) is inappropriate: the claim doesn't fit the data.

As we said, OiO_i is a binomial random variable. For large values of nn and kk, binomial random variables are approximately normal. If EiE_i is the expected value of OiO_i, then (Oiβˆ’Ei)∼Nor(0,Οƒ2)(O_i - E_i) \sim \text{Nor}(0, \sigma^2).

As it turns out, the quantity (Oiβˆ’Ei)2(O_i - E_i)^2 is a Ο‡2\chi^2 random variable, and, by dividing it by EiE_i, we standardize it to a Ο‡2\chi^2 random variable with one degree of freedom. If we add up kk Ο‡2\chi^2 random variables, we get a Ο‡2\chi^2 random variable with kk degrees of freedom.

Note that the OiO_i's are correlated with one another; in other words, if RjR_j falls in one sub-interval, it cannot fall in another sub-interval. The number of observations that land in one sub-interval is obviously dependent on the number that land in the other sub-intervals.

All this to say: we don't actually get kk degrees of freedom, in our resulting Ο‡2\chi^2 random variable, we only get kβˆ’1k-1. We have to pay a penalty of one degree of freedom for the correlation. In summary, by the central limit theorem, Ο‡02βˆΌΟ‡kβˆ’12\chi^2_0 \sim \chi^2_{k-1}, if H0H_0 is true.

Now, we can look up various (1βˆ’a)(1-a) quantiles for Ο‡2\chi^2 distributions with varying degrees of freedom, nn, where we define a quantile, χα,n2\chi^2_{\alpha, n} as: P(Ο‡n2<χα,n2)=1βˆ’Ξ±P(\chi^2_n < \chi^2_{\alpha, n}) = 1 - \alpha. In our case, we reject the null hypothesis if our test statistic, Ο‡02\chi^2_0, is larger than χα,kβˆ’12\chi^2_{\alpha, k-1}, and we fail to reject H0H_0 if Ο‡02≀χα,kβˆ’12\chi^2_0 \leq \chi^2_{\alpha, k-1}.

The usual recommendation from an introductory statistics class for the Ο‡2\chi^2 goodness-of-fit test is to pick large values for kk and nn: Ei=n/kE_i = n/k should be at least five, and nn should be at least 30.

However, when we test PRN generators, we usually have massive values for both nn and kk. When kk is so large we can't find tables with values for Ο‡a,kβˆ’12\chi^2_{a,k-1}, we can approximate the Ο‡2\chi^2 quantile using the following expression:

χα,kβˆ’12β‰ˆ(kβˆ’1)[1βˆ’29(kβˆ’1)+za29(kβˆ’1)]3\chi^2_{\alpha, k-1} \approx (k-1) \left[1 - \frac{2}{9(k-1)} + z_a \sqrt{\frac{2}{9(k-1)}}\right]^3

Note that zaz_a is the appropriate standard normal quantile.

Illustrative Example

Suppose we draw n=1000n = 1000 observations, and we've divided the unit interval into k=5k=5 equal sub-intervals. The expected number of observations that fall into each interval is Ei=n/k=200E_i = n/k = 200. Let's look at the observations we gathered:

interval[0,0.2](0.2,0.4](0.4,0.6](0.6,0.8](0.8,1.0]Ei200200200200200Oi179208222199192\begin{array}{c|ccccc} \text{interval} & [0,0.2] & (0.2,0.4] & (0.4, 0.6] & (0.6, 0.8] & (0.8, 1.0] \\ \hline E_i & 200 & 200 & 200 & 200 & 200 \\ O_i & 179 & 208 & 222 & 199 & 192 \end{array}

Let's compute the goodness-of-fit statistic:

Ο‡02=βˆ‘i=1k(Oiβˆ’Ei)2Ei=1200((βˆ’21)2+82+222+(βˆ’1)2+(βˆ’8)2)=1054200=5.27\begin{alignedat}{1} \chi_0^2 &= \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} \\ &= \frac{1}{200}((-21)^2 + 8^2 + 22^2 + (-1)^2 + (-8)^2) \\[2ex] & = \frac{1054}{200} = 5.27 \end{alignedat}

Since k=5k=5, we are looking at a Ο‡2\chi^2 distribution with four degrees of freedom. Let's choose Ξ±=0.05\alpha = 0.05. Now, we need to look up the Ο‡0.05,42\chi^2_{0.05, 4} quantile in a table, which has a value of 9.49. Since our test statistic is less than the quantile, we fail to reject H0H_0. We can assume that these observations are approximately uniform.

Choosing a Good Generator - Independence Tests I

In this lesson, we'll look at the so-called "runs" tests for independence of PRNs. There are many other tests - correlation tests, gap test, poker tests, birthday tests, etc. - but we'll focus specifically on two types of runs tests.

Interestingly, we find ourselves in something of a catch-22: the independence tests all assume that the PRNs are Unif(0,1), while the goodness-of-fit tests all assume that the PRNs are independent.

Independence - Runs Tests

Let's consider three different sequences of coin tosses:

A.Β H,T,H,T,H,T,H,T,H,TB.Β H,H,H,H,H,T,T,T,T,TC.Β H,H,H,T,T,H,T,T,H,T\begin{aligned} \text{A. H,T,H,T,H,T,H,T,H,T} \\ \text{B. H,H,H,H,H,T,T,T,T,T} \\ \text{C. H,H,H,T,T,H,T,T,H,T} \\ \end{aligned}

In example A, we have a high negative correlation between the coin tosses since tails always follows heads and vice versa. In example B, we have a high positive correlation, since similar outcomes tend to follow one another. We can't see a discernible pattern among the coin tosses in example C, so observations might be independent in this sequence.

A run is a series of similar observations. In example A above, there are ten runs: "H", "T", "H", "T", "H", "T", "H", "T", "H", "T". In example B, there are two runs: "HHHHH", "TTTTT". Finally, in example C, there are six runs: "HHH", "TT", "H", "TT", "H", "T".

In independence testing, our null hypothesis is that the PRNs R1,R2,...,RnR_1, R_2,...,R_n are independent. A runs test rejects the null hypothesis if there are "too many" or "too few" runs, provided we quantify the terms "too many" and "too few". There are several types of runs tests, and we'll look at two of them.

Runs Test "Up and Down"

Consider the following PRNs:

.41.68.89.84.74.91.55.71.36.30.09.41 \quad .68 \quad .89 \quad .84 \quad .74 \quad .91 \quad .55 \quad .71 \quad .36 \quad .30 \quad .09

In the "up and down" runs test, we look to see whether we go "up" or "down" as we move from PRN to PRN. Going from .41.41 to .68.68, we go up. Going from .68.68 to 0.890.89, we go up. Going from 0.890.89 to 0.840.84, we go down. From 0.840.84 to 0.740.74, we go down again.

Let's transform our sequence of PRNs into one of plusses and minuses, where a plus sign indicates going up, and a minus sign indicates going down:

++βˆ’βˆ’+βˆ’+βˆ’βˆ’βˆ’...++--+-+--- ...

Here are the associated runs - there are six in total - demarcated by commas:

++,βˆ’βˆ’,+,βˆ’,+,βˆ’βˆ’βˆ’,...++,--,+,-,+,---, ...

Let AA denote the total number of up and down runs out of the nn observations. Like we said, A=6A = 6 in this example. If nn is large (say, β‰₯20\geq 20), and the RjR_j's are indeed independent, then AA is approximately normal, with the following parameters:

Aβ‰ˆNor(2nβˆ’13,16nβˆ’2990)A \approx \text{Nor}\left(\frac{2n-1}{3}, \frac{16n-29}{90}\right)

Let's transform AA into a standard normal random variable, Z0Z_0, which we accomplish with the following manipulation:

Z0=Aβˆ’E[A]Var(A)Z_0 = \frac{A - E[A]}{\sqrt{\text{Var}(A)}}

Now we can finally quantify "too large" and "too small". Specifically, we reject H0H_0 if the absolute value of Z0Z_0 is greater than the Ξ±/2\alpha/2 standard normal quantile:

f(Z0)={RejectΒ H0∣Z0∣>zΞ±/2AcceptΒ H0∣Z0βˆ£β‰€zΞ±/2f(Z_0) = \left\{ \begin{matrix} \text{Reject } H_0 & |Z_0| > z_{\alpha/2} \\ \text{Accept } H_0 & |Z_0| \leq z_{\alpha/2} \end{matrix} \right.

Up and Down Example

Suppose we have observed A=55A = 55 runs over n=100n = 100 observations. Given these variables, AA is approximately normal with the following parameters:

Aβ‰ˆNor(2(100)βˆ’13,16(100)βˆ’2990)β‰ˆNor(1993,157190)β‰ˆNor(66.33,17.46)\begin{alignedat}{1} A &\approx \text{Nor}\left(\frac{2(100)-1}{3}, \frac{16(100)-29}{90}\right) \\ & \approx \text{Nor}\left(\frac{199}{3}, \frac{1571}{90}\right) \\ & \approx \text{Nor}\left(66.33, 17.46\right) \end{alignedat}

Let's compute Z0Z_0:

Z0=Aβˆ’E[A]Var(A)=55βˆ’66.3317.46β‰ˆβˆ’11.334.1785β‰ˆβˆ’2.71\begin{alignedat}{1} Z_0 &= \frac{A - E[A]}{\sqrt{\text{Var}(A)}} \\[3ex] &= \frac{55 - 66.33}{\sqrt{17.46}} \\[3ex] &\approx \frac{-11.33}{4.1785} \approx -2.71 \end{alignedat}

If Ξ±=0.05\alpha = 0.05, then zΞ±/2=1.96z_{\alpha/2} = 1.96 and we reject H0H_0, thereby rejecting independence.

Runs Test "Above and Below the Mean"

Let's look at the same sequence of PRNs:

.41.68.89.84.74.91.55.71.36.30.09.41 \quad .68 \quad .89 \quad .84 \quad .74 \quad .91 \quad .55 \quad .71 \quad .36 \quad .30 \quad .09

Let's transform our sequence of PRNs into one of plusses and minuses, where a plus sign indicates that Riβ‰₯0.5R_i \geq 0.5, and a minus sign indicates that Ri<0.5R_i < 0.5:

βˆ’+++++++βˆ’βˆ’βˆ’...-+++++++---...

Here are the associated runs - there are three in total - demarcated by commas:

βˆ’,+++++++,βˆ’βˆ’βˆ’-,+++++++,---

If nn is large (say, β‰₯20\geq 20), and the RjR_j's are indeed independent, then the number of runs, BB, is again approximately normal, with the following parameters:

Bβ‰ˆNor(2n1n2n+12,2n1n2(2n1n2βˆ’n)n2(nβˆ’1))B \approx \text{Nor}\left(\frac{2n_1n_2}{n} + \frac{1}{2}, \frac{2n_1n_2(2n_1n_2 - n)}{n^2(n-1)}\right)

Note that n1n_1 refers to the number of observations greater than or equal to the mean and n2=nβˆ’n1n_2 = n - n_1.

Let's transform BB into a standard normal random variable, Z0Z_0, which we accomplish with the following manipulation:

Z0=Bβˆ’E[B]Var(B)Z_0 = \frac{B - E[B]}{\sqrt{\text{Var}(B)}}

Again, we reject H0H_0 if the absolute value of Z0Z_0 is greater than the Ξ±/2\alpha/2 standard normal quantile:

f(Z0)={RejectΒ H0∣Z0∣>zΞ±/2AcceptΒ H0∣Z0βˆ£β‰€zΞ±/2f(Z_0) = \left\{ \begin{matrix} \text{Reject } H_0 & |Z_0| > z_{\alpha/2} \\ \text{Accept } H_0 & |Z_0| \leq z_{\alpha/2} \end{matrix} \right.

Above and Below the Mean Example

Consider the following +/βˆ’+/- sequence of n=40n=40 observations:

βˆ’+++++++βˆ’βˆ’βˆ’++βˆ’+βˆ’βˆ’βˆ’βˆ’βˆ’βˆ’βˆ’++βˆ’βˆ’βˆ’βˆ’++βˆ’βˆ’+βˆ’+βˆ’βˆ’++βˆ’\begin{aligned} -+++++++---++-+----- \\ --++----++--+-+--++- \end{aligned}

In this case, we have n1=18n_1 = 18 observations above the mean and n2=22n_2 = 22 observations below the mean, as well as B=17B=17 runs. Without walking through the algebra, we can compute that Bβ‰ˆNor(20.3,9.54)B \approx \text{Nor}(20.3, 9.54).

Let's compute Z0Z_0:

Z0=Bβˆ’E[B]Var(B)=17βˆ’20.39.54β‰ˆβˆ’3.33.0887β‰ˆβˆ’1.07\begin{alignedat}{1} Z_0 &= \frac{B - E[B]}{\sqrt{\text{Var}(B)}} \\[3ex] &= \frac{17 - 20.3}{\sqrt{9.54}} \\[3ex] &\approx \frac{-3.3}{3.0887} \approx -1.07 \end{alignedat}

If Ξ±=0.05\alpha = 0.05, then zΞ±/2=1.96z_{\alpha/2} = 1.96, and we fail to reject H0H_0. Therefore, we can treat the observations in this sequence as being independent.

Choosing a Good Generator - Independence Tests II (OPTIONAL)

In this lesson, we'll look at another class of tests, autocorrelation tests, that we can use to evaluate whether pseudo-random numbers are independent.

Correlation Test

Given a sequence of PRNs, R1,R2,...RnR_1, R_2, ... R_n, and assuming that each RiR_i is Unif(0,1), we can conduct a correlation test against the null hypothesis that the RiR_i's are indeed independent.

We define the lag-1 correlation of the RiR_i's by ρ≑Corr(Ri,Ri+1)\rho \equiv \text{Corr}(R_i, R_{i+1}). In other words, the lag-1 correlation measures the correlation between one PRN and its immediate successor. Ideally, if the PRNs are uncorrelated, ρ\rho should be zero.

A good estimator for ρ\rho is given by:

ρ^≑(12nβˆ’1βˆ‘k=1nβˆ’1RkR1+k)βˆ’3\hat\rho \equiv \left(\frac{12}{n-1}\sum_{k=1}^{n-1}R_kR_{1+k}\right) - 3

In particular, if nn is large, and H0H_0 is true:

ρ^β‰ˆNor(0,13nβˆ’19(nβˆ’1)2)\hat\rho \approx \text{Nor}\left(0, \frac{13n- 19}{(n-1)^2}\right)

Let's transform ρ^\hat{\rho} into a standard normal random variable, Z0Z_0, which we accomplish with the following manipulation:

Z0=ρ^βˆ’E[ρ^]Var(ρ^)=ρ^Var(ρ^)Z_0 = \frac{\hat\rho - E[\hat\rho]}{\sqrt{\text{Var}(\hat\rho)}} = \frac{\hat\rho}{\sqrt{\text{Var}(\hat\rho)}}

We reject H0H_0 if the absolute value of Z0Z_0 is greater than the Ξ±/2\alpha/2 standard normal quantile:

f(Z0)={RejectΒ H0∣Z0∣>zΞ±/2AcceptΒ H0∣Z0βˆ£β‰€zΞ±/2f(Z_0) = \left\{ \begin{matrix} \text{Reject } H_0 & |Z_0| > z_{\alpha/2} \\ \text{Accept } H_0 & |Z_0| \leq z_{\alpha/2} \end{matrix} \right.

Example

Consider the following n=30n=30 PRNs:

0.290.380.460.290.690.730.800.740.990.740.880.660.560.410.350.220.180.050.250.360.390.450.500.620.760.810.970.720.110.55\begin{aligned} 0.29 \quad 0.38 \quad 0.46 \quad 0.29 \quad 0.69 \quad 0.73 \quad 0.80 \quad 0.74 \quad 0.99 \quad 0.74 \\ 0.88 \quad 0.66 \quad 0.56 \quad 0.41 \quad 0.35 \quad 0.22 \quad 0.18 \quad 0.05 \quad 0.25 \quad 0.36 \\ 0.39 \quad 0.45 \quad 0.50 \quad 0.62 \quad 0.76 \quad 0.81 \quad 0.97 \quad 0.72 \quad 0.11 \quad 0.55 \\ \end{aligned}

After some algebra, we get ρ^=0.950\hat\rho = 0.950 and Var(ρ^)=0.441\text{Var}(\hat\rho) = 0.441. Notice how high our correlation estimator is; we might expect a robust rejection of the null hypothesis.

Let's compute Z0Z_0:

Z0=ρ^Var(B)=0.9500.441β‰ˆ1.43\begin{alignedat}{1} Z_0 &= \frac{\hat\rho}{\sqrt{\text{Var}(B)}} \\[3ex] &= \frac{0.950}{\sqrt{0.441}} \approx 1.43 \end{alignedat}

If Ξ±=0.05\alpha = 0.05, then zΞ±/2=1.96z_{\alpha/2} = 1.96, and we fail to reject H0H_0. Therefore, we can treat the observations in this sequence as being independent. In this case, n=30n=30 observations is quite small, and we might indeed reject H0H_0, given such a high correlation, if we were to collect more observations.

OMSCS Notes is made with in NYC by Matt Schlenker.

Copyright Β© 2019-2023. All rights reserved.

privacy policy