Simulation

Random Variate Generation

59 minute read



Notice a tyop typo? Please submit an issue or open a PR.

Random Variate Generation

Inverse Transform Method

In this lesson, we'll go into some additional detail regarding the inverse transform method.

Inverse Transform Method

The inverse transform method states that, if XX is a continuous random variable with cdf F(x)F(x), then F(X)U(0,1)F(X) \sim \mathcal{U}(0,1). In other words, if we plug a random variable, from any distribution, into its own cdf, we get a Unif(0,1) random variable.

We can prove the inverse transform method. Let Y=F(X)Y = F(X). Since YY is a random variable, it has a cdf, which we can denote G(y)G(y). By definition:

G(y)=P(Yy)G(y) = P(Y \leq y)

Since Y=F(X)Y = F(X):

G(y)=P(F(X)y)G(y) = P(F(X) \leq y)

Since XX is a continuous random variable, its cdf is continuous. Therefore, we can apply the inverse, F1F^{-1}, to both sides of the inequality:

G(y)=P(F1(F(X))F1(y))G(y) = P(F^{-1}(F(X)) \leq F^{-1}(y))

What is F1(F(X))F^{-1}(F(X))? Simply, XX:

G(y)=P(XF1(y))G(y) = P(X \leq F^{-1}(y))

Notice that we have an expression of the form P(XxP(X \leq x), where x=F1(y)x = F^{-1}(y). We know, by definition, F(x)=P(Xx)F(x) = P(X \leq x), so:

G(y)=F(F1(y))=yG(y) = F(F^{-1}(y)) = y

In summary, the cdf of YY is G(y)=yG(y) = y. If we take the derivative of the cdf to get the pdf, we see that g(y)=1g(y) = 1. Let's remember the pdf for a uniform random variable:

f(x)={1bax[a,b]0otherwisef(x) = \left\{ \begin{matrix} \frac{1}{b-a} & x \in [a,b] \\ 0 & \text{otherwise} \end{matrix} \right.

If a=0,b=1a = 0, b = 1, then f(x)=1=g(y)f(x) = 1 = g(y). Therefore, YU(0,1)Y \sim \mathcal{U}(0,1).

How Do We Use This Result

Let UU(0,1)U \sim \mathcal{U}(0,1). Since F(X)=UF(X) = U, then F1(U)=XF^{-1}(U) = X. In other words, if we set F(X)=UF(X) = U, and then solve for XX, we get a random variable from the same distribution as XX, but in terms of UU.

Here is how we use the inverse transform method for generating a random variable XX having cdf F(x)F(x): 1. Sample UU from U(0,1)\mathcal{U}(0,1) 2. Return X=F1(U)X = F^{-1}(U)

Illustration

Consider the following cdf, F(x)F(x).

Let's think about the range of F(x)F(x). Since we are dealing with a cdf, we know that 0F(x)10 \leq F(x) \leq 1. Note that, correspondingly, 0U<10 \leq \mathcal{U} < 1. This means that, for a given Unif(0,1) random variable, UU, we can generate an xx, such that F(x)=UF(x) = U, by taking the inverse of FF with UU: x=F1(U)x = F^{-1}(U).

Let's look at an example. Suppose we generate U=0.9U = 0.9. Let's draw a horizontal line (0,0.9)(0, 0.9) to the graph of F(x)F(x). From there, if we draw a vertical line down to the xx-axis, the xx-coordinate of the intersection is F1(U)F^{-1}(U). In this case, F1(0.9)=3F^{-1}(0.9) = 3.

Uniform Example

Consider the U(a,b)\mathcal{U}(a,b) distribution, which has the following cdf:

F(x)=xaba,axbF(x) = \frac{x - a}{b - a}, \quad a \leq x \leq b

Let's set F(X)=UF(X) = U and solve for XX:

U=Xaba(ba)U=Xaa+(ba)U=X\begin{alignedat}{1} U &= \frac{X - a}{b - a} \\[2ex] (b - a)U &= X - a \\ a + (b - a )U &= X \end{alignedat}

Intuitively, this result makes perfect sense. If we take a Unif(0,1) random variable, and multiply it by bab-a, we end up with a Unif(0, bab-a) random variable. Then, if we add aa to each end, we get a Unif(aa, bb) random variable.

Exponential Example

Consider the Exp(λ)\text{Exp}(\lambda) distribution, which has the following cdf:

F(x)=1eλx,x0F(x) = 1 - e^{-\lambda x}, x \geq 0

Let's set F(X)=UF(X) = U and solve for XX:

U=1eλXU1=eλXln(U1)=λXln(U1)λ=X\begin{alignedat}{1} U &= 1 - e^{-\lambda X} \\ U - 1 &= e^{-\lambda X} \\ \ln(U - 1) &= -\lambda X \\ \frac{\ln(U - 1)}{-\lambda} &= X \end{alignedat}

Also, we know that the expression U1U-1 is itself uniform, so we can simplify:

X=ln(U)λX = \frac{\ln(U)}{-\lambda}

Inverse Transform Method - Continuous Examples

In this lesson, we'll apply the inverse transform method to trickier continuous examples.

Weibull Example

Let's start with the Weibull distribution, which has the following cdf:

F(x)=1e(λx)βF(x) = 1 - e ^{(-\lambda x)^\beta}

We can solve F(X)=UF(X) = U for XX:

U=1e(λX)β1U=e(λX)βln(1U)=λXβln(1U)1/β=λX1λln(1U)1/β=X\begin{alignedat}{1} U &= 1 - e ^{(-\lambda X)^\beta} \\ 1 - U &= e ^{(-\lambda X)^\beta} \\ \ln(1 - U) &= -\lambda X^\beta \\ \ln(1 - U)^{1/\beta} &= -\lambda X \\ \frac{-1}{\lambda}\ln(1 - U)^{1/\beta} &= X \\ \end{alignedat}

As we saw before, we know that the expression U1U-1 is itself uniform, so we can simplify:

X=1λln(U)1/βX = \frac{-1}{\lambda}\ln(U)^{1/\beta}

Notice that, if β=1\beta = 1, X=ln(U)/λExp(λ)X = -\ln(U) / \lambda \sim \text{Exp}(\lambda). We say that the Weibull distribution generalizes the exponential distribution.

Triangular Example

Now let's look at the triangular distribution. This distribution is important when we don't have a lot of data. For example, suppose we ask someone how long it takes to complete a particular job. Without much data, all they might be able to tell us is the minimum, maximum, and most common (mode) amounts of time. A triangular distribution - parameterized by these three values - is often a good first approximation.

Let's specifically consider the triangular(0,1,2) distribution, which has the following pdf:

f(x)={x0x<12x1x2f(x) = \left\{ \begin{matrix} x & 0 \leq x < 1 \\ 2 - x & 1 \leq x \leq 2 \end{matrix} \right.

If we integrate, we get the following cdf:

F(x)={x2/20x<11(x2)2/21x2F(x) = \left\{ \begin{matrix} x^2 / 2 & 0 \leq x < 1 \\ 1 - (x-2)^2 / 2 & 1 \leq x \leq 2 \end{matrix} \right.

Since the cdf contains distinct expressions for different ranges of xx, we need to transform each expression independently. How do we know which uniforms belong to which expressions? Well, the cdf evaluates the top expression for the first half of the possible xx-values and the bottom expression for the second half, so we can similarly divide the unit interval in half.

Specifically, if U<1/2U < 1/2, we solve X2/2=UX^2/2 = U to get X=2UX = \sqrt{2U}. Note that the square root technically contains both negative and positive roots, but we only consider the positive root since we know that XX must be between zero and one.

If U1/2U \geq 1/2, we solve the second expression for XX:

U=1(X2)2/21U=(X2)2/22ln(1U)=(X2)22ln(1U)=X222ln(1U)=X\begin{alignedat}{1} U &= 1 - (X-2)^2 / 2 \\ 1 - U &= (X-2)^2 / 2 \\ 2\ln(1 - U) &= (X-2)^2 \\ \sqrt{2\ln(1 - U)} &= X-2 \\ 2 - \sqrt{2\ln(1 - U)} &= X \\ \end{alignedat}

Technically, we could have a ±\pm in front of the square root, but it doesn't make sense to consider the positive root since XX must be between one and two. In this case, we only consider the negative root to keep XX within bounds.

Be aware that we cannot replace 1U1-U with UU in this example. We need 1U1-U to keep XX bounded appropriately. For example, suppose we made the replacement and drew U=1U = 1. Then, X=22<1X = 2 - \sqrt{2} < 1, which doesn't make sense since we know that XX must be between one and two.

In any event, let's look at two simple examples. If U=0.4U = 0.4, we evaluate X=2U=0.8X = \sqrt{2U} = \sqrt{0.8}. Alternatively, if U=0.5U = 0.5, then X=22(10.5)=21=1X = 2 - \sqrt{2(1-0.5)} = 2 - 1 = 1.

Inverse Transform Method - Continuous Examples - DEMO 1

In this demo, we'll look at some Matlab code that applies the inverse transform method to simulate a triangular(0,1,2) random variable. Here's the code.

m = 100;
X = [];
for i=1:m
  U = rand;
  if U < 0.5
    X(i) = sqrt(2 * U);
  else
    X(i) = 2 - sqrt(2 * (1-U));
  end
  i = i+1;
end
histogram(X)

First, we initialize an empty array, X, and then we iterate one hundred times. On each iteration, we sample a uniform random variable with the rand function, perform the appropriate transform depending on whether U < 0.5, and then store the result in X. After we finish iterating, we plot a histogram of the elements in X.

Let's take a look at the histogram we created, which slightly resembles a triangle.

Now, let's iterate one hundred thousand times and look at the resulting histogram. This result looks much nicer.

Inverse Transform Method - Continuous Examples - DEMO 2

Now, let's talk about generating standard normal random variates. Remember our general formula: to compute F1(U)F^{-1}(U), we first set F(X)=UF(X) = U and then solve for XX.

Unfortunately, the inverse cdf of the normal distribution, Φ1()\Phi^{-1}(\cdot), does not have an analytical form, which means we can't use our normal approach. This obstacle frequently occurs when working with the inverse transform method.

Generating Standard Normal Random Variables

One easy solution to get around this issue is to look up the values in a standard normal table. For instance, if U=0.975U = 0.975, then the value of ZZ, for which Φ(Z)=0.975\Phi(Z) = 0.975, is 1.961.96. If we want to compute Φ1(0.975)\Phi^{-1}(0.975) in Excel, we can execute this function: NORMSINV(0.975).

We can also use the following crude approximation for ZZ, which gives at least one decimal place of accuracy for 0.00134U0.988650.00134 \leq U \leq 0.98865:

Z=Φ1(U)U0.135(1U)0.1350.1975Z = \Phi^{-1}(U) \approx \frac{U^{0.135} - (1-U)^{0.135}}{0.1975}

Here's a more accurate, albeit significantly more complicated, approximation, which has an absolute error 0.45×103\leq 0.45 \times 10^{-3}:

Z=sign(U1/2)(tc0+c1t+c2t21+d1t+d2t2+d3t3)wheresign(x)={1x<00x=01x>0,t=ln[min(U,1U)]2,c0=2.515517,c1=0.802853,0.010328d1=1.432788,d2=0.189269d3=0.001308\begin{aligned} & Z = \text{sign}(U - 1/2)\left(t- \frac{c_0 + c_1t + c_2t^2}{1 + d_1t + d_2t^2 + d_3t^3}\right) \\[2ex] & \text{where} \\[2ex] & \text{sign}(x) = \left\{ \begin{matrix} -1 & x < 0 \\ 0 & x = 0 \\ 1 & x > 0 \end{matrix} \right., \\[5ex] & t = \sqrt{-\ln[\min(U, 1-U)]^2}, \\[2ex] &c_0 = 2.515517, \quad c_1 = 0.802853, \quad 0.010328 \\ &d_1 = 1.432788, \quad d_2 = 0.189269 \quad d_3 = 0.001308 \end{aligned}

Transforming Standard Normals to Other Normals

Now, suppose we have ZNor(0,1)Z \sim \text{Nor}(0,1), and we want XNor(μ,σ2)X \sim \text{Nor}(\mu, \sigma^2). We can apply the following transformation:

Xμ+σZX \leftarrow \mu + \sigma Z

Note that we multiply ZZ by σ\sigma, not σ2\sigma^2!

Let's look at an example. Suppose we want to generate XNor(3,16)X \sim \text{Nor}(3,16), and we start with U=0.59U = 0.59. We know that:

X=μ+σZX = \mu + \sigma Z

Remember that Z=Φ1(U)Z = \Phi^{-1}(U):

X=μ+σΦ1(U)=μ+σΦ1(0.59)X = \mu + \sigma \Phi^{-1}(U) = \mu + \sigma \Phi^{-1}(0.59)

If we compute Φ1(0.59)\Phi^{-1}(0.59), using one of the methods above, we get Φ1(0.59)=0.2275\Phi^{-1}(0.59) = 0.2275. Now we can plug and chug:

X=3+4(0.2275)=3.91X = 3 + 4(0.2275) = 3.91

Matlab Demonstration

Let's look at some Matlab code to generate standard normal random variables from Unif(0,1) observations. Consider the following:

m = 100;
x = [];
for i=1:m
  X(i) = norminv(rand, 0, 1);
  i = i + 1;
end
histogram(X)

First, we initialize an empty array, X, and then we iterate one hundred times. On each iteration, we first sample a Unif(0,1) random variable with the rand function. Next, we transform it into a Nor(0,1) random variable using the norminv function, which receives a real number, a mean, and a standard deviation and returns the transformed value. Finally, we store the result in X. After we finish iterating, we plot a histogram of the elements in X.

Let's look at the resulting histogram, which doesn't look like the famous bell curve since we are only drawing one hundred observations.

Let's now sample one hundred thousand uniforms and plot the corresponding histogram. This result looks much prettier.

Inverse Transform Method - Discrete Examples

In this lesson, we will use the inverse transform method to generate discrete random variables.

Discrete Examples

It's often best to construct a table for discrete distributions when looking to apply the inverse transform method, especially if the distribution doesn't take on too many values.

Bernoulli Example

Let's look at the Bern(pp) distribution. XBern(p)X \sim \text{Bern}(p) takes on the value of one with probability pp and the value zero with probability 1p1 - p.

The cdf, F(x)F(x), takes on the value 1p1-p for x=0x = 0 since P(X0)=P(0)=1pP(X \leq 0) = P(0) = 1 - p. For x=1x = 1, F(x)=1F(x) = 1, since P(X1)=P(X=0)+P(X=1)=1p+p=1P(X \leq 1) = P(X = 0) + P(X = 1) = 1 - p + p = 1.

Let's construct a table to see how we might map the unit interval onto these two values of XX:

xP(X=x)F(x)U(0,1)’s01p1p[0,1p]1p1(1p,1]\begin{array}{cccc} x & P(X = x) & F(x) & \mathcal{U}(0,1)\text{'s} \\ \hline 0 & 1 - p & 1 - p & [0, 1 - p] \\ 1 & p & 1 & (1 - p, 1] \end{array}

Since P(X=0)P(X = 0) occurs with probability 1p1-p, we transform all uniforms on the range [0,1p][0, 1-p] to X=0X = 0. This leaves a range of uniforms of width pp, which map to X=1X = 1, from which we draw values with probability pp.

Here's a simple implementation. If we draw a uniform U1pU \leq 1 - p, then we take X=0X = 0; otherwise, we take X=1X = 1. For example, if p=0.75p = 0.75, and U0.13U - 0.13, we take X=0X = 0 since U(1p=0.25)U \leq (1 - p = 0.25).

Alternatively, we can construct the following "backwards" table, starting with X=1X = 1 instead of X=0X = 0, which isn't a strict application of the inverse transform method. However, this approach slices the uniforms in a much more intuitive manner. Here's the table:

xP(X=x)F(x)U(0,1)’s1p1[0,p]01p1p(p,1]\begin{array}{cccc} x & P(X = x) & F(x) & \mathcal{U}(0,1)\text{'s} \\ \hline 1 & p & 1 & [0, p] \\ 0 & 1 - p & 1 - p & (p, 1] \end{array}

Here, we take X=1X = 1 if UpU \leq p, which occurs with probability pp, and we take X=0X = 0 if U>pU > p, which occurs with probability 1p1 - p.

Generic Example

Consider this slightly less-trivial pmf:

xP(X=x)F(x)U(0,1)’s10.60.6[0.0,0.6]2.50.30.9(0.6,0.9]40.11.0(0.9,1.0]\begin{array}{cccc} x & P(X = x) & F(x) & \mathcal{U}(0,1)\text{'s} \\ \hline -1 & 0.6 & 0.6 & [0.0, 0.6] \\ 2.5 & 0.3 & 0.9 & (0.6, 0.9] \\ 4 & 0.1 & 1.0 & (0.9, 1.0] \end{array}

Remember that, for a discrete distribution, F(xi)=P(X=xi)+F(xi1)F(x_i) = P(X = x_i) + F(x_{i-1}), where F(x0)=0F(x_0) = 0. Correspondingly, the range of uniforms that maps to xix_i is bounded by [F(xi1),F(xi)][F(x_{i-1}), F(x_i)]. For example, if U=0.63U = 0.63, we take X=2.5X = 2.5.

Discrete Uniform Example

Sometimes, there's an easy way to avoid constructing a table. For example, consider the discrete uniform distribution over {1,2,...,n}\{1,2,...,n\}, which has the following pmf:

P(X=k)=1n,k=1,2,...,nP(X = k) = \frac{1}{n}, \quad k = 1,2,...,n

We can think of this random variable as an nn-sided die toss. How do we transform a Unif(0,1) correctly? If we take a real number between zero and one and multiply it by nn, we get a real number between zero and nn. If we round up, we get an integer between one and nn. For example, if n=10n = 10 and U=0.376U = 0.376, then X=3.76=4X = \lceil 3.76 \rceil = 4, where \lceil\cdot\rceil is the ceiling/"round up" function.

Geometric Example

Now let's look at the geometric distribution. Intuitively, this distribution represents the number of Bern(pp) trials until the first success. Therefore, the pmf, f(k)f(k), represents the probability that the first success occurs on the kk-th try: k1k - 1 failures, with probability qk1q^{k-1}, followed by a single success, with probability pp.

Accordingly, this distribution has the following pmf and cdf:

f(k)=qk1p,F(k)=1qk,k=1,2,...,f(k) = q^{k-1}p, \quad F(k) = 1 - q^k, k = 1,2,...,

Let's set F(X)=UF(X) = U and solve for XX, finding the minimum kk such that 1qkU1 - q^k \geq U. After some algebra, we see that:

X=min[k:1qkU]=ln(1U)ln(1p)X = \min[k : 1 - q^k \geq U] = \left\lceil\frac{\ln(1-U)}{\ln(1-p)}\right\rceil

As we've seen in the past, we can also replace 1U1-U with UU:

X=min[k:1qkU]=ln(U)ln(1p)X = \min[k : 1 - q^k \geq U] = \left\lceil\frac{\ln(U)}{\ln(1-p)}\right\rceil

For instance, if p=0.3p = 0.3, and U=0.72U = 0.72, we obtain:

X=ln(1U)ln(1p)=ln(0.28)ln(0.7)=4X = \left\lceil\frac{\ln(1- U)}{\ln(1-p)}\right\rceil = \left\lceil\frac{\ln(0.28)}{\ln(0.7)}\right\rceil = 4

We can also generate a Geom(pp) random variable by counting Bern(pp) trials until we see a success. Suppose we want to generate XGeom(1/6)X \sim \text{Geom}(1/6). We can generate Bern(1/61/6) random variables until we see a one. For instance, if we generate 0,0,0,0,10,0,0,0,1, then X=5X = 5.

More generally, we generate a Geom(pp) random variable by counting the number of trials until UipU_i \leq p, which occurs with probability pp for each UiU_i. For example, if p=0.3p=0.3, and we draw U1=0.71U_1 = 0.71, U2=0.96U_2 = 0.96, and U3=0.12U_3 = 0.12, then X=3X=3.

Poisson Example

If we have a discrete distribution, like Pois(λ\lambda), which has an infinite number of values, we could write out table entries until the cdf is nearly one, generate a uniform, UU, and then search the table to find X=xi=F1(U)X = x_i = F^{-1}(U), such that U(F(xi1),F(xi)]U \in (F(x_{i-1}), F(x_i)]. Consider the following table:

xP(X=x)F(x)U(0,1)’sx1f(x1)F(x1)[0.0,F(x1)]x2f(x2)F(x2)(F(x1),F(x2)]x3f(x3)F(x3)(F(x2),F(x3)]\begin{array}{cccc} x & P(X = x) & F(x) & \mathcal{U}(0,1)\text{'s} \\ \hline x_1 & f(x_1) & F(x_1) & [0.0, F(x_1)] \\ x_2 & f(x_2) & F(x_2) & (F(x_1), F(x_2)] \\ x_3 & f(x_3) & F(x_3) & (F(x_2), F(x_3)] \\ \vdots \end{array}

For example, if we generate a uniform in the range (F(x1),F(x2)](F(x_1), F(x_2)], we select X=x2X = x_2.

Let's look at a concrete example, in which we will generate XPois(2)X \sim \text{Pois}(2), which has the following pmf:

f(x)=e22xx!,x=0,1,2,...f(x) = \frac{e^{-2}2^x}{x!}, \quad x = 0,1,2,...

We can construct the following table:

xP(X=x)F(x)U(0,1)’s00.13530.1353[0.0,0.1353]10.27060.4059(0.1353,0.4059]20.27060.6765(0.4059,0.6765]\begin{array}{cccc} x & P(X = x) & F(x) & \mathcal{U}(0,1)\text{'s} \\ \hline 0 & 0.1353 & 0.1353 & [0.0, 0.1353] \\ 1 & 0.2706 & 0.4059 & (0.1353, 0.4059] \\ 2 & 0.2706 & 0.6765 & (0.4059, 0.6765] \\ \vdots \end{array}

For example, if U=0.313U = 0.313, then X=1X = 1.

Inverse Transform Method - Empirical Distributions (OPTIONAL)

In this lesson, we will look at how to apply the inverse transform method to unknown, empirical, continuous distributions. Here, we don't know F(x)F(x), so we have to approximate it. We will blend continuous and discrete strategies to transform these random variables.

Continuous Empirical Distributions

If we can't find a good theoretical distribution to model a certain random variable, or we don't have enough data, we can use the empirical cdf of the data, X1,X2,...,XnX_1, X_2,...,X_n:

F^n(x)number of Xi’sxn\hat F_n(x) \equiv \frac{\text{number of } X_i\text{'s} \leq x}{n}

Each XiX_i has a probability 1/n1/n of being drawn from our unknown continuous distribution. As a result, we can construct a cdf, F^n\hat F_n, where F^n(x)\hat F_n(x) is equal to the number of observations less than or equal to xx, divided by the total number of observations.

Note that F^n(x)\hat F_n(x) is a step function, which jumps by 1/n1/n at every XiX_i.

Even though XX is continuous, the Glivenko-Cantelli Lemma states that F^n(x)F(x)\hat F_n(x) \to F(x) for all xx as xx \to \infty. As a result, F^n(x)\hat F_n(x) is a good approximation to the true cdf, F(x)F(x).

We can use the ARENA functions DISC and CONT to generate random variables from the empirical cdf's of discrete and random distributions, respectively.

To generate these random variables ourselves, let's first define the ordered points X(1)X(2)X(n)X_{(1)} \leq X_{(2)} \leq X_{(n)}, also called order statistics. Note that we wrap the subscripts in parentheses to differentiate them from the order in which they arrived. For example, if X1=4X_1 = 4, X2=1X_2 = 1, and X3=6X_3 = 6, then X(1)=1X_{(1)} = 1, X(2)=4X_{(2)} = 4, and X(3)=6X_{(3)} = 6.

Here's what the empirical cdf looks like, shown in red below:

Notice that we step from zero to 1/31/3 at X=1X=1, from 1/31/3 to 2/32/3 at X=4X=4, and 2/32/3 to 11 at X=6X=6. If we were to take many observations, then the empirical cdf would approach the true cdf, shown above in green.

Linear Interpolation

Given that we only have a finite number, nn, of data points, we can attempt to transform the empirical cdf into a continuous function by using linear interpolation between the X(i)X_{(i)}'s. Here is the corresponding interpolated cdf (which is not the true cdf even though we denote it as F(x)F(x)):

F(x)={0if x<X(1)i1n1+xX(i)(n1)(X(n+1)X(i))if X(i)xX(i+1),i1if xX(n)F(x) = \left\{ \begin{matrix} 0 & \text{if } x < X_{(1)} \\ \frac{i - 1}{n - 1} + \frac{x - X_{(i)}}{(n-1)(X_{(n+1)} - X_{(i)})} & \text{if } X_{(i)} \leq x \leq X_{(i+1)}, \forall i \\ 1 & \text{if } x \geq X_{(n)} \end{matrix} \right.

If xx is less than the first order statistic, then F(x)=0F(x) = 0. Given our current observations, we have no evidence that we will encounter a value smaller than our current minimum. Similarly, if xx is greater than our last order statistic, then F(x)=1F(x) = 1. Again, we have no evidence that we will encounter a value greater than our current maximum. For all other points, we interpolate.

Here's the interpolation algorithm. First, we set F(X)=UU(0,1)F(X) = U \sim \mathcal U(0,1). Next, we set P=(n1)UP = (n-1)U and I=PI = \lceil P \rceil. Note that II is a discrete uniform random variable over {1,2,...,n}\{1,2,...,n\}. Finally, we evaluate the following equation for XX:

X=X(I)+(PI+1)(X(I+1)X(I))X = X_{(I)} + (P - I + 1)(X_{(I+1)} - X_{(I)})

Let's break down the above equation. X(I)X_{(I)} corresponds to our random starting order statistic. The expression PI+1P - I + 1 turns out to be Unif(0,1): we subtract an integer from the uniform from which it was rounded up and then add one. We multiply that uniform random value by X(I+1)X(I)X_{(I+1)} - X_{(I)} to move the appropriate distance between our starting point and the next order statistic.

For example, suppose X(1)=1X_{(1)} = 1, X(2)=4X_{(2)} = 4 and X(3)=6X_{(3)} = 6. If U=0.73U = 0.73, then P=(n1)U=1.46P = (n-1)U = 1.46 and I=P=2I = \lceil P \rceil = 2. Then:

X=X(I)+(PI+1)(X(I+1)X(I))=X(2)+(1.462+1)(X(3)X(2))=4+(0.46)(64)=4.92\begin{alignedat}{1} X &= X_{(I)} + (P - I + 1)(X_{(I+1)} - X_{(I)}) \\ &= X_{(2)} + (1.46 - 2 + 1)(X_{(3)} - X_{(2)}) \\ &= 4 + (0.46)(6 - 4) \\ &= 4.92 \end{alignedat}

Let's see what the linearly interpolated cdf looks like, shown below in orange. Notice that, between X=1X = 1 and X=6X=6, we have two line segments: one from X=1X = 1 to X=4X=4, and one from X=4X=4 to X=6X=6.

Suppose we draw a uniform, U=0.73U = 0.73.

If we drop a line from the intersection point with the cdf vertically down, the xx-coordinate of the intersection with the xx-axis is F1(U)F^{-1}(U).

Alternatively, we can write the interpolated cdf explicitly by using the equation we saw previously and enumerating the two [X(i),X(i+1)][X_{(i)}, X_{(i+1)}] ranges:

F(x)={0+x12(41)if 1x<4(i = 1 case)12+x42(64)if 4x<6(i = 2 case)F(x) = \left\{ \begin{matrix} 0 + \frac{x-1}{2(4-1)} & \text{if } 1 \leq x < 4 \quad \text{(i = 1 case)} \\[2ex] \frac{1}{2} + \frac{x-4}{2(6-4)} & \text{if } 4 \leq x < 6 \quad \text{(i = 2 case)} \\ \end{matrix} \right.

If we set F(X)=UF(X) = U and solve for both cases, we have:

X={1+6Uif U<1/22+4Uif U1/2X = \left\{\begin{matrix} 1 + 6U & \text{if } U < 1/2 \\[2ex] 2 + 4U & \text{if } U \geq 1/2 \\ \end{matrix} \right.

Again, if U=0.73U = 0.73, then X=2+4(0.73)=4.92X = 2 + 4(0.73) = 4.92.

Convolution Method

In this lesson, we'll look at an alternative to the inverse transform method: the convolution method.

Binomial Example

The term convolution refers to adding things up. For example, if we add up nn iid Bern(pp) random variables, we get a Binomial(nn, pp) random variable:

Y=i=1n(XiBern(p))Bin(n,p)Y = \sum_{i=1}^n (X_i \sim \text{Bern}(p)) \sim \text{Bin}(n,p)

We already know how to generate Bern(pp) random variables using the inverse transform method. Suppose we have a collection of uniforms, U1,...,UniidU(0,1)U_1,...,U_n \overset{\text{iid}}{\sim} \mathcal{U}(0,1). If UipU_i \leq p, we set Xi=1X_i = 1; otherwise, Xi=0X_i = 0. If we repeat this process for all XiX_i and then add up the XiX_i's, we get YY.

For instance, suppose we want to generate YBin(3,0.4)Y \sim \text{Bin}(3, 0.4). Let's draw three uniforms: U1=0.63U_1 = 0.63, U2=0.17U_2 = 0.17, and U3=0.81U_3 = 0.81. We transform these uniforms into the respective Bern(pp) random variables: X1=0X_1 = 0, X2=1X_2 = 1, and X3=0X_3 = 0. If we add these values together, we get Y=1Y = 1.

Triangular Example

Now let's turn to the Tria(0,1,2) distribution. Remember, we generated triangular random variables using the inverse transform method previously. Alternatively, we can add two iid uniforms together to generate a Tria(0,1,2). In other words, if U1,U2iidU(0,1)U_1, U_2 \overset{\text{iid}}{\sim} \mathcal U(0,1), then U1+U2Tria(0,1,2)U_1 + U_2 \sim \text{Tria}(0,1,2).

Let's look at a pictorial representation of this result.

In the Tria(0,1,2) distribution, values close to either end of the range are quite rare. Indeed, it is highly unlikely that two iid uniforms will both have a value very close to zero or one. Values closer to 0.5 are much more likely, and the resulting distribution, which has a maximum probability at X=1X = 1, reflects this fact.

This result should remind us of dice toss outcomes. Suppose we toss two six-sided dice. It's highly unlikely for us to see a two or a twelve, each of which occurs with probability 1/361/36, but it's much more common to see a seven, which occurs with probability 1/61/6. There are just more ways to roll a seven.

Erlang Example

Suppose X1,...,XniidExp(λ)X_1,..., X_n \overset{\text{iid}}{\sim} \text{Exp}(\lambda). By definition, the sum of these exponential random variables forms an Erlangn_n(λ\lambda) random variable:

Y=i=1nXiErlangn(λ)Y = \sum_{i=1}^n X_i \sim \text{Erlang}_n(\lambda)

Let's use the inverse transform method with convolution to express YY in terms of uniform random variables. First, let's remember how we transform uniforms into exponential random variables:

F(x)=1eλxU=1eλXX=1λln(U)\begin{alignedat}{1} F(x) &= 1 - e^{-\lambda x} \\ U &= 1 - e^{-\lambda X} \\ \vdots \\ X &= \frac{-1}{\lambda}\ln(U) \end{alignedat}

Let's rewrite our summation expression to reflect this transformation:

Y=i=1nXi=1λi=1nln(Ui)Y = \sum_{i=1}^n X_i = \frac{-1}{\lambda}\sum_{i=1}^n \ln(U_i)

Now, we can take advantage of the fact that ln(a)+ln(b)=ln(ab)\ln(a) + \ln(b) = \ln(ab) and replace nn natural log invocations with just one. Why might we do this? Natural logarithms are expensive to evaluate on a computer, so we want to reduce them where possible. Consider the following manipulation:

Y=i=1n[1λln(Ui)]=1λln(i=1nUi)Y = \sum_{i=1}^n \left[\frac{-1}{\lambda}\ln(U_i)\right] = \frac{-1}{\lambda}\ln\left(\prod_{i=1}^n U_i\right)

Desert Island Nor(0,1) Approximate Generator

Suppose we have the random variables U1,...,UniidU(0,1)U_1,...,U_n \overset{\text{iid}}{\sim} \mathcal U(0,1), and we let YY equal the summation of the UiU_i's: Y=i=1nUiY = \sum_{i=1}^n U_i. The central limit theorem tells us that, for large enough nn, YY is approximately normal. Furthermore, we know that the mean of YY is nE[Ui]nE[U_i] and the variance of YY is nVar(Ui)n\text{Var}(U_i). As a result, YNor(n/2,n/12)Y \approx \text{Nor}(n/2, n/12).

Let's choose n=12n=12 and assume that it's "large enough". Then YNor(6,1)Y \approx \text{Nor}(6, 1). If we subtract 66 from YY, then the resulting mean is 00, and we end up with a standard normal random variable:

Y6=i=112Ui6Nor(0,1)Y - 6 = \sum_{i=1}^{12} U_i - 6 \approx \text{Nor}(0,1)

If X1,...,XnX_1,...,X_n are iid Geom(pp) random variables, then the sum of the XiX_i's is a negative binomial random variable:

i=1nXiNegBin(n,p)\sum_{i=1}^{n} X_i \sim \text{NegBin}(n,p)

If Z1,...,ZnZ_1,..., Z_n are iid Nor(0, 1) random variables, then sum of the squares of the ZiZ_i's is a χ2\chi^2 random variable:

i=1nZi2χ2(n)\sum_{i=1}^{n} Z_i^2 \sim \chi^2(n)

If Xi,...,XnX_i,...,X_n are iid Cauchy random variables, then the sample mean, Xˉ\bar X, is also a Cauchy random variable. We might think that Xˉ\bar X is normal for large nn, but this is not the case as the Cauchy distribution violates the central limit theorem.

Convolution Method DEMO

In this demo, we will look at some of the random variables we can generate using convolution. First, we'll look at how we generate a uniform distribution, which, admittedly, involves no convolution. Consider the following code:

n = 1;
m = 1000000;
Z = 0 * unifrnd(0,1,[1 m]));

for i=1:n
  Z=Z+(unifrnd(0,1,[1 m]));
  i=i+1;
end
Z=Z/n;
hist(Z,100);

This code generates the following histogram, which indeed appears uniform.

If we change n from one to two, then Z becomes the sum of two uniforms. This code generates the following histogram, which appears triangular, as expected.

Let's change n to three and repeat. The following histogram appears to be taking on a normal shape.

Let's change n to 12 to emulate the "desert island" normal random variable generator we discussed previously. This histogram resembles the normal distribution even more closely.

Now, let's look at generating Exp(1) random variables. Consider the following code:

n = 1;
m = 1000000;
Z = 0 * unifrnd(0,1,[1 m]));

for i=1:n
  Z=Z-log(unifrnd(0,1,[1 m])); % This is the only line that changes
  i=i+1;
end
Z=Z/n;
hist(Z,100);

Here's the corresponding histogram.

Now, if we change n from one to two, then Z becomes the sum of two exponential random variables, itself an Erlang random variable. Here's the corresponding histogram.

Let's add n=30 exponential random variables, which the central limit theorem tells us will produce an approximately normal result. Here's the histogram. This histogram is clearly not the normal distribution, as it contains considerable right skew.

For a much better approximation, we likely need to add several hundred observations together. Here's the histogram for n=200.

Finally, let's look at generating Cauchy random variables. Consider the following code:

n = 1;
m = 1000;
Z = 0 * unifrnd(0,1,[1 m]));

for i=1:n
  Z=Z + normrnd(0.0, 1.0, [1 m]) ./normrnd(0.0, 1.0, [1 m]) % This is the only line that changes
  i=i+1;
end
Z=Z/n;
hist(Z,100);

If we generate m = 1000 of these random variables, we get the following histogram. The Cauchy distribution has a large variance so, even though most of the observations are close to zero, there are a few observations far out on either side.

If we add n = 1000 Cauchy random variables, we might expect to see a histogram approximating the normal distribution. However, as we can see from the following plot, the Cauchy distribution violates the central limit theorem.

Acceptance-Rejection Method

In this lesson, we'll talk about the acceptance-rejection method, which is an alternative to the inverse transform method and the convolution technique we studied previously. In contrast to those two methods, acceptance-rejection is very general, and it always works regardless of the random variate we want to generate.

Acceptance-Rejection Method

The motivation for the acceptance-rejection method is that most cdf's cannot be inverted efficiently, which means that the inverse transform method has limited applicability.

The acceptance-rejection method samples from an "easier" distribution, which is close to but not identical to the one we want. Then, it adjusts for this discrepancy by "accepting" (keeping) only a certain proportion of the variates it generates from the approximating distribution.

The acceptance proportion is based on how close the approximate distribution is to the one we want. If the sampling distribution resembles the desired distribution closely, we keep a high proportion of the samples.

Let's look at a simple example. Suppose we want to generate XU(2/3,1)X \sim \mathcal U(2/3, 1). Of course, we'd usually use the inverse transform method for something so trivial, but let's see how we'd generate XX using acceptance-rejection.

Here's the algorithm. First, we generate UU(0,1)U \sim \mathcal U(0, 1). Next, we check to see if U2/3U \geq 2/3. If so, we accept UU and set X=UX=U, because the conditional distribution of XX given that U2/3U \geq 2/3 is U(2/3,1)\mathcal U (2/3, 1). If U<2/3U < 2/3, we reject the observation, and we return to step one, where we draw another uniform. Eventually, we'll get U2/3U \geq 2/3, at which point we stop.

Notation

Suppose we want to simulate a continuous random variable, XX, with pdf f(x)f(x), but XX is difficult to generate directly. Also suppose that we can easily generate a random variable that has a distinct pdf, h(x)t(x)/ch(x) \equiv t(x)/c, where cc is a constant and t(x)t(x) majorizes f(x).

When we say that t(x)t(x) majorizes f(x)f(x), we mean that t(x)f(x)t(x) \geq f(x), for all xx.

Note that only h(x)h(x), not t(x)t(x), is a pdf. Since t(x)f(x)t(x) \geq f(x), and the integral of f(x)f(x) over the real line equals one - by virtue of f(x)f(x) being a pdf - then the corresponding integral of t(x)t(x) must be greater than or equal to one. Since t(x)t(x) doesn't strictly integrate to one, it cannot be a pdf.

Let's look at the constant, cc. We define cc, which we assume to be finite, as the integral of t(x)t(x) over the real line, which we just said was greater than or equal to the corresponding integral of f(x)f(x):

cRt(x)dxRf(x)dx=1,c<c \equiv \int_{\mathbb{R}} t(x)dx \geq \int_{\mathbb{R}} f(x)dx = 1, \quad c < \infty

Theorem

Let's define a function g(x)f(x)/t(x)g(x) \equiv f(x)/t(x). We can notice that, since f(x)t(x)f(x) \leq t(x), 0g(x)10 \leq g(x) \leq 1 for all xx. Additionally, let's sample UU(0,1)U \sim \mathcal U(0,1), and let YY be a random variable (independent of UU), with the pdf we defined earlier: h(y)t(y)/ch(y) \equiv t(y) / c.

If Ug(Y)U \leq g(Y), then the conditional distribution of YY has pdf f(y)f(y). Remember, the random variable we are trying to generate has pdf f(y)f(y). In other words, we accept that YY has the pdf of the random variable we want when Ug(Y)U \leq g(Y).

This result suggests the following acceptance-rejection algorithm for generating a random variable XX with pdf f(x)f(x). First, we generate UU(0,1)U \sim \mathcal U(0,1) and YY with pdf h(y)h(y), independent of UU. If Ug(Y)U \leq g(Y), we accept YY and we return X=YX = Y; otherwise, we reject YY and start over, continuing until we produce an acceptable (U,Y)(U, Y) pair. We refer to the event Ug(Y)U \leq g(Y) as the acceptance event, which always happens eventually.

Visual Walkthrough

Consider the following pdf, outlined in green, which is the pdf of the random variable we want to generate.

Now consider t(x)t(x), shown in red below, which majorizes f(x)f(x). As we can see: t(x)f(x)t(x) \geq f(x), for all xx.

Finally, let's consider h(x)h(x), which is defined as t(x)/ct(x) / c, where cc is the area under t(x)t(x). As a result, h(x)h(x) is proportional to t(x)t(x), but has an area equal to one, which makes it a valid pdf.

Let's generate a point, YY, uniformly under t(x)t(x). We accept that point with probability g(Y)=f(Y)/t(Y)=f(Y)/ch(Y)g(Y) = f(Y)/t(Y) = f(Y)/ch(Y). For the point below, we can see the corresponding values of t(Y)t(Y) and f(Y)f(Y). As the distance between t(Y)t(Y) and f(Y)f(Y) decreases, the probability of accepting YY as a valid value of XX increases.

Proof of Acceptance-Rejection Method (OPTIONAL)

In this lesson, we will prove the acceptance-rejection method.

A-R Method Recap

Let's define g(x)f(x)/t(x)g(x) \equiv f(x)/t(x). Remember that f(x)f(x) is the pdf of the random variable, XX, that we wish to generate, and t(x)t(x) majorizes f(x)f(x). Let UU(0,1)U \sim \mathcal U(0,1), and let YY be a random variable, independent of UU, with pdf h(y)=t(y)/ch(y) = t(y)/c.

Note that t(y)t(y) is not a pdf, but we can transform it into a pdf by dividing it by cc - the area under t(y)t(y) - which guarantees that it integrates to one.

If Ug(Y)U \leq g(Y), we set X=YX=Y because the conditional distribution of YY, given Ug(Y)U \leq g(Y), is f(x)f(x). Again, we refer to Ug(Y)U \leq g(Y) as the acceptance event. We continue sampling UU and YY until we have an acceptance, at which point we set X=YX=Y and stop.

Proof

We need to prove that the value that comes out of this algorithm, which we claim has pdf f(x)f(x), indeed has pdf f(x)f(x).

Let AA be the acceptance event. The cdf of XX is, by definition:

P(Xx)P(X \leq x)

Given that we have experienced AA, we have set Y=XY = X. As a result, we can express the cdf of XX with a conditional probability concerning YY and AA:

P(Xx)=P(YxA)P(X \leq x) = P(Y \leq x | A)

We can expand the conditional probability:

P(Xx)=P(YxA)=P(A,YX)P(A)(1)P(X \leq x) = P(Y \leq x | A) = \frac{P(A, Y \leq X)}{P(A)} \quad (1)

Now, what's the probability of the acceptance event, AA, given YY? Well, from the definition of AA, we see that:

P(AY=y)=P(Ug(Y)Y=y)P(A|Y=y) = P(U \leq g(Y)|Y=y)

Since Y=yY=y, we can substitute yy for YY:

P(Ug(Y)Y=y)=P(Ug(y)Y=y)P(U \leq g(Y)|Y=y) = P(U \leq g(y)|Y=y)

Earlier, we stated that UU and YY are independent. As a result, information about YY gives us no information about UU, so:

P(Ug(y)Y=y)=P(Ug(y))P(U \leq g(y)|Y=y) = P(U \leq g(y))

Additionally, we know that UU is uniform, so, by definition, P(Ux)=x,0x1P(U \leq x) = x, 0 \leq x \leq 1. Since g(y)g(y) also has a range of [0,1][0,1], by definition, then:

P(Ug(y))=g(y)(2)P(U \leq g(y)) = g(y) \quad (2)

Now let's consider the joint probability P(A,Yx)P(A, Y \leq x). Here, we can use the law of total probability, also known as the standard conditioning argument:

P(A,Yx)=P(A,YxY=y)h(y)dyP(A, Y \leq x) = \int_{-\infty}^\infty P(A, Y \leq x | Y=y)h(y)dy

We can express YxY \leq x directly in the limits of integration, instead of in the conditional probability expression inside the integral. If we integrate over all yy, such that yx-\infty \leq y \leq x, then we are still only considering values of YY such that YxY \leq x:

P(A,Yx)=xP(AY=y)h(y)dyP(A, Y \leq x) = \int_{-\infty}^x P(A | Y=y)h(y)dy

We can substitute t(y)/c=h(y)t(y)/c = h(y):

P(A,Yx)=1cxP(AY=y)t(y)dyP(A, Y \leq x) = \frac{1}{c}\int_{-\infty}^x P(A | Y=y)t(y)dy

We might remember, from result (2)(2) above, that P(AY=y)=g(y)P(A|Y=y) = g(y). Therefore:

P(A,Yx)=1cxg(y)t(y)dyP(A, Y \leq x) = \frac{1}{c}\int_{-\infty}^x g(y)t(y)dy

Also remember that, by definition, g(y)=f(y)/t(x)g(y) = f(y)/t(x). Therefore:

P(A,Yx)=1cxf(y)dy(3)P(A, Y \leq x) = \frac{1}{c}\int_{-\infty}^x f(y)dy \quad (3)

If we let xx \to \infty, then we have:

P(A)=1cf(y)dyP(A) = \frac{1}{c}\int_{-\infty}^\infty f(y)dy

Notice that two things changed here. First, we changed the upper limit of integration from xx to \infty. Second, we changed the probability expression from P(A,Yx)P(A, Y \leq x) to P(A)P(A). Why? If xx \to \infty, P(Yx)1P(Y \leq x) \to 1, so P(A,Yx)P(A)P(A, Y \leq x) \to P(A).

We know that f(x)f(x) is a pdf, and the integral of a pdf over the real line is equal to one, so:

P(A)=1cf(y)dy=1c(4)P(A) = \frac{1}{c}\int_{-\infty}^\infty f(y)dy = \frac{1}{c} \quad (4)

Together, facts (1)(1), (3)(3), and (4)(4) imply that:

P(Xx)=P(A,Yx)P(A)=xf(y)dyP(X \leq x) = \frac{P(A, Y \leq x)}{P(A)} = \int_{-\infty}^x f(y)dy

Essentially, we have shown that the cdf of XX is equal to the integral of f(y)f(y), from -\infty to xx. If we take the derivative of both sides, we see that the pdf of XX is equal to f(x)f(x).

There are two main issues here. First, we need the ability to sample YY from h(y)h(y) quickly since we can't sample XX from f(x)f(x) easily. Second, cc must be small since the probability of the acceptance event is:

P(Ug(Y))=1cP(U \leq g(Y)) = \frac{1}{c}

Now, cc is bounded by one from below - f(x)=t(x),c=1f(x) = t(x), c = 1 - so we want cc to be as close to one as possible. If so, then:

P(Ug(Y))=1c1P(U \leq g(Y)) = \frac{1}{c} \approx 1

If c1c \approx 1, then we accept almost every YY we sample. If cc is very large, then 1/c1/c is very small, and we will likely have to draw many (U,Y)(U, Y) pairs before acceptance. In fact, the number of trials until "success", defined as Ug(Y)U \leq g(Y), is Geom(1/c1/c), which has an expected value of cc trials.

A-R Method - Continuous Examples

In this lesson, we will look at some examples of applying the acceptance-rejection method on continuous distributions. As we said, acceptance-rejection is a general method that always works, even when other methods may be difficult to apply.

Theorem Refresher

Define g(x)f(x)/t(x)g(x) \equiv f(x)/t(x), and note that 0g(x)10 \leq g(x) \leq 1 for all xx. We sample UU(0,1)U \sim \mathcal U(0,1) and YY from pdf h(y)=t(y)/ch(y) = t(y)/c, where t(y)t(y) majorizes f(x)f(x) and cc is the integral of t(y)t(y) over the real line. If Ug(Y)U \leq g(Y), then YY has the conditional pdf f(y)f(y).

Here's the basic algorithm, which we repeat until acceptance. First, we draw UU from U(0,1)\mathcal U(0,1) and YY from h(y)h(y) independent of UU. If Ug(Y)U \leq g(Y), we return XYX \leftarrow Y. Otherwise, we repeat, sampling UU and YY again.

Example

Let's generate a random variable with pdf f(x)=60x3(1x)2,0x1f(x) = 60x^3(1-x)^2, 0 \leq x \leq 1. This pdf is shown below.

We can't invert this pdf analytically, so we cannot use the inverse transform method; we must use acceptance-rejection. We can use some calculus to determine that the maximum of f(x)f(x) occurs at x=0.6x = 0.6: f(0.6)=2.0736f(0.6) = 2.0736. With this knowledge, we can generate a basic majorizer, t(x)=2.0736t(x) = 2.0736.

Remember that we said we wanted cc to be as close to one as possible so that we accept samples from h(y)h(y) with high probability. We know that cc equals the integral of t(x)t(x) from zero to one, so, in this case, c=2.0736c = 2.0736. All this to say, t(x)t(x) is a relatively inefficient majorizer.

Like we said, h(x)=t(x)/ch(x) = t(x)/c. In this case, h(x)=1h(x) = 1, since t(x)=2.0736=ct(x) = 2.0736 = c. This result means that YY is a Unif(0,1) random variable, since it's pdf is one.

Finally, let's compute g(x)g(x):

g(x)=f(x)t(x)=60x3(1x)22.0736g(x) = \frac{f(x)}{t(x)} = \frac{60x^3(1-x)^2}{2.0736}

Let's look at a simple example. Let's draw two uniforms, U=0.13U = 0.13 and Y=0.25Y = 0.25. If we plug and chug, we see that g(Y)0.25g(Y) \approx 0.25. Therefore, Ug(Y)U \leq g(Y) and we take X0.25X \leftarrow 0.25.

A-R Method - Continuous Examples DEMO

Previous Example

Let's demo the acceptance-rejection example we looked at previously. We wanted to generate XX with pdf f(x)=60x3(1x)2,0x1f(x) = 60x^3(1-x)^2, 0 \leq x \leq 1. Let's recall g(x)g(x):

g(x)=f(x)t(x)=60x3(1x)22.0736g(x) = \frac{f(x)}{t(x)} = \frac{60x^3(1-x)^2}{2.0736}

Additionally, remember that our acceptance event is Ug(Y)U \leq g(Y). Consider the following code:

m = 10000;
X = [];
for i=1:m
  U = 1; % Initialize to known failing condition
  Y = 0; % Initialize to known failing condition
  while U > 60 * (Y^3) * (1-Y)^2 / 2.0736
    U = rand;
    Y = rand;
  end
  X(i) = Y;
  i=i+1
end
histogram(X)

This code iterates for m=10000 iterations and, on each iteration, generates Unif(0,1) random variables, Y and U, until the acceptance event occurs. Once it does, we store away the accepted value of Y and proceed to the next iteration. Finally, we generate a histogram of the m accepted values of Y.

Let's look at the generated histogram after 10,000 samples.

Here's the histogram after 100,000 samples. As we can see, the histogram of generated values looks quite close to the graph of f(x)f(x).

Half-Normal Example

In this example, we are going to generate a standard half-normal random variable, which has the following pdf:

f(x)=22πex22,x0f(x) = \frac{2}{\sqrt{2\pi}}e^{\frac{-x^2}{2}}, \quad x \geq 0

This random variable closely resembles a standard normal random variable, except that we've "flipped" the negative portion of the distribution over the yy-axis and doubled f(x)f(x), for all x0x \geq 0.

We can use the following majorizing function, t(x)t(x), as it is always greater than f(x)f(x):

t(x)=2eπexf(x)t(x) = \sqrt{\frac{2e}{\pi}}e^{-x} \geq f(x)

If we take the integral of t(x)t(x) over the domain of f(x)f(x), we get cc:

c=0t(x)dx=2eπc = \int_0^{\infty} t(x)dx = \sqrt{\frac{2e}{\pi}}

Now, let's compute h(x)h(x):

h(x)=t(x)c=exh(x) = \frac{t(x)}{c} = e^{-x}

Let's remember the pdf for an exponential random variable:

f(x)=eλxf(x) = e^{-\lambda x}

We can see that h(x)h(x) is simply the pdf of an Exp(1) random variable, which we can generate easily!

Finally, let's look at g(x)g(x):

g(x)=f(x)t(x)=e(x1)2/2g(x) = \frac{f(x)}{t(x)} = e^{-(x-1)^2/2}

To generate a half normal, we simply generate UU(0,1)U \sim \mathcal U(0,1) and YExp(1)Y \sim \text{Exp}(1) and accept YY if Ug(Y)U \leq g(Y).

We can use the half-normal result to generate a Nor(0,1) random variable. We simply have to "flip back" half of the XX values over the yy-axis. Given UU(0,1)U \sim \mathcal U(0,1) and XX from the half-normal distribution, we can see that:

Z={XU1/2XU>1/2Nor(0,1)Z = \left\{ \begin{matrix} -X & U \leq 1/2 \\ X & U > 1/2 \end{matrix} \right. \sim \text{Nor}(0,1)

As always, we can generate a Nor(μ\mu, σ2\sigma^2) by applying the transformation μ+σZ\mu + \sigma Z.

A-R Method - Poisson Distribution

In this lesson, we will use a method similar to acceptance-rejection to generate a discrete random variable.

Poisson Distribution

The Poisson distribution has the following pmf:

P(X=n)=eλλnn!,n=0,1,...P(X=n) = e^{-\lambda}\frac{\lambda^n}{n!}, \quad n = 0, 1,...

We'll use a variation of the acceptance-rejection method here to generate a realization of XX. The algorithm will go through a series of equivalent statements to arrive at a rule that gives X=nX = n for some nn.

Remember that, by definition, X=nX = n if and only if we observe exactly nn arrivals from a Poisson(λ\lambda) process in one time unit. We can define AiA_i as the iith interarrival time from a Pois(λ\lambda) process: the time between arrivals i1i-1 and ii.

Like we said, X=nX=n if and only if we see exactly nn Pois(λ\lambda) arrivals by t=1t = 1. We can express this requirement as an inequality of sums:

X=n    i=1nAi1<i=1n+1AiX = n \iff \sum_{i=1}^n A_i \leq 1 < \sum_{i=1}^{n+1} A_i

In other words, the sum of the first nn interarrival times must be less than or equal to one, and, correspondingly, the sum of the first n+1n+1 interarrival times must be greater than one. Put another way, the nnth arrival occurs by time t=1t=1, but the (n+1)(n+1)th arrival occurs after time t=1t=1.

We might recall that interarrival times from a Pois(λ\lambda) are generated from an Exp(λ\lambda) distribution, and we know how to transform uniforms into Exp(λ\lambda) random variables:

X=n    i=1n[1λlnUi]1<i=1n+1[1λlnUi]X = n \iff \sum_{i=1}^n \left[\frac{-1}{\lambda}\ln U_i\right] \leq 1 < \sum_{i=1}^{n+1} \left[\frac{-1}{\lambda}\ln U_i\right]

As we saw previously, we can move the natural logarithm outside of the sum and convert the sum to a product: ln(a)+ln(b)=ln(ab)\ln(a) + \ln(b) = \ln(ab). Consider:

X=n    1λln(i=1nUi)1<1λln(i=1n+1Ui)X = n \iff \frac{-1}{\lambda}\ln\left(\prod_{i=1}^nU_i\right) \leq 1 < \frac{-1}{\lambda}\ln\left(\prod_{i=1}^{n+1}U_i\right)

Let's multiply through by λ-\lambda - make sure to flip the inequalities when multiplying by a negative - and raise the whole thing by ee:

X=n    i=1nUieλ>i=1n+1UiX = n \iff \prod_{i=1}^nU_i \geq e^{-\lambda} > \prod_{i=1}^{n+1}U_i

To generate a Pois(λ\lambda) random variable, we will generate n+1n+1 uniforms until the above inequality holds, and then we'll set X=nX=n.

Poisson Generation Algorithm

Let's set a=eλa = e^{-\lambda}, p=1p=1, and X=1X=-1. Note that even though we set XX to 1-1, the first thing we do is increment it. As long as p<ap < a, we generate a uniform, UU. Next, we update our running product, p=pUp = pU, and we increment XX by one. Once p<ap < a, we return XX.

Example

Let's obtain a Pois(2) random variable. We will sample uniforms until the following condition holds:

eλ=e2=0.1353>i=1n+1Uie^{-\lambda} = e^{-2} = 0.1353 > \prod_{i=1}^{n+1}U_i

Consider the following table:

nUn+1i=1n+1UiStop?00.39110.3911No10.94510.3696No20.50330.1860No30.70030.1303Yes\begin{array}{cccc} n & U_{n+1} & \prod_{i=1}^{n+1}U_i & \text{Stop?} \\ \hline 0 & 0.3911 & 0.3911 & \text{No} \\ 1 & 0.9451 & 0.3696 & \text{No} \\ 2 & 0.5033 & 0.1860 & \text{No} \\ 3 & 0.7003 & 0.1303 & \text{Yes} \end{array}

From our iterative sampling above, we see that we take X=3X=3 since the inequality above holds for n=3n=3.

Remark

It turns out the the expected number of UU's required to generate one realization of XX is E[X+1]=λ+1E[X+1] = \lambda + 1. For example, if λ=8.3\lambda = 8.3, then we need to sample, on average, 9.39.3 uniforms before we realize XX.

This sampling process becomes tedious; thankfully, for large values of λ\lambda, we can take a shortcut. When λ20\lambda \geq 20, then XX is approximately Nor(λ\lambda, λ\lambda). We can standardize XX to get a Nor(0,1) random variable:

XλλNor(0,1)\frac{X - \lambda}{\sqrt \lambda} \approx \text{Nor}(0,1)

Given this fact, we can use the following algorithm to generate a Pois(λ\lambda) random variable. First, generate ZZ from Nor(0,1). Next, return:

X=max(0,λ+λZ+0.5)X = \max(0, \lfloor \lambda + \sqrt \lambda Z + 0.5\rfloor)

Let's walk through the above expression. The term λZ\sqrt \lambda Z turns our Nor(0,1) random variable into a Nor(0, λ\lambda) random variable. If we add λ\lambda, we get a Nor(λ\lambda, λ\lambda) random variable.

Remember that the normal distribution is continuous, and the Poisson distribution is discrete. We need to discretize our real-valued transformation of ZZ, which we accomplish by rounding down. However, if we always round down, then we will underestimate XX, on average.

We perform a continuity correction by adding 0.50.5 before we round, which means that if λ+λZ\lambda + \sqrt \lambda Z is between two numbers, we choose the larger number with probability 0.50.5 and the lower number with probability 0.50.5.

Finally, we have to take the maximum of our computed value and zero. Why? The standard normal distribution can return negative values, but the Poisson can only return zero or larger.

Let's look at a simple example. If λ=30\lambda = 30 and Z=1.46Z = 1.46, then X=30.5+30(1.46)=38X = \lfloor 30.5 + \sqrt 30(1.46) = 38.

Of course, we could have also used the discrete version of the inverse transform method to generate Pois(λ\lambda) random variables by tabling the cdf values and mapping them to the appropriate uniform ranges.

OMSCS Notes is made with in NYC by Matt Schlenker.

Copyright © 2019-2023. All rights reserved.

privacy policy