Happy studying! Did you find my notes useful this semester?
Please consider
giving me a few bucks
or
buying me a beer. Contributions like yours help me keep these notes forever free.
Happy studying! Did you find my notes useful this semester?
Please consider
giving me a few bucks
or
buying me a beer. Contributions like yours help me keep these notes forever free.
Calculus, Probability, and Statistics Primers, Continued
88 minute read
Notice a tyop typo? Please submit an
issue
or open a
PR.
Calculus, Probability, and Statistics Primers, Continued
Conditional Expectation (OPTIONAL)
In this lesson, we are going to continue our exploration of conditional expectation and look at several cool applications. This lesson will likely be the toughest one for the foreseeable future, but don't panic!
Conditional Expectation Recap
Let's revisit the conditional expectation of Y given X=x. The definition of this expectation is as follows:
For example, suppose f(x,y)21x2y/4 for x2≤y≤1. Then, by definition:
f(y∣x)=fX(x)f(x,y)
We calculated the marginal pdf, fX(x), previously, as the integral of f(x,y) over all possible values of y∈[x2,1]. We can plug in fX(x) and f(x,y) below:
f(y∣x)=821(1−x4)421x2y=1−x42y,x2≤y≤1
Given f(y∣x), we can now compute E[Y∣X=x]:
E[Y∣X=x]=∫Ry∗(1−x42y)dy
We adjust the limits of integration to match the limits of y:
E[Y∣X=x]=∫x21y∗(1−x42y)dy
Now, complete the integration:
E[Y∣X=x]=∫x211−x42y2dy
E[Y∣X=x]=1−x42∫x21y2dy
E[Y∣X=x]=1−x423y3∣∣∣∣x21
E[Y∣X=x]=3(1−x4)2y3∣∣∣∣x21=3(1−x4)2(1−x6)
Double Expectations
We just looked at the expected value of Y given a particular value X=x. Now we are going to average the expected value of Y over all values of X. In other words, we are going to take the average expected value of all the conditional expected values, which will give us the overall population average for Y.
The theorem of double expectations states that the expected value of the expected value of Y given X is the expected value of Y. In other words:
E[E(Y∣X)]=E[Y]
Let's look at E[Y∣X]. We can use the formula that we used to calculate E[Y∣X=x] to find E[Y∣X], replacing x with X. Let's look back at our conditional expectation from the previous slide:
E[Y∣X=x]=3(1−x4)2(1−x6)
If we set X=X, we get the following expression:
E[Y∣X=X]=E[Y∣X]=3(1−X4)2(1−X6)
What does this mean? E[Y∣X] is itself a random variable that is a function of the random variable X. Let's call this function h:
h(X)=3(1−X4)2(1−X6)
We now have to calculate E[h(X)], which we can accomplish using the definition of LOTUS:
We can rearrange the right-hand side. Note that we can move y outside of the first integral since it is a constant value when we integrate with respect to dx:
Remember now the definition for the conditional pdf:
f(y∣x)=fX(x)f(x,y);f(y∣x)fX(x)=f(x,y)
We can substitute in f(x,y) for f(y∣x)fX(x):
E[E[Y∣X]]=∫Ry∫Rf(x,y)dxdy
Let's remember the definition for the marginal pdf of Y:
fY(y)=∫Rf(x,y)dx
Let's substitute:
E[E[Y∣X]]=∫RyfY(y)dy
Of course, the expected value of Y, E[Y] equals:
E[Y]=∫RyfY(y)dy
Thus:
E[E[Y∣X]]=E[Y]
Example
Let's apply this theorem using our favorite joint pdf: f(x,y)=21x2y/4,x2≤y≤1. Through previous examples, we know fX(x), fY(y) and E[Y∣x]:
fX(x)=821x2(1−x4)
fY(y)=27y25
E[Y∣x]=3(1−x4)2(1−x6)
We are going to look at two ways to compute E[Y]. First, we can just use the definition of expected value and integrate the product yFY(y)dy over the real line:
E[Y]=∫RyfY(y)dy
E[Y]=∫01y∗27y25dy
E[Y]=∫0127y27dy
E[Y]=27∫01y27dy
E[Y]=2792y29∣∣∣∣01=97y29∣∣∣∣01=97
Now, let's calculate E[Y] using the double expectation theorem we just learned:
E[Y]=E[E(Y∣X)]=∫RE(Y∣x)fX(x)dx
E[Y]=∫−113(1−x4)2(1−x6)×821x2(1−x4)dx
E[Y]=2442∫−11(1−x4)(1−x6)×x2(1−x4)dx
E[Y]=2442∫−11x2(1−x6)dx
E[Y]=2442∫−11x2−x8dx
E[Y]=2442(3x3−9x9)∣∣∣∣−11
E[Y]=2442(93x3−x9)∣∣∣∣−11
E[Y]=2442(93−1−(−3+1))=2442∗94=97
Mean of the Geometric Distribution
In this application, we are going to see how we can use double expectation to calculate the mean of a geometric distribution.
Let Y equal the number of coin flips before a head, H, appears, where P(H)=p. Thus, Y is distributed as a geometric random variable parameterized by p: Y∼Geom(p). We know that the pmf of Y is fY(y)=P(Y=y)=(1−p)y−1p,y=1,2,.... In other words, P(Y=y) is the product of the probability of y−1 failures and the probability of one success.
Let's calculate the expected value of Y using the summation equation we've used previously (take the result on faith):
E[Y]=y∑yfY(y)=1∑∞y(1−p)y−1p=p1
Now we are going to use double expectation and a standard one-step conditioning argument to compute E[Y]. First, let's define X=1 if the first flip is H and X=0 otherwise. Let's pretend that we have knowledge of the first flip. We don't really have this knowledge, but we do know that the first flip can either be heads or tails: P(X=1)=p,P(X=0)=1−p.
Let's remember the double expectation formula:
E[Y]=E[E(Y∣X)]=x∑E(Y∣x)fX(x)
What are the x-values? X can only equal 0 or 1, so:
E[Y]=E(Y∣X=0)P(X=0)+E(Y∣X=1)P(X=1)
Now, if X=0, the first flip was tails, and I have to start counting all over again. The expected number of flips I have to make before I see heads is E[Y]. However, I have already flipped once, and I flipped tails: that's what X=0 means. So, the expected number of flips I need, given that I already flipped tails is 1+E[Y]: P(Y∣X=0)=1+E[Y] What is P(0)? It's just 1−p. Thus:
E[Y∣X=0]P(X=0)=(1+E[Y])(1−p)
Now, if X=1, the first flip was heads. I won! Given that X=1, the expected value of Y is one. If I know that I flipped heads on the first try, the expected number of trials before I flip heads is that one trial: P(Y∣X=1)=1. What is P(1)? It's just p. Thus:
E[Y∣X=1]P(X=1)=(1)(p)=p
Let's solve for E[Y]:
E[Y]=(1+E[Y])(1−p)+p
E[Y]=1+E[Y]−p−pE[Y]+p
E[Y]=1+E[Y]−pE[Y]
pE[Y]=1;E[Y]=p1
Computing Probabilities by Conditioning
Let A be some event. We define the random variable Y=1 if A occurs, and Y=0 otherwise. We refer to Y as an indicator function of A; that is, the value of Y indicates the occurrence of A. The expected value of Y is given by:
E[Y]=y∑yfY(y)dy
Let's enumerate the y-values:
E[Y]=0(P(Y=0))+1(P(Y=1))=P(Y=1)
What is P(Y=1)? Well, Y=1 when A occurs, so P(Y=1)=P(A)=E[Y]. Indeed, the expected value of an indicator function is the probability of the corresponding event.
Let's look at an implication of the above result. By definition:
P[A]=E[Y]=E[E(Y∣X)]
Using LOTUS:
P[A]=∫RE[Y∣X=x]dFX(x)
Since we saw that E[Y∣X=x]=P(A∣X=x), then:
P[A]=∫RP(A∣X=x)dFX(x)
Theorem
The result above implies that, if X and Y are independent, continuous random variables, then:
P(Y<X)=∫RP(Y<x)fX(x)dx
To prove, let A={Y<X}. Then:
P[A]=∫RP(A∣X=x)dFX(x)
Substitute A={Y<X}:
P[A]=∫RP(Y<X∣X=x)dFX(x)
What's P(Y<X∣X=x)? In other words, for a given X=x, what's the probability that Y<X? That's a long way of saying P(Y<x):
P[A]=∫RP(Y<x)dFX(x)
P[A]=P[Y<X]=∫RP(Y<x)fX(x)dx,Fx′(x)=fX(x)dx
Example
Suppose we have two random variables, X∼Exp(μ) and Y∼Exp(λ). Then:
P(Y<X)=∫RP(Y<x)fX(x)dx
Note that P(Y<x) is the cdf of Y at x: FY(x). Thus:
P(Y<X)=∫RFY(x)fX(x)dx
Since X and Y are both exponentially distributed, we know that they have the following pdf and cdf, by definition:
f(x;λ)=λe−λx,x≥0
F(x;λ)=1−e−λx,x≥0
Let's substitute these values in, adjusting the limits of integration appropriately:
P(Y<X)=∫0∞1−e−λx(μe−μx)dx
Let's rearrange:
P(Y<X)=μ∫0∞e−μx−e−λx−μxdx
P(Y<X)=μ[∫0∞e−μxdx−∫0∞e−λx−μxdx]
Let u1=−μx. Then du1=−μdx. Let u2=−λx−μx. Then du2=−(λ+μ)dx. Thus:
P(Y<X)=μ[−∫0∞μeu1du1+∫0∞λ+μeu2du2]
Now we can integrate:
P(Y<X)=μ[∫0∞λ+μeu2du2−∫0∞μeu1du1]
P(Y<X)=μ[λ+μeu2−μeu1]0∞
P(Y<X)=μ[λ+μe−λx−μx−μe−μx]0∞
P(Y<X)=μ[0−λ+μ1+μ1]
P(Y<X)=μ[μ1−λ+μ1]
P(Y<X)=μμ−λ+μμ
P(Y<X)=λ+μλ+μ−λ+μμ=λ+μλ
As it turns out, this result makes sense because X and Y correspond to arrivals from a poisson process and μ and λ are the arrival rates. For example, suppose that X corresponds to arrival times for women to a store, and Y corresponds to arrival times for men. If women are coming in at a rate of three per hour - λ=3 - and men are coming in at a rate of nine per hour - μ=9 - then the probability of a woman arriving before a man is going to be 3/4.
Variance Decomposition
Just as we can use double expectation for the expected value of Y, we can express the variance of Y, Var(Y) in a similar fashion, which we refer to as variance decomposition:
Var(Y)=E[Var(Y∣X)]+Var[E(Y∣X)]
Proof
Let's start with the first term: E[Var(Y∣X)]. Remember the definition of variance, as the second central moment:
Var(X)=E[X2]−(E[X])2
Thus, we can express E[Var(Y∣X)] as:
E[Var(Y∣X)]=E[E[Y2∣X]−(E[Y∣X])2]
Note that, since expectation is linear:
E[Var(Y∣X)]=E[E[Y2∣X]]−E[(E[Y∣X])2]
Notice the first expression on the right-hand side. That's a double expectation, and we know how to simplify that:
E[Var(Y∣X)]=E[Y2]−E[(E[Y∣X])2],1.
Now let's look at the second term in the variance decomposition: Var[E(Y∣X)]. Considering again the definition for variance above, we can transform this term:
Var[E(Y∣X)]=E[(E[Y∣X)2]−(E[E[Y∣X]])2
In this equation, we again see a double expectation, quantity squared. So:
Var[E(Y∣X)]=E[(E[Y∣X)2]−E[Y]2,2.
Remember the equation for variance decomposition:
Var(Y)=E[Var(Y∣X)]+Var[E(Y∣X)]
Let's plug in 1 and 2 for the first and second term, respectively:
Var(Y)=E[Y2]−E[(E[Y∣X])2]+E[(E[Y∣X)2]−E[Y]2
Notice the cancellation of the two scary inner terms to reveal the definition for variance:
Var(Y)=E[Y2]−E[Y]2=Var(Y)
Covariance and Correlation
In this lesson, we are going to talk about independence, covariance, correlation, and some related results. Correlation shows up all over the place in simulation, from inputs to outputs to everywhere in between.
LOTUS in 2D
Suppose that h(X,Y) is some function of two random variables, X and Y. Then, via LOTUS, we know how to calculate the expected value, E[h(X,Y)]:
E[h(X,Y)]={∑x∑yh(x,y)f(x,y)∫R∫Rh(x,y)f(x,y)dxdyif (X,Y) is discreteif (X,Y) is continuous
Expected Value, Variance of Sum
Whether or not X and Y are independent, the sum of the expected values equals the expected value of the sum:
E[X+Y]=E[X]+E[Y]
If X and Y are independent, then the sum of the variances equals the variance of the sum:
Var(X+Y)=Var(X)+Var(Y)
Note that we need the equations for LOTUS in two dimensions to prove both of these theorems.
Aside: I tried to prove these theorems. It went terribly! Check out the proper proofs here.
Random Sample
Let's suppose we have a set of n random variables: X1,...,Xn. This set is said to form a random sample from the pmf/pdf f(x) if all the variables are (i) independent and (ii) each Xi has the same pdf/pmf f(x).
We can use the following notation to refer to such a random sample:
X1,...,Xn∼iidf(x)
Note that "iid" means "independent and identically distributed", which is what (i) and (ii) mean, respectively, in our definition above.
Theorem
Given a random sample, X1,...,Xn∼iidf(x), the sample mean, Xnˉ equals the following:
Xnˉ≡i=1∑nnXi
Given the sample mean, the expected value of the sample mean is the expected value of any of the individual variables, and the variance of the sample mean is the variance of any of the individual variables divided by n:
E[Xnˉ]=E[Xi];Var(Xnˉ)=Var(Xi)/n
We can observe that as n increases, E[Xnˉ] is unaffected, but Var(Xnˉ) decreases.
Covariance
Covariance is one of the most fundamental measures of non-independence between two random variables. The covariance between X and Y, Cov(X,Y) is defined as:
Cov(X,Y)≡E[(X−E[X])(Y−E[Y])]
The right-hand side of this equation looks daunting, so let's see if we can simplify it. We can first expand the product:
E[(X−E[X])(Y−E[Y])=E[XY−XE[Y]−YE[X]+E[Y]E[X]]
Since expectation is linear, we can rewrite the right-hand side as a difference of expected values:
Note that both E[X] and E[Y] are just numbers: the expected values of the corresponding random variables. As a result, we can apply two principles here: E[aX]=aE[X] and E[a]=a. Consider the following rearrangement:
The last three terms are the same, they and sum to −E[Y]E[X]. Thus:
Cov(X,Y)≡E[(X−E[X])(Y−E[Y])]=E[XY]−E[Y]E[X]
This equation is much easier to work with; namely, h(X,Y)=XY is a much simpler function than h(X,Y)=(X−E[X])(Y−E[Y]) when it comes time to apply LOTUS.
Let's understand what happens when we take the covariance of X with itself:
Cov(X,X)=E[X∗X]−E[X]E[X]=E[X2]−(E[X])2=Var(X)
Theorem
If X and Y are independent random variables, then Cov(X,Y)=0. On the other hand, a covariance of 0 does not mean that X and Y are independent.
For example, consider two random variables, X∼Unif(−1,1) and Y=X2. Since Y is a function of X, the two random variables are dependent: if you know X, you know Y. However, take a look at the covariance:
Cov(X,Y)=E[X3]−E[X]E[X2]
What is E[X]? Well, we can integrate the pdf from −1 to 1, or we can understand that the expected value of a uniform random variable is the average of the bounds of the distribution. That's a long way of saying that E[X]=(−1+1)/2=0.
Now, what is E[X3]? We can apply LOTUS:
E[X3]=∫−11x3f(x)dx
What is the pdf of a uniform random variable? By definition, it's one over the difference of the bounds:
E[X3]=1−−11∫−11x3f(x)dx
Let's integrate and evaluate:
E[X3]=214x4∣∣∣∣−11=814−8(−1)4=0
Thus:
Cov(X,Y)=E[X3]−E[X]E[X2]=0
Just because the covariance between X and Y is 0 does not mean that they are independent!
More Theorems
Suppose that we have two random variables, X and Y, as well as two constants, a and b. We have the following theorem:
Cov(aX,bY)=abCov(X,Y)
Whether or not X and Y are independent,
Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)
Var(X−Y)=Var(X)+Var(Y)−2Cov(X,Y)
Note that we looked at a theorem previously which gave an equation for the variance of X+Y when both variables are independent: Var(X+Y)=Var(X)+Var(Y). That equation was a special case of the theorem above, where Cov(X,Y)=0 as is the case between two independent random variables.
Correlation
The correlation between X and Y, ρ, is equal to:
ρ≡Var(X)Var(Y)Cov(X,Y)
Note that correlation is standardized covariance. In other words, for any X and Y, −1≤ρ≤1.
If two variables are highly correlated, then ρ will be close to 1. If two variables are highly negatively correlated, then ρ will be close to −1. Two variables with low correlation will have a ρ close to 0.
For this pmf, X can take values in {2,3,4} and Y can take values in {40,50,60}. Note the marginal pmfs along the table's right and bottom, and remember that all pmfs sum to one when calculated over all appropriate values.
What is the expected value of X? Let's use fX(x):
E[X]=2(0.45)+3(0.3)+4(0.25)=2.8
Now let's calculate the variance:
Var(X)=E[X2]−(E[X])2
Var(X)=4(0.45)+9(0.3)+16(0.25)−(2.8)2=0.66
What is the expected value of Y? Let's use fY(y):
E[Y]=40(0.3)+50(0.3)+60(0.4)=51
Now let's calculate the variance:
Var(Y)=E[Y2]−(E[Y])2
Var(X)=1600(0.3)+2500(0.3)+3600(0.4)−(51)2=69
If we want to calculate the covariance of X and Y, we need to know E[XY], which we can calculate using two-dimensional LOTUS:
E[XY]=x∑y∑xyf(x,y)
E[XY]=(2∗40∗0.00)+(2∗50∗0.15)+...+(4∗60∗0.1)=140
With E[XY] in hand, we can calculate the covariance of X and Y:
Cov(X,Y)=E[XY]−E[X]E[Y]=140−(2.8∗51)=−2.8
Finally, we can calculate the correlation:
ρ=Var(X)Var(Y)Cov(X,Y)
ρ=0.66(69)−2.8≈−0.415
Portfolio Example
Let's look at two different assets, S1 and S2, that we hold in our portfolio. The expected yearly returns of the assets are E[S1]=μ1 and E[S2]=μ2, and the variances are Var(S1)=σ12 and Var(S2)=σ22. The covariance between the assets is σ12.
A portfolio is just a weighted combination of assets, and we can define our portfolio, P, as:
P=wS1+(1−w)S2,w∈[0,1]
The portfolio's expected value is the sum of the expected values of the assets times their corresponding weights:
Finally, let's substitute in the appropriate variables:
Var(P)=w2σ12+(1−w)2σ22+2w(1−w)σ12
How might we optimize this portfolio? One thing we might want to optimize for is minimal variance: many people want their portfolios to have as little volatility as possible.
Let's recap. Given a function f(x), how do we find the x that minimizes f(x)? We can take the derivative, f′(x), set it to 0 and then solve for x. Let's apply this logic to Var(P). First, we take the derivative with respect to w:
dwdVar(P)=2wσ12−2(1−w)σ22+2σ12−4wσ12
dwdVar(P)=2wσ12−2σ22+2wσ22+2σ12−4wσ12
Then, we set the derivative equal to 0 and solve for w:
0=2wσ12−2σ22+2wσ22+2σ12−4wσ12
0=wσ12−σ22+wσ22+σ12−2wσ12
σ22−σ12=wσ12+wσ22−2wσ12
σ22−σ12=w(σ12+σ22−2σ12)
σ12+σ22−2σ12σ22−σ12=w
Example
Suppose E[S1]=0.2, E[S2]=0.1, Var(S1)=0.2, Var(S2)=0.4, and Cov(S1,S2)=−0.1.
What value of w maximizes the expected return of this portfolio? We don't even have to do any math: just allocate 100% of the portfolio to the asset with the higher expected return - S1. Since we define our portfolio as wS1+(1−w)S2, the correct value for w is 1.
What value of w minimizes the variance? Let's plug and chug:
w=σ12+σ22−2σ12σ22−σ12
w=0.2+0.4+0.20.4+0.1=0.5/0.8=0.625
To minimize variance, we should hold a portfolio consisting of 5/8S1 and 3/8S2.
There are tradeoffs in any optimization. For example, optimizing for maximal expected return may introduce high levels of volatility into the portfolio. Conversely, optimizing for minimal variance may result in paltry returns.
Probability Distributions
In this lesson, we are going to review several popular discrete and continuous distributions.
Bernoulli (Discrete)
Suppose we have a random variable, X∼Bernoulli(p). X has the following pmf:
f(x)={p1−p(=q)if x=1if x=0
Additionally, X has the following properties:
E[X]=p
Var(X)=pq
MX(t)=pet+q
Binomial (Discrete)
The Bernoulli distribution generalizes to the binomial distribution. Suppose we have n iid Bernoulli random variables: X1,X2,...,Xn∼iidBern(p). Each Xi takes on the value 1 with probability p and 0 with probability 1−p. If we take the sum of the successes, we have the following random variable, Y:
Y=i=1∑nXi∼Bin(n,p)
Y has the following pmf:
f(y)=(yn)pyqn−y,y=0,1,...,n.
Notice the binomial coefficient in this equation. We read this as "n choose k", which is defined as:
(yn)=k!(n−k)!n!
What's going on here? First, what is the probability of y successes? Well, completely, it's the probability of y successes and n−y failures: pyqn−y. Of course, the outcome of yconsecutive successes followed by n−yconsecutive failures is just one particular arrangement of many. How many? n choose k. This is what the binomial coefficient expresses.
Additionally, Y has the following properties:
E[Y]=np
Var(Y)=npq
MY(t)=(pet+q)n
Note that the variance and the expected value are equal to n times the variance and the expected value of the Bernoulli random variable. This relationship makes sense: a binomial random variable is the sum of n Bernoulli's. The moment-generating function looks a little bit different. As it turns out, we multiply the moment-generating functions when we sum the random variables.
Geometric (Discrete)
Suppose we have a random variable, X∼Geometric(p). A geometric random variable corresponds to the number of Bern(p) trials until a success occurs. For example, three failures followed by a success ("FFFS") implies that X=4. A geometric random variable has the following pmf:
f(x)=qx−1p,x=1,2,...
We can see that this equation directly corresponds to the probability of x−1 failures, each with probability q followed by one success, with probability p.
Additionally, X has the following properties:
E[X]=p1
Var(X)=p2q
MX(t)=1−qetpet
Negative Binomial (Discrete)
The geometric distribution generalizes to the negative binomial distribution. Suppose that we are interested in the number of trials it takes to see r successes. We can add r iid Geom(p) random variables to get the random variable Y∼NegBin(r,p). For example, if r=3, then a run of "FFFSSFS" implies that Y∼NegBin(3,p)=7. Y has the following pmf:
f(y)=(r−1y−1)qy−rpr,y=r,r+1,...
Additionally, Y has the following properties:
E[Y]=pr
Var(Y)=p2qr
Note that the variance and the expected value are equal to r times the variance and the expected value of the geometric random variable. This relationship makes sense: a negative binomial random variable is the sum of r geometric random variables.
Poisson (Discrete)
A counting process, N(t) keeps track of the number of "arrivals" observed between time 0 and time t. For example, if 7 people show up to a store by time t=3, then N(3)=7. A Poisson process is a counting process that satisfies the following criteria.
Arrivals must occur one-at-a-time at a rate, λ. For example, λ=4/hr means that, on average, arrivals occur every fifteen minutes, yet no two arrivals coincide.
Disjoint time increments are independent. Suppose we are looking at arrivals on the intervals 12 am - 2 am and 5 am - 10 am. Independent increments means that the arrivals in the first interval don't impact arrivals in the second.
Increments are stationary; in other words, the distribution of the number of arrivals in the interval [s,s+t] depends only on the interval's length, t. It does not depend on where the interval starts, s.
A random variable X∼Pois(λ) describes the number of arrivals that a Poisson process experiences in one time unit, i.e., N(1). X has the following pmf:
f(x)=x!e−λλx,x=0,1,...
Additionally, X has the following properties:
E[X]=Var(X)=λ
MX(t)=eλ(et−1)
Uniform (Continuous)
A uniform random variable, X∼Uniform(a,b), has the following pdf:
f(x)=b−a1,a≤x≤b
Additionally, X has the following properties:
E[X]=2a+b
Var(X)=12(b−a)2
MX(t)=tb−taetb−eta
Exponential (Continuous)
A continuous, exponential random variable X∼Exponential(λ) has the following pdf:
f(x)=λe−λx,x≥0
Additionally, X has the following properties:
E[X]=λ1
Var(X)=λ21
MX(t)=λ−tλ,t<λ
The exponential distribution also has a memoryless property, which means that for s,t>0, P(X>s+t∣X>s)=P(X>t). For example, if we have a light bulb, and we know that it has lived for s time units, the conditional probability that it will live for s+t