In this post we will explore the duality between uniform and exponential distributions, and try to see how it can be used as a clever trick to solve probability problems.

Suppose you have a problem that mentions “i.i.d. Uniform” and asks about order statistics, typically minimum and/or maximum. Instead of doing tedious arithmetic/calculus, what if there was another way?

Recall the Poisson Process. This counting process has independent arrivals and it is a well-known fact that inter-arrival times are exponentially distributed. In addition, the less-known feature is that arrivals are also uniform for a finite interval. This is where the intuition comes from.

Uniform - Exponential dualityWe skip the rigorous proof here, but an alternative way to sample nn uniformly distributed random variables and then sort them is to generate nn exponential random variables, representing intervals in-between, draw another one representing the last interval and eventually normalize the entire interval.

Let Y1,Y2,,Yn+1Y_1, Y_2, \dots, Y_{n+1} be independent Exponential(1) random variables. Denote

Y(k)=i=1kYii=1n+1YiY_{(k)} = \frac{\sum\limits_{i=1}^{k} Y_i}{\sum\limits_{i=1}^{n+1} Y_i}.

Then Y(k),k=1,,nY_{(k)},k=1,\dots,n are jointly distributed exactly the same as Uniform[0;1] order statistics.

Couple of useful facts (and links) about Y(k)Y_{(k)} random variables:

Expression Distribution Mean Variance
Exp(λ)Exp(\lambda)wiki Γ(1,λ)\Gamma(1,\lambda)wiki 1λ\frac{1}{\lambda} 1λ2\frac{1}{\lambda^2}
kExp(λ)k \cdot Exp(\lambda) Exp(λk)Exp(\frac{\lambda}{k})
Sum of nn i.i.d. Exp(1)Exp(1)wiki Γ(n,1)\Gamma(n,1) nn nn
Min of nn i.i.d. Exp(1)Exp(1)wiki Exp(n)Exp(n) 1n\frac{1}{n} 1n2\frac{1}{n^2}
Mix of indep.wiki: Γ(m,1)Γ(m,1)+Γ(n,1)\frac{\Gamma(m,1)}{\Gamma(m,1)+\Gamma(n,1)} Beta(m,n)Beta(m,n)wiki mm+n\frac{m}{m+n} mn(m+n)2(m+n+1)\frac{mn}{(m+n)^2(m+n+1)}
Joint Y(1),Y(2),Y_{(1)},Y_{(2)},\dots Dirichletwiki with (1,1,)(1,1,\dots)

 

Now let’s demonstrate the application of the uniform-exponential for getting succinct proofs.

Problem 1 (Easy). Given nn i.i.d. Uniform [0;1] random variables, find the probability that the sum of their maximum and minimum is greater than 1.

Let Y1,Y2,,Yn+1Y_1, Y_2, \dots, Y_{n+1} be independent Exponential(1) random variables. Denote minimum and maximum as Xmin,XmaxX_{min}, X_{max}. Then, by the machinery above, it holds that

Xmin=Y1i=1n+1Yi X_{min} = \frac{Y_1}{\sum\limits_{i=1}^{n+1} Y_i} and Xmax=i=1nYii=1n+1Yi X_{max} = \frac{\sum\limits_{i=1}^{n} Y_i}{\sum\limits_{i=1}^{n+1} Y_i} .

We want to evaluate P(Xmin+Xmax>1)P(X_{min} + X_{max} > 1). After substituting YiY_i, this becomes simply the probability of P(Y1>Yn+1) P( Y_1 > Y_{n+1} ) and equals to 1/2 by symmetry.

The alternative approach is to notice that maximum is one minus minimum and play with it. This evolves bits of creativity and the standard machinery helps when “noticing” was not successful.

Problem 2 (Hard). Given nn i.i.d. Uniform [0;1] random variables, find the correlation between their minimum and maximum.

Let Y1,Y2,,Yn+1Y_1, Y_2, \dots, Y_{n+1} be independent Exponential(1) random variables. Denote minimum and maximum as Xmin,XmaxX_{min}, X_{max}. Then, by the machinery above, it holds that

Xmin=Y1i=1n+1Yi X_{min} = \frac{Y_1}{\sum\limits_{i=1}^{n+1} Y_i} and Xmax=i=1nYii=1n+1Yi X_{max} = \frac{\sum\limits_{i=1}^{n} Y_i}{\sum\limits_{i=1}^{n+1} Y_i} .

After this the magic happens when one recognizes that for every two variables A,B A,B , by linearity, it holds that

cov(A,B)=cov(A,1B) cov(A, B) = - cov(A, 1-B)

Denote z=cov(Yik=1n+1Yk,Yjk=1n+1Yk) z = cov \left( \frac{Y_i}{\sum\limits_{k=1}^{n+1} Y_k}, \frac{Y_j}{\sum\limits_{k=1}^{n+1} Y_k} \right) for ij i \neq j . Note that we can drop the index i,j i,j , since by symmetry all these are the same.

Let’s substitute minimum and maximum into the “magic” observation. Then simply cov(Xmin,1Xmax)=z -cov\left( X_{min}, 1-X_{max} \right) = z, where indices are i=1i=1 and j=n+1j=n+1. The original correlation after opening the sum can be expressed as

cov(Xmin,Xmax)=Var(Xmin)+(n1)z cov\left( X_{min}, X_{max} \right) = Var\left( X_{min} \right) + (n-1)z,

where variance comes from Y1Y_1. Hence, we have that z=Var(Xmin)n z = \frac{Var\left( X_{min} \right)}{n} , which is also a covariance! Finally, for correlation, noticing that variance for minimum and maximum are equal (same machinery),

ρ(Xmin,Xmax)=cov(Xmin,Xmax)Var2(Xmin)=1n\rho \left( X_{min}, X_{max} \right) = \frac{ cov\left(X_{min}, X_{max}\right) }{\sqrt{ Var^2 \left(X_{min} \right) }} = \frac{1}{n}.

Problem 3 (Hard). Given nn i.i.d. Uniform [0;1] random variables, find the expected value of the smallest distance between them.

Use the machinery above. Denote the denominator of exponentials as V=Y1++Yn+1 V = Y_1 + \dots + Y_{n+1}. Then the distances are conveniently Y2V,,YnV \frac{Y_2}{V}, \dots, \frac{Y_n}{V} and we are looking to find the expectation of their minimum. They share the common denominator that can be moved out, such as

min{Y2V,,YnV}=1Vmin{Y2,,Yn}=MV. \min \left\{ \frac{Y_2}{V}, \dots, \frac{Y_n}{V} \right\} = \frac{1}{V} \min \left\{ Y_2, \dots, Y_n \right\} =\frac{M}{V}.

NB: random variables Y1Y_1 and Yn+1Y_{n+1} are not inside the minimum function (see image in the beginning). From the table above, we know that (n1)MExp(1)Γ(1,1) (n-1) \cdot M \sim Exp(1) \equiv \Gamma(1, 1). Now the idea is to find the Beta distribution in the form Gamma’s fraction.

Since MM and VV are not independent, we can’t use Beta distribution without justification. Recall that the exponential distribution is memoryless and remain the same independently distributed exponential after the minimum time passed.

Let’s condition the expectation on the minimum being achieved for some index kk^{*}, and we want to use the law of total expectation E[MV]=E[E[MVk,M=m]]\mathbb{E}\left[\frac{M}{V}\right] = \mathbb{E}\left[\mathbb{E}\left[ \frac{M}{V} | k^{*},M=m \right]\right] . The denominator VV consists of:

  • Y1 Y_1 and Yn+1Y_{n+1}, independent from the minimum YkY_{k^{*}};
  • the minimum Yk=mY_{k^{*}}=m itself;
  • n2n-2 exponential variables greater or equal mm, indices from 22 to nn skipping index kk^{*}.

By memoryless property P(Yk>s+mYk>m)=P(Yk>s),\mathbb{P} \left( Y_k > s+m | Y_k > m \right) = \mathbb{P} \left( Y_k > s \right), hence it can be written as Yk=m+Yk~Y_k = m + \widetilde{Y_k}, where Yk~\widetilde{Y_k} has exactly the same exponential distribution and independent of MM. Now that M,YkM, Y_{k^{*}} are obviously measurable and other variables are independent, we can remove the condition and make enumerator Gamma by multiplying and dividing by n1n-1, to receive the general expectation

E[MV]=1n1E[(n1)MY1+Yn+1+M+(Y2~+M)+] \footnotesize{ \mathbb{E} \left[ \frac{M}{V} \right] = \frac{1}{n-1}\mathbb{E} \left[ \frac{(n-1)M}{Y_1 + Y_{n+1} + M + (\widetilde{Y_2}+M)+\dots} \right] }

The denominator has the minimum n1n-1 times, and reminder is nn exponentials independent from the minimum, with their sum being Γ(n,1)\Gamma(n,1). Now we recall that (n1)M(n-1)M has Gamma distribution and we can use the Beta ratio from the table:

E[MV]=1n1E [Beta(1,n)]=1(n1)(n+1). \mathbb{E} \left[ \frac{M}{V} \right] = \frac{1}{n-1}\mathbb{E}  \left[ Beta(1, n) \right] = \frac{1}{(n-1)(n+1)}.

0 Comments

Leave a Reply

Avatar placeholder