Randomized QuickSort

PDF of Eric’s handwritten notes are here.

Geometric Random Variables

Flip a coin, which will give

• heads with probability $p$
• tails with probability $1-p$

Let $X =$ # of flips until the first heads (including the flip with the heads)

We denote the above distribution as $X \sim$ Geom$(p)$, i.e. a geometric distribution with parameter $p$. Let $\mu = E(X)$.

What is $\mu$? If we consider $p=1/2$, then we would expect $\mu=2$. For general $p$, we would expect $\mu = 1/p$, and we will now prove this fact.

Suppose we tried to directly evaluate the expectation:

$\mu = \sum_{i=1}^\infty i \cdot \Pr(X=i) = \sum_{i=1}^\infty i (1-p)^{i-1} p$

We could try to evaluate the above sum and eventually arrive at the correct answer, or we could view $\mu$ another way. Consider the first flip. If it’s heads, then we’re done and only flip once. If it’s tails, then we repeat the process, thus flipping $1+\mu$ times in expectation.

So

$\mu = p \cdot 1 + (1-p)\cdot (\mu + 1)$

and solving the above for $\mu$ gives $\mu = 1/p$.

Linearity of Expectation

Here we give a proof of linearity of expectation.

Let $X,Y$ be random variables and take values in $\{0,1,\ldots,n\}$ (though you can use the same proof for any domain). Note that $X$ and $Y$ may not be independent. First, observe the following fact:

$\sum_{j=0}^n \Pr(X=i, Y=j) = \Pr(X=i)$.

We can now see that

\begin{aligned} E[X+Y] &= \sum_{i=0}^n \sum_{j=0}^n (i+j) \Pr(X=i, Y=j) \qquad \mbox{ by definition of expectation}\\ &=\sum_{i=0}^n \sum_{j=0}^n i \Pr(X=i, Y=j) + \sum_{i=0}^n\sum_{j=0}^n j \Pr(X=i, Y=j) \\ &= \sum_{i=0}^n i \sum_{j=0}^n \Pr(X=i, Y=j) + \sum_{j=0}^n j \sum_{i=0}^n \Pr(X=i, Y=j) \\ &= \sum_{i=0}^n i \Pr(X=i) + \sum_{j=0}^n j \Pr(Y=j) \qquad \mbox{ by above fact}\\ &= E[X] + E[Y]. \end{aligned}

Coupon Collector

Suppose we have $n$ coupons in an urn. Consider the following process:

1. Choose a coupon from the urn at random and look at it.
2. Put the coupon back in the urn.
3. Repeat.

Let $X =$# of steps until we see all $n$ coupons at least once.
What is $E(X)$?

Consider defining $X_i$ as the number of steps to see the $i$-th different coupon once we have seen $i-1$ different coupons. So $X_1$ is the time to see the first coupon, $X_2$ is the time to see the second coupon after seeing the first coupon, etc.

Then, we can see that $X = X_1 + X_2 + \ldots + X_n$. By linearity of expectation, we have

$E(X) = E(X_1 + \ldots + X_n) = E(X_1) + \ldots + E(X_n)$.

So what is $E(X_i)$?
We have seen $i-1$ coupons, and not seen $n - i + 1$ coupons.
So, the probability that we see a new coupon is $\frac{n-i+1}{n}$.

And $X_i$ is the time until this event of seeing a new coupon occurs.  Thus $X_i$ is a geometric random variable with parameter $\frac{n-i+1}{n}$. Thus,

$E(X_i) = \frac{n}{n-i+1}$.

Plugging this back into the equation for $E(X)$, we see that

\begin{aligned} E(X) &= \sum_{i=1}^n \frac{n}{n-i+1} \\ &= \frac{n}{n} + \frac{n}{n-1} + \ldots + \frac{n}{1} \\ &= n \left(1 + \frac{1}{2} + \frac{1}{3} + \ldots + \frac{1}{n}\right) \end{aligned}

Claim $E(X) = n \ln n + O(n)$.

Proof We’ll show that

$\frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n} \leq \ln{n},$

and thus

$n( 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n} )\leq n(1 + \ln{n}) = n\ln{n} + O(n)$ which will complete the proof.

To see the earlier claimed upper bound, consider the following function $g(x) = 1/\lfloor x+1 \rfloor$.  Note that for $x=1\rightarrow n$ the function $g(x)$ consists of $n-1$ rectangles of area $1/2, 1/3, \ldots, 1/n$, and thus $\int_1^n g(x) = \frac{1}{2} + \frac{1}{3} + ... + \frac{1}{n}$.

Now consider the function: $f(x) = 1/x$.  Notice that $g(x) \leq f(x)$ and thus $\int_1^n g(x) \leq \int_1^n f(x)$Therefore,

\begin{aligned} \frac{1}{2} + \frac{1}{3} + \ldots + \frac{1}{n} &= \int_1^n g(x) \\ &\le \int_1^n f(x) \\ &\le \int_1^n \frac{1}{x}\, dx \\ &\le \ln x \Big |_{x=1}^n \\ &= \ln n. \end{aligned}

Thus we get $\sum_{i=1}^n \frac{1}{i} \le 1 + \ln n$ and $E(X) \le n \ln n + n$.

We can also see a lower bound of $E[X] \ge n\ln n$ by a similar argument. We draw $n$ rectangles of area $1, 1/2, \ldots, 1/n$, and we lower bound the area of the rectangles by $\int_1^n 1/x \, dx$.

Expected Runtime of QuickSort

Input: unsorted array $A = [a_1, \ldots, a_n]$ of $n$ numbers

Output: sorted $A$

QuickSort(A):

1. Choose a random pivot $p$.
2. Partition $A$ into $A_{p}$.
3. Recursively sort $A_{p}$.
4. Return $(A_{p})$.

In the worst case, the above algorithm could take $\Omega(n^2)$ time, e.g. every time we select the smallest element as the pivot. If the pivot $p$ was the median element, then we get

$T(n) = 2T(n/2) + O(n) = O(n \log n)$.

What is the runtime when we select $p$ at random? We will examine the expected number of comparisons of randomized QuickSort.

Let $X=$# of comparisons for QuickSort.

Claim $E(X) \le 2n \ln n$.

Proof

Let $S = \{s_1, \ldots, s_n\}$ be a sorted version of $A$, so $s_1 \le s_2 \le \ldots \le s_n$. If there are multiple sorted versions of $A$, we simply let $S$ be one of them.

For $1 \le i < j \le n$, let $X_{ij}$ be the number of comparisons between $s_i$ and $s_j$. And so

$X = \sum_{1\le i < j \le n} X_{ij}$

and again by linearity of expectation

$E(X) = \sum_{1 \le i.

Note that in QuickSort, two elements will be compared at most once. So $0 \le X_{ij} \le 1$ and further

$E(X_{ij}) = 0 \cdot \Pr(X_{ij}=0) + 1 \cdot \Pr(X_{ij}=1)$.

What is the probability that $s_i, s_j$ compare? Consider $s_i, s_{i+1}, \ldots, s_j$. For $s_i, s_j$ to compare, we need one of them to be a pivot WHILE they are in the subproblem. Therefore, we further need that one of $s_i, s_j$ to be a pivot before ANY element of $s_{i+1}, \ldots, s_{j-1}$.

So, the probability that $s_i, s_j$ is a pivot before all elements between them is $2/(j-i+1)$. Therefore,

$E(X_{ij}) = \frac{2}{j-i+1}$

and combining all terms together in the above equation for $E(X)$, we see that

\begin{aligned} E(X) &= \sum_{1 \le i < j \le n} \frac{2}{j-i+1} \\ &=\sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j-i+1} \\ &=\sum_{i=1}^{n-1} \left( \frac{2}{2} + \frac{2}{3} + \ldots + \frac{2}{n-i+1}\right) \\ &\le 2 \sum_{i=1}^n \left(\frac{1}{2} + \ldots + \frac{1}{n}\right) \\ &\le 2n \ln n. \end{aligned}.