Randomized QuickSort

PDF of Eric’s handwritten notes are here.

Geometric Random Variables

Flip a coin, which will give

  • heads with probability p
  • tails with probability 1-p

Let X = # of flips until the first heads (including the flip with the heads)

We denote the above distribution as X \sim Geom(p), i.e. a geometric distribution with parameter p. Let \mu = E(X).

What is \mu? If we consider p=1/2, then we would expect \mu=2. For general p, we would expect \mu = 1/p, and we will now prove this fact.

Suppose we tried to directly evaluate the expectation:

\mu = \sum_{i=1}^\infty i \cdot \Pr(X=i) = \sum_{i=1}^\infty i (1-p)^{i-1} p

We could try to evaluate the above sum and eventually arrive at the correct answer, or we could view \mu another way. Consider the first flip. If it’s heads, then we’re done and only flip once. If it’s tails, then we repeat the process, thus flipping 1+\mu times in expectation.

So

\mu = p \cdot 1 + (1-p)\cdot (\mu + 1)

and solving the above for \mu gives \mu = 1/p.

Linearity of Expectation

Here we give a proof of linearity of expectation.

Let X,Y be random variables and take values in \{0,1,\ldots,n\} (though you can use the same proof for any domain). Note that X and Y may not be independent. First, observe the following fact:

\sum_{j=0}^n \Pr(X=i, Y=j) = \Pr(X=i).

We can now see that

\begin{aligned}  E[X+Y] &= \sum_{i=0}^n \sum_{j=0}^n (i+j) \Pr(X=i, Y=j) \qquad \mbox{ by definition of expectation}\\  &=\sum_{i=0}^n \sum_{j=0}^n i \Pr(X=i, Y=j) + \sum_{i=0}^n\sum_{j=0}^n j \Pr(X=i, Y=j) \\  &= \sum_{i=0}^n i \sum_{j=0}^n \Pr(X=i, Y=j) + \sum_{j=0}^n j \sum_{i=0}^n \Pr(X=i, Y=j) \\  &= \sum_{i=0}^n i \Pr(X=i) + \sum_{j=0}^n j \Pr(Y=j) \qquad \mbox{ by above fact}\\  &= E[X] + E[Y].  \end{aligned}

Coupon Collector

Suppose we have n coupons in an urn. Consider the following process:

  1. Choose a coupon from the urn at random and look at it.
  2. Put the coupon back in the urn.
  3. Repeat.

Let X = # of steps until we see all n coupons at least once.
       What is E(X)?

Consider defining X_i as the number of steps to see the i-th different coupon once we have seen i-1 different coupons. So X_1 is the time to see the first coupon, X_2 is the time to see the second coupon after seeing the first coupon, etc.

Then, we can see that X = X_1 + X_2 + \ldots + X_n. By linearity of expectation, we have

E(X) = E(X_1 + \ldots + X_n) = E(X_1) + \ldots + E(X_n).

So what is E(X_i)?
      We have seen i-1 coupons, and not seen n - i + 1 coupons.
      So, the probability that we see a new coupon is \frac{n-i+1}{n}.

And X_i is the time until this event of seeing a new coupon occurs.  Thus X_i is a geometric random variable with parameter \frac{n-i+1}{n}. Thus,

E(X_i) = \frac{n}{n-i+1}.

Plugging this back into the equation for E(X), we see that

\begin{aligned}  E(X) &= \sum_{i=1}^n \frac{n}{n-i+1} \\  &= \frac{n}{n} + \frac{n}{n-1} + \ldots + \frac{n}{1} \\  &= n \left(1 + \frac{1}{2} + \frac{1}{3} + \ldots + \frac{1}{n}\right) \end{aligned}

Claim E(X) = n \ln n + O(n).

Proof We’ll show that

\frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n} \leq \ln{n},

and thus

n( 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{n} )\leq n(1 + \ln{n}) = n\ln{n} + O(n) which will complete the proof.

To see the earlier claimed upper bound, consider the following function g(x) = 1/\lfloor x+1 \rfloor .  Note that for x=1\rightarrow n the function g(x) consists of n-1 rectangles of area 1/2, 1/3, \ldots, 1/n, and thus \int_1^n g(x) = \frac{1}{2} + \frac{1}{3} + ... + \frac{1}{n}.

Now consider the function: f(x) = 1/x.  Notice that g(x) \leq f(x) and thus \int_1^n g(x) \leq \int_1^n f(x)Therefore,

\begin{aligned}  \frac{1}{2} + \frac{1}{3} + \ldots + \frac{1}{n} &= \int_1^n g(x) \\  &\le \int_1^n f(x) \\  &\le \int_1^n \frac{1}{x}\, dx \\  &\le \ln x \Big |_{x=1}^n \\  &= \ln n.  \end{aligned}

Thus we get \sum_{i=1}^n \frac{1}{i} \le 1 + \ln n and E(X) \le n \ln n + n.

We can also see a lower bound of E[X] \ge n\ln n by a similar argument. We draw n rectangles of area 1, 1/2, \ldots, 1/n, and we lower bound the area of the rectangles by \int_1^n 1/x \, dx.

Expected Runtime of QuickSort

Input: unsorted array A = [a_1, \ldots, a_n] of n numbers

Output: sorted A

QuickSort(A):

  1. Choose a random pivot p.
  2. Partition A into A_{<p}, A_{=p}, A_{>p}.
  3. Recursively sort A_{<p}, A_{>p}.
  4. Return (A_{<p}, A_{=p}, A_{>p}).

In the worst case, the above algorithm could take \Omega(n^2) time, e.g. every time we select the smallest element as the pivot. If the pivot p was the median element, then we get

T(n) = 2T(n/2) + O(n) = O(n \log n).

What is the runtime when we select p at random? We will examine the expected number of comparisons of randomized QuickSort.

Let X=# of comparisons for QuickSort.

Claim E(X) \le 2n \ln n.

Proof 

Let S = \{s_1, \ldots, s_n\} be a sorted version of A, so s_1 \le s_2 \le \ldots \le s_n. If there are multiple sorted versions of A, we simply let S be one of them.

For 1 \le i < j \le n, let X_{ij} be the number of comparisons between s_i and s_j. And so

X = \sum_{1\le i < j \le n} X_{ij}

and again by linearity of expectation

E(X) = \sum_{1 \le i<j \le n} E(X_{ij}).

Note that in QuickSort, two elements will be compared at most once. So 0 \le X_{ij} \le 1 and further

E(X_{ij}) = 0 \cdot \Pr(X_{ij}=0) + 1 \cdot \Pr(X_{ij}=1).

What is the probability that s_i, s_j compare? Consider s_i, s_{i+1}, \ldots, s_j. For s_i, s_j to compare, we need one of them to be a pivot WHILE they are in the subproblem. Therefore, we further need that one of s_i, s_j to be a pivot before ANY element of s_{i+1}, \ldots, s_{j-1}.

So, the probability that s_i, s_j is a pivot before all elements between them is 2/(j-i+1). Therefore,

E(X_{ij}) = \frac{2}{j-i+1}

and combining all terms together in the above equation for E(X), we see that

\begin{aligned}  E(X) &= \sum_{1 \le i < j \le n} \frac{2}{j-i+1} \\  &=\sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j-i+1} \\  &=\sum_{i=1}^{n-1} \left( \frac{2}{2} + \frac{2}{3} + \ldots + \frac{2}{n-i+1}\right) \\  &\le 2 \sum_{i=1}^n \left(\frac{1}{2} + \ldots + \frac{1}{n}\right) \\  &\le 2n \ln n.  \end{aligned}  .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s