Geometric Random Variables
Flip a coin, which will give
- heads with probability
- tails with probability
Let # of flips until the first heads (including the flip with the heads)
We denote the above distribution as Geom, i.e. a geometric distribution with parameter . Let .
What is ? If we consider , then we would expect . For general , we would expect , and we will now prove this fact.
Suppose we tried to directly evaluate the expectation:
We could try to evaluate the above sum and eventually arrive at the correct answer, or we could view another way. Consider the first flip. If it’s heads, then we’re done and only flip once. If it’s tails, then we repeat the process, thus flipping times in expectation.
and solving the above for gives .
Linearity of Expectation
Here we give a proof of linearity of expectation.
Let be random variables and take values in (though you can use the same proof for any domain). Note that and may not be independent. First, observe the following fact:
We can now see that
Suppose we have coupons in an urn. Consider the following process:
- Choose a coupon from the urn at random and look at it.
- Put the coupon back in the urn.
Let # of steps until we see all coupons at least once.
What is ?
Consider defining as the number of steps to see the -th different coupon once we have seen different coupons. So is the time to see the first coupon, is the time to see the second coupon after seeing the first coupon, etc.
Then, we can see that . By linearity of expectation, we have
So what is ?
We have seen coupons, and not seen coupons.
So, the probability that we see a new coupon is .
And is the time until this event of seeing a new coupon occurs. Thus is a geometric random variable with parameter . Thus,
Plugging this back into the equation for , we see that
Proof We’ll show that
which will complete the proof.
To see the earlier claimed upper bound, consider the following function . Note that for the function consists of rectangles of area , and thus .
Now consider the function: . Notice that and thus . Therefore,
Thus we get and .
We can also see a lower bound of by a similar argument. We draw rectangles of area , and we lower bound the area of the rectangles by .
Expected Runtime of QuickSort
Input: unsorted array of numbers
- Choose a random pivot .
- Partition into .
- Recursively sort .
- Return .
In the worst case, the above algorithm could take time, e.g. every time we select the smallest element as the pivot. If the pivot was the median element, then we get
What is the runtime when we select at random? We will examine the expected number of comparisons of randomized QuickSort.
Let # of comparisons for QuickSort.
Let be a sorted version of , so . If there are multiple sorted versions of , we simply let be one of them.
For , let be the number of comparisons between and . And so
and again by linearity of expectation
Note that in QuickSort, two elements will be compared at most once. So and further
What is the probability that compare? Consider . For to compare, we need one of them to be a pivot WHILE they are in the subproblem. Therefore, we further need that one of to be a pivot before ANY element of .
So, the probability that is a pivot before all elements between them is . Therefore,
and combining all terms together in the above equation for , we see that