1 Quicksort
1.1 Main topics in sorting
1.2 Quicksort Implementation
1.3 It’s all in how you partition it...
1.4 And now we see why shuffle is necessary
1.5 Weakness in this partition
1.6 Daily Exercise
1.7 Homework1 Feedback
1.8 Homework2 Instructions
1.9 Tilde Explanation
1.10 Lecture Key Topics
1.11 Version : 2015/ 11/ 10

CS 2223 Nov 09 2015

Lecture Path: 08
Back Next

Expected reading: 288-301
Daily Exercise:

Thou dost lie in’t,
to be in’t and say it is thine.
’Tis for the dead,
not for the quick;
therefore thou liest.
William Shakespeare

1 Quicksort

1.1 Main topics in sorting

We have made some major progress in analyzing the sorting problem. From last lecture we were able to assert the Mergesort is an asymptotically optimal compare-based sorting algorithm.

We made this claim because we demonstrated that any comparison-based sorting algorithm requires ~ N log N comparisons to correctly sort an array of N elements.

The logic was based on identifying a recursive formulat that computed the number of comparisons required. We came up with the following formula that demonstrated the most number of comparisons that would be used.

C(n) <= C(n/2) + C(n/2) + N
C(n) <= 2*C(n/2) + N

Be sure you understand the logic as presented in lecture, which is supported by the book; review the class capture if that would be helpful.

1.2 Quicksort Implementation

At this point, you might ask why we should continue investigating sorting as a problem. Well, the first issue is that MergeSort still required ~ N extra storage with which to work. Is it possible to eliminate the extra overhead? It is, and the surprisingly brief implementation might stun you.

Find this implementation in the Git repository algs.days.day08.

public static void sort(Comparable[] a) { shuffle(a); sort(a, 0, a.length - 1); } // quicksort the subarray from a[lo] to a[hi] private static void sort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; int j = partition(a, lo, hi); sort(a, lo, j-1); sort(a, j+1, hi); }

Once again, you will note there are no comparisons or exchanges, so the real work appears to be done in partition. Just given the information above, let’s see if we can reverse engineer what the method is supposed to do.

It is instructive to compare this method with MergeSort:

mergesort(Comparable[] a, int lo, int hi) { if (hi <= lo) return; sort(a, lo, mid); sort(a, mid+1, hi); merge(a, lo, mid, hi); }

In MergeSort, both halves of the array are sorted, and then merged into place using auxiliary storage.

In Quicksort, an item in a[lo..hi] is found and put into its final spot at a location identified by j. Let’s assume for a moment that this is even possible. Then, all you need to do is sort the left side a[lo..j-1] and then the right side a[j+1..hi].

Now, if partition is able to be effective at finding a value "near the middle" of the array, then the left and right sort subproblems will more or less be half the size of the original problem, and we will be able to demonstrate an asymptotically optimal compare-based sorting algorithm.

1.3 It’s all in how you partition it...

The speed and elegance of Quicksort comes from the partition method. Numerous methods have been proposed (there are at least four that I know of). The book lists one (p. 290) that is clean and easy to visualize. It also has some nice features that take advantage of duplicate values that might exist in the array.

First the intuition: Pick one of the existing values in a[lo..hi] and try to find where it belongs. For convenience, start with the value in a[lo]. Now, imagine sweeping through the array with two index values. Index i starts at lo and advances towards hi. While another index j starts at hi+1 and descends towards lo. Until these indices "cross each other" you try to find two elements that can be swapped with each other.

Let’s try to identify some truths in the following annotated code. The handout has the clean code, so the following is a bit verbose but it has everything you need to know.

private static int partition(Comparable[] a, int lo, int hi) { int i = lo; int j = hi + 1; Comparable v = a[lo]; Truth: (1) items in a[j .. hi] all >= v. EMPTY RANGE Truth: (2) items in a[lo .. i] all <= v. IDENTITY; while (true) { // find item on lo to swap. Note pre-increment while (less(a[++i], v)) { if (i == hi) break; } Truth: either (1) a[i] >= v or (2) done with array // find item on hi to swap. Note pre-decrement while (less(v, a[−−j])) { if (j == lo) break; } Truth: either (1) v >= a[j] or (2) done with array // check if pointers cross if (i >= j) break; exch(a, i, j); Truth: (1) items in a[j .. hi] all >= v. Truth: (2) items in a[lo .. i] all <= v. ; } Truth: a[j] is <= a[lo] BECAUSE pointers crossed // put partitioning item v at a[j] exch(a, lo, j); // now, a[lo .. j-1] <= a[j] <= a[j+1 .. hi] return j; }

These ten-lines of code have a number of complicated logical conditions in such a small space. You truly need to work out a number of examples BY HAND to make sure you are able to follow the logic. Page 291 of the book has a long example and I encourage you to review that example.

Let’s try our hand at the partitioning the following seven-member array:

egg

fly

ant

cat

dog

get

bat

lo

hi

Start as follows:

egg

fly

ant

cat

dog

get

bat

i

j










1.4 And now we see why shuffle is necessary

Quicksort is dependent on the selection of the partitioning element. As you will see on page 295, if the partitioning only reduces the problem size by 1 then the total number of compares will be ~ N2/2, which was exactly the case for Selection Sort.

So the trick is to ensure the value choosed to partition is close to the median value, that is, the value in the list which has an equal number of items smaller than or equal to it as it does that are greater than or equal to it.

You might think that you need to sort a collection to find its median value, but it turns out there is an approach you can use to determine it in time directly proportional to the elements in the array. However it is not used in practice because of its complexity.

The advanced mathematics alluded to in the book declares that "on average" the partition will be about half, and in the long run, this will lead to a demonstrated number of comparisons that is 40% above optimum.

1.5 Weakness in this partition

Review what happens when all elements are the same!? That is, given an array of SEVEN elements, how does partition operate? And what value j is returned?

the

the

the

the

the

the

the

lo

hi

Start by selecting a[lo] as the element v to place, set i to lo and j to be one greater than hi:

the

the

the

the

the

the

the

i

j

Remember that both inner while loops use pre-increment for both i and j. In both cases, the condition is false because all values are the same, however i and j are both incremented. When we get to the exch within the while loop, the two values are exchanged, even though they are the same value. The resulting state is follows:

the

the

the

the

the

the

the

i

j

This will repeat three more times, with i and j finally crossing paths, thus terminating the loop. Naturally, during all this time, these exchanges are wasted.

But ask yourself whether it is worth your time to add an additional comparison just to ensure that the elements being swapped are different. Since you are trying to reduce the number of comparisons, it turns out not to help.

1.6 Daily Exercise

By now you have heard me state on a number of occasions that you need at least N-1 comparisons to locate the largest element within a collection of N elements.

Consider the following recursive solution to determining maximumum element in array.

static Comparable largest (Comparable[] a, int lo, int hi) { if (hi <= lo) { return a[lo]; } int mid = lo + (hi - lo)/2; Comparable maxLeft = largest (a, lo, mid); Comparable maxRight = largest (a, mid+1, hi); if (less(maxLeft, maxRight)) { return maxRight; } else { return maxLeft; } }

Using the same approach used to count comparisons in MergeSort, write an equation to computer C(N) the number of comparisons and solve the equation assuming N = 2n or a power of two.

Note: this could be an exam question, but I need to prepare students more for this to be a reality.

1.7 Homework1 Feedback

To retrieve your homework feedback, visit the following web page:

http://www.wpi.edu/~heineman/cs2223/

1.8 Homework2 Instructions

I have to make some more detailed suggestions to the UniqueBag problem. Note, however, that you have enough to go on as it stands. I made two slight changes to the code I provided, so you should make these changes if you have already copied the old class file into your workspace:

1.9 Tilde Explanation

This notation was introduced on page 179 of the book. It is meant to approximate a more complicated function whose lower-order terms can be ignored as N grows.

As another example, assume f(n) = n3/2 - 950n. In the long run, the constant 950 doesn’t matter with regards to the overall growth of this function, and it can be approximated with the Tilde approximation:

n3/2

This concept is bound to the notion of Order of growth. You can start with the Tilde approximation and eliminate constants, because you want to know how the performance will change, for example, as the problem size doubles.

In the above case, the Order of Growth is n3.

1.10 Lecture Key Topics

1.11 Version : 2015/11/10

(c) 2015, George Heineman