16 Search and Sort
16 Search and Sort
Hello!
The main topics for this chapter are searching and sorting data. We'll limit our discussion to arrays for
searching and sorting, so that our data are linear (it's much simpler to discuss the ideas of searching and
sorting, but we could also search and sort a tree structure of data, for instance).
So, although we present some pretty simple algorithms for sorting and searching, keep in mind that we are only
showing quite simple sorts; there is quite intensive study into what kind of algorithms perform best in which
situations. That said, we still want to be able to manage smaller sets of data, even if our small projects or toy
online database of phonebook entries can’t be macro-scaled to millions of entries without betraying a bit of
naiveté. So let’s get sorting, and let’s start searching!
Getting Started
Grab the two files, SearchSortComplete.java, and SearchSortIncomplete.java, to your machine and open
them (they are also pasted at the end of this document, though cut-pasting from a pdf is dicey).
https://cs.gmu.edu/~msnyde14/211/share/SearchSortComplete.java
https://cs.gmu.edu/~msnyde14/211/share/SearchSortIncomplete.java
You will be implementing some of the method 'stubs' throughout the chapter in the *Incomplete version, but
you will have the *Complete version for testing and comparison's sake. This lab will be slightly more about
observing code than writing it: we maybe saw some of these examples in class, and we'll just learn about a
couple more algorithms today.
There are a few things to notice about this class.
First of all, scroll down to the bottom and notice there a few 'helper' functions—
printArray, getChoice (two versions), printMenu, and displaySearchResult. This just helps us to
organize our basic testing and exploring harness. In the main method, using these methods makes the
code in the main method much shorter, and more descriptive. Instead of seeing a loop, and some
sc.nextFoo() stuff, we see getChoice; instead of a lot of print statements, we see printMenu.
Notice the two pairs of functions—the two overloadings of binarySearch, and the two overloadings of
quickSort. The versions with more parameters are called by the others: the simpler ones (fewer
parameters) are called "driver functions". All they do is take a request, make a start-up call to the 'real'
method with any initial values for the additional parameters, and pass any results through. We could
just as well have put the extra parameters (usually a zero or an x.length-1 or something) in our call from
the main method, but driver methods are one more way we can abstract details away from our code.
Notice the class variable debug; We make this variable at the very beginning, and then anywhere we
want to print things out in testing mode, we conditionally test if debug==true. Taking a quick look
inside a method like bubbleSort will exemplify this. Maybe using the debugger is a more effective way
to view the state, but if you want to take print-style debugging to the next level, this is one approach.
Your Turn!
You should be comfortable writing a linear search algorithm by now, so try implementing the body of
linearSearch without looking at your notes or the solution.
Test your code: noting the default array values created in main, run the program and plug in a key to
search for. Search for values that are:
in the array
not in the array
in the array multiple times
→ what would be some good unit tests for these? It's a good chance to practice your
testing skills if you've got the time.
Compare your code against the solution.
Each time we perform that swapping-traversal, one more almost-as-large value is guaranteed to have bubbled
up to its correct location; if we have an array of length n, and we make (n-1) traversals, then we know n
elements are in order. Thus all the elements are in order. So the outer loop must run as many as (n-1) times in
the worst case.
Bubble Sort is the simplest sort; we might bash it in class, because it does far more swaps than
necessary. Notice that, in sorting a particular spot, we might swap many of the values into the sorting spot, if
we keep on finding smaller values to the right.
Your Turn!
Try implementing bubble sort on your own, without looking at the solution.
Using the completed code, turn our debugging print statements on (set the class variable debug equal to
true); run bubble sort and see how it does a lot of swaps. Reset the array, and compare to Insertion Sort
(using the complete code's version so that there's an implementation there).
Now, try placing different values (and more of them) in the array—specifically,
in decreasing order. Now see how inefficient bubbleSort can be!
Going Further. We can make many improvements to the provided bubble sort implementation:
Make the inner for-loop smarter: if we've completed k traversals, we know the last k elements
are sorted; let's end not when i is less than the array length, but when it has entered the "already
sorted" zone.
Change the outer for-loop to a while loop. When should it quit? When no swaps were necessary
during the last walk. (Use a boolean variable to track this).
Your Turn!
Going Further: Implement and test your own binary search, trying not to look at the solution (or class
notes) in the process. This lends well to a recursive definition, but does not have to be.
Your Turn!
Try running this algorithm just like linear search, without sorting first. Notice you can get some buggy
results—search for a ‘6’ in the array originally listed, and the program crashes!
A binary search needs sorted data. Let’s fix the example:
Add a class variable, a boolean named isSorted. (This must be static; why?)
Go ahead and initialize it to false right there, on the same line. (Reminder: we must always initialize
on the same line for class variables—things at the class level don’t need an object, so we can't rely
on some constructor call to instantiate it).
Now, in the binarySearch method, at the very beginning, add an if-statement that checks
if isSorted is false, and if so run a sort algorithm and set isSorted to true (take your pick of the
various sorting methods in our program).
We don’t have any other searches in our lab that require a sorted array, so we don’t have to fix this
anywhere else. However, if we copy over the old array, we need to reset this variable to false. Go
to the main method, and look for the option that resets the array to the copy. Immediately following
that code, set isSorted to false. Now, our binarySearch will conditionally sort the information, if
necessary, and won’t spend all that time if it’s already sorted! This is especially good because many
sorting algorithms are pathologically slow when working on mostly- or completely-sorted
data. Always sorting first would be an expensive waste of time!
The first iteration, we make sure the first element by itself is sorted. Well, yes, of course it is. So really, the first
step is to add one more unsorted location to our already sorted list of length one, meaning we consider the first
two items. If necessary, swap them.
Your Turn!
Try implementing insertion sort. Notice that we still will likely have two for-loops: one inner one that
handles inserting the next item into our sorted portion of the array, and one outer loop that guides us to
adding each unsorted element to the sorted portion, one at a time.
Okay, one way or another, let's now take a look at the solution:
Notice that the inner loop counts down through indexes, starting as high as the outer loop tells us the
next unsorted location is. If you had an array drawn on paper, try tracing your finger over indexes that
correspond to j's values.
After traversing down through our sorted array and swapping our unsorted element as far down as
necessary, this additional value is now in its sorted place. Once it no longer needs to be swapped, it
won't have to go any further so we can quit the inner loop.
Merge Sort is a recursive algorithm, as implemented. This means we have to check for base cases first, and
then complete the inductive (recursive) cases.
Your Turn!
If you're up for the challenge, try implementing merge sort!
o First, create two sub-lists by splitting the list in half (make two new arrays).
o Then, recursively call merge sort on each.
o Then, recreate a list of the original length by always copying in the smallest value from either list
that hasn't yet been included. (Relying on their being sorted makes this simple, though you might
have a many-else-ifs block of code).
Okay, let's discuss the provided solution. It uses arrays instead of ArrayLists (which Snyder's in-class examples
showed), so that you can compare the two implementations.
This sort first creates two halves from the given array. Then, it calls mergeSort on both halves: we now have
two smaller sorted arrays, and we just need to combine them into one larger sorted array. Look at the code for
that zipping-together portion, and notice it basically checks if the value should come from the next in the first
half, the next in the second half, or the smallest of the next in each.
The reason this works is that lists with only one element in them get returned—they’re already sorted—
and from this base case, we use the zipping portion to merge two sorted sub-arrays together. So, what
this algorithm really does is break the array up into individual spots, then merge them all back together,
two arrays at a time, until we get the entire set of values back in one array.
Note that, especially with the creation of so many arrays all over the place, this is not going to be a very
efficient implementation: it uses up a lot of extra space storing all these 'copies' of the array in various
levels of separation. An in-place sort (which reuses the current locations in the array instead of creating
copies of those values somewhere else) would be much more space-efficient, though it would probably
mean more work shuffling values around in that limited space.
Your Turn!
Try running this sort, as well as in debug mode. It prints a lot, but you can sort of see the two steps—
splitting and merging—occurring, just by the width of the left and right arrays being used.
Your Turn!
Try turning on debugging to see how calls to quickSort operate.
What happens when two values are identical?
How does quickSort fare when the list is already sorted? When it's sorted the wrong way? (Try
starting with values such as {8,7,6,5,4,3,2,1}).
Closing Thoughts on Search & Sort
That's it for searching and sorting for us! Hopefully you now understand what the purpose of searching and
sorting is, at least for linear data structures like arrays. In later classes and projects, you will have larger and
more complex data structures, but the ideas of searching and sorting will constantly show up: searching through
those structures for specific parts (values); trying to organize unsorted data; or, even trying to maintain data in a
sorted fashion while adding elements to it one at a time.
We've looked at just a couple of ways to search through data. A linear search was very simple, and needed
nothing more than a way to inspect every location in some sequential order. But if we knew we had ordered
data, we could do much better with binary search: each inspection allows us to throw out half the remaining
search space each time.
We considered a few more ways to sort data, with increasing complexity and different approaches. Bubble sort
is very simple, but has pretty poor performance (e.g., it performs many more swaps on average than the other
sorts). Insertion sort keeps growing a sorted list at the front of the list, repeatedly sliding the next unsorted
value into our sorted portion until it fits. It performs a bit better than bubble sort. Next we considered merge
sort, a divide-and-conquer algorithm, which can easily be defined by recursion (but doesn't have to be). It
successively splits the list down into sub-lists until they are of size 1 (sorted by definition), and then merges
them back together, relying on the simplicity of combining two already-sorted lists. Lastly, we looked at quick
sort, which chooses a pivot value, puts all the smaller-than-pivot values on the left, all the larger-than-pivot
values on the right, and the pivot between them; it then recursively quicksorts the left and right, and is done.
These versions of sorting had different behaviors – some used lots of space (merge sort), some worked in place
(the other three). Some performed better on nearly-sorted data than others.
There are many ways to improve upon the searching and sorting that we've introduced here. What can you find
out about these two ubiquitous programming tasks?
SearchSortIncomplete.java
import java.util.*;
while (!quit) {
printMenu();
/*
Sorting Methods Section.
*/
/**
* Searching Methods Section.
*/
if (i < 10)
indices += " ";
if (i < 100)
indices += " ";
indices += " " + i;
}
System.out.println(vals + "\n" + indices);
}
}
SearchSortComplete.java
import java.util.*;
while (!quit) {
printMenu();
/*
Sorting Methods Section.
*/
//base cases
if (a.length <= 1) {
return;
}
//make two arrays. secondHalf might be one-longer than firstHalf for odd-length arrays.
int[] firstHalf = new int[halfnum];
int[] secondHalf = new int[a.length - halfnum];
if (debug) {
System.out.println("to merge: " + a.length);
printArray(a);
printArray(firstHalf);
printArray(secondHalf);
}
if (debug) {
System.out.println("having merged: " + a.length);
printArray(a);
printArray(firstHalf);
printArray(secondHalf);
}
/**
* merge the two halves back together. If you think of the smallest
* cases, this part actually does the sorting!
*/
int ifst = 0;
int isnd = 0;
if (debug) {
System.out.println("\tending a length: " + a.length);
printArray(a);
}
}
/*
Searching Methods Section.
*/
/**
* Helpful Functions Section. These all make the main method more
* "streamlined."
*/
if (i < 10)
indices += " ";
if (i < 100)
indices += " ";
indices += " " + i;
}
System.out.println(vals + "\n" + indices);
}