Sorting and Searching
Sorting and Searching
Sorting and Searching
.
Searching data involves determining whether a value (referred to as the
search key) is present in the data and, if so, finding the value’s location.
◦ Two popular search algorithms are the simple linear search and the faster but
more complex binary search
Sorting places data in ascending or descending order, based on one or
more sort keys.
◦ You’ll learn about insertion sort and selection sort and the more efficient, but
more complex merge sort
◦ Introduces Big O notation, which is used to characterize an
algorithm’s worst-case runtime—that is, how hard an algorithm may
have to work to solve a problem.
Looking up a phone number, accessing a website and
checking a word’s definition in a dictionary all involve
searching through large amounts of data.
Searching algorithms all accomplish the same goal—
finding an element that matches a given search key, if
such an element does, in fact, exist.
The major difference is the amount of effort they
require to complete the search.
One way to describe this effort is with Big O notation.
◦ For searching and sorting algorithms, this is particularly
dependent on the number of data elements.
Function Template linearSearch
Function template linearSearch (Fig. 20.2, lines 10–18)
compares each element of an array with a search key (line 14).
Because the array is not in any particular order, it’s just as likely that
the search key will be found in the first element as the last.
On average, therefore, the program must compare the search key with
half of the array’s elements.
To determine that a value is not in the array, the program must
compare the search key to every array element.
Linear search works well for small or unsorted arrays.
However, for large arrays, linear searching is inefficient.
If the array is sorted (e.g., its elements are in ascending order), you
can use the high-speed binary search technique.
Big O: Constant Runtime
Suppose an algorithm simply tests whether the first element of an array is
equal to the second element.
If the array has 10 elements, this algorithm requires only one comparison.
If the array has 1000 elements, the algorithm still requires only one
comparison.
In fact, the algorithm is independent of the number of array elements.
This algorithm is said to have a constant runtime, which is represented in Big
O notation as O(1).
An algorithm that is O(1) does not necessarily require only one comparison.
O(1) just means that the number of comparisons is constant—it does not grow
as the size of the array increases.
An algorithm that tests whether the first element of an array is equal to any
of the next three elements will always require three comparisons, but in Big O
notation it’s still considered O(1).
O(1) is often pronounced “on the order of 1” or more simply “order 1.”
Big O: Linear Runtime
An algorithm that tests whether the first element of an array is
equal to any of the other elements of the array requires at
most n – 1 comparisons, where n is the number of elements in the
array.
If the array has 10 elements, the algorithm requires up to nine
comparisons.
If the array has 1000 elements, the algorithm requires up to
999 comparisons.
As n grows larger, the n part of the expression n – 1 “dominates,”
and subtracting one becomes inconsequential.
Big O is designed to highlight these dominant terms and ignore
terms that become unimportant as n grows.
An algorithm that requires a total of n – 1 comparisons
is said to be O(n) and is referred to as having a linear
runtime.
O(n) is often pronounced “on the order of n” or more
array element.
If this matches the search key, the algorithm ends.
Assuming the array is sorted in ascending order, then if the search
key is less than the middle element, the search key cannot match any
element in the array’s second half so the algorithm continues with
only the first half (i.e., the first element up to, but not including, the
middle element).
If the search key is greater than the middle element, the search key
cannot match any element in the array’s first half so the algorithm
continues with only the second half (i.e., the element after the middle
element through the last element).
Each iteration tests the middle value of the array’s remaining
elements.
If the element does not match the search key, the algorithm eliminates
half of the remaining elements.
The algorithm ends either by finding an element that matches the
search key or by reducing the sub-array to zero size.
Binary Search Example
Figure 20.3 implements and demonstrates the binary-search algorithm.
Throughout the program’s execution, we use function template
displayElements (lines 11–22) to display the portion of the array
that’s currently being searched.
Lines 25–56 define function template
binarySearch, which has two parameters—a
reference to the array to search and a reference to the
search key.
Lines 28–30 calculate the low end index, high end
comparisons.
In Big O notation, smaller terms drop out and constants
original array.
Efficiency of Merge Sort
Merge sort is a far more efficient algorithm than either insertion sort or
selection sort—although that may be difficult to believe when looking at the
busy output in Fig. 20.6.
Consider the first (nonrecursive) call to function mergeSort (line 120).
This results in two recursive calls to function mergeSort with sub-
arrays that are each approximately half the original array’s size, and a
single call to function merge.
The call to merge requires, at worst, n – 1 comparisons to fill the original
array, which is O(n).
The two calls to function mergeSort result in four more recursive calls
to function mergeSort—each with a sub-array approximately one-
quarter the size of the original array—and two calls to function merge.
These two calls to function merge each require, at worst, n/2 – 1
comparisons, for a total number of comparisons of O(n).
This process continues, each call to mergeSort
generating two additional calls to mergeSort and a call to
merge, until the algorithm has split the array into one-
element sub-arrays.
At each level, O(n) comparisons are required to merge the
sub-arrays.
Each level splits the size of the arrays in half, so doubling
the size of the array requires one more level.
Quadrupling the size of the array requires two more levels.
This pattern is logarithmic and results in log2 n levels.
This results in a total efficiency of O(n log n).
Summary of Searching and Sorting Algorithm
Efficiencies
Figure 20.7 summarizes the searching and sorting