An Intuitive Introduction To Data Structures
An Intuitive Introduction To Data Structures
algorithms +
Lists +
An Intuitive Introduction
Stacks and Queues + to Data Structures
Recursion + Version 2.0
Binary Trees +
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike
Binary Search Trees + 4.0 Unported License
Heaps + Links
Sets and Hashing + A pdf version of the book
Maps + Text files needed for some of the examples and exercises
Exercises +
Preface
This book covers standard topics in data structures including running time
analysis, dynamic arrays, linked lists, stacks, queues, recursion, binary trees,
binary search trees, heaps, hashing, sets, maps, graphs, and sorting. It is
based on Data Structures and Algorithms classes I've taught over the years.
The first time I taught the course, I used a bunch of data structures and
introductory programming books as reference, and while I liked parts of all the
books, none of them approached things quite in the way I wanted, so I
decided to write my own book. I originally wrote this book in 2012. This
version is substantially revised and reorganized from that earlier version. My
approach to topics is a lot more intuitive than it is formal. If you are looking
for a formal approach, there are many books out there that take that
approach.
I've tried to keep explanations short and to the point. When I was learning this
material, I found that just reading about a data structure wasn't helping the
material to stick, so I would instead try to implement the data structure
myself. By doing that, I was really able to get a sense for how things work.
This book contains many implementations of data structures, but it is
recommended that you either try to implement the data structures yourself or
work along with the examples in the book.
There are a few hundred exercises. They are all grouped together in Chapter
12. It is highly recommended that you do some of the exercises in order to
become comfortable with the data structures and to get better at
programming.
Running times of If you spot any errors, please send me a note at heinold@msmary.edu.
algorithms +
Lists +
long total = 0;
total++;
Any of these algorithms would work fine if we are just adding up numbers
from 1 to 100. But if we are adding up numbers from 1 to a billion, then the
choice of algorithm would really matter. On my system, I tested out adding
integers from 1 to a billion. The first algorithm took about 1 second to add up
everything. The next algorithm returned the answer almost instantly. I
estimate the last algorithm would take around 12 years to finish, based on its
progress over the three minutes I ran it.
This notation, called big O notation, is used to measure the running time of
algorithms. Big O notation doesn't give the exact running time, but rather it's
Running times of an order of magnitude estimation. Measuring the exact running time of an
algorithms + algorithm isn't practical as so many different things like processor speed,
amount of memory, what else is running on the machine, the version of the
Lists + programming language, etc. can affect the running time. So instead, we use
big O notation, which measures how an algorithm's running time grows as
Stacks and Queues + some parameter n grows. In the example above, n is the integer we are
summing up to, but in other cases it might be the size of an array or list, the
Recursion + number of items in a matrix, etc.
Binary Trees + With big O, we usually only care about the dominant or most important term.
So something like O(n2+n+1) is the same as O(n2), as n and 1 are small in
Binary Search Trees +
comparison to n2 as n grows. Also, we usually ignore constants. So O(3.47n)
Heaps + is the same as O(n). And if we have something like O(2n+4n3+.6n2-1), we
would just write that as O(n3). Remember that big O is just an order of
Sets and Hashing + magnitude estimation, and we don't often care about being exact.
Maps +
1.2. Estimating the running times of
Graphs +
algorithms
Sorting +
To estimate the big O running time of an algorithm, here are a couple of rules
Exercises + of thumb:
Here are several examples of how to compute the big O running time of some
code segments. All of these involve working with an array, and we will assume
n is the length of the array.
int total = 0;
total += a[i];
System.out.println(a[i]+a[j]);
Lists +
These are two pretty ordinary loops, each running for n= a.length steps.
Stacks and Queues + They are nested, so their running times multiply, and overall the running
time is O(n2). For each of the n times the outer loop runs, the inner loop
Recursion + has to also run n times, which is where the n2 comes from.
Binary Trees + 3. Here are three nested loops:
Binary Search Trees + for (int i=0; i<a.length; i++)
System.out.println(a[i]+a[j]*a[k]);
Sets and Hashing +
This code runs in O(n3) time. Technically, since the second and third
Maps +
loops don't run the full length of the array, we could write it as O(n(n-1)
Graphs + (n-2)), but remember that we are only interested in the most important
term, which is n3, as n(n-1)(n-2) can be written as n3 plus a few smaller
Sorting + terms.
count++;
count2+=2;
int w = a[0];
int z = a[a.length-1];
System.out.println(w + z);
The running time here is O(1). The key idea is that the running time has
no dependence on n. No matter how large that array is, this code will
always take the same amount of time to run.
int c = 0;
c += a[i];
Despite the loop, this code runs in O(1) time. The loop always runs to
Running times of 100, regardless of how large the array is. Notice that a.length never
algorithms + appears in the code.
int sum = 0;
This code runs in O(log n) time. The reason is the that the loop variable
Binary Trees +
is increasing via i *= 2. It goes up as 1, 2, 4, 8, 16, 32, 64, 128, …. It
Binary Search Trees + takes only 10 steps to get to 1000, 20 steps to get to 1 million, and 30
steps to get to 1 billion.
Heaps +
The number of steps to get to n is gotten by solving 2x = n, and the
Sets and Hashing + solution to that is log2(n) (which we often write just as log(n)).**
Maps +
8. Here is a set of nested loops:
Graphs + n = a.length;
int m = n;
Exercises +
while (m > 1) {
sum += m;
m /= 2;
This code has running time O(n log n). The outer loop is a pretty
ordinary one and contributes the factor of n to the running time. The
inner loop is a logarithmic one as the loop variable is cut in half at each
stage. Since the loops are nested, we multiply their running times to get
O(n log n).
total = 0;
int i=0;
while (i<a.length) {
i++;
total += a[i];
total += a[i]*a[j];
The running time is O(n2). To get this, start with the fact that the while
loop runs in O(n) time. The nested loops' running times multiply to give
O(n · n/2). The while loop and the nested loops follow one another, so
we add their running times to get O(n+n · n/2). But remember that we
Running times of only care about the dominant term, and we drop constants, so the end
algorithms + result is O(n2).
Lists +
1.3. Common running times
Stacks and Queues +
The most common running times are probably O(1), O(log n), O(n), O(n log
Recursion + n), O(n2), and O(2n). These are listed in order of desirability. An O(1)
algorithm is usually the most preferable, while an O(2n) algorithm should be
Binary Trees +
avoided if at all possible.
Binary Search Trees +
To get a good feel for the functions, it helps to compare them side by side. The
Heaps + table below compares the values of several common functions for varying
values of n.
Sets and Hashing +
1 10 100 1000 10000
Maps +
1 1 1 1 1 1
Graphs + log n 0 3.3 6.6 10.0 13.3
n 1 10 100 1000 10,000
Sorting +
n log n 0 33 664 9966 132,877
Exercises + n2 1 100 10,000 1,000,000 100,000,000
2n 2 1024 1.2× 1030 1.1× 10301 2.0× 103010
Things to note:
1. The first line remains constant. This is why O(1) algorithms are usually
preferable. No matter how large the input gets, the running time stays
the same.
3. Linear growth (O(n)) is what we are most familiar with from real life. If
we double the input size, we double the running time. If we increase the
input size by a factor of 10, the running time also increases by a factor
of 10. This kind of growth is often manageable. A lot of important
algorithms can't be done in O(1) or O(log n) time, so it's nice to be able
to find a linear algorithm in those cases.
Heaps + 6. The last line is exponential growth, O(2n). Exponential growth gets out
of hand extremely quickly. None of the examples we saw earlier had this
Sets and Hashing + running time, but there are a number of important practical optimization
problems whose best known running times are exponential.
Maps +
7. Other running times — There are infinitely many other running times,
Graphs + n
like O(n3), O(n4), …, or O(n1.5), O(n!), and O(22 ). The ones listed
Sorting + above are just the most common.
Exercises + 4. Big O notation sometimes can have multiple parameters. For instance, if
we are searching for a substring of length m inside a string of length n,
we might talk about an algorithm that has running time O(n+m).
5. Last, but not least, we have been using big O notation somewhat
incorrectly. Instead we really should be using something called big theta,
as in Θ(n) instead of O(n). In formal computer science, big O is sort of a
“no worse than” running time. That is, O(1) and O(n) algorithms are
technically also O(n2) algorithms because a running time of 1 or n is no
worse than n2. Big O is sort of like a “less than or equal to”, while big
theta is used for “exactly equal to”.
However, most people are not really
careful about this and use big O when technically they should use big
theta. We will do the same thing here. If you are around someone
pedantic or if you take a more advanced course in algorithms, then you
might want to use big theta, but otherwise big O is fine.
if (a[i]==item)
return true;
return false;
}
This is probably the most straightforward way of searching—check the first
Running times of
item, then the second item, etc. It is called a linear search, and its running
algorithms +
time is O(n). However, this is not the most efficient type of search. If the data
in the array is in order, then there is an O(log n) algorithm called the binary
Lists +
search that can be used.
Stacks and Queues +
To understand how it works, imagine a guess-a-number game. A person picks
Recursion + a random number from 1 to 1000 and you have 10 guesses to get it right.
After each guess, you are told if your guess is too high or too low. Your first
Binary Trees + guess should be 500, as whether it's too high or too low, you will immediately
have eliminated 500 numbers from consideration, which is the highest amount
Binary Search Trees + you can guarantee removing. Now let's suppose that 500 turns out to be too
low. Then you know that the number is between 500 and 1000, so your next
Heaps + guess should be halfway between them, at 750. Suppose 750 turns out to be
too high. Then you know the number is between 500 and 750, so your next
Sets and Hashing + guess should be 625, halfway between again.
Maps + At each step, the number of possibilities is cut in half from 1000 to 500 to 250,
and so on, and after 10 of these halvings, there is only one possibility left. A
Graphs +
binary search of an array is essentially this process. We look at the middle
Sorting + item of the array and compare it with the thing we are looking for. Either we
get really lucky and find it, or else that middle element is either larger or
Exercises + smaller than what we are looking for. If it is larger, then we know to check the
first half of the array, and otherwise we check the second half. Then we repeat
the process just like with the guess-a-number game, until we either find the
element or run out of places to look. In general, if there are n items, then no
more than about log2(n) guesses will be needed, making this an O(log n)
algorithm. Here is the code for it:
if (a[mid] == item)
return true;
end = mid - 1;
else
start = mid + 1;
return false;
Inside the loop, the code first locates the middle index** We then compare the
middle item to the one we are looking for and adjust the start or end
variables, which has the effect of focusing our attention on one half or the
other of the array.
If possible, a binary search should be used over a linear search. For instance,
a linear search on a list of 10 billion items will involve 10 billion checks if the
Running times of
item is not in the list, while a binary search will only require log2(10 billion})=
algorithms +
34 checks.
Lists +
However, binary search does require the data to be sorted, and the initial sort
Stacks and Queues + will take longer than a single linear search. So a linear search would make
sense if you are only doing one search, and it would also make sense if the
Recursion + items in the array are of a data type that can't be sorted. A linear search
would also be fine for very small arrays.
Binary Trees +
Heaps +
Chapter 2. Lists
Sets and Hashing +
The first data structure we will talk about is the list. A list is a collection of
objects, such as [5, 31, 19, 27, 16], where order matters and there may be
repeated elements. We will look at two ways to implement lists: dynamic
arrays and linked lists.
Let's start out with looking at an ordinary Java array. An array is data stored in
a contiguous chunk of memory, like in the figure below.
Each cell in the array has the same data type, meaning it takes up the same
amount of space. This makes it quick to access any given location in the array.
For instance, to access the element at index 5 in an array a in Java, we use
a[5]. Internally, Java finds that location via a simple computation, which takes
the starting memory location of the array and adds to it 5 times the size of the
array's data type. In short, getting and setting elements in an array is a very
fast O(1) operation.
The main limitation of Java's arrays is that they are fixed in size. Often we
don't know up front how much space we will need. We can always just create
our arrays with a million elements apiece, but that's wasteful. This brings us to
the concept of dynamic arrays. A dynamic array is an array that can grow as
Running times of
needed to accommodate new elements. We can create this data structure
algorithms +
using an ordinary array. We can start the array small, maybe making it 10
Lists + items long. Once we use up those 10 spots in the array, we create a new array
and copy over all the stuff from the old array to the new one. Java will
Stacks and Queues + automatically free up the space in that old array to be used elsewhere in a
process called garbage collection.
Recursion +
One question is how big we should make the new array. If we make it too
Binary Trees + small, then the expensive operation of copying the old array over will have to
be done too often. A simple and reasonable thing to do is to make the new
Binary Search Trees + array double the size of the old one.
Heaps +
Implementing a dynamic array
Sets and Hashing +
Here is the start of a dynamic array class:
Maps +
public class AList {
numElements = 0;
capacity = 10;
We have an array to hold the data, a variable called numElements thats keeps
track of how many data values are in the list, and a variable called capacity
that is the size of the data array. Once numElements reaches capacity, we will
run out of room in the current array.
For instance, consider the list [77, 24, 11]. It has three elements, so
numElements is 3. The capacity in the figure is 10.**
Adding elements
To add an element, there are a few things to do. First, we need to check if the
array is at capacity. If so, then we need to enlarge it. We do that by calling a
separate enlarge method. In terms of actually adding the element to the array,
we use the numElements variable to tell us where in the data array to add the
element. Specifically, it should go right after the last data value. We also need
to increase numElements by 1 after adding the new item. Here is the code:
public void add(int item) {
Running times of
enlarge();
numElements++;
Lists + }
Stacks and Queues + Below is the code that enlarges the array as needed. What we do is create a
new array that is twice the size of the current one, then copy over all data
Recursion +
from the old array into this one. After that, we update the capacity variable
Binary Trees + and point the data variable to our new array.
copy[i] = data[i];
data = copy;
Maps + }
Graphs + Notice the design decision to make enlarge a private method. This is because
it is part of the internal plumbing that our class uses to make things work, but
Sorting + it's not something that users of our class should need to see or use.**
Exercises +
Displaying the contents of the dynamic array
We now have a working dynamic array, but we don't have any way yet to see
what is in it. To do that, we will write a toString method. The toString method
is a special method that is designed to work with Java's printing methods. So if
we create an AList object called list, then System.out.println(list) will
automatically call our toString method to get a string representation of our
list to print. Here is the method:
@Override
if (numElements == 0)
return "[]";
String s = "[";
s += data[numElements-1] + "]";
return s;
It returns the items in the list with square brackets around them and commas
between them. The code builds up a string one item at a time.** In order to
avoid a comma being generated after the last element, we stop the loop one
short of the end, and add the last element in separately. Note also the special
case for an empty list.
Here is some code to test out our class. We first add a couple of elements and
print out the list, and then we add enough elements to force the array to have
Running times of to be enlarged, to make sure that process works.
algorithms +
public static void main(String[] args) {
list.add(2);
System.out.println(list);
list.add(i);
}
Binary Search Trees +
When testing out a class like this, be sure also test out all the various edge
Heaps + cases you can think of. For example, if testing out a method that inserts items
into a list, edge cases would be inserting at the start or end of a list or
Sets and Hashing + inserting into an empty list. In general, try to think of as many interesting
scenarios that can give your code trouble. The extra time spent testing now
Maps +
can save a lot more time in the future tracking down bugs.
Graphs +
More methods
Sorting +
To make our list class more useful, there are a few other methods we might
Exercises + add. Some of these methods are also good for programming practice, as they
demonstrate useful techniques for working with lists and arrays.
return data[index];
data[index] = value;
In both cases, we really should do some error-checking to make sure the index
chosen is valid. Both methods should have code like the following:
For the sake of simplicity, since our goal is to understand how data structures
work, we will usually not do much error-checking, as real error-checking code
can often become long enough to obscure how the data structure works.
Getting the size of the list
Running times of Since we have a variable keeping track of how many things are in the list,
algorithms + finding the size is easy:
Lists +
return numElements;
}
Stacks and Queues +
Lists often have a separate method called isEmpty that tells if the list has
Recursion +
anything in it or not. We can do that by checking if numElements is 0 or not. A
Binary Trees + standard way to do that would involve four lines of code with an if/else
statement. However, there is a slick way to do this in one line of code, shown
Binary Search Trees + below.
return numElements == 0;
A contains method
A useful task is to see if a list contains something. We can use a linear search
to do that.
if (data[i] == value)
return true;
return false;
A common mistake is to have an else statement inside the loop that returns
false. The problem with this is that the code checks the first item and
immediately returns false if it is not the thing being searched for. In the
approach above, we only return false if the code gets all the way through the
loop and hasn't found the desired element.
Reversing a list
The first thing we would want to decide is if we want it to modify the list itself
or return a new list. Let's say we want to modify the list. Here is one
approach:
int j = 0;
}
Stacks and Queues +
This is a relatively straightforward approach that loops through the data array
Recursion + backwards and fills up a new array with those elements and then replaces the
old data array with the new one. Note that we use a separate index variable j
Binary Trees + to advance us through the new array. Here is another approach that avoids
creating a new array:
Binary Search Trees +
public void reverse2() {
data[numElements-1-i] = hold;
Maps + }
}
Graphs +
The way this approach works is it swaps the first and last element, then it
Sorting + swaps the second and second-to-last elements, etc. It stops right at the
middle of the list.
Exercises +
Note: A common mistake in writing these and other methods is to loop to
capacity instead of numElements. The array's data values end at index
numElements-1. Everything beyond there is essentially garbage.
Deleting things
An efficient way to delete an item from the array is to slide everything to the
right of it down by one index. As each item is being slid down, it overwrites
the item in that place. For an illustration, see the figure below where the
element 5 is being deleted. The algorithm starts by overwriting 5 with 7, then
overwriting 7 with 8, and then overwriting 8 with 9. The last item in the list is
not overwritten, but by decreasing numElements by 1, we make that index
essentially inaccessible to the other methods because all of our methods never
look beyond index numElements-1.
data[i] = data[i+1];
numElements--;
}
Running times of
algorithms +
Heaps + enlarge();
data[index] = value;
Maps + numElements++;
}
Graphs +
Running times
Sorting +
Looking back at the code we wrote, here are the big O running times of the
Exercises + methods.
1. The add method is O(1) most of the time. The enlarge method is O(n)
and every once in a while the add method needs to call it.**
2. The size and isEmpty methods are O(1). Each of them consists of a
single operation with no loop involved. In some implementations of a list
data structure, the size method would be O(n), where you would loop
through the list and count how many items you see. However, since we
are maintaining a variable numElements that keeps track of the size, that
counting is spread out over all the calls to the add method.
3. The get and set methods are O(1). Each of them consists of a single
operation with no loop involved.
4. The reverse method is O(n). The first reverse method we wrote has a
single loop that runs through the entire array, so it's O(n). The second
reverse method loops through half of the array, so it's O(n/2), but since
we ignore constants, we say it's O(n). Still in a practical sense, the
second method is about twice as fast as the first.
5. The contains, insert, and delete methods are all O(n). Remember that
for big O, we are usually concerned with the worst case scenarios. For
the contains method, this would be if the item is not in the list. For the
insert and delete methods, this would be for inserting/deleting at index
0. In all these cases, we have to loop through the entire array, so we get
O(n).
Running times of 2.2. Linked lists
algorithms +
Linked lists provide a completely different approach to implementing a list data
Lists + structure. With arrays, the elements are all stored next to each other, one
after another, in memory. With a linked list, the elements can be spread out all
Stacks and Queues + over memory.
Recursion + In particular, each element in a linked list is paired with a link that points to
where the next element is stored in memory. So a linked list is just sort of
Binary Trees + strung along from one element to the next across memory, as in the figure
below.
Binary Search Trees +
Heaps +
Maps +
Graphs +
Sorting +
Exercises +
Each item in a linked list is called a node. Each node consists of a data value
as well as a link to the next node in the list. The last node's link is null; in
other words, it is a link going nowhere. This indicates the end of the list.
this.value = value;
algorithms +
}
Lists +
private Node front;
public LList() {
Binary Trees + }
Binary Search Trees + The linked list class itself only has one class variable, a Node variable called
front that keeps track of which node is at the front of the list. The Node class is
Heaps + made private since it is an internal implementation detail of the LList class,
not something that users of our list need to worry about. The Node class is
Sets and Hashing + quite simple, having just the two class variables and a constructor that sets
their values.**
Maps +
Graphs +
Adding to the front of a linked list
Sorting +
Let's look first at adding to the front of a linked list. To do that, we need to do
Exercises + three things: create a new node, point that new node to the old front of the
list, and then make the new node into the new front of the list. See the figure
below.
It often helps to draw figures like this first and use them to create the linked
list code. Here is code that does this process:.
node.next = front;
front = node;
Working with linked lists often involves creating new nodes and moving links
around. Below is an imaginative view of what memory might look like, with a
linked list before and after adding a new item at the front.
Running times of
algorithms +
Lists + We see that we have to first create the new node in memory. As we're creating
that node, we specify what it will point to, and after that we have to move the
Stacks and Queues + front marker from its old location so that it's now indicating the new node is
the front.
Recursion +
The three lines of code can actually be done in a single line, like below:
Binary Trees +
public void addToFront(int value) {
}
Heaps +
This loop is one that is very useful for looping through linked lists. The idea of
it is that we create a Node variable called node that acts as a loop variable,
similar to how the variable i is used in a typical for loop. This Node variable
starts at the beginning of the list via the line node = front. Then the line node
= node.next moves the node variable through the list one node at a time. The
loop condition node.next != null stops us just short of the last variable. A lot
of times we will want to use node != null in its place to make sure we get
through the whole list. In both cases, remember that the way we know when
to stop the loop is that the end of the list is marked by a null link.
if (front == null)
else {
node = node.next;
Note the special case at the start. Checking if front==null checks if the list is
empty. In particular, when adding to an empty list, not only do we have to
create a new node, but we also need to set the front variable to point to it. If
Running times of we leave this code out, we would get a very common linked list error
algorithms + message: a null pointer exception. What would happen is if the list is empty,
then node=front will set node to null. Then when we try to check node.next, we
Lists + are trying to get a field from something that has no fields (it is null). That
gives a null pointer exception.
Stacks and Queues +
Recursion + The last line of code of the method creates the new node and points the old
last node of the list to this new node. This is shown in the figure below.
Binary Trees +
Heaps + Note also that the code that creates a new node sets its link to null. This is
because the end of a list is indicated by a null link, and the node we are
Sets and Hashing +
creating is to go at the end of the list.
Maps +
Graphs + toString()
This method is a lot like the toString method for dynamic arrays except that
Sorting +
in place of the for loop that is used for dynamic arrays, we use a linked list
Exercises + while loop. Here is the code:
@Override
if (front == null)
return "[]";
String s = "[";
node = node.next;
}
s += node.value + "]";
return s;
list.add(1);
list.add(2);
System.out.println(list);
It's worth stopping here to take stock of what we have done. Namely, we have
managed to create a list essentially from scratch, without using arrays, where
the array-like behavior is simulated by linking the items together.
Running times of size and isEmpty
algorithms + Finding the size of a linked list gives us another opportunity to use the
standard linked list loop:
Lists +
public int size() {
count++;
}
Heaps +
To check if a list is empty, we need only check if the front variable points to
Sets and Hashing + something or if it is null. This can be done very quickly, like below:
Maps + public boolean isEmpty() {
int count = 0;
count++;
node = node.next;
}
return node.value;
int count = 0;
count++;
node = node.next;
}
node.value = value;
}
Running times of Deleting items
algorithms +
To delete a node, we reroute the link of the node immediately before it as
Lists + below (the middle node is being deleted).
Recursion +
Binary Trees +
Heaps +
To do this, we step through the list node-by-node, using a loop similar to the
Sets and Hashing + ones for the get and set methods. Once the loop variable node reaches the
node immediately before the one being deleted, we use the following line to
Maps + perform the deletion:
Sorting + This line reroute's node's link around the deleted node so it points to the node
immediately after the deleted node. That deleted node is still sitting there in
Exercises + memory, but with nothing now pointing to it, it is no longer part of the list.
Eventually Java's garbage collection algorithm will come around and allow that
part of memory to be reused for other purposes.
The full delete code is below. Note that the line above will not work for
deleting the first index, so we need a special case for that. To handle it, it
suffices to do front = front.next, which moves the front variable to point to
the second thing in the list, making that the new front.
if (index == 0)
front = front.next;
else {
int count = 0;
count++;
node = node.next;
node.next = node.next.next;
Note also that we don't need a special case for deleting the last thing in the
list. Try to see why the code will work just fine in that case.
Lists +
Recursion +
Binary Trees +
We can see that to insert a node, we go the node immediately before index we
Binary Search Trees + need. Let's call it node. We point node's link to a new node that we create, and
make sure the new node's link points to what the node used to point to. The
Heaps +
node creation and rerouting of links can be done in one line, like this:
Sets and Hashing + node.next = new Node(value, node.next);
Maps +
Much of the code for inserting into a list is similar to the delete method. In
Graphs + particular, we need a special case for inserting at the beginning of the list
(where we can simply call the addToFront method that we already wrote), and
Sorting + the code that loops to get to the insertion point is the same as the delete
method's code. Here is the method:
Exercises +
public void insert(int index, int value) {
if (index == 0)
addToFront(value);
else {
int count = 0;
count++;
node = node.next;
Basics
First, remember the basic idea of a linked list, that each item in the list is a
node that contains a data value and a link to the next node in the list. Creating
a new node is done with a line like below:
Stacks and Queues + To change the link of node to point to something else, say the front of the list,
we could do the following:
Recursion +
node.next = front;
Binary Trees +
Remember that the front marker is a Node variable that keeps track of which
Binary Search Trees + node is at the front of the list. The line below moves the front marker one
spot forward in the list (essentially deleting the first element of the list):
Heaps +
front = front.next;
Sets and Hashing +
Looping
Maps +
To move through a linked list, a loop like below is used:
Graphs +
Node node = front;
node = node.next;
Exercises +
This will loop through the entire list. If we use node.next != null, that will
stop us at the last element instead of moving through the whole list. If we
want to stop at a certain index in the list, we can use a loop like the one below
(looping to index 10):
int count = 0;
count++;
node = node.next;
Using pictures
Many linked list methods just involve creating some new nodes and moving
some links around. To keep everything straight, it can be helpful to draw a
picture. For instance, below is the picture we used earlier when describing how
to add a new node to the front of a list:
Running times of This line does a few things. It creates a new node, points that node to the old
algorithms + front of the list, and moves the front marker to be at this new node.
Graphs +
Null pointer exceptions
Sorting + When working with nodes, there is a good chance of making a mistake and
getting a null pointer exception from Java. This usually happens if you use an
Exercises + object without first initializing it. For example, the following will cause a null
pointer exception:
Node node;
if (node.value == 3)
Since the node variable was not initialized to anything, its current value is the
special Java value, null. When we try to access node.value, we get a null
pointer exception because node is null and has no field called value.
The correction is to either set node equal to some other Node object or initialize
it like below:
A doubling method
Here is a linked list method that replaces each copy of an element with two
copies of itself. For instance, if the list is [1,2,3], after calling this method it
will be [1,1,2,2,3,3]. Here is the code:
Running times of
algorithms + public void doubler() {
Recursion + }
Binary Trees + We do this with a single loop. At each step of the loop, we make a copy of the
current node and point the current node to the copy. Then we move to the
Binary Search Trees + next node. But we can't just do node = node.next because that will move from
the current node to its copy, and we will end up with an infinite loop that keeps
Heaps +
copying the same value over and over. Instead, node = node.next.next moves
us forward by two elements in the list.
Sets and Hashing +
Note that we could do this method by looping through the list and calling the
Maps +
insert method. However, that would be an O(n2) approach as the insert
Graphs + method itself uses a loop to get to the appropriate locations. That of course is
wasteful since we are already at the appropriate location. The approach given
Sorting + above is O(n).
Exercises +
Removing zeroes
Let's write a method to remove all the zeroes from the list. We could do this by
looping through the list, checking if the current element is 0, and calling the
delete method, but that would be an O(n2) approach. Here is an O(n)
approach:
front = front.next;
node.next = node.next.next;
node = node.next;
}
Writing this requires some real care. First, dealing with zeroes at the start of
the list is different from dealing with them in the middle. To delete a zero at
the front of the array, we can use front = front.next. However, it may
happen that the next element is also zero and maybe the one after that, so we
use a loop. We also need to be careful that the list is not all zeroes, as we can
easily end up with a null pointer exception that way. The first two lines of the
method take care of all of this.
The next few lines handle zeroes in the middle of the array. We can delete
these zeroes by rerouting links in the same way that the delete method does.
Running times of
But we have to be careful about multiple zeroes in a row. To handle this, every
algorithms +
time we see a zero, we loop until we stop seeing zeroes or reach the end of
the list. Note that despite the nested loops, they both affect the loop variable
Lists +
node, so overall this is still an O(n) algorithm.
Stacks and Queues +
To test this method out, try it with a variety of cases including lists of all
Recursion + zeroes, lists that start with multiple zeroes, lists that end with multiple zeroes,
and various other combinations.
Binary Trees +
Sets and Hashing + 1. The addToFront method is O(1) as it only involves creating a new node
and reassigning the front variable.
Maps +
2. The add method, the way we wrote it, is O(n) since we loop all the way
Graphs + to the end of the list before creating and linking the new node. We could
turn this into an O(1) method if we maintain a back variable, analogous
Sorting + to the front variable, that keeps track of where the back of the list is.
Doing that requires a little more care in some of the other methods. For
Exercises + instance, in the delete method, we would now need a special case to
handle deleting the last item in the list, as the back variable would need
to change.
3. The size method is O(n) because we loop through the entire list to count
how many items there are. We could make it O(1) by adding a
numElements variable to the class, similar to what we have with our
dynamic array class. Every time a node is added or removed, we would
update that variable. The size method would then just have to return
the value of that variable.
4. The isEmpty method is O(1) since it's just a simple check to see if front
is null.
6. The insert and delete methods are O(n). The reason is that we have to
loop through the list to get to the insertion/deletion point. However, once
there, the insertion and deletion operations are quick. This is in contrast
to dynamic arrays, where getting to the insertion/deletion point is quick,
but the actual insertion/deletion is not so quick because all the elements
after the insertion/deletion point need to be moved.
Comparing dynamic arrays and linked lists
Running times of
Dynamic arrays are probably the better choice in most cases. One of the big
algorithms +
advantages of dynamic arrays has to do with cache performance. Not all RAM
Lists + is created equal. RAM access can be slow, so modern processors include a
small amount cache memory that is faster to access than regular RAM. Arrays,
Stacks and Queues + occupying contiguous chunks of memory, can be easily moved into the cache
all at once. Linked list nodes, which are spread out through memory, can't be
Recursion + moved into the cache as easily.
Binary Trees + Dynamic arrays are also better when a lot of random access (accessing things
in the middle of the list) is required. This is because getting and setting
Binary Search Trees + elements is O(1) for dynamic arrays, while it is O(n) for linked lists.
Heaps + In terms of memory usage, either type of list is fine for most applications. For
really large lists, remember that linked lists require extra memory for all the
Sets and Hashing + links, an extra five or six bytes per list element. Dynamic arrays have their
own potential problems in that not every space allocated for the array is used,
Maps +
and for really large arrays, finding a contiguous block of memory could be
Graphs + tricky, especially if memory has become fragmented (as chunks of memory are
allocated and freed, memory can start to look like Swiss cheese).
Sorting +
Despite the fact that dynamic arrays are the better choice for most
Exercises + applications, it is still worth learning linked lists, as they are a fundamental
topic in computer science that every computer science student is expected to
know. They are popular in job interview questions, and the ideas behind them
are useful in many contexts. For instance, some file systems use linked lists to
store the blocks of the file spread out over a disk drive.
The second variant of a linked list is a doubly-linked list, where each node has
two links, one to the next node and one to the previous node, as shown below:
Running times of
algorithms +
Lists +
Having two links can be useful for methods that need to know the predecessor
Stacks and Queues +
of a node or for traversing the list in reverse order. Doubly-linked lists are a
Recursion + little more complex to implement because we now have to keep track of two
links for each node. To keep track of all the links, it can be helpful to sketch
Binary Trees + things out. For instance, here is a sketch useful for adding a node to the front
of a doubly-linked list.
Binary Search Trees +
Heaps +
Maps + We see from the sketch that we need to create a new node whose next link
points to the old front and whose previous link is null. We also have to set the
Graphs + previous link of the old front to point to this new node, and then set the front
marker to the new node. A special case is needed if the list is empty.
Sorting +
The slanted brackets are used to indicate a generic type. After that, the main
thing we have to do is anywhere we refer to the data type of the values stored
in the list, we have to change it from int to T. For example, the get method
changes as shown below:
algorithms +
count++; count++;
Lists +
} }
} }
Recursion +
Notice that the return type of the method is now type T instead of int. Notice
Binary Trees + also that index stays an int. Indices are still integers, so they don't change.
It's only the data type stored in the list that changes to type T.
Binary Search Trees +
One other small change is that when comparing generic types, using ==
Heaps + doesn't work anymore, so we have to replace that with a call to .equals, just
like when comparing strings in Java.
Sets and Hashing +
Here is some code that shows how to create an object from the generic class
Maps + (which we'll call GLList)
Sorting +
2.6. Lists in the Java Collections Framework
Exercises +
While it is instructive to write our own list classes, it is best to use the ones
provided by Java for most practical applications. Java's list classes have been
extensively tested and optimized, whereas ours are mostly designed to help us
understand how the two types of lists work. On the other hand, it's nice to
know that if we need something slightly different from what Java provides, we
could code it.
The Integer in the slanted braces indicates the data type that the list will hold.
This is an example of Java generics. We can put any class name here. Here are
some examples of other types of lists we could have:
The data type always has to be a class. In particular, since int is a primitive
data type and not a class, we need to use Integer, which is the class version
of int, in the declaration. Here are a few lines of code demonstrating how to
Running times of
work with lists in Java:
algorithms +
// Create list, add to list, print list
list.add(2);
System.out.println(list);
Recursion +
Binary Trees +
list.set(1, 99);
Collections.shuffle(list);
Maps +
System.out.println(list.get(i));
Sorting +
System.out.println(x);
Note: Java has two list classes built in: one from java.util and one from
java.awt. You want the one from java.util. If you find yourself getting weird
list errors, check to make sure the right list type is imported.
List methods
Here are the most useful methods of the List interface. They are available to
both ArrayLists and LinkedLists.
Method Description
add(x) adds x to the list
add(i, x) adds x to the list at index i
contains(x) returns whether x is in the list
equals(list2) returns whether the list equals list2
get(i) returns the value at index i
indexOf(x) returns the first index (location) of x in the list
isEmpty() returns whether the list is empty
lastIndexOf(x) like indexOf, but returns the last index
remove(i) removes the item at index i
remove(x) removes the first occurrence of the object x
removeAll(x) removes all occurrences of x
set(i, x) sets the value at index i to x
Running times of size() returns the number of elements in the list
algorithms + subList(i,j) returns a slice of a list from index i to j-1, sort of
like substring
Lists +
The constructor itself is also useful for making a copy of a list. See below:
Stacks and Queues +
List<Integer> list2 = new ArrayList<Integer>(list);
Recursion +
Note that setting list2 = list would not have the same effect. It would just
Binary Trees + create an alias, where list and list2 would be two names for the same object
in memory. The code above, however, creates a brand new list, so that list
Binary Search Trees +
and list2 reference totally different lists in memory. Forgetting this is a really
Heaps + common source of bugs.
A couple of examples
To demonstrate how to work with lists, here is some code that builds a list of
the dates of the year and then prints random dates. Notice that we make use
of lists and loops to do this instead of copy-pasting 365 dates.
Collections.addAll(monthDays, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
Running times of
algorithms +
for (int j=1; j<monthDays.get(i)+1; j++)
Lists +
Stacks and Queues + // print out 10 random dates where repeats could possibly happen
System.out.println(dates.get(random.nextInt(dates.size())));
Binary Trees +
Heaps + System.out.println(dates.get(i));
Sets and Hashing + Here is another example that reads from a wordlist file and prints a few
different things about the words in the file. The word list file is assumed to
Maps + have one word on each line. Wordlists of common English words are easy to
find on the internet and are particularly useful.
Graphs +
List<String> words = new ArrayList<String>();
while (scanner.hasNext())
Exercises + words.add(scanner.nextLine());
longest = word;
System.out.println(longest);
Write a program that asks the user to enter a length in feet. The
program should then give the user the option to convert from feet
into inches, yards, miles, millimeters, centimeters, meters, or
kilometers. Say if the user enters a 1, then the program converts
to inches, if they enter a 2, then the program converts to yards,
etc.
Recursion +
Heaps +
3.1. Introduction
Sets and Hashing + This chapter is about two of the simplest and most useful data structures—
stacks and queues. Stacks and queues behave like lists of items, each with its
Maps + own rules for how items are added and removed.
Graphs + Stacks
Sorting +
A stack is a LIFO structure, short for last in, first out. Think of a stack of
Exercises + dishes, where the last dish placed on the stack is the first one to be taken off.
Or as another example, think of a discard pile in a game of cards where the
last card put on the pile is the first one to be picked up. Stacks have two main
operations: push and pop. The push operation puts a new element on the top
of the stack and the pop element removes whatever is on the top of the stack
and returns it to the caller.
Queues
A queue is a FIFO structure, short for first in, first out. A queue is like a line
(or queue) at a store. The first person in line is the first person to check out. If
a line at a store were implemented as a stack, on the other hand, the last
person in line would be the first to check out, which would lead to lots of angry
customers. Queues have two main operations, add and remove. The add
operation adds a new element to the back of the queue, and the remove
operation removes the element at the front of the queue and returns it to the
caller.
Here is an example of some queue operations, where the front of the queue is
at the left.
Running times of
algorithms + Deques
Lists + There is one other data structure, called a deque, that we will mention briefly.
It behaves like a list that supports adding and removing elements on either
Stacks and Queues + end. The name deque is short for double-ended queue. Since it supports
addition and deletion at both ends, it can also be used as a stack or a queue.
Recursion +
public T value;
this.value = value;
this.next = next;
public LLStack() {
top = null;
public T pop() {
T returnValue = top.value;
top = top.next;
return returnValue;
This not much different from our linked list class. The Node class is the same,
the variable top is the same as the linked list class's front variable, and the
push method is the same as the addToFront method. The only new thing is the
pop method, which removes the top of the stack via the line top = top.next
and also returns the value stored there.
Running times of
Notice that the push and pop methods both have an O(1) running time.
algorithms +
this.next = next;
Maps + }
Graphs +
Node front;
if (back != null)
back = back.next = newNode;
else
public T remove() {
T save = front.value;
front = front.next;
if (front == null)
back = null;
return save;
With a stack, pushing and popping happen both at the first item in the list.
With a queue, removing items happens at the first item, but adding happens
at the end of the list. This means that our queue's remove method is mostly the
same as our stack's pop method, but to make it so the add method is O(1), we
add a back variable that keeps track of what node is in the back. Keeping track
of that back variable adds a few complications, as we need to be extra careful
when the queue empties out. See the figure below for what is going on with
the add method:
Running times of
algorithms +
Lists +
Stacks and Queues + Note that both the add and remove methods are O(1).
Recursion +
3.4. Stacks, queues, and deques in the
Binary Trees +
Collections Framework
Binary Search Trees +
The Collections Framework supports all three of these data structures. The
Heaps + Collections Framework does contain a Stack interface, but it is old and flawed
and only kept around to avoid breaking old code. The Java documentation
Sets and Hashing + recommends against using it. Instead, it recommends using the Deque
interface, which is usually implemented by the ArrayDeque class. That class can
Maps + also implement queues, so we will use it to do so. Here is some code showing
how things work:
Graphs +
Deque<Integer> stack = new ArrayDeque<Integer>();
Exercises + stack.push(3);
System.out.println(stack.pop());
queue.add(3);
System.out.println(queue.remove());
The <Deque> interface also contains a method called element, which can be
used to view the top/front of the stack/queue without removing it. It also
contains size and isEmpty methods, and more. In particular, a deque, or
double-ended queue, allows for O(1) additions and removals at both the front
and back. The methods for these are addFirst, addLast, removeFirst, and
removeLast. The stack and queue methods push, pop, etc. are synonyms for
certain of these.
3.5. Applications
Stacks and queues have many applications. Here are a few.
2. For card games, stacks are useful for representing a discard pile, where
the card most recently placed on the pile is the first one to be picked
back up. Queues are useful for the game War, where players play the
top card from their hand and cards that are won are placed on the
Running times of bottom of the hand.
algorithms +
3. Stacks can be used to reverse a list. Here is a little code to demonstrate
Lists + that:
stack.push(x);
Binary Trees +
}
Heaps +
4. A common algorithm using stacks is for checking if parentheses are
Sets and Hashing + balanced. To be balanced, each opening parenthesis must have a
matching closing parenthesis and any other parentheses that come
Maps + between the two must themselves be opened and closed before the
outer group is closed. For instance, the parentheses in the expression
Graphs + (2*[3+4*{5+6}]) are balanced, but those in (2+3*(4+5) and 2+(3*
[4+5)] are not.
A simple stack-based algorithm for this is as follows:
Sorting +
Loop through the expression. Every time an opening parenthesis is
Exercises + encountered, push it onto the stack. Every time a closing parenthesis is
encountered, pop the top element from the stack and compare it with
the closing parenthesis. If they match, then things are still okay. If they
don't match, then the expression is trying to close one type of
parenthesis with another type, indicating a problem.
If the stack is
empty when we try to pop, that also indicates a problem, as we have a
closing parenthesis with no corresponding opening parenthesis. Finally, if
the stack is not empty after we are done reading through the
expression, that also indicates a problem, namely an opening
parenthesis with no corresponding closing parenthesis.
5. Stacks are useful for parsing formulas. Suppose, for example, we need
to write a program to evaluate user-entered expressions like 2*
(3+4)+5*6. One technique is to convert the expression into what is
called postfix notation (or Reverse Polish notation) and then use a stack.
In postfix, an expression is written with its operands first, followed by
the operators. Ordinary notation, with the operator between its
operands, is called infix notation.
For example, 1+2 in infix becomes 1 2
+ in postfix. The expression 1+2+3 becomes 1 2 3 + +, and the
expression (3+4)*(5+6) becomes 3 4 + 5 6 + *.
To evaluate a postfix
expression, we use a stack of operands and results. We move through
the formula left to right. Every time we meet an operand, we push it
onto the stack. Every time we meet an operator, we pop the top two
items from the stack, apply the operand to them, and push the result
back onto the stack.
For example, let's evaluate (3+4)*(5+6), which is 3
4 + 5 6 + * in postfix. Here is the sequence of operations.
3 push it onto stack. Stack = [3]
4 push it onto stack. Stack = [4,
Running times of
3]
algorithms +
+ pop 4 and 3 from stack, compute 3+4=7, push Stack = [7]
Lists + 7 onto stack.
5 push it onto stack. Stack = [5,
Stacks and Queues + 7]
Recursion + 6 push it onto stack. Stack = [6,
5, 7]
Binary Trees + + pop 6 and 5 from stack, compute 5+6=11, Stack = [11,
push 11 onto stack. 7]
Binary Search Trees +
* pop 11 and 7 from stack, compute 7*11=77, Stack = [77]
Heaps + push 77 onto stack
Sets and Hashing + The result is 77, the only thing left on the stack at the end.
Postfix
notation has some benefits over infix notation in that parentheses are
Maps + never needed and it is straightforward to write a program to evaluate
postfix expressions. In fact, many calculators of the '70s and '80s used
Graphs + postfix for just this reason. Calculator users would enter their formulas
using postfix notation. To convert from infix to postfix there are several
Sorting + algorithms. A popular one is Dijkstra's Shunting Yard Algorithm. It uses
stacks as a key part of what it does.
Exercises +
6. Queues are used in operating systems for scheduling tasks. Often, if the
system is busy doing something, tasks will pile up waiting to be
executed. They are usually done in the order in which they are received,
and a queue is an appropriate data structure to handle this.
In general, queues are useful when things must be processed in the order in
which they were received, and stacks are useful if things must be processed in
the opposite order, where the most recent thing must be processed first.
Running times of
algorithms +
Chapter 4. Recursion
Lists +
Maps + if (s.equals(""))
return "";
Graphs +
Sorting + }
Exercises + Notice how the function calls itself in the last line. To understand how the
function works, consider the string abcde. Pull off the first letter to break the
string into a and bcde. If we reverse bcde and add a to the end of it, we will
have reversed the string. That is what the following line does:
The key here is that bcde is smaller than the original string. The function gets
called on this smaller string and it reverses that string by breaking off b from
the front and reversing cde. Then cde is reversed in a similar way. We continue
breaking the string down until we can't break it down any further, as shown
below:
reverse("") = ""
Building back up from the bottom, we go from "" to "e" to "ed" to "edc", etc.
All of this is encapsulated in the single line of code shown below again, where
the reverse function keeps calling itself:
We stop the recursive process when we get to the empty string "". This is our
base case. We need it to prevent the recursion from continuing forever. In the
function it corresponds to the following two lines:
Lists + Another way to think of this is as a series of nested function calls, like this:
Recursion + When I sit down to write something recursively, I have all of the above in the
back of my mind, but trying to think about that while writing the code makes
Binary Trees + things harder. Instead, I ask myself the following:
Binary Search Trees + 1. How can I solve the problem in terms of smaller cases of itself?
Heaps + 2. How will I stop the recursion? (In other words, what are the small cases
that don't need recursion?)
Sets and Hashing +
One of the most useful breakdowns for a string is to break it into its head (first
Maps + character) and tail (everything else). The head in Java is s.charAt(0), and the
tail is s.substring(1).
Graphs +
In order to reverse a string, I think about it like this: If I know what the head
Sorting + (first character) is, and I know that calling reverse on the tail will correctly
reverse the tail, how can I combine these things to reverse the entire string?
Exercises + The answer is to take the reverse of the tail and append the first character to
the end. I don't worry about how the tail reversal happens—I just trust that
the method will handle it for me.
Length of a string
if (s.equals(""))
return 0;
return 1 + length(s.substring(1));
Here again the idea is that we break the problem into a smaller version of
itself. We do the same breakdown as for reversing the string, where we break
the string up into its head (the first element) and its tail (everything else). The
length of the string is 1 (for the head) plus the length of the tail. The base
case that stops the recursion is when we get down to the empty string.
Running times of The key to writing this function is we just trust that the length function will
algorithms + correctly return the length of the tail. This can be hard to do, but try it—just
write the recursive part of the code trusting that the length function will do its
Lists + job on the tail. It almost doesn't seem like we are doing anything at all, like
the function is too simple to possibly work, but that is the nature of recursion.
Stacks and Queues +
Notice that there is no need for a loop or for a variable to keep count. We will
Recursion + rarely need those things in our recursive examples.
Binary Trees +
Sum of a list
Binary Search Trees +
Here a recursive function that sums a list of integers:
Heaps +
public static int sum(List<Integer> a) {
return 0;
Maps +
Graphs + }
Sorting + We break down the list into its head and tail, just like in the previous two
examples. The base case is the empty list, which has a sum of 0.
Exercises +
The sum of the list is the value of the head plus the sum of the elements of
the tail. That is what the last line says. Again, we just trust that the sum
function will correctly return the sum of the tail.
If it seems like we aren't really doing anything, that is an illusion. Each call to
the function does the task of adding one new item to the total, just like total
+= list.get(i) would do in ordinary iterative code. The function calling itself
causes a sort of loop, and the running total is actually hidden in the internal
system (call stack) that Java uses to manage its function calls.
Fibonacci numbers
The Fibonacci numbers are the numbers 1, 1, 2, 3, 5, 8, 13, 21, 34, …, where
the first two numbers are 1, and each number thereafter is the sum of the two
previous numbers. For example, 21=13+8 and 34=21+13.
The formal way of saying this is F1=1, F2=1, and Fn = Fn-1 + Fn-2. We can
translate this definition directly into a recursive function, like below:
if (n==1 || n==2)
return 1;
Notice how close to the mathematical definition the code is. One unfortunate
problem with this approach is that it turns out to be really slow. For example,
to compute F50, the program computes F49 and F48. Then to get F49 it
Running times of computes F48 and F47. But we had already computed F48 and here we are
algorithms + computing it again. This waste actually grows exponentially as the tree
diagram below shows:
Lists +
Recursion +
Binary Trees +
if (s.equals(""))
return false;
if (s.charAt(0) == c)
return true;
else
Again, we break our string into a head and tail. If the first character is the
thing we are looking for, then we return true. Otherwise, we sort of punt on
the question and ask recursion to handle the rest. Specifically, we ask if the
desired character is in the tail or not. It's a little like an assembly line, where
we do our part, which is to look at the head, and if we don't see the character,
then we pass the rest of the string on down and assume that everyone else on
the assembly line will do their part, eventually telling us the answer.
To put it another way, an empty string cannot contain the desired character,
and a nonempty string contains the desired character if the character is either
the head or it is somewhere in the tail. The six lines of the function say exactly
this in Java code.
if (list.size() == 1)
return list.get(0);
Running times of
algorithms +
if (list.get(0) < minTail)
return list.get(0);
Lists +
else
}
Recursion +
Like many of our previous examples, we break the list into its head (first
Binary Trees + element) and tail (all other elements). If we know what the first element is
and we know what the minimum is of the tail, then how can we combine those
Binary Search Trees + to get the overall minimum of the list? The answer is that the overall minimum
is either the first element or the minimum of the tail, whichever is smaller.
Heaps + That is precisely what the code does.
return true;
if (s.charAt(0) == s.charAt(s.length()-1))
else
return false;
The base case covers two possibilities: zero- and one-character strings, which
are automatically palindromes.
The recursive part works a little differently than the head-tail breakdown we
have used for the earlier examples. Recursion requires us to break the
problem into smaller versions of itself and head-tail is only one way of doing
that. For checking palindromes, a better breakdown is to pull off both the first
and last characters of the string. If those characters are equal and the rest of
the string (the middle) is also a palindrome, then the overall string is a
palindrome.
This demonstrates one of the keys to solving problems recursively. Think about
how to define the problem in terms of smaller cases of itself and then translate
that into code. In particular, here we use an if statement to compare the first
and last character. If they are equal then we move on and check the rest of
the string (everything but the first and last characters), but if they are not
equal, then we know we don't have a palindrome, and we return false. We also
need a base case to stop the recursion, which is what the first two lines do.
Repeating a string
Running times of
algorithms + Below is a recursive method that repeats a string a given number of times. For
instance, repeat("abc", 3) will return "abcabcabc".
Lists +
public static String repeat(String s, int n) {
return "";
}
Binary Trees +
In this problem, we don't break the string down into a head and a tail. We
Binary Search Trees + need to keep the string together in order to repeat it. Instead, the recursion
happens in the second parameter, n. To understand the line s + repeat(s, n-
Heaps + 1), think of it as saying if we want n copies of the string s, then we can take
one copy of s and attach n-1 copies of s to it. Again, just trust that recursion
Sets and Hashing + will correctly handle the n-1 copies. It makes it easier to write recursive
methods that way.
Maps +
Sorting + Here is a recursive function that returns a list of integers in a given range:
if (b < a)
list.add(b);
return list;
Remember that with recursion, the key idea is to break the problem into
smaller versions of itself. Here we break down range(a, b) into range(a, b-1)
with b separated off. For instance, we can think of [2,3,4,5] as [2,3,4] with a
5 to be added to the end. Recursion takes care of range(a, b-1) and we call
the list add method to put b onto the end of that. The base case here is b<a,
which is used to stop the recursion. We keep subtracting 1 from b until
eventually b ends up less than a, and that's when we stop and return an empty
list.
A replaceAll method
Here is a recursive method that replaces all occurrences of the character c1
with the character c2 in a string s:
if (s.equals(""))
return "";
if (s.charAt(0) == c1)
else
algorithms + }
Lists + We break the string into its head and tail. We then look at the head and either
replace it or leave it be. Then we call replaceAll on the tail. Compare this to
Stacks and Queues + what iterative code would look like.
Recursion + String t = "";
t = t + c2;
t = t + s.charAt(i);
Heaps + }
Sets and Hashing + The if statement where we look at the current character is the same as in the
recursive code. The statement t = t + c2 is similar to where in the recursive
Maps +
code we return c2 plus replaceAll called on the tail. Here t stands for the
Graphs + string we have built up so far, where we don't really worry about how it got
built up. We just know that it has everything it needs. This is similar to how we
Sorting + just trust that recursion has correctly replaced everything in the tail.
if (list.size() == 0)
return true;
return true;
else
return false;
The base case for an empty list returns true, because, technically, everything
in an empty list is 0. Mathematicians say it is vacuously true. Sometimes I tell
people I have won every round of gold I ever played, which is technically
(vacuously) true since I haven't ever played golf.
Again, we break the list into its head and tail. When writing this, ask the
following question: if we know what the first element of the list is and if we
know whether or not the tail consists of all zeroes, then how can we use that
information to solve the problem? The answer is if the head is 0 and every
element of the tail is 0, then the list consists of all zeros. Otherwise, it doesn't.
That's what we do in the code—we check if the first element (list.get(0)) is 0
and we check if the rest of the list is 0 by calling allZero on the tail.
Power function
Running times of Here is a recursive function that raises a number to a (positive) integer power:
algorithms +
public static double pow(double x, int n) {
Lists + if (n==0)
return 1;
}
Recursion +
For example, pow(3,5) computes 35=729.
Binary Trees +
To see how things get broken down here, consider 35 = 3 · 3 · 3 · 3 · 3. If we
Binary Search Trees +
pull the first 3 off, we have 35 = 3 · 34. In general, xn = x · xn-1. This
Heaps + translates to pow(x,n) = x * pow(x,n-1).
Sets and Hashing + We can keep breaking n down by 1 until we get to n=0, which is our base
case. In that case we return 1, since anything (except maybe 0) to the 0
Maps + power is 1.
Graphs + It is worth comparing this with the iterative code that computes the power:
The base case of the recursion corresponds to the statement that sets prod
equal to 1. The recursive case x * pow(x, n-1) corresponds to the line prod =
prod * x, where prod holds the product computed thus far, which is similar to
how pow(x, n-1) holds the product from the next level down.
if (s.charAt(0)==c)
return 0;
Since we are assuming that the character is in the string, we don't need much
of a base case except that if the first character is a match, then we return 0.
Otherwise, we call indexOf on the tail. That tells us where the character is
located in the tail. Wherever that is, we need to add 1 to it since the tail is
offset by 1 from the start of the string.
Finding repeats
Below is a recursive method that determines if a string contains consecutive
characters that are equal. For instance, "abccde" contains the repeat "cc".
Running times of
algorithms + public static boolean hasRepeat(String s) {
if (s.length() <= 1)
if (s.charAt(0) == s.charAt(1))
return hasRepeat(s.substring(1));
Recursion + }
Binary Trees + Our base case consists of strings of length 0 or 1, which are not long enough
to have repeats. Otherwise, we break the string into a head and tail, but we
Binary Search Trees + also need to look at the not just the first character, but also the second so that
we can make the comparison for repeats.
Heaps +
Minimum of a list (again)
Sets and Hashing +
Maps + To demonstrate that head/tail isn't the only way to break things down, here is
a way to find the minimum of a list by splitting it right down the middle into
Graphs + left and right halves:
if (list.size() == 1)
Exercises + return list.get(0);
return minLeft;
else
return minRight;
The way the recursion works is that we find the minimum in both the left and
right halves and return whichever of those is smaller.
if (list.isEmpty())
return total;
}
When writing iterative code to sum a list, we would use a variable to hold the
running total. One way to do that in recursion is to use a method parameter to
Running times of hold the total. At each stage, we update the total variable in the function call.
algorithms + When we get down to the base case, having essentially “looped” recursively
through the entire list, that total variable contains the sum of all the items,
Lists + and so that's what we return.
Stacks and Queues + Note that adding a parameter changes the way people use the method since
there's an extra parameter to worry about. To keep avoid this, we put the
Recursion + recursion into a separate helper method and have the sum method call that
helper.
Binary Trees +
Heaps + Here is another example of doing recursion by adding a parameter. We will one
more time find the minimum of a list. Here is the code:
Sets and Hashing +
public static int min(List<Integer> list) {
Graphs +
Sorting + if (list.size() == 1)
return smallest;
else
If we were doing things iteratively, the following code would be a key part of
the solution:
if (x < smallest)
smallest = x;
In particular, we use a variable to hold the smallest thing seen thus far as we
traverse the list. To do this recursively, we use that variable as a parameter.
Notice how in the recursive code if list.get(0) < smallest, then we change
that parameter to list.get(0) and otherwise leave it alone. This is just like
what happens in the if statement of the iterative code.
System.out.println(f);
Heaps + }
Sets and Hashing + We can turn this into recursive code that checks the entire tree of
subdirectories by replacing the print statement with a recursive call to
Maps + printContents and adding a base case that handles normal files (non-
directories):
Graphs +
public static void printContents(File file) {
Sorting + if (!file.isDirectory())
System.out.println(f);
Exercises + else
printContents(f);
Again, notice how the loop in the first version of the method is the same as
the one in the second method, except that the simple print statement has
been replaced with a recursive call to the method itself.
files.add(f);
if (current.isDirectory())
files.add(i+1, f);
else
System.out.println(current.getName());
}
Running times of
algorithms +
Factoring numbers
Lists +
In math it is often important to factor integers into their prime divisors. For
Stacks and Queues + example, 54 factors into 2× 3× 3× 3. One way to factor is to search for a
divisor and once we find one, we divide the number by that divisor and then
Recursion + factor the smaller number. For example, with 54 we first notice that 2 is a
divisor, then we divide 54 by 2 to get 27, and after that we factor 27. To factor
Binary Trees + 27 we find that 3 is a divisor, divide 27 by 3 to get 9 and then factor 9. The
process continues like this until we get down to a prime number. Since we are
Binary Search Trees + repeating the same process over and over, each time on smaller numbers,
recursion seems like a natural choice. Here is the recursive solution:
Heaps +
public static List<Integer> factors(int n) {
if (n % i == 0) {
Graphs +
List<Integer> list = factors(n / i);
list.add(0, i);
Sorting +
return list;
Exercises +
}
The code is very short. Let's look at it with the example of n=75. We loop
through potential divisors starting with 2. The first one we find is 3. We divide
75 by 3 to get 25 and call factors(25). We just trust that this will return with
the correct list of divisors, then add 3 to the front of that list (to keep the
divisors in order) and return the list.
Permutations
The key is that we can build up the permutations from the permutations the
next level down. For example, the twenty-four permutations of 1234 can be
broken into those starting with 1, those starting with 2, those starting with 3,
and those staring with 4. The permutations starting with 1 consist of a 1
followed by all the ways to permute 234. The other cases are similar, as shown
below:
1 + permutations of 234 — 1234, 1243, 1324, 1342, 1423, 1432
algorithms +
List<String> list = new ArrayList<String>();
if (s.length() == 1) {
Lists +
list.add(s);
Recursion +
list.add(s.charAt(i) + x);
}
Heaps +
The base case is a string of length 1. In that case, the only permutation is to
Sets and Hashing + leave the element fixed.
Maps + For the recursive part, we loop over all the characters in the string and take
each of them as the starting character for a portion of all the permutations.
Graphs + Those permutations consist of that character followed by all the ways to
permute the other characters. The substring code above finds those other
Sorting + characters. We then use recursion to handle finding the permutations of those
characters.
Exercises +
Combinations
To compute all the k-element combinations from the set {1,2,…,n}, we can
use a recursive process. Let's say we want the two-element combinations of
1234. We first generate the two-element combinations of 123, which are 12,
13, and 23. These are half of the two-elements combinations of 1234. For the
other half, we take the one-element combinations of 123, namely 1, 2, and 3,
and append a 4 to the end of them to get 14, 24, and 34.
if (n==0 || k==0)
Lists + list.add(String.valueOf(i));
return list;
return list;
The same procedure that we apply at the start is repeatedly applied to all the
smaller triangles. This makes it a candidate for recursion. Here is a program to
draw an approximation to the Sierpinski triangle:
import javax.swing.*;
import java.awt.*;
super("Sierpinski Triangle");
contents = getContentPane();
contents.setLayout(new BorderLayout());
contents.add(canvas, BorderLayout.CENTER);
setSize(450, 450);
this.numIterations = numIterations;
Lists + }
public void drawTriangle(Graphics g, int x1, int y1, int x2, int y2, int x3, int y3) {
public void sierpinski(Graphics g, int x1, int y1, int x2, int y2, int x3, int y3, int depth
Maps + if (depth > 0) {
else
new Sierpinski(6);
The key to writing the code is just to worry about one case, namely the first
one. When we cut out the middle triangle, we create three new triangles,
whose coordinates are shown in the figure below.
If we want to just draw one level of the Sierpinski triangle, we could do the
following, using the coordinates given in the figure above along with the
midpoint formula:
public void sierpinski(Graphics g, int x1, int y1, int x2, int y2, int x3, int y3) {
Lists + }
Stacks and Queues + A small change to this code turns it into magical recursion code that draws not
just one level, but arbitrarily many levels of the triangle. The change is to
Recursion + replace the drawTriangle calls with recursive calls to the sierpinski method:
Binary Trees + public void sierpinski(Graphics g, int x1, int y1, int x2, int y2, int x3, int y3, int depth) {
if (depth > 0) {
Binary Search Trees + sierpinski(g, x1, y1, (x1+x2)/2, (y1+y2)/2, (x1+x3)/2, (y1+y3)/2, depth-1);
Maps + }
Graphs + Each of those calls tells where to break up the bigger triangle and then says to
“do the Sierpinski thing” on them, instead of just drawing a single triangle. To
Sorting + keep the recursion from going on forever, we add a depth variable that keeps
track of how many iterations of the process we've got left. Each time we call
Exercises +
sierpinski, we decrease that depth by 1 until we get to the base case of
depth 0, which is when we actually draw the triangles to the screen.
At this point, it would be good to stop reading and get out four objects of
varying sizes and try the problem. Rings and pegs are not needed—any four
objects of different sizes will do. The minimum number of moves needed is 15.
It s also worth trying with just three objects. In that case, the minimum
number of moves is 7.
Running times of
algorithms + We will solve the problem using recursion. The key idea is that the solution to
the problem builds on the solution with one ring less. To start, here is the
Lists + optimal solution with two rings:
Recursion +
The two-ring solution takes three moves and is pretty obvious. Shown below is
Binary Trees + the optimal seven-move solution with three rings.
Binary Search Trees +
Heaps +
Maps + Looking carefully, we can see that the solution for two rings is used to solve
the three-ring problem. Steps 1, 2, and 3 are the two-ring solution for moving
Graphs +
two rings from the left to the center. Steps 5, 6, and 7 are the two-ring
Sorting + solution for moving two rings from the center to the right. And overall, the
three-ring solution itself is structured like the two-ring solution. Now here's the
Exercises + solution for four rings:
Notice that steps 1-7 use the three-ring solution to move the top three rings to
the middle. Step 8 moves the big ring over the end, and then steps 9-15 use
the three ring solution to move the top three rings over to the end.
2 rings: Move the top ring to the center, move the bottom ring to
the right, move the top ring to the right.
3 rings: Move the top two rings to the center, move the bottom
ring to the right, move the top two rings to the right.
4 rings: Move the top three rings to the center, move the bottom
ring to the right, move the top three rings to the right.
Shown below are parts of the two-, three-, and four-ring solutions, lined up,
showing how they are similar.
Running times of
algorithms +
Lists +
Recursion +
The general solution is a similar combination of the n-1 solution and the two-
Binary Trees + ring solution. We use the n-1 solution to move the top n-1 rings to the center,
then move the bottom ring to the right, and finally use the n-1 solution to
Binary Search Trees + move the top three rings to the right.
Heaps + It's not too hard to count the number of moves needed in general. The
solution for two rings requires 3 moves. The solution for three rings requires
Sets and Hashing + two applications of the two-ring solution plus one more move, for a total of 2 ·
3 + 1=7 moves. Similarly, the solution for four rings requires two applications
Maps +
of the three-ring solution plus one additional move, for 2 · 7+1=15 moves. In
Graphs + general, if hn is the number of moves in the optimal solution with n rings, then
we have hn=2hn-1+1. And in general, hn=2n-1 moves.
Sorting +
This is exponential growth. Already, the solution for 10 pegs requires 210-
Exercises + 1=1023 moves. This is a lot, but doable by hand in under an hour. The
solution for 20 pegs requires over a million moves, and the solution for 30
pegs requires over a billion moves. The solution for 100 pegs requires roughly
1030 moves, which just might be able to be completed if every computer in
existence worked round the clock for millions of years on the problem.
Here is a program that returns the optimal solution to the Towers of Hanoi
problem:
import java.util.ArrayDeque;
import java.util.Deque;
public Hanoi(int n) {
s1 = new ArrayDeque<Integer>();
s2 = new ArrayDeque<Integer>();
s3 = new ArrayDeque<Integer>();
s1.push(i);
end.push(start.pop());
algorithms +
System.out.println(s1 + " " + s2 + " " + s3);
Lists +
else {
Binary Trees + }
new Hanoi(3);
Heaps + }
}
Sets and Hashing +
Here is the output for the three-ring solution:
Maps +
[1, 2, 3] [] []
[] [1, 2] [3]
[1] [] [2, 3]
[] [] [1, 2, 3]
The code uses three stacks to represent the pegs. The class's constructor
initializes the stacks and fills the left stack. All of the work is done by the
recursiveHanoi method. The method takes three stacks—start, end, and other
—as parameters, as well as an integer size. Any move can be thought of as
moving size number of rings from start to end, using other as a helper peg.
The base case of the recursion is size=1, which is when we are moving a single
ring from one peg to another. We accomplish that by popping the “peg” from
the start stack and pushing it onto the end stack. At this point we also print
out that stacks, so that every time a switch is made, the stacks are printed.
The recursive part of the method moves size-1 rings from the start stack to
the helper stack (other), then moves the bottom ring from the start stack to
the end stack, and finally moves size-1 rings from the helper stack to the end
stack.
Here is the code from Section 4.1 that recursively reverses a string.
public static String reverse(String s) {
}
Stacks and Queues +
If we try running the reverse function on a really long string, we will get a
Recursion +
stack overflow error. The reason for this is every time we make a recursive
call, the program uses a certain amount of memory for local variables and
Binary Trees +
other things the function needs. This is saved on an internal Java data
Binary Search Trees + structure called the call stack. This stack has a limited amount of space,
typically space for maybe 10,000 function calls. Recursive calls are nested—
Heaps + each call happening before the previous one returns—which can cause the call
stack to fill up, or overflow. This is a problem in most imperative languages,
Sets and Hashing + not just Java.
f();
Sorting + }
Exercises + Here we basically have an infinite recursive loop. This will eventually fill up the
call stack and cause a stack overflow. When writing recursive code, it is easy
for a program bug to cause a stack overflow. This often happens from making
a mistake in the base case or forgetting it entirely. It can also happen if your
recursive code does not actually break the input into smaller pieces. For
instance, if instead of s.substring(1) in the reverse method, we use the entire
string s itself, then the input string to the function will always be the same
size, and the recursive calls will keep happening, never getting to the base
case to stop the recursion. Eventually the call stack will overflow.
Using parameters
if (i == 10)
return;
else
countToTen(i + 1);
}
Running times of Notice how the loop variable n from the iterative code becomes a parameter in
algorithms + the recursive code. In general, it's possible to convert any chunk of iterative
code into recursive code using parameters in a similar way. As another
Lists + example, here is some iterative code to compute Fibonacci numbers:
if (n==1 || n==2)
Recursion + return 1;
c = p+c;
Heaps + p = hold;
}
Maps +
Here is a direct translation of this approach into recursive code:
Graphs +
public static long fib(int n) {
return 1;
if (i==n-2)
return c;
We have a helper function here because our recursive method has a lot of
parameters that the caller shouldn't be bothered with. The fib function just
handles the simple cases that n equals 1 or 2 and then sets the helperFib
function in motion by essentially “initializing” the variable i to 0, and c and p
to 1.
Notice how the variables in the recursive code (including the loop variable i)
become parameters to the function. Also, notice how the for loop in the
iterative code runs until i equals n-2, which is the base case of the helperFib
function.
Let's compare the iterative code's for loop with the recursive call to helperFib.
for (int i=0; i<n-2; i++) { return helperFib(n, i+1, c+p, c);
long hold = c;
c = p+c;
p = hold;
Notice that the recursive function calls itself with the i parameter set to i+1,
with the c parameter set to c+p, and with the p parameter set to c. This is the
Running times of equivalent of what happens in the iterative code's for loop, where we do i++,
algorithms + c=c+p, and p=hold, where hold gets the old value of c.
Lists + This approach is more work than the earlier recursive Fibonacci method we
wrote, but it runs much more efficiently than the other version. In general, a
Stacks and Queues + similar process to this can be used to translate any iterative code into
recursive code. Note that it's actually the call stack that makes recursive
Recursion + solutions seem simpler than iterative ones. The call stack saves the value of
the local variables, whereas with an iterative solution we have to keep track of
Binary Trees +
the variables ourselves. In fact, we can convert a recursive solution into an
Binary Search Trees + iterative one by managing a stack ourselves.
Heaps + The upshot of all this is that any (solvable) problem can be approached either
iteratively or recursively.
Sets and Hashing +
Tail recursion
Maps +
There is class of programming languages, called functional languages, which
Graphs + have better support for recursion than Java. They use what is called tail-call
optimization to eliminate most of the overhead of the call stack. A function is
Sorting +
said to be tail-recursive if it is written so that the last thing the function does is
Exercises + call itself without making any use of the result of that call other than to return
it. Many of the examples from this chapter could easily be rewritten to be tail-
recursive. For example, here is a tail-recursive version of the reverse method:
if (s.equals(""))
return total;
Notice how the last thing we do is call the function, and nothing is done with
the result of that other than to return it. This makes it tail recursive.
The reason for tail recursion is that some programming languages (but not
Java as of this writing) can turn tail recursion into iterative code, eliminating
the possibility for a stack overflow. Of course, why not just write the code
iteratively to start? The answer is you should just write it iteratively, except
that in some programming languages (mostly functional languages), you just
don't (or can't) do that. In those languages, you write most things using
recursion, and writing them in a tail-recursive way allows for the language's
compiler to turn the code into iterative code that doesn't risk a stack overflow.
Summary
Sometimes recursion almost seems like magic. All you have to do is specify
how to use the solution to the smaller problem to solve the larger one, take it
Running times of
on faith that the smaller one works, and you're essentially done (once you do
algorithms +
the base case, which is often simple).
Lists + For example, here again is recursive solution to reversing a string: break off
the first character, reverse the remaining characters and add the first character
Stacks and Queues +
to the end. The code for this is:
Recursion +
return reverse(s.substring(1)) + s.charAt(0);
Binary Trees +
We just assume that reverse correctly reverses the smaller substring and then
Binary Search Trees + add the first character to the end. This, plus the base case to stop the
recursion, are all that we need.
Heaps +
In Java, a simple rule for when to use recursion might be to use it when it
Sets and Hashing + gives a simpler, easier-to-code solution than an iterative approach and doesn't
risk a stack overflow error or other inefficiency (like with the first Fibonacci
Maps + function we did). A good example of an appropriate use of recursion is for
calculating permutations. Iterative code to generate permutations is somewhat
Graphs + more complicated than the recursive code. And there is no risk of a stack
overflow as we couldn't generate permutations for very large values of n
Sorting + recursively (or iteratively) because there are so many permutations (n! of
them).
Exercises +
The take-home lesson is that in Java, most code is done iteratively, and
recursion is useful on occasion.
Lists +
Recursion +
Binary Trees +
public T value;
this.value = value;
this.left = left;
this.right = right;
public BinaryTree() {
root = null;
Notice the similarity to the linked list class. Next, we will consider how to add a
new node to the tree. We will allow callers to add to any location in the tree.
To do this, callers will specify a string of L's and R's, indicating how to traverse
the tree by moving left and right to get to where the new node should go. For
instance, the figure below shows the addition of a node with data value 8 as
Running times of the left child of the node with data value 1.
algorithms +
Lists +
Recursion +
Binary Trees +
Heaps +
Maps +
Graphs +
To get to the place to add the new node, starting at the root we would have to
Sorting +
walk left and then right, and then add the new node as the left child. The
Exercises + caller could do this via add(8, "LRL"). The first two letters, "LR", tell how to
get to the insertion point, and the last letter, L, says to insert the new node as
the left child. Here is code that implements this addition process:
if (location.equals("")) {
return;
if (c == 'L')
node = node.left;
else
node = node.right;
if (location.charAt(location.length()-1) == 'L')
else
The first case, where the location is an empty string, is how the user would
add something to the root of the tree. Otherwise, we use all the characters of
the location string except the last to move us to the right spot in the tree.
Once we get there, we use the last character to tell us whether to add the new
node as the left or right child.
Running times of Note that if the caller specifies a location that is already in the tree, then this
algorithms + code will wipe out a portion of the tree.
if (s.equals(""))
Maps + return 0;
return length(s.substring(1)) + 1;
Graphs +
}
Sorting + We break the string into its head (the current character) and tail (everything
after it), and the overall length is the length of the tail plus 1 for the head.
Exercises +
Here now is the binary tree size method:
if (node == null)
return 0;
Whereas we break strings up into their head and tail, for binary trees it's more
like there are two tails: a left and a right one. The size of the tree that starts
at a given node is the size of the left subtree plus the size of the right subtree,
plus 1 for the node itself. See the figure below.
Note the base case is a null node. The ends of the tree are indicated by nulls,
so that is where we stop the recursion.
One issue with this approach is that the recursion requires a Node parameter,
but someone calling the method just wants to do something like tree.size()
without having to worry about a Node parameter. To fix this, we introduce a
Running times of helper, as below:
algorithms +
public int size() {
return 0;
}
Binary Search Trees +
The helper method does all the work. The size method is a wrapper around it
Heaps +
that simply starts the recursion in motion at the tree's root.
Sets and Hashing +
We will look at several other binary tree methods. Each will follow the same
Maps + structure as this one. The way to approach these problems is to ask the
following: if we know the answer to the problem for the left and right subtrees,
Graphs + how can we combine that along with information about the node itself to
answer the overall problem?
Sorting +
A contains method
Exercises +
Here we look at a method that tells whether or not the tree contains a
particular data value. To write it recursively, we use the same approach as with
the size method. We look at the current node itself, its left subtree, and its
right subtree. In particular, the subtree starting at a particular node contains a
given data value if that data value is either at the node itself, in its left
subtree, or in its right subtree. See the figure below:
return false;
Lists +
}
Recursion +
Just like with the size method, we use a helper method. The base case is a
Binary Trees +
null node. The recursive part checks the node itself for the data value and it
Binary Search Trees + recursively checks the left and right subtrees. We use a single statement here
to return the logical OR of the three possibilities, but we could also do this with
Heaps + an if/else statement.
Knowing the heights of the left and right subtrees, how do we get the overall
height? Take whichever of the two heights is larger and add 1 to it since the
current node adds 1 to the total height. See the figure below:
Below is the code for the method. Note that Java's Math.max method returns
the larger of two numbers.
return heightHelper(root);
if (node == null)
return 0;
}
Running times of
algorithms + This height method demonstrates how to think recursively: figure out how the
problem can be solved in terms of smaller versions of itself. You don't actually
Lists + have to specify how to move through the tree. Instead specify how to combine
the solutions to the subproblems to get a solution to the whole, and specify a
Stacks and Queues + base case to stop the recursion.
Recursion +
A method to count leaves
Binary Trees +
The leaves of a tree are the nodes at the ends of the tree that don't have any
Binary Search Trees + children. In terms of Java code, a node called node is a leaf if node.left and
node.right are both null. We can combine this fact along with recursion to
Heaps + count the number of leaves in a tree. In particular, the number of leaves is the
number of leaves to the left of the current node plus the number to the right.
Sets and Hashing + See the figure below:
Maps +
Graphs +
Sorting +
Exercises +
return countLeavesHelper(root);
if (node == null)
return 0;
return 1;
There are essentially two base cases here, one for when we fall off the end of
the tree and one for if the node is a leaf.
A toString method
Here we write a toString method for printing all the elements in the tree. The
method follows the same basic structure as all of the other methods we have
written. Here is the code:
return toStringHelper(root);
Lists + }
if (node == null)
else
Sets and Hashing + The way that works is we always visit nodes in the order left-center-right. That
is, from any given node, we will always print out everything in the left subtree,
Maps + then the node's own value, and finally everything in its right subtree. This is a
recursive process. From a given node we first print out the in-order traversal
Graphs + of the left subtree, then the value of the node, then the in-order traversal of
the right subtree. Let's look at how this works on the tree below:
Sorting +
Exercises +
Start at the root. According the in-order rule, we first look at its left subtree.
So we're now at the node labeled with 5. Again, recursively following the rule,
we look at this node's left subtree. We're now at the node labeled 9. Its left
subtree is empty, so following the in-order rule, we then look at the node
itself, which has data value 9, and that's the first thing that gets printed. The
node labeled 9 has no right subtree. Having finished with this node, we back
up to the previous node. Having finished its left subtree, we visit the node
itself, and print out a 5. We then visit its right subtree and apply the same
process. In the end, we get 9 5 1 4 3 6 2.
If we change around the order in the return statement we can get other types
of traversals. Two common ones are the pre-order traversal (node, left, right)
and post-order traversal (left, right, node).
Below is an alternate toStringHelper method that prints out the path to each
node as a string of L's and R's. To get the path, we add a parameter location
to the helper function. In the recursion, we add an "L" or "R" to the previous
value of location depending on whether we are moving on to the left or right
subtree. This location variable thus keeps track of the string of L's and R's
that we have to take from the root to get to the node.
Running times of private String toStringHelper(Node n, String location) {
algorithms + if (n == null)
return "";
Lists +
if (n == root)
Stacks and Queues + return toStringHelper(n.left, location+"L") + " " + "root: " + n.value +
Recursion +
}
Binary Search Trees +
For the tree shown earlier in this section, here is the output of this method:
Heaps +
LL: 9
LR: 1
Maps + LRR: 4
root: 3
Graphs +
RL: 2
R: 6
Sorting +
if (node == null)
return "";
if (level % 2 == 0)
stuffAtEvenLevelsHelper(node.right, level+1);
else
stuffAtEvenLevelsHelper(node.right, level+1);
this.value = value;
this.left = left;
this.right = right;
Here a binary tree and its declaration using this class. The declaration is
spread across several lines for readability.
new Tree(5,
return toStringHelper(this);
Lists + }
if (tree==null)
Binary Trees + }
Binary Search Trees + We can see that this is very similar to the earlier toString method. In fact, the
other recursive algorithms we wrote would change very little with this new
Heaps +
approach.
Sets and Hashing +
This approach and the earlier node-based approach are actually quite similar.
Maps + In the earlier approach the recursion is hidden in the Node class, which has
references to itself.
Graphs +
Either of the two implementations of binary trees gives a good basis to work
Sorting + from if you need a binary tree class for something. This is important as there
is no binary tree or general tree class in Java's Collections Framework, so if
Exercises + you need an explicit binary tree class, you either have to code one up yourself
or look around for one.
Applications
Binary trees and more general trees are a fundamental building block of other
data structures and algorithms and form a useful way to represent data. They
show up in networking algorithms, compression algorithms, and databases. In
Chapters 6 and 7 we will see binary search trees and heaps, which are
specialized binary trees used for maintaining data in order.
A computer's file system can be thought of as a tree with the directories and
files being the nodes. The files are leaf nodes, and the directory nodes are
nodes whose children are subdirectories and files in the directory.
Heaps +
Maps +
Chapter 6. Binary Search Trees
Graphs +
6.1. Introduction
Sorting +
Suppose we have some data that is in order. We are continually adding and
Exercises + deleting things, and we want the data to stay in order as things are added and
deleted. One approach is to store the data in a list and use a search to find the
right place to add new elements, or just add new elements to the end of the
list and re-sort it. This is fine if the list is small or if we don't care about speed.
However, list insertions, as well as deletions, are typically O(n), and a list sort
is an O(n log n) operation typically.
By storing our data in a more clever way in something called a binary search
tree (BST), we can get the running time down to O(log n). It is worth noting
once more how big a difference there is between n and log n: if
n=1,000,000}, then log n = 20. There is a huge difference between a program
that takes 1,000,000 microseconds to run and a program that takes 20
microseconds to run.
Recall from Section 1.5 that the binary search algorithm is an O(log n)
algorithm for finding an element in a list. It works by breaking continually
breaking the data into halves and only looking at the halves that could contain
the value we are looking for. We start by looking at an entry in the middle of
the list. Assuming we don't find it on the first try, if the entry is bigger than
the one we want we know not to bother with the upper half of the list and to
just look at the lower half of the list, and if the entry is smaller than the one
we want, we can just focus on the upper half of the list. We then apply the
same procedure to the appropriate half of the list and keep doing so, cutting
the search space in half at each step until we find the value we want. This
continuous cutting in half is characteristic of logarithmic algorithms.
A BST can be thought of as combination of a binary tree and the binary search
algorithm. Formally, a BST is a binary tree with one special property:
Running times of For any given node N, each element in its left subtree is less than
algorithms + or equal to the value stored in N and each element in its right
subtree is greater than the value stored in N.
Lists +
Below are two binary trees. The one on the left is a binary search tree, but the
Stacks and Queues + one on the right is not. Notice in the tree on the left that for any given node,
everything to the left of it is less than or equal to it and everything to the right
Recursion +
is greater than it. On the other hand, the tree on the right is not a binary
Binary Trees + search tree for two reasons. First, the 16 in the third level is greater than the
14 at the root but is to the left of 14. Second, the 17 at the bottom of the tree
Binary Search Trees + is less than 18 two levels up from it but is to the right of it.
Heaps +
Maps +
Graphs +
Sorting +
Exercises +
To add an element to a BST we rely on the key property of BSTs, that things to
the left of a node are smaller and things to the right are larger. We start at the
root and traverse the tree comparing the value to be added to the value stored
in the current node. If the value to be added is less than or equal to the value
in the current node, we move left; otherwise, we move right. We continue this
until we reach the end of the tree (reach a null link).
For instance, in the example below, let's suppose we are adding a 7 into the
BST.
Running times of
algorithms +
Lists +
Recursion +
Binary Trees + We start by comparing 7 to the root's value 5. Since 7>5, we move right. Then
we compare 7 to the next node's value, 11, and since 7≤ 11, we move left.
Binary Search Trees + Then comparing 7 to 6, we see 7>6 so we move right. At this point we stop
and add the new node because the last move right led to a null node at the
Heaps + end of the tree.
Sets and Hashing + Here is another example. Suppose we add the integers 5, 11, 19, 6, 2, 4, 2, 3,
26 to the BST in exactly that order. The figure below shows the evolution of
Maps +
the tree.
Graphs +
Sorting +
Exercises +
Try this on paper for yourself to get a good sense of how the BST property
works.
It might not be obvious that the data stored in this tree is actually sorted, but
it is. The key is to do an in-order traversal of the tree. Remember that an in-
order traversal visits nodes in the order left-middle-right, visiting the entire
left subtree, then the node itself, then the right subtree. The BST property
guarantees that the data values of everything to the left of a node will be less
than or equal its value and everything to the right will be greater, which means
that an in-order traversal will return the data in exactly sorted order.
Deleting things
When deleting things from a BST, we sometimes have to be careful. There are
three cases to consider: (1) deleting a leaf, (2) deleting a node with exactly
one child, (3) deleting a node with two children.
Deleting a leaf
Running times of Deleting a leaf is pretty straightforward. We simply remove the leaf from the
algorithms + tree. Doing so has no effect on the BST property. See below.
Lists +
Recursion +
Binary Trees +
Heaps +
Exercises +
This is the most interesting case. We can't reroute the parent's link to the child
like in the previous case because there are two children. Rather, the trick is to
replace the node being deleted with the largest node in left subtree of the
node being deleted. See the figure below for an example.
Running times of
algorithms +
Lists + Why does this work? Let d be the name of the node being deleted and let m be
the largest node in the left subtree of d. Because m is to the left of d, it is
Stacks and Queues + guaranteed to be less than anything to the right of d, so that part of the BST
property works. Further, since m is the largest thing in the left subtree, it is
Recursion + guaranteed to be greater than everything to the left of d, so that part of the
BST property is satisfied. So, overall, m fits perfectly into the tree right where
Binary Trees +
d is.
Binary Search Trees +
One other potential problem is about how to delete m since we are moving it
Heaps + from its current location. The fact is that m is guaranteed to have no right
child since any such child would have to be larger than m, and m is the largest
Sets and Hashing + thing in that subtree. So m has at most one child, so we can just delete it if
it's a leaf or reroute its parent's link around it if it has a left child.
Maps +
It's worth asking if this is the only way to do things. The answer is no. By
Graphs + symmetry, we could replace the deleted node with the smallest thing in its
right subtree. There are often other things that we could replace it with, but in
Sorting + order to program this, we want to have some rule that always works, and this
rule is guaranteed to work. It's also fast, running usually in O(log n) time
Exercises + (we'll see why a little later).
Running times
An important consideration about BSTs is how well balanced they are. Roughly
speaking, a tree is balanced if all of the paths from the root to the leaves
(nodes at the ends of the tree) are roughly the same length. If there are some
long and some short paths, the tree is unbalanced. The tree on the left below
is balanced, while the one on the right is not.
In the perfectly balanced case, each of the paths from the root to the leaves
has length log n, where n is the number of nodes. To see why, note that at the
root level there is one node. At the next level there are at most two nodes, the
root's two children. At the next level, each child has at most two children for a
total of four nodes. In general, as we go from level to the next, the maximum
number of nodes doubles, so we have 2k-1 nodes at level k, where level 1 is
Running times of the root level. If the tree is completely full to level k, then, using the
algorithms + geometric series formula, we find that there are 1+2+4+…+2k-1 = 2k-1 total
nodes in the tree. Thus we see that a tree with n nodes and k levels must
Lists + satisfy the equation 2k-1=n. Solving for k, we get k=log(n+1). So the number
of levels is the logarithm of the number of nodes. Another way to look at
Stacks and Queues +
things is that each time we choose a left or right child and step down the tree,
Recursion + we cutting off half of the tree. This continual dividing in half is just like the
binary search algorithm, which has a logarithmic running time.
In a case like
Binary Trees + this, adding and checking if the tree contains an item will both work in O(log
n) time as they both walk down the tree one level at a time, until they reach
Binary Search Trees + their goal or the bottom of the tree. Deleting an item will also work in O(log n)
time, as it won't take more than O(log n) steps to find the largest item in the
Heaps + left subtree.
Sets and Hashing + As long as the tree is relatively balanced, the methods will work in more or
less O(log n) time. If there is a lot of data and it's pretty random, the tree will
Maps + be fairly well balanced. On the other hand, suppose the data we're putting into
a BST already happens to be sorted. Then each item as its added to the BST
Graphs +
will be added as the right child of its parent. The result is a degenerate tree, a
long string of nodes, arranged like in a list. In this case the BST behaves a lot
Sorting +
like a linked list and all the operations are reduced to O(n). This is a serious
Exercises + problem because in a lot of real life situations, data is often sorted or nearly
sorted.
In a case like this, there are techniques to balance the tree. One of these
techniques, AVL trees, involves rebalancing the tree after each addition of a
node. This involves moving around a few nodes in the area of the added node.
Another technique is red-black trees. These techniques are nice to know, but
we will not cover them here. They are covered in other data structures books
and on the web.
Java's Collections Framework does not have a dedicated BST class, though it
does use BSTs to implement some other data structures (like sets and maps,
which we'll see in Chapter 8).
this.value = value;
this.left = left;
this.right = right;
Running times of }
algorithms + }
root = null;
Recursion + }
Heaps +
Maps + else
Graphs + + toStringHelper(node.right);
}
Sorting +
The toString method works in the order left-center-right. Keeping in mind the
Exercises + BST property about everything left being less and everything right being
greater, this toString method will always print out the BST's data in perfectly
sorted order.
The process for adding a value to a BST is as follows: Start at the root,
compare the value to it, move left if it's less than or equal to the root's value
and move right otherwise. At the next node, we repeat the process. We
continue until we reach the end of the tree (a null node) and then insert a new
node at that location. This is all coded below:
if (root == null) {
return;
parent = node;
node = node.left;
else
node = node.right;
}
Lists +
At the start of the code we have a special case for adding to an empty tree.
Stacks and Queues + After that we have the code that walks the tree, moving left or right depending
on how the inserted value compares to the current node's value.
Recursion +
Once we are done with that, we create the new node and insert it as either the
Binary Trees + left or the right child of the last tree node we reached. In order to keep track
of that last tree node, we use a separate Node variable called parent. In the
Binary Search Trees + loop, before we update the variable node, we set parent=node. The effect is
that as we walk through the tree, node is the current node we are at and
Heaps + parent is the one we were at in the previous step.
Sets and Hashing + Here is some code to test the add method:
Maps + BinarySearchTree bst = new BinarySearchTree();
Sorting + bst.add(x);
System.out.println(bst);
Exercises +
And the output looks like below, in perfectly sorted order:
2 2 3 4 5 6 11 19 26
The code for checking for containment is very similar to the code for the add
method. We have a very similar loop that walks through the tree, moving left
or right as appropriate. If in the loop we ever find the value we are looking for,
we return true. If we fall off the end of the tree, then we are guaranteed by
the BST properties that the value is not in the tree, so we return false.
if (node.value == value)
return true;
node = node.left;
else
node = node.right;
return false;
Deleting things
Recall that there are three cases to deleting a node: (1) deleting a leaf (a node
Running times of with no children), (2) deleting a node with exactly one child, and (3) deleting a
algorithms + node with exactly two children. This method is arguably the most complicated
one in this book. First, here is the entire method:
Lists +
public void delete(int value) {
Stacks and Queues + // Special case if removing root and root doesn't have 2 children
return;
Binary Trees + }
Heaps + }
Graphs +
parent = node;
Sorting +
node = node.left;
else
Exercises +
node = node.right;
if (node == parent.left)
parent.left = null;
else
parent.right = null;
if (node == parent.left)
parent.left = node.left;
else
parent.right = node.left;
if (node == parent.left)
parent.left = node.right;
else
parent.right = node.right;
else {
algorithms +
largest = largest.right;
node.value = largest.value;
parentOfLargest.left = largest.left;
parentOfLargest.right = largest.left;
}
Heaps +
The code starts off with a special case involving the root that isn't covered by
Sets and Hashing +
any of the later cases. After that, we have a loop, very similar to the loop in
Maps + the add and method, that locates the node to be deleted as well as its parent.
Graphs + We then take care of deleting a leaf. To do that, we break the link from its
parent, using an if statement to figure out if the node is its parent's left or
Sorting + right child.
Exercises + The next part of the code handles deleting a node with one child by routing the
link from the parent around the child to point to the grandchild.
The last part of the code handles deleting a node with two children. Recall that
to do that, we replace the deleted node with the largest value in its left
subtree. To find that node, we move left from the deleted node and then move
right as far as we can until we hit the end of the tree. The node we reach in
that way is guaranteed to be the largest in the left subtree.
We then replace the deleted node's value with the value we just found and we
delete the node we just found by routing its parent's link around it to its left
child (which might be null). There won't be a right child since we found that
node by going right as far as we could. Note that we need a special case to
handle to possibility that the largest thing in the left subtree is actually the left
child of the node being deleted.
It's good to stress-test this method a bit. Below is some code to do that. The
code adds 100 random integers to a BST and then randomly chooses elements
to delete from the tree.
bst.add(x);
System.out.println(bst);
Running times of
algorithms + Collections.shuffle(list);
Lists + bst.delete(x);
System.out.println(bst);
Java's built-in Integer, Double, and String classes all implement the
Comparable interface. For instance, the following prints out a negative number:
System.out.println("abc".compareTo("def");
this.value = value;
this.suit = suit;
}
Running times of
return ((Integer)this.value).compareTo(other.value);
Lists + }
}
Stacks and Queues +
The compareTo method for this class piggybacks off the compareTo method of
Recursion + Java's built-in Integer class. Here is how we would use the class to compare
cards:
Binary Trees +
Card a = new Card(3, "diamonds");
if (a.compareTo(b) < 0)
else
Ordinary compareTo
a < b a.compareTo(b) < 0
a <= b a.compareTo(b) <= 0
a == b a.compareTo(b) == 0
a != b a.compareTo(b) != 0
a > b a.compareTo(b) > 0
a >= b a.compareTo(b) >= 0
else else
algorithms + } }
Lists + } }
Recursion +
Binary Trees +
Chapter 7. Heaps
Binary Search Trees +
7.1. Introduction
Heaps +
A heap is another kind of binary tree. Like BSTs, heaps maintain their
Sets and Hashing + elements in a kind of sorted order. The main property of heaps is that they are
a way to store data in which continually finding and possibly removing the
Maps + smallest element is designed to be very fast. In particular, finding the
minimum element in a heap is a O(1) operation, compared with O(log n) for a
Graphs +
BST. However, unlike BSTs, heaps do not store all the data in sorted order. It's
Sorting + just the minimum value that is guaranteed to be easy to find.
Exercises + The definition of a heap is that it is a binary tree with two special properties:
1. The value of a parent node is less than or equal to the values of both its
children.
2. The nodes are added into the tree typewriter style. That is, we start at
the root and move through the tree from left to right, completely filling a
level before moving on to the next. The order is shown in the figure
below.
Notice the difference in property 1 between BSTs and heaps. In a BST, values
to the left of the parent are always less than or equal to the parent's value and
values to the right are larger. In a heap, left and right are not important. All
that matters is that the values of the parent be less than or equal to the
values of its children. Property 2 guarantees that the heap is a balanced binary
tree.
Stacks and Queues + On the other hand, the tree on the right is not a heap for two reasons. First,
the node with value 12 has a child with value 9, breaking the rule that the
Recursion + parent has to be less than or equal to the child. Second, there is a hole where
the node with value 4 should have a left child. This tree was not filled
Binary Trees + typewriter-style.
Binary Search Trees +
Heaps +
Maps +
Graphs +
Sorting +
Exercises +
Below is an example where we add the value 3 (shown in green) to the heap.
We start by placing it at the end of the heap. Its value is out of place, however.
We compare its value to its parent, which has value 8. Since 3<8, a swap is
Running times of necessary. We then compare 3 with its new parent, which has value 5. Since
algorithms + 3<5, another swap is needed. We again compare 3 with its new parent and
this time 3>1, so we can stop.
Lists +
Recursion +
Binary Trees +
In the example we start by moving the very last item (value 9) from the end
of the heap to the top. Specifically, that value of 9 is deleted from the bottom
and it replaces the value at the top of the tree. We then compare 9 to its two
new children, 4 and 5. Since 9 is out of place, being larger than them, we
swap it with the smaller child, 4. We then repeat the process with 9's two new
children, and then repeat the process one more time until 9 ends up at the
right place in the heap.
We can greatly reduce the number of checks using a heap. First assume that
the particles are moving in straight line trajectories. We can calculate ahead of
time when two particles will collide. We use a heap of collisions, with the heap
values being the collision times. After each time step we pop off any imminent
collisions from the heap and address them. It may happen that a collision we
calculated ahead of time may not happen because one of the particles got
deflected in the meantime, so we also keep track of which collisions won't
happen. After addressing collisions, we have to recalculate new collisions since
the colliding particles are now moving in different directions, but the total
number of these calculations is usually a lot less than n(n-1)/2.
Java doesn't have a class specifically called a heap, but it does have something
called a priority queue, which can be used as a heap. A priority queue is a
queue where each element is given a priority and when elements are removed
from the queue, they are removed in order of priority. Lots of real-life
situations can be described by priority queues. For instance, in our everyday
lives we have a bunch of things to do, some of which have higher priorities
than others. In computers, operating systems use priority queues to determine
which tasks to run when, as certain tasks are more important than others.
Running times of Internally, Java's PriorityQueue class is implemented with a heap, and we can
algorithms + use the PriorityQueue as a heap. Here is an example:
heap.add(4);
Recursion + heap.add(1);
heap.add(2);
Binary Trees +
Heaps + The remove method will pop things off in numerical order, and the output will
be 1 2 4 8. Note further that the method that returns the top value without
Sets and Hashing + removing it is called element.
Maps +
7.4. Implementing a heap
Graphs +
Shown below is a heap with the nodes labeled in order by level, left to right.
Sorting + Remember that a heap has no gaps. So this suggests that we can represent a
heap using a list, specifically a dynamic array. In particular, the root will be at
Exercises + list index 0, its two children will be at indices 1 and 2, its left child's children
will be at indices 3 and 4, etc.
Graphs + }
}
Sorting +
The easiest method to write is the peek method that returns the minimum
Exercises + value stored in the heap. That value is always stored at the top of the heap,
which is index 0 of the list. Here is the method:
return data.get(0);
Adding to a heap
Recall the process for adding to a heap: We add the new item at the very end
of the heap. Then we compare that item to its parent's value and swap the two
if they are out of order. We then repeat the process at the parent's location, its
parent's location, etc. until the value is at the right place in the tree. Below is
the code for the method:
data.add(value);
int c = data.size() - 1;
int p = (c-1) / 2;
data.set(p, data.get(c));
data.set(c, temp);
c = p;
p = (c-1)/2;
}
The code starts by adding the new item to the end of the list. The code uses
Running times of the p=(c-1)/2 formula for finding the parent of a node. The loop runs until the
algorithms + new value is in the right place in the tree or until we reach the top of the tree.
The code inside the loop swaps the parent and child values and updates the p
Lists + and c variables that keep track of which items we are currently comparing.
Stacks and Queues +
Popping from a heap
Recursion +
The code for popping from a heap takes a little work to get right. The way the
Binary Trees + popping process works is after we pop the top item, we need to find something
to replace it. We replace it with the last item in the heap, since removing that
Binary Search Trees + item maintains the typewriter property of the heap. That item probably doesn't
belong at the top of the heap, so we have to find the right place for it. Here is
Heaps + the code for the method:
Sets and Hashing + public int pop() {
// save the top value and replace it with the last item in the list
data.set(0, data.get(data.size()-1));
Graphs +
data.remove(data.size()-1);
Sorting +
// move the new top to its correct location in the heap
int p = 0;
Exercises +
int c1 = 2*p + 1;
int c2 = 2*p + 2;
data.set(p, data.get(c1));
data.set(c1, temp);
p = c1;
else {
data.set(p, data.get(c2));
data.set(c2, temp);
p = c2;
c1 = 2*p + 1;
c2 = 2*p + 2;
data.set(p, data.get(c1));
data.set(c1, temp);
return returnValue;
Running times of }
algorithms +
The code starts out by saving the top value, and the last line of the method
Lists + returns that value. After saving that top value, we replace the top with the last
thing from the list and remove that last value from the list. The loop contains a
Stacks and Queues +
lot of code, but the concept of what it's doing is relatively straightforward: We
Recursion + start with p=0 and use the formulas c1=2p+1 and c2=2p+2 to get the locations
of the two children. We then compare the parent's value to both of those
Binary Trees + children and swap with whichever one is smaller.
Binary Search Trees + We continue this process until it happens that the parent's value is not larger
than either child or until we reach the bottom of the tree. In particular, the
Heaps + first two conditions in the while loop handle looking for the bottom of the tree,
while the condition on the second line compares the parent and children.
Sets and Hashing +
After the while loop, there's one special case we have to worry about that's not
Maps + covered by the loop. Specifically, it deals with the possibility that the last value
of p is a node that has a left child but no right child. This can only happen at
Graphs +
the very bottom right of the tree.
Sorting +
Exercises +
Sets are useful for storing data where we want to be able to quickly tell if the
value is in the set or not. In particular, by using a technique called hashing, we
can create a set data structure where checking for containment, as well as
adding and deleting elements, are all O(1) operations.
0: []
1: [6]
2: [2, 12]
3: []
To add a new element to the set, we run it through the hash function and add
it to the list at the resulting index. To determine if the set contains an element,
we compute its hash and search the corresponding list for the element. These
are all fast operations as long as the buckets are not too full.
Hash functions
The hash functions we have considered thus far consist of just modding by a
number. This isn't ideal as it tends to lead to a lot of collisions when working
with real-life data. We want to minimize collisions and keep things spread
relatively evenly among the buckets. A good hash function to use with integers
is one of the form h(x)=ax mod b, where a and b are relatively large primes.
We can also have hash functions that operate on other data types, like strings.
For instance, a simple, but not particularly good, hash function operating on
strings of letters might convert the individual letters into numerical values, add
the numerical values of the letters, and then mod by 100. For example, the
Running times of hash of "message" would be 13+5+19+19+1+7+5 mod 100 = 69. Notice how
algorithms + it takes any string and converts it into a value in a fixed range, in this case the
range from 0 to 99.
Lists +
There is a lot that goes into constructing good hash functions. There are many
Stacks and Queues + places to learn more about this if you are interested. For simplicity, here we
will just focus on sets with integer data.
Recursion +
Graphs +
private final int B = 17393;
Sorting +
public HSet() {
buckets.add(new ArrayList<Integer>());
We have made the two primes constants that are easily changeable. The hash
table is a list of buckets, and those buckets are themselves lists of integers, so
we declare that bucket list as a List<List<Integer>>. Each of those buckets
needs to start out as an empty list, and that is what the constructor does: it
fills up the bucket list with empty lists.
Adding an item
To add an item, we compute its hash value to find the right bucket and then
we add the item to that bucket, provided it's not already there. The code for
this is pretty short:
if (!buckets.get(bucket).contains(x))
buckets.get(bucket).add(x);
}
Removing an item
Running times of
algorithms + The code for removing an item is nearly the same as for adding an item. See
below:
Lists +
public void remove(int x) {
if (buckets.get(bucket).contains(x))
}
Binary Trees +
The reason for casting x as Integer is that if we call the remove method with an
Binary Search Trees +
integer argument, it will treat that integer as an index. We want the integer
Heaps + treated as an actual item in the list, and casting it as an Integer object is a
sneaky trick that does that.
Sets and Hashing +
Checking to see if an item is in the set
Maps +
To check to see if an element is in the set, we compute its hash to find its
Graphs + bucket and then look for it in that bucket. It is quick to do:
Sorting + public boolean contains(int x) {
A toString method
To display all the elements in the set, we can loop through the list of buckets,
like below:
@Override
String s = "{";
s += x + ", ";
if (s.equals("{"))
return "{}";
else
Stacks and Queues + of elements in the set. As mentioned, if N is large and M is small (a high load
factor), then things could degenerate to O(n). If things are reversed—if there
Recursion + are a lot of buckets but the set is small—then there will be few collisions, but
lots of wasted space, and iterating through the set will be slowed. Generally, a
Binary Trees + load factor between about .6 and .8 is considered good.
Binary Search Trees + There is a common alternative to chaining called probing or open addressing.
Instead of using lists of elements with the same hash, (linear) probing does
Heaps + the following: Calculate the item's hash. If there is already an element at that
hash index, then check the next index in the bucket list. If it is empty, then
Sets and Hashing + store the item there, and otherwise keep going until a suitable index is found.
There are pros and cons to probing that we won't get into here. More
Maps +
information can be found in other books on data structures or on the web.
Graphs +
Hash functions are used extensively in cryptography. Uses include storing
Sorting + passwords, creating message authentication codes, and testing files for
changes. Hash functions for these purposes are much more sophisticated than
Exercises + the ones we have considered here. In particular, they need to be designed so
that the chance of a collision in a real-world scenario is astronomically small.
The Set interface includes add, remove, and contains methods, among others.
We can iterate through the items of a set with a foreach loop. Here is a loop
through a set of integers:
for (int x : set)
System.out.println(x);
Running times of
algorithms +
8.5. Applications
Lists +
Heaps + Using LinkedHashSet makes sure that the elements stay in the same order.
Graphs + Sets give a quick way to tell if a list contains any duplicates. For instance, the
following tells us if an integer list called list contains duplicates:
Sorting +
if (new HashSet<Integer>(list).size() == list.size())
The code above creates a set from the list, which has the effect of removing
repeats, and it checks to see if the set has the same size as the original list; if
so, then there must be no repeats.
Sets are also useful for storing a collection of elements where we don't want
repeats. For example, suppose we are running a search where as we find
certain things, we add them to a list of things to be checked later. We don't
want repeats in the list as that will waste space and time, so in place of the
list, we could use a set.
One of the best applications of sets is for storing values where you repeatedly
have to check to see if a value is in the set. Suppose we want to find all
English words that are also words when written backwards. For instance, when
the word bat is written backwards, we get tab, which is also an English word.
To code this, it helps to have a method to reverse a string. Here is code for
that:
String t = "";
t += s.charAt(i);
Running times of
return t;
algorithms + }
Lists + Next, assuming we have a file called wordlist.txt that is a list of common
English words, one per line, here is code that stores those words in a set and
Stacks and Queues +
then finds all the words that are words backwards:
Recursion +
Set<String> words = new HashSet<String>();
Binary Trees +
Scanner scanner = new Scanner(new File("wordlist.txt"));
Heaps +
System.out.println(x);
Maps +
It's interesting to try using a list of words instead of a set of words. When I
Graphs + tested this using the set approach on my laptop, it would typically find all of
the words in about 175 milliseconds. When I replaced the set with a list, it
Sorting + took about 2 minutes (120,000 milliseconds). This is a difference about 700
times, a huge speedup for changing just a few characters of code. Note that
Exercises + the list contains method uses a linear search. When I replaced that with a call
to Collections.binarySearch, things were much faster, averaging about 225
milliseconds, but still slower than the set approach.
This demonstrates the difference between O(1) (sets), O(n) (lists with a linear
search), and O(log n) (lists with a binary search).
Union
The union of sets A and B, written A∪ B, is the set containing all the elements
that are in A or B or both. For instance, the union of {1,2,3} and {3,4,5} is
{1,2,3,4,5}.
Java sets have an addAll method that be used to implement the union
operation. That method adds everything from one set into another. For
instance, set1.addAll(set2) will add everything from set2 into set1. If we
want to store the union in a separate set, we can do the following:
Running times of Set<Integer> union = new HashSet<Integer>(set1);
algorithms + union.addAll(set2);
Lists +
Recursion + The intersection of sets A and B, written A∩ B, is all the elements that are in A
and B both. It is what they have in common. For instance, the intersection of
Binary Trees + {1,2,3} and {3,4,5} is {3}. Java's retainAll method can be used to
implement the intersection, as below:
Binary Search Trees +
Set<Integer> intersection = new HashSet<Integer>(set1);
Heaps + intersection.retainAll(set2);
Sets and Hashing +
Maps + Difference
Graphs + The difference of sets A and B, often written as A-B, consists of all the
elements that are in A but not B. For instance, {1,2,3}-{3,4,5} is {1,2}. We
Sorting + can use Java's removeAll method for differences:
Exercises + Set<Integer> difference = new HashSet<Integer>(set1);
difference.removeAll(set2);
Subset
if (set1.containsAll(set2))
Chapter 9. Maps
9.1. Introduction
To motivate the idea of a map, consider this list of the number of days in the
first three months of the year: [31, 28, 31]. We could write the contents of
the list like below, with the indices shown on the left:
0 : 31
1 : 28
Running times of 2 : 31
algorithms +
It would be nice if we could replace those indices 0 through 2 with something
Lists + more meaningful. Maps allow us to do that, like below:
"Feb" : 28
Recursion + "Mar" : 31
Binary Trees + The strings that are taking the place of indices are called keys and the days in
each month are the values. Together, a key and its value form a key-value pair
Binary Search Trees + (or binding).
Heaps + Basically, a map is like a list where the indices can be things other than
integers. Most often those indices (keys) are strings, but they can be other
Sets and Hashing + things like characters, numbers, and objects.
Maps + Maps are known by several other names, including dictionaries, associative
arrays, and symbol tables. Maps can be implemented with hashing in a similar
Graphs + way to how sets are implemented. This means map operations like getting and
setting elements are typically O(1) operations.
Sorting +
months.put("Jan", 31);
months.put("Feb", 28);
months.put("Mar", 31);
System.out.println(months.get("Jan"));
Note that the method for adding things is named put and not set.
Just like with sets, there are three types of maps: HashMap, LinkedHashMap, and
TreeMap. Of these, HashMap is the fastest, though it may scramble the order of
the keys. LinkedHashMap is almost as fast, and it keeps keys in the order they
are added. TreeMap is implemented with a BST and stores the keys in order. It
is not as fast as the other two, but it is still pretty fast. Here are some of the
more useful methods for maps:
Method Description
get(k) returns the value associated with the key k
put(k,v) adds the key-value pair (k,v) into the map
containsKey(k) returns whether k is a key in the map
containsValue(v) returns whether v is a value in the map
keySet() returns the keys as a set
Running times of To loop over a map, use code like below (assuming the keys are strings):
algorithms +
for (String key : map.keySet())
Binary Trees +
Counting points in Scrabble
Binary Search Trees +
Scrabble is a word game where each letter is associated with a point value.
Heaps + The score of a word is the sum of the values of its letters. For instance, the
values of the first five letters are A=1, B=3, C=3, D=2, and E=1. The word
Sets and Hashing + ACE has a total score of 1+3+1=5. A natural way to store the letter scores is
with a map whose keys are the letters and whose values are the point values.
Maps +
Here is some code that builds part of that map and scores a word.
Graphs + Map<Character, Integer> map = new HashMap<Character, Integer>();
map.put('A', 1);
map.put('C', 3);
Exercises +
String word = "CAB";
total = 0;
total += m.get(c)
Note that it is kind of tedious to have to enter 26 put statements to create the
map. There is currently (as of Java 11) no easy way around this in Java
syntax, but one approach would be to create a text file or string that looks like
below and use some code to read it and fill in the map.
A 1
B 3
C 3
...
Assuming this info is in a file called scrabble.txt, here is how to read it into a
map:
while (scanner.hasNext()) {
map.put(stuff[0].charAt(0), Integer.parseInt(stuff[1]));
Substitution cipher
A substitution cipher is a simple encryption technique where we encrypt a
message by replacing each letter by a different letter. For instance, maybe A,
Running times of B, and C are replaced with S, A, and P. Then the message CAB would be
algorithms + encrypted to SPA.
Lists + To use a map for this, the keys are the letters of the alphabet and the values
are what those letters are replaced with. For example, here is how we might
Stacks and Queues + create such a map:
Recursion + Map<Character, Character> map = new HashMap<Character, Character>();
map.put(alpha.charAt(i), key.charAt(i));
Heaps +
Then to encrypt a message, we can do the following (assuming the message
Sets and Hashing + only consists of letters):
Maps +
Graphs +
String encrypted = "";
Sorting +
encrypted += map.get(c);
Exercises +
For decryption, we could use another map with the keys and values reversed
from the map above.
Roman numerals
A nice place where a map can be used is to convert from Roman numerals to
ordinary numbers. The keys are the Roman numerals and the values are their
numerical values. We would like to then compute the numerical value by
looping through the string and counting like below:
int total = 0;
total += romanMap.get(c);
The problem with this is it doesn't work with things like IV, which is 4. The
trick we use to take care of this is to replace each occurrence of IV with some
unused character, like a and add an entry to the map for it. The code is below:
romanMap.put('M', 1000);
romanMap.put('D', 500);
romanMap.put('C', 100);
romanMap.put('L', 50);
romanMap.put('X', 10);
romanMap.put('V', 5);
romanMap.put('I', 1);
romanMap.put( a , 900); // CM
romanMap.put('b', 400); // CD
romanMap.put('e', 9); // IX
int total = 0;
total += romanMap.get(c);
Sets and Hashing +
Maps +
Baseball
Graphs +
Suppose we have a text file called baseball.txt that contains stats for all the
Sorting + players in the 2014 MLB season. A typical line of the file look like this:
Exercises + Abreu, B NYM 12 33 1 14 .248
The entries in each line are separated by tabs. Home runs are the second-to-
last entry. Suppose we want to know which team hit the most total home runs.
We can do this by creating a map whose keys are the team names and whose
values are the total home runs that team has hit. To find the total number of
home runs, we loop through the file, continually adding to the appropriate map
entry, as shown below:
while (textFile.hasNext()) {
int hr = Integer.parseInt(fields[4]);
if (map.containsKey(team))
else
map.put(team, hr);
The map acts like a list of count variables, one for each team. Each time we
encounter a new team, we create a new map entry for that team and initialize
it to the number of home runs in the line we're looking at. If the map entry
already exists, then we add the number of homeruns to the current map
value.
Then we can loop through the map we created to find the maximum (
Baltimore, with 224 home runs):
Running times of
algorithms + int best = 0;
bestTeam = team;
Recursion + }
}
Binary Trees +
text = text.toLowerCase();
The first call to replaceAll uses regular expressions to replace any group of
spaces (including tabs and newlines) with a single space. The second call
removes all characters that are not letters or spaces. This is not ideal, as it
removes hyphens from hyphenated words as well as apostrophes from
contractions, but for demonstration purposes, it's good enough.
Next, we build the map of word frequencies. That map is essentially a whole
collection of count variables, one for each word in the document. The first time
we see a word, we create a new map entry for it with a value (count) of 1, and
otherwise we read the previous count and add 1 to it.
if (!wordCounts.containsKey(word))
wordCounts.put(word, 1);
else
wordCounts.put(word, wordCounts.get(word)+1);
Finally, here is some code that lists the top 100 most frequent words. The code
is a little tricky. It uses the entrySet method that returns an object containing
all the key-value pairs. We put that object into a list and then sort that list by
a special criterion, namely by the value in the key value pair.
algorithms + list.sort(Entry.comparingByValue());
Collections.reverse(list);
Recursion +
Maps +
Graphs +
Sorting +
Exercises + We have seen one type of graph already—trees. A lot of real-life problems can
be modeled with graphs.
Vocabulary
To illustrate these definitions, consider the graph below. There are 5 vertices
and 7 edges. The edges involving vertex a are ab, ac, and ad. So we say that
a is adjacent to b, c, and d and that those vertices are neighbors of a.
10.2. Graph data structures
Running times of
algorithms + There are two types of data structures that are often used to implement
graphs: adjacency lists and adjacency matrices.
Lists +
Recursion + For each vertex of the graph we keep a list of the vertices it is adjacent to.
Here is a graph and its adjacency lists:
Binary Trees +
Heaps +
Maps +
We can use a map to represent the adjacency lists, where the keys are the
Graphs + vertices and the values are lists of neighbors of the corresponding vertex.
Sorting +
Adjacency matrices
Exercises +
The idea of adjacency matrices is that we use a matrix to keep track of which
vertices are adjacent to which other vertices. An entry of 1 in the matrix
indicates two vertices are adjacent and a 0 indicates they aren't adjacent. For
instance, here is a graph and its adjacency matrix:
Adjacency lists use a lot less space than adjacency matrices if vertices tend to
not have too many neighbors, which is the case for many graphs that arise in
practice. For this reason (and others) adjacency lists are used more often than
adjacency matrices when representing graphs on a computer.
10.3. Implementing a graph class
Running times of
algorithms + Java does not have a built-in graph class, but we can implement our own
pretty quickly. We will use an adjacency list approach. Here is the entire class:
Lists +
public class Graph<T> implements Iterable<T> {
Binary Trees + }
if (!neighbors.containsKey(v))
Maps + neighbors.get(u).add(v);
neighbors.get(v).add(u);
Graphs + }
Exercises + }
@Override
return neighbors.keySet().iterator();
The adjacency list approach uses a map whose keys are the vertices. The
value corresponding to each key is a list of vertices adjacent to that key
vertex. We have made that map protected instead of private because we will
later be inheriting from this class to build another type of graph class.
We have used generics here so that the data type of the vertices can be
anything. Most of the time we will use strings.
Adding a new vertex is as simple as creating a new map entry for that vertex
and setting its adjacency list to an empty list. Note that we check to see if a
vertex has already been added. It's not strictly necessary, but it turns out to
be helpful when building graphs because if we accidentally add a vertex to a
graph twice, we don't want to wipe out all the adjacencies for that vertex.
To add an edge between vertices u and v, we add each vertex to the other's
adjacency list.
// do something with v
Stacks and Queues +
In order to be able to loop like this in Java, we have to have our class
Recursion +
implement the Iterable interface, which in turn requires that our class have
an iterator method. Rather than write our own, we use the fact that Java sets
Binary Trees +
have their own iterator method.
Binary Search Trees +
Digraphs
Heaps +
For some problems, it is useful to have edges that go in only one direction
Sets and Hashing + instead of both ways. This is sort of like having one-way streets instead of
two-way streets. A graph where the edges are directional is called a directed
Maps + graph or digraph. An example is shown below:
Graphs +
Sorting +
Exercises +
Implementing a digraph class is pretty similar to implementing an ordinary
graph class. The only difference is in the addEdge method. When adding an
edge directed from u to v, we add v to the adjacency list for u, but we don't do
the reverse. This is what makes the edge directional. Namely, v is a neighbor
of u, but u is not a neighbor of v. Since only the add_edge method will change,
we can create a class for digraphs by inheriting from the Graph class and
overriding the addEdge method, as shown below:
@Override
neighbors.get(u).add(v);
10.4. Searching
Graphs can be used for a number of things. Here we will use graphs to solve a
few types of puzzles, looking for a path in the graph from a starting vertex to a
solution vertex. There are two algorithms we will look at for finding these
paths: breadth-first search and depth-first search.
The basic idea of each of these is we start somewhere in the graph, then visit
that vertex's neighbors, then their neighbors, etc., all the while keeping track
of vertices we've already visited to avoid getting stuck in an infinite loop. The
figure below shows the order in which BFS and DFS visit the vertices of a
graph, starting at vertex a.
Running times of
algorithms +
Lists +
Recursion +
Binary Trees +
The code
Below is some code implementing BFS. It takes a graph and a starting vertex
as parameters and returns a set of all the vertices able to be reached from the
starting vertex by following paths of vertices and edges.
found.add(start);
waiting.add(start);
while (!waiting.isEmpty()) {
T v = waiting.remove();
for (T u : graph.getNeighbors(v)) {
if (!found.contains(u)) {
found.add(u);
waiting.add(u);
algorithms + }
Lists + We maintain a queue called waiting that contains the vertices in line to be
explored, and we maintain a set that contains all the vertices we have found.
Stacks and Queues + Initially, both start out with only v in them. We then continually do the
following: remove a vertex from the waiting list and loop through its
Recursion +
neighbors, adding any neighbor we haven't already found into both the found
set and waiting list. The while loop keeps running as long as the waiting list is
Binary Trees +
not empty, i.e. as long as there are still vertices we haven't explored.
Binary Search Trees +
A remarkable fact is that the code for DFS is almost exactly the same, except
Heaps + that instead of using a waiting queue, we use a waiting stack. Specifically, the
only thing we need to change is to replace waiting.remove with
Sets and Hashing + waiting.removeLast. Specifically, the two algorithms differ only in how they
choose the next vertex to visit. DFS visits the most recently added vertex,
Maps + while BFS visits the earliest added vertex.
Graphs + Let's run BFS on the graph below, starting at vertex a. The first thing the
algorithm does is add each of the neighbors of a to the waiting queue. So at
Sorting + this point, the queue is [b,c,d,e]. BFS always visits the first vertex in the
queue, so it visits b next and adds its neighbors to the list and set. So now the
Exercises +
queue is [c,d,e,f,g]. For the next step, we take the first thing off the queue,
which is c and visit it. We add its neighbors h and i to the list and set. This
gives us a queue of [d,e,f,g,h,i]. Note also that each time we find a new
vertex, we add it to the found set. As the process continues, the queue will
start shrinking and the found set will eventually include all the vertices.
Now let's look at the first few steps of DFS on the same graph, starting at
vertex a. It starts by adding the neighbors of a to the waiting stack and found.
After the first step, the stack is [b,c,d,e]. DFS then visits the last thing on the
stack, e, as opposed to BFS, which visits the first. DFS adds the neighbors of
e, which are j, k, and l, to the stack, which is now [b,c,d,j,k,l]. It visits the last
vertex in the stack, l, and adds its neighbors m and n to the list. This gives a
stack of [b,c,d,j,k,m]. We then visit vertex m. It has only one neighbor, k,
which has already been found, so we do nothing with that neighbor. This is
how we avoid getting stuck in an infinite loop because of the cycle. The stack
at this point is now [b,c,d,j,k] and the found set is {a,b,c,d,e,j,k,l,m}. We visit
k next. It has a neighbor e, but we don't add e to the stack because e is in the
visited set already. The waiting stack is now [b,c,d,j]. The next vertex visited
is j. Notice how the algorithm has essentially backtracked to e and is now
Running times of searching in a different direction from e. The algorithm continues for several
algorithms + more steps from here, but for brevity, we won't include them.
Lists +
Shortest paths
Stacks and Queues +
The searches above find all the vertices that can be reached from a given
Recursion + starting vertex. This is nice, but it is often more useful to be able to find a path
between two given vertices. We can take the BFS code above and modify it
Binary Trees + slightly to find such a path, and because of the way BFS fans out, the path
found will be as short as possible. Here is the code:
Binary Search Trees +
public static <T> List<T> bfsPath(Graph<T> graph, T start, T end) {
waiting.add(start);
T v = waiting.remove();
if (!parent.containsKey(u)) {
Exercises + waiting.add(u);
parent.put(u, v);
if (u.equals(end)) {
while (u != null) {
path.add(u);
u = parent.get(u);
Collections.reverse(path);
return path;
What is different here from the BFS code is that instead of a found set, we use
a map called parent. Each time we find a new vertex, we record which vertex
we reached it from. Once we find the goal vertex, we use that parent map to
work our way backward to the start vertex.
Running times of A word ladder is a type of puzzle where you are given two words and you have
algorithms + to get from one to another by changing a single letter at a time, with each
intermediate word being a real word. For example, to get from time to rise, we
Lists + could do time → tide → ride → rise.
Stacks and Queues + We can model this as a graph whose vertices are words, with edges whenever
two words differ by a single letter. Solving a word ladder amounts to finding a
Recursion + path in the graph between two vertices. The first thing we will need is a
method to determine if two words are one letter away from each other:
Binary Trees +
public static boolean oneAway(String s, String t) {
return false;
Heaps +
int total = 0;
if (s.charAt(i) == t.charAt(i))
Maps +
total++;
Graphs +
return total == s.length()-1;
}
Sorting +
Exercises + Now we create the graph. Using a wordlist, we read in all the 4-letter words
from the list. Each of those becomes a vertex of the graph. We then use a
nested loop to compare each word to each other word and add an edge
between any pairs that are one letter apart. This is an O(n2) approach that
takes a few seconds to put together the entire graph. It's a nice exercise to try
to find a faster approach.** Once the graph is built, we can call the bfsPath
method to find the shortest path between two given words, giving us a word
ladder between them. Here is the code:
while (scanner.hasNext()) {
String w = scanner.nextLine();
if (w.length() == 4)
words.add(w);
if (oneAway(words.get(i), words.get(j)))
graph.addEdge(words.get(i), words.get(j));
Graphs +
Here is some code that parses through the file and fills up the graph:
Sorting +
Graph<String> graph = new Graph<String>();
while (scanner.hasNext()) {
String[] a = line.split("/");
graph.add(a[0]);
graph.add(a[i]);
graph.addEdge(a[0], a[i]);
We can then call bfsPath to find the shortest path between any pair of actors.
It turns out that there almost always a path between any two actors that
you've ever heard of, and that path is usually pretty short, rarely needing
more than a couple of intermediate movies.
For example, maybe we could start by filling up the 7-gallon pail, then
dumping it into the 11-gallon pail. We could then fill up the 7-gallon pail and
Running times of then dump it into the 11-gallon again, leaving 3 gallons in the 7-gallon pail.
algorithms + Maybe then we could dump the 11-gallon pail out completely. We could keep
going doing operations like this until we have 6 gallons in one of the pails.
Lists +
We can represent this problem with a digraph. The vertices are the possible
Stacks and Queues + states, represented as pairs (x,y), where x and y are the amount of water in
the 7- and 11-gallon pails, respectively. There is an edge between two vertices
Recursion + if it is possible to go from their corresponding states, following the rules of the
problem. For example, there is an edge between (3,0) and (0,0) because
Binary Trees + starting at the (3,0) state, we can dump out the first to get to the (0,0) state.
However, there is not an edge going in the opposite direction because there is
Binary Search Trees +
no way to go from (0,0) to (3,0) in one step following the rules.
Heaps +
To create the graph, we first make a class to represent the states. The class
has two integer fields, x and y. We have a simple toString method for the
Sets and Hashing +
class. We override the equals method because we will need to test states for
Maps + equality, and we override hashCode because this class is used for vertices,
which are themselves keys in a hash map. We have used an IDE to generate
Graphs + these methods. Though the code for this class is long, it is very simple and
useful in other places.
Sorting +
private static class Pair {
private int y;
@Override
this.x = x;
this.y = y;
@Override
int result = 1;
return result;
@Override
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() ! obj.getClass())
return false;
Running times of Pair other = (Pair) obj;
algorithms + if (x != other.x)
return false;
Lists + if (y != other.y)
return false;
Stacks and Queues + return true;
Recursion + }
Binary Trees + Below is the code that creates the digraph. An explanation follows.
int M = 7, N = 11;
Heaps +
Maps +
for (Pair p : G) {
int y = p.y;
Sorting +
if (x != 0 && y != N) {
if (x + y > N)
else
if (y != 0 && x != M) {
if (x + y > M)
else
The code starts by adding edges for all the states (0,0), (0,1) up through
(M,N). Then for every one of those states we add edges. The first two edges
added correspond to emptying out one of the pails. The next two correspond
to completely filling up one of the pails.
After that, we have two more edges where we pour one pail into the other. We
have to be careful here as doing this will sometimes completely empty one pail
and sometimes not, and there is a little math involved to figure out exactly
how much will end up in each pail in both cases. We also have to avoid trying
Running times of to pour into an already full pail or trying to pour from an empty pail.
algorithms +
Shortest paths
An important type of graph is a weighted graph, where the edges have
numerical values called weights attached to them. Suppose we have a graph
where the vertices are locations and the edges are roads connecting them.
Weights could be the costs of using those roads, in terms of time, distance, or
money. We would then be interested in the shortest or cheapest path between
two vertices. The shortest path algorithm is a relative of the BFS algorithm we
wrote earlier. This, or something like it, is used by GPS systems to find
directions.
Spanning trees
A spanning tree of a graph is a subgraph that is a tree and contains all the
vertices. Spanning trees are useful in routing packets over a network.
Networks can often be complicated and it's possible to get a routing loop,
where a packet gets caught in a loop and never gets to its destination. By
Running times of finding a spanning tree in a network, we get a subgraph that still connects all
algorithms + the vertices but contains no cycles, which will allow us to prevent routing
loops.
Lists +
For clarity, the figures demonstrating the sorts will use letters instead of
numbers.
Our sorts will directly modify the caller's array. We could also leave the
Running times of array untouched and return a new sorted array, but we've chosen this
algorithms + approach for simplicity.
Lists + The first three sorts that we cover—insertion sort, mergesort, and quicksort—
are arguably the three most important sorts. The other sorts we cover are nice
Stacks and Queues + to know, but not as important as the these three.
Recursion +
11.2. Insertion sort
Binary Trees +
Binary Search Trees + Insertion sort is an O(n2) sorting algorithm. It is slower than mergesort and
quicksort for large arrays, but it is simple and faster than those sorts for very
Heaps + small arrays (of size up to maybe a few dozen elements). Mergesort and
quicksort require a certain amount of setup work or overhead due to recursion
Sets and Hashing + that insertion sort doesn't. Once they get going, however, they will usually
outperform insertion sort.
Maps +
To understand the basic idea, suppose we have a stack of papers we want to
Graphs + put into in alphabetical order. We could start by putting the first two papers in
order. Then we could put the third paper where it fits in order with the other
Sorting + two. Then we could put the fourth paper where it fits in order with the first
three. If we keep doing this, we have what is essentially insertion sort.
Exercises +
To do insertion sort on an array, we loop over the indices of the array running
from 1 to the end of the array. For each index, we take the element at that
index and loop back through the array towards the front, looking for where the
element belongs among the earlier elements. These elements are in sorted
order (having been placed in order by the previous steps), so to find the
location we run the loop until we find an element that is smaller than the one
we or looking at or fall off the front of the array. The figure below shows it in
action.
int j = i;
a[j] = a[j-1];
Binary Trees + }
Binary Search Trees + The outer loop goes through each new element of the array, and the inner
determines where to place that element.
Heaps +
We see the nested loops in the code, which is where the O(n2) running time
Sets and Hashing +
comes from. If the list is already sorted, the condition on the inner while loop
Maps + will always be false and insertion sort runs in O(n) time, which is the best any
sorting algorithm can manage. In fact, insertion sort can run relatively quickly
Graphs + even on very large arrays if those arrays are nearly sorted. On the other hand,
if the list is in reverse order, the while loop condition is always true and we
Sorting + have n(n-1)/2 swaps, which is about as bad as things can get for a sort.
Exercises +
11.3. Mergesort
Mergesort is one of the fastest sorts available. It runs in O(n log n) time for all
inputs. The sorting algorithms built into many programming languages
(including Java) use mergesort or a variation of it called timsort.
Mergesort works as follows: We break the array into two halves, sort them,
and then merge them together. The halves themselves are sorted with
mergesort, making this is a recursive algorithm. For instance, say we want to
sort the string CEIGJBFDHA. We break it up into two halves, CEIGJ and BFDHA.
We then sort those halves (using mergesort) to get CEGIJ and ABDFH and
merge the sorted halves back together to get ABCDEFGHIJ. See the figure
below:
The way the merging process works is we have position markers (counters) for
Running times of
each of the two halves, each initially at the starts of their corresponding
algorithms +
halves. We compare the values at the markers, choose the smaller of the two,
Lists + and advance the corresponding marker. We repeat the process until we reach
the end of one of the halves. After that, we append any remaining items from
Stacks and Queues + the other half to the end. The first few steps are shown below:
Recursion +
Binary Trees +
Heaps +
Maps +
Here is the code for mergesort:
Graphs +
public static void mergesort(int[] a) {
return;
Exercises +
mergesort(left);
mergesort(right);
a[i] = m[i];
m[z++] = a[x++];
else
m[z++] = b[y++];
m[z++] = a[x++];
m[z++] = b[y++];
return m;
}
Running times of
algorithms + After the base case, the mergesort method breaks the list into its left and right
halves using the Arrays.copyOfRange method. It then recursively calls
Lists + mergesort on those two halves and calls the merge method to put things back
together. At the end we copy the result back into the a variable, overwriting
Stacks and Queues + the old array with the new, sorted one.
Recursion + The merge method first declares a new array that will hold the merged result
and it sets up counters x, y that are the position markers mentioned above.
Binary Trees + The counter z is to keep track of what index we are adding to in the m array.
We then loop until one of the counters x and y reaches the end of its array. In
Binary Search Trees + that loop, we compare the array values at those indices and put the smaller
one into the m array. After the loop, if there is anything left in either a or b, will
Heaps +
add all of it to the end of m.
Sets and Hashing + One thing to note is that to keep the code short, we use the ++ operator. When
used like this, the incrementing happens after the assignment. So, for
Maps +
instance, the code on the left is equivalent to the code on the right.
Graphs +
m[z++] = a[x++]; m[z] = a[x];
Sorting + z++;
x++;
Exercises +
Not everyone likes this way of doing things, but some people do and you will
likely see it in the future, especially if you work with C or C++.
This version of mergesort is a little slow. There are several places that it can
be improved, but one in particular stands out—we waste a lot of time creating
new arrays. Creating and filling a new array is time-consuming, and we create
three new arrays with each recursive call of the function. There is a way
around this that involves doing most of the sorting and merging within the
original array. We will need to create just one other array, b, with the same
size as a as a place to temporarily store some things when merging. The key is
that instead of creating the left and right arrays, we will work completely
within a, using indices called left, mid, right to keep track of where in the
array we are currently sorting. These become parameters to the merge
function itself. The faster code is below:
mergesortHelper(a, b, 0, a.length);
if (right-left <= 1)
return;
algorithms +
else
Binary Trees +
}
Sets and Hashing +
If you look closely, you'll see that the structure of this code really is the same
Maps + as what we wrote earlier, just reworked to avoid continually creating arrays.
Graphs + Mergesort runs in O(n log n) time. Unlike many other sorting algorithms,
mergesort is pretty consistent in that it runs in O(n log n) time for all inputs.
Sorting + There really aren't any inputs that it does really well with or really poorly with.
Exercises +
11.4. Quicksort
Quicksort is one of the fastest sorting algorithms. It is built into a number of
programming languages. Like mergesort, it works by breaking the array into
two subarrays, sorting them and then recombining.
Quicksort first picks an element of the array to serve as a pivot. For simplicity,
we will use the first element. Quicksort then breaks the array into the portion
that consists of all the elements less than or equal to the pivot and the portion
that consists of all the elements greater than the pivot. For instance, if we are
sorting CEIGJBFDHA, then the pivot is C, and the two portions are BA
(everything less than the pivot) and EIGJFDH (everything greater than the
pivot).
We then sort the two portions using quicksort again, making this a recursive
algorithm. Finally, we combine the portions. Because we know that all the
things in the one portion are less than or equal to the pivot and all the things
in the other portion are greater than the pivot, putting things together is
quick. See the figure below:
Running times of
algorithms +
Lists +
return;
Heaps +
Maps +
Graphs + int d = 0, e = 0;
smaller[d++] = a[i];
Exercises + else
larger[e++] = a[i];
quicksort(smaller);
quicksort(larger);
int c = 0;
a[c++] = pivot;
a[c++] = x;
After the base case, the code creates two arrays to hold the things less than or
equal to the pivot and the things greater than the pivot. We then loop through
the main array to fill up these two arrays. We initially don't know how large to
make the two arrays, so we set their initial sizes to a.length-1 and then use
Arrays.copyOf to resize them to their correct sizes. After this, we call quicksort
recursively and then combine the two parts along with the pivot into the final,
sorted result.
Lists + We use two indices, i and j, with i starting at the left end of the array and j
starting at the right. Both indices move towards the middle of the array. We
Stacks and Queues + first advance i until we meet an element that is greater than or equal to the
pivot. We then advance j until we meet an element that is less than or equal
Recursion +
to the pivot. When this happens, we swap the elements at each index. We
then continue advancing the indices and swapping in the same way, stopping
Binary Trees +
the process once the once the i and j indices meet each other. See the figure
Binary Search Trees + below:
Heaps +
Maps +
Graphs +
Sorting +
Exercises +
In the figure above, the pivot is M, the first letter in the array. The figure is
broken down into “seek” phases, where we advance i and j, and swap phases,
where we exchange the values at those indices. The blue highlighted letter on
the left corresponds to the position of i and the green highlighted letter on the
right corresponds to the position of j. Notice at the last step how they cross
paths.
This process partitions the array. We then make recursive calls to quicksort,
using the positions of the indices i and j to indicate where the two subarrays
are located in the main array. In particular, we make a recursive call on the
subarray starting at the left end and ending at the position of i, and we make
a recursive call on the subarray starting at the position of j and ending at the
right end. So in the example above, we would call quicksort on the subarrays
from 0 to 8 and from 8 to 13. Here is the code for our improved quicksort:
quicksort_helper(a, 0, a.length-1);
return;
while (i <= j) {
i++;
Running times of
while (a[j] > pivot)
algorithms +
j--;
if (i <= j) {
Lists +
int hold = a[i];
a[i++] = a[j];
Recursion + }
Heaps + Quicksort generally has an O(n log n) running time, but there are some special
cases where it degenerates to O(n2). One of these would be if the array was
Sets and Hashing + already sorted. In that case the subarray smaller will be empty, while the
subarray larger will have size n-1. When we call quicksort on larger, we will
Maps + again have the same problem with the new smaller array being empty and the
new larger array having size n-2. This will continue with the larger array
Graphs +
going down in size by 1 at each step. This will lead to n recursive calls, each
Sorting + taking O(n) time, so we end up with an O(n2) running time overall. Since real-
life arrays are often sorted or nearly so, this is an important problem. One way
Exercises + around this problem is to use a different pivot, such as one in the middle of
the array or one that is chosen randomly.
Selection sort
Selection sort is one of this simplest sorts to implement, but it is also slow. It
works as follows: Start by considering the first element of the array in relation
to all the other elements. Find the smallest element among the others and if it
is less than the first element, then swap them. Now move on to the second
element of the array. Look at all the elements after the second, find the
smallest element among them, and if it is smaller than the second element,
then swap them. We continue the same process for the third, fourth, etc.
elements in the array, until we reach the end.
This method is so slow that it is not really worth using except that a variation
of it is extremely quick to implement, consisting of nested loops, a
comparison, and a swap, as shown below:
a[j] = hold;
algorithms +
}
Lists +
}
Exercises + We then continue the process, where at each step, we start at the end of the
array and compare adjacent elements and continue doing that until we get to
the stopping point. At each step, the stopping point moves one index forward
until we reach the end of the array. The figure below shows the order in which
elements are compared. The arrows indicate when a swap is performed.
We use nested for loops to implement the algorithm. The outer loop keeps
track of the stopping point. It starts at the front of the array and works its way
toward the end. The inner loop always starts at the end of the array and ends
Running times of at the stopping point. Inside the loop we compare adjacent elements and swap
algorithms + if necessary.
a[j-1] = a[j];
Heaps + }
Sets and Hashing +
Just like selection sort, bubble sort is O(n2), making n(n-1)/2 comparisons,
Maps + regardless of whether the array is already sorted or completely jumbled.
However, we can actually improve on this. If there are no swaps made on one
Graphs + of the passes through the array (i.e., one run of the inner loop), then the array
must be sorted and we can stop. This improvement, however, is still not
Sorting + enough to make bubble sort useful for sorting large arrays. In fact, its
performance can be worse than the ordinary bubble sort for large, well-mixed
Exercises + arrays. .
Shellsort
Shellsort builds on insertion sort. Here is an example of one variation of
shellsort: We start by pulling out every fourth character starting with the first
and using insertion sort to sort just those characters. We then pull out every
fourth character starting with the second character and use insertion sort to
sort those characters. We do the same for every fourth character starting with
the third and then every fourth starting with the fourth. See the figure below:
We then do something similar except we break up the array at a finer scale,
Running times of first breaking it up at every second character starting with the first, sorting
algorithms + those characters, and then breaking it up at every second character starting
with the second and sorting, as shown below:
Lists +
Recursion +
Binary Trees +
Sets and Hashing + We run a final insertion sort on the entire array to sort it. This step is pretty
quick because only a few swaps need to be made before the array is sorted.
Maps + Now it may seem like we did so many individual sorts that it would have been
more efficient to just have run a single insertion sort in the first place, but that
Graphs + is not the case. The total number of comparisons and swaps needed will often
be quite a bit less for shellsort than for insertion sort.
Sorting +
In the example above, we used gaps of length 4, 2, and 1. The sequence
Exercises + (4,2,1) is called a gap-sequence. There are a variety of different gap
sequences one could use, the only requirement being that the last gap is 1. An
implementation of shellsort is shown below using a very specific gap sequence.
The code is surprisingly short and very efficient as it essentially works on all
the subarrays for a given gap at once. In the example above, with a gap of 4,
we did four separate steps, but the code below would handle them all together.
int j = i;
a[j] = a[j-gap];
j-=gap;
a[j] = hold;
Recursion + Heapsort uses a heap to sort an array. Recall that a heap is a data structure
that gives easy access to the smallest item. Using a heap to sort an array is
Binary Trees + very easy—we just run through the array, putting the elements one-by-one
onto the heap and when that's done, we then one-by-one remove the top
Binary Search Trees + element from the heap. Since the top item of a heap is always the smallest
item, things will come out in sorted order.
Heaps +
The PriorityQueue class in Java's Collections Framework is implemented using
Sets and Hashing + a heap. So we can use Java's PriorityQueue to implement heapsort. Here is
the code:
Maps +
public static void heapsort(int[] a) {
heap.add(a[i]);
Exercises +
a[i] = heap.remove();
Heapsort has an O(n log n) running time. When we insert the elements into
the heap we have to loop through the array, so that's where the n comes from.
Adding each element to the heap is an O(log n) operation, so overall adding
the elements of the array into the heap takes O(n log n) time. Looping through
and removing the items also takes O(n log n) time. Heapsort is one of the
faster sorts. It runs in O(n log n) time regardless of whether the data is
already sorted, in reverse order, or whatever.
A close relative of heapsort is treesort that works in the same way but uses a
BST in place of a heap.
Counting sort
If we know something about the data in our array, then we can do better than
the O(n log n) algorithms we've seen. For instance, if we know that we know
the array contains only values in a relatively small range, then we can use an
O(n) sort known as counting sort.
Counting sort works by keeping an array of counts of how many times each
element occurs in the array. We scan through the array and each time we
meet an element, we add 1 to its count. Suppose we know that our arrays will
only contain integers between 0 and 9. If we have the array
[1,2,5,0,1,5,1,3,5], the array of counts would be [1,3,1,1,0,3,0,0,0,0]
because we have 1 zero, 3 ones, 1 two, 1 three, no fours, 3 fives, and no
Running times of
sixes, sevens, eights, or nines.
algorithms +
We can then use this array of counts to construct the sorted list
Lists + [0,1,1,1,2,3,5,5,5] by repeating 0 one time, repeating 1 three times,
repeating 2 one time, etc., starting from 0 and repeating each element
Stacks and Queues +
according to its count. Here is code for the counting sort. The max argument
Recursion + needs to be at least as large as the largest thing in the array.
for (int x : a)
Heaps + counts[x]++;
a[c++] = i;
Graphs + }
Sorting + This simple sort has an O(n) running time. The only drawback is that it only
works if the array consists of integers between 0 and max. This approach
Exercises + becomes infeasible if max is very large.
Regarding the big O running times, remember that there are constants
involved. For instance, selection sort and bubble sort are both O(n2)
algorithms, but it may turn out that selection sort's running time is something
like 10n2 and bubble sort's is 15n2, making selection sort a bit faster. These
values of 10 and 15 are made up, but the point is that there are constants in
the running times that do not appear in the big O notation, and those
constants are important in the overall running time.
A bit more practically, the O(n2) insertion sort beats the O(n log n) mergesort
Running times of for small arrays. This could happen if insertion sort's running time looks
algorithms +
something like 0.2n2 and mergesort's running time looks like 2n log n. For
Lists + values up until around 50, 0.2n2 is smaller, but after n=50 the n2 term starts
to dominate n log n. So in this scenario, insertion sort is faster than mergesort
Stacks and Queues + up through around n=50.
Recursion + There is more to the story than just big O. One thing that matters is exactly
how the algorithms are implemented and on what sort of data they will be
Binary Trees + used. For instance, working with integer arrays where comparisons are simple
is a lot different than working on arrays of objects where the comparison
Binary Search Trees + operation might be fairly complex. Similarly, swapping integers is quicker than
swapping objects. So in certain situations a sorting algorithm that uses less
Heaps +
comparisons or less swaps may be better.
Sets and Hashing +
Another consideration is memory usage. Most of the O(n log n) algorithms
Maps + have memory usage associated with either a data structure, recursive
overhead, or using auxiliary arrays to hold intermediate results.
Graphs +
Also, we need to consider how the algorithms work with memory. For instance,
Sorting + it is important whether the algorithm reads from memory sequentially or
randomly. This is important, for example, when sorting large files that require
Exercises + reads from a hard disk drive. Repositioning the read head on a hard disk is
slow, whereas once it is positioned, reading the next elements in sequence is
relatively fast. So, since mergesort does sequential reads and quicksort does
random ones, mergesort may be better when a lot of reading from a hard disk
is required. Some of these same considerations apply to how well the
algorithm makes use of cache memory.
One last practical consideration is that real-life data often has some structure
to it. For instance, timsort, a variation of mergesort, takes advantage of the
fact that real-life data there are often pockets of data, called runs, that are
sorted or nearly so within the larger array.
There are still further considerations that we won't get into here. In summary,
each of the algorithms has its good and bad points, but of all the algorithms,
quicksort and mergesort are probably the most used.
Maps +
[†] The results for shellsort vary depending on the gap sequence. For
Graphs + some gap sequences, the worst-case running time is O(n2), O(n4/3) or
even O(n log2 n). The average case running times for most gap
Sorting + sequences used in practice are unknown. For already sorted data, the
running time can be O(n log n) or O(n), depending on the gap sequence.
Exercises +
[‡] Depends on the pivot. Choosing the first element as the pivot will
make this O(n2), but choosing the middle element will make it O(n log
n).
Notice in particular, the difference between the O(n) counting sort, the O(n log
2
n) heapsort, mergesort, and quicksort, and the O(n ) selection sort, bubble
sort, and insertion sort. The difference is especially clear for n= 100,000.
Running times of
algorithms + Here are the results sorted by algorithm:
Maps + The tests for large arrays were not run on insertion sort, bubble sort, or
selection sort because they would take far too long. In fact, when looking at
Graphs + the times for these algorithms, we can see the O(n2) growth. Look, for
instance, at the bubble sort times. We see that as n goes from 100 to 1000
Sorting + (up by a factor of 10), the time goes from .115 to 1.73, an increase of roughly
100 times. From n=1000 to 10000 we see a similar near hundredfold increase
Exercises +
from 1.73 to 157.6. From n=10000 to 100000, it again goes up by a factor of
nearly 100, from 157.6 to 18262.3. This is quadratic growth —a tenfold
increase in n translates to a hundredfold (100=102) increase in running time.
Were we to try running bubble sort with n=1,000,000, we would expect a
running time on the order of about 2000000 milliseconds (about 30 minutes).
At n=10,000,000, bubble sort would take about 3000 minutes (about 2 days).
Here is the variation on the selection sort algorithm on an integer array from
Section 11.5:
a[i] = a[j];
a[j] = hold;
Running times of }
algorithms + }
Stacks and Queues + public static <T extends Comparable<T>> void selectionSortVariant(List<T> a) {
if (a.get(i).compareTo(a.get(j)) > 0) {
a.set(i, a.get(j));
Heaps + }
int[] a = {3,9,4,1,3,2};
Arrays.sort(a);
Collections.addAll(list, 3,9,4,1,3,2);
Collections.sort(list);
In Java 8 and later, lists now have a sorting method, as shown below:
Collections.addAll(list, 3,9,4,1,3,2);
list.sort();
Exercises + Here is another example. Say we have a class called Record that has three
fields—firstName, lastName, and age—and we want to sort a list of records by
age.
If we have a getter written for the age, we could also do the following:
list.sort(Comparator.comparing(Record::getAge));
if (!r1.lastName.equals(r2.lastName))
return r1.lastName.compareTo(r2.lastName);
else
Lists +
Binary Search Trees + 1. Order the running times below from slowest to fastest:
Heaps + O(log n), O(2n), O(1), O(n), O(n2), O(n3), O(n log n),
Sets and Hashing + 2. The running time certain algorithms has been determined to be exactly
the values given below. Using big O notation, how would people usually
Maps + write these (in their simplest possible form)?
Graphs + a. n2+2n+3
Sorting + b. 3+5n2
Exercises +
c. 4.034n3+2.116n+9.037
d. 24n+10n4+270.09
e. n+log n
f. 100 · 2n + n!/5
Lists + count++;
Stacks and Queues + b. int sum=0;
Recursion +
for (int j=0; j<a.length; j++) {
Binary Trees +
sum += a[i]*a[j]*a[k];
sum += a[i]*a[j]*a[k];
Graphs + }
}
Sorting +
d. int i = 1;
System.out.println(a[i]);
i *= 2;
e. if (a.length < 2)
System.out.println("Too small.");
System.out.println("Too large.");
else
System.out.println(a[a.length-1]);
System.out.println(a[i] - a[j]);
System.out.print(a[j] - a[i]);
a[i] = 0;
h. int i = a.length-1;
while (i >= 1) {
int j=0;
a[j]+=i;
j++;
i /= 2;
Running times of }
algorithms +
i. int sum=0;
while (i >= 1) {
i /= 3;
Maps +
}
Graphs +
l. for (int i=0; i<a.length; i++) {
Sorting +
System.out.println(a[i] + a[j]);
Exercises +
m. int i = a.length-1;
while (i>=1) {
a[j]+=i;
i /= 3;
a[i] = 0;
o. int sum = 0;
sum += a[i]*j;
p. int sum = 0;
sum += a[i]*j;
q. int sum = 0;
System.out.println((a[0]+a[a.length-1])/2.0);
r. int sum = 0;
algorithms + int j = 0;
sum += a[i]*j;
Recursion + }
System.out.println(Math.pow(a[0], i));
Binary Search Trees +
t. int i = 1;
a[i] = 0;
Maps +
Graphs + System.out.println(a);
Sorting +
12.2. Chapter 2 Exercises
Exercises +
1. If we create an AList object, add values into it so it equals [1, 3, 6, 10,
15] and then call the f method, what will the list look like after the call?
if (i % 2 == 1)
data[i] *= 2;
}
2. Keeping in mind that when Java creates an array, it initializes all the
elements to 0, if we create a method g for the AList class as given below
and then run the code below it in the main method of some class, what
will the output be?
numElements = n;
list.add(1);
list.g(4);
System.out.println(list);
3. If we create a new AList object and call the add method 40 times, what
will the value of capacity be?
4. What is the output of the following AList method on the list
[1,2,3,4,5,6]?
Running times of
algorithms + for (int i=0; i<numElements-1; i+=2) {
int x = a[i];
a[i+1] = x;
Recursion + 5. Add the following methods to the dynamic array class AList.
Binary Trees + a. addAll(a) — Adds all the elements of the array a to the end of the
list.
Binary Search Trees +
b. addSorted(x) — Assuming the list is in sorted order, this method
Heaps + adds x to the list in the appropriate place so that the list remains
sorted.
Sets and Hashing +
c. allIndicesOf(x) — Returns an integer ArrayList consisting of every
Maps + index in the list where x occurs.
Graphs + d. clear() — Empties all the elements from the list.
Sorting + e. copy() — Returns a new AList object which is a copy of the
current one.
Exercises +
f. count(x) — Returns the number of occurrences of x in the list.
Sets and Hashing + t. shrink() — Checks if the array holding the data is less than half
full. If so, it replaces the data array with a new data array half the
Maps + size and copies all the old data over to it.
list.delete(i);
7. Suppose an LList object is created and filled with items so that it equals
[1,2,3,4,5]. Suppose that we add the following method to the LList
class and then call list.f(). What will the list look like after this
operation?
if (front == null)
return;
node = node.next;
front.next = node;
8. What will the list {[1,2,3]} look like after the LList method below is
called?
algorithms + }
}
Lists +
9. In the method of the previous problem, if the if statement is removed, a
Stacks and Queues + null pointer exception could result. Give an example of a list that would
cause the null pointer exception and specify the exact line on which that
Recursion + exception will occur.
Binary Trees + 10. If the method below is called on the list [0, 1, 2, 0, 1, 2], what will
the list look like after the method is called?
Binary Search Trees +
public void f() {
if (node.data == 0)
else
Exercises +
prev = node;
node = node.next;
11. Suppose we write a linked list method and the first line of the method is
Node node = front.next. What will happen if the method is called and
the list is empty? Explain why, and explain how to fix things.
12. Add the following methods to the linked list class LList class.
Important note: For all of the methods below, your code must
not call any of the methods already written for the class. In other
words, you can't use size, get, delete, etc.
c. clear() — Empties all the elements from the list (i.e., it sets the
list back to []).
Running times of d. combine(list2) — Adds all of the items of the LList object list2 to
algorithms + the end of the current list. For instance, if list = [1,2,3], list2 =
[4,5,6], then list.combine(list2) will change list into
Lists + [1,2,3,4,5,6].
Stacks and Queues + e. count(x) — Returns the number of occurrences of x in the list.
m. removeFront() — Removes the first thing in the list and leaves the
list alone if it is empty.
n. rotate() — Shifts the elements of the list so that the last item in
the list is now first and everything else is shifted right by one
position.
Stacks and Queues + p. sumNegatives() — Returns the sum of all the negative entries in
the list. For instance, if the list is [2,-4,1,-5], the method would
Recursion + return -9.
Binary Trees + 13. Create a class called LList2. This will be a linked list class that has
pointers to both the front and the back of the list. Copy the Node class
Binary Search Trees + and toString method over from the LList class. Then add the following
methods:
Heaps +
a. A constructor that creates an empty list.
Sets and Hashing +
b. A method addToFront(x) that adds x to the front of the list and
Maps + runs in O(1) time.
Graphs + c. A method add(x) that adds x to the back of the list and runs in
O(1) time.
Sorting +
d. A method delete(int index) that deletes the item at the given
Exercises +
index. It should run in O(n) time. [Hint: be careful to update either
the front, back, or both, if necessary.]
15. Add a method called rotate to the circularly linked list class. This
method should rotate all the elements to the left by one unit, with the
end of the list wrapping around to the front. For instance, [1,2,3,4]
should become [2,3,4,1]. Your solution to this part should run in O(1)
time.
16. Implement a doubly-linked list, giving it the same methods as the LList
class.
18. Rewrite the doubly-linked list class to include a field called back that
keeps track of the back of the list. To do this properly, you will have to
modify many of the methods to keep track of the new field.
1. 44
2. 23
3. 18
4. 5
5. 23
There were 5 numbers generated. The repeated number first appeared as #2.
Binary Trees + 22. Each line of the file elements.txt has an element number, its symbol,
and its name, each separated by a space, followed by a hyphen and then
Binary Search Trees + another space. Use this file to create a list that contains the symbols of
all the elements. It should be ["H", "He", "Li", ...].
Heaps +
23. Write a program that asks the user to enter a length in feet. The
Sets and Hashing + program should then give the user the option to convert from feet into
inches, yards, miles, millimeters, centimeters, meters, or kilometers.
Maps + Say if the user enters a 1, then the program converts to inches, if they
enter a 2, then the program converts to yards, etc. While this can be
Graphs +
done with if statements, it is much shorter with lists and it is also easier
Sorting + to add new conversions if you use lists, so please use an approach that
uses lists to store the conversions.
Exercises +
24. Write a simple quiz program that reads a bunch of questions and
answers from a file and randomly chooses five questions. It should ask
the user those questions and print out the user's score. Store the
questions and answers in lists.
25. Write a program that estimates the average number of drawings it takes
before the user's numbers are picked in a lottery that consists of
correctly picking six different numbers that are between 1 and 10. To do
this, run a loop 1000 times that randomly generates a set of user
numbers and simulates drawings until the user's numbers are drawn.
Find the average number of drawings needed over the 1000 times the
loop runs.
26. Write a program that simulates drawing names out of a hat. In this
drawing, the number of hat entries each person gets may vary. Allow the
user to input a list of names and a list of how many entries each person
has in the drawing, and print out who wins the drawing.
push(11);
push(6);
push(14);
pop();
push(9);
algorithms + push(24);
push(47);
Lists +
2. Suppose instead that the stack from the problem above is replaced with
Stacks and Queues + a queue and that the push/pop operations are replaced with add/remove
operations. What would the queue look like after all the operations have
Recursion + been performed?
Binary Trees + 3. Fill in the blanks with either stack or queue.
Binary Search Trees + a. In an operating system there are often several tasks waiting to be
executed. They are typically processed in the order in which they
Heaps +
are generated. A would be the more appropriate data structure
to use.
Sets and Hashing +
4. What is the output of the function below, given the input [2, 5, 11, 0,
0, 1, 19]?
for (int x : a) {
if (x == 0)
stack2.push(stack1.pop());
else if (x == 1)
stack1.push(stack2.pop());
else
stack1.push(x);
int total = 0;
total += stack1.pop();
return total;
if (Character.isUpperCase(s.charAt(i)))
Running times of
stack.push(s.charAt(i));
algorithms +
else if (stack.isEmpty() ||
Character.toLowerCase(stack.pop()) != s.charAt(i))
Lists +
return false;
}
Recursion + }
Binary Trees + 6. What is the output of the function below, given the input [4, 1, 8, 15,
9, 0, 1, 8, 0]?
Binary Search Trees +
public static int f(int[] a) {
for (int x : a) {
Maps + if (x == 0)
total += stack2.pop();
Graphs + else if (x == 1)
stack2.push(stack1.pop());
Sorting + else
stack1.push(x);
Exercises + }
return total;
u — if the user enters the letter u (for undo), then remove the
most recently added integer. The user should be able to call this
multiple times in a row. However, if there is nothing left to undo,
print a message saying so.
r — if the user enters the letter r (for redo), then put back the
number most recently undone. This should be able to handle
multiple undo operations. For instance, if the user chooses undo
three times in a row, then the program should allow the user to
redo all of them. If there is nothing to redo, print a message
saying so. Note that redo should work just like it does for real
programs, where if you undo a bunch of things and then enter a
new number, then each remaining redo is wiped out.
q — if the user enters q (for quit), then print a message and exit
the program.
anything else — print an message indicating that the input is
invalid.
Running times of
algorithms + Finally, after each action, the program should display the contents of the
stack of entered integers. Here is some sample output:
Lists +
Action: 1
Recursion + Action: 2
[2, 1]
Binary Trees +
Action: u
Heaps +
Action: u
[]
Maps + Action: u
Nothing to undo!
Graphs + []
Sorting + Action: r
[1]
Exercises +
Action: r
[2, 1]
Action: 3
[3, 2, 1]
Action: u
[2, 1]
Action: u
[1]
Action: 18
[18, 1]
Action: r
Nothing to redo!
[18, 1]
Action: help
[18, 1]
Action: q
Bye!
Maps +
A: 1 B: 2
Sorting + A: 10 B: 8
Exercises +
[Hints: Remember that the cards are just integers. You don't need a
special card object. Java's queues have a size method. You can't use
Collections.shuffle to shuffle a queue, so you might want to build a
list, shuffle it, and then add the cards from the list into the queue.]
11. Write a simple version of the card game War. Instead of cards with suits,
just consider cards with integer values 2 through 14, with four copies of
each card in the deck. Each player has a shuffled deck of cards. On each
turn of the simplified game, both players compare the top card of their
deck. The player with the higher-valued card takes both cards and puts
them on the bottom of his deck. If both cards have the same value, then
each player puts their card on the bottom of their deck (this part differs
from the usual rules). Use queues to implement the decks.
12. One application of a stack is for checking if parentheses in an expression
are correct. For instance, the parentheses in the expression (2*[3+4*
Running times of {5+6}]) are correct, but those in (2+3*(4+5) and 2+(3*[4+5)] are
algorithms + incorrect. For this problem, assume (, [, and { count as parentheses.
Each opening parenthesis must have a matching closing parenthesis and
Lists + any other parentheses that come between the two must themselves be
opened and closed before the outer group is closed.
A simple stack-
Stacks and Queues +
based algorithm for this is as follows: Loop through the expression.
Recursion + Every time an opening parenthesis is encountered, push it onto the
stack. Every time a closing parenthesis is encountered, pop the top
Binary Trees + element from the stack and compare it with the closing parenthesis. If
they match, then things are still okay. If they don't match then there is a
Binary Search Trees + problem. If the stack is empty when you try to pop, that also indicates a
problem. Finally, if the stack is not empty after you are done reading
Heaps + through the expression, that also indicates a problem.
Sets and Hashing + Implement this algorithm in a function called checkParens It should take
a string as its parameter and return a boolean indicating whether the
Maps + parentheses are okay. The stack should be implemented using Java's
ArrayDeque class.
Graphs +
13. Write a postfix expression evaluator. For simplicity, assume the valid
Sorting +
operators are +, -, *, and / and that operands are integers.
Exercises +
14. For this problem you will be working with strings that consist of capital
and lowercase letters. Such a string is considered valid if it satisfies the
following rules:
The letters of the string always come in pairs—one capital and one
lowercase. The capital letter always comes first followed by its
lowercase partner somewhere later in the string.
Here are a few valid strings: ZYyz, ABCcba, ABCcCcba, AaBb, AAaa,
ABbACDAaDC, and ABBbCDEedcba.
Write a boolean method called valid that takes a string and returns
whether or not a given string is valid according to the rules above.
Running times of 15. Add exception-handling to the LLStack class. An exception should be
algorithms + raised if the user tries a peek operation or a pop when the data structure
is empty.
Lists +
16. The Java documentation recommends using the ArrayDeque class for a
Stacks and Queues +
stack. A problem with this is that the ArrayDeque class is very general
Recursion + and has a lot of methods that are not needed for stacks. Create a stack
class that acts as a wrapper around the ArrayDeque class. It should
Binary Trees + provide the same methods as our stack class from this chapter and
implement them by calling the appropriate ArrayDeque methods. There
Binary Search Trees + should be no other methods.
Heaps + 17. This exercise is about implementing a queue with an array instead of as
a linked list. Create a class called AQueue. It will have class variables a,
Sets and Hashing + front, back, and capacity, and numElements. The queue's data (integers)
are stored in the integer array a whose length is capacity. The field
Maps + numElements is for how many elements are actually stored in the queue.
The front and back variables are integers that keep track of the front
Graphs +
and back of the queue. Create the following methods for the class.
Sorting +
a. A constructor that takes size as a parameter.
Exercises +
b. add(x) — Adds an integer into the queue. The way add works is
that you add x into the array at the location of back. Then increase
back by 1. Note that if back ends up greater than or equal to the
size, then it should wrap around. If adding an element would fill
up the queue beyond capacity, then instead of adding, you should
throw a RuntimeException.
Note that as things are added and removed from the queue, the part of
the array that contains the queue's data will shift to the right. After
awhile it will even wrap around. To check if the queue is empty or full,
you can either maintain a separate count variable or do a check based
on the values of the front and back variables.
18. Repeat the exercise above but create a class called AStack that is a stack
implemented with an array. In place of add and remove, the methods will
be push and pop.
19. Implement a queue class by using two stacks. The queue class should
have the same methods as the one from this chapter.
20. A deque is a double-ended queue that allows adding and removing from
both ends. Implement a deque class from scratch using a linked list
approach like we did in class for stacks and queues. Your class should
Running times of have methods addFront, removeFront, addBack, removeBack, toString,
algorithms + and a constructor that creates an empty deque. The remove methods
should not only remove the value, but they should also return that
Lists +
value.
It should use a doubly-linked list with both front and back
Stacks and Queues + markers. This will allow all of the adding and removing methods to run in
O(1) time. This means that there should be no loops in your code.
Recursion +
Binary Trees +
12.4. Chapter 4 Exercises
Binary Search Trees + Here are a few notes about your solutions to the recursive problems below.
Heaps + In many cases an iterative solution would be much easier, but this
exercise is for practice thinking recursively. Assume that any lists in this
Sets and Hashing + problem are lists of integers.
Maps + This is recursion, so make sure that somewhere in your solution, your
method calls itself.
Graphs +
You cannot use any global variables.
Sorting +
The number and type of your methods' parameters must match what is
Exercises + specified in the problem.
Stacks and Queues + g. compress(list) — compresses the list a so that all duplicates are
reduced to a single copy. For instance, [1,2,2,3,4,4,4,4,5,5,6]
Recursion + gets reduced to [1,2,3,4,5,6].
Stacks and Queues + s. f(n) — Takes an integer n and returns the nth term of the
sequence given by x1=3, xn = 7xn-1-15.
Binary Trees + q. swap(s) — assuming s is a string consisting of zeros and ones, this
method returns a string with all of the ones changed to zeros and
Binary Search Trees + all of the zeros changed to ones. For instance, swap("00001101")
should return "11110010".
Heaps +
r. trim(s) — Returns a string with all the spaces from both the front
Sets and Hashing + and the back of s removed. For instance, if s is " abc de f ", then
trim(s) should return "abc de f".
Maps +
s. triple(s) — returns a string with each character of s replaced by
Graphs + three adjacent copies of itself. For instance, triple("abc") should
return "aaabbbccc".
Sorting +
t. tripleUp(a) — Multiplies all the elements in an array by 3 and
Exercises +
returns the new array. For instance, if a is [1,2,3], tripleUp(a)
would return [3,6,9]
if (a==0)
return b;
if (n==5)
return s;
5. For the following recursive method, what value will it return if given the
list [4,8,15,7,2,18,40,9]?
if (list.size() == 1)
return list.get(0);
if (list.get(0) < x)
return list.get(0);
else
algorithms + }
Lists + 6. What is the output of the function below when given the string abc?
if (s.equals(""))
String c = String.valueOf(s.charAt(0));
}
Binary Search Trees +
7. What does the following recursive method tell us about its input string?
Heaps +
public static boolean f(String s) {
return true;
return false;
}
Sorting +
8. Write a method called countAs(s) that counts the number of capital A's
Exercises + in a string. But do this by creating a helper function called
countAHelper(s, total), where total is a parameter that will hold a
running total. The countA method should contain exactly one line of
code, and that line should call the helper. The helper should do almost all
the work, making use of that total parameter.
10. Rewrite the sum function from Section 4.2 so that it uses a parameter to
the function called total to keep a running total.
11. Translate the following code into recursive code using the technique
outlined in Section 4.6.
if (i*i%5==1)
a.add(i)
if (i*i+j*j=k*k)
count++;
return count;
}
Running times of
algorithms + 12. Consider the code below that finds the smallest item in a list. Convert
that code into recursive code in the same way that we converted the
Lists + iterative Fibonacci code into recursive code. That is, the variables i and
small should become parameters so that you have a function called
Stacks and Queues + smallest_helper(L, i, small) that does all the work along with a
function smallest(L) that calls the helper function. Here is the iterative
Recursion +
code you should translate:
Binary Trees +
public int smallest(List<Integer> list) {
small = L.get(i)
}
Maps +
13. Write a function max(list) that returns the largest element in the given
Graphs + list of integers. Do this by breaking the string into halves split at the
middle of the string (instead of head and tail).
Sorting +
14. Rewrite the binary search algorithm from Section 1.5 recursively.
Exercises +
15. Rewrite the permutations function from Section 4.3 to return a list of the
permutations of an arbitrary string. For instance, when called on "abc",
it should return [cba, bca, bac, cab, acb, abc].
16. Rewrite the permutations function from Section 4.3 so that it returns
lists of integer lists instead of strings. For instance, when called with
n=2, it should return [[1,2], [2,1]].
n=2: 12
Recursion + 21. Write a function called onesAndTwos(n) that returns a list of strings,
where the strings in the list contain all the ways to add ones and twos to
Binary Trees + get n. Here are the first few values:
Graphs +
For instance, in the n=3 case, we see that 1+1+1, 2+1, and 1+2 are all
Sorting + the ways to add ones and twos to equal 3.
Exercises + To get the values for n, take all the ways to get n-1 and add a 1 to the
end of each, and take all the ways to get n-2 and add a 2 to the end of
each. Combine all these into a single list and return it. The two base
cases are n=1, which should return the list ["11"] and n=2, which should
return ["11", "2"].
22. Write a function called partitions(n) that returns a list of all the
partitions of the integer n. A partition of n is a breakdown of n into
positive integers less than or equal to n that sum to n, where order
doesn't matter. For instance, all the partitions of 5 are 5, 4+1, 3+2,
3+1+1, 2+2+1, 2+1+1+1, and 1+1+1+1+1. Each partition should be
represented as a string, like "2+1+1".
Running times of
algorithms +
Lists +
Recursion +
Binary Trees +
k. printLevel(k) — prints all of the elements at level k
Binary Search Trees +
l. set(location, value) changes the data value at the node specified
Heaps + by the string location to the given new value.
Sets and Hashing + 2. Our binary tree class works with generic types. Make the class less
general so that it only works with integer data. Then add the following
Maps + methods to the class:
Graphs + a. max() — Returns the largest integer stored in the tree.
Sorting + b. numNegatives() — Returns how many negative entries are in the
tree
Exercises +
c. numNegativeLeaves() — Returns how many leaves in the tree have
negative values
d. sum() — Returns the sum of all the integers stored in the tree.
4. Rewrite the binary tree class to represent a general tree where each
node can have an arbitrary number of children. To do this, use a list of
nodes in place of the left and right variables.
5. Write a binary tree class that uses an dynamic array instead of a linked
structure. Use the following technique: The root is stored at index 0. Its
children are stored at indices 1 and 2. The children of the root's left child
are stored at indices 3 and 4, and its right children are stored at indices
5 and 6. In general, if the index of a node is i, then its left and right
children are stored at indices 2i+1 and 2i+2, respectively. The class
should have three methods:
Maps +
Graphs +
Sorting +
Exercises +
3. This is not a BST. Indicate precisely where the BST property is broken.
d. Besides the answer given in part (b), there is one other item that
we could use. What is it?
5. If the binary search tree is well-balanced, what are the big O running
times of adding and deleting?
6. If elements are added to a binary search tree in numerical order (and no
effort is made to rebalance the tree), what are the big O running times
Running times of of adding and deleting items?
algorithms +
7. This exercise involves adding some methods to the BST class. Your
Lists + methods must make use of the order property of binary search trees. In
other words, they should work with binary search trees, but not on an
Stacks and Queues + arbitrary binary tree. Except for the last problem, these methods should
not have to look at every item in the tree. Binary search trees are
Recursion +
ordered to make searching faster than with arbitrary trees and this
problem is about taking advantage of that ordering.
Binary Trees +
a. get(k) — returns the kth smallest item in the BST
Binary Search Trees +
9. It would be nice if Java had a special class for ordered pairs (things like
(2,3) or (5,1) that you might remember from algebra). Java
unfortunately doesn't have such a class, but we can create one. Create a
class called OrderedPair with the following specifications.
h. Use Collections.sort to sort the list, and then print out the list.
2. Elements are added to a heap in the order given below. Draw the heap
at each individual step.
Binary Trees +
Heaps +
4. Recall that heaps are implemented using lists. What would the list be for
Sets and Hashing +
the original heap shown in problem 3?
Maps +
5. Add the following methods to the Heap class:
Graphs +
a. delete(x) — Deletes an arbitrary element x from the heap. One
way to do this is to do a linear search through the list representing
Sorting +
the heap until you find the element and then delete it in more or
Exercises + less the same way to how the pop method removes the min. If
there are multiple copies of x, your method doesn't have to delete
all of them.
6. Create a class called MaxHeap that is like the Heap class except that it
implements a max heap, where a parent's value is always greater than
or equal to both of its children.
7. Create a class called MaxHeap2 that has the same methods as the Heap
class, but uses Java's PriorityQueue class under the hood to implement
the methods. Assume that the data in the heap is all integers. [Hint:
mathematically, if x>y then -x<-y.]
8. Create a class called GHeap that is a generic version of the Heap class that
works with any data type that implements the Comparable interface.
Make sure the class has the same methods as the Heap class.
Binary Trees + 11. One nice application of heaps is for sorting really huge data sets, like
data from a file several gigabytes in size, where it would be impossible
Binary Search Trees + to fit all of the data into RAM at once. Suppose we have a huge file
consisting of integers, and we need to know the 10 largest integers from
Heaps + the file.
Here is an algorithm to find them: Start by reading the first
10,000 items from the file. Add them all into a heap. Pop the first 10
Sets and Hashing +
things off of the heap and add them to a new heap. Then add the next
Maps + 9990 items from the file onto that heap. Pop the first 10 things off of this
heap onto a new heap and add the next 9990 items from the file onto
Graphs + that heap. Keep repeating this process until you've gone through all the
data in the file. Popping the first 10 things off of the last heap gives the
Sorting + ultimate solution.
Implement this algorithm in a function called
largest10. The function should take a string argument representing a
Exercises + filename and return a list of the 10 largest integers from that file.
Assume the file contains one integer per line.
2. a. If we use a good hash function, the buckets will all only have at
most a few elements each. This means that add, remove and
contains will all have what big O running time?
a. Do this by creating a set of all the words in Pride and Prejudice and
another set of all the words in War and Peace and seeing what is in
the latter that is not in the former. Be sure to remove all the
punctuation and convert the text to lowercase.
c. How long did the first program take to find all the words? How long
did the second one take?
d. How does your answer in part (c) relate to the big O of looking
through a set and a list/array for something?
Exercises +
The indices 0, 1, 4, and 6 are true, and the others are false.
Write a
BitSet class that works for sets of integers that contains the following
methods:
Binary Search Trees + m. [] difference(s1, s2) — a static method that returns the
intersection of sets s1 and s2. [Be careful because the sets may
Heaps + have different sizes.]
Sets and Hashing + n. [] isSubset(s1, s2) — a static method that returns whether or not
s1 is a subset of s2.
Maps +
8. Implement a set using a binary search tree. Provide add, remove,
Graphs + contains, and union methods.
Sorting +
12.9. Chapter 9 Exercises
Exercises +
1. What does the function below accomplish? Assume c is a character in s.
if (map.containsKey(s.charAt(i)))
map.put(s.charAt(i), map.get(s.charAt(i))+1);
else
map.put(s.charAt(i), 1);
return map.get(c);
2. Write a few lines of code that creates the map {"a":4, "b":2, "c":3}
and then uses that map to create a string consisting of the keys of that
map repeated a number of times equal to their values. For this map, it
would be "aaaabbccc".
Running times of 4. The file elements.txt contains periodic table data. Write a program that
algorithms + reads the file and creates a map whose keys are the element symbols
(H, He, Li, etc.) and whose values are the corresponding names of those
Lists + elements: (Hydrogen, Helium, Lithium, etc.) Then ask the user to enter
an abbreviation and print out its name. If the abbreviation is not a valid
Stacks and Queues + element abbreviation, print out a message saying so.
Recursion + 5. Write a method called sameCharFreq(s, t) that takes two strings and
returns whether they contain the same characters with the same
Binary Trees + frequencies, though possibly in different orders. For instance, "AABCCC"
and "CCCABA" both contain two As, a B, and three Cs, so the method
Binary Search Trees + would return true for those strings. It would return false for the strings
"AAABC" and "ABC" since the frequencies don't match up. Your method
Heaps +
should use a map.
Sets and Hashing +
6. This problem will give you a tool for cheating at crossword puzzles.
Maps + Create a function called crosswordCheat(s) that takes a string s that
consists of the letters they know from the word, with asterisks as
Graphs + placeholders for the letters they don't know. Then print a list of all the
words that could work. For instance, if s is w**er, the program should
Sorting + return a list containing water, wafer, wider, etc.
Exercises + Please do the problem using a map as follows: Create a map map1 from s
so that its keys are the indices of the non-asterisk characters and its
values are the characters themselves. Then loop through the words in a
wordlist. If the word has a different length than s, then move on to the
next word. Otherwise, create a similar map, map2 for the word, and
compare the two maps. If map1 has any keys that are not in map2 or if the
value associated with any key of map1 is different from its value in map2,
then the word won't work. Otherwise, it will, and print it out.
7. Suppose we are given a word, and we want find all the words that can
be formed from its letters. For instance, given water, we can form the
words we, wear, are, tea, etc.
One solution is, for a given word, form a map whose keys are the
characters of the word and whose values are the number of times the
character appears in the word. For instance, the map corresponding to
java would be {a:2, j:1, v:1}. A word w can be made from the letters
of another word W if each key in the map of w also occurs in the map of
W and the corresponding values for the letters of w are less than or
equal to the values for the letters of W.
Using this approach, write a method called wordsFrom that takes a string
and returns a list of all the English words (from a wordlist) that can be
made from that string.
8. One use for maps is for implementing sparse integer lists. A sparse
integer list is one where most of the entries are 0, and only a few are
nonzero. We can save a lot of space by using a map whose keys are the
indices of nonzero entries and whose values are the corresponding
Running times of entries. Create a class called SparseList that uses this approach. The
algorithms + class should have get, set, and contains methods. In this exercise, you
are creating what is basically an implementation of a list that uses maps.
Lists + What is special about it is that the map only stores the indices of
nonzero values.
Stacks and Queues +
9. The file scores.txt contains the results of every 2009-10 NCAA Division
Recursion + I basketball game (from http://kenpom.com). Each line of that file looks
like below:
11/09/2009 AlcornSt. 60 OhioSt. 100
Binary Trees +
Write a program that finds all the teams with winning records (more
Binary Search Trees +
wins than losses) who were collectively (over the course of the season)
Heaps + outscored by their opponents. To solve the problem, use maps wins,
losses, pointsFor, and pointsAgainst whose keys are the teams.
Sets and Hashing +
10. The substitution cipher encodes a word by replacing every letter of a
Maps + word with a different letter. For instance every a might be replaced with
an e, every b might be replaced with an a, etc. Write a program that
Graphs + asks the user to enter two strings. Then determine if the second string
could be an encoded version of the first one with a substitution cipher.
Sorting + For instance, CXYZ is not an encoded version of BOOK because O got
mapped to two separate letters. Also, CXXK is not an encoded version of
Exercises + BOOK, because K got mapped to itself. On the other hand, CXXZ would
be an encoding of BOOK.
11. Suppose you are give a file courses.txt, where each line of that file
contains a name and the CS courses that person has taken. Read this
info into a map whose keys are the names and whose values are lists of
courses. Then create a static method called getStudents(map, course)
that takes the map and a course number and uses the map to return a
list of all the students who have taken that course.
12. The file war_and_peace.txt contains the complete text of Tolstoy's War
and Peace. Convert it all to lowercase and create a map that whose keys
are the lowercase letters a through z and whose values are what
percentage (as a double) of the total that letter comprises, each rounded
to 1 decimal place.
Your final map should start with the exact entries
below (in alphabetical order) when printed out:
That is, 8.1% of the letters are a's, 1.4% are b's, etc.
Hint: This line can
be used to read the entire contents of a file into a string in one fell
swoop:
Sorting +
12.10. Chapter 10 exercises
Exercises + 1. Give the adjacency list representation of the graph below.
2. For the graphs below, we are starting at vertex a. Indicate the order in
which both breadth-first search and depth-first search visit the vertices.
a. Describe what to make the vertices and what to make the edges.
b. For the particular case of the state of the puzzle above on the left,
give exactly what the vertices and edges associated with that state
would be.
6. A floodfill is where you have a two-dimensional region and you fill part of
it with a certain value until you reach boundaries. Those boundaries can
be the edge of the region or other values in the region. This is a common
feature in many drawing programs. It is also useful in certain games,
like Minesweeper. As an example, a region (two-dimensional array of
Running times of characters) is shown below on the left. We want to floodfill parts of the
algorithms + region by replacing dashes with @ symbols. The result of floodfilling
starting at index (0,0) is shown in the middle. There we start by setting
Lists + the dash at (0,0) to @. From there we continue searching up, down, left,
and right until we reach the end of the array or an asterisk character. On
Stacks and Queues +
the right below is the result of a floodfill starting at index (1,3).
Recursion +
- - * - * - @ @ * - * - - - * @ * -
- * - - * * @ * - - * * - - * @ * -
Binary Trees +
* * * - - * * * * - - * * * * @ @ *
- * - - * * - * - - * * - * @ @ * *
Heaps +
Write a function called floodfill(char[][] a, int r, int c) that takes
Sets and Hashing + a two-dimensional array of characters and a location (r,c) and performs
a floodfill replacing dashes with @ characters, starting at location (r,c).
Maps + Your function should modify the caller's array. Instead of creating a
graph, do this by modifying the BFS or DFS code to work with an array.
Graphs +
7. In the game Boggle, letters are arranged in a 5× 5 grid. The goal is to
Sorting + try to find words in the grid. You form words by picking a starting letter
and following a path from cell to cell. You can move horizontally,
Exercises + vertically, or diagonally, and this can be done backwards or forwards. For
example, in the board below, if we start at the B near the bottom right,
we can form the word BYTES by moving right, then up, then left, and
then diagonally left and down.
B D N V M
S T M C B
I M E N A
S P K E T
T L S B Y
Five professors and their five dogs were having a lot of fun
camping. Each professor owned one of the dogs. While they
were all on a hike, they came to a river they had to cross.
There was a small motorboat alongside the river. They
determined that the boat was only large enough to hold
three living things—either dogs or professors. Unfortunately,
the dogs were a little temperamental, and they really didn't
like the professors. Each dog was comfortable with its owner
e t e p o esso s ac dog as co o tab e t ts o e
and no other professor. Each dog could not be left with the
Running times of professors unless the dog's owner was present—not even
algorithms + momentarily! Dogs could be left with other dogs, however.
The crossing would have been impossible except Professor
Lists + A's dog, which was very smart, knew how to operate the
boat. None of the other dogs were that smart, however. How
Stacks and Queues + was the crossing arranged, and how many trips did it take?
Recursion + 9. Add the following methods to the Graph class.
a : b c
b : a
c : b d
d : c
e :
Sets and Hashing + q. isValidColoring(m) — returns true if the map m represents a valid
coloring of the graph and false otherwise. A coloring of a graph is
Maps + an assignment of colors to the vertices of the graph such that
adjacent vertices are not given the same color. The argument m is
Graphs + a map whose keys are the vertices of the graph and whose values
are the colors (integers) assigned to the vertices. Usually positive
Sorting +
integers are used for the color names.
Exercises +
r. findValidColoring() — returns a map like in the previous problem
that represents a valid coloring of the graph. Create the map using
the following greedy algorithm: Loop through the vertices in any
order, assign each vertex the smallest color (starting with the
number 1) that has not already been assign to one of its
neighbors.
10. In a file separate from the Graph class, create the following methods:
Heaps + 13. Write a recursive version of the depth first search algorithm.
Sets and Hashing + 14. Modify the depth-first search algorithm to create a function hasCycle(G)
that determines if the graph G has a cycle. [Hint: if depth-first search
Maps + encounters a vertex that it has already seen, that indicates that there is
a cycle in the graph. Just be careful that that vertex doesn't happen to
Graphs + be the parent of the vertex currently being checked.]
a. Mergesort will initially break this array into two pieces. What are
those two pieces?
b. Quicksort will initially break this array into two pieces. What are
those two pieces if the first element is used as the pivot?
3. Suppose that mergesort has broken an array into two parts and sorted
those two parts into [C, R, S, V] and [D, M, T, Y, Z]. Show each step
Running times of of the merge process that merges those two parts into one big array.
algorithms +
4. Suppose that quicksort has broken an array into two parts and sorted
Lists + those parts into [B, D, F, R], and [T, W, Z]. If the pivot is S, describe
how quicksort combines everything back into one big array.
Stacks and Queues +
5. Show how the in-place quicksort partition operation works on the list
Recursion +
below, using the first element of the list as a pivot. Please show your
Binary Trees + work clearly, indicating how the indices move through the list and when
swaps occur.
Heaps + You are showing just the partition process in this problem, not the
recursive part of quicksort.
Sets and Hashing +
6. Fill in the missing entries in the table of big O running times below.
Maps +
Algorithm Worst Case Best Case Average Case
Graphs +
Insertion sort
Sorting + Mergesort O(n log n)
Quicksort O(n log n)
Exercises +
7. a. Give an example of a array of 10 items for which quicksort will run
in O(n2) time if it always uses the first item as a pivot.
b. If the pivot is changed to always pick the middle element, will the
running time for your array be O(n2) or O(n log n)?
9. At the end of the part of Section 11.5 on Bubble Sort was a suggestion
for improving Bubble Sort by checking to see if any swaps were made on
a pass through the array. If no swaps are made, then the array must be
sorted and the algorithm can be stopped. Implement this.
10. In Section 11.7, we showed how to write the selection sort variant to
work with a generic list. Rewrite the Quicksort method in the same way.
11. Exercise 9 of Section 12.6 asked you to create a class called OrderedPair
that represents an ordered pair (x,y). Create a list of pairs and sort the
pairs by their x coordinates. Then perform a sort that sorts them first by
their x coordinates and second by their y coordinates.
12. Write a method that takes a string of lowercase letters and creates a
map of how many times each letter appears. Using that map, do the
following:
a. Print out the letters and their frequencies, ordered alphabetically.
Running times of
algorithms +
b. Print out the letters and their frequencies, ordered from most to
least frequent.
Lists +
c. Print out the letters and their frequencies, ordered from least to
Stacks and Queues +
most frequent.
Recursion +
13. Implement the following sorting algorithms to sort integer arrays.
Binary Trees +
a. Treesort — This works just like Heapsort except it uses a BST in
Binary Search Trees + place of a heap. Do this using the BST class from Chapter 7. In
particular, you will want to add a method to that class called
Heaps + inOrderList that returns a list of the items in the BST from an in
order traversal.
Sets and Hashing +
b. Cocktail sort — This is a relative of bubble sort. Assume the n
Maps + elements are in indices 0, 1, … n-1. Start by comparing elements 0
and 1, then elements 1 and 2, and so on down to the end of the
Graphs + array, swapping each time if the elements are out of order. When
we're done, the largest element will be in place at the end of the
Sorting + array. Now go backwards. Start at the second to last element
(index n-2) and compare it with the one before it (index n-3).
Exercises +
Then compare the elements at n-3 and n-4, the elements at n-4
and n-5, etc. down to the front of the list. After this, the smallest
element will be in place at the front of the array. Now proceed
forward again, starting by comparing elements 2 and 3, then 3 and
4, etc., down to n-3 and n-2. After this the second to last element
will be correct. Then go backwards again, then forwards, then
backwards, each time stopping one position short of the previous
stopping place, until there is nowhere left to move.