0% found this document useful (0 votes)
7 views

Data Structures and Algorithms - MAIN

Uploaded by

manyookimasi2
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Data Structures and Algorithms - MAIN

Uploaded by

manyookimasi2
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 26

CSC-319 Data Structures and Algorithms

Even though computers can perform literally millions of mathematical computations per second,
when a problem gets large and complicated, performance can nonetheless be an issue. One of the
most crucial aspects to how quickly a problem can be solved is how the data is stored in
memory.

To illustrate this point, consider going to the local library to find a book about a specific subject
matter. Most likely, you will be able to use some kind of electronic reference or, in the worst
case, a card catalog, to determine the title and author of the book you want. Since the books are
typically shelved by category, and within each category sorted by author's name, it is a fairly
straightforward and painless process to then physically select your book from the shelves.

Now, suppose instead you came to the library in search of a particular book, but instead of
organized shelves, were greeted with large garbage bags lining both sides of the room, each
arbitrarily filled with books that may or may not have anything to do with one another. It would
take hours, or even days, to find the book you needed. This is how software runs when data is not
stored in an efficient format appropriate to the application.

A data structure is an arrangement of data in a computer's memory or even disk storage. An


example of several common data structures are arrays, linked lists, queues, stacks, binary
trees, and hash tables.

Algorithms, on the other hand, are used to manipulate the data contained in these data structures
as in searching and sorting. Many algorithms apply directly to a specific data structures. When
working with certain data structures you need to know how to insert new data, search for a
specified item, and delete a specific item.

Abstract Data Types (ADT)

An Abstract Data Type is more a way of focusing on what it does and ignoring how it does its
job. A stack or a queue is an example of an ADT. It is important to understand that both stacks
and queues can be implemented using an array. It is also possible to implement stacks and
queues using a linked list. This demonstrates the "abstract" nature of stacks and queues: how
they can be considered separately from their implementation.

To best describe the term Abstract Data Type, it is best to break the term down into "data type"
and then "abstract".

Data type

When we consider a primitive type we are actually referring to two things: a data item with
certain characteristics and the permissible operations on that data. An int in Java, for example,
can contain any whole-number value from -2,147,483,648 to +2,147,483,647. It can also be used
with the operators +, -, *, and /. The data type's permissible operations are an inseparable part of
its identity; understanding the type means understanding what operations can be performed on it.
Primitive data types are predefined types of data which are supported by the programming
language. Example: int, char, string, Boolean, byte, short, long, float, & double.

In Java, any class represents a data type, in the sense that a class is made up of data (fields) and
permissible operations on that data (methods). By extension, when a data storage structure like a
stack or queue is represented by a class, it too can be referred to as a data type. A stack is
different in many ways from an int, but they are both defined as a certain arrangement of data
and a set of operations on that data.

Recall the definition of a data type: "a data type consists of the values it represents and the
operations defined upon it." You are already familiar with some basic data types such as integers
and characters. These data types are found in nearly all computers, and they have well-defined
operations that can be done with them. For example, the integer data type has operations for
addition, subtraction, multiplication, and division. The computer knows how to perform these
arithmetic operations with any two integers because these operations are part of the integer data
type.

We can extend the idea of data types to include more than just the basic data types. Our
definition of data types consists of two parts: 1) values, and 2) operations. Now suppose we
extended our definition so that the first part included data structures. This extension makes sense
because our data structures are really just novel ways of organizing values. We have already seen
several examples of these extended or abstract data types (ADTs).

Abstract

Now let’s look at the "abstract" portion of the phrase. The word abstract in our context stands for
"considered apart from the detailed specifications or implementation".

In Java, an Abstract Data Type is a class considered without regard to its implementation. It can
be thought of as a "description" of the data in the class and a list of operations that can be carried
out on that data and instructions on how to use these operations. What is excluded though, is the
details of how the methods carry out their tasks. As an end user (or class user), you should be
told what methods to call, how to call them, and the results that should be expected, but not
HOW they work.

We can further extend the meaning of the ADT when applying it to data structures such as a
stack and queue. In Java, as with any class, it means the data and the operations that can be
performed on it. In this context though, even the fundamentals of how the data is stored should
be invisible to the user. Users not only should not know how the methods work, they should also
not know what structures are being used to store the data.

Consider for example the stack class. The end user knows that push() and pop() (among other
similar methods) exist and how they work. The user doesn't and shouldn't have to know how
push() and pop() work, or whether data is stored in an array, a linked list, or some other data
structure like a tree.

2
So, stop trying to memorize everything. Instead, start with the basics and learn to do two things:

 Visualize the data structure. Intuitively understand what the data structure looks like,
what it feels like to use it, and how it is structured both in the abstract and physically in
your computer’s memory.

 Learn when and how to use different data structures and their algorithms in your own
code. This is harder as a student, as your class discussions just won’t impart this
knowledge. That’s fine. Realize you won’t master data structures until you are working
on a real-world problem and discover when to use Array, hash, tree, Priorities, etc. As a
student you should focus on learning not the minutia details but the practicalities.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Imagine that you are hired by company XYZ to organize all of their records into a computer
database. The first thing you are asked to do is to create a database of names with all the company's
management and employees. To start your work, you make a list of everyone in the company along with
their position.

Name Position
Aaron Manager
Charles VP
George Employee
Jack Employee
Janet VP
John President
Kim Manager
Larry Manager
Martha Employee
Patricia Employee
Rick Secretary
Sarah VP
Susan Manager
Thomas Employee
Zack Employee

But this list only shows one view of the company. You also want your database to represent the
relationships between management and employees at XYZ. Although your list contains both
name and position, it does not tell you which managers are responsible for which workers and so
on. After thinking about the problem for a while, you decide that a tree diagram is a much better
structure for showing the work relationships at XYZ.
3
These two diagrams are examples of different data structures. In one of the data structures, your
data is organized into a list. This is very useful for keeping the names of the employees in
alphabetical order so that we can locate the employee's record very quickly. However, this
structure is not very useful for showing the relationships between employees. A tree structure is
much better suited for this purpose.
In computer science, data structures are an important way of organizing information in a
computer. Just like the diagrams above illustrate, there are many different data structures that
programmers use to organize data in computers. Some data structures are similar to the tree
diagram because they are good for representing relationships between data. Other structures are
good for ordering data in a particular way like the list of employees. Each data structure has
unique properties that make it well suited to give a certain view of the data.

Computer Memory

To understand how a computer can represents large data structures like our tree diagram, we first
need to understand some basic facts about computer memory. Every piece of data that is stored
in a computer is kept in a memory cell with a specific address. We can think of these memory
cells as being a long row of boxes where each box is labeled with an address. If you have ever
used a computer spreadsheet before, you know that spreadsheets also have labeled boxes that can
hold data. Computer memory is similar to this with the exception that computer memory is
linear. That's why we think of computer memory as being organized in a row rather than a grid
like a spreadsheet.

The computer can store many different types of data in its memory. You have already learned
about some of the basic types of data the computer uses. These include integers, real numbers,
and characters. Once the computer stores data in the memory cells, it can access the data by using the

4
address of the data cells. For example, consider the following instructions for adding two integers
together.

Instructions Computer Memory

1. Store '1' in cell 2003.

2. Store '5' in cell 2004.

3. Add cells 2003 and 2004 and store the result in


cell 2006.

Notice how the computer performs operations by referring to the address of the memory cells.
These addresses are a very important component in creating various data structures in computer
memory. For example, suppose we want a data structure that can store a group of characters as a
word. In many computer languages, this data structure is called a string. If we store the
characters for the string 'apple' in the computer's memory, it might look something like this.

In order for the computer to recognize that 'apple' is a string, it must have some way of
identifying the start and end of the characters stored in memory. This is why the addresses of the
memory cells are important. By using the addresses to refer to a group of memory cells as a
string, the computer can store many strings in a row to create a list. This is one way that we
could create a data structure to represent our list of employees at company XYZ.

But what happens when we try to represent our tree diagram of company XYZ? It doesn't make
sense to store the names one after the other because the tree is not linear. Now we have a
problem. We want to represent a nonlinear data structure using computer memory that is linear.
In order to do this, we are going to need some way of mapping nonlinear structures like trees or
spreadsheet tables onto linear computer memory. Let’s see one solution to this problem.

Pointer Indirection

In our last lesson, we discovered a problem with representing data structures that are not linear.
We need some way to map these data structures to the computer's linear memory. One solution is
to use pointers. Pointers are memory locations that are stored in memory cells. By using a
5
pointer, one memory cell can "point" to another memory cell by holding a memory address
rather than data. Let's see how it works.

In the diagram above, the memory cell at address 2003 contains a pointer, an address of another
cell. In this case, the pointer is pointing to the memory cell 2005 which contains the letter 'c'.
This means that we now have two ways of accessing the letter 'c' as stored data. We can refer to
the memory cell which contains 'c' directly or we can use our pointer to refer to it indirectly. The
process of accessing data through pointers is known as indirection. We can also create multiple levels of
indirection using pointers. The diagram below shows an example of double indirection. Notice
that we must follow two pointers this time to reach the stored data.

As you can see, pointers can become very complex and difficult to use with many levels of
indirection. In fact, when used incorrectly, pointers can make data structures very difficult to
understand. Whenever we use pointers in constructing data structures, we have to consider the
tradeoff between complexity and flexibility.

The idea of pointers and indirection is not exclusive to computer memory. Pointers appear in
many different aspects of computer use. A good example is hyperlinks in web pages. These links
are really pointers to another web page. Perhaps you have even experienced "double indirection"
when you went to visit a familiar web site and found the site had moved. Instead of the page you
expected, you saw a notice that the web pages had been moved and a link to the new site. Rather
than clicking a single link, you had to follow two links or two pointers to reach the web page.

Linear Data structure

In our previous lesson, we saw that it is very simple to create data structures that are organized
similar to the way the computer's memory is organized. For example, the list of employee's from
the XYZ company is a linear data structure. Since the computer's memory is also linear, it is very
easy to see how we can represent this list with the computer. Any data structure which organizes
the data elements one after the other is a linear data structure. So far we have seen two examples
6
of linear data structures: the string data structure (a list of characters) and the XYZ company list
(a list of strings).

Example String

Example List

You may have noticed that these two examples of linear data structures resemble each other.
This is because they are both really different kinds of lists. In general, all linear data structures
look like a list. However, this does not mean that all linear data structures are exactly the same.

Suppose I want to design a list to store the names of the XYZ employees in the computer. One
possible design is to organize the names similar to the example pictured above. Another possible
design is to use the pointers we learned about in the last lesson. While these two designs provide
the same functionality (i.e. a list that can hold names), the way they are implemented in the
computer is much different. This means that there is an abstract view of a list which is distinct
from any particular computer implementation. We will return to this idea of an abstract view of a
data structure later.

You may have also noticed that the example picture of the Name
XYZ employees is not exactly the same as the original list. Aaron
Take another look at the employee list to the right. When we Charles
make a list of names, we tend to organize this list in a column George
rather than a row. In this case, the conceptual or logical Jack
representation of a list is a column of names. However, the Janet
physical representation of the list in the computer's memory John
is a row of strings. For most data structures, the way that we Kim
think about them is far different from the way they are
Larry
implemented in the computer. In other words, the physical
Martha
representation is much different from the logical
Patricia
representation, especially in data structures that use pointers.
Rick
Sarah
Susan
Thomas
Zack

Ordered List: The Abstract View


7
The most common linear data structure is the list. By now you are already pretty familiar with
the idea of a list and at least one way of representing a list in the computer. Now we are going to
look at a particular kind of list: an ordered list. Ordered lists are very similar to the alphabetical
list of employee names for the XYZ company. These lists keep items in a specific order such as
alphabetical or numerical order. Whenever an item is added to the list, it is placed in the correct
sorted position so that the entire list is always sorted.

Before we consider how to implement such a list, we need to consider the abstract view of an
ordered list. Since the idea of an abstract view of a list may be a little confusing, let's think about
a more familiar example. Consider the abstract view of a television. Regardless of who makes a
television, we all expect certain basic things like the ability to change channels and adjust the
volume. As long as these operations are available and the TV displays the shows we want to
view, we really don't care about who made the TV or how they chose to construct it. The
circuitry inside the TV set may be very different from one brand to the next, but the functionality
remains the same. Similarly, when we consider the abstract view of an ordered list, we don't
worry about the details of implementation. We are only concerned with what the list does, not
how it does it.

Suppose we want a list that can hold the following group of sorted numbers: [2 4 6 7]. What are
some things that we might want to do with our list? Well, since our list is in order, we will need
some way of adding numbers to the list in the proper place, and we will need some way of
deleting numbers we don't want from the list. To represent these operations, we will use the
following notation:

AddListItem(List, Item)
RemoveListItem(List, Item)

Each operation has a name and a list of parameters the operation needs. The parameter list for the
AddListItem operation includes a list (the list we want to add to) and an item (the item we want
to add). The RemoveListItem operation is very similar except this time we specify the item we
want to remove. These operations are part of the abstract view of an ordered list. They are what
we expect from any ordered list regardless of how it is implemented in the computer.

Array Implementation

One approach to creating a list is simply to reserve a block of adjacent memory cells large
enough to hold the entire list. Such a block of memory is called an array. Of course, since we
will want to add items to our list, we need to reserve more than just four memory cells. For now,
we will make our array large enough to hold six numbers.

First, you saw that the elements in the list must be kept in sequence, that is, there must not be
gaps in the list. If gaps are allowed, the computer will not be able to determine which items are
part of the list and which items are not. For this reason, the ordered list structures that are
implemented with arrays are known as sequential lists.

8
The second disadvantage is that arrays have a fixed size and therefore limit the number of items
the list can contain. Of course we could try to increase the size of the array, but it may not always
be the case that the adjacent memory cells in the computer are available. They could be in use by
some other program. However, it is quite likely that the computer does have available memory at
some other non-adjacent location. To take advantage of this memory, we need to design our list
so that the list items do not have to be adjacent.

Pointer Implementation

A second approach to creating a list is to link groups of memory cells together using pointers.
Each group of memory cells is called a node. With this implementation every node contains a
data item and a pointer to the next item in the list. You can picture this structure as a chain of
nodes linked together by pointers. As long as we know where the chain begins, we can follow
the links to reach any item in the list. Often this structure is called a linked list.

Notice that the last memory cell in our chain contains a symbol called "Null". This symbol is a
special value that tells us we have reached the end of our list. You can think of this symbol as a
pointer that points to nothing. Since we are using pointers to implement our list, the list
operations AddListItem and RemoveListItem will work differently than they did for sequential
lists.

By implementing our list with pointers, we are able to avoid the two disadvantages we
discovered with using sequential lists. However, this does not mean that linked lists are the
perfect solution. Whenever we use indirection in building a data structure, it becomes much
harder to find mistakes. For example, it is very easy to assign the wrong address to a pointer and
"short circuit" our list. In general, sequential lists are simpler than linked lists but they are also
9
more limited. Linked lists give us a great amount of flexibility but this comes at the cost of
increased complexity.

Stack: Abstract View

Another common linear data structure is the stack. Just like we did with the ordered list, we will
examine the abstract view of a stack first and then look at a couple of ways a stack can be
implemented. In one way, a stack is very similar to a list except that a stack is more restricted.

With the stack, we have restricted the access to one end of the list by using the pop and push
operations. The result of this restriction is that items in the list pile one on top of the other. To get
to the bottom item, we must first remove all the items above it. This behavior is sometimes
described as "last-in, first-out" or LIFO since the last item to enter the stack is the first item to
leave the stack. With the stack, the top item is always the last item to enter the stack and it is
always the first item to leave the stack since no other items can be removed until the top item is
removed.

Let's take another look at the operations that can be performed on a stack. We will represent
these two operations with the following notation:

Item PushStackItem(Stack, Item)


Item PopStackItem(Stack)

The PushStackItem operation has two parameters which are a stack and an item. This operation
adds the item to the top of the specified stack. The PopStackItem operation only takes one
parameter which is a stack. However, notice that this operation has the keyword Item listed to
the left. This keyword represents the item that is removed from the top of the stack when the
PopStackItem operation is done. These two operations are part of the abstract view of a stack.
They are what we expect from any stack regardless of how it is implemented in the computer.

Stack: The Implementation View

As we did with the ordered list, we are going to look at two implementations of a stack. The first
implementation uses an array to create the stack data structure, and the second implementation
uses pointers.

Array Implementation

In order to implement a stack using an array, we need to reserve a block of memory cells large
enough to hold all the items we want to put on the stack. The picture below shows an array of six
memory cells that represent our stack. Notice that we have one other memory cell called a stack
pointer that holds the location of the top of our stack. As the stack grows and shrinks, this pointer
is updated so that it always points to the top item of the stack.

10
Notice that our array implementation retains one of the problems we saw with the array
implementation of an ordered list. Since our array is a fixed size, our stack can only grow to a
certain size. Once our stack is full, we will have to use the PopStackItem operation before we
can push any more items onto the stack. To make the size of our stack more flexible, we can use
pointers to implement the stack.

Pointer Implementation

In order to implement a stack using pointers, we need to link nodes (groups of memory cells)
together just like we did for the pointer implementation of a list. Each node contains a stack item
and a pointer to the next node. We also need a special pointer to keep track of the top of our
stack.

Notice that the stack operations can get a little tricky when we use pointers. To push an item onto
the stack, we need to find a free memory location, set the pointer of the new location to the top of
the stack, and finally set the stack pointer to the new location. The order of these operations is
very important. If we set the stack pointer to the location of the new memory first, we will lose
the location of the top of our stack. This example shows the same tradeoff that we saw earlier
with the ordered list implementations. While the array implementation is simpler, the added
complexity of the pointer implementation gives us a more flexible stack.

11
Queues: The Abstract View

The final linear data structure that we will examine is the queue. Like the stack, the queue is a
type of restricted list. However, instead of restricting all the operations to one end of the list as a
stack does, the queue allows items to be added at one end of the list and removed at the other end

The restrictions placed on a queue cause this structure to be a "first-in, first-out" or FIFO
structure. This idea is similar to customer lines at a grocery store. When customer X is ready to
check out, he or she enters the tail of the waiting line. When the preceding customers have paid,
then customer X pays and exits the head of the line. The check-out line is really a queue that
enforces a "first come, first serve" policy.

Now let's take another look at the operations that can be performed on a queue. We will represent
these two operations with the following notation:

Item EnqueueItem(Queue, Item)


Item DequeueItem(Queue)

These two operations are very similar to the operations we learned for the stack data structure.
Although the names are different, the logic of the parameters is the same. The EnqueueItem
operation takes the Item parameter and adds it to the tail of Queue. The DequeueItem operation
removes the head item of Queue and returns this as Item. Notice that we represent the returned
item with a keyword located to the left of the operation name. These two operations are part of
the abstract view of a queue. Regardless of how we choose to implement our queue on the
computer, the queue must support these two operations.

Queues: The Implementation View

When we looked at the ordered list and stack data structures, we saw two different ways to
implement each one. Although the implementations were different, the data structure was still
the same from the abstract point of view. We could still use the same operations on the data
structures regardless of their implementations. With the queue, it is also possible to have various
implementations that support the operations EnqueueItem and DequeueItem. However, in this
lesson, we are only going to focus on one implementation in order to highlight another
distinction: the distinction between the logical representation of a queue and the physical
representation of a queue. Remember that the logical representation is the way that we think of
the data being stored in the computer. The physical representation is the way the data is actually
organized in the memory cells.

To implement our queue, we will use an array of eight memory cells and two pointers to keep
track of the head and tail of the queue. The diagram below shows a snapshot of a queue in the
computer's memory. The queue currently contains five letter items with 'L' at the head of the
queue and 'O' at the tail of the queue.

12
Now let's consider how the EnqueueItem and DequeueItem operations might be implemented.
To enqueue letters into the queue, we could advance the tail pointer one location and add the new
letter. To dequeue letters, we could remove the head letter and increase the head pointer one
location. While this approach seems very straightforward, it has a serious problem. As items are
added and removed, our queue will march straight through the computer's entire memory. We
have not limited the size of our queue.

Perhaps we could limit the size of the queue by not allowing the tail pointer to advance beyond a
certain location. This implementation would stop the queue from traversing the entire memory,
but it would only allow us to fill the queue one time. Once the head and tail pointers reached the
stop location, our queue would no longer work.

What we really need is a way to make our array circular. Of course, we know that the computer's
memory is linear, so we can't change the physical representation of the data. However, we can
implement our operations in such a way that our queue acts like it was a ring or a circle. In other
words, we can create a logical representation that is different from the physical representation in
memory.

Circular Queue

In a normal Queue Data Structure, we can insert elements until queue becomes full. But once
queue is full, we cannot insert the next element until all the elements are deleted from the queue.
For example consider the queue below:

After inserting all the elements into the queue.

13
Now consider the following situation even after deleting three elements from the queue:

We still get the message that says the Queue is Full and we cannot insert the new element. Even
though we have empty positions in the queue we still cannot make use of them to insert new
element. This is the major problem in normal queue data structure. To overcome this problem we
use circular queue data structure.

What is Circular Queue?

A Circular Queue can be defined as follows:

Circular Queue is a linear data structure in which the operations are performed based on FIFO (First In First Out)
principle and the last position is connected back to the first position to make a circle. Here’s a Graphical
representation of a circular queue:

Example 2

Given an array A, with two references back and front, respectively. Each time we insert
(enqueue) a new item, we increase the back index; when we remove (dequeue) an item - we
increase the front index. Here is a picture that illustrates the model:

14
As you can see from the picture, dequeue logically moves in the array from left to right. After
several moves back reaches the end, leaving no space for adding new elements.

However, there is a free space before the front index. We shall use that space for enqueueing
new items, i.e. the next entry will be stored at index 0, then 1, until we get to front. Such a model
is called a wrap around queue or a circular queue. Finally, when back reaches front, the
queue is full.

Nonlinear Data Structures

In the first lesson of this section, we discussed the problem of representing relationships between
employees at the XYZ Company. Since these relationships are not linear, we could not
adequately show the relationships using a linear data structure like a list or a stack. Instead, we
needed something that looks more like a tree.

15
A tree is just one example of a nonlinear data structure. Two other examples are
multidimensional arrays and graphs. In the next few lessons, we will examine these data
structures to see how they are represented using the computer's linear memory. Remember that in
the last lesson we saw that we could create a logical representation of a circular queue. Although
the actual memory locations were just an array (a group of linear memory cells), we made them
seem to be circular by wrapping our pointers around to the front of the array each time they
reached the end. This example demonstrated that there are two ways of representing our data
structure. The logical representation was a circle or a loop, while the physical representation was
a simple linear array.

Multidimensional Arrays

Let's return to our example of the XYZ company. After you finish creating a list of the
company's employees, you are asked to make an electronic time sheet that shows the number of
hours that each employee worked during the week. You begin by associating each data point
(hours worked) with two labels: employee's name and day of the week. To organize the data, you
use the following logical representation.

Mon Tue Wed Thu Fri


Aaron 8 9 3 4 10
Charles 8 8 7 7 5
George
7 6 5 4 7
Jack
6 9 7 6 7
Janet
John 9 5 7 8 7
Kim 7 8 9 7 9
Larry 8 4 6 8 7
Martha 2 8 6 7 9
Patricia 5 9 8 5 7
Rick 8 9 8 6 5
Sarah 6 9 8 7 6
Susan
9 9 7 6 9
Thomas
4 6 5 4 9
Zack
1 10 7 8 9
6 7 9 7 6

16
Now we have a new data structure called a two-dimensional array. We can see how the data
structure gets its name by comparing it with a typical array. All the arrays we have seen so far
were simply a group of contiguous memory cells. Since computer memory is linear, the arrays
were also linear or one-dimensional. Notice, however, that each row of the table above looks like
a typical, linear array. Our table is really a collection of one-dimensional arrays with five
memory cells. Each memory cell represents a day of the week, and each array in the table
represents an employee. Just like a one-dimensional array is a collection of memory cells, a two-
dimensional array is a collection of one-dimensional arrays.

It is common to refer to array locations by specifying the row and column numbers after the
array name. If we named our table above "Hours" then we could find the number of hours that
Aaron worked on Wednesday at the following array location: Hours[1, 3].

We can also have higher dimensional arrays that are collections of lower dimensional arrays. For
example, we could organize the time sheet for one month by making a three-dimensional array.
This array would be a collection of four weeks of time sheets which are two-dimensional arrays.

Then we could find the number of hours that Aaron worked during the second Wednesday of the
month at the following array location: Hours[2, 1, 3]. The '2' represents the second week of the
month, the '1' represents the employee Aaron, and the '3' represents Wednesday, the third day of
the week.

Trees

Another common nonlinear data structure is the tree. We have already seen an example of a tree
when we looked at the employee hierarchy from the XYZ company. Let's take another look at
this diagram with some of the important features of trees highlighted.

17
In this diagram, we can see that the starting point, or the root node, is circled in blue. A node is a
simple structure that holds data and links to other nodes. In this case, our root node contains the
data string "John" and three links to other nodes. Notice that the group of nodes circled in red do
not have any links. These nodes are at the ends of the branches and they are appropriately called
leaves or leaf nodes. In our diagram, the nodes are connected with solid black lines called arcs or
edges. These edges show the relationships between nodes in the tree. One important relationship
is the parent/child relationship. Parent nodes have at least one edge to a node lower in the tree.
This node is called the child node. Nodes can have more than one child, but children can only
have a single parent. Notice that the root node has no parent, and the leaf nodes have no children.
The final feature to note in our diagram is the subtree. At each level of the tree, we can see that
the tree structure is repeated. For example, the two nodes representing "Charles" and "Rick"
compose a very simple tree with "Charles" as the root node and and "Rick" as a single leaf node.

Binary Tree

In a normal tree, every node can have any number of children. Binary tree is a special type of
tree data structure in which every node can have a maximum of 2 children. One is known as
left child and the other is known as right child.

A tree in which every node can have a maximum of two children is called Binary Tree.

In a binary tree, every node can have either 0 children or 1 child or 2 children but not more than
2 children. Example:

18
There are different types of binary trees:

1. Strictly Binary Tree

In a binary tree, every node can have a maximum of two children. But in strictly binary tree,
every node should have exactly two children or none. That means that every internal node
must have exactly two children. A strictly Binary Tree can be defined as follows...

A binary tree in which every node has either two or zero number of children is called Strictly
Binary Tree. Strictly binary tree is also called Full Binary Tree or Proper Binary Tree.

2. Complete Binary Tree

In a binary tree, every node can have a maximum of two children. But in strictly binary tree,
every node should have exactly two children or none and in complete binary tree all the nodes
must have exactly two children and at every level of complete binary tree there must be 2 level
number of nodes. For example at level 2 there must be 22 = 4 nodes and at level 3 there must be
23 = 8 nodes.

A binary tree in which every internal node has exactly two children and all leaf nodes are at same
level is called Complete Binary Tree. Complete binary tree is also called as Perfect Binary
Tree. Example:

19
Now let's examine one way that trees are implemented in the computer's memory. We will begin
by introducing a simple tree structure called a binary tree. Binary trees have the restriction that
nodes can have no more than two children. With this restriction, we can easily determine how to
represent a single binary node in memory. Our node will need to reserve memory for data and
two pointers.

Using our binary node, we can construct a binary tree. In the data cell of each node, we will store
a letter. The physical representation of our tree might look something like this:

Although the diagram above represents a tree, it doesn't look much like the tree we examined
from the XYZ company. Because our tree uses pointers, the physical representation is much
different than the logical representation. Starting with the root node of the binary tree (the node
that contains 'H'), see if you can draw a sketch of the logical representation of this tree.

Graphs

The last data structure that we will study in this section is the graph. Graphs are similar to trees
except they do not have as many restrictions. In the previous lesson, we saw that every tree has a
root node, and all the other nodes in the tree are children of this node. We also saw that nodes
can have many children but only one parent. When we relax these restrictions, we get the graph
data structure. The logical representation of a typical graph might look something like this:

20
Notice that our graph does not have a root node like the tree data structure did. Instead, any node
can be connected with any other node. Nodes do not have a clear parent/child relationship like
we saw in the tree. Instead nodes are called neighbors if they are connected by an edge. For
example, node A above has three neighbors: B, C, and D.

It is not hard to imagine how the graph data structure could be useful for representing data.
Perhaps each of the nodes above could represent a city and the edges connecting the nodes could
represent roads. Or we could use a graph to represent a computer network where the nodes are
workstations and the edges are network connections. Graphs have so many applications in
computer science and mathematics that several algorithms have been written to perform standard
graph operations such as searching the graph and finding the shortest path between nodes of a
graph.

Now that you have a basic idea of the logical representation of graphs, let's take a look at one
way that graphs are commonly represented in computers. The representation is called an
adjacency matrix, and it uses a two-dimensional array to store information about the graph
nodes. The adjacency matrix for our graph is given below.

CD
EF A
AB B
C
D
E
F

21
-- 1 1 1 -- --
1 -- 1 -- 1 --
1 1 -- -- -- --
1 -- -- -- 1 1
-- 1 -- 1 -- --
-- -- -- 1 -- --

Notice that the matrix has six rows and six columns labeled with the nodes from the graph. We
mark a '1' in a cell if there exists an edge from the two nodes that index that cell. For example,
since we have a edge between A and B, we mark a '1' in the cells indexed by A and B. These
cells are marked with a dark gray background in the adjacency matrix. With our adjacency
matrix, we can represent every possible edge that our graph can have.

Characteristics of Data Structures

Data Structure Operations Abstract View

AddListItem(List, Item)
Ordered List RemoveListItem(List, Item)

Item PushStackItem(Stack, Item)


Stack Item PopStackItem(Stack)

Item EnqueueItem(Queue, Item)


Queue Item DequeueItem(Queue)

22
Notice that each of the ADTs above has the two elements required by our definition: (1) a
particular data structure, and (2) operations related to the data structure. We call these data types
"abstract" because we have said nothing about how they are implemented. Instead, we have
defined an interface for using the data type which consists of certain operations. The interface for
an ADT remains the same regardless of how the data structure and operations are implemented.
For example, we can use the stack ADT in a computer program by using the two operations
defined in the stack interface. We do not need to know whether these operations were
implemented using a linked list or an array.

Hashing

In all search techniques like linear search, binary search and search trees, the time required to
search a data element is depends on the total number of data elements in that data structure. In all
these search techniques, as the number of element are increased the time required to search an
element also increased linearly.

Hashing is another approach in which time required to search a data element doesn't depend on
the number of data elements. Using hashing data structure, an element is searched with constant
time complexity. Hashing is an effective way to reduce the number of comparisons to search an
element in a data structure.

Hashing is the process of indexing and retrieving element (data) in a data structure to
provide faster way of finding the element using the hash key.

Here, hash key is a value which provides the index value where the actual data is likely to store
in the data structure.

In this data structure, we use a concept called Hash table to store data. All the data values are
inserted into the hash table based on the hash key value. Hash key value is used to map the data
with index in the hash table. And the hash key is generated for every data using a hash function.
That means every entry in the hash table is based on the key value generated using a hash
function. Hash Table is defined as follows:

Hash table is just an array which maps a key (data) into the data structure with the help of
hash function such that insertion, deletion and search operations can be performed with
constant time complexity.

Hash tables are used to perform the operations like insertion, deletion and search very quickly in
a data structure. Using hash table concept insertion, deletion and search operations are
accomplished in constant time. Generally, every hash table makes use of a function, which we'll
call the hash function to map the data into the hash table. A hash function is defined as follows:

Hash function is a function which takes a piece of data (i.e. key) as input and outputs an
integer (i.e. hash value) which maps the data to a particular index in the hash table.

Basic concept of hashing and hash table is shown in the following figure:
23
A hash table is made up of two parts: an array (the actual table where the data to be searched is
stored) and a mapping function, known as a hash function. The hash function is a mapping
from the input space to the integer space that defines the indices of the array. In other words, the
hash function provides a way for assigning numbers to the input data such that the data can then
be stored at the array index corresponding to the assigned number.

24
Hash Table

How does a hash table work?

Here's a breakdown:

Let's assume you want to fill up a library with books; not just stuff them in there, but you want to
be able to easily find them again when you need them.

So, you decide that if the person who wants to read a book knows the exact title of the book, then
that's all it should take. With the title, the person, with the aid of the librarian, should be able to
go find the book easily and quickly.

So, how can you do that? Well, obviously you can keep some kind of list of where you put each
book, but then you have the same problem as searching the library because then you need to
search the list. Granted, the list would be smaller and easier to search, but still you don't want to
search sequentially from one end of the library (or list) to the other.

You want something that, with the title of the book, can give you the right spot at once, so all
you have to do is just stroll over to the right shelf, and pick up the book.

But how can that be done? Well, with a bit of forethought when you are filling up the library.

Instead of just starting to fill up the library from one end to the other, you devise a clever little
method. You take the title of the book, run it through a small computer program, which spits out
a shelf number and a slot number on that shelf. This is where you place the book.

The beauty of this program is that later on, when a person comes back in to read the book, you
feed the title through the program once more, and get back the same shelf number and slot
number that you were originally given, and this is where the book is located.

The program is called a hash algorithm or hash computation, and usually works by taking the
data fed into it (the title of the book in this case) and calculates a number from it.

For simplicity, let's say that it just converts each letter and symbol into a number, and sums them
all up. In reality it's a lot more complicated than that, but let's leave it at that for now.

The beauty of such an algorithm is that if you feed the same input into it again and again, it will
keep spitting out the same number each time.

Ok, so that's basically how a hash table works.

25
Technical stuff

First, there's the size of the number. Usually, the output of such a hash algorithm is inside a range
of some large number, typically much larger than the space you have in your table. For instance,
let's say that we have room for exactly one million books in the library. The output of the hash
calculation could be in the range of 0 to one billion; a lot higher.

So what do we do? We use something called modulus calculation, which basically says that if
you counted to the number you wanted (ie. the one billion number), but wanted to stay inside a
much smaller range, each time you hit the limit of that smaller range, you started back at 0, but
you have to keep track of how far in the big sequence you've come.

Say that the output of the hash algorithm is in the range of 0 to 20, and you get the value 17 from
a particular title. If the size of the library is only 7 books, you count 0, 1, 2, 3, 4, 5, 6, and when
you get to 7, you start back at 0. Since we need to count 17 times, we have 0, 1, 2, 3, 4, 5, 6, 0, 1,
2, 3, 4, 5, 6, 0, 1, 2, 3, and the final number is 3.

Of course modulus calculation isn't done like that, it's done with division, and remainder. The
remainder of dividing 17 by 7 is 3 (7 goes 2 times into 17, to 14, and the difference between 17
and 14 is 3).

Thus, you put the book in slot number. 3.

This leads to the next problem; Collisions. Since the algorithm has no way to space out the books
so that they fill the library exactly (or the hash table if you will), it will invariably end up
calculating a number that has been used before. In the library sense, when you get to the shelf
and the slot number you wish to put a book in, there's already a book there.

Various collision handling methods exist, including running the data into yet another calculation
to get another spot in the table, or simply to find a space close to the one you were given (i.e.
right next to the previous book). This would mean that you have some digging to do when you
try to find the book later, but it's still better than simply starting at one end of the library.

Finally, at some point, you might want to put more books into the library than the library allows.
In other words, you need to build a bigger library. Since the exact spot in the library was
calculated using the exact, and current, size of the library, it follows that if you resize the library,
you might end up having to find new spots for all the books, since the calculation done to find
their spots has changed.

26

You might also like