Iare Ds Lecture Notes 2vish
Iare Ds Lecture Notes 2vish
Iare Ds Lecture Notes 2vish
Notes
Department of Computer Science & Engineering
3CS4-05 Data Structures and Algorithms -Total Lecture Required:
Faculty Name : Asst.Professor Vaishali Sharma IA(MT1+MT2+Assignment) : 30 Marks
A data structure is a way of storing data in a computer so that it can be used efficiently and it will allow
the most efficient algorithm to be used. A well-designed data structure allows a variety of critical
operations to be performed, using as few resources, both execution time and memory space, as possible.
Data structure introduction refers to a scheme for organizing data, or in other words it is an arrangement
of data in computer's memory in such a way that it could make the data quickly available to the
processor for required calculations.
A data structure should be seen as a logical concept that must address two fundamental concerns.
Linear data structures can be constructed as a continuous arrangement of data elements in the memory.
It can be constructed by using array data type. In the linear Data Structures the relationship of adjacency
is maintained between the data elements.
1. Add an element
2. Delete an element
3. Traverse
4. Sort the list of elements
5. Search for a data element
1
For example Stack, Queue, Array and Linked Lists.
Non-linear data structure can be constructed as a collection of randomly distributed set of data item
joined together by using a special pointer (tag). In non-linear Data structure the relationship of
adjacency is not maintained between the data items.
Algorithms:
1. Input Step
2. Assignment Step
3. Decision Step
4. Repetitive Step
5. Output Step
2. Definiteness: The steps of the algorithm must be precisely defined or unambiguously specified.
3. Generality: An algorithm must be generic enough to solve all problems of a particular class.
4. Effectiveness: the operations of the algorithm must be basic enough to be put down on pencil and
paper. They should not be too complex to warrant writing another algorithm for the operation.
5. Input-Output: The algorithm must have certain initial and precise inputs, and outputs that may be
generated both at its intermediate and final steps.
An algorithm does not enforce a language or mode for its expression but only demands adherence to its
properties.
1. To save time (Time Complexity): A program that runs faster is a better program.
2
2. To save space (Space Complexity): A program that saves space over a competing program is
considerable desirable.
Efficiency of Algorithms:
The performances of algorithms can be measured on the scales of time and space. The performance of
a program is the amount of computer memory and time needed to run a program. We use two
approaches to determine the performance of a program.
Time Complexity: The time complexity of an algorithm or a program is a function of the running time
of the algorithm or a program. In other words, it is the amount of computer time it needs to run to
completion.
Space Complexity: The space complexity of an algorithm or program is a function of the space needed
by the algorithm or program to run to completion.
Stack:
A stack is a container of objects that are inserted and removed according to the last-in first-out (LIFO)
principle. In the pushdown stacks only two operations are allowed: push the item into the stack, and pop
the item out of the stack. A stack is a limited access data structure - elements can be added and removed
from the stack only at the top. Push adds an item to the top of the stack, pop removes the item from the
top. A helpful analogy is to think of a stack of books; you can remove only the top book, also you can
add a new book on the top.
A stack may be implemented to have a bounded capacity. If the stack is full and does not contain enough
space to accept an entity to be pushed, the stack is then considered to be in an overflow state. The pop
operation removes an item from the top of the stack. A pop either reveals previously concealed items or
results in an empty stack, but, if the stack is empty, it goes into underflow state, which means no items
are present in stack to be removed.
Stack is an Abstract data structure works on the principle Last In First Out (LIFO). The last element add
to the stack is the first element to be delete. Insertion and deletion can be takes place at one end called
TOP. It looks like one side closed tube.
The add operation of the stack is called push operation
3
The delete operation is called as pop operation.
Push operation on a full stack causes stack overflow.
Pop operation on an empty stack causes stack underflow.
SP is a pointer, which is used to access the top element of the stack.
If you push elements that are added at the top of the stack;
In the same way when we pop the elements, the element at the top of the stack is deleted.
Operations of stack:
There are two operations applied on stack they are
1. push
2. pop.
While performing push & pop operations the following test must be conducted on the stack.
1) Stack is empty or not
2) Stack is full or not
Push:
Push operation is used to add new elements in to the stack. At the time of addition first check the stack is
full or not. If the stack is full it generates an error message "stack overflow".
Pop:
Pop operation is used to delete elements from the stack. At the time of deletion first check the stack is
empty or not. If the stack is empty it generates an error message "stack underflow".
Representation of a Stack using Arrays:
Let us consider a stack with 6 elements capacity. This is called as the size of the stack. The number of
elements to be added should not exceed the maximum size of the stack. If we attempt to add new
element beyond the maximum size, we will encounter a stack overflow condition. Similarly, you cannot
remove elements beyond the base of the stack. If such is the case, we will reach a stack underflow
condition.
4
When an element is taken off from the stack, the operation is performed by pop().
STACK: Stack is a linear data structure which works under the principle of last in first out. Basic
operations: push, pop, display.
1. PUSH: if (top==MAX), display Stack overflow else reading the data and making stack [top]
=data and incrementing the top value by doing top++.
2. POP: if (top==0), display Stack underflow else printing the element at the top of the stack and
decrementing the top value by doing the top.
3. DISPLAY: IF (TOP==0), display Stack is empty else printing the elements in the stack from
stack [0] to stack [top].
We can represent a stack as a linked list. In a stack push and pop operations are performed at one end
called top. We can perform similar operations at one end of list using top pointer.
Stack Applications:
1. Stack is used by compilers to check for balancing of parentheses, brackets and braces.
5
4. In recursion, all intermediate arguments and return values are stored on the processor‟s stack.
5. During a function call the return address and arguments are pushed onto a stack and on return
they are popped off.
6. Depth first search uses a stack data structure to find an element from a graph.
Factorial(n)
Input: integer n ≥ 0
Output: n!
Procedure:
b) If the scanned symbol is an operand, then place directly in the postfix expression
(output).
c) If the symbol scanned is a right parenthesis, then go on popping all the items from
the stack and place them in the postfix expression till we get the matching left
parenthesis.
d) If the scanned symbol is an operator, then go on removing all the operators from the
stack and place them in the postfix expression, if and only if the precedence of the
operator which is on the top of the stack is greater than (or equal) to the precedence
of the scanned operator and push the scanned operator onto the stack otherwise,
push the scanned operator onto the stack.
6
Convert the following infix expression A + B * C – D / E * H into its equivalent postfix expression.
A A
+ A +
B AB +
* AB +*
C ABC -
- ABC *+ -
D ABC *+D -
/ ABC *+D -/
E ABC *+DE -/
* ABC *+DE/ -*
H ABC *+DE/H -*
End of The input is now empty. Pop the output symbols from the
string ABC *+DE/H*- stack until it is empty.
Procedure:
The postfix expression is evaluated easily by the use of a stack. When a number is seen, it is pushed onto
the stack; when an operator is seen, the operator is applied to the two numbers that are popped from the
stack and the result is pushed onto the stack.
6 6
5 6, 5
2 6, 5, 2
7
Next a „+‟ is read, so 3 and 2
+ 2 3 5 6, 5, 5 are popped from the stack and
their sum 5, is pushed
8 2 3 5 6, 5, 5, 8 Next 8 is pushed
Recursion:
In general terms recursion means the process to define a problem or the solution for a problem in a much
simpler way compared to the original version. It is a problem-solving programming technique that has a
remarkable and unique characteristic.
In recursion in data structure, a method or a function has the capability to decode an issue. In the process of
recursion, a problem is resolved by transforming it into small variations of itself. In this procedure, the
function can call itself either directly or indirectly. Based on the differences in call recursion in data
structure can be categorized into different categories. You will get to know about these categories later in
the blog.
Moreover, the indirect calling of the subroutine takes place whenever a subroutine calls another subroutine,
which in turn calls the first subroutine. Recursion in data structure can use several lines of code to refer to
an elaborate job.
A recursive algorithm calls itself with small input values & returns the outcome for the present input by
executing fundamental functions on the returned value for the small input. By and large, if a problem could
be sorted out by using remedies to smaller adaptations of the same problem, in addition to the smaller
variations reduced to conveniently solvable cases, then the problem could be fixed employing a recursive
algorithm.
As soon as the root issue has been developed, functions return their value to functions through which they
are called. The moment this takes place, memories are de-allocated and the cycle repeats.
One crucial element of a recursive function is the serious need for a base instance or termination point.
Every program created with recursion must make sure the functions are terminated, failing which could
result in the system crashing errors.
There are 5 effective recursion techniques programmers can make use of in functional programming. And,
they are
Even though linear recursion is by far the most widely used technique, tail recursion is considerably more
beneficial. When you use the tail recursion process, the recursive functions are called in the end.
When working with binary recursion, functions are called upon 2 times rather than being called one by one.
This specific form of recursion data structure is used in operations like merging and also tree traversal.
It is the most widely used recursion method. applying this approach, functions call themselves in a non-
complex form and then terminate the function with a termination condition. This particular strategy consists
of functions making one-off calls to itself during execution.
It is the approach where a function will use others recursively. Functions turn out calling one another in this
technique. This strategy is particularly used when writing programs making use of programming languages
that often do not support recursive calling functions. Hence, implementing mutual recursion can serve as an
alternative for a recursion function. In mutual recursion, starting circumstances are used for a single,
several, or even all of the functions.
It is an exception wherein these sorts of recursions can’t be transformed into an iterative format. Within this
technique, recursive functions pass the parameters as recursive calls that translate to recursions in
recursions.
There are five types of recursion in data structure that are broadly categorized into two major types of
recursion. These are direct recursion and indirect recursion.
In the direct recursion, functions call themselves. This kind of operation consists of a single-stage recursive
9
call by the function from within itself. Why don’t we investigate precisely how to carry out direct recursion
to determine the square root of a number.
// base case
if (x == 0)
return x;
// recursive case
else
int main() {
int input=3;
return 0;
3^4 = 9
Indirect recursion happens when functions call other functions to call the initial function. This specific
course of action consists of 2 simple steps when developing a recursive call, essentially making functions
call functions to generate a recursive call. Mutual recursion could be referred to as indirect recursion.
Let’s examine precisely how to put into action indirect recursion to print or perhaps find out all of the
figures from 10 to 20.
int n=10;
10
// declaring functions
void foo1(void);
void foo2(void);
void foo1()
if (n <= 20)
cout<<n<<” “; // prints n
n++; // increments n by 1
else
return;
void foo2()
if (n <= 20)
cout<<n<<” “; // prints n
n++; // increments n by 1
else
return;
10 11 12 13 14 15 16 17 18 19 20
int fun(int z)
printf(“%d”,z);
fun(z-1);
If you notice this particular program, you can observe that the last line ADI is going to execute for method
fun is a recursive call. And due to that, there’s no need to recall any earlier state of the program.
A recursive function is said to be non-tail recursive in case the recursion call isn’t the final thing performed
by the function. After reverting back, there’s another thing still there to assess. Now, look at the example.
int fun(int z)
fun(z-1);
printf(“%d”,z);
In this particular function, you can notice that there is another operation following the recursive call. Hence
the ADI is going to have to remember the preceding state within this specific method block. That’s the
reason this program could be regarded as non-tail recursive.
You’ll find circumstances where you can take advantage of iteration or recursion in data structure.
Nevertheless, you must always select a solution that seems to be the considerably more natural fit for a
challenge. A recursion in data structure is, obviously, the right choice when we talk about data abstraction.
Folks generally use recursive definitions to describe data along with associated operations.
It will not be inaccurate to imply that recursion in data structure is usually the natural alternative for issues
related to the application of distinct operations on data. Nevertheless, you can get certain things related to
recursion in data structure that could not make it the ideal answer for every situation. In these situations, a
solution such as the iterative technique is the greatest fit.
The implementation of recursion in data structure uses a good deal of stack space, which may usually lead
to redundancy. Whenever we make use of recursion in data structure, we call a method that results in the
formation of a whole new instance of that approach. A new instance holds variables and parameters that are
kept on the stack, and therefore are taken on the return. So while recursion in data structure is a slightly
12
more straightforward option as opposed to others, it is not often the most practical.
Additionally, we do not have some predefined guidelines that may help select recursion or iteration for
various issues. The most significant advantage of using recursion in data structure is that it’s a concise
technique. This process makes checking out and maintaining it much easier than usual. But recursive
methods are not the most effective approaches available to us as they take a good deal of storage capacity
and use up plenty of time during implementation.
Acknowledging a couple of factors can make it easier to determine if selecting a recursion in data structure
for a problem is the appropriate way to go or not. You must opt for recursion if the issue that you’re
intending to fix is pointed out in recursive terms and the recursive solution appears to be less complicated.
You must remember that recursion in data structure, in nearly all cases, simplifies the implementation of the
algorithms that you would like to use. However, if the complexities connected with making use of iteration
and recursion are the same for a specified issue, you must opt for iteration as the prospects of it being a lot
more effective are higher.
Each recursive call causes a new version of that method in the memory. When the data is returned by this
method, the copy is taken out of the memory. Since all the variables and other stuff declared within the
function get saved in the stack. As a result, a separate stack is maintained at each recursive call. After the
value is returned from the corresponding function, the stack will get wiped out. Recursion in data structure
consists of a lot of complexity in solving and monitoring the values at each recursive call. Thus we need to
maintain the stack and monitor the values of the variables determined in the stack.
Let us figure out how to efficiently apply recursion in data structure using different programming languages
to create a few easy programs or seek answers to diverse problems. Being recognized as one of the most
compact and optimized strategies of generating functions call or deploy functions when programming,
recursion in data structure could be used to effectively employ many algorithms and functions.
Recursion in data structure is quite widely used in programming languages including C++. Let’s examine
exactly how to work with the recursive function in C++ to carry out recursion in data structure when
creating a program to obtain the factorial of 6.
// Factorial function
int f(int n)
// Stop condition
if ( n == 0 || n == 1 )
return 1;
// Recursive condition
else
// Driver code
int main()
int n = 6;
return 0;
15
Implementing Recursion in Data Structure C
Let’s apply recursion in data structure in C to determine the sum of consecutive numbers from 0.
#include
scanf(“%d”, &number);
result = sum(number);
return 0;
int sum(int n) {
if (n != 0)
return n + sum(n-1);
else
return n;
Enter the last number: (Let’s use the positive integer 6, then the consecutive numbers are 1, 2, 3, 4, 5, and
6)
Reversing an Array
Let’s look at the issue of reversing the n components of an array, A so that the first component turns into
the final, the 2nd component will become the 2nd to the last, and so forth. We can work out this specific
challenge with the linear recursion in data structure, by paying attention to how the reversal of an array
could be done by swapping the first and last components and subsequently recursively reversing the rest of
the components in the array.
if i < j then
17
Swap A[i] and A[j]
return
Fibonacci Sequence
Fibonacci sequences are the sequence of numbers 1, 1, 2, 3, 5, 8, 13, 21, 34, 55…. The initial two numbers
of the sequence are the two 1, while every succeeding number is the sum of the two numbers prior to it. We
can set a function F(n) that will calculate the nth Fibonacci number.
if (n <= 1) then
return (n, 0)
else
return (i + j, i)
Recursion uses stack space: Each recursive method call creates a new instance of the method, one with
a brand new set of local variables. The complete stack space consumed is dependent upon the degree
of nesting of this recursion method, and the number of local variables along with parameters.
Recursion might carry out redundant computations: Look at the recursive computation of the Fibonacci
sequence.
In total, an individual has to weigh the ease of the code presented by recursion against its downsides as
outlined above. When a straightforward iterative alternative is achievable, it’s undoubtedly a much better
option.
Towers of Hanoi
Input: The aim of the tower of Hanoi problem is to move the initial n different sized disks from needle
A to needle C using a temporary needle B. The rule is that no larger disk is to be placed above the
smaller disk in any of the needle while moving or at any time, and only the top of the disk is to be
moved at a time from any needle to any needle.
Output:
18
2. If n>1, move the top n-1 disks from A to B using C as temporary.
return
n=4
A queue is a data structure that is best described as "first in, first out". A queue is another special kind of
list, where items are inserted at one end called the rear and deleted at the other end called the front. A
real world example of a queue is people waiting in line at the bank. As each person enters the bank, he
or she is "enqueued" at the back of the line. When a teller becomes available, they are "dequeued" at the
front of the line.
Let us consider a queue, which can hold maximum of five elements. Initially the queue is empty.
19
0 1 2 3 4
RE A R = RE A R + 1 = 2
F RO NT = 0
FR
Again insert another element 33 to the queue. The
Q u e u e E mp t y
status of the queue is:
F RO NT = RE A R = 0 0 1 2 3 4
11 22 33
0 1 2 3 4
RE A R = RE A R + 1 = 3
11 F RO NT = 0
F R
RE A R = RE A R + 1 = 1 F
RO NT = 0
0 1 2 3 4
11 22
F R
Now, delete an element. The element deleted is the element at the front of the queue. So the status of the
queue is:
0 1 2 3 4
22 33 RE A R = 3
F RO NT = F R O NT + 1 = 1
F R
Again, delete an element. The element to be deleted is always pointed to by the FRONT pointer. So, 22
is deleted. The queue status is as follows:
20
0 1 2 3 4
33
F R
RE A R = 3
F RO NT = F R O NT + 1 = 2
Now, insert new elements 44 and 55 into the queue. The queue status is:
0 1 2 3 4
33 44 55
F R
RE A R = 5
F RO NT = 2
Next insert another element, say 66 to the queue. We cannot insert 66 to the queue as the rear crossed the
maximum size of the queue (i.e., 5). There will be queue full signal. The queue status is as follows:
0 1 2 3 4
33 44 55
F R
RE A R = 5 F RO NT = 2
Now it is not possible to insert an element 66 even though there are two vacant positions in the linear
queue. To overcome this problem the elements of the queue are to be shifted towards the beginning of
the queue so that it creates vacant position at the rear end. Then the FRONT and REAR are to be
adjusted properly. The element 66 can be inserted at the rear end. After this operation, the queue status is
as follows:
0 1 2 3 4
RE A R = 4
33 44 55 66
F RO NT = 0
F R
This difficulty can overcome if we treat queue position with index 0 as a position that comes after
position with index 4 i.e., we treat the queue as a circular queue.
While implementing a queue data structure using stacks, we will have to consider the natural behaviour
21
of stack too, which is First in Last Out.
For performing enqueue we require only one stack as we can directly push data onto the stack, but to
perform dequeue we will require two Stacks, because we need to follow queue's FIFO property and if we
directly pop any data element out of Stack, it will follow LIFO approach(Last in First Out).
We can represent a queue as a linked list. In a queue data is deleted from the front end and inserted at the
rear end. We can perform similar operations on the two ends of a list. We use two pointers front and rear
for our linked queue implementation.
Applications of Queues:
2. When multiple users send print jobs to a printer, each printing job is kept in the printing queue.
Then the printer prints those jobs according to first in first out (FIFO) basis.
3. Breadth first search uses a queue data structure to find an element from a graph.
There are two problems associated with linear queue. They are:
Time consuming: linear time to be spent in shifting the elements to the beginning of the queue.
The output restricted DEQUE allows deletions from only one end and input restricted DEQUE allow
insertions at only one end. The DEQUE can be constructed in two ways they are
1) Using array
Operations in DEQUE
1. Insert element at back
23
24
Applications of DEQUE:
1. The A-Steal algorithm implements task scheduling for several processors (multiprocessor
scheduling).
3. When one of the processor completes execution of its own threads it can steal a thread from
another processor.
4. It gets the last element from the deque of another processor and executes it.
Circular Queue:
Circular queue is a linear data structure. It follows FIFO principle. In circular queue the last node is
connected back to the first node to make a circle.
Elements are added at the rear end and the elements are deleted at front end of the queue
Both the front and the rear pointers points to the beginning of the array.
3. Using arrays
25
Representation of Circular Queue:
Let us consider a circular queue, which can hold maximum (MAX) of six elements. Initially the queue is
empty.
F R
5 0
1 Q u e u e E mp t y
4 M A X = 6
F RO NT = RE A R = 0
C O U NT = 0
3 2
C irc u lar Q u e u e
Now, insert 11 to the circular queue. Then circular queue status will be:
5 0
R
11
1 F RO NT = 0
4 RE A R = ( RE A R + 1) % 6 = 1
C O U NT = 1
3 2
C irc u lar Q u e u e
Insert new elements 22, 33, 44 and 55 into the circular queue. The circular queue status is:
F
R
0
5
11
1 F RO N T = 0, RE A R =22
5
4 55
RE A R = RE A R % 6 = 5
C O U NT = 5
44 33
3 2
C irc u lar Q u e u e
Now, delete an element. The element deleted is the element at the front of the circular queue. So, 11 is
deleted. The circular queue status is as follows:
26
R
0
5
F
22 1 F RO NT = (F R O NT + 1) % 6 = 1
4 55
RE A R = 5
C O U NT = C O U NT - 1 = 4
44 33
3 2
C irc u lar Q u e u e
Again, delete an element. The element to be deleted is always pointed to by the FRONT pointer. So, 22
is deleted. The circular queue status is as follows:
0
5
1 F RO NT = (F R O NT + 1) % 6 = 2
4 55
RE A R = 5
C O U NT = C O U NT - 1 = 3
44 33
F
3 2
C irc u lar Q u e u e
Again, insert another element 66 to the circular queue. The status of the circular queue is:
0
5
66
55 1
4 F RO N T = 2
RE A R = ( RE A R + 1) % 6 = 0
C O U NT = C O U NT + 1 = 4
44 33
F 3 2
C irc u lar Q u e u e
Now, insert new elements 77 and 88 into the circular queue. The circular queue status is:
27
0
5
66 77
55 88 1
4 F RO NT = 2, RE A R = 2
RE A R = RE A R % 6 = 2
C O U NT = 6
44 33
2 R
3 F
C irc u lar Q u e u e
Now, if we insert an element to the circular queue, as COUNT = MAX we cannot add the element to
circular queue. So, the circular queue is full.
Priority Queue is an abstract data type that is similar to a queue, and every element has some priority
value associated with it. The priority of the elements in a priority queue determines the order in which
elements are served (i.e., the order in which they are removed). If in any case the elements have same
priority, they are served as per their ordering in the queue.
28
How to Implement Priority Queue?
Priority queue can be implemented using the following data structures:
Arrays
Linked list
Heap data structure
Binary search tree
A typical priority queue supports following operations.
insert(item, priority): Inserts an item with given priority.
getHighestPriority(): Returns the highest priority item.
deleteHighestPriority(): Removes the highest priority item.
Implementation priority queue
insert() operation can be implemented by adding an item at end of array in O(1) time.
getHighestPriority() operation can be implemented by linearly searching the highest priority item in
array. This operation takes O(n) time.
deleteHighestPriority() operation can be implemented by first linearly searching an item, then removing
the item by moving all subsequent items one position back.
We can also use Linked List, time complexity of all operations with linked list remains same as array. The
advantage with linked list is deleteHighestPriority() can be more efficient as we don’t have to move
items.
29
LINKED LISTS
Linked lists and arrays are similar since they both store collections of data. Array is the most common
data structure used to store collections of elements. Arrays are convenient to declare and provide the easy
syntax to access any element by its index number. Once the array is set up, access to any element is
convenient and fast.
The disadvantages of arrays are:
• The size of the array is fixed. Most often this size is specified at compile time. This makes the
programmers to allocate arrays, which seems "large enough" than required.
• Inserting new elements at the front is potentially expensive because existing elements need to be
shifted over to make room.
• Deleting an element from an array is not possible. Linked lists have their own strengths and
weaknesses, but they happen to be strong where arrays are weak.
Generally array's allocates the memory for all its elements in one block whereas linked lists use
an entirely different strategy. Linked lists allocate memory for each element separately and only
when necessary.
Linked List Concepts:
A linked list is a non-sequential collection of data items. It is a dynamic data structure. For every data
item in a linked list, there is an associated pointer that would give the memory location of the next data
item in the linked list. The data items in the linked list are not in consecutive memory locations. They
may be anywhere, but the accessing of these data items is easier as each data item contains the address of
the next data item.
Advantages of linked lists:
Linked lists have many advantages. Some of the very important advantages are:
1. Linked lists are dynamic data structures. i.e., they can grow or shrink during the execution of a
program.
2. Linked lists have efficient memory utilization. Here, memory is not preallocated. Memory is
allocated whenever it is required and it is de-allocated (removed) when it is no longer needed.
3. Insertion and Deletions are easier and efficient. Linked lists provide flexibility in inserting a data
item at a specified position and deletion of the data item from the given position.
4. Many complex applications can be easily carried out with linked lists.
55
4. Circular Double Linked List.
A single linked list is one in which all nodes are linked together in some sequential manner. Hence, it is
also called as linear linked list.
A double linked list is one in which all nodes are linked together by multiple links which helps in
accessing both the successor node (next node) and predecessor node (previous node) from any arbitrary
node within the list. Therefore each node in a double linked list has two link fields (pointers) to point to
the left node (previous) and the right node (next). This helps to traverse in forward direction and
backward direction.
A circular linked list is one, which has no beginning and no end. A single linked list can be made a
circular linked list by simply storing address of the very first node in the link field of the last node.
A circular double linked list is one, which has both the successor pointer and predecessor pointer in the
circular manner.
56
Applications of linked list:
1. Linked lists are used to represent and manipulate polynomial. Polynomials are expression containing
terms with nonzero coefficient and exponents. For example: P(x) = a0 Xn + a1 Xn-1 + …… + an-1 X
+ an
2. Represent very large numbers and operations of the large number such as addition, multiplication and
division.
3. Linked lists are to implement stack, queue, trees and graphs. 4. Implement the symbol table in
compiler construction.
Single Linked List:
A linked list allocates space for each element separately in its own block of memory called a "node". The
list gets an overall structure by using pointers to connect all its nodes together like the links in a chain.
Each node contains two fields; a "data" field to store whatever element, and a "next" field which is a
pointer used to link to the next node. Each node is allocated in the heap using malloc(), so the node
memory continues to exist until it is explicitly de-allocated using free(). The front of the list is a pointer to
the “start” node.
57
Insertion of a Node:
One of the most primitive operations that can be done in a singly linked list is the insertion of a node.
Memory is to be allocated for the new node (in a similar way that is done while creating a list) before
reading the data. The new node will contain empty data field and empty next field. The data field of the
new node is then stored with the information read from the user. The next field of the new node is
assigned to NULL. The new node can then be inserted at three different places namely:
• Inserting a node at the beginning.
• Inserting a node at the end.
• Inserting a node at intermediate position.
58
Inserting a node into the single linked list at a specified intermediate position other than
beginning and end.
Deletion of a node:
Another primitive operation that can be done in a singly linked list is the deletion of a node. Memory is
to be released for the node to be deleted. A node can be deleted from the list from three different places
namely.
• Deleting a node at the beginning.
• Deleting a node at the end.
• Deleting a node at intermediate position.
Deleting a node at the beginning:
59
Deleting a node at the end:
60
Note that if your linked lists do include a header node, there is no need for the special case code given
above for the remove operation; node n can never be the first node in the list, so there is no need to check
for that case. Similarly, having a header node can simplify the code that adds a node before a given node
n.
Note that if you do decide to use a header node, you must remember to initialize an empty list to contain
one (dummy) node, you must remember not to include the header node in the count of "real" nodes in the
list.
It is also useful when information other than that found in each node of the list is needed. For example,
imagine an application in which the number of items in a list is often calculated. In a standard linked list,
the list function to count the number of nodes has to traverse the entire list every time. However, if the
current length is maintained in a header node, that information can be obtained very quickly. 3.5. Array
based linked lists: Another alternative is to allocate the nodes in blocks. In fact, if you know the
61
maximum size of a list a head of time, you can pre-allocate the nodes in a single array. The result is a
hybrid structure – an array based linked list.
shows an example of null terminated single linked list where all the nodes are allocated contiguously in
an array.
Double Linked List: A double linked list is a two-way list in which all nodes will have two links. This
helps in accessing both successor node and predecessor node from the given node position. It provides bi-
directional traversing. Each node contains three fields:
Left link.
Data.
Right link.
The left link points to the predecessor node and the right link points to the successor node. The data field
stores the required data.
Many applications require searching forward and backward thru nodes of a list. For example searching
for a name in a telephone directory would need forward and backward scanning thru a region of the whole
list.
The basic operations in a double linked list are:
Creation.
Insertion.
Deletion.
Traversing.
A double linked list is shown in figure
62
The beginning of the double linked list is stored in a "start" pointer which points to the first node. The
first node‟s left link and last node‟s right link is set to NULL.
Creating a node for Double Linked List:
Creating a double linked list starts with creating a node. Sufficient memory has to be allocated for
creating a node. The information is stored in the memory.
Double Linked List with 3 nodes:
63
Inserting a node at an intermediate position:
64
Traversal and displaying a list (Left to Right):
To display the information, you have to traverse the list, node by node from the first node, until the end
of the list is reached. The function traverse_left_right() is used for traversing and displaying the
information stored in the list from left to right.
Traversal and displaying a list (Right to Left):
To display the information from right to left, you have to traverse the list, node by node from the first
node, until the end of the list is reached. The function traverse_right_left() is used for traversing and
displaying the information stored in the list from right to left.
65
Inserting a node at the beginning:
67
Inserting a node at an intermediate position:
68
Deleting a node at Intermediate position:
Polynomials:
A polynomial is of the form: i n i ∑ ci
Where, ci is the coefficient of the ith term and
n is the degree of the polynomial
Some examples are:
5x2 + 3x + 1
12x3 – 4x
5x4 – 8x3 + 2x2 + 4x1 + 9x0
69
It is not necessary to write terms of the polynomials in decreasing order of degree. In other words the two
polynomials 1 + x and x + 1 are equivalent.
The computer implementation requires implementing polynomials as a list of pairs of coefficient and
exponent. Each of these pairs will constitute a structure, so a polynomial will be represented as a list of
structures.
A linked list structure that represents polynomials 5x4 – 8x3 + 2x2 + 4x1 + 9x0
Addition of Polynomials:
To add two polynomials we need to scan them once. If we find terms with the same exponent in the two
polynomials, then we add the coefficients; otherwise, we copy the term of larger exponent into the sum
and go on. When we reach at the end of one of the polynomial, then remaining part of the other is copied
into the sum.
To add two polynomials follow the following steps:
• Read two polynomials
• Add them.
• Display the resultant polynomial.
70
UNIT – IV NON LINEAR DATA STRUCTURES
Trees Basic Concepts:
A tree is a non-empty set one element of which is designated the root of the tree while the remaining
elements are partitioned into non-empty sets each of which is a sub-tree of the root.
A tree T is a set of nodes storing elements such that the nodes have a parent-child relationship that
satisfies the following
• If T is not empty, T has a special tree called the root that has no parent.
• Each node v of T different than the root has a unique parent node w; each node with parent w is a
child of w.
Tree nodes have many useful properties. The depth of a node is the length of the path (or the number of
edges) from the root to that node. The height of a node is the longest path from that node to its leaves.
The height of a tree is the height of the root. A leaf node has no children -- its only path is up to its
parent.
Binary Tree:
In a binary tree, each node can have at most two children. A binary tree is either empty or consists of a
node called the root together with two binary trees called the left subtree and the right subtree.
84
Tree Terminology:
Leaf node
A node with no children is called a leaf (or external node). A node which is not a leaf is called an internal
node.
Path: A sequence of nodes n1, n2, . . ., nk, such that ni is the parent of ni + 1 for i = 1, 2,. . ., k - 1. The
length of a path is 1 less than the number of nodes on the path. Thus there is a path of length zero from a
node to itself.
Ancestor and Descendent If there is a path from node A to node B, then A is called an ancestor of B
and B is called a descendent of A.
Level: The level of the node refers to its distance from the root. The root of the tree has level O, and the
level of any other node in the tree is one more than the level of its parent.
Height:The maximum level in a tree determines its height. The height of a node in a tree is the length of a
longest path from the node to a leaf. The term depth is also used to denote height of the tree.
Depth:The depth of a node is the number of nodes along the path from the root to that node.
Assigning level numbers and Numbering of nodes for a binary tree: The nodes of a binary tree can be
numbered in a natural way, level by level, left to right. The nodes of a complete binary tree can be
numbered so that the root is assigned the number 1, a left child is assigned twice the number assigned its
parent, and a right child is assigned one more than twice the number assigned its parent.
85
Properties of Binary Trees:
3. Since a binary tree can contain at most one node at level 0 (the root), it can contain at most 2l
node at level l.
If every non-leaf node in a binary tree has nonempty left and right subtrees, the tree is termed a strictly
binary tree. Thus the tree of figure 7.2.3(a) is strictly binary. A strictly binary tree with n leaves always
contains 2n - 1 nodes.
A full binary tree of height h has all its leaves at level h. Alternatively; All non leaf nodes of a full binary
tree have two children, and the leaf nodes have no children.
A full binary tree with height h has 2h + 1 - 1 nodes. A full binary tree of height h is a strictly binary tree all
of whose leaves are at level h.
86
For example, a full binary tree of height 3 contains 2 3+1 – 1 = 15 nodes.
A binary tree with n nodes is said to be complete if it contains all the first n nodes of the above
numbering scheme.
A complete binary tree of height h looks like a full binary tree down to level h-1, and the level h is filled
from left to right.
A Binary tree is Perfect Binary Tree in which all internal nodes have two children and all leaves are at
same level.
Following are examples of Perfect Binary Trees.
18
/ \
15 30
/ \ /\
40 50 100 40
87
18
/ \
15 30
A Perfect Binary Tree of height h (where height is number of nodes on path from root to leaf) has 2h – 1
node.
Example of Perfect binary tree is ancestors in family. Keep a person at root, parents as children, parents
of parents as their children.
A binary tree is balanced if height of the tree is O(Log n) where n is number of nodes. For Example, AVL
tree maintain O(Log n) height by making sure that the difference between heights of left and right
subtrees is 1. Red-Black trees maintain O(Log n) height by making sure that the number of Black nodes
on every root to leaf paths are same and there are no adjacent red nodes. Balanced Binary Search trees are
performance wise good as they provide O(log n) time for search, insert and delete.
Representation of Binary Trees:
2. Pointer-based.
For these nodes are numbered / indexed according to a scheme giving 0 to root. Then all the
nodes are numbered from left to right level by level from top to bottom. Empty nodes are also
numbered. Then each node having an index i is put into the array as its ith element.
In the figure shown below the nodes of binary tree are numbered according to the given scheme.
88
The figure shows how a binary tree is represented as an array. The root 3 is the 0 th element while its
leftchild 5 is the 1 st element of the array. Node 6 does not have any child so its children i.e. 7 th and 8 th
element of the array are shown as a Null value.
It is found that if n is the number or index of a node, then its left child occurs at (2n + 1) th position and
right child at (2n + 2) th position of the array. If any node does not have any of its child, then null value is
stored at the corresponding index of the array.
The following program implements the above binary tree in an array form. And then traverses
the tree in inorder traversal.
Binary trees can be represented by links where each node contains the address of the left child
and the right child. If any node has its left or right child empty then it will have in its respective
link field, a null value. A leaf node has null value in both of its links.
Traversal of a binary tree means to visit each node in the tree exactly once. The tree traversal is used in all t
it.
89
In a linear list nodes are visited from first to last, but a tree being a non linear one we need definite rules. Th
ways to traverse a tree. All of them differ only in the order in which they visit the nodes.
Inorder Traversal
Preorder Traversal
Postorder Traversal
In all of them we do not require to do anything to traverse an empty tree. All the traversal methods are base
90
functions since a binary tree is itself recursive as every child of a node in a binary tree is itself a binary tree.
Inorder Traversal:
To traverse a non empty tree in inorder the following steps are followed recursively.
Preorder Traversal:
Algorithm Pre-order(tree)
Post-order Traversal:
Algorithm Post-order(tree)
91
self.val = key
if root:
if root:
if root:
# Driver code
92
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
print "Preorder traversal of binary tree is"
printPreorder(root)
Binary Search Tree, is a node-based binary tree data structure which has the following properties:
The left sub-tree of a node contains only nodes with keys less than the node’s key.
The right sub-tree of a node contains only nodes with keys greater than the node’s key.
The left and right sub-tree each must also be a binary search tree.
There must be no duplicate nodes.
The above properties of Binary Search Tree provide an ordering among keys so that the operations like
search, minimum and maximum can be done fast. If there is no ordering, then we may have to compare
every key to search a given key.
Searching a key
To search a given key in Binary Search Tree, we first compare it with root, if the key is present at root,
we return root. If key is greater than root’s key, we recur for right sub-tree of root node. Otherwise we
recur for left sub-tree.
# A utility function to search a given key in BST
def search(root,key):
# Base Cases: root is null or key is present at root
93
if root is None or root.val == key:
return root
# Key is greater than root's key
if root.val < key:
return search(root.right,key)
# Key is smaller than root's key
return search(root.left,key)
Application of Trees:
1. One reason to use trees might be because you want to store information that naturally forms a
hierarchy. For example, the file system on a computer:
file system
———–
/ <-- root
/ \
... home
/ \
ugrad course
/ / | \
... cs101 cs112 cs113
2. If we organize keys in form of a tree (with some ordering e.g., BST), we can search for a
given key in moderate time (quicker than Linked List and slower than arrays). Self-balancing
search trees like AVL and Red-Black trees guarantee an upper bound of O(logn) for search.
3) We can insert/delete keys in moderate time (quicker than Arrays and slower than Unordered
Linked Lists). Self-balancing search trees like AVL and Red-Black trees guarantee an upper
bound of O(logn) for insertion/deletion.
4) Like Linked Lists and unlike Arrays, Pointer implementation of trees don’t have an upper
limit on number of nodes as nodes are linked using pointers.
The pair is ordered because (u, v) is not same as (v, u) in case of directed graph (di-graph). The pair of
form (u, v) indicates that there is an edge from vertex u to vertex v. The edges may contain
weight/value/cost.
95
Graph and its representations:
Graphs are used to represent many real life applications: Graphs are used to represent networks. The
networks may include paths in a city or telephone network or circuit network. Graphs are also used in
social networks like linkedIn, facebook. For example, in facebook, each person is represented with a
vertex(or node). Each node is a structure and contains information like person id, name, gender and
locale.
There are other representations also like, Incidence Matrix and Incidence List. The choice of the graph
representation is situation specific. It totally depends on the type of operations to be performed and ease
of use.
Adjacency Matrix:
Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph. Let the 2D
array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex j. Adjacency
matrix for undirected graph is always symmetric. Adjacency Matrix is also used to represent weighted
graphs. If adj[i][j] = w, then there is an edge from vertex i to vertex j with weight w.
96
Adjacency Matrix Representation of the above graph
Pros: Representation is easier to implement and follow. Removing an edge takes O(1) time. Queries like
whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and can be done O(1).
Cons: Consumes more space O(V^2). Even if the graph is sparse (contains less number of edges), it
consumes the same space. Adding a vertex is O(V^2) time.
Adjacency List:
An array of linked lists is used. Size of the array is equal to number of vertices. Let the array be array[].
An entry array[i] represents the linked list of vertices adjacent to the ith vertex. This representation can
also be used to represent a weighted graph. The weights of edges can be stored in nodes of linked lists.
Following is adjacency list representation of the above graph.
Breadth First Traversal (or Search) for a graph is similar to Breadth First Traversal of a tree The only
catch here is, unlike trees, graphs may contain cycles, so we may come to the same node again. To avoid
processing a node more than once, we use a boolean visited array.
For example, in the following graph, we start traversal from vertex 2. When we come to vertex 0, we look
for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we don’t mark visited vertices, then 2
will be processed again and it will become a non-terminating process. Breadth First Traversal of the
following graph is 2, 0, 3, 1.
97
Algorithm: Breadth-First Search Traversal
BFS(V, E, s)
98
Applications of Breadth First Traversal
1) Shortest Path and Minimum Spanning Tree for unweighted graph In unweighted graph, the
shortest path is the path with least number of edges. With Breadth First, we always reach a vertex from
given source using minimum number of edges. Also, in case of unweighted graphs, any spanning tree is
Minimum Spanning Tree and we can use either Depth or Breadth first traversal for finding a spanning
tree.
2) Peer to Peer Networks. In Peer to Peer Networks like BitTorrent, Breadth First Search is used to find
all neighbor nodes.
3) Crawlers in Search Engines: Crawlers build index using Bread First. The idea is to start from source
page and follow all links from source and keep doing same. Depth First Traversal can also be used for
crawlers, but the advantage with Breadth First Traversal is, depth or levels of built tree can be limited.
4) Social Networking Websites: In social networks, we can find people within a given distance ‘k’ from
a person using Breadth First Search till ‘k’ levels.
5) GPS Navigation systems: Breadth First Search is used to find all neighboring locations.
6) Broadcasting in Network: In networks, a broadcasted packet follows Breadth First Search to reach all
nodes.
7) In Garbage Collection: Breadth First Search is used in copying garbage collection using Cheney’s
algorithm.
8) Cycle detection in undirected graph: In undirected graphs, either Breadth First Search or Depth First
Search can be used to detect cycle. In directed graph, only depth first search can be used.
9) Ford–Fulkerson algorithm In Ford-Fulkerson algorithm, we can either use Breadth First or Depth
First Traversal to find the maximum flow. Breadth First Traversal is preferred as it reduces worst case
time complexity to O(VE2).
10) To test if a graph is Bipartite We can either use Breadth First or Depth First Traversal.
11) Path Finding We can either use Breadth First or Depth First Traversal to find if there is a path
between two vertices.
12) Finding all nodes within one connected component: We can either use Breadth First or Depth First
Traversal to find all nodes reachable from a given node.
Depth First Traversal (or Search) for a graph is similar to Depth First Traversal of a tree. The only catch
here is, unlike trees, graphs may contain cycles, so we may come to the same node again. To avoid
processing a node more than once, we use a boolean visited array.
For example, in the following graph, we start traversal from vertex 2. When we come to vertex 0, we look
for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we don’t mark visited vertices, then 2
will be processed again and it will become a non-terminating process. Depth First Traversal of the
99
following graph is 2, 0, 1, 3
The DFS forms a depth-first forest comprised of more than one depth-first trees. Each tree is made of
edges (u, v) such that u is gray and v is white when edge (u, v) is explored. The following pseudocode for
DFS uses a global timestamp time.
DFS (V, E)
DFS-Visit(u)
color[u] ← GRAY
time ← time + 1
d[u] ← time
for each vertex v adjacent to u
do if color[v] ← WHITE
then π[v] ← u
DFS-Visit(v)
color[u] ← BLACK
time ← time + 1
f[u] ← time
Applications of Depth First Search
100
1) For an unweighted graph, DFS traversal of the graph produces the minimum spanning tree and all pair
shortest path tree.
3) Path Finding
We can specialize the DFS algorithm to find a path between two given vertices u and z.
i) Call DFS(G, u) with u as the start vertex.
ii) Use a stack S to keep track of the path between the start vertex and the current vertex.
iii) As soon as destination vertex z is encountered, return the path as the
contents of the stack
4) Topological Sorting
6) Finding Strongly Connected Components of a graph A directed graph is called strongly connected
if there is a path from each vertex in the graph to every other vertex. (See this for DFS based also for
finding Strongly Connected Components)
DFS search starts from root node then traversal into left child node and continues, if item found it stops
otherwise it continues. The advantage of DFS is it requires less memory compare to Breadth First Search
(BFS).
BFS search starts from root node then traversal into next level of graph or tree and continues, if item
found it stops otherwise it continues. The disadvantage of BFS is it requires more memory compare to
Depth First Search (DFS).
101
UNIT – V BINARY TREES AND HASHING
Binary Search Trees:
An important special kind of binary tree is the binary search tree (BST). In a BST, each node
stores some information including a unique key value, and perhaps some associated data. A
binary tree is a BST iff, for every node n in the tree:
All keys in n's left subtree are less than the key in n, and
All keys in n's right subtree are greater than the key in n.
In other words, binary search trees are binary trees in which all values in the node’s left subtree are less
than node value all values in the node’s right subtree are greater than node value.
Here are some BSTs in which each node just stores an integer key:
In the left one 5 is not greater than 6. In the right one 6 is not greater than 7.
The reason binary-search trees are important is that the following operations can be implemented
efficiently using a BST:
102
and searching for 12:
103
Properties and Operations:
Inserting a node
A naïve algorithm for inserting a node into a BST is that, we start from the root node, if the node to insert
is less than the root, we go to left child, and otherwise we go to the right child of the root. We continue
this process (each node is a root for some sub tree) until we find a null pointer (or leaf node) where we
cannot go any further. We then insert the node as a left or right child of the leaf node based on node is less
or greater than the leaf node. We note that a new node is always inserted as a leaf node. A recursive
algorithm for inserting a node into a BST is as follows. Assume we insert a node N to tree T. if the tree is
empty, the we return new node N as the tree. Otherwise, the problem of inserting is reduced to inserting
the node N to left of right sub trees of T, depending on N is less or greater than T. A definition is as
follows.
Insert(N, T) = N if T is empty
= insert(N, T.left) if N < T
= insert(N, T.right) if N > T
104
= search(N, T.left) if N < T
= search(N, T.right) if N > T
Deleting a node
A BST is a connected structure. That is, all nodes in a tree are connected to some other node. For
example, each node has a parent, unless node is the root. Therefore deleting a node could affect all sub
trees of that node. For example, deleting node 5 from the tree could result in losing sub trees that are
rooted at 1 and 9.
Hence we need to be careful about deleting nodes from a tree. The best way to deal with deletion seems to
be considering special cases. What if the node to delete is a leaf node? What if the node is a node with
just one child? What if the node is an internal node (with two children). The latter case is the hardest to
resolve. But we will find a way to handle this situation as well.
Case 1 : The node to delete is a leaf node
This is a very easy case. Just delete the node 46. We are done
105
Case 3: The node to delete is a node with two children
This is a difficult case as we need to deal with two sub trees. But we find an easy way to handle it. First
we find a replacement node (from leaf node or nodes with one child) for the node to be deleted. We need
to do this while maintaining the BST order property. Then we swap leaf node or node with one child with
the node to be deleted (swap the data) and delete the leaf node or node with one child (case 1 or case 2)
Next problem is finding a replacement leaf node for the node to be deleted. We can easily find this as
follows. If the node to be deleted is N, the find the largest node in the left sub tree of N or the smallest
node in the right sub tree of N. These are two candidates that can replace the node to be deleted without
losing the order property. For example, consider the following tree and suppose we need to delete the root
38.
Then we find the largest node in the left sub tree (15) or smallest node in the right sub tree (45) and
replace the root with that node and then delete that node. The following set of images demonstrates this
process.
106
107
Balanced Search Trees:
A self-balancing (or height-balanced) binary search tree is any node-based binary search tree
that automatically keeps its height (maximal number of levels below the root) small in the face of
arbitrary item insertions and deletions. The red–black tree, which is a type of self-balancing
binary search tree, was called symmetric binary B-tree. Self-balancing binary search trees can be
used in a natural way to construct and maintain ordered lists, such as priority queues. They can
also be used for associative arrays; key-value pairs are simply inserted with an ordering based on
the key alone. In this capacity, self-balancing BSTs have a number of advantages and
disadvantages over their main competitor, hash tables. One advantage of self-balancing BSTs is
that they allow fast (indeed, asymptotically optimal) enumeration of the items in key order,
which hash tables do not provide. One disadvantage is that their lookup algorithms get more
complicated when there may be multiple items with the same key. Self-balancing BSTs have
better worst-case lookup performance than hash tables (O(log n) compared to O(n)), but have
worse average-case performance (O(log n) compared to O(1)).
Self-balancing BSTs can be used to implement any algorithm that requires mutable ordered lists,
to achieve optimal worst-case asymptotic performance. For example, if binary tree sort is
implemented with a self-balanced BST, we have a very simple-to-describe yet asymptotically
optimal O(n log n) sorting algorithm. Similarly, many algorithms in computational geometry
exploit variations on self-balancing BSTs to solve problems such as the line segment intersection
problem and the point location problem efficiently. (For average-case performance, however,
self-balanced BSTs may be less efficient than other solutions. Binary tree sort, in particular, is
likely to be slower than merge sort, quicksort, or heapsort, because of the tree-balancing
overhead as well as cache access patterns.)
Self-balancing BSTs are flexible data structures, in that it's easy to extend them to efficiently
record additional information or perform new operations. For example, one can record the
number of nodes in each subtree having a certain property, allowing one to count the number of
nodes in a certain key range with that property in O(log n) time. These extensions can be used,
for example, to optimize database queries or other list-processing algorithms.
AVL Trees:
An AVL tree is another balanced binary search tree. Named after their inventors, Adelson-
Velskii and Landis, they were the first dynamically balanced trees to be proposed. Like red-black
trees, they are not perfectly balanced, but pairs of sub-trees differ in height by at most 1,
maintaining an O(logn) search time. Addition and deletion operations also take O(logn) time.
Definition of an AVL tree: An AVL tree is a binary search tree which has the following
properties:
108
Balance
requirement
for an AVL
tree: the left
and right
sub-trees
differ by at
most 1 in
height.
Yes this is an AVL tree. Examination shows that each left sub-tree has a height 1 greater than
each right sub-tree.
No this is not an AVL tree. Sub-tree with root 8 has height 4 and sub-tree with root 18 has
height 2.
An AVL tree implements the Map abstract data type just like a regular binary search tree, the
only difference is in how the tree performs. To implement our AVL tree we need to keep track
of a balance factor for each node in the tree. We do this by looking at the heights of the left and
right subtrees for each node. More formally, we define the balance factor for a node as the
difference between the height of the left subtree and the height of the right subtree.
balanceFactor=height(leftSubTree)−height(rightSubTree)
109
Using the definition for balance factor given above we say that a subtree is left-heavy if the
balance factor is greater than zero. If the balance factor is less than zero then the subtree is right
heavy. If the balance factor is zero then the tree is perfectly in balance. For purposes of
implementing an AVL tree, and gaining the benefit of having a balanced tree we will define a
tree to be in balance if the balance factor is -1, 0, or 1. Once the balance factor of a node in a
tree is outside this range we will need to have a procedure to bring the tree back into balance.
Figure shows an example of an unbalanced, right-heavy tree and the balance factors of each
node.
AVL trees are identical to standard binary search trees except that for every node in an AVL
tree, the height of the left and right subtrees can differ by at most 1 (Weiss, 1993, p:108). AVL
trees are HB-k trees (height balanced trees of order k) of order HB-1.
When storing an AVL tree, a field must be added to each node with one of three values: 1, 0, or
-1. A value of 1 in this field means that the left subtree has a height one more than the right
subtree. A value of -1 denotes the opposite. A value of 0 indicates that the heights of both
subtrees are the same. Updates of AVL trees require up to rotations, whereas updating
red-black trees can be done using only one or two rotations (up to color changes). For
this reason, they (AVL trees) are considered a bit obsolete by some.
Sparse AVL trees are defined as AVL trees of height h with the fewest possible nodes. Figure 3
shows sparse AVL trees of heights 0, 1, 2, and 3.
110
Figure Structure of an AVL tree
A multiway tree is a tree that can have more than two children. A multiway tree of order m (or
an m-way tree) is one in which a tree can have m children.
As with the other trees that have been studied, the nodes in an m-way tree will be made up of key
fields, in this case m-1 key fields, and pointers to children.
To make the processing of m-way trees easier some type of order will be imposed on the keys
within each node, resulting in a multiway search tree of order m (or an m-way search tree).
By definition an m-way search tree is a m-way tree in which:
111
M-way search trees give the same advantages to m-way trees that binary search trees gave to
binary trees - they provide fast information retrieval and update. However, they also have the
same problems that binary search trees had - they can become unbalanced, which means that the
construction of the tree becomes of vital importance.
B Trees:
An extension of a multiway search tree of order m is a B-tree of order m. This type of tree will
be used when the data to be accessed/stored is located on secondary storage devices because they
allow for large amounts of data to be stored in a node.
1. The root has at least two subtrees unless it is the only node in the tree.
2. Each nonroot and each nonleaf node have at most m nonempty children and at least m/2
nonempty children.
3. The number of keys in each nonroot and each nonleaf node is one less than the number of
its nonempty children.
4. All leaves are on the same level.
These restrictions make B-trees always at least half full, have few levels, and remain perfectly
balanced.
Searching a B-tree
An algorithm for finding a key in B-tree is simple. Start at the root and determine which pointer
to follow based on a comparison between the search value and key fields in the root node.
Follow the appropriate pointer to a child node. Examine the key fields in the child node and
continue to follow the appropriate pointers until the search value is found or a leaf node is
reached that doesn't contain the desired search value.
112
The condition that all leaves must be on the same level forces a characteristic behavior of B-
trees, namely that B-trees are not allowed to grow at the their leaves; instead they are forced to
grow at the root.
When inserting into a B-tree, a value is inserted directly into a leaf. This leads to three common
situations that can occur:
This is the easiest of the cases to solve because the value is simply inserted into the correct sorted
position in the leaf node.
In this case, the leaf node where the value should be inserted is split in two, resulting in a new
leaf node. Half of the keys will be moved from the full leaf to the new leaf. The new leaf is then
incorporated into the B-tree.
The new leaf is incorporated by moving the middle value to the parent and a pointer to the new
leaf is also added to the parent. This process is continues up the tree until all of the values have
"found" a location.
113
results in a split of the first leaf node:
The new node needs to be incorporated into the tree - this is accomplished by taking the middle
value and inserting it in the parent:
The upward movement of values from case 2 means that it's possible that a value could move up
to the root of the B-tree. If the root is full, the same basic process from case 2 will be applied and
a new root will be created. This type of split results in 2 new nodes being added to the B-tree.
114
Results in:
The 15 needs to be moved to the root node but it is full. This means that the root needs to be
divided:
The 15 is inserted into the parent, which means that it becomes the new root node:
As usual, this is the hardest of the processes to apply. The deletion process will basically be a
115
reversal of the insertion process - rather than splitting nodes, it's possible that nodes will be
merged so that B-tree properties, namely the requirement that a node must be at least half full,
can be maintained.
1a) If the leaf is at least half full after deleting the desired value, the remaining larger values are
moved to "fill the gap".
results in:
1b) If the leaf is less than half full after deleting the desired value (known as underflow), two
things could happen:
116
1b-1) If there is a left or right sibling with the number of keys exceeding the minimum
requirement, all of the keys from the leaf and sibling will be redistributed between them by
moving the separator key from the parent to the leaf and moving the middle key from the node
and the sibling combined to the parent.
1b-2) If the number of keys in the sibling does not exceed the minimum requirement, then the
leaf and sibling are merged by putting the keys from the leaf, the sibling, and the separator from
the parent into the leaf. The sibling node is discarded and the keys in the parent are moved to
"fill the gap". It's possible that this will cause the parent to underflow. If that is the case, treat the
parent as a leaf and continue repeating step 1b-2 until the minimum requirement is met or the
root of the tree is reached.
Special Case for 1b-2: When merging nodes, if the parent is the root with only one key, the keys
from the node, the sibling, and the only key of the root are placed into a node and this will
become the new root for the B-tree. Both the sibling and the old root will be discarded.
117
Case 2: Deletion from a non-leaf
This case can lead to problems with tree reorganization but it will be solved in a manner similar
to deletion from a binary search tree.
The key to be deleted will be replaced by its immediate predecessor (or successor) and then the
predecessor (or successor) will be deleted since it can only be found in a leaf node.
118
and then the immediate predecessor is deleted:
The vales in the left sibling are combined with the separator key (18) and the remaining values.
They are divided between the 2 nodes:
119
and then the middle value is moved to the parent:
Hashing is the technique used for performing almost constant time search in case of insertion, deletion
and find operation. Taking a very simple example of it, an array with its index as key is the example of
hash table. So each index (key) can be used for accessing the value in a constant search time. This
mapping key must be simple to compute and must helping in identifying the associated value. Function
which helps us in generating such kind of key-value mapping is known as Hash Function.
In a hashing system the keys are stored in an array which is called the Hash Table. A perfectly
implemented hash table would always promise an average insert/delete/retrieval time of O(1).
Hashing Function:
A function which employs some algorithm to computes the key K for all the data elements in the set U,
such that the key K which is of a fixed size. The same key K can be used to map data to a hash table
and all the operations like insertion, deletion and searching should be possible. The values returned by
a hash function are also referred to as hash values, hash codes, hash sums, or hashes.
120
Hash Collision:
A situation when the resultant hashes for two or more data elements in the data set U, maps to the same
location in the has table, is called a hash collision. In such a situation two or more data elements would
qualify to be stored / mapped to the same location in the hash table.
Open Hashing, is a technique in which the data is not directly stored at the hash key index (k) of the Hash
table. Rather the data at the key index (k) in the hash table is a pointer to the head of the data structure
where the data is actually stored. In the most simple and common implementations the data structure
adopted for storing the element is a linked-list.
n this technique when a data needs to be searched, it might become necessary (worst case) to traverse
all the nodes in the linked list to retrieve the data.
Note that the order in which the data is stored in each of these linked lists (or other data structures) is
completely based on implementation requirements. Some of the popular criteria are insertion order,
frequency of access etc.
In this technique a hash table with pre-identified size is considered. All items are stored in the hash
table itself. In addition to the data, each hash bucket also maintains the three states: EMPTY,
OCCUPIED, DELETED. While inserting, if a collision occurs, alternative cells are tried until an empty
bucket is found. For which one of the following technique is adopted.
121
1. Liner Probing
2. Quadratic probing
3. Double hashing (in short in case of collision another hashing function is used with the key value
as an input to identify where in the open addressing scheme the data should actually be stored.)
122
123