0% found this document useful (0 votes)
2 views91 pages

Data Structures Notes

Uploaded by

bapezijosibij3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views91 pages

Data Structures Notes

Uploaded by

bapezijosibij3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 91

Data Structures

DATA STRUCTURES (C53PC2)

Course Objective:
Make to understand the significance of data structures and imply them in building efficient
algorithms
Course Outcomes:
After completion of this course, the student will be able to
1.Understand the concepts of time and space complexities.
2.Understand the concept of Abstract Data Type.
3.Choose appropriate data structures to represent data items in real world problems.
4.Analyze the search and space complexities of algorithms.
5.Design programs using a variety of data structures such as stacks, queues, hash tables,
binary trees, search trees, heaps, graphs, and B-trees, and implement various searching and
sorting techniques.

UNIT I
Basic Concepts
Data objects and Structures,
Algorithm Specification-Introduction,
Recursive algorithms,
Data abstraction,
Performance analysis-Time complexity and Space complexity,
Asymptotic Notation-Big O, Omega and Theta notations,
Complexity Analysis Examples,
Introduction to Linear and Non-Linear data structures.

1 Basic Concepts

1.1 Data objects and Structures


The term data structure is used to describe the way data is stored, and the term algorithm is used to
describe the way data is processed. Data structures and algorithms are interrelated. Choosing a data
structure affects the kind of algorithm you might use, and choosing an algorithm affects the data
structures we use.

TKR College of Engineering & Technology 1


Data Structures

Data structure is a representation of logical relationship existing between individual elements of


data. In other words, a data structure defines a way of organizing all data items that considers not
only the elements stored but also their relationship to each other. The term data structure is used to
describe the way data is stored.

To develop a program of an algorithm we should select an appropriate data structure for that
algorithm. Therefore, data structure is represented as:

Algorithm + Data structure = Program

A data structure is said to be linear if its elements form a sequence or a linear list. The linear data
structures like an array, stacks, queues and linked lists organize data in linear order. A data structure
is said to be nonlinear if its elements form a hierarchical classification where, data items appear at
various levels.

Trees and Graphs are widely used non-linear data structures. Tree and graph structures represents
hierarchical relationship between individual data elements. Graphs are nothing but trees with
certain restrictions removed.

Data structures are divided into two types:

• Primitive data structures.

• Non-primitive data structures.

Primitive Data Structures are the basic data structures that directly operate upon the machine
instructions. They have different representations on different computers. Integers, floating point
numbers, character constants, string constants and pointers come under this category.

Non-primitive data structures are more complicated data structures and are derived from primitive
data structures. They emphasize on grouping same or different data items with relationship between
each data item. Arrays, lists and files come under this category. Figure 1.1 shows the classification of
data structures.

Figure 1.1 Classification of Data Structure

TKR College of Engineering & Technology 2


Data Structures

1.1.2 Data structures: Organization of data

The collection of data you work with in a program have some kind of structure or organization. No
matter how complex your data structures are they can be broken down into two fundamental types:

• Contiguous

• Non-Contiguous.

In contiguous structures, terms of data are kept together in memory (either RAM or in a file). An
array is an example of a contiguous structure. Since each element in the array is located next to one
or two other elements. In contrast, items in a non-contiguous structure and scattered in memory,
but we linked to each other in some way. A linked list is an example of a non-contiguous data
structure. Here, the nodes of the list are linked together using pointers stored in each node. Figure
1.2 below illustrates the difference between contiguous and non-contiguous structures.

Contiguous structures:

Contiguous structures can be broken drawn further into two kinds: those that contain data items of
all the same size, and those where the size may differ. Figure 1.2 shows example of each kind. The
first kind is called the array. Figure 1.3(a) shows an example of an array of numbers. In an array, each
element is of the same type, and thus has the same size.

The second kind of contiguous structure is called structure, figure 1.3(b) shows a simple structure
consisting of a person‘s name and age. In a struct, elements may be of different data types and thus
may have different sizes.

For example, a person‘s age can be represented with a simple integer that occupies two bytes of
memory. But his or her name, represented as a string of characters, may require many bytes and
may even be of varying length.

Couples with the atomic types (that is, the single data-item built-in types such as integer, float and
pointers), arrays and structs provide all the “mortar” you need to build more exotic form of data
structure, including the non-contiguous forms.

TKR College of Engineering & Technology 3


Data Structures

Non-contiguous structures:

Non-contiguous structures are implemented as a collection of data-items, called nodes, where each
node can point to one or more other nodes in the collection. The simplest kind of non-contiguous
structure is linked list.

A linked list represents a linear, one-dimension type of non-contiguous structure, where there is only
the notation of backwards and forwards. A tree such as shown in figure 1.4(b) is an example of a
two-dimensional non-contiguous structure. Here, there is the notion of up and down and left and
right.

In a tree each node has only one link that leads into the node and links can only go down the tree.
The most general type of non-contiguous structure, called a graph has no such restrictions. Figure
1.4(c) is an example of a graph.

Hybrid structures:

If two basic types of structures are mixed then it is a hybrid form. Then one part contiguous and
another part non-contiguous. For example, figure 1.5 shows how to implement a double–linked list
using three parallel arrays, possibly stored a past from each other in memory.

TKR College of Engineering & Technology 4


Data Structures

The array D contains the data for the list, whereas the array P and N hold the previous and next
“pointers‘‘. The pointers are actually nothing more than indexes into the D array. For instance, D[i]
holds the data for node i and p[i] holds the index to the node previous to i, where may or may not
reside at position i–1. Likewise, N[i] holds the index to the next node in the list.

1.2 Algorithm Specification-Introduction


An Algorithm is a finite sequence of instructions, each of which has a clear meaning and can be
performed with a finite amount of effort in a finite length of time. No matter what the input values
may be, an algorithm terminates after executing a finite number of instructions.

An Algorithm is a sequence of unambiguous instructions which can be used to solve the problem. In
addition, every algorithm must satisfy the following criteria.

Input: there are zero or more quantities, which are externally supplied.

Output: at least one quantity is produced.

Definiteness: each instruction must be clear and unambiguous.

Finiteness: if we trace out the instructions of an algorithm, then for all cases the algorithm will
terminate after a finite number of steps.

Effectiveness: every instruction must be sufficiently basic that it can in principle be carried out by a
person using only pencil or paper. It is not enough that each operation be definite, but it must also
be feasible.

Practical Algorithm design issues:

There are some criteria that we need to take care while designing the algorithm for any process:

1) Try to save time (Time Complexity)


2) Try to save space (Space Complexity)
3) Try to have face

A program that runs faster is better program, so saving time is an obvious goal. Likewise, a program
that saves space over a competing program is considered desirable.

TKR College of Engineering & Technology 5


Data Structures

1.3 Recursive algorithms


A recursive function or method invokes itself.
When function is called within the same function, it is known as recursion in C++. The
function which calls the same function, is known as recursive function.
In direct recursion the code for function f contains a statement that invokes f, whereas
indirect recursion the function f invokes some other function g, which invokes yet another
function h, and so on until function f is again invoked.
1.3.1 Difference between Recursion and Iteration:
1. A function is said to be recursive if it calls itself again and again within its body whereas
iterative functions are loop based imperative functions.
2. Recursion uses stack whereas iteration does not use stack.
3. Recursion uses more memory than iteration as its concept is based on stacks.
4. Recursion is comparatively slower than iteration due to overhead condition of maintaining
stacks.
5. Recursion makes code smaller and iteration makes code longer.
6. Iteration terminates when the loop-continuation condition fails whereas recursion
terminates when a base case is recognized.
7. While using recursion multiple activation records are created on stack for each call where
as in iteration everything is done in one activation record.
8. Infinite recursion can crash the system whereas infinite looping uses CPU cycles
repeatedly. 9. Recursion uses selection structure whereas iteration uses repetition structure.
1.3.2 Types of Recursions:

Recursion is of two types depending on whether a function calls itself from within itself or
whether two functions call one another mutually. The former is called direct recursion and
the latter is called indirect recursion.
Thus, there are two types of recursions:
 Direct Recursion
 Indirect Recursion
may be further categorized as:
 Linear Recursion
 Binary Recursion
 Multiple Recursion
Linear Recursion: It is the most common type of Recursion in which function calls itself
repeatedly until base condition [termination case] is reached. Once the base case is reached

TKR College of Engineering & Technology 6


Data Structures

the results are return to the caller function. If a recursive function is called only once then it is
called a linear recursion.

1.4 Data abstraction


Data abstraction refers to providing only essential information to the outside world and
hiding their background details, i.e., to represent the needed information in program without
presenting the details.
Data abstraction is a programming (and design) technique that relies on the separation of
interface and implementation.
Let's take one real life example of a TV, which you can turn on and off, change the channel,
adjust the volume, and add external components such as speakers, VCRs, and DVD players,
BUT you do not know its internal details, that is, you do not know how it receives signals
over the air or through a cable, how it translates them, and finally displays them on the
screen.
Thus, we can say a television clearly separates its internal implementation from its external
interface and you can play with its interfaces like the power button, channel changer, and
volume control without having any knowledge of its internals.
In C++, classes provides great level of data abstraction. They provide sufficient public
methods to the outside world to play with the functionality of the object and to manipulate
object data, i.e., state without actually knowing how class has been implemented internally.
For example, your program can make a call to the sort() function without knowing what
algorithm the function actually uses to sort the given values. In fact, the underlying
implementation of the sorting functionality could change between releases of the library, and
as long as the interface stays the same, your function call will still work.
In C++, we use classes to define our own abstract data types (ADT). You can use
the cout object of class ostream to stream data to standard output like this −

1.4.1 Access Labels Enforce Abstraction

In C++, we use access labels to define the abstract interface to the class. A class may contain
zero or more access labels −
 Members defined with a public label are accessible to all parts of the program. The
data-abstraction view of a type is defined by its public members.
 Members defined with a private label are not accessible to code that uses the class.
The private sections hide the implementation from code that uses the type.
There are no restrictions on how often an access label may appear. Each access label
specifies the access level of the succeeding member definitions. The specified access level
remains in effect until the next access label is encountered or the closing right brace of the
class body is seen.

1.4.2 Benefits of Data Abstraction

Data abstraction provides two important advantages −

TKR College of Engineering & Technology 7


Data Structures

 Class internals are protected from inadvertent user-level errors, which might corrupt
the state of the object.
 The class implementation may evolve over time in response to changing requirements
or bug reports without requiring change in user-level code.
By defining data members only in the private section of the class, the class author is free to
make changes in the data. If the implementation changes, only the class code needs to be
examined to see what affect the change may have. If data is public, then any function that
directly access the data members of the old representation might be broken.
Ex:

#include <iostream>
using namespace std;

class Adder {
public:
// constructor
Adder(int i = 0) {
total = i;
}

// interface to outside world


void addNum(int number) {
total += number;
}

// interface to outside world


int getTotal() {
return total;
};

private:
// hidden data from outside world
int total;
};

int main() {
Adder a;

a.addNum(10);
a.addNum(20);
a.addNum(30);

cout << "Total " << a.getTotal() <<endl;


return 0;
}

TKR College of Engineering & Technology 8


Data Structures

1.5 Performance analysis-Time complexity and Space complexity


The Performance of a program is the amount of computer memory it consumed and the time
required to execute the program.
We use two approaches to determine the performance of program. One is Analytical method
and other is Experimental method. In Performance Analysis we use analytical methods, while
in performance measurement we conduct experiments.
1.5.1 Time Complexity:
The Time complexity of a program is the amount of computer time it needs to run to
completion.
The limiting behaviour of the complexity as size increases is called the asymptotic time
complexity.it is the asymptotic complexity of an algorithm, which ultimately determines the
size of problems that can be solved by the algorithm.
1.5.2 Space Complexity:
The space complexity of the program is the amount of memory it needs to run to completion.
The space need by a program has the following components.
1. Instruction Space: Instruction space is the space needed to store the compiled version
of the program instructions. The amount of instructions space that is needed depends
on such factor such as:
a. The compiler used to complete the program into machine code.
b. The compiler options in at the time of compilation.
c. The target computer.
2. Data Space: x. Data space has two components:
a. Space needed by constants and simple variables in program.
b. Space needed by dynamically allocated objects such as arrays and class
instances.
3. Environment Stack Space: The Environment stack is used to save information needed
to resume execution of partially completed functions.

1.6 Asymptotic Notation-Big O, Omega and Theta notations


Big O Notation
We express complexity using big-O notations, User to measure running time.
Basically, it tells you how fast a function grows or declines. Big O Specifically describes the
worst-case scenario, and can be used to describe the execution time required or the space
used by an algorithm.

TKR College of Engineering & Technology 9


Data Structures

For Example, when analyzing some algorithm, one might find that the time (or the
number of steps) it takes to complete a problem od size n is given by T(n)=4n^2-2n+2.
If we ignore constants (which makes sense because those depend on the particular
hardware the program is run on) and slower going terms, we could say “T(n) grows at the
order of n^2” and write T(n)=O(n^2).
T(n)=O(f(n)), (pronounced order of or big oh), says that the growth rate of T(n) is less
than or equal (<) that of f(n)
Omega Notation (Ω)
Ω Notation provides an asymptotic lower bound. It is Useful for finding the Best time an
Algorithm can Take

Ω (f(n))>={g(n): there exists c>0 and n0 such that g(n)<=c.f(n) for all n>n0.}
Theta Notation, Ɵ
The notation describes asymptotic tight bounds

TKR College of Engineering & Technology 10


Data Structures

A theoretical measure of the execution of an algorithm, usually the time or memory needed,
given the problem size n, which is usually the number of items. Informally, saying some
equation f(n)= Ɵ(g(n) means it is within a constant multiple of g(n). the equation is read “f of
n is theta g of n”.

F(n)= Ɵ(g(n)) means there are positive constants c 1, c2, and k such that 0<=c1g(n)
<=f(n)<=c2g(n) for all n>=k. The values of c1, c2 and k must be fixed for the function f and
must not depend on n.

1.7 Complexity Analysis Examples

1.8 Introduction to Linear and Non-Linear data structures.

TKR College of Engineering & Technology 11


Data Structures

UNIT-II
2.1 Sparse matrices-array and linked representations.
2.2 Linear list ADT-array representation and linked list representation,
2.3 Singly Linked Lists-Operations Insertion, Deletion,
2.4 Circular linked lists-Operations for Circular linked lists,
2.5 Doubly Linked Lists Operations-Insertion, Deletion.
2.6 Stack ADT, definition, array and linked list implementations,
2.7 applications-infix to postfix conversion, Postfix expression evaluation,
2.8 recursion implementation,
2.9 Queue ADT, definition, array and linked list, Implementations,
2.10 Circular Queues-Insertion and deletion operations,
2.11 Polynomial.

TKR College of Engineering & Technology 12


Data Structures

2.1 Sparse Matrix:

A matrix is a two-dimensional data object made of m rows and n columns, therefore having
total m x n values. If most of the elements of the matrix have 0 value, then it is called a
sparse matrix.
00304
00570
00000
02600
Representing a sparse matrix by a 2D array leads to wastage of lots of memory as zeroes in
the matrix are of no use in most of the cases. So, instead of storing zeroes with non-zero
elements, we only store non-zero elements. This means storing non-zero elements
with triples- (Row, Column, value).
Sparse Matrix Representations can be done in many ways following are two common
representations:
1. Array representation
2. Linked list representation
Method 1: Using Arrays:
2D array is used to represent a sparse matrix in which there are three rows named as
 Row: Index of row, where non-zero element is located
 Column: Index of column, where non-zero element is located
 Value: Value of the non zero element located at index – (row,column)

// C++ program for Sparse Matrix Representation


// using Array
#include <iostream>
using namespace std;

int main()
{
// Assume 4x5 sparse matrix

TKR College of Engineering & Technology 13


Data Structures

int sparseMatrix[4][5] =
{
{0 , 0 , 3 , 0 , 4 },
{0 , 0 , 5 , 7 , 0 },
{0 , 0 , 0 , 0 , 0 },
{0 , 2 , 6 , 0 , 0 }
};

int size = 0;
for (int i = 0; i < 4; i++)
for (int j = 0; j < 5; j++)
if (sparseMatrix[i][j] != 0)
size++;

// number of columns in compactMatrix (size) must be


// equal to number of non - zero elements in
// sparseMatrix
int compactMatrix[3][size];

// Making of new matrix


int k = 0;
for (int i = 0; i < 4; i++)
for (int j = 0; j < 5; j++)
if (sparseMatrix[i][j] != 0)
{
compactMatrix[0][k] = i;
compactMatrix[1][k] = j;
compactMatrix[2][k] = sparseMatrix[i][j];
k++;
}

TKR College of Engineering & Technology 14


Data Structures

for (int i=0; i<3; i++)


{
for (int j=0; j<size; j++)
cout <<" "<< compactMatrix[i][j];

cout <<"\n";
}
return 0;
}

Method 2: Using Linked Lists


In linked list, each node has four fields. These four fields are defined as:
 Row: Index of row, where non-zero element is located
 Column: Index of column, where non-zero element is located
 Value: Value of the non zero element located at index – (row,column)
 Next node: Address of the next node

// C++ program for sparse matrix representation.


// Using Link list
#include<iostream>
using namespace std;

// Node class to represent link list


class Node
{
public:

TKR College of Engineering & Technology 15


Data Structures

int row;
int col;
int data;
Node *next;
};

// Function to create new node


void create_new_node(Node **p, int row_index, int col_index, int x)
{
Node *temp = *p;
Node *r;

// If link list is empty then


// create first node and assign value.
if (temp == NULL)
{
temp = new Node();
temp->row = row_index;
temp->col = col_index;
temp->data = x;
temp->next = NULL;
*p = temp;
}

// If link list is already created


// then append newly created node
else
{
while (temp->next != NULL)
temp = temp->next;

TKR College of Engineering & Technology 16


Data Structures

r = new Node();
r->row = row_index;
r->col = col_index;
r->data = x;
r->next = NULL;
temp->next = r;
}
}

// Function prints contents of linked list


// starting from start
void printList(Node *start)
{
Node *ptr = start;
cout << "row_position:";
while (ptr != NULL)
{
cout << ptr->row << " ";
ptr = ptr->next;
}
cout << endl;
cout << "column_position:";

ptr = start;
while (ptr != NULL)
{
cout << ptr->col << " ";
ptr = ptr->next;
}

TKR College of Engineering & Technology 17


Data Structures

cout << endl;


cout << "Value:";
ptr = start;

while (ptr != NULL)


{
cout << ptr->data << " ";
ptr = ptr->next;
}
}

// Driver Code
int main()
{

// 4x5 sparse matrix


int sparseMatrix[4][5] = { { 0 , 0 , 3 , 0 , 4 },
{ 0 , 0 , 5 , 7 , 0 },
{ 0 , 0 , 0 , 0 , 0 },
{ 0 , 2 , 6 , 0 , 0 } };

// Creating head/first node of list as NULL


Node *first = NULL;
for(int i = 0; i < 4; i++)
{
for(int j = 0; j < 5; j++)
{

// Pass only those values which


// are non - zero

TKR College of Engineering & Technology 18


Data Structures

if (sparseMatrix[i][j] != 0)
create_new_node(&first, i, j, sparseMatrix[i][j]);
}
}
printList(first);

return 0;
}

2.2 Linear list ADT-array representation and linked list representation


Data Structure is a particular way of storing and organizing data in a computer so that it can
be used efficiently.
Different kinds of data structures are suited to different kinds of applications.
Storing and retrieving can be carried out on data stored in both memory and in secondary
memory.
Way in which data are stored for efficient search and retrieval.
The simplest data structure is the one-dimensional (liner) Array.
Data items stored non-consecutive in memory may be linked by pointers.
Many algorithms have been developed for storing data efficiently.
An Algorithm is step-by-step procedure for calculations.
An algorithm is an effective method expressed as a finite of well-defined instructions for
calculating a function.
The transition form one state to the next is not necessarily deterministic: some algorithms
incorporate random input.
Procedure that produces the answer to a question or the solution to a problem in a finite
number of steps.
An algorithm that produces a yes or no answer is called a decision procedure; one that leads
to a solution is a computation procedure.
Primitive/Non-Primitive
Basic Data structures available / derived from Primitive Data Structures.
Homogeneous/Heterogeneous
Elements are of the same type/Different types

TKR College of Engineering & Technology 19


Data Structures

Static/Dynamic
Memory is allocated at the time of compilation/run-time
Linear/Non-Linear
Maintain a Linear relationship between element
Problem solving with a computer means processing data
To process data, we need to define the data type and the operation to be performed on the
data
The definition of the data type and the definition of the operation to be applied to the data is
part of the idea behind an Abstract Data Type (ADT)
The user of an ADT needs only to know that a set of operations are available for the data
type, but does not need to know how they are applied.
Several simple ADTs such as integer, real, character, pointer and so on, have been
implemented are available for use in most languages.
A data type is characterized by:
A set of values
A Data representation, which is common to all these values, and
A set of operations, which can be applied uniformly to all these values.
Languages like C provides the following primitive data types:
Boolean
Char, byte, int
Flat, double
Each primitive data type has:
A set of values
A Data type representation
A set of operations
In computer science, an abstract data type (ADT) is a mathematical model for a certain class
of data structures that have similar behaviour.
An abstract data type is defined indirectly, only by the operations that may be performed on it
and by mathematical constraints on the effects (and possibly cost) of those operations.
An ADT may be implemented by specific data types or data structures, in many ways and in
many programming languages; or described in a formal specification language.
Example an abstract stack could be defined by three operations:
Push, that inserts some data item onto the structure,

TKR College of Engineering & Technology 20


Data Structures

Pop, that extracts an item from it, and


Peek, that allows data on top of the structure to be examined without removal.
ADT in simple words:
Is a set of operations
Mathematical abstraction
No Implementation details
Example:
Lists, sets, graphs, stacks are examples of ADT along with their operations.
Modularity:
Divide program into small functions
Easy to debug and maintain
Easy to modify
Group work
Reuse:
Do some operations only once
Easy to change implementation:
Transparent to the program
A Data representation
Must be able to represent all necessary values of the ADT
Should be private
An algorithm for each of the necessary operation:
Must be consistent with the chosen representation
All auxiliary (helper) operations that are not in the contract should be private
The List ADT
The List is an
Ordered sequence of data items called elements
A1, A2, A3, …..An is a list of size N
Size of an Empty list is 0
Ai+1Succeds of Ai
Ai-1 precedes of Ai

TKR College of Engineering & Technology 21


Data Structures

Position of Ai is i
First element A1 is called “head”
Last element is An Called “Tail”
Operations on Lists
MakeEmpty
PrintList
Find
FindKth
Insert
Delete
Next
Previous
List An Example:
The elements of list are 34,12,52,16,12
Find(52)->3
Insert(20,4)-> 34,12,52,20,16,12
Delete(52)-> 34,12,20,16,12
FindKth(3)-> 20
List can be implemented using:
Arrays
Linked List
Cursor [Linked List using Arrays]
Arrays
Array is a static data structure that represents a collection of fixed number of
homogenous data items or
A fixed sized indexed sequence of elements, all of the same type.
The individual elements are typically stored in consecutive memory locations
The length of array is determined when the array is created, can be changed.
Any component of the array can be inspected or updated by using its index.

TKR College of Engineering & Technology 22


Data Structures

https://www.slideshare.net/ayyakathir/unit-i-linear-data-structures-list

2.3 Singly Linked Lists-Operations Insertion, Deletion,


2.4 Circular linked lists-Operations for Circular linked lists,
2.5 Doubly Linked Lists Operations-Insertion, Deletion.
2.6 Stack ADT, definition, array and linked list implementations,
2.7 applications-infix to postfix conversion, Postfix expression evaluation,
2.8 recursion implementation,
2.9 Queue ADT, definition, array and linked list, Implementations,
2.10 Circular Queues-Insertion and deletion operations,
2.11 Polynomial.
A polynomial p(x) is the expression in variable x which is in the form (axn + bxn-1 + …. + jx+ k),
where a, b, c …., k fall in the category of real numbers and 'n' is non negative integer, which
is called the degree of polynomial.

An essential characteristic of the polynomial is that each term in the polynomial


expression consists of two parts:

 one is the coefficient


 other is the exponent
Ex:
5x3+4x2+10x+1

10x2 + 26x, here 10 and 26 are coefficients and 2, 1 is its exponential value.
Points to keep in Mind while working with Polynomials:

 The sign of each coefficient and exponent is stored within the coefficient and
the exponent itself
 Additional terms having equal exponent is possible one
 The storage allocation for each term in the polynomial must be done in
ascending and descending order of their exponent

TKR College of Engineering & Technology 23


Data Structures

Polynomial can be represented in the various ways. These are:

 By the use of arrays


 By the use of Linked List

There may arise some situation where you need to evaluate many polynomial
expressions and perform basic arithmetic operations like addition and subtraction
with those numbers. For this, you will have to get a way to represent those
polynomials. The simple way is to represent a polynomial with degree 'n' and store
the coefficient of n+1 terms of the polynomial in the array. So every array element
will consist of two values:

 Coefficient and
 Exponent

A polynomial can be represented using the C++ code:

UNIT III
3.1 Trees
3.1.1 Definition, terminology,
3.1.2 Binary trees-definition,
3.1.3 Properties of Binary Trees, Binary Tree ADT,
3.1.4 representation of Binary Trees-array and linked representations,
3.1.5 Binary Tree traversals,
3.1.6 threaded binary trees,
3.2 Priority Queues –Definition and applications,
3.2.1 Max Priority Queue ADT implementation-Max Heap-Definition,
3.2.2 Insertion into a Max Heap, Deletion from a Max Heap.
3.3 Disjoint set ADT -Equivalence relations,
3.3.1 the dynamic equivalence problem,
3.3.2 Basic data structure,
3.3.3 Smart union algorithms, Path compression,

TKR College of Engineering & Technology 24


Data Structures

3.3.4 worst case for union by rank and path compression, and an application -generation of
mazes.

Tree - Terminology
linear data structure data is organized in sequential order and in non-linear data
structure data is organized in random order. A tree is a very popular non-linear data
structure used in a wide range of applications. A tree data structure can be defined
as follows...

Tree is a non-linear data structure which organizes data in hierarchical


structure and this is a recursive definition.

A tree data structure can also be defined as follows...

Tree data structure is a collection of data (Node) which is organized in


hierarchical structure recursively

In tree data structure, every individual element is called as Node. Node in a tree
data structure stores the actual data of that particular element and link to next
element in hierarchical structure.

In a tree data structure, if we have N number of nodes then we can have a maximum
of N-1 number of links.

Example

Terminology
In a tree data structure, we use the following terminology...

1. Root
In a tree data structure, the first node is called as Root Node. Every tree must have
a root node. We can say that the root node is the origin of the tree data structure. In

TKR College of Engineering & Technology 25


Data Structures

any tree, there must be only one root node. We never have multiple root nodes in a
tree.

2. Edge
In a tree data structure, the connecting link between any two nodes is called
as EDGE. In a tree with 'N' number of nodes there will be a maximum of 'N-1'
number of edges.

3. Parent
In a tree data structure, the node which is a predecessor of any node is called
as PARENT NODE. In simple words, the node which has a branch from it to any
other node is called a parent node. Parent node can also be defined as "The node
which has child / children".

TKR College of Engineering & Technology 26


Data Structures

4. Child
In a tree data structure, the node which is descendant of any node is called
as CHILD Node. In simple words, the node which has a link from its parent node is
called as child node. In a tree, any parent node can have any number of child nodes.
In a tree, all the nodes except root are child nodes.

5. Siblings
In a tree data structure, nodes which belong to same Parent are called as SIBLINGS.
In simple words, the nodes with the same parent are called Sibling nodes.

TKR College of Engineering & Technology 27


Data Structures

6. Leaf
In a tree data structure, the node which does not have a child is called as LEAF
Node. In simple words, a leaf is a node with no child.

In a tree data structure, the leaf nodes are also called as External Nodes. External
node is also a node with no child. In a tree, leaf node is also called as ' Terminal'
node.

7. Internal Nodes
In a tree data structure, the node which has atleast one child is called as INTERNAL
Node. In simple words, an internal node is a node with atleast one child.

In a tree data structure, nodes other than leaf nodes are called as Internal
Nodes. The root node is also said to be Internal Node if the tree has more than
one node. Internal nodes are also called as 'Non-Terminal' nodes.

TKR College of Engineering & Technology 28


Data Structures

8. Degree
In a tree data structure, the total number of children of a node is called
as DEGREE of that Node. In simple words, the Degree of a node is total number of
children it has. The highest degree of a node among all the nodes in a tree is called
as 'Degree of Tree'

9. Level
In a tree data structure, the root node is said to be at Level 0 and the children of root
node are at Level 1 and the children of the nodes which are at Level 1 will be at
Level 2 and so on... In simple words, in a tree each step from top to bottom is called
as a Level and the Level count starts with '0' and incremented by one at each level
(Step).

TKR College of Engineering & Technology 29


Data Structures

10. Height
In a tree data structure, the total number of edges from leaf node to a particular
node in the longest path is called as HEIGHT of that Node. In a tree, height of the
root node is said to be height of the tree. In a tree, height of all leaf nodes is
'0'.

11. Depth
In a tree data structure, the total number of edges from root node to a particular
node is called as DEPTH of that Node. In a tree, the total number of edges from root
node to a leaf node in the longest path is said to be Depth of the tree. In simple
words, the highest depth of any leaf node in a tree is said to be depth of that tree. In
a tree, depth of the root node is '0'.

TKR College of Engineering & Technology 30


Data Structures

12. Path
In a tree data structure, the sequence of Nodes and Edges from one node to another
node is called as PATH between that two Nodes. Length of a Path is total number
of nodes in that path. In below example the path A - B - E - J has length 4.

13. Sub Tree


In a tree data structure, each child from a node forms a subtree recursively. Every
child node will form a subtree on its parent node.

TKR College of Engineering & Technology 31


Data Structures

Tree Representations

A tree data structure can be represented in two methods. Those methods are as

follows...

1. List Representation

2. Left Child - Right Sibling Representation

1. List Representation
In this representation, we use two types of nodes one for representing the node with
data called 'data node' and another for representing only references called
'reference node'. We start with a 'data node' from the root node in the tree. Then it is
linked to an internal node through a 'reference node' which is further linked to any
other node directly. This process repeats for all the nodes in the tree.
The above example tree can be represented using List representation as follows...

TKR College of Engineering & Technology 32


Data Structures

2. Left Child - Right Sibling Representation


In this representation, we use a list with one type of node which consists of three
fields namely Data field, Left child reference field and Right sibling reference field.
Data field stores the actual value of a node, left reference field stores the address of
the left child and right reference field stores the address of the right sibling node.
Graphical representation of that node is as follows...

In this representation, every node's data field stores the actual value of that node. If
that node has left a child, then left reference field stores the address of that left child
node otherwise stores NULL. If that node has the right sibling, then right reference
field stores the address of right sibling node otherwise stores NULL.
The above example tree can be represented using Left Child - Right Sibling
representation as follows...

TKR College of Engineering & Technology 33


Data Structures

Binary Tree Data structure

In a normal tree, every node can have any number of children. A binary tree is a

special type of tree data structure in which every node can have a maximum of 2

children. One is known as a left child and the other is known as right child.

A tree in which every node can have a maximum of two children is called

Binary Tree.

TKR College of Engineering & Technology 34


Data Structures

There are different types of binary trees and they are...

1. Strictly Binary Tree


In a binary tree, every node can have a maximum of two children. But in strictly
binary tree, every node should have exactly two children or none. That means every
internal node must have exactly two children. A strictly Binary Tree can be defined
as follows...

A binary tree in which every node has either two or zero number of
children is called Strictly Binary Tree

Strictly binary tree is also called as Full Binary Tree or Proper Binary Tree or 2-
Tree

TKR College of Engineering & Technology 35


Data Structures

Strictly binary tree data structure is used to represent mathematical expressions.

Example

2. Complete Binary Tree


In a binary tree, every node can have a maximum of two children. But in strictly
binary tree, every node should have exactly two children or none and in complete
binary tree all the nodes must have exactly two children and at every level of
complete binary tree there must be 2 level number of nodes. For example at level 2
there must be 22 = 4 nodes and at level 3 there must be 23 = 8 nodes.

A binary tree in which every internal node has exactly two children and all
leaf nodes are at same level is called Complete Binary Tree.

Complete binary tree is also called as Perfect Binary Tree

TKR College of Engineering & Technology 36


Data Structures

3. Extended Binary Tree


A binary tree can be converted into Full Binary tree by adding dummy nodes to
existing nodes wherever required.

The full binary tree obtained by adding dummy nodes to a binary tree is
called as Extended Binary Tree.

In above figure, a normal binary tree is converted into full binary tree by adding
dummy nodes (In pink colour).

TKR College of Engineering & Technology 37


Data Structures

A tree is an abstract data type that stores elements hierarchically. With the
exception of the top element, each element in a tree has a parent element and
zero or
more children elements. A tree is usually visualized by placing elements inside
ovals or rectangles, and by drawing the connections between parents and
children
with straight lines. (See Figure 7.2.) We typically call the top element the root
of the tree, but it is drawn as the highest element, with the other elements being
connected below (just the opposite of a botanical tree)

Formal Tree Definition


Formally, we define tree T to be a set of nodes storing elements in a parent-
child relationship with the following properties:
• If T is nonempty, it has a special node, called the root of T, that has no parent.
• Each node v of T different from the root has a unique parent node w; every
node with parent w is a child of w.
Note that according to our definition, a tree can be empty, meaning that it
doesn’t have any nodes. This convention also allows us to define a tree
recursively, such that a tree T is either empty or consists of a node r, called the
root of T, and a (possibly empty) set of trees whose roots are the children of r.
Two nodes that are children of the same parent are siblings. A node v is
external if v has no children. A node v is internal if it has one or more children.
External nodes are also known as leaves.
Example 7.1: In most operating systems, files are organized hierarchically into
nested directories (also called folders), which are presented to the user in the
form of a tree. More specifically, the internal nodes of the tree are associated
with directories and the external nodes are associated with regular files.
In the UNIX and Linux operating systems, the root of the tree is appropriately
called the “root directory,” and is represented by the symbol “/.”

TKR College of Engineering & Technology 38


Data Structures

TKR College of Engineering & Technology 39


Data Structures

UNIT IV
4.1 Searching
4.1.1 Linear Search,
4.1.2 Binary Search,
4.1.3 Hashing-Introduction,
4.1.4 hash tables,
4.1.5 hash functions,
4.1.6 Overflow Handling,
4.1.7 Comparison of Searching methods.
4.2 Sorting:
4.2.1 Insertion Sort,
4.2.2 Selection Sort,
4.2.3 Radix Sort,
4.2.4 Quick sort,
4.2.5 Heap Sort,
4.2.6 Merge sort,
4.2.7 Comparison of Sorting methods.

4.3 External sorting-Model for external sorting,

Model for External Sorting

The wide variety of mass storage devices makes external sorting much more device dependent than
internal sorting. The algorithms that we will consider work on tapes, which are probably the most
restrictive storage medium. Since access to an element on tape is done by winding the tape to the
correct location, tapes can be efficiently accessed only in sequential order (in either direction). We
will assume that we have at least three tape drives to perform the sorting. We need two drives to do
an efficient sort; the third drive simplifies matters. If only one tape drive is present, then we are in
trouble: any algorithm will require O(n 2 ) tape accesses.

TKR College of Engineering & Technology 40


Data Structures

The Simple Algorithm

The basic external sorting algorithm uses the merge routine from merge sort. Suppose we have four
tapes, Ta1, Ta2, Tb1, Tb2, which are two input and two output tapes.

Depending on the point in the algorithm, the a and b tapes are either input tapes or output tapes.
Suppose the data is initially on Ta1. Suppose further that the internal memory can hold (and sort) m
records at a time. A natural first step is to read m records at a time from the input tape, sort the
records internally, and then write the sorted records alternately to Tb1 and Tb2. We will call each set
of sorted records a run. When this is done, we rewind all the tapes. Suppose we have the same input
as our example for Shell sort.

Now Tb1 and Tb2 contain a group of runs. We take the first run from each tape and merge them,
writing the result, which is a run twice as long, onto Ta1. Then we take the next run from each tape,
merge these, and write the result to Ta2. We continue this process, alternating between Ta1 and
Ta2, until either Tb1 or Tb2 is empty. At this point either both are empty or there is one run left. In
the latter case, we copy this run to the appropriate tape. We rewind all four tapes, and repeat the
same steps, this time using the a tapes as input and the b tapes as output. This will give runs of 4m.
We continue the process until we get one run of length n.

This algorithm will require log(n/m) passes, plus the initial run-constructing pass. For instance, if we
have 10 million records of 128 bytes each, and four megabytes of internal memory, then the first
pass will create 320 runs. We would then need nine more passes to complete the sort. Our example
requires log 13/3 = 3 more passes, which are shown in the following figure.

TKR College of Engineering & Technology 41


Data Structures

Multiway Merge

If we have extra tapes, then we can expect to reduce the number of passes required to sort our
input. We do this by extending the basic (two-way) merge to a k-way merge.

Merging two runs is done by winding each input tape to the beginning of each run. Then the smaller
element is found, placed on an output tape, and the appropriate input tape is advanced. If there are
k input tapes, this strategy works the same way, the only difference being that it is slightly more
complicated to find the smallest of the k elements. We can find the smallest of these elements by
using a priority queue. To obtain the next element to write on the output tape, we perform a
delete_min operation. The appropriate input tape is advanced, and if the run on the input tape is not
yet completed, we insert the new element into the priority queue. Using the same example as
before, we distribute the input onto the three tapes

TKR College of Engineering & Technology 42


Data Structures

After the initial run construction phase, the number of passes required using k-way merging is
logk(n/m) , because the runs get k times as large in each pass. For the example above, the formula is
verified, since log3 13/3 = 2. If we have 10 tapes, then k = 5, and our large example from the
previous section would require log5 320 = 4 passes.

Polyphase Merge

The k-way merging strategy developed in the last section requires the use of 2k tapes. This could be
prohibitive for some applications. It is possible to get by with only k + 1 tapes. As an example, we will
show how to perform two-way merging using only three tapes

Suppose we have three tapes, T1, T2, and T3, and an input file on T1 that will produce 34 runs. One
option is to put 17 runs on each of T2 and T3. We could then merge this result onto T1, obtaining
one tape with 17 runs. The problem is that since all the runs are on one tape, we must now put
some of these runs on T2 to perform another merge. The logical way to do this is to copy the first
eight runs from T1 onto T2 and then perform the merge. This has the effect of adding an extra half
pass for every pass we do.

An alternative method is to split the original 34 runs unevenly. Suppose we put 21 runs on T2 and 13
runs on T3. We would then merge 13 runs onto T1 before T3 was empty. At this point, we could
rewind T1 and T3, and merge T1, with 13 runs, and T2, which has 8 runs, onto T3. We could then
merge 8 runs until T2 was empty, which would leave 5 runs left on T1 and 8 runs on T3. We could
then merge T1 and T3, and so on. The following table below shows the number of runs on each tape
after each pass.

The original distribution of runs makes a great deal of difference. For instance, if 22 runs are placed
on T2, with 12 on T3, then after the first merge, we obtain 12 runs on T1 and 10 runs on T2. Afte
another merge, there are 10 runs on T1 and 2 runs on T3. At this point the going gets slow, because
we can only merge two sets of runs before T3 is exhausted. Then T1 has 8 runs and T2 has 2 runs.
Again, we can only merge two sets of runs, obtaining T1 with 6 runs and T3 with 2 runs. After three
more passes, T2 has two runs and the other tapes are empty. We must copy one run to another
tape, and then we can finish the merge.

It turns out that the first distribution we gave is optimal. If the number of runs is a Fibonacci number
Fn, then the best way to distribute them is to split them into two Fibonacci numbers Fn-1 and Fn-2.
Otherwise, it is necessary to pad the tape with dummy runs in order to get the number of runs up to
a Fibonacci number. We leave the details of how to place the initial set of runs on the tapes as an
exercise.

We can extend this to a k-way merge, in which case we need kth order Fibonacci numbers for the
distribution, where the kth order Fibonacci number is defined as F (k) (n) = F (k) (n - 1) + F (k) (n - 2) +
+ F (k) (n - k), with the appropriate initial conditions F (k) (n) = 0,0 n k - 2, F (k) (k - 1) =1.

TKR College of Engineering & Technology 43


Data Structures

Replacement Selection

The last item we will consider is construction of the runs. The strategy we have used so far is the
simplest possible: We read as many records as possible and sort them, writing the result to some
tape. This seems like the best approach possible, until one realizes that as soon as the first record is
written to an output tape, the memory it used becomes available for another record. If the next
record on the input tape is larger than the record we have just output, then it can be included in the
run.

Using this observation, we can give an algorithm for producing runs. This technique is commonly
referred to as replacement selection.

Initially, m records are read into memory and placed in a priority queue. We perform a delete_min,
writing the smallest record to the output tape. We read the next record from the input tape. If it is
larger than the record we have just written, we can add it to the priority queue. Otherwise, it cannot
go into the current run. Since the priority queue is smaller by one element, we can store this new
element in the dead space of the priority queue until the run is completed and use the element for
the next run. Storing an element in the dead space is similar to what is done in heapsort. We
continue doing this until the size of the priority queue is zero, at which point the run is over. We
start a new run by building a new priority queue, using all the elements in the dead space. Figure
7.18 shows the run construction for the small example we have been using, with m = 3. Dead
elements are indicated by an asterisk.

In this example, replacement selection produces only three runs, compared with the five runs
obtained by sorting. Because of this, a three-way merge finishes in one pass instead of two. If the
input is randomly distributed, replacement selection can be shown to produce runs of average
length 2m. For our large example, we would expect 160 runs instead of 320 runs, so a five-way
merge would require four passes. In this case, we have not saved a pass, although we might if we get
lucky and have 125 runs or less. Since external sorts take so long, every pass saved can make a
significant difference in the running time.

TKR College of Engineering & Technology 44


Data Structures

4.3.1 basic external sorting algorithm,


4.3.2 multi-way merge,
4.3.3 poly-phase merge,
4.3.4 replacement selection.

TKR College of Engineering & Technology 45


Data Structures

UNIT V
5.1 Graphs
5.1.1 Definitions, Terminology,
A graph is a pictorial representation of a set of objects where some pairs of objects are
connected by links. The interconnected objects are represented by points termed as vertices,
and the links that connect the vertices are called edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of
edges, connecting the pairs of vertices. Take a look at the following graph −

in the above graph,


V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}

Graph Data Structure

Mathematical graphs can be represented in data structure. We can represent a graph using an
array of vertices and a two-dimensional array of edges. Before we proceed further, let's
familiarize ourselves with some important terms −
 Vertex − Each node of the graph is represented as a vertex. In the following example,
the labeled circle represents vertices. Thus, A to G are vertices. We can represent
them using an array as shown in the following image. Here A can be identified by
index 0. B can be identified using index 1 and so on.
 Edge − Edge represents a path between two vertices or a line between two vertices.
In the following example, the lines from A to B, B to C, and so on represents edges.
We can use a two-dimensional array to represent an array as shown in the following
image. Here AB can be represented as 1 at row 0, column 1, BC as 1 at row 1,
column 2 and so on, keeping other combinations as 0.
 Adjacency − Two node or vertices are adjacent if they are connected to each other
through an edge. In the following example, B is adjacent to A, C is adjacent to B,
and so on.
 Path − Path represents a sequence of edges between the two vertices. In the
following example, ABCD represents a path from A to D.

TKR College of Engineering & Technology 46


Data Structures

Basic Operations

Following are basic primary operations of a Graph −


 Add Vertex − Adds a vertex to the graph.
 Add Edge − Adds an edge between the two vertices of the graph.
 Display Vertex − Displays a vertex of the graph.

5.1.2 Applications and more definitions, Properties,


5.1.3 Graph ADT, Graph Representations-Adjacency matrix, Adjacency lists,

5.1.4 Graph Search methods -DFS and BFS, Complexity analysis.


Depth First Search (DFS) algorithm traverses a graph in a depthward motion and uses a stack to
remember to get the next vertex to start a search, when a dead end occurs in any iteration.

As in the example given above, DFS algorithm traverses from S to A to D to G to E to B first, then to
F and lastly to C. It employs the following rules.
 Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Push it in a stack.

TKR College of Engineering & Technology 47


Data Structures

 Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It will pop up all the
vertices from the stack, which do not have adjacent vertices.)
 Rule 3 − Repeat Rule 1 and Rule 2 until the stack is empty.

Ste Traversal Description


p

Initialize the stack.

Mark S as visited and put it


onto the stack. Explore any
unvisited adjacent node
from S. We have three nodes
and we can pick any of them.
For this example, we shall
take the node in an
alphabetical order.

Mark A as visited and put it


onto the stack. Explore any
unvisited adjacent node from
A. Both S and D are adjacent
to A but we are concerned for
unvisited nodes only.

TKR College of Engineering & Technology 48


Data Structures

Visit D and mark it as visited


and put onto the stack. Here,
we have B and C nodes,
which are adjacent to D and
both are unvisited. However,
we shall again choose in an
alphabetical order.

We choose B, mark it as
visited and put onto the
stack. Here B does not
have any unvisited
adjacent node. So, we
pop B from the stack.

We check the stack top for


return to the previous node
and check if it has any
unvisited nodes. Here, we
find D to be on the top of
the stack.

TKR College of Engineering & Technology 49


Data Structures

Only unvisited adjacent


node is from D is C now.
So we visit C, mark it as
visited and put it onto the
stack.

As C does not have any unvisited adjacent node so we keep popping the stack until we find
a node that has an unvisited adjacent node. In this case, there's none and we keep popping
until the stack is empty.

Breadth First Search (BFS) algorithm traverses a graph in a breadthward motion and uses a
queue to remember to get the next vertex to start a search, when a dead end occurs in any
iteration.

As in the example given above, BFS algorithm traverses from A to B to E to F first then to C
and G lastly to D. It employs the following rules.
 Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Insert it in
a queue.
 Rule 2 − If no adjacent vertex is found, remove the first vertex from the queue.
 Rule 3 − Repeat Rule 1 and Rule 2 until the queue is empty.

TKR College of Engineering & Technology 50


Data Structures

Ste Traversal Description


p

TKR College of Engineering & Technology 51


Data Structures

Initialize the queue.

We start from
visiting S (starting node), and
mark it as visited.

We then see an unvisited


adjacent node from S. In this
example, we have three nodes
but alphabetically we
choose A, mark it as visited
and enqueue it.

Next, the unvisited adjacent


node from S is B. We mark it
as visited and enqueue it.

TKR College of Engineering & Technology 52


Data Structures

Next, the unvisited adjacent


node from S is C. We mark it
as visited and enqueue it.

Now, S is left with no


unvisited adjacent nodes. So,
we dequeue and find A.

From A we have D as
unvisited adjacent node. We
mark it as visited and enqueue
it.

At this stage, we are left with no unmarked (unvisited) nodes. But as per the algorithm we
keep on dequeuing in order to get all unvisited nodes. When the queue gets emptied, the
program is over.

DFS BFS
1. Stack Queue
2. Depth breadth
3. First adjacent vertex all the Adjacent vertex

TKR College of Engineering & Technology 53


Data Structures

5.2 Search Trees Binary Search Tree ADT, Definition,


5.2.1 Operations-Searching, Insertion and Deletion,
5.2.3 Balanced search trees-AVL Trees-Definition and Examples only,
5.2.4 B-Trees-Definition and Examples only,
5.2.5 Red Black Trees-Definitions and Examples only,
Tree represents the nodes connected by edges. We will discuss binary tree or binary search
tree specifically.
Binary Tree is a special data structure used for data storage purposes. A binary tree has a
special condition that each node can have a maximum of two children. A binary tree has the
benefits of both an ordered array and a linked list as search is as quick as in a sorted array
and insertion or deletion operation are as fast as in linked list.

Important Terms

Following are the important terms with respect to tree.


 Path − Path refers to the sequence of nodes along the edges of a tree.
 Root − The node at the top of the tree is called root. There is only one root per tree
and one path from the root node to any node.
 Parent − Any node except the root node has one edge upward to a node called
parent.
 Child − The node below a given node connected by its edge downward is called its
child node.
 Leaf − The node which does not have any child node is called the leaf node.
 Subtree − Subtree represents the descendants of a node.
 Visiting − Visiting refers to checking the value of a node when control is on the
node.
 Traversing − Traversing means passing through nodes in a specific order.
 Levels − Level of a node represents the generation of a node. If the root node is at
level 0, then its next child node is at level 1, its grandchild is at level 2, and so on.

TKR College of Engineering & Technology 54


Data Structures

 keys − Key represents a value of a node based on which a search operation is to be


carried out for a node.

Binary Search Tree Representation

Binary Search tree exhibits a special behavior. A node's left child must have a value less
than its parent's value and the node's right child must have a value greater than its parent
value.

Tree Node

The code to write a tree node would be similar to what is given below. It has a data part and
references to its left and right child nodes.
struct node {
int data;
struct node *leftChild;
struct node *rightChild;
};
In a tree, all nodes share common construct.

BST Basic Operations

The basic operations that can be performed on a binary search tree data structure, are the
following −
 Insert − Inserts an element in a tree/create a tree.
 Search − Searches an element in a tree.
 Preorder Traversal − Traverses a tree in a pre-order manner.
 Inorder Traversal − Traverses a tree in an in-order manner.
 Postorder Traversal − Traverses a tree in a post-order manner.
We shall learn creating (inserting into) a tree structure and searching a data item in a tree in
this chapter. We shall learn about tree traversing methods in the coming chapter.

Insert Operation

The very first insertion creates the tree. Afterwards, whenever an element is to be inserted,
first locate its proper location. Start searching from the root node, then if the data is less than
the key value, search for the empty location in the left subtree and insert the data. Otherwise,
search for the empty location in the right subtree and insert the data.

TKR College of Engineering & Technology 55


Data Structures

Algorithm
If root is NULL
then create root node
return

If root exists then


compare the data with node.data

while until insertion position is located

If data is greater than node.data


goto right subtree
else
goto left subtree

endwhile

insert data

end If
Implementation
The implementation of insert function should look like this −
void insert(int data) {
struct node *tempNode = (struct node*) malloc(sizeof(struct node));
struct node *current;
struct node *parent;

tempNode->data = data;
tempNode->leftChild = NULL;
tempNode->rightChild = NULL;

//if tree is empty, create root node


if(root == NULL) {
root = tempNode;
} else {
current = root;
parent = NULL;

while(1) {
parent = current;

//go to left of the tree


if(data < parent->data) {
current = current->leftChild;

//insert to the left


if(current == NULL) {
parent->leftChild = tempNode;
return;

TKR College of Engineering & Technology 56


Data Structures

}
}

//go to right of the tree


else {
current = current->rightChild;

//insert to the right


if(current == NULL) {
parent->rightChild = tempNode;
return;
}
}
}
}
}

Search Operation

Whenever an element is to be searched, start searching from the root node, then if the data is
less than the key value, search for the element in the left subtree. Otherwise, search for the
element in the right subtree. Follow the same algorithm for each node.
Algorithm
If root.data is equal to search.data
return root
else
while data not found

If data is greater than node.data


goto right subtree
else
goto left subtree

If data found
return node
endwhile

return data not found

end if
The implementation of this algorithm should look like this.
struct node* search(int data) {
struct node *current = root;
printf("Visiting elements: ");

while(current->data != data) {
if(current != NULL)
printf("%d ",current->data);

TKR College of Engineering & Technology 57


Data Structures

//go to left tree

if(current->data > data) {


current = current->leftChild;
}
//else go to right tree
else {
current = current->rightChild;
}

//not found
if(current == NULL) {
return NULL;
}

return current;
}
}

Traversal is a process to visit all the nodes of a tree and may print their values too. Because,
all nodes are connected via edges (links) we always start from the root (head) node. That is,
we cannot randomly access a node in a tree. There are three ways which we use to traverse a
tree −

 In-order Traversal
 Pre-order Traversal
 Post-order Traversal
Generally, we traverse a tree to search or locate a given item or key in the tree or to print all
the values it contains.

In-order Traversal

In this traversal method, the left subtree is visited first, then the root and later the right sub-
tree. We should always remember that every node may represent a subtree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an
ascending order.

TKR College of Engineering & Technology 58


Data Structures

We start from A, and following in-order traversal, we move to its left subtree B. B is also
traversed in-order. The process goes on until all the nodes are visited. The output of inorder
traversal of this tree will be −
D→B→E→A→F→C→G
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Visit root node.
Step 3 − Recursively traverse right subtree.

Pre-order Traversal

In this traversal method, the root node is visited first, then the left subtree and finally the
right subtree.

TKR College of Engineering & Technology 59


Data Structures

We start from A, and following pre-order traversal, we first visit A itself and then move to
its left subtree B. B is also traversed pre-order. The process goes on until all the nodes are
visited. The output of pre-order traversal of this tree will be −
A→B→D→E→C→F→G
Algorithm
Until all nodes are traversed −
Step 1 − Visit root node.
Step 2 − Recursively traverse left subtree.
Step 3 − Recursively traverse right subtree.

Post-order Traversal

In this traversal method, the root node is visited last, hence the name. First we traverse the
left subtree, then the right subtree and finally the root node.

We start from A, and following Post-order traversal, we first visit the left subtree B. B is
also traversed post-order. The process goes on until all the nodes are visited. The output of
post-order traversal of this tree will be −
D→E→B→F→G→C→A
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Recursively traverse right subtree.
Step 3 − Visit root node.

A Binary Search Tree (BST) is a tree in which all the nodes follow the below-mentioned
properties −
 The value of the key of the left sub-tree is less than the value of its parent (root)
node's key.

TKR College of Engineering & Technology 60


Data Structures

 The value of the key of the right sub-tree is greater than or equal to the value of its
parent (root) node's key.
Thus, BST divides all its sub-trees into two segments; the left sub-tree and the right sub-tree
and can be defined as −
left_subtree (keys) < node (key) ≤ right_subtree (keys)

Representation

BST is a collection of nodes arranged in a way where they maintain BST properties. Each
node has a key and an associated value. While searching, the desired key is compared to the
keys in BST and if found, the associated value is retrieved.
Following is a pictorial representation of BST −

We observe that the root node key (27) has all less-valued keys on the left sub-tree and the
higher valued keys on the right sub-tree.

Basic Operations

Following are the basic operations of a tree −


 Search − Searches an element in a tree.
 Insert − Inserts an element in a tree.
 Pre-order Traversal − Traverses a tree in a pre-order manner.
 In-order Traversal − Traverses a tree in an in-order manner.
 Post-order Traversal − Traverses a tree in a post-order manner.

Node

Define a node having some data, references to its left and right child nodes.
struct node {
int data;
struct node *leftChild;
struct node *rightChild;
};

Search Operation

TKR College of Engineering & Technology 61


Data Structures

Whenever an element is to be searched, start searching from the root node. Then if the data
is less than the key value, search for the element in the left subtree. Otherwise, search for the
element in the right subtree. Follow the same algorithm for each node.
Algorithm
struct node* search(int data){
struct node *current = root;
printf("Visiting elements: ");

while(current->data != data){

if(current != NULL) {
printf("%d ",current->data);

//go to left tree


if(current->data > data){
current = current->leftChild;
} //else go to right tree
else {
current = current->rightChild;
}

//not found
if(current == NULL){
return NULL;
}
}
}

return current;
}

Insert Operation

Whenever an element is to be inserted, first locate its proper location. Start searching from
the root node, then if the data is less than the key value, search for the empty location in the
left subtree and insert the data. Otherwise, search for the empty location in the right subtree
and insert the data.
Algorithm
void insert(int data) {
struct node *tempNode = (struct node*) malloc(sizeof(struct node));
struct node *current;
struct node *parent;

tempNode->data = data;
tempNode->leftChild = NULL;
tempNode->rightChild = NULL;

//if tree is empty


if(root == NULL) {

TKR College of Engineering & Technology 62


Data Structures

root = tempNode;
} else {
current = root;
parent = NULL;

while(1) {
parent = current;

//go to left of the tree


if(data < parent->data) {
current = current->leftChild;
//insert to the left

if(current == NULL) {
parent->leftChild = tempNode;
return;
}
} //go to right of the tree
else {
current = current->rightChild;

//insert to the right


if(current == NULL) {
parent->rightChild = tempNode;
return;
}
}
}
}
}

TKR College of Engineering & Technology 63


Data Structures

AVL Trees-Definition and Examples only


What if the input to binary search tree comes in a sorted (ascending or descending)
manner? It will then look like this −

It is observed that BST's worst-case performance is closest to linear search


algorithms, that is Ο(n). In real-time data, we cannot predict data pattern and their
frequencies. So, a need arises to balance out the existing BST.
Named after their inventor Adelson, Velski & Landis, AVL trees are height
balancing binary search tree. AVL tree checks the height of the left and the right
sub-trees and assures that the difference is not more than 1. This difference is
called the Balance Factor.
Here we see that the first tree is balanced and the next two trees are not balanced −

In the second tree, the left subtree of C has height 2 and the right subtree has
height 0, so the difference is 2. In the third tree, the right subtree of A has height 2
and the left is missing, so it is 0, and the difference is 2 again. AVL tree permits
difference (balance factor) to be only 1.

TKR College of Engineering & Technology 64


Data Structures

BalanceFactor = height(left-sutree) − height(right-sutree)


If the difference in the height of left and right sub-trees is more than 1, the tree is
balanced using some rotation techniques.

AVL Rotations
To balance itself, an AVL tree may perform the following four kinds of rotations −

 Left rotation
 Right rotation
 Left-Right rotation
 Right-Left rotation
The first two rotations are single rotations and the next two rotations are double
rotations. To have an unbalanced tree, we at least need a tree of height 2. With this
simple tree, let's understand them one by one.
Left Rotation
If a tree becomes unbalanced, when a node is inserted into the right subtree of the
right subtree, then we perform a single left rotation −

In our example, node A has become unbalanced as a node is inserted in the right
subtree of A's right subtree. We perform the left rotation by making A the left-
subtree of B.

Right Rotation
AVL tree may become unbalanced, if a node is inserted in the left subtree of the left
subtree. The tree then needs a right rotation.

TKR College of Engineering & Technology 65


Data Structures

As depicted, the unbalanced node becomes the right child of its left child by
performing a right rotation.
Left-Right Rotation
Double rotations are slightly complex version of already explained versions of
rotations. To understand them better, we should take note of each action performed
while rotation. Let's first check how to perform Left-Right rotation. A left-right
rotation is a combination of left rotation followed by right rotation.

State Action

A node has been inserted into the right subtree of the left
subtree. This makes C an unbalanced node. These scenarios
cause AVL tree to perform left-right rotation.

We first perform the left rotation on the left subtree of C. This


makes A, the left subtree of B.

TKR College of Engineering & Technology 66


Data Structures

Node C is still unbalanced, however now, it is because of the


left-subtree of the left-subtree.

We shall now right-rotate the tree, making B the new root node
of this subtree. C now becomes the right subtree of its own left
subtree.

The tree is now balanced.

Right-Left Rotation
The second type of double rotation is Right-Left Rotation. It is a combination of right
rotation followed by left rotation.

State Action

A node has been inserted into the left subtree of the right
subtree. This makes A, an unbalanced node with balance
factor 2.

TKR College of Engineering & Technology 67


Data Structures

First, we perform the right rotation along C node,


making C the right subtree of its own left subtree B.
Now, B becomes the right subtree of A.

Node A is still unbalanced because of the right subtree of its


right subtree and requires a left rotation.

A left rotation is performed by making B the new root node of


the subtree. A becomes the left subtree of its right subtree B.

The tree is now balanced.

TKR College of Engineering & Technology 68


Data Structures

Operations on an AVL Tree


The following operations are performed on AVL tree...

1. Search
2. Insertion
3. Deletion

Search Operation in AVL Tree


In an AVL tree, the search operation is performed with O(log n) time complexity.
The search operation in the AVL tree is similar to the search operation in a Binary
search tree. We use the following steps to search an element in AVL tree...

 Step 1 - Read the search element from the user.

 Step 2 - Compare the search element with the value of root node in the tree.

 Step 3 - If both are matched, then display "Given node is found!!!" and
terminate the function

 Step 4 - If both are not matched, then check whether search element is
smaller or larger than that node value.

 Step 5 - If search element is smaller, then continue the search process in left
subtree.

 Step 6 - If search element is larger, then continue the search process in right
subtree.

 Step 7 - Repeat the same until we find the exact element or until the search
element is compared with the leaf node.

 Step 8 - If we reach to the node having the value equal to the search value,
then display "Element is found" and terminate the function.

 Step 9 - If we reach to the leaf node and if it is also not matched with the
search element, then display "Element is not found" and terminate the
function.

Insertion Operation in AVL Tree


In an AVL tree, the insertion operation is performed with O(log n) time complexity.
In AVL Tree, a new node is always inserted as a leaf node. The insertion operation is
performed as follows...

 Step 1 - Insert the new element into the tree using Binary Search Tree
insertion logic.

 Step 2 - After insertion, check the Balance Factor of every node.

TKR College of Engineering & Technology 69


Data Structures

 Step 3 - If the Balance Factor of every node is 0 or 1 or -1 then go for next


operation.

 Step 4 - If the Balance Factor of any node is other than 0 or 1 or -1 then


that tree is said to be imbalanced. In this case, perform suitable Rotation to
make it balanced and go for next operation.

TKR College of Engineering & Technology 70


Data Structures

Example: Construct an AVL Tree by inserting


numbers from 1 to 8.

TKR College of Engineering & Technology 71


Data Structures

TKR College of Engineering & Technology 72


Data Structures

Deletion Operation in AVL Tree


The deletion operation in AVL Tree is similar to deletion operation in BST. But after
every deletion operation, we need to check with the Balance Factor condition. If the
tree is balanced after deletion go for next operation otherwise perform suitable
rotation to make the tree Balanced.

Red Black Trees-Definitions and Examples only

The Red-Black Trees are self-balancing binary search tree. There are some
conditions for each node. These are like below −

 Each node has color. Which is either Red or Black


 The root will be always black
 There will be no two adjacent Red nodes
 Every path from a node (including root) to any of its descendent NULL node
has the same number of black nodes.

Example of Red-black tree

Red-Black tree with Null Nodes at leaf

TKR College of Engineering & Technology 73


Data Structures

Introduction:
A red-black tree is a kind of self-balancing binary search tree where each
node has an extra bit, and that bit is often interpreted as the colour (red or
black). These colours are used to ensure that the tree remains balanced
during insertions and deletions. Although the balance of the tree is not
perfect, it is good enough to reduce the searching time and maintain it
around O(log n) time, where n is the total number of elements in the tree.
This tree was invented in 1972 by Rudolf Bayer.
It must be noted that as each node requires only 1 bit of space to store the
colour information, these types of trees show identical memory footprint to
the classic (uncoloured) binary search tree.
Rules That Every Red-Black Tree Follows:
1. Every node has a colour either red or black.
2. The root of the tree is always black.
3. There are no two adjacent red nodes (A red node cannot have a
red parent or red child).
4. Every path from a node (including root) to any of its descendants
NULL nodes has the same number of black nodes.
5. All leaf nodes are black nodes.
Why Red-Black Trees?
Most of the BST operations (e.g., search, max, min, insert, delete.. etc) take
O(h) time where h is the height of the BST. The cost of these operations may
become O(n) for a skewed Binary tree. If we make sure that the height of the
tree remains O(log n) after every insertion and deletion, then we can
guarantee an upper bound of O(log n) for all these operations. The height of
a Red-Black tree is always O(log n) where n is the number of nodes in the
tree.

TKR College of Engineering & Technology 74


Data Structures

Sr.
No. Algorithm Time Complexity

1. Search O(log n)

2. Insert O(log n)

3. Delete O(log n)

“n” is the total number of elements in the red-black tree.


Comparison with AVL Tree:
The AVL trees are more balanced compared to Red-Black Trees, but they
may cause more rotations during insertion and deletion. So if your
application involves frequent insertions and deletions, then Red-Black trees
should be preferred. And if the insertions and deletions are less frequent and
search is a more frequent operation, then AVL tree should be preferred over
Red-Black Tree.
How does a Red-Black Tree ensure balance?
A simple example to understand balancing is, a chain of 3 nodes is not
possible in the Red-Black tree. We can try any combination of colours and
see all of them violate Red-Black tree property.
A chain of 3 nodes is not possible in Red-Black Trees.
Following are NOT Red-Black Trees
30 30 30
/ \ / \ / \
20 NIL 20 NIL 20 NIL
/ \ / \ / \
10 NIL 10 NIL 10 NIL
Violates Violates Violates
Property 4. Property 4 Property 3

Following are different possible Red-Black Trees with above 3


keys
20 20
/ \ / \
10 30 10 30
/ \ / \ / \ / \
NIL NIL NIL NIL NIL NIL NIL NIL

Interesting points about Red-Black Tree:


1. Black height of the red-black tree is the number of black nodes on a
path from the root node to a leaf node. Leaf nodes are also counted

TKR College of Engineering & Technology 75


Data Structures

as black nodes. So, a red-black tree of height h has black height >=
h/2.
2. Height of a red-black tree with n nodes is h<= 2 log 2(n + 1).
3. All leaves (NIL) are black.
4. The black depth of a node is defined as the number of black nodes
from the root to that node i.e the number of black ancestors.
5. Every red-black tree is a special case of a binary tree.

Black Height of a Red-Black Tree :


Black height is the number of black nodes on a path from the root to a leaf.
Leaf nodes are also counted black nodes. From the above properties 3 and
4, we can derive, a Red-Black Tree of height h has black-height >= h/2.
Number of nodes from a node to its farthest descendant leaf is no more than
twice as the number of nodes to the nearest descendant leaf.
Every Red Black Tree with n nodes has height <= 2Log2(n+1)
This can be proved using the following facts:
1. For a general Binary Tree, let k be the minimum number of nodes
on all root to NULL paths, then n >= 2 k – 1 (Ex. If k is 3, then n is at
least 7). This expression can also be written as k <= Log 2(n+1).
2. From property 4 of Red-Black trees and above claim, we can say in
a Red-Black Tree with n nodes, there is a root to leaf path with at-
most Log2(n+1) black nodes.

number of black nodes in a Red-Black tree is at least ⌊ n/2 ⌋ where


3. From property 3 and 5 of Red-Black trees, we can claim that the

n is the total number of nodes.


From the above points, we can conclude the fact that Red Black Tree
with n nodes has height <= 2Log 2(n+1)
Search Operation in Red-black Tree:
As every red-black tree is a special case of a binary tree so the searching
algorithm of a red-black tree is similar to that of a binary tree.
Algorithm:
searchElement (tree, val)
Step 1:
If tree -> data = val OR tree = NULL
Return tree
Else
If val < data
Return searchElement (tree -> left, val)
Else
Return searchElement (tree -> right, val)
[ End of if ]
[ End of if ]

Step 2: END
For the program, you can refer it for AVL tree.

TKR College of Engineering & Technology 76


Data Structures

Example: Searching 11 in the following red-black tree.

Solution:
1. Start from the root.
2. Compare the inserting element with root, if less than root, then
recurse for left, else recurse for right.
3. If the element to search is found anywhere, return true, else return
false.

Just follow the blue bubble.

TKR College of Engineering & Technology 77


Data Structures

In this post, we introduced Red-Black trees and discussed how balance is


ensured. The hard part is to maintain balance when keys are added and
removed. We have also seen how to search an element from the red-black
tree. We will soon be discussing insertion and deletion operations in coming
posts on the Red-Black tree.
Exercise:
1) Is it possible to have all black nodes in a Red-Black tree?
2) Draw a Red-Black Tree that is not an AVL tree structure-wise?
Insertion and Deletion
Red-Black Tree Insertion
Red-Black Tree Deletion
Applications:
1. Most of the self-balancing BST library functions like map and set in
C++ (OR TreeSet and TreeMap in Java) use Red-Black Tree.
2. It is used to implement CPU Scheduling Linux. Completely Fair
Scheduler uses it.
3. Besides they are used in the K-mean clustering algorithm for
reducing time complexity.
4. Moreover, MySQL also uses the Red-Black tree for indexes on
tables.

Comparison with AVL Tree


AVL Trees are more balanced than the Red-Black tree. But the major disadvantage
is, there will be more rotations during insertion and deletion. For multiple insertion
and deletion, Red-Black tree will be helpful.

TKR College of Engineering & Technology 78


Data Structures

5.2.7 Comparison of Search Trees.

Here we will see some search trees and their differences. There are many different
search trees. They are different in nature. The basic search tree is Binary Search
Tree (BST). Some other search trees are AVL tree, B tree, Red-Black tree, splay
tree etc.
These trees can be compares based on their operations. We will see the time
complexity of these trees

Search Tree Average Case

Insert Delete Search

Binary Search Tree O(log n) O(log n) O(log n)

AVL tree O(log2 n) O(log2 n) O(log2 n)

B Tree O(log n) O(log n) O(log n)

Red-Black Tree O(log n) O(log n) O(log n)

Splay Tree O(log2 n) O(log2 n) O(log2 n)

Search Tree Worst Case

Insert Delete Search

Binary Search Tree O(n) O(n) O(n)

AVL tree O(log2 n) O(log2 n) O(log2 n)

TKR College of Engineering & Technology 79


Data Structures

Search Tree Worst Case

B Tree O(log n) O(log n) O(log n)

Red-Black Tree O(log n) O(log n) O(log n)

Splay Tree O(log2 n) O(log2 n) O(log2 n)

5.2.6 k-d trees,


A K-D Tree (also called as K-Dimensional Tree) is a binary search tree
where data in each node is a K-Dimensional point in space. In short, it is a
space partitioning (details below) data structure for organizing points in a K-
Dimensional space.
A non-leaf node in K-D tree divides the space into two parts, called as half-
spaces.
Points to the left of this space are represented by the left subtree of that
node and points to the right of the space are represented by the right
subtree. We will soon be explaining the concept on how the space is divided
and tree is formed.
For the sake of simplicity, let us understand a 2-D Tree with an example.
The root would have an x-aligned plane, the root’s children would both have
y-aligned planes, the root’s grandchildren would all have x-aligned planes,
and the root’s great-grandchildren would all have y-aligned planes and so
on.
Generalization:
Let us number the planes as 0, 1, 2, …(K – 1). From the above example, it is
quite clear that a point (node) at depth D will have A aligned plane where A
is calculated as:
A = D mod K
How to determine if a point will lie in the left subtree or in right
subtree?
If the root node is aligned in plane A, then the left subtree will contain all
points whose coordinates in that plane are smaller than that of root node.
Similarly, the right subtree will contain all points whose coordinates in that
plane are greater-equal to that of root node.

TKR College of Engineering & Technology 80


Data Structures

Creation of a 2-D Tree:


Consider following points in a 2-D plane:
(3, 6), (17, 15), (13, 15), (6, 12), (9, 1), (2, 7), (10, 19)
1. Insert (3, 6): Since tree is empty, make it the root node.
2. Insert (17, 15): Compare it with root node point. Since root node is
X-aligned, the X-coordinate value will be compared to determine if it
lies in the rightsubtree or in the right subtree. This point will be Y-
aligned.
3. Insert (13, 15): X-value of this point is greater than X-value of point
in root node. So, this will lie in the right subtree of (3, 6). Again
Compare Y-value of this point with the Y-value of point (17, 15)
(Why?). Since, they are equal, this point will lie in the right subtree
of (17, 15). This point will be X-aligned.
4. Insert (6, 12): X-value of this point is greater than X-value of point in
root node. So, this will lie in the right subtree of (3, 6). Again
Compare Y-value of this point with the Y-value of point (17, 15)
(Why?). Since, 12 < 15, this point will lie in the left subtree of (17,
15). This point will be X-aligned.
5. Insert (9, 1):Similarly, this point will lie in the right of (6, 12).
6. Insert (2, 7):Similarly, this point will lie in the left of (3, 6).
7. Insert (10, 19): Similarly, this point will lie in the left of (13, 15).

How is space partitioned?


All 7 points will be plotted in the X-Y plane as follows:

TKR College of Engineering & Technology 81


Data Structures

1. Point (3, 6) will divide the space into two parts: Draw line X = 3.

2. Point (2, 7) will divide the space to the left of line X = 3 into two
parts horizontally.

TKR College of Engineering & Technology 82


Data Structures

Draw line Y = 7 to the left of line X = 3.

3. Point (17, 15) will divide the space to the right of line X = 3 into two
parts horizontally.

TKR College of Engineering & Technology 83


Data Structures

Draw line Y = 15 to the right of line X = 3.

 Point (6, 12) will divide the space below line Y = 15 and to the right
of line X = 3 into two parts.

TKR College of Engineering & Technology 84


Data Structures

Draw line X = 6 to the right of line X = 3 and below line Y = 15.

 Point (13, 15) will divide the space below line Y = 15 and to the
right of line X = 6 into two parts.

TKR College of Engineering & Technology 85


Data Structures

Draw line X = 13 to the right of line X = 6 and below line Y = 15.

 Point (9, 1) will divide the space between lines X = 3, X = 6 and Y =


15 into two parts.

TKR College of Engineering & Technology 86


Data Structures

Draw line Y = 1 between lines X = 3 and X = 6.

 Point (10, 19) will divide the space to the right of line X = 3 and
above line Y = 15 into two parts.

TKR College of Engineering & Technology 87


Data Structures

Draw line Y = 19 to the right of line X = 3 and above line Y = 15.

Following is C++ implementation of KD Tree basic operations like search,


insert and delete.

// A C++ program to demonstrate operations of KD tree


#include<bits/stdc++.h>
using namespace std;

const int k = 2;

// A structure to represent node of kd tree


struct Node
{
int point[k]; // To store k dimensional point
Node *left, *right;
};

// A method to create a node of K D tree


struct Node* newNode(int arr[])
{
struct Node* temp = new Node;

for (int i=0; i<k; i++)


temp->point[i] = arr[i];

TKR College of Engineering & Technology 88


Data Structures

temp->left = temp->right = NULL;


return temp;
}

// Inserts a new node and returns root of modified tree


// The parameter depth is used to decide axis of comparison
Node *insertRec(Node *root, int point[], unsigned depth)
{
// Tree is empty?
if (root == NULL)
return newNode(point);

// Calculate current dimension (cd) of comparison


unsigned cd = depth % k;

// Compare the new point with root on current dimension 'cd'


// and decide the left or right subtree
if (point[cd] < (root->point[cd]))
root->left = insertRec(root->left, point, depth + 1);
else
root->right = insertRec(root->right, point, depth + 1);

return root;
}

// Function to insert a new point with given point in


// KD Tree and return new root. It mainly uses above recursive
// function "insertRec()"
Node* insert(Node *root, int point[])
{
return insertRec(root, point, 0);
}

// A utility method to determine if two Points are same


// in K Dimensional space
bool arePointsSame(int point1[], int point2[])
{
// Compare individual pointinate values
for (int i = 0; i < k; ++i)
if (point1[i] != point2[i])
return false;

return true;
}

// Searches a Point represented by "point[]" in the K D tree.

TKR College of Engineering & Technology 89


Data Structures

// The parameter depth is used to determine current axis.


bool searchRec(Node* root, int point[], unsigned depth)
{
// Base cases
if (root == NULL)
return false;
if (arePointsSame(root->point, point))
return true;

// Current dimension is computed using current depth and total


// dimensions (k)
unsigned cd = depth % k;

// Compare point with root with respect to cd (Current dimension)


if (point[cd] < root->point[cd])
return searchRec(root->left, point, depth + 1);

return searchRec(root->right, point, depth + 1);


}

// Searches a Point in the K D tree. It mainly uses


// searchRec()
bool search(Node* root, int point[])
{
// Pass current depth as 0
return searchRec(root, point, 0);
}

// Driver program to test above functions


int main()
{
struct Node *root = NULL;
int points[][k] = {{3, 6}, {17, 15}, {13, 15}, {6, 12},
{9, 1}, {2, 7}, {10, 19}};

int n = sizeof(points)/sizeof(points[0]);

for (int i=0; i<n; i++)


root = insert(root, points[i]);

int point1[] = {10, 19};


(search(root, point1))? cout << "Found\n": cout << "Not Found\n";

int point2[] = {12, 19};


(search(root, point2))? cout << "Found\n": cout << "Not Found\n";

return 0;

TKR College of Engineering & Technology 90


Data Structures

Output:
Found
Not Found

TKR College of Engineering & Technology 91

You might also like