UNIT-I
Introduction to Algorithm Analysis, Space and Time Complexity analysis,
Asymptotic
Notations.
AVL Trees – Creation, Insertion, Deletion operations and Applications
B-Trees – Creation, Insertion, Deletion operations and Applications
Algorithm:
An algorithm is a set of well-defined instructions to solve a particular problem. It
takes a set of input(s) and produces the desired output.
For example,
An algorithm to add two numbers:
1. Take two number inputs
2. Add numbers using the + operator
3. Display the result
Qualities of a Good Algorithm
1. Input and output should be defined accurately.
2. Each step in the algorithm should be clear and unambiguous.
3. Algorithms should be most effective among many different ways to solve a
problem.
4. An algorithm should not include computer code. Instead, the algorithm
should be written in such a way that it can be used in different programming
languages.
Algorithm 1: Add two numbers entered by the user
Step 1: Start
Step 2: Declare variables num1, num2 and sum.
Step 3: Read values num1 and num2.
Step 4: Add num1 and num2 and assign the result to sum.
Sum= num1+num2
Step 5: Display sum
Step 6: Stop
Algorithm 2: Find the largest number among three numbers
Step 1: Start
Step 2: Declare variables a,b and c.
Step 3: Read variables a,b and c.
Step 4: If a > b
If a > c
Display a is the largest number.
Else
Display c is the largest number.
Else
If b > c
Display b is the largest number.
Else
Display c is the greatest number.
Step 5: Stop
Algorithm 3: Find the factorial of a number
Step 1: Start
Step 2: Declare variables n, factorial and i.
Step 3: Initialize variables
factorial = 1, i = 1
Step 4: Read value of n
Step 5: Repeat the steps until i = n
5.1: factorial = factorial*i
5.2: i = i+1
Step 6: Display factorial
Step 7: Stop
Performance Analysis
Performance Analysis of an algorithm:
If we want to go from city "A" to city "B", there can be many ways of doing this. We
can go by flight, by bus, by train and also by bicycle. Depending on the availability
and convenience, we choose the one which suits us. Similarly, in computer science,
there are multiple algorithms to solve a problem. When we have more than one
algorithm to solve a problem, we need to select the best one. Performance analysis
helps us to select the best algorithm from multiple algorithms to solve a problem.
When there are multiple alternative algorithms to solve a problem, we analyse them
and pick the one, which is best suitable for our requirements. The formal definition is
as follows...
Performance of an algorithm is a process of making evaluative judgement about
algorithms.
It can also be defined as follows...
Performance of an algorithm means predicting the resources, which are required
to an algorithm to perform its task.
That means when we have multiple algorithms to solve a problem, we need to select
a suitable algorithm to solve that problem.
We compare algorithms with each other which are solving the same problem, to select
the best algorithm. To compare algorithms, we use a set of parameters or set of
elements like memory required by that algorithm, the execution speed of that
algorithm, easy to understand, easy to implement, etc.,
Generally, the performance of an algorithm depends on the following elements...
1. Whether that algorithm is providing the exact solution for the problem?
2. Whether it is easy to understand?
3. Whether it is easy to implement?
4. How much space (memory) it requires to solve the problem?
5. How much time it takes to solve the problem? Etc.,
When we want to analyse an algorithm, we consider only the space and time required
by that particular algorithm and we ignore all the remaining elements.
Based on this information, performance analysis of an algorithm can also be defined
as follows...
Performance analysis of an algorithm is the process of calculating space and time
required by that algorithm.
Performance analysis of an algorithm is performed by using the following measures...
1. Space required to complete the task of that algorithm (Space Complexity). It
includes program space and data space
2. Time required to complete the task of that algorithm (Time Complexity)
Space Complexity:
When we design an algorithm to solve a problem, it needs some computer memory to
complete its execution. For any algorithm, memory is required for the following
purposes...
1. To store program instructions.
2. To store constant values.
3. To store variable values.
4. And for few other things like function calls, jumping statements etc,.
Space complexity of an algorithm can be defined as follows...
Total amount of computer memory required by an algorithm to complete its
execution is called as space complexity of that algorithm.
Generally, when a program is under execution it uses the computer memory for
THREE reasons. They are as follows...
1. Instruction Space: It is the amount of memory used to store compiled version
of instructions.
2. Environmental Stack: It is the amount of memory used to store information of
partially executed functions at the time of function call.
3. Data Space: It is the amount of memory used to store all the variables and
constants.
To calculate the space complexity, we must know the memory required to store
different datatype values (according to the compiler). For example, the C
Programming Language compiler requires the following...
1. 2 bytes to store Integer value.
2. 4 bytes to store Floating Point value.
3. 1 byte to store Character value.
4. 6 (OR) 8 bytes to store double value.
Consider the following piece of code...
Example 1
int square(int a)
return a*a;
}
In the above sample code, it requires 2 bytes of memory to store variable 'a' and
another 2 bytes of memory is used for return value.
That means, totally it requires 4 bytes of memory to complete its execution. And this
4 bytes of memory is fixed for any input value of 'a'.
Time Complexity:
What is Time complexity?
Every algorithm requires some amount of computer time to execute its instruction to
perform the task. This computer time required is called time complexity.
The time complexity of an algorithm can be defined as follows...
The time complexity of an algorithm is the total amount of time required by an
algorithm to complete its execution.
Generally, the running time of an algorithm depends upon the following...
1. Whether it is running on Single processor machine or Multi processor
machine.
2. Whether it is a 32 bit machine or 64 bit machine.
3. Read and Write speed of the machine.
4. The amount of time required by an algorithm to perform Arithmetic
operations, logical operations, return value and assignment operations etc.,
5. Input data
Calculating Time Complexity of an algorithm based on the system configuration is a
very difficult task because the configuration changes from one system to another
system. To solve this problem, we must assume a model machine with a specific
configuration. So that, we can able to calculate generalized time complexity according
to that model machine.
To calculate the time complexity of an algorithm, we need to define a model machine.
Let us assume a machine with following configuration...
1. It is a Single processor machine
2. It is a 32 bit Operating System machine
3. It performs sequential execution
4. It requires 1 unit of time for Arithmetic and Logical operations
5. It requires 1 unit of time for Assignment and Return value
6. It requires 1 unit of time for Read and Write operations
Now, we calculate the time complexity of following example code by using the above-
defined model machine...
Example 1: The following sample code
int sum (int a, int b)
return a+b;
In the above sample code, it requires 1 unit of time to calculate a+b and 1 unit of time
to return the value. That means, totally it takes 2 units of time to complete its
execution. And it does not change based on the input values of a and b. That means
for all input values, it requires the same amount of time i.e. 2 units.
Asymptotic Analysis:
The efficiency of an algorithm depends on the amount of time, storage and other
resources(Type and size of the processor) required to execute the algorithm. The
efficiency is measured with the help of asymptotic notations.
An algorithm may not have the same performance for different types of inputs. With
the increase in the input size, the performance will change.
The study of change in performance of the algorithm with the change in the order of
the input size is defined as asymptotic analysis.
Asymptotic Notations:
Asymptotic notations are the mathematical notations used to describe the running
time of an algorithm when the input tends towards a particular value or a limiting
value.
For Example: In bubble sort, when the input array is already sorted, the time taken by
the algorithm is linear i.e. the best case.
But, when the input array is in reverse condition, the algorithm takes the maximum
time (quadratic) to sort the elements i.e. the worst case.
When the input array is neither sorted nor in reverse order, then it takes average time.
These durations are denoted using asymptotic notations.
There are mainly three asymptotic notations:
1. Big-O notation
2. Omega notation
3. Theta notation
Big-O Notation (O-notation)
Big-O notation represents the upper bound of the running time of an algorithm.
Thus, it gives the worst-case complexity of an algorithm.
Big-O gives the upper bound of a function
O(g(n)) = { f(n): there exist positive constants c and n0
such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }
The above expression can be described as a function f(n) belongs to the set
O(g(n)) if there exists a positive constant c such that it lies between 0 and
cg(n), for sufficiently large n.
For any value of n, the running time of an algorithm does not cross the time
provided by O(g(n)).Meanwhile it gives the worst-case running time of an
algorithm.
Omega Notation (Ω-notation)
Omega notation represents the lower bound of the running time of an
algorithm. Thus, it provides the best case complexity of an algorithm.
Omega gives the lower bound of a function
Ω(g(n)) = { f(n): there exist positive constants c and n0
such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }
The above expression can be described as a function f(n) belongs to the set
Ω(g(n)) if there exists a positive constant c such that it lies above cg(n), for
sufficiently large n.
For any value of n, the minimum time required by the algorithm is given by
Omega Ω(g(n)).
Theta Notation (Θ-notation)
Theta notation encloses the function from above and below. Since it represents
the upper and the lower bound of the running time of an algorithm, it is used
for analysing the average-case complexity of an algorithm.
Theta bounds the function within constants factors
Θ(g(n)) = { f(n): there exist positive constants c1, c2 and n0
such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0 }
The above expression can be described as a function f(n) belongs to the set
Θ(g(n)) if there exist positive constants c1 and c2 such that it can be fitted in
between c1g(n) and c2g(n), for sufficiently large n.
If a function f(n) lies anywhere in between c1g(n) and c2g(n) for all n ≥ n0,
then f(n) is said to be asymptotically tight bound.
AVL Tree Data structure
AVL tree is a height-balanced binary search tree. That means, an AVL tree is also a
binary search tree but it is a balanced tree. A binary tree is said to be balanced if, the
difference between the heights of left and right subtrees of every node in the tree is
either -1, 0 or +1. In other words, a binary tree is said to be balanced if the height of
left and right children of every node differ by either -1, 0 or +1. In an AVL tree, every
node maintains an extra information known as balance factor. The AVL tree was
introduced in the year 1962 by G.M. Adelson-Velsky and E.M. Landis.
An AVL tree is defined as follows...
An AVL tree is a balanced binary search tree. In an AVL tree, balance factor of
every node is either -1, 0 or +1.
Balance factor of a node is the difference between the heights of the left and right
subtrees of that node. The balance factor of a node is calculated either height of left
subtree - height of right subtree (OR) height of right subtree - height of left subtree.
In the following explanation, we calculate as follows...
Balance factor = height of Left Subtree – height of Right Subtree
Example1 of AVL Tree
The above tree is a binary search tree and every node is satisfying balance factor
condition. So this tree is said to be an AVL tree.
Example2 of AVL Tree
Note: Every AVL Tree is a binary search tree but every Binary Search Tree need
not be AVL tree.
Complexity
Algorithm Average case Worst case
Space o(n) o(n)
Search o(log n) o(log n)
Insert o(log n) o(log n)
Delete o(log n) o(log n)
AVL Tree Rotations
In AVL tree, after performing operations like insertion and deletion we need to
check the balance factor of every node in the tree. If every node satisfies the
balance factor condition then we conclude the operation otherwise, we must
make it balanced. Whenever the tree becomes imbalanced due to any operation
we use rotation operations to make the tree balanced.
Rotation operations are used to make the tree balanced.
Rotation is the process of moving nodes either to left or to right to make the tree
balanced.
There are four rotations and they are classified into two types.
Single Left Rotation (LL Rotation)
In LL Rotation, every node moves one position to left from the current position.
To understand LL Rotation, let us consider the following insertion operation in
AVL Tree...
Single Right Rotation (RR Rotation)
In RR Rotation, every node moves one position to right from the current position. To
understand RR Rotation, let us consider the following insertion operation in AVL
Tree...
Left Right Rotation (LR Rotation)
The LR Rotation is a sequence of single left rotation followed by a single right
rotation. In LR Rotation, at first, every node moves one position to the left and
one position to right from the current position. To understand LR Rotation, let
us consider the following insertion operation in AVL Tree...
Right Left Rotation (RL Rotation)
The RL Rotation is sequence of single right rotation followed by single left rotation. In
RL Rotation, at first every node moves one position to right and one position to left
from the current position. To understand RL Rotation, let us consider the following
insertion operation in AVL Tree...
Operations on an AVL Tree
The following operations are performed on AVL tree...
1. Search
2. Insertion
3. Deletion
Search Operation in AVL Tree
In an AVL tree, the search operation is performed with O (log n) time
complexity. The search operation in the AVL tree is similar to the search
operation in a Binary search tree. We use the following steps to search an
element in AVL tree...
• Step 1 - Read the search element from the user. n
• Step 2 - Compare the search element with the value of root node in the
tree.
• Step 3 - If both are matched, then display "Given node is found!!!" and
terminate the function
• Step 4 - If both are not matched, then check whether search element is
smaller or larger than that node value.
• Step 5 - If search element is smaller, then continue the search process in
left subtree.
• Step 6 - If search element is larger, then continue the search process in
right subtree.
• Step 7 - Repeat the same until we find the exact element or until the
search element is compared with the leaf node.
• Step 8 - If we reach to the node having the value equal to the search value,
then display "Element is found" and terminate the function.
• Step 9 - If we reach to the leaf node and if it is also not matched with the
search element, then display "Element is not found" and terminate the
function.
Insertion Operation in AVL Tree:
In an AVL tree, the insertion operation is performed with O (log n) time
complexity. In AVL Tree, a new node is always inserted as a leaf node. The
insertion operation is performed as follows...
• Step 1 - Insert the new element into the tree using Binary Search Tree
insertion logic.
• Step 2 - After insertion, check the Balance Factor of every node.
• Step 3 - If the Balance Factor of every node is 0 or 1 or -1 then go for next
operation.
• Step 4 - If the Balance Factor of any node is other than 0 or 1 or -1 then
that tree is said to be imbalanced. In this case, perform
suitable Rotation to make it balanced and go for next operation.
Example: Construct an AVL Tree by inserting numbers from 1
to 8.
Deletion Operation in AVL Tree:
The deletion operation in AVL Tree is similar to deletion operation in BST. But
after every deletion operation, we need to check with the Balance Factor
condition. If the tree is balanced after deletion go for next operation otherwise
perform suitable rotation to make the tree Balanced.
Applications of AVL Trees
1. Most in-memory sets and dictionaries are stored using AVL trees.
2. Database applications, where insertions and deletions are less common
but frequent data lookups are necessary, also frequently employ AVL
trees.
3. In addition to database applications, it is employed in other applications
that call for better searching.
4. A balanced binary search tree called an AVL tree uses rotation to keep
things balanced.
5. It may be used in games with plotlines as well.
6. It is mostly utilized in business sectors where it is necessary to keep
records on the employees that work there and their shift changes.
B - Tree Data structure
In search trees like binary search tree, AVL Tree, Red-Black tree, etc., every node
contains only one value (key) and a maximum of two children. But there is a special
type of search tree called B-Tree in which a node contains more than one value (key)
and more than two children. B-Tree was developed in the year 1972 by Bayer and
McCreight with the name Height Balanced m-way Search Tree. Later it was named
as B-Tree.
B-Tree can be defined as follows...
B-Tree is a self-balanced search tree in which every node contains multiple
keys and has more than two children.
Note1: The number of keys in a node and number of children for a node depends on
the order of B-Tree.
Note2: Every B-Tree has an order.
B-Tree of Order m has the following properties...
1. Property 1 - All leaf nodes must be at same level.
2. Property 2 - All nodes except root must have at least [m/2]-1 keys and
maximum of m-1 keys.
3. Property 3 - All non leaf nodes except root (i.e. all internal nodes) must have
at least m/2 children.
4. Property 4 - If the root node is a non leaf node, then it must have atleast
2 children.
5. Property 5 - A non leaf node with n-1 keys must have n number of children.
6. Property 6 - All the key values in a node must be in Ascending Order.
For example, B-Tree of Order 4 contains a maximum of 3 key values in a node
and maximum of 4 children for a node.
Example
Operations on a B-Tree
The following operations are performed on a B-Tree...
1. Search
2. Insertion
3. Deletion
Search Operation in B-Tree
The search operation in B-Tree is similar to the search operation in Binary Search
Tree. In a Binary search tree, the search process starts from the root node and we make
a 2-way decision every time (we go to either left subtree or right subtree). In B-Tree
also search process starts from the root node but here we make an n-way decision
every time. Where 'n' is the total number of children the node has. In a B-Tree, the
search operation is performed with O(log n) time complexity. The search operation is
performed as follows...
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with first key value of root node in the tree.
Step 3 - If both are matched, then display "Given node is found!!!" and terminate
the function
Step 4 - If both are not matched, then check whether search element is smaller or
larger than that key value.
Step 5 - If search element is smaller, then continue the search process in left
subtree.
Step 6 - If search element is larger, then compare the search element with next key
value in the same node and repeat steps 3, 4, 5 and 6 until we find the exact match
or until the search element is compared with last key value in the leaf node.
Step 7 - If the last key value in the leaf node is also not matched then display
"Element is not found" and terminate the function.
Insertion Operation in B-Tree
In a B-Tree, a new element must be added only at the leaf node. That means, the new key Value is
always attached to the leaf node only. The insertion operation is performed as follows...
Step 1 - Check whether tree is Empty.
Step 2 - If tree is Empty, then create a new node with new key value and insert it into the tree
as a root node.
Step 3 - If tree is Not Empty, then find the suitable leaf node to which the new key value is
added using Binary Search Tree logic.
Step 4 - If that leaf node has empty position, add the new key value to that leaf node in
ascending order of key value within the node.
Step 5 - If that leaf node is already full, split that leaf node by sending middle value to its
parent node. Repeat the same until the sending value is fixed into a node.
Step 6 - If the spilting is performed at root node then the middle value becomes new root node
for the tree and the height of the tree is increased by one.
Example
Construct a B-Tree of Order 3 by inserting numbers from 1 to 10.
The Order of B Tree is m=3
Min. Number of keys= m/2-1=3/2-1=1.5-1=0.5=0
Max. Number of keys=m-1=3-1=2
Min. Number of children’s=m/2=3/2=1
Max. Number of Children’s=m=3
Example2:
Insertion operation
The insertion operation for a B Tree is done similar to the Binary Search Tree but the
elements are inserted into the same node until the maximum keys are reached. The
insertion is done using the following procedure −
Step 1 − Calculate the maximum (m−1) and, minimum (⌈m/2⌉−1) number of keys a
node can hold, where m is denoted by the order of the B Tree.
Step 2 − The data is inserted into the tree using the binary search insertion and once
the keys reach the maximum number, the node is split into half and the median key
becomes the internal node while the left and right keys become its children .
Step 3 − All the leaf nodes must be on the same level.
The keys, 5, 3, 21, 9, 13 are all added into the node according to the binary search
property but if we add the key 22, it will violate the maximum key property. Hence,
the node is split in half, the median key is shifted to the parent node and the insertion
is then continued.
Another interruption occurs during the insertion of 11, so the node is split and
median is shifted to the parent.
While inserting 16, even if the node is split in two parts, the parent node also overflows
as it reached the maximum keys. Hence, the parent node is split first and the median
key becomes the root. Then, the leaf node is split in half the median of leaf node is
shifted to its parent.
The final B tree after inserting all the elements is achieved.
Deletion operation of a B Tree
The Deletion operation in a B Tree is slightly different from the deletion operation of
a Binary Search Tree.
Let us say the node to be deleted is called the target key. The target key can either be at
the leaf node or an internal node
The deletion of node in a B-Tree can be broadly classified into two cases:
1. Deletion At Leaf Node.
2. Deletion At Internal Node.
1. If the target key is at the leaf node:
If the target key is at the leaf node, we further study the given data to check if any of
the following cases exist:
Case 1: If the leaf node consists of the min number of keys according to the given
degree/order, then the key is simply deleted from the node.
Case 2: If the leaf contains the minimum number of keys, then:
Case 2a: The node can borrow a key from the immediate left sibling node, if it has
more than the minimum number of keys. The transfer of the keys take place through
the parent node, i.e, the maximum key of the left sibling moves upwards and replaces
the parent; while the parent key moves down to the target node from where the target
key is simply deleted.
Case 2b: The node can borrow a key from the immediate right sibling node, if it has
more than the minimum number of keys. The transfer of the keys take place through
the parent node, i.e, the minimum key of the right sibling moves upwards and replaces
the parent; while the parent key moves down to the target node from where the target
key is simply deleted.
Case 2c: If neither of the siblings have keys more than the minimum number of keys
required then, merge the target node with either the left or the right sibling along with
the parent key of respective node.
2.If the target key is at the internal node:
If the target key is at an internal node, we further study the given data to check if
any of the following cases exist:
Case 1: If the left child has more than the minimum number of keys, the target key in
the internal node is replaced by its inorder predecessor ,i.e, the largest element of the
left child node.
Case 2: If the right child has more than the minimum number of keys, the target key
in the internal node is replaced by it’s inorder successor ,i.e, the smallest element of
the right child node.
Applications of B Tree:
1. Large databases employ it to access information stored on discs.
2. Using the B-Tree, finding data in a data set can be done in a great deal less time.
3. Multilevel indexing is possible with the indexing feature.
4. The B-tree method is also used by the majority of servers.
5. In CAD systems, B-Trees are used to catalogue and search geometric data.
6. Other applications of B-Trees include encryption, computer networks, and
natural language processing.
7. Since accessing values stored in a large database that is stored on a disc takes a
long time, B trees are used to index the data and provide quick access to the
actual data stored on the disks.
8. In the worst case, it takes O (n) running time to search a database with n key
values that is not sorted or indexed. However, if we use B Tree to index this
database, it will be searched in O (log n) time in worst case.