Data Structure Intro and Algo
Data Structure Intro and Algo
Data Structure Intro and Algo
Introduction
Data Structure can be defined as the group of data elements which provides an efficient way of storing and
organising data in the computer so that it can be used efficiently. Some examples of Data Structures are
arrays, Linked List, Stack, Queue, etc. Data Structures are widely used in almost every aspect of Computer
Science i.e. operating System, Compiler Design, Artificial intelligence, Graphics and many more.
Data Structures are the main part of many computer science algorithms as they enable the programmers to
handle the data in an efficient way. It plays a vital role in enhancing the performance of a software or a
program as the main function of the software is to store and retrieve the user's data as fast as possible
Basic Terminology
Data structures are the building blocks of any program or the software. Choosing the appropriate data
structure for a program is the most difficult task for a programmer. Following terminology is used as far as
data structures are concerned
Data: Data can be defined as an elementary value or the collection of values, for example, student's name
and its id are the data about the student.
Group Items: Data items which have subordinate data items are called Group item, for example, name of a
student can have first name and the last name.
Record: Record can be defined as the collection of various data items, for example, if we talk about the
student entity, then its name, address, course and marks can be grouped together to form the record for the
student.
File: A File is a collection of various records of one type of entity, for example, if there are 60 employees in
the class, then there will be 20 records in the related file where each record contains the data about each
employee.
Attribute and Entity: An entity represents the class of certain objects. it contains various attributes. Each
attribute represents the particular property of that entity.
Field: Field is a single elementary unit of information representing the attribute of an entity.
Processor speed: To handle very large amount of data, high speed processing is required, but as the data is
growing day by day to the billions of files per entity, processor may fail to deal with that much amount of
data.
Data Search: Consider an inventory size of 106 items in a store, If our application needs to search for a
particular item, it needs to traverse 106 items every time, results in slowing down the search process.
Multiple requests: If thousands of users are searching the data simultaneously on a web server, then there
are the chances that a very large server can be failed during that process
in order to solve the above problems, data structures are used. Data is organized to form a data structure in
such a way that all items are not required to be searched and required data can be searched instantly.
Reusability: Data structures are reusable, i.e. once we have implemented a particular data structure, we can
use it at any other place. Implementation of data structures can be compiled into libraries which can be used
by different clients.
Abstraction: Data structure is specified by the ADT which provides a level of abstraction. The client
program uses the data structure through interface only, without getting into the implementation details.
Linear Data Structures: A data structure is called linear if all of its elements are arranged in the linear
order. In linear data structures, the elements are stored in non-hierarchical way where each element has the
successors and predecessors except the first and last element.
Types of Linear Data Structures are given below:
Arrays: An array is a collection of similar type of data items and each data item is called an element of the
array. The data type of the element may be any valid data type like char, int, float or double.
The elements of array share the same variable name but each one carries a different index number known as
subscript. The array can be one dimensional, two dimensional or multidimensional.
Linked List: Linked list is a linear data structure which is used to maintain a list in the memory. It can be
seen as the collection of nodes stored at non-contiguous memory locations. Each node of the list contains a
pointer to its adjacent node.
Stack: Stack is a linear list in which insertion and deletions are allowed only at one end, called top.
A stack is an abstract data type (ADT), can be implemented in most of the programming languages. It is
named as stack because it behaves like a real-world stack, for example: - piles of plates or deck of cards etc.
Queue: Queue is a linear list in which elements can be inserted only at one end called rear and deleted only
at the other end called front.
It is an abstract data structure, similar to stack. Queue is opened at both end therefore it follows First-In-
First-Out (FIFO) methodology for storing the data items.
Non Linear Data Structures: This data structure does not form a sequence i.e. each item or element is
connected with two or more other items in a non-linear arrangement. The data elements are not arranged in
sequential structure.
Trees: Trees are multilevel data structures with a hierarchical relationship among its elements known as
nodes. The bottommost nodes in the herierchy are called leaf node while the topmost node is called root
node. Each node contains pointers to point adjacent nodes.
Tree data structure is based on the parent-child relationship among the nodes. Each node in the tree can have
more than one children except the leaf nodes whereas each node can have atmost one parent except the root
node. Trees can be classfied into many categories which will be discussed later in this tutorial.
Graphs: Graphs can be defined as the pictorial representation of the set of elements (represented by vertices)
connected by the links known as edges. A graph is different from tree in the sense that a graph can have
cycle while the tree can not have the one.
Example: If we need to calculate the average of the marks obtained by a student in 6 different subject, we
need to traverse the complete array of marks and calculate the total sum, then we will devide that sum by the
number of subjects i.e. 6, in order to find the average.
2) Insertion: Insertion can be defined as the process of adding the elements to the data structure at any
location.
If the size of data structure is n then we can only insert n-1 data elements into it.
3) Deletion:The process of removing an element from the data structure is called Deletion. We can delete
an element from the data structure at any random location.
If we try to delete an element from an empty data structure then underflow occurs.
4) Searching: The process of finding the location of an element within the data structure is called Searching.
There are two algorithms to perform searching, Linear Search and Binary Search. We will discuss each one
of them later in this tutorial.
5) Sorting: The process of arranging the data structure in a specific order is known as Sorting. There are
many algorithms that can be used to perform sorting, for example, insertion sort, selection sort, bubble sort,
etc.
6) Merging: When two lists List A and List B of size M and N respectively, of similar type of elements,
clubbed or joined to produce the third list, List C of size (M+N), then this process is called merging
DS Algorithm
What is an Algorithm?
An algorithm is a process or a set of rules required to perform calculations or some other problem-solving
operations especially by a computer. The formal definition of an algorithm is that it contains the finite set of
instructions which are being carried in a specific order to perform the specific task. It is not the complete
program or code; it is just a solution (logic) of a problem, which can be represented either as an informal
description using a Flowchart or Pseudocode.
Characteristics of an Algorithm
The following are the characteristics of an algorithm:
o Input: An algorithm has some input values. We can pass 0 or some input value to an algorithm.
o Finiteness: An algorithm should have finiteness. Here, finiteness means that the algorithm should
contain a limited number of instructions, i.e., the instructions should be countable.
o Algorithm: An algorithm will be designed for a problem which is a step by step procedure.
o Input: After designing an algorithm, the required and the desired inputs are provided to the
algorithm.
o Processing unit: The input will be given to the processing unit, and the processing unit will produce
the desired output.
o Scalability: It helps us to understand the scalability. When we have a big real-world problem, we
need to scale it down into small-small steps to easily analyze the problem.
o Performance: The real-world is not easily broken down into smaller steps. If the problem can be
easily broken into smaller steps means that the problem is feasible.
Let's understand the algorithm through a real-world example. Suppose we want to make a lemon juice, so
following are the steps required to make a lemon juice:
Step 2: Squeeze the lemon as much you can and take out its juice in a container.
Step 5: When sugar gets dissolved, add some water and ice in it.
The above real-world can be directly compared to the definition of the algorithm. We cannot perform the
step 3 before the step 2, we need to follow the specific order to make lemon juice. An algorithm also says
that each and every instruction should be followed in a specific order to perform a specific task.
Now we will look an example of an algorithm in programming.
The following are the steps required to add two numbers entered by the user:
Step 1: Start
Step 4: Add the values of a and b and store the result in the sum variable, i.e., sum=a+b.
Step 6: Stop
Factors of an Algorithm
The following are the factors that we need to consider for designing an
algorithm:
o Modularity: If any problem is given and we can break that problem into small-small modules or
small-small steps, which is a basic definition of an algorithm, it means that this feature has been
perfectly designed for the algorithm.
o Correctness: The correctness of an algorithm is defined as when the given inputs produce the
desired output, which means that the algorithm has been designed algorithm. The analysis of an
algorithm has been done correctly.
o Maintainability: Here, maintainability means that the algorithm should be designed in a very
simple structured way so that when we redefine the algorithm, no major change will be done in the
algorithm.
o Robustness: Robustness means that how an algorithm can clearly define our problem.
o User-friendly: If the algorithm is not user-friendly, then the designer will not be able to explain it
to the programmer.
o Extensibility: If any other algorithm designer or programmer wants to use your algorithm then it
should be extensible.
Importance of Algorithms
1. Theoretical importance: When any real-world problem is given to us and we break the problem
into small-small modules. To break down the problem, we should know all the theoretical aspects.
2. Practical importance: As we know that theory cannot be completed without the practical
implementation. So, the importance of algorithm can be considered as both theoretical and practical.
Issues of Algorithms
The following are the issues that come while designing an algorithm:
Approaches of Algorithm
The following are the approaches used after considering both the
theoretical and practical importance of designing an algorithm:
o Brute force algorithm: The general logic structure is applied to design an algorithm. It is also
known as an exhaustive search algorithm that searches all the possibilities to provide the required
solution. Such algorithms are of two types:
1. Optimizing: Finding all the solutions of a problem and then take out the best solution or if
the value of the best solution is known then it will terminate if the best solution is known.
o Greedy algorithm: It is an algorithm paradigm that makes an optimal choice on each iteration with
the hope of getting the best solution. It is easy to implement and has a faster execution time. But,
there are very rare cases in which it provides the optimal solution.
o Dynamic programming: It makes the algorithm more efficient by storing the intermediate results.
It follows five different steps to find the optimal solution for the problem:
1. It breaks down the problem into a subproblem to find the optimal solution.
2. After breaking down the problem, it finds the optimal solution out of these subproblems.
3. Stores the result of the subproblems is known as memorization.
4. Reuse the result so that it cannot be recomputed for the same subproblems.
o Branch and Bound Algorithm: The branch and bound algorithm can be applied to only integer
programming problems. This approach divides all the sets of feasible solutions into smaller subsets.
These subsets are further evaluated to find the best solution.
o Randomized Algorithm: As we have seen in a regular algorithm, we have predefined input and
required output. Those algorithms that have some defined set of inputs and required output, and
follow some described steps are known as deterministic algorithms. What happens that when the
random variable is introduced in the randomized algorithm?. In a randomized algorithm, some
random bits are introduced by the algorithm and added in the input to produce the output, which is
random in nature. Randomized algorithms are simpler and efficient than the deterministic algorithm.
o Backtracking: Backtracking is an algorithmic technique that solves the problem recursively and
removes the solution if it does not satisfy the constraints of a problem.
o Search: Algorithm developed for searching the items inside a data structure.
o Delete: Algorithm developed for deleting the existing element from the data structure.
o Update: Algorithm developed for updating the existing element inside a data structure.
Algorithm Analysis
The algorithm can be analyzed in two levels, i.e., first is before creating the algorithm, and second is after
creating the algorithm. The following are the two analysis of an algorithm:
o Priori Analysis: Here, priori analysis is the theoretical analysis of an algorithm which is done before
implementing the algorithm. Various factors can be considered before implementing the algorithm
like processor speed, which has no effect on the implementation part.
o Posterior Analysis: Here, posterior analysis is a practical analysis of an algorithm. The practical
analysis is achieved by implementing the algorithm using any programming language. This analysis
basically evaluate that how much running time and space taken by the algorithm.
Algorithm Complexity
The performance of the algorithm can be measured in two factors:
o Time complexity: The time complexity of an algorithm is the amount of time required to complete
the execution. The time complexity of an algorithm is denoted by the big O notation. Here, big O
notation is the asymptotic notation to represent the time complexity. The time complexity is mainly
calculated by counting the number of steps to finish the execution. Let's understand the time
complexity through an example.
1. sum=0;
2. // Suppose we have to calculate the sum of n numbers.
3. for i=1 to n
4. sum=sum+i;
5. // when the loop ends then sum holds the sum of the n numbers
6. return sum;
In the above code, the time complexity of the loop statement will be atleast n, and if the value of n increases,
then the time complexity also increases. While the complexity of the code, i.e., return sum will be constant
as its value is not dependent on the value of n and will provide the result in one step only. We generally
consider the worst-time complexity as it is the maximum time taken for any given input size.
o Space complexity: An algorithm's space complexity is the amount of space required to solve a
problem and produce an output. Similar to the time complexity, space complexity is also expressed
in big O notation.
So,
Types of Algorithms
The following are the types of algorithm:
o Search Algorithm
o Sort Algorithm
Search Algorithm
On each day, we search for something in our day to day life. Similarly, with the case of computer, huge data
is stored in a computer that whenever the user asks for any data then the computer searches for that data in
the memory and provides that data to the user. There are mainly two techniques available to search the data
in an array:
o Linear search
o Binary search
Linear Search
Linear search is a very simple algorithm that starts searching for an element or a value from the beginning
of an array until the required element is not found. It compares the element to be searched with all the
elements in an array, if the match is found, then it returns the index of the element else it returns -1. This
algorithm can be implemented on the unsorted list.
Binary Search
A Binary algorithm is the simplest algorithm that searches the element very quickly. It is used to search the
element from the sorted list. The elements must be stored in sequential order or the sorted manner to
implement the binary algorithm. Binary search cannot be implemented if the elements are stored in a random
manner. It is used to find the middle element of the list.
Sorting Algorithms
Sorting algorithms are used to rearrange the elements in an array or a given data structure either in an
ascending or descending order. The comparison operator decides the new order of the elements.
o Searching a particular element in a sorted list is faster than the unsorted list.