Data Structures
Data Structures
(Computer Science)
SEMESTER - II (CBCS)
DATA STRUCTURE
: Mlind Thorat
Asst. Professor,
KJSIEIT, Sion, ,Mumbai
: Dr Triveni Koul
Asst. Professor,
Yashwantrao Chavan College,
Kopar Khairane, Navi Mumbai
: Abhijeet Pawaskar
Asst. Professor,
Thakur Educational Trusts Thakur College of
Science And Commerce, Mumbai
Published by : Director
ipin Enterprises Institute of Distance and Open Learning ,
University of Mumbai,
Tantia Jogani Industrial Estate, Unit No. 2,
Vidyanagari, Mumbai - 400 098.
Ground Floor, Sitaram Mill Compound,
J.R. Boricha Marg, Mumbai - 400 011
DTP Composed and : Mumbai University Press
Printed by Vidyanagari, Santacruz (E), Mumbai - 400098
CONTENTS
Unit No. Title Page No.
Unit- I
1. Abstract Data Types 01
2. Arrays 15
3. Sets and Map 35
4. Algorithm Analysis 55
5. Application of Searching 75
6. Application of Sorting and Working With Sorted Lists 90
Unit - II
7. Linked Structures 116
8. Stacks 134
9. Linked Lists 148
10. Queues 157
Unit - III
11. Recursion 175
12. Hash Table 196
13. Advanced Sorting 224
14. Binary Trees 251
F.Y.B.SC.
( COMPUTER SCIENCE)
SEM - II (CBCS)
Unit I
1
ABSTRACT DATA TYPES
Unit Structure
1.0 Objective
1.1 Introduction
1.1.1 Abstractions
1.1.2 Abstract Data Types
1.1.3 Data Structures
1.1.4 General Definitions
1.2 The Date Abstract Data Type
1.2.1 Defining the ADT
1.2.2 Preconditions and Post conditions
1.3 Bags
1.4 Iterators
1.5 Application: Student Records
1.6 Exercise
1.0 OBJECTIVE
In this chapter we are going to learn.
➔ the concept of abstract data types (ADTs) for both simple types, those
containing individual data fields, and the more complex types, those
containing data structures.
➔ ADTs definition, use, and implementation.
➔ the importance of abstraction,
➔ several ADTs and then how a well-defined ADT can be used without
knowing how its actually implemented.
➔ the implementation of the ADTs with an emphasis placed on the
importance of selecting an appropriate data structure.
The chapter includes an introduction to the Python iterator mechanism and
provides an example of a user-defined iterator for use with a container
type ADT.
1
Data Structures
1.1 INTRODUCTION
● Data items are represented within a computer as a sequence of binary
digits.
● These sequences can appear very similar but have different meanings
since computers can store and manipulate different types of data.
● For example, the binary sequence 010011001100101101011100110
11100 could be a string of characters, an integer value, or a real value.
● To distinguish between the different types of data, the term type is
often used to refer to a collection of values and the term data type to
refer to a given type along with a collection of operations for
manipulating values of the given type.
● Programming languages commonly provide data types as part of the
language itself.
1.1.1 Abstractions
● An abstraction is a mechanism for separating the properties of an
object and restricting the focus to those relevant in the current context.
● The user of the abstraction does not have to understand all of the
details in order to utilize the object, but only those relevant to the
current task or problem.
2
● Two common types of abstractions encountered in computer science Abstract Data Types
are
5
Data Structures ● All data structures store a collection of values, but differ in how they
organize the individual data items and by what operations can be
applied to manage the collection.
● The choice of a particular data structure depends on the ADT and the
problem at hand. Some data structures are better suited to particular
problems.
● For example, the queue structure is perfect for implementing a printer
queue, while the B-Tree is the better choice for a database index.
Term Definition
6
1.2 THE DATE ABSTRACT DATA TYPE Abstract Data Types
Definition:
A date represents a single day in the proleptic Gregorian calendar in
which the first day starts on November 24, 4713 BC.
Methos Description
7
Data Structures
dayOfWeek(): ➢ Returns the day of the week as a
number between 0 and 6 with 0
representing Monday and 6
representing Sunday.
● Postconditions:
A postcondition indicates the result or ending state of the ADT
instance after the operation is performed.
● The precondition is assumed to be true while the postcondition is a
guarantee as long as the preconditions are met.
8
● Attempting to perform an operation in which the precondition is not Abstract Data Types
satisfied should be flagged as an error.
● Example:
Consider the use of the pop(i) method for removing a value from a list.
○ When this method is called, the precondition states the supplied
index must be within the legal range.
○ Upon successful completion of the operation, the postcondition
guarantees the item has been removed from the list.
○ If an invalid index, one that is out of the legal range, is passed to
the pop() method, an exception is raised.
● All operations have at least one precondition, which is that the ADT
instance has to have been previously initialized.
● When implementing abstract data types, it’s important that we ensure
the proper execution of the various operations by verifying any stated
preconditions.
● The appropriate mechanism when testing preconditions for abstract
data types is to test the precondition and raise an exception when the
precondition fails.
1.3 BAGS
● The Date ADT provided an example of a simple abstract data type.
● To illustrate the design and implementation of a complex abstract data
type, we define the Bag ADT.
● A bag is a simple container like a shopping bag that can be used to
store a collection of items.
Definition:
A bag is a container that stores a collection in which duplicate values are
allowed.
The items, each of which is individually stored, have no particular order
but they must be comparable.
9
Data Structures
Method Description
10
1.4 ITERATORS Abstract Data Types
11
Data Structures
1.5 APPLICATION: STUDENT RECORDS
● Most computer applications are written to process and manipulate data
that is stored external to the program.
● Data is commonly extracted from files stored on disk, from databases,
and even from remote sites through web services.
● For example, suppose we have a collection of records stored on disk
that contain information related to students at Smalltown College.
● We have been assigned the task to extract this information and
produce a report similar to the following in which the records are
sorted by identification number.
● Our contact in the Registrar’s office, who assigned the task, has
provided some information about the data.
● We know each record contains five pieces of information for an
individual student:
(1) the student’s id number represented as an integer;
(2) their first and last names, which are strings;
(3) an integer classification code in the range [1 . . . 4] that
indicates if the student is a freshman, sophomore, junior, or
senior; and
(4) their current grade point average represented as a floatingpoint
value.
● The data could be stored in a plain text file, in a binary file, or even in
a database.
12
● In addition, if the data is stored in a text or binary file, we will need to Abstract Data Types
know how the data is formatted in the file, and if it’s in a relational
database, we will need to know the type and the structure of the
database.
13
Data Structures
1.6 EXERCISE
Answer the following:
1. Define ADT (Abstract Data Type). Mention the features of ADT.What
are the benefits of ADT?
2. Explain the various operations of the list ADT with examples
Reference Book:
Data Structure and Algorithm Using Python, Rance D. Necaise, 2016 Wiley
India Edition
14
2
ARRAYS
Unit Structure
2.0 Objective
2.1 Array Structure
2.2 Python List
2.3 Two Dimensional Arrays
2.4 Matrix Abstract Data Type
2.5 Application: The Game of Life
2.6 Exercise
2.0 OBJECTIVE
● Introduces the student to the array structure, which is important since
Python only provides the list structure and students are unlikely to
have seen the concept of the array as a fixed-sized structure in a first
course using Python.
● We define an ADT for a one-dimensional array and implement it using
a hardware array provided through a special mechanism of the C-
implemented version of Python.
● The two-dimensional array is also introduced and implemented using a
1-D array of arrays.
● The array structures will be used throughout the text in place of the
Python’s list when it is the appropriate choice.
● The implementation of the list structure provided by Python is
presented to show how the various operations are implemented using a
1-D array.
● The Matrix ADT is introduced and includes an implementation using a
two-dimensional array that exposes the students to an example of an
ADT that is best implemented using a structure other than the list or
dictionary.
● The most basic structure for storing and accessing a collection of data
is the array.
● Arrays can be used to solve a wide range of problems in computer
science. Most programming languages provide this structured data
type as a primitive and allow for the creation of arrays with multiple
dimensions.
15
Data Structures ● In this chapter, we implement an array structure for a one-dimensional
array and then use it to implement a two-dimensional array and the
related matrix structure.
List Array
The list can store the value of It can only consist of value of
different types. same type.
The list cannot handle the direct It can directly handle arithmetic
arithmetic operations. operations.
We need to import the array before The lists are the build-in data
work with the array. structure so we don't need to
import it.
16
The lists are less compatible than the An array are much compatible Arrays
array to store the data. than the list.
We can print the entire list using We can print the entire list
explicit looping. without using explicit looping.
Method Description
getitem (index): Returns the value stored in the array at element position
index. The index argument must be within the valid
range. Accessed using the subscript operator.
17
Data Structures
setitem (index, Modifies the contents of the array element at position
value ): index to contain value. The index must be within the
valid range. Accessed using the subscript operator.
➔ After the array has been created, the elements can be accessed using
the same integer subscript notation as used with Python’s own
sequence types.
➔ For the slots array, the legal range is [0 . . . 4].
➔ The range() function to indicate the number of elements to be
initialized.
18
➔ References to any type of Python object can be stored in any element
Arrays
of the array.
➔ For example, the following code segment stores three integers in
various elements of the array:
slots[1] = 12
slots[3] = 54
slots[4] = 37
the result of which is illustrated here:
➔ The size of the array can never change, so removing an item from an
array has no effect on the size of the array or on the items stored in
other elements.
➔ The array does not provide any of the list type operations such as
appending or popping items, searching for a specific item, or sorting
the items.
19
Data Structures 2.2.1 Creating a Python List
● Suppose we create a list containing several values:
pyList = [ 4, 12, 2, 34, 17 ]
● the list() constructor being called to create a list object and fill it with
the given values.
● Following Figure illustrates the abstract and physical views of our
sample list:
● In the physical view, the elements of the array structure used to store
the actual contents of the list are enclosed inside the dashed gray box.
● The elements with null references shown outside the dashed gray box
are the remaining elements of the underlying array structure that are
still available for use.
● This notation will be used throughout the section to illustrate the
contents of the list and the underlying array used to implement it.
2.2.2 Appending Items
● If there is room in the array, the item is stored in the next available slot
of the array and the length field is incremented by one.
● The result of appending 50 to pyList is illustrated in Figure
pyList.append (50 )
Figure: The steps required to expand the array to provide space for
value 6
21
Data Structures 2.2.3 Extending A List
● A list can be appended to a second list using the extend() method as
shown in the following example:
pyListA = [ 34, 12 ]
pyListB = [ 4, 6, 31, 9 ]
pyListA.extend( pyListB )
● The new array will be created larger than needed to allow more items
to be added to the list without first requiring an immediate expansion
of the array.
● After the new array is created, elements from the destination list are
copied to the new array followed by the elements from the source list,
pyListB, as illustrated in Figure:
22
Arrays
Figure: Inserting an item into a list: (a) the array elements are shifted
to the right one at a time, traversing from right to left; (b) the new
value is then inserted into the array at the given position; (c) the result
after inserting the item.
2.2.5 Removing Items
● An item can be removed from any position within the list using the
pop() method.
● Consider the following code segment, which removes both the first
and last items from the sample list:
pyList.pop( 0 ) # remove the first item
pyList.pop() # remove the last item
● The first statement removes the first item from the list.
● After the item is removed, typically by setting the reference variable to
None, the items following it within the array are shifted down, from
left to right, to close the gap.
● Finally, the length of the list is decremented to reflect the smaller size.
● Following Figure on the next page illustrates the process of removing
the first item from the sample list.
● The second pop() operation in the example code removes the last item
from the list
23
Data Structures
Figure: Removing an item from a list: (a) a copy of the item is saved;
(b) the array elements are shifted to the left one at a time, traversing
left to right; and (c) the size of the list is decremented by one.
● After removing an item from the list, the size of the array may be
reduced using a technique similar to that for expansion.
● This reduction occurs when the number of available slots in the
internal array falls below a certain threshold.
● For example, when more than half of the array elements are empty, the
size of the array may be cut in half.
2.2.6 List Slice
● Slicing is an operation that creates a new list consisting of a
contiguous subset of elements from the original list.
● The original list is not modified by this operation.
● Instead, references to the corresponding elements are copied and
stored in the new list.
● In Python, slicing is performed on a list using the colon operator and
specifying the beginning element index and the number of elements
included in the subset.
● Consider the following example code segment, which creates a slice
from our sample list:
aSlice = theVector[2:3]
● To slice a list, a new list is created with a capacity large enough to
store the entire subset of elements plus additional space for future
insertions.
● The elements within the specified range are then copied, element by
element, to the new list. The result of creating the sample slice is
illustrated in Figure:
24
Arrays
25
Data Structures
Method Description
setitem( i1, i2, value ): Modifies the contents of the 2-D array
element indicated by the 2-tuple (i1, i2) with
the new value. Both indices must be within
the valid range. Accessed using the
subscript operator: x[0,3] = y.
26
Arrays
clear() The clear() method can set every element to the given
value by calling the clear() method on each of the 1-D
arrays used to store the individual rows.
27
Data Structures
Method Description
setitem ( row, col, scalar ): Sets the matrix element at the given row
and col to scalar. The element indices
must be within the valid range.
○
● Scaling.
○ A matrix can be uniformly scaled, which modifies each element of
the matrix by the same scale factor.
○ A scale factor of less than 1 has the effect of reducing the value of
each element whereas a scale factor greater than 1 increases the
value of each element.
○ Scaling a matrix by a scale factor of 3 is illustrated here:
● Transpose
○ Another useful operation that can be applied to a matrix is the matrix
transpose.
○ Given a m × n matrix, a transpose swaps the rows and columns to
create a new matrix of size n × m as illustrated here:
○
● Multiplication.
○ Matrix multiplication is only defined for matrices where the number
of columns in the matrix on the lefthand side is equal to the number
of rows in the matrix on the righthand side.
○ The result is a new matrix that contains the same number of rows as
the matrix on the lefthand side and the same number of columns as
the matrix on the righthand side.
○ In other words, given a matrix of size m × n multiplied by a matrix
of size n × p, the resulting matrix is of size m × p.
○ In multiplying two matrices, each element of the new matrix is the
result of summing the product of a row in the lefthand side matrix by
a column in the righthand side matrix.
29
Data Structures ○ In the example matrix multiplication illustrated here, the row and
column used to compute entry (0, 0) of the new matrix is shaded in
gray.
30
Arrays
● The game of Life requires the use of a grid for storing the organisms.
● A Life Grid ADT can be defined to add a layer of abstraction between
the algorithm for “playing” the game and the underlying structure used
to store and manipulate the data.
31
Data Structures
Method Description
clearCell( row, col ): Clears the individual cell (row, col) and
sets it to dead.
The cell indices must be within the valid
range of the grid.
32
The gameoflife.py program. Arrays
1 # Program for playing the game of Life.
2 from life import LifeGrid
3
4 # Define the initial configuration of live cells.
5 INIT_CONFIG = [ (1,1), (1,2), (2,2), (3,2) ]
6
7 # Set the size of the grid.
8 GRID_WIDTH = 5
9 GRID_HEIGHT = 5
10
11 # Indicate the number of generations.
12 NUM_GENS = 8
13
14 def main():
15 # Construct the game grid and configure it.
16 grid = LifeGrid( GRID_WIDTH, GRID_HEIGHT )
17 grid.configure( INIT_CONFIG )
18
33
Data Structures 34 # Determine the number of live neighbors for this cell.
35 neighbors = grid.numLiveNeighbors( i, j )
36
37 # Add the (i,j) tuple to liveCells if this cell contains
38 # a live organism in the next generation.
39 if (neighbors == 2 and grid.isLiveCell( i, j )) or \
40 (neighbors == 3 ) :
41 liveCells.append( (i, j) )
42
43 # Reconfigure the grid using the liveCells coord list.
44 grid.configure( liveCells )
45
46 # Prints a text-based representation of the game grid.
47 def draw( grid ):
48 ......
49
50 # Executes the main routine.
51 main()
2.6 EXERCISE
Answer the following:
1. Explain different operation performed on List.
2. Explain the application of Array in ADT
Reference:
34
3
SETS AND MAPS
Unit Structure
3.0 Objective
3.1 Sets
3.1.1 Set ADT
3.1.2 Selecting Data Structure
3.1.3 List based Implementation
3.2 Maps
3.2.1 Map ADT
3.2.2 List Based Implementation
3.3 Multi-Dimensional Arrays
3.3.1 Multi-Array ADT
3.3.2 Implementing Multiarrays
3.4 Application
3.5 Exercise
3.0 OBJECTIVE
In this chapter we are going to learn about:
➢ Both the Set and Map (or dictionary) ADTs.
➢ Multi-dimensional arrays (those of two or more dimensions)
➢ Concept of physically storing these using a one-dimensional array in
either row-major or column-major order.
➢ Benefit from the use of a three-dimensional array.
3.1 SETS
● The Set ADT is a common container used in computer science.
● Set is commonly used when you need to store a collection of unique
values without regard to how they are stored or when you need to
perform various mathematical set operations on collections.
35
Data Structures
Definition Set ADT
A set is a container that stores a collection of unique values over a given
comparable domain in which the stored values have no particular
ordering.
Method Description
remove( element ): Removes the given value from the set if the
value is contained in the set and raises an
exception otherwise.
union( setB ): Creates and returns a new set that is the union of
this set and set B.
The new set created from the union of two sets,
A and B, contains all elements in A plus those
elements in B that are not in A.
Neither set A nor set B is modified by this
operation.
36
intersect( setB ): Creates and returns a new set that is the Sets and Map
intersection of this set and set B.
The intersection of sets A and B contains only
those elements that are in both A and B.
Neither set A nor set B is modified by this
operation.
Example:
In the following code segment, we create two sets and add elements to
each. The results are illustrated in Figure:
smith = Set()
smith.add( "CSCI-112" )
smith.add( "MATH-121" )
smith.add( "HIST-340" )
smith.add( "ECON-101" )
roberts = Set()
roberts.add( "POL-101" )
roberts.add( "ANTH-230" )
roberts.add( "CSCI-112" )
roberts.add( "ECON-101" )
37
Data Structures 3.1.2 Selecting a Data Structure
● We are trying to replicate the functionality of the set structure
provided by Python,which leaves the array, list, and dictionary
containers for consideration in implementing the Set ADT.
● The storage requirements for the bag and set are very similar with the
difference being that a set cannot contain duplicates.
● The dictionary would seem to be the ideal choice since it can store
unique items, but it would waste space in this case.
● The dictionary stores key/value pairs, which requires two data fields
per entry.
● An array could be used to implement the set, but a set can contain any
number of elements and by definition an array has a fixed size.
● To use the array structure, we would have to manage the expansion of
the array when necessary in the same fashion as it’s done for the list.
● Since the list can grow as needed, it seems ideal for storing the
elements of a set just as it was for the bag and it does provide for the
complete functionality of the ADT.
● Since the list allows for duplicate values, however, we must make
sure as part of the implementation that no duplicates are added to our
set.
3.1.3 List-Based Implementation
● Having selected the list structure, we can now implement the Set
ADT.
● Some of the operations of the set are very similar to those of the Bag
ADT and are implemented in a similar fashion.
● Sample instances for the two sets from Figure (a) are illustrated in
Figure (b)
(a) (b)
Figure (a) and (b): Two instances of the Set class implemented as a
list.
38
Adding Elements ➢ We must ensure that duplicate values are not Sets and Map
added to the set since the list structure does not
handle this for us.
➢ When implementing the add method we must
first determine if the supplied element is
already in the list or not.
○ If the element is not a duplicate, we can
simply append the value to the end of the
list;
○ If the element is a duplicate, we do
nothing.
➢ The reason for this is that the definition of the
add() operation indicates no action is taken
when an attempt is made to add a duplicate
value.
➢ This is known as a noop, which is short for no
operation and indicates no action is taken.
➢ Noops are appropriate in some cases, which
will be stated implicitly in the definition of an
abstract data type by indicating no action is to
be taken when the precondition fails as we did
with the add() operation.
39
Data Structures
➢ After verifying the size of the lists, we can
test to see if the self set is a subset of setB by
calling self.isSubsetOf(setB).
➢ This is a valid test since two equal sets are
subsets of each other and we already know
they are of the same size.
➢ To determine if one set is the subset of
another, we can iterate over the list of
elements in the self set and make sure each is
contained in setB.
➢ If just one element in the self set is not in
setB, then it is not a subset.
40
The Set Union ➢ For the first step, we simply create a new Sets and Map
instance of the Set class.
➢ The second step is accomplished with the use
of the list extend() method.
➢ It directly copies the entire contents of the list
used to store the elements of the self set to the
list used to store the elements of the newSet.
➢ For the final step, we iterate through the
elements of setB and add those elements to
the the newSet that are not in the self set.
➢ The unique elements are added to the newSet
by appending them to the list used to store the
elements of the newSet.
➢ The remaining operations of the Set ADT can
be implemented in a similar fashion and are
left as exercises.
3.2 MAPS
● An abstract data type that provides this type of search capability is
often referred to as a map or dictionary since it maps a key to a
corresponding value.
● Consider the problem of a university registrar having to manage and
process large volumes of data related to students.
● To keep track of the information or records of data, the registrar
assigns a unique student identification number to each individual
student as illustrated in following Figure.
41
Data Structures
42
Method Description Sets and Map
add( key, value ): Adds a new key/value pair to the map if the key is
not already in the map or replaces the data
associated with the key if the key is in the map.
Returns True if this is a new key and False if the
data associated with the existing key is replaced.
remove( key ): Removes the key/value pair for the given key if it is
in the map and raises an exception otherwise.
valueOf( key ): Returns the data record associated with the given
key. The key must exist in the map or an exception
is raised.
Method Description
45
Data Structures
length( dim ): Returns the length of the given array
dimension.
The individual dimensions are numbered
starting from 1, where 1 represents the
first, or highest, dimension possible in
the array.
Thus, in an array with three dimensions,
1 indicates the number of tables in the
box, 2 is the number of rows, and 3 is the
number of columns.
setitem ( i1, i2, . . . in, value ): Modifies the contents of the specified
array element to contain the given value.
The element is specified by the n-tuple
(i1, i2, . . . in).
All of the subscript components must be
given and they must be within the valid
range of the corresponding array
dimensions.
Accessed using the element operator: x[
1, 2 ] = y.
Data Organization
● Most computer architectures provide a mechanism at the hardware
level for creating and using one-dimensional arrays.
● Programming languages need only provide appropriate syntax to make
use of a 1-D array.
● Multi-dimensional arrays are not handled at the hardware level.
● Instead, the programming language typically provides its own
mechanism for creating and managing multi-dimensional arrays.
46
Array Storage Sets and Map
● A one-dimensional array is commonly used to physically store arrays
of higher dimensions.
● Consider a two-dimensional array divided into a table of rows and
columns as illustrated in the following Figure.
47
Data Structures ● In column-major order, the 2-D array is stored sequentially, one entire
column at a time, as illustrated in the given Figure.
48
Dimensionality ➢ In the multi-dimensional version of the Sets and Map
and Lengths array,each dimension of the array has an
associated size.
➢ The size of the requested dimension is then
returned using the appropriate value from the
_dims tuple.
➢ The _numDims() method returns the
dimensionality of the array, which can be
obtained from the number of elements in the
_dims tuple.
49
Data Structures
3.4 APPLICATION: SALES REPORTS
Scenario:
● LazyMart, Inc. is a small regional chain department store with
locations in several different cities and states.
● The company maintains a collection of sales records for the various
items sold and would like to generate several different types of reports
from this data.
● One such report, for example, is the yearly sales by store, as illustrated
in Figure :
Total Sales by ➢ With the data loaded from the file and stored in
Store a 3-D array, we can produce many different
types of reports or extract various information
from the sales data.
➢ For example, suppose we want to determine the
total sales for a given store, which includes the
sales figures of all items sold in that store for all
12 months.
➢ Assuming our view of the data as a collection of
spreadsheets, this requires traversing over every
element in the spreadsheet containing the data
for the given store.
51
Data Structures
➢ If store equals 1, this is equivalent to processing
every element in the spreadsheet.
➢ Two nested loops are required since we must
sum the values from each row and column
contained in the given store spreadsheet.
➢ The number of rows (dimension number 2) and
columns (dimension number 3) can be obtained
using the length() array method.
52
Total Sales by ➢ Another value that we can compute from the Sets and Map
Item sales data in the 3-D array is the total sales for a
given item, which includes the sales figures for
all 12 months and from all 8 stores.
53
Data Structures
3.5 EXERCISE
Answer the following:
1. Complete the Set ADT by implementing intersect() and difference().
2. Modify the Set() constructor to accept an optional variable argument to
which a collection of initial values can be passed to initialize the set.
The prototype for the new constructor should look as follows:
def Set( self, *initElements = None )
It can then be used as shown here to create a set initialized with the
given values:
s = Set( 150, 75, 23, 86, 49 )
3. Add a new operation to the Set ADT to test for a proper subset. Given
two sets, A and B, A is a proper subset of B, if A is a subset of B and
A does not equal B.
Reference:
Data Structure and algorithm Using Python, Rance D. Necaise, 2016
Wiley India Edition
54
4
ALGORITHM ANALYSIS
Unit Structure
4.0 Objective
4.1 Algorithm Analysis
4.1.1 The Need for Analysis
4.2 Complexity Analysis
4.2.1 Big-O Notation
4.3 Evaluating Python Code
4.4 Evaluating Python List
4.5 Amortized Cost
4.6 Evaluating Set ADT
4.7 Exercise
4.0 OBJECTIVE
● In this section we are going to learn about Algorithm analysis.
● In this chapter, we will discuss the need for analysis of algorithms
and how to choose a better algorithm for a particular problem as one
computational problem can be solved by different algorithms.
● By considering an algorithm for a specific problem, we can begin to
develop pattern recognition so that similar types of problems can be
solved by the help of this algorithm.
● In this chapter, we will also discuss the analysis of the algorithm
using Big – O asymptotic notation in complete detail.
55
Data Structures ● We can implement the solution by constructing a computer program,
using a given programming language.
● We then execute the program and time it using a wall clock or the
computer’s internal clock.
● The execution time is dependent on several factors.
● First, the amount of data that must be processed directly affects the
execution time.
● As the data set size increases, so does the execution time.
● Second, the execution times can vary depending on the type of
hardware and the time of day a computer is used.
● If we use a multi-process, multi-user system to execute the program,
the execution of other programs on the same machine can directly
affect the execution time of our program.
● Finally, the choice of programming language and compiler used to
implement an algorithm can also influence the execution time.
● Some compilers are better optimizers than others and some languages
produce better optimized code than others.
● Thus, we need a method to analyze an algorithm’s efficiency
independent of the implementation details.
56
○ Average case − An average number of steps taken on any Algorithm Analysis
instance of size a.
○ Amortized − A sequence of operations applied to the input of
size a averaged over time.
● To solve a problem, we need to consider time as well as space
complexity as the program may run on a system where memory is
limited but adequate space is available or may be vice-versa.
● Reasons for analyzing algorithms:
1. To predict the resources that the algorithm require
○ Computational Time(CPU consumption).
○ Memory Space(RAM consumption).
○ Communication bandwidth consumption.
2. To predict the running time of an algorithm
○ Total number of primitive operations executed.
● In this version, the inner loop is again executed n times, but this
time, it only contains one addition operation.
● That gives a total of n additions for each iteration of the outer loop,
but the outer loop now contains an addition operator of its own.
● To calculate the total number of additions for this version, we take
the n additions performed by the inner loop and add one for the
addition performed at the bottom of the outer loop.
● This gives n + 1 additions for each iteration of the outer loop, which
is performed n times for a total of n2 + n additions.
58
Table: Growth rate comparisons for different input sizes. Algorithm Analysis
➔ and graphically in below Figure:
59
Data Structures ➔ A linear-time function/method is “order N” : O(N)
➔ A quadratic-time function/method is “order N squared” : O(N 2 )
Definition:
Let g and f be functions from the set of natural numbers to itself.
The function f is said to be O(g) (read big-oh of g), if there is a constant c
> 0 and a natural number n0 such that f (n) ≤ cg(n) for all n >= n0 .
60
● Thus algorithm with their computational complexity can be rated as Algorithm Analysis
per the mentioned order of performance.
61
Data Structures
4.3 EVALUATING PYTHON CODE
● As indicated earlier, when evaluating the time complexity of an
algorithm or code segment, we assume that basic operations only
require constant time.
● The basic operations include statements and function calls whose
execution time does not depend on the specific values of the data that
is used or manipulated by the given instruction.
● For example, the assignment statement
x=5
is a basic instruction since the time required to assign a reference
to the given variable is independent of the value or type of object
specified on the right hand side of the = sign.
● The evaluation of arithmetic and logical expressions
y=x
z=x+y*6
done = x > 0 and x < 100
● are basic instructions, again since they require the same number of
steps to perform the given operations regardless of the values of their
operands.
● The subscript operator, when used with Python’s sequence types
(strings, tuples, and lists) is also a basic instruction.
62
Algorithm Analysis
List Traversal
● A sequence traversal accesses the individual items, one after the
other, in order to perform some operation on every item.
● Python provides the built-in iteration for the list structure, which
accesses the items in sequential order starting with the first item.
● Consider the following code segment, which iterates over and
computes the sum of the integer values in a list:
64
Algorithm Analysis
Appending to a List
● The append() operation adds a new item to the end of the sequence.
● If the underlying array used to implement the list has available
capacity to add the new item, the operation has a best case time of
O(1) since it only requires a single element access.
● Creating the new larger array and destroying the old array can each be
done in O(1) time.
65
Data Structures ● To copy the contents of the old array to the new larger array, the
items have to be copied element by element, which requires O(n)
time.
● Combining the times from the three steps yields a time of
T(n) = 1 + 1 + n and
a worst case time of O(n).
Extending a List
● The extend() operation adds the entire contents of a source list to the
end of the destination list.
● This operation involves two lists, each of which have their own
collection of items that may be of different lengths.
● To simplify the analysis, however, we can assume both lists contain n
items.
● When the destination list has sufficient capacity to store the new
items, the entire contents of the source list can be copied in O(n) time.
● But if there is not sufficient capacity, the underlying array of the
destination list has to be expanded to make room for the new items.
● This expansion requires O(n) time since there are currently n items in
the destination list.
● After the expansion, the n items in the source list are copied to the
expanded array, which also requires O(n) time.
● Thus, in the worst case the extend operation requires T(n) = n + n =
2n or O(n) time.
66
The list data type has some more methods. Here are all of the methods Algorithm Analysis
of list objects:
Method Description
67
Data Structures
list.count(x) ➔ Return the number of times x appears in
the list.
68
Aggregate Method Algorithm Analysis
● The aggregate method is used to find the total cost.
● If we want to add a bunch of data, then we need to find the amortized
cost by this formula.
● For a sequence of n operations, the cost is −
69
Data Structures when there is an available slot in the array or immediately after the array
has been expanded.
Storing an item into an array element is a constant time operation.
ei represents the time required to expand the array when it does not contain
available capacity to store the item.
Based on our assumptions related to the size of the array, an expansion
only occurs when i − 1 is a power of 2 and the time incurred is based on
the current size of the array (i − 1).
While every append operation entails a storage cost, relatively few require
an expansion cost.
Note that as the size of n increases, the distance between append
operations requiring an expansion also increases.
Based on the tabulated results in following Table , the total time required
to perform a sequence of 16 append operations on an initially empty list is
31, or just under 2n.
This results from a total storage cost (si) of 16 and a total expansion cost
(ei) of 15.
It can be shown that for any n, the sum of the storage and expansion costs,
si + ei , will never be more than T(n) = 2n.
Since there are relatively few expansion operations, the expansion cost can
be distributed across the sequence of operations, resulting in an amortized
cost of T(n) = 2n/n or O(1) for the append operation.
Table : Using the aggregate method to compute the total run time for
a sequence of 16 append operations.
70
4.6 EVALUATING SET ADT Algorithm Analysis
71
Data Structures
72
● The set equality operation is also O(n2) since it calls isSubsetOf() Algorithm Analysis
after determining the two sets are of equal size.
4.7 EXERCISE
Answer the following:
1. Arrange the following expressions from slowest to fastest growth rate.
3. Prove or show why the worst case time-complexity for the insert() and
remove() list operations is O(n).
73
Data Structures 4. Evaluate each of the following code segments and determine the O(·)
for the best and worst cases. Assume an input size of n.
74
5
APPLICATION OF SEARCHING
Unit Structure
5.0 Objective
5.1 Application Searching and Sorting
5.2 Searching
5.2.1 Linear Search
5.2.2 Binary Search
5.3 Implementation using Python
5.3.1 Linear Search using Python
5.3.2 Binary Search Iterative Method using Python
5.3.3 Binary Search Recursive Method using Python
5.4 Exercise
5.0 OBJECTIVE
In this chapter, we explore these important topics and study some of the
basic algorithms for use with sequence structures.
The searching problem will be discussed many times throughout the text
as it can be applied to collections stored using many different data
structures, not just sequences.
In this chapter we are going to be able to explain and implement
sequential search and binary search.
5.2 SEARCHING
● Searching is the process of selecting particular information from a
collection of data based on specific criteria.
● In this text, we restrict the term searching to refer to the process of
finding a specific item in a collection of data items.
75
Data Structures ● The search operation can be performed on many different data
structures.
● The sequence search, which is the focus in this chapter, involves
finding an item within a sequence using a search key to identify the
specific item.
● A key is a unique value used to identify the data elements of a
collection.
● In collections containing simple types such as integers or reals, the
values themselves are the keys.
● For collections of complex types, a specific data component has to be
identified as the key.
● In some instances, a key may consist of multiple components, which
is also known as a compound key.
● The use of the in operator makes our code simple and easy to read
but it hides the inner workings.
● Underneath, the in operator is implemented as a linear search.
● Consider the unsorted 1-D array of integer values shown in
Figure (a).
77
Data Structures
78
Application of Searching
● The value of K, i.e., 41, is not matched with the first element of the
array. So, move to the next element. And follow the same process
until the respective element is found.
79
Data Structures
80
● To prime the loop, we assume the first value in the sequence is the Application of Searching
smallest and start the comparisons at the second item.
● Since the smallest value can occur anywhere in the sequence, we must
always perform a complete traversal, resulting in a worst case time of
O (n).
81
Data Structures ○ If a larger value is encountered, the loop terminates and False is
returned.
○ With the modification to the linear search algorithm, we have
produced a better version, but the time-complexity remains the
same.
● The reason is that the worst case occurs when the value is not in the
sequence and is larger than the last element.
● In this case, we must still traverse the entire sequence of n items.
83
Data Structures ● In performing this task, most people would not begin with the first
exam and flip through one at a time until the requested exam is found,
as would be done with a linear search.
● Instead, you would probably flip to the middle and determine if the
requested exam comes alphabetically before or after that one.
● Assuming Jessica’s paper follows alphabetically after the middle one,
you know it cannot possibly be in the top half of the stack.
● Instead, you would probably continue searching in a similar fashion by
splitting the remaining stack of exams in half to determine which
portion contains Jessica’s exam.
● This is an example of a divide and conquer strategy, which entails
dividing a larger problem into smaller parts and conquering the smaller
part.
84
Working of Binary search Application of Searching
● Now, let's see the working of the Binary Search Algorithm.
● To understand the working of the Binary search algorithm, let's take a
sorted array. It will be easy to understand the working of Binary search
with an example.
● There are two methods to implement the binary search algorithm -
○ Iterative method
○ Recursive method
● The recursive method of binary search follows the divide and conquer
approach.
● Let the elements of array are -
85
Data Structures
1. Time Complexity
2. Space Complexity
86
5.3 IMPLEMENTATION USING PYTHON Application of Searching
if (array[i] = = x):
return i
return -1
array = [2, 4, 0, 1, 9]
x=1
n = len(array)
result = linearSearch(array, n, x)
if(result == -1):
else:
Output
87
Data Structures 5.3.2 Binary Search Iterative Method:
Output
Element is present at index 1
88
5.3.3 Binary Search (Recursive Method) Application of Searching
# Binary Search in python
def binarySearch(array, x, low, high):
if high >= low:
mid = low + (high - low)//2
# If found at mid, then return it
if array[mid] == x:
return mid
# Search the left half
elif array[mid] > x:
return binarySearch(array, x, low, mid-1)
# Search the right half
else:
return binarySearch(array, x, mid + 1, high)
else:
return -1
array = [3, 4, 5, 6, 7, 8, 9]
x=4
result = binarySearch(array, x, 0, len(array)-1)
if result != -1:
print("Element is present at index " + str(result))
else:
print("Not found")
Output
Element is present at index 1
5.4 EXERCISE
1. What do you mean by Searching? Explain Sequential search and
Binary search with help of example.
2. What is searching?
3. What is Linear search?
4. Define Space Complexity
5. Define Time Complexity
6. What are asymptotic notations?
89
6
APPLICATION OF SORTING AND
WORKING WITH SORTED LISTS
Unit Structure
6.0 Objective
6.1 Sorting
6.1.1 Difference between Searching and Sorting Algorithms
6.1.2 Bubble Sort
6.1.3 Selection Sort
6.1.4 Insertion Sort
6.2 Working with Sorted Lists
6.2.1 Maintaining Sorted List
6.2.2 Merging Sorted Lists.
6.3 Implementation using Python
6.3.1 Bubble Sort
6.3.2 Selection Sort
6.3.3 Insertion Sort
6.4 Exercise
6.0 OBJECTIVE
● In this chapter we are going to be able to explain and understand the
difference between searching and sorting.
● Sorting refers to arranging data in a particular format.
● Sorting algorithm specifies the way to arrange data in a particular
order.
● Most common orders are in numerical or lexicographical order.
● In this chapter, we will discuss the Bubble sort Algorithm.
● A sorting algorithm is an algorithm that puts elements of a list in a
certain order.
● The most used orders are numerical order and lexicographical order.
● Efficient sorting is important to optimizing the use of other algorithms
that require sorted lists to work correctly and for producing human -
readable input.
90
6.1 SORTING Application of Sorting and
Working With Sorted Lists
● Sorting is the process of arranging or ordering a collection of items
such that each item and its successor satisfy a prescribed relationship.
● The items can be simple values, such as integers and reals, or more
complex types, such as student records or dictionary entries.
● In either case, the ordering of the items is based on the value of a sort
key.
● The key is the value itself when sorting simple types or it can be a
specific component or a combination of components when sorting
complex types.
● We encounter many examples of sorting in everyday life.
● Consider the listings of a phone book, the definitions in a dictionary,
or the terms in an index, all of which are organized in alphabetical
order to make finding an entry much easier.
● As we saw earlier in the chapter, the efficiency of some applications
can be improved when working with sorted lists.
● Another common use of sorting is for the presentation of data in some
organized fashion.
● For example, we may want to sort a class roster by student name, sort
a list of cities by zip code or population, rank order SAT scores, or list
entries on a bank statement by date.
● Sorting is one of the most studied problems in computer science and
extensive research has been done in this area, resulting in many
different algorithms.
● While Python provides a sort() method for sorting a list, it cannot be
used with an array or other data structures.
● In addition, exploring the techniques used by some of the sorting
algorithms for improving the efficiency of the sort problem may
provide ideas that can be used with other types of problems.
● In this section, we present three basic sorting algorithms, all of which
can be applied to data stored in a mutable sequence such as an array or
list.
Sorting algorithms are often classified by :
● Computational complexity (worst, average and best case) in terms of
the size of the list (N).
For typical sorting algorithms good behaviour is O(NlogN) and worst
case behaviour is O(N2 ) and the average case behaviour is O(N).
● Memory Utilization
● Stability - Maintaining relative order of records with equal keys.
91
Data Structures ● No. of comparisons.
● Methods applied like Insertion, exchange, selection, merging etc.
● Sorting is a process of linear ordering of lists of objects.
● Sorting techniques are categorized into
○ Internal Sorting
○ External Sorting
● Internal Sorting takes place in the main memory of a computer.
○ Example: - Bubble sort, Insertion sort, Shell sort, Quick sort, Heap
sort, etc.
● External Sorting, takes place in the secondary memory of a computer,
Since the number of objects to be sorted is too large to fit in main
memory.
○ Example: - Merge Sort, Multiway Merge, Polyphase merge.
92
Application of Sorting and
Bubble Sort, Insertion Sort, Working With Sorted Lists
There is no stable and Merge Sort etc are the stable
4. unstable searching sorting algorithms whereas
algorithms. Quick Sort, Heap Sort etc are
the unstable sorting algorithms.
The Linear Search and the The Bubble Sort, Insertion Sort,
Binary Search are the Selection Sort, Merge Sort,
5.
examples of Searching Quick Sort etc are the examples
Algorithms. of Sorting Algorithms.
1. begin BubbleSort(arr)
2. for all array elements
3. if arr[i] > arr[i+1]
4. swap(arr[i], arr[i+1])
5. end if
6. end for
7. return arr
8. end BubbleSort
93
Data Structures Working of Bubble sort Algorithm
● To understand the working of bubble sort algorithm, let's take an
unsorted array.
● We are taking a short and accurate array, as we know the complexity
of bubble sort is O(n2).
● Let the elements of array are -
● First Pass
○ Sorting will start from the initial two elements. Let compare
them to check which is greater.
95
Data Structures ● Fourth pass
○ Similarly, after the fourth iteration, the array will be -
Stable YES
● The space complexity of bubble sort is O(1). It is because, in bubble
sort, an extra variable is required for swapping.
● The space complexity of optimized bubble sort is O(2). It is because
two extra variables are required in optimized bubble sort.
96
Optimized Bubble sort Algorithm Application of Sorting and
Working With Sorted Lists
● In the bubble sort algorithm, comparisons are made even when the
array is already sorted. Because of that, the execution time increases.
● To solve it, we can use an extra variable swapped. It is set to true if
swapping requires; otherwise, it is set to false.
● It will be helpful, as suppose after an iteration, if there is no swapping
required, the value of variable swapped will be false.
● It means that the elements are already sorted, and no further iterations
are required.
● This method will reduce the execution time and also optimizes the
bubble sort.
1. bubbleSort(array)
2. n = length(array)
3. repeat
4. swapped = false
5. for i = 1 to n - 1
6. if array[i - 1] > array[i], then
7. swap(array[i - 1], array[i])
8. swapped = true
9. end if
10. end for
11. n = n - 1
12. until not swapped
13. end bubbleSort
1.SELECTION SORT(arr, n)
2.
3.Step 1: Repeat Steps 2 and 3 for i = 0 to n-1
4.Step 2: CALL SMALLEST(arr, i, n, pos)
5.Step 3: SWAP arr[i] with arr[pos]
6.[END OF LOOP]
7.Step 4: EXIT
8.
9.SMALLEST (arr, i, n, pos)
10. Step 1: [INITIALIZE] SET SMALL = arr[i]
11. Step 2: [INITIALIZE] SET pos = i
12. Step 3: Repeat for j = i+1 to n
13. if (SMALL > arr[j])
14. SET SMALL = arr[j]
15. SET pos = j
16. [END OF if]
17. [END OF LOOP]
18. Step 4: RETURN pos
98
Working of Selection sort Algorithm Application of Sorting and
Working With Sorted Lists
● Now, let's see the working of the Selection sort Algorithm.
● To understand the working of the Selection sort algorithm, let's take an
unsorted array. It will be easier to understand the Selection sort via an
example.
● Let the elements of array are -
○ Now, for the first position in the sorted array, the entire array is to be
scanned sequentially.
○ At present, 12 is stored at the first position, after searching the entire
array, it is found that 8 is the smallest value.
○ So, swap 12 with 8. After the first iteration, 8 will appear at the first
position in the sorted array.
● Now, swap 29 with 12. After the second iteration, 12 will appear at the
second position in the sorted array. So, after two iterations, the two
smallest values are placed at the beginning in a sorted way.
● The same process is applied to the rest of the array elements. Now, we
are showing a pictorial representation of the entire sorting process.
99
Data Structures
2. Space Complexity
Stable YES
● The space complexity of selection sort is O(1). It is because, in
selection sort, an extra variable is required for swapping.
101
Data Structures Algorithm:
The simple steps of achieving the insertion sort are listed as follows -
● Here, 31 is greater than 12. That means both elements are already in
ascending order. So, for now, 12 is stored in a sorted sub-array.
102
● Here, 25 is smaller than 31. So, 31 is not at correct position. Now, Application of Sorting and
swap 31 with 25. Along with swapping, insertion sort will also check it Working With Sorted Lists
with all elements in the sorted array.
● For now, the sorted array has only one element, i.e. 12. So, 25 is
greater than 12. Hence, the sorted array remains sorted after swapping.
● Now, two elements in the sorted array are 12 and 25. Move forward to
the next elements that are 31 and 8.
● Now, the sorted array has three items that are 8, 12 and 25. Move to
the next items that are 31 and 32.
103
Data Structures ● Hence, they are already sorted. Now, the sorted array includes 8, 12,
25 and 31.
1. Time Complexity
104
● Best Case Complexity - It occurs when there is no sorting required, Application of Sorting and
i.e. the array is already sorted. The best-case time complexity of Working With Sorted Lists
insertion sort is O(n).
● Average Case Complexity - It occurs when the array elements are in
jumbled order that is not properly ascending and not properly
descending. The average case time complexity of insertion sort is
O(n2).
● Worst Case Complexity - It occurs when the array elements are
required to be sorted in reverse order. That means suppose you have to
sort the array elements in ascending order, but its elements are in
descending order. The worst-case time complexity of insertion sort is
O(n2).
2. Space Complexity
Stable YES
105
Data Structures
● Note that while the sorting algorithms from the previous section all
require O(n2) time in the worst case, there are more efficient sorting
algorithms that only require O(n log n) time.
106
Application of Sorting and
Working With Sorted Lists
Finding the location of a target value using the binary search.
107
Data Structures ● It shows the changes to the three variables low, mid, and high as the
binary search algorithm progresses in searching for value 25.
● The while loop terminates when either the low or high range variable
crosses the other, resulting in the condition low > high.
● Upon termination of the loop, the low variable will contain the
position where the new value should be placed.
● This index can then be supplied to the insert() method to insert the new
value into the list.
● The findOrderedPosition() function can also be used with lists
containing duplicate values, but there is no guarantee where the new
value will be placed in relation to the other duplicate values beyond
the proper ordering requirement that they be adjacent.
Problem Solution
● This problem can be solved by simulating the action a person might
take to merge two stacks of exam papers, each of which are in
alphabetical order.
● Start by choosing the exam from the two stacks with the name that
comes first in alphabetical order.
● Flip it over on the table to start a new stack.
● Again, choose the exam from the top of the two stacks that comes
next in alphabetical order and flip it over and place it on top of first
one.
● Repeat this process until one of the two original stacks is exhausted.
108
● The exams in the remaining stack can be flipped over on top of the Application of Sorting and
new stack as they are already in alphabetical order and alphabetically Working With Sorted Lists
follow the last exam flipped onto the new stack.
● A similar approach can be used to merge two sorted lists.
● Consider the illustration in the following Figure:
109
Data Structures
● The process of merging the two lists begins by creating a new empty
list and initializing the two index variables to zero.
● A loop is used to repeat the process of selecting the next largest value
to be added to the new merged list.
● During the iteration of the loop, the value at listA[a] is compared to
the value listB[b].
● The largest of these two values is added or appended to the new list.
● If the two values are equal, the value from listB is chosen.
● As values are copied from the two original lists to the new merged
list, one of the two index variables a or b is incremented to indicate
the next largest value in the corresponding list.
● This process is repeated until all of the values have been copied from
one of the two lists, which occurs when a equals the length of listA or
b equals the length of listB.
● Note that we could have created and initialized the new list with a
sufficient number of elements to store all of the items from both listA
and listB.
● While that works for this specific problem, we want to create a more
general solution that we can easily modify for similar problems where
the new list may not contain all of the items from the other two lists.
● After the first loop terminates, one of the two lists will be empty and
one will contain at least one additional value.
110
● All of the values remaining in that list must be copied to the new Application of Sorting and
merged list. Working With Sorted Lists
● This is done by the next two while loops, but only one will be
executed depending on which list contains additional values.
● The position containing the next value to be copied is denoted by the
respective index variable a or b.
111
Data Structures 6.3 IMPLEMENTATION USING PYTHON
6.3.1 Bubble Sort
Program: Write a program to implement bubble sort in python.
Output
112
6.3.2 Selection Sort Application of Sorting and
Working With Sorted Lists
Program: Write a program to implement selection sort in python.
Output:
113
Data Structures 6.3.3 Insertion Sort:
Program: Write a program to implement insertion sort in python.
Output:
114
6.4 EXERCISE: Application of Sorting and
Working With Sorted Lists
Answer the following:
1. In this chapter, we used a modified version of the merge Sorted Lists()
function to develop a linear time union() operation for our Set ADT
implemented using a sorted list. Use a similar approach to implement
new linear time versions of the is Subset Of (), intersect(), and
difference() methods.
2. Given the following list of keys (80, 7, 24, 16, 43, 91, 35, 2, 19, 72),
show the contents of the array after each iteration of the outer loop for
the indicated algorithm when sorting in ascending order.
(a) bubble sort (b) selection sort (c) insertion sort
3. Given the following list of keys (3, 18, 29, 32, 39, 44, 67, 75), show
the contents of the array after each iteration of the outer loop for the
(a) bubble sort (b) selection sort (c) insertion sort
4. Evaluate the insertion sort algorithm to determine the best case and the
worst case time complexities.
115
Unit II
7
LINKED STRUCTURES
Unit Structure
7.0 Objective
7.1 Introduction
7.2 Singly Linked List-
7.2.1 Traversing.
7.2.2 Searching.
7.2.3 Prepending Nodes
7.2.4 Removing Nodes
7.3 Bag ADT-Linked List Implementation
7.4 Comparing Implementations
7.5 Linked List Iterators,
7.6 More Ways to Build Linked Lists
7.7 Application-Polynomials
7.8 Summary
7.9 Reference for further reading
7.10 Unit End Exercises
7.0 OBJECTIVE
1. To understand the concept of linked list in Data Structures.
2. To understand the implementation of linked lists using python.
3. To learn different operations under the linked list.
4. To study the Bag ADT implementation.
5. To understand the application of linked lists.
7.1 INTRODUCTION
● Linked List can be defined as a collection of objects called nodes that
are randomly stored in the memory.
● A data structure known as a linked list, which provides an alternative
to an array-based sequence (also called a Python list).
116
● Both array-based sequences and linked lists keep elements in a certain Linked Structures
type of order, but using a very different style.
● An array provides a more centric representation, with one large chunk
of memory capable of accommodating references to many elements.
● A linked list, in contrast, relies on a more distributed representation in
which a lightweight object, known as a node, is allocated for each
element.
● Each node maintains a reference to its element and one or more
references to next to nodes in order to collectively represent the linear
order of the sequence.
● The Python list, which is also a sequence container, is an abstract
sequence type implemented using an array structure. It gives more the
functionality of an array by providing a larger set of operations than
the array, and it can automatically adjust in size as items are added or
removed.
● The linked list arrangement, which may be a general purpose structure
which will be wont to store a set in linear order. The linked list
improves on the development and management of an array and Python
list by requiring smaller memory allocations and no element shifts for
insertions and deletions.
● There are several sorts of linked lists. The singly linked list may be a
linear structure during which traversals start at the front and progress,
one element at a time, to the top. Other variations include the
circularly linked, the doubly linked, and therefore the circularly doubly
linked lists.
classListNode :
def __init__( self, data ) :
self.data = data
● In Linked List we create several instances of this class which is called
ListNode, each storing data of our choosing. within the following
example, we create three instances, each storing an integer value:
a = ListNode( 11 )
b = ListNode( 52 )
c = ListNode( 18 )
● This three node create three variables and three objects :
Fig. 1
117
Data Structures ● Add a second node or data field to the ListNode class:
classListNode :
def __init__( self, data ) :
self.data = data
self.next = None
● The three objects from the previous figure would now have a second
node or data field initialized with a NULL reference, as shown in the
following fig. 2:
Fig. 2
● Since subsequent fields can contain a regard to any sort of object, we
will assign thereto a regard to one among the opposite ListNode
objects. For example, suppose we assign b to the next field of object
a:
a.next = b
● which output in object a being linked to object b, as shown below.
Fig.3
● And at the end, link object b to object c:
b.next = c
● resulting in a series of objects, as shown here.
Fig. 4
118
● In Linked List, remove the two external references b and c by Linked Structures
assigning none to each, as shown below
Fig. 5
● The final result is a linked list structure. The two objects previously
pointed to by b and c are still accessible via a. For example, suppose
we wanted to print the values of the three objects. We can access the
other two objects through the next field of the first object:
print(a.data )
print(a.next.data )
print(a.next.next.data )
● A linked structure contains a collection of objects called nodes, each
of which contains data and at least one reference or pointer or link to
another node. A linked list is a linked structure in which the nodes are
connected in order to form a linear list.
Fig. 6
● The above Linked list provides an example of a linked list consisting
of 5 nodes. The last node within the list, commonly called the tail
node, is indicated by a NULL link reference. Most nodes within the
list haven't any name and are simply referenced via the link field of
the preceding node.
● The first node within the list, however, must be named or referenced
by an external variable because it provides an entry point into the
linked list.
● This variable is usually referred to as the top pointer, or head
reference. A linked list also can be empty, which is indicated when
the top reference is NULL.
119
Data Structures 7.2 SINGLY LINKED LIST:
● A singly linked list, in its simplest form, is a collection of nodes that
collectively form a linear sequence.
● Each node stores a reference to an object that is an element of the
sequence, as well as a reference to the next node of the list.+
● A node in the singly linked list consists of two parts: data part and link
part. Data part of the node stores actual information that is to be
represented by the node while the link part of the node stores the
address of its immediate successor.
7.2.1 Traversing.
● Traversing a linked list requires the initialization and adjustment of a
temporary external reference variable.
● Step by Step Linked List Traversing:
1. After initializing the temporary external reference.
Fig. 7
2. Advancing the external reference after printing value 2.
Fig. 8
3. Advancing the external reference after printing value 52.
Fig. 9
120
4. Advancing the external reference after printing value 18. Linked Structures
Fig. 10
5. Advancing the external reference after printing value 36.
Fig. 11
6. The external reference is set to None after printing value 13.
Fig. 12
Implementation of the linked list:
121
Data Structures Example:
Code:
class Node:
def __init__(self, datavalue1=None):
self.datavalue1 = datavalue1
self.nextvalue1 = None
class SLinkedList:
def __init__(self):
self.headvalue1 = None
deflistprint(self):
printvalue1 = self.headvalue1
while printvalue1 is not None:
print (printvalue1.datavalue1)
printvalue1 = printvalue1.nextvalue1
list = SLinkedList()
list.headvalue1 = Node("One")
e2 = Node("Two")
e3 = Node("Three")
# Link 1st Node to 2nd node
list.headvalue1.nextvalue1=e2
# Link 2nd Node to 3rd node
e2.nextvalue1=e3
list.listprint()
Output:
One
Two
Three
122
7.2.2 Searching Linked Structures
● A linear or sequential search operation can be carried out on a linked
list. It is very closely the same as the traversal operation. The only
difference is that the loop can stop early if we end the target value
within the list.
● About two logic in the while loop. It is important that we test for a
NULL currentNode reference before trying to examine the contents of
the node.
● If the item is not found in the list, currentNode will be NULL when the
end of the list is reached.
● If we try to evaluate the data field of the NULL reference, an
exception will be raised, resulting in a run-time error.
● A NULL reference does not point to an object and thus there are no
fields or methods to be referenced.
● When we use the search operation for the linked list, we must make
sure that it works with both empty and non-empty lists.
● In this fact, we do not need a separate test to identify if the list is
empty. This is done automatically by checking the traversal variable as
the loop condition. If the list were vacant, currentNode would be set to
None initially and the loop would never be entered.
● In the linked list search operation needs O(n) in the worst case, which
occurs when the target item is not in the list.
123
Data Structures 1 def traversal( head ):
2 currentNode = head
3 while currentNode is not None :
4 print currentNode.data
5 currentNode = currentNode.next
124
Linked Structures
Fig. 13
Prepending a node to the linked list:
(a) The original list
(b) Link modifications required to prepend the node; and
(c) The result after prepending 96.
7.2.4 Removing Nodes
● An item or data delete from a linked list by removing or disjoining the
node containing that item or data.
● The linked list as shown in the figure 14c and take it that we remove
the node containing 18. First, we must find the node containing the
target value and place an external reference variable pointing to it, as
shown in figure 14a. After finding the node, it has to be unlinked from
the list, which necessitates adjusting the link field of the node's
predecessor to point to its successor as shown in figure 14b. The
node's link field is also freed by setting it to none.
Fig. 14
125
Data Structures Removing a node from a linked list:
1. Finding the node that need to be removed and assigning an external
reference variable
2. The link alteration required to unlink and remove a node.
126
Linked List Implementation Linked Structures
● The linked list implementation of the Bag ADT can be done with the
constructor. Initially, the head field will store the head pointer of the
linked list. The reference pointer is initiated to None to represent an
empty bag.
● The size field is used to keep track of the number of items stored in the
bag that is required by the len() method. This field is not needed. But
it does avoid us from having to traverse the list to count the number of
nodes each time the length is required. Define only a head pointer as a
data field in the object. Short live references such as the currentNode
reference used to traverse the list are not defined as attributes, but
instead as local variables within the individual methods as needed.
● A sample instance of the new Bag class is shown in Figure 15
Fig. 15
Sample instance of the Bag class
● The contains () method is a simple search of the linked list, The add()
method simply implements the prepend operation, though we must
also increment the item counter ( size) as new items are added.
● The Bag List Node class, used to represent the individual nodes, is also
denied within the same module.
127
Data Structures
Fig. 16
128
● When repeated over a linked list, we need only keep track of the Linked Structures
current node being processed and thus we use a single data held
currentNode in the iterator.
● The linked list as the for loop iterates over the nodes.
● Figure 17 shows the Bag and BagIterator objects at the beginning of
the for loop.
● The currentNode pointer in the BagIterator object is used just like the
currentNode pointer we used when directly performing a linked list
traversal.
● The difference is that we do not include a while loop since Python
manages the iteration for us as part of the for loop.
● The iterator objects can be used with singly linked list configuration to
traverse the nodes and return the data consist in each one.
Fig. 17
129
Data Structures 4. Instead of a single external head reference, we have to use two
external references, one for the head and one for the tail. Figure
18 shows a sample linked list with both a head and a tail
reference.
Fig. 18 Sample linked list using both head and tail external references
7.7 APPLICATION-POLYNOMIALS
● Arithmetic expressions specified in terms of variables and constants.
A polynomial in one variable can be indicatedin expanded form:
● The exponent of the part is called the degree of that variable and is
limited to whole numbers. For Example,
130
● Consists of three terms. Linked Structures
○ The first term, , is of degree 2 and has a coefficient of 12
○ the second term, -3x, is of degree 1 and has a coefficient of -3
○ The last term, while constant, is of degree 0 with a coefficient
of 7.
○ Polynomials can be characterized by degree (i.e., all second-
degree polynomials).
○ The degree of a polynomial is the largest single degree of its
terms.
○ The example polynomial above has a degree of 2 since the
degree of the first term has the largest degree.
● Design and implement an abstract data type to represent polynomials
in one variable expressed in expanded form.
Polynomial Operations
A number of operations can be performed on polynomials. Let start with
addition operation.
● Addition
Two polynomials can be added the coefficients of corresponding
terms of equal degree. The result is a third polynomial.
131
Data Structures ● Multiplication
The product of two polynomials is also a third polynomial. The
new polynomial is finding by summing the result from multiplying
each term of the first polynomial by each term of the second.
● Evaluation
The evaluation is an easiest operation of a polynomial.
Polynomials can be evaluated by assigning a value to the variable,
commonly called the unknown. By making the variable known in
specifying a value, the expression can be calculated, resulting in a
real value. If we assign value 3 to the variable x in the equation
7.8 SUMMARY
1. The linked list improves on the construction and management of an
array and Python list by requiring smaller memory allocations and no
element required to shift for insertions and deletions.
2. A singly linked list, in its easiest form, is a collection of nodes that
combine to form a linear sequence.
3. A bag is a simple container like a shopping bag that needs to be used
to store a collection of items.
4. The Python list and the linked list can be used to handle the elements
stored in a bag. Both implementations give the same time-complexities
for the various operations with the exception of the add () method.
132
7.9 REFERENCE FOR FURTHER READING Linked Structures
133
8
STACKS
Unit Structure
8.0 Objective
8.1 Introduction
8.2 Stack ADT
8.3 Implementing Stacks
8.3.1 Using Python List
8.3.2 Using Linked List
8.4 Stack Applications
8.4.1 Balanced Delimiters
8.4.2 Evaluating Postfix Expressions
8.5 Summary
8.6 Reference for further reading
8.7 Unit End Exercises
8.0 OBJECTIVE
1. To understand the concept of Stacks in Data Structures.
2. To understand the implementation of stack using python.
3. To learn different operations under the stack.
4. To study the Stack ADT implementation.
5. To understand the applications of stacks.
8.1 INTRODUCTION
● Python list and linked list structures to implement a different container
ADT.
● The stack, which may be a sort of container with restricted access that
stores a linear collection.
● Stacks are very common in computer science and utilized in many
types of problems.
● Stacks also occur in our everyday lives.
● Consider a stack of trays. When a tray is taken away from the top, the
others shift up. If trays are kept onto the stack, the others are pushed
down.
134
8.2 STACK ADT Stacks
● A stack is used to store data like the last item inserted is the first item
removed.
● It is used to implement a last-in first-out (LIFO) type of data structure.
● The stack is a linear data structure in which new items are added at the
end, or existing items are removed from the same end, commonly
called at the top of the stack.
● The opposite end is known as the base of the stack.
● Example shown in figure 7.1, which illustrates new values being added
to the top of the stack and one value being removed from the top.
136
# Returns True if the stack is empty or False otherwise. Stacks
defisEmpty( self ):
return len( self ) == 0
# Returns the number of items in the stack.
def __len__ ( self ):
return len( self._theItems )
# Returns the top item on the stack without removing it.
def peek( self ):
assert not self.isEmpty(), "Cannot peek at an empty stack"
return self._theItems[-1]
# Removes and returns the top item on the stack.
def pop( self ):
assert not self.isEmpty(), "Cannot pop from an empty stack"
return self._theItems.pop()
# Push an item onto the top of the stack.
def push( self, item ):
self._theItems.append( item )
● The peek() and pop() operations can only be used with a non-empty
stack.
● To accomplish this requirement, until we first assert the stack is not
empty before performing the given operation.
● The peek() method directly returns a reference to the last item in the
list.
● To implement the pop() method, call the pop() method of the list
structure, which actually performs the same operation that we are
striving to implement. This will save a copy of the last item in the list,
delete the item from the list, and then return the saved item.
● The push() method simply inserts new items to the end of the list since
that represents the top of our stack.
● The Separate stack operations are simple to evaluate for the Python
list-based implementation. isEmpty(), len , and peek() only require
O(1) time.
● The pop() and push() methods both require O(n) time in the worst
case, When used in sequence, both operations have an restitution cost
of O(1).
137
Data Structures 8.3.2 Using Linked List
● The Python list based implementation may not be the excellent choice
for stacks with a huge number of push and pop operations.
● Keep in mind that, each append() and pop() list operation may require
a reallocation of the underlying array used to implement the list.
● A singly linked list can be used to carry out the Stack ADT, helping
the concern over array reallocations.
● The Stack ADT implemented using a linked list is shown below:
138
● The class constructor produces two instance variables for each Stack. Stacks
● The top field is the head reference for preserving the linked list while
size is an integer value for keeping track of the number of items on the
stack.
● The concluding has to be adjusted when items are pushed onto or
popped off the stack.
● Figure 2 shows a sample Stack object for the stack from Figure 1b.
● The StackNode class is employed to make the linked list nodes.
● Note the inclusion of the link argument within the constructor, which
is employed to initialize subsequent fields of the new node.
● By including this argument, we will simplify the prepend operation of
the push () method.
● The 2 steps required to prepend a node to a linked list are combined by
passing the top regard to p because of the second argument of the
StackNode() constructor.
139
Data Structures
140
Example: Stacks
Table 1 shows the steps performed by our algorithm and the contents of
the stack after each delimiter is encountered in given sample code
141
Data Structures ○ The delimiters are balanced with an equal number of opening and
closing delimiters
○ If the delimiters are not balanced properly then we encounter more
opening or closing delimiters.
○ For example, if the programmer initiate a typographical error in the
function header:
intsum List (intthe List)], int size)
● In Table 2 the stack is empty since the left parenthesis was popped and
matched with the preceding right parenthesis.
● Like so unbalanced delimiters in which there are more closing
delimiters than opening ones can be detected when trying to pop from
the stack and we determine the stack is empty.
142
for token in line : Stacks
if token in "{[(" :
s.push( token )
elif token in "}])" :
if s.isEmpty() :
return False
else :
left = s.pop()
if (token == "}" and left != "{") or \
(token == "]" and left != "[") or \
(token == ")" and left != "(") :
return False
return s.isEmpty()
A same algorithm can be used for converting from infix to prefix notation.
144
● Assume we are given a valid postfix expression stored in a string Stacks
consisting of operators and single-letter variables. We can evaluate the
expression by scanning the string, one character or token at a time. For
each token, we perform the following steps:
1. If the current item is an operand, push its value onto the stack.
2. If the current item is an operator:
(a) Pop the top two operands of the stack.
(b) Perform the operation.
(Note the top value is the right operand while the next to the top value
is the left operand.)
(c) Push the result of this operation back onto the stack.
145
Data Structures ● The following invalid expression in which there are more operands
than available operators:
AB*CD+
● if there are too many operators for the given number of operands.
Consider such an invalid expression:
AB*+C/
● In this case, there are too few operands on the stack when we
encounter the addition operator, as shown in Table 4
8.4 SUMMARY
1. A stack is used to store data such that the last item inserted is the rst
item removed. It is used to implement a last-in rst-out (LIFO) type
protocol.
2. A stack is a data structure that stores a linear collection of items with
access limited to a last-in first-out order. Adding and removing items
is restricted to one end known as the top of the stack. An empty stack
is one containing no items.
3. The two most common approaches in Python include the use of a
Python list and a linked list.
4. The function isValid Source() accepts a object, which we assume was
pre- viously opened and contains C++ source code.
146
5. Evaluating a postfix expression requires the use of a stack to store the Stacks
operands or variables at the beginning of the expression until they are
needed.
147
9
LINKED LIST
Unit Structure
9.0 Objectives
9.1 Advanced Linked Lists
9.1.1 Doubly linked list
9.1.1.1 Memory Representation of a doubly linked list
9.1.1.2 Operations on doubly linked list
9.1.2 Circular Singly Linked List
9.1.2.1 Memory Representation of circular linked list
9.1.2.2 Operations on Circular Singly linked list
9.1.3 Multi-Linked Lists
9.1.3.1 Example 1: Multiple Orders Of One Set Of Elements
9.1.3.2 Example 2: Sparse Matrices
9.2 Points to Remember
9.3 References
9.0 OBJECTIVES
This chapter will make the readers understand the following concepts:
Doubly Linked List
Operations on doubly linked list
Circular Linked List
Operations on circular linked list
Multilinked list
Examples
Doubly linked list is a complex type of linked list in which a node contains
a pointer to the previous as well as the next node in the sequence.
Therefore, in a doubly linked list, a node consists of three parts: node data,
pointer to the next node in sequence (next pointer), pointer to the previous
node (previous pointer). A sample node in a doubly linked list is shown in
the figure.
2 Insertion at end Adding the node into the linked list to the
end.
3 Insertion after Adding the node into the linked list after
specified node the specified node.
5 Deletion at the end Removing the node from end of the list.
In a circular Singly linked list, the last node of the list contains a pointer to
the first node of the list. We can have circular singly linked list as well as
circular doubly linked list.
We traverse a circular singly linked list until we reach the same node
where we started. The circular singly liked list has no beginning and no
ending. There is no null value present in the next part of any of the nodes.
The following image shows a circular singly linked list.
151
Data Structures
152
9.1.2.2 Operations on Circular Singly linked list Linked List
Insertion
A multilinked list is a more general linked list with multiple links from
nodes.
A multi-list has more than one next pointer, like a doubly linked list,
but the pointers create separate lists Linked Structures
A doubly-linked list or multi-list is a data structure with multiple
pointers in each node.
We traverse a circular singly linked list until we reach the same node
where we started.
155
Data Structures Every item has a priority associated with it.
If two elements have the same priority, they are served according to
their order in the queue.
In the linked queue, there are two pointers maintained in the memory
i.e. front pointer and rear pointer. The front pointer contains the
address of the starting element of the queue while the rear pointer
contains the address of the last element of the queue.
9.3 REFERENCES
Data structures and Algorithms Narasimha karum.
Data structures and Algorithms using C ,C++ learnbay.com
Greeksforgreeks.com
Data Structures by Schaum Series
Introduction to Algorithms by Thomas H Cormen
Introduction to Algorithm: A Creative Approach
156
10
QUEUES
Unit Structure
10.0 Objectives
10.1 The Queue Abstract Data Type introduction
10.1.1 Queue Representation
10.2 Basic Operations
10.2.1 Enqueue Operation
10.2.2 Dequeue Operation
10.3 Implementing Queue-Using Python List
10.3.1 Implementation using list
10.4 Circular Queue
10.4.1 Operations on Circular Queue:
10.4.2 Applications:
10.5 Linked Queue
10.5.1 Operation on Linked Queue
10.6 Priority Queue — Abstract Data Type
10.6.1 ADT — Interface
10.6.2 Implementation of priority queue
10.6.3 Bounded Priority Queue
10.6.4 Unbounded Priority Queues
10.6.5 Applications of Priority Queue:
10.8 Points to Remember
10.9 References
10.0 OBJECTIVES
This chapter will make the readers understand the following concepts:
Introduction to Queue
Basic operations on queues
157
Data Structures Queues using Python List
Concept of circular Queues
Concept of Linked Queues
Priority Queues
Doubly Linked List
Circular Linked List
Multilinked list
Figure 1- Queue
As in stacks, a queue can also be implemented using Arrays, Linked-lists,
Pointers and Structures.
158
10.2 BASIC OPERATIONS Queues
159
Data Structures Sometimes, we also check to see if a queue is initialized or not, to handle
any unforeseen situations.
Algorithm for enqueue operation
procedure enqueue(data)
if queue is full
return overflow
endif
rear ← rear +1
queue[rear]← data
returntrue
end procedure
intenqueue(int data)
if(isfull())
return0;
rear = rear +1;
queue[rear]= data;
return1;
end procedure
160
Queues
procedure dequeue
if queue is empty
return underflow
endif
data = queue[front]
front ← front +1
returntrue
end procedure
intdequeue(){
if(isempty())
return0;
int data = queue[front];
front = front +1;
return data;
}
161
Data Structures
10.3 IMPLEMENTING QUEUE-USING PYTHON LIST
There are various ways to implement a queue in Python. This can be done
in the following ways:
list
collections.deque
queue.Queue
10.3.1 Implementation using list
List is a Python’s built-in data structure that can be used as a queue.
Instead of enqueue() and dequeue(), append() and pop() function is used.
However, lists are quite slow for this purpose because inserting or deleting
an element at the beginning requires shifting all of the other elements by
one, requiring O(n) time.
# Python program to demonstrate queue implementation using list
# Initializing a queue
queue = []
# Adding elements to the queue
queue.append('a')
queue.append('b')
queue.append('c')
print("Initial queue")
print(queue)
# Removing elements from the queue
print("\nElements dequeued from queue")
print(queue.pop(0))
print(queue.pop(0))
print(queue.pop(0))
print("\nQueue after removing elements")
print(queue)
# Uncommenting print(queue.pop(0))
# will raise and IndexError
# as the queue is now empty
162
Output: Queues
Initial queue
['a', 'b', 'c']
Elements dequeued from queue
a
b
c
Queue after removing elements
[]
163
Data Structures 2. If it is full then display Queue is full. If queue is not full then, check
if (rear == SIZE – 1 &&front != 0) if it is true then set rear=0 and
insert element.
164
Queues
return;
}
else if (front == -1) /* Insert First Element */
{
front = rear = 0;
arr[rear] = value;
}
else if (rear == size-1 &&front != 0)
{ rear = 0;
arr[rear] = value;
}
else
{ rear++;
arr[rear] = value;
}
// Function to delete element from Circular Queue
int Queue::deQueue()
{
if (front == -1)
{printf("\nQueue is Empty");
return INT_MIN;
}
int data = arr[front];
arr[front] = -1;
if (front == rear)
{ front = -1;
rear = -1;
}
else if (front == size-1)
front = 0;
else
front++;
return data;
}
165
Data Structures
// Function displaying the elements
// of Circular Queue
void Queue::displayQueue()
{
if (front == -1)
{
printf("\nQueue is Empty");
return;
}
printf("\nElements in Circular Queue are: ");
if (rear >= front)
{
for (int i = front; i<= rear; i++)
printf("%d ",arr[i]);
} else
{ for (int i = front; i< size; i++)
printf("%d ", arr[i]);
for (int i = 0; i<= rear; i++)
printf("%d ", arr[i]);
}
}
/* Driver of the program */
int main()
{
Queue q(5);
// Inserting elements in Circular Queue
q.enQueue(14);
q.enQueue(22);
q.enQueue(13);
q.enQueue(-6);
// Display elements present in Circular Queue
q.displayQueue();
// Deleting elements from Circular Queue
printf("\nDeleted value = %d", q.deQueue());
166
Queues
printf("\nDeleted value = %d", q.deQueue());
q.displayQueue();
q.enQueue(9);
q.enQueue(20);
q.enQueue(5);
q.displayQueue();
q.enQueue(20);
return 0;
}
Output:
Elements in Circular Queue are: 14 22 13 -6
Deleted value = 14
Deleted value = 22
Elements in Circular Queue are: 13 -6
Elements in Circular Queue are: 13 -6 9 20 5
Queue is Full
Time Complexity: Time complexity of enQueue(), deQueue() operation is
O(1) as there is no loop in any of the operation.
10.4.2 Applications:
Insert operation
The insert operation append the queue by adding an element to the end of
the queue. The new element will be the last element of the queue.
Firstly, allocate the memory for the new node ptr by using the following
statement.
168
In the second case, the queue contains more than one element. The Queues
condition front = NULL becomes false. In this scenario, we need to update
the end pointer rear so that the next pointer of rear will point to the new
node ptr. Since, this is a linked queue, hence we also need to make the rear
pointer point to the newly added node ptr. We also need to make the next
pointer of rear point to NULL.
rear -> next = ptr;
rear = ptr;
rear->next = NULL;
In this way, the element is inserted into the queue. The algorithm and the
C implementation is given as follows.
Algorithm
Step3: IFFRONT=NULL
SETFRONT=REAR=PTR
SETFRONT->NEXT=REAR->NEXT=NULL
ELSE
SETREAR->NEXT=PTR
SETREAR=PTR
SETREAR->NEXT=NULL
[END OF IF]
Step 4: END
Deletion
Deletion operation removes the element that is first inserted among all the
queue elements. Firstly, we need to check either the list is empty or not.
The condition front == NULL becomes true if the list is empty, in this
case , we simply write underflow on the console and make exit.
Otherwise, we will delete the element that is pointed by the pointer front.
For this purpose, copy the node pointed by the front pointer into the
pointer ptr. Now, shift the front pointer, point to its next node and free the
node pointed by the node ptr. This is done by using the following
statements.
ptr = front;
front = front -> next;
free(ptr);
The algorithm and C function is given as follows.
169
Data Structures Algorithm
Step1: IFFRONT=NULL
Write“Underflow"
GotoStep5
[END OF IF]
Step 5: END
If two elements have the same priority, they are served according to
their order in the queue.
A real-life example of a priority queue would be a hospital queue where
the patient with the most critical situation would be the first in the queue.
In this case, the priority order is the situation of each patient.
Atypical priority queue supports following operations.
Main operations
170
peek() -> Check the element on the top Queues
isEmpty() -> Check if the queue is empty
171
Data Structures Because bounded priority queues require a comparator and a maximum
size constraint, they do not comply with the recommendation in
the Collection interface in that they neither implement a nullary
constructor nor a constructor taking a single collection. Instead, they are
constructed with a comparator and a maximum size bound.
An unbounded priority queue based on a priority heap. The elements of
the priority queue are ordered according to their natural ordering, or by
a Comparator provided at queue construction time, depending on which
constructor is used. A priority queue does not permit null elements. A
priority queue relying on natural ordering also does not permit insertion of
non-comparable objects (doing so may result in Class Cast Exception).
The head of this queue is the least element with respect to the specified
ordering. If multiple elements are tied for least value, the head is one of
those elements -- ties are broken arbitrarily. The queue retrieval
operations poll, remove, peek, and element access the element at the head
of the queue.
A priority queue is unbounded, but has an internal capacity governing the
size of an array used to store the elements on the queue. It is always at
least as large as the queue size. As elements are added to a priority queue,
its capacity grows automatically. The details of the growth policy are not
specified.
This class and its iterator implement all of the optional methodsof
the Collection and Iterator interfaces.
The Iterator provided in method iterator() is not guaranteed to traverse the
elements of the priority queue in any particular order. If you need ordered
traversal, consider using Arrays.sort(pq.toArray()).
We traverse a circular singly linked list until we reach the same node
where we started.
173
Data Structures Every item has a priority associated with it.
If two elements have the same priority, they are served according to
their order in the queue.
10.8 REFERENCES
Data structures and Algorithms Narasimha karum.
Data structures and Algorithms using C ,C++ learnbay.com
Greeksforgreeks.com
Data Structures by Schaum Series
Introduction to Algorithms by Thomas H Cormen
Introduction to Algorithm: A Creative Approach
10.9 UNIT END EXERCISES
1. What is Queue ? Explain Bounded queue with different operations.
2. What is the meaning of ADT?
3. Write a short note on”
a. linked Queue
b. Circular Queue
174
Unit III
11
RECURSION
Unit Structure:
11.0 Objective
11.1 Introduction
11.2 Types of recursion
11.2.1 Implementation
11.2.2 Analysis of Recursion
11.2.3 Time Complexity
11.2.4 Space Complexity
11.3 Properties of recursive functions
11.4 How does recursion work?
11.5 When is recursion used?
11.6 Practical Applications
11.7 Summary
11.8 Questions
11.0 OBJECTIVE
To study of Recursion.
11.1 INTRODUCTION
In simple words, recursion is a problem solving, and in some cases, a
programming technique that has a very special and exclusive property. In
recursion, a function or method has the ability to call itself to solve the
problem. The process of recursion involves solving a problem by turning it
into smaller varieties of itself.
175
Data Structures The process in which a function calls itself could happen directly as well
as indirectly. This difference in call gives rise to different types of
recursion, which we will talk about a little later.
The concept of recursion is established on the idea that a problem can be
solved much easily and in lesser time if it is represented in one or smaller
versions. Adding base conditions to stop recursion is another important
part of using this algorithm to solve a problem.
176
2. Tail recursive Recursion
Tail recursion is a form of linear recursion. In tail recursion, the recursive
call is the last thing the function does. Often, the value of the recursive call
is returned. As such, tail recursive functions can often be easily
implemented in an iterative manner; by taking out the recursive call and
replacing it with a loop, the same effect can generally be achieved. In fact,
a good compiler can recognize tail recursion and convert it to iteration in
order to optimize the performance of the code.
A good example of a tail recursive function is a function to compute the
GCD, or Greatest Common Denominator, of two numbers:
intgcd(int m, int n)
{
int r;
if (m < n) return gcd(n,m);
r = m%n;
if (r == 0) return(n);
else return(gcd(n,r));
}
3. Binary Recursive
Some recursive functions don't just have one call to themself, they have
two (or more). Functions with two recursive calls are referred to as binary
recursive functions.
The mathematical combinations operation is a good example of a function
that can quickly be implemented as a binary recursive function. The
number of combinations, often represented as nCk where we are choosing
n elements out of a set of k elements, can be implemented as follows:
177
Data Structures 4. Exponential recursion
An exponential recursive function is one that, if you were to draw out a
representation of all the function calls, would have an exponential number
of calls in relation to the size of the data set (exponential meaning if there
were n elements, there would be O(an) function calls where a is a positive
number).
A good example an exponentially recursive function is a function to
compute all the permutations of a data set. Let's write a function to take an
array of n integers and print out every permutation of it.
intackerman(int m, int n)
{
if (m == 0) return(n+1);
else if (n == 0)
return(ackerman(m-1,1));
else
return(ackerman(m-1,ackerman(m,n-1)));
}
6. Mutual Recursion
A recursive function doesn't necessarily need to call itself. Some recursive
functions work in pairs or even larger groups. For example, function A
calls function B which calls function C which in turn calls function A.
A simple example of mutual recursion is a set of function to determine
whether an integer is even or odd. How do we know if a number is even?
Well, we know 0 is even. And we also know that if a number n is even,
then n - 1 must be odd. How do we know if a number is odd? It's not even!
intis_even(unsigned int n)
{
if (n==0) return 1;
else return(is_odd(n-1));
}
intis_odd(unsigned int n)
{
return (!iseven(n));
}
179
Data Structures I told you recursion was powerful! Of course, this is just an illustration.
The above situation isn't the best example of when we'd want to use
recursion instead of iteration or a closed form solution. A more efficient
set of function to determine whether an integer is even or odd would be
the following:
intis_even(unsigned int n)
{
if (n % 2 == 0) return 1;
else return 0;
}
intis_odd(unsigned int n)
{
if (n % 2 != 0) return 1;
else return 0;
}
Problem: Your boss asks you to write a function to sum up all of the
numbers between some high and low value. You decide to write two
different versions of the function, one recursive and one iterative. 1) Write
them. The next morning you come into work and your boss calls you into
his office, unhappy at how slow both of your functions work, compared to
how the problem could be solved. 2) How else could you solve this
problem?
1a) Iteratively:
intsum_nums(int low, int high)
{ inti, total=0;
for(i=low; i<=high; i++)
total+=i;
return total;
}
180
1b) Recursively: Recursion
intsum_nums(int low, int high)
{
if (low == high) return high;
else return low + sum_nums(low+1, high);
}
int factorial(int n)
{ if (n<=1) return 1;
else if (n<0) return 0;
else return factorial(n-1) * n;
}
The first two if statements should be switched. This function works fine if
the function is called on valid input (n > = 0). But if it is called on invalid
input (n < 0), the function will incorrectly return 1.
Problem: Your research assistant has come to you with the following two
functions:
181
Data Structures
intfactorial_iter(int n)
{ int fact=1;
if (n<0) return 0;
for( ; n>0; n--) fact *= n;
return(fact);
}
and
intfactorial_recur(int n)
{ if (n<0) return 0;
else if (n<=1)
return 1;
else
return n * factorial_recur(n-1);
}
11.2.1 Implementation
Many programming languages implement recursion by means of stacks.
Generally, whenever a function (caller) calls another function (callee) or
itself as callee, the caller function transfers execution control to the
callee. This transfer process may also involve some data to be passed
from the caller to the callee.
This implies, the caller function has to suspend its execution temporarily
and resume later when the execution control returns from the callee
function. Here, the caller function needs to start exactly from the point of
execution where it puts itself on hold. It also needs the exact same data
values it was working on. For this purpose, an activation record (or stack
frame) is created for the caller function.
182
Recursion
This activation record keeps the information about local variables, formal
parameters, return address and all information passed to the caller
function.
11.2.2 Analysis of Recursion
One may argue why to use recursion, as the same task can be done with
iteration. The first reason is, recursion makes a program more readable
and because of latest enhanced CPU systems, recursion is more efficient
than iterations.
11.2.3 Time Complexity
In case of iterations, we take number of iterations to count the time
complexity. Likewise, in case of recursion, assuming everything is
constant, we try to figure out the number of times a recursive call is being
made. A call made to a function is Ο(1), hence the (n) number of times a
recursive call is made makes the recursive function Ο(n).
11.2.4 Space Complexity
Space complexity is counted as what amount of extra space is required
for a module to execute. In case of iterations, the compiler hardly requires
any extra space. The compiler keeps updating the values of variables
used in the iterations. But in case of recursion, the system needs to store
activation record each time a recursive call is made. Hence, it is
considered that space complexity of recursive function may go higher
than that of a function with iteration.
183
Data Structures The base case is the condition that allows the algorithm to stop the
recursion and begin the process of returning to the original calling
function. This process is sometimes called unwinding the stack. A base
case is typically a problem that is small enough to solve directly. In
the accumulate algorithm the base case is an empty vector.
The second property requires modifying something in the recursive
function that on subsequent calls moves the state of the program closer to
the base case. A change of state means that some data that the algorithm is
using is modified. Usually the data that represents our problem gets
smaller in some way. In the accumulate algorithm our primary data
structure is a vector, so we must focus our state-changing efforts on the
vector. Since the base case is the empty vector, a natural progression
toward the base case is to shorten the vector.
Lastly, a recursive algorithm must call itself. This is the very definition of
recursion. Recursion is a confusing concept to many beginning
programmers. As a novice programmer, you have learned that functions
are good because you can take a large problem and break it up into smaller
problems. The smaller problems can be solved by writing a function to
solve each problem. When we talk about recursion it may seem that we are
talking ourselves in circles. We have a problem to solve with a function,
but that function solves the problem by calling itself! But the logic is not
circular at all; the logic of recursion is an elegant expression of solving a
problem by breaking it down into a smaller and easier problems.
In the remainder of this chapter we will look at more examples of
recursion. In each case we will focus on designing a solution to a problem
by using the three properties of recursive functions.
184
again go on. This is why recursion makes defining situations a lot easier Recursion
than usual.
Recursion is also a good enough programming technique. A recursive
subroutine is defined as one that directly or indirectly calls itself. Calling a
subroutine directly signifies that the definition of the subroutine already
has the call statement of calling the subroutine that has been defined.
On the other hand, the indirect calling of a subroutine happens when a
subroutine calls another subroutine, which then calls the original
subroutine. Recursion can use a few lines of code to describe a very
complex task. Let us now turn our attention to the different types of
recursion that we have already touched upon.
If you’re not familiar with Candy Crush (You should be) then chess is
another example of recursion in action. Almost all searching
algorithms today use a form of recursion as well. In this day and age
where information is key, recursion becomes one of the most
important methods in programming.
Multiple recursion with the Sierpinski gasket
So far, the examples of recursion that we've seen require you to make
one recursive call each time. But sometimes you need to make
multiple recursive calls. Here's a good example, a mathematical
construct that is a fractal known as a Sierpinski gasket:
186
Recursion
Take the three squares with an × through them—the top left, top right,
and bottom right—and divide them into four sections in the same
way:
187
Data Structures
Keep going. Divide every square with an × into four sections, and
place an × in the top left, top right, and bottom right squares, but
never the bottom left.
188
Recursion
Once the squares get small enough, stop dividing. If you fill in each
square with an × and forget about all the other squares, you get the
Sierpinski gasket. Here it is once again:
189
Data Structures
190
var dim = 240; Recursion
varminSize = 8;
draw = function() {
background(255, 255, 255);
fill(255, 255, 0);
rect(0, 0, dim, dim);
stroke(0, 0, 255);
fill(0, 0, 255);
drawGasket(0, 0, dim);
};
Example.
1. "Recursion" is technique of solving any problem by calling same
function again and again until some breaking (base) condition where
recursion stops and it starts calculating the solution from there on. For
eg. calculating factorial of a given number
2. Thus in recursion last function called needs to be completed first.
3. Now Stack is a LIFO data structure i.e. ( Last In First Out) and hence
it is used to implement recursion.
4. The High level Programming languages, such as Pascal , C etc. that
provides support for recursion use stack for book keeping.
191
Data Structures 5. In each recursive call, there is need to save the
1. current values of parameters,
2. local variables and
3. the return address (the address where the control has to return
from the call).
6. Also, as a function calls to another function, first its arguments, then
the return address and finally space for local variables is pushed onto
the stack.
192
Examples Recursion
Let’s take a classic example where recursion is the best solution: the
Fibonacci sequence. If we want to generate the nth Fibonacci number
using recursion, we can do it like this:
Let’s take another example. In this case, we have a number of bunnies and
each bunny has two big floppy ears. We want to compute the total number
of ears across all the bunnies recursively. We could do it like this:
193
Data Structures
11.7 SUMMARY
In simple words, recursion is a problem solving, and in some cases, a
programming technique that has a very special and exclusive
property. In recursion, a function or method has the ability to call
itself to solve the problem.
Some recursive functions don't just have one call to themself, they
have two (or more). Functions with two recursive calls are referred to
as binary recursive functions.
11.8 QUESTIONS
1. Write a short note on Recursion.
2. What are the types of Recursion?
3. Explain Linear Recursive with example.
4. Explain Tail recursive with example.
5. Explain Binary Recursive with example.
6. Explain Exponential recursion with example.
7. Explain Nested Recursion with example.
8. Explain Mutual Recursion with example.
9. Properties of recursive functions
10. How does recursion work?
11. When is recursion used?
194
Recursion
11.9 REFERENCE FOR FURTHER READING
https://www.upgrad.com/blog/recursion-in-data-structure/
https://www.sparknotes.com/cs/recursion/whatisrecursion/section2/
https://www.upgrad.com/blog/recursion-in-data-structure/
https://daveparillo.github.io/cisc187-reader/recursion/properties.html
https://medium.com/@frankzou4000/recursion-and-its-applications-
4dc00ee94130
195
12
HASH TABLE
Unit Structure:
12.0 Objective
12.1 Introduction
12.2 Hashing
12.3 Hash Function
12.3.1 Types of Hash Functions-
12.3.2 Properties of Hash Function
12.4 Collision
12.5 Collision Resolution
12.5.1 Separate chaining (Open Hashing)
12.5.2 In open addressing
12.6 Summary
12.7 Questions
12.8 Reference for further reading
12.0 OBJECTIVE
To study of Hashing techniques.
To study application of Hashing Algorithm.
To find best Hashing Algorithm for particular situation.
12.1 INTRODUCTION
Hash functions are used in conjunction with hash tables to store and
retrieve data items or data records. The hash function translates the key
associated with each datum or record into a hash code, which is used to
index the hash table. When an item is to be added to the table, the hash
code may index an empty slot (also called a bucket), in which case the
item is added to the table there. If the hash code indexes a full slot, some
kind of collision resolution is required: the new item may be omitted (not
added to the table), or replace the old item, or it can be added to the table
in some other location by a specified procedure.
12.2 HASHING
In data structures,
There are several searching techniques like linear search, binary search,
search trees etc.
196
In these techniques, time taken to search any particular element depends Hash Table
on the total number of elements.
Example-
Linear Search takes O(n) time to perform the search in unsorted arrays
consisting of n elements.
Binary Search takes O(logn) time to perform the search in sorted
arrays consisting of n elements.
It takes O(logn) time to perform the search in Binary Search
Tree consisting of n elements.
Drawback-
The main drawback of these techniques is-
As the number of elements increases, time taken to perform the search
also increases.
This becomes problematic when total number of elements become too
large.
Hashing is one of the searching techniques that uses a constant time. The
time complexity in hashing is O(1). Till now, we read the two techniques
for searching, i.e., linear search and binary search. The worst time
complexity in linear search is O(n), and O(logn) in binary search. In both
the searching techniques, the searching depends upon the number of
elements but we want the technique that takes a constant time. So, hashing
technique came that provides a constant time.
In Hashing technique, the hash table and hash function are used. Using the
hash function, we can calculate the address at which the value can be
stored.
The main idea behind the hashing is to create the (key/value) pairs. If the
key is given, then the algorithm computes the index at which the value
would be stored. It can be written as:
Index = hash(key)
197
Data Structures Advantage-
Unlike other searching techniques,
Hashing is extremely efficient.
The time taken by it to perform the search does not depend upon the
total number of elements.
It completes the search with constant time complexity O(1).
Hashing Mechanism-
In hashing,
An array data structure called as Hash table is used to store the data
items.
Based on the hash key value, data items are inserted into the hash table.
Hash function is a function that maps any big number or string to a small
integer value.
Hash function takes the data item as an input and returns a small integer
value as an output.
The small integer value is called as a hash value.
Hash value of the data item is then used as an index for storing it into
the hash table.
198
13.3.1 Types of Hash Functions- Hash Table
There are various types of hash functions available such as-
This works well because most or all bits of the key value contribute to
the result. For example, consider records whose keys are 4-digit
numbers in base 10.
The goal is to hash these key values to a table of size 100 (i.e., a range
of 0 to 99). This range is equivalent to two digits in base 10.
All digits of the original key value (equivalently, all bits when the
number is viewed in binary) contribute to the middle two digits of the
squared value. Thus, the result is not dominated by the distribution of
the bottom digit or the top digit of the original key value.
199
Data Structures 3. Folding Hash Function
The folding method for constructing hash functions begins by dividing the
item into equal-size pieces (the last piece may not be of equal size). These
pieces are then added together to give the resulting hash value. For
example, if our item was the phone number 436-555-4601, we would take
the digits and divide them into groups of 2 (43,65,55,46,01). After the
addition, 43+65+55+46+01, we get 210. If we assume our hash table has
11 slots, then we need to perform the extra step of dividing by 11 and
keeping the remainder. In this case 210 % 11 is 1, so the phone number
436-555-4601 hashes to slot 1. Some folding methods go one step further
and reverse every other piece before the addition. For the above example,
we get 43+56+55+64+01=219 which gives 219 % 11=10.
12.3.2 Properties of Hash Function-
The properties of a good hash function are-
It is efficiently computable.
It minimizes the number of collisions.
It distributes the keys uniformly over the table.
12.4 COLLISION
A collision occurs when more than one value to be hashed by a particular
hash function hash to the same slot in the table or data structure (hash
table) being generated by the hash function.
Example Hash Table With Collisions:
200
Let’s take the exact same hash function from before: take the value to be Hash Table
hashed mod 10, and place it in that slot in the hash table.
Numbers to hash: 22, 9, 14, 17, 42
As before, the hash table is shown to the right.
As before, we hash each value as it appears in the string of values to hash,
starting with the first value. The first four values can be entered into the
hash table without any issues. It is the last value, 42, however, that causes
a problem. 42 mod 10 = 2, but there is already a value in slot 2 of the hash
table, namely 22. This is a collision.
The value 42 must end up in one of the hash table’s slots,
but arbitrarly assigning it a slot at random would make accessing data in a
hash table much more time consuming, as we obviously want to retain the
constant time growth of accessing our hash table. There are two common
ways to deal with collisions: chaining, and open addressing.
201
Data Structures
202
Hash Table
Step-02:
Insert the given keys in the hash table one by one.
The first key to be inserted in the hash table = 50.
Bucket of the hash table to which key 50 maps = 50 mod 7 = 1.
204
So, key 50 will be inserted in bucket-1 of the hash table as- Hash Table
Step-03:
The next key to be inserted in the hash table = 700.
Bucket of the hash table to which key 700 maps = 700 mod 7 = 0.
So, key 700 will be inserted in bucket-0 of the hash table as-
Step-04:
The next key to be inserted in the hash table = 76.
Bucket of the hash table to which key 76 maps = 76 mod 7 = 6.
So, key 76 will be inserted in bucket-6 of the hash table as-
205
Data Structures
Step-05:
The next key to be inserted in the hash table = 85.
Bucket of the hash table to which key 85 maps = 85 mod 7 = 1.
Since bucket-1 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to
bucket-1.
So, key 85 will be inserted in bucket-1 of the hash table as-
Step-06:
The next key to be inserted in the hash table = 92.
Bucket of the hash table to which key 92 maps = 92 mod 7 = 1.
Since bucket-1 is already occupied, so collision occurs.
206
Separate chaining handles the collision by creating a linked list to Hash Table
bucket-1.
So, key 92 will be inserted in bucket-1 of the hash table as-
Step-07:
The next key to be inserted in the hash table = 73.
Bucket of the hash table to which key 73 maps = 73 mod 7 = 3.
So, key 73 will be inserted in bucket-3 of the hash table as-
Step-08:
The next key to be inserted in the hash table = 101.
Bucket of the hash table to which key 101 maps = 101 mod 7 = 3.
Since bucket-3 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to
bucket-3.
207
Data Structures So, key 101 will be inserted in bucket-3 of the hash table as-
208
Using the hash value, that bucket of the hash table is checked. Hash Table
If the required key is found, the key is searched.
Otherwise, the subsequent buckets are checked until the required key or
an empty bucket is found.
The empty bucket indicates that the key is not present in the hash table.
Delete Operation-
The key is first searched and then deleted.
After deleting the key, that particular bucket is marked as “deleted”.
NOTE-
During insertion, the buckets marked as “deleted” are treated like any
other empty bucket.
During searching, the search is not terminated on encountering the
bucket marked as “deleted”.
The search terminates only after the required key or an empty bucket is
found.
Example
Suppose we have a list of size 20 (m = 20). We want to put some elements
in linear probing fashion. The elements are {96, 48, 63, 29, 87, 77, 48, 65,
69, 94, 61}
209
Data Structures
Hash Table
2. Quadratic probing
210
This means that if the first hash value is h, the successive values are Hash Table
h+1, h+4, h+9, h+16, and so on.
Figure shows our example values after they are placed using this
technique.
When i = 0
Index= (5+02)%10 = 5
When i=1
Index = (5+12)%10 = 6
Since location 6 is empty, so the value 11 will be added at the index 6.
The next element is 13. When the hash function is applied on 13, then the
index value comes out to be 9, which we already discussed in the chaining
method. At index 9, the cell is occupied by another value, i.e., 3. So, we
will apply the quadratic probing technique to calculate the free location.
When i=0
Index = (9+02)%10 = 9
When i=1
Index = (9+12)%10 = 0
Since location 0 is empty, so the value 13 will be added at the index 0.
The next element is 7. When the hash function is applied on 7, then the
index value comes out to be 7, which we already discussed in the chaining
method. At index 7, the cell is occupied by another value, i.e., 7. So, we
will apply the quadratic probing technique to calculate the free location.
211
Data Structures When i=0
Index = (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
Since location 8 is empty, so the value 7 will be added at the index 8.
The next element is 12. When the hash function is applied on 12, then the
index value comes out to be 7. When we observe the hash table then we
will get to know that the cell at index 7 is already occupied by the value 2.
So, we apply the Quadratic probing technique on 12 to determine the free
location.
When i=0
Index= (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
When i=2
Index = (7+22)%10 = 1
When i=3
Index = (7+32)%10 = 6
When i=4
Index = (7+42)%10 = 3
Since the location 3 is empty, so the value 12 would be stored at the index
3.
The final hash table would be:
212
Therefore, the order of the elements is 13, 9, _, 12, _, 6, 11, 2, 7, 3. Hash Table
3. Double Hashing
Double hashing is an open addressing technique which is used to avoid the
collisions. When the collision occurs then this technique uses the
secondary hash of the key. It uses one hash value as an index to move
forward until the empty location is found.
In double hashing, two hash functions are used. Suppose h1(k) is one of
the hash functions used to calculate the locations whereas h2(k) is another
hash function. It can be defined as "insert ki at first free place
from (u+v*i)%m where i=(0 to m-1)". In this case, u is the location
computed using the hash function and v is equal to (h2(k)%m).
Consider the same example that we use in quadratic probing.
3 ((2*3)+3)%10 = 9 - 1
2 ((2*2)+3)%10 = 7 - 1
9 ((2*9)+3)%10 = 1 - 1
6 ((2*6)+3)%10 = 5 - 1
11 ((2*11)+3)%10 = 5 (3(11)+1)%10 =4 3
13 ((2*13)+3)%10 = 9 (3(13)+1)%10 = 0
7 ((2*7)+3)%10 = 7 (3(7)+1)%10 = 2
12 ((2*12)+3)%10 = 7 (3(12)+1)%10 = 7 2
As we know that no collision would occur while inserting the keys (3, 2,
9, 6), so we will not apply double hashing on these key values.
On inserting the key 11 in a hash table, collision will occur because the
calculated index value of 11 is 5 which is already occupied by some
another value. Therefore, we will apply the double hashing technique on
key 11. When the key value is 11, the value of v is 4.
Now, substituting the values of u and v in (u+v*i)%m
213
Data Structures When i=0
Index = (5+4*0)%10 =5
When i=1
Index = (5+4*1)%10 = 9
When i=2
Index = (5+4*2)%10 = 3
Since the location 3 is empty in a hash table; therefore, the key 11 is added
at the index 3.
The next element is 13. The calculated index value of 13 is 9 which is
already occupied by some another key value. So, we will use double
hashing technique to find the free location. The value of v is 0.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (9+0*0)%10 = 9
We will get 9 value in all the iterations from 0 to m-1 as the value of v is
zero. Therefore, we cannot insert 13 into a hash table.
The next element is 7. The calculated index value of 7 is 7 which is
already occupied by some another key value. So, we will use double
hashing technique to find the free location. The value of v is 2.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (7 + 2*0)%10 = 7
When i=1
Index = (7+2*1)%10 = 9
When i=2
Index = (7+2*2)%10 = 1
When i=3
Index = (7+2*3)%10 = 3
When i=4
Index = (7+2*4)%10 = 5
When i=5
Index = (7+2*5)%10 = 7
When i=6
Index = (7+2*6)%10 = 9
When i=7
Index = (7+2*7)%10 = 1
When i=8
214
Index = (7+2*8)%10 = 3 Hash Table
When i=9
Index = (7+2*9)%10 = 5
Since we checked all the cases of i (from 0 to 9), but we do not find
suitable place to insert 7. Therefore, key 7 cannot be inserted in a hash
table.
The next element is 12. The calculated index value of 12 is 7 which is
already occupied by some another key value. So, we will use double
hashing technique to find the free location. The value of v is 7.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (7+7*0)%10 = 7
When i=1
Index = (7+7*1)%10 = 4
Since the location 4 is empty; therefore, the key 12 is inserted at the index
4.
The final hash table would be:
215
Data Structures Separate Chaining Vs Open Addressing-
Some buckets of the hash table are Buckets may be used even if
never used which leads to wastage no key maps to those
of space. particular buckets.
Secondary
Yes Yes No
Clustering
Number of Probe
Sequence m m m2
(m = size of table)
Lies between
Cache performance Best Poor
the two
216
Conclusions- Hash Table
Linear Probing has the best cache performance but suffers from
clustering.
Quadratic probing lies between the two in terms of cache performance
and clustering.
Double caching has poor cache performance but no clustering.
Load Factor (α)-
Load factor (α) is defined as-
In open addressing, the value of load factor always lie between 0 and 1.
This is because-
In open addressing, all the keys are stored inside the hash table.
So, size of the table is always greater or at least equal to the number of
keys stored in the table.
PRACTICE PROBLEM BASED ON OPEN ADDRESSING-
Problem-
Using the hash function ‘key mod 7’, insert the following sequence of
keys in the hash table-
50, 700, 76, 85, 92, 73 and 101
Use linear probing technique for collision resolution.
Solution-
The given sequence of keys will be inserted in the hash table as-
Step-01:
Draw an empty hash table.
For the given hash function, the possible range of hash values is [0, 6].
So, draw an empty hash table consisting of 7 buckets as-
217
Data Structures
Step-02:
Insert the given keys in the hash table one by one.
The first key to be inserted in the hash table = 50.
Bucket of the hash table to which key 50 maps = 50 mod 7 = 1.
So, key 50 will be inserted in bucket-1 of the hash table as-
Step-03:
The next key to be inserted in the hash table = 700.
Bucket of the hash table to which key 700 maps = 700 mod 7 = 0.
So, key 700 will be inserted in bucket-0 of the hash table as-
218
Hash Table
Step-04:
The next key to be inserted in the hash table = 76.
Bucket of the hash table to which key 76 maps = 76 mod 7 = 6.
So, key 76 will be inserted in bucket-6 of the hash table as-
Step-05:
The next key to be inserted in the hash table = 85.
Bucket of the hash table to which key 85 maps = 85 mod 7 = 1.
Since bucket-1 is already occupied, so collision occurs.
To handle the collision, linear probing technique keeps probing linearly
until an empty bucket is found.
The first empty bucket is bucket-2.
So, key 85 will be inserted in bucket-2 of the hash table as-
219
Data Structures
Step-06:
The next key to be inserted in the hash table = 92.
Bucket of the hash table to which key 92 maps = 92 mod 7 = 1.
Since bucket-1 is already occupied, so collision occurs.
To handle the collision, linear probing technique keeps probing linearly
until an empty bucket is found.
The first empty bucket is bucket-3.
So, key 92 will be inserted in bucket-3 of the hash table as-
Step-07:
The next key to be inserted in the hash table = 73.
Bucket of the hash table to which key 73 maps = 73 mod 7 = 3.
Since bucket-3 is already occupied, so collision occurs.
To handle the collision, linear probing technique keeps probing linearly
until an empty bucket is found.
220
The first empty bucket is bucket-4. Hash Table
So, key 73 will be inserted in bucket-4 of the hash table as-
Step-08:
The next key to be inserted in the hash table = 101.
Bucket of the hash table to which key 101 maps = 101 mod 7 = 3.
Since bucket-3 is already occupied, so collision occurs.
To handle the collision, linear probing technique keeps probing linearly
until an empty bucket is found.
The first empty bucket is bucket-5.
So, key 101 will be inserted in bucket-5 of the hash table as-
221
Data Structures 12.6 SUMMARY
Linear Search takes O(n) time to perform the search in unsorted
arrays consisting of n elements.
There are various types of hash functions available such as Mid Square
Hash Function, Division Hash Function, Folding Hash Function
12.7 QUESTIONS
1. Write a short note on Hashing.
2. Differentiate between Linear Search and Binary Search.
3. What are advantages and disadvantages of Hashing?
4. What are types of Hash Function?
5. Explain Collision I detail.
6. What techniques are used to avoid collision.
7. Explain quadratic probing.
8. Explain double hashing.
9. Differentiate between Separate Chaining and Open Addressing
222
12.8 REFERENCE FOR FURTHER READING Hash Table
http://www.cs.cmu.edu/~ab/15-111N09/Lectures/Lecture%2017%20-
%20%20Hashing.pdf
https://www.gatevidyalay.com/hashing/
https://www.gatevidyalay.com/tag/hashing-in-data-structure-notes/
http://web.mit.edu/16.070/www/lecture/hashing.pdf
https://www.upgrad.com/blog/hashing-in-data-structure/
https://research.cs.vt.edu/AVresearch/openalgoviz/VT/Hashing/tags/2
0090324-release/midsquare.php
http://www.umsl.edu/~siegelj/information_theory/projects/HashingFu
nctionsInCryptography.html
https://runestone.academy/runestone/books/published/pythonds/SortS
earch/Hashing.html
223
13
ADVANCED SORTING
Unit Structure:
13.0 Objective
13.1 Introduction
13.2 Merge Sort
13. 3 Quick Sort
13.4 Radix Sort
13.5 Sorting Linked List
13.6 Summary
13.7 Questions
13.8 Reference for further reading
13.0 OBJECTIVE
To study and analyze time complexity of various sorting algorithms
13.1 INTRODUCTION
A Sorting Algorithm is used to rearrange a given array or list elements
according to a comparison operator on the elements. The comparison
operator is used to decide the new order of element in the respective data
structure.
224
Algorithm Advanced Sorting
Merge sort keeps on dividing the list into equal halves until it can no more
be divided. By definition, if it is only one element in the list, it is sorted.
Then, merge sort combines the smaller sorted lists keeping the new list
sorted too.
225
Data Structures Repeatedly merge sublists to produce newly sorted sublists until there is
only 1 sublist remaining. This will be the sorted list.
Pseudocode
We shall now see the pseudocodes for merge sort functions. As our
algorithms point out two main functions − divide & merge.
Merge sort works with recursion and we shall see our implementation in
the same way.
226
var c as array Advanced Sorting
while ( a and b have elements )
if ( a[0] > b[0] )
add b[0] to the end of c
remove b[0] from b
else
add a[0] to the end of c
remove a[0] from a
end if
end while
while ( a has elements )
add a[0] to the end of c
remove a[0] from a
end while
while ( b has elements )
add b[0] to the end of c
remove b[0] from b
end while
return c
end procedure
228
Program Explanation Advanced Sorting
1. The user is prompted to enter a list of numbers.
2. The list is passed to the merge_sort function.
3. The sorted list is displayed.
229
Data Structures Iteration (2)
Complexity
230
13.3 QUICK SORT Advanced Sorting
232
Step 2: The main array after the first step becomes Advanced Sorting
233
Data Structures Pseudocode of Quick Sort Algorithm:
/**
* The main function that implements quick sort.
* @Parameters: array, starting index and ending index
*/
quickSort(arr[], low, high)
{
if (low < high)
{
// pivot_index is partitioning index, arr[pivot_index] is now at correct
place in sorted array
pivot_index = partition(arr, low, high);
Program Explanation
1. The user is prompted to enter a list of numbers.
2. The list is passed to the quicksort function.
3. The sorted list is displayed.
Complexity Analysis
235
Data Structures Time Complexity of Quick sort
Best case scenario: The best case scenario occurs when the partitions
are as evenly balanced as possible, i.e their sizes on either side of the
pivot element are either are equal or are have size difference of 1 of
each other.
o Case 1: The case when sizes of sublist on either side of pivot becomes
equal occurs when the subarray has an odd number of elements and the
pivot is right in the middle after partitioning. Each partition will
have (n-1)/2 elements.
o Case 2: The size difference of 1 between the two sublists on either side
of pivot happens if the subarray has an even number, n, of elements.
One partition will have n/2 elements with the other having (n/2)-1.
In either of these cases, each partition will have at most n/2 elements, and
the tree representation of the subproblem sizes will be as below:
236
Advanced Sorting
237
Data Structures Radix sort takes in a list of nn integers which are in base bb (the radix) and
so each number has at most dd digits where
and k is the largest number in the list. For example, three digits are
needed to represent decimal 104(in base 10). It is important that radix sort
can work with any base since the running time of the
algorithm, O(d(n+b)), depends on the base it uses. The algorithm runs in
linear time when b and n are of the same size magnitude, so
knowing n, b can be manipulated to optimize the running time of the
algorithm.
Radix sort works by sorting each digit from least significant digit to most
significant digit. So in base 10 (the decimal system), radix sort would sort
by the digits in the 1's place, then the 10’s place, and so on. To do this,
radix sort uses counting sort as a subroutine to sort the digits in each place
value. This means that for a three-digit number in base 10, counting sort
will be called to sort the 1's place, then it will be called to sort the 10's
place, and finally, it will be called to sort the 100's place, resulting in a
completely sorted list. Here is a quick refresher on the counting
sort algorithm.
238
Advanced Sorting
Radix sort is a stable sort, which means it preserves the relative order of
elements that have the same key value. This is very important.
For example, since the list of numbers [56,43,51,58] will be sorted as
[51,43,56,58] when the 1’s place is sorted (since 1 < 3 < 6 < 8) and on the
second pass, when the 10’s place is being sorted, the sort will see that
three of the four values are 5.
To preserve the sorting that the algorithm determined while sorting the 1’s
place, it is important to maintain relative order (namely 1 < 6 < 8) between
the numbers with the same value in the 10’s place (or whatever place
value is currently being sorted).
The second pass of the radix sort will produce [43,51,56,58].
Counting sort can only sort one place value of a given base. For example,
a counting sort for base-10 numbers can only sort digits zero through nine.
To sort two-digit numbers, counting sort would need to operate in base-
100. Radix sort is more powerful because it can sort multi-digit numbers
without having to search over a wider range of keys (which would happen
if the base was larger).
In the image showing radix sort below, notice that each column of
numbers (each place value) is sorted by the digit in question before the
algorithm moves on to the next place value. This shows how radix sort
preserves the relative order of digits with the same value at a given place
value — remember, 66 and 68 will both appear as 66's in the 10's column,
but 68 > 66, so the order determined in the 1's column, that 8 > 6 must be
preserved for the sort to work properly and produce the correct answer.
239
Data Structures
return
defkey_factory(digit, base):
def key(alist, index):
return((alist[index]//(base**digit)) % base)
return key
largest=max(alist)
exp=0
while base**exp<= largest:
alist=counting_sort(alist, base - 1,key_factory(exp, base))
exp=exp + 1
returnalist
defcounting_sort(alist, largest, key):
c =[0]*(largest + 1)
foriinrange(len(alist)):
c[key(alist,i)]= c[key(alist,i)] + 1
# Find the last index for each element
c[0]= c[0] - 1# to decrement each element for zero-based indexing
foriinrange(1, largest + 1):
c[i]= c[i] + c[i - 1]
240
result=[None]*len(alist) Advanced Sorting
foriinrange(len(alist) - 1, -1, -1):
result[c[key(alist,i)]]=alist[i]
c[key(alist,i)]= c[key(alist,i)] - 1
return result
alist=input('Enter the list of (nonnegative) numbers: ').split()
alist=[int(x)for x inalist]
sorted_list=radix_sort(alist)
print('Sorted list: ', end='')
print(sorted_list)
Program Explanation
1. The user is prompted to enter a list of numbers.
2. The list is passed to the radix_sort function and the returned list is the
sorted list.
3. The sorted list is displayed.
241
Data Structures and b isn't much larger than n (in other words, b=O(n)), then radix sort
takes linear time.
242
''' Advanced Sorting
Split the given list's nodes into front and back halves,
If the length is odd, the extra node should go in the front list.
It uses the fast/slow pointer strategy
'''
deffrontBackSplit(source):
# if the length is less than 2, handle it separately
if source is None or source.next is None:
return source, None
(slow, fast) = (source, source.next)
# advance `fast` two nodes, and advance `slow` one node
while fast:
fast = fast.next
if fast:
slow = slow.next
fast = fast.next
# `slow` is before the midpoint of the list, so split it in two
# at that point.
ret = (source, slow.next)
slow.next = None
return ret
# Sort a given linked list using the merge sort algorithm
defmergesort(head):
# base case — length 0 or 1
if head is None or head.next is None:
return head
# split `head` into `a` and `b` sublists
front, back = frontBackSplit(head)
# recursively sort the sublists
front = mergesort(front)
back = mergesort(back)
# answer = merge the two sorted lists
return sortedMerge(front, back)
if __name__ == '__main__':
# input keys
243
Data Structures keys = [8, 6, 4, 9, 3, 1]
head = None
for key in keys:
head = Node(key, head)
# sort the list
head = mergesort(head)
# print the sorted list
printList(head)
Partition:
1. Take the rightmost element as the pivot.
2. Traverse through the list:
1. If the current node has a value greater than the pivot, we will move it
after the tail
2. else, keep it in the same position.
244
Quick Sort: Advanced Sorting
1. Call the partition() method, which will place the pivot at the right
position and will return the pivot
2. Find the tail node of the left sublist i.e., left side of the pivot and recur
the whole algorithm for the left list.
3. Now, recur the algorithm for the right list.
'''
sort a linked list using quick sort
'''
class Node:
def __init__(self, val):
self.data = val
self.next = None
class QuickSortLinkedList:
def __init__(self):
self.head=None
defaddNode(self,data):
if (self.head == None):
self.head = Node(data)
return
curr = self.head
while (curr.next != None):
curr = curr.next
newNode = Node(data)
curr.next = newNode
defprintList(self,n):
while (n != None):
print(n.data, end=" ")
n = n.next
''' takes first and last node,but do not
break any links in the whole linked list'''
defparitionLast(self,start, end):
if (start == end or start == None or end == None):
245
Data Structures return start
pivot_prev = start
curr = start
pivot = end.data
'''iterate till one before the end,
no need to iterate till the end because end is pivot'''
while (start != end):
if (start.data< pivot):
# keep tracks of last modified item
pivot_prev = curr
temp = curr.data
curr.data = start.data
start.data = temp
curr = curr.next
start = start.next
'''swap the position of curr i.e.
next suitable index and pivot'''
temp = curr.data
curr.data = pivot
end.data = temp
''' return one previous to current because
current is now pointing to pivot '''
return pivot_prev
def sort(self, start, end):
if(start == None or start == end or start == end.next):
return
# split list and partition recurse
pivot_prev = self.paritionLast(start, end)
self.sort(start, pivot_prev)
'''
if pivot is picked and moved to the start,
that means start and pivot is same
so pick from next of pivot
'''
if(pivot_prev != None and pivot_prev == start):
246
self.sort(pivot_prev.next, end) Advanced Sorting
# if pivot is in between of the list,start from next of pivot,
# since we have pivot_prev, so we move two nodes
elif (pivot_prev != None and pivot_prev.next != None):
self.sort(pivot_prev.next.next, end)
if __name__ == "__main__":
ll = QuickSortLinkedList()
ll.addNode(30)
ll.addNode(3)
ll.addNode(4)
ll.addNode(20)
ll.addNode(5)
n = ll.head
while (n.next != None):
n = n.next
print("\nLinked List before sorting")
ll.printList(ll.head)
ll.sort(ll.head, n)
print("\nLinked List after sorting");
ll.printList(ll.head)
# This code is contributed by humpheykibet.
Radix sort isn't a comparison-based sort, which means it can attain O(n)
performance. However if you're sorting a flat array, it has to do a lot of
data shuffling. This is one of the few times where a linked-list actually has
a performance advantage, as it can shuffle the lists around with only a
single re-link operation.
247
Data Structures n = len(arr)
# The output array elements that will have sorted arr
output = [0] * (n)
# initialize count array as 0
count = [0] * (10)
# Store count of occurrences in count[]
for i in range(0, n):
index = arr[i] // exp1
count[index % 10] += 1
# Change count[i] so that count[i] now contains actual
# position of this digit in output array
for i in range(1, 10):
count[i] += count[i - 1]
# Build the output array
i=n-1
while i>= 0:
index = arr[i] // exp1
output[count[index % 10] - 1] = arr[i]
count[index % 10] -= 1
i -= 1
# Copying the output array to arr[],
# so that arr now contains sorted numbers
i=0
for i in range(0, len(arr)):
arr[i] = output[i]
# Method to do Radix Sort
defradixSort(arr):
# Find the maximum number to know number of digits
max1 = max(arr)
248
# Do counting sort for every digit. Note that instead Advanced Sorting
# of passing digit number, exp is passed. exp is 10^i
# where i is current digit number
exp = 1
while max1 / exp> 0:
countingSort(arr, exp)
exp *= 10
# Driver code
arr = [170, 45, 75, 90, 802, 24, 2, 66]
# Function Call
radixSort(arr)
for i in range(len(arr)):
print(arr[i])
13.6 SUMMARY
Merge sort is one of the most efficient sorting algorithms. It works on
the principle of Divide and Conquer.
The name "Quick Sort" comes from the fact that, quick sort is capable
of sorting a list of data elements significantly faster (twice or thrice
faster) than any of the common sorting algorithms.
Radix sort is an integer sorting algorithm that sorts data with integer
keys by grouping the keys by individual digits that share the same
significant position and value (place value).
13.7 QUESTIONS
1. Explain Merge Sort with example.
2. Write python code for Merge Sort.
3. Explain Quick Sort with example.
249
Data Structures 4. Write python code for Quick Sort.
5. Explain Radix Sort with example.
6. Write python code for Radix Sort.
https://brilliant.org/wiki/radix-sort/
https://www.techiedelight.com/merge-sort-singly-linked-list/
https://www.studytonight.com/post/use-quick-sort-to-sort-a-linear-
linked-list
https://daveparillo.github.io/cisc187-reader/recursion/properties.html
https://medium.com/@frankzou4000/recursion-and-its-applications-
4dc00ee94130
https://www.sanfoundry.com/python-program-implement-radix-sort/
250
14
BINARY TREES
Unit Structure:
14.0 Objective
14.1 Introduction
14.2 Tree Terminology
14.3 Types of Tree
14.4 Binary Tree
14.5 Implementation of Binary Tree
14.6 Traversing a Tree
14.7 Expression Trees
14.8 Heaps and Heapsort
14.9 Search Trees
14.10 Summary
14.11 Questions
14.12 Reference for further reading
14.0 OBJECTIVE
Trees reflect structural relationships in the data. Trees are used to
represent hierarchies. Trees provide an efficient insertion and searching.
Trees are very flexible data, allowing to move subtrees around with
minumum effort.
14.1 INTRODUCTION
We have all watched trees from our childhood. It has roots, stems,
branches and leaves. It was observed long back that each leaf of a tree
can be traced to root via a unique path. Hence tree structure was used to
explain hierarchical relationships, e.g. family tree, animal kingdom
classification, etc.
251
Data Structures This hierarchical structure of trees is used in Computer science as an
abstract data typefor various applications like data storage, search and sort
algorithms. Let us explore this data type in detail.
Example From
Terminology Description
Diagram
Parent node is an
Parent Node immediate predecessor of B is parent of D & E
a node.
252
Edge is a connection Binary Trees
between one node to
Line between A & B is
Edge another. It is a line
edge
between two nodes or a
node and a leaf.
Path is a number of
Path / successive edges from A – B – E – J is path
Traversing source node to from node A to E
destination node.
A, B, C, D & E can
Height of a node have height. Height of
represents the number of A is no. of edges
Height of
edges on the longest path between A and H, as
Node
between that node and a that is the longest path,
leaf. which is 3. Height of C
is 1
Level of a node
represents the generation
of a node. If the root Level of H, I & J is 3.
Levels of
node is at level 0, then its Level of D, E, F & G is
node
next child node is at level 2
1, its grandchild is at
level 2, and so on
Degree of a node
Degree of Degree of D is 2 and of
represents the number of
Node E is 1
children of a node.
Properties
1. Follow properties of a tree.
2. A node can have any number of children.
253
Data Structures
Usage
1. Used to store hierarchical data such as folder structures.
2. Binary Tree
A binary tree is a tree data structure where the following properties can be
found.
Properties
1. Follow properties of a tree.
2. A node can have at most two child nodes (children).
3. These two child nodes are known as the left child and right child.
Usage
1. Used by compilers to build syntax trees.
2. Used to implement expression parsers and expression solvers.
3. Used to store router-tables in routers.
254
3. Binary Search Tree Binary Trees
A binary search tree is a more constricted extension of a binary tree.
Properties
1. Follow properties of a binary tree.
2. Has a unique property known as the binary-search-tree property.
This property states that the value (or key) of the left child of a given
node should be less than or equal to the parent value and the value of
the right child should be greater than or equal to the parent value.
Usage
1. Used to implement simple sorting algorithms.
2. Can be used as priority queues.
3. Used in many search applications where data are constantly entering
and leaving.
4. AVL tree
An AVL tree is a self-balancing binary search tree. This is the first tree
introduced which automatically balances its height.
Properties
1. Follow properties of binary search trees.
2. Self-balancing.
3. Each node stores a value called a balance factor which is the
difference in height between its left subtree and right subtree.
4. All the nodes must have a balance factor of -1, 0 or 1.
After performing insertions or deletions, if there is at least one node that
does not have a balance factor of -1, 0 or 1 then rotations should be
performed to balance the tree (self-balancing).
255
Data Structures
Usage
1. Used in situations where frequent insertions are involved.
2. Used in Memory management subsystem of the Linux kernel to search
memory regions of processes during preemption.
5. Red-black tree
A red-black tree is a self-balancing binary search tree, where each node has
a colour; red or black. The colours of the nodes are used to make sure that
the tree remains approximately balanced during insertions and deletions.
Properties
1. Follow properties of binary search trees.
2. Self-balancing.
3. Each node is either red or black.
4. The root is black (sometimes omitted).
5. All leaves (denoted as NIL) are black.
6. If a node is red, then both its children are black.
7. Every path from a given node to any of its leaf nodes must go through
the same number of black nodes.
256
Usage Binary Trees
1. As a base for data structures used in computational geometry.
2. Used in the Completely Fair Scheduler used in current Linux kernels.
3. Used in the epoll system call implementation of Linux kernel.
6. Splay tree
A splay tree is a self-balancing binary search tree.
Properties
1. Follow properties of binary search trees.
2. Self-balancing.
3. Recently accessed elements are quick to access again.
After performing a search, insertion or deletion, splay trees perform an
action called splaying where the tree is rearranged (using rotations) so that
the particular element is placed at the root of the tree.
Usage
1. Used to implement caches
2. Used in garbage collectors.
3. Used in data compression
7. Treap
A treap (the name derived from tree + heap) is a binary search tree.
Properties
1. Each node has two values; a key and a priority.
2. The keys follow the binary-search-tree property.
3. The priorities (which are random values) follow the heap property.
257
Data Structures
Usage
1. Used to maintain authorization certificates in public-key cryptosystems.
2. Can be used to perform fast set operations.
8. B-tree
B tree is a self-balancing search tree and contains multiple nodes which
keep data in sorted order. Each node has 2 or more children and consists of
multiple keys.
Properties
1. Every node x has the following:
Every other node must have at least (t-1) keys and at most (2t-1)
keys. Hence, every node will have at least t children and at most 2t
children. We say the node is full if it has (2t-1) keys.
258
Binary Trees
Usage
1. Used in database indexing to speed up the search.
2. Used in file systems to implement directories.
259
Data Structures
261
Data Structures Benefits of a Binary Tree
The search operation in a binary tree is faster as compared to other
trees
Only two traversals are enough to provide the elements in sorted order
It is easy to pick up the maximum and minimum elements
Graph traversal also uses binary trees
Converting different postfix and prefix expressions are possible using
binary trees
class Node:
def __init__(self, data):
self.left = None
self.right = None
self.data = data
defPrintTree(self):
print(self.data)
root = Node(10)
root.PrintTree()
262
Output Binary Trees
When the above code is executed, it produces the following result −
10
class Node:
def __init__(self, data):
self.left = None
self.right = None
self.data = data
def insert(self, data):
# Compare the new value with the parent node
if self.data:
if data <self.data:
if self.left is None:
self.left = Node(data)
else:
self.left.insert(data)
elif data >self.data:
if self.right is None:
self.right = Node(data)
else:
self.right.insert(data)
else:
self.data = data
# Print the tree
263
Data Structures defPrintTree(self):
if self.left:
self.left.PrintTree()
print( self.data),
if self.right:
self.right.PrintTree()
# Use the insert method to add nodes
root = Node(12)
root.insert(6)
root.insert(14)
root.insert(3)
root.PrintTree()
Output
When the above code is executed, it produces the following result −
3 6 12 14
264
Let's understand the inorder traversal through an example. Binary Trees
Consider the below tree for the inorder traversal.
First, we will visit the left part, then root, and then the right part of
performing the inorder traversal. In the above tree, A is a root node, so we
move to the left of the A, i.e., B. As node B does not have any left child so
B will be printed as shown below:
After visiting node B, we move to the right child of node B, i.e., D. Since
node D is a leaf node, so node D gets printed as shown below:
The left part of node A is traversed. Now, we will visit the root node, i.e.,
A, and it gets printed as shown below:
Once the traversing of left part and root node is completed, we move to
the right part of the root node. We move to the right child of node A, i.e.,
C. The node C has also left child, i.e., E and E has also left child, i.e., G.
Since G is a leaf node, so G gets printed as shown below:
265
Data Structures
Since E does not have any right child, so we move to the root of the E
node, i.e., C. C gets printed as shown below:
Once the left part of node C and the root node, i.e., C, are traversed, we
move to the right part of Node C. We move to the node F and node F has a
left child, i.e., H. Since H is a leaf node, so it gets printed as shown below:
Now we move to the root node of H, i.e., F and it gets printed as shown
below:
After visiting the F node, we move to the right child of node F, i.e., I, and
it gets printed as shown below:
classNode:
def __init__(self, data):
self.left=None
self.right=None
self.data= data
# Insert Node
def insert(self, data):
ifself.data:
if data <self.data:
ifself.leftisNone:
self.left=Node(data)
else:
self.left.insert(data)
elif data >self.data:
ifself.rightisNone:
self.right=Node(data)
else:
self.right.insert(data)
else:
self.data= data
# Print the Tree
defPrintTree(self):
ifself.left:
self.left.PrintTree()
print(self.data),
ifself.right:
self.right.PrintTree()
# Inorder traversal
# Left -> Root -> Right
definorderTraversal(self, root):
res=[]
if root:
res=self.inorderTraversal(root.left)
267
Data Structures res.append(root.data)
res= res +self.inorderTraversal(root.right)
return res
root=Node(27)
root.insert(14)
root.insert(35)
root.insert(10)
root.insert(19)
root.insert(31)
root.insert(42)
print(root.inorderTraversal(root))
When the above code is executed, it produces the following result −
[10,14,19,27,31,35,42]
2. Preorder Traversal
A preorder traversal is a traversal technique that follows the policy,
i.e., Root Left Right. Here, Root Left Right means root node of the tree is
traversed first, then the left subtree and finally the right subtree is
traversed. Here, the Preorder name itself suggests that the root node would
be traversed first.
268
To perform the preorder traversal, we first visit the root node, then the left Binary Trees
part, and then we traverse the right part of the root node. As node A is the
root node in the above tree, so it gets printed as shown below:
Once the root node is traversed, we move to the left subtree. In the left
subtree, B is the root node for its right child, i.e., D. Therefore, B gets
printed as shown below:
Since node B does not have a left child, and it has only a right child;
therefore, D gets printed as shown below:
Once the left part of the root node A is traversed, we move to the right part
of node A. The right child of node A is C. Since C is a root node for all the
other nodes; therefore, C gets printed as shown below:
Now we move to the left child, i.e., E of node C. Since node E is a root
node for node G; therefore, E gets printed as shown below:
The node E has a left child, i.e., G, and it gets printed as shown below:
Since the left part of the node C is completed, so we move to the right part
of the node C. The right child of node C is node F. The node F is a root
node for the nodes H and I; therefore, the node F gets printed as shown
below:
269
Data Structures
Once the node F is visited, we will traverse the left child, i.e., H of node F
as shown below:
Now we will traverse the right child, i.e., I of node F, as shown below:
classNode:
self.left=None
self.right=None
self.data= data
# Insert Node
def insert(self, data):
ifself.data:
if data <self.data:
ifself.leftisNone:
270
self.left=Node(data) Binary Trees
else:
self.left.insert(data)
elif data >self.data:
ifself.rightisNone:
self.right=Node(data)
else:
self.right.insert(data)
else:
self.data= data
# Preorder traversal
# Root -> Left ->Right
defPreorderTraversal(self, root):
res=[]
if root:
res.append(root.data)
res= res +self.PreorderTraversal(root.left)
res= res +self.PreorderTraversal(root.right)
return res
271
Data Structures root=Node(27)
root.insert(14)
root.insert(35)
root.insert(10)
root.insert(19)
root.insert(31)
root.insert(42)
print(root.PreorderTraversal(root))
[27,14,10,19,35,31,42]
3. Postorder Traversal
A Postorder traversal is a traversal technique that follows the
policy, i.e., Left Right Root. Here, Left Right Root means the left subtree
of the root node is traversed first, then the right subtree, and finally, the
root node is traversed. Here, the Postorder name itself suggests that the
root node of the tree would be traversed at the last.
Let's understand the Postorder traversal through an example.
Consider the below tree for the Postorder traversal.
To perform the postorder traversal, we first visit the left part, then the right
part, and then we traverse the root node. In the above tree, we move to the
left child, i.e., B of node A. Since B is a root node for the node D;
therefore, the right child, i.e., D of node B, would be traversed first and
then B as shown below:
272
Binary Trees
Once the traversing of the left subtree of node A is completed, then the
right part of node A would be traversed. We move to the right child of
node A, i.e., C. Since node C is a root node for the other nodes, so we
move to the left child of node C, i.e., node E. The node E is a root node,
and node G is a left child of node E; therefore, the node G is printed first
and then E as shown below:
Once the traversal of the left part of the node C is traversed, then we move
to the right part of the node C. The right child of node C is node F. Since F
is also a root node for the nodes H and I; therefore, the left child 'H' is
traversed first and then the right child 'I' of node F as shown below:
Once the left part and the right part of node C are traversed, then the node
C is traversed as shown below:
In the above tree, the left subtree and the right subtree of root node A have
been traversed, the node A would be traversed.
classNode:
self.left=None
self.right=None
self.data= data
# Insert Node
def insert(self, data):
ifself.data:
if data <self.data:
ifself.leftisNone:
self.left=Node(data)
else:
self.left.insert(data)
elif data >self.data:
ifself.rightisNone:
self.right=Node(data)
else:
self.right.insert(data)
else:
self.data= data
274
self.left.PrintTree() Binary Trees
print(self.data),
ifself.right:
self.right.PrintTree()
# Postorder traversal
# Left ->Right -> Root
defPostorderTraversal(self, root):
res=[]
if root:
res=self.PostorderTraversal(root.left)
res= res +self.PostorderTraversal(root.right)
res.append(root.data)
return res
root=Node(27)
root.insert(14)
root.insert(35)
root.insert(10)
root.insert(19)
root.insert(31)
root.insert(42)
print(root.PostorderTraversal(root))
[10,19,14,31,42,35,27]
a + (b * c) + d * (e + f)
275
Data Structures
Expression Tree
All the below are also expressions. Expressions may includes constants
value as well as variables
a*6
16
(a^2)+(b^2)+(2 * a * b)
(a/b) + (c)
m * (c ^ 2)
276
Thus, An expression is created or constructed by reading the symbols or Binary Trees
numbers from the left. If operand, create a node. If operator, create a tree
with operator as root and two pointers to left and right subtree
The first two symbols are operands, we create one-node tree and push a
pointer to them onto the stack.
Next, read a'+' symbol, so two pointers to tree are popped,a new tree is
formed and push a pointer to it onto the stack.
Next, 'c' is read, we create one node tree and push a pointer to it onto the
stack.
277
Data Structures
Finally, the last symbol is read ' * ', we pop two tree pointers and form a
new tree with a, ' * ' as root, and a pointer to the final tree remains on the
stack.
Examples
Expression Tree is used to represent expressions. Let us look at some
examples of prefix, infix and postfix expressions from expression tree for
3 of the expresssions:
a*b+c
a+b*c+d
a+b-c*d+e*f
278
Binary Trees
Infix, Prefix and Postfix Expressions from Expression Tree for a+b*c+d
Expression Tree for a + b * c + d can be represented as:
279
Data Structures
Infix expression a+b*c+d
Infix, Prefix and Postfix Expressions from Expression Tree for a+b-
c*d+e*f
Expression Tree for a + b - c * d + e * f can be represented as:
Infix expression a + b - c *d + e * f
280
For example, if X is the parent node of Y, then the value of X follows a Binary Trees
specific order with respect to the value of Y and the same order will be
followed across the tree.
The maximum number of children of a node in a heap depends on the type
of heap. However, in the more commonly-used heap type, there are at
most 2 children of a node and it's known as a Binary heap.
In binary heap, if the heap is a complete binary tree with N nodes, then it
has smallest possible height which is log2N .
In the diagram above, you can observe a particular sequence, i.e each node
has greater value than any of its children.
Suppose there are N Jobs in a queue to be done, and each job has its own
priority. The job with maximum priority will get completed first than
others. At each instant, we are completing a job with maximum priority
and at the same time we are also interested in inserting a new job in the
queue with its own priority.
So at each instant we have to check for the job with maximum priority to
complete it and also insert if there is a new job. This task can be very
easily executed using a heap by considering N jobs as N nodes of the tree.
As you can see in the diagram below, we can use an array to store the
nodes of the tree. Let’s say we have 7 elements with values {6, 4, 5, 3, 2,
0, 1}.
Note: An array can be used to simulate a tree in the following way. If we
are storing one element at index i in array Arr, then its parent will be
stored at index i/2 (unless its a root, as root has no parent) and can be
accessed by Arr[i/2], and its left child can be accessed by Arr[2 i] and its
right child can be accessed by Arr[2 i+1]. Index of root will be 1 in an
array.
281
Data Structures
Implementation:
Let’s assume that we have a heap having some elements which are stored
in array Arr. The way to convert this array into a heap structure is the
following. We pick a node in the array, check if the left sub-tree and the
right sub-tree are max heaps, in themselves and the node itself is a max
heap (it’s value should be greater than all the child nodes)
To do this we will implement a function that can maintain the property of
max heap (i.e each element value should be greater than or equal to any of
its child and smaller than or equal to its parent)
voidmax_heapify(intArr[],inti,int N)
{
int left =2*i//left child
int right =2*i+1//right child
if(left<= N andArr[left]>Arr[i])
largest= left;
else
282
largest=i; Binary Trees
Complexity: O(logN)
Example:
In the diagram below,initially 1st node (root node) is violating property of
max-heap as it has smaller value than its children, so we are performing
max_heapify function on this node having value 4.
283
Data Structures Let’s observe this with an example:
Lets take above example of 7 elements having values {8, 7, 6, 3, 2, 4, 5}.
voidbuild_maxheap(intArr[])
{
for(inti= N/2;i>=1;i--)
{
max_heapify(Arr,i);
}
}
284
Complexity: O(N). max_heapify function has complexity logN and Binary Trees
the build_maxheap functions runs only N/2 times, but the amortized
complexity for this function is actually linear.
Example:
Suppose you have 7 elements stored in array Arr.
Here N=7, so starting from node having index N/2=3, (also having value 3
in the above diagram), we will call max_heapify from index N/2 to 1.
In the diagram below:
In step 1, in max_heapify(Arr, 3), as 10 is greater than 3, 3 and 10 are
swapped and further call to max_heap(Arr, 7) will have no effect as 3 is a
leaf node now.
285
Data Structures Step 4 is a subpart of step 3, as after swapping 1 with 10, again a recursive
call of max_heapify(Arr, 3) will be performed , and 1 will be swapped
with 9. Now further call to max_heapify(Arr, 7) will have no effect, as 1 is
a leaf node now.
In step 5, we finally get a max- heap and the elements in the array Arr will
be :
2. Min Heap: In this type of heap, the value of parent node will always
be less than or equal to the value of child node across the tree and the
node with lowest value will be the root node of tree.
As you can see in the above diagram, each node has a value
smaller than the value of their children.
We can perform same operations as performed in building max_heap.
First we will make function which can maintain the min heap property, if
some element is violating it.
voidmin_heapify(intArr[],inti,int N)
{
int left =2*i;
int right =2*i+1;
int smallest;
if(left <= N andArr[left]<Arr[i])
smallest= left;
else
smallest=i;
if(right <= N andArr[right]<Arr[smallest])
smallest= right;
if(smallest !=i)
{
swap(Arr[i],Arr[ smallest ]);
min_heapify(Arr,smallest,N);
}
}
286
Complexity: O(logN) . Binary Trees
Example:
Suppose you have elements stored in array Arr {4, 5, 1, 6, 7, 3, 2}. As you
can see in the diagram below, the element at index 1 is violating the
property of min -heap, so performing min_heapify(Arr, 1) will maintain
the min-heap.
Now let’s use above function in building min-heap. We will run the above
function on remaining nodes other than leaves as leaf nodes are 1 element
heap.
voidbuild_minheap(intArr[])
{
for(inti= N/2;i>=1;i--)
min_heapify(Arr,i);
}
287
Data Structures
Heaps can be considered as partially ordered tree, as you can see in the
above examples that the nodes of tree do not follow any order with their
siblings(nodes on the same level). They can be mainly used when we give
more priority to smallest or the largest node in the tree as we can extract
these node very efficiently using heaps.
Heap Sort:
We can use heaps in sorting the elements in a specific order in efficient
time.
Let’s say we want to sort elements of array Arr in ascending order. We
can use max heap to perform this operation.
Idea: We build the max heap of elements stored in Arr, and the maximum
element of Arr will always be at the root of the heap.
Leveraging this idea we can sort an array in the following manner.
Processing:
Initially we will build a max heap of elements in Arr.
Now the root element that is Arr[1] contains maximum element
of Arr. After that, we will exchange this element with the last
element of Arr and will again build a max heap excluding the last
element which is already in its correct position and will decrease
the length of heap by one.
288
We will repeat the step 2, until we get all the elements in their Binary Trees
correct position.
We will get a sorted array.
Implementation:
Suppose there are N elements stored in array Arr.
voidheap_sort(intArr[])
{
intheap_size= N;
build_maxheap(Arr);
for(inti= N;i>=2;i--)
{
swap(Arr[1],Arr[i]);
heap_size= heap_size-1;
max_heapify(Arr,1,heap_size);
}
}
Example:
In the diagram below,initially there is an unsorted array Arr having 6
elements. We begin by building max-heap.
289
Data Structures
After building max-heap, the elements in the array Arr will be:
Processing:
Step 1: 8 is swapped with 5.
Step 2: 8 is disconnected from heap as 8 is in correct position now.
Step 3: Max-heap is created and 7 is swapped with 3.
Step 4: 7 is disconnected from heap.
Step 5: Max heap is created and 5 is swapped with 1.
Step 6: 5 is disconnected from heap.
Step 7: Max heap is created and 4 is swapped with 3.
Step 8: 4 is disconnected from heap.
Step 9: Max heap is created and 3 is swapped with 1.
Step 10: 3 is disconnected.
290
Binary Trees
291
Data Structures After all the steps, we will get a sorted array.
Binary Search Tree is a binary tree in which every node contains only
smaller values in its left subtree and only larger values in its right
subtree.
In a binary search tree, all the nodes in the left subtree of any node
contains smaller values and all the nodes in the right subtree of any node
contains larger values as shown in the following figure...
292
Example Binary Trees
The following tree is a Binary Search Tree. In this tree, left subtree of
every node contains nodes with smaller values and right subtree of every
node contains larger values.
Every binary search tree is a binary tree but every binary tree need
not to be binary search tree.
Operations on a Binary Search Tree
The following operations are performed on a binary search tree...
1. Search
2. Insertion
3. Deletion
Search Operation in BST
In a binary search tree, the search operation is performed with O(log
n) time complexity. The search operation is performed as follows...
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with the value of root node in
the tree.
Step 3 - If both are matched, then display "Given node is found!!!"
and terminate the function
Step 4 - If both are not matched, then check whether search
element is smaller or larger than that node value.
293
Data Structures Step 5 - If search element is smaller, then continue the search
process in left subtree.
Step 6- If search element is larger, then continue the search
process in right subtree.
Step 7 - Repeat the same until we find the exact element or until
the search element is compared with the leaf node
Step 8 - If we reach to the node having the value equal to the
search value then display "Element is found" and terminate the
function.
Step 9 - If we reach to the leaf node and if it is also not matched
with the search element, then display "Element is not found" and
terminate the function.
10,12,5,4,20,8,7,15 and 13
Above elements are inserted into a Binary Search Tree as follows...
295
Data Structures
classNode:
def __init__(self, data):
self.left=None
self.right=None
self.data= data
# Insert method to create nodes
def insert(self, data):
ifself.data:
if data <self.data:
ifself.leftisNone:
self.left=Node(data)
else:
self.left.insert(data)
296
else data >self.data: Binary Trees
ifself.rightisNone:
self.right=Node(data)
else:
self.right.insert(data)
else:
self.data= data
# findval method to compare the value with nodes
deffindval(self,lkpval):
iflkpval<self.data:
ifself.leftisNone:
returnstr(lkpval)+" Not Found"
returnself.left.findval(lkpval)
elseiflkpval>self.data:
ifself.rightisNone:
returnstr(lkpval)+" Not Found"
returnself.right.findval(lkpval)
else:
print(str(self.data)+' is found')
# Print the tree
defPrintTree(self):
ifself.left:
self.left.PrintTree()
print(self.data),
ifself.right:
self.right.PrintTree()
root=Node(12)
root.insert(6)
root.insert(14)
297
Data Structures root.insert(3)
print(root.findval(7))
print(root.findval(14))
Output
When the above code is executed, it produces the following result –
7 Not Found
14 is found
14.10 SUMMARY
A tree is a hierarchical data structure defined as a collection of
nodes. Nodes represent value and nodes are connected by edges.
Full Binary Tree is a special kind of a binary tree that has either zero
children or two children.
A complete binary tree is another specific type of binary tree where all
the tree levels are filled entirely with nodes, except the lowest level of
the tree.
In a binary tree, every node can have a maximum of two children but
there is no need to maintain the order of nodes basing on their values.
In a binary tree, the elements are arranged in the order they arrive at
the tree from top to bottom and left to right.
14.11 QUESTIONS
1. Explain all tree terminology.
2. What are the types of tree?
3. Write a short note on AVL tree.
4. What is Binary Tree? And How it is different than general tree?
5. What are the types of Binary Tree?
6. Write a short note on Tree Traversal.
7. What do you mean by Expression Tree?
8. Write a short note on Heap and its types.
9. How Heap Sort Works?
10. Write a short note on Binary Search Tree.
https://towardsdatascience.com/8-useful-tree-data-structures-worth-
knowing-8532c7231e8c
https://www.upgrad.com/blog/5-types-of-binary-tree/
https://www.krivalar.com/data-structures-expression-tree
https://www.hackerearth.com/practice/data-
structures/trees/heapspriority-queues/tutorial/
http://www.btechsmartclass.com/data_structures/binary-search-
tree.html
299