DATA STRUCTURE NOTES
WEEK 7: TREE:
Linear data structures
• Data elements are arranged in sequential manner
• There are at most two elements connected to an element
• There is no concept of hierarchy – all data elements are on the same level
Non-linear data structures
• Hierarchical - parent-child relationship possible between data elements
• An element can be connected to more than two elements
• Tree is an example of non-linear data structures
What is Tree?
o Non-linear data structure (tree, graphs, etc)
o Represents data in hierarchical form
o An undirected graph is a tree if and only if there is a unique simple path between any two of its vertices.
o Every Tree is a Graph ,but every Graph is not a tree.
o A tree is a specific type of graph.
o Connected: Every pair of nodes is connected by exactly one unique path (no cycles).
o Acyclic: No cycles exist in Trees, But Graph may be cyclic
Feature Tree Graph
A connected acyclic graph with a
A collection of vertices and edges, may
Definition root node and hierarchical
contain cycles
structure
Exactly one path between any
Multiple paths between vertices, may have
Connectivity pair of vertices, no disconnected
disconnected components
components
Acyclicity Always acyclic (no cycles) May be acyclic or cyclic
Used in hierarchical data
Used to model various real-world scenarios
Applications structures, e.g., file systems,
like social networks, transportation networks
organizational charts
In an unbalanced tree, some branches may have significantly more nodes than others, resulting in inefficient
operations.
In a balanced tree, each branch maintains a relatively equal number of nodes, ensuring that the height of the tree
remains small and operations are efficient
A self-balancing tree is a type of tree data structure that automatically adjusts its shape during insertions and
deletions to maintain balance. Examples of self-balancing trees include AVL trees and red-black trees.
Tree data structure allows faster search algorithms, therefore database indexes are stored using trees
XML/HTML also follows tree structure
Trees terminology
o A tree is a collection of nodes
o The nodes are connected through edges
o A non-empty tree has one special node called the root
o There is a parent-child relationship among nodes of a tree
o Any node may have a parent and/or child nodes
Root node does not have a parent node
o A node can have multiple child nodes but has only one parent node
o The nodes with a common parent are sibling nodes
o A node with no children is called leaf node
o A tree can be thought of as a recursive data structure having sub-trees
A child node can be considered as the root of a sub-tree
o Length of a path between two nodes in the number of edges between them
o Depth of a node is the length of the path from the root to that node
o Height of a node is the length of the longest path from this node to a leaf node
Tree traversal
o Traversal: Processing each item in a collection
o Tree traversal: Processing each node of the tree
o Uses of traversal
Printing data (key) in all nodes in the tree
Searching for a particular key
Counting the number of nodes
Determining the height/depth of the tree
How to traverse a Tree?
List traversal was easy
• It’s a linear structure, so we just sequentially access all elements
Tree is a non-linear structure
• We can’t just start from one end and sequentially Process all nodes
• We have to make use of a key property of trees
The parent-child relationship between nodes (sub-tree)
• Tree traversal is done using stack or recursion
Tree types
Binary Tree:
o A binary tree is a hierarchical data structure in which each node has at most two children, referred to as
the left child and the right child.
o Nodes in a binary tree can have zero, one, or two children.
o Binary trees are used in various applications, including binary search trees and expression trees.
Binary Search Tree (BST):
o A binary search tree is a specific type of binary tree in which the left child of a node contains values
smaller than the parent node's value, and the right child contains values greater than or equal to the node's
value.
o Binary search trees are commonly used in search algorithms and database indexing
AVL (Adelson-Velsky and Landis)Tree:
o An AVL tree is a self-balancing binary search tree in which the heights of the two child subtrees of any
node differ by at most one.
o AVL trees maintain balance during insertions and deletions to ensure efficient performance of operations.
o AVL trees are used in scenarios where efficient search, insertion, and deletion operations are crucial, such
as in databases and compilers.
o Height of left sub tree – height of right sub tree is {-1, 0, or 1 }(Balanced Factor)
o For Each this balance factor is calculated
o No duplicate element in BST
o If balance factor is not {-1, 0, or 1 } then you have to balance it
Red-Black Tree:
o A red-black tree is another type of self-balancing binary search tree that maintains balance by ensuring
certain properties are preserved during insertions and deletions.
o Each node in a red-black tree is assigned a color (red or black), and the tree is balanced according to
specific rules related to the colors of nodes and their positions.
o Red-black trees are commonly used in implementations of associative arrays and in-memory databases.
Binary Search Tree
o BSTs allows for efficient searching, insertion, and deletion operations.
o When performing a search in a BST, the tree's structure enables a "divide and conquer" approach, where
the search space is divided in half at each step, similar to the process used in binary search on sorted
arrays.
o Running time of basic operations on binary search trees
o On average: (lgn)
The expected height of the tree is lgn
o In the worst case: (n)
The tree is a linear chain of n nodes 5
o BST property
o The left child of parent node is less than the parent node
o The right child of the parent node is greater than or equal to the parent node
o Both left and right subtree must be Binary Search Trees
o Inorder tree walk: 3 7
o Prints the keys of a binary tree in sorted order
o Root is printed between the values of its left and right subtrees: left, root, right
o Preorder tree walk:
o root printed first: root, left, right
2 5 9
o Postorder tree walk: left, right, root
o root printed last
Inorder: 2 3 5 5 7 9
Searching for a key
Given a pointer to the root of a tree and a key k:
Preorder: 5 3 2 5 7 9
Return a pointer to a node with key k Postorder: 2 5 3 9 7 5
if one exists
Otherwise return NIL
Alg: TREE-SEARCH(x, k)
1. if x = NIL or k = key [x]
2. then return x
3. if k < key [x]
4. then return TREE-SEARCH(left [x], k )
5. else return TREE-SEARCH(right [x], k )
Running Time: O (h), h – the height of the tree
What is the time complexity of find?
Average case (general case)
n/2k = 1
n = 2k
Take log2 of both sides
log2(n) = log2 (2k)
log2(n) = k log2(2)
log2(n) = k
k is the number of basic operations (comparisons/recursive calls) performed to find the required node in the tree,
and it is equal to log (n) where n is the number of nodes in the tree
Hence complexity of find = O(log(n))
Stack
A stack is a linear list in which insertion and deletion operations are performed at only one end of the list.Thus you are
able to insert as well as delete the element from only one end of the stack. Since insertion and deletion operations are
performed at same end, the element which is inserted Last is first to delete. A deck of cards or a pile of plates, A real-
world stack allows operations at one end only. LIFO stands for Last-In-First-Out. Here, the element which is placed
(inserted or added) last, is accessed first. In stack terminology, insertion operation is called PUSH operation and removal
operation is called POP operation.
PUSH(STACK, TOP, SIZE, ITEM)
This procedure pushes an ITEMS onto a stack.
IF (TOP == SIZE) then
Print “OVERFLOW”
Return
ELSE
Set TOP: = TOP + 1
[Increase TOP by 1.]
Set STACK[TOP] := ITEM.
[ Inserts ITEM in new TOP Position.]
Return.
POP(STACK, TOP, ITEM)
IF TOP = 0, then:
Write: UNDERFLOW
Return
ENDIF
Set ITEM := STACK[TOP]. [Assigns TOP elements to ITEM.]
Set TOP := TOP - 1.
[Decreases TOP elements to ITEM.]
4. Return.
Polish Notation:
Queue is a linear Data Structure in which insertion operation is performed at one end called Rear and deletion operation is
performed at another end called front. Queue is ordered collection of homogeneous data elements in which insertion and
deletion operation take place at two end . insertion allowed from starting of queue called REAR point and deletion
allowed from FRONT end only insertion operation is called ENQUEUE, deletion operation is called DEQUEUE Like
Stack, Queue is also an ordered list of elements of similar data types. Queue is a FIFO( First in First Out ) structure. The
new element is added to the end of the queue, and the elements already present in the queue maintain their relative order.
1.Queue is empty
FRONT=REAR (Front & Rear = -1)
2.Queue is full
REAR=N //N size of Queue
3.Queue contains element >=1
FRONT<REAR
NO. OF ELEMENT = REAR-FRONT+1
This means that after the last element, the next element is placed at the beginning of the queue if there is space available,
effectively forming a circular structure.
Deque stands for double ended queue. It Allows elements to be added or removed on either the ends (front or rear). Deque
is also known as head-tail linked list.
What is a list?
A linear collection of similar type of data
Can there be any other (non-linear) forms of data structure?
Yes. Trees and Graphs are non-linear data structures
Trees and Graphs are hierarchical data structures
What is Data Structure?
Data Structure is a mathematical or logical way of organizing data in memory. Data structures let the input and output be
represented in a way that can be handled efficiently and effectively. Data Structure represent data in memory and also
represents the relationship among the data that is stored in memory.
Rich Structure =>Ability to model real world
Simple =>Easier Processing
There are various operations that can be performed on Data Structure:
Traversal Insertion Deletion Searching Sorting Merging
Basically Data Structure can be classified into two categories:
(1) Primitive Data Structure
"The Data Structure which is directly operated by machine level instruction is known as Primitive Data Structure.”
All primary (built in) data types are known as Primitive data Structure.
Following are Primitive Data Structure:
Integer Floating Point Character
(2) Non Primitive Data Structure
"The Data Structure which is not directly operated by machine level instruction is known as Non Primitive Data
Structure."
Non Primitive Data Structure is derived from Primitive Data Structure. Non Primitive Data Structure are classified into
two categories:
(1) Linear Data Structure
"The Data Structure in which elements are arranged such that we can process / accessed them in linear fashion
(sequentially) is called linear data structure.“
Following are Examples of Linear Data Structure:
Array Stack Queue Linked List
(2) Non Linear Data Structure
"The Data Structure in which elements are arranged such that we can not process them in linear fashion
(sequentially) is called Non-Linear data structure.“
Non-linear data structures are those in which elements are not arranged in a sequential or linear order. Instead,
elements are organized in a more complex manner, often involving branching or hierarchical relationships.
Non Linear Data Structures are useful to represent more complex relationships as compared to Linear Data
Structures.
Following are Examples of Non-Linear Data Structure:
Tree Graph Hash map Hash tree
Asymptotic Notations are used to describe the complexity of an algorithm. Complexity of an algorithm indicates
how much time needed by an algorithm to complete its execution for given set of inputdata.
Big - O Notation is used to describe the worst case Time and Space complexity of an algorithm.
It means how much time is needed by an algorithm to complete its execution for the input size of N in worst
situations.