Data Structure
Fall 2024
Instructor: Prof. John Yun
Lecture: 9-12P, Wednesdays; S2-402
Department of AI Big Data
Woosong University
Announcements
• Quiz 2 next week
• Topics on Sorting, Index and Binary Search Tree
• Special 3-Week Venture Program (with Babson Academy)
• Will be offered to all SolBridge ISB/Woosong University Students
• Will be a hybrid course (campus/online lectures)
• You will receive 3 credit units
• You will receive an official Babson Venture Program Certifications
• Index
• Binary Search Tree
• Binary Search Tree
Algorithm & Implementation
Index
Week 12
Fall 2024
Index
Searching…
• The Era of Data Deluge
• (2.5*1018) = 25,000,000,000,000,000,000 bytes of data created
daily
• 1 Terabyte Disk = 2,500,000 Disks
• Searching…
• Index: At the end of a book
• Website
• File
• Record
Searching in our lives
Record, Key, Search Tree
• Record
• Including everything about an object
• Example: Personal Record
• National ID (Social Security) Number
• Name
• Student ID
• Home Address
• Home Telephone Number Each of these data is
• Work Phone Number, called Field
• Personal Mobile Number
• Education Level
• Annual Income
• Family Members
• The purpose of the index is to search for the record of the object.
Key
• Index is made of a field that represents the record
• Field that uniquely identifying a record
• We call such field a Key.
• In the Personal Record example: disk
• The Name CANNOT be the key
• Fields that CAN be the key:
record
• Student ID
• National ID
• Mobile Number
• Structure of the Index:
• Key, Page#
• Key + Page(or block #) of the record that uses the Key
Binary Search Tree, AVL-Tree, Red-Black Tree
Binary Search Tree
Binary Search Tree AVL Tree
Built-in Index (Memory)
Red-Black Tree
Search Tree
Balanced Search Tree
2-3-4 Tree
Hashed Search Tree
B Tree External Index (Memory)
ADT: Index
• Insert key x into the index
• Search for key x in the index
• Delete key x from the index
• Check if the index is empty
• Clears the index
Data Structure for the Index
• Binary search tree
• Balanced binary search tree
• AVL tree, Red-Black tree
• B-tree
• Hash table
We can use the array for the index, but very inefficient,
It’s not practical.
(a, p1) (b, p2) … (x, pn)
• Index
• Binary Search Tree
• Binary Search Tree
Algorithm & Implementation
Binary Search Tree
Week 12
Fall 2024
Tree
root
40
root
If you reverse it, it looks like a tree
Search Tree
40
key
10 55
7 20 70
17 All nodes have a key value (for searching)
Search Tree
(a) Root node with the key value 30 (b) Root node with the key value 40
Example of Binary Search Trees
K-ary Search Tree
40
Ternary Search Tree
K=3, Maximum of 3 Branches
10 55
40 60
7 20 70
10 30 50 65
17
7 20 35 61 70 80
Binary Search Tree
K=2, Maximum of 2 Branches
32
Binary Search Tree
• All nodes have a unique key
• Each node has a maximum of 2 child (max. of 2 branches)
• For the key of an arbitrary node:
• Is bigger than all key values of the nodes in the left subtree,
• Is smaller than all key values of the nodes in the right subtree
TR
TL
Subtree r
40
10 55
7 20 70
Left Subtree of node r Right subtree of node r
17
10
55
7 20
70
17
Expression can be represented
with the binary search tree
a-b a–b*c (a + b) / c
- - /
a b a * + c
b c a b
Array-based Binary Search Tree Representation
leftChild rightChild
0 40 3 1
1 55 -1 2
0 2 70 -1 -1
40
3 10 6 4
3 1 4 20 5 -1
10 55
5 17 -1 -1
6 4 2 6 7 -1 -1
7 20 70
7 ?
8 ?
5
17
… … … …
Structure of (Binary Search Tree)
Node Object
(a) Binary Search Tree Node (b) Actual Binary Search Tree Node
Structure of (Binary Search Tree)
Node Object
Object-Oriented Binary Search Tree Node Representation
Structure of (Binary Search Tree)
Node Object
__ root → root of the binary search tree
search(x)→ search element x from the binary search tree
insert(x) → insert element x into the binary search tree
delete(x) → delete element x from the binary search tree
isEmpty()→ check if the binary search tree is empty
clear() → clears the binary search tree
Link-based Binary Search Tree Representation
Typical Binary Search Tree
Relationship between Binary Search Tree object and Binary Search Tree represented with tree nodes
Same Data, Different Binary Search Tree
20
Optimal Shape
17 55
55
7 10 40 70 20 70
17 40
7 10
It could look like this…
20 7 Worst Case Shape
10
17 40
17
7 10 55
20
70 40
55
70
• Index
• Binary Search Tree
• Binary Search Tree
Algorithm &
Binary Search Tree Implementation
Algorithm &
Implementation
Week 12
Fall 2024
Search (of a Binary Search Tree)
search(t, x): # t: root reference of the subtree
# x: the key you’re searching for
if (t = null || t.item = x)
return t
else if (x < t.item)
return search(t.left, x) # t.left: left children of the t
else
return search(t.right, x) # t.right: right children of the t
t.left t.right
Solving it recursively!
Recursive Perspective on Search
search(t, x)
t
search(t.left, x)
t.left
search(t, x):
if ( t = null || t.item = x )
return t
else if (x < t.item)
return search(t.left, x)
else
return search(t.right, x)
Failed Search 55
20 70
17 40 Φ(null)
7 10
search(t, x):
if ( t = null || t.item = x )
return t
else if (x < t.item)
return search(t.left, x)
else
return search(t.right, x)
55
Successful Search
20 70
17 40
7 10
(a) Successful Search (b) Failed Searches
Insertion
• Assumption: There is no element x in the search tree
insertSketch(t, x): # t: root node of the (sub)tree
# x: element to be inserted
if (t = null)
attach the node having the value x under the parent and return
else if (x < t.item)
insertSketch(t.left, x)
t
else t.left t.right
insertSketch(t.right, x)
insertSketch(t, x):
◀ t: root node of the (sub)tree
◀ x: key to be inserted
if (t = null)
Attach the node having the key x to the parent and end
else if (x < t.item)
insertSketch(t.left, x)
else
insertSketch(t.right, x)
55
20 70
17 40 60
7 10
Insert() Algorithm (for Binary Search Tree)
insert(x):
root ← insertItem(root, x) # root: the reference root node
insertItem(t, x):
# t: root node of the (sub)tree
# x: element x to be inserted
# after finishing inserting the element, return the root node reference
if (t = null)
r.item ← x; r.left ← null; r.right ← null # r: new node
return r
else if (x < t.item)
t.left ← insertItem(t.left, x)
return t
else
t.right ← insertItem(t.right, x)
return x
insert() Algorithm (non-recursive)
# root: reference of the tree root node
insert(x): # x: inserting key
r.item ← x; r.left ← null; r.right ← null # r: new node
if (root = null)
root ← r
else
t ← root
while (t ≠ null)
parent ← t
if (x < t.item) t ← t.left
else t ← t.right
if (x < parent.item) parent.left ← r
else parent.right ← r
Unbalanced Binary Search Tree
Deletion
deleteSketch(r): # r: node to be deleted
if (r is the leaf node) # case 1
leave r as where it is
else if (if r has only one child) # case 2
let r’s parent point to r’s child directly
else # case 3
find the smallest element s from the right subtree of r
copy s into r’s place and remove s
deleteSketch(r): ◀ r: deleting node
if (r is the leaf node) ◀ Case 1
leave the r
else if (r has only child) ◀ Case 2
let r’s parent point to r’s child directly
else ◀ Case 3
find the minimum element s from r’s right subtree
copy s into r and remove s
Case 1 Case 2 Case 3
r r r
r has 2 children
r has no child r has only 1 child
Case 1
Case 1 Delete Result
p p
r has no child
Case 2
Case 2 Delete Result
p p
r has only 1 child
Case 3 Delete Result
r r r
r has 2 children
m m
Case 1
(a) r has no child (b) simply delete r
Case 2
(a) r has only 1 child (b) Delete r (c) Place r’s child into r’s place
Case 3 – (1)
(a) find the minimum s(30) from the (b) delete r
right subtree
Case 3 – (2)
(c) Move s’ child to the s place (d) Move s into r’s place
delete() Algorithm (non-recursive)
# root: root node of the tree
delete(r, p): # r: deleting node, p: parent node of r
# when r is the root node
if (r = root)
root ← deleteNode(root);
# when r is not the root node
else if (r = p.left)
p.left ← deleteNode(r) # r is the left child of p
else
p.right ← deleteNode(r) # r is the right child of p
deleteNode(r):
if (r.left = r.right = null) return null
else if (r.left = null and r.right ≠ null) return r.right
else if (r.left ≠ null and r.right = null) return r.left
else
s ← r.right
while (s.left ≠ null)
parent ← s; s ← s.left
r.item ← s.item # copy content
if (s = r.right) r.right ← s.right
else parent.left = s.right
return r
delete() Algorithm (recursive with given x)
delete(t, x): # t: root node of the tree
# x: deleting element
if (t = null)
Error(“item not found!”)
else if (x = t.item)
t ← deleteNode(t)
return t
else if (x < t.item)
t.left ← delete(t.left, x)
return t
else
t.right ← delete(t.right, x)
return t
deleteNode(t):
if (t.left = null && t.right = null)
return null # case 1
else if (t.left = null)
return t.right # case 2 (has only left child)
else if (t.right = null)
return t.left # case 2 (has only right child)
else # case 3 (has both children)
(minItem, node) ← deleteMinItem(t.right)
t.item ← minItem
t.right ← node
return t # t survived but changed
deleteMinItem(t):
if (t.left = null)
return (t.item, t.right) # found minimum at t
else # branch left
(minItem, node) = deleteMinItem(t.left)
t.left ← node
return (minItem, t)
Search Time
Fact:
The asymptotic average search time for a binary
search tree with a total of n nodes is Θ(log n)
Properties of a Binary Search Tree
For integer h>=1, the total number of leaf nodes l(h) of a full binary
Fact 1
search tree with the height h is 2h-1.
Proof
i) (Base Case) When h=1, there is only 1 leaf node, so l(1) = 21-1 =
20 = 1 is satisfied.
ii) (Recursive Assumption) Let’s assume that the full binary search
tree with the height=k has l(k) = 2k-1 number of leaf nodes.
iii) (Recursive Analysis) We need to prove that l(k+1) = 2k . The full
binary search tree with the height k+1 has two subtrees which are
full binary search tree of their own with the height k. Thus, the
total number of leaf nodes is l(k+1) = 2ㆍl(k) = 2ㆍ 2k-1 = 2k
For integer h>=0, the total number of nodes n(h) of a full binary search
Fact 2
tree with the height h is 2h-1
Proof
i) (Base Case) When h=0, it is an empty binary search tree. This
also belongs to the full binary search tree. The number of nodes
is 0. n(0) = 20-1 = 1 is satisfied.
ii) (Recursive Assumption) Let’s assume that for all 0<=k<h, the
number of nodes for a full binary search tree is n(k) = 2k-1
iii) (Recursive Analysis) We need to prove that n(h) = 2h-1 Given the
assumption, the number of nodes for a full binary search tree with
the height h-1 is n(h-1) = 2h-1-1. The full binary search tree with
the height h has two subtrees which are full binary search tree of
their own with the height h-1. Thus, the total number of nodes is
n(h) = 1+2ㆍn(h-1) = 1+2(2h-1-1) = 2h-1 (Here we are adding 1 for
the root node)
For integer h>=0, the binary tree with the height h has equal or less
Fact 3
than 2h-1 number of nodes
Proof
The full binary tree with the height h has the most number of nodes
among all binary trees. Since the full binary tree with the height h has
2h-1 number of nodes, all binary trees with the height h has at most
2h-1 number of nodes.
The height of a binary tree having the total number of n nodes is at
Fact 4
least h log2(n+1) )
Proof If the height of the binary tree is h, then its number of nodes is n 2h-
1 from the Fact 3. When converted, it is h log2(n+1). Because h is
an integer, h log2(n+1) )
Thus, the height of the binary tree having n total number of nodes can
be represented as Ω(log n)
The maximum height of a binary tree having the total n number of
Fact 5
nodes is n
The asymptotic time for insertion in a binary search tree is the same as
Fact 6
the time for searching
The asymptotic average search time for a binary search tree with n
Fact 7
nodes is Ꝋ(log n).
Traversal of Binary Tree
Traversal: Visiting all nodes of a binary tree
Preorder Traversal
Inorder Traversal
Postorder Traversal Number of the node
represents the order of the
visit
Preorder Inorder Postorder
- Root first - Root in the middle - Root at the end
preOrder() Algorithm
preOrder(r):
if(r != null)
r.visited = true
preOrder(r.left)
preOrder(r.right)
inOrder() Algorithm
inOrder(r):
if(r != null)
inOrder(r.left)
r.visited = true
inOrder(r.right)
postOrder() Algorithm
postOrder(r):
if(r != null)
postOrder(r.left)
postOrder(r.right)
r.visited = true
Traversal of Binary Tree
Preorder Inorder Postorder
Inorder traversal visits a binary
search tree in sorted order
Theorem: The Inorder traversal of the binary search tree T visits its nodesT in sorted
order
Proof:
Base Case: h=1, (h: height) r
• T has only 1 node(root)
• Visiting only 1 node is naturally in
h
sorted order h–1
TR
Recursive Assumption:
TL
Let’s assume, for all k, k < h is true.
Recursive Proof: (Show that the analysis is true for k=h)
Binary search tree T can be drawn as on the right.
Based on the recursive assumption above, the Inorder traversal for TL
and TR visits each tree in sorted order. All keys of TL is less than the key
of r, All keys for TR is bigger than the key of r, Inorder traversal, TL → r →
TR is in sorted order.
Efficiency of Binary Search Tree Operations
Operation Average Case Worst Case
Search Θ(log n) Θ(n)
Insert Θ(log n) Θ(n)
Delete Θ(log n) Θ(n)
Θ(n)
Traversal Θ(n)
Extension
Internal Path Length (IPL)
• The sum of the depths of all nodes in a binary search tree
Fact:
1 When n elements are inserted
40
sequentially to form a binary search tree,
2 2 and all possible permutations of the input
10 55 elements are equally likely, the expected
IPL (Internal Path Length) of the
3 3 3 resulting binary search tree is O(n log n).
7 20 70
What it means:
4 Average Search Time is O(n logn)
17
IPL = 18
Finding the size of the tree
size(t):
if (t = null) return 0
else return (1+size(t.left)+size(t.right))
Tree Sort
Sort using the binary search tree
treeSort(A[ ], n):
Create a binary search tree, T from reading the elements from A[0…n-1]
in order
Traverse the tree T in inOrder, and save in A[0…n-1]
Time Complexity: Average Ꝋ(n log n)
Building a binary search tree – Average Ꝋ(n log n)
Inorder Traversal: Ꝋ(n)
The theoretical performance is good, but the constant
factor included in Ꝋ(n log n) is large.
Saving Binary Search Tree into a File
• I want to store the binary search tree in a file and load it
back into memory when needed.
• It can be stored in preorder traversal order or in inorder
traversal order.
• To restore it to its original form – Preorder Traversal
• To restore it to a perfectly balanced shape – Inorder Traversal
Restoring a binary search tree from a list stored in inorder traversal
recursiveRestore (L): ◀ L: element list
if (L is null)
return null
else
Find the median element rrr of LLL, and take it as the root.
r.left = recursiveRestore(list of elements before r)
r.right = recursiveRestore(list of elements after r)
return r
Binary Search Tree Implementation
newItem
newItem
left right
Implementation: Search
Implementation: Insert
Implementation: Delete (1)
Implementation: Delete (2)
# 3 cases
# 1. tNode is the leaf node
# 2. tNode has only 1 child
# 3. tNode has two children
# case 1(no children)
# case 2 (right child only)
# case 2 (left child only)
# case 3 (has two children)
Implementation: Delete (3)
# Miscellaneous
End of Lecture