Vietnam National University of HCMC
International University
School of Computer Science and Engineering
Data Structures and Algorithms
★ Simple Sorting ★
Dr Vi Chi Thanh - vcthanh@hcmiu.edu.vn
https://vichithanh.github.io
Week by week topics (*)
1. Overview, DSA, OOP and Java 7. Advanced Sorting
2. Arrays 8. Binary Tree
3. Sorting 9. Hash Table
4. Queue, Stack 10.Graphs
5. List 11.Graphs Adv.
6. Recursion Final-Exam
Mid-Term 10 LABS
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 2
Objectives
• Understand and know how to use basic sorting methods.
• Bubble Sort
• Selection Sort
• Insertion Sort
• Compare their performance
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 3
Major Topics
• Introductory Remarks
• Bubble Sort
• Selection Sort
• Insertion Sort
• Sorting Objects
• Comparing Sorts
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 4
Introduction
• Why do we need to sort data?
• To get the lowest price
• To get the most crowded country
• etc.
• So many lists are better dealt if ordered.
• Sorting data may be a preliminary step to searching
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 5
Introduction
• Sorting is time-consuming task
• ➔ many sorting algorithms are developed
• Will look at simple sorting first.
• Note: there are books written on many advanced sorting techniques. E.g. shell sort;
quicksort; heapsort; etc.
• Will start with ‘simple’ sorts.
• Relatively slow,
• Easy to understand, and
• Excellent performance under circumstances.
• All these are O(N2) sorts.
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 6
How would you do it?
ARRANGE PLAYER IN ORDER OF
INCREASING HEIGHT
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 7
Basic idea of these simple sorts
• Compare two items
• Swap or Copy over
• Depending on the specific algorithm…
• Don’t need additional space
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 8
Bubble sort
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 9
Description
Suppose we have an array of data which is unsorted:
+Starting at the front, traverse the array, find the largest item, and
move (or bubble) it to the top
+With each subsequent iteration, find the next largest item and
bubble it up towards the top of the array
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 10
Observations
As well as looking at good algorithms, it is often useful to look at
sub-optimal algorithms
+Bubble sort is a simple algorithm with:
+a memorable name, and
+a simple idea
+It is also significantly worse than insertion sort
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 11
Observations
Some thoughts about bubble sort:
+the Jargon file states that bubble sort is
“the generic bad algorithm”
+Donald Knuth comments that
“the bubble sort seems to have nothing to recommend it, except a
catchy name and the fact that it leads to some interesting theoretical
problems”
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 12
Obama on
bubble sort
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 13
Implementation
+Starting with the first item, assume that it is the largest
+Compare it with the second item:
+If the first is larger, swap the two,
+Otherwise, assume that the second item is the largest
+Continue up the array, either swapping or redefining the largest item
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 14
Implementation
+After one pass, the largest item must be the last in the list
+Start at the front again:
+the second pass will bring the second largest element into the second last
position
+Repeat n – 1 times, after which, all entries will be in place
+That why it is named Bubble sort
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 15
Simulation
https://www.hackerearth.com/practice/algorithms/sorting/bubble-sort/visualize/
https://visualgo.net/en/sorting
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 16
Bubble Sort process
• After first pass, we made
• n -1 comparisons
• 0 to n-1 swaps (depend on data)
• Continue this process.
• Next time we do not check the last entry (N-1), because we know it is
in the right spot. We stop comparing at (N-2).
• See Some code (p85-86).
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 17
The Basic Algorithm
+Here we have two nested loops, and therefore calculating the run
time is straight-forward:
n −1 n ( n − 1) n ( n − 1)
( n − k ) = n ( n − 1) −
k =1 2
=
2
= (n 2 )
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 D A TA ST R U C T U R E S A N D A L G O RI T H M S ( I T5 1 1) 18
Bubble Sort process
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 19
Hand-on
+Try with some examples
89 58 29 40 12 42 10 1
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 20
Efficiency of the Bubble Sort - Comparisons
• Can readily see that there are fewer comparisons each ‘pass.’
• Thus, number of comparisons is computed as:
(n-1)+(n-2)+… + 1 = n(n-1)/2;
• For 10 elements, the number is 10*9/2 = 45.
• So, the algorithm makes about n2/2 comparisons
• (ignoring the -1 which is negligible especially if N is large)
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 25
Efficiency of the Bubble Sort - Swaps
• Fewer SWAPS than COMPARISONS
• since every comparison does not result in a swap.
• In general, a swap will occur half the time.
• For n2/2 comparisons,
we have n2/4 swaps.
• Worst case, every compare results in a swap (which case?)
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 26
Overall – Bubble Sort
• Both swaps and compares are proportional to n 2.
• Ignore the 2 and 4
➔ Complexity of Bubble Sort is O(n2)
➔ Rather slow.
• Hint to determine Big-O:
• 2 nested-loop ➔ O(n2)
• Outer loop executes n times and inner loop executes in n times PER
execution of the outer n times: hence n2
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 27
Question
• The bubble sort algorithm alternates between:
• A) Comparing and swapping
• B) Moving and copying
• C) Moving and comparing
• D) Copying and comparing
• What can be improved in Bubble Sort algorithm?
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 28
Selection sort
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 29
Selection Sort
• Fewer swaps, same comparisons.
• Swaps O(n2) ➔ O(n)
• Comparisons O(n2)
➔Important while dealing with large records
➔Reduction in swap time is more important than one in comparison
time
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 30
How does the Selection Sort work?
• Start from the first element (e.g., the left end)
• Scan all elements to selecting the smallest (largest) item.
• Swap with the first element
• Next pass, move one position right
• Repeat until all are sorted.
Animation
https://www.hackerearth.com/practice/algorithms/sorting/selection-sort/visualize/
https://visualgo.net/en/sorting
Selection Sort – in more detail
• So, in one pass, you have made n comparisons but possibly ONLY
ONE Swap!
• With each succeeding pass,
• one more item is sorted and in place;
• one fewer item needs to be considered.
• Java code for the Selection Sort (p93-94).
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 33
Selection Sort
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 34
Hand-on
+Try with some examples
89 58 29 40 12 42 10 1
1 32 12 53 11 76 23 89
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 35
Selection Sort itself - more
+Algorithm implies it is an O(n2) sort (and it is).
+How did we see this?
+In comparison with Bubble Sort
+Same number of comparisons n2/2
+Fewer swap n
→ Faster than Bubble Sort
Insertion sort
Insertion Sort
• In many cases, this sort is considered the best of these elementary
sorts.
• Still an O(n2) but:
• about twice as fast as bubble sort and
• somewhat faster than selection sort in most situations.
• Easy, but a bit more complex than the others
• Sometimes used as final stage of some more sophisticated sorts,
such as a QuickSort (coming).
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 38
Insertion Sort – The idea
39
+Thinking that ‘half’ of the list of items to be sorted. (Partially sorted ,
Marked, Unsorted items)
Insert marked item to partially sorted list
+Take out of line → Shift right until appropriate place → Insert
Insertion sort – Intermediate result
Animation
https://www.hackerearth.com/practice/algorithms/sorting/insertion-sort/visualize/
https://visualgo.net/en/sorting
Insertion Sort
• Result after each round:
• Partially-ordered list is now one item larger and
• The unsorted list is now one item smaller.
• Marked item moves one slot to the right, so once more it is again in front of
the leftmost unsorted item.
• Continue process until all unsorted items have been inserted.
• Hence the name ‘insertion sort.’
• Code page 99 - 100
Insertion sort
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 44
Discussion– How it really implemented!!
• Start with out = 1, which means there is only a single element to its
‘left.’
• We infer that this item to its left is sorted unto itself.
• Hard to argue this is not true. (This is out = 0)
• a[out] is the marked item, and it is moved into temp.
• a[out] is the leftmost unsorted item.
Hand-on
+Try with some examples
89 58 29 40 12 42 10 1
1 32 12 53 11 76 23 89
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 46
Efficiency of the Insertion Sort
• Comparisons:
• On pass one, max of one;
pass two, max of two, etc.
Up to a max of n-1 comparisons.
→ 1+2+3+…+n-1 = n*(n-1)/2 comparisons.
• But, on average
n*(n-1)/4.
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 47
Efficiency of the Insertion Sort (2 of 3)
• Copy:
• Have lots of ‘copies’
• (same as number of comparisons – about)
• ➔ But a copy is not nearly as time-consuming as a swap. Think about this!!
• For random data,
• twice as fast as the bubble sort
• faster than the selection sort.
• Still runs on O(n2) time for random data.
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 48
Efficiency of the Insertion Sort (3 of 3)
• If data is nearly sorted ➔ quite well.
• Condition in while loop is never true, so it almost runs in O(n) time; much
better than O(n2) time!
• If data is in very unsorted order (nearly backward)
• ➔ No faster than the bubble sort
• as every possible comparison and shift is carried out.
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 49
Sorting Objects
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 50
Sorting Objects
+Very important to be able to sort objects.
+Must be careful, especially in noting that
+An array are objects, and
+The sort is based on values of String attributes inside of the object.
+Use inherited String method, compareTo.
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 51
Sorting Objects
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 52
Secondary Sort Fields?
• Equal Keys Problems ? E.g., Last name
• Solutions
• Using Secondary Key? e.g., ZIP code
• Using ‘Stable’ Sort? What does this mean?
• Only sort what needs to be sorted
• Leave everything else in its original order
• All of the algorithms so far are stable
• Think Windows:
• Arrange by Type; then by date modified…
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 53
Comparing the Simple Sorts
• Bubble Sort, Selection Sort, and Insertion Sort all have a worst-case
time complexity of O(n^2).
• Bubble Sort, Selection Sort, and Insertion Sort all have a space
complexity of O(1) as they are in-place sorting algorithms.
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 54
Comparing the Simple Sorts
• Bubble Sort – simplest.
• Use only if you don’t have other algorithms available and ‘n’ is small.
• Selection Sort
• Minimizes number of swaps, but number of comparisons still high.
• Useful when amount of data is small, and swapping is very time consuming – like
when sorting records in tables – internal sorts.
• Insertion Sort
• The most versatile, and is usually best bet in most situations, if amount of data is
small or data is almost sorted.
• For large n, other sorts are better. We will cover advanced sorts later…
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 55
Comparing the Simple Sorts
• All require very little space, and they sort in place.
• Can’t see the real efficiencies / differences unless you apply the different
sources to large amounts of data.
T U E SD A Y , 2 4 S EP T EM BE R 2 0 24 56
Vietnam National University of HCMC
International University
School of Computer Science and Engineering
THANK YOU
Dr Vi Chi Thanh - vcthanh@hcmiu.edu.vn
https://vichithanh.github.io