Mergesort: Merge Sort Visualizer

Download as pdf or txt
Download as pdf or txt
You are on page 1of 90

Mergesort

Merge Sort Visualizer


Quicksort
If we want to sort big sequences quickly, without the extra memory and copying back
and forth of mergesort, the answer is quicksort.

In real-world time, it is the fastest sorting algorithm known.

It can be performed without extra memory (but our version isn't that one). While its
worst-case run time is O(n^2), its average run time is O(n lg n). A simple algorithm for
quicksort is:
Quicksort
Here’s an imperfect trace of quicksort on a small list (assume we’ve drawn little boxes to
represent the list):
Quicksort complexity

What input will give quicksort its worst performance? (Bonus)


Quicksort complexity

Performance is best when the pivot splits the unsorted portion in half. It's worst
when the pivot doesn't split it at all.
Performance depends on choosing good pivots.
With good pivot choices, average case behavior for quicksort is O(n lg n).
● Always pick the first element as a pivot.
● Always pick the last element as a pivot
● Pick a random element as a pivot.
● Pick the middle as the pivot.
Choosing a Pivot
Simple: Pick the first or last element of the range. (bad on partially sorted input)

Better: Pick the item in the middle of the range. (better on partially sorted input)

However, picking any arbitrary element runs the risk of poorly partitioning the array of size n into two arrays of size 1
and n-1. If you do that often enough, your quicksort runs the risk of becoming O(n^2).

One improvement I've seen is pick median(first, last, mid); In the worst case, it can still go to O(n^2), but
probabilistically, this is a rare case.

For most data, picking the first or last is sufficient. But, if you find that you're running into worst case scenarios often
(partially sorted input), the first option would be to pick the central value( Which is a statistically good pivot for
partially sorted data).

So choosing a random pivot minimizes the chance that you will encounter worst-case O(n2) performance since
(always choosing first or last would cause worst-case performance for nearly-sorted or nearly-reverse-sorted data).
Beware of relative performance of comparisons, though; if your comparisons are costly, then middle does more
comparisons than choosing (a single pivot value) at random .
Complexity
● Best Case: Ω (N log (N))
The best-case scenario for quicksort occur when the pivot chosen at the each step divides the array into roughly equal
halves.
In this case, the algorithm will make balanced partitions, leading to efficient Sorting.
● Average Case: θ ( N log (N))
Quicksort’s average-case performance is usually very good in practice, making it one of the fastest sorting Algorithm.
● Worst Case: O(N^2)
The worst-case Scenario for Quicksort occur when the pivot at each step consistently results in highly unbalanced
partitions. When the array is already sorted and the pivot is always chosen as the smallest or largest element. To
mitigate the worst-case Scenario, various techniques are used such as choosing a good pivot (e.g., median of three) and
using Randomized algorithm (Randomized Quicksort ) to shuffle the element before sorting.
● Auxiliary Space: O(1), if we don’t consider the recursive stack space. If we consider the recursive stack space then, in the
worst case quicksort could make O(N).
General

Advantages of Quick Sort:


● It is a divide-and-conquer algorithm that makes it easier to solve problems.
● It is efficient on large data sets.
● It has a low overhead, as it only requires a small amount of memory to function.

Disadvantages of Quick Sort:


● It has a worst-case time complexity of O(N2), which occurs when the pivot is chosen poorly.
● It is not a good choice for small data sets.
● It is not a stable sort, meaning that if two elements have the same key, their relative order will not be
preserved in the sorted output in case of quick sort, because here we are swapping elements according to
the pivot’s position (without considering their original positions).
Radish Sort
Radix Sort
● Radix Sort is a linear sorting algorithm.

● Radix Sort's time complexity of O(nd), where n is the size of the array and d is the
number of digits in the largest number. It is not an in-place sorting algorithm
because it requires extra space.

● Radix sort algorithm may be slower than other sorting algorithms such as merge
sort and Quicksort if the operations are inefficient. These operations include
sub-inset lists and delete functions, and the process of isolating the desired digits.

● Because it is based on digits or letters, radix sort is less flexible than other sorts. If
the type of data changes, the Radix sort must be rewritten.
Bubble sort and Shell sort

They are well-known approaches to sorting that are part of the computer
science vocabulary, even though they are often not particularly useful in
practice.
Bubble Sort
Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the
adjacent elements if they are in the wrong order. This algorithm is not suitable for large
data sets as its average and worst-case time complexity is quite high.
In Bubble Sort algorithm,
● traverse from left and compare adjacent elements, then the higher one is placed at
right side.
● In this way, the largest element is moved to the rightmost end at first.
● This process is then continued to find the second largest and place it and so on
until the data is sorted.
Bubble Sort

https://www.geeksforgeeks.org/bubble-sort/?ref=header_search
Bubble Sort
Bubble Sort
Bubble Sort complexity
Time Complexity: O(N2)
Auxiliary Space: O(1)

Advantages of Bubble Sort:


● Bubble sort is easy to understand and implement.
● It does not require any additional memory space.
● It is a stable sorting algorithm, meaning that elements with the same key value maintain their relative order in the sorted
output.

Disadvantages of Bubble Sort:


● Bubble sort has a time complexity of O(N^2) which makes it very slow for large data sets.

Due to its simplicity, bubble sort is often used to introduce the concept of a sorting algorithm. In computer graphics, it is popular for
its capability to detect a tiny error (like a swap of just two elements) in almost-sorted arrays and fix it with just linear complexity (2n).
Shell Sort

Shell sort is a generalized version of the insertion sort algorithm. It first sorts elements that
are far apart from each other and successively reduces the interval between the elements
to be sorted.
● Step 1 − Start
● Step 2 − Initialize the value of gap size. Example: h
● Step 3 − Divide the list into smaller sub-part. Each must have equal intervals to h
● Step 4 − Sort these sub-lists using insertion sort
● Step 5 – Repeat this step 2 until the list is sorted.
● Step 6 – Print a sorted list.
● Step 7 – Stop.
Shell Sort
Shell Sort
Worst Case Complexity
The worst-case complexity for shell sort is O(n2)

Best Case Complexity


When the given array list is already sorted the total count of comparisons of each interval is equal to the size of
the given array.
So best case complexity is Ω(n log(n))

Average Case Complexity


The shell sort Average Case Complexity depends on the interval selected by the programmer.
θ(n log(n)2).
THE Average Case Complexity: O(n*log n)~O(n1.25)

Space Complexity
The space complexity of the shell sort is O(1).
Quick Recap
Heapsort
We can use a heap as the foundation for a very efficient sorting algorithm called
heapsort.

Heapsort consists of two phases:


Heapify: build a heap using the elements to be sorted
Sort: Use the heap to sort the data

This can all be done in place in the array that holds the heap, but it’s easier to see if
we draw the trees instead of the array. Once you see how it works with trees, make
sure you understand how it works in place in the array.
Previously...

...we considered the possibility that sorted linear lists, whether linked
or otherwise, pose some challenges when it comes to insertion (and
also deletion)

We suggested that there might be a better way...


Trees

In computing, a tree has a root (or root node) and leaves (or leaf nodes) just like a
botanical tree. But the resemblance between our trees and nature’s best sort of ends
there.
For example, we like our trees upside down. Non-computing people
tend to think this is weird. Really, we’re just trendsetters...
Anyone who has ever used a computer has experience with trees,
though they may not know it. That’s how file directory structures are
represented on your computer.
A tree is a (possibly non-linear) data structure made up of nodes or vertices and
edges with only one pathway from the root node to a given node. The tree with no
nodes is called the null or empty tree. A tree that is not empty consists of a root
node and potentially many levels of additional nodes that form a hierarchy.
Binary trees

For reasons which will become obvious, we’re primarily interested in


binary trees (and variations on the binary-tree theme).

A binary tree is a tree that is either


- empty (or null), or
- a node called the root node and two binary trees called the left
subtree and a right subtree

(Remember that there can’t be multiple paths from the root to any leaf
node in a tree.)
Binary trees

So this is a binary tree.


This is a binary tree too.
And this
Here’s another one.
Of course this is a binary tree.
Here is yet another.
This is a tree, but it’s not binary
This might be "binary", but it’s not a tree.
Each binary tree node holds some data, a pointer to its left subtree and
a pointer to its right subtree.
Each binary tree node holds some data, a pointer to its left subtree and
a pointer to its right subtree.
Tree traversal

Sometimes, we want to operate on the values contained in a binary


tree. We walk through the tree in a prescribed order and visit or
process the value at the nodes as they are encountered. This is called
tree traversal.
Heaps
Heaps

Say you’re at the grocery store. You put a few items in your cart and
head for checkout. There are a couple of other sparsely-laden carts
ahead of you, but ahead of those is the guy whose cart has enough
food for the next five Thanksgiving dinners. Should you and the other
sparsely-laden carts go ahead of him?
You click ‘send’ on an email to your significant other to say you’ll meet
them in 15 minutes at your favorite pub. But just before you clicked
that button, another person using the same email system clicked
‘send’ on an uncompressed file of all the digital video he’s taken since
2007. Whose email should go first?

You put your to-do list in the printer queue right after the student in the
next office hit the ‘print’ button on her 957-page doctoral dissertation
with annotated bibliography. Which one should get printed first?
Heaps

Problems like this can be solved easily by searching the queue for
“little jobs” that can be done before the “big jobs”.

If that dissertation in the printer queue holds up the small print


requests of 50 other people, you have 50 unhappy people. If those 50
jobs go first, then at most you have 1 unhappy person. And that person
isn’t expecting speedy printing anyway, so she probably won’t notice.
Heaps

If the jobs/requests are being fed into a traditional FIFO queue, you can
extract the “small jobs” by performing a linear search on the data
structure. As the queue gets bigger, it takes more time to search for the
“small jobs” ... time that could be spent on processing the jobs.

If we want to prioritize the entries in the queue, we’ll want a faster way
of getting at the “small jobs”. The heap data structure will help us get
there. We’ll use it as a queue, but it’s not a linear data structure...it’s a
tree.
Heaps

Except that you can implement that tree as a linear list, and really
cool stuff can happen as a result. Wait. What?

A heap is a complete* binary tree in which

• If A is a parent node of B, then the value (or key) at A is ordered with


respect to the value (or key) of B, and the same ordering applies
throughout the heap. In other words, every subtree of a heap is a heap.

*Danger! Terminology confusion ahead.


A full binary tree is a binary tree in which each node has exactly 0 or 2 children. Each
internal node has exactly 2 children, and each leaf has 0 children.

A complete binary tree is a binary tree of height h in which leaves appear only at two
adjacent levels, depth h – 1 and depth h, and the leaves at the very bottom (depth h)
are in the leftmost positions. This is the common textbook definition.

Some sources (e.g., Wikipedia) say that a binary tree is almost complete or nearly
complete if the bottom level is not completely filled, and the tree is only complete if the
bottom level is completely filled.
Tree Terminology
Heaps

So like we said, a heap is a complete binary tree in which

• If A is a parent node of B, then the value (or key) at A is ordered with


respect to the value (or key) of B, and the same ordering applies
throughout the heap. In other words, every subtree of a heap is a heap
Heaps
If the values (or keys) of parent nodes are always greater than or equal to those of the
child nodes and the largest value (or key) is at the root node, we have a maximum binary
heap or max heap.

If the values (or keys) of parent nodes are always less than or equal to those of the child
nodes and the smallest value (or key) is at the root node, we have a minimum binary
heap or min heap. All the examples in your textbook seem to be min heaps.

So in lecture, we’ll use min heaps for consistency. Also note the following: This type of
heap is completely unrelated to the common pool of free memory from which
dynamically allocated memory is assigned, which is also called a heap.
Heaps
Insert
Delete
Implementing a heap
If the node we’re looking at is 28, then k = 4
The left child is 41 at index = 8
The right child is 52 at index = 9
The parent is 14 at index = 2
The more formal algorithms for insertion in and deletion from a heap
are in your textbook. You should make sure you understand the details.
Priority queues

The heap data structure provides the basis for the priority queue
abstract data type. If we assume that a value in the heap represents
the priority of a job (smaller numbers having higher priority, like number
of pages in a print request), then as values are added, the smaller,
higher priority values float toward the top, and lower priority values fall
to the bottom.
Priority queues

When we want to select the highest priority job in the queue, we just
grab the item at the top of the heap (e.g. 3). We then move the last item
in the heap to the top and perform ReheapDown as described
previously.
Priority queues

Removing the minimum value (i.e., highest priority item) from the
priority queue requires ReheapDown, which takes O(lg n) time. Adding
a new item to the priority queue requires ReheapUp, which also takes
O(lg n) time.
Heapsort
We can use a heap as the foundation for a very efficient sorting algorithm called
heapsort.

Heapsort consists of two phases:


Heapify: build a heap using the elements to be sorted
Sort: Use the heap to sort the data

This can all be done in place in the array that holds the heap, but it’s easier to see if
we draw the trees instead of the array. Once you see how it works with trees, make
sure you understand how it works in place in the array.
Heapsort

Here’s the heapify component:

for each item in the sequence to be sorted


add the item to the next available position in the complete binary tree
restore the heap property (using ReheapUp)

Say we want to sort the sequence 5 2 1 4 3:


Heapsort - heapify

Say we want to sort the sequence 5 2 1 4 3:

The sequence is heapified


Heapsort - Sort

Now for the sorting:

while the heap is not empty


● remove the first item from the heap by
swapping it with the last item in the
heap
● reduce the size of the heap by one
● restore the heap property

Now it’s sorted, but it’s largest to smallest (reading top to bottom, left to right). We
created a min heap, and we ended up with the sequence sorted from largest to
smallest. The same thing will happen when sorting in place in the array.
Heapsort

In the array-based heapsort example in the book (which you should


study), the original sequence was heapified into a max heap, and the
resulting sorted sequence in the array went from smallest to largest.
Just something to keep in mind.
Heapsort analysis
A heap of size n has lg n levels. Building a heap requires finding the correct location
for each item in a heap with lg n levels. Because there are n items to insert in the
heap and each insert is O(lg n), the time to heapify the original unsorted sequence
using ReheapUp or Sift-up is O(n lg n). During sorting, we have n items to remove
from the heap, which then is also O(n lg n). Because we can do it all in the original
array, no extra storage is required.
https://www.explainxkcd.co
m/wiki/index.php/1185:_Ine
ffective_Sorts

You might also like