Jump to content

Adaptive heap sort: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Citation bot (talk | contribs)
Altered title. Added chapter. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | #UCB_CommandLine
 
(16 intermediate revisions by 10 users not shown)
Line 1: Line 1:
{{Short description|Comparison-based sorting algorithm}}
In computer science, adaptive heap sort is a [[Comparison sort|comparison-based]] [[sorting algorithm]] of the [[Adaptive sort|adaptive sort family]]. It is a variant of [[Heapsort|heap sort]] that performs better when the data contains existing order. Published by Christos Levcopoulos and Ola Petersson in 1992, the algorithm utilizes a new measure of presortedness, ''Osc,'' as the number of oscillations <ref name=":0">{{Cite journal|last=Levcopoulos|first=C.|last2=Petersson|first2=O.|date=1993-05-01|title=Adaptive Heapsort|url=http://www.sciencedirect.com/science/article/pii/S0196677483710217|journal=Journal of Algorithms|volume=14|issue=3|pages=395–413|doi=10.1006/jagm.1993.1021|issn=0196-6774}}</ref>. Instead of putting all the data into the heap as the traditional heap sort did, adaptive heap sort only take part of the data into the heap so that the run time will reduce significantly when the presortedness of the data is high<ref name=":0" />.
In computer science, '''adaptive heap sort''' is a [[Comparison sort|comparison-based]] [[sorting algorithm]] of the [[Adaptive sort|adaptive sort family]]. It is a variant of [[Heapsort|heap sort]] that performs better when the data contains existing order. Published by [[Christos Levcopoulos]] and [[Ola Petersson]] in 1992, the algorithm utilizes a new measure of presortedness, ''Osc,'' as the number of oscillations.<ref name=":0">{{Cite journal|last1=Levcopoulos|first1=C.|last2=Petersson|first2=O.|date=1993-05-01|title=Adaptive Heapsort|journal=Journal of Algorithms|volume=14|issue=3|pages=395–413|doi=10.1006/jagm.1993.1021|issn=0196-6774}}</ref> Instead of putting all the data into the heap as the traditional heap sort did, adaptive heap sort only take part of the data into the heap so that the run time will reduce significantly when the presortedness of the data is high.<ref name=":0" />


== Heapsort ==
== Heapsort ==
Heap sort is a sorting algorithm that utilizes [[binary heap]] data structure. The method treats an array as a complete [[binary tree]] and builds up a Max-Heap/Min-Heap to achieve sorting. <ref name=":1">{{Cite journal|last=Schaffer|first=R.|last2=Sedgewick|first2=R.|date=1993-07-01|title=The Analysis of Heapsort|url=http://www.sciencedirect.com/science/article/pii/S019667748371031X|journal=Journal of Algorithms|volume=15|issue=1|pages=76–100|doi=10.1006/jagm.1993.1031|issn=0196-6774}}</ref>It usually involves the following four steps.
Heap sort is a sorting algorithm that utilizes [[binary heap]] data structure. The method treats an array as a complete [[binary tree]] and builds up a Max-Heap/Min-Heap to achieve sorting.<ref name=":1">{{Cite journal|last1=Schaffer|first1=R.|last2=Sedgewick|first2=R.|date=1993-07-01|title=The Analysis of Heapsort|journal=Journal of Algorithms|volume=15|issue=1|pages=76–100|doi=10.1006/jagm.1993.1031|issn=0196-6774}}</ref> It usually involves the following four steps.


# Build a Max-Heap(Min-Heap): put all the data into the heap so that all nodes are either greater than or equal (less than or equal to for ''Min-Heap'') to each of its child nodes.
# Build a Max-Heap(Min-Heap): put all the data into the heap so that all nodes are either greater than or equal (less than or equal to for ''Min-Heap'') to each of its child nodes.
Line 9: Line 10:
# Repeat Step 2 and 3 until the heap has only one element. Put this last element at the end of the list and output the list. The data in the list will be sorted.
# Repeat Step 2 and 3 until the heap has only one element. Put this last element at the end of the list and output the list. The data in the list will be sorted.


Below is a C++ implementation that builds up a Max-Heap and sorts the array after the heap is built.<br />
Below is a C/C++ implementation that builds up a Max-Heap and sorts the array after the heap is built.
#include <iostream>
/*
''A C++ sample heap sort code that sort an array to an increasing order''
*/
using namespace std;
void heapify(int array[], int start, int end);// A ''function that build up a max-heap binary tree''
void heapify(int array[], int start, int end)
{
int parent = start;
int child = parent * 2 + 1;
while (child <= end)
{ if (child + 1 <= end ) ''//when there are two child nodes''
{
if (array[child + 1] > array[child])
{
child ++; ''//take the bigger child node''
}
}
if (array[parent] > array[child])
{
return; ''//if the parent node is greater, then it's already heapified''
}
if (array[parent] < array[child]) ''// when child node is greater than parent node''
{
swap (array[parent], array[child]); ''// switch parent and child node''
parent = child;
child = child * 2 + 1; ''//continue the loop, compare the child node and its child nodes''
}
}
}
void heap_sort (int array[], int len); ''// heap_sort function''
void heap_sort (int array[], int len)
{
for (int i = len/2 - 1; i >= 0; i--) '''''//Step 1: build up the max-heap'''''
{
heapify(array, i, len);
}
for (int i = len - 1; i >= 0; i--) '''''//Step 4: repeat step 2 and 3 till finished'''''
{
swap(array[0], array[i]); '''''// Step 2: put the max at the end of the array'''''
heapify (array, 0, i-1); '''''// Step 3: remove the max from the tree and heapify again'''''
}
}
int main()
{
int array[] = {42, 1283, 123, 654, 239847, 45, 97, 85, 763, 90, 770, 616, 328, 1444, 911, 315, 38, 5040, 1}; ''//the array that will be sorted''
int array_len = sizeof(array)/sizeof(*array); ''//length of the array''
heap_sort (array, array_len);// heap sort
return 0;
}


<syntaxhighlight lang="c">
== Measures of Presortedness ==
/*
Measures of presortedness measures the existing order in a given sequence.<ref>{{Cite journal|last=Mannila|first=Heikki|date=1985-4|title=Measures of Presortedness and Optimal Sorting Algorithms|url=http://ieeexplore.ieee.org/document/5009382/|journal=IEEE Transactions on Computers|volume=C-34|issue=4|pages=318–325|doi=10.1109/TC.1985.5009382|issn=0018-9340}}</ref> These measures of presortedness decides the number of data that will be put in to the heap during the sorting process as well as the lower bound of running time. <ref name=":2" />
A C/C++ sample heap sort code that sort an array to an increasing order
*/

// A function that build up a max-heap binary tree
void heapify(int array[], int start, int end)
{
int parent = start;
int child = parent * 2 + 1;
while (child <= end)
{ if (child + 1 <= end) // when there are two child nodes
{
if (array[child + 1] > array[child])
{
child ++; //take the bigger child node
}
}
if (array[parent] > array[child])
{
return; //if the parent node is greater, then it's already heapified
}
if (array[parent] < array[child]) // when child node is greater than parent node
{
swap (array[parent], array[child]); // switch parent and child node
parent = child;
child = child * 2 + 1; //continue the loop, compare the child node and its child nodes
}
}
}

// heap_sort function
void heap_sort (int array[], int len)
{
for (int i = len/2 - 1; i >= 0; i--) //Step 1: build up the max-heap
{
heapify(array, i, len);
}
for (int i = len - 1; i >= 0; i--) //Step 4: repeat step 2 and 3 till finished
{
swap(array[0], array[i]); // Step 2: put the max at the end of the array
heapify (array, 0, i-1); // Step 3: remove the max from the tree and heapify again
}
}


int main()
{

//the array that will be sorted
int array[] = {42, 1283, 123, 654, 239847, 45, 97, 85, 763, 90, 770, 616, 328, 1444, 911, 315, 38, 5040, 1};
int array_len = sizeof(array)/sizeof(*array); //length of the array

heap_sort (array, array_len);

return 0;
}
</syntaxhighlight>

== Measures of presortedness ==
[[Measure of presortedness|Measures of presortedness]] measures the existing order in a given sequence.<ref>{{Cite journal|last=Mannila|first=Heikki|date=April 1985|title=Measures of Presortedness and Optimal Sorting Algorithms|journal=IEEE Transactions on Computers|volume=C-34|issue=4|pages=318–325|doi=10.1109/TC.1985.5009382|issn=0018-9340}}</ref> These measures of presortedness decides the number of data that will be put in to the heap during the sorting process as well as the lower bound of running time.<ref name=":2" />


=== Oscillations (''Osc'') ===
=== Oscillations (''Osc'') ===
For sequence <math>X = <x_1, x_2, x_3, .... ,x_n></math>, ''Cross''(x<sub>i</sub>) is defined as the number edges of the line plot of X that are intersected by a horizontal line through the point (i, x<sub>i</sub>). Mathematically, it is defined as<math>Cross(x_i) = \{ j \mid 1 \leq j < n\ and \ \min\{ x_j, x_{j+1}\} < x_i < \max\{ x_j, x_{j+1}\}\}</math>, for <math>1\leq i \leq n</math>. The oscillation(''Osc'') of X is just the total number of intersections, defined as <math>Osc(x) = \textstyle \sum_{i=1}^n \displaystyle\lVert Cross(x_i) \rVert</math>. <ref name=":0" /><br />
For sequence <math>X = \langle x_1, x_2, x_3, \dots ,x_n \rangle</math>, ''Cross''(''x<sub>i</sub>'') is defined as the number edges of the line plot of ''X'' that are intersected by a horizontal line through the point (''i, x<sub>i</sub>''). Mathematically, it is defined as <math>\mathit{Cross}(x_i) = \{j \mid \min\{ x_j, x_{j+1}\} < x_i < \max\{ x_j, x_{j+1}\} \text{ for } 1 \leq j < n \}\text{, for }1\leq i \leq n</math>. The oscillation(''Osc'') of X is just the total number of intersections, defined as <math>\mathit{Osc}(x) = \textstyle \sum_{i=1}^n \displaystyle\lVert \mathit{Cross}(x_i) \rVert</math>.<ref name=":0" />


=== Other measures ===
=== Other measures ===
Besides the original Osc measurement, other known measures include the number of inversions ''Inv'', the number of runs ''Runs'', the number of blocks ''Block'', and the measures ''Max'', ''Exc'' and ''Rem''. Most of these different measurements are related for adaptive heap sort. Some measures dominate the others: every Osc-optimal algorithm is Inv optimal and Runs optimal; every Inv-optimal algorithm is Max optimal; and every Block-optimal algorithm is Exc optimal and Rem optimal . <ref name=":2">{{Cite journal|last=Edelkamp|first=Stefan|last2=Elmasry|first2=Amr|last3=Katajainen|first3=Jyrki|date=2011|editor-last=Iliopoulos|editor-first=Costas S.|editor2-last=Smyth|editor2-first=William F.|title=Two Constant-Factor-Optimal Realizations of Adaptive Heapsort|url=https://link.springer.com/chapter/10.1007%2F978-3-642-25011-8_16|journal=Combinatorial Algorithms|series=Lecture Notes in Computer Science|language=en|publisher=Springer Berlin Heidelberg|pages=195–208|doi=10.1007/978-3-642-25011-8_16|isbn=9783642250118}}</ref>
Besides the original Osc measurement, other known measures include the number of inversions ''Inv'', the number of runs ''Runs'', the number of blocks ''Block'', and the measures ''Max'', ''Exc'' and ''Rem''. Most of these different measurements are related for adaptive heap sort. Some measures dominate the others: every Osc-optimal algorithm is Inv optimal and Runs optimal; every Inv-optimal algorithm is Max optimal; and every Block-optimal algorithm is Exc optimal and Rem optimal.<ref name=":2">{{Cite book|last1=Edelkamp|first1=Stefan|last2=Elmasry|first2=Amr|last3=Katajainen|first3=Jyrki|s2cid=10325857|date=2011|editor-last=Iliopoulos|editor-first=Costas S.|editor2-last=Smyth|editor2-first=William F.|chapter=Two Constant-Factor-Optimal Realizations of Adaptive Heapsort|title=Combinatorial Algorithms|volume=7056|series=Lecture Notes in Computer Science|publisher=Springer Berlin Heidelberg|pages=195–208|doi=10.1007/978-3-642-25011-8_16|isbn=9783642250118}}</ref>


== Algorithm ==
== Algorithm ==


Adaptive heap sort is a variant of heap sort that seeks optimality (asymptotically optimal) with respect to the lower bound derived with the measure of presortedness by taking advantage of the existing order in the data. In heap sort, for a data <math>X = <x_1, x_2, x_3, .... ,x_n></math> , we put all n elements into the heap and then keep extracting the maximum (or minimum) for n times. Since the time of each max-extraction action is the logarithmic in the size of the heap, the total running time of standard heap sort is [[Big O notation|O(n log n)]].<ref name=":1" /> For adaptive heap sort, instead of putting all the elements into the heap, only the possible maximums of the data (max-candidates) will be put into the heap so that fewer runs are required when each time we try to locate the maximum(or minimum). First, a [[Cartesian tree]] is built from the input in <math>O(n)</math> time by putting the data into a binary tree and making each node in the tree is greater(or smaller) than all its children nodes, and the root of the Cartesian tree is inserted into an empty binary heap. Then repeatedly extract the maximum from the binary heap, retrieve the maximum in the Cartesian tree, and add its left and right children (if any) which are themselves Cartesian trees, to the binary heap. If the input is already nearly sorted, the Cartesian trees will be very unbalanced, with few nodes having left and right children, resulting in the binary heap remaining small, and allowing the algorithm to sort more quickly than <math>O(n\log n)</math> for inputs that are already nearly sorted.<ref>{{Cite web|url=http://www.keithschwarz.com/interesting/code/?dir=cartesian-tree-sort|title=Archive of Interesting Code|website=www.keithschwarz.com|access-date=2019-10-31}}</ref><br />
Adaptive heap sort is a variant of heap sort that seeks optimality (asymptotically optimal) with respect to the lower bound derived with the measure of presortedness by taking advantage of the existing order in the data. In heap sort, for a data <math>X = \langle x_1, x_2, x_3, \dots ,x_n \rangle</math> , we put all n elements into the heap and then keep extracting the maximum (or minimum) for n times. Since the time of each max-extraction action is the logarithmic in the size of the heap, the total running time of standard heap sort is [[Big O notation|<math>\color{Blue} O(n \log n)</math>]].<ref name=":1" /> For adaptive heap sort, instead of putting all the elements into the heap, only the possible maximums of the data (max-candidates) will be put into the heap so that fewer runs are required when each time we try to locate the maximum (or minimum).
First, a [[Cartesian tree]] is built from the input in <math>O(n)</math> time by putting the data into a binary tree and making each node in the tree is greater(or smaller) than all its children nodes, and the root of the Cartesian tree is inserted into an empty binary heap. Then repeatedly extract the maximum from the binary heap, retrieve the maximum in the Cartesian tree, and add its left and right children (if any) which are themselves Cartesian trees, to the binary heap. If the input is already nearly sorted, the Cartesian trees will be very unbalanced, with few nodes having left and right children, resulting in the binary heap remaining small, and allowing the algorithm to sort more quickly than <math>O(n\log n)</math> for inputs that are already nearly sorted.<ref>{{Cite web|url=http://www.keithschwarz.com/interesting/code/?dir=cartesian-tree-sort|title=Archive of Interesting Code|website=www.keithschwarz.com|access-date=2019-10-31}}</ref>

Below is an implementation in pseudo-code:<ref name=":0" />

Input: an array of n elements that need to be sorted
Input: an array of n elements that need to be sorted
Construct the Cartesian tree ''l''(x)
Construct the Cartesian tree ''l''(''x'')
Insert the root of ''l''(x) into a heap
Insert the root of ''l''(''x'') into a heap
for i = from 1 to n
for i = from 1 to n
{
{
Perform ExtractMax on the heap
Perform ExtractMax on the heap
if the max element extracted has any children in ''l''(x)
if the max element extracted has any children in ''l''(''x'')
{
{
retrieve the children in ''l''(x)
retrieve the children in ''l''(''x'')
insert the children element into the heap
insert the children element into the heap
}
}
}
}<ref name=":0" />


== Drawbacks ==
== Drawbacks ==
Despite decades of researches, gap between the theory of adaptive sorting and the actual computing practice still exists, leading to several drawbacks of this algorithm. Due to the use of Cartesian trees and the pointer manipulations, problems regarding low cache efficiency and high memory requirements arose, deteriorate the performance of any implementation of adaptive heapsort.<ref name=":2" />
Despite decades of research, there's still a gap between the theory of adaptive heap sort and its practical use. Because the algorithm makes use of Cartesian trees and pointer manipulation, it has low cache-efficiency and high memory requirements, both of which deteriorate the performance of implementations.<ref name=":2" />

==See also==
* [[Adaptive sort]]
*[[Heapsort]]
*[[Cartesian tree]]


== References ==
== References ==
{{reflist}}
__FORCETOC__

[[Category:Comparison sorts]]
[[Category:Heaps (data structures)]]

Latest revision as of 09:12, 22 June 2024

In computer science, adaptive heap sort is a comparison-based sorting algorithm of the adaptive sort family. It is a variant of heap sort that performs better when the data contains existing order. Published by Christos Levcopoulos and Ola Petersson in 1992, the algorithm utilizes a new measure of presortedness, Osc, as the number of oscillations.[1] Instead of putting all the data into the heap as the traditional heap sort did, adaptive heap sort only take part of the data into the heap so that the run time will reduce significantly when the presortedness of the data is high.[1]

Heapsort

[edit]

Heap sort is a sorting algorithm that utilizes binary heap data structure. The method treats an array as a complete binary tree and builds up a Max-Heap/Min-Heap to achieve sorting.[2] It usually involves the following four steps.

  1. Build a Max-Heap(Min-Heap): put all the data into the heap so that all nodes are either greater than or equal (less than or equal to for Min-Heap) to each of its child nodes.
  2. Swap the first element of the heap with the last element of the heap.
  3. Remove the last element from the heap and put it at the end of the list. Adjust the heap so that the first element ends up at the right place in the heap.
  4. Repeat Step 2 and 3 until the heap has only one element. Put this last element at the end of the list and output the list. The data in the list will be sorted.

Below is a C/C++ implementation that builds up a Max-Heap and sorts the array after the heap is built.

/*
A C/C++ sample heap sort code that sort an array to an increasing order
*/

// A function that build up a max-heap binary tree
void heapify(int array[], int start, int end)
{
	int parent = start;
	int child = parent * 2 + 1;
	while (child <= end)
	{	if (child + 1 <= end) // when there are two child nodes
		{
			if (array[child + 1] > array[child])
			{
				child ++; //take the bigger child node
			}
				
		}
		if (array[parent] > array[child])
		{
			return; //if the parent node is greater, then it's already heapified
		}
		if (array[parent] < array[child]) // when child node is greater than parent node
		{
			swap (array[parent], array[child]); // switch parent and child node
			parent = child;
			child = child * 2 + 1; //continue the loop, compare the child node and its child nodes
			
		}
	}
}

// heap_sort function
void heap_sort (int array[], int len)
{
	for (int i = len/2 - 1; i >= 0; i--) //Step 1: build up the max-heap
	{
		heapify(array, i, len);
	}
	for (int i = len - 1; i >= 0; i--) //Step 4: repeat step 2 and 3 till finished
	{
		swap(array[0], array[i]); // Step 2: put the max at the end of the array
		heapify (array, 0, i-1); // Step 3: remove the max from the tree and heapify again
	}
}


int main()
{

	//the array that will be sorted
	int array[] = {42, 1283, 123, 654, 239847, 45, 97, 85, 763, 90, 770, 616, 328, 1444, 911, 315, 38, 5040, 1};
	int array_len = sizeof(array)/sizeof(*array); //length of the array

	heap_sort (array, array_len);

	return 0;
}

Measures of presortedness

[edit]

Measures of presortedness measures the existing order in a given sequence.[3] These measures of presortedness decides the number of data that will be put in to the heap during the sorting process as well as the lower bound of running time.[4]

Oscillations (Osc)

[edit]

For sequence , Cross(xi) is defined as the number edges of the line plot of X that are intersected by a horizontal line through the point (i, xi). Mathematically, it is defined as . The oscillation(Osc) of X is just the total number of intersections, defined as .[1]

Other measures

[edit]

Besides the original Osc measurement, other known measures include the number of inversions Inv, the number of runs Runs, the number of blocks Block, and the measures Max, Exc and Rem. Most of these different measurements are related for adaptive heap sort. Some measures dominate the others: every Osc-optimal algorithm is Inv optimal and Runs optimal; every Inv-optimal algorithm is Max optimal; and every Block-optimal algorithm is Exc optimal and Rem optimal.[4]

Algorithm

[edit]

Adaptive heap sort is a variant of heap sort that seeks optimality (asymptotically optimal) with respect to the lower bound derived with the measure of presortedness by taking advantage of the existing order in the data. In heap sort, for a data , we put all n elements into the heap and then keep extracting the maximum (or minimum) for n times. Since the time of each max-extraction action is the logarithmic in the size of the heap, the total running time of standard heap sort is .[2] For adaptive heap sort, instead of putting all the elements into the heap, only the possible maximums of the data (max-candidates) will be put into the heap so that fewer runs are required when each time we try to locate the maximum (or minimum).

First, a Cartesian tree is built from the input in time by putting the data into a binary tree and making each node in the tree is greater(or smaller) than all its children nodes, and the root of the Cartesian tree is inserted into an empty binary heap. Then repeatedly extract the maximum from the binary heap, retrieve the maximum in the Cartesian tree, and add its left and right children (if any) which are themselves Cartesian trees, to the binary heap. If the input is already nearly sorted, the Cartesian trees will be very unbalanced, with few nodes having left and right children, resulting in the binary heap remaining small, and allowing the algorithm to sort more quickly than for inputs that are already nearly sorted.[5]

Below is an implementation in pseudo-code:[1]

Input: an array of n elements that need to be sorted

Construct the Cartesian tree l(x)
Insert the root of l(x) into a heap

for i = from 1 to n
{
    Perform ExtractMax on the heap 
    if the max element extracted has any children in l(x)
    {
        retrieve the children in l(x)
        insert the children element into the heap
    }
}

Drawbacks

[edit]

Despite decades of research, there's still a gap between the theory of adaptive heap sort and its practical use. Because the algorithm makes use of Cartesian trees and pointer manipulation, it has low cache-efficiency and high memory requirements, both of which deteriorate the performance of implementations.[4]

See also

[edit]

References

[edit]
  1. ^ a b c d Levcopoulos, C.; Petersson, O. (1993-05-01). "Adaptive Heapsort". Journal of Algorithms. 14 (3): 395–413. doi:10.1006/jagm.1993.1021. ISSN 0196-6774.
  2. ^ a b Schaffer, R.; Sedgewick, R. (1993-07-01). "The Analysis of Heapsort". Journal of Algorithms. 15 (1): 76–100. doi:10.1006/jagm.1993.1031. ISSN 0196-6774.
  3. ^ Mannila, Heikki (April 1985). "Measures of Presortedness and Optimal Sorting Algorithms". IEEE Transactions on Computers. C-34 (4): 318–325. doi:10.1109/TC.1985.5009382. ISSN 0018-9340.
  4. ^ a b c Edelkamp, Stefan; Elmasry, Amr; Katajainen, Jyrki (2011). "Two Constant-Factor-Optimal Realizations of Adaptive Heapsort". In Iliopoulos, Costas S.; Smyth, William F. (eds.). Combinatorial Algorithms. Lecture Notes in Computer Science. Vol. 7056. Springer Berlin Heidelberg. pp. 195–208. doi:10.1007/978-3-642-25011-8_16. ISBN 9783642250118. S2CID 10325857.
  5. ^ "Archive of Interesting Code". www.keithschwarz.com. Retrieved 2019-10-31.