Cyclesort - A Curious Little Sorting Algorithm
Cyclesort - A Curious Little Sorting Algorithm
Cyclesort - A Curious Little Sorting Algorithm
http://www.reddit.com/r/programming/comments...
MY SUBREDDITS
FRONT - ALL - RANDOM | PICS - FUNNY - POLITICS - GAMING - ASKREDDIT - WORLDNEWS - VIDEOS - IAMA - TODAYILEARNED MORE
PROGRAMMING
comments
related
200
search reddit
this post was submitted on 22 Nov 2010
all 83 comments
sorted by: best
[] [deleted] 12 points 2 years ago
Don't confuse a secondary source with a primary source when critiquing the algorithm. See http://comjnl.oxfordjournals.org/content /33/4/365.full.pdf+html - of note, the original paper gives some applications of the algorithm.
permalink [] zahlenwang 28 points 2 years ago*
username
remember me
password
reset password
login
edit2: I apologize for comment-jacking my own comment and royally messing up the comment structure with programming repeated comments and edits, but here goes. The blog subscribe 430,810 readers article presents cycle sort as if it were an elaborate way to sort the integers [0 .. n]; it is more useful than that, 544 users here now and you can read the original 3 page paper by Haddon for accurate information. The Wikipedia article contains a /r/programming is a reddit for discussion and news about computer programming very similar algorithm (does not assume a key function, runs in quadratic time), and is clearly and concisely written, so you should read that too. You can disregard Girls Who Code everything else I've said in these comments, my comments below have no additional information.
hackNY
While the article makes cycle sort sound like a hilariously round-about way of sorting the integers [0 .. n], reading Guidelines the Wikipedia article clarifies the matter. The Please try to keep submissions on topic and of <s>actual</s> Wikipedia cycle sort algorithm does not use the magic indexing function, and runs in O(n ) time. Its advantage is that it is an in-place algorithm with O(1) additional needed memory that performs the minimum possible number of writes, i.e. one write for every element that does not start in the correct position. Also: The cyclesort_simple algorithm only works on permutations of sets of numbers ranging from 0 to n. There are other fast ways to sort data of this restricted kind, but all the methods I know of require additional memory proportional to n.
def sort_permutation(l): return range(0, len(l)) 2
high quality.
Just because it has a computer in it doesn't make it programming. Memes and image macros are not acceptable forms of content. If there is no code in your link, it probably doesn't belong here. App demos should include code and/or architecture discussion. Please follow proper reddiquette. Info Do you have a question? Check out /r/learnprogramming, /r/cscareerquestions, or stackoverflow. For posting job listings, please visit /r/forhire or /r/jobbit.
or def sort_permutation(l): for i in xrange(0, len(l)): l[i] = i depending on your definition of "additional memory".
1 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
Check out our faq. It could use some updating. If you're an all-star hacker (or even just beginning), why not join the discussion at /r/redditdev and steal our reddit code! Related reddits /r/technology /r/learnprogramming /r/coding /r/compsci /r/netsec /r/webdev /r/web_design /r/gamedev /r/cscareerquestions /r/reverseengineering /r/startups /r/techsupport Specific languages
created by spez a community for 7 years
message the moderators
edit: At emef's suggestion I've now read the original paper cited by Wikipedia and the blog article. The algorithm listed in Wikipedia originated from a stackoverflow post; the original algorithm is indeed O(n), and assumes the existence of a key function -- specifically it is for sorting data associated to the integers [0 .. n] with possible duplicates. It would be reasonable to use "cycle sort" to refer to the original paper, but that leaves the question of what name to give to the algorithm in the Wikipedia article on cycle sort.
permalink [] fonkprop 5 points 2 years ago*
Jesus. Over on Hacker News, this same article has an interesting and wide-ranging discussion and the consensus is that cycle-sort is O(n). Over here on Reddit, the total contribution is "But you could just generate the sequence! Derp." You totally miss the point, morans. Nobody is actually going to sort the sequence of numbers range(0, n) with this. It's useful if there is data associated with each key.
permalink parent [] zahlenwang -1 points 2 years ago*
MODERATORS ketralnis spez Poromenos tryx dons masta kylev chromakode a_redditor
If there is data associated to these integers, we still can easily construct the sorted list, as others have pointed out. For example, def sort_permutation_with_data(l): s = [None] * len(l) for i in xrange(0, len(l)): k, v = l[i] s[k] = v return s print (sort_permutation_with_data([(1, 'b'), (2, 'c'), (0, 'a'), (3, 'd')])) Our point is not that cycle sort is a bad algorithm; it has its uses, as I said. Our point is that the author doesn't appear to know what he or she is talking about. The author never mentions data being associated to the integers, or that there exist easier algorithms (and "all the methods I know..." admits that s/he doesn't know of the easier algorithms). That and the "in-place" apply_cycle function makes the author look a little muddled. Although someone who already understands permutations and cycles and sorting could easily fill in the gaps in the author's understanding, if one's goal were to give an introduction to cycle sort aimed at the novice, then surely there are better explanations available -- explanations that carefully explain the differences between sorting integers with or without associated data, using cycle sort on data sets that lack a indexing function, and the situations in which cycle sort might be preferable over simpler algorithms.
...and 1 more
2 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
edit: I did not intend to suggest that sort_permutation_with_data uses O(1) additional space. Clearly it requires O(n) additional space.
permalink parent [] zohebv 6 points 2 years ago
I am familiar with cycle sort. In fact, I ended up discovering it myself when trying to prove that "number of swaps required to change one permutation to another is always odd or always even". In fact, this cyclic pattern can be a handy tool when analyzing permutations. This is exactly what the author of the article is talking about. He is perfectly lucid and understands everything that he is talking about. Our point is that the author doesn't appear to know what he or she is talking about. False The author never mentions data being associated to the integers His entire discussion about the key function is exactly about this association. and "all the methods I know..." admits that s/he doesn't know of the easier algorithms I don't even .... That and the "in-place" apply_cycle function makes the author look a little muddled. Seriously, he is explaining things one logical step at a time. Although someone who already understands permutations and cycles and sorting could easily fill in the gaps in the author's understanding WTF! The fun thing about cycle sort is that it uses the not very obvious property that any permutation can be decomposed into a unique set of disjoint cycles, as the author has explained. This is what he is so keen on demonstrating and using it to do the O(1) space sort. situations in which cycle sort might be preferable over simpler algorithms. This is isn't about applicable situations. You need to appreciate the pattern that all permutations are just a bunch of cyclic rotations. I certainly never thought of permutations that way until I attempted to solve the "swap" problem.
permalink parent [] zahlenwang -1 points 2 years ago
His entire discussion about the key function is exactly about this association. Throughout the article he only sorts lists of integers (or other small sets), and never associates values to these integers. Thus the examples the author presents can be trivially solved without cycle sort. The word "key" used by the author refers to the relationship between the integer and its index in the sorted array (which is the identity when there is only one of each integer). That and the "in-place" apply_cycle function makes the author look a little muddled. Seriously, he is explaining things one logical step at a time. Sorry, I wasn't clear; what I'm referring to is that the author says the apply_cycle function is in-place when it explicitly is not.
permalink parent [] zohebv 2 points 2 years ago
The word "key" used by the author refers to the relationship between the integer and its index in the sorted array (which is the identity when there is only one of each integer). He explains that it is clearly a special case. The way you use the key function to associate data is this. You have two fields in the object. The integer and The data associated with the integer. In this case, the key function takes the object as parameter
3 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
and returns the integer field in the object. As simple as that. The key function is a more general way of specifying that data be associated with an integer. For example the key could be a one-way function. what I'm referring to is that the author says the apply_cycle function is in-place when it explicitly is not. I am not familiar with python. What he seems to be doing is destructively update the array that is passed in. In most computer languages this is called an in-place update.
permalink parent [] fonkprop 2 points 2 years ago
You just don't really understand what you've read, do you? Your proposed replacement algorithm has double the memory requirement of cycle-sort, so it loses what makes the algorithm interesting. This is the whole point: with cycle-sort we can do it in-place. Anyway, I'll let you guys get back to upvoting whichever commenter has the same ignorant misconception you have - move along!
permalink parent [] zahlenwang -1 points 2 years ago*
You and I both well understand the differences and relative merits of the various sorting algorithms. (Yes, in the circumstance that you need to sort a collection of data associated to the integers [0 .. n] with O(1) additional memory and with the minimum number of writes, you should use cycle sort.) My claim is that the author of the article either doesn't understand or does a rather poor job of explaining things. edit: In the Shyamalian event that you are the author of the article, then since you do understand cycle sort, I'd propose rewriting the article to reflect that fact. Of course, that's not really my business, it's the author's blog to do with as he or she pleases. If you aren't the author please disregard.
permalink parent [] fonkprop 3 points 2 years ago*
The author seems to understand things just fine. The same can't be said for you everything you've said about this has just been flat-out wrong. The "actual cycle sort" does use the equivalent of a "key" function because that's how the algorithm works you just don't understand it. You think you can replace cyclesort by "generating a range", but really, you just don't understand what it does. You think you can replace cyclesort with your obtuse algorithm above, but, really, you just don't understand its basic characteristics. Basically, not one damn thing you've said has been correct or useful. And yet your contribution is the top comment. Ever wonder why proggit has degenerated to the cesspool of ignorance and despair that it is now?
permalink parent [] zahlenwang 1 point 2 years ago*
everything you've said about this has just been flat-out wrong I can see how my comments give the impression that sort_permutation_with_data uses O(1) additional space; I did not mean to suggest that. Obviously it requires O(n) additional space. Other than that, I don't think I've said anything false, unless I overlooked where the author discusses data associated with keys. The "actual cycle sort" does use the equivalent of a "key" function <s>By "actual cycle sort" I refer to the O(n ) algorithm discussed under "Algorithm" on the Wikipedia page. That algorithm only uses comparison of elements. (Wikipedia does elaborate under "Specific-situation optimizations" that the algorithm can be improved to O(n) if a hash function is available.) I haven't
2
4 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
read the original paper by Haddon, perhaps the algorithm discussed there is different from what I thought.</s> You think you can replace cyclesort In nearly all cases, including the cases the article discusses, cycle sort can be replaced by a simpler or more general algorithm. As I've said, there do exist cases where cycle sort is preferable. edit: your contribution A naive reader quickly looking at the blog article may get the impression that cycle sort can only be used for sorting the integers [0 .. n] (with possible repeats). The purpose of my original remark was to clarify that cycle sort is, in fact, actually useful in certain circumstances. edit2: see edit to my root post regarding which algorithm is the actual cycle sort.
permalink parent [] emef 3 points 2 years ago
Good call. The wikipedia algorithm is not that in the original paper.
permalink parent [] fonkprop -3 points 2 years ago
Right. So your entire idiotic critique is based on the fact that you a) didn't understand the article, and b) you didn't refer to the original paper linked to right in the opening fucking paragraph. Instead, you thought you'd try to look clever by parroting the Wikipedia page, little knowing that it is, in fact, wrong. Priceless. I have a better idea: why don't you just refrain from commenting on things you don't know the first fucking thing about?
permalink parent [] tias 1 point 2 years ago
No matter who is right, I know you're the one being an ass about it.
permalink parent [] zahlenwang -1 points 2 years ago
I wrongly assumed that the Wikipedia article is more trustworthy than a poorly written blog post. Now that I've read the original paper, I see that the blog is more faithful to the original paper than the Wikipedia article is (although that raises the question of what name to call the algorithm that appears on Wikipedia). I am sorry about the confusion that has come from my using the name "cycle sort" for a slightly different algorithm. Other than the question of what name to give to what algorithm, my critique of the blog article stands unchanged. Fundamentally, it comes down to the fact that the blog never discusses associating data to keys, which is rather essential to the algorithm being of use.
permalink parent [+] fonkprop comment score below threshold (2 children) [] cojoco -2 points 2 years ago
morans
5 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
Yes, that's what I thought. By the way, we're all talking about a sorting algorithm in which the final index of every data element is incorporated into every key I think anyone would have a hard time writing a sort for this kind of data which was worse than O(n)
permalink parent [] Nebu 1 point 2 years ago
I think anyone would have a hard time writing a sort for this kind of data which was worse than O(n) I bet I could write a sort that's a hundred times worse than O(n).
permalink parent [] cojoco 1 point 2 years ago
Oops, sorry, I was feeling a bit defensive after all of the downvotes.
permalink [] zohebv 2 points 2 years ago parent
The blog article presents cycle sort as if it were an elaborate way to sort the integers [0 .. n]; it is more useful than that, This is getting really infuriating! The blog post is perfectly well written. You did not follow it or the usage of the key function. The author first explains a very interesting fact about permutations, then uses it do demonstrate an in-place sort which is the most obvious application. The Wikipedia article contains a very similar algorithm (does not assume a key function, runs in quadratic time) So it looks like you did not follow the wikipedia article either. Notice this sentence - "a constant-time perfect hash function can greatly speed up finding where to put an item1, turning the sort from (n2) time to (n + k) time" What is this constant-time perfect hash function?! Precisely, the key function! Your inability to follow the blog post or map the key function to the perfect-hash fn is not the fault of the author. A certain amount of intelligence needs to be used when reading and insisting that the author should use the exact phraseology from wikipedia is ridiculous. If you want to make a comment about something just do it. Stop shitting on the author of the blog post who is obviously several notches smarter than you and has put up a well written article and understands the topic way better than you do, even after 100s of comments, corrections and edits.
permalink parent [] hobbified 1 point 2 years ago*
Your inability to follow the blog post or map the key function to the perfect-hash fn is not the fault of the author. I think you're missing the point entirely. That's not at issue. What is is the fact that the blog post never mentions the general O(n algorithm, only the O(n + m) one that arises from it.
permalink parent [] zohebv 3 points 2 years ago 2)
6 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
Why should the article by the author be an exact replica of the wikipedia article? If we cannot write a decent O(1) key function, but need an O(n) key function, obviously the algorithm will be O(n The algorithm is only interesting when we have a fast key function and this is what the author described. Instead, the comments here are childish - "He he, we can't sort faster than nlogn, Hey why doesn't this fool just print out 1..n directly, the author doesn't know what he is talking about". This is just ridiculous. Sure, he didn't mention that - if the key function isn't O(1) as expected the complexity of the algo is not O(n). This was because the focus of the post was on demonstrating O(1) space complexity and the reduction of a permute action to a rotation of disjoint sets of objects.
permalink parent [] zahlenwang 0 points 2 years ago 2).
I disagree (though I originally was conflating the Wikipedia algorithm with the Haddon algorithm). While it would have been nice for the blog post to discuss the quadratic algorithm, it does not appear in the Haddon paper, so its omission is not a big deal (or at least, I personally don't think so).
permalink parent [] zahlenwang 1 point 2 years ago*
Notice this sentence - "a constant-time perfect hash function can greatly speed up finding where to put an item1, turning the sort from (n2) time to (n + k) time" I am responsible for some confusion here. I stated elsewhere (very deeply buried of course) that when I refer to the Wikipedia algorithm, I mean the O(n ) one in the Algorithm section -- that runs in quadratic time and requires no key function. The O(n) specialization given in the following section is identical to cycle sort as originally presented by Haddon, and requires a key function. I have come across as far harsher against the original blog post than I had originally intended. It is overall an okay presentation of a cycle sort. However all the back-and-forth has made things rather ugly.
permalink parent [] Doctor -3 points 2 years ago 2
Yeah, that was my response too. It's quick to sort integers when each integer is its own sorted index? Well, duh.
permalink parent [] basscadet 3 points 2 years ago
produces: 3 0 1 2 and /:~ 3 4 5 1 gives 1 3 4 5 ~ makes the /: (grade is the name of that) reflexive, meaning "use the cycle generated for that list to order the list"
permalink [] euneirophrenia 10 points 2 years ago
Here's a Python function that applies a cycle to a list in-place: def apply_cycle(lst, c): # Extract the cycle's values vals = [lst[i] for i in c] # Rotate them circularly by one position vals = [vals[-1]] + vals[:-1] # Re-insert them into the list for i, offset in enumerate(c): lst[offset] = vals[i] This is the exact opposite of in-place
permalink
7 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
XOR swap is a neat trick to show an intro CS class, but I'd be surprised to ever see it in practice. The wiki article you linked sums up the reasons nicely Most modern compilers can optimize away the temporary variable in the naive swap, in which case the naive swap uses the same amount of memory and the same number of registers as the XOR swap and is at least as fast, and often faster.[2] The XOR swap is also much less readable, and can be completely opaque to anyone who isn't already familiar with the technique. On modern CPU architectures, the XOR technique is considerably slower than using a temporary variable to do swapping. One reason is that modern CPUs strive to execute commands in parallel via instruction pipelines. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order. If efficiency is of tremendous concern, it is advised to test the speeds of both the XOR technique and temporary variable swapping on the target architecture
permalink parent [] geon 19 points 2 years ago
If you know the data is a continuos range [0..n], you can simply generate that range... If I do that lazily, it's O(1).
permalink [] yatima2975 16 points 2 years ago
I was wondering whether I was missing something, because of this exact point! Could it also sort permutations of subsets of [0..n]? Nope! But then I looked at this little snippet of code:
def key(element): return element
And then it struck me: this is not useful for sorting the numbers themselves, but when there is also satellite data/values associated to them. In that case, just generating the range is not enough, since you also need to go through the original array again to find the satellite data.
permalink parent [] geon -1 points 2 years ago
Yes. But in that case, this is a really stupid data structure to begin with, and should be converted to an array, which is a much simpler task than to sort it.
permalink parent [] bluGill 3 points 2 years ago
What if the data is far too big to fit in memory, so it lives only on tape (or disk, but tape is worse). Further, lets assume that sorted read doesn't need to happen often, so insertion sort isn't worth the bother (perhaps 90% of access is for one of the 10 previous records). However once in a while (before year end inventory?) you need to read everything in sorted order. In this case you need a sort that can quickly get things moved around - but your access time to the data is the critical factor in speed, so you need to minimize that.
8 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
Good point.
permalink parent [] jennyther 5 points 2 years ago
and the first step in converting it to an array would be what exactly? Not sorting it into order surely?
permalink parent [] geon -1 points 2 years ago
The datastructure described would be an association list. If the keys are all the integers of a range [0, n], it makes sense to instead use an array, where the key of the association list is used as the index. If I wouldn't do it this way, the key would be a completely pointless detail. Pseudocode:
foreach(someList as element) someArray[element.key] = element.value
After this conversion, the array would be sorted. (Unless you do it in PHP, where hashtables and arrays are almost-but-not-really the same thing.)
permalink parent [] durandal42 4 points 2 years ago
Indeed. The author goes on to claim: The Haddon paper presents a solution for one common case: permutations whose elements come from a relatively small set, where the number of occurances of each element is known. In this case too, you can simply generate the output directly.
permalink parent [] MarshallBanana 27 points 2 years ago
Nobody sorts lists of just integers. You usually sort a list of data records, that have some field which you sort by. That field can be an integer that fulfills the requirements to use cyclesort.
permalink parent [] piderman 2 points 2 years ago
The thing is though, for cyclesort to work you have to know in advance the occurrence and the order of the integers you use since you use the keys themselves to lookup their place in the sequence. So if you get some random data set from a database with indexes to sort on, you first have to sort the integers before you can use cyclesort. Cycles are fun but this algorithm is worse than useless.
permalink parent [] MarshallBanana 7 points 2 years ago
You don't, if the range of the integers is small enough. As the article explains.
permalink parent [] DoorsofPerceptron 1 point 2 years ago
You can have 'holes' in the cycles, or point to numbers that don't have a value in them. This means that, if you know you have integers you can make a single pass, pick the minima and maxima and then cycle sort within a larger array of size min - max. This means the algorithm is weakly linear.
permalink parent
9 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
You could just go through the list and count how many there is of each key. It wouldn't hurt the time complexity. Although at that point you are basically just reinventing bucket sort.
permalink parent [] MarshallBanana 6 points 2 years ago
Even for the part there you count the occurances of every key?
permalink parent [] MarshallBanana 3 points 2 years ago
If the range of keys is less than the size of the sorted array, yes.
permalink [] [deleted] 2 years ago parent
[deleted]
permalink parent [] dnew 1 point 2 years ago
Assuming this is an actual question, it's because integers by themselves are already sorted. If you want to sort the integers from one to five, the result is 1, 2, 3, 4, 5 regardless of what order they came in.
permalink parent [] fonkprop 2 points 2 years ago
Rest easy, my poor baffled friend - nobody is actually going to sort a continuous range [0..n]. This type of algorithm is used when there's data associated with each key.
permalink parent [] cojoco 0 points 2 years ago
True, but any other algorithm won't stay O(n), when you start evaluating it.
permalink parent [] cojoco 1 point 2 years ago
True, but any other algorithm won't stay O(n), when you start evaluating it. Yowsers! What's the point? Downvotes for pointing out that your statement was trivial, and more downvotes when you trivialize me pointing out that what you said was trivial!
permalink parent [] stlunatic102 1 point 2 years ago
10 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
Upvoted for being the best troll algorithm all year. If it's not a troll algorithm, upvoted for being the best non-troll algorithm all year.
permalink [] spyne-02139 -1 points 2 years ago
The author would be more clear, if in the third paragraph he/she notes that a cycle is a separate entity applied to a list and not merely a property of the list.
permalink [] pkrecker -2 points 2 years ago
I'm pretty sure that this is blog post is a fail. It begins when he doesn't really describe the concept of a cycle (2 examples but no concrete notation?) and continues until the end when you realize that this guy has just written a massive blog post on something that is old, useless and not any more interesting than the other linear time sorting algorithms out there.
permalink [] noumuon 3 points 2 years ago
he does do a moderate job of describing a cycle. examples are really the only way to get across the idea, as there isn't any particular notation to fall back on (as there would be with, for example, group action of an additive group of integers mod n on some other set).
permalink parent [] pkrecker 1 point 2 years ago
I think the original article at http://comjnl.oxfordjournals.org/content/33/4/365.full.pdf+html does a good job of explaining cycles for the purpose of this algorithm
permalink parent [] pkrecker 1 point 2 years ago
OK fine I deserve the downvotes for being too harsh. But I still think the original paper does a better job of explaining this than the blog post.
permalink parent [] [deleted] 2 years ago*
[deleted]
permalink [] time_circuits 10 points 2 years ago
Impossible for comparison sorts, yes. Sorting by tallying, etc. can be achieved in linear time.
permalink parent [] rabidgnat 7 points 2 years ago
It's impossible to do a comparison-based sort better than O(n log(n)). If you don't use comparisons, sometimes you can do better. When your items almost covers all possible items, bucket sort is O(n). Bead sort has a theoretical complexity of O(1), and real-world complexity of O(sqrt(n)).
permalink parent [+] Mikle comment score below threshold (0 children) [] theresistor 4 points 2 years ago
It's impossible for a comparisons-only sort to be done in less than O(n log n). This algorithm is only linear for data sets that have additional constraints that would require more comparisons to verify, and thus pull it up to O(n log n).
permalink parent [] ethraax 5 points 2 years ago
11 of 12
03/27/2013 09:45 PM
http://www.reddit.com/r/programming/comments...
Radix sort and bucket sort are both sorting algorithms with linear time complexity. They don't exclusively use comparisons, so that limit (which is true of comparison-based sorts) does not apply.
permalink parent
about
blog about team source code advertise
help
wiki FAQ reddiquette rules contact us
tools
mobile firefox extension chrome extension buttons widget
<3
reddit gold store redditgifts reddit.tv radio reddit
Use of this site constitutes acceptance of our User Agreement and Privacy Policy. 2013 reddit inc. All rights reserved. REDDIT and the ALIEN Logo are registered trademarks of reddit inc.
12 of 12
03/27/2013 09:45 PM