Sorting is an essential part of our everyday lives. Because of this importance, many people have devoted much time towards improving the performance of different sorting procedures. Previously, I wrote a script comparing many of these algorithms. I decided to break that single script up into several scripts (one for each algorithm) so that the individual algorithms are easier to access and learn more about.

**BubbleSort**

When I first learned about BubbleSort, I thought of being in elementary school and how my teacher would arrange students in line by increasing height. The way she did this was to have us begin by standing in line (in arbitrary order). Then she would start at the beginning of the line and compare the height of each student to that of the student behind him. If the student in front was taller, then she’d have the two students change places. She continued doing this until she reached the back of the line, taking note of whether or not two students had moved or not. If a swap had been made, she would go back to the beginning of the line and start again.

What my teacher was doing (knowingly or unknowingly) was executing the BubbleSort algorithm. A more formal pseudocode version of this algorithm is listed below:

BubbleSort(A, start, end):

swapped <-
false
while (swapped is false)
for (i = start to end)
if (A[i] > A[i+1]) then

temp <-
A[i]
A[i] <- A[i+1]
A[i+1] <- temp
swapped <- true
end if
end for
end while
end BubbleSortHere is an execution of the BubbleSort algorithm.

**SelectionSort**

One of the more common ways that we do sorting in real life is through something similar to selection sort. Suppose we have a list of graded exams and want to have them in descending order before we pass them back to the students. The idea behind selection sort is to search for the largest element in the list, or in this case the highest test score, move that element to the beginning of the list, and then repeat the same procedure on the remaining portion of the list.

Here is a more formal pseudocode for the SelectionSort algorithm:

SelectionSort(A, start, end):

for (i = start to end)

min <-
i
for (j = i+1 to end)
if (A[j] < A[min])
min <- j
end if
end fortemp <- A[i]
A[i] <- A[min]
A[min] <- temp
end for
end SelectionSortHere is an execution of the SelectionSort algorithm

**InsertionSort**

If we think of the same set of (unsorted) graded exams as in the SelectionSort description that we wish to have them in descending order before we pass them back to the students, an alternative way to sort the exams is to have more of a “sort-as-you-go” type of a procedure where each test is only compared to an already sorted subset of the exams. The way this is carried out is to divide the exams into two piles. Originally, one pile (call it Pile 1) is empty and the other pile (Pile 2) contains the entire set of exams. We take the exam from the top of Pile 2 and compare it to the exams in Pile 1 and place it in the proper location in this pile (removing it from Pile 1). We continue this until Pile 2 is empty, in which case Pile 1 will have all the exams and the set of exams will be sorted.

Here is the pseudocode for the InsertionSort algorithm:

InsertionSort(A, start, end):

for (i = start to end)

val <-
A[i]
j <- iwhile (A[j - 1] > val)

A[j] <-
A[j-1]
j <- j - 1
end whileA[j] <- val
end for
end InsertionSortHere is an execution of the InsertionSort algorithm

**MergeSort**

MergeSort is based on a simple “Divide and Conquer” approach. The principle behind it is simple: We want to sort an entire list, but if this entire list is sorted, then each half of this list must also be sorted. It is an easier problem to sort a smaller list than a larger list, so we sort each of the smaller lists. Once these two lists are sorted, we run a merge algorithm which combines the two smaller lists into one larger list. These smaller lists are then sorted using the same principle until our smaller lists are of size 1 or 2, in which case we can simply swap the two items or return the item itself.

The algorithm itself is similar to the hierarchy at work.

The boss is given a job – “sort these 8 documents in decreasing order”.

Boss divides the list into two sets of size 8 and gets two subordinates and says “hey you two, each of you take one of these lists and sort it and get it back to me”.

Each of subordinates then divide their lists into two distinct lists of size 4 and gets two sub-subordinates to sort these smaller lists.

The sub-subordinates then divide their lists into two distinct lists of size 2 and gets two sub-sub-subordinates to sort these smaller lists.

The sub-sub-subordinates now have lists of size 2 which are easy to sort, if they’re ordered, then they say “its already sorted”; if they’re not ordered, they swap the elements and say “here you go”.

Now that the lists of size 2 have been sorted, the sub-subordinates need to combine them into a list of size 4. This can be done by simply reading both lists and taking the maximum element from each list.

Similarly, the subordinates can just combine the lists of size 4 into lists of size 8 using that same principle.

Finally the boss can take the two lists of size 8 and combine it into a sorted list of size 16 using that same principle.

Here is the pseudocode for the MergeSort Algorithm:

MergeSort(A, start, end)

if ((end – start + 1) = 1)

return A

end if

mid <- (end - start) / 2left <- {A[start], A[start+1], ... A[mid]} right <- {A[mid+1], A[mid+2], ... A[end]left <- MergeSort(left, 0, mid) right <- MergeSort(right, 0, end-mid)return Merge(left, right) end MergeSortHere is an execution of the MergeSort algorithm

**QuickSort**

QuickSort is considered the best sorting algorithm among the ones listed here. Like MergeSort, it operates under a divide and conquer principle. Unlike MergeSort, however, the question of where to divide the list (known as the pivot element) and the recursive calls are a bit more complex decisions.

The algorithm works by choosing a pivot element in the list (by default we can let this be the middle element in the list, but there are more complex variations that help decide what this should be), and then reordering the list so that the elements to the left (call this sub-list “small”) of the pivot element are all less than the pivot element, and the elements to the right (call this sub-list “big”) of the pivot element are all greater than the pivot element. So after the swap part of quicksort, the largest element of “small” will be less than the smallest element of “big”. We then need to check on whether the sub-lists “small” and “big” need to be sorted, which is true if we have not considered those regions yet. In such a case, we call the procedure Quicksort on each of the sub-lists.

Here is the pseudocode for the QuickSort Algorithm:

QuickSort(A, left, right)

i <-
left
j <- right
pivot <- A[floor((left + right) / 2)]
while (i <= j)
while (A[i] < pivot)
i <- i+1
end while
while (A[j] > pivot)

j <-
j-1
end while
if (i <= j)
tmp <- A[i]
A[i] <- A[j]
A[j] <- tmp
i <- i+1
j <- j-1
end if
end while
if (left < j)
A <- QuickSort(A, left, j)
end if
if (i < right)
A <- QuickSort(A, i, right)
end ifreturn A;
end QuickSortHere is an execution of the QuickSort algorithm

Aaron:Don’t bother. qsort() is going to be culkny and slow because compilers can’t optimize out the very painful indirect call (read pipeline stall) through the comparator.True, but you can get rid of it in your own implementation.I’m not sure why you want an implementation that looks “clean.” It’s not like you actually need to look at and review the sorting algorithm itself on a regular basis. Real sorting algorithm implementations aren’t clean, because there are a lot of complications. There’s a lot of special cases you need to watch out for in a real implementation, and each of these will require adding various “ugly” bits in.That’s where you’re wrong, there are many cases where you need to specialize the sorting. Say for example you have multiple arrays where all your data is associated with each other. Perhaps one array contains first names, another last names, another ID numbers and so on. These should be in a structure, but for whatever reason, they are independent arrays. If you want to sort them, you need to modify your quick sort to check just one of these arrays for order, but for the swapping part to swap in each of these arrays. Due to this specialization needed at times, one should have a clean quick qsort handy which they can modify as needed.It also means we need to allocate memory, either with C99 VLAs (poorly supported), alloca() (non-standard), or malloc/new (might fail).Not so.Take this for example:static void qsort_internal(char *begin, char *end, size_t size, int(*compar)(const void *, const void *)){ if (begin < end) { char *pivot = partition(begin, end, size, compar); qsort_internal(begin, pivot, size, compar); qsort_internal(pivot+size, end, size, compar); }}Can be replaced with:static void qsort_internal(char *begin, char *end, size_t size, int(*compar)(const void *, const void *)){ while (begin < end) { char *pivot = partition(begin, end, size, compar); qsort_internal(begin, pivot, size, compar); begin = pivot+size; }}And once we reach this point, the other qsort call can be passed off to another CPU thus parallelizing the code making it faster on dual core / dual CPU machines, or replaced with an internal stack of set size. Once the looping mechanism is in place, a static array of log(sizeof(maximum_array)) is all that is needed, meaning you can do stack_t stack[32]; at the top of your qsort() for a machine limited to 4GB of memory.Another issue with simple implementations is that what you probably want in a high quality implementation is really an 'introspective sort;' something that will fall back to a heapsort when qsort is taking too long. This eliminates a very serious failure mode, one that can lead to security problems, at only the cost of increased code size.I mentioned that in my article.Well, in other words, I have a suspicion that you're going to have serious difficult beating a high quality qsort() implementation, which has been tuned for many man-hours by experts, but any modest improvements you make will be negated by the poor comparator performance. As an example of such a high-quality qsort(), see glibc's qsort().I already beat several high quality qsort() implementations which is a story for a different time. glibc's is not quite the best out there. Here's results of a test I ran:Sun 406000Microsoft 448000AT&T 589000glibc 590000Insane 594000IBM 603000BSD 1993 809000Syslinux 1342000uClibc 1547000BSD 1990 2006000BSD 1983 2277000Simple 3973000diet libc 10809000Simple is the one I offered above, which beat out diet libc's. Insane is what I'm currently working on. I would like more feedback from experts to get some new ideas so I can tune my ideas better and beat even Sun's. I've already employed several techniques which I will discuss in a later article to get around limitations your straight forward high quality qsort()s run into.C++ offers many advantages here, which you may or mat be interested in. For example, C++ std::sort() is mostly the same as qsort(), but eliminates the comparator problem, as well as the type safety issue. It is difficult to make a qsort implementation, in C or C++, which will beat a high quality compiler's std::sort().Yes this I know, std::sort() is blazingly fast, and I use it in C++, it's not an option in C though, and those programs and programmers still exist. There are also times where you need specialization like I explained above.Alternately, a lot of people don't realize that a radix sort beats the socks off of qsort in most real cases. The reason you don't see it as much is that its arguably harder to generalize, as its not comparison-based. The main advantage is that radix sort is O(n) average and worst case, whereas qsort is O(nlogn) average and O(n**2) worst case. However, radix sort can be applied to most data sets that a comparison-based sort can. The difficulty in generalizing the radix sort is a serious problem though, because radix sort can get just as hairy as qsort.I may discuss radix sort more a different time.Anyway, real problems are difficult, and have complicated solutions; this requires complicated and complex code. Simple solutions look good on paper, but not always so good in the real world.Yes I know, I've spent much time working on my own implementations of qsort() and studying other ones, and reading the existing papers on the subject. What I want is to see others who have done similar work to elaborate on what conclusions they came to. I myself plan on writing a future article about some optimizations I came up with, and perhaps provide a new extremely fast, but clean and easy to modify implementation.