By now you might be wondering if there are any sorting algorithms that do better than O(n²). Yes, there are. One of the most elegant implementations of a sorting algorithm is merge sort, which I will discuss now.

Merge sort is a recursive algorithm. If you still do not understand recursion, you best go back to the chapter on that topic and study it (again), so that you understand the discussion.

The idea behind merge sort is as follows.

Suppose that I have two lists that are sorted. I can merge these together to form one sorted list. I simply create a third, empty list, check the first items on the two lists, and then remove the lowest and place it in the third list. I then do the same with what are now the first items on the two lists, and I continue doing that until one of the two lists is empty. At that point, I append the remaining items of the non-empty list at the end of the third list. The third list now contains all the elements from the first two lists, sorted. Such a merge operation can be implemented as O(mn), whereby m is the length of the first list, and n is the length of the second list (I show this implementation below).

So merging two sorted lists is easy and efficient.

The merge sort algorithm takes a list, splits it into two halves, then calls some sorting procedure on each of the halves to produce sorted versions of the halves, and then merges the two halves together to produce the sorted list.

That sounds nice, but what is that “some sorting procedure” supposed to be? With the development of merge sort we are supposed to write a sorting procedure, right? So what do we call to sort the two halves? The answer is simple: we use merge sort to sort the two halves!

Wait a moment… That’s circular. Merge sort needs merge sort to sort a list. How is that going to work?

The trick is that every time merge sort calls merge sort to sort another list, that list is at most half the length of the original list! So merge sort gets called with smaller and smaller lists, until it gets called with a list that is of length 1. And a list of length 1 is already sorted!

I can imagine that this sounds confusing, so here is an example.

Merge sort gets called with the following list:

5  2  11  12  4  1  0  9

It splits that list in two halves, and calls merge sort with each of these halves, i.e., with:

5  2  11  12   and   4  1  0  9

Each of these calls, again, calls merge sort with halved lists, so with:

5  2   and   11  12   and   4  1   and   0  9

Each of these calls, once more, calls merge sort with halved lists, so with:

5   and   2   and   11   and   12   and   4   and   1   and   0   and   9

These lists all have length 1, so they are all sorted. So now they get merged: the 5 and the 2 get merged, the 11 and the 12 get merged, the 4 and the 1 get merged, and the 0 and the 9 get merged, so we have:

2  5   and   11  12   and   1  4   and   0  9

Then the first two sorted sublists get merged and the second two sorted sublists get merged, so we have:

2  5  11  12   and   0  1  4  9

Finally, these two sorted sublists get merged, and we have:

0  1  2  4  5  9  11  12

And there you have it. The way I see it, merge sort implements a sorting procedure without doing any sorting – it only merges.

def merge( sublist1, sublist2 ):
    newlist = []
    i1, i2 = 0, 0
    while i1 < len( sublist1 ) and i2 < len( sublist2 ):
        if sublist1[i1] < sublist2[i2]:
            newlist.append( sublist1[i1] )
            i1 += 1
        else:
            newlist.append( sublist2[i2] )
            i2 += 1
    newlist.extend( sublist1[i1:] )
    newlist.extend( sublist2[i2:] )
    return newlist

def merge_sort( tlist ):
    if len( tlist ) <= 1:
        return tlist
    sublist1 = merge_sort( tlist[:len( tlist )//2] )
    sublist2 = merge_sort( tlist[len( tlist )//2:] )
    return merge( sublist1, sublist2 )
        
numlist = [8, 3, 5, 11, 5, 33, 2, 2, 0, 9, 12, 26, 1, 9, 18]
numlist = merge_sort( numlist )
print( numlist )

Merge sort is O(n*log(n)). You can envision the algorithm as a tree, that has the whole list at the root, two branches under the root, each ending in a node that contains one half of the list, underneath each node again two branches, ending in a node that contains a quarter of the list, etc. At each level of the tree, all items on the list are processed once. This means that the n items on the list are all processed about as often as the depth of the tree, which is about log₂(n). Therefore merge sort is O(n*log(n)).

Note that I deliberately made sure that in the merge() function I do not remove any items from the sublists. If I would do that, the algorithm would become more expensive, as removing items from a list is an expensive operation, as I have shown before. Instead, I just keep track of the items at the front of the sublists with their respective indices.

I see two big disadvantages of merge sort. First, it needs quite a bit of extra memory to store all these temporary lists. It can be implemented as an in-place algorithm, but that may hurt the time complexity. Second, it is recursive, which not only means that it needs a lot of (stack) memory to store data on function calls, but also tends to be slow and get into problems if the lists get too big.

Practical merge sort implementations should attempt to use the least amount of stack space possible, and try to reduce the number of calls to merge, for instance by combining merge sort with insertion sort when the lists get small. This is actually how the list sort() method in Python is implemented: it uses Timsort, which is a complex combination of merge sort and insertion sort, with several optimizations to make it work well with big sets of real-world data.