Selection sort is quite dumb. Insertion sort, at first glance, seems an improvement. The basic definition of insertion sort uses two lists: the original list, and a new list which contains the sorted elements. The new list starts empty. The algorithm grabs items from the original list, removing them, and inserts them into the new list. Since the new list is sorted, it can reasonably efficiently determine where the new item needs to be inserted.

The main improvement over selection sort is that insertion sort does not need to select a particular item from the list (e.g., the minimum). It can simply take a random item from the list. If it takes the last one, removing it from the list is cheap (since no elements on the list need to be moved).

Here is an example implementation of insertion sort:

def find_index( tlist, num ):
    i = 0
    while i < len( tlist ):
        if tlist[i] >= num:
            break
        i += 1
    return i

def insertion_sort( numlist ):
    newlist = []
    while len( numlist ) > 0:
        num = numlist.pop()
        i = find_index( newlist, num ) # returns the spot where num must be inserted
        if i >= len( newlist ):
            newlist.append( num )
        else:
            newlist.insert( i, num )
    return newlist

numlist = [8, 3, 5, 11, 5, 33, 2, 2, 0, 9, 12, 26, 1, 9, 18]
numlist = insertion_sort( numlist )
print( numlist )

Insertion sort is O(n²). It processes n items. For each item, it has to pop it (1 step), find the index (on average about (0.5)*n steps for the worst-case scenario) and insert it (on average about (0.5)*n steps for the worst-case scenario). So it needs about n * (1 + (0.5)*n + (0.5)*n) steps, which means that it is O(n²).

Exercise: You can make the find_index() function more efficient by implementing it as a binary search. That does not change the time complexity of insertion sort, though. Why not? (Hint: Which of the terms in the steps formula which I give above represents the call to find_index()? Replace it by log(n), which is the time-complexity of binary search.)

The “in-place” version of insertion sort considers the start of the list the sorted part of the list, and the end of the list the unsorted part. At the start, the sorted part is empty, and the unsorted part is (therefore) the whole list. The items of the unsorted part are processed from front to back (i.e., from low index to high index). To find the place where the insertion must be made, the item that must be inserted gets “swapped” with items from the back of the sorted part to the front, until it is in the correct spot. For example, suppose that of the following list the first 4 items are already sorted, and the other 4 not yet:

 2  5  11  12  4  1  0  9

The 4 is the next item to be processed. So it gets first swapped with the 12:

 2  5  11  4  12  1  0  9

Then with the 11:

 2  5  4  11  12  1  0  9

Then with the 5:

 2  4  5  11  12  1  0  9

And then the algorithm sees that the 4 is now in the correct spot, so it stops swapping. Now the sorted part of the list is 5 items long, and the unsorted only 3.

This is an elegant way of implementing insertion sort, which needs very little code.

def in_place_insertion_sort( numlist ):
    if len( numlist ) <= 1:
        return
    for i in range( len( numlist ) ):
        j = i
        while j > 0 and numlist[j-1] > numlist[j]:
            numlist[j-1], numlist[j] = numlist[j], numlist[j-1]
            j -= 1
            
numlist = [8, 3, 5, 11, 5, 33, 2, 2, 0, 9, 12, 26, 1, 9, 18]
in_place_insertion_sort( numlist )
print( numlist )

This is still insertion sort, so it is O(n²) (check it if you have any doubts). However, when you study the algorithm, you might notice something about its big-Omega.

Exercise: What is the big-Omega of in_place_insertion_sort()?

Answer: The best-case scenario is that the list is already sorted to begin with. In that case, no swaps will take place at all; each item is checked only once, and then no longer handled. This means this in-place version of insertion sort is Ω(n).

Note that the previously discussed selection sort does not have this advantage, as it has to seek the minimum each time it processes a new item. So selection sort is Ω(n²).

I said before that computer scientists are usually not interested in big-Omega. However, in case you are working with a list for which you know that it is already mostly sorted, the in-place version of insertion sort works well. In fact, if you know that on the list, on average an item is k places away from the place where it ultimately should end up, it is actually O(nk).