Time complexity of hash tables

The implementation above uses the count() method to determine the number of empty slots in the table. This is not a good idea, as count() is an expensive operation (it is O(n)). However, count() can be avoided by keeping track of the number of filled slots (easily achievable by implementing the hash table as a class, which I did not do above so that I would not have to bother about all the overhead methods that using a class brings with it). So I assume that count() is not part of the hash table algorithm.

If each hash number that gets calculated is mapped to a unique index in the table, then any element can be found by a straightforward calculation, i.e., in constant time. This is the ideal situation.

The worst-case scenario is that the hash table is completely filled up, and all the keys get mapped to the same hash number, and the last element of the chain is searched for. In that case, all elements of the table get traversed. This means that finding elements in a hash table is O(n) in these circumstances. However, hash tables have been designed to avoid this happening. Not only are they never completely filled, but the hashing is designed to map keys to different indices. Naturally, the more elements in the table, the higher the chance that there is a collision. But even a collision is not that bad: it only becomes bad if too many collisions happen with the same index.

With a proper hash function and a properly designed hash table, storing elements in the table and finding them leads to no collisions, or at worst 2 or 3. In that case, storing and looking up keys are O(1). If you find that you get substantially more than, say, 2 collisions on average, there are two situations to consider:

In practice, therefore, hash tables are considered to be O(1) with a proper implementation.


What you learned

In this chapter, you learned about: