Although the suffix tree decreases memory requirements from O(|Text|2) to O(|Text|), on average it still requires about 20 times as much memory as Text. In the case of a 3 GB human genome, 60 GB of RAM still presents a memory challenge for most machines. This reveals a dark secret of big-O notation, which is that it ignores constant factors. For long strings such as the human genome, we will need to pay attention to this constant factor, since the expression O(|Text|) applies to both an algorithm with 2 · |Text| memory and an algorithm with 1000 · |Text| memory.

Yet before seeing how we can further reduce the memory needed for multiple pattern matching, we ask you to solve three problems showing how suffix trees can be applied to other pattern matching challenges. The first such problem is the Longest Repeat Problem.

Assignment

Implement the longest_repeat method which takes a string.

The function should return a longest substring of the given string that occurs more than once in the given string.

Example

>>> longest_repeat('TCGAAGTGGATGGAAGGGGTAC')
'GAAG'