The Levenshtein distance between two DNA sequences is the minimal amount of treatments that are necessary in order to convert one sequence to another. This linear measure is named after Vladimir Levenshtein, who invented it in 1965. Apart from its use in the spelling checker, the Levenshtein disctance is also used to make astimates of the evolutionary distance betwee two DNA sequences (where the variants of the Needleman-Wunsh or the Smith-Waterman are usually used). Here, a DNA sequence can be represented by a sequence of symbols that consists only of the letters A, C, G and T. The collection of valid treatments is then insertion (inserting a letter), deletion (deleting a letter) and substitution (replacing a letter by another letter).

For the calculation of the Levenshtein distance between two given DNA sequences, a two-dimensional matrix $$d(0,\ldots,m;0,\ldots,n)$$ is used, where $$m$$ represents the length of the first DNA sequence and $$n$$ the length of the second DNA sequence. Initially you fill out the first column of the matrix $$d$$ with the numbers $$0,1,\ldots,m$$ and the first row with the numbers $$0,1,\ldots,n$$. The other numbers are calculated from left to right and from top to bottom, where the value on row $$i$$ and column $$j$$ are given by
\[ d_{i,j} = \textrm{minimum} \left\{ \begin{array}{ll} d_{i-1,j}+1&\textrm{deletion}\\ d_{i,j-1}+1&\textrm{insertion}\\ d_{i-1,j-1}+\textrm{cost}\quad\quad&\textrm{substitution}\\ \end{array} \right. \]
Here, the cost is equal to 0 if the $$i$$th letter of the first DNA sequence equals the $$j$$th letter of the second DNA sequence. Otherwise, the cost equals 1. At the end of this procedure, the Levenshtein distance can be read as the value on the bottom right of the matrix.

An example, the Levenshtein distance between ACTCA and AGTCCG is 3 and can thus be given: 

The end result after calculating the two-dimensional matrix $$d$$ looks as follows:

Levenshtein voorbeeld

Assignment

Define a class Sequence which can be used to make instances of valid DNA sequences. The objects of the class Sequence must support the following methods:

Example

>>> seq1 = Sequence('ACTCA')
>>> seq2 = Sequence('AGTCCG')
>>> seq1.distance(seq2)
3
>>> seq3 = Sequence('GTCNAAC')
Traceback (most recent call last):
AssertionError: invalid sequence