In "Finding a motif with modifications1" we considered a problem in which we were given a motif and a long string (perhaps representing a genome), and we aimed to find the "closest" substring of the long string to the motif. In that problem, "closest" was defined as a minimum with respect to edit distance.
Yet there may be multiple substring candidates from the genome that achieve the minimum distance to the motif; this situation might occur in practice when the motif forms a repeat that occurs multiple times with variations deriving from mutations.
In this assignment, we would like to find all substrings of a genome that are within a certain fixed distance of the desired motif.
In the following interactive session, we assume the FASTA file data.fna2 to be located in the current directory.
>>> similarMotifs(2, 'ACGTAG', 'ACGGATCGGCATCGT') {(1, 4), (1, 5), (1, 6)} >>> from Bio import SeqIO >>> similarMotifs(2, *SeqIO.parse('data.fna', 'fasta')) {(1, 4), (1, 5), (1, 6)}