Drop hier links of afbeeldingen om ze aan de editor toe te voegen.

We now extend "Most frequent words with mismatches" to find frequent words with both mismatches and reverse complements. Recall that $$\overline{p}$$ refers to the reverse complement of $$p$$.

Assignment

Write a function most_frequent_kmers that takes a DNA string $$s$$ and two integers $$k$$ and $$d$$. The function must return the set of all $$k$$-mers $$p$$ maximizing the sum PatternCountd($$s$$, $$p$$) + PatternCountd($$s$$, $$\overline{p}$$) over all possible k-mers.

Example

In the following interactive session, we assume the FASTA file data.fna to be located in the current directory.

>>> most_frequent_kmers('ACGTTGCATGTCGCATGATGCATGAGAGCT', 4, 1)
{'ACAT', 'ATGT'}
>>> most_frequent_kmers('AACAAGCTGATAAACATTTAAAGAG', 5, 1)
{'TTAAA', 'TTTAA', 'TTTTA', 'TAAAA'}

>>> from Bio import SeqIO
>>> most_frequent_kmers(*SeqIO.parse('data.fna', 'fasta'), 10, 2)
{'CCGGCGGCCG', 'CGGCCGCCGG'}

Note

As in "Most frequent words with mismatches" it is not necessary for the pattern $$p$$ to appear in $$s$$ in order to consider $$p$$ as a most frequent word with mismatches and reverse complements of $$s$$.