We now extend "Most frequent words with mismatches¹" to find frequent words with both mismatches and reverse complements. Recall that $$\overline{p}$$ refers to the reverse complement of $$p$$.

Assignment

Write a function most_frequent_kmers that takes a DNA string $$s$$ and two integers $$k$$ and $$d$$. The function must return the set of all $$k$$-mers $$p$$ maximizing the sum PatternCount_d($$s$$, $$p$$) + PatternCount_d($$s$$, $$\overline{p}$$) over all possible k-mers.

Example

In the following interactive session, we assume the FASTA file data.fna² to be located in the current directory.

        >>> most_frequent_kmers('ACGTTGCATGTCGCATGATGCATGAGAGCT', 4, 1)
{'ACAT', 'ATGT'}
>>> most_frequent_kmers('AACAAGCTGATAAACATTTAAAGAG', 5, 1)
{'TTAAA', 'TTTAA', 'TTTTA', 'TAAAA'}

>>> from Bio import SeqIO
>>> most_frequent_kmers(*SeqIO.parse('data.fna', 'fasta'), 10, 2)
{'CCGGCGGCCG', 'CGGCCGCCGG'}

Note

As in "Most frequent words with mismatches³" it is not necessary for the pattern $$p$$ to appear in $$s$$ in order to consider $$p$$ as a most frequent word with mismatches and reverse complements of $$s$$.