We now extend "Most frequent words with mismatches1" to find frequent words with both mismatches and reverse complements. Recall that $$\overline{p}$$ refers to the reverse complement of $$p$$.
Write a function most_frequent_kmers that takes a DNA string $$s$$ and two integers $$k$$ and $$d$$. The function must return the set of all $$k$$-mers $$p$$ maximizing the sum PatternCountd($$s$$, $$p$$) + PatternCountd($$s$$, $$\overline{p}$$) over all possible k-mers.
In the following interactive session, we assume the FASTA file data.fna2 to be located in the current directory.
>>> most_frequent_kmers('ACGTTGCATGTCGCATGATGCATGAGAGCT', 4, 1) {'ACAT', 'ATGT'} >>> most_frequent_kmers('AACAAGCTGATAAACATTTAAAGAG', 5, 1) {'TTAAA', 'TTTAA', 'TTTTA', 'TAAAA'} >>> from Bio import SeqIO >>> most_frequent_kmers(*SeqIO.parse('data.fna', 'fasta'), 10, 2) {'CCGGCGGCCG', 'CGGCCGCCGG'}
As in "Most frequent words with mismatches3" it is not necessary for the pattern $$p$$ to appear in $$s$$ in order to consider $$p$$ as a most frequent word with mismatches and reverse complements of $$s$$.