We implemented a greedy approach for searching motifs in "Greedy motif search1". In this assignment, we will power it up with pseudocounts.

Assignment

Write a function greedy_motif_search that takes an integer $$k$$ and the location of a FASTA file containing a collection of DNA strings $$\mathcal{C}_\text{DNA}$$. The function must return a tuple containing the $$k$$-mers resulting from a greedy motif search with pseudocounts in $$\mathcal{C}_\text{DNA}$$. If at any step the function finds more than one $$\mathcal{P}$$-most probable $$k$$-mers in a given DNA string, it must use the one occurring first (the leftmost one).

Example

In the following interactive session, we assume the FASTA files data01.fna2, data02.fna3, data03.fna4 and data04.fna5 to be located in the current directory.

>>> greedy_motif_search(3, 'data01.fna')
('TTC', 'ATC', 'TTC', 'ATC', 'TTC')
>>> greedy_motif_search(5, 'data02.fna')
('AGGCG', 'ATCCG', 'AAGCG', 'AGTCG', 'AACCG', 'AGGCG', 'AGGCG', 'AGGCG')
>>> greedy_motif_search(5, 'data03.fna')
('AGGCG', 'TGGCA', 'AAGCG', 'AGGCA', 'CGGCA', 'AGGCG', 'AGGCG', 'AGGCG')
>>> greedy_motif_search(5, 'data04.fna')
('GGCGG', 'GGCTC', 'GGCGG', 'GGCAG', 'GACGG', 'GACGG', 'GGCGC', 'GGCGC')